论文:2020,Vol:38,Issue(2):434-441
引用本文:
徐东, 王鑫, 孟宇龙, 张子迎. 一种基于森林优化的粗糙集离散化算法[J]. 西北工业大学学报
XU Dong, WANG Xin, MENG Yulong, ZHANG Ziying. A Discretization Algorithm Based on Forest Optimization Network and Variable Precision Rough Set[J]. Northwestern polytechnical university

一种基于森林优化的粗糙集离散化算法
徐东, 王鑫, 孟宇龙, 张子迎
哈尔滨工程大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001
摘要:
多维属性离散化能提升机器学习算法训练的速度与精度,目前的离散化算法性能较低且多是单属性离散,忽略了属性之间的潜在关联。基于此,提出了一种基于森林优化的粗糙集离散化算法(a discretization algorithm based on forest optimization and rough set,FORDA)。该算法针对多维连续属性的离散化,依据变精度粗糙集理论,设计适宜值函数,进而构建森林寻优网络,迭代搜索最优断点子集。在UCI数据集上的实验结果表明,与当前主流的离散化算法相比,所提算法能避免局部最优,显著提升了SVM分类器的分类精度,其离散化性能更为优良,且具有一定的通用性,验证了算法的有效性。
关键词:    离散化    森林优化    多维    变精度粗糙集    寻优网络    断点子集   
A Discretization Algorithm Based on Forest Optimization Network and Variable Precision Rough Set
XU Dong, WANG Xin, MENG Yulong, ZHANG Ziying
School Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
Abstract:
Discretization of multidimensional attributes can improve the training speed and accuracy of machine learning algorithm. At present, the discretization algorithms perform at a lower level, and most of them are single attribute discretization algorithm, ignoring the potential association between attributes. Based on this, we proposed a discretization algorithm based on forest optimization and rough set (FORDA) in this paper. To solve the problem of discretization of multi-dimensional attributes, the algorithm designs the appropriate value function according to the variable precision rough set theory, and then constructs the forest optimization network and iteratively searches for the optimal subset of breakpoints. The experimental results on the UCI datasets show that:compared with the current mainstream discretization algorithms, the algorithm can avoid local optimization, significantly improve the classification accuracy of the SVM classifier, and its discretization performance is better, which verifies the effectiveness of the algorithm.
Key words:    discretization    forest optimization network    multiple dimensions    variable precision rough set    breakpoint subset    nonlinear systems    SVM    algorithms   
收稿日期: 2019-04-10     修回日期:
DOI: 10.1051/jnwpu/20203820434
通讯作者: 孟宇龙(1976-),哈尔滨工程大学副教授、硕士生导师,主要从事机器学习与可信计算研究。E-mail:mengyulong@hrbeu.edu.cn     Email:mengyulong@hrbeu.edu.cn
作者简介: 徐东(1969-),哈尔滨工程大学教授、硕士生导师,主要从事计算机网络与信息安全研究。
相关功能
PDF(954KB) Free
打印本文
把本文推荐给朋友
作者相关文章
徐东  在本刊中的所有文章
王鑫  在本刊中的所有文章
孟宇龙  在本刊中的所有文章
张子迎  在本刊中的所有文章

参考文献:
[1] YANG Y, WEBB G I, WU X. Discretization Methods Data Mining and Knowledye Discovery Handbook[M]. Boston:Springer, 2009, 101-116
[2] PAWLAK Z, SKOWRON A. Rough Sets and Boolean Reasoning[J]. Information Sciences, 2007, 177(1):41-73
[3] LIU H, HUSSAIN F, TAN C L, et al. Discretization:an Enabling Technique[J]. Data Mining and Knowledge Discovery, 2002, 6(4):393-423
[4] TAY F E H, SHEN L. A Modified Chi2 Algorithm for Discretization[J]. IEEE Trans on Knowledge and Data Engineering, 2002, 14(3):666-670
[5] 谢宏, 程浩忠, 牛东晓. 基于信息熵的粗糙集连续属性离散化算法[J]. 计算机学报, 2005, 28(9):1570-1574 XIE Hong, CHENG Haozhong, NIU Dongxiao. Discretization Algorithm for Continuous Sets of Rough Sets Based on Information Entropy[J]. Journal of Computers, 2005, 28(9):1570-1574(in Chinese)
[6] KURGAN L A, CIOS K J. CAIM Discretization Algorithm[J]. IEEE Trans on Knowledge and Data Engineering, 2004, 16(2):145-153
[7] 陈迎春, 李鸥, 孙昱. 基于聚类离散化和变精度邻域熵的属性约简[J]. 控制与决策, 2018, 33(8):1407-1414 CHEN Yingchun, LI O, SUN Yu. Attribute Reduction Based on Clustering Discretization and Variable Precision Neighborhood Entropy[J]. Control and Decision, 2018, 33(8):1407-1414(in Chinese)
[8] JIANG F, ZHAO Z, GE Y. A Supervised and Multivariate Discretization Algorithm for Rough Sets[C]//Rough Set & Knowledge Technology-International Conference, 2010
[9] WEN L Y, MIN F, WANG S Y. A Two-Stage Discretization Algorithm Based on Information Entropy[J]. Applied Intelligence, 2017, 47(1):1-17
[10] SHARMIN S, ALI A A, KHAN M A H, et al. Feature Selection and Discretization based on Mutual Information[C]//IEEE International Conference on Imaging, 2017
[11] 张婧, 曹峰, 唐超. 基于遗传算法和变精度粗糙集的离散化算法[J]. 华中师范大学学报, 2018, 52(3):36-42(in Chinese) ZHANG Jing, CAO Feng, TANG Chao. Discretization Algorithm Based on Genetic Algorithm and Variable Precision Rough Set[J]. Journal of Huazhong Normal University, 2018, 52(3):36-42(in Chinese)
[12] GHAEMI M, FEIZI-DERAKHSHI M R. Forest Optimization Algorithm[J]. Expert Systems with Applications, 2014, 41(15):6676-6687
[13] PAWLAK Zdzisław. Rough Sets[J]. International Journal of Computer & Information Sciences, 1982, 11(5):341-356
[14] ZIARKO W. Variable Precision Rough Set Model[J]. Journal of Computer & System Science, 1993, 46(1):39-59
[15] JIA X, LIAO W, TANG Z, et al. Minimum Cost Attribute Reduction in Decision-Theoretic Rough Set Models[J]. Information Sciences, 2013, 219(Complete):151-167
[16] CHAGHARI A, FEIZI-DERAKHSHI M R, BALAFAR M A. Fuzzy Clustering Based on Forest Optimization Algorithm[J]. Journal of King Saud University-Computer and Information Sciences, 2018:30(1):25-32
[17] 聂大干. 森林优化算法的改进及离散化研究[D]. 兰州:兰州大学,2016 NIE Digan. Improvement and Discretization of Forest Optimization Algorithms[D]. Lanzhou:Lanzhou University, 2016(in Chinese)
[18] JIANG F, SUI Y. A Novel Approach for Discretization of Continuous Attributes in Rough Set Theory[J]. Knowledge-Based Systems, 2015, 73:324-334
[19] CLÁÁUDIO Rebelo de Sá, SOARES C, KNOBBE A. Entropy-Based Discretization Methods for Ranking Data[J]. Information Sciences, 2016, 329:921-936
[20] KHANMOHAMMADI S, CHOU C A. A Gaussian Mixture Model Based Discretization Algorithm for Associative Classification of Medical Data[J]. Expert Systems with Applications, 2016, 58:119-129