论文:2012,Vol:30,Issue(6):968-973
引用本文:
张振海, 李士宁, 李志刚. 一种基于相关信息熵的多标签分类算法[J]. 西北工业大学
Zhang Zhenhai, Li Shining, Li Zhigang. A Multi-Label Classification Algorithm Using Correlation Information Entropy[J]. Northwestern polytechnical university

一种基于相关信息熵的多标签分类算法
张振海, 李士宁, 李志刚
西北工业大学 计算机学院, 陕西 西安 710072
摘要:
在多标签分类中,标签之间的相关关系是一个重要的因素。为了利用标签之间的相关关系,文章提出了一种基于相关信息熵的多标签分类算法,使用相关信息熵来衡量标签之间相关关系的强弱程度。首先找出相关信息熵值最大的k标签组合的集合,然后使用LP(Label Powerset)分类器对每一个标签组合进行训练。在7个不同实验数据集上的实验结果表明:文中提出的算法的分类性能在其中的大部分数据集上优于其它对比的分类算法,而其它对比的分类算法仅在某一个数据集上优于文中提出的算法。
关键词:    多标签分类    数据处理    相关信息熵    相关关系   
A Multi-Label Classification Algorithm Using Correlation Information Entropy
Zhang Zhenhai, Li Shining, Li Zhigang
Department of Computer Science and Technology,Northwestern Polytechnical University,Xi'an 710072,China
Abstract:
In our opinion,the LP(label powerset) classifier may put the uncorrelated labels into the label set andtrain it as a single label. To solve this problem,it is very necessary to make use of the correlations among multiplelabels in carrying out multi-label classification. Therefore,we propose a multi-label classification algorithm usingcorrelation information entropy (MLCACIE) for measuring the strength of label correlation. Its core consists of:(1) given the number of classifiers (CN) to be trained,we find out the CN number of subsets of k-labels with thestrongest correlation; (2) we train these k-label subsets one by one with the CN number of LP classifiers. Finally,we use seven experimental datasets and the decision tree as the base classifier to perform experiments on the MLCA-CIE and compare it with other classification algorithms. The experimental results,given in Table 3,and their anal-ysis show preliminarily that: (1) ourMLCACIE outperforms other classification algorithms on most datasets becauseit makes use of the correlations among multiple labels in performing multi-label classification, while the other classi-fication algorithms outperform our MLCACIE only on one of the seven datasets; (2) the use of the correlations a-mong multiple labels can enhance the multi-label classification performance.
Key words:    algorithms    classification (of information)    correlation theory    data processing    decision trees    entro-py    information theory    labels;correlation information entropy    multi-label classification   
收稿日期: 2011-12-11     修回日期:
DOI:
基金项目: 国家科技重大专项(2012ZX03005007)资助
通讯作者:     Email:
作者简介: 张振海(1984-),西北工业大学博士研究生,主要从事无线传感器网络和数据处理技术的研究。
相关功能
PDF(814KB) Free
打印本文
把本文推荐给朋友
作者相关文章
张振海  在本刊中的所有文章
李士宁  在本刊中的所有文章
李志刚  在本刊中的所有文章

参考文献:
[1] Grigorios T,Ioannis V.Mining Multi-Label Data.Data Mining and Knowledge Discovery Handbook, 2010, 2nd edition
[2] Zhang M L,Zhou Z H.Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization.IEEETransactions on Knowledge and Data Engineering, 2006, 18(10):1338-1351
[3] Andre E,Jason W.A Kernel Method for Multi-Labelled Classification.Advances in Neural Information Processing Systems,2002, 14: 681-687
[4] Francesco D C,Remi G,Marc T.Learning Multi-Label Alternating Decision Trees from Texts and Data.Lecture Notes in Com-puter Science 2734, 2003, 35-49
[5] Johannes F K,Eyke H.Multilabel Classification via Calibrated Label Ranking.Machine Learning, 2008, 73(2):133-153
[6] Ji S W,Tang L.Extracting Shared Subspaces for Multi-Label Classification.KDD 2008:14th ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, 2008, 381-389
[7] Jesse R,Bernhard P.Classifier Chains for Multi-Label Classification.Machine Learning, 2011, 85(3):333-359
[8] Dembczynski K,Cheng W W.Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains.Proc ICML,2010,279-286
[9] Grigorios T,Ioannis V.Random k-Labelsets for Multilabel Classification.IEEE Transactions on Knowledge and Data Engineer-ing, 2011, 23(7):1079-1089
[10] Wang Q,Shen Y,Zhang J Q.A Nonlinear Correlation Measure for Multivariable Data Set.Physica D:Nonlinear Phenomena,2005, 200(3/4):287-295
[11] Grigorios T,Eleftherios S X.MULAN:A Java Library for Multi-Label Learning.Journal of Machine Learning Research, 2011,2411-2414
[12] Remco R B,Eibe F.WEKA—Experiences with a Java Open-Source Project.Journal of Machine Learning Research,2010,2533-2541
[13] Zhang M L,Zhou Z H.ML-KNN: A Lazy Learning Approach to Multi-Label Learning.Pattern Recognition 40,2007,2038-2048
[14] Ross Q.C4. 5:Programs for Machine Learning.San Mateo,CA:Morgan Kaufmann, 1993