[1]聂 斌,李 欢,罗计根,等.融合GINI指数的C4.5算法的分类研究[J].江西师范大学学报(自然科学版),2019,(05):469-472.[doi:10.16357/j.cnki.issn1000-5862.2019.05.05]
 NIE Bin,LI Huan,LUO Jigen,et al.The Study on Classification of C4.5 Algorithms with GINI Index[J].Journal of Jiangxi Normal University:Natural Science Edition,2019,(05):469-472.[doi:10.16357/j.cnki.issn1000-5862.2019.05.05]
点击复制

融合GINI指数的C4.5算法的分类研究()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2019年05期
页码:
469-472
栏目:
出版日期:
2019-10-10

文章信息/Info

Title:
The Study on Classification of C4.5 Algorithms with GINI Index
文章编号:
1000-5862(2019)05-0469-04
作者:
聂 斌李 欢罗计根杜建强周 丽黄 强
江西中医药大学计算机学院, 江西 南昌 330004
Author(s):
NIE BinLI HuanLUO JigenDU JianqiangZHOU LiHUANG Qiang
School of Computer Science,Jiangxi University of Traditional Chinese Medicine,Nanchang Jiangxi 330004,China
关键词:
C4.5算法 GINI指数 决策树 中医药信息
Keywords:
C4.5 algorithm GINI index decision tree information of Chinese medicine
分类号:
TP 301.6
DOI:
10.16357/j.cnki.issn1000-5862.2019.05.05
文献标志码:
A
摘要:
信息增益率倾向于取值数较少的属性和产生不平衡的划分,GINI指数偏向于取值数较多的属性且区间趋于平衡的划分.基于此,该文提出融合GINI指数的C4.5改进算法,首先计算候选属性的信息增益率和GINI指数,其次计算信息增益率和GINI指数的比值,最后筛选出比值最大的属性作为划分结点,改进了C4.5算法的不足.以10次10折交叉验证准确率和运行时间为评价指标,通过5组UCI数据测试改进算法性能,并与ID3、C4.5和CART算法对比实验.实验结果表明:融合GINI指数的C4.5算法减轻了属性取值多少对划分结点选择的影响,并且缓和了划分区间的不平衡,提高了分类准确率和运行效率,算法更加稳定,可行有效.
Abstract:
The information gain rate tends to take fewer attributes and produce an imbalance partition.The GINI index tends to take more attributes and produce the balanced partition.Based on this,an improve C4.5 algorithm combining GINI index is proposed.The algorithm first calculates the information gain rate and GINI index of candidate attributes,and then calculates the ratio of information gain rate to GINI index.Finally,the attribute with the largest ratio is selected as the segmentation node,which improves the shortcomings of the C4.5 algorithm.Taking ten times and ten fold cross-validation accuracy and running time as evaluation index,the improved algorithm performance is tested through five UCI data sets and compared with ID3,C4.5 and CART algorithms.The results show that the C4.5 algorithm combining GINI index reduces the influence of attribute value on the selection of partition nodes,and alleviates the imbalance of partition interval,which improves the classification accuracy and operation efficiency.The algorithm is more stable and feasible.

参考文献/References:

[1] 陈亚慧,叶继华.基于决策树分类的个性化农产品移动信息服务系统[J].江西师范大学学报:自然科学版,2016,40(2):145-148.
[2] 冷强奎,刘福德,秦玉平.一种基于混合二叉树结构的多类支持向量机分类算法[J].计算机科学,2018,45(5):220-223,237.[3] 杨国亮,王志元,张雨.一种改进的深度卷积神经网络的精细图像分类[J].江西师范大学学报:自然科学版,2017,41(5):476-483.
[4] Tang Bo,He Haibo,Baggenstoss P M,et al.A Bayesian classification approach using class-specific features for text categorization[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1602-1606.
[5] Quinlan J R.C4.5:programs for machine learning[M].San Francisco:Morgan Kaufmann Publisher,1993.
[6] 夏修臣,王秀英.基于余弦相似度的改进C4.5决策树算法[J].计算机工程与设计,2018,39(1):120-125.
[7] 曾繁慧,李艺.因素空间理论的决策树C4.5算法改进[J].辽宁工程技术大学学报:自然科学版,2017,36(1):109-112.
[8] Peng Hai,Zhang Xiaofan,Huang Letian.An energy efficient approach for C4.5 algorithm using OpenCL design flow[EB/OL].[2018-11-19〗.https://ieeexplore.ieee.org/document/8280132.
[9] Nasution M Z F,Sitompul O S,Ramli M.PCA based feature reduction to improve the accuracy of decision tree c4.5 classification[EB/OL].[2018-11-19].http://iopscience.iop.org/article/10.1088/1742-6596/978/1/012058/pdf.
[10] 黄秀霞,孙力.C4.5算法的优化[J].计算机工程与设计,2016,37(5):1265-1270,1361.
[11] 周志华.机器学习[M].北京:清华大学出版社,2016.
[12] Han Jiawei,Micheline Kamber,Pei Jian.数据挖掘:概念与技术[M].3版.范明,孟小峰,译.北京:机械工业出版社,2012.
[13] Breiman L I,Friedman J H,Olshen R A,et al.Classification and regression trees(CART).wadsworth[J].Encyclopedia of Ecology,1984,40(3):582-588.
[14] 孙喜洲.数据挖掘分类技术在健身会所管理系统中的应用研究[D].青岛:中国海洋大学,2011.
[15] UCI.Machine learning repository[EB/OL].[2019-01-17].http://archive.ics.uci.edu/ml/index.php.

备注/Memo

备注/Memo:
收稿日期:2019-01-03基金项目:国家自然科学基金(61562045),江西省卫生计生委中医药科研计划(普通)(2017A282)和江西省科技厅重点研发计划(20171ACE50021)资助项目.作者简介:聂 斌(1972-),男,江西峡江人,副教授,主要从事数据挖掘、机器学习、人工智能和中医药信息学的研究.E-mail:ncunb@163.com
更新日期/Last Update: 2019-10-10