[1]王 卓,聂 斌*,罗计根,等.调和平均优化选择划分属性的决策树改进算法[J].江西师范大学学报(自然科学版),2018,(04):384-388.[doi:10.16357/j.cnki.issn1000-5862.2018.04.11]
 WANG Zhuo,NIE Bin*,LUO Jigen,et al.The Improvement Decision Tree Algorithm for Harmonic Mean Optimization on Selection Attributes[J].Journal of Jiangxi Normal University:Natural Science Edition,2018,(04):384-388.[doi:10.16357/j.cnki.issn1000-5862.2018.04.11]
点击复制

调和平均优化选择划分属性的决策树改进算法()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2018年04期
页码:
384-388
栏目:
出版日期:
2018-08-20

文章信息/Info

Title:
The Improvement Decision Tree Algorithm for Harmonic Mean Optimization on Selection Attributes
文章编号:
1000-5862(2018)04-0384-05
作者:
王 卓1聂 斌2*罗计根2杜建强2陈 爱1周 丽2
1.南昌大学软件学院,江西 南昌 330047; 2. 江西中医药大学计算机学院,江西 南昌 330004
Author(s):
WANG Zhuo1NIE Bin2*LUO Jigen2DU Jianqiang2CHEN Ai1ZHOU Li2
1.School of Software,Nanchang University,Nanchang Jiangxi 330047,China; 2.School of Computer,Jiangxi University of Traditional Chinese Medicine,Nanchang Jiangxi 330004,China
关键词:
决策树 信息增益率 调和平均 中医药信息
Keywords:
decision tree information gain ratio harmonic mean information of traditional Chinese medicine
分类号:
TP 391
DOI:
10.16357/j.cnki.issn1000-5862.2018.04.11
文献标志码:
A
摘要:
针对信息增益和信息增益率对属性取值数的偏好,提出了一种调和平均优化选择划分属性的决策树改进算法.首先计算候选划分属性的信息增益,找出信息增益高于平均水平的属性,然后分别计算这些属性的信息增益率和信息增益的调和平均值,从中筛选调和平均值最大的属性,建立分支决策,并用递归方法建立决策树.通过4份不同规模数据实验,利用信息增益、信息增益率、GINI指数以及该文提出的方法作为属性划分的标准,分别考察其准确性在训练集、测试集、10次10折交叉验证(或5次5折交叉验证),以及其平均值.实验结果表明:该方法准确性较好、运行时间较短,具有一定程度的优越性.
Abstract:
Aiming at the preference of information gain and information gain rate for the number of attribute values,an improved decision tree algorithm is proposed to adjust the attribute of optimal selection.The basic idea of the algorithm is as follows.Firstly,the information gain of the candidate partitioning attribute is calculated to find out the attribute of the information gain higher than the average level.Then,the harmonic average of the information gain and information gain of these attributes are calculated respectively,value of the largest attribute,the establishment of branch decision.Lastly,the use of recursive method to establish decision tree.Through four experiments of different scale data,the information gain, information gain rate,GINI index and the method proposed in the paper are used as the criteria of attribute classification to examine the accuracy of the method in the training set,the test set,ten times the ten-fold cross validation(or five times the five-fold cross validation),and the three aspects of the average.The results show that the proposed method is of good accuracy and low running time,and has certain advantages.

参考文献/References:

[1] 周志华.机器学习[M].北京:清华大学出版社,2016.
[2] Quinlan J R.Induction of decision trees[J].Machine Learning,1986,1(1):81-106.
[3] Quinlan J R.C4.5:Programs for machine learning[EB/OL].
[2017-03-17].http://ishare.iask.sina.com.cn/f/12391571.html.
[4] Chen Kunhuang,Wang Kungjeng,Wang Kungmin,et al.Applying particle swarm optimization-based decision tree classifierfor cancer classification on gene expression data[J].Applied Soft Computing,2014:24(C):773-780.
[5] Chen Cuihua,He Binbin,Zeng Ze.A method for mineral prospectivity mapping integrating C4.5 decision tree,weights-of-evidence and m-branch smoothing techniques:a case study in the eastern Kunlun Mountains,China[J].Earth Science Informatics,2014,7(1):13-24.
[6] Huang Aihui.C4.5 algorithm of decision tree improvement and application[J].Science Technology and Engineering,2009(1):34-36,42.
[7] Jia Ping,Dai Jianhua,Pan Yunhe,et al.Novel algorithm for attribute reduction based on Mutual-information gain ratio[J].Journal of Zhejiang University:Engineering Science,2006,40(6):1041-1044,1070.
[8] 王靖,王兴伟,赵悦.基于变精度粗糙集决策树垃圾邮件过滤[J].系统仿真学报,2016,28(3):705-710.
[9] 张棪,曹健.面向大数据分析的决策树算法[J].计算机科学,2016(S1):374-379,383.
[10] 于菲,张敏灵.基于决策树集成的偏标记学习算法[J].模式识别与人工智能,2016,29(4):367-375.
[11] 王杰,蔡良健,高瑜.一种基于决策树的多示例学习算法[J].郑州大学学报:理学版,2016,48(1):81-84.
[12] 王忠民,张琮,衡霞.CNN与决策树结合的新型人体行为识别方法研究[J].计算机应用研究,2017(12):1-2.
[13] 王世东,刘毅,王新闯,等.基于改进决策树模型的矿区土地复垦适宜性评价[J].中国水土保持科学,2016,14(6):35-43.
[14] 李瑞红,李智,童玲.蚁群路径优化决策树在慢性肾病分期诊断中的应用[J].软件导刊,2017,16(2):135-138.
[15] 谢振平,孙桃.自组织决策树的联想记忆在线学习模型[J].模式识别与人工智能,2017,30(1):21-31.
[16] 张巍,聂进,滕少华.基于互信息的模糊决策树及其增量学习[J].江西师范大学学报:自然科学版,2014,38(1):89-94.
[17] 李航.统计学习方法[M].北京:清华大学出版社,2012.

相似文献/References:

[1]滕少华,胡俊,张巍,等.支持向量机与哈夫曼树实现多分类的研究[J].江西师范大学学报(自然科学版),2014,(04):383.
 TENG Shao-hua,HU Jun,ZHANG Wei,et al.The Research of Multi-Classification Based on SVM and Huffnan Tree[J].Journal of Jiangxi Normal University:Natural Science Edition,2014,(04):383.

备注/Memo

备注/Memo:
收稿日期:2017-08-06
基金项目:国家自然科学基金(61562045,61363042),江西省自然科学基金重大项目(20152AXCB20007),江西省高校科技落地计划(LD12038),江西省教育科学“十二五”规划一般课题(15YB005)和江西中医药大学自然科学基金(2013ZR0068)资助项目.
通信作者:聂 斌(1972-),男,江西峡江人,副教授,主要从事中医信息学、数据挖掘和人工智能方面的研究.E-mail:864860723@qq.com
更新日期/Last Update: 2018-08-20