[1]万韩永,左家莉,万剑怡,等.基于样本重要性原理的KNN文本分类算法[J].江西师范大学学报(自然科学版),2015,(03):297-303.
 WAN Hanyong,ZUO Jiali,WAN Jianyi,et al.The KNN Text Classification Based on Sample Importance Principals[J].,2015,(03):297-303.
点击复制

基于样本重要性原理的KNN文本分类算法()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2015年03期
页码:
297-303
栏目:
出版日期:
2015-05-31

文章信息/Info

Title:
The KNN Text Classification Based on Sample Importance Principals
作者:
万韩永;左家莉;万剑怡;王明文
江西师范大学计算机信息工程学院,江西 南昌 330022
Author(s):
WAN HanyongZUO JialiWAN JianyiWANG Mingwen
关键词:
文本分类 KNN 样本重要性原理 SI-KNN
Keywords:
text classification KNN sample importance principals SI-KNN
分类号:
TP 391
文献标志码:
A
摘要:
KNN是重要数据挖掘算法之一,具有良好的文本分类性能.传统的KNN方法对所有样本权重看作相同,而忽略了不同样本对于分类贡献的不同.为了解决该个问题,提出了一种样本重要性原理,并在此基础上构造KNN分类器.应用随机游走算法识别类边界点,并计算出每个样本点的边界值,生成每个样本点的重要性得分,将样本重要性与KNN方法融合形成一种新的分类模型——SI-KNN.在中英文文本语料上的实验表明:改进的SI-KNN分类模型相比于传统的KNN方法有一定的提高.
Abstract:
As one of the top ten data mining algorithms,KNN has good performance of text classification.All samples are treated as the same as its weight in the traditional KNN method,but the question that the different sample has the different contribution to the classification has been ignored.To solve the problem,a sample importance principals and KNN classifier constructed on the basis of this principle has been presented.Using the random walk algorithm to identify these samples near the class boundary,and calculate the boundary value of each sample.To generate the score of sample importance of each sample from the boundary value,combined sample importance with KNN method to form a new classification model.Experimental results show that the new SI-KNN classifier has some improvement compared to the traditional KNN method on the Chinese and English text corpus.

参考文献/References:

[1] Rutkowski L,Jaworski M,Pietruczuk L,et al.The CART decision tree for mining data streams [J].Information Sciences,2014,266:1-15.
[2] Jiang Liangxiao,Cai Zhihua,Wang Dianhong,et al.Bayesian citation-KNN with distance weighting[J].International Journal of Machine Learning and Cybernetics,2014,5(2):193-199.
[3] Bollen K A,Harden J J,Ray S,et al.BIC and alternative Bayesian information criteria in the selection of structural equation models [J].Structural Equation Modeling:A Multidisciplinary Journal,2014,21(1):1-19.
[4] Rebentrost P,Mohseni M,Lloyd S.Quantum support vector machine for big data classification [J].Physical Review Letters,2014,113(13):130503.
[5] Utkin L V,Zhuk Y A.Robust boosting classification models with local sets of probability distributions [J].Knowledge-Based Systems,2014,61:59-75.
[6] Vapnik V N,Vapnik V.Statistical learning theory [M].New York:Wiley,1998.
[7] Hastie T,Tibshirani R,Friedman J,et al.The elements of statistical learning [M].New York:Springer,2009.
[8] Bermejo S,Cabestany J.Large margin nearest neighbor classifiers [M].Springer Berlin Heidelberg,2001,84:669-676.
[9] Domeniconi C,Gunopulos D,Peng J.Large margin nearest neighbor classifiers [J].Neural Networks,IEEE Transactions on,2005,16(4):899-909.
[10] Chai Jing,Liu Hongwei,Chen Bo,et al.Large margin nearest local mean classifier [J].Signal Processing,2010,90(1):236-248.
[11] Schapire R E,Freund Y,Bartlett P,et al.Boosting the margin:A new explanation for the effectiveness of voting methods [J].Annals of statistics,1998,26(5):1651-1686.
[12] Nguyen N,Guo Y.Metric learning:A support vector approach [M].Berlin:Springer Berlin Heidelberg,2008:125-136.
[13] Weinberger K Q,Saul L K.Distance metric learning for large margin nearest neighbor classification [J].The Journal of Machine Learning Research,2009,10:207-244.
[14] 杨柳,于剑,景丽萍.一种自适应的大间隔近邻分类算法 [J].计算机研究与发展,2013(11):2269-2277.
[15] 胡元,石冰.基于区域划分的kNN文本快速分类算法研究 [J].计算机科学,2012,39(10):182-186.
[16] 周奇.基于指纹识别特征选择的改进加权KNN算法 [J].现代计算机:专业版,2014(2):27-29.
[17] 王超学,潘正茂,马春森,等.改进型加权KNN算法的不平衡数据集分类 [J].计算机工程,2012,38(20):160-163.
[18] Jindaluang W,Chouvatut V,Kantabutra S.Under-sampling by algorithm with performance guaranteed for class-imbalance problem [C].Computer Science and Engineering Conference,2014:215-221.

相似文献/References:

[1]蔡桂秀,王明文,揭安全,等.基于Markov网络团的查询意图识别[J].江西师范大学学报(自然科学版),2012,(04):383.
 CAI Gui-xiu,WANG Ming-wen,JIE An-quan,et al.A Method for Query Intent Identification Based on Markov Network Clique[J].,2012,(03):383.
[2]黄奕平,万剑怡,万中英,等.基于MapReduce的粒子群投影寻踪模型的设计与实现[J].江西师范大学学报(自然科学版),2012,(04):388.
 HUANG Yi-ping,WAN Jian-yi,WAN Zhong-ying,et al.The Design and Implementing for Projection Pursuit Model Using PSO Based on MapReduce[J].,2012,(03):388.
[3]万中英,王明文,揭安全,等.投影寻踪模型中投影指标的改进[J].江西师范大学学报(自然科学版),2013,(03):284.
 WAN Zhong-ying,WANG Ming-wen,JIE An-quan,et al.The Projection Index's Improvement in Projection Pursuit Model[J].,2013,(03):284.

备注/Memo

备注/Memo:
国家自然科学基金(61272212,61163006,61203313,61365002,61462045)
更新日期/Last Update: 1900-01-01