[1]万中英,王明文,左家莉,等.一种新的样本选择算法及其在文本分类中的应用[J].江西师范大学学报(自然科学版),2019,(01):76-83.[doi:10.16357/j.cnki.issn1000-5862.2019.01.13]
 WAN Zhongying,WANG Mingwen,ZUO Jiali,et al.The New Boundary Sample Selection Method and Its Application in the Text Classification[J].Journal of Jiangxi Normal University:Natural Science Edition,2019,(01):76-83.[doi:10.16357/j.cnki.issn1000-5862.2019.01.13]
点击复制

一种新的样本选择算法及其在文本分类中的应用()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2019年01期
页码:
76-83
栏目:
信息科学与技术
出版日期:
2019-02-10

文章信息/Info

Title:
The New Boundary Sample Selection Method and Its Application in the Text Classification
文章编号:
1000-5862(2019)01-0076-08
作者:
万中英王明文左家莉刘长红
江西师范大学计算机信息工程学院,江西 南昌 330022
Author(s):
WAN ZhongyingWANG MingwenZUO JialiLIU Changhong
School of Computer Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China
关键词:
边界样本 样本选择 文本分类 支持向量机 K近邻
Keywords:
boundary samples sample selection text classification SVM KNN
分类号:
TP 391
DOI:
10.16357/j.cnki.issn1000-5862.2019.01.13
文献标志码:
A
摘要:
在保证分类性能的前提下,如何从大量的训练样本集合中选择重要样本子集,是模式分类中的一个重要问题.基于该问题提出了一种新的样本选择算法,并将该算法应用于文本分类,在标准文档集Reuters-21578、复旦文档集和20newsGroup新闻组文档集上进行了实验.实验结果表明:该方法能有效地选取边界样本,且采用SVM和KNN分类能得到较好的分类结果,尤其是在不均衡文档集上效果更佳.
Abstract:
On the premise of ensuring the classification performance,how to select an important sample set from a large number of training sample sets has become an important issue in the pattern classification.Aiming at this problem,a new sample selection algorithm is proposed and applied to text categorization.Experiments are carried out on the standard document set Reuters-21578,Fudan document set and 20 news group document set.The experimental results show that the proposed method can effectively select the boundary samples,and the SVM and KNN classifiers can get better classification results,especially on the unbalanced document set.

参考文献/References:

[1] Hart P E.The condensed nearest neighbor rule[J].IEEE Transaction on Information Theory,1968,14(5):15-516.
[2] 李畅.基于边界样本选择的支持向量机[D].石家庄:河北大学,2014.
[3] Gates G W.The reduced nearest neighbor rule[J].IEEE Transactions on Information Theory,1972,18(3):431-433.
[4] Wilson D L.Asymptotic properties of nearest neighbor rules using edited data[J].IEEE Transaction on Systems,Man and Cybernetics,1972,2(3):408-421.
[5] Angiulli F.Fast nearest neighbor condensation for large data sets classification[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(11):1450-1464.
[6] Tambouratzis T.Counter-clustering for training pattern selection[J].The Computer Journal,2000,43(3):177-190.
[7] Lyhyaoui A,Ynez M M,Mora I.Sample selection via clustering to construct support vector-like classifiers[J].IEEE Transactions on Neural Networks,1999,10(6):1474-1480.
[8] 杨宏晖,王芸,孙进才,等.融合样本选择与特征选择的AdaBoost支持向量机集成算法[J].西安交通大学学报,2014,48(12):63-68.
[9] Ramesh B,Sathiaseelan J G R.An advanced multi class instance selection based support vector machine for text classification[J].Procedia Computer Science,2015,57:1124-1130.
[10] 周玉,朱安福,周林,等.一种神经网络分类器样本数据选择方法[J].华中科技大学学报:自然科学版,2012,40(6):39-43.
[11] 胡小生,钟勇.基于边界样本选择的支持向量机加速算法[J].计算机工程与应用,2017,53(3):169-173.
[12] Yang Honghui,Zhou Xin,Wang Yun,et a1.A new adaptive immune clonal algorithm for underwater acoustic target sample selection[EB/OL].[2017-03-11].https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6718810.
[13] Anwar I M,Salama K M,Abdelbar A M.Instance selection with ant colony optimization[J].Procedia Computer Science,2015,53(1):248-256.
[14] Marcin Blachnik.Ensembles of instance selection methods based on feature subset[J].Procedia Computer Science,2014,35:388-396.
[15] Álvar Arnaiz-González,José-Francisco Díez-Pastor,Juan J Rodríguez,et al.Instance selection of linear complexity for big data[J].Knowledge-Based Systems,2016,107(C):83-95.
[16] Sun Wei,Lin Aiping,Yu Hongshan,et al.All-dimension neighborhood based particle swarm optimization with randomly selected neighbors[J].Information Sciences An International Journal,2017,405(C):141-156.

备注/Memo

备注/Memo:
收稿日期:2018-05-22
基金项目:国家自然科学基金(61462045,61462043,61163006)和江西省教育厅科学技术研究(GJJ150354)资助项目.
作者简介:万中英(1977-),女,江西南昌人,副教授,主要从事信息检索、文本挖掘研究.E-mail:libby2005@126.com
更新日期/Last Update: 2019-02-10