[1]潘敏,王明文,王晓庆,等.基于簇特征的文本增量聚类研究[J].江西师范大学学报(自然科学版),2014,(01):95-101.
 PAN Min,WANG Ming-wen,WANG Xiao-qing,et al.A Research on the Text Incremental Clustering Based on Cluster Features[J].,2014,(01):95-101.
点击复制

基于簇特征的文本增量聚类研究()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2014年01期
页码:
95-101
栏目:
出版日期:
2014-02-28

文章信息/Info

Title:
A Research on the Text Incremental Clustering Based on Cluster Features
作者:
潘敏;王明文;王晓庆;揭安全
江西师范大学计算机信息工程学院,江西南昌,330022
Author(s):
PAN Min;WANG Ming-wen;WANG Xiao-qing;JIE An-quan
关键词:
增量聚类文本聚类中心矩簇特征
Keywords:
incremental clusteringtext clusteringcentral momentcluster features
分类号:
TP311
文献标志码:
A
摘要:
提出了一种基于簇特征的文本增量聚类算法:充分利用简单、有效的k-means算法来进行初始聚类,并保留聚类后每个簇的簇中心、均值、方差、文档数、3阶中心矩和4阶中心矩作为该簇的簇特征,当出现新增数据时,利用初始簇的簇特征对新增数据进行聚类.在20newsgroups数据集上的实验结果表明:相比于对整个数据集进行重新聚类,该算法具有一定的优势.
Abstract:
A text incremental clustering algorithm based on cluster features has been presented.Firstly,initial clustering is performed by making full use of simple and efficient k-means algorithm.Secondly,the clustering center,mean,variance,the number of document,the third central moment and the fourth central moment are saved as the cluster features of each cluster.Finally,when new documents occur,they are incrementally clustered with those cluster features.The experimental results on 20newsgroups data set demonstrate that the algorithm the paper presents has some advantages.

参考文献/References:

[1] Chen Chien-Yu,Hwang Shien-Ching,Yen-Jen Oyang.A statistics-based approach to control the quality of subclusters in incremental gravitational clustering [J].Pattern Recognition,2005,38(12):2256-2269.
[2] Ian Davidson,Ravi S S,Martin Ester.Efficient incremental constrained clustering [EB/OL].
[2013-03-12].http:∥www.cs.ucdavis.edu/~davidson/Publications/KDDinc.pdf.
[3] Sophoin Khy,Yoshiharu Ishikawa,Hiroyuki Kitagawa.Incremental clustering based on novelty of on-line documents [J].Nihon Detabesu Gakkai Letters,2006,5(1):57-60.
[4] Boris Martínez,Francisco Herrera,Jesús Fernández,et al.An incremental clustering method and its application in online fuzzy modeling [J].Studies in Fuzziness and Soft Computing,224:163-178.
[5] Walaa K G,Mohamed S K.Incremental clustering algorithm based on phrase-semantic similarity histogram [EB/OL].
[2013-03-17].http:∥ieeexplore.ieee.org/xpl/abstractKeywords.jsp?arnumber=5580499.
[6] Sebastian Luhr,Mihai Lazarescu.Incremental clustering of dynamic data streams using connectivity based representative points [J].Data & Knowledge Engineering,2009,68(1):1-27.
[7] Zhou Yang,Cheng Hong,Jeffrey X Y.Clustering large attributed graphs:an efficient incremental approach [EB/OL].
[2013-03-19].http:∥ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5694023.
[8] Zhou Yang,Cheng Hong,Jeffrey X Y.Graph clustering based on structural/attribute similarities.VLDB,pp.718-729,2009.
[9] Ning Huazhong,Xu Wei,Chi Yun,et al.Incremental spectral clustering by efficiently updating the eigen-system [J].Pattern Recognition,2010,43(1):113-127.
[10] Serhat Selcuk Bucak,Bilge Gunsel.Incremental clustering via nonnegative matrix factorization [EB/OL].
[2013-03-19].http:∥ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4761104.
[11] 曾道建,来斯惟,张元哲,等.面向非结构化文本的开放式实体属性抽取 [J].江西师范大学学报:自然科学版,2013,37(3):279-283,305.
[12] 万中英,王明文,揭安全,等.投影寻踪模型中投影指标的改进 [J].江西师范大学学报:自然科学版,2013,37(3):284-287.

备注/Memo

备注/Memo:
国家自然科学基金(60963014);江西省自然科学基金(20114BAB201037)
更新日期/Last Update: 1900-01-01