[1]程艳,苗永春.高维数据流的聚类离群点检测算法研究[J].江西师范大学学报(自然科学版),2014,(05):449-453.
 CHENG Yan,MIAO Yong-chun.The Study on Clustering-Based Outlier Detection Algorithm for High-Dimensional Data Stream[J].,2014,(05):449-453.
点击复制

高维数据流的聚类离群点检测算法研究()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2014年05期
页码:
449-453
栏目:
出版日期:
2014-10-31

文章信息/Info

Title:
The Study on Clustering-Based Outlier Detection Algorithm for High-Dimensional Data Stream
作者:
程艳;苗永春
江西师范大学计算机信息工程学院,江西 南昌,330022
Author(s):
CHENG Yan;MIAO Yong-chun
关键词:
高维数据流滑动窗口属性约简K-均值微聚类信息熵离群点检测
Keywords:
high-dimensional data streamsliding windowattribute reductionK-meansmicro-clusteringinforma-tion entropyoutlier detection
分类号:
TP311;TP391
文献标志码:
A
摘要:
针对基于聚类的离群点检测算法在处理高维数据流时效率和精确度低的问题,提出一种高维数据流的聚类离群点检测(CODHD-Stream)算法。该算法首先采用滑动窗口技术对数据流划分,然后通过属性约简算法对高维数据集降维;其次运用基于距离的信息熵过滤机制的 K-means 聚类算法将数据集划分成微聚类,并检测微聚类的离群点。通过实验结果分析表明:该算法可以有效提高高维数据流中离群点检测的效率和准确度。
Abstract:
The existing clustering-based outlier detection suffers from low efficiency and precision when dealing with high-dimensional data stream. To relieve this problem,an algorithm of clustering-based outlier detection for high-di-mensional data stream(CODHD-Stream)was presented. The algorithm used sliding window technology to divide the data stream. Then dimensions of high-dimensional data streams were reduced by an attribute reduction algorithm. Fi-nally,it divided the data set into a number of micro-clustering to detect outliers contained in the micro-clustering by the K-means method of the distance-based information entropy mechanism. The experimental analyses show that the proposed algorithm can effectively raise the speed and accuracy of outlier detection in high-dimensional data stream.

参考文献/References:

[1] Wu Xindong,Zhu Xingquan,Wu Gongqing,et al.Data mining with big data [J].Knowledge and Data Engineering,2014,26(1):97-107.
[2] Wang Changdong,Lai Jianghuang,Huang Dong,et al.SVStream:a support vector-based algorithm for clustering data streams [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(6):1410-1424.
[3] Albanese A,Pal S K,Petrosino,A.Rough sets,kernel set,and spatiotemporal outlier detection [J].Knowledge and Data Engineering,2014,26(1):194-207.
[4] Kollios G,Gunopulos D,Koudas N,et al.Efficient biased sampling for approximate clustering and outlier detection in large data sets [J].Knowledge and Data Engineering,2003,15(5):1170-1187.
[5] Charalampidis D.A modified k-means algorithm for circular invariant clustering [J].Pattern Analysis and Machine Intelligence,2005,27(12):1856-1865.
[6] Kanungo Tapas,Mount D M,Netanyahu N S,et al.An efficient k-means clustering algorithm:analysis and implementation [J].Pattern Analysis and Machine Intelligence,2002,24(7):881-892.
[7] Yip A M,Ding C,Chan T F.Dynamic cluster formation using level set methods [J].Pattern Analysis and Machine Intelligence,2006,28(6):877-889.
[8] Guha S,Meyerson A,Mishra N,et al.Clustering data streams:Theory and practice [J].Knowledge and Data Engineering,2003,15(3):515-528.
[9] Jiang Feng,Sui Yuefei,Cao Cungen.An information entropy-based approach to outlier detection in rough sets [J].Expert Syst Appl,2010,37(1):6338-6344.
[10] Kapoor R,Gupta R.Non-linear dimensionality reduction using fuzzy lattices [J].IET Computer Vision,2013,7(3):201-208.
[11] Nie Bin,Wang Zhuo,Du Jianqiang,et al.The research for information granule reduction and cluster based on the partial least squares [J].Journal of Jiangxi Normal University:Natural Science,2012,36(5):472-476.

备注/Memo

备注/Memo:
国家社科基金教育学青年课题“教育虚拟社区的群集智能化构建?椒ㄑ芯俊?CCA110109);国家自然科学基金地区基金(61262080)
更新日期/Last Update: 1900-01-01