[1]苗永春,程艳.离群点检测方法及其在大数据时代下的改进方法研究[J].江西师范大学学报(自然科学版),2014,(05):454-458.
 MIAO Yong-chun,CHENG Yan.The Outlier Detection Method and Its Improvement in the Eea of Big Data[J].Journal of Jiangxi Normal University:Natural Science Edition,2014,(05):454-458.
点击复制

离群点检测方法及其在大数据时代下的改进方法研究()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2014年05期
页码:
454-458
栏目:
出版日期:
2014-10-31

文章信息/Info

Title:
The Outlier Detection Method and Its Improvement in the Eea of Big Data
作者:
苗永春;程艳
江西师范大学计算机信息工程学院,江西 南昌,330022
Author(s):
MIAO Yong-chun;CHENG Yan
关键词:
大数据离群点检测方法改进策略
Keywords:
big dataoutlier detection methodimprovement strategies
分类号:
TP391;TP311
文献标志码:
A
摘要:
通过对当前有代表性的离群数据检测方法的分析和比较,总结了各方法的特性及优缺点。针对大数据的数据量大、维数高的特性,分析了离群点检测方法的改进策略,并以 T-ODCD 算法和 AROD 算法为例,进一步说明离群点检测改进策略。
Abstract:
The paper compared and analyzed major outlier detection method and their features and merit and demerit were summarized. In addition,in view of the large amount of data and high dimension of the big data,improvement strategies of outlier detection method were analyzed. Improvement strategies of outlier detection were further illustra-ted by T-ODCD and AROD algorithms.

参考文献/References:

[1] Barwick H.The “four Vs ” of big data.Implementing information infrastructure symposium [EB/OL].
[2012-10-02].http://www.computerworld.com.au/article/396198/iiis_four_vs_big_data/.
[2] Han Jiawei,Kamber.Data mining:concepts and techniques [M].2ed.San Francisco:Morgan Kaufmann,2006.
[3] 薛安荣,姚林,鞠时光,等.离群点挖掘方法综述 [J].计算机科学,2008,35(11):13-27.
[4] 薛安荣,鞠时光,何伟华,等.局部离群点挖掘算法研究 [J].计算机学报,2007,30(8):1456-1463.
[5] 黄洪宇,林甲祥,陈崇成,等.离群数据挖掘综述 [J].计算机应用研究,2006,8:8-11.
[6] Hawkins D.Identification of outliers [M].London:Chapman and Hall,1980.
[7] 徐翔,刘建伟,罗雄麟.离群点挖掘研究 [J].计算机应用研究,2009,26(1):34-39.
[8] Barnett V,Lewis T.Outliers in statistical data [M].New York:John Wiley & Sons,1994.
[9] 金义富,邓明.基于统计的离群数据挖掘与分析 [J].湛江师范学院学报,2007,28(6):71-73.
[10] 李志云.数据挖掘中离群点检测的非参数方法研究 [J].微型电脑应用,2013,29(8):46-47.
[11] Paul S T,Fung K Y.A Generalized extreme studentized residual multiple-outlier-detection procedure in linear regression [J].Techno-metrics,1991,33:339-348.
[12] 史东辉,张春阳,蔡庆生.离群数据的挖掘方法研究 [J].小型微型计算机系统,2001,22(10):234-236.
[13] 杨茂林.离群检测算法研究 [D].武汉:华中科技大学,2012.
[14] Knorr E M,Ng R T.Algorithms for mining distance-based outliers in large datasets [C]//New York:Proc of Int Conf Very Large Data-bases(VLDB'98),1998:392-403.
[15] Knorr E,Ng R.Findingintensional knowledge of distance-based outliers [C]//Scotland:Proc of the 25 th VLDB Conference Edin-burgh,1999:211-222.
[16] Angiulli F,Pizzuti C.Fast outlier detection in high dimensional spaces [EB/OL].
[2012-10-16].http://www.researchgate.net/publication/220699183_Fast_Outlier_Detection_in_High_Dimensional_Spaces.
[17] Bay S D,Schwabacher M.Mining distance-based outliers in near linear time with randomization and a simple pruning rule [C].Washington,DC:Sigkdd,2003.
[18] An Jiawei,Kamber M.Datamining:concepts and techniques [M].New York:Academic Press,2001.
[19] 胡彩平,秦小麟.一种基于密度的局部离群点检测算法DLOF [J].计算机研究与发展,2010,47(12):2110-2116.
[20] 杨福萍,王洪国,等.基于聚类划分的两阶段离群点检测算法 [J].计算机应用研究,2013,30(7):1943-1945.
[21] Spiros Papadimitriou,Hiroyuki Kitagawa,et al.LOCI:fast outlier detection using the local correlation integral [EB/OL].
[2013-10-12] 10.1109/ICDE.2003.1260802.
[22] Breuning M M,Kriegel H P,Ng R T,et al.LOF:identifying density-based local outliers [C].Dallas:ACM Press,2000:93-104.
[23] 施化吉,周书勇,李星毅,等.基于平均密度的孤立点检测研究 [J].电子科技大学学报,2007,36(6):1286-1288.
[24] Aggarwal C C,Yu P.Findinggeneralized projected clusters in high dimensional spaces [C].Dallas:ACM Press,2000:70-81.
[25] 张卫旭,尉宇.基于密度的局部离群点检测算法 [J].计算机与数字工程,2010,38(10):11-14.
[26] Ng R,Han J.Efficient and effective clustering methods for spatial data mining [C].California:Morgan Kaufmann Publishers Inc,1994,144-155.
[27] 蒋盛益,李霞,郑琪.数据挖掘原理与实践 [M].北京:电子工业出版社,2011.
[28] Xu R,Wunsch II D.Survey of clustering algorithms [J].IEEE Transactions on Neural Networks,2005,16(3):645-678.
[29] Das K,Schneider J.Detecting anomalous records in categorical dataset [C].New York:ACM,2007,220-229.
[30] Markou M,Singh S.Novelty detection:a review-part2:neural network based approaches [J].Signal Processing,2003,83(12):2499-2521.
[31] Wong W K,Moore A,Cooper G,et al.Bayesian network anomaly pattern detection for disease outbreaks [C].Washington DC:AAAI Press,2003:808-815.
[32] Ratsch Q Mika S,Scholkopf B.Constructing boosting algorithms from svms:An application to one-class classification [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(9):1184-1199.
[33] Mahoney M V,Chan P K.Learning rules for anomaly detection of hostile network traffic [C].Washington DC:IEEE,2003:601-604.
[34] 崔贯勋,朱庆生.一种改进的基于密度的离群数据挖掘算法 [J].计算机应用,2007,27(3):560-573.
[35] 古平,刘海波,罗志恒.一种基于多重聚类的离群点检测算法 [J].计算机应用研究,2013,30(3):751-754.
[36] 赵战营,成长生.基于聚类分析局部离群点挖掘改进算法的研究与实现 [J].计算机应用与软件,2010,27(11):255-258.
[37] Agrawal R,Gehrke J,Gunopulos D,et al.Automatic subspace clustering of high dimensional data for data mining applications [EB/OL].
[2013-10-17].http://wenku.baidu.com/link?url=GuhDQJR7Xnz0D_PifjZVa1jMJtC- iFqlbh_qphD8egqzM_2fkYZJLCaj8sfpFuJ 5gocOgVM3vv- U2c_NX_AlhbEd0BhLCW4bagPjP3CYF 1Qmq.
[38] 吴晓燕.高维数据空间中离群点检测算法的研究 [D].南京:南京财经大学,2010.
[39] 王芳.基于属性重要度的属性约简算法研究 [D].成都:电子科技大学,2011.
[40] Ye Zhengwang.The research of intrusion detection algorithms based on the clustering of information entropy [C].Wuhan:Hubei University of Technology,2010:552-555.
[41] 陈源,曾德胜,谢冲.基于聚类的属性约简方法 [J].计算机系统应用,2009,(5):173-176.
[42] 孟小峰,慈祥.大数据管理、概念技术与挑战 [J].计算机研究与发展,2013,50(1):146-169.

备注/Memo

备注/Memo:
国家社科基金教育学青年课题“教育虚拟社区的群集智能化构建方法研究”(CCA110109)
更新日期/Last Update: 1900-01-01