[1]卢逸君,滕少华*.小样本数据生成及其在异常检测中的应用[J].江西师范大学学报(自然科学版),2020,(04):385-393.[doi:10.16357/j.cnki.issn1000-5862.2020.04.10]
 LU Yijun,TENG Shaohua*.The Generation of Minority Sample Data and Its Application in Abnormal Detection[J].Journal of Jiangxi Normal University:Natural Science Edition,2020,(04):385-393.[doi:10.16357/j.cnki.issn1000-5862.2020.04.10]
点击复制

小样本数据生成及其在异常检测中的应用()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2020年04期
页码:
385-393
栏目:
信息科学与技术
出版日期:
2020-08-10

文章信息/Info

Title:
The Generation of Minority Sample Data and Its Application in Abnormal Detection
文章编号:
1000-5862(2020)04-0385-09
作者:
卢逸君12滕少华1*
1.广东工业大学计算机学院,广东 广州 510006; 2.广东省信息安全测评中心,广东 广州 510095
Author(s):
LU Yijun12TENG Shaohua1*
1.College of Computer,Guangdong University of Technology,Guangzhou Guangdong 510006,China; 2.Guangdong Information Technology Security Evaluation Center,Guangzhou Guangdong 510095,China
关键词:
卷积神经网络 生成式对抗网络 样本生成 主机入侵检测 神经网络
Keywords:
convolutional neural networks generative adversarial networks sample generation host-based intrusion detection neural network
分类号:
TP 183
DOI:
10.16357/j.cnki.issn1000-5862.2020.04.10
文献标志码:
A
摘要:
在不平衡数据的应用中,少量的负样本(异常数据)往往是检测准确率低的重要原因,如在主机异常检测领域中,异常样本过少使得检测效果不佳.为解决这一问题,该文改进了深度卷积生成对抗网络,使其更易于收敛和生成样本.再通过将改进的深度卷积生成对抗网络用于入侵检测评测数据集ADFA-LD异常样本的训练,构造出更多的异常样本.最后,为验证生成样本的效果,以多种异常检测方法检测对上述增加样本后的平衡数据进行实验,实验结果发现新增加的异常样本能被全部检测出,而且已测出的异常样本无漏检,实现了高检测率和低误报率.对比实验表明该文提出的小样本数据生成方法能有效解决某些数据不平衡的应用问题.
Abstract:
In the application of unbalanced data,the small number of negative samples(abnormal data)can be an important reason for low detection rate,as in the field of host based intrusion detection,the gap of sample size for majority class and minority class can lead to poor detection result.To solve this problem,the deep convolutional generative adversarial networks(DCGAN)are improved in the paper,making it easier to converge and generate more ideal samples,which introduces improved DCGAN to the intrusion detection evaluation data set ADFA-LD and generates more abnormal samples to make the data set more balanced.Finally,a variety of abnormal detection methods are used in the paper to observe the effect of this data-balancing method.The result shows that newly generated abnormal samples can all be detected,without missing any detected abnormal sample,which leads to higher detection rate and lower false positive rate.Therefore,it is concluded that this data generation method can effectively alleviate some data imbalance problems in practice.

参考文献/References:

[1] Zachary G,Schwartz S.Data preprocessing and feature selection for an intrusion detection system dataset[EB/OL].[2019-12-16].https://www.researchgate.net/publication/331730342_Data_preprocessing_and_feature_selection_for_machine_learning_intrusion_detection_systems.
[2] Shekarforoush S,Green R C,Dyer R,et al.Classifying commit messages:a case study in resampling techniques[EB/OL].[2019-12-16].http://ieeexplore.ieee.org/iel7/7958416/7965814/07965999.pdf.
[3] Nitesh V,Kevin W,Lawrence O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2012,16:321-357.
[4] Liu Kaijian,Fan Zhen,Liu Meiqin,et al.Hybrid intrusion detection method based on K-means and CNN for smart home[EB/OL].[2019-12-16].https://www.researchgate.net/publication/332378390_Hybrid_Intrusion_Detection_Method_Based_on_K-Means_and_CNN_for_Smart_Home.
[5] Promper C,Engel D,Green R C.Anomaly detection in smart grids with imbalanced data methods[EB/OL].[2019-12-16].http://www.en-trust.at/papers/promper17a.pdf.
[6] Lee J,Park K.GAN-based imbalanced data intrusion detection system[EB/OL].[2019-12-16].https://www.researchgate.net/publication/337169111_GAN-based_imbalanced_data_intrusion_detection_system.
[7] Salem M,Taheri S,Yuan J S,et al.Anomaly generation using generative adversarial networks in host based intrusion detection[EB/OL].[2019-12-16].http://arxiv.org/abs/1812.04697.
[8] Goodfellow I,Pougetabadie J,Mirza M,et al.Generative adversarial nets[EB/OL].[2019-12-16].http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/808.
[9] Radford A,Metz L,Chintala S.Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].[2019-12-16].https://arxiv.org/abs/1511.06434v2.
[10] Creech G,Hu Jiankun.A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns[J].IEEE Transactions on Computers,2014,63(4):807-819.
[11] Forrest S,Hofmeyr S,Somayaji A,et al.A sense of self for Unix processes[M].New York:IEEE Symposium on Security and Privacy,1996:120-128.
[12] Subba B,Biswas S,Karmakar S.Host based intrusion detection system using frequency analysis of N-gram term[EB/OL].[2019-12-16].https://www.researchgate.net/publication/322218531_Host_based_intrusion_detection_system_using_frequency_analysis_of_n-gram_terms.
[13] Serpen G,Aghaei E.Host-based misuse intrusion detection using PCA feature extraction and kNN classification algorithms[J].Intelligent Data Analysis,2018,22(5):1101-1114.
[14] Aghaei E,Serpen G.Host-based anomaly detection using Eigentraces feature extraction and one-class classification on system call trace data[R/OL].[2019-12-16].https://arxiv.org/abs/1911.11284.
[15] Warrender C,Forrest S,Pearlmutter B.Detecting intrusions using system calls:alternative data models[EB/OL].[2019-12-16].https://ieeexplore.ieee.org/document/766910.
[16] Gao Debin,Reiter M K,Song D.Behavioral distance measurement using hidden markov models[EB/OL].[2019-12-16].https://www.researchgate.net/publication/221427551_Behavioral_Distance_Measurement_Using_Hidden_Markov_Models.
[17] Liao Yihua,Vemuri V R.Use of K-nearest neighbor classifier for intrusion detection[J].Computers and Security,2002,21(5):439-448.
[18] Zhang Zonghua,Shen Hong.Application of online-training SVMs for real-time intrusion detection with different considerations[J].Computer Communications,2005,28(12):1428-1442.
[19] Vijayanand R,Devaraj D,Kannapiran B,et al.A novel intrusion detection system for wireless mesh network with hybrid feature selection technique based on GA and MI[J].Journal of Intelligent and Fuzzy Systems,2018,34(3):1243-1250.
[20] Bridges R A,Glass-Vanderlan T R,Iannacone M D,et al.A Survey of intrusion detection systems leveraging host data[J].ACM Computeng Surveys,2020,52(6):1-35.
[21] Teng Shaohua,Wu Naiqi,Zhu Haibin,et al.SVM-DT-based adaptive and collaborative intrusion detection[J].IEEE/CAA Journal of Automatica Sinica,2018,5(1):108-118.
[22] Deshpande P,Sharma S C,Peddoju S K,et al.HIDS:a host based intrusion detection system for cloud computing environment[J].International Journal of System Assurance Engineering and Management,2018,9(3):567.
[23] Kashyap A,Kumar G S,Jangir S,et al.IHIDS:introspection-based hybrid intrusion detection system in cloud environment[EB/OL].[2019-12-16].https://ieeexplore.ieee.org/iel7/8119306/8125802/08125921.pdf.
[24] Creech G,Hu Jiankun.A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns[J].IEEE Transactions on Computers,2014,63(4):807-819.
[25] Xie Miao,Hu Jiankun,Slay J,et al.Evaluating host-based anomaly detection systems:application of the one-class SVM algorithm to ADFA-LD[EB/OL].[2019-12-16].https://www.researchgate.net/publication/287318600_Evaluating_host-based_anomaly_detection_systems_Application_of_the_one-class_SVM_algorithm_to_ADFA-LD.
[26] Msika S,Quintero A,Khomh F.SIGMA:strengthening IDS with GAN and Metaheuristics attacks[EB/OL].[2019-12-18].https://arxiv.org/abs/1912.09303.
[27] 滕少华,孔棱睿.基于生成式对抗网络的中文字体风格迁移[J].计算机应用研究,2019,36(10):3164-3167.
[28] 卢逸君.一种主机序列入侵检测方法:中国,2019105964097[P].2019-10-15.

备注/Memo

备注/Memo:
收稿日期:2020-01-23
基金项目:国家自然科学基金(61702110,61772141,61972102),广东省重点领域研发计划(2020B010166006),广东省教育厅课题(粤教高函[2018]179号,粤教高函[2018]1号)和广州市科技计划课题(201903010107)资助项目.
通信作者:滕少华(1962-),男,江西南昌人,教授,博士,主要从事大数据、数据挖掘、数字音频分析与处理、网络安全方面的研究.E-mail:shteng@gdut.edu.cn
更新日期/Last Update: 2020-08-10