[1]李 松,吴润秀*,康 平,等.基于自适应剪辑与概率参数的Tri-Training算法[J].江西师范大学学报(自然科学版),2023,(05):490-496.[doi:10.16357/j.cnki.issn1000-5862.2023.05.08]
 LI Song,WU Runxiu*,KANG Ping,et al.The ADP-Tri-Training:Tri-Training with Adaptive Editing and Probability Parameters[J].Journal of Jiangxi Normal University:Natural Science Edition,2023,(05):490-496.[doi:10.16357/j.cnki.issn1000-5862.2023.05.08]
点击复制

基于自适应剪辑与概率参数的Tri-Training算法()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2023年05期
页码:
490-496
栏目:
信息科学与技术
出版日期:
2023-09-25

文章信息/Info

Title:
The ADP-Tri-Training:Tri-Training with Adaptive Editing and Probability Parameters
文章编号:
1000-5862(2023)05-0490-07
作者:
李 松吴润秀*康 平赵 嘉
(南昌工程学院信息工程学院,江西 南昌 330099)
Author(s):
LI SongWU Runxiu*KANG PingZHAO Jia
(School of Information Engineering,Nanchang Institute of Technology,Nanchang Jiangxi 330099,China)
关键词:
半监督学习 自适应策略 概率参数 三体训练算法
Keywords:
semi-supervised learning adaptive strategy probability parameter Tri-training
分类号:
TP 311
DOI:
10.16357/j.cnki.issn1000-5862.2023.05.08
文献标志码:
A
摘要:
半监督学习利用少量标签数据和大量的无标签数据进行学习.Tri-training是一种基于分歧的半监督分类算法,在进行伪标记时会因误标记而使训练集产生噪声,从而导致算法分类性能下降.为了减少误标记对算法分类性能的影响,该文提出一种基于自适应剪辑与概率参数的Tri-training算法(ADPT).新算法利用基于最近邻的RemoveOnly数据剪辑技术对触发自适应剪辑策略的标记数据进行噪声识别及剔除,而未触发自适应剪辑策略的标记数据则用概率参数方法对噪声进行识别及剔除.为验证本文算法的分类性能,采用4个评价指标,在9组UCI数据集上进行实验,并与相关算法进行比较.实验结果表明:该算法在准确率、精度、召回率及Fmeasure等评价指标上与其他算法相比,具有明显优势.
Abstract:
Semi-supervised learning utilizes a small amount of labeled data and a large amount of unlabeled data for learning.Tri-training is a divergence-based semi-supervised classification algorithm.When pseudo-labeling,Tri-training will cause noise in the training set due to mislabeling,which will reduce the classification performance of the algorithm.In order to reduce the impact of mislabeling on the classification performance of the algorithm,the ADP-Tri-training that is tri-training with adaptive editing and probability parameters(ADPT)is proposed.Firstly the new algorithm uses the nearest neighbor-based RemoveOnly data editing technology to identify and eliminate the noise of the marked data that triggers the adaptive editing strategy,while the marked data that does not trigger the adaptive editing strategy uses the probability parameter method to identify and eliminate noise.In order to verify the classification performance of the algorithm in this paper,four evaluation indicators are used to conduct experiments on 9 groups of UCI datasets,and compare with related algorithms.The experimental results show that the algorithm in this paper has obvious advantages compared with other algorithms in terms of accuracy,precision,recall and Fmeasure indicators.

参考文献/References:

[1] XU Mengfan,LI Xinghua,LIU Hai,et al.An intrusion detection scheme based on semi-supervised learning and information gain ratio [J].Journal of computer research and development,2017,54(10):2255-2267.
[2] CHAPELLE O,SCHOLKOPF B,ZIEN A.Semi-supervised learning [M].Cambridge:MIT Press,2006.
[3] 周志华.基于分歧的半监督学习[J].自动化学报,2013,39(11):1871-1878.
[4] FISHER R A.The use of multiple measurements in taxonomic problems [J].Annals of Eugenics,1936,7(2):179-188.
[5] BAUDAT G,ANOUAR F.Generalized discriminant analysis using a kernel approach [J].Neural Compu-tation,2000,12(10):2385-2404.
[6] RABINER L R.A tutorial on hidden Markov models and selected applications in speech recognition [J].Procee-dings of the IEEE,1989,77(2):257-286.
[7] SHAHSHAHANI B M,LANDGREBE D A.The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon [J].IEEE Transactions on Geoscience and Remote Sensing,1994,32(5):1087-1095.
[8] WANG Fei,ZHANG Changshui.Label propagation through linear neighborhoods [J].IEEE Transactions on Know-ledge and Data Engineering,2008,20(1):55-67.
[9] BREVE F,ZHAO Liang,QUILES M,et al.Particle competition and cooperation in networks for semi-supervised learning[J].IEEE Transactions on Knowledge and Data Engineering,2011,24(9):1686-1698.
[10] BLUM A,MITCHELL T.Combining labeled and unlabeled data with co-training [EB/OL].[2022-02-06].https://is.muni.cz/el/1433/jaro2010/PV056/um/12319818/Blum-Mitchell-Cotraining.pdf.
[11] GOLDMAN S,ZHOU Yan.Enhancing supervised learning with unlabeled data [EB/OL].[2022-03-16]. http://citeseerx.ist.psu.edu/viewdoc/download; jsessionid=1B220CE1696AD240EF611FDAF54AC93F?doi=10.1.1.33.2574&rep=rep1&type=pdf.
[12] ZHOU Zhihua,LI Ming.Tri-training:exploiting unlabeled data using three classifiers [J].IEEE Transactions on knowledge and Data Engineering,2005,17(11):1529-1541.
[13] 张永,陈蓉蓉,张晶.基于交叉熵的安全Tri-training算法 [J].计算机研究与发展,2021,58(1):60-69
[14] 莫建文,贾鹏.基于梯形网络和改进三训练法的半监督分类[EB/OL].[2022-03-19].https://doi.org/10.16383/j.aas.c190869.
[15] 邓超,郭茂祖.基于Tri-Training和数据剪辑的半监督聚类算法[J].软件学报,2008,19(3):663-673.
[16] 杨艺,蒋良孝,李超群,等.一种基于Tri-training的众包标记噪声纠正算法 [J].电子学报,2021,49(3):424-434.
[17] 邓超,郭茂祖.基于自适应数据剪辑策略的Tri-training算法 [J].计算机学报,2007,30(8):1213-1226.
[18]ANGLUIN D,LAIRD P.Learning from noisy examples [J].Machine Learning,1988,2(4):343-370.
[19] 李敦明.基于半监督学习策略的网络异常检测方法研究 [D].上海:华东师范大学,2019.
[20] BACHE K,Lichman M.UCI machine learning repository [EB/OL].[2022-6-30].https://www.researchgate.net/publication/272825857_UCI_Machine_Learning_Repository.

相似文献/References:

[1]王艳华,杨志豪,李彦鹏,等.基于监督学习和半监督学习的蛋白质关系抽取[J].江西师范大学学报(自然科学版),2013,(04):392.
 WANG Yan-hua,YANG Zhi-hao,LI Yan-peng,et al.Protein-Protein Interaction Extraction Based on the Combination of Supervised and Semi-Supervised Learning Method[J].Journal of Jiangxi Normal University:Natural Science Edition,2013,(05):392.

备注/Memo

备注/Memo:
收稿日期:2023-05-11
基金项目:国家自然科学基金(52069014)和江西省教育厅科技计划课题(GJJ180940,GJJ201915)资助项目.
通信作者:吴润秀(1971—),女,江西南丰人,教授,主要从事大数据分析与群智能算法研究.E-mail:wrx@nit.edu.cn
更新日期/Last Update: 2023-09-25