[1]何思兰,左家莉*,朱洪坤,等.文言文-现代文神经机器翻译的研究[J].江西师范大学学报(自然科学版),2023,(05):483-489.[doi:10.16357/j.cnki.issn1000-5862.2023.05.07]
 HE Silan,ZUO Jiali *,ZHU Hongkun,et al.The Study on the Neural Machine Translation of Ancient Chinese-Modern Chinese[J].Journal of Jiangxi Normal University:Natural Science Edition,2023,(05):483-489.[doi:10.16357/j.cnki.issn1000-5862.2023.05.07]
点击复制

文言文-现代文神经机器翻译的研究()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2023年05期
页码:
483-489
栏目:
信息科学与技术
出版日期:
2023-09-25

文章信息/Info

Title:
The Study on the Neural Machine Translation of Ancient Chinese-Modern Chinese
文章编号:
1000-5862(2023)05-0483-07
作者:
何思兰左家莉*朱洪坤王明文
(江西师范大学计算机信息工程学院,江西 南昌 330022)
Author(s):
HE Silan ZUO Jiali * ZHU Hongkun WANG Mingwen
(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang Jiangxi 330022,China)
关键词:
文言文-现代文神经机器翻译 Seq2Seq模型 翻译
Keywords:
ancient Chinese-modern Chinese neural machine translation Seq2Seq model transformer
分类号:
TP 181
DOI:
10.16357/j.cnki.issn1000-5862.2023.05.07
文献标志码:
A
摘要:
中国古典文献汗牛充栋,它们是中国文化的瑰宝,但现代人想要理解这些文献极为困难,人工翻译它们更是不可能完成的任务.因此,该文研究了文言文-现代文的神经机器翻译,通过应用Seq2Seq模型和Transformer模型,考察了训练语料规模对文言文-现代文翻译性能的影响.研究结果发现:基于现有的训练语料规模,分词与否会极大影响Seq2Seq模型的性能.此外,若训练语料和测试语料的文体不同,则模型的性能也会受到影响.
Abstract:
There are a large number of Chinese ancient documents, which are the treasures of Chinese civilization. However, it is extremely difficult for modern people to understand these documents, and it is also impossible to translate them manually. Therefore, the Neural Machine Translation of ancient Chinese-modern Chinese is studied.By applying the Seq2Seq model and Transformer model, the impact of the size of training corpus on the translation performance of Ancient Chinese-Modern Chinese is investigated.It is also found that based on the existing training corpus of this scale, word segmentation will greatly affect the performance of Seq2Seq model. In addition, if the style of training corpus and test corpus is different, the performance of the model will also be affected.

参考文献/References:

[1] FORCADA M L, GINESTÍ-ROSELL M, NORDFALK J, et al.Apertium:a free/open-source platform for rule-based machine translation [J].Machine Translation,2011,25(2):127-144.
[2] BROWN P F,DELLA PIETRA V J,DELLA PIETRA S A,et al.The mathematics of statistical machine translation: parameter estimation [J].Computational Linguistics,1993,19(2):263-311.
[3] MANNING C D.Human language understanding & reason-ing [J].Daedalus,2022,151(2):127-138.
[4] ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent neural network regularization [EB/OL].[2022-01-19].http://de.arxiv.org/pdf/1409.2329.
[5] KALCHBRENNER N,BLUNSOM P.Recurrent continuous translation models [EB/OL].[2022-01-11]. https://www.researchgate.net/publication/289758666_Recurrent_ continuous_translation_models.
[6] CHO K,VAN MERRIЁNBOER B,BAHDANAU D,et al.On the properties of neural machine translation: Encoder-decoder approaches[EB/OL].[2022-01-17]. http://de.arxiv.org/pdf/1409.1259.
[7] CHO K,VAN MERRIЁNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation [EB/OL].[2022-02-10].https://arxiv.org/pdf/1406.1078.pdf.
[8] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate [EB/OL].[2022-02-16]. https://arxiv.org/pdf/1409.0473.pdf.
[9] WU Yonghui, SCHUSTER M, CHEN Zhifeng, et al. Google's neural machine translation system: Bridging the gap between human and machine translation [EB/OL].[2022-02-18].https://arxiv.org/pdf/1609.08144.pdf.
[10] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning [EB/OL].[2022-02-22]. https://arxiv.org/abs/1705.03122.
[11] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need [EB/OL]. [2022-01-22].https://doi.org/10.48550/arxiv.1706.03762.
[12] FENG Yang, XIE Wanying, GU Shuhao, et al. Modeling fluency and faithfulness for diverse neural machine translation [EB/OL].[2022-03-10]. https://arxiv.org/abs/1912.00178.
[13] FENG Yang, GU Shuhao,GUO Dengji, et al.Guiding teacher forcing with seer forcing for neural machine translation [EB/OL].[2022-02-13]. https://arxiv.org/abs/2106.06751v1.
[14] CHEN Kehai, WANG Rui, UTIYAMA M, et al. Content word aware neural machine translation[EB/OL].[2022-03-11]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 358.
[15] QIU Xipeng, PEI Hengzhi, YAN Hang, et al. A concise model for multi-criteria Chinese word segmentation with transformer encoder [EB/OL].[2022-03-19]. https://aclanthology.org/2020.findings-emnlp.260/.
[16] BAUGH A,CABLE T.A history of the English language [M].London:Routledge,1993.
[17] JHAMTANI H,GANGAL V,HOVY E,et al.Shakespearizing modern language using copy-enriched sequence-to-sequence models [EB/OL].[2022-04-11]. https://arxiv.org/pdf/1707.01161.pdf.
[18] CARLSON K, RIDDELL A, ROCKMORE D. Evaluating prose style transfer with the Bible [J].Royal Society open science,2018,5(10):171920.
[19] YANG Zhichao, CAI Pengshan, FENG Yansong, et al. Generating classical Chinese poems from vernacular Chinese [EB/OL].[2022-04-15].https://arxiv.org/abs/1909.00279.
[20] CHANG E, SHIUE Y T, YEH H S, et al. Time-aware ancient Chinese text translation and inference [EB/OL].[2022-02-17]. https://arxiv.org/abs/2107.03179v1.
[21] LIU Dayiheng, YANG Kexin, QU Qian, et al. Ancient-modern chinese translation with a new large training dataset [J].ACM Transactions on Asian and Low-Resource Language Information Processing(TALLIP), 2019,19(1):1-13.
[22] ZHANG Zhiyuan,LI Wei,SU Qi.Automatic translating between ancient Chinese and contemporary Chinese with limited aligned corpora[EB/OL].[2022-03-19]. https://arxiv.org/abs/1803.01557.
[23] DANKERS V, LUCAS C G, TITOV I. Can transformer be too compositional? analysing idiom processing in neural machine translation [EB/OL].[2022-02-19]. https://arxiv.org/abs/2205.15301.
[24] YANG Zinong, CHEN Kejia,CHEN Jingqiang.Guwen-UNILM: machine translation between ancient and modern Chinese based on pre-trained models [EB/OL].[2022-04-10].https://link.springer.com/chapter/10.1007/978-3-030-88480-2_10.
[25] SUTSKEVER I, VINYALS O, LE Quoc V. Sequence to sequence learning with neural networks [EB/OL].[2022-02-18].https://www.docin.com/p-2163732294.html.
[26] PAPINENI K,ROUKOS S,WARD T,et al. Bleu:a method for automatic evaluation of machine translation[EB/OL].[2022-02-15].https://aclanthology.org/P02-1040.pdf.

备注/Memo

备注/Memo:
收稿日期:2022-10-19
基金项目:国家自然科学基金(61866018,62266023)资助项目.
通信作者:左家莉(1982—),女,江西宜春人,副教授,博士,主要从事自然语言处理的研究.E-mail: zjl@jxnu.edu.cn
更新日期/Last Update: 2023-09-25