[1]杨雨晴,吴水秀*,左家莉.一种改进的中文词嵌入模型[J].江西师范大学学报(自然科学版),2021,(02):131-136.[doi:10.16357/j.cnki.issn1000-5862.2021.02.04]
 YANG Yuqing,WU Shuixiu*,ZUO Jiali.The Modified Chinese Word Embeddings Model[J].Journal of Jiangxi Normal University:Natural Science Edition,2021,(02):131-136.[doi:10.16357/j.cnki.issn1000-5862.2021.02.04]
点击复制

一种改进的中文词嵌入模型()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2021年02期
页码:
131-136
栏目:
信息科学与技术
出版日期:
2021-04-10

文章信息/Info

Title:
The Modified Chinese Word Embeddings Model
文章编号:
1000-5862(2021)02-0131-06
作者:
杨雨晴吴水秀*左家莉
江西师范大学计算机信息工程学院,江西 南昌 330022
Author(s):
YANG YuqingWU Shuixiu*ZUO Jiali
College of Computer and Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China
关键词:
词嵌入 语言模型 自然语言处理
Keywords:
word embedding language model nature language processing
分类号:
TP 311
DOI:
10.16357/j.cnki.issn1000-5862.2021.02.04
文献标志码:
A
摘要:
针对当前中文词嵌入模型无法较好地建模汉字字形结构的语义信息,提出了一种改进的中文词嵌入模型.该模型基于词、字和部件(五笔编码)等粒度进行联合学习,通过结合部件、字和词来构造词嵌入,使得该模型可以有效学习汉字字形结构所蕴含的语义信息,在一定程度上提升了中文词嵌入的质量.
Abstract:
Considering that current Chinese word embedding model can not well model the semantic information of Chinese character's glyph structure,an improved Chinese word embedding model is proposed.The model constructs joint learning based on the granularities of words,characters and components(WUBI),can effectively learn the semantic information contained in the Chinese character glyph structure by constructing word embedding with components,characters and words,and improves the quality of Chinese word embeddings.

参考文献/References:

[1] Felipe A,Geraldo X.Word embeddings:a survey[EB/OL].[2019-03-12].https://arxiv.org/pdf/1901.09069.pdf.
[2] Gerard S,Andrew W,Yang Chungshu.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
[3] David D.The most influential paper gerard salton never wrote[EB/OL].[2019-03-16].https://www.ideals.illinois.edu/handle/2142/1697.
[4] Peter D T,Patrick P.From frequency to meaning:vector space models of semantics[J].Journal of Artificial Intelligence Research,2010,37(1):141-188.
[5] Wang Yuxuan,Hou Yutai,Che Wanxiang,et al.From static to dynamic word representations:a survey[EB/OL].[2019-03-12].https://link.springer.com/article/10.1007/s13042-020-01069-8.
[6] Scott D,Susan T D,George W F,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[7] Yoshua B,Réjean D,Pascal V,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003(3):1137-1155.
[8] Geffrey H,James L M,David E R.Distributed representations[M].Massachusetts:MIT Press,1986:77-109.
[9] Ronan C,Jason W.A unified architecture for natural language processing:deep neural networks with multitask learning[EB/OL].[2019-03-12].https://dl.acm.org/doi/10.1145/1390156.1390177.
[10] Ronan C,Jason W,Léon B,et al.Natural language processing(almost)from scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[11] Zhang Lei,Wang Shuai,Liu Bing.Deep learning for sentiment analysis:a survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2018,8(4):e1253.
[12] Tomas M,Chen Kai,Greg C,et al.Efficient estimation of word representations in vector space[EB/OL].[2019-03-12].https://arxiv.org/abs/1301.3781v3.
[13] Tomas M,Ilya Sr,Chen Kai,et al.Distributed representations of words and phrases and their compositionality[EB/OL].[2019-03-12].https://www.mendeley.com/catalogue/1cc04e87-4750-3f1e-bbd3-7476f9046a47/.
[14] Jeffrey P,Richard S,Christopher D M.Glove:global vectors for word representation[EB/OL].[2020-02-11].https://nlp.stanford.edu/pubs/glove.pdf.
[15] Matthew E P,Mark N,Mohit I,et al.Deep contextualized word representations[EB/OL].[2019-03-12].https://arxiv.org/pdf/1802.05365.pdf.
[16] Jacob D,Chang Mingwei,Kenton Lee,et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2019-03-12].https://nlp.stanford.edu/seminar/details/jdevlin.pdf.
[17] Tom B B,Benjamin M,Nick R,et al.Language models are Few-Shot Learners[EB/OL].[2019-03-12].https://arxiv.org/abs/2005.14165.
[18] Qiu Xipeng,Sun Tianxiang,Xu Yige,et al.Pre-trained models for natural language processing:a survey[EB/OL].[2019-03-12].https://arxiv.org/abs/2003.08271v2.
[19] Chen Xinxiong,Xu Lei,Liu Zhiyuan,et al.Joint learning of character and word embeddings[EB/OL].[2019-03-12].https://dl.acm.org/doi/10.5555/2832415.2832421.
[20] Li Yanran,Li Wenjie,Sun Fei,et al.Component-enhanced Chinese character embeddings[EB/OL].[2019-03-12].https://arxiv.org/abs/1508.06669.
[21] Yin Rongchao,Wang Quan,Li Peng,et al.Multi-granularity Chinese word embedding[EB/OL].[2019-03-12].https://www.aclweb.org/anthology/D16-1100.pdf.
[22] Yu Jinxing,Jian Xun,Xin Hao,et al.Joint embeddings of Chinese words,characters,and fine-grained subcharacter components[EB/OL].[2019-03-12].http://repository.ust.hk/ir/Record/1783.1-87829.
[23] Shi Xinlei,Zhai Junjie,Yang Xudong,et al.Radical embedding:delving deeper to chinese radicals[EB/OL].[2019-03-12].https://www.mendeley.com/catalogue/b7502a9a-cf29-3806-9e84-0120f63fe04b/.
[24] Xu Jian,Liu Jiawei,Zhang Liangang,et al.Improve Chinese word embeddings by exploiting internal structure[EB/OL].[2019-03-12].https://www.aclweb.org/anthology/N16-1119.pdf.
[25] Cao Shaosheng,Lu Wei,Li Xiaolong.Cw2vec:learning Chinese word embeddings with stroke n-gram information[EB/OL].[2019-03-12].http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14724/14187.
[26] Frederick Liu,Lu Han,Chieh Lo,et al.Learning character-level compositionality with visual features[EB/OL].[2019-03-12].https://arxiv.org/abs/1704.04859v1.
[27] Su T R,Lee H Y.Learning Chinese word representations from glyphs of characters[EB/OL].[2019-03-12].https://arxiv.org/abs/1704.04859v1.
[28] Meng Yuxian,Wu Wei,Wang Fei,et al.Glyce:Glyph-vectors for Chinese character representations[EB/OL].[2019-03-12].https://arxiv.org/pdf/1901.10125.pdf.
[29] Sun Chi,Qiu Xipeng,Huang Xuanjing.VCWE:Visual character-enhanced word embeddings[EB/OL].[2019-03-12].https://arxiv.org/pdf/1902.08795.pdf.
[30] Zellig S H.Distributional structure[EB/OL].[2019-03-12].https://www.tandfonline.com/doi/pdf/10.1080/00437956.1954.11659520.
[31] Firth J R.A synopsis of linguistic theory,1930—1955[EB/OL].[2019-03-12].https://www.researchgate.net/publication/238697185_A_synopsis_of_linguistic_theory_1930—1955.
[32] Sepp H,Jürgen S.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[33] Ashish V,Noam S,Niki P,et al.Attention is all you need[EB/OL].[2019-03-12].https://arxiv.org/abs/1706.03762v5.
[34] 王永民.数字键汉字编码技术的研究和应用[J].计算机学报,2008,31(6):1046-1055.
[35] Diederik P K,Jimmy B.Adam:a method for stochastic optimization[EB/OL].[2019-03-12].https://arxiv.org/abs/1412.6980v9.
[36] Nikhil G,Londa S,Dan J,et al.Word embeddings quantify 100 years of gender and ethnic stereotypes[EB/OL].[2019-03-12].https://arxiv.org/abs/1711.08412.
[37] Wang Tianlu,Xi V L,Nazneen F R,et al.Double-hard debias:tailoring word embeddings for gender bias mitigation[EB/OL].[2019-03-12].https://arxiv.org/abs/2005.00965v1.

相似文献/References:

[1]曹中华,黄 欣,彭文忠,等.基于词嵌入特性聚类的文本主题挖掘[J].江西师范大学学报(自然科学版),2022,(05):468.[doi:10.16357/j.cnki.issn1000-5862.2022.05.05]
 CAO Zhonghua,HUANG Xin,PENG Wenzhong,et al.The Topic Mining Based on Word Embedding Characteristics Clustering[J].Journal of Jiangxi Normal University:Natural Science Edition,2022,(02):468.[doi:10.16357/j.cnki.issn1000-5862.2022.05.05]

备注/Memo

备注/Memo:
收稿日期:2020-09-13
基金项目:国家自然科学基金(60866018)资助项目.
通信作者:吴水秀(1975—),女,江西省南丰人,副教授,主要从事信息检索、中文信息处理和机器学习方面的研究.E-mail:wushuixiu@jxnu.edu.cn
更新日期/Last Update: 2021-04-10