[1]钱鹏,黄萱菁.中国古诗统计建模与宏观分析[J].江西师范大学学报(自然科学版),2015,(02):117-123.
 QIAN Peng,HUANG Xuanjing.The Statistical Modeling and Macro-Analysis of Chinese Classical Poetry[J].,2015,(02):117-123.
点击复制

中国古诗统计建模与宏观分析()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2015年02期
页码:
117-123
栏目:
出版日期:
2015-04-10

文章信息/Info

Title:
The Statistical Modeling and Macro-Analysis of Chinese Classical Poetry
作者:
钱鹏;黄萱菁
1.复旦大学中国语言文学系,上海 200433; 2.复旦大学计算机学院,上海 201203
Author(s):
QIAN Peng HUANG Xuanjing
关键词:
中国古诗 统计建模 分词 主题模型
Keywords:
Chinese classical poetry statistical modeling word segmentation topic model
分类号:
TP 391
文献标志码:
A
摘要:
利用自然语言处理技术处理文学文本是计算语言学领域近年来的热门话题.该文结合点态互信息量与频率阈值,自动发现中国古诗词汇.基于构建的诗歌词典,利用启发式的正向最大匹配算法,对中国古诗作分词处理.采用主题模型对分词后的诗歌文本进行统计建模,并在此基础上进行了主题演变和诗人群体风格网络的探索性分析.基于全唐诗语料的实验结果表明:主题模型可以给出具有较好解释力的中国古诗统计模型,验证已有的文学史研究,并在传统的文本细读的研究范式之外,对中国诗学提供了全新视角的宏观刻画、描述与阐释.
Abstract:
Modeling literary texts with natural language processing technology has become a popular topic of computational linguistics in recent years.the vocabulary of Chinese classical poetry by combining point-wise mutual information(PMI)method and frequency threshold has been extracted.Based on the extracted poetic vocabulary,a heuristic forward maximum matching algorithm to segment the poems has been used.In order to model the poetry,latent dirichlet allocation(topic model), based on which we also put forward explorative analysis of the literature evolution and poet network has been used.The experiments on the corpus of All-Tang poetry indicate that topic model is an explanatory statistical model of the Chinese classical poetry.While proving the existed evolution theory of Chinese literature, the statistical model also provides insightful macro-analysis from a new perspective, in addition to the traditional methodology of Chinese literature research.

参考文献/References:

[1] 胡俊峰,俞士汶.唐宋诗之计算机辅助深层研究 [J].北京大学学报:自然科学版, 2001, 37(5):727-733.
[2] 年洪东, 陈小荷,王东波.现当代文学作品的作者身份识别研究 [J].计算机工程与应用,2010,46(4):226-229.
[3] 武晓春,黄萱菁,吴立德.基于语义分析的作者身份识别方法研究 [J].中文信息学报, 2006,20(6):61-68.
[4] 张运良,朱礼军,乔晓东,等.基于句类特征的作者写作风格分类研究 [J].计算机工程与应用, 2009,45(22):129-131.
[5] McFarland, Daniel A, Christopher D,et al.Differentiating language usage through topic models [J].Poetics, 2013,41(6): 607-625.
[6] Stamatatos, Efstathios.A survey of modern authorship attribution methods [J].Journal of the American Society for Information Science and Technology, 2009,60(3):538-556.
[7] 周昌乐,游维,丁晓君.一种宋词自动生成的遗传算法及其机器实现 [J].软件学报, 2010(3):427-437.
[8] He Jing, Zhou Ming, Jiang Long.Generating Chinese classical poemswith statistical machine translation models [C]∥Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
[9] Zhang Xingxing, Mirella Lapata.Chinese poetry generation with recurrent neural networks [C]∥Proceedings of EMNLP, 2014:670-680.
[10] Elson, David, Nicholas Dames, et al.Extracting social networks from literary fiction [C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,2010:138-147.
[11] Hughes, James M, Nicholas J Foti,et al.Quantitative patterns of stylistic influence in the evolution of literature [J].Proceedings of the National Academy of Sciences, 2012,109(20):7682-7686.
[12] Kao, Justine,Dan Jurafsky.A computational analysis of style, affect, and imagery in contemporary poetry [C]∥NAACL Workshop on Computational Linguistics for Literature,2012.
[13] Voigt, Rob,Dan Jurafsky.Tradition and modernity in 20th century Chinese poetry [C]∥NAACL Second Workshop on Computational Linguistics for Literature,2013.
[14] Church, Kenneth, William Gale, et al.Using statistics in lexical analysis [C]//Uri Zernik Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.Hillsdale, NJ: Lawrence Erlbaum,1991:15-164.
[15] 苏劲松, 周昌乐, 李翼鸿.基于统计抽词和格律的全宋词切分语料库建立 [J].中文信息学报, 2007, 21(2):52-57.
[16] Qiu Xipeng, Qi Zhang, Huang Xuanjing.FudanNLP: A toolkit for Chinese natural language processing [C]∥Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics,2013: 49-54.
[17] Blei, David M, Andrew Ng, et al.Latent dirichletallocation [J].Journal of Machine Learning Research, 2003(3):993-1022.
[18] Grififths, Thomas L MarkSteyvers.Finding scientific topics [J].Proceedings of the National Academy of Sciences, 2004,101(1): 5228-5235.
[19] 章培恒, 骆玉明.中国文学史新著 [M].上海:复旦大学出版社, 2011.
[20] Luo Yuming.A concise history of Chinese literature [C].Koninklijke Brill NV, Leiden: Netherlands, 2011.

备注/Memo

备注/Memo:
国家自然科学基金(61472088)
更新日期/Last Update: 1900-01-01