[1]曾道建,来斯惟,张元哲,等.面向非结构化文本的开放式实体属性抽取[J].江西师范大学学报(自然科学版),2013,(03):279-283.
 ZENG Dao-jian,LAI Si-wei,ZHANG Yuan-zhe,et al.Open Entity Attribute-Value Extraction from Unstructured Text[J].,2013,(03):279-283.
点击复制

面向非结构化文本的开放式实体属性抽取()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2013年03期
页码:
279-283
栏目:
出版日期:
2013-05-01

文章信息/Info

Title:
Open Entity Attribute-Value Extraction from Unstructured Text
作者:
曾道建;来斯惟;张元哲;刘康;赵军
中国科学院自动化所模式识别国家重点实验室,北京,100190
Author(s):
ZENG Dao-jian;LAI Si-wei;ZHANG Yuan-zhe;LIU Kang;ZHAO Jun
关键词:
属性抽取非结构化信息框百度百科
Keywords:
attribute-value extractionunstructured textInfoboxBaidu encyclopedia
分类号:
TP391
文献标志码:
A
摘要:
从非结构化文本中抽取给定实体的属性及属性值,将属性抽取看作是一个序列标注问题.为避免人工标注训练语料,充分利用百度百科信息框(Infobox)已有的结构化内容,对非结构化文本回标自动产生训练数据.在得到训练语料后,结合中文特点,选取多维度特征训练序列标注模型,并利用上下文信息进一步提高系统性能,进而在非结构化文本中抽取出实体的属性及属性值.实验结果表明:该方法在百度百科多个类别中均有效;同时,该方法可以直接扩展到类似的非结构化文本中抽取属性.
Abstract:
An approach for extracting attribute-value pairs of a given entity has been proposed,regarding attribute-value extraction as a sequential data-labeling problem.In order to avoid label the corpus manually,the information in the Infoboxes of Baidu encyclopedia is used to label the unstructured text as the training data.After the training data was generated,multidimensional features are used to train the sequential data-labeling model,and then the performance is improved by using the context.Experiments shows that this method can be used in many classes of the Baidu encyclopedia,and this method can be also used in other websites.

参考文献/References:

[1] 赵军,刘康,周光有,等.开放式文本信息抽取 [J].中文信息学报,2011,25(6):98-110.
[2] Etzioni O,Cafarella M,Downey D,et al.Unsupervised named-entity extraction from the web:an experimental study [J].Artificial Intelligence,2005,165(1):91-134.
[3] Banko M,Cafarella M,Soderland S,et al.Open information extraction from the Web [EB/OL].
[2012-11-12].http:∥turing.cs.washington.edu/papers/rjcai07.pdf.
[4] Wu Fei,Daniel S Weld.Open information extraction using Wikipedia [EB/OL].
[2012-11-12].http:∥homes.cs.washington.edu/~weld/papers/wu-acl10.pdf.
[5] Oren Etzioni,Anthony Fader,Janara Christensen,et al.Open information extraction:the second generation [EB/OL].
[2012-11-12].http:∥turing.cs.washington.edu/papers/etzioni-ijcai2011.pdf..
[6] Marius Pasca,Benjamin Van Durme.Weakly-supervised acquisition of labeled class instances using graph random walks [EB/OL].
[2012-11-16].http:∥www.cs.utexas.edu/~joeraii/papers/adsorption_emnlp08.pdf.
[7] Dmitry Davidov,Ari Rappoport,Moshe Koppel.Fully unsupervised discovery of concept-specific relationships by Web mining[EB/OL].http:∥www.citeulike.org/user/student_t/article/3270320.
[8] Dmitry Davidov,Ari Rappoport.Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated SAT analogy questions [EB/OL].
[2012-12-17].http:∥www.cse.huj.ac.il/~arir/sat.pdf.

备注/Memo

备注/Memo:
国家自然科学基金(61070106);国家"973"计划(2012CB316300);清华信息科学与技术国家实验室(筹)基金
更新日期/Last Update: 1900-01-01