Title
Attribute Extraction by Combing Feature Ranking and Sequence Labeling
Abstract
Due to the language characteristics, it is a challenge for the knowledge extraction of Chinese text documents. In this paper, an attribute extraction method based on feature ranking and sequence labeling is proposed. Firstly, we obtain the training corpus by annotating Wikipedia texts with the attribute information extracted from the information box of Wikipedia. To improve the quality of training corpus, the trigger keywords are filtered based on the information entropy. The attribute extraction is regarded as a sequence labeling problem, which exploits the multidimensional features such as part of speech and word context. Then, the conditional random field model is trained on the corpus to extract attributes from the unstructured texts. Experiment results show that our method can effectively improve the quality of training corpus using the keyword filtering technique, and hence improve the performance of attribute extraction. Compared with the rule-based attribute extraction methods, our method can be extended to other fields, which has better portability and expansibility.
Year
DOI
Venue
2018
10.1109/BigComp.2018.00094
2018 IEEE International Conference on Big Data and Smart Computing (BigComp)
Keywords
Field
DocType
attribute extraction,feature ranking,sequence labeling,conditional random field
Conditional random field,Sequence labeling,Computer science,Part of speech,Feature extraction,Natural language processing,Encyclopedia,Software portability,Artificial intelligence,Knowledge extraction,Entropy (information theory)
Conference
ISSN
ISBN
Citations 
2375-933X
978-1-5386-3650-3
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Bin Peng166.41
Xiaoming Zhang226335.42
Yueying He300.34
Zhoujun Li4964115.99