Attribute Extraction by Combing Feature Ranking and Sequence Labeling - Citegraph

Paper Info

Title
Attribute Extraction by Combing Feature Ranking and Sequence Labeling

Abstract
Due to the language characteristics, it is a challenge for the knowledge extraction of Chinese text documents. In this paper, an attribute extraction method based on feature ranking and sequence labeling is proposed. Firstly, we obtain the training corpus by annotating Wikipedia texts with the attribute information extracted from the information box of Wikipedia. To improve the quality of training corpus, the trigger keywords are filtered based on the information entropy. The attribute extraction is regarded as a sequence labeling problem, which exploits the multidimensional features such as part of speech and word context. Then, the conditional random field model is trained on the corpus to extract attributes from the unstructured texts. Experiment results show that our method can effectively improve the quality of training corpus using the keyword filtering technique, and hence improve the performance of attribute extraction. Compared with the rule-based attribute extraction methods, our method can be extended to other fields, which has better portability and expansibility.

Year	DOI	Venue
2018	10.1109/BigComp.2018.00094	2018 IEEE International Conference on Big Data and Smart Computing (BigComp)
Keywords	Field	DocType
attribute extraction,feature ranking,sequence labeling,conditional random field	Conditional random field,Sequence labeling,Computer science,Part of speech,Feature extraction,Natural language processing,Encyclopedia,Software portability,Artificial intelligence,Knowledge extraction,Entropy (information theory)	Conference
ISSN	ISBN	Citations
2375-933X	978-1-5386-3650-3	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Bin Peng	1	6	6.41
Xiaoming Zhang	2	263	35.42
Yueying He	3	0	0.34
Zhoujun Li	4	964	115.99

1