Using Wikipedia for Cross-Language Named Entity Recognition. - Citegraph

Paper Info

Title
Using Wikipedia for Cross-Language Named Entity Recognition.

Abstract
Named entity recognition and classification NERC is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on manually annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate partially annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

Year	DOI	Venue
2014	10.1007/978-3-319-29009-6_1	MSM/MUSE/SenseML
Field	DocType	Citations
Data mining,NERC Tag,Computer science,Natural language processing,Artificial intelligence,Conditional random field,Question answering,Information retrieval,Exploit,Information extraction,Hidden Markov model,Named-entity recognition,Perceptron,Machine learning	Conference	0
PageRank	References	Authors
0.34	28	4

Authors (4 rows)

Cited by (0 rows)

References (28 rows)

Name	Order	Citations	PageRank
Eraldo R. Fernandes	1	76	6.09
Ulf Brefeld	2	633	51.89
Roi Blanco	3	872	57.42
Jordi Atserias	4	268	31.84

1