Abstract | ||
---|---|---|
One of the unique challenges to Chinese Language Processing is cross-strait named entity recognition. Due to the adoption of different transliteration strategies, foreign name transliterations can vary greatly be- tween PRC and Taiwan. This situation poses a serious problem for NLP tasks: including data mining, translation and information re- trieval. In this paper, we introduce a novel approach to automatic extraction of diver- gent transliterations of foreign named enti- ties by bootstrapping co-occurrence statis- tics from tagged Chinese corpora. In this study, we use Chinese Word Sketch The automatically bootstrapped translitera- tion pairs are further screened based on pho- netic similarity. The precision is evaluated to be more than 90% against manually cor- rected transliteration pairs. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1142/S1793840608001780 | International Journal of Computer Processing of Languages |
Keywords | Field | DocType |
data mining,transliteration | Entity linking,Word sketch,Computer science,Natural language processing,Artificial intelligence,Named-entity recognition,Transliteration | Journal |
Citations | PageRank | References |
0 | 0.34 | 2 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Petr Simon | 1 | 2 | 2.87 |
Chu-Ren Huang | 2 | 600 | 136.84 |
Shu-kai Hsieh | 3 | 47 | 21.47 |
Jia-Fei Hong | 4 | 18 | 9.06 |