Title
Translating Chinese Romanized Name into Chinese Idiographic Characters via Corpus and Web Validation
Abstract
Cross-language information retrieval performance depends on the quality of the translation resources used to pass from a user's source language query to target language documents. Translation lists of proper names are rare but vital resources for cross-language retrieval between languages using different character sets. Named entities translation dictionaries can be extracted from bilingual corpus with some degree of success, but the problem of the coverage of these scarce bilingual corpora remains. In this article, we present a technique for finding Chinese transliterations for any Chinese name written in English script. Our system performs transliteration of Pinyin (the standard Romanization for Chinese) to Chinese characters via corpus and web validation. Though Chinese family names form a small set, the number and variety of multisyllabic first names is great, and treatment is complicated by the fact that one Pinyin transliteration can correspond to hundred of different Chinese characters. Our method finds the best translations of a Chinese name written in Pinyin by filtering out unlikely translations using a bigram model derived from a very large monolingual Chinese corpus, and then vetting remaining candidate transliterations using Web statistics. We experimentally validate our method using an independent gold standard.
Year
Venue
Keywords
2005
CORIA
system performance,proper names,gold standard
Field
DocType
Citations 
Romanization,Chinese characters,Pinyin,Computer science,Passer,Corpus linguistics,Bigram,Proper noun,Linguistics,Transliteration
Conference
3
PageRank 
References 
Authors
0.45
6
2
Name
Order
Citations
PageRank
Yiping Li162.52
Gregory Grafenstette230.78