Abstract | ||
---|---|---|
Proper name transliteration, the pronunciation based translation of a proper name, is important to many multilingual natural language processing task, such as Statistical Machine Translation (SMT) and Cross Lingual Information Retrieval (CLIR). This task is extremely challenging due to the pronunciation difference between the source and target language. A given proper name can lead to many different transliterations. In the past, research efforts had demonstrated a 30-50% error using top-1 reference for transliteration. This error leads to performance degradation for many applications. In this paper, a novel approach to verify a given proper name transliteration pair using a discrete variant Hidden Markov Model (HMM) alignment is proposed. The state emission probabilities are derived from SMT phrase tables. The proposed method yields an Equal Error Rate (EER) of 3.73% on a 300 matched and 1000 unmatched name pairs test set. By comparison, the commonly used SMT framework yields 6.5% EER under the best configuration. The widely used edit distance approach has an EER of 22%. Our new method achieves high accuracy and low complexity, and provides an alternative for name transliteration in CLIR and other cross lingual natural language applications such as word alignment and machine translation. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/ISCSLP.2010.5684842 | ISCSLP |
Keywords | Field | DocType |
multilingual natural language processing task,smt phrase tables,information retrieval,translteration,cross lingual information retrieval,state emission probabilities,equal error rate,language translation,proper name transliteration verification,pronunciation based translation,natural language processing,cross lingual ir,hidden markov models,component,machine translation,probability,discrete variant hidden markov model,natural language,computational modeling,edit distance,noise measurement,proper names,hidden markov model,decoding,kernel | Edit distance,Language translation,Computer science,Machine translation,Artificial intelligence,Natural language processing,Pattern recognition,Word error rate,Speech recognition,Natural language,Hidden Markov model,Proper noun,Transliteration | Conference |
ISBN | Citations | PageRank |
978-1-4244-6244-5 | 0 | 0.34 |
References | Authors | |
16 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jan, E.-E. | 1 | 148 | 39.33 |
Niyu Ge | 2 | 195 | 21.69 |
Shih-Hsiang Lin | 3 | 142 | 14.07 |
Salim Roukos | 4 | 6248 | 845.50 |
Jeffrey S. Sorensen | 5 | 154 | 16.12 |