Abstract | ||
---|---|---|
This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/978-3-642-39593-2_18 | SLSP |
Keywords | Field | DocType |
corpus size,automatic discovery,wikipedia headline,language pair,transliteration corpus,public use,automatic transliteration,transliteration generation,mdl-based model,minimum description length principle,different alphabet,etymological sound change | Pairwise comparison,Computer science,Minimum description length,Parallel corpora,Speech recognition,Natural language processing,Artificial intelligence,Sound change,Proper noun,Transliteration | Conference |
Citations | PageRank | References |
2 | 0.41 | 22 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Javad Nouri | 1 | 2 | 1.76 |
Lidia Pivovarova | 2 | 16 | 7.04 |
Roman Yangarber | 3 | 411 | 62.85 |