Title
MDL-based models for transliteration generation
Abstract
This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance.
Year
DOI
Venue
2013
10.1007/978-3-642-39593-2_18
SLSP
Keywords
Field
DocType
corpus size,automatic discovery,wikipedia headline,language pair,transliteration corpus,public use,automatic transliteration,transliteration generation,mdl-based model,minimum description length principle,different alphabet,etymological sound change
Pairwise comparison,Computer science,Minimum description length,Parallel corpora,Speech recognition,Natural language processing,Artificial intelligence,Sound change,Proper noun,Transliteration
Conference
Citations 
PageRank 
References 
2
0.41
22
Authors
3
Name
Order
Citations
PageRank
Javad Nouri121.76
Lidia Pivovarova2167.04
Roman Yangarber341162.85