MDL-based models for transliteration generation - Citegraph

Paper Info

Title
MDL-based models for transliteration generation

Abstract
This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance.

Year	DOI	Venue
2013	10.1007/978-3-642-39593-2_18	SLSP
Keywords	Field	DocType
corpus size,automatic discovery,wikipedia headline,language pair,transliteration corpus,public use,automatic transliteration,transliteration generation,mdl-based model,minimum description length principle,different alphabet,etymological sound change	Pairwise comparison,Computer science,Minimum description length,Parallel corpora,Speech recognition,Natural language processing,Artificial intelligence,Sound change,Proper noun,Transliteration	Conference
Citations	PageRank	References
2	0.41	22
Authors
3

Authors (3 rows)

Cited by (2 rows)

References (22 rows)

Name	Order	Citations	PageRank
Javad Nouri	1	2	1.76
Lidia Pivovarova	2	16	7.04
Roman Yangarber	3	411	62.85

1