Title
Extraction of Name and Transliteration in Monolingual and Parallel Corpora
Abstract
Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.
Year
DOI
Venue
2004
10.1007/978-3-540-30194-3_20
Lecture Notes in Computer Science
Keywords
Field
DocType
machine translation,proper names,text analysis
Bilingual dictionary,Computer science,Multilingualism,Precision and recall,Machine translation,Computational linguistics,Speech recognition,Artificial intelligence,Natural language processing,Proper noun,Cross-language information retrieval,Transliteration
Conference
Volume
ISSN
Citations 
3265
0302-9743
3
PageRank 
References 
Authors
0.50
4
3
Name
Order
Citations
PageRank
Tracy Lin1132.77
Jian-Cheng Wu27013.30
Jason S. Chang334562.64