Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation - Citegraph

Paper Info

Title
Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation

Abstract
Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic representations of a CJK name written in a Latin script. The proposed approach involves first identifying the origin of the name, and then back-transliterating the name to all possible Chinese characters using language-specific mappings. To reduce the massive number of possibilities for computation, we apply a three-tier filtering process by filtering first through a set of attested bigrams, then through a set of attested terms, and lastly through the WWW for a final validation. We illustrate the approach with English-to-Japanese back-transliteration. Against test sets of Japanese given names and surnames, we have achieved average precisions of 73% and 90%, respectively.

Year	DOI	Venue
2004	10.3115/1218955.1218979	ACL
Keywords	Field	DocType
english-to-japanese back-transliteration,attested term,novel approach,latin script,cjk name,attested bigrams,ideographic representation,corpus validation,test set,asian language,proper name,possible chinese character,language identification,japanese name,scripting language,proper names	Chinese characters,Computer science,Copying,Latin script,Natural language processing,Language identification,Bigram,Artificial intelligence,Proper noun,Linguistics	Conference
Volume	Citations	PageRank
P04-1	24	1.29
References	Authors
5	2

Authors (2 rows)

Cited by (24 rows)

References (5 rows)

Name	Order	Citations	PageRank
Yan Qu	1	24	1.29
Gregory Grefenstette	2	1129	147.00

1