Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences. - Citegraph

Paper Info

Title
Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences.

Abstract
This article proposes a technique for mining bilingual lexicons from pairs of parallel short word sequences. The technique builds a generative model from a corpus of training data consisting of such pairs. The model is a hierarchical nonparametric Bayesian model that directly induces a bilingual lexicon while training. The model learns in an unsupervised manner and is designed to exploit characteristics of the language pairs being mined. The proposed model is capable of utilizing commonly used word-pair frequency information and additionally can employ the internal character alignments within the words themselves. It is thereby capable of mining transliterations and can use reliably aligned transliteration pairs to support the mining of other words in their context. The model is also capable of performing word reordering and word deletion during the alignment process, and it is furthermore capable of operating in the absence of full segmentation information. In this work, we study two mining tasks based on English-Japanese and English-Chinese language pairs, and compare the proposed approach to baselines based on a simpler models that use only word-pair frequency information. Our results show that the proposed method is able to mine bilingual word pairs at higher levels of precision and recall than the baselines.

Year	DOI	Venue
2017	10.1145/3003726	ACM Trans. Asian & Low-Resource Lang. Inf. Process.
Keywords	Field	DocType
Bilingual lexicon,mining,alignment	Training set,Word deletion,Bilingual lexicon,Segmentation,Computer science,Precision and recall,Speech recognition,Exploit,Artificial intelligence,Natural language processing,Transliteration,Generative model	Journal
Volume	Issue	ISSN
16	3	2375-4699
Citations	PageRank	References
0	0.34	13
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Andrew Finch	1	144	19.05
Taisuke Harada	2	0	0.34
Kumiko Tanaka-Ishii	3	261	36.69
Eiichiro SUMITA	4	1466	190.87

1