Title
Neural-Network Lexical Translation for Cross-lingual IR from Text and Speech
Abstract
We propose a neural network model to estimate word translation probabilities for Cross-Lingual Information Retrieval (CLIR). The model estimates better probabilities for word translations than automatic word alignments alone, and generalizes to unseen source-target word pairs. We further improve the lexical neural translation model (and subsequently CLIR), by incorporating source word context, and by encoding the character sequences of input source words to generate translations of out-of-vocabulary words. To be effective, neural network models typically need training on large amounts of data labeled directly on the final task, in this case relevance to queries. In contrast, our approach only requires parallel data to train the translation model, and uses an unsupervised model to compute CLIR relevance scores. We report results on the retrieval of text and speech documents from three morphologically complex languages with limited training data resources (Swahili, Tagalog, and Somali) and short English queries. Despite training on only about 2M words of parallel training data for each language, we obtain neural network translation models that are very effective for this task. We also obtain further improvements using (i) a modified relevance model, which uses the probability of occurrence of a translation of each query term in the source document, and (ii) confusion networks (instead of 1-best output) that encode multiple transcription alternatives in the output of an Automatic Speech Recognition (ASR) system. We achieve overall MAP relative improvements of up to 24% on Swahili, 50% on Tagalog, and 39% on Somali over the baseline probabilistic model, and larger improvements over monolingual retrieval from machine translation output.
Year
DOI
Venue
2019
10.1145/3331184.3331222
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Keywords
Field
DocType
cross-lingual information retrieval, machine translation, neural networks, probabilistic modeling, speech recognition
Tagalog,ENCODE,Cross lingual,Information retrieval,Computer science,Swahili,Machine translation,Artificial intelligence,Natural language processing,Statistical model,Artificial neural network,Encoding (memory)
Conference
ISBN
Citations 
PageRank 
978-1-4503-6172-9
0
0.34
References 
Authors
0
11
Name
Order
Citations
PageRank
Rabih Zbib125026.70
Lingjun Zhao200.34
Damianos Karakos322119.35
William Hartmann46410.66
Jay DeYoung502.37
Zhongqiang Huang621720.41
Zhuolin Jiang7133537.93
Noah Rivkin800.34
Le Zhang926832.16
Richard M. Schwartz102839717.76
John Makhoul11399236.78