Title
Discovering multilingual concepts from unaligned web documents by exploring associated images
Abstract
The Internet is experiencing an explosion of information presented in different languages. Though written in different languages, some articles implicitly share common concepts. In this paper, we propose a novel framework to mine cross-language common concepts from unaligned web documents. Specifically, visual words of images are used to bridge articles in different languages and then common concepts of multiple languages are learned by using an existing topic modeling algorithm. We conduct cross-lingual text classification in a real-world data set using the mined multilingual concepts from our method. The experiment results show that our approach is effective to mine cross-lingual common concepts.
Year
DOI
Venue
2013
10.1145/2487788.2487874
WWW (Companion Volume)
Keywords
Field
DocType
different language,cross-lingual text classification,experiment result,multiple language,cross-language common concept,cross-lingual common concept,common concept,mined multilingual concept,existing topic modeling algorithm,novel framework,unaligned web document,image
Data mining,World Wide Web,Computer science,Topic model,Visual Word,The Internet
Conference
ISBN
Citations 
PageRank 
978-1-4503-2038-2
0
0.34
References 
Authors
3
4
Name
Order
Citations
PageRank
Xiaochen Zhang150.77
Xiaoming Jin231523.42
Lianghao Li31215.60
Dou Shen4122459.46