Title
Cross-language query classification using web search for exogenous knowledge
Abstract
The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages.
Year
DOI
Venue
2009
10.1145/1498759.1498811
WSDM
Keywords
Field
DocType
non-english web,exogenous knowledge,web search,existing english text classifier,english web,english taxonomy,erroneous machine translation,cross-language query classification,additional language,web search result,available language processing tool,machine translation,online advertising,query classification
Data mining,Query language,Relevance feedback,Computer science,Machine translation,Web query classification,Natural language processing,Artificial intelligence,Classifier (linguistics),Text processing,Web search query,Query expansion,Information retrieval
Conference
Citations 
PageRank 
References 
3
0.50
21
Authors
5
Name
Order
Citations
PageRank
Xuerui Wang11735123.38
Andrei Broder27357920.20
Evgeniy Gabrilovich34573224.48
Vanja Josifovski42265148.84
Bo Pang55795451.00