Abstract | ||
---|---|---|
The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1145/1498759.1498811 | WSDM |
Keywords | Field | DocType |
non-english web,exogenous knowledge,web search,existing english text classifier,english web,english taxonomy,erroneous machine translation,cross-language query classification,additional language,web search result,available language processing tool,machine translation,online advertising,query classification | Data mining,Query language,Relevance feedback,Computer science,Machine translation,Web query classification,Natural language processing,Artificial intelligence,Classifier (linguistics),Text processing,Web search query,Query expansion,Information retrieval | Conference |
Citations | PageRank | References |
3 | 0.50 | 21 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuerui Wang | 1 | 1735 | 123.38 |
Andrei Broder | 2 | 7357 | 920.20 |
Evgeniy Gabrilovich | 3 | 4573 | 224.48 |
Vanja Josifovski | 4 | 2265 | 148.84 |
Bo Pang | 5 | 5795 | 451.00 |