Title
Concept-Based Information Retrieval Using Explicit Semantic Analysis
Abstract
Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keyword-based text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.
Year
DOI
Venue
2011
10.1145/1961209.1961211
ACM Trans. Inf. Syst.
Keywords
Field
DocType
feature selection,concept-based information retrieval,concept-based retrieval method,concept-based feature,information retrieval system,new concept-based retrieval approach,concept-based retrieval,explicit semantic analysis,training data,term cooccurrence data,comprehensive human world knowledge,keyword-based retrieval,new text,new method,semantic search,information retrieval
Data mining,Cognitive models of information retrieval,Computer science,Explicit semantic analysis,Natural language processing,Artificial intelligence,Term Discrimination,Vector space model,Human–computer information retrieval,Information retrieval,Relevance (information retrieval),Concept search,Visual Word
Journal
Volume
Issue
Citations 
29
2
120
PageRank 
References 
Authors
3.31
62
3
Search Limit
100120
Name
Order
Citations
PageRank
Ofer Egozi11384.51
Shaul Markovitch23010262.77
Evgeniy Gabrilovich34573224.48