Title
Topic Models Ensembles For Ad-Hoc Information Retrieval
Abstract
Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods' limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.
Year
DOI
Venue
2021
10.3390/info12090360
INFORMATION
Keywords
DocType
Volume
ad hoc information retrieval, Latent Dirichlet Allocation (LDA), Bagging, boosting
Journal
12
Issue
Citations 
PageRank 
9
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Pablo Ormeño100.34
Marcelo Mendoza2150285.81
Carlos Valle3218.20