Title
Expoliting rich features for promoting diversity in biomedical information retrieval
Abstract
Prompting diversity in ranking for information retrieval (IR) becomes an important topic in the past decade [2], [4] because of the increasing demand of personalization and disambiguation of user's queries. Beyond counting on relevance between documents and query, diversity IR takes consideration of relationship among documents in ranking order to promote diversity and reduce redundancy. To promote diversity means to provide various aspects of information in the ranking results list and to reduce redundancy aims to deduce repeatedly mentioned information. The application of diversity IR has drawn great attention and shown beneficial in previous studies when query turns out to be ambiguous, especially in the scenario of biomedical IR investigated in TREC1 2006 and 2007 Genomics Tracks where biologists tend to query a certain type of entities covering different aspects that are related to the question, for example, genes, proteins, diseases, and mutations [1]. However, to the best of our knowledge, there is no learning-to-rank algorithm that processes the biomedical information retrieval in the perspective of addressing the domain specific features that may reflect the novelty of single document and the diversity of whole ranking list. We argue that it is promising to define and make use of diversity reflecting features to better model diversity information. Unlike previous studies, we tackle this problem in the learning-to-rank [3] perspective view. The main challenges are how to find salient features for biomedical data and how to tackle the problem of utilizing dynamic features with learning-to-rank technology. In this paper, we propose a novel approach to combine the dynamic diversified features with the learning-to-rank technology. Firstly we rank results using a general learning-to-rank model. Second, using Wikipedia, the topics of each retrieved results are detected which facilitate the generation of diversity-biased features. (Table I lists exampl- of diversity features.) Then a diversity-favored ranking model which awards high novelty and low redundancy ranking results is learned from dataset represented by all features. Final results will be given by combination of both models. Experiment results conducted on the TREC 2006 and 2007 Genomics collections show our proposed method outperforms BM25, Language Model with Dirichlet Smoothing and general learning-to-rank model. The major contributions of this paper are two-fold. First, we propose several diversity-reflecting features by studying the relationship among documents. Second, we propose a learning to rank framework to combine the diversity-biased model with a general ranking model learned from the common features. Extensive experiments on the TREC 2006 and 2007 Genomics Tracks[1] demonstrate that the using of diversity-based features is beneficial for promoting diversity in biomedical IR.
Year
DOI
Venue
2013
10.1109/BIBM.2013.6732579
BIBM
Keywords
Field
DocType
wikipedia,medical information systems,user queries,biomedical information retrieval,genomics,information retrieval,2007 genomics tracks,information ranking results list,diversity information,rich features,documents,trec1 2006,personalization,diversity-favored ranking model,disambiguation,learning-to-rank technology
Learning to rank,Computer science,Ranking (information retrieval),Redundancy (engineering),Artificial intelligence,Language model,Personalization,Information retrieval,Ranking,Smoothing,Bioinformatics,Novelty,Machine learning
Conference
ISSN
Citations 
PageRank 
2156-1125
0
0.34
References 
Authors
1
3
Name
Order
Citations
PageRank
Jiajin Wu1163.93
Jimmy Huang231.74
Zheng Ye3453.01