Title
Learning to rank with SoftRank and Gaussian processes
Abstract
In this paper we address the issue of learning to rank for document retrieval using Thurstonian models based on sparse Gaussian processes. Thurstonian models represent each document for a given query as a probability distribution in a score space; these distributions over scores naturally give rise to distributions over document rankings. However, in general we do not have observed rankings with which to train the model; instead, each document in the training set is judged to have a particular relevance level: for example "Bad", "Fair", "Good", or "Excellent". The performance of the model is then evaluated using information retrieval (IR) metrics such as Normalised Discounted Cumulative Gain (NDCG). Recently Taylor et al. presented a method called SoftRank which allows the direct gradient optimisation of a smoothed version of NDCG using a Thurstonian model. In this approach, document scores are represented by the outputs of a neural network, and score distributions are created artificially by adding random noise to the scores. The SoftRank mechanism is a general one; it can be applied to different IR metrics, and make use of different underlying models. In this paper we extend the SoftRank framework to make use of the score uncertainties which are naturally provided by a Gaussian process (GP), which is a probabilistic non-linear regression model. We further develop the model by using sparse Gaussian process techniques, which give improved performance and efficiency, and show competitive results against baseline methods when tested on the publicly available LETOR OHSUMED data set. We also explore how the available uncertainty information can be used in prediction and how it affects model performance.
Year
DOI
Venue
2008
10.1145/1390334.1390380
SIGIR
Keywords
Field
DocType
probabilistic non-linear regression model,gaussian process,document score,thurstonian model,model performance,different underlying model,document retrieval,document ranking,softrank framework,softrank mechanism,cumulant,non linear regression,probability distribution,neural network,ranking,learning to rank,information retrieval
Learning to rank,Data mining,Computer science,Probability distribution,Artificial intelligence,Gaussian process,Document retrieval,Probabilistic logic,Thurstonian model,Ranking,Information retrieval,Machine learning,Discounted cumulative gain
Conference
Citations 
PageRank 
References 
28
1.12
13
Authors
2
Name
Order
Citations
PageRank
John Guiver148221.48
Edward Snelson261041.42