Abstract | ||
---|---|---|
This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accepting the latter as a score. This prior has been combined with a document retrieval language model that uses Jelinek-Mercer (JM), a smoothing technique which does not take into account document length. The combination of the prior boosts the retrieval performance, so that it outperforms a LM with a document length dependent smoothing component (Dirichlet prior) and other state of the art high-performing scoring function (BM25). Improvements are significant, robust across different collections and query sizes. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1007/978-3-540-78646-7_36 | ECIR |
Keywords | Field | DocType |
document retrieval language model,prior boost,new document,probabilistic document length prior,dependent smoothing component,account document length,document length prior,query likelihood estimation,language modeling,query size,document length,score function,document retrieval,language model | Data mining,Information retrieval,Document clustering,Computer science,Smoothing,Artificial intelligence,Dirichlet distribution,Document retrieval,Probabilistic logic,Prior probability,Machine learning,Language model | Conference |
Volume | ISSN | ISBN |
4956 | 0302-9743 | 3-540-78645-7 |
Citations | PageRank | References |
13 | 0.78 | 14 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Roi Blanco | 1 | 872 | 57.42 |
Alvaro Barreiro | 2 | 226 | 22.42 |