Probabilistic document length priors for language models - Citegraph

Paper Info

Title
Probabilistic document length priors for language models

Abstract
This paper addresses the issue of devising a new document prior for the language modeling (LM) approach for Information Retrieval. The prior is based on term statistics, derived in a probabilistic fashion and portrays a novel way of considering document length. Furthermore, we developed a new way of combining document length priors with the query likelihood estimation based on the risk of accepting the latter as a score. This prior has been combined with a document retrieval language model that uses Jelinek-Mercer (JM), a smoothing technique which does not take into account document length. The combination of the prior boosts the retrieval performance, so that it outperforms a LM with a document length dependent smoothing component (Dirichlet prior) and other state of the art high-performing scoring function (BM25). Improvements are significant, robust across different collections and query sizes.

Year	DOI	Venue
2008	10.1007/978-3-540-78646-7_36	ECIR
Keywords	Field	DocType
document retrieval language model,prior boost,new document,probabilistic document length prior,dependent smoothing component,account document length,document length prior,query likelihood estimation,language modeling,query size,document length,score function,document retrieval,language model	Data mining,Information retrieval,Document clustering,Computer science,Smoothing,Artificial intelligence,Dirichlet distribution,Document retrieval,Probabilistic logic,Prior probability,Machine learning,Language model	Conference
Volume	ISSN	ISBN
4956	0302-9743	3-540-78645-7
Citations	PageRank	References
13	0.78	14
Authors
2

Authors (2 rows)

Cited by (13 rows)

References (14 rows)

Name	Order	Citations	PageRank
Roi Blanco	1	872	57.42
Alvaro Barreiro	2	226	22.42

1