Abstract | ||
---|---|---|
This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1007/978-3-540-78646-7_32 | ECIR |
Keywords | Field | DocType |
component query term,component term,term occurrence,context phrase,non-overlapping phrase,viewing term proximity,query term,term frequency,extra contribution,different perspective,relevance contribution,term proximity,inverse document frequency,probabilistic model | Data mining,Information retrieval,Computer science,Markov random field,Phrase,Statistical model,Language model | Conference |
Volume | ISSN | ISBN |
4956 | 0302-9743 | 3-540-78645-7 |
Citations | PageRank | References |
35 | 1.53 | 20 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruihua Song | 1 | 1138 | 59.33 |
Michael J. Taylor | 2 | 749 | 41.75 |
Ji-Rong Wen | 3 | 4431 | 265.98 |
Hsiao-Wuen Hon | 4 | 1719 | 354.37 |
Yong Yu | 5 | 7637 | 380.66 |