Abstract | ||
---|---|---|
Various information retrieval models have been studied for decades. Most traditional retrieval models are based on bag-of-termrepresentations, and they model the relevance based on various collection statistics. Despite these efforts, it seems that the performance of \"bag-of-term\" based retrieval functions has reached plateau, and it becomes increasingly difficult to further improve the retrieval performance. Thus, one important research question is whether we can provide any theoretical justifications on the empirical performance bound of basic retrieval functions. In this paper, we start with single term queries, and aim to estimate the performance bound of retrieval functions that leverage only basic ranking signals such as document term frequency, inverse document frequency and document length normalization. Specifically, we demonstrate that, when only single-term queries are considered, there is a general function that can cover many basic retrieval functions. We then propose to estimate the upper bound performance of this function by applying a cost/gain analysis to search for the optimal value of the function. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1145/2970398.2970428 | ICTIR |
Field | DocType | Citations |
Data mining,Learning to rank,Divergence-from-randomness model,Normalization (statistics),Ranking,Information retrieval,tf–idf,Computer science,Upper and lower bounds,Vector space model,Term Discrimination | Conference | 1 |
PageRank | References | Authors |
0.36 | 10 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peilin Yang | 1 | 100 | 12.00 |
Hui Fang | 2 | 918 | 63.03 |