Using the past to score the present: extending term weighting models through revision history analysis - Citegraph

Paper Info

Title
Using the past to score the present: extending term weighting models through revision history analysis

Abstract
The generative process underlies many information retrieval models, notably statistical language models. Yet these models only examine one (current) version of the document, effectively ignoring the actual document generation process. We posit that a considerable amount of information is encoded in the document authoring process, and this information is complementary to the word occurrence statistics upon which most modern retrieval models are based. We propose a new term weighting model, Revision History Analysis (RHA), which uses the revision history of a document (e.g., the edit history of a page in Wikipedia) to redefine term frequency - a key indicator of document topic/relevance for many retrieval models and text processing tasks. We then apply RHA to document ranking by extending two state-of-the-art text retrieval models, namely, BM25 and the generative statistical language model (LM). To the best of our knowledge, our paper is the first attempt to directly incorporate document authoring history into retrieval models. Empirical results show that RHA provides consistent improvements for state-of-the-art retrieval models, using standard retrieval tasks and benchmarks.

Year	DOI	Venue
2010	10.1145/1871437.1871519	CIKM
Keywords	DocType	Citations
modern retrieval model,standard retrieval task,document topic,term weighting model,actual document generation process,revision history,generative process,information retrieval model,revision history analysis,state-of-the-art text retrieval model,state-of-the-art retrieval model,retrieval model,information retrieval,term frequency	Conference	17
PageRank	References	Authors
0.84	30	4

Authors (4 rows)

Cited by (17 rows)

References (30 rows)

Name	Order	Citations	PageRank
Ablimit Aji	1	277	14.26
Yu Wang	2	138	6.99
Eugene Agichtein	3	4549	269.70
Evgeniy Gabrilovich	4	4573	224.48

1