How Informative is a Term?: Dispersion as a measure of Term Specificity. - Citegraph

Paper Info

Title
How Informative is a Term?: Dispersion as a measure of Term Specificity.

Abstract
Similarity functions assign scores to documents in response to queries. These functions require as input statistics about the terms in the queries and documents, where the intention is that the statistics are estimates of the relative informativeness of the terms. Common measures of informativeness use the number of documents containing each term (the document frequency) as a key measure. We argue in this paper that the distribution of within-document frequencies across a collection is also pertinent to informativeness, a measure that has not been considered in prior work: the most informative words tend to be those whose frequency of occurrence has high variance. We propose use of relative standard deviation (RSD) as a measure of variability incorporating within-document frequencies, and show that RSD compares favourably with inverse document frequency (IDF), in both in-principle analysis and in practice in retrieval, with small but consistent gains.

Year	DOI	Venue
2016	10.1145/2911451.2914687	SIGIR
Field	DocType	Citations
Dispersion (optics),Data mining,tf–idf,Computer science,Statistics,Relative standard deviation	Conference	0
PageRank	References	Authors
0.34	6	3

Authors (3 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Rodney McDonell	1	0	0.34
Justin Zobel	2	1	1.11
Bodo Billerbeck	3	272	14.24

1