Abstract | ||
---|---|---|
We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. The four interestingness measures presented here are based upon common measures of diversity of a population: variance, the Simpson index, and the Shannon index. Using each of the proposed measures, we assign a single real value to a summary that describes its interestingness. Our experimental results show that the ranks assigned by the four interestingness measures are highly correlated. |
Year | DOI | Venue |
---|---|---|
1999 | 10.1007/3-540-48912-6_28 | PAKDD |
Keywords | Field | DocType |
proposed measure,shannon index,common measure,interestingness measure,information theory,single real value,discovered knowledge,probability distribution,simpson index | Information theory,Data mining,Population,Ranking,Tuple,Computer science,Probability distribution,Heuristics,Knowledge extraction,Knowledge base | Conference |
ISBN | Citations | PageRank |
3-540-65866-1 | 9 | 1.11 |
References | Authors | |
11 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Robert J. Hilderman | 1 | 270 | 29.86 |
Howard J. Hamilton | 2 | 1501 | 145.55 |