Title
Evaluation of Interestingness Measures for Ranking Discovered Knowledge
Abstract
When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.
Year
DOI
Venue
2001
10.1007/3-540-45357-1_28
PAKDD
Keywords
Field
DocType
different way,thirteen diversity,empirical result,thirteen diversity measure,interestingness measures,human user,various discipline,middle index value,discovered knowledge,ranking summary,heuristic measure,various technique
Data mining,Normal distribution,Computer science,Synthetic data,Artificial intelligence,Hierarchy,Information theory,Heuristic,Information retrieval,Ranking,Association rule learning,Knowledge extraction,Machine learning
Conference
ISBN
Citations 
PageRank 
3-540-41910-1
49
4.47
References 
Authors
16
2
Name
Order
Citations
PageRank
Robert J. Hilderman127029.86
Howard J. Hamilton21501145.55