Abstract | ||
---|---|---|
We study data rn~rdng where the task is description by summarization, the representation language is gen- eralized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the Multi-Attribute Generalization al- gorithm for domain generalization graphs. We present and empirically compare four heuristics for ranking the interestingness of generalized relations (or summaries). The measures are based on common measures of the di- versity of a population, statistical variance, the Simp- son index, and the Shannon index. All four measures rank less complex summaries (i.e., those with few tu- ples and/or non-ANY attributes) as most interesting. Highly ranked summaries provide a reasonable starting point for fixrther analysis of discovered knowledge. |
Year | Venue | Keywords |
---|---|---|
1999 | FLAIRS Conference | data mining systems,indexation,data mining |
Field | DocType | ISBN |
Population,Data mining,Computer science,Heuristics,Artificial intelligence,Natural language processing,Automatic summarization,Graph,Heuristic,Information retrieval,Ranking,Tuple,Representation language | Conference | 1-57735-080-4 |
Citations | PageRank | References |
8 | 2.91 | 16 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Robert J. Hilderman | 1 | 270 | 29.86 |
Howard J. Hamilton | 2 | 1501 | 145.55 |
Brock Barber | 3 | 86 | 9.48 |