Title
Low-Entropy Set Selection
Abstract
Most pattern discovery algorithms easily generate very large numbers of patterns, making the results impossible to un- derstand and hard to use. Recently, the problem of instead selecting a small subset of informative patterns from a large collection of patterns has attracted a lot of interest. In this paper we present a succinct way of representing data on the basis of itemsets that identify strong interactions. This new approach, LESS, provides a more powerful and more general technique to data description than exist- ing approaches. Low-entropy sets consider the data sym- metrically and as such identify strong interactions between attributes, not just between items that are present. Selec- tion of these patterns is executed through the MDL-criterion. This results in only a handful of sets that together form a compact lossless description of the data. By using entropy-based elements for the data descrip- tion, we can successfully apply the maximum likelihood principle to locally cover the data optimally. Further, it al- lows for a fast, natural and well performing heuristic. Based on these approaches we present two algorithms that provide high-quality descriptions of the data in terms of strongly in- teracting variables. Experiments on these methods show that high-quality results are mined: very small pattern sets are returned that are easily interpretable and understandable descriptions of the data, and can be straightforwardly visualized. Swap randomization experiments and high compression ratios show that they capture the structure of the data well.
Year
Venue
Keywords
2009
SDM
maximum likelihood principle,low-entropy sets,dense data,pattern subset selection,mdl,compression ratio,randomized experiment
Field
DocType
Citations 
Heuristic,Maximum likelihood principle,Pattern recognition,Computer science,Compression ratio,Artificial intelligence,Swap (finance),Data description,Lossless compression
Conference
10
PageRank 
References 
Authors
0.67
12
4
Name
Order
Citations
PageRank
Hannes Heikinheimo1553.54
Jilles Vreeken297762.94
Arno Siebes3902102.05
Heikki Mannila465951495.69