Title
Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data
Abstract
We study the interaction between global and local techniques in data mining. Specifically, we study the collections of frequent sets in clusters produced by a probabilistic clustering using mixtures of Bernoulli models. That is, we first analyze 0-1 datasets by a global technique (probabilistic clustering using the EM algorithm) and then do a local analysis (discovery of frequent sets) in each of the clusters. The results indicate that the use of clustering as a preliminary phase in finding frequent sets produces clusters that have significantly different collections of frequent sets. We also test the significance of the differences in the frequent, set collections in the different clusters by obtaining estimates of the underlying joint density. To get from the local patterns in each cluster back to distributions, we use the maximum entropy technique [17] to obtain a local model for each cluster, and then combine these local models to get a mixture model. We obtain clear improvements to the approximation quality against the use of either the mixture model or the maximum entropy model.
Year
Venue
Keywords
2003
SIAM Proceedings Series
maximum entropy,maximum entropy model,mixture model,data mining,em algorithm
Field
DocType
Citations 
Data mining,Maximum-entropy Markov model,Pattern recognition,Expectation–maximization algorithm,Computer science,Binary entropy function,Artificial intelligence,Principle of maximum entropy,Binary data,Cluster analysis,Local analysis,Mixture model
Conference
10
PageRank 
References 
Authors
0.69
12
3
Name
Order
Citations
PageRank
Jaakko Hollmén127230.98
Jouni K. Seppänen21249.09
Heikki Mannila365951495.69