Model-based clustering of high-dimensional data: Variable selection versus facet determination - Citegraph

Paper Info

Title
Model-based clustering of high-dimensional data: Variable selection versus facet determination

Abstract
Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the ''best'' clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection.

Year	DOI	Venue
2013	10.1016/j.ijar.2012.08.001	Int. J. Approx. Reasoning
Keywords	Field	DocType
empirical result,cluster data,model-based clustering,high-dimensional data,gaussian mixture model,facet determination,clustering result,class information,domain expert,cluster analysis,variable selection,gaussian mixture models	Data mining,Clustering high-dimensional data,Pattern recognition,Feature selection,Facet (geometry),Artificial intelligence,Cluster analysis,Mathematics,Mixture model,Machine learning	Journal
Volume	Issue	ISSN
54	1	0888-613X
Citations	PageRank	References
7	0.53	29
Authors
4

Authors (4 rows)

Cited by (7 rows)

References (29 rows)

Name	Order	Citations	PageRank
Leonard K. M. Poon	1	94	10.96
Nevin .L Zhang	2	895	97.21
Tengfei Liu	3	92	7.09
Tengfei Liu	4	488	34.13

1