Title
A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data
Abstract
We develop a parallel expectation-maximization (EM) algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate dataset. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger datasets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high-resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.
Year
DOI
Venue
2013
10.1080/00401706.2013.826146
TECHNOMETRICS
Keywords
Field
DocType
Parallel computing,Parallel coordinate plot,Spatial time series,Unsupervised learning
Data mining,Programming paradigm,Expectation–maximization algorithm,Unsupervised learning,Multivariate normal distribution,Temporal database,Parallel coordinates,Cluster analysis,Statistics,Mathematics,Mixture model
Journal
Volume
Issue
ISSN
55.0
4.0
0040-1706
Citations 
PageRank 
References 
1
0.36
7
Authors
5
Name
Order
Citations
PageRank
Wei-Chen Chen1121.78
George Ostrouchov214218.13
Dave Pugmire315218.62
Prabhat445634.79
Michael Wehner510013.13