Title
Scalable computation of distributions from large scale data sets
Abstract
As we approach the era of exascale computing, the role of distributions to summarize, analyze and visualize large scale data is becoming more and more important. Since histograms continue to be a popular way of modeling the underlying data distribution, we propose a scalable and distributed framework for computing histograms from scalar and vector data at different levels of detail required by various types of analysis algorithms. We present efficient parallel techniques for histogram computation from regular as well as rectilinear grid data. We also study a technique called cross-validation to estimate the quality of computed histograms as a model of the actual data distribution. We parallelize cross-validation in a scalable manner to support histogram evaluation and selection of histogram parameters such as number of bins. We also present our distributed software framework for supporting science applications which require large scale distribution-based data analysis. The presented case studies highlight how the proposed algorithms and the related software benefit information theoretic and other distribution-driven analysis.
Year
DOI
Venue
2012
10.1109/LDAV.2012.6378985
Large Data Analysis and Visualization
Keywords
Field
DocType
data analysis,data visualisation,information theory,parallel processing,cross-validation,data distribution,distributed software framework,distribution scalable computation,distribution-based data analysis,distribution-driven analysis,exascale computing,histogram computation,histogram evaluation,histogram parameter estimation,information theoretic analysis,large scale data analysis,large scale data sets,large scale data summarization,large scale data visualization,parallel techniques,rectilinear grid data,regular grid data,scalar data,vector data
Exascale computing,Information theory,Histogram,Data mining,Data visualization,Data set,Computer science,Software,Grid,Scalability
Conference
ISBN
Citations 
PageRank 
978-1-4673-4732-7
5
0.45
References 
Authors
0
8
Name
Order
Citations
PageRank
Abon Chaudhuri1524.49
Teng-Yok Lee232419.31
Bo Zhou35811.39
Cong Wang450.45
Tiantian Xu5292.48
Han-Wei Shen62204148.60
Tom Peterka753149.78
Yi-jen Chiang850338.21