Title
P-DOT: A model of computation for big data
Abstract
In response to the high demand of big data analytics, several programming models on large and distributed cluster systems have been proposed and implemented, such as MapRe-duce, Dryad and Pregel. However, compared with high performance computing areas, the basis and principles of computation and communication behavior of big data analytics is not well studied. In this paper, we review the current big data computational model DOT and DOTA, and propose a more general and practical model p-DOT (p-phases DOT). p-DOT is not a simple extension, but with profound significance: for general aspects, any big data analytics job execution expressed in DOT model or BSP model can be represented by it; for practical aspects, it considers I/O behavior to evaluate performance overhead. Moreover, we provide a cost function implying that the optimal number of machines is near-linear to the square root of input size for a fixed algorithm and workload, and demonstrate the effectiveness of the function through several experiments.
Year
DOI
Venue
2013
10.1109/BigData.2013.6691551
International Journal of Parallel, Emergent and Distributed Systems
Keywords
Field
DocType
p-phases dot,p-dot,parallel processing,big data,i-o behavior,bsp model,distributed cluster systems,data analysis,bulk synchronous parallel model,big data computational model,mapre-duce,pregel,programming models,dryad,high performance computing areas,big data analytics,dota,computational model,distributed system,cost function
Data mining,Programming with Big Data in R,Computer science,Theoretical computer science,Model of computation,Artificial intelligence,Computation,Programming paradigm,Supercomputer,Workload,Square root,Big data,Machine learning
Conference
Volume
Issue
ISSN
31
3
1744-5760
Citations 
PageRank 
References 
2
0.39
12
Authors
4
Name
Order
Citations
PageRank
Tao Luo130.74
Yin Liao221.06
Guoliang Chen330546.48
Yunquan Zhang432743.92