Title
Exploiting Matrix Dependency for Efficient Distributed Matrix Computation
Abstract
Distributed matrix computation is a popular approach for many large-scale data analysis and machine learning tasks. However existing distributed matrix computation systems generally incur heavy communication cost during the runtime, which degrades the overall performance. In this paper, we propose a novel matrix computation system, named DMac, which exploits the matrix dependencies in matrix programs for efficient matrix computation in the distributed environment. We decompose each matrix program into a sequence of operations, and reveal the matrix dependencies between operations in the program. We next design a dependency-oriented cost model to select an optimal execution strategy for each operation, and generate a communication efficient execution plan for the matrix computation program. To facilitate the matrix computation in distributed systems, we further divide the execution plan into multiple un-interleaved stages which can run in a distributed cluster with efficient local execution strategy on each worker. The DMac system has been implemented on a popular general-purpose data processing framework, Spark. The experimental results demonstrate that our techniques can significantly improve the performance of a wide range of matrix programs.
Year
DOI
Venue
2015
10.1145/2723372.2723712
ACM SIGMOD Conference
Keywords
Field
DocType
matrix computing,dependency analysis,distributed system
Data processing,Spark (mathematics),Distributed Computing Environment,Matrix (mathematics),Computer science,Parallel computing,Exploit,Numerical linear algebra,Database,Distributed computing
Conference
Citations 
PageRank 
References 
9
0.47
13
Authors
3
Name
Order
Citations
PageRank
Lele Yu1706.93
Yingxia Shao221324.25
Bin Cui31843124.59