Title
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms
Abstract
Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix---matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Year
DOI
Venue
2015
10.1007/s11227-014-1133-x
The Journal of Supercomputing
Keywords
Field
DocType
Matrix Multiplication,Parallel Computing,Exascale Computing,Communication Cost,Grid5000,BlueGene,Hierarchy
Exascale computing,Supercomputer,Computer science,Parallel algorithm,Parallel computing,Multiplication,Hierarchy,Matrix multiplication,Computing systems,Distributed computing
Journal
Volume
Issue
ISSN
71
11
0920-8542
Citations 
PageRank 
References 
11
0.63
26
Authors
3
Name
Order
Citations
PageRank
Khalid Hasanov1283.35
Jean-Noël Quintin2284.11
Alexey Lastovetsky376384.50