Title
Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication
Abstract
ABSTRACTSparse times dense matrix multiplication (SpMM) finds its applications in well-established fields such as computational linear algebra as well as emerging fields such as graph neural networks. In this study, we evaluate the performance of various techniques for performing SpMM as a distributed computation across many nodes by focusing on GPU accelerators. We examine how the actual local computational performance of state-of-the-art SpMM implementations affect computational efficiency as dimensions change when we scale to large numbers of nodes, which proves to be an unexpectedly important bottleneck. We also consider various distribution strategies, including A-Stationary, B-Stationary, and C-Stationary algorithms, 1.5D and 2D algorithms, and RDMA-based and bulk synchronous methods of data transfer. Our results show that the best choice of algorithm and implementation technique depends not only on the cost of communication for particular matrix sizes and dimensions, but also on the performance of local SpMM operations. Our evaluations reveal that with the involvement of GPU accelerators, the best design choices for SpMM differ from the conventional algorithms that are known to perform well for dense matrix-matrix or sparse matrix-sparse matrix multiplies.
Year
DOI
Venue
2021
10.1145/3447818.3461472
ICS
DocType
Citations 
PageRank 
Conference
2
0.38
References 
Authors
0
6
Name
Order
Citations
PageRank
Oguz Selvitopi121.73
Benjamin Brock282.30
Israt Nisa320.38
Alok Tripathy420.38
Katherine A. Yelick53494407.23
Aydin Buluc6105767.49