Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication - Citegraph

Paper Info

Title
Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication

Abstract
ABSTRACTSparse times dense matrix multiplication (SpMM) finds its applications in well-established fields such as computational linear algebra as well as emerging fields such as graph neural networks. In this study, we evaluate the performance of various techniques for performing SpMM as a distributed computation across many nodes by focusing on GPU accelerators. We examine how the actual local computational performance of state-of-the-art SpMM implementations affect computational efficiency as dimensions change when we scale to large numbers of nodes, which proves to be an unexpectedly important bottleneck. We also consider various distribution strategies, including A-Stationary, B-Stationary, and C-Stationary algorithms, 1.5D and 2D algorithms, and RDMA-based and bulk synchronous methods of data transfer. Our results show that the best choice of algorithm and implementation technique depends not only on the cost of communication for particular matrix sizes and dimensions, but also on the performance of local SpMM operations. Our evaluations reveal that with the involvement of GPU accelerators, the best design choices for SpMM differ from the conventional algorithms that are known to perform well for dense matrix-matrix or sparse matrix-sparse matrix multiplies.

Year	DOI	Venue
2021	10.1145/3447818.3461472	ICS
DocType	Citations	PageRank
Conference	2	0.38
References	Authors
0	6

Authors (6 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Oguz Selvitopi	1	2	1.73
Benjamin Brock	2	8	2.30
Israt Nisa	3	2	0.38
Alok Tripathy	4	2	0.38
Katherine A. Yelick	5	3494	407.23
Aydin Buluc	6	1057	67.49

1