Title
Parallel Fast Multipole Method Accelerated Fft On Hpc Clusters
Abstract
With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major communication bottlenecks. In this work, we take a look at an existing communication-avoiding algorithm FMM-FFT, an alternative to FFT which utilizes the Fast Multipole Method (FMM) to reduce communications to a single all-to-all communication. We present a detailed implementation of FMM-FFT relying on modern libraries and demonstrate it on two distinct distributed memory architectures notably a traditional Intel Xeon based HPC cluster and then a Beowulf cluster. We show that while the FMM-FFT is significantly slower than FFT on the traditional HPC cluster, on the Beowulf cluster it outperforms standard FFT, consistently getting speedups of 1.5x or more against FFTW. We then proceed to show how the communication to computation cost metric is important and useful in explaining the performance results of FMM-FFT against standard FFT. The source code pertaining to this work is being made publicly available under a permissive open source licence at Github.
Year
DOI
Venue
2021
10.1016/j.parco.2021.102783
PARALLEL COMPUTING
Keywords
DocType
Volume
Fast Fourier Transform, Fast Multipole Method, Beowulf cluster, Communication avoiding algorithms, Parallel programming, High performance computing
Journal
104
ISSN
Citations 
PageRank 
0167-8191
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Chahak Mehta100.34
Amarnath Karthi200.34
Vishrut Jetly300.34
Bhaskar Chaudhury412.78