Title
An Implementation of Parallel 1-D FFT on the K Computer
Abstract
In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on the K computer. The proposed algorithm is based on the six-step FFT algorithm, which can be altered into the recursive six-step FFT algorithm to reduce the number of cache misses. The recursive six-step FFT algorithm improves performance by utilizing the cache memory effectively. We use the recursive six-step FFT algorithm to implement the parallel one-dimensional FFT algorithm. The performance results of one-dimensional FFTs on the K computer are reported. We successfully achieved a performance of over 18 TFlops on 8192 nodes of the K computer (82944 nodes, 128 GFlops/node, 10.6 PFlops peak performance) for a 2^41-point FFT.
Year
DOI
Venue
2012
10.1109/HPCC.2012.53
HPCC-ICESS
Keywords
Field
DocType
pflops peak performance,41-point fft,k computer,performance result,cache memory,one-dimensional ffts,parallel one-dimensional fft algorithm,1-d fft,parallel one-dimensional,proposed algorithm,six-step fft algorithm,parallel processing,tflops,distributed databases,fast fourier transform,fast fourier transforms,indexes,kernel,registers
Cache-oblivious algorithm,Split-radix FFT algorithm,Prime-factor FFT algorithm,CPU cache,Computer science,Parallel computing,Cooley–Tukey FFT algorithm,Fast Fourier transform,Rader's FFT algorithm,Bruun's FFT algorithm
Conference
ISSN
Citations 
PageRank 
2576-3504
5
0.51
References 
Authors
8
3
Name
Order
Citations
PageRank
Daisuke Takahashi129739.92
Atsuya Uno28712.94
Mitsuo Yokokawa322751.71