Abstract | ||
---|---|---|
In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on the K computer. The proposed algorithm is based on the six-step FFT algorithm, which can be altered into the recursive six-step FFT algorithm to reduce the number of cache misses. The recursive six-step FFT algorithm improves performance by utilizing the cache memory effectively. We use the recursive six-step FFT algorithm to implement the parallel one-dimensional FFT algorithm. The performance results of one-dimensional FFTs on the K computer are reported. We successfully achieved a performance of over 18 TFlops on 8192 nodes of the K computer (82944 nodes, 128 GFlops/node, 10.6 PFlops peak performance) for a 2^41-point FFT. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/HPCC.2012.53 | HPCC-ICESS |
Keywords | Field | DocType |
pflops peak performance,41-point fft,k computer,performance result,cache memory,one-dimensional ffts,parallel one-dimensional fft algorithm,1-d fft,parallel one-dimensional,proposed algorithm,six-step fft algorithm,parallel processing,tflops,distributed databases,fast fourier transform,fast fourier transforms,indexes,kernel,registers | Cache-oblivious algorithm,Split-radix FFT algorithm,Prime-factor FFT algorithm,CPU cache,Computer science,Parallel computing,Cooley–Tukey FFT algorithm,Fast Fourier transform,Rader's FFT algorithm,Bruun's FFT algorithm | Conference |
ISSN | Citations | PageRank |
2576-3504 | 5 | 0.51 |
References | Authors | |
8 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daisuke Takahashi | 1 | 297 | 39.92 |
Atsuya Uno | 2 | 87 | 12.94 |
Mitsuo Yokokawa | 3 | 227 | 51.71 |