Title
Parallelizing the spectral transform method, part II
Abstract
The spectral transform method is a widely used numerical technique for solving partial differential equations on the sphere in global climate modeling. This paper describes the parallelization and performance of the spectral method for solving the non-linear shallow water equations on the surface of a sphere using a 128-node Intel iPSC/860 hypercube. Solving the shallow water equations represents a computational kernel of more complex climate models. This work is part of a research program to develop climate models that are capable of much longer simulations at a significantly finer resolution than current models. Such models are important in understanding the effects of the increasing atmospheric concentrations of greenhouse gases, and the computational requirements are so large that massively parallel multiprocessors will be necessary to run climate model simulations in a reasonable amount of time.The spectral method involves the transformation of data between the physical, Fourier and spectral domains. Each of these domains is two-dimensional. The spectral method performs Fourier transforms in the longitude direction followed by summation in the latitude direction to evaluate the discrete spectral transform. A simple way of parallelizing the spectral code is to decompose the physical problem domain in just the latitude direction. This allows an optimized sequential FFT algorithm to be used in the longitude direction. However, this approach limits the number of processors that can be brought to bear on the problem. Decomposing the problem over both directions allows the parallelism inherent in the problem to be exploited more effectively-the grain size is reduced, so that more processors can be used.Results are presented that show that decomposing over both directions does result in a more rapid solution of the problem. The results show that, for a given problem and number of processors, the optimum decomposition has approximately equal numbers of processors in each direction. Load imbalance also has an impact on the performance of the method. The importance of minimizing communication latency and overlapping communication with calculation is stressed. General methods for doing this, that may be applied to many other problems, are discussed.
Year
DOI
Venue
1992
10.1002/cpe.4330040703
Concurrency - Practice and Experience
Keywords
DocType
Volume
part II
Journal
4
Issue
ISSN
Citations 
7
1040-3108
5
PageRank 
References 
Authors
3.25
3
3
Name
Order
Citations
PageRank
David W. Walker11158129.14
Patrick H. Worley2488101.02
John B. Drake314057.27