Title | ||
---|---|---|
Efficient implementation of quantum materials simulations on distributed CPU-GPU systems |
Abstract | ||
---|---|---|
We present a scalable implementation of the Linearized Augmented Plane Wave method for distributed memory systems, which relies on an efficient distributed, block-cyclic setup of the Hamiltonian and overlap matrices and allows us to turn around highly accurate 1000+ atom all-electron quantum materials simulations on clusters with a few hundred nodes. The implementation runs efficiently on standard multi-core CPU nodes, as well as hybrid CPU-GPU nodes. The key for the latter is a novel algorithm to solve the generalized eigenvalue problem for dense, complex Hermitian matrices on distributed hybrid CPU-GPU systems. Performance tests for Li-intercalated CoO2 supercells containing 1501 atoms demonstrate that high-accuracy, transferable quantum simulations can now be used in throughput materials search problems. While our application can benefit and get scalable performance through CPU-only libraries like ScaLAPACK or ELPA2, our new hybrid solver enables the efficient use of GPUs and shows that a hybrid CPU-GPU architecture scales to a desired performance using substantially fewer cluster nodes, and notably, is considerably more energy efficient than the traditional multi-core CPU only systems for such complex applications. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2807591.2807654 | International Conference for High Performance Computing, Networking, Storage, and Analysis |
Keywords | Field | DocType |
distributed CPU-GPU systems,linearized augmented plane wave method,distributed memory system,efficient distributed block-cyclic setup,Hamiltonian matrices,overlap matrices,all-electron quantum materials simulation,multicore CPU node,hybrid CPU-GPU node,generalized eigenvalue problem,dense complex Hermitian matrices,Li-intercalated CoO2 supercells,high-accuracy transferable quantum simulation,ScaLAPACK,ELPA2,hybrid CPU-GPU architecture | Computer science,Matrix (mathematics),Load balancing (computing),Efficient energy use,Parallel computing,ScaLAPACK,Eigendecomposition of a matrix,Solver,Hermitian matrix,Scalability,Distributed computing | Conference |
ISBN | Citations | PageRank |
978-1-5090-0273-3 | 4 | 0.49 |
References | Authors | |
17 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Raffaele Solcà | 1 | 35 | 3.74 |
Anton Kozhevnikov | 2 | 6 | 1.48 |
Azzam Haidar | 3 | 409 | 35.39 |
Stanimire Tomov | 4 | 1214 | 102.02 |
Jack J. Dongarra | 5 | 17625 | 2615.79 |
Thomas C. Schulthess | 6 | 106 | 15.16 |