Abstract | ||
---|---|---|
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ≈30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-31464-3_65 | Lecture Notes in Computer Science |
Keywords | DocType | Volume |
double-precision peak performance,performance result,parallel implementation,implementation choice,lattice boltzmann code,multi-gpu implementation,latest generation multi-core cpus,gp-gpu architecture,multi-gpu cluster,nvidia fermi processor,linear scaling,compressible lattice boltzmann code,computational fluid dynamics,lattice boltzmann methods | Conference | 7203 |
ISSN | Citations | PageRank |
0302-9743 | 7 | 0.86 |
References | Authors | |
6 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Luca Biferale | 1 | 18 | 3.72 |
Filippo Mantovani | 2 | 82 | 14.72 |
Marcello Pivanti | 3 | 62 | 7.43 |
Fabio Pozzati | 4 | 18 | 2.37 |
M. Sbragaglia | 5 | 30 | 5.62 |
Andrea Scagliarini | 6 | 18 | 2.37 |
Sebastiano Fabio Schifano | 7 | 191 | 28.37 |
federico toschi | 8 | 30 | 8.29 |
Raffaele Tripiccione | 9 | 145 | 20.46 |