Title
Utilizing many-core accelerators for halo and center finding within a cosmology simulation
Abstract
Efficiently finding and computing statistics about ¿halos¿ (regions of high density) are essential analysis steps for N-body cosmology simulations. However, in state-of-the-art simulation codes, these analysis operators do not currently take advantage of the shared-memory data-parallelism available on multi-core and many-core architectures. The Hybrid / Hardware Accelerated Cosmology Code (HACC) is designed as an MPI+X code, but the analysis operators are parallelized only among MPI ranks, because of the difficulty in porting different X implementations (e.g., OpenMP, CUDA) across all architectures on which it is run. In this paper, we present portable data-parallel algorithms for several variations of halo finding and halo center finding algorithms. These are implemented with the PISTON component of the VTK-m framework, which uses Nvidia's Thrust library to construct data-parallel algorithms that allow a single implementation to be compiled to multiple backends to target a variety of multi-core and many-core architectures. Finally, we compare the performance of our halo and center finding algorithms against the original HACC implementations on the Moonlight, Stampede, and Titan supercomputers. The portability of Thrust allowed the same code to run efficiently on each of these architectures. On Titan, the performance improvements using our code have enabled halo analysis to be performed on a very large data set (81923 particles across 16,384 nodes of Titan) for which analysis using only the existing CPU algorithms was not feasible.
Year
DOI
Venue
2015
10.1109/LDAV.2015.7348076
LDAV
Keywords
DocType
ISSN
D.1.3 [Software]: Programming Techniques — [Concurrent Prgm.]
Conference
2373-7514
Citations 
PageRank 
References 
1
0.36
9
Authors
5
Name
Order
Citations
PageRank
Christopher Sewell116414.96
Li-ta Lo2757.03
Katrin Heitmann314414.49
Salman Habib49815.24
James Ahrens523335.07