Title
Inverted indices for particle tracking in petascale cosmological simulations
Abstract
We describe the challenges arising from tracking dark matter particles in state of the art cosmological simulations. We are in the process of running the Indra suite of simulations, with an aggregate count of more than 35 trillion particles and 1.1PB of total raw data volume. However, it is not enough just to store the particle positions and velocities in an efficient manner -- analyses also need to be able to track individual particles efficiently through the temporal history of the simulation. The required inverted indices can easily have raw sizes comparable to the original simulation. We explore various strategies on how to create an efficient index for such data, using additional insight from the physical properties of the particle motions for a greatly compressed data representation. The basic particle data are stored in a relational database in course-grained containers corresponding to leaves of a fixed depth oct-tree labeled by their Peano-Hilbert index. Within each container the individual objects are sorted by their Lagrangian identifier. Thus each particle has a multi-level address: the PH key of the container and the index of the particle within the sorted array (the slot). Given the nature of the cosmological simulations and choice of the PH-box sizes, in consecutive snapshots particles can only cross into spatially adjacent boxes. Also, the slot number of a particle in adjacent snapshots is adjusted up or down by typically a small number. As a result, a special version of delta encoding over the multi-tier address already results in a dramatic reduction of data that needs to be stored. We follow next with an efficient bit-compression, adapting to the statistical properties of the two-part addresses, achieving a final compression ratio better than a factor of 9. The final size of the full inverted index is projected to be 22.5 TB for a petabyte ensemble of simulations.
Year
DOI
Venue
2013
10.1145/2484838.2484882
SSDBM
Keywords
Field
DocType
trillion particle,dark matter particle,particle motion,particle position,consecutive snapshots particle,inverted index,individual particle,particle tracking,peano-hilbert index,total raw data volume,basic particle data,data representation,petascale cosmological simulation
Inverted index,External Data Representation,Relational database,Identifier,Computer science,Sorted array,Petascale computing,Delta encoding,Database,Particle
Conference
Citations 
PageRank 
References 
2
0.42
5
Authors
6
Name
Order
Citations
PageRank
Daniel Crankshaw135612.24
Randal Burns21955115.15
Bridget Falck320.42
Tamás Budavári4244.17
Alexander S. Szalay5959105.36
Jie Wang620.42