Title
Numerical reproducibility for the parallel reduction on multi- and many-core architectures
Abstract
A parallel algorithm to compute correctly-rounded floating-point sumsHighly-optimized implementations for modern CPUs, GPUs and Xeon PhiAs fast as memory bandwidth allows for large sums with moderate dynamic rangeScales well with the problem size and resources used on a cluster of compute nodes On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and, therefore, non-reproducible mainly due to the non-associativity of floating-point operations. We introduce an approach to compute the correctly rounded sums of large floating-point vectors accurately and efficiently, achieving deterministic results by construction. Our multi-level algorithm consists of two main stages: first, a filtering stage that relies on fast vectorized floating-point expansion; second, an accumulation stage based on superaccumulators in a high-radix carry-save representation. We present implementations on recent Intel desktop and server processors, Intel Xeon Phi co-processors, and both AMD and NVIDIA GPUs. We show that numerical reproducibility and bit-perfect accuracy can be achieved at no additional cost for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms.
Year
DOI
Venue
2015
10.1016/j.parco.2015.09.001
Parallel Computing
Keywords
Field
DocType
Parallel floating-point summation,Reproducibility,Accuracy,Long accumulator,Error-free transformations,Multi- and many-core architectures
Orders of magnitude (numbers),Memory bandwidth,Xeon Phi,Parallel algorithm,Computer science,Parallel computing,Filter (signal processing),Implementation,Theoretical computer science,Xeon,Computation
Journal
Volume
Issue
ISSN
49
C
0167-8191
Citations 
PageRank 
References 
11
0.81
14
Authors
4
Name
Order
Citations
PageRank
Sylvain Collange1142.58
David Defour213118.28
Stef Graillat39216.06
Roman Iakymchuk4325.98