Title
Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs
Abstract
Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency. In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia's concurrency-prioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold. Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides.
Year
DOI
Venue
2016
10.1109/HiPC.2016.030
2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
Keywords
Field
DocType
GPU,OpenCL,CUSPARSE,sparse,linear algebra,systems of equations,concurrency,data locality
Linear algebra,Linear equation,Locality,System of linear equations,Computer science,Concurrency,Matrix (mathematics),Parallel computing,Sparse matrix,Distributed computing,Speedup
Conference
ISSN
ISBN
Citations 
1094-7256
978-1-5090-5412-1
0
PageRank 
References 
Authors
0.34
8
5
Name
Order
Citations
PageRank
Andrea Picciau100.34
G. Inggs2376.61
John Wickerson314210.08
Eric C. Kerrigan432936.97
George A. Constantinides51391160.26