Title
Fast block distributed CUDA implementation of the Hungarian algorithm
Abstract
The Hungarian algorithm solves the linear assignment problem in polynomial time. A GPU/CUDA implementation of this algorithm is proposed. GPUs are massive parallel machines. In this implementation, the alternating path search phase of the algorithm is distributed by several blocks in a way to minimize global device synchronization. This phase is very important and has a big contribution to the execution time. Other advanced features also implemented are: parallel graph traversal; the parallel detection of multiple alternating paths in a single iteration; a simplified and fast matrix compression that stores the zeros of the slack matrix, resulting in very fast graph traversal; highly optimized reductions for the initial slack matrix calculation and update. This results in a fast implementation for moderate size problems.
Year
DOI
Venue
2019
10.1016/j.jpdc.2019.03.014
Journal of Parallel and Distributed Computing
Keywords
Field
DocType
Hungarian algorithm,Linear assignment problem,GPU
Hungarian algorithm,Synchronization,Graph traversal,Computer science,Matrix (mathematics),CUDA,Parallel computing,Assignment problem,Execution time,Time complexity
Journal
Volume
ISSN
Citations 
130
0743-7315
1
PageRank 
References 
Authors
0.36
0
4