Title
Parallelization techniques for implementing trellis algorithms on graphics processors
Abstract
In this paper, we study different schemes to parallelize trellis algorithms for efficient implementation on a GPU. We consider parallelization schemes at the packet-level, subblock-level and trellis-level to increase the number of threads in a GPU implementation. At the trellis-level, we consider state-level, forward-backward traversal and branch-metric parallelism. To evaluate the performance of the different schemes, an LTE uplink Turbo decoder is implemented on an NVIDIA GTX470 GPU. Tradeoffs between throughput, latency and bit error rate are presented. Our most balanced configuration is simultaneously processing multiple subblocks in a packet in conjunction with recovery schemes and trellis-level parallelism, which can achieve a throughput of 19.65 Mbps with a latency of 0.56 ms at bit error rate of 10-5 for 1.3 dB channel SNR. We also show how different combinations of parallelization schemes can be used to satisfy systems with widely varying requirements of throughput, latency and bit error rate.
Year
DOI
Venue
2013
10.1109/ISCAS.2013.6572072
Circuits and Systems
Keywords
Field
DocType
Long Term Evolution,decoding,graphics processing units,turbo codes,GPU implementation,LTE uplink turbo decoder,NVIDIA GTX470 GPU,bit error rate,branch-metric parallelism,graphics processors,packet-level parallelism,parallelization technique,recovery scheme,state-level forward-backward traversal parallelism,subblock-level parallelism,trellis algorithm,trellis-level parallelism
Instruction-level parallelism,Task parallelism,Computer science,Parallel computing,Network packet,Turbo code,Communication channel,Algorithm,Data parallelism,Throughput,Bit error rate
Conference
ISSN
ISBN
Citations 
0271-4302
978-1-4673-5760-9
2
PageRank 
References 
Authors
0.53
4
4
Name
Order
Citations
PageRank
Zheng, Q.120.53
Chen-Yi Lee21211152.40
Dreslinski, R.3432.37
Chaitali Chakrabarti41978184.17