Accelerating the Training of HTK on GPU with CUDA - Citegraph

Paper Info

Title
Accelerating the Training of HTK on GPU with CUDA

Abstract
The training procedure of Hidden Markov Model (HMM) based Speech Recognition is often very time consuming because of its high computational complexity. The new parallel hardware like GPU can provide multi-thread processing and very high floating-point capability. We take advantage of GPU to accelerate a popular HMM-based Speech Recognition package - HTK. Based on the sequential code of HTK, we design the "paraTraining", a parallel training model in HTK and develop different optimization methods to improve the performance of HTK on GPU which include unrolling the nested loops and using "reduction add" which can maximize the number of threads per block, using warp mechanism of GPU to reduce synchronizing latency, building different indices of threads to address data efficiently. Experimental results show that about 20+ speedup can be achieved without loss in accuracy. We also discuss the implementation of our method on multi-GPU and got around two times speedup compared with on single-GPU.

Year	DOI	Venue
2012	10.1109/IPDPSW.2012.235	IPDPS Workshops
Keywords	Field	DocType
optimisation,reduction add,different index,warp mechanism,floating-point capability,optimization methods,thread indices,htk training,speech recognition,times speedup,parallel architectures,graphics processing units,training,multi-threading,different optimization method,performance improvement,c htk,sequential code,cuda,synchronizing latency reduction,parallel hardware,popular hmm-based speech recognition,hmm-based speech recognition training,computational complexity,paratraining design,high computational complexity,stream processor,high floating-point capability,parallel training model,gpu warp mechanism,new parallel hardware,gpu computin,multithread processing,hidden markov models,hidden markov model,nested loops,floating point arithmetic,data parallel computing,instruction sets,computational modeling,multi threading,vectors	Multithreading,CUDA,Computer science,Parallel computing,Thread (computing),Graphics processing unit,Hidden Markov model,Stream processing,Nested loop join,Speedup	Conference
ISSN	ISBN	Citations
2164-7062	978-1-4673-0974-5	0
PageRank	References	Authors
0.34	5	3

Authors (3 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
PengLiuZhihui Du	1	383	48.74
Xiangyu Li	2	27	10.25
Ji Wu	3	226	32.62

1