Title
A Fully Data Parallel Wfst-Based Large Vocabulary Continuous Speech Recognition On A Graphics Processing Unit
Abstract
Tremendous compute throughput is becoming available in personal desktop and laptop systems through the use of graphics processing units (GPUs). However, exploiting this resource requires re-architecting an application to fit a data parallel programming model. The complex graph traversal routines in the inference process for large vocabulary continuous speech recognition (LVCSR) have been considered by many as unsuitable for extensive parallelization. We explore and demonstrate a fully data parallel implementation of a speech inference engine on NVIDIA's GTX280 GPU. Our implementation consists of two phases - compute-intensive observation probability computation phase and communication-intensive graph traversal phase. We take advantage of dynamic elimination of redundant computation in the compute-intensive phase while maintaining close-to-peak execution efficiency. We also demonstrate the importance of exploring application-level trade-offs in the communication-intensive graph traversal phase to adapt the algorithm to data parallel execution on GPUs. On 3.1 hours of speech data set, we achieve more than 11 x speedup compared to a highly optimized sequential implementation on Intel Core i7 without sacrificing accuracy.
Year
Venue
Keywords
2009
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5
Data parallel, Continuous Speech Recognition, Graphics Processing Unit
Field
DocType
Citations 
Graphics,Graph traversal,Computer science,Speech recognition,Parallel programming model,Inference engine,Throughput,Graphics processing unit,Vocabulary,Speedup
Conference
18
PageRank 
References 
Authors
1.05
12
4
Name
Order
Citations
PageRank
Jike Chong113611.62
Ekaterina Gonina2736.50
Youngmin Yi328125.93
Kurt Keutzer45040801.67