Title
Rapid Training Of Acoustic Models Using Graphics Processing Units
Abstract
Robust and accurate speech recognition systems can only be realized with adequately trained acoustic models. For common languages, state-of-the-art systems are now trained on thousands of hours of speech data. Even with a large cluster of machines the entire training process can take many weeks. To overcome this development bottleneck we propose a new framework for rapid training of acoustic models using highly parallel graphics processing units (GPUs). In this paper we focus on Viterbi training and describe the optimizations required for effective throughput on GPU processors. Using a single NVIDIA GTX580 GPU our proposed approach is shown to be 51x faster than a sequential CPU implementation, enabling a moderately sized acoustic model to be trained on 1000 hours of speech data in just over 9 hours. Moreover, we show that our implementation on a two-GPU system can perform 67% faster than a standard parallel reference implementation on a high-end 32-core Xeon server. Our GPU-based training platform empowers research groups to rapidly evaluate new ideas and build accurate and robust acoustic models on very large training corpora.
Year
Venue
Keywords
2011
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5
Continuous Speech Recognition, Acoustic Model Training, Graphics Processing Unit
Field
DocType
Citations 
Computer graphics (images),Computer science,Speech recognition,Graphics processing unit
Conference
0
PageRank 
References 
Authors
0.34
1
3
Name
Order
Citations
PageRank
Senaka Buthpitiya11239.07
Ian R. Lane225933.64
Jike Chong313611.62