Title
Improved Single System Conversational Telephone Speech Recognition With Vgg Bottleneck Features
Abstract
On small datasets, discriminatively trained bottleneck features from deep networks commonly outperform more traditional spectral or cepstral features. While these features are typically trained with small, fully-connected networks, recent studies have used more sophisticated networks with great success. We use the recent deep CNN (VGG) network for bottleneck feature extraction previously used only for low-resource tasks and apply it to the Switchboard English conversational telephone speech task. Unlike features derived from traditional MLP networks, the VGG features outperform cepstral features even when used with BLSTM acoustic models trained on large amounts of data. We achieve the best BBN single system performance when combining the VGG features with a BLSTM acoustic model. When decoding with an n-gram language model, which are used for deployable systems, we have a realistic production system with a WER of 7.4%. This result is competitive with the current state-of-the-an in the literature. While our focus is on realistic single system performance, we further reduce the WER to 6.1% through system combination and using expensive neural network language model rescoring.
Year
DOI
Venue
2017
10.21437/Interspeech.2017-1513
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords
Field
DocType
Conversational speech recognition, VGG, bottleneck features, Switchboard
Bottleneck,Computer science,Speech recognition
Conference
ISSN
Citations 
PageRank 
2308-457X
1
0.38
References 
Authors
10
6
Name
Order
Citations
PageRank
William Hartmann16410.66
Roger Hsiao2573.32
Tim Ng31229.38
Jeff Z. Ma413315.79
Francis Keith511.06
Manhung Siu646461.40