Improved Single System Conversational Telephone Speech Recognition With Vgg Bottleneck Features - Citegraph

Paper Info

Title
Improved Single System Conversational Telephone Speech Recognition With Vgg Bottleneck Features

Abstract
On small datasets, discriminatively trained bottleneck features from deep networks commonly outperform more traditional spectral or cepstral features. While these features are typically trained with small, fully-connected networks, recent studies have used more sophisticated networks with great success. We use the recent deep CNN (VGG) network for bottleneck feature extraction previously used only for low-resource tasks and apply it to the Switchboard English conversational telephone speech task. Unlike features derived from traditional MLP networks, the VGG features outperform cepstral features even when used with BLSTM acoustic models trained on large amounts of data. We achieve the best BBN single system performance when combining the VGG features with a BLSTM acoustic model. When decoding with an n-gram language model, which are used for deployable systems, we have a realistic production system with a WER of 7.4%. This result is competitive with the current state-of-the-an in the literature. While our focus is on realistic single system performance, we further reduce the WER to 6.1% through system combination and using expensive neural network language model rescoring.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-1513	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	Field	DocType
Conversational speech recognition, VGG, bottleneck features, Switchboard	Bottleneck,Computer science,Speech recognition	Conference
ISSN	Citations	PageRank
2308-457X	1	0.38
References	Authors
10	6

Authors (6 rows)

Cited by (1 rows)

References (10 rows)

Name	Order	Citations	PageRank
William Hartmann	1	64	10.66
Roger Hsiao	2	57	3.32
Tim Ng	3	122	9.38
Jeff Z. Ma	4	133	15.79
Francis Keith	5	1	1.06
Manhung Siu	6	464	61.40

1