Title
The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function
Abstract
The Multi-Genre Broadcast challenge is an official challenge of the IEEE Automatic Speech Recognition and Understanding Workshop. This paper presents NAISTs contribution to the premiere of this challenge. The presented speech-to-text system for English makes use of various front-ends (e.g., MFCC, i-vector and FBANK), DNN acoustic models and several language models for decoding and rescoring (N-gram, RNNLM). Subsets of the training data with varying sizes were evaluated with respect to the overall training quality. Two speech segmentation systems were developed for the challenge, based on DNNs and GMM-HMMs. Recognition was performed in three stages: Decoding, lattice rescoring and system combination. This paper focuses on the system combination experiments and presents a rank-score based system weighting approach, which gave better performance compared to a normal system combination strategy. The DNN based ASR system trained on MFCC + i-vector features with the sMBR training criterion gives the best performance of 27.8% WER, and thus significantly outperforms the baseline DNN-HMM sMBR yielding 33.7% WER.
Year
DOI
Venue
2015
10.1109/ASRU.2015.7404858
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Keywords
Field
DocType
speech recognition,ASRU MGB,broadcast,evaluation system,system development
Broadcasting,Mel-frequency cepstrum,Weighting,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Decoding methods,Deep learning,Score,Speech segmentation,Language model
Conference
Citations 
PageRank 
References 
2
0.39
11
Authors
6
Name
Order
Citations
PageRank
do quoc truong1105.67
Michael Heck2165.20
Sakriani Sakti325765.02
Graham Neubig4989130.31
Tomoki Toda51874167.18
Satoshi Nakamura61099194.59