Title
JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning
Abstract
This paper describes the JHU team's Kaldi system submission to the Arabic MGB-3: The Arabic speech recognition in the Wild Challenge for ASRU-2017. We use a weights transfer approach to adapt a neural network trained on the out-of-domain MGB-2 multi-dialect Arabic TV broadcast corpus to the MGB-3 Egyptian YouTube video corpus. The neural network has a TDNN-LSTM architecture and is trained using lattice-free maximum mutual information (LF-MMI) objective followed by sMBR discriminative training. For supervision, we fuse transcripts from 4 independent transcribers into confusion network training graphs. We also describe our own approach for speaker diarization and audio-transcript alignment. We use this to prepare lightly supervised transcriptions for training the seed system used for adaptation to MGB-3. Our primary submission to the challenge gives a multi-reference WER of 32.78% on the MGB-3 test set.
Year
DOI
Venue
2017
10.1109/ASRU.2017.8268956
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Keywords
DocType
ISBN
Multi-genre broadcast,Automatic speech recognition,Lightly-supervised training,LF-MMI,Segmentation
Conference
978-1-5090-4789-5
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Vimal Manohar1547.99
Daniel Povey22442231.75
Sanjeev Khudanpur32155202.00