Title | ||
---|---|---|
JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning |
Abstract | ||
---|---|---|
This paper describes the JHU team's Kaldi system submission to the Arabic MGB-3: The Arabic speech recognition in the Wild Challenge for ASRU-2017. We use a weights transfer approach to adapt a neural network trained on the out-of-domain MGB-2 multi-dialect Arabic TV broadcast corpus to the MGB-3 Egyptian YouTube video corpus. The neural network has a TDNN-LSTM architecture and is trained using lattice-free maximum mutual information (LF-MMI) objective followed by sMBR discriminative training. For supervision, we fuse transcripts from 4 independent transcribers into confusion network training graphs. We also describe our own approach for speaker diarization and audio-transcript alignment. We use this to prepare lightly supervised transcriptions for training the seed system used for adaptation to MGB-3. Our primary submission to the challenge gives a multi-reference WER of 32.78% on the MGB-3 test set. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/ASRU.2017.8268956 | 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |
Keywords | DocType | ISBN |
Multi-genre broadcast,Automatic speech recognition,Lightly-supervised training,LF-MMI,Segmentation | Conference | 978-1-5090-4789-5 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vimal Manohar | 1 | 54 | 7.99 |
Daniel Povey | 2 | 2442 | 231.75 |
Sanjeev Khudanpur | 3 | 2155 | 202.00 |