Abstract | ||
---|---|---|
We describe the speech recognition systems we have created for MGB-3, the 3
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">rd</sup>
Multi Genre Broadcast challenge, which this year consisted of a task of building a system for transcribing Egyptian Dialect Arabic speech, using a big audio corpus of primarily Modern Standard Arabic speech and only a small amount (5 hours) of Egyptian adaptation data. Our system, which was a combination of different acoustic models, language models and lexical units, achieved a Multi-Reference Word Error Rate of 29.25%, which was the lowest in the competition. Also on the old MGB-2 task, which was run again to indicate progress, we achieved the lowest error rate: 13.2%. The result is a combination of the application of state-of-the-art speech recognition methods such as simple dialect adaptation for a Time-Delay Neural Network (TDNN) acoustic model (−27% errors compared to the baseline), Recurrent Neural Network Language Model (RNNLM) rescoring (an additional −5%), and system combination with Minimum Bayes Risk (MBR) decoding (yet another −10%). We also explored the use of morph and character language models, which was particularly beneficial in providing a rich pool of systems for the MBR decoding. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/ASRU.2017.8268955 | 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |
Keywords | DocType | ISBN |
speech recognition,dialect adaptation,subwords,neural network language models,system combination | Conference | 978-1-5090-4789-5 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peter Smit | 1 | 18 | 5.08 |
Siva Reddy Gangireddy | 2 | 10 | 2.30 |
seppo enarvi | 3 | 4 | 2.44 |
Sami Virpioja | 4 | 299 | 25.51 |
Mikko Kurimo | 5 | 908 | 93.37 |