Title | ||
---|---|---|
Improving Audio-Visual Speech Recognition Performance With Cross-Modal Student-Teacher Training |
Abstract | ||
---|---|---|
In this paper, we propose a cross-modal student-teacher learning framework to make a full use of externally abundant acoustic data in addition to a given task-specific audio-visual training database for improving speech recognition performance under the low signal-to-noise-ratio ( SNR) and acoustic mismatch conditions. First, a teacher model is trained with large-sized audio-only databases. Next, a student, namely a deep neural network ( DNN) model, is trained on a small-sized audio-visual database to minimize the Kullback-Leibler ( KL) divergence between its output and the posterior distribution of the teacher. We evaluate the proposed approach in both matched and mismatch acoustic conditions for phone recognition with the NTCD-TIMIT database. Compared to the DNN recognition system trained with the original audio-visual data only, the proposed solution reduces the phone error rate ( PER) from 26.7% to 21.3% on a matched acoustic scenario. In the mismatch conditions, the PER is reduced from 47.9% to 42.9%. Moreover, we show that posteriors generated by the teacher contain environmental information, which enables our proposed student-teacher learning to work as an environmental-aware training and good PER reductions are observed in all SNR conditions. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/icassp.2019.8682868 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Keywords | DocType | ISSN |
Audio-visual speech recognition, deep neural network, cross-modal training, student-teacher training, transfer learning, environmental-aware training | Conference | 1520-6149 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wei Li | 1 | 436 | 140.67 |
Sicheng Wang | 2 | 0 | 0.68 |
Ming Lei | 3 | 10 | 8.36 |
Sabato Marco Siniscalchi | 4 | 310 | 30.21 |
Chin-Hui Lee | 5 | 6101 | 852.71 |