Title
Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
Abstract
In this work, we propose a novel training scheme to modularize end-to-end systems. Our training scheme aims at altering the flow of information in an end-to-end system to use the kernels of this system for another system that fulfills another task. We apply this scheme to extract the noise reduction capabilities from a noise-robust automatic speech recognition (ASR) system and implement a speech enhancer from it. This enhancer receives spectral representations from unfiltered audio and outputs cleaned spectral representations. Our enhancer can be integrated into an ASR system as front-end, is trainable, and reduces background noise. Our front-end uses a decoder to clean speech based on the hidden activations of the ASR system Jasper. While training, we exclusively adapt the weights in our decoder and the batch normalization in Jasper. The resulting spectral representations show less background noise. Further, areas in the spectral features are not reconstructed if they do not contribute to speech recognition. We demonstrate that our front-end can be combined with a pre-trained ASR system as back-end and supports speech recognition in noisy conditions. Further, we show that training another ASR system with our front-end results in an increased performance of the ASR system in noisy as well as noiseless conditions. The ASR system's performance is especially improved on challenging speech datasets.
Year
DOI
Venue
2021
10.1109/IJCNN52387.2021.9533390
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
DocType
ISSN
Citations 
Conference
2161-4393
0
PageRank 
References 
Authors
0.34
2
4
Name
Order
Citations
PageRank
Matthias Möller114.08
Johannes Twiefel2124.48
Cornelius Weber331841.92
Stefan Wermter41100151.62