Title
BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge
Abstract
We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.
Year
DOI
Venue
2015
10.1109/ASRU.2015.7404829
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Keywords
Field
DocType
Robust Speech Recognition,Beamforming,Feature Enhancement,Neural Networks
Beamforming,Speech processing,Pattern recognition,Computer science,Voice activity detection,Word error rate,Speech recognition,Time delay neural network,Test data,Artificial intelligence,Test set,Acoustic model
Conference
Citations 
PageRank 
References 
14
0.71
11
Authors
4
Name
Order
Citations
PageRank
Jahn Heymann110210.29
Lukas Drude29511.10
Aleksej Chinaev3223.05
Reinhold Haeb-Umbach41487211.71