Title
Improve Multichannel Speech Recognition With Temporal And Spatial Information
Abstract
In this letter, we explored the usage of spatio-temporal information in one unified framework to improve the performance of multichannel speech recognition. Generalized cross correlation (GCC) is served as spatial feature compensation, and an attention mechanism across time is embedded within long short-term memory (LSTM) neural networks. Experiments on the AMI meeting corpus show that the proposed method provides a 8.2% relative improvement in word error rate (WER) over the model trained directly on the concatenation of multiple microphone outputs.
Year
DOI
Venue
2018
10.1587/transinf.2017EDL8268
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords
Field
DocType
multichannel speech recognition, long short-term memory, attention mechanism, generalized cross correlation
Spatial analysis,Computer vision,Computer science,Speech recognition,Artificial intelligence
Journal
Volume
Issue
ISSN
E101D
7
1745-1361
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Yu Zhang129498.00
Pengyuan Zhang25019.46
Qingwei Zhao38020.70