Abstract | ||
---|---|---|
In this letter, we explored the usage of spatio-temporal information in one unified framework to improve the performance of multichannel speech recognition. Generalized cross correlation (GCC) is served as spatial feature compensation, and an attention mechanism across time is embedded within long short-term memory (LSTM) neural networks. Experiments on the AMI meeting corpus show that the proposed method provides a 8.2% relative improvement in word error rate (WER) over the model trained directly on the concatenation of multiple microphone outputs. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1587/transinf.2017EDL8268 | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS |
Keywords | Field | DocType |
multichannel speech recognition, long short-term memory, attention mechanism, generalized cross correlation | Spatial analysis,Computer vision,Computer science,Speech recognition,Artificial intelligence | Journal |
Volume | Issue | ISSN |
E101D | 7 | 1745-1361 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yu Zhang | 1 | 294 | 98.00 |
Pengyuan Zhang | 2 | 50 | 19.46 |
Qingwei Zhao | 3 | 80 | 20.70 |