Abstract | ||
---|---|---|
Long Short-Term Memory (Plain-LSTM) is efficient for acoustic modeling in automatic speech recognition systems, but their training is obstructed by the vanishing and exploding gradient issues. To alleviate the problem, the paper introduces an improved space residual LSTM (S-RES-LSTM), which uses the output before not after the LSTM projection layer as spatial shortcut connection compared to the previous RES-LSTM. Experiments for distant speech recognition on the AMI SDM show that S-RES-LSTM can reach 5% absolute WER(over) and 5.9% absolute WER (non-over) reduction than the Plain-LSTM in 9- layer in eval. It also has 0.6% absolute WER reduction than the RES-LSTM in 9-layer. To further enhance the information flow for S-RES-LSTM, the space and time residual LSTM (ST-RES-LSTM) is proposed, which adds an innovational residual connection in the temporal dimension. The experiments show that compared with the Plain-LSTM and the RES-LSTM, ST-RES-LSTM achieves 5.5% absolute WER(over) degradation and 1% absolute WER(over) reduction respectively in 9-layer in eval. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ISCSLP.2018.8706565 | 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) |
Keywords | Field | DocType |
Speech recognition,Training,Mathematical model,Hidden Markov models,Microphones,Acoustics,Neural networks | Space time,Residual,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Hidden Markov model,Artificial neural network | Conference |
ISBN | Citations | PageRank |
978-1-5386-5627-3 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Long Wu | 1 | 2 | 1.59 |
Li Wang | 2 | 250 | 56.88 |
Pengyuan Zhang | 3 | 50 | 19.46 |
Ta Li | 4 | 2 | 2.06 |
Yonghong Yan 0002 | 5 | 83 | 19.58 |