Title | ||
---|---|---|
Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition. |
Abstract | ||
---|---|---|
Multi-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7% relative improvement in WER is reported at -3 SNR dB |
Year | Venue | Field |
---|---|---|
2017 | arXiv: Computation and Language | Multi-task learning,Computer science,Speech recognition,Artificial intelligence,Hidden Markov model,Deep neural networks,Machine learning |
DocType | Volume | Citations |
Journal | abs/1701.02477 | 2 |
PageRank | References | Authors |
0.36 | 11 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abhinav Thanda | 1 | 2 | 1.04 |
Shankar M. Venkatesan | 2 | 89 | 12.61 |