Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition. - Citegraph

Paper Info

Title
Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition.

Abstract
Multi-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7% relative improvement in WER is reported at -3 SNR dB

Year	Venue	Field
2017	arXiv: Computation and Language	Multi-task learning,Computer science,Speech recognition,Artificial intelligence,Hidden Markov model,Deep neural networks,Machine learning
DocType	Volume	Citations
Journal	abs/1701.02477	2
PageRank	References	Authors
0.36	11	2

Authors (2 rows)

Cited by (2 rows)

References (11 rows)

Name	Order	Citations	PageRank
Abhinav Thanda	1	2	1.04
Shankar M. Venkatesan	2	89	12.61

1