Title
Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition.
Abstract
Multi-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7% relative improvement in WER is reported at -3 SNR dB
Year
Venue
Field
2017
arXiv: Computation and Language
Multi-task learning,Computer science,Speech recognition,Artificial intelligence,Hidden Markov model,Deep neural networks,Machine learning
DocType
Volume
Citations 
Journal
abs/1701.02477
2
PageRank 
References 
Authors
0.36
11
2
Name
Order
Citations
PageRank
Abhinav Thanda121.04
Shankar M. Venkatesan28912.61