Title
Multi-View Visual Speech Recognition Based On Multi Task Learning
Abstract
Visual speech recognition (VSR), also known as lipreading is a task that recognizes word or phrase using video clip of lip movement. Traditional VSR methods are limited in that they are based mostly on VSR of frontal view facial movement. This limitation should be relaxed to include lip movement from all angles. In this paper, we propose a pose invariant network which can recognize word spciken from any arbitrary view input. The architecture that combines convolutional neural network (CNN) with bidirectional long short-term memory (LSTM) is trained in a multi-task manner such that the pose and the word spoken are jointly classified. Here, pose classification is considered as the auxiliary task. To comparatively evaluate the performance of the proposed multi-task learning, OuluVS2 benchmark dataset is considered. The experimental results show that the deep model learned based on the proposed multi-task learning method prove its advantage compared to previous single-view VSR methods and also previous multi-view lipreading methods. This deep model achieved recognition performance of 95.0% accuracy on OuluVS2 dataset.
Year
Venue
Keywords
2017
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)
lipreading, multi view, multi task, pose-invariant, Visual Speech Recognition
Field
DocType
ISSN
Multi-task learning,Task analysis,Pattern recognition,Visualization,Convolutional neural network,Computer science,Phrase,Speech recognition,Artificial intelligence,Facial movement
Conference
1522-4880
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
HouJeung Han100.34
Sunghun Kang252.00
Chang D. Yoo337545.88