Multi-View Visual Speech Recognition Based On Multi Task Learning - Citegraph

Paper Info

Title
Multi-View Visual Speech Recognition Based On Multi Task Learning

Abstract
Visual speech recognition (VSR), also known as lipreading is a task that recognizes word or phrase using video clip of lip movement. Traditional VSR methods are limited in that they are based mostly on VSR of frontal view facial movement. This limitation should be relaxed to include lip movement from all angles. In this paper, we propose a pose invariant network which can recognize word spciken from any arbitrary view input. The architecture that combines convolutional neural network (CNN) with bidirectional long short-term memory (LSTM) is trained in a multi-task manner such that the pose and the word spoken are jointly classified. Here, pose classification is considered as the auxiliary task. To comparatively evaluate the performance of the proposed multi-task learning, OuluVS2 benchmark dataset is considered. The experimental results show that the deep model learned based on the proposed multi-task learning method prove its advantage compared to previous single-view VSR methods and also previous multi-view lipreading methods. This deep model achieved recognition performance of 95.0% accuracy on OuluVS2 dataset.

Year	Venue	Keywords
2017	2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)	lipreading, multi view, multi task, pose-invariant, Visual Speech Recognition
Field	DocType	ISSN
Multi-task learning,Task analysis,Pattern recognition,Visualization,Convolutional neural network,Computer science,Phrase,Speech recognition,Artificial intelligence,Facial movement	Conference	1522-4880
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
HouJeung Han	1	0	0.34
Sunghun Kang	2	5	2.00
Chang D. Yoo	3	375	45.88

1