Title
Towards View-Independent Viseme Recognition Based On Cnns And Synthetic Data
Abstract
Visual Speech Recognition is the ability to interpret spoken text using video information only. To address such task automatically, recent works have employed Deep Learning and obtained high accuracy on the recognition of words and sentences uttered in controlled environments, with limited head-pose variation. However, the accuracy drops for multi-view datasets and when it comes to interpreting isolated mouth shapes, such as visemes, the values reported are considerably lower, as shorter segments of speech lack temporal and contextual information. In this work, we evaluate the applicability of synthetic datasets for assisting recognition of visemes in real-world data acquired under controlled and uncontrolled environments, using GRID and AVICAR datasets, respectively. We create two large-scale synthetic 2D datasets based on realistic 3D facial models - with near-frontal and multi-view mouth images. We perform experiments that indicate that a transfer learning approach using synthetic data can get higher accuracy than training from scratch using real data only, on both scenarios.
Year
Venue
Keywords
2018
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)
Image recognition, Speech recognition, Computer graphics, Machine learning
Field
DocType
ISSN
Pattern recognition,Task analysis,Viseme,Computer science,Transfer of learning,Synthetic data,Solid modeling,Artificial intelligence,Deep learning,Hidden Markov model,Grid
Conference
1522-4880
Citations 
PageRank 
References 
0
0.34
0
Authors
3