Title
Regression based landmark estimation and multi-feature fusion for visual speech recognition
Abstract
Visual speech recognition also known as lipreading can improve robustness of automatic acoustic speech recognition especially under noisy environments. However, it remains a challenging topic considering the variety of speaking characteristics and confusion between visual speech features. In this paper, we propose an automatic lipreading method by using a new lip tracking method and multiple visual information fusion to tackle the problem. First, a method of face landmark estimation based on regression is employed for lip detection, based on which a geometric-based shape invariant feature (SIF) is put forward. Moreover, it can also be applied to the removal of the non-speaking utterance. Then the motion interchange patterns and spatial-temporal descriptors are also adopted to describe the lip information, where the Bayes combination strategy is applied. The proposed method is explored on three benchmark data sets: Avletters2, OuluVS and PKUVS. Experimental results demonstrate promising results and show effectiveness of the proposed approach. © 2015 IEEE.
Year
DOI
Venue
2015
10.1109/ICIP.2015.7350911
Proceedings - International Conference on Image Processing, ICIP
Keywords
Field
DocType
Visual Speech Recognition, Shape Invariant Features, Motion Interchange Patterns, Bayes Combination
Computer vision,Feature fusion,Regression,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Landmark
Conference
Volume
ISSN
Citations 
2015-December
1522-4880
0
PageRank 
References 
Authors
0.34
10
3
Name
Order
Citations
PageRank
Hong Liu174782.65
Xue-Wu Zhang24311.98
Wu Pingping3324.36