Title | ||
---|---|---|
Regression based landmark estimation and multi-feature fusion for visual speech recognition |
Abstract | ||
---|---|---|
Visual speech recognition also known as lipreading can improve robustness of automatic acoustic speech recognition especially under noisy environments. However, it remains a challenging topic considering the variety of speaking characteristics and confusion between visual speech features. In this paper, we propose an automatic lipreading method by using a new lip tracking method and multiple visual information fusion to tackle the problem. First, a method of face landmark estimation based on regression is employed for lip detection, based on which a geometric-based shape invariant feature (SIF) is put forward. Moreover, it can also be applied to the removal of the non-speaking utterance. Then the motion interchange patterns and spatial-temporal descriptors are also adopted to describe the lip information, where the Bayes combination strategy is applied. The proposed method is explored on three benchmark data sets: Avletters2, OuluVS and PKUVS. Experimental results demonstrate promising results and show effectiveness of the proposed approach. © 2015 IEEE. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/ICIP.2015.7350911 | Proceedings - International Conference on Image Processing, ICIP |
Keywords | Field | DocType |
Visual Speech Recognition, Shape Invariant Features, Motion Interchange Patterns, Bayes Combination | Computer vision,Feature fusion,Regression,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Landmark | Conference |
Volume | ISSN | Citations |
2015-December | 1522-4880 | 0 |
PageRank | References | Authors |
0.34 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hong Liu | 1 | 747 | 82.65 |
Xue-Wu Zhang | 2 | 43 | 11.98 |
Wu Pingping | 3 | 32 | 4.36 |