Title
A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR
Abstract
Over the past two decades, many algorithms have been pro- posed to detect and track a human face and its facial features. Of particular interest to the Automatic Speech Recognition (ASR) community are algorithms that can track the shape of the lips, as such visual speech input can then be used in an auditory- visual (AV) ASR system to improve the recognition accuracy of traditional audio-only ASR systems, particularly in the pres- ence of acoustic noise. Despite the large number of face and lip tracking algorithms that have been proposed over the years, there is a lack of a comparative study that evaluates such algo- rithms in the context of AV ASR performance. In this paper, the performance of various 2D and 3D lip tracking algorithms is compared from a point of view of AV ASR. In particular, the focus of this study is on algorithms that use explicit lip models. A number of variants of the recently popular Active Appearance Models (AAMs) are compared with a 3D lip tracking algorithm that uses stereo vision. All performance evaluations are made using the AVOZES data corpus. Index Terms: Lip tracking, auditory-visual automatic speech recognition, active appearance model
Year
Venue
DocType
2008
AVSP
Conference
Citations 
PageRank 
References 
0
0.34
17
Authors
1
Name
Order
Citations
PageRank
Akshay Asthana172925.02