Title
An Embedded System for In-Vehicle Visual Speech Activity Detection
Abstract
We present a system for automatically detecting driver's speech in the automobile domain using visual-only information extracted from the driver's mouth region. The work is motivated by the desire to eliminate manual push-to-talk activation of the speech recognition engine in newly designed voice interfaces in the typically noisy car environment, aiming at reducing driver cognitive load and increasing naturalness of the interaction. The proposed system uses a camera mounted on the rearview mirror to monitor the driver, detect face boundaries and facial features, andønally employ lip motion clues to recognize visual speech activity. In particular, the designed algorithm has very low computational cost, which allows real-time implemen- tation on currently available inexpensive embedded platforms, as described in the paper. Experiments are also reported on a small multi-speaker database collected in moving automobiles, that demonstrate promising accuracy. In this paper, we investigate an alternative approach to the problem of automatically detecting when the driver speaks. The proposed solution exploits visual information from the driver's mouth region, which of course is not affected by the noisy acoustic car environment. The approach is motivated by ongoing work on audio-visual automatic speech recogni- tion (AVASR), which has been repeatedly demonstrated to signiøcantly improve ASR accuracy especially in noisy envi- ronments by automatic lipreading (10). The system proposed here uses a similar principle. It captures visual information by an appropriately designed and placed camera and processes it by employing a sequence of simple algorithmic steps in order to drive voice activity detection (VAD). The particular algorithms are chosen to allow real-time implementation on limited-resource platforms, typically used in automobiles to allow low-cost solution integration. Details of the proposed system are presented in Section II, with embedded platform implementation described in Section IV. The algorithm is tested on a small multi-subject database recorded in moving automobiles, as reported in Section III. Finally, Section V closes the paper with a short summary.
Year
DOI
Venue
2007
10.1109/MMSP.2007.4412866
Crete
Keywords
Field
DocType
driver information systems,embedded systems,face recognition,feature extraction,speech recognition,automobile domain,driver speech,embedded system,face boundary detection,in-vehicle visual speech activity detection,lip motion clues,push-to-talk activation,speech recognition engine,visual-only information extraction,voice interfaces
Computer vision,Facial recognition system,Mouth region,Voice activity detection,Computer science,Naturalness,Feature extraction,Speech recognition,Artificial intelligence,Cognitive load
Conference
ISBN
Citations 
PageRank 
978-1-4244-1274-7
5
0.47
References 
Authors
3
4
Name
Order
Citations
PageRank
Libal, V.150.47
Jonathan Connell250.81
Potamianos, G.3281.66
Etienne Marcheret410011.15