Title
Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration
Abstract
We have developed visual preprocessing algorithms for extracting phonologically relevant features from the grayscale video image of a speaker, to provide speaker-independent inputs for an automat(cid:173) ic lipreading ("speechreading") system. Visual features such as mouth open/closed, tongue visible/not-visible, teeth visible/not(cid:173) visible, and several shape descriptors of the mouth and its motion are all rapidly computable in a manner quite insensitive to lighting conditions. We formed a hybrid speechreading system consisting of two time delay neural networks (video and acoustic) and inte(cid:173) grated their responses by means of independent opinion pooling - the Bayesian optimal method given conditional independence, which seems to hold for our data. This hybrid system had an er(cid:173) ror rate 25% lower than that of the acoustic subsystem alone on a five-utterance speaker-independent task, indicating that video can be used to improve speech recognition.
Year
Venue
Keywords
1993
NIPS
neural network
Field
DocType
Citations 
Computer science,Conditional independence,Word error rate,Speech recognition,Preprocessor,Artificial intelligence,Artificial neural network,Sensory system,Speechreading,Hybrid system,Machine learning,Grayscale
Conference
21
PageRank 
References 
Authors
13.45
4
4
Name
Order
Citations
PageRank
Gregory J. Wolff121240.46
K. Venkatesh Prasad212824.66
David G. Stork3627106.17
Marcus E. Hennecke43618.65