Title | ||
---|---|---|
Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration |
Abstract | ||
---|---|---|
We have developed visual preprocessing algorithms for extracting phonologically relevant features from the grayscale video image of a speaker, to provide speaker-independent inputs for an automat(cid:173) ic lipreading ("speechreading") system. Visual features such as mouth open/closed, tongue visible/not-visible, teeth visible/not(cid:173) visible, and several shape descriptors of the mouth and its motion are all rapidly computable in a manner quite insensitive to lighting conditions. We formed a hybrid speechreading system consisting of two time delay neural networks (video and acoustic) and inte(cid:173) grated their responses by means of independent opinion pooling - the Bayesian optimal method given conditional independence, which seems to hold for our data. This hybrid system had an er(cid:173) ror rate 25% lower than that of the acoustic subsystem alone on a five-utterance speaker-independent task, indicating that video can be used to improve speech recognition. |
Year | Venue | Keywords |
---|---|---|
1993 | NIPS | neural network |
Field | DocType | Citations |
Computer science,Conditional independence,Word error rate,Speech recognition,Preprocessor,Artificial intelligence,Artificial neural network,Sensory system,Speechreading,Hybrid system,Machine learning,Grayscale | Conference | 21 |
PageRank | References | Authors |
13.45 | 4 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gregory J. Wolff | 1 | 212 | 40.46 |
K. Venkatesh Prasad | 2 | 128 | 24.66 |
David G. Stork | 3 | 627 | 106.17 |
Marcus E. Hennecke | 4 | 36 | 18.65 |