Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration - Citegraph

Paper Info

Title
Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration

Abstract
We have developed visual preprocessing algorithms for extracting phonologically relevant features from the grayscale video image of a speaker, to provide speaker-independent inputs for an automat(cid:173) ic lipreading ("speechreading") system. Visual features such as mouth open/closed, tongue visible/not-visible, teeth visible/not(cid:173) visible, and several shape descriptors of the mouth and its motion are all rapidly computable in a manner quite insensitive to lighting conditions. We formed a hybrid speechreading system consisting of two time delay neural networks (video and acoustic) and inte(cid:173) grated their responses by means of independent opinion pooling - the Bayesian optimal method given conditional independence, which seems to hold for our data. This hybrid system had an er(cid:173) ror rate 25% lower than that of the acoustic subsystem alone on a five-utterance speaker-independent task, indicating that video can be used to improve speech recognition.

Year	Venue	Keywords
1993	NIPS	neural network
Field	DocType	Citations
Computer science,Conditional independence,Word error rate,Speech recognition,Preprocessor,Artificial intelligence,Artificial neural network,Sensory system,Speechreading,Hybrid system,Machine learning,Grayscale	Conference	21
PageRank	References	Authors
13.45	4	4

Authors (4 rows)

Cited by (21 rows)

References (4 rows)

Name	Order	Citations	PageRank
Gregory J. Wolff	1	212	40.46
K. Venkatesh Prasad	2	128	24.66
David G. Stork	3	627	106.17
Marcus E. Hennecke	4	36	18.65

1