Title
Visual speech recognition: aligning terminologies for better understanding.
Abstract
We are at an exciting time for machine lipreading. Traditional research stemmedfrom the adaptation of audio recognition systems. But now, the computer vision communityis also participating. This joining of two previously disparate areas with differentperspectives on computer lipreading is creating opportunities for collaborations, but indoing so the literature is experiencing challenges in knowledge sharing due to multipleuses of terms and phrases and the range of methods for scoring results.In particular we highlight three areas with the intention to improve communicationbetween those researching lipreading; the effects of interchanging between speech readingand lipreading; speaker dependence across train, validation, and test splits; and theuse of accuracy, correctness, errors, and varying units (phonemes, visemes, words, andsentences) to measure system performance. We make recommendations as to how wecan be more consistent.
Year
Venue
Field
2017
arXiv: Computer Vision and Pattern Recognition
Knowledge sharing,Computer science,Viseme,Correctness,Speech recognition,Natural language processing,Artificial intelligence
DocType
Volume
ISSN
Journal
abs/1710.01292
Helen L Bear and Sarah Taylor. Visual speech recognition: aligning terminologies for better understanding. British Machine Vision Conference (BMVC) Deep learning for machine lip reading workshop. 2017
Citations 
PageRank 
References 
0
0.34
10
Authors
2
Name
Order
Citations
PageRank
Helen L. Bear1307.10
Sarah L. Taylor2674.77