Abstract | ||
---|---|---|
We are at an exciting time for machine lipreading. Traditional research stemmedfrom the adaptation of audio recognition systems. But now, the computer vision communityis also participating. This joining of two previously disparate areas with differentperspectives on computer lipreading is creating opportunities for collaborations, but indoing so the literature is experiencing challenges in knowledge sharing due to multipleuses of terms and phrases and the range of methods for scoring results.In particular we highlight three areas with the intention to improve communicationbetween those researching lipreading; the effects of interchanging between speech readingand lipreading; speaker dependence across train, validation, and test splits; and theuse of accuracy, correctness, errors, and varying units (phonemes, visemes, words, andsentences) to measure system performance. We make recommendations as to how wecan be more consistent. |
Year | Venue | Field |
---|---|---|
2017 | arXiv: Computer Vision and Pattern Recognition | Knowledge sharing,Computer science,Viseme,Correctness,Speech recognition,Natural language processing,Artificial intelligence |
DocType | Volume | ISSN |
Journal | abs/1710.01292 | Helen L Bear and Sarah Taylor. Visual speech recognition: aligning
terminologies for better understanding. British Machine Vision Conference
(BMVC) Deep learning for machine lip reading workshop. 2017 |
Citations | PageRank | References |
0 | 0.34 | 10 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Helen L. Bear | 1 | 30 | 7.10 |
Sarah L. Taylor | 2 | 67 | 4.77 |