Abstract | ||
---|---|---|
The objective of this work is to annotate sign instances across a broad vocabulary in continuous sign language. We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens on a largescale collection of signing footage with weakly-aligned subtitles. We show that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation. Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/CVPR46437.2021.01658 | 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 |
DocType | ISSN | Citations |
Conference | 1063-6919 | 0 |
PageRank | References | Authors |
0.34 | 14 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
gul varol | 1 | 243 | 10.32 |
Liliane Momeni | 2 | 1 | 1.37 |
Samuel Albanie | 3 | 40 | 9.91 |
Triantafyllos Afouras | 4 | 121 | 9.19 |
Andrew Zisserman | 5 | 45998 | 3200.71 |