Read and Attend: Temporal Localisation in Sign Language Videos - Citegraph

Paper Info

Title
Read and Attend: Temporal Localisation in Sign Language Videos

Abstract
The objective of this work is to annotate sign instances across a broad vocabulary in continuous sign language. We train a Transformer model to ingest a continuous signing stream and output a sequence of written tokens on a largescale collection of signing footage with weakly-aligned subtitles. We show that through this training it acquires the ability to attend to a large vocabulary of sign instances in the input sequence, enabling their localisation. Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.

Year	DOI	Venue
2021	10.1109/CVPR46437.2021.01658	2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
DocType	ISSN	Citations
Conference	1063-6919	0
PageRank	References	Authors
0.34	14	5

Authors (5 rows)

Cited by (0 rows)

References (14 rows)

Name	Order	Citations	PageRank
gul varol	1	243	10.32
Liliane Momeni	2	1	1.37
Samuel Albanie	3	40	9.91
Triantafyllos Afouras	4	121	9.19
Andrew Zisserman	5	45998	3200.71

1