Learning Visual Voice Activity Detection With An Automatically Annotated Dataset - Citegraph

Paper Info

Title
Learning Visual Voice Activity Detection With An Automatically Annotated Dataset

Abstract
Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing VVAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets inthe-wild - WildV-VAD - based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset.

Year	DOI	Venue
2020	10.1109/ICPR48806.2021.9412884	2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)
DocType	ISSN	Citations
Conference	1051-4651	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sylvain Guy	1	0	0.34
Stéphane Lathuilière	2	33	5.98
Pablo Mesejo	3	16	3.01
Radu Horaud	4	2776	261.99

1