Personal VAD - Speaker-Conditioned Voice Activity Detection. - Citegraph

Paper Info

Title
Personal VAD - Speaker-Conditioned Voice Activity Detection.

Abstract
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level. This system is useful for gating the inputs to a streaming speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption. We achieve this by training a VAD-alike neural network that is conditioned on the target speaker embedding or the speaker verification score. For every frame, personal VAD outputs the scores for three classes: non-speech, target speaker speech, and non-target speaker speech. With our optimal setup, we are able to train a 130KB model that outperforms a baseline system where individually trained standard VAD and speaker recognition network are combined to perform the same task.

Year	DOI	Venue
2020	10.21437/Odyssey.2020-62	Odyssey
DocType	Citations	PageRank
Conference	1	0.35
References	Authors
0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ding Shaojin	1	1	0.35
Quan Wang	2	115	20.15
Shuo-Yiin Chang	3	27	4.71
Wan Li	4	1	2.38
Moreno Ignacio Lopez	5	1	0.35

1