VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking - Citegraph

Paper Info

Title
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Abstract
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask. Our system significantly reduces the speech recognition WER on multi-speaker signals, with minimal WER degradation on single-speaker signals.

Year	DOI	Venue
2019	10.21437/Interspeech.2019-1101	arXiv: Audio and Speech Processing
DocType	Volume	Citations
Conference	abs/1810.04826	13
PageRank	References	Authors
0.60	8	10

Authors (10 rows)

Cited by (13 rows)

References (8 rows)

Name	Order	Citations	PageRank
Quan Wang	1	115	20.15
Hannah Muckenhirn	2	29	3.08
Kevin W. Wilson	3	348	28.35
Prashant Sridhar	4	14	1.28
Zelin Wu	5	15	2.00
John R. Hershey	6	844	65.57
Rif Saurous	7	148	10.49
Ron J. Weiss	8	443	29.47
Ye Jia	9	58	4.68
Ignacio Lopez-Moreno	10	187	14.97

1