Title
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Abstract
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask. Our system significantly reduces the speech recognition WER on multi-speaker signals, with minimal WER degradation on single-speaker signals.
Year
DOI
Venue
2019
10.21437/Interspeech.2019-1101
arXiv: Audio and Speech Processing
DocType
Volume
Citations 
Conference
abs/1810.04826
13
PageRank 
References 
Authors
0.60
8
10
Name
Order
Citations
PageRank
Quan Wang111520.15
Hannah Muckenhirn2293.08
Kevin W. Wilson334828.35
Prashant Sridhar4141.28
Zelin Wu5152.00
John R. Hershey684465.57
Rif Saurous714810.49
Ron J. Weiss844329.47
Ye Jia9584.68
Ignacio Lopez-Moreno1018714.97