Title
Vector-to-Vector Regression via Distributional Loss for Speech Enhancement
Abstract
In this work, we leverage on a novel distributional loss to improve vector-to-vector regression for feature-based speech enhancement (SE). The distributional loss function is devised based on the Kullback-Leibler divergence between a selected target distribution and a conditional distribution to be learned from the data for each coefficient in the clean speech vector given the noisy input features. A deep model having a softmax layer per coefficient is employed to parametrize the conditional distribution, and deep model parameters are found by minimizing a weighted sum of the cross-entropy between its outputs and respective target distributions. Experiments with convolutional neural networks (CNNs) on publicly available noisy speech dataset obtained from the Voice Bank corpus show consistent improvement over conventional solutions based on the mean squared error (MSE), and the least absolute deviation (LAD). Moreover, our approach compares favourably in terms of both speech quality and intelligibility against the Mixture Density Networks (MDNs), which is also an approach that relies on computing parametric conditional distributions based on Gaussian mixture models (GMMs) and a neural architecture. Comparison against GAN-based solutions are presented as well.
Year
DOI
Venue
2021
10.1109/LSP.2021.3050386
IEEE Signal Processing Letters
Keywords
DocType
Volume
Deep neural network,distributional loss,knowledge distillation,multi-task learning,speech enhancement
Journal
28
Issue
ISSN
Citations 
99
1070-9908
0
PageRank 
References 
Authors
0.34
0
1
Name
Order
Citations
PageRank
Sabato Marco Siniscalchi131030.21