Title
Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN
Abstract
In this paper, we propose a low-latency speech enhancement technique for electrolaryngeal (EL) speech based on multi-task CLDNN. Although the EL speech can generate relatively intelligible speech, laryngectomees always suffer quality degradation of speech naturalness due to the mechanical excitation signals. To solve this problem, an EL speech enhancement technique based on CLDNN consisting of convolution, recurrent, and fully connected layers has been proposed. In this technique, an input feature vector of the EL speech is converted into several vocoder parameters such as excitation parameters and spectral parameters based on expert CLDNNs optimized for each feature. However, it is difficult to utilize speech communication because its bi-directional recurrent layers cause a large delay to wait for the end of the utterance. To address this issue, in this paper, we propose multi-task CLDNN with uni-directional recurrent layers for the low-latency EL speech enhancement. Moreover, to achieve comparable performance to the bi-directional CLDNN, we also propose the following techniques: 1) knowledge distillation, 2) data augmentation, and 3) phonetic regularization. The experimental results demonstrate that the proposed method makes it possible to achieve comparable objective results to the bi-directional CLDNN and outperform naturalness and speech intelligibility in the noisy condition.
Year
DOI
Venue
2020
10.23919/Eusipco47968.2020.9287721
2020 28th European Signal Processing Conference (EUSIPCO)
Keywords
DocType
ISSN
electrolaryngeal speech,low-latency speech enhancement,voice conversion,deep neural network
Conference
2219-5491
ISBN
Citations 
PageRank 
978-1-7281-5001-7
0
0.34
References 
Authors
9
2
Name
Order
Citations
PageRank
Kazuhiro Kobayashi1669.91
Tomoki Toda21874167.18