Title
Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation
Abstract
The challenge in deep learning for speaker independent speech separation comes from the label ambiguity or permutation problem. Utterance-level permutation invariant training (uPIT) technique solves this problem by minimizing the mean square error (MSE) over all permutations between outputs and targets. It is a state-of-the-art deep learning architecture. However, uPIT only minimizes the chosen permutation with the lowest MSE, not discriminates it with other permutations. This may lead to increase the possibility of remixing the separated sources. In this paper, we propose a uPIT with discriminative learning (uPITDL) method to solve this problem by adding one regularization at the cost function. In other words, we minimize the difference between the outputs of model and their corresponding reference signals. Moreover, the dissimilarity between the prediction and the targets of other sources is maximized. We evaluate the proposed model on WSJ0-2mix dataset. Experimental results show 22.0% and 24.8% relative improvements under both closed and open conditions compared with the uPIT baseline.
Year
DOI
Venue
2018
10.1109/ISCSLP.2018.8706611
2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords
Field
DocType
Training,Cost function,Deep learning,Signal to noise ratio,Linear programming,Adaptation models,Speech recognition
Pattern recognition,Computer science,Permutation,Signal-to-noise ratio,Mean squared error,Regularization (mathematics),Linear programming,Invariant (mathematics),Artificial intelligence,Deep learning,Ambiguity
Conference
ISBN
Citations 
PageRank 
978-1-5386-5627-3
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Cunhang Fan123.79
Bin Liu219135.02
Jianhua Tao3848138.00
Zhengqi Wen48624.41
Jiangyan Yi51917.99
Ye Bai675.52