Title
End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform
Abstract
Overlapped speech processing has attracted more and more attention in recent years, and it is a key problem when processing multi-talker mixed speech under the cocktail party scenario. It is commonly observed that the performance of overlapped speech processing can be significantly improved if the number of speakers is given in advance. However, such prior knowledge is often unavailable in real-world conditions, so a robust overlapped speech detection and speaker counting system is demanded. Most existing works focus on combining different handcrafted features to tackle this task, which can be sub-optimal since there are no direct connections between the features and the task. In this work, we try to solve these two problems with an end-to-end manner. First, an end-to-end framework for overlapped speech detection and speaker counting is proposed, which extracts features from the raw waveform directly. Then a curriculum learning strategy is applied to make better use of the training data. The proposed methods are evaluated on multi-talker mixed speech generated from the LibriSpeech corpus. Experimental results show that our proposed methods outperform the model with handcrafted features on both tasks, achieving more than 2% and 4% absolute accuracy improvement on overlapped speech detection and speaker counting respectively.
Year
DOI
Venue
2019
10.1109/ASRU46091.2019.9003962
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Keywords
DocType
ISBN
end-to-end,raw waveform,overlapped speech detection,speaker counting,deep learning
Conference
978-1-7281-0307-5
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Wangyou Zhang1125.44
Man Sun200.34
Lan Wang300.68
Yanmin Qian429544.44