End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform - Citegraph

Paper Info

Title
End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform

Abstract
Overlapped speech processing has attracted more and more attention in recent years, and it is a key problem when processing multi-talker mixed speech under the cocktail party scenario. It is commonly observed that the performance of overlapped speech processing can be significantly improved if the number of speakers is given in advance. However, such prior knowledge is often unavailable in real-world conditions, so a robust overlapped speech detection and speaker counting system is demanded. Most existing works focus on combining different handcrafted features to tackle this task, which can be sub-optimal since there are no direct connections between the features and the task. In this work, we try to solve these two problems with an end-to-end manner. First, an end-to-end framework for overlapped speech detection and speaker counting is proposed, which extracts features from the raw waveform directly. Then a curriculum learning strategy is applied to make better use of the training data. The proposed methods are evaluated on multi-talker mixed speech generated from the LibriSpeech corpus. Experimental results show that our proposed methods outperform the model with handcrafted features on both tasks, achieving more than 2% and 4% absolute accuracy improvement on overlapped speech detection and speaker counting respectively.

Year	DOI	Venue
2019	10.1109/ASRU46091.2019.9003962	2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Keywords	DocType	ISBN
end-to-end,raw waveform,overlapped speech detection,speaker counting,deep learning	Conference	978-1-7281-0307-5
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Wangyou Zhang	1	12	5.44
Man Sun	2	0	0.34
Lan Wang	3	0	0.68
Yanmin Qian	4	295	44.44

1