Title | ||
---|---|---|
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers |
Abstract | ||
---|---|---|
In this paper, we propose a joint model for simultaneous speaker counting, speech recognition, and speaker identification on monaural overlapped speech. Our model is built on serialized output training (SOT) with attention-based encoder-decoder, a recently proposed method for recognizing overlapped speech comprising an arbitrary number of speakers. We extend the SOT model by introducing a speaker inventory as an auxiliary input to produce speaker labels as well as multi-speaker transcriptions. All model parameters are optimized by speaker-attributed maximum mutual information criterion, which represents a joint probability for overlapped speech recognition and speaker identification. Experiments on LibriSpeech corpus show that our proposed method achieves significantly better speaker-attributed word error rate than the baseline that separately performs overlapped speech recognition and speaker identification. |
Year | DOI | Venue |
---|---|---|
2020 | 10.21437/Interspeech.2020-1085 | INTERSPEECH |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Naoyuki Kanda | 1 | 103 | 19.45 |
Yashesh Gaur | 2 | 15 | 9.06 |
Xiaofei Wang | 3 | 5 | 4.14 |
Zhong Meng | 4 | 33 | 14.95 |
Zhuo Chen | 5 | 153 | 24.33 |
Tianyan Zhou | 6 | 12 | 4.79 |
Takuya Yoshioka | 7 | 585 | 49.20 |