Abstract | ||
---|---|---|
The LEAP submission for DIHARD-III challenge is described in this paper. The proposed system is composed of a speech bandwidth classifier, and diarization systems fine-tuned for narrowband and wideband speech separately. We use an end-to-end speaker diarization system for the narrowband conversational telephone speech recordings. For the wideband multi-speaker recordings, we use a neural embedding based clustering approach, similar to the baseline system. The embeddings are extracted from a time-delay neural network (called x-vectors) followed by the graph based path integral clustering (PIC) approach. The LEAP system showed 24% and 18% relative improvements for Track-1 and Track-2 respectively over the baseline system provided by the organizers. This paper describes the challenge submission, the post-evaluation analysis and improvements observed on the DIHARD-III dataset. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-728 | Interspeech |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Prachi Singh | 1 | 0 | 0.68 |
Rajat Varma | 2 | 0 | 0.68 |
Venkat Krishnamohan | 3 | 0 | 0.68 |
Srikanth Raj Chetupalli | 4 | 0 | 1.69 |
Sriram Ganapathy | 5 | 252 | 39.62 |