Title | ||
---|---|---|
Sequence-Level Speaker Change Detection With Difference-Based Continuous Integrate-and-Fire |
Abstract | ||
---|---|---|
Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/LSP.2022.3185955 | IEEE SIGNAL PROCESSING LETTERS |
Keywords | DocType | Volume |
Task analysis, Decoding, Training, Transforms, Recording, Predictive models, Partitioning algorithms, Difference-based continuous integrate-and-fire, sequence transduction, speaker change detection | Journal | 29 |
ISSN | Citations | PageRank |
1070-9908 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhiyun Fan | 1 | 0 | 0.34 |
linhao dong | 2 | 4 | 2.81 |
Meng Cai | 3 | 0 | 1.01 |
Zejun Ma | 4 | 0 | 0.68 |
Bo Xu | 5 | 111 | 27.31 |