Title | ||
---|---|---|
Attentive Temporal Pooling for Conformer-Based Streaming Language Identification in Long-Form Speech. |
Abstract | ||
---|---|---|
In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, a simple domain adaptation mechanism is introduced to allow adapting an existing language identification model to a new domain where the prior language distribution is different. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-base models outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation significantly improve the model accuracy. |
Year | DOI | Venue |
---|---|---|
2022 | 10.21437/Odyssey.2022-36 | The Speaker and Language Recognition Workshop (Odyssey) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Quan Wang | 1 | 115 | 20.15 |
Yang Yu | 2 | 24 | 13.21 |
Jason Pelecanos | 3 | 0 | 0.68 |
Yiling Huang | 4 | 0 | 0.68 |
Ignacio Lopez-Moreno | 5 | 187 | 14.97 |