Title
Attentive Temporal Pooling for Conformer-Based Streaming Language Identification in Long-Form Speech.
Abstract
In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, a simple domain adaptation mechanism is introduced to allow adapting an existing language identification model to a new domain where the prior language distribution is different. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-base models outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation significantly improve the model accuracy.
Year
DOI
Venue
2022
10.21437/Odyssey.2022-36
The Speaker and Language Recognition Workshop (Odyssey)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Quan Wang111520.15
Yang Yu22413.21
Jason Pelecanos300.68
Yiling Huang400.68
Ignacio Lopez-Moreno518714.97