Title
Dynamic Multi-Scale Convolution for Dialect Identification.
Abstract
Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling. Dynamic kernel convolution captures features between short-term and long-term context adaptively. Local multi-scale learning, which represents multi-scale features at a granular level, is able to increase the range of receptive fields for convolution operation. Besides, global multi-scale pooling is applied to aggregate features from different bottleneck layers in order to collect information from multiple aspects. The proposed architecture significantly outperforms state-of-the-art system on the AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020, with the best average cost performance (Cavg) of 0.067 and the best equal error rate (EER) of 6.52%. Compared with the known best results, our method achieves 9% of Cavg and 45% of EER relative improvement, respectively. Furthermore, the parameters of proposed model are 91% fewer than the best known model.
Year
DOI
Venue
2021
10.21437/Interspeech.2021-56
Interspeech
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
9
Name
Order
Citations
PageRank
Tianlong Kong100.34
Shouyi Yin211.71
Dawei Zhang302.37
Wang Geng400.34
Xin Wang519453.80
Dandan Song615019.44
Jinwen Huang700.34
Huiyu Shi800.34
Xiaorui Wang901.69