Dynamic Multi-Scale Convolution for Dialect Identification. - Citegraph

Paper Info

Title
Dynamic Multi-Scale Convolution for Dialect Identification.

Abstract
Time Delay Neural Networks (TDNN)-based methods are widely used in dialect identification. However, in previous work with TDNN application, subtle variant is being neglected in different feature scales. To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling. Dynamic kernel convolution captures features between short-term and long-term context adaptively. Local multi-scale learning, which represents multi-scale features at a granular level, is able to increase the range of receptive fields for convolution operation. Besides, global multi-scale pooling is applied to aggregate features from different bottleneck layers in order to collect information from multiple aspects. The proposed architecture significantly outperforms state-of-the-art system on the AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020, with the best average cost performance (Cavg) of 0.067 and the best equal error rate (EER) of 6.52%. Compared with the known best results, our method achieves 9% of Cavg and 45% of EER relative improvement, respectively. Furthermore, the parameters of proposed model are 91% fewer than the best known model.

Year	DOI	Venue
2021	10.21437/Interspeech.2021-56	Interspeech
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tianlong Kong	1	0	0.34
Shouyi Yin	2	1	1.71
Dawei Zhang	3	0	2.37
Wang Geng	4	0	0.34
Xin Wang	5	194	53.80
Dandan Song	6	150	19.44
Jinwen Huang	7	0	0.34
Huiyu Shi	8	0	0.34
Xiaorui Wang	9	0	1.69

1