Title
Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification
Abstract
Attention-based models have recently shown powerful representation learning ability in speaker recognition. However, most of the attention mechanism based models primarily focus on pooling layers. In this work, we present an end-to-end speaker verification system which leverage time-frequency and channel features hierarchically. To further improve system performance, we employ Large Margin Cosine Loss to optimize the model to determine the optimal loss function. We carry out experiments on the VoxCeleb1 datasets to evaluate the effectiveness of our methods. The results suggest that our best system outperforms the i-vector + PLDA and x-vector system by 53.3% and 7.6%, respectively.
Year
DOI
Venue
2021
10.1109/ISCSLP49672.2021.9362054
2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords
DocType
ISBN
speaker verification,speaker embedding,Large Margin Cosine Loss,CBAM
Conference
978-1-7281-6995-8
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Chenglong Wang100.34
Jiangyan Yi21917.99
Jianhua Tao3848138.00
Ye Bai401.35
Zhengkun Tian535.79