Abstract | ||
---|---|---|
We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor - windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ISCAS51556.2021.9401593 | 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS) |
Keywords | DocType | ISSN |
Speaker verification, feature extraction, mel-frequency cesptral coefficients (MFCCs) | Conference | 0271-4302 |
Citations | PageRank | References |
1 | 0.37 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuechen Liu | 1 | 1 | 0.71 |
Md. Sahidullah | 2 | 326 | 24.99 |
Tomi Kinnunen | 3 | 1323 | 86.67 |