Abstract | ||
---|---|---|
Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/LSP.2021.3122796 | IEEE SIGNAL PROCESSING LETTERS |
Keywords | DocType | Volume |
Feature extraction, Discrete Fourier transforms, Task analysis, Neural networks, Mel frequency cepstral coefficient, Stochastic processes, Standards, Multi-taper spectrum, speaker verification | Journal | 28 |
Issue | ISSN | Citations |
1 | 1070-9908 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuechen Liu | 1 | 0 | 0.34 |
Md. Sahidullah | 2 | 326 | 24.99 |
Tomi Kinnunen | 3 | 1323 | 86.67 |