Abstract | ||
---|---|---|
Voice anti-spoofing aims at classifying a given speech input either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Numerous voice anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora) -- and we do not know \emph{why}. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance. Our within- and between-domain experiments pool data from seven public corpora and three anti-spoofing methods based on Gaussian mixture and convolutive neural network models. We assess the impacts of long-term spectral information, speaker population (through x-vector speaker embeddings), signal-to-noise ratio, and selected voice quality features. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-1180 | Interspeech |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bhusan Chettri | 1 | 2 | 2.79 |
Rosa González Hautamäki | 2 | 30 | 3.87 |
Md. Sahidullah | 3 | 326 | 24.99 |
Tomi Kinnunen | 4 | 1323 | 86.67 |