Abstract | ||
---|---|---|
The second Automatic Speaker Verification Spoofing and Countermeasures challenge (ASVspoof 2017) focused on replay attack detection. The best deep-learning systems to compete in ASVspoof 2017 used Convolutional Neural Networks (CNNs) as a feature extractor. In this paper, we study their performance in an end-to-end setting. We find that these architectures show poor generalization in the evaluation dataset, but find a compact architecture that shows good generalization on the development data. We demonstrate that for this dataset it is not easy to obtain a similar level of generalization on both the development and evaluation data. This leads to a variety of open questions about what the differences are in the data; why these are more evident in an end-to-end setting; and how these issues can be overcome by increasing the training data. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Audio and Speech Processing | Training set,Speaker verification,Spoofing attack,End-to-end principle,Convolutional neural network,Computer science,Speech recognition,Extractor,Replay attack,Anti spoofing |
DocType | Volume | Citations |
Journal | abs/1805.09164 | 0 |
PageRank | References | Authors |
0.34 | 6 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bhusan Chettri | 1 | 2 | 2.79 |
Saumitra Mishra | 2 | 2 | 1.75 |
Bob L. Sturm | 3 | 241 | 29.88 |
Emmanouil Benetos | 4 | 557 | 52.48 |