Abstract | ||
---|---|---|
Probabilistic linear discriminant analysis (PLDA) is the de facto standard for backends in i-vector speaker recognition. If we try to extend the PLDA paradigm using non-linear models, e.g., deep neural networks, the posterior distributions of the latent variables and the marginal likelihood become intractable. In this paper, we propose to approach this problem using stochastic gradient variational Bayes. We generalize the PLDA model to let i-vectors depend non-linearly on the latent factors. We approximate the evidence lower bound (ELBO) by Monte Carlo sampling using the reparametrization trick. This enables us to optimize of the ELBO using backpropagation to jointly estimate the parameters that define the model and the approximate posteriors of the latent factors. We also present a reformulation of the likelihood ratio, which we call Q-scoring. Q-scoring makes possible to efficiently score the speaker verification trials for this model. Experimental results on NIST SRE10 suggest that more data might be required to exploit the potential of this method. |
Year | DOI | Venue |
---|---|---|
2017 | 10.21437/Interspeech.2017-1018 | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Keywords | Field | DocType |
speaker recognition, i-vectors, variational autoencoders, stochastic variational inference, PLDA | I vector,Autoencoder,Pattern recognition,Computer science,Speech recognition,Speaker recognition,Artificial intelligence | Conference |
ISSN | Citations | PageRank |
2308-457X | 1 | 0.35 |
References | Authors | |
5 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
jesus villalba | 1 | 41 | 5.11 |
Niko Brümmer | 2 | 595 | 44.01 |
N. Dehak | 3 | 1269 | 92.64 |