Title
Tied Variational Autoencoder Backends For I-Vector Speaker Recognition
Abstract
Probabilistic linear discriminant analysis (PLDA) is the de facto standard for backends in i-vector speaker recognition. If we try to extend the PLDA paradigm using non-linear models, e.g., deep neural networks, the posterior distributions of the latent variables and the marginal likelihood become intractable. In this paper, we propose to approach this problem using stochastic gradient variational Bayes. We generalize the PLDA model to let i-vectors depend non-linearly on the latent factors. We approximate the evidence lower bound (ELBO) by Monte Carlo sampling using the reparametrization trick. This enables us to optimize of the ELBO using backpropagation to jointly estimate the parameters that define the model and the approximate posteriors of the latent factors. We also present a reformulation of the likelihood ratio, which we call Q-scoring. Q-scoring makes possible to efficiently score the speaker verification trials for this model. Experimental results on NIST SRE10 suggest that more data might be required to exploit the potential of this method.
Year
DOI
Venue
2017
10.21437/Interspeech.2017-1018
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords
Field
DocType
speaker recognition, i-vectors, variational autoencoders, stochastic variational inference, PLDA
I vector,Autoencoder,Pattern recognition,Computer science,Speech recognition,Speaker recognition,Artificial intelligence
Conference
ISSN
Citations 
PageRank 
2308-457X
1
0.35
References 
Authors
5
3
Name
Order
Citations
PageRank
jesus villalba1415.11
Niko Brümmer259544.01
N. Dehak3126992.64