Title
A study on the roles of total variability space and session variability modeling in speaker recognition
Abstract
Abstract Speaker verification (SV) using i-vector concept becomes state-of-the-art. In this technique, speakers are projected onto the total variability space and represented by vectors called i-vectors. During testing, the i-vectors of the test speech segment and claimant are conditioned to compensate for the session variability before scoring. So, i-vector system can be viewed as two processing blocks: one is total variability space and the other is post-processing module. Several questions arise, such as, (i) which part of the i-vector system plays a major role in speaker verification: total variability space or post-processing task; (ii) is the post-processing module intrinsic to the total variability space? The motivation of this paper is to partially answer these questions by proposing several simpler speaker characterization systems for speaker verification, where speakers are represented by their speaker characterization vectors (SCVs). The SCVs are obtained by uniform segmentation of the speakers gaussian mixture models (GMMs)- and maximum likelihood linear regression (MLLR) super-vectors. We consider two adaptation approaches for GMM super-vector: one is maximum a posteriori and other is MLLR. Similarly to the i-vector, SCVs are post-processed for session variability compensation during testing. The proposed system shows promising performance when compared to the classical i-vector system which indicates that the post-processing task plays an major role in i-vector based SV system and is not intrinsic to the total variability space. All experimental results are shown on NIST 2008 SRE core condition.
Year
DOI
Venue
2016
10.1007/s10772-015-9324-2
International Journal of Speech Technology
Keywords
Field
DocType
MLLR, MAP, Super-vector, Uniform segmentation, i-Vector, Speaker verification
Speaker verification,Pattern recognition,Segmentation,Computer science,Speech recognition,Maximum likelihood linear regression,NIST,Speaker recognition,Speaker diarisation,Artificial intelligence,Maximum a posteriori estimation,Mixture model
Journal
Volume
Issue
ISSN
19
1
1572-8110
Citations 
PageRank 
References 
1
0.39
14
Authors
3
Name
Order
Citations
PageRank
Achintya Kumar Sarkar1237.81
Jean-François Bonastre249336.03
Driss Matrouf340441.80