Title
Using Vtln Matrices For Rapid And Computationally-Efficient Speaker Adaptation With Robustness To First-Pass Transcription Errors
Abstract
In this paper, we propose to combine the rapid adaptation capability of conventional Vocal Tract Length Normalization (VTLN) with the computational efficiency of transform-based adaptation such as MLLR or CMLLR. VTLN requires the estimation of only one parameter and is, therefore, most suited for the cases where there is little adaptation data (i.e. rapid adaptation). In contrast, transform-based adaptation methods require the estimation of matrices. However, the drawback of conventional VTLN is that it is computationally expensive since it requires multiple spectral-warping to generate VTLN-warped features. We have recently shown that VTLN-warping can be implemented by a linear-transformation (LT) of the conventional MFCC features. These LTs are analytically pre-computed and stored. In this frame-work of LT VTLN, computational complexity of VTLN is similar to transform-based adaptation since warp-factor estimation can be done using the same sufficient statistics as that are used in CMLLR. We show that VTLN provides significant improvement in performance when there is small adaptation data as compared to transform-based adaptation methods. We also show that the use of an additional decorrelating transform. MLLT, along with the VTLN-matrices, gives performance that is better than MLLR and comparable to SAT with MLLT even for large adaptation data. Further we show that in the mismatched train and test case (i.e. poor first-pass transcription), VTLN provides significant improvement over the transform-based adaptation methods. We compare the performances of different methods on the WSJ, the RM and the TIDIGITS databases.
Year
Venue
Keywords
2009
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5
VTLN, Rapid Adaptation, MLLT, CAT, Linear Transform
Field
DocType
Citations 
Mel-frequency cepstrum,Normalization (statistics),Pattern recognition,Matrix (mathematics),Computer science,Robustness (computer science),Speech recognition,Artificial intelligence,Sufficient statistic,Speaker adaptation,Vocal tract,Computational complexity theory
Conference
1
PageRank 
References 
Authors
0.36
9
3
Name
Order
Citations
PageRank
S. P. Rath1212.97
Srinivasan Umesh29316.31
Achintya Kumar Sarkar3237.81