VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC - Citegraph

Paper Info

Title
VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

Abstract
In this paper, we propose a method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract length normalization (VTLN)-warped MFCC features, thereby simplifying the VTLN processing. There have been many attempts to obtain such a linear-transformation, but all the previously proposed approaches either modify the signal processing (and therefore not conventional MFCC), or the linear-transformation does not correspond to conventional VTLN-warping, or the matrices being estimated and are data dependent. In short, the conventional VTLN part of an automatic speech recognition (ASR) system cannot be simply replaced with any of the previously proposed methods. Umesh et. al. proposed the idea to use band-limited interpolation for performing VTLN-warping on MFCC using plain cepstra. Motivated from this work, Panchapagesan and Alwan proposed a linear-transformation to perform VTLN-warping on conventional MFCC. However, in their approach, VTLN warping is specified in the Mel-frequency domain and is not equivalent to conventional VTLN. In this paper, we present an approach which also draws inspiration from the work of Umesh et. al., and which we believe for the first time performs conventional VTLN as a linear-transformation on conventional MFCC using the ideas of band-limited interpolation. Deriving such a linear-transformation to perform VTLN, would allow us to use the VTLN-matrices in transform-based adaptation framework with its associated advantages and yet would require the estimation of a single parameter. Using four different tasks, we show that our proposed approach has almost identical recognition performance to conventional VTLN on both clean and noisy speech data.

Year	DOI	Venue
2012	10.1109/TASL.2012.2186289	IEEE Transactions on Audio, Speech, and Language Processing
Keywords	Field	DocType
parameter estimation,frequency domain,mel frequency cepstral coefficients,speech processing,automatic speech recognition,signal processing,speech recognition,speech,mel frequency cepstral coefficient,linear transformation,discrete cosine transform	Signal processing,Mel-frequency cepstrum,Speech processing,Normalization (statistics),Image warping,Pattern recognition,Computer science,Interpolation,Speech recognition,Artificial intelligence,Estimation theory,Vocal tract	Journal
Volume	Issue	ISSN
20	5	1558-7916
Citations	PageRank	References
6	0.62	20
Authors
2

Authors (2 rows)

Cited by (6 rows)

References (20 rows)

Name	Order	Citations	PageRank
D. R. Sanand	1	28	3.02
Srinivasan Umesh	2	93	16.31

1