Title
Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification
Abstract
Articulation characteristics of particular phonemes can provide cues to distinguish accents in spoken English. For example, as shown in Arslan and Hansen (1996, 1997), Voice Onset Time (VOT) can be used to classify mandarin, Turkish, German and American accented English. Our goal in this study is to develop an automatic system that classifies accents using VOT in unvoiced stops. VOT is an important temporal feature which is often overlooked in speech perception, speech recognition, as well as accent detection. Fixed length frame-based speech processing inherently ignores VOT. In this paper, a more effective VOT detection scheme using the non-linear energy tracking algorithm Teager Energy Operator (TEO), across a sub-frequency band partition for unvoiced stops (/p/, /t/ and /k/), is introduced. The proposed VOT detection algorithm also incorporates spectral differences in the Voice Onset Region (VOR) and the succeeding vowel of a given stop-vowel sequence to classify speakers having accents due to different ethnic origin. The spectral cues are enhanced using one of the four types of feature parameter extractions - Discrete Mellin Transform (DMT), Discrete Mellin Fourier Transform (DMFT) and Discrete Wavelet Transform using the lowest and the highest frequency resolutions (DWTlfr and DWThfr). A Hidden Markov Model (HMM) classifier is employed with these extracted parameters and applied to the problem of accent classification. Three different language groups (American English, Chinese, and Indian) are used from the CU-Accent database. The VOT is detected with less than 10% error when compared to the manual detected VOT with a success rate of 79.90%, 87.32% and 47.73% for English, Chinese and Indian speakers (includes atypical cases for Indian case), respectively. It is noted that the DMT and DWTlfr features are good for parameterizing speech samples which exhibit substitution of succeeding vowel after the stop in accented speech. The successful accent classification rates of DMT and DWTlfr features are 66.13% and 71.67%, for /p/ and /t/ respectively, for pairwise accent detection. Alternatively, the DMFT feature works on all accent sensitive words considered, with a success rate of 70.63%. This study shows that effective VOT detection can be achieved using an integrated TEO processing with spectral difference analysis in the VOR that can be employed for accent classification.
Year
DOI
Venue
2010
10.1016/j.specom.2010.05.004
Speech Communication
Keywords
Field
DocType
teager energy operator (teo),unvoiced stop,accent detection,voice onset time (vot),accent classification,successful accent classification rate,success rate,dwtlfr feature,voice onset region (vor),effective vot detection scheme,accent sensitive word,pairwise accent detection,effective vot detection,automatic voice onset time,speech processing,voice onset time,speech perception,discrete wavelet transform,mellin transform,fourier transform,hidden markov model,speech recognition
Voice-onset time,Speech processing,Pattern recognition,Computer science,Feature extraction,Speech recognition,American English,Artificial intelligence,Vowel,Speech perception,Classifier (linguistics),Hidden Markov model
Journal
Volume
Issue
ISSN
52
10
Speech Communication
Citations 
PageRank 
References 
10
0.83
12
Authors
3
Name
Order
Citations
PageRank
John H. L. Hansen13215365.75
Sharmistha S. Gray2211.36
Wooil Kim312016.95