Title
Non-native pronunciation variation modeling using an indirect data driven method
Abstract
In this paper, we propose a pronunciation variation modeling method for improving the performance of a non-native automatic speech recognition (ASR) system that does not degrade the performance of a native ASR system. The proposed method is based on an indirect data-driven approach, where pronunciation variability is investigated from the training speech data, and variant rules are subsequently derived and applied to compensate for variability in the ASR pronunciation dictionary. To this end, native utterances are first recognized by using a phoneme recognizer, and then the variant phoneme patterns of native speech are obtained by aligning the recognized and reference phonetic sequences. The reference sequences are transcribed by using each of canonical, knowledge-based, and hand-labeled methods. Similar to non-native speech, the variant phoneme patterns of non-native speech can also be obtained by recognizing non-native utterances and comparing the recognized phoneme sequences and reference phonetic transcriptions. Finally, variant rules are derived from native and non-native variant phoneme patterns using decision trees and applied to the adaptation of a dictionary for non-native and native ASR systems. In this paper, Korean spoken by Chinese native speakers is considered as the non-native speech. It is shown from non-native ASR experiments that an ASR system using the dictionary constructed by the proposed pronunciation variation modeling method can relatively reduce the average word error rate (WER) by 18.5% when compared to the baseline ASR system using a canonical transcribed dictionary. In addition, the WER of a native ASR system using the proposed dictionary is also relatively reduced by 1.1%, as compared to the baseline native ASR system with a canonical constructed dictionary.
Year
DOI
Venue
2007
10.1109/ASRU.2007.4430114
ASRU
Keywords
Field
DocType
nonnative pronunciation variation modeling,reference phonetic sequence,indirect data driven method,asr pronunciation dictionary,non-native speech recognition,natural languages,word error rate,pronunciation variation,learning (artificial intelligence),vocabulary,dictionaries,native speech phoneme pattern,chinese native speakers,speaker recognition,indirect data-driven approach,dictionary adaptation,phoneme recognizer,automatic speech recognition system,speech recognition,decision trees,decision tree,native speaker,knowledge base,automatic speech recognition,learning artificial intelligence
Pronunciation,Transcription (linguistics),Decision tree,Data-driven,Computer science,Word error rate,Speech recognition,Natural language,Speaker recognition,Natural language processing,Artificial intelligence,Vocabulary
Conference
ISBN
Citations 
PageRank 
978-1-4244-1746-9
8
0.67
References 
Authors
7
3
Name
Order
Citations
PageRank
Mina Kim180.67
Yoo Rhee Oh2273.41
Hong Kook Kim325851.67