Title
An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling
Abstract
In this paper, we present an integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. The target corpus considered here has almost all utterances in the host language of Mandarin, while many of them are embedded with terms (mostly special terminologies for the course) produced in the guest language of English. For acoustic modeling, we propose a state mapping approach to merge English states with similar Mandarin states to solve the problem of very limited data for English, and integrate it with multi-path speaker adaptation. For language modeling, we integrate class-based n-grams based on perplexity or POS features, random forest and model adaptation. Very encouraging improvements in performance were obtained.
Year
DOI
Venue
2010
10.1109/ISCSLP.2010.5684908
ISCSLP
Keywords
Field
DocType
class-based n-gram,state-mapping,language modeling,acoustic modeling,mllr,rflm,model adaptation,mandarin state,pos,multipath speaker adaptation,mandarin english code,perplexity,adaptation,code-mixing,speech coding,bilingual,natural language processing,pos feature,component,map,accuracy,language model,silicon,acoustics,merging,random forest,data models
Data modeling,Perplexity,Transcription (linguistics),Computer science,Modeling language,Speech recognition,Natural language processing,Artificial intelligence,Random forest,Code-mixing,Language model,Mandarin Chinese
Conference
ISBN
Citations 
PageRank 
978-1-4244-6244-5
9
0.66
References 
Authors
8
4
Name
Order
Citations
PageRank
Ching-feng Yeh1716.88
Chao-Yu Huang2373.01
Liang-Che Sun3363.43
Lin-shan Lee41525182.03