Title | ||
---|---|---|
An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling |
Abstract | ||
---|---|---|
In this paper, we present an integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling. The target corpus considered here has almost all utterances in the host language of Mandarin, while many of them are embedded with terms (mostly special terminologies for the course) produced in the guest language of English. For acoustic modeling, we propose a state mapping approach to merge English states with similar Mandarin states to solve the problem of very limited data for English, and integrate it with multi-path speaker adaptation. For language modeling, we integrate class-based n-grams based on perplexity or POS features, random forest and model adaptation. Very encouraging improvements in performance were obtained. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/ISCSLP.2010.5684908 | ISCSLP |
Keywords | Field | DocType |
class-based n-gram,state-mapping,language modeling,acoustic modeling,mllr,rflm,model adaptation,mandarin state,pos,multipath speaker adaptation,mandarin english code,perplexity,adaptation,code-mixing,speech coding,bilingual,natural language processing,pos feature,component,map,accuracy,language model,silicon,acoustics,merging,random forest,data models | Data modeling,Perplexity,Transcription (linguistics),Computer science,Modeling language,Speech recognition,Natural language processing,Artificial intelligence,Random forest,Code-mixing,Language model,Mandarin Chinese | Conference |
ISBN | Citations | PageRank |
978-1-4244-6244-5 | 9 | 0.66 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ching-feng Yeh | 1 | 71 | 6.88 |
Chao-Yu Huang | 2 | 37 | 3.01 |
Liang-Che Sun | 3 | 36 | 3.43 |
Lin-shan Lee | 4 | 1525 | 182.03 |