Abstract | ||
---|---|---|
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/ICASSP43922.2022.9747537 | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Brian Yan | 1 | 0 | 2.37 |
Chunlei Zhang | 2 | 37 | 7.43 |
Meng Yu | 3 | 524 | 66.52 |
Shi-Xiong Zhang | 4 | 18 | 6.75 |
Siddharth Dalmia | 5 | 0 | 0.34 |
Dan Berrebbi | 6 | 0 | 1.35 |
Chao Weng | 7 | 113 | 19.75 |
Shinji Watanabe | 8 | 1158 | 139.38 |
Dong Yu | 9 | 6264 | 475.73 |