Title | ||
---|---|---|
Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation |
Abstract | ||
---|---|---|
We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain adaptation method, where we use the intermediate layer output in the audio-based speech recognition model as a teacher for the unlabeled adaptation data. Because the audio signal contains more information for recognizing speech than lip images, the knowledge of the audio-based model can be used as a powerful teacher in cases where the unlabeled adaptation data consists of audio-visual parallel data. In addition, because the proposed intermediate-layer-based KD can express the teacher as the sub-class (sub-word)-level representation, this method allows us to use the data of unknown classes for the adaptation. Through experiments on an image-based word recognition task, we demonstrate that the proposed approach can not only improve the UDA performance but can also use the unknown-class adaptation data. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1186/s13636-021-00232-5 | EURASIP Journal on Audio, Speech, and Music Processing |
Keywords | DocType | Volume |
Lip reading, Knowledge distillation, Multimodal, Unsupervised domain adaptation | Journal | 2021 |
Issue | ISSN | Citations |
1 | 1687-4722 | 0 |
PageRank | References | Authors |
0.34 | 1 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Takashima, Yuki | 1 | 0 | 0.34 |
Takashima, Ryoichi | 2 | 0 | 0.34 |
Tsunoda, Ryota | 3 | 0 | 0.34 |
Aihara, Ryo | 4 | 0 | 0.34 |
Tetsuya Takiguchi | 5 | 85 | 8.77 |
Yasuo Ariki | 6 | 519 | 88.94 |
Motoyama, Nobuaki | 7 | 0 | 0.34 |