Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation - Citegraph

Paper Info

Title
Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation

Abstract
We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain adaptation method, where we use the intermediate layer output in the audio-based speech recognition model as a teacher for the unlabeled adaptation data. Because the audio signal contains more information for recognizing speech than lip images, the knowledge of the audio-based model can be used as a powerful teacher in cases where the unlabeled adaptation data consists of audio-visual parallel data. In addition, because the proposed intermediate-layer-based KD can express the teacher as the sub-class (sub-word)-level representation, this method allows us to use the data of unknown classes for the adaptation. Through experiments on an image-based word recognition task, we demonstrate that the proposed approach can not only improve the UDA performance but can also use the unknown-class adaptation data.

Year	DOI	Venue
2021	10.1186/s13636-021-00232-5	EURASIP Journal on Audio, Speech, and Music Processing
Keywords	DocType	Volume
Lip reading, Knowledge distillation, Multimodal, Unsupervised domain adaptation	Journal	2021
Issue	ISSN	Citations
1	1687-4722	0
PageRank	References	Authors
0.34	1	7

Authors (7 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Takashima, Yuki	1	0	0.34
Takashima, Ryoichi	2	0	0.34
Tsunoda, Ryota	3	0	0.34
Aihara, Ryo	4	0	0.34
Tetsuya Takiguchi	5	85	8.77
Yasuo Ariki	6	519	88.94
Motoyama, Nobuaki	7	0	0.34

1