Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion. - Citegraph

Paper Info

Title
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion.

Abstract
We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion. We combine the motion and speech audio of the speaker using a motion-audio cross attention transformer. Furthermore, we enable non-deterministic prediction by learning a discrete latent representation of realistic listener motion with a novel motion-encoding VQ-VAE. Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions. Moreover, it produces realistic 3D listener facial motion synchronous with the speaker (see video). We demonstrate that our method outperforms baselines qualitatively and quantitatively via a rich suite of experiments. To facilitate this line of research, we introduce a novel and large in-the-wild dataset of dyadic conversations. Code, data, and videos available at https://evonneng.github.io/learning2listen/.

Year	Venue	DocType
2022	IEEE Conference on Computer Vision and Pattern Recognition	Conference
Citations	PageRank	References
0	0.34	0
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Evonne Ng	1	0	0.34
Joo Hanbyul	2	2	2.38
Liwen Hu	3	0	0.68
Hao Li	4	960	46.39
Trevor Darrell	5	22413	1800.67
Angjoo Kanazawa	6	272	10.36
Shiry Ginosar	7	10	1.31

1