Multimodal Dance Generation Networks Based On Audio-Visual Analysis - Citegraph

Paper Info

Title
Multimodal Dance Generation Networks Based On Audio-Visual Analysis

Abstract
3D human dance generation from music is an interesting and challenging task in which the aim is to estimate 3D pose from visual and audio information. Existing methods only use skeleton information to complete this task, which may cause jittering results. In addition, due to lack of appropriate evaluation metrics for this task, it is difficult to evaluate the quality of the generated results. In this paper, the authors explore multi-modality dance generation networks through constructing the correspondence between the visual and the audio cues. Specifically, they propose a 2D prediction module to predict future frames by fusing visual and audio features. Moreover, they propose a 3D conversion module, which is able to generate the 3D skeleton from the 2D skeleton. In addition, some new human dance generation evaluation metrics are proposed to evaluate the quality of the generated results. Experimental results indicate that the proposed modules can meet the requirements of authenticity and diversity.

Year	DOI	Venue
2021	10.4018/IJMDEM.2021010102	INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT
Keywords	DocType	Volume
3D Pose, Audio-Visual, Classification, Dance Generation, LSTM, Metrics, Mixture Density Networks, Multimodal, Skeleton, VAE	Journal	12
Issue	ISSN	Citations
1	1947-8534	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lijuan Duan	1	0	1.01
Xiao Xu	2	26	3.61
Qing En	3	0	1.35

1