Title
Multimodal Dance Generation Networks Based On Audio-Visual Analysis
Abstract
3D human dance generation from music is an interesting and challenging task in which the aim is to estimate 3D pose from visual and audio information. Existing methods only use skeleton information to complete this task, which may cause jittering results. In addition, due to lack of appropriate evaluation metrics for this task, it is difficult to evaluate the quality of the generated results. In this paper, the authors explore multi-modality dance generation networks through constructing the correspondence between the visual and the audio cues. Specifically, they propose a 2D prediction module to predict future frames by fusing visual and audio features. Moreover, they propose a 3D conversion module, which is able to generate the 3D skeleton from the 2D skeleton. In addition, some new human dance generation evaluation metrics are proposed to evaluate the quality of the generated results. Experimental results indicate that the proposed modules can meet the requirements of authenticity and diversity.
Year
DOI
Venue
2021
10.4018/IJMDEM.2021010102
INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT
Keywords
DocType
Volume
3D Pose, Audio-Visual, Classification, Dance Generation, LSTM, Metrics, Mixture Density Networks, Multimodal, Skeleton, VAE
Journal
12
Issue
ISSN
Citations 
1
1947-8534
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Lijuan Duan101.01
Xiao Xu2263.61
Qing En301.35