Title
Audio-Driven Stylized Gesture Generation with Flow-Based Model.
Abstract
Generating stylized audio-driven gestures for robots and virtual avatars has attracted increasing considerations recently. Existing methods require style labels (e.g. speaker identities), or complex preprocessing of data to obtain the style control parameters. In this paper, we propose a new end-to-end flow-based model, which can generate audio-driven gestures of arbitrary styles with neither preprocessing nor style labels. To achieve this goal, we introduce a global encoder and a gesture perceptual loss into the classic generative flow model to capture both global and local information. We conduct extensive experiments on two benchmark datasets: the TED Dataset and the Trinity Dataset. Both quantitative and qualitative evaluations show that the proposed model outperforms state-of-the-art models.
Year
DOI
Venue
2022
10.1007/978-3-031-20065-6_41
European Conference on Computer Vision
Keywords
DocType
Citations 
Stylized gesture,Flow-based model,Global encoder
Conference
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Sheng Ye100.34
Yu-Hui Wen201.01
Yanan Sun300.34
Ying He41264105.35
Ziyang Zhang5105.97
Yaoyuan Wang601.69
Weihua He700.68
Yong-Jin Liu800.68