A survey on multimodal-guided visual content synthesis - Citegraph

Paper Info

Title
A survey on multimodal-guided visual content synthesis

Abstract
With the increasing interest in various creative scenes such as social media, film production, and intelligence courses, people expect to be able to compile rich visual content according to their subjective ideas and actual needs. In this context, visual content synthesis technique based on multimodal data has attracted much attention in recent years. Compared to traditional generative methods, multimodal data offer more flexible and concrete clues that provide an interactive and controllable way to generate the desired visual content. In this survey, we comprehensively summarize the improvements in multimodal-guided visual content synthesis. We first formulate the taxonomy of visual content synthesis and divide it into four different subfields depending on the input modality, including visual-guided visual content synthesis, text-guided visual content synthesis, audio-guided visual content synthesis, and visual content synthesis guided by other modalities. In each subfield, we describe the paradigm of different modality-guided visual content synthesis, and also discuss the signature methods mainly based on Generative Adversarial Networks (GANs). Next, we present commonly used benchmark datasets and metrics for evaluating models, as well as detailed comparisons between different methods. Finally, we provide insight into current research challenges and possible future research directions.

Year	DOI	Venue
2022	10.1016/j.neucom.2022.04.126	Neurocomputing
Keywords	DocType	Volume
Deep learning,Multimodal,Visual content synthesis,GAN	Journal	497
ISSN	Citations	PageRank
0925-2312	0	0.34
References	Authors
97	5

Authors (5 rows)

Cited by (0 rows)

References (97 rows)

Name	Order	Citations	PageRank
Ziqi Zhang	1	0	0.34
Zeyu Li	2	0	0.34
Kun Wei	3	12	4.55
Siduo Pan	4	0	0.34
Cheng Deng	5	1283	85.48

1