Abstract | ||
---|---|---|
While predicting video content is challenging given the huge unconstrained searching space, this work explores cross-modality constraints to safeguard the video generation process and seeks improved content prediction. By observing the underlying correspondence between the sound and the object movement, we propose a novel cross-modality video generation network. Via adversarial training, this network directly links sound with the movement parameters of the operated object and automatically outputs corresponding object motion according to the rhythm of the given audio signal. We experiment on both rigid object and non-rigid object motion prediction tasks and show that our method significantly reduces motion uncertainty for the generated video content, with the guidance of the associated audio information. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.cviu.2019.03.006 | Computer Vision and Image Understanding |
Keywords | Field | DocType |
Video generation,Cross-modality constraint,Adversarial learning | Audio signal,Computer vision,Parametrization,Safeguard,Artificial intelligence,Motion prediction,Cross modality,Mathematics | Journal |
Volume | Issue | ISSN |
183 | 1 | 1077-3142 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yichao Yan | 1 | 90 | 6.70 |
Bingbing Ni | 2 | 1421 | 82.90 |
Wendong Zhang | 3 | 15 | 1.85 |
Jun Tang | 4 | 0 | 0.34 |
Xiaokang Yang | 5 | 3581 | 238.09 |