Cascade Attention Guided Residue Learning Gan For Cross-Modal Translation - Citegraph

Paper Info

Title
Cascade Attention Guided Residue Learning Gan For Cross-Modal Translation

Abstract
Since we were babies, we intuitively develop the ability to correlate the input from different cognitive sensors such as vision, audio, and text. However, in machine learning, this cross-modal learning is a nontrivial task because different modalities have no homogeneous properties. Previous works discover that there should be bridges among different modalities. From a neurology and psychology perspective, humans have the capacity to link one modality with another one, e.g., associating a picture of a bird with the only hearing of its singing and vice versa. Is it possible for machine learning algorithms to recover the scene given the audio signal?In this paper, we propose a novel Cascade Attention-Guided Residue GAN (CAR-GAN), aiming at reconstructing the scenes given the corresponding audio signals. Particularly, we present a residue module to mitigate the gap between different modalities progressively. Moreover, a cascade attention guided network with a novel classification loss function is designed to tackle the crossmodal learning task. Our model keeps consistency in the high-level semantic label domain and is able to balance two different modalities. The experimental results demonstrate that our model achieves the state-of-the-art cross-modal audio-visual generation on the challenging Sub-URMP dataset.

Year	DOI	Venue
2020	10.1109/ICPR48806.2021.9412890	2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)
DocType	ISSN	Citations
Conference	1051-4651	0
PageRank	References	Authors
0.34	28	5

Authors (5 rows)

Cited by (0 rows)

References (28 rows)

Name	Order	Citations	PageRank
Duan Bin	1	0	0.68
Wei Wang	2	131	14.16
Hao Tang	3	338	34.83
Hugo Latapie	4	16	3.36
Yan Yan	5	784	38.14

1