Abstract | ||
---|---|---|
Real-world data are typically described using multiple modalities or multiple types of descriptors that are considered as multiple views. The data from different modalities locate in different subspaces, therefore the representations associated with similar semantics would be different. To solve this problem, many approaches have been proposed for fusion representation using data from multiple views. Although effectiveness achieved, most existing models lack precision for gradient diffusion. We proposed Asymmetric Multimodal Variational Autoencoder (AMVAE) to reduce the effect. The proposed model has two key components: multiple autoencoders and multimodal variational autoencoder. Multiple autoencoders are responsible for encoding view-specific data, while the multimodal variational autoencoder guides the generation of fusion representation. The proposed model effectively solves the problem of low precision. The experimental results show that our method is state of the art on several benchmark datasets for both clustering and classification tasks. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1007/978-3-030-86362-3_32 | ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I |
Keywords | DocType | Volume |
Multi-view representation, Variational autoencoder, Deep learning | Conference | 12891 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wen Youpeng | 1 | 0 | 0.34 |
Hongxiang Lin | 2 | 0 | 0.34 |
Guo Yiju | 3 | 0 | 0.34 |
Liang Zhao | 4 | 5 | 1.75 |