Abstract | ||
---|---|---|
We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multimodal action recognition dataset available. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/CVPR.2019.00713 | 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) |
Field | DocType | Volume |
Architecture,Action recognition,Fusion,Artificial intelligence,RGB color model,Search problem,Sequential model,Machine learning,Mathematics | Journal | abs/1903.06496 |
ISSN | Citations | PageRank |
1063-6919 | 7 | 0.40 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Juan-Manuel Perez-Rua | 1 | 17 | 1.54 |
Valentin Vielzeuf | 2 | 7 | 0.40 |
stephane pateux | 3 | 24 | 2.06 |
Moez Baccouche | 4 | 45 | 2.14 |
Frédéric Jurie | 5 | 3924 | 235.82 |