Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification - Citegraph

Paper Info

Title
Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification

Abstract
Fine-tuning pre-trained models for downstream tasks is mainstream in deep learning. However, the pre-trained models are limited to be fine-tuned by data from a specific modality. For example, as a visual model, DenseNet cannot directly take the textual data as its input. Hence, although the large pre-trained models such as DenseNet or BERT have a great potential for the downstream recognition tasks, they have weaknesses in leveraging multimodal information, which is a new trend of deep learning. This work focuses on fine-tuning pre-trained unimodal models with multimodal inputs of image-text pairs and expanding them for image-text multimodal recognition. To this end, we propose the Multimodal Information Injection Plug-in (MI2P) which is attached to different layers of the unimodal models (e.g., DenseNet and BERT). The proposed MI2P unit provides the path to integrate the information of other modalities into the unimodal models. Specifically, MI2P performs cross-modal feature transformation by learning the fine-grained correlations between the visual and textual features. Through the proposed MI2P unit, we can inject the language information into the vision backbone by attending the word-wise textual features to different visual channels, as well as inject the visual information into the language backbone by attending the channel-wise visual features to different textual words. Armed with the MI2P attachments, the pre-trained unimodal models can be expanded to process multimodal data without the need to change the network structures.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.01505	IEEE Conference on Computer Vision and Pattern Recognition
Keywords	DocType	Volume
Vision + language	Conference	2022
Issue	Citations	PageRank
1	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Fengmao Lv	1	27	3.49
Guosheng Lin	2	688	33.91
Mingyang Wan	3	0	0.34
Tianrui Li	4	3176	191.76
Guojun Ma	5	0	0.34
Fengmao Lv	6	0	0.34

1