Title
Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification
Abstract
Fine-tuning pre-trained models for downstream tasks is mainstream in deep learning. However, the pre-trained models are limited to be fine-tuned by data from a specific modality. For example, as a visual model, DenseNet cannot directly take the textual data as its input. Hence, although the large pre-trained models such as DenseNet or BERT have a great potential for the downstream recognition tasks, they have weaknesses in leveraging multimodal information, which is a new trend of deep learning. This work focuses on fine-tuning pre-trained unimodal models with multimodal inputs of image-text pairs and expanding them for image-text multimodal recognition. To this end, we propose the Multimodal Information Injection Plug-in (MI2P) which is attached to different layers of the unimodal models (e.g., DenseNet and BERT). The proposed MI2P unit provides the path to integrate the information of other modalities into the unimodal models. Specifically, MI2P performs cross-modal feature transformation by learning the fine-grained correlations between the visual and textual features. Through the proposed MI2P unit, we can inject the language information into the vision backbone by attending the word-wise textual features to different visual channels, as well as inject the visual information into the language backbone by attending the channel-wise visual features to different textual words. Armed with the MI2P attachments, the pre-trained unimodal models can be expanded to process multimodal data without the need to change the network structures.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.01505
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
Vision + language
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Fengmao Lv1273.49
Guosheng Lin268833.91
Mingyang Wan300.34
Tianrui Li43176191.76
Guojun Ma500.34
Fengmao Lv600.34