Title
Label graph learning for multi-label image recognition with cross-modal fusion
Abstract
It has become popular to learn the correlation between labels in most existing multi-label image recognition tasks. Existing approaches begin to construct a label graph to learn the label dependencies but they suffer from a low convergence efficiency when fusing image features and label embeddings, and also limit the performance improvement on multi-label images. To overcome this challenge, we propose a l abel g raph l earning m odel (termed as LGLM) for multi-label image recognition, which integrates a multi-modal fusion component to efficiently fuse cross-modal embeddings. First, LGLM uses convolution neural network to learn the feature for each image. Second, LGLM first constructs a label graph according to the word vector of each object and then adopts graph convolution network to learn the label correlations to generate label co-occurrence embeddings. Finally, the multi-modal fusion component efficiently fuses image features and label co-occurrence embeddings to generate an end-to-end image recognition model. We conduct extensive experiments on MS-COCO and FLICKR25K and the experimental results demonstrate the superiority of LGLM compared with the state-of-the-art image recognition methods. The code of LGLM has been released on GitHub: https://github.com/lzHZWZ/LGLM .
Year
DOI
Venue
2022
10.1007/s11042-022-12397-y
Multimedia Tools and Applications
Keywords
DocType
Volume
Multi-label image recognition, Label graph, Graph convolution network, Multi-modal fusion
Journal
81
Issue
ISSN
Citations 
18
1380-7501
0
PageRank 
References 
Authors
0.34
6
4
Name
Order
Citations
PageRank
Yanzhao Xie121.39
Yangtao Wang2275.85
Yu Liu349230.80
Ke Zhou445251.98