Label graph learning for multi-label image recognition with cross-modal fusion - Citegraph

Paper Info

Title
Label graph learning for multi-label image recognition with cross-modal fusion

Abstract
It has become popular to learn the correlation between labels in most existing multi-label image recognition tasks. Existing approaches begin to construct a label graph to learn the label dependencies but they suffer from a low convergence efficiency when fusing image features and label embeddings, and also limit the performance improvement on multi-label images. To overcome this challenge, we propose a l abel g raph l earning m odel (termed as LGLM) for multi-label image recognition, which integrates a multi-modal fusion component to efficiently fuse cross-modal embeddings. First, LGLM uses convolution neural network to learn the feature for each image. Second, LGLM first constructs a label graph according to the word vector of each object and then adopts graph convolution network to learn the label correlations to generate label co-occurrence embeddings. Finally, the multi-modal fusion component efficiently fuses image features and label co-occurrence embeddings to generate an end-to-end image recognition model. We conduct extensive experiments on MS-COCO and FLICKR25K and the experimental results demonstrate the superiority of LGLM compared with the state-of-the-art image recognition methods. The code of LGLM has been released on GitHub: https://github.com/lzHZWZ/LGLM .

Year	DOI	Venue
2022	10.1007/s11042-022-12397-y	Multimedia Tools and Applications
Keywords	DocType	Volume
Multi-label image recognition, Label graph, Graph convolution network, Multi-modal fusion	Journal	81
Issue	ISSN	Citations
18	1380-7501	0
PageRank	References	Authors
0.34	6	4

Authors (4 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Yanzhao Xie	1	2	1.39
Yangtao Wang	2	27	5.85
Yu Liu	3	492	30.80
Ke Zhou	4	452	51.98

1