Abstract | ||
---|---|---|
It has become popular to learn the correlation between labels in most existing multi-label image recognition tasks. Existing approaches begin to construct a label graph to learn the label dependencies but they suffer from a low convergence efficiency when fusing image features and label embeddings, and also limit the performance improvement on multi-label images. To overcome this challenge, we propose a l abel g raph l earning m odel (termed as LGLM) for multi-label image recognition, which integrates a multi-modal fusion component to efficiently fuse cross-modal embeddings. First, LGLM uses convolution neural network to learn the feature for each image. Second, LGLM first constructs a label graph according to the word vector of each object and then adopts graph convolution network to learn the label correlations to generate label co-occurrence embeddings. Finally, the multi-modal fusion component efficiently fuses image features and label co-occurrence embeddings to generate an end-to-end image recognition model. We conduct extensive experiments on MS-COCO and FLICKR25K and the experimental results demonstrate the superiority of LGLM compared with the state-of-the-art image recognition methods. The code of LGLM has been released on GitHub:
https://github.com/lzHZWZ/LGLM
. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/s11042-022-12397-y | Multimedia Tools and Applications |
Keywords | DocType | Volume |
Multi-label image recognition, Label graph, Graph convolution network, Multi-modal fusion | Journal | 81 |
Issue | ISSN | Citations |
18 | 1380-7501 | 0 |
PageRank | References | Authors |
0.34 | 6 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yanzhao Xie | 1 | 2 | 1.39 |
Yangtao Wang | 2 | 27 | 5.85 |
Yu Liu | 3 | 492 | 30.80 |
Ke Zhou | 4 | 452 | 51.98 |