Title | ||
---|---|---|
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes With Semantic Consistency and Attention Mechanism |
Abstract | ||
---|---|---|
Food retrieval is an important task to perform analysis of food-related information, where we are interested in retrieving relevant information about the queried food item such as ingredients, cooking instructions, etc. In this paper, we investigate cross-modal retrieval between food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. Two major challenges in addressing this problem are 1) large intra-variance and small inter-variance across cross-modal food data; and 2) difficulties in obtaining discriminative recipe representations. To address these two problems, we propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. Besides, we exploit a self-attention mechanism to improve the embedding of recipes. We evaluate the performance of the proposed method on the large-scale Recipe1M dataset, and show that we can outperform several state-of-the-art cross-modal retrieval strategies for food images and cooking recipes by a significant margin. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TMM.2021.3083109 | IEEE TRANSACTIONS ON MULTIMEDIA |
Keywords | DocType | Volume |
Semantics, Task analysis, Data models, Correlation, Visualization, Training, Sugar, Deep learning, cross-modal retrieval, vision-and-language | Journal | 24 |
ISSN | Citations | PageRank |
1520-9210 | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hao Wang | 1 | 16 | 3.28 |
Doyen Sahoo | 2 | 83 | 9.94 |
Chenghao Liu | 3 | 334 | 32.66 |
Shu Ke | 4 | 4 | 1.11 |
Palakorn Achananuparp | 5 | 302 | 23.16 |
Ee-Peng Lim | 6 | 5889 | 754.17 |
Steven C. H. Hoi | 7 | 3830 | 174.61 |