Title | ||
---|---|---|
Learning Image Representation via Attribute-Aware Attention Networks for Fashion Classification |
Abstract | ||
---|---|---|
Attribute descriptions enrich the characteristics of fashion products, and they play an essential role in fashion image research. We propose a fashion classification model (M2Fashion) based on multi-modal data (text + image). It uses the intra-modal and inter-modal data correlation to locate relevant image regions under the guidance of attributes and the attention mechanism. Compared with traditional single-modal feature representation, learning embedding from multi-modal features can better reflect fine-grained image features. We adopt a multi-task learning framework that combines category classification and attribute prediction tasks. The extensive experimental result on the public dataset Deep-Fashion shows the superiority of our proposed M2Fashion compared with state-of-the-art methods. It achieves +1.3% top-3 accuracy rate improvement in the category classification task and +5.6%/+3.7% top-3 recall rate improvement in the attribute prediction of part/shape, respectively. A supplementary experiment on attribute-specific image retrieval on the DARN dataset also demonstrates the effectiveness of M2Fashion. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-030-98358-1_6 | MULTIMEDIA MODELING (MMM 2022), PT I |
Keywords | DocType | Volume |
Multi-modal, Classification, Prediction, Attention mechanism | Conference | 13141 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yongquan Wan | 1 | 2 | 2.08 |
Cairong Yan | 2 | 19 | 9.25 |
Bofeng Zhang | 3 | 0 | 0.68 |
Guobing Zou | 4 | 95 | 20.12 |