Title
Learning Image Representation via Attribute-Aware Attention Networks for Fashion Classification
Abstract
Attribute descriptions enrich the characteristics of fashion products, and they play an essential role in fashion image research. We propose a fashion classification model (M2Fashion) based on multi-modal data (text + image). It uses the intra-modal and inter-modal data correlation to locate relevant image regions under the guidance of attributes and the attention mechanism. Compared with traditional single-modal feature representation, learning embedding from multi-modal features can better reflect fine-grained image features. We adopt a multi-task learning framework that combines category classification and attribute prediction tasks. The extensive experimental result on the public dataset Deep-Fashion shows the superiority of our proposed M2Fashion compared with state-of-the-art methods. It achieves +1.3% top-3 accuracy rate improvement in the category classification task and +5.6%/+3.7% top-3 recall rate improvement in the attribute prediction of part/shape, respectively. A supplementary experiment on attribute-specific image retrieval on the DARN dataset also demonstrates the effectiveness of M2Fashion.
Year
DOI
Venue
2022
10.1007/978-3-030-98358-1_6
MULTIMEDIA MODELING (MMM 2022), PT I
Keywords
DocType
Volume
Multi-modal, Classification, Prediction, Attention mechanism
Conference
13141
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Yongquan Wan122.08
Cairong Yan2199.25
Bofeng Zhang300.68
Guobing Zou49520.12