Learning Image Representation via Attribute-Aware Attention Networks for Fashion Classification - Citegraph

Paper Info

Title
Learning Image Representation via Attribute-Aware Attention Networks for Fashion Classification

Abstract
Attribute descriptions enrich the characteristics of fashion products, and they play an essential role in fashion image research. We propose a fashion classification model (M2Fashion) based on multi-modal data (text + image). It uses the intra-modal and inter-modal data correlation to locate relevant image regions under the guidance of attributes and the attention mechanism. Compared with traditional single-modal feature representation, learning embedding from multi-modal features can better reflect fine-grained image features. We adopt a multi-task learning framework that combines category classification and attribute prediction tasks. The extensive experimental result on the public dataset Deep-Fashion shows the superiority of our proposed M2Fashion compared with state-of-the-art methods. It achieves +1.3% top-3 accuracy rate improvement in the category classification task and +5.6%/+3.7% top-3 recall rate improvement in the attribute prediction of part/shape, respectively. A supplementary experiment on attribute-specific image retrieval on the DARN dataset also demonstrates the effectiveness of M2Fashion.

Year	DOI	Venue
2022	10.1007/978-3-030-98358-1_6	MULTIMEDIA MODELING (MMM 2022), PT I
Keywords	DocType	Volume
Multi-modal, Classification, Prediction, Attention mechanism	Conference	13141
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yongquan Wan	1	2	2.08
Cairong Yan	2	19	9.25
Bofeng Zhang	3	0	0.68
Guobing Zou	4	95	20.12

1