Title | ||
---|---|---|
EXPLORING VISUAL-AUDIO COMPOSITION ALIGNMENT NETWORK FOR QUALITY FASHION RETRIEVAL IN VIDEO |
Abstract | ||
---|---|---|
Fashion retrieval in video suffers from the issues of imperfect visual representation and low quality of search results under the E-commercial circumstance. Previous works generally focus on searching the identical images from visual perspective only, but lack of leveraging multi-modal information for high quality commodities. As a cross-domain problem, instructional or exhibiting audio reveals rich semantic information to facilite the video-to-shop task. In this paper, we present a novel Visual-Audio Composition Alignment Network (VACANet) to deal with quality fashion retrieval in video. Firstly, we introduce the visual-audio composition module in VACANet aiming to distinguish attentive and residual entities by learning semantic embedding from both visual and audio streams. Secondly, a quality alignment training scheme is then designed by quality-aware triplet mining and domain alignment constraint for video-to-image adaptation. Finally, extensive experiments conducted on challenging video datasets demonstrate the scalable effectiveness of our model in alleviating quality fashion retrieval. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICASSP39728.2021.9413617 | 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) |
Keywords | DocType | Citations |
Fashion Retrieval, Visual-Audio Embedding, Multi-modal Learning, Cross-domain Alignment | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yanhao Zhang | 1 | 180 | 13.90 |
Jianmin Wu | 2 | 110 | 9.91 |
Xiong Xiong | 3 | 0 | 0.34 |
Dangwei Li | 4 | 0 | 0.34 |
Chenwei Xie | 5 | 0 | 1.35 |
Yun Zheng | 6 | 59 | 11.91 |
Pan Pan | 7 | 10 | 4.29 |
Yinghui Xu | 8 | 0 | 0.34 |