Title
BIDIRECTIONAL FOCUSED SEMANTIC ALIGNMENT ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL
Abstract
Cross-modal retrieval is a very challenging and significant task in intelligent understanding. Researchers have tried to capture modal semantic information through a weighted attention mechanism. Still, they cannot eliminate irrelevant semantic information's negative effects and cannot capture fine-grained modal semantic information. In order to further accurately capture the multi-modal semantic information, a bidirectional focused semantic alignment attention network (BFSAAN) is proposed to handle cross-modal retrieval tasks. Core ideas of BFSAAN are as follows: 1) Bidirectional focused attention mechanism is adopted to share modal semantic information, further eliminating the negative influence of irrelevant semantic information. 2) Strip pooling is applied to image and text modalities, a lightweight spatial attention mechanism to capture modal spatial semantic information. 3) Second-order covariance pooling is explored to obtain multi-modal semantic representation, capturing modal channel semantic information and achieving semantic alignment between image-text modalities. The experiment is executed in two standard cross-modal retrieval datasets (Flickr30K and MS COCO). The experimental design includes four aspects: performance comparison, ablation analysis, algorithm convergence, and visual analysis. Experimental results show that BFSAAN has better cross-modal retrieval performance.
Year
DOI
Venue
2021
10.1109/ICASSP39728.2021.9414382
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords
DocType
Citations 
Cross-modal retrieval, bidirectional focused attention, semantic alignment, attention mechanism
Conference
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Shuli Cheng167.59
Liejun Wang272.86
Anyu Du344.19
Yongming Li400.34