Abstract | ||
---|---|---|
•A novel model based on the self-attention mechanism is proposed to learn more effective multi-modal representations.•The DSACA model is proposed to capture the internal dependencies and cross-modal correlation between the image and question sentence.•Extensive experiments and analysis confirm the superiority of the proposed DSACA. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.patcog.2021.107956 | Pattern Recognition |
Keywords | DocType | Volume |
Self-attention,Visual-textual co-attention,Visual question answering | Journal | 117 |
Issue | ISSN | Citations |
1 | 0031-3203 | 2 |
PageRank | References | Authors |
0.37 | 0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liu Yun | 1 | 280 | 57.13 |
Xiaoming Zhang | 2 | 28 | 2.87 |
Qianyun Zhang | 3 | 2 | 0.37 |
Chaozhuo Li | 4 | 47 | 8.45 |
Feiran Huang | 5 | 50 | 8.30 |
Xianghong Tang | 6 | 2 | 0.37 |
Zhoujun Li | 7 | 964 | 115.99 |