Abstract | ||
---|---|---|
•Systematic investigation of Accuracy vs. Complexity trade-off for VQA Models.•Often additional complexity does not guarantee higher VQA accuracy.•SeNet features are more generalizable than ResNet features.•Superior bilinear fusion with visual attention results in higher VQA accuracy. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.patcog.2021.108106 | Pattern Recognition |
Keywords | DocType | Volume |
Visual question answering,Visual feature extraction,Language features,Multi-modal fusion,Speed-accuracy trade-off | Journal | 120 |
Issue | ISSN | Citations |
1 | 0031-3203 | 0 |
PageRank | References | Authors |
0.34 | 15 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Farazi Moshiur R. | 1 | 0 | 0.34 |
Salman Khan | 2 | 387 | 41.05 |
Nick Barnes | 3 | 577 | 68.68 |