RegionCLIP: Region-based Language-Image Pretraining | 0 | 0.34 | 2022 |
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning. | 0 | 0.34 | 2022 |
How Much Can CLIP Benefit Vision-and-Language Tasks? | 0 | 0.34 | 2022 |
Broaden the Vision - Geo-Diverse Visual Commonsense Reasoning. | 0 | 0.34 | 2021 |
Efficient Contextual Representation Learning With Continuous Outputs | 0 | 0.34 | 2019 |
Efficient Contextual Representation Learning Without Softmax Layer. | 0 | 0.34 | 2019 |