Simple Open-Vocabulary Object Detection. - Citegraph

Paper Info

Title
Simple Open-Vocabulary Object Detection.

Abstract
Combining simple architectures with large-scale pre-training has led to massive improvements in image classification. For object detection, pre-training and scaling approaches are less well established, especially in the long-tailed and open-vocabulary setting, where training data is relatively scarce. In this paper, we propose a strong recipe for transferring image-text models to open-vocabulary object detection. We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning. Our analysis of the scaling properties of this setup shows that increasing image-level pre-training and model size yield consistent improvements on the downstream detection task. We provide the adaptation strategies and regularizations needed to attain very strong performance on zero-shot text-conditioned and one-shot image-conditioned object detection. Code and models are available on GitHub github.com/google-research/scenic/tree/main/scenic/projects/owl_vit.

Year	DOI	Venue
2022	10.1007/978-3-031-20080-9_42	European Conference on Computer Vision
Keywords	DocType	Citations
Open-vocabulary detection,Transformer,Vision transformer,Zero-shot detection,Image-conditioned detection,One-shot object detection,Contrastive learning,Image-text models,Foundation models,CLIP	Conference	0
PageRank	References	Authors
0.34	0	14

Authors (14 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Minderer Matthias	1	0	0.68
Gritsenko Alexey	2	0	0.34
Stone Austin	3	0	0.34
Neumann Maxim	4	0	0.34
Weissenborn Dirk	5	0	0.34
Alexey Dosovitskiy	6	1797	80.48
Mahendran Aravindh	7	0	0.34
Arnab Anurag	8	0	0.34
Dehghani Mostafa	9	0	0.34
Shen Zhuoran	10	0	0.34
Wang Xiao	11	0	0.34
Xiaohua Zhai	12	209	13.00
Kipf Thomas	13	0	0.34
Neil Houlsby	14	153	14.73

1