Title
A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation.
Abstract
This work presents a simple vision transformer design as a strong baseline for object localization and instance segmentation tasks. Transformers recently demonstrate competitive performance in image classification. To adopt ViT to object detection and dense prediction tasks, many works inherit the multistage design from convolutional networks and highly customized ViT architectures. Behind this design, the goal is to pursue a better trade-off between computational cost and effective aggregation of multiscale global contexts. However, existing works adopt the multistage architectural design as a black-box solution without a clear understanding of its true benefits. In this paper, we comprehensively study three architecture design choices on ViT – spatial reduction, doubled channels, and multiscale features – and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy. We further complete a scaling rule to optimize our model’s trade-off on accuracy and computation cost / model size. By leveraging a constant feature resolution and hidden size throughout the encoder blocks, we propose a simple and compact ViT architecture called Universal Vision Transformer (UViT) that achieves strong performance on COCO object detection and instance segmentation benchmark. Our code is available at https://github.com/tensorflow/models/tree/master/official/projects/uvit.
Year
DOI
Venue
2022
10.1007/978-3-031-20080-9_41
European Conference on Computer Vision
Keywords
DocType
Citations 
Vision transformer,Self-attention,Object detection,Instance segmentation
Conference
0
PageRank 
References 
Authors
0.34
0
11
Name
Order
Citations
PageRank
Wuyang Chen164.11
Xianzhi Du2464.20
Fan Yang31134.53
Lucas Beyer423213.50
Xiaohua Zhai520913.00
Tsung-Yi Lin62957111.64
Huizhong Chen700.34
Jing Li800.34
Xiaodan Song973354.42
Zhangyang Wang1043775.27
Denny Zhou1100.68