Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation - Citegraph

Paper Info

Title
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

Abstract
Vision Transformers (ViTs) have emerged with superior performance on computer vision tasks compared to the convolutional neural network (CNN)-based models. However, ViTs mainly designed for image classification will generate single-scale low-resolution representations, which makes dense prediction tasks such as semantic segmentation challenging for ViTs. Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs. We balance the model performance and efficiency of HRViT by various branch-block co-optimization techniques. Specifically, we explore heterogeneous branch designs, reduce the redundancy in linear layers, and augment the attention block with enhanced expressiveness. Those approaches enabled HRViT to push the Pareto frontier of performance and efficiency on semantic segmentation to a new level, as our evaluation results on ADE20K and Cityscapes show. HRViT achieves 50.20% mIoU on ADE20K and 83.16% mIoU on Cityscapes, surpassing state-of-the-art MiT and CSWin backbones with an average of +1.78 mIoU improvement, 28% parameter saving, and 21% FLOPs reduction, demonstrating the potential of HRViT as a strong vision backbone for semantic segmentation. Our code is publicly available 1 1 https://github.com/facebookresearch/HRViT.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.01178	IEEE Conference on Computer Vision and Pattern Recognition
Keywords	DocType	Volume
Deep learning architectures and techniques, Efficient learning and inferences, Representation learning, Segmentation,grouping and shape analysis	Conference	2022
Issue	Citations	PageRank
1	0	0.34
References	Authors
0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jiaqi Gu	1	18	6.97
H Kwon	2	0	0.34
Dongming Wang	3	571	59.66
W Ye	4	0	0.34
M Li	5	0	0.34
Huan Chen	6	15	10.39
Luhua Lai	7	369	33.78
Vikas Chandra	8	691	59.76
David Z. Pan	9	2653	237.64

1