Title
Mobile-Former: Bridging MobileNet and Transformer.
Abstract
We present Mobile-Former, a parallel design of MobileNet and Transformer with a two-way bridge in between. This structure leverages the advantage of MobileNet at local processing and transformer at global interaction. And the bridge enables bidirectional fusion of local and global features. Different with recent works on vision transformer, the transformer in Mobile-Former contains very few tokens (e.g. less than 6 tokens) that are randomly initialized, resulting in low computational cost. Combining with the proposed light-weight cross attention to model the bridge, Mobile-Former is not only computationally efficient, but also has more representation power, outperforming MobileNetV3 at low FLOP regime from 25M to 500M FLOPs on ImageNet classification. For instance, it achieves 77.9\% top-1 accuracy at 294M FLOPs, gaining 1.3\% over MobileNetV3 but saving 17\% of computations. When transferring to object detection, Mobile-Former outperforms MobileNetV3 by 8.6 AP.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.00520
IEEE Conference on Computer Vision and Pattern Recognition
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Yinpeng Chen118623.77
Xiyang Dai2256.88
Dongdong Chen35219.10
Mengchen Liu442616.26
X. Dong5338.20
Lu Yuan680148.29
zicheng liu73662199.64