Mobile-Former: Bridging MobileNet and Transformer. - Citegraph

Paper Info

Title
Mobile-Former: Bridging MobileNet and Transformer.

Abstract
We present Mobile-Former, a parallel design of MobileNet and Transformer with a two-way bridge in between. This structure leverages the advantage of MobileNet at local processing and transformer at global interaction. And the bridge enables bidirectional fusion of local and global features. Different with recent works on vision transformer, the transformer in Mobile-Former contains very few tokens (e.g. less than 6 tokens) that are randomly initialized, resulting in low computational cost. Combining with the proposed light-weight cross attention to model the bridge, Mobile-Former is not only computationally efficient, but also has more representation power, outperforming MobileNetV3 at low FLOP regime from 25M to 500M FLOPs on ImageNet classification. For instance, it achieves 77.9\% top-1 accuracy at 294M FLOPs, gaining 1.3\% over MobileNetV3 but saving 17\% of computations. When transferring to object detection, Mobile-Former outperforms MobileNetV3 by 8.6 AP.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.00520	IEEE Conference on Computer Vision and Pattern Recognition
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yinpeng Chen	1	186	23.77
Xiyang Dai	2	25	6.88
Dongdong Chen	3	52	19.10
Mengchen Liu	4	426	16.26
X. Dong	5	33	8.20
Lu Yuan	6	801	48.29
zicheng liu	7	3662	199.64

1