Abstract | ||
---|---|---|
Recent works have been applying self-attention to various fields in computer vision and natural language processing. However, the memory and computational demands of existing self-attention operations grow quadratically with the spatiotemporal size of the input. This prohibits the application of self-attention on large inputs, e.g., long sequences, high-definition images, or large videos. To remedy this drawback, this paper proposes a novel decomposed attention (DA) module with substantially less memory and computational consumption. The resource-efficiency allows more widespread and flexible application. Empirical evaluations on object recognition demonstrated the effectiveness of these advantages. DA-augmented models achieved state-of-the-art performance for object recognition on MS-COCO 2017 and significant improvement for image classification on ImageNet. Further, the resource-efficiency of DA democratizes self-attention to fields where the prohibitively high costs have been preventing its application. The state-of-the-art result for stereo depth estimation on the Scene Flow dataset exemplified this. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Computer Vision and Pattern Recognition | Journal |
Volume | Citations | PageRank |
abs/1812.01243 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhuoran Shen | 1 | 0 | 0.34 |
Mingyuan Zhang | 2 | 0 | 1.01 |
Shuai Yi | 3 | 167 | 14.21 |
Junjie Yan | 4 | 1288 | 58.19 |
Haiyu Zhao | 5 | 65 | 6.28 |