VA-RED2: Video Adaptive Redundancy Reduction - Citegraph

Paper Info

Title
VA-RED2: Video Adaptive Redundancy Reduction

Abstract
Performing inference on deep learning models for videos remains a challenge due to the large amount of computational resources required to achieve robust recognition. An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both. The type of redundant features depends on the dynamics and type of events in the video: static videos have more temporal redundancy while videos focusing on objects tend to have more channel redundancy. Here we present a redundancy reduction framework, termed VA-RED2, which is input-dependent. Specifically, our VA-RED2 framework uses an input-dependent policy to decide how many features need to be computed for temporal and channel dimensions. To keep the capacity of the original model, after fully computing the necessary features, we reconstruct the remaining redundant features from those using cheap linear operations. We learn the adaptive policy jointly with the network weights in a differentiable way with a shared-weight mechanism, making it highly efficient. Extensive experiments on multiple video datasets and different visual tasks show that our framework achieves 20%−40% reduction in computation (FLOPs) when compared to state-of-the-art methods without any performance loss. Project page: http://people.csail.mit.edu/bpan/va-red/.

Year	Venue	DocType
2021	ICLR	Conference
Citations	PageRank	References
0	0.34	0
Authors
9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Bowen Pan	1	0	2.03
Rameswar Panda	2	85	14.02
Camilo Luciano Fosco	3	0	0.34
Chung-Ching Lin	4	45	9.19
Alex Andonian	5	6	2.14
Meng Yue	6	8	2.20
kate saenko	7	4478	202.48
Aude Oliva	8	5121	298.19
Rogério Feris	9	1529	89.95

1