Title
Long-Short Term Cross-Transformer in Compressed Domain for Few-Shot Video Classification.
Abstract
Compared with image few-shot learning, most of the existing few-shot video classification methods perform worse on feature matching, because they fail to sufficiently exploit the temporal information and relation. Specifically, frames are usually evenly sampled, which may miss important frames. On the other hand, the heuristic model simply encodes the equally treated frames in sequence, which results in the lack of both long-term and short-term temporal modeling and interaction. To alleviate these limitations, we take advantage of the compressed domain knowledge and propose a long-short term Cross-Transformer (LSTC) for few-shot video classification. For short terms, the motion vector (MV) contains temporal cues and reflects the importance of each frame. For long terms, a video can be natively divided into a sequence of GOPs (Group Of Picture). Using this compressed domain knowledge helps to obtain a more accurate spatial-temporal feature space. Consequently, we design the long-short term selection module, short-term module, and long-term module to comprise the LSTC. Long-short term selection is performed to select informative compressed domain data. Long/short-term modules are utilized to sufficiently exploit the temporal information so that the query and support can be well-matched by cross-attention. Experimental results show the superiority of our method on various datasets.
Year
DOI
Venue
2022
10.24963/ijcai.2022/174
European Conference on Artificial Intelligence
Keywords
DocType
Citations 
Computer Vision: Recognition (object detection, categorization),Computer Vision: Video analysis and understanding,Machine Learning: Few-shot learning
Conference
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Wenyang Luo100.68
Yufan Liu2153.93
Bing Li321760.28
Weiming Hu45300261.38
Yanan Miao500.34
Li Yangxi6345.75