Title
Residual Attention-Based Fusion for Video Classification
Abstract
Video data is inherently multimodal and sequential. Therefore, deep learning models need to aggregate all data modalities while capturing the most relevant spatio-temporal information from a given video. This paper presents a multimodal deep learning framework for video classification using a Residual Attention-based Fusion (RAF) method. Specifically, this framework extracts spatio-temporal features from each modality using residual attention-based bidirectional Long Short-Term Memory and fuses the information using a weighted Support Vector Machine to handle the imbalanced data. Experimental results on a natural disaster video dataset show that our approach improves upon the state-of-the-art by 5% and 8% regarding F1 and MAP metrics, respectively. Most remarkably, our proposed residual attention model reaches a 0.95 F1-score and 0.92 MAP for this dataset.
Year
DOI
Venue
2019
10.1109/CVPRW.2019.00064
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Keywords
Field
DocType
imbalanced data,natural disaster video dataset show,residual attention model,video classification,video data,data modalities,relevant spatio-temporal information,multimodal deep learning framework,Residual Attention-based Fusion method,spatio-temporal features,residual attention-based bidirectional Long Short-Term Memory,weighted Support Vector Machine
Computer vision,Residual,Pattern recognition,Computer science,Fusion,Artificial intelligence
Conference
ISSN
ISBN
Citations 
2160-7508
978-1-7281-2507-7
0
PageRank 
References 
Authors
0.34
1
3
Name
Order
Citations
PageRank
Samira Pouyanfar114113.06
Tianyi Wang229427.78
Shu-Ching Chen31978182.74