Residual Attention-Based Fusion for Video Classification - Citegraph

Paper Info

Title
Residual Attention-Based Fusion for Video Classification

Abstract
Video data is inherently multimodal and sequential. Therefore, deep learning models need to aggregate all data modalities while capturing the most relevant spatio-temporal information from a given video. This paper presents a multimodal deep learning framework for video classification using a Residual Attention-based Fusion (RAF) method. Specifically, this framework extracts spatio-temporal features from each modality using residual attention-based bidirectional Long Short-Term Memory and fuses the information using a weighted Support Vector Machine to handle the imbalanced data. Experimental results on a natural disaster video dataset show that our approach improves upon the state-of-the-art by 5% and 8% regarding F1 and MAP metrics, respectively. Most remarkably, our proposed residual attention model reaches a 0.95 F1-score and 0.92 MAP for this dataset.

Year	DOI	Venue
2019	10.1109/CVPRW.2019.00064	2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Keywords	Field	DocType
imbalanced data,natural disaster video dataset show,residual attention model,video classification,video data,data modalities,relevant spatio-temporal information,multimodal deep learning framework,Residual Attention-based Fusion method,spatio-temporal features,residual attention-based bidirectional Long Short-Term Memory,weighted Support Vector Machine	Computer vision,Residual,Pattern recognition,Computer science,Fusion,Artificial intelligence	Conference
ISSN	ISBN	Citations
2160-7508	978-1-7281-2507-7	0
PageRank	References	Authors
0.34	1	3

Authors (3 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Samira Pouyanfar	1	141	13.06
Tianyi Wang	2	294	27.78
Shu-Ching Chen	3	1978	182.74

1