Title
A Unimodal Representation Learning and Recurrent Decomposition Fusion Structure for Utterance-Level Multimodal Embedding Learning
Abstract
Learning a unified embedding for utterance-level video attracts significant attention recently due to the rapid development of social media and its broad applications. An utterance normally contains not only spoken language but also the nonverbal behaviors such as facial expressions and vocal patterns. Instead of directly learning utterance embedding based on low-level features, we firstly explore high-level representation for each modality separately via an unimodal representation learning gyroscope structure. In this way, the learnt unimodal representations are more representative and contain more abstract semantic information. In the gyroscope structure, we introduce multi-scale kernel learning, 'channel expansion' and 'channel fusion' operations to explore high-level features both spatially and channelwise. Another insight of' our method lies in that we fuse representations of all modalities to obtain a unified embedding by interpreting fusion procedure as the flow of intermodality information between various modalities, which is more specialized in terms of the information to he fused and the fusion process. Specifically, considering that each modality carries modality-specific and cross-modality interactions, we innovate to decompose unimodal representations into intra- and inter-modality dynamics using gating mechanism, and further fuse the intermodality dynamics by passing them from previous modalities to the following one using a recurrent neural fusion architecture. Extensive experiments demonstrate that our method achieves state-of-the-art performance on multiple benchmark datasets.
Year
DOI
Venue
2022
10.1109/TMM.2021.3082398
IEEE TRANSACTIONS ON MULTIMEDIA
Keywords
DocType
Volume
Multimodal utterance embedding, unimodal representation learning, intra- and inter-modality dynamics, recurrent decomposition fusion network
Journal
24
ISSN
Citations 
PageRank 
1520-9210
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Sijie Mai164.87
Haifeng Hu227060.38
Songlong Xing362.49