Title
JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features.
Abstract
Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrates our proposed model outperforms the state-of-the-art approaches by a large margin.
Year
Venue
DocType
2018
COLING
Conference
Volume
Citations 
PageRank 
abs/1806.01483
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Hongru Liang112.73
Haozheng Wang292.19
J. Wang39336.24
Shaodi You412320.49
Zhe Sun56618.02
Jin-Mao Wei613515.90
Zhenglu Yang725735.45