JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features. - Citegraph

Paper Info

Title
JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features.

Abstract
Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrates our proposed model outperforms the state-of-the-art approaches by a large margin.

Year	Venue	DocType
2018	COLING	Conference
Volume	Citations	PageRank
abs/1806.01483	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hongru Liang	1	1	2.73
Haozheng Wang	2	9	2.19
J. Wang	3	93	36.24
Shaodi You	4	123	20.49
Zhe Sun	5	66	18.02
Jin-Mao Wei	6	135	15.90
Zhenglu Yang	7	257	35.45

1