Title | ||
---|---|---|
JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features. |
Abstract | ||
---|---|---|
Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrates our proposed model outperforms the state-of-the-art approaches by a large margin. |
Year | Venue | DocType |
---|---|---|
2018 | COLING | Conference |
Volume | Citations | PageRank |
abs/1806.01483 | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongru Liang | 1 | 1 | 2.73 |
Haozheng Wang | 2 | 9 | 2.19 |
J. Wang | 3 | 93 | 36.24 |
Shaodi You | 4 | 123 | 20.49 |
Zhe Sun | 5 | 66 | 18.02 |
Jin-Mao Wei | 6 | 135 | 15.90 |
Zhenglu Yang | 7 | 257 | 35.45 |