Deep Coordinated Textual And Visual Network For Sentiment-Oriented Cross-Modal Retrieval - Citegraph

Paper Info

Title
Deep Coordinated Textual And Visual Network For Sentiment-Oriented Cross-Modal Retrieval

Abstract
Cross-modal retrieval has attracted more and more attention recently, which enables people to retrieve desired information efficiently from a large amount of multimedia data. Most methods on cross-modal retrieval only focus on aligning the objects in image and text, while sentiment alignment is also essential for facilitating various applications, e.g., entertainment, advertisement, etc. This paper studies the problem of retrieving visual sentiment concepts with a goal to extract sentiment-oriented information from social multimedia content, i.e., sentiment oriented cross-media retrieval. Such problem is inherently challenging due to the subjective and ambiguity characteristics of the adjectives like "sad" and "awesome". Thus, we focus on modeling visual sentiment concepts with adjective-noun pairs, e.g., "sad dog" and "awesome flower", where associating adjectives with concrete objects makes the concepts more tractable. This paper proposes a deep coordinated textural and visual network with two branches to learn a joint semantic embedding space for both images and texts. The visual branch is based on a convolutional neural network ( CNN) pre-trained on a large dataset, which is optimized with the classification loss. The textual branch is added on the fully-connected layer providing supervision of the textual semantic space. In order to learn the coordinated representation for different modalities, the multi-task loss function is optimized during the end-to-end training process. We have conducted extensive experiments on a subset of the large-scale VSO dataset. The results show that the proposed model is able to retrieval sentiment-oriented data, which performs favorably against the state-of-the-art methods.

Year	DOI	Venue
2018	10.1007/978-3-319-97304-3_52	PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I
Keywords	Field	DocType
Cross-modal retrieval, Visual sentiment analysis, Convolutional neural network	Modalities,Embedding,Information retrieval,Convolutional neural network,Computer science,Entertainment,Artificial intelligence,Social multimedia,Ambiguity,Machine learning,Modal,Semantic space	Conference
Volume	ISSN	Citations
11012	0302-9743	0
PageRank	References	Authors
0.34	25	5

Authors (5 rows)

Cited by (0 rows)

References (25 rows)

Name	Order	Citations	PageRank
Jiamei Fu	1	3	1.06
Dongyu She	2	43	4.65
Xingxu Yao	3	6	2.09
Yuxiang Zhang	4	167	15.28
Jufeng Yang	5	78	12.04

1