Title
Multimodal Representation of Advertisements Using Segment-level Autoencoders.
Abstract
Automatic analysis of advertisements (ads) poses an interesting problem for learning multimodal representations. A promising direction of research is the development of deep neural network autoencoders to obtain inter-modal and intra-modal representations. In this work, we propose a system to obtain segment-level unimodal and joint representations. These features are concatenated, and then averaged across the duration of an ad to obtain a single multimodal representation. The autoencoders are trained using segments generated by time-aligning frames between the audio and video modalities with forward and backward context. In order to assess the multimodal representations, we consider the tasks of classifying an ad as funny or exciting in a publicly available dataset of 2,720 ads. For this purpose we train the segment-level autoencoders on a larger, unlabeled dataset of 9,740 ads, agnostic of the test set. Our experiments show that: 1) the multimodal representations outperform joint and unimodal representations, 2) the different representations we learn are complementary to each other, and 3) the segment-level multimodal representations perform better than classical autoencoders and cross-modal representations -- within the context of the two classification tasks. We obtain an improvement of about 5% in classification accuracy compared to a competitive baseline.
Year
DOI
Venue
2018
10.1145/3242969.3243026
ICMI
Keywords
Field
DocType
multimodal joint representation, autoencoders, advertisements
Modalities,Advertising,Computer science,Concatenation,Artificial neural network,Test set
Conference
ISBN
Citations 
PageRank 
978-1-4503-5692-3
0
0.34
References 
Authors
14
4
Name
Order
Citations
PageRank
Krishna S.198.31
Victor R. Martinez234.78
naveen kumar3204.24
Narayanan Shrikanth45558439.23