Cross-media Structured Common Space for Multimedia Event Extraction - Citegraph

Paper Info

Title
Cross-media Structured Common Space for Multimedia Event Extraction

Abstract
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

Year	Venue	DocType
2020	ACL	Conference
Volume	Citations	PageRank
2020.acl-main	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Manling Li	1	8	7.89
Zareian, Alireza	2	7	4.20
Qi Zeng	3	8	3.20
Spencer Whitehead	4	6	4.45
Di Lu	5	41	14.62
Heng Ji	6	1544	127.27
Shih-Fu Chang	7	13015	1101.53

1