Abstract | ||
---|---|---|
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods. |
Year | Venue | DocType |
---|---|---|
2020 | ACL | Conference |
Volume | Citations | PageRank |
2020.acl-main | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Manling Li | 1 | 8 | 7.89 |
Zareian, Alireza | 2 | 7 | 4.20 |
Qi Zeng | 3 | 8 | 3.20 |
Spencer Whitehead | 4 | 6 | 4.45 |
Di Lu | 5 | 41 | 14.62 |
Heng Ji | 6 | 1544 | 127.27 |
Shih-Fu Chang | 7 | 13015 | 1101.53 |