Title
GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution.
Abstract
Learning from multimodal data has become a popular research topic in recent years. Multimodal coreference resolution (MCR) is an important task in this area. MCR involves resolving the references across different modalities, e.g., text and images, which is a crucial capability for building next-generation conversational agents. MCR is challenging as it requires encoding information from different modalities and modeling associations between them. Although significant progress has been made for visual-linguistic tasks such as visual grounding, most of the current works involve single turn utterances and focus on simple coreference resolutions. In this work, we propose an MCR model that resolves coreferences made in multi-turn dialogues with scene images. We present GRAVL-BERT, a unified MCR framework which combines visual relationships between objects, background scenes, dialogue, and metadata by integrating Graph Neural Networks with VL-BERT. We present results on the SIMMC 2.0 multimodal conversational dataset, achieving the rank-1 on the DSTC-10 SIMMC 2.0 MCR challenge with F1 score 0.783. Our code is available at https://github.com/alexa/gravl-bert.
Year
Venue
DocType
2022
International Conference on Computational Linguistics
Conference
Volume
Citations 
PageRank 
Proceedings of the 29th International Conference on Computational Linguistics
0
0.34
References 
Authors
0
9
Name
Order
Citations
PageRank
Danfeng Guo100.34
Arpit Gupta222.05
Sanchit Agarwal3141.79
Jiun-Yu Kao400.34
Shuyang Gao5241.86
Arijit Biswas644.50
Chien-Wei Lin700.34
Tagyoung Chung8688.83
Mohit Bansal902.03