Title
MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction.
Abstract
Acronym extraction is the task of identifying acronyms and their expanded forms in texts that is necessary for various NLP applications. Despite major progress for this task in recent years, one limitation of existing AE research is that they are limited to the English language and certain domains (i.e., scientific and biomedical). Challenges of AE in other languages and domains are mainly unexplored. As such, lacking annotated datasets in multiple languages and domains has been a major issue to prevent research in this direction. To address this limitation, we propose a new dataset for multilingual and multi-domain AE. Specifically, 27,200 sentences in 6 different languages and 2 new domains, i.e., legal and scientific, are manually annotated for AE. Our experiments on the dataset show that AE in different languages and learning settings has unique challenges, emphasizing the necessity of further research on multilingual and multi-domain AE.
Year
Venue
DocType
2022
International Conference on Computational Linguistics
Conference
Volume
Citations 
PageRank 
Proceedings of the 29th International Conference on Computational Linguistics
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Amir Pouran Ben Veyseh105.75
Nicole Meister200.68
Seunghyun Yoon300.34
Rajiv Jain445.16
Franck Dernoncourt514935.39
Thuy Thanh Nguyen623632.55