Title | ||
---|---|---|
Fine-Grained Named Entity Recognition with Distant Supervision in COVID-19 Literature |
Abstract | ||
---|---|---|
Biomedical named entity recognition (BioNER) is a fundamental step for mining COVID-19 literature. Existing BioNER datasets cover a few common coarse-grained entity types (e.g., genes, chemicals, and diseases), which cannot be used to recognize highly domain-specific entity types (e.g., animal models of diseases) or emerging ones (e.g., coronaviruses) for COVID-19 studies. We present CORD-NER, a fine-grained named entity recognized dataset of COVID-19 literature (up until May 19, 2020). CORD-NER contains over 12 million sentences annotated via distant supervision. Also included in CORD-NER are 2,000 manually-curated sentences as a test set for performance evaluation. CORD-NER covers 75 fine-grained entity types. In addition to the common biomedical entity types, it covers new entity types specifically related to COVID-19 studies, such as coronaviruses, viral proteins, evolution, and immune responses. The dictionaries of these fine-grained entity types are collected from existing knowledge bases and human-input seed sets. We further present DISTNER, a distantly supervised NER model that relies on a massive unlabeled corpus and a collection of dictionaries to annotate the COVID-19 corpus. DISTNER provides a benchmark performance on the CORD-NER test set for future research. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/BIBM49941.2020.9313126 | 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) |
Keywords | DocType | ISBN |
fine-grained named entity recognition,distant supervision,COVID-19 | Conference | 978-1-7281-6216-4 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuan Wang | 1 | 28 | 5.96 |
Xiangchen Song | 2 | 0 | 0.68 |
Bangzheng Li | 3 | 0 | 0.34 |
Kang Zhou | 4 | 0 | 0.34 |
Qi Li | 5 | 13 | 3.20 |
Jiawei Han | 6 | 43085 | 3824.48 |