Title
GrantExtractor: A Winning System for Extracting Grant Support Information from Biomedical Literature
Abstract
As the important information in MEDLINE database, grant support (GS) refers to funding agencies and contract numbers. For funding organizations, GS plays a crucial role in tracking their funding outcomes. In this paper, we present a pipeline system called GrantExtractor that is able to automatically extract funding information from biomedical literature. GrantExtractor is a novel solution to the practical problem of GS information extraction, which is related to both name entity recognition and relation extraction. Our approaches rely on an integration of several modern machine learning techniques. In particular, funding sentences in articles are first identified by a sentence classifier. Entities of grant numbers and agencies are then extracted from these funding sentences by a bi-directional LSTM and the CRF layer (BiLSTM-CRF), as well as pattern matching. After removing noisy numbers by a multi-class model, we finally match each grant number with its corresponding agency. Experimental results on benchmark datasets show that GrantExtractor clearly outperformed all baseline methods. In addition, GrantExtractor won the first place in Task 5C of 2017 BioASQ challenge, achieving the Micro-recall of 0.9526 for 22,610 articles. This number is 33% higher than 0.7174, which is the highest score as the baseline of“BioASQ Filtering” provided by National Library of Medicine (NLM). Moreover, GrantExtractor has achieved the Micro F-measure score as high as 0.90 in the task of extracting grant pairs.
Year
DOI
Venue
2018
10.1109/BIBM.2018.8621579
2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Keywords
Field
DocType
machine learning techniques,grant pairs,noisy numbers,grant numbers,sentence classifier,relation extraction,name entity recognition,GS information extraction,pipeline system,funding organizations,contract numbers,funding agencies,biomedical literature,winning system,GrantExtractor
Information retrieval,Computer science,Filter (signal processing),Information extraction,Artificial intelligence,Classifier (linguistics),MEDLINE,Pattern matching,Sentence,Machine learning,Relationship extraction
Conference
ISSN
ISBN
Citations 
2156-1125
978-1-5386-5489-7
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Suyang Dai151.75
Zihan Zhang210.70
Wenxuan Zuo300.68
Xiaodi Huang434240.33
Shanfeng Zhu542935.04