Title
Exploring the boundaries: gene and protein identification in biomedical text
Abstract
Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools.We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts.This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation.Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
Year
DOI
Venue
2005
10.1186/1471-2105-6-S1-S5
BMC Bioinformatics
Keywords
Field
DocType
microarrays,bioinformatics,maximum entropy,algorithms
Information retrieval,Protein identification,Computer science,Information extraction,Bioinformatics,Automatic processing,Named-entity recognition
Journal
Volume
Issue
ISSN
6
S1
1471-2105
Citations 
PageRank 
References 
61
4.68
21
Authors
6
Name
Order
Citations
PageRank
Jenny Rose Finkel1127568.58
Shipra Dingare215511.59
Christopher D. Manning3225791126.22
Malvina Nissim447951.48
Beatrice Alex523725.59
Claire Grover6729100.15