Title
Disease named entity recognition and normalization with DNorm
Abstract
Automated techniques for locating and identifying key biomedical entities such as diseases in biomedical publications have a wide range of applications, including semantic literature indexing, biocuration support and knowledge discovery. Machine learning methods to automatically locate entity mentions -- the task of named entity recognition (NER) -- have matured significantly, providing high performance and facilitating the adaptation of systems to new domains. Many applications, however, require the mentions found to be identified within a controlled vocabulary such as SNOMED-CT or MeSH, the task of normalization. One task a normalization system must address is term variation: the identification of terms which are functionally similar but textually distinct (e.g. \"nephropathy\" and \"kidney disease\"). DNorm [1] is the first machine learning method to address term variation by learning similarities between mentions and entity names from a controlled vocabulary directly from training data. We apply DNorm to PubMed abstracts using the NCBI Disease Corpus, resulting in 80.3% precision and 76.3% recall -- an increase of 19.1% precision and 7.8% recall over the highest results achieved by other techniques. We also use DNorm to normalize diseases in clinical narrative text to concepts in SNOMED-CT through a community-wide shared task, ShARe/CLEF eHealth 2013, where it achieved the highest performance of all participants [2]. We conclude that DNorm -- the first machine learning method to normalize disease names -- provides a new state of the art for disease normalization and also represents a promising step towards normalization methods that are both high performing and fully adaptable. DNorm is open source and available with an online demonstration at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/DNorm/
Year
DOI
Venue
2014
10.1145/2649387.2660780
BCB
Keywords
Field
DocType
health,algorithms,named entity recognition,experimentation,normalization,biology and genetics,text processing,information extraction,machine learning,text analysis
Normalization (statistics),Computer science,Controlled vocabulary,Search engine indexing,Natural language processing,Artificial intelligence,Clef,Information retrieval,Information extraction,Knowledge extraction,Bioinformatics,Named-entity recognition,Recall
Conference
Citations 
PageRank 
References 
1
0.35
1
Authors
2
Name
Order
Citations
PageRank
Robert Leaman191439.98
Zhiyong Lu22735171.27