DNorm: disease name normalization with pairwise learning to rank. - Citegraph

Paper Info

Title
DNorm: disease name normalization with pairwise learning to rank.

Abstract
Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text-the task of disease name normalization (DNorm)-compared with other normalization tasks in biomedical text mining research. Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH (R) and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval. Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively.

Year	DOI	Venue
2013	10.1093/bioinformatics/btt474	BIOINFORMATICS
Field	DocType	Volume
Training set,Data mining,Normalization (statistics),Information retrieval,Source code,Computer science,Biomedical text mining,Bioinformatics,Pairwise learning,Vocabulary,The Internet	Journal	29
Issue	ISSN	Citations
22	1367-4803	114
PageRank	References	Authors
3.27	31	3

Search Limit

100114

Authors (3 rows)

Cited by (100 rows)

References (31 rows)

Name	Order	Citations	PageRank
Robert Leaman	1	914	39.98
Rezarta Islamaj-Doğan	2	419	20.65
Zhiyong Lu	3	2735	171.27

1