Title
Analysis of Biomedical Text for Chemical Names: A Comparison of Three Methods
Abstract
At the National Library of Medicine (NLM), a variety of biomedical vocabularies are found in data pertinent to its mission. In addition to standard medical terminology, there are specialized vocabularies including that of chemical nomenclature. Normal language tools including the lexically based ones used by the Unified Medical Language System (R) (UMLS (R)) to manipulate and normalize text do not work well on chemical nomenclature. In order to improve NLM's capabilities in chemical text processing, two approaches to the problem of recognizing chemical nomenclature were explored The first approach was a lexical one and consisted of analyzing text for the presence of a fixed set of chemical segments. The approach was extended with general chemical patterns and also with terms from NLM's indexing vocabulary, MeSH (R) and the NLM SPECIALIST (TM) lexicon. The second approach applied Bayesian classification to n-grams of text via two different methods. The single lexical method and two statistical methods, were tested against data from the 1999 UMLS Metathesaurus (R). One of the statistical methods had an overall classification accuracy of 97%.
Year
Venue
Field
1999
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
Medical terminology,Information retrieval,Naive Bayes classifier,Chemical nomenclature,Computer science,Search engine indexing,Lexicon,Natural language processing,Artificial intelligence,Vocabulary,Unified Medical Language System,Text processing
DocType
Issue
ISSN
Conference
SUPnan
1067-5027
Citations 
PageRank 
References 
32
7.06
4
Authors
6
Name
Order
Citations
PageRank
W. John Wilbur143045.66
George F. Hazard Jr.2327.06
Guy Divita313824.59
James G. Mork464765.22
Alan R. Aronson52551260.67
Allen C. Browne618432.81