Title
Predicting Medical Subject Headings Based on Abstract Similarity and Citations to MEDLINE Records.
Abstract
We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting "Heterocyclic Compounds, 2-Ring" vs. other "Heterocyclic Compounds"). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10-fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on "Heterocyclic Compounds, 2-Ring", while our approach performs better on Alzheimer Disease. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.
Year
DOI
Venue
2016
10.1145/2910896.2910920
JCDL
Keywords
Field
DocType
Controlled vocabularies,Medical subject headings,Machine Learning,Curation of bibliographic databases
k-nearest neighbors algorithm,Information retrieval,Computer science,Indexer,Precision and recall,Controlled vocabulary,Hierarchy,Classifier (linguistics),MEDLINE
Conference
ISSN
ISBN
Citations 
2575-7865
978-1-4503-4229-2
1
PageRank 
References 
Authors
0.34
8
2
Name
Order
Citations
PageRank
Adam K. Kehoe110.34
Vetle I. Torvik243027.15