Title
deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.
Abstract
Objective: In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods: Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results: We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions: Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.
Year
DOI
Venue
2019
10.1093/jamia/ocy189
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
Keywords
Field
DocType
word sense disambiguation,biomedical text mining,deep neural networks,bidirectional long short-term memory network,zero-shot learning
Data mining,Natural language processing,Artificial intelligence,Medicine,Word-sense disambiguation
Journal
Volume
Issue
ISSN
26
5
1067-5027
Citations 
PageRank 
References 
1
0.35
18
Authors
4
Name
Order
Citations
PageRank
Ahmad Pesaranghader1284.20
Stan Matwin23025344.20
Marina Sokolova372028.40
Ali Pesaranghader4293.16