Abstract | ||
---|---|---|
Objective: In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods: Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results: We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions: Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1093/jamia/ocy189 | JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION |
Keywords | Field | DocType |
word sense disambiguation,biomedical text mining,deep neural networks,bidirectional long short-term memory network,zero-shot learning | Data mining,Natural language processing,Artificial intelligence,Medicine,Word-sense disambiguation | Journal |
Volume | Issue | ISSN |
26 | 5 | 1067-5027 |
Citations | PageRank | References |
1 | 0.35 | 18 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ahmad Pesaranghader | 1 | 28 | 4.20 |
Stan Matwin | 2 | 3025 | 344.20 |
Marina Sokolova | 3 | 720 | 28.40 |
Ali Pesaranghader | 4 | 29 | 3.16 |