Title
Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?
Abstract
We explore the impact of data source on word representations for different NLP tasks in the clinical domain in French (natural language understanding and text classification). We compared word embeddings (Fasttext) and language models (ELMo), learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for one of the two tasks(+ 7% and +8% of gain in F1-score).
Year
DOI
Venue
2019
10.3233/SHTI190533
Studies in Health Technology and Informatics
Keywords
Field
DocType
Natural language processing,electronic health records
Natural language processing,Artificial intelligence,Medicine
Conference
Volume
ISSN
Citations 
264
0926-9630
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Antoine Neuraz1164.22
Vincent Looten200.34
Bastien Rance36511.91
Nicolas Daniel400.34
N Garcelon5406.01
leonardo campillos llanos698.39
Anita Burgun750657.91
Sophie Rosset839361.66