The Impact Of Specialized Corpora For Word Embeddings In Natural Langage Understanding - Citegraph

Paper Info

Title
The Impact Of Specialized Corpora For Word Embeddings In Natural Langage Understanding

Abstract
Recent studies in the biomedical domain suggest that learning statistical word representations (static or contextualized word embeddings) on large corpora of specialized data improve the results on downstream natural language processing (NLP) tasks. In this paper, we explore the impact of the data source of word representations on a natural language understanding task. We compared embeddings learned with Fasttext (static embedding) and ELMo (contextualized embedding) representations, learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for the two sub-tasks (+7% and + 4% of gain in F1-score). Moreover, ELMo representations were trained with only a fraction of the data used for Fasttext.

Year	DOI	Venue
2020	10.3233/SHTI200197	DIGITAL PERSONALIZED HEALTH AND MEDICINE
Keywords	DocType	Volume
Natural Language processing, Contextual word embeddings, Natural language understanding	Conference	270
ISSN	Citations	PageRank
0926-9630	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Antoine Neuraz	1	16	4.22
Bastien Rance	2	65	11.91
N Garcelon	3	40	6.01
leonardo campillos llanos	4	9	8.39
Anita Burgun	5	506	57.91
Sophie Rosset	6	393	61.66

1