Inexpensive Domain Adaptation of Pretrained Language Models - Case Studies on Biomedical NER and Covid-19 QA. - Citegraph

Paper Info

Title
Inexpensive Domain Adaptation of Pretrained Language Models - Case Studies on Biomedical NER and Covid-19 QA.

Abstract
Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by unsupervised pretraining on target-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO 2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on target-domain text and align the resulting word vectors with the wordpiece vectors of a general-domain PTLM. We evaluate on eight English biomedical Named Entity Recognition (NER) tasks and compare against the recently proposed BioBERT model. We cover over 60% of the BioBERT - BERT F1 delta, at 5% of BioBERT’s CO 2 footprint and 2% of its cloud compute cost. We also show how to quickly adapt an existing general-domain Question Answering (QA) model to an emerging domain: the Covid-19 pandemic.

Year	DOI	Venue
2020	10.18653/V1/2020.FINDINGS-EMNLP.134	EMNLP
DocType	Volume	Citations
Conference	2020.findings-emnlp	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Nina Poerner	1	0	1.35
Ulli Waltinger	2	64	10.76
Hinrich Schütze	3	2113	362.21

1