Title
Inexpensive Domain Adaptation of Pretrained Language Models - Case Studies on Biomedical NER and Covid-19 QA.
Abstract
Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by unsupervised pretraining on target-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO 2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on target-domain text and align the resulting word vectors with the wordpiece vectors of a general-domain PTLM. We evaluate on eight English biomedical Named Entity Recognition (NER) tasks and compare against the recently proposed BioBERT model. We cover over 60% of the BioBERT - BERT F1 delta, at 5% of BioBERT’s CO 2 footprint and 2% of its cloud compute cost. We also show how to quickly adapt an existing general-domain Question Answering (QA) model to an emerging domain: the Covid-19 pandemic.
Year
DOI
Venue
2020
10.18653/V1/2020.FINDINGS-EMNLP.134
EMNLP
DocType
Volume
Citations 
Conference
2020.findings-emnlp
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Nina Poerner101.35
Ulli Waltinger26410.76
Hinrich Schütze32113362.21