Title
CamemBERT: a Tasty French Language Model
Abstract
Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French, we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging, dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and downstream applications for French NLP.
Year
DOI
Venue
2020
10.18653/V1/2020.ACL-MAIN.645
ACL
DocType
Volume
Citations 
Conference
2020.acl-main
1
PageRank 
References 
Authors
0.36
27
8
Name
Order
Citations
PageRank
Louis Martin111.37
Benjamin Müller211.04
Suárez Pedro Javier Ortiz310.36
yoann dupont414.42
Laurent Romary5511102.00
Éric de la Clergerie623839.48
Djamé Seddah716119.20
beno it sagot832649.52