Abstract | ||
---|---|---|
We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet is trained using the RoBERTa pre-training procedure (Liu et al., 2019), with the same model configuration as BERT-base (Devlin et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet to facilitate future research and downstream applications on Tweet data. Our BERTweet is available at: https://github.com/VinAIResearch/BERTweet |
Year | DOI | Venue |
---|---|---|
2020 | 10.18653/V1/2020.EMNLP-DEMOS.2 | EMNLP |
DocType | Volume | Citations |
Conference | 2020.emnlp-demos | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dat Quoc Nguyen | 1 | 246 | 25.87 |
Thanh Vu | 2 | 40 | 6.87 |
Tuan Nguyen Anh | 3 | 1 | 5.09 |