Abstract | ||
---|---|---|
Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community.(1) |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/TASLP.2021.3124365 | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING |
Keywords | DocType | Volume |
Bit error rate, Task analysis, Computational modeling, Training, Analytical models, Adaptation models, Predictive models, Pre-trained language model, representation learning, natural language processing | Journal | 29 |
Issue | ISSN | Citations |
1 | 2329-9290 | 4 |
PageRank | References | Authors |
0.46 | 7 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yiming Cui | 1 | 87 | 13.40 |
Wanxiang Che | 2 | 711 | 66.39 |
Ting Liu | 3 | 2735 | 232.31 |
Bing Qin | 4 | 1076 | 72.82 |
Ziqing Yang | 5 | 4 | 1.82 |
Shijin Wang | 6 | 180 | 31.56 |
Guoping Hu | 7 | 309 | 37.32 |