Pre-Training With Whole Word Masking for Chinese BERT - Citegraph

Paper Info

Title
Pre-Training With Whole Word Masking for Chinese BERT

Abstract
Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community.(1)

Year	DOI	Venue
2021	10.1109/TASLP.2021.3124365	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords	DocType	Volume
Bit error rate, Task analysis, Computational modeling, Training, Analytical models, Adaptation models, Predictive models, Pre-trained language model, representation learning, natural language processing	Journal	29
Issue	ISSN	Citations
1	2329-9290	4
PageRank	References	Authors
0.46	7	7

Authors (7 rows)

Cited by (4 rows)

References (7 rows)

Name	Order	Citations	PageRank
Yiming Cui	1	87	13.40
Wanxiang Che	2	711	66.39
Ting Liu	3	2735	232.31
Bing Qin	4	1076	72.82
Ziqing Yang	5	4	1.82
Shijin Wang	6	180	31.56
Guoping Hu	7	309	37.32

1