Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators - Citegraph

Paper Info

Title
Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Abstract
We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.

Year	Venue	Keywords
2022	International Conference on Learning Representations (ICLR)	Language Model Pretraining
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yu Meng	1	49	11.09
Chen-Yan Xiong	2	405	30.82
Payal Bajaj	3	6	1.44
saurabh tiwary	4	29	3.86
Paul N. Bennett	5	1500	87.93
Jiawei Han	6	0	7.44
Xia Song	7	30	3.19

1