Abstract | ||
---|---|---|
Recent days have witnessed a diverse set of knowledge injection models for pre-trained language models (PTMs); however, most previous studies neglect the PTMs' own ability with quantities of implicit knowledge stored in parameters. A recent study [2] has observed knowledge neurons in the Feed Forward Network (FFN), which are responsible for expressing factual knowledge. In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers. Empirically results on two knowledge-intensive tasks, commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE), demonstrate that Kformer can yield better performance than other knowledge injection technologies such as concatenation or attention-based injection. We think the proposed simple model and empirical findings may be helpful for the community to develop more powerful knowledge injection methods1 (Code available in https:// github.com/zjunlp/Kformer). |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-031-17120-8_11 | NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I |
Keywords | DocType | Volume |
Transformer, Feed Forward Network, Knowledge injection | Conference | 13551 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yunzhi Yao | 1 | 0 | 1.01 |
Shaohan Huang | 2 | 57 | 10.29 |
Li Dong | 3 | 0 | 0.34 |
Li Dong | 4 | 582 | 31.86 |
Huanhuan Chen | 5 | 731 | 101.79 |
Ningyu Zhang | 6 | 63 | 18.56 |