What's Hidden in a One-layer Randomly Weighted Transformer? - Citegraph

Paper Info

Title
What's Hidden in a One-layer Randomly Weighted Transformer?

Abstract
We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98%/92% (34.14/25.24 BLEU) of the performance of, a trained Transformer small/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger and deeper transformers in this setting, as well as the impact of different initialization methods. We released the source code at https://github.com/sIncerass/one_layer_lottery_ticket.

Year	Venue	DocType
2021	EMNLP	Conference
Citations	PageRank	References
0	0.34	0
Authors
5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sheng Shen	1	1	1.02
Zhewei Yao	2	31	10.58
Douwe Kiela	3	0	0.34
Kurt Keutzer	4	5040	801.67
Michael W. Mahoney	5	3297	218.10

1