Memory-Efficient Differentiable Transformer Architecture Search. - Citegraph

Paper Info

Title
Memory-Efficient Differentiable Transformer Architecture Search.

Abstract
Differentiable architecture search (DARTS) is successfully applied in many vision tasks. However, directly using DARTS for Transformers is memory-intensive, which renders the search process infeasible. To this end, we propose a multi-split reversible network and combine it with DARTS. Specifically, we devise a backpropagation-with-reconstruction algorithm so that we only need to store the last layer's outputs. By relieving the memory burden for DARTS, it allows us to search with larger hidden size and more candidate operations. We evaluate the searched architecture on three sequence-to-sequence datasets, i.e., WMT'14 English-German, WMT'14 English-French, and WMT'14 English-Czech. Experimental results show that our network consistently outperforms standard Transformers across the tasks. Moreover, our method compares favorably with big-size Evolved Transformers, reducing search computation by an order of magnitude.

Year	Venue	DocType
2021	ACL/IJCNLP	Conference
Volume	Citations	PageRank
2021.findings-acl	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yuekai Zhao	1	1	1.71
Li Dong	2	582	31.86
Yelong Shen	3	709	35.97
Zhihua Zhang	4	646	62.89
Furu Wei	5	1956	107.57
Weizhu Chen	6	597	38.77

1