Abstract | ||
---|---|---|
A common technique for compressing a neural network is to compute the k-rank l2 approximation Ak of the matrix A is an element of Rnxd via SVD that corresponds to a fully connected layer (or embedding layer). Here, d is the number of input neurons in the layer, n is the number in the next one, and Ak is stored in O((n+d)k) memory instead of O(nd). Then, a fine-tuning step is used to improve this initial compression. However, end users may not have the required computation resources, time, or budget to run this fine-tuning stage. Furthermore, the original training set may not be available. In this paper, we provide an algorithm for compressing neural networks using a similar initial compression time (to common techniques) but without the fine-tuning step. The main idea is replacing the k-rank l2 approximation with lp, for p is an element of[1,2], which is known to be less sensitive to outliers but much harder to compute. Our main technical result is a practical and provable approximation algorithm to compute it for any p >= 1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing the networks BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage. |
Year | DOI | Venue |
---|---|---|
2021 | 10.3390/s21165599 | SENSORS |
Keywords | DocType | Volume |
matrix factorization, neural networks compression, robust low rank approximation, Lowner ellipsoid | Journal | 21 |
Issue | ISSN | Citations |
16 | 1424-8220 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tukan Murad | 1 | 0 | 2.03 |
Alaa Maalouf | 2 | 0 | 0.34 |
Matan Weksler | 3 | 0 | 0.34 |
Dan Feldman | 4 | 94 | 5.06 |