Abstract | ||
---|---|---|
Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases. |
Year | Venue | DocType |
---|---|---|
2021 | ICLR | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Joan Puigcerver | 1 | 1 | 2.08 |
Carlos Riquelme | 2 | 11 | 2.30 |
Mustafa Ayoob Basil | 3 | 5 | 3.15 |
Cèdric Renggli | 4 | 9 | 4.23 |
André Susano Pinto | 5 | 0 | 1.69 |
Sylvain Gelly | 6 | 760 | 59.74 |
Daniel Keysers | 7 | 1737 | 140.59 |
Neil Houlsby | 8 | 153 | 14.73 |