Abstract | ||
---|---|---|
Efficient deep learning inference requires algorithm and hardware codesign to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the neural architecture design makes the design space much larger: it is not only about designing the hardware architecture but also codesigning the neural architecture to fit the hardware architecture. It is difficult for human engineers to exhaust the design space by heuristics. We propose design automation techniques for architecting efficient neural networks given a target hardware platform. We investigate automatically designing specialized and fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate that such learning-based, automated design achieves superior performance and efficiency than the rule-based human design. Moreover, we shorten the design cycle by 200× than previous work, so that we can afford to design specialized neural network models for different hardware platforms. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/MM.2019.2953153 | IEEE Micro |
Keywords | Field | DocType |
AutoML,Neural Architecture Search,Channel Pruning,Mixed-Precision,Quantization,Specialization,Efficient Inference | Computer science,Parallel computing,Artificial neural network,Distributed computing | Journal |
Volume | Issue | ISSN |
40 | 1 | 0272-1732 |
Citations | PageRank | References |
2 | 0.37 | 0 |
Authors | ||
8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Han Cai | 1 | 223 | 10.39 |
Lin, Ji | 2 | 79 | 8.18 |
Yujun Lin | 3 | 101 | 7.03 |
Zhijian Liu | 4 | 59 | 9.80 |
Kuan Wang | 5 | 45 | 3.06 |
Tianzhe Wang | 6 | 10 | 1.79 |
Ligen Zhu | 7 | 83 | 5.19 |
Song Han | 8 | 2102 | 79.81 |