Abstract | ||
---|---|---|
To facilitate the efficient execution of convolutional neural networks (CNNs) on cloud servers, this paper proposes Yin Yang (YY), an input-driven synergistic deep learning system, which dynamically distributes CNN computation between a complex (Yang) and a simple (Yin) CNN. YY runs most of the inferences on Yin, while Yang is invoked only when Yin has low confidence. On average, compared to the traditional CNN as a service approach, YY improves datacenter throughput by 1.8× and reduces inference latency by 31% on an NVIDIA TITAN X GPU without any accuracy loss across 21 CNNs. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/PACT.2019.00065 | 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT) |
Keywords | Field | DocType |
efficient neural network, inference, cloud servers | Convolutional neural network,Computer science,Inference,Latency (engineering),Parallel computing,Server,Artificial intelligence,Throughput,Deep learning,Artificial neural network,Computation | Conference |
ISSN | ISBN | Citations |
1089-795X | 978-1-7281-3614-1 | 0 |
PageRank | References | Authors |
0.34 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Babak Zamirai | 1 | 58 | 3.64 |
Salar Latifi | 2 | 1 | 1.04 |
Scott Mahlke | 3 | 4811 | 312.08 |