Title
POSTER: Pairing Up CNNs for High Throughput Deep Learning
Abstract
To facilitate the efficient execution of convolutional neural networks (CNNs) on cloud servers, this paper proposes Yin Yang (YY), an input-driven synergistic deep learning system, which dynamically distributes CNN computation between a complex (Yang) and a simple (Yin) CNN. YY runs most of the inferences on Yin, while Yang is invoked only when Yin has low confidence. On average, compared to the traditional CNN as a service approach, YY improves datacenter throughput by 1.8× and reduces inference latency by 31% on an NVIDIA TITAN X GPU without any accuracy loss across 21 CNNs.
Year
DOI
Venue
2019
10.1109/PACT.2019.00065
2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Keywords
Field
DocType
efficient neural network, inference, cloud servers
Convolutional neural network,Computer science,Inference,Latency (engineering),Parallel computing,Server,Artificial intelligence,Throughput,Deep learning,Artificial neural network,Computation
Conference
ISSN
ISBN
Citations 
1089-795X
978-1-7281-3614-1
0
PageRank 
References 
Authors
0.34
2
3
Name
Order
Citations
PageRank
Babak Zamirai1583.64
Salar Latifi211.04
Scott Mahlke34811312.08