Title
E<sup>2</sup>bird: <underline>E</underline>nhanced <underline>E</underline>lastic <underline>B</underline>atch for <underline>I</underline>mproving <underline>R</underline>esponsiveness and Throughput of <underline>D</underline>eep Learning Services
Abstract
We aim to tackle existing problems about deep learning serving on GPUs in the view of the system. GPUs have been widely adopted to serve online deep learning-based services that have stringent QoS(Quality-of-Service) requirements. However, emerging deep learning serving systems often result in poor responsiveness and low throughput of the inferences that damage user experience and increase the number of GPUs required to host an online service. Our investigation shows that the poor batching operation and the lack of data transfer-computation overlap are the root causes of the poor responsiveness and low throughput. To this end, we propose E <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> bird, a deep learning serving system that is comprised of a GPU-resident memory pool, a multi-granularity inference engine, and an elastic batch scheduler. The memory pool eliminates the unnecessary waiting of the batching operation and enables data transfer-computation overlap. The inference engine enables concurrent execution of different batches, improving the GPU resource utilization. The batch scheduler organizes inferences elasticallyto guarantee the QoS. Our experimental results on an Nvidia Titan RTXGPU show that E <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> bird reduces the response latency of inferences by up to 82.4 percent and improves the throughput by up to 62.8 percent while guaranteeing the QoS target compared with TensorFlow Serving.
Year
DOI
Venue
2021
10.1109/TPDS.2020.3047638
IEEE Transactions on Parallel and Distributed Systems
Keywords
DocType
Volume
GPUs,DL serving,latency,throughput,responsiveness
Journal
32
Issue
ISSN
Citations 
6
1045-9219
3
PageRank 
References 
Authors
0.37
0
6
Name
Order
Citations
PageRank
Weihao Cui1133.27
Quan Chen217521.86
Han Zhao381.81
Mengze Wei430.37
Xiaoxin Tang560.79
Minyi Guo63969332.25