Title
Collage Inference: Using Coded Redundancy for Lowering Latency Variation in Distributed Image Classification Systems
Abstract
MLaaS (ML-as-a-Service) offerings by cloud computing platforms are becoming increasingly popular. Hosting pre-trained machine learning models in the cloud enables elastic scalability as the demand grows. But providing low latency and reducing the latency variance is a key requirement. Variance is harder to control in a cloud deployment due to uncertain-ties in resource allocations across many virtual instances. We propose the collage inference technique, which uses a novel convolutional neural network model, collage-cnn, to provide low-cost redundancy. A collage-cnn model takes a collage image formed by combining multiple images and performs multi-image classification in one shot, albeit at slightly lower accuracy. We augment a collection of traditional single image classifier models with a single collage-cnn classifier, which acts as their low-cost redundant backup. Collage-cnn provides backup classification results if any single image classification requests experience a slowdown. Deploying the collage-cnn models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1.2× to 2× compared to replication-based approaches while providing high accuracy. Variation in inference latency can be reduced by 1.8× to 15×.
Year
DOI
Venue
2020
10.1109/ICDCS47774.2020.00024
2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)
Keywords
DocType
ISSN
Cloud Computing,Availability,Stragglers,Neural Networks,Deep Learning,Coded Computing,Redundancy,Machine Learning,Tail latency
Conference
1063-6927
ISBN
Citations 
PageRank 
978-1-7281-7003-9
1
0.38
References 
Authors
0
5
Name
Order
Citations
PageRank
Hema Venkata Krishna Giri Narra110.38
Zhifeng Lin2106.30
Ganesh Ananthanarayanan3140772.93
Amir Salman Avestimehr41880157.39
Murali Annavaram51685113.77