Title
<italic>Astrea:</italic> Auto-Serverless Analytics Towards Cost-Efficiency and QoS-Awareness
Abstract
With the ability to simplify the code deployment with one-click upload and lightweight execution, serverless computing has emerged as a promising paradigm with increasing popularity. However, there remain open challenges when adapting data-intensive analytics applications to the serverless context, in which users of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">serverless analytics</i> encounter the difficulty in coordinating computation across different stages and provisioning resources in a large configuration space. This paper presents our design and implementation of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Astrea</i> , which configures and orchestrates serverless analytics jobs in an autonomous manner, while taking into account flexibly-specified user requirements. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Astrea</i> relies on the modeling of performance and cost which characterizes the intricate interplay among multi-dimensional factors (e.g., function memory size, degree of parallelism at each stage). We formulate an optimization problem based on user-specific requirements towards performance enhancement or cost reduction, and develop a set of algorithms based on graph theory to obtain the optimal job execution. We deploy <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Astrea</i> in the AWS Lambda platform and conduct real-world experiments over representative benchmarks, including Big Data analytics and machine learning workloads, at different scales. Extensive results demonstrate that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Astrea</i> can achieve the optimal execution decision for serverless data analytics, in comparison with various provisioning and deployment baselines. For example, when compared with three provisioning baselines, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Astrea</i> manages to reduce the job completion time by 21% to 69% under a given budget constraint, while saving cost by 20% to 84% without violating performance requirements.
Year
DOI
Venue
2022
10.1109/TPDS.2022.3172069
IEEE Transactions on Parallel and Distributed Systems
Keywords
DocType
Volume
Cloud computing,serverless computing,resource provisioning,modeling,optimization
Journal
33
Issue
ISSN
Citations 
12
1045-9219
0
PageRank 
References 
Authors
0.34
21
4
Name
Order
Citations
PageRank
Jananie Jarachanthan100.34
Li Chen221.06
Fei Xu300.68
Baochun Li49416614.20