Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks - Citegraph

Paper Info

Title
Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

Abstract
Deep Neural Networks (DNNs) have reinvigorated real-world applications that rely on learning patterns of data and are permeating into different industries and markets. Cloud infrastructure and accelerators that offer INFerence-as-a-Service (INFaaS) have become the enabler of this rather quick and invasive shift in the industry. To that end, mostly accelerator-based INFaaS (Google's TPU [1], NVIDIA T4 [2], Microsoft Brainwave [3], etc.) has become the backbone of many real-life applications. However, as the demand for such services grows, merely scaling-out the number of accelerators is not economically cost-effective. Although multi-tenancy has propelled datacenter scalability, it has not been a primary factor in designing DNN accelerators due to the arms race for higher speed and efficiency. This paper sets out to explore this timely requirement of multi-tenancy through a new dimension: dynamic architecture fission. To that end, we define Planaria <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime. This microarchitectural capability enables spatially co-locating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration. To realize this dynamic reconfigurability, we first devise breakable omni-directional systolic arrays for DNN acceleration that allows omni-directional flow of data. Second, it uses this capability and a unique organization of on-chip memory, interconnection, and compute resources to enable fission in systolic array based DNN accelerators. Architecture fission and its associated flexibility enables an extra degree of freedom for task scheduling, that even allows breaking the accelerator with regard to the server load, DNN topology, and task priority. As such, it can simultaneously co-locate DNNs to enhance utilization, throughput, QoS, and fairness. We compare the proposed design to PREMA [4], a recent effort that offers multi-tenancy by time-multiplexing the DNN accelerator across multiple tasks. We use the same frequency, the same amount of compute and memory resources for both accelerators. The results show significant benefits with (soft, medium, hard) QoS requirements, in throughput (7.4×, 7.2×, 12.2×), SLA satisfaction rate (45%, 15%, 16%), and fairness (2.1×, 2.3×, 1.9×).

Year	DOI	Venue
2020	10.1109/MICRO50266.2020.00062	2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
Keywords	DocType	ISBN
Accelerators,Deep Neural Networks,DNN,DNN Acceleration,Multi-Tenancy,Spatial DNN Task Co-Location,Multi-Tenant DNN Acceleration,Dynamic Architecture Fission,Omni-Directional Systolic Arrays	Conference	978-1-7281-7384-9
Citations	PageRank	References
5	0.44	51
Authors
12

Authors (12 rows)

Cited by (5 rows)

References (51 rows)

Name	Order	Citations	PageRank
Soroush Ghodrati	1	13	1.94
Byung Hoon Ahn	2	5	0.44
Joon Kyung Kim	3	60	2.38
Sean Kinzer	4	7	2.14
Brahmendra Reddy Yatham	5	5	0.44
Navateja Alla	6	5	0.44
Hardik Sharma	7	5	0.44
Alian, M.	8	38	6.22
Eiman Ebrahimi	9	479	17.23
Nam Sung Kim	10	3268	225.99
Cliff Young	11	411	46.31
H. Esmaeilzadeh	12	1443	69.71

1