ParallelFusion: Towards Maximum Utilization of Mobile GPU for DNN Inference - Citegraph

Paper Info

Title
ParallelFusion: Towards Maximum Utilization of Mobile GPU for DNN Inference

Abstract
ABSTRACTMobile GPUs are extremely under-utilized for DNN computations across different mobile deep learning frameworks and multiple DNNs with various complexities. We explore the feasibility of batching and it improves the throughput by up to 35%. However, real-time applications in mobile have a limited amount of requests to get a benefit from batching. To tackle the challenge, we present ParallelFusion technique that enables concurrent execution of heterogeneous operators to further utilize the mobile GPU. We implemented ParallelFusion over the MNN framework and evaluated on 6 state-of-the-art DNNs. Our evaluation shows that Parallel Fusion achieves up to 195% to 218% throughput with fused execution of 2 and 3 operators compared to single DNN inference.

Year	DOI	Venue
2021	10.1145/3469116.3470014	MOBISYS
DocType	Citations	PageRank
Conference	1	0.36
References	Authors
0	3

Authors (3 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jingyu Lee	1	3	1.45
Yunxin Liu	2	694	54.18
Youngki Lee	3	832	70.33

1