Abstract | ||
---|---|---|
Tensor computation, or computation on high-dimensional arrays, is widely used in deep learning, image processing, and scientific computation. And GPU has become the mainstream platform to accelerate computing. We propose an algorithm which can efficiently find a promising schedule to exploit the parallelism and locality of computation on GPU. In particular, an empirical model comprehensively considering locality, load balance and parallelism sufficiency of computation on given GPU model is designed to measure the quality of a candidate schedule. And empirical constraints are introduced to significantly reduce the searching space of schedule to polynomial complexity in terms of computation dimensions. Compared with the state-of-the-art tool, Tensor Comprehensions, our algorithm can find a promising schedule 5-45× faster, and the corresponding scheduled code runs 1.5-10× faster. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00084 | 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) |
Keywords | DocType | ISBN |
Tensor computation,scheduling,GPU | Conference | 978-1-7281-4329-3 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yuxiang Zhang | 1 | 11 | 15.58 |
Yu Zhang | 2 | 109 | 20.13 |