Abstract | ||
---|---|---|
Rapid growth of Internet led to web applications that produce large unstructured sparse datasets (e.g., texts, ratings). Machine learning (ML) algorithms are the basis for many important analytics workloads that extract knowledge from these datasets. This paper characterizes such workloads on a high-end server for real-world datasets and shows that a set of sparse matrix operations dominates runtime. Further, they run inefficiently due to low compute-per-byte and challenging thread scaling behavior. As such, we propose a hardware accelerator to perform these operations with extreme efficiency. Simulations and RTL synthesis to 14nm ASIC demonstrate significant performance and performance/Watt improvements over conventional processors, with only a small area overhead. |
Year | Venue | Keywords |
---|---|---|
2016 | PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE) | Hardware accelerator, analytics, machine learning |
Field | DocType | ISSN |
Algorithm design,Computer science,Server,Parallel computing,Thread (computing),Application-specific integrated circuit,Real-time computing,Hardware acceleration,Analytics,Sparse matrix,The Internet | Conference | 1530-1591 |
Citations | PageRank | References |
1 | 0.48 | 13 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Eriko Nurvitadhi | 1 | 399 | 33.08 |
Asit K. Mishra | 2 | 1216 | 46.21 |
Yajun Wang | 3 | 3185 | 163.17 |
Ganesh Venkatesh | 4 | 274 | 17.97 |
Debbie Marr | 5 | 175 | 12.39 |