Abstract | ||
---|---|---|
Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, providing up to $9.8 \times$ memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers $4.2 \times$ throughput improvement and 45.8% memory energy savings. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ISCA45697.2020.00070 | 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) |
DocType | ISSN | ISBN |
Conference | 0884-7495 | 978-1-7281-4661-4 |
Citations | PageRank | References |
19 | 0.83 | 60 |
Authors | ||
21 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liu Ke | 1 | 29 | 3.20 |
Udit Gupta | 2 | 74 | 6.27 |
Benjamin Youngjae Cho | 3 | 19 | 0.83 |
David Brooks | 4 | 5518 | 422.08 |
Vikas Chandra | 5 | 691 | 59.76 |
Utku Diril | 6 | 19 | 0.83 |
Amin Firoozshahian | 7 | 19 | 0.83 |
Kim M. Hazelwood | 8 | 2465 | 110.46 |
Bill Jia | 9 | 126 | 5.90 |
Hsien-Hsin Sean Lee | 10 | 1657 | 102.66 |
Meng Li | 11 | 19 | 1.84 |
Bert Maher | 12 | 19 | 0.83 |
Dheevatsa Mudigere | 13 | 289 | 19.84 |
Maxim Naumov | 14 | 68 | 10.29 |
Martin Schatz | 15 | 19 | 0.83 |
Mikhail Smelyanskiy | 16 | 1160 | 65.96 |
Xiaodong Wang | 17 | 126 | 6.24 |
Brandon Reagen | 18 | 210 | 13.90 |
Carole-Jean Wu | 19 | 432 | 23.81 |
Mark Hempstead | 20 | 980 | 81.39 |
xuan zhang | 21 | 93 | 25.30 |