Abstract | ||
---|---|---|
We present FlashEmbedding, a hardware/software co-design solution for storing embedding tables on SSDs for large-scale recommendation inference under memory capacity-limited systems. FlashEmbedding leverages an embedding semantic-aware SSD, an embedding-oriented software cache, and pipeline techniques to improve the overall performance. We evaluate the performance of FlashEmbedding with our FPGA-based prototype SSD on a real-world public dataset. FlashEmbedding achieves up to 17.44x lower latency in embedding lookups and 2.89x lower end-to-end latency than baseline solution in a memory capacity-limted system. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3476886.3477511 | APSYS '21: PROCEEDINGS OF THE 12TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS |
Keywords | DocType | Citations |
Recommender systems, Embedding, Solid-state drive (SSD) | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hu Wan | 1 | 0 | 0.34 |
Xuan Sun | 2 | 0 | 1.69 |
Yufei Cui | 3 | 6 | 7.02 |
Chia-Lin Yang | 4 | 1033 | 76.39 |
Tei-Wei Kuo | 5 | 3203 | 326.35 |
Chun Jason Xue | 6 | 0 | 0.68 |