Jointly Optimize Capacity, Latency and Engagement in Large-scale Recommendation Systems - Citegraph

Paper Info

Title
Jointly Optimize Capacity, Latency and Engagement in Large-scale Recommendation Systems

Abstract
ABSTRACT As the recommendation systems behind commercial services scale up and apply more and more sophisticated machine learning models, it becomes important to optimize computational cost (capacity) and runtime latency, besides the traditional objective of user engagement. Caching recommended results and reusing them later is a common technique used to reduce capacity and latency. However, the standard caching approach negatively impacts user engagement. To overcome the challenge, this paper presents an approach to optimizing capacity, latency and engagement simultaneously. We propose a smart caching system including a lightweight adjuster model to refresh the cached ranking scores, achieving significant capacity savings without impacting ranking quality. To further optimize latency, we introduce a prefetching strategy which leverages the smart cache. Our production deployment on Facebook Marketplace demonstrates that the approach reduces capacity demand by 50% and p75 end-to-end latency by 35%. While Facebook Marketplace is used as a case study, the approach is applicable to other industrial recommendation systems as well.

Year	DOI	Venue
2021	10.1145/3460231.3474606	RECSYS
Keywords	DocType	Citations
caching, multi-objective optimization, transfer learning	Conference	0
PageRank	References	Authors
0.34	0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hitesh Khandelwal	1	0	0.34
Viet Ha-Thuc	2	0	1.01
Avishek Dutta	3	0	0.68
Yining Lu	4	0	0.34
Nan Du	5	0	0.34
Zhihao Li	6	136	17.95
Qi Hu	7	0	0.34

1