Title
GLMix: Generalized Linear Mixed Models For Large-Scale Response Prediction
Abstract
Generalized linear model (GLM) is a widely used class of models for statistical inference and response prediction problems. For instance, in order to recommend relevant content to a user or optimize for revenue, many web companies use logistic regression models to predict the probability of the user's clicking on an item (e.g., ad, news article, job). In scenarios where the data is abundant, having a more fine-grained model at the user or item level would potentially lead to more accurate prediction, as the user's personal preferences on items and the item's specific attraction for users can be better captured. One common approach is to introduce ID-level regression coefficients in addition to the global regression coefficients in a GLM setting, and such models are called generalized linear mixed models (GLMix) in the statistical literature. However, for big data sets with a large number of ID-level coefficients, fitting a GLMix model can be computationally challenging. In this paper, we report how we successfully overcame the scalability bottleneck by applying parallelized block coordinate descent under the Bulk Synchronous Parallel (BSP) paradigm. We deployed the model in the LinkedIn job recommender system, and generated 20% to 40% more job applications for job seekers on LinkedIn.
Year
DOI
Venue
2016
10.1145/2939672.2939684
KDD
Field
DocType
Citations 
Recommender system,Data mining,Computer science,Generalized linear model,Artificial intelligence,Statistical model,Statistical inference,Coordinate descent,Generalized linear mixed model,Bulk synchronous parallel,Machine learning,Linear regression
Conference
16
PageRank 
References 
Authors
1.12
16
6
Name
Order
Citations
PageRank
XianXing Zhang1161.12
Bee-Chung Chen2113162.51
Liang Zhang313810.45
Yitong Zhou4292.21
Yiming Ma52451154.28
Deepak Agarwal6139183.44