Title
An Extensive Examination of Regression Models with a Binary Outcome Variable.
Abstract
Linear regression is among the most popular statistical models in social sciences research, and researchers in various disciplines use linear probability models (LPMs)-linear regression models applied to a binary outcome. Surprisingly, LPMs are rare in the IS literature, where researchers typically use logit and probit models for binary outcomes. Researchers have examined specific aspects of LPMs' but not thoroughly evaluated their practical pros and cons for different research goals under different scenarios. We perform an extensive simulation study to evaluate the advantages and dangers of LPMs, especially with respect to big data, which is now common in IS research. We evaluate LPMs for three common uses of binary outcome models: inference and estimation, prediction and classification, and selection bias. We compare its performance to logit and probit under different sample sizes, error distributions, and more. We find that coefficient directions, statistical significance, and marginal effects yield results similar to logit and probit. In addition, LPM estimators are consistent for the true parameters up to a multiplicative scalar. This scalar, although rarely required, can be estimated assuming an appropriate error distribution. For classification and selection bias, LPMs are on par with logit and probit models in terms of class separation and ranking and is a viable alternative in selection models. LPMs are lacking when the predicted probabilities are of interest because predicted probabilities can exceed the unit interval. We illustrate some of these results by modeling price in online auctions using data from eBay.
Year
Venue
Keywords
2017
JOURNAL OF THE ASSOCIATION FOR INFORMATION SYSTEMS
Linear Regression,Linear Probability Model,Binary Outcome,Selection Bias,Estimation,Inference,Prediction,Big Data,Logit,Probit
Field
DocType
Volume
Econometrics,Binomial regression,Computer science,Regression analysis,Proper linear model,Bayesian multivariate linear regression,Statistics,Linear predictor function,Logistic regression,Regression dilution,Linear regression
Journal
18
Issue
ISSN
Citations 
4
1536-9323
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Suneel Chatla101.35
Galit Shmueli226523.00