Title
Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods.
Abstract
Crash injury severity prediction is a promising research target in traffic safety. Traditionally, various statistical methods were used for modeling crash injury severities. In recent years, machine learning-based methods are becoming popular due to their good predictive performance. However, the machine learning-based models are usually criticized as they perform like a black-box. In this paper, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among various machine learning and statistical methods with distinct modeling logic for crash severity analysis. The crash severity, road geometry, and traffic flow data were collected at freeway diverge areas in Florida. We estimated two most commonly used statistical methods which were ordered probit (OP) model and multinomial logit model, and four popular machine learning methods, including K-Nearest Neighbor, Decision Tree, Random Forest (RF), and Support Vector Machine. The correct prediction rate for each crash severity level and the overall correct prediction rate were calculated. The results showed that the machine learning methods had higher predicting accuracy than the statistical methods, though they suffered from over-fitting issue. The RF method had the best prediction in overall and severe crashes while OP was the weakest one. We compared variable importance on crash severity via perturbation-based sensitivity analyses. The results showed that the inferences of variable importance from different methods were not always consistent and should be paid careful attention.
Year
DOI
Venue
2018
10.1109/ACCESS.2018.2874979
IEEE ACCESS
Keywords
Field
DocType
Crash severity,statistical model,machine learning,accuracy,variable importance
Decision tree,Crash,Traffic flow,Multinomial logistic regression,Computer science,Ordered probit,Support vector machine,Artificial intelligence,Random forest,Machine learning,Statistical analysis
Journal
Volume
ISSN
Citations 
6
2169-3536
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Jian Zhang100.34
Zhibin Li2416.93
Ziyuan Pu351.75
Chengcheng Xu466.32