Title
Supervised Massive Data Analysis for Telecommunication Customer Churn Prediction
Abstract
Customer churn management becomes increasingly critical for telecommunication companies in the competitive mobile market. For retaining customers before they switch to competitors, an accurate customer churn analysis model is important to predict the potential lost customers in two or three months. Two month window is practical for telecommunication companies to design strategies to retain potential lost customers. However it will bring large uncertainty and increase the difficulty for prediction. There are three main difficulties for customer churn prediction modeling. First, the customer churn data set is substantially imbalanced in reality. Second, the samples in feature space are relatively scattering. Third, the dimension of feature space is high and dimension reduction is necessary for algorithm efficiency. To overcome these difficulties, we propose a new supervised one-side sampling technique to pre-process the imbalanced data set. K-means method is applied to cluster the data set into meaningful clusters and then one-sided sampling is applied in each cluster for removing noise and redundant negative samples. Random forest method is used for dimensional reduction and selecting important variables. C5.0 decision tree is the classifier applied in this study to predict customer churn in two or three months. About 2.7 million 4 Generation (4G) telecommunication customer data are used for experiments. We obtain a precision ratio of 80.42% with a recall ratio of 52.43%. The proposed model provides satisfied prediction results which can be practically used to retain potential lost customers.
Year
DOI
Venue
2016
10.1109/BDCloud-SocialCom-SustainCom.2016.35
2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom)
Keywords
Field
DocType
customer churn prediction,massive data analysis,imbalanced data,random forest,decision tree
Data mining,Decision tree,Feature vector,Algorithmic efficiency,Telecommunications,Dimensionality reduction,Computer science,Sampling (statistics),Classifier (linguistics),Random forest,Mobile telephony
Conference
ISBN
Citations 
PageRank 
978-1-5090-3937-1
1
0.34
References 
Authors
4
5
Name
Order
Citations
PageRank
Hui Li110.34
Deliang Yang210.34
Lingling Yang362.24
Yao Lu4124.66
Xiaola Lin5109978.09