Supervised Massive Data Analysis for Telecommunication Customer Churn Prediction - Citegraph

Paper Info

Title
Supervised Massive Data Analysis for Telecommunication Customer Churn Prediction

Abstract
Customer churn management becomes increasingly critical for telecommunication companies in the competitive mobile market. For retaining customers before they switch to competitors, an accurate customer churn analysis model is important to predict the potential lost customers in two or three months. Two month window is practical for telecommunication companies to design strategies to retain potential lost customers. However it will bring large uncertainty and increase the difficulty for prediction. There are three main difficulties for customer churn prediction modeling. First, the customer churn data set is substantially imbalanced in reality. Second, the samples in feature space are relatively scattering. Third, the dimension of feature space is high and dimension reduction is necessary for algorithm efficiency. To overcome these difficulties, we propose a new supervised one-side sampling technique to pre-process the imbalanced data set. K-means method is applied to cluster the data set into meaningful clusters and then one-sided sampling is applied in each cluster for removing noise and redundant negative samples. Random forest method is used for dimensional reduction and selecting important variables. C5.0 decision tree is the classifier applied in this study to predict customer churn in two or three months. About 2.7 million 4 Generation (4G) telecommunication customer data are used for experiments. We obtain a precision ratio of 80.42% with a recall ratio of 52.43%. The proposed model provides satisfied prediction results which can be practically used to retain potential lost customers.

Year	DOI	Venue
2016	10.1109/BDCloud-SocialCom-SustainCom.2016.35	2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom)
Keywords	Field	DocType
customer churn prediction,massive data analysis,imbalanced data,random forest,decision tree	Data mining,Decision tree,Feature vector,Algorithmic efficiency,Telecommunications,Dimensionality reduction,Computer science,Sampling (statistics),Classifier (linguistics),Random forest,Mobile telephony	Conference
ISBN	Citations	PageRank
978-1-5090-3937-1	1	0.34
References	Authors
4	5

Authors (5 rows)

Cited by (1 rows)

References (4 rows)

Name	Order	Citations	PageRank
Hui Li	1	1	0.34
Deliang Yang	2	1	0.34
Lingling Yang	3	6	2.24
Yao Lu	4	12	4.66
Xiaola Lin	5	1099	78.09

1