Churn Data
Churn Data
OVERVIEW
Introduction Research questions Operational churn definition Data Survival Analysis Predictive churn models Tests and results Conclusions and recommendations Questions
INTRODUCTION
Mobile telecommunications industry
Changed from a rapidly growing market, into a state of saturation and fierce competition. Focus shifted from building a large customer base into keeping customers in house.
INTRODUCTION
Churn
Churn prevention:
Acquiring more loyal customers initially Identifying customers most likely to churn
INTRODUCTION
Predictive churn modelling
INTRODUCTION
Predictive churn modelling
Trained by offering snapshots of churned customers and nonchurned customers. Disadvantage: The time aspect often involved in these problems is neglected.
Survival analysis
INTRODUCTION
Prepaid versus postpaid
INTRODUCTION
Prepaid versus postpaid
RESEARCH QUESTIONS
Is it possible to make a prepaid churn model based on the theory of survival analysis?
What is a proper, practical and measurable prepaid churn definition? How well do survival models perform in comparison to the established predictive models?
Do survival models have an added value compared to the established predictive models?
RESEARCH QUESTIONS
To answer the 2nd and 3rd sub question, a second predictive model is considered Decision tree Direct comparison in tests and results.
+ is used as a threshold.
DATA
Database provided by Vodafone. Already monthly aggregated data. Only usage and billing information. Derived variables: capture customer behaviour in a better way.
recharge this month yes/no time since last recharge
SURVIVAL ANALYSIS
Survival analysis is a collection of statistical methods which model time-to-event data. The time until the event occurs is of interest. In our case the event is churn.
SURVIVAL ANALYSIS
Survival function S(t):
The survival at time t is the probability that a subject will survive to that point in time.
SURVIVAL ANALYSIS
SURVIVAL ANALYSIS
Hazard rate function :
Probability that event occurs in current interval, given that event has not already occurred.
The hazard (rate) at time t describes the frequency of the occurance of the event in events per <time period>. instantaneous
SURVIVAL ANALYSIS
SURVIVAL ANALYSIS
commitment date
SURVIVAL ANALYSIS
How can accommodate to an individual?
SURVIVAL MODEL
Cox model
Hazard for individual i at time t Baseline hazard: the average hazard curve
SURVIVAL MODEL
Cox model
SURVIVAL MODEL
Cox model
Drawback: hazard at time t only dependent on baseline hazard, not on variables. We want to include time-dependent covariates variables that vary over time, e.g. the number of SMS messages per month.
SURVIVAL MODEL
Extended Cox model
SURVIVAL MODEL
Extended Cox model
Now we can compute the hazard for time t, but in fact we want to forecast. In fact, the data from this month is already outdated. Lagging of variables is required:
SURVIVAL MODEL
Principal component regression
Principal component analysis (PCA): Reduce the dimensionality of the dataset while retaining as much as possible of the variation present in the dataset. Transform variables into new ones principal components.
SURVIVAL MODEL
Principal component regression
SURVIVAL MODEL
Principal component regression
Principal component regression: Use principal components as variables in model. First reason: Reduces collinearity. Collinearity causes inaccurate estimations of the regression coefficients.
SURVIVAL MODEL
SURVIVAL MODEL
Principal component regression
Second reason: Reduce dimensionality The first 20 components are chosen. Safe choice, because principal components with largest variances are not necessarily the best predictors.
SURVIVAL MODEL
Extended Cox model
SURVIVAL MODEL
Example
SURVIVAL MODEL
Example
DECISION TREE
Compare with the performance the extended Cox model.
DECISION TREE
DECISION TREE
Recursive partitioning. An iterative process of splitting the data up into (in this case) two partitions.
DECISION TREE
Optimal tree size
DECISION TREE
Optimal tree size
10-fold cross-validation
DECISION TREE
Optimal tree size
DECISION TREE
Oversampling
Oversampling: alter the proportion of the outcomes in the training set. Increases the proportion of the less frequent outcome (churn). Why? Otherwise not sensible enough. Proportion changed to 1/3 churn and 2/3 non-churn.
DECISION TREE
Churn definition 1
DECISION TREE
Churn definition 2
Goal: gain insight into the performance of the extended Cox model. Same test set for extended Cox model and decision tree. Direct comparison possible.
Extended Cox model gives satisfying results with both a high sensitivity and specificity. However, the decision tree performs even better. Time aspect incorporated by the extended Cox model does not provide an advantage over the decision tree in this particular problem.
Put the results in perspective dependent on churn definition. Already difference between churn definition 1 and 2. A new and different churn definition is likely to yield different results. Churn definition too simple? Size of the decision trees.
How well do survival models perform in comparison to the established predictive models?
Survival model = Extended Cox model. Established predictive model = Decision tree. High sensitivity and specificity. However, not better than the decision tree.
Do survival models have an added value compared to the established predictive models?
Models time aspect through baseline hazard. Can handle censored data. Stratification customer groups. If only time-independent variables predict at a future time.
Is it possible to make a prepaid churn model based on the theory of survival analysis?
Yes! We have shown that it gives results with both a high sensitivity and specificity. In this particular prepaid problem, no benefit over decision tree.
Switching of sim-cards.
Neural networks for survival data can handle nonlinear relationships. Other scoring methods.
QUESTIONS