The Architecture of A Churn Prediction System Based On Stream Mining
The Architecture of A Churn Prediction System Based On Stream Mining
1. Introduction
2. Preliminaries
Like in every business sector, operators are always looking to get a better under-
standing about their customers’ preferences and their satisfaction to offer them
better products and services. Despite of the maturity of this market and due to
subscriber growth down and revenues flat, churn management has become into
one of the most pressing problems faced by operators. There are many reasons
that may affect a subscriber on deciding to churn. Some of them may be:
• In contrast to post-paid customers, prepaid customers are not bound by
a contract. The central problem concerning prepaid customers is that the
actual churn date in most cases is difficult to assess.
• Customer loyalty is directly related to the customer service and service
experience. Lack of connection capabilities or quality in places where the
customer requires service can cause customers to abandon their current
service provider in favor of others with broader reach or a more robust
network. Besides, a slow response to customer complaints or billing errors
are sure paths to a customer relations disaster.
• Finally other factors such as the pricing, lack of features (customers may
switch carriers for features not provided by their current providers) or cov-
erage, new technology introduced by competitors (for example, high-speed
data) or the fact that new competitors enter the market, are also reason
that has their influence on the churn rate.
Churn prediction modeling techniques attempt to understand the precise cus-
tomer behaviors and attributes which signal the risk of customer churn. The ac-
curacy of the technique used is obviously critical to the success of any proactive
retention efforts. In order to have successful loyalty programs operators need to
analyze their customers based on several parameters such as spending and usage
of each service and product as well as their behavior and traffic patterns. All these
decisions should be made upon the analysis of the data related to customer. Time
is critical, which means that online approaches are gaining more momentum to
support operational decisions in almost real time. By means of online analysis, an
operator can launch and execute quickly some actions with aim of retaining some
customers, and measure the impact of the campaign in real time which allows
adapting them instantly as required.
3. Requirements
4. Architecture
In this paper, we describe the architecture proposed for our prototype. It is de-
picted in Figure 1, and its main components are described next.
4.2. Integrator
The system processes as input a number of streams with diverse information. For
example, a stream of call records, a stream with billing actions by the company
and bill payments by the subscribers, contents from social networking services
such as Twitter, etc. The Integrator module integrates all these streams, gener-
ating a logically unique stream of events. We distinguish several type of events,
including at least a subscriber joining the company, calls and SMSes, complaints,
bills emitted and bills paid, tweets by a subscriber, and “churn” events (e.g., a
user explicitly has left the company, or for a prepaid user, it has been declared as
a churner according to the company’s criterion).
The Data Manager module manages the customer information database and the
Pending Predictions queue.
The customer information database contains basic information about our sub-
scribers (age, address, type of contract, etc.) as well as highly dynamic infor-
mation (e.g. last numbers called, most frequently numbers called). The Pending
Predictions queue contains all predictions that are awaiting for confirmation or
refusal in the future (that is, whether the subscriber to which the prediction refers
to has churned or not within a specified time frame).
These two structures, the customer information database and the Pending
prediction queue, can easily become the two main bottlenecks in the system if
not carefully implemented, both for time and for memory usage. If they do not
fit in RAM, a write-optimized disk-resident database will be required.
The Record Generator receives the stream of Events generated by the integrator
and uses it for two purposes. First, it updates the customer information database
according to each Event. Second, it generates one or more Records from each
Event using information from the customer information database. Thus, it creates
a stream of Records passed downstream.
A Record is a vector of features, the first of which is a subscriber identifier,
and the rest contains all the information about that subscriber state that is con-
sidered relevant for prediction. Many of them cannot be directly derived from the
Event, but are aggregations of information about the customer precomputed in
the database. One of the features (say, the last one) indicates if the subscriber
has churned so far: it will be true when the Event originating the record was a
churning one.
The prototype currently use this set of features for prediction, which figure
among the most widely reported as useful in the literature:
• Age, sex, income range
• Contract type (mobile or landline, pre-paid or post-paid)
• Average call duration during last month
• Number of calls last week and last month
• Increase of decrease in number of calls in the last 2 weeks and the last 2
months
• % of calls by/to this subscriber where the caller/recipient belongs to an-
other company
• # of complaints in the last 2 months, and % of these that were resolved
satisfactorily
• Average bill value
• Increase or decrease of value last 2 bills
We made the following optimization for efficiency. Every event gives rise to at
least one record, with the exception that every day only one call by or from a
subscriber generates a record and a prediction. That is, if a subscriber gets or
receives 20 calls in a day, all of them will be used to update his/her statistics
in the customer database, but only the first one will generate a record and a
prediction. This introduces a delay of (at most) one day in flagging this customer
as churner, but reduces a lot of overhead.
The Record Processor is the heart of the system: it builds, maintains, and applies
the predictive models. It therefore contains the data mining or machine learning
algorithms that make prediction possible.
When a record not indicating churn arrives, it passes the record through the
current model and makes a churn prediction for it. The record with its prediction
is stored the Pending Predictions queue, waiting for future confirmation. When
a record indicating churn arrives, records for that subscriber are searched in the
Pending Predictions queue and, if found, passed to the model trainer as positive
instances of churning. Expired records in the Pending Predictions queue (corre-
sponding to subscribers that did not churn within a specified time) are passed to
the model trainer as negative instances of churn. All records (describing current
states of subscribers) are passed a clustering submodule to build subscriber pro-
files. We have used both 1) clustering methods available in MOA and 2) the split
induced by the Hoeffding tree branches to define customer segments; they may
give alternate segmentations of potential use for analysis.
Additionally, a background process periodically scans customers for which
no Event has occurred and injects a special record indicating “no activity”, so
that 1) a prediction for the customer is generated from time to time (in case e.g.
inactivity may indicate churning propension) and 2) the system is also trained to
predict well on periods of customer inactivity.
The Record Processor module thus produces predictions, statistics and profiles of
the predicted churners. The predicted churners id’s and their current profiles are
passed to the user interface or other parts of the customer management system
so that adequate actions can be assessed and taken.
The subscriber profiles provide information for human analysts to build un-
derstandable portraits of churners and causes for their churning. Moreover it al-
lows to focus the retention action efforts, such as calling with a promotion, to sub-
sets of the subscribers with propensity to churning, even before they are flagged
as churners by the system.
4.7. Implementation
C I
⋯ Moodt ⋯
Callt Complaint
The actions of each user are governed by a dynamic markovian model whose
current state determines the user’s “mood”, which is in one of four states:
{happy,neutral,angry,churn}. The dynamics of this model are as follows:
This internal mood state, which is unknown to the provider, affects the behavior
of the user in multiple ways.
1. A user only complains if ”Angry” (and the longer the time in ”Angry”,
the more s/he complains.
2. A user only churns if ”Angry” (and this becomes more likely the longer
time s/he’s been ”Angry”)
3. The longer time in ”Angry”, the less s/he calls.
4. The longer time in ”Happy”, the more s/he calls.
5. When s/he goes back to ”neutral”, the rate of calls per day goes back
slowly towards the default value for the user.
6. Both duration and number of calls depend on the hidden parameter C.
Neutral
pn,h
t pn,a
t
ph,n
t
pa.n
t
ph,h
t
Happy Angry pa,a
t
pa,c
t
Churn
We obtain good levels of recall and precision, roughly to the point that the ran-
domness that we placed in the random generator allows. That is, we can predict
which users will churn with an accuracy that is close to the probability with which
(randomly) decide to churn or not given their internal states. In particular, if we
happen to make the subscribers absolutely deterministic, we get results close to
100%. Of course, the absolute “goodness” of these figures does not mean much,
other than how difficult or easy to predict we made our synthetic data. The point
is that the system is able to correctly remember and put to use the information
in the event stream for one particular purpose, that of churn prediction.
We also checked that Hoeffding trees are extremely good at adapting to
changes. Via the prototype GUI we can vary during the execution parameters
such as prices of our company and the competition, frequency of complaint calls
and % of those resolved satisfactorily, average number of calls per subscriber, etc.
which affect our subscribers’ churn rate. We verified that after a change, predic-
tion accuracy falls because the predictor gets out of sync, but after a few thousand
calls, the new behaviors are captured by the tree and accuracy rises to almost
optimal levels again.
On a commodity PC, the system processes about 10,000 records per second.
Average memory consumption is about 40Mbytes for each 1,000 subscribers with
more than realistic levels of average activities (40 calls day, 2% daily churn rate,
etc.). Thus, there is ample room for upscaling using higher-end machines.
For deployment by large operators, with possibly many million subscribers, it
is clear that scaling out by distributed processing would be necessary. Addition-
ally, the customer base would be geographically distributed over the planet, so
communication latencies among datacenters and traffic splitting and routing must
be taken into account. Finally, the emergence of new technologies and services,
as well as company culture, will undoubtedly put additional constraints on the
processing. From a data mining point of view, techniques for distributed model
building will have to be incorporated. In fact, building several models at geo-
graphically distinct location may be advantageous to capture different customer
patterns at different zones. Since the models themselves are compact, they could
possibly be exchanged among machines and sites and be used cooperatively (e.g.,
with ensemble methods) for better accuracy.
We have hopefully shown that stream mining technology may help customer churn
prediction on high-volume streams originating from customer activity. The main
difference with exiting, batch-oriented, data mining approaches to the problem
is the ability of these technologies for reacting and adapting fast to changes in
customer behavior, without human intervention, which may have a direct impact
on revenue and image for companies. Although the system is a prototype far from
being deployable, we have shown that even on a single low-end machine we can
deal with quite high data speeds and gracefully handle all the churn prediction
process, including user segmentation and connecting with the customer relation
management subsystem.
Further work and additional research includes testing the system with real
subscriber data collected from live networks and combine additional data sources
from outside operators boundaries.
Acknowledgements
References
[1] C. Archaux, H. Laanaya, A. Martin, and A. Khenchaf. An svm based churn detector in
prepaid mobile telephony. In Intl. Conf. on Information and Communication Technolo-
gies: from Theory to Applications (ICTTA), Damascus, Syria, 19-23 April 2004.
[2] Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. MOA: Massive
online analysis. J. Mach. Learn. Res., 11:1601–1604, August 2010.
[3] J. Gama. Knowledge Discovery from Data Streams. Data Mining and Knowledge Discov-
ery. Chapman & Hall/CRC, 2010.
[4] J. Gama and M.M. Gaber. Learning from Data Streams: Processing Techniques in Sensor
Networks. New generation computing. Springer, 2007.
[5] B. Q. Huang, T. M. Kechadi, B. Buckley, G. Kiernan, E. Keogh, and T. Rashid. A
new feature set with new window techniques for customer churn prediction in land-line
telecommunications. Expert Syst. Appl., 37(5):3657–3665, May 2010.
[6] Bing Quan Huang, Mohand Tahar Kechadi, and Brian Buckley. Customer churn prediction
in telecommunications. Expert Syst. Appl., 39(1):1414–1425, 2012.
[7] Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams.
In Proc. 2001 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining,
pages 97–106, 2001.
[8] Sahand KhakAbi, Mohammad R. Gholamian, and Morteza Namvar. Data mining applica-
tions in customer churn management. In Proceedings of the 2010 International Conference
on Intelligent Systems, Modelling and Simulation, ISMS ’10, pages 220–225, Washington,
DC, USA, 2010. IEEE Computer Society.
[9] Scott A. Neslin, Sunil Gupta, Wagner Kamakura, Junxiang Lu, and Charlotte H. Mason.
Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer
Churn Models. Journal of Marketing Research, 43(2):204–211, May 2006.
[10] P.C. Pendharkar. Genetic algorithm based neural network approaches for predicting churn
in cellular wireless network services. Expert Syst. Appl., 36(3):6714–6720, April 2009.
[11] Anita Prinzie and Dirk Van den Poel. Incorporating sequential information into traditional
classification models by using an element/position-sensitive SAM. Decis. Support Syst.,
42(2):508–526, November 2006.
[12] Yossi Richter, Elad Yom-Tov, and Noam Slonim. Predicting customer churn in mobile
networks through analysis of social groups. In SIAM Intl. Conf. on Data Mining (SDM),
pages 732–741. SIAM, 2010.