Customer Churn Prediction in The Telecommunication Sector Using A Rough Set Approach
Customer Churn Prediction in The Telecommunication Sector Using A Rough Set Approach
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
A R T I C L E I N F O A BS T RAC T
Communicated by Zidong Wang Customer churn is a critical and challenging problem affecting business and industry, in particular, the rapidly
Keywords: growing, highly competitive telecommunication sector. It is of substantial interest to both academic researchers
Classification and industrial practitioners, interested in forecasting the behavior of customers in order to differentiate the
Churn prediction churn from non-churn customers. The primary motivation is the dire need of businesses to retain existing
Data mining customers, coupled with the high cost associated with acquiring new ones. A review of the field has revealed a
Feature selection lack of efficient, rule-based Customer Churn Prediction (CCP) approaches in the telecommunication sector. This
Rough Set theory study proposes an intelligent rule-based decision-making technique, based on rough set theory (RST), to extract
important decision rules related to customer churn and non-churn. The proposed approach effectively performs
classification of churn from non-churn customers, along with prediction of those customers who will churn or
may possibly churn in the near future. Extensive simulation experiments are carried out to evaluate the
performance of our proposed RST based CCP approach using four rule-generation mechanisms, namely, the
Exhaustive Algorithm (EA), Genetic Algorithm (GA), Covering Algorithm (CA) and the LEM2 algorithm (LA).
Empirical results show that RST based on GA is the most efficient technique for extracting implicit knowledge in
the form of decision rules from the publicly available, benchmark telecom dataset. Further, comparative results
demonstrate that our proposed approach offers a globally optimal solution for CCP in the telecom sector, when
benchmarked against several state-of-the-art methods. Finally, we show how attribute-level analysis can pave
the way for developing a successful customer retention policy that could form an indispensable part of strategic
decision making and planning process in the telecom sector.
⁎
Corresponding author.
E-mail address: [email protected] (S. Anwar).
http://dx.doi.org/10.1016/j.neucom.2016.12.009
Received 30 May 2016; Received in revised form 14 November 2016; Accepted 3 December 2016
Available online 07 December 2016
0925-2312/ © 2016 Elsevier B.V. All rights reserved.
A. Amin et al. Neurocomputing 237 (2017) 242–254
social network services, telecommunication, airlines, online gaming, known fact that existing customers are the most valuable assets for
and banking [7]. companies as compared to acquiring new ones [1]. Customer churn
Researchers have also used various machine learning (ML) techni- behavior has certain impacts on the company's performance which are
ques (e.g. Random Forest, Balanced Random Forest, Rotation Forest summarized as follows: (i) a negative impact on the overall perfor-
and RotBoost) for dealing with the problem of customer churn mance of the company, (ii) a potential cause for low sales because new/
prediction, but these ML techniques lack the required effectiveness short-term customers buy fewer services, (iii) helps competitors to gain
for predicting customer churn [8]. On the other hand, the RST dissatisfied customers with business promotion(s), (iv) leads to reven-
proposed by Pawlak [9] is an effective technique for discovering hidden ue losses, (v) puts negative impact on long-term customers, (vi)
rules, handling uncertainty and dealing with the unknown distribution increases uncertainty which reduces the ratio of possible new custo-
of data [6]. However, RST application has not been widely studied in mers, (vii) attracting new customers is more expensive than retaining
customer churn prediction, specifically in the telecommunication existing and (viii) risk to company's image in the competitive market
sector. Therefore, this study is an initial exploration of RST for CCP with loss of customer base.
in the telecommunication sector by constructing more appropriate Churn prediction has been widely studied in the recent decade,
predictive classifiers that forecast churn prediction based on accumu- particularly in the following domains: Open Social Network [14–19],
lated knowledge. Banking sector [20–24], Credit Card & a Financial Service provider
More specifically, an RST based benchmarking and empirical [7,25,26], Online Gaming industry [27–29], Human Resource depart-
approach is proposed in this study, and compared with previous ment of competitive organizations [12,30,31], Subscription service
state-of-the-art machine learning based methods, in order to address market [32,33], Question and Answer Q & A forums [34] and
the challenging CCP problem. As part of the aim to develop an Insurance service providers [35]. It is clear from this discussion that
improved CCP technique, we evaluate the performance of four well- customer churn as a problem is crucial for various organizations.
known rule extraction algorithms (EA, GA, CA, and LA), in order to Simultaneously, CCP is rapidly being observed in the telecommunica-
empirically determine which is most suitable for use with RST, and to tion industry around the globe as well. Table 1 provides a brief
extract more useful rules from hidden patterns in the telecom sector. overview of previous studies on CCP in the telecommunication
The extraction of decision rules based on RST, coupled with a best industry.
performing rule-generation algorithm, followed by appropriate attri-
bute level analysis, can lead to a more strategic decision making and 2.2. Review of CCP approaches
efficient planning process within the telecom industry. For example,
based on the extracted rules, decision makers can readily devise and A prediction model can be defined as the process of discovering
adopt new retention policies, and improve the overall performance of hidden patterns from data and predicting future events [36]. The
the organization. Finally, with attribute level analysis, decision makers marketing strategy within competitive companies has evolved from a
can identify reasons for customer churns and develop an appropriate product-oriented approach to a customer-centric one, due to the
retention policy. advancements in the field of ML [1]. The database technologies not
The rest of the paper is organized as follows: the next section only provide useful information to the organization's decision makers
presents customer churn and a review of CCP approaches; the about their past and current customers’ behavior, but also, provide
preliminary study on RST is explored in Section 3 and evaluation future prediction with the help of prediction modeling techniques.
methods are described in Section 4; the evaluation setup and experi- Churn prediction is an equally alarming issue for the service sector.
ments are discussed in Section 5 followed by results and comparisons Keaveney [37] published an early and influential study by conducting a
in Section 6; the paper is concluded in the last Section 7. survey to find out reasons as to why customers switch services (i.e.
customer churn). According to his study, the critical incidents between
2. Related work the customers and firms can be categorized into two broad cases;
firstly, Core Service Failure (CSF) and secondly the Service Encounter
2.1. Customer churn Failure (SEF). It was also investigated that occurrence of any of the two
cases (e.g. CSF or SEF) can be the reason for customer churn. Another
Customer churn— shifting from one service provider to the next study [38] has identified that service quality and customer response are
competitor in the market, is a key challenge in highly competitive two important drivers of churn and also established a link between
markets and is very much observed in the telecommunication sector these drivers and user churn.
[5–7,10–12]. Customer churns are those targeted customers who have Bloemer et al. [22] assumed that customer satisfaction and service
decided to leave a service provider, product, or even a company and quality are both directly proportional to each other. When customers
shifted to a competitor in the market. Literature reveals the following are getting a higher degree of satisfaction, it has a positive impact on
three types of customer churns [13]: the overall performance of the company in a competitive market.
Various machine learning techniques have been previously used for
1. Active churner (volunteer): those customers who want to quit CCP, including the Support Vector Machine (SVM) [39–43], Neural
the contract and move to the next provider. Networks [2,13,41,44,45,46], Decision trees [4,41,44], Regression
2. Passive churner (non-volunteer): when a company discon- analysis [44], Naïve Bayes and Bayesian Network [4] and Neuro-
tinues service to a customer. Fuzzy [47]. However, the most important problem, specifically of which
3. Rotational churner (silent): those customers who discontinue classification technique can be used to approach churn prediction in a
the contract without the prior knowledge of both parties (customer more appropriate fashion, still remains an open research problem.
and company), where each party (e.g. customer or company) may Although the literature [39] argues that SVM is one of the state-of-the-
suddenly terminate the contract without any notification. art classification approaches due to its ability to efficiently model
arbitrary nonlinearities, on the other hand, SVM also generates a black
The first two types of churns can be predicted easily with the help of box model [48] which is considered its drawback. Similarly, several
traditional approaches in terms of the Boolean class value; however, other studies [13,49] have reported that computationally expensive
the third type of churn is difficult to predict since there is a possibility multi-layered neural network can outperform as compared to other
of customers who might churn in the near future, for a variety of conventional ML algorithms.
reasons that are either not known or difficult to predict. It should be the A benchmarking empirical approach is proposed in this study with
goal of the decision maker to decrease the churn ratio, for it is a well- the aim of achieving the best performance accuracy, through efficient
243
Table 1
Overview of previous studies on churn prediction in telecommunication industry.
A. Amin et al.
[51] Malaysian subscriber in telecom industry Data mining by evolutionary learning (DMEL) Evolutionary Learning process 100000 subscribers data and 251 DMEL effectively discovered rules and accurately
database variables associated with predicted churn in telecom data.
subscriber
[10] A proprietary dataset of Taiwan telecom Decision tree (C5.0) and Neural Network back Use LIFT and Hit Ratio to assess 160,000 subscribers with 14,000 Both DT and NN techniques can deliver accurately while
company. propagation (BPN). model performance, Exploratory churners. the BPN performance is better than DT without
data analysis segmentation.
[18] Proprietary dataset of largest mobile operator Social Network Analysis, J48 Decision Tree, Weka 60 GB data which contains detail Proposed a technique which efficiently predicted
about Call Detail Record link-based & collective classifications about voice call, SMS, value-added potential churns by underlying social network analysis.
call etc.
[40] CRM Dataset from Duke University Support Vector Machine- Recursive Features MATLAB 65000 customer data with 171 Introduced SVM-RFE for attributes selection in churn
Elimination (SVM-RFE) attributes prediction which showed the satisfactory predictive
result.
[52] Six Real life proprietary European churn Random Forests and Logistic Regression Weka Telecom Dataset which includes Under-sampling increased the predictive performance
datasets (Bank1, Bank2, Mobile telecom, Pay 100205 customers while 2983 while using advanced sampling technique CUBE has not
TV, Newspaper, Supermarket) customers are churns shown better accuracy.
[53] Collected data from in-house customer Back Propagation Neural Network algorithm. Data mining using MATLAB Dataset of 895 customers where Predicted churn at risk which possibly may churn.
database, proprietary call record data from 17.67% were Churns and 82.33%
company & research survey were non-churn
[2] Telecom Dataset, UCI Repository, University of Artificial Neural Network Clementine Data mining SPSS 2427 objects Artificial Neural Network based model obtained 92%
California, Irvine accuracy
[3] Telecom Dataset, KDD Library, UCI Repository C4.5 and RIPPER AntMiner+ and Weka Dataset of 5000 instances ALBA with C4.5 & Ripper Shown highest accuracy
while Ant Miner+ High performance.
[47] Telecom Dataset, UCI Repository, University of Adaptive Neuro-fuzzy Inference system (ANFIS) Fuzzy Logic toolbox MATLAB Dataset contains 5000 customers Neuro-Fuzzy obtained better accuracy, specificity,
California Irvine sensitivity than C4.5, RIPPER
244
[41] The Dataset is obtained from an anonymous Decision tree, neural network, and SVM. Weka Dataset of 5000 instances Decision Tree accuracy: 77.9%, Neural Network
mobile service provider accuracy: 83.7% SVM accuracy: 83.7%
[54] 11 datasets of wireless telecom including KDD NN, Li SVM, rbfSVM, linLSSVM, rbfLSSVM, Weka, MATLAB, SAS, and R Smallest data contains 2180 and Oversampling does not improve classifiers’ performance
Cup2009, UCI, Duke and other private Ripper, Part, C4.5, CART, ADT, RF, LMT, Bag, largest up to 338874 observations as on churn prediction in the telecom sector and a large
Boost, RBFN, VP, Logit, KNN10, KNN100, BN, group of classifiers found yields comparable
NB performance.
[4] Proprietary European telecommunication Naive Bayes and Bayesian Network, C4.5 Weka 106405 with 5.6% Churns All predictive models performed with improved
company dataset. decision. prediction rate. Decision tree performed better in high
accuracy rate while other two achieved a higher true
positive rate.
[8] Publicly available Orange large & Cell2Cell RotBoost (RB), Random Forest, Rotation Forest Weka and MATLAB 50,000 instances with 260 mRMR returns more suitable features as compared to
datasets for telecom churn prediction. and Decorate (DEC) ensemble with mRMR features where 190 features Fisher's ratio and f-score for ensemble and Random
numerical and 70 nominal +rotation forest and RB+DEC with mRMR performed
features. best.
[39] Telecom Dataset, UCI Repository, University of Support Vector Machine (SVM) Algorithm IBM SPSS Dateset for 3333 unique SVM based model obtained 88.56% accuracy
California, Irvine customers
[55] Telecom Dataset includes the customer logistic regression and Multilayer perceptron MATLAB Dataset contained 89,412 Developed an efficient approach using SPA method as
personal information data and the CDR data neural networks Instances with 9.7% instances as propagation process.
churners
[56] Asian Mobile telecom operator dataset. Logistic Regression, Voted perceptron, WEKA Contains 2000 subscribers and 23 Proposed a hybrid learning model to predict churn
variables with 534 churns. which shows the most accurate result.
[57] Telecom Dataset, UCI Repository, University of Rough Set Theory RSES Dataset of 3333 Instances where Rough Set as multi-class classifier can provide more
California, Irvine 85.51% non-churn & 14.49% accurate results for binary classification problem.
churns.
[46] Telecom Dataset, UCI Repository, University of SVM, Decision tree, Artificial Neural Network, Package C5.0 in R-language for Dataset of 5000 Instances The best overall classifier was the SVM-POLY using
California, Irvine Naïve Bayes, Regression Analysis, boosting statistical computing AdaBoost.
Neurocomputing 237 (2017) 242–254
A. Amin et al. Neurocomputing 237 (2017) 242–254
extraction of appropriate decision rules from hidden existing patterns for every a∈Α where Va represents value set of attribute ai . A decision
and attribute level analysis. Based on these rules, the decision maker system is any information system of the form S = (U , C ∪ {d}), where C
can easily adopt new retention policies and improve the overall is conditional attributes and d ∉C is the decision attribute. The union of
performance of the organization. With attribute level analysis, the C and {d} are elements of Set A [60].
decision maker is able to identify the reasons for customer churns and,
develop a corresponding retention policy as well. Decision rules which 3.2. Indiscernibility relation
require additional information related to the data e.g. value of
possibility in the fuzzy set, grade of membership, basic probability in The notion of a concept approximation is precisely based on the
Dempster-Shafer theory, or probability in statistics, will result in less similarity between objects, which is referred to as indiscernibility (IND)
realistic decisions in comparison to decisions based on decision rules relation. A decision table may contain a large number of unnecessary or
which do not require additional information. RST is unique as it does redundant objects or attributes. For example, if objects (customers) are
not require any additional information. Hence, the ultimate decision suffering from certain critical behavior i.e. churns; symptoms of the
made, based on RST, is argued to be more realistic [50]. churn behavior from information can be obtained about the customers.
These objects (customers) can be characterized by the same informa-
tion, and are indiscernible (similar) in the pattern of the existing
3. Rough set theory
information about them. The RST indiscernibility approach is defined
relative to a given set of properties (attributes). Any set of IND objects
RST can be used as a mathematical tool to approach vagueness [9].
is called an elementary set which is also known as a basic granule of
RST philosophy is based on the assumption that we can associate
information about the universe, whilst any union of elementary sets is
information i.e. knowledge, data or information with every object of the
called a crisp set. If the union set is not a crisp set, then it is referred to
universe of discourse (e.g. also known as U). An RST has a precise
as vague (rough) [9,58,59].
concept of lower & upper approximation, and the boundary region.
Suppose we are given a pair of non-empty IS = (U , A), finite sets U
The boundary region separates the lower approximation from upper
and A, where A contains a set of attributes (i.e. a:U → Va , where Va is
approximation (i.e. boundary-line). For example, those objects that
the set of values of attributes a ) and U is the universe of objects. Any
cannot be classified with certainty as elements of either the lower or
subset B of A determining a binary relation I (B ) on Universe U is called
upper bounds. It is clear that borderline instances cannot be char-
an indiscernibility relation which can be defined as:
acterized in terms of available knowledge about the elements.
xI (B ) y IFFa (x ) = a ( y) for every a∈B ,
Therefore, any rough concept is replaced by either lower or upper
where a (x ),a ( y) are values of attribute a for object x and
approximation of vague concept [9,58,59].
y respectively. Obviously, I (B ) is an equivalence relation which is
Mathematically, the concepts of lower or upper approximation and
determined by B . If (x, y)∈I (B ) then we can say that x and y are B-
boundary region have been defined as: suppose set X ⊆ U and B is an
indiscernible where I (B ) an equivalence classes or B-elementary set or
equivalence relation and U is the universe of objects (i.e. the partition
rough set. In RST the elementary set (granules) is a concept of available
of the universe set U to create a new subset of interest from U which
knowledge about the reality [61]. For any subset of attributes B ⊆ A
has the same value of outcome attribute) in information system
indiscernibility relation IND (B ) is defined as follows: If
IS = (U , A) of non-empty, finite set U and A, where U is the universe
IND (B ) = IND (B − {a}) (i.e. recall equivalence relation) then a∈B is
of objects and A is a set which contains attributes. Then
dispensable otherwise indispensable in B while set B can be called
LB = ∪ {Y ∈U / A: Y ⊆ X} is a lower approximation and an exact mem-
independent if all attributes are indiscernible. If (i,j )∈U × U belongs to
ber of X while UB = ∪ {Y ∈U / A: Y ∩ X ≠ Ø} is upper approximation
IND (B ), then we can say that i and j are indiscernible by attributes
which is possibly a member of X . BR = UB − LB is the boundary
from B .
region. The whole RST based classification system can be visualized as
(see Fig. 1.).
3.3. Reduction of attributes and core set
3.1. Decision table The reducts (reduction) and discernibility relation are the two key
notions in the conventional RST [60]. The data table contains some
The special case of Information system (IS) is known as decision superfluous data. Therefore, a reduction process was used to remove
table, which is usually known as training sets or samples in ML. An IS some data from the decision table whilst preserving its basic properties
can be represented by a data table with rows “R ” and columns “C ”, [61]. Let us mathematically express the reduction of attribute idea
where R is labeled by objects and C is labeled by attributes. Formally, more precisely by: If C , D ⊆ A, where C is a set of condition attributes
an information system is IS = (U , A) where U is a non-empty finite set and D is a set of decision attribute and X ⊆ C is a D-reduct (reduct with
of instances called the universe is and A is a non-empty finite set of respect to D ) of C , then X is a minimal subset of C such that
attributes or properties i.e A = {a1, a2, a3………an} such that a:U → Va γ (C , D ) = γ (X , D ). Every element of core belongs to a reduct set in
such a way that the core contains the most important subset of
attributes and none of its element can be removed due to effect on
the classification power [61]. So the core is the intersection of all
reducts Core (B) = Red (B ). Where Red (B ) is a set of all reducts of B .
245
A. Amin et al. Neurocomputing 237 (2017) 242–254
may ultimately reduce the size of value sets of attributes and thus 4. Evaluation measures
reduces the computational cost. On the other hand, the grouping of
attribute's data in discretization process is actually based on the It is nearly impossible to build a perfect classifier or a model that
calculated cuts, where the continuous variables are converted into could perfectly characterize all the instances of the test set [52]. To
discrete attributes or into intervals i.e. grouping of values [6]. The idea assess the classification results, we count the number of True Positive
of cut can be incorporated in the discretization process. Actually, cut (TP), True Negative (TN), False Positive (FP) and False Negative (FN).
mostly appears in the discretization process as a pair (a, c), (where a is The FN value actually belongs to Positive P (e.g. TP + FN = P) but is
a continuous value and c is a cut to split the value into two disjoint sub- wrongly classified as Negative N (e.g. TN + FP = N); while, FP value is
intervals [6]). For example, there may exist some unseen objects which actually part of N but wrongly classified as P. The following measures
cannot match with the rules or can slow down the ML process and were used for the evaluation of proposed classifiers.
increase computational costs. Therefore, cut and discretization meth-
ods are used to obtain a high quality of classification [60]. – Sensitivity/recall: it measures the fraction of churn customers
who are correctly identified as true churn.
TP
Sensitivity (Recall ) =
P (1)
3.5. Rules generation
– Specificity: it measures the fraction of true non-churn customers
Decision rules are often denoted as “IF C THEN D” where C is the who are correctly identified.
set of condition attributes and D represents the decision attribute in
TN FP
the decision table [61]. Given two unary predicate formulae are Specificity = = > 1 – Specificity =
N N (2)
∝(x ) and β (x ), where x executes over a finite set U. Łukasiewicz [62],
card (||α (x )||)
defined a value i.e. , assign to ∝(x ) where
card (U )
||∝(x ) || = {x ∈ U:x satisfies ∝ }while the fractional value is assigned to
implication ∝(x )= > β (x ) is then
card ( ∝ (x ) β (x ) )
with assumption that – Precision: it is characterized by the number of correctly predicted
card (||∝ (x )||)
churns over the total number of churns predicted by the proposed
||∝(x ) || ≠ ∅. The decision rules can be constructed by overlaying the
approach. Formally, it can be expressed as:
reduct sets over the decision table. Mathematically, it can be expressed
as; (ai1 = v1)^…..^(aik= vk) = > d = vd, where 1≤ i1 < … < ik ≤ m, vi ∈ TP
Precision =
Vai: for simplification we can state it represented in IF-Else statement TP + FP (3)
i.e., IF C THEN D; where C is set of conditions and D is decision value.
To extract the decision rules, the following four well-known rules
– Accuracy: overall accuracy of the classifier can be calculated as:
generation algorithms are used [60]:
TP + TN
Accuracy =
– EA: It takes subsets of features incrementally and then returns the P + N (4)
reducts set and minimal decision rules. The generated decision rules
are those rules which have minimal descriptors in the conditional – Misclassification error: it is referred to as a classification error
attributes. It needs more concentration because it may lead to where an instance is falsely classified to a class to which the instance
extensive computations in the case of a complex and large decision does not belong. Different types of misclassification errors can be
table. It is based on a Boolean reasoning approach [63]. calculated as:
– GA:It is based on order-based GA coupled with a heuristic. It is used
to reduce the computational cost in a large and complex decision
Misclassification = 1–Accuracy (5)
table [64,65]. FP
– CA: It is the customized implementation of the LEM1 algorithm Type − I Error = 1 – Specificity =
FP + TN (6)
idea and is implemented in the RSES covering method. It was
introduced by Jerzy Grzymala [66]. FN
Type − II Error = 1 – Sensitivity =
– LA: It is a divide and conquer technique paired with lower and TP + FN (7)
upper approximation of RST and is based on local covering
determination of each object from the decision class [66,67].
– F-Measure: a composite measure of precision and recall to
Since RST is employed as a base classifier in the proposed study, we compute the test's accuracy. It can be interpreted as a weighted
have provided a brief overview of RST to help the reader understand average of precision and recall.
the basic terminologies used in RST (specifically, the decision table,
Precision. Recall
indiscernibility, reduct & core sets, cut & discretization and decision F − Measure = 2 .
Precision + Recall (8)
rules). In section 2, the reasons for using RST in the proposed study are
outlined. Moreover, the implementation of RST in the CCP problem is
illustrated pictorially in Fig. 2. Wherein it depicts the three major parts: – Coverage: the ratio of classified objects that are recognized by a
(i) data preprocessing (creating a decision table), (ii) training process classifier from the class to the total number of objects in the class.
(or applying rules generation algorithms followed by reduct and core Where C is classifier, A is a decision table, Match A(C) is a subset of
sets concepts of RST), and (iii) classification process to validate the objects in A that areclassified by classifier C.
prediction performance of the RST based approach for the CCP
|Match A (C )|
problem. Each part is explained in section 5 with step-wise details in Coverage A(C) =
|A| (9)
the context of the fundamental study of RST. This study is an extension
of our previous work [80]. The next section describes the evaluation
measures which are used to evaluate the performance of the proposed 5. Evaluation setup
approach.
In this section, we have evaluated the performance of RST based on
246
A. Amin et al. Neurocomputing 237 (2017) 242–254
four rules generation algorithms for customer churn prediction using knowledge discovery process. In order to identify those variables or
the RSES toolkit [60]. An analytical environment was setup to perform attributes, from a large number of attributes in a dataset, that are
the proposed technique as shown in Fig. 2. RSES toolkit is very helpful relevant and will reduce the computational cost [64,69,70], the
to: (i) convert dataset into decision table (ii) apply Cut and selection of the most appropriate attributes from the dataset in hand
Discretization (iii) extracting the decision rules set from the training was carried out using a feature ranking method known as “Information
set, and (iv) validate the result. These experiments were carried out to Gain Attribute Evaluator” using a Weka toolkit [71]. It evaluates the
fulfill the objectives of the proposed study as well as to address the attributes worth through the information gain and measurement
following points: procedure as per the class value. It ranks the attributes and selects
the top ranked attributes, which significantly improves the computa-
P1: Which features are more indicative for churn prediction in the tional efficiency and classification. After feature ranking, it includes
telecom sector? most relevant and ranked attributes in the decision table.
P2: Which algorithm (EA, GA, CA, and LA) is more appropriate for
generating rules sets for RST classification approach in the tele- 5.2. Preparation of a decision table, cut and discretization
communication sector?
P3: What is the predictive power of the proposed approach for The preparation of the decision table is an important stage of the
churn prediction in the telecom sector? proposed study. The decision table which consists of objects, condi-
P4: Can the derived rules help the decision makers in strategic tional attributes, and decision attributes is organized in Table 3.
decision making and planning process? Cut and discretization is the plausible approach to reduce the
dataset horizontally in order to handle the large data efficiently. It is a
common approach used in the RST where the variables containing
5.1. Data preparation and feature selection
continuous values are partitioned into a finite number of intervals or
groups [72]. The cut and discretization process was carefully performed
Evaluation of data mining approaches on a publicly available
on the prepared decision table using the RSES toolkit. The cuts are
dataset has many benefits in terms of comparability of results, ranking
added in the decision table using the toolkit at every iteration and
techniques and evaluation of existing methodologies with new ones
generates fewer number of cuts [60]. For example, in this study, the
[68]. For this study, we have used a publicly available dataset.
cuts of attribute “Day_Mins” were grouped after the discretization
Description about the used dataset can be obtained from the URL1.
process as listed in Table 4. The first column shows the groups, that are
The dataset consists of 3333 instances, where 85.5% customers are
represented by numeric data for simplicity purposes, and are listed in
Non-churn (NC) and 14.49% are Churn (C). The number of churns was
ascending order. The second column represents the intervals that are
much smaller than non-churns customers in the selected dataset,
obtained after the discretization process. The third column is the count
which can make it difficult for the churn prediction classifier during
of the attribute's values that fall into certain groups; while, the last
the learning process. The training set contains 2333 customers, where
column is the percentage of a variable's value in each interval. It is clear
85.16% are NC and 14.8% are C. The test set contains 1000 instances,
from Table 4 that the value of Day_Mins has been changed from the
where 86.3% are NC and 13.7% are C. Table 2 reflects descriptive
continuous nature into 14 different intervals or groups, after the cut
statistics of the target dataset that are selected after applying
and discretization process.
"Information Gain Attribute Evaluation" method.
Data preparation and feature selection are important steps in the
5.3. Training and test sets
1
(“Data Source Link http://www.sgi.com/tech/mlc/db/”) In data mining, validation is extremely important to ensure that the
247
A. Amin et al. Neurocomputing 237 (2017) 242–254
Table 2
Statistical information about ranked attributes.
Attributes Distinct counts Min value Max value Means StdDev Ranks Values Description
Table 3 Table 7
Organization of attributes for decision table. Evaluation of four rules generation methods through RST classification approach.
Sets Description EA GA CA LA
Table 6
The decision rules can be obtained from the training set by selecting
Statistics about rules induced using four methods. either of the rules generation methods (EA, GA, CA and LA). Where EA
and GA scan the training set object-by-object and generate rules sets by
Description Methods for calculating rules matching the objects and attributes with reducts. The CA and LA can
induce rules sets without matching with reduct sets using the RSES
EA GA CA LEM2
toolkit.
Total no. of rules 4184 9468 369 625 GA encodes the problem domain objects into chromosomes and
# of rules induced that classifies customer 1221 2674 122 160 applies selection, mutation, and a reproduction operator from the
as churn
theory of natural selection, on the population of chromosomes which
# of rules induced that classifies 2963 6715 247 465
customers as non-churn are candidate solutions. In each generation, the fittest chromosomes
are retained after evaluation through a fitness function, and eventually
248
A. Amin et al. Neurocomputing 237 (2017) 242–254
Fig. 3. Marginal percent of each discretized group inside all attributes. The discretized groups of each variable are labeled by integer number e.g. 1, 2,…..N. on the x-axis in the graphs
while y-axis indicates marginal percent.
the algorithm converges after defining a number of generations. The CustServ_Calls (3.5, *)
GA applied to CCP problem using subset evaluation as the fitness ThenChurn= (True)
function. Each rule generated is passed through the genetic operators Based on these simple and comfortably interpretable rules, the
and the best amongst the rules, having maximum support and decision makers can easily understand the flow of customer churn
confidence, are selected as final candidate solutions. The rules having behavior and they can adopt a more suitable strategic plan to retain
lesser support and confidence are simply discarded. The parameters for their customer churn. The total number of generated decision rules
the GA are described in Table 5. induced from the training set is summarized in Table 6.
The GA for computing reducts has a probability of 60%, starting
with an initial population of 50 chromosomes. The chromosomes are
6. Results and discussion
designed on the basis of the number of features in the dataset with one
bit allocated for a class label. The GA converges in the span of 100
This section represents the performance of classifiers (in subsection
generations and produces an exhaustive rule base method containing 6.1), the analysis of discretized groups (in subsection 6.2), analysis of
more than nine thousand rules. In the proposed study, important
features (in subsection 6.3) and finally, RST approximation analysis (in
decision rules were extracted from the training set through these four subsection 6.4). We have also performed a comparison of the proposed
different rules generations algorithms. The decision rules set, specifies
approach with other related techniques that are applied to the same
the rules in the form of “if C then D” where C is a condition and D refers dataset in section 6.5.
to decision attribute. For example:
249
A. Amin et al. Neurocomputing 237 (2017) 242–254
Table 8
Summary of statistical most significance discretized groups.
Day_Mins 12 229.09
Intl_Mins 6 28.38
Eve_Mins 6 14.69
Intl_Calls 1 21.69
CustServ_Calls 5 298.08
Intl_Plan 2 203.24
Vmail_Plan 2 25.16
with an RST based classification approach. All these four methods are
applied to the same telecom dataset. Table 7 reflects the comparison of
these four rules generation algorithms (i.e. GA, CA, EA, LA) with RST
based classification. It was investigated that the GA with the RST has Fig. 5. Reflect sensitivity on y-axis and 1-specificity on the x-axis for each individual
shown more suitable predictive capacity which is reported to P3 (i.e. feature.
Section 5). LA gives maximum accuracy which is about 0.993; however,
it has a coverage of 66.8% of customers (which means it has only
classified 668 instances while 332 customers were ignored). Similarly, Table 9
Approximation Analysis.
covering algorithm has only classified 64% customers with 0.878
accuracy which is the least of accuracy among four rules generation Decision class No. of Lower Upper Accuracy
algorithms. Although, the EA achieved less accuracy (e.g. 92.6%) as objects approximation approximation
compared to LA; nonetheless the EA method performed better than
NC 2850 2850 2850 1.000
both algorithms (i.e. LA and CA) in terms of coverage, recall, and F- C 483 483 483 1.000
measures. On the other hand, Table 7 reflects that the overall best
250
A. Amin et al. Neurocomputing 237 (2017) 242–254
Table 10
Comparison of predictive performance of proposed & previous approaches applied to the same dataset.
performance is achieved by GA for generating the rules set using RST is investigated that group 1 in Intl_Calls, have a high ratio of churn (i.e.
based classification approach for churn prediction in the telecommu- 20.84%) which shows that those customers who have utilized mini-
nication sector. GA correctly predicted 86% true churn, fully classified mum international calls have become churn. Similarly, group 1 in
false churn and with 98.2% overall accuracy achieved with minimum VMail_Plan has also a maximum number of churn (i.e. 16.72) which
misclassification error, i.e., about 1.9% with 100% coverage of in- shows that those customers who have not activated the voice mail plan
stances as well. Therefore, GA indicating as the best approach among have more tendency of churn. Finally, those customers who have
these targeted algorithms (i.e. CA, LA, and EA) with RST based activated the international call plan have a high ratio of customer churn
prediction of customer churn which is reported to P2 (i.e. Section 5). (i.e. 42.41). The decision makers can use this information while
designing their customer retention policy which is reported to P4.
251
A. Amin et al. Neurocomputing 237 (2017) 242–254
6.4. Approximation analysis new generation, since background knowledge is not really required
about the dataset [79].
We have also analyzed the vagueness in the dataset by using upper Finally, it is clear from the comparative results, that the proposed
approximation and lower approximation of RST described in Table 9. approach performed very well as compared to previously applied
Based on the results of approximation analysis, it is observed that there techniques on the same benchmark dataset from the telecommunica-
is no vagueness in the samples. tion industry (see Table 10).
6.5. Comparison and discussion of simulation results 7. Conclusion and future work
In this study, the proposed approach evaluated four well-known In this study, the application of RST is explored to predict customer
rules-generation algorithms (discussed in subsection 3.5) based on churn in the telecommunication sector by constructing a predictive
RST, to mine concise rules for efficiently predicting customer churns. classifier that can forecast churn behavior based on accumulated
The total number of rules induced from the telecom sector by applying knowledge. To evaluate the results of the proposed approach, a
rules-generation algorithms are summarized in Table 6. The LEM2 benchmarking study is applied and the performance of four different
algorithm is seen to perform well on symbolic attributes only [75]. rules-generation algorithms (EA, GA, CA and LA) is investigated. It is
Fig. 6 reflects the graphical depiction of the induced concise rules. found that RST classification based on GA outperforms other rules-
It is observed that by applying both CA and LA in the proposed generation algorithms in terms of precision, recall, the rate of
approach for rule generation, tend to produce a lesser number of rules misclassification, lift, coverage, accuracy, and F-measure. The discre-
than GA and EA. Specifically, the CA and LA both produced fewer rules tization process applied to different attributes revealed some important
(i.e., 369, and 625 respectively), as opposed to the total number of insights regarding reasons for customer churn that can ultimately help
decision rules (4184 and 9468) generated by EA and GA respectively. the decision makers to develop retention policies accordingly. The
In this empirical study, the GA produced the maximum number of proposed approach also outperforms other techniques applied on the
decision rules as compared to the other rules generation algorithms. same dataset in terms of precision, recall, accuracy, F-measure and rate
The CA and LA are also faster on average, than EA and GA due to of misclassification. It is important to note that the study conducted is
searching in few rules for predicting the customer's churns. Although, pertaining to the specific dataset used for this study; results may vary
out of these four rules-generation algorithms, CCP using the CA yields with other datasets.
maximum accuracy, however, both the CA and LA return very less This study has shed some light on the performance of popular ML
valuable and meaningful rules that have direct effects on the perfor- techniques for the CCP issues, and supported the advantage of RST
mances of the both algorithms. Further, they also produced very low application in the proposed approach. In future, we intend to further
coverage as compared to the EA and GA (see Table 7). On the other theoretically and experimentally investigate the proposed approach
hand, Nandita & Jaya [76] concluded from their empirical results that while considering several other pertinent issues. Firstly, churn datasets
the CA yields the best accuracy compared to both the GA and LA. exhibit the class imbalance problem; whereby, the churn class (min-
However, they were only focused on accuracy, without considering the ority class or class of interest) contains fewer number of samples as
coverage evaluation measure. In our case, the CA and LA covered 64% compared to the non-churn class (majority class). This makes it
and 66% of customers respectively and the CA ignored 36% customers difficult to recognize the minority class for some ML techniques;
with 87% accuracy, whereas the LA ignored 34% customers with 99.3% although they may achieve high overall accuracy. Secondly, eliminating
accuracy. Therefore, we cannot recommend the CA and LA to use in and detecting of outliers would greatly contribute to providing better
such a scenario, where the coverage can be compromised in the favor of results. Finally, in this study the profiles of predicted customer churns
accuracy. On the other hand, EA checks all the possible combinations were not considered; while these might be of interest to organizations
in search for the absolute optimum solution. Therefore, EA may turn in decision making related to retention of specific churn customers or
out to be unacceptably slow since the required time is proportional to letting them go. Thus worthy churn customers, at focus, could possibly
the maximum number of all possible solutions [77]. It was also have greater lifetime value. Hence, we wish to address these challen-
observed that the GA and EA performed best as compared to the CA ging issues in future research.
and LA, because both the GA and EA covered all the objects, whilst
offering reasonable performance, as shown in Table 7. However, Acknowledgements
although the GA and EA demonstrated the best performance, in overall
terms, the GA performed much better than EA, CA, and LA in The authors are grateful to the anonymous reviewers for their
recognition of true churns (precision) and true non-churns (recall). insightful comments and suggestions, which helped improve the
To show the balance between the two measures (i.e., precision and quality of this paper. Professor A. Hussain is supported by the UK
recall), we observed the performance of these rules-generation algo- Engineering and Physical Sciences Research Council (EPSRC) grant no.
rithms based on f-measure which combines the precision and recall. EP/M026981/1.
The GA obtained the best score i.e., 92.5% f-measure as compared to all
other rules-generation algorithms. References
In the proposed study, EA could not achieve optimal results as
compared to GA, since the EA with RST was found to be relatively [1] J. Hadden, A. Tiwari, R. Roy, D. Ruta, Computer assisted customer churn
management: state-of-the-art and future trends, Comput. Oper. Res. 34 (10) (2007)
ineffective in reducing large data in the decision table (information
2902–2917.
system) for producing effective decision rules. On the other hand, GA [2] A. Sharma, P. Prabin Kumar, A neural network based approach for Predicting
begins by evaluating the problem as a population of solution candidates Customer churn in cellular network services, Int. J. Comput. Appl. 27 (11) (2011)
26–31.
and generates new offspring through a cross-over and mutation [3] W. Verbeke, D. Martens, C. Mues, B. Baesens, Building comprehensible customer
process, with the aim of having the best-fit candidate when the next churn prediction models with advanced rule induction techniques, Expert Syst.
step for evaluation starts and so on [78]. It is concluded that a classifier Appl. 38 (3) (2011) 2354–2364.
[4] C. Kirui, L. Hong, W. Cheruiyot, H. Kirui, Predicting customer churn in mobile
trained on a large decision table using RST classification, coupled with telephony industry using probabilistic classifiers in data mining, IJCSI Int. J.
the GA for inducing efficient decision rules, can produce the optimal Comput. Sci. Issues 10 (2) (2013) 165–172.
[5] B. Huang, M.T. Kechadi, B. Buckley, Customer churn prediction in telecommuni-
solution for CCPin the telecommunication sector, subject to optimizing cations, Expert Syst. Appl. 39 (1) (2012) 1414–1425.
parameter settings (see Table 5). More importantly, the proposed [6] C.-S. Lin, G.-H. Tzeng, Y.-C. Chin, Combined rough set theory and flow network
approach is also flexible in evolving itself to a new situation following a graph to predict customer churn in credit card accounts, Expert Syst. Appl. 38 (1)
252
A. Amin et al. Neurocomputing 237 (2017) 242–254
(2011) 8–15. [43] H. Kaizhu, D. Zheng, J. Sun, Y. Hotta, K. Fujimoto, S. Naoi, Sparse learning for
[7] R.H. Wolniewicz, R. Dodier, Predicting customer behavior in telecommunications, support vector classification, Pattern Recognit. Lett. 31 (13) (2010) 1944–1951.
IEEE Intell. Syst. 19 (2) (2004) 50–58. [44] S.A.Qureshi, A.S.Rehman, A.M.Qamar, A.Kamal, A.Rehman, Telecommunication
[8] A. Idris, A. Khan, Y. Soo, Intelligent churn prediction in telecom: employing mRMR subscribers' churn prediction model using machine learning, in: Proceedings of the
feature selection and RotBoost based ensemble classification, Springer Science Eighth International Conference on Digital Information Management (ICDIM
Business Media, New York, 2013, pp. 659–672. 2013)pp. 131–136, 2013.
[9] Z. Pawlak, Rough sets, Int. J. Comput. Inf. Sci. 11 (5) (1982) 341–356. [45] Z. Khawar Malik, A. Hussain, Q.M.J. Wu, Multi-layered echo state machine: a novel
[10] S.-Y. Hung, D.C. Yen, H.-Y. Wang, Applying data mining to telecom churn architecture and algorithm, IEEE Trans. Cybern. (2016).
management, Expert Syst. Appl. 31 (3) (2006) 515–524. [46] T. Vafeiadis, K.I. Diamantaras, G. Sarigiannidis, K.C. Chatzisavvas, A comparison of
[11] C.-P. Wei, I.-T. Chiu, Turning telecommunications call details to churn prediction: machine learning techniques for customer churn prediction, Simul. Model. Pract.
a data mining approach, Expert Syst. Appl. 23 (2) (2002) 103–112. Theory 55 (2015) 1–9.
[12] V.V. Saradhi, G.K. Palshikar, Employee churn prediction (Mar.)Expert Syst. Appl. [47] H. Abbasimehr, A neuro-fuzzy classifier for Customer churn prediction, Int. J.
38 (3) (2011) 1999–2006 (Mar.). Comput. Appl 19 (8) (2011) 35–41.
[13] V. Lazarov, M. Capota, churn prediction, Bus. Anal. Course TUM Comput. Sci. [48] M.A.H. Farquad, V. Ravi, S.B. Raju, Churn prediction using comprehensible
(2007). support vector machine: an analytical CRM application, Appl. Soft Comput. 19
[14] L.Backstrom, D.Huttenlocher, J.Kleinberg, X.Lan, Group formation in large social (2014) 31–40.
networks, in: Proceedings of the 12th ACM SIGKDD international conference on [49] M.C. Mozer, R. Wolniewicz, D.B. Grimes, E. Johnson, H. Kaushansky, Predicting
Knowledge discovery and data mining - KDD ’06pp. 44–54, 2006. subscriber dissatisfaction and improving retention in the wireless telecommuni-
[15] L.Xi, Y.Wenjing, L.An, N.Haiying, H.Lixian, Q.Luo, C.Yan, Churn Analysis of cations industry, IEEE Trans. Neural Netw. 11 (3) (2000) 690–696.
Online Social Network Users Using Data Mining Techniques, Preced. International [50] Z. Pawlak, Rough Sets And Data Mining, Kluwer Acad. Publ., 1997, p. 6.
multi Conference Eng. Comput. Sci., no. 1, pp. 14–16, 2012. [51] W. Au, K.C.C. Chan, X. Yao, A novel Evolutionary data mining algorithm With
[16] W. Verbeke, D. Martens, B. Baesens, Social network analysis for customer churn applications to churn prediction, IEEE Trans. Evol. Comput. 7 (6) (2003) 532–545.
prediction, Appl. Soft Comput. 14 (2014) 431–446. [52] J. Burez, D. Van den Poel, Handling class imbalance in customer churn prediction,
[17] D.Archambault, N.Hurley, C.T.Tu, ChurnVis: Visualizing mobile telecommunica- Expert Syst. Appl. 36 (3) (2009) 4626–4636.
tions churn on a social network with attributes,Adv. Soc. Networks Anal. Min. [53] R.J. Jadhav, U.T. Pawar, Churn prediction in Telecommunication using data
(ASONAM), 2013 IEEE/ACM, pp. 894–901, 2013. mining technology, Int. J. Adv. Comput. Sci. Appl. 2 (2) (2011) 17–19.
[18] K.Dasgupta, R.Singh, B.Viswanathan, D.Chakraborty, S.Mukherjea, A.A.Nanavati, [54] W. Verbeke, K. Dejaeger, D. Martens, J. Hur, B. Baesens, New insights into churn
A.Joshi, Social ties and their relevance to churn in mobile telecom networks, in: prediction in the telecommunication sector: a profit driven data mining approach,
Proceedings of the 11th international conference on Extending database technology Eur. J. Oper. Res. 218 (1) (2012) 211–229.
Advances in database technology - EDBT ’08, pp. 668–677, 2008. [55] K. Kim, C.-H. Jun, J. Lee, Improved churn prediction in telecommunication
[19] J. David Nunez-Gonzalez, M. Grana, B. Apolloni, Reputation features for trust industry by analyzing a large network, Expert Syst. Appl. 41 (15) (2014)
prediction in social networks, Neurocomputing 166 (2014) 1–7. 6575–6584.
[20] U. Prasad Devi, S. Madhavi, Prediction Of churn behavior Of Bank customers, Bus. [56] G. Olle, A hybrid churn prediction model in mobile Telecommunication industry,
Intell. J. 5 (1) (2012) 96–101. Int. J. e-Educ. e-Bus. e-Manag. e-Learn. 4 (1) (2014) 55–62.
[21] K.Chitra, B.Subashini, Customer Retention in Banking Sector using Predictive Data [57] A.Amin, C.Khan, I.Ali, S.Anwar, Customer Churn Prediction in Telecommunication
Mining Technique, ICIT 2011 5th International Conference Inf. Technol., 2011. Industry: With and without Counter-Example, in: Proceedings of the 13th Mexican
[22] J. Bloemer, K. de Ruyter, P. Peeters, Investigating drivers of bank loyalty: the International Conference on Artificial Intelligence, MICAI 2014, Springerpp. 206–
complex relationship between image, service quality and satisfaction, Int. J. Bank 218, 2014.
Mark. no. 16 (1998) 276–286. [58] Z. Pawlak, rough set, rough relations and rough functions, Fundam. Inform. IOS
[23] N. Nguyen, G. LeBlanc, The mediating role of corporate image on customers’ Press 27 (2–3) (1996) 103–108.
retention decisions: an investigation in financial services, Int. J. Bank Mark. 16 (2) [59] Z. Pawlak, Theoretical Aspects of Reasoning About Data, Kluwer Acad. Publ.
(1998) 52–65. Dordr., 1991.
[24] A. Zakaryazad, E. Duman, A profit-driven artificial neural network (ANN) with [60] J.G. Bazan, S. Marcin, The rough set exploration system, Trans. Rough. Sets III,
applications to fraud detection and direct marketing, Neurocomputing 175 (2016) Springer Berlin Heidelb. (2005) 37–56.
121–131. [61] Z. Pawlak, A. Skowron, Rough Sets and ConflictAnalysis 37, Springer Berlin
[25] K.C. Lee, N. Chung, K. Shin, An artificial intelligence-based data mining approach Heidelberg, Berlin, Heidelberg, 2007.
to extracting strategies for reducing the churning rate in credit card industry, J. [62] J. Łukasiewicz, Die logischen Grundlagen der Wahrscheinlichkeitsrechnung,
Intell. Inf. Syst. 8 (2) (2002) 15–35. Akademie der Wissenschaften, Krakau, 1913.
[26] D. Van den Poel, B. Larivière, Customer attrition analysis for financial services [63] S.H.Nguyen, H.S.Nguyen, Analysis of STULONG Data by Rough Set Exploration
using proportional hazard models, Eur. J. Oper. Res. 157 (1) (2004) 196–217. System ( RSES ),Proceedings ECML/PKDD Work., pp. 71–82, 2003.
[27] J.Kawale, A.Pal, J.Srivastava, Churn Prediction in MMORPGs: A Social Influence [64] J.G. Bazan, H.S. Nguyen, S.H. Nguyen, P. Synak, Jakub Wroblews, “Rough
Based Approach, in: Proceedings 2009 International Conference Comput. Sci. Eng., set algorithms in classification problem, Rough. Set. Methods Appl. Phys. HD
vol. 4, pp. 423–428, 2009. (2000) 49–88.
[28] M.Suznjevic, I.Stupar, M.Matijasevic, MMORPG Player Behavior Model based on [65] J. Wróblewski, Genetic algorithms in decomposition and classification problems,
Player Action Categories, in: Proceedings 10th Annu. Work. Netw. Syst. Support Rough. Sets Knowl. Discov. 2 19 (1998) 471–487.
Games. IEEE Press, 2011. [66] J.W. Grzymala-Busse, A. New Version, of the Rule induction System LERS,”,
[29] K.Chen, C.Lei, Network game design: Hints and implications of player interaction, Fundam. Inform. IOS Press 31 (1) (1997) 27–39.
in: Proceedings 5th ACM SIGCOMM Work. Netw. Syst. Support games., pp. 1–9, [67] J.Grzymala-Busse, A system for learning from examples based on rough sets. In
2006. Intelligent Decision Support. In Intelligent Decision Support. Handbook of
[30] S. Meaghan, B. Nick, Voluntary turnover: knowledge management, J. Intellect. Cap. Applications and Advances of theRough Sets Theory, ed. byR. Slowinski, 1992.
3 (3) (2002) 303–322. [68] O. Vandecruys, D. Martens, B. Baesens, C. Mues, M. De Backer, R. Haesen, Mining
[31] M.Kane, Laura, Predictive Models of Employee Voluntary Turnover, 2007. software repositories for comprehensible software fault prediction models, J. Syst.
[32] J. Burez, D. Van den Poel, CRM at a pay-TV company: using analytical models to Softw. 81 (5) (2008) 823–839.
reduce customer attrition by targeted marketing for subscription services, Expert [69] Z. Khawar Malik, A. Hussain, W. Jonathan, An online generalized eigenvalue
Syst. Appl. 32 (2) (2007) 277–288. version of Laplacian Eigenmaps for visual big data, Neurocomputing 173 (2016)
[33] K. Coussement, D. Van den Poel, Churn prediction in subscription services: an 127–136.
application of support vector machines while comparing two parameter-selection [70] M.B. Stojanović, M.M. Božić, M.M. Stanković, Z.P. Stajić, A methodology for
techniques, Expert Syst. Appl. 34 (1) (2008) 313–327. training set instance selection using mutual information in time series prediction,
[34] G.Dror, D.Pelleg, O.Rokhlenko, I.Szpektor, Churn prediction in new users of Neurocomputing 141 (2014) 236–245.
Yahoo! answers, in: Proceedings of the 21st International Conference companion [71] G.Holmes, A.Donkin, I.H.Witten, WEKA: a machine learning workbench, in:
World Wide Web - WWW ’12 Companion, pp. 829–834, 2012. Proceedings of ANZIIS ’94 - Australian New Zealnd Intelligent Information
[35] R.A. Soeini, K.V. Rodpysh, Applying data mining to insurance Customer churn Systems Conferencepp. 357–361, 1994.
management, Int. Proc. Comput. Sci. Inf. Technol. 30 (2012) 82–92. [72] F.He, X.Wang, B.Liu, Attack Detection by Rough Set Theory in Recommendation
[36] C. Rygielski, J.-C. Wang, C.D. Yen, Data mining techniques for customer relation- System, in 2010 IEEE International Conference on Granular Computing, pp. 692–
ship management, Technol. Soc. 24 (4) (2002) 483–502. 695, 2010.
[37] S.M. Keaveney, Customer switching behavior in service industries: an exploratory [73] R. Bellazzi, B. Zupan, “Predictive data mining in clinical medicine: current issues
study, J. Mark. 59 (2) (1995) 71–82. and guidelines, Int. J. Med. Inform. 77 (2) (2008) 81–97.
[38] B. Padmanabhan, A. Hevner, C. Michael, S. Crystal, From information to opera- [74] J. Hadden, A Customer Profiling Methodology for Churn Prediction (P.hD thesis),
tions: service quality and customer retention, ACM Trans. Manag. Inf. Syst. 2 (4) Cranf. Univ., 2008.
(2011). [75] J.W.Grzymala-busse, A Comparison of Three Strategies to Rule Induction from
[39] I. Brandusoiu, G. Toderean, Churn prediction in the telecommunications sector Data with Numerical Attributes.pdf, in: Proceedings of the International Workshop
using support vector machines, Ann. ORADEA Univ. Fascicle Manag. Technol. Eng. on Rough Sets in Knowledge Discovery (RSKD 2003)pp. 132–140, 2003.
(1) (2013). [76] N. Sengupta, J. Sil, A.I. System, Comparison of different Rule calculation method
[40] C.Kang, S.Pei-ji, Customer Churn Prediction Based on SVM-RFE, in 2008 for rough set theory, Int. J. Inf. Electron. Eng. 2 (3) (2012) 464–466.
International Seminar on Business and Information Managementvol. 1, pp. 306– [77] J. Nievergelt, Exhaustive search, combinatorial optimization and enumeration:
309, 2008. exploring the potential of raw computing power, Sofsem 2000 Theory Pract.
[41] E. Shaaban, Y. Helmy, A. Khedr, M. Nasr, A proposed churn prediction model, Int. Inform. 1963 (2000) 18–35.
J. Eng. Res. Appl 2 (4) (2012) 693–697. [78] N. Ariffin, M. Zin, S. Norul, H. Sheikh, N. Faridatul, A. Zainal, A Comparison of
[42] H. Kaizhu, Y. Haiqin, M.R. Lyu, Machine Learning: modeling data locally and Exhaustive, heuristic and genetic algorithm for travelling Salesman problem in
globallyAdvanced Topics in Science and Technology in China, Springer-Verlag PROLOG, Int. J. Adv. Sci. Eng. Inf. Technol. 2 (2012) (2012) 49–53.
Berlin Heidelberg, 2008. [79] C.Hor, P.A.Crossley, D.L.Millar, Application of Genetic Algorithm and Rough Set
253
A. Amin et al. Neurocomputing 237 (2017) 242–254
Theory for Knowledge Extraction, in: Power Tech, 2007 IEEE Lausannepp. 1117–
1122, 2007. Khalid Alawfi received the B.Sc degree in computer engineering in 1999 from King Fahd
[80] A. Amin, S. Shehzad, C. Khan, I. Ali, S. Anwar, Churn prediction in telecommu- University of Petroleum and Minerals (KFUPM), Saudi Arabia, and MSc and PhD degrees
nication industry using rough set approach. New Trends in Computational in Informatics from Bradford University, UK, in 2002 and in 2006 respectively. During
Collective Intelligence, Springer, 2015, pp. 83–95. 2002–2006, he worked as part of the Networks and Performance Engineering Research
Group at Bradford University. Currently he is an Associate Professor in Computer Science,
and Dean of the College of Computer Science and Engineering at Taibah University in
Adnan Amin received the MSc degree in Computer Saudi Arabia. He is also a Senior Honorary Fellow at the Cognitive Big Data Informatics
Science from University of Peshawar, MS degree (with (CogBID) Research Laboratory at the University of Stirling, Scotland, UK.
Distinction) in Computer Science with major Databases
from Institute of Management Sciences Peshawar, Pakistan
in 2008 and 2015 respectively. He is currently a Ph.D Kaizhu Huang is currently an Associate Professor in
Scholar and Lecturer at Department of Computing Sciences Xi’an Jiaotong-Liverpool University, China. Before that, he
at Institute of Management Sciences Peshawar. His re- was an Associate Professor at National Laboratory of
search interests include Data mining, Databases, Big Data Pattern Recognition (NLPR), Institute of Automation,
and Machine Learning. Chinese Academy of Sciences (CASIA). He was a student
of the Special Class for Gifted Youth at Xi'an Jiaotong
University and received the B.Sc. degree in Engineering in
1997. He received the M.Sc. degree in Engineering from
CASIA in July 2000 and the Ph.D. degree from The Chinese
Univ. of Hong Kong (CUHK) in 2004. He worked as a
research scientist in Fujitsu R & D Centre from 2004 to
Sajid Anwar obtained his BSc (Comp. Sc) and MSc 2007. During 2008 and 2009, he was a research fellow in
(Comp. Sc) degrees from University of Peshawar in 1997 CUHK and a researcher at University of Bristol, UK. He is
and 1999 respectively. He obtained his MS (Comp. Sc) and the recipient of 2011 Asian Pacific Neural Network Assembly (APNNA) Distinguished
PhD (in Software Architecture) from the University of Younger Researcher Award. He also received Best Book Award in National “Three
NUCES-FAST, Pakistan, in 2007 and 2011 respectively. 100“Competition 2009. He has published 6 books in Springer and over 110 international
He is currently Assistant Professor of Computing Science, research papers (40 SCI-indexed international journals and 60+ EI conference papers)
and coordinator of the BS-Software Engineering at the e.g., in journals (JMLR, Neural Computation, IEEE T-PAMI, IEEE T-NN, IEEE T-BME,
Institute of Management Sciences Peshawar, Pakistan. His IEEE T-SMC, NN) and conferences (NIPS, IJCAI, SIGIR, UAI,CIKM, ICDM,
research interests are concerned with Software ICML,ECML, CVPR). He serves as Advisory Board Member in Springer Book Series
Architecture, Software Requirement Engineering, Bio- Neuroinformatics. He is the member of CCF Technical Committee of Artificial
Searched Based Software Engineering and Mining Intelligence and Pattern Recognition. He served on the programme committees in many
Software Repository. international conferences such as ICONIP, IJCNN, IWACI, EANN, KDIR. Especially, he
serves as chairs in several major conferences or workshops, e.g., AAAI 2016 (Area Chair),
ACML 2016 (Publication co-Chair), ICONIP 2014 (Program co-Chair), DMC 2012–2016
(Organizing co-Chair), ICDAR 2011 (Publication Chair), ACPR 2011 (Publicity Chair),
Awais adnan is Assistant Professor and Cooerdinator of ICONIP2006, 2009–2011 (Session Chair).
Master Program, Department of Comptuer Science in
Institute of Management Sciences Peshawar. He has done
his Ph.D. from IMSciences|Peshawar and MS from NUST Amir Hussain received the BEng (with highest first class
Islamabad. He is manager of ORIC in IMSciences| Hons.) and the Ph.D. degree in novel neural network
Peshawar to promote and facilitate the research students architectures and algorithms from the University of
in commercialization of their research. His major areas of Strathclyde, Glasgow, U.K., in 1992 and 1997, respectively.
interest are Multimedia and Machine Learning. He is currently full Professor of Computing Science, and
founding Director of the Cognitive Big Data Informatics
(CogBID) Research Laboratory at the University of Stirling
in Scotland, UK. He has conducted and led collaborative
research with industry; partnered in major European and
international research programs, and supervised over 30
Ph.D. students. He has (co)authored over 300 papers,
including over a dozen books and 100+ journal papers.
Muhammad Nawaz received his MSc (Computer He is founding Editor-in-Chief of the journals: Cognitive
Science) and MS in Information Technology from Computation (Springer Nature), and Big Data Analytics (BioMed Central), and Chief-
University of Peshawar-Pakistan. He worked as a lecturer Editor of the Springer Book Series on Socio-Affective Computing, and Cognitive
at the University of Peshawar; followed by working as a Computation Trends. He is Associate Editor of the IEEE Transactions on Neural
Computer Programmer at Khyber Teaching Hospital, Networks and Learning Systems, the IEEE Transactions on Systems, Man, and
Peshawar; and then appointed as Assistant Professor in Cybernetics: Systems, and the IEEE Computational Intelligence Magazine. He is
Multimedia at the Institute of Management Sciences, Chapter Chair of the IEEE UK & RI Industry Applications Society, Vice-Chair of the
Peshawar - a position he still holds. Currently he is the Emerging Technologies Technical Committee of the IEEE Computational Intelligence
Head of PhD and MS-Computer Sciences at the IMSciences Soceity (CIS), and founding General Chair for the IEEE CIS sponsored, flagship IEEE
| Peshawar. SSCI (CICARE Symposium) series. He is a senior Fellow of the Brain Science Foundation
(USA).
254