Data Preparation
Data Preparation
Executive summary
This study aimed at developing and testing data mining model for credit card customer churn
prediction for bank. The bank issued four types of credit card namely blue, gold, silver and
platinum. The dataset was downloaded from secondary sources Kaggle.com. The data was
preprocessed and prepared for model development. The Decision tree analysis was performed to
predict the customer churn. The results were interpreted in the form of decision tree.
Table of Contents
Executive summary.....................................................................................................................................2
Introduction.................................................................................................................................................4
1 Data preparation......................................................................................................................................5
2 Modelling..................................................................................................................................................7
3 Test Design...............................................................................................................................................8
4. Analysis and conclusion...........................................................................................................................9
Discussion..................................................................................................................................................10
Conclusion.................................................................................................................................................11
Introduction
A successful banking service focuses more on making profit and minimizing the risk. The profit
comes automatically through various sources as per the fund management decision. But the
minimizing Risk is become heavy task in banking operation, especially in credit card
management involves critical risk factor. In this, customer dropping will be considered as a high-
risk factor pertaining to individuals. This indicates the banks volatility in terms of fund
management and risk management. The risk handling technique is a process of identifying,
analysing, accepting and finding solution to overcome risk. Any growing bank organisation will
vigorously give more preference to customers at their fir0st spot, same time they adopt
systemized way to keep the customers at maximum level of satisfaction on regular basis. The
systemized approach should be executed very effectively keeping two aspects in mind, that is
prediction and proactive decision on eradicating customer attrition very quickly. This operation
is a challenging role at top management for making effecting decision on making potential
revenue source for any business organization. Around the world banking sector credit card
customer Churning is common phenomenon but as per the business point of view it is not
healthy sign for any organisation. In this case, Bank manager analysed that, there is gradual drop
in Credit card customers leaving their banking service. We are taking the data from the source
and started analysing the reason behind the issue. The first and foremost step to be taken is to
improve the royalty of the customer by way of measuring adopting a peculiar technique to reach
the customers. This will significantly reduce the drop in credit card customer attrition at bank. In
addition, we make a productive effort in analysis, will yield good result on customers mind set in
turn they continue their activity with the bank with confident. The task taken for retaining
customer is cumbersome and it crops good returns. Bank need to invest in technology based
operational solution more efficiently to support on decision making level making it simple. It is a
long time and continuous process from the first day of bank operation and need to continuously
improve the technology on regular basis as per the trend prevailing at universe. Bank
Management was so aggressive in mobilizing fund management at the other end, credit card
customers are dipping in numbers will really cautioned the top management at decision making
level. The decisive goal will be attained using the technology driven solution for retaining the
customer by way of satisfying them on day-to-day basis adopting simplified systems.
Management is more focused on further development of operations using the technology driven
methods is given complete vision to decision makers at bank. This technology provides a churn
data analysis returns a very minimum error on decision making. The churn prediction on attrition
rate of credit cards will be performed using Knime Analytics software.
1 Data preparation
1. 1 Selection of dataset
The bank issued four different types of credit card namely Blue, Gold, Silver and Platinum. High
rates of customer attrition rate was observed during last fiscal year. There are 23 different
variables in this dataset. The bank Churners dataset file will be chosen using CSV reader in the
Knime workflow.
Output
1.3 Construct featured extraction
The data will be constructed based on the featured extraction method. The variable average open
to buy and contacting call center were excluded. The rest other attributes were chosen for the
study. The derived attributes includes attrition flag, age, gender, total number of dependents of
the customer, educational status, marital status, Income, card type whether it is blue, silver or
gold, total number of months on book, credit limit of customer, total revolving balance, total
change in amount transacted from Q1 to Q4, total amount transacted, total count of transactions
made and average utilization rate.
2 Modelling
2.1 Selection of modelling task
Churn prediction is a typical problem faced by the customer service industries. The commonly
used churn prediction methods involve data mining techniques such as logistic regressions,
clustering, KNN, Naïve Bayesian and decision tree analysis. The best five prescient investigation
models are:
Classification model: Considered the least difficult model, it sorts information for basic and
direct inquiry reaction. A model use case is answer the inquiry "Is this a fake exchange?"
Clustering model: This model homes information together by normal credits. It works by
gathering things or individuals with shared attributes or practices and plans methodologies for
each gathering at a bigger scope. A model is in deciding credit hazard for an advance candidate
dependent on what others in the equivalent or a comparable circumstance did before.
Forecasting model: This is a famous model, and it works on anything with a mathematical worth
dependent on gaining from authentic information. For instance, in noting how much lettuce a
café should arrange one week from now or the number of calls a client assistance specialist ought
to have the option to deal with each day or week, the framework thinks back to recorded
information.
Outlier model: This model works by investigating strange or remote information focuses. For
instance, a bank may utilize an exception model to recognize extortion by finding out if an
exchange is outside of the client's typical purchasing propensities or whether a cost in a given
classification is ordinary or not. For instance, a $1,000 Visa charge for a washer and dryer in the
cardholder's favored enormous box store would not be disturbing, yet $1,000 spent on architect
garments in an area where the client has never charged different things may be characteristic of a
penetrated account.
Time series model: This model assesses a succession of information focuses dependent on
schedule. For instance, the quantity of stroke patients conceded to the medical clinic over the
most recent four months is utilized to foresee the number of patients the medical clinic may hope
to concede one week from now, one month from now or the remainder of the year. A solitary
measurement estimated and analyzed after some time is consequently more significant than a
straightforward normal.
Decision tree analysis using random forest algorithm that classifies the dataset. It is a kind of
machine learning algorithm. The dataset is partitioned into 66% for model building and 34% for
testing the model. The dataset was featured using forward feed loop selection. The decision tree
analysis was performed and tree image was processed for both training and testing data. The
ROC curve and accuracy statistics of the model was computed.
2.3 Assumptions
1. The entire training data was considered as root for the decision tree analysis
2. The featured attributes are categorical
3. The records are distributed in recursive manner
4. The attribute selection is based on gini index
3 Test Design
The data was partitioned for 66% model building and rest 34% will be used for model
evaluation. The pruning strategy is based on the number of records. The number of times that the
predicted class coincides with the original churn class is the basis for any measure for the model
quality as it is calculated by the Scorer node. It is observed that the customers with churn=
attrition customer are, hopefully, many more than the existing customers with Churn=1. It is also
observed to consider this fact into account and give more weight to the error made on the class
Churn=1, then it shall be introduced an Equal Size Sampling node on the test set to under-
sample the more numerous class Churn=0. Also that the Scorer node ‒ or any other scoring node
‒ allows to evaluate and compare different models. A subsequent Sorter node would allow to
select and retain only the best performing model.
Model building
Parameters
Model description
The model was color coded based on existing and attrition customer. The data is filtered and
partitioned with 66% and 34%. The decision tree analysis predicts the churn rate based on the
variables. The accuracy of prediction model is evaluated using specificity analysis and ROC
curve.
The model estimated that most of the credit card holders from type blue shows highest rate.
However, most of the Gold credit card users are estimated as churned. Less attrition rate was
observed among Silver type credit card users. Platinum credit card users are estimated with zero
churn rate.
Specificity
The specificity value of silver, Gold and platinum have high specificity and blue type only shows
less specificity.
ROC Curve
The ROC curve reveals that most of the variables shows true positive rates. However, total
transaction count and transacted amount, revolving balance and average utilization rates are
represented area under the curve.
Discussion
Accuracy is a key factor in financial management because to follow ethics and performance.
Performance includes accuracy and speed with simplicity. Human ability is limited and handling
large amount of data is become impossible. The technology improves performance and enables
its operational flow very quickly and efficiently with adequate security measure. Under Human
operational system, customer handling was very minimal and risk involved is very high due to
delay in data handling which leads to more on attrition at customer level. The main aim of a bank
management is to curtail the credit card customers attrition rate on war foot level of execution. A
Pinnacle Customer Engagement programme is the best way of meeting customers on face-to-
face, will lead to complete satisfaction to customers and bank. In this process, bank will
understand each customer’s issues on the hand to give them fruitful solution to keep them
retained as a customer for ever. The issues are raised by the customers are might be genuine and
new as per current situation prevailing in banking operation. The Government always change
their financial goals as and when it requires. Banking sector runs its money rotation on Interest
rates applicable to credit cards, loans, etc. Similarly, customers financial strength is fluctuated
depends on factors affecting on generating their income.
In modern banking industry involves more cost factor to acquire new genuine customers than it
to retain the existing customer. Because, banks are retaining the income from the current
customer and saving money, efforts and marketing on fetching new customers. Performing
effective data analysis will provide a clear vision for banks at technology driven decision for
increasing customer’s loyalty on predicting and reducing the attrition rate tremendously. An
effective Predictive model will show a path to risk on Dormancy at individual customer level.
The main key elements on this model is predicting and preventing customer attrition using micro
level examination parameters to identify ‘customer-at-risk’. The Technology driven analysis is
provided a clear eye-vision to banks on handling of customers as per their level of risk involved
in it.
The study observed that average utilization rate is higher for existing customer than attrition
customer. The current dataset possess 16% attrition customers with utilization rates close to 0.99.
The estimated total transaction of churned customer is less than 833 and higher for existing
customer. The total change in amount during Q4 to Q1 is 0.3% for churned customers and 98%
for existing customers.
Conclusion
The study recommends the strategy development for avoiding customer churn on credit cards.
The study observed that total estimated churned customers are 16%. Among 16% churned
customers, most of them used Gold and blue credit cards. The transaction amount was less than
900 USD for the churned customers. Hence, few offers on Blue and Gold credit card should be
employed to avoid such churning rates.