0% found this document useful (0 votes)
69 views28 pages

Predictive Analysis For Retail Banking

Logistic regression was chosen as the best predictive model for a retail banking marketing campaign dataset based on an accuracy of 80.9%. Exploratory data analysis including correlation heatmaps and model selection using min-max scaling were performed. Logistic regression outperformed k-nearest neighbors, decision trees, random forests, and support vector machines on this imbalanced classification problem to predict customer subscription to banking term deposits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views28 pages

Predictive Analysis For Retail Banking

Logistic regression was chosen as the best predictive model for a retail banking marketing campaign dataset based on an accuracy of 80.9%. Exploratory data analysis including correlation heatmaps and model selection using min-max scaling were performed. Logistic regression outperformed k-nearest neighbors, decision trees, random forests, and support vector machines on this imbalanced classification problem to predict customer subscription to banking term deposits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Predictive Analytics for Retail Banking

Done by
vamsi
RETAIL BANKING ??!

 Typical mass-market banking in which individual customers use local branches


of larger commercial banks. Services offered include savings and checking
accounts, mortgages, personal loans, debit/credit cards. The focus is on the
customer.
 The main challenges this sector are :
• What is the suitable product to recommend to a customer ?
• What is the best time to market the product ?
• Which is the most effective channel to contact a customer ?
PROBLEM STATEMENT

 In this problem, the data is related with direct marketing campaigns of a banking
institution. The marketing campaigns were based on phone calls. Often, more
than one contact to the same client was required, in order to access if the product
(bank term deposit) would be ('yes') or not ('no’) subscribed. The goal is to
predict if the client will subscribe a term deposit.
ABOUT DATASET

 This is the classic marketing bank dataset uploaded originally in the UCI
Machine Learning Repository. The dataset gives you information about a
marketing campaign of a financial institution in which you will have to analyse
in order to find ways to look for future strategies in order to improve future
marketing campaigns for the bank.
Here are what the columns in the data set
represent:
 Age :  Age of the client- (numeric)
 Job : Client’s occupation - (categorical) (admin, blue-collar, entrepreneur, housemaid, management, retired, self
employed, services, student, technician, unemployed, unknown)
 Marital :  Client’s marital status - (categorical) (divorced, married, single, unknown, note: divorced means
divorced or widowed)
 Education : Client’s education level - (categorical)
 Default : Indicates if the client has credit in default - (categorical) (no, yes)
 Balance :average yearly balance, in euros (numeric).
 Housing : Does the client as a housing loan? - (categorical) (no, yes)
 Loan :  Does the client as a personal loan? - (categorical) (no, yes)
 Contact : Type of communication contact - (categorical) (unknown, cellular, telephone)
 Day : Day of last contact with client.
 Month : Month of last contact with client - (categorical) (Jan - Dec)
 Duration : Duration of last contact with client, in seconds - (numeric)
For benchmark purposes only, and not reliable for predictive modelling.

 Campaign : number of contacts performed during this campaign and for this client
(numeric, includes last contact) - (numeric)
(includes last contact)

 Pdays : Number of days passed client was last contacted - (numeric)


(-1 means client was not previously contacted)

 Previous : Number of client contacts performed before this campaign - (numeric)

 Poutcome : Previous marketing campaign outcome - (categorical)

 Deposit : subscription verified. (output)


EXPLORATORY DATA ANALYSIS(EDA)
CORRELATION
USING
HEATMAP
0divorced
1married
2single
Model Selection
Why Min Max Scaler?

 Since the output variable is


in 0’s and 1’s form, We
need to scale down our
feature variables to the
range of 0 and 1
TEST SIZE

80-20 *Recommended for banking sector


Accuracies compared …

• K-nearest Neighbour: 75.3%


• Logistic Regression: 80.9%
• Decision Tree: 78.2%
• Random Forest Classifier: 78%
• Support vector Machine: 53%
Confusion Matrices..

KNN Logistic Regression Decision Tree Random Forest SVM


GRAPHS
GRAPHS (CONT.)
GRAPHS (CONT.)
WE CHOOSE

LOGISTIC REGRESSION
Accuracy = 80.9%
CONCLUSION

 Most classification problems in the real world are imbalanced. Also, almost always data
sets have missing values. In this post, we covered strategies to deal with both missing
values and imbalanced data sets. We also explored different ways of building ensembles
in sklearn. Below are some takeaway points:
 Sometimes we may be willing to give up some improvement to the model if that would
increase the complexity much more than the percentage change in the improvement to the
evaluation metrics.
 When building ensemble models, try to use good models that are as different as possible
to reduce correlation between the base learners. We could’ve enhanced our stacked
ensemble model by adding Dense Neural Network and some other kind of base learners as
well as adding more layers to the stacked model.
 Easy Ensemble usually performs better than any other resampling methods.

You might also like