Predictive Analysis For Retail Banking
Predictive Analysis For Retail Banking
Done by
vamsi
RETAIL BANKING ??!
In this problem, the data is related with direct marketing campaigns of a banking
institution. The marketing campaigns were based on phone calls. Often, more
than one contact to the same client was required, in order to access if the product
(bank term deposit) would be ('yes') or not ('no’) subscribed. The goal is to
predict if the client will subscribe a term deposit.
ABOUT DATASET
This is the classic marketing bank dataset uploaded originally in the UCI
Machine Learning Repository. The dataset gives you information about a
marketing campaign of a financial institution in which you will have to analyse
in order to find ways to look for future strategies in order to improve future
marketing campaigns for the bank.
Here are what the columns in the data set
represent:
Age : Age of the client- (numeric)
Job : Client’s occupation - (categorical) (admin, blue-collar, entrepreneur, housemaid, management, retired, self
employed, services, student, technician, unemployed, unknown)
Marital : Client’s marital status - (categorical) (divorced, married, single, unknown, note: divorced means
divorced or widowed)
Education : Client’s education level - (categorical)
Default : Indicates if the client has credit in default - (categorical) (no, yes)
Balance :average yearly balance, in euros (numeric).
Housing : Does the client as a housing loan? - (categorical) (no, yes)
Loan : Does the client as a personal loan? - (categorical) (no, yes)
Contact : Type of communication contact - (categorical) (unknown, cellular, telephone)
Day : Day of last contact with client.
Month : Month of last contact with client - (categorical) (Jan - Dec)
Duration : Duration of last contact with client, in seconds - (numeric)
For benchmark purposes only, and not reliable for predictive modelling.
Campaign : number of contacts performed during this campaign and for this client
(numeric, includes last contact) - (numeric)
(includes last contact)
LOGISTIC REGRESSION
Accuracy = 80.9%
CONCLUSION
Most classification problems in the real world are imbalanced. Also, almost always data
sets have missing values. In this post, we covered strategies to deal with both missing
values and imbalanced data sets. We also explored different ways of building ensembles
in sklearn. Below are some takeaway points:
Sometimes we may be willing to give up some improvement to the model if that would
increase the complexity much more than the percentage change in the improvement to the
evaluation metrics.
When building ensemble models, try to use good models that are as different as possible
to reduce correlation between the base learners. We could’ve enhanced our stacked
ensemble model by adding Dense Neural Network and some other kind of base learners as
well as adding more layers to the stacked model.
Easy Ensemble usually performs better than any other resampling methods.