Bank Marketing Data Set Analysis
Bank Marketing Data Set Analysis
ANALYSIS
BY ROHAN SANGHAVI
1
1. What is this data regarding?
The data is related with direct marketing campaigns of a Portuguese banking institution. The
marketing campaigns were based on phone calls. Often, more than one contact to the same
client was required, in order to access if the product (bank term deposit) would be ('yes') or
not ('no') subscribed.
Further details are given below:
Input variables:
# bank client data:
1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-
collar','entrepreneur','housemaid','management','retired','self-
employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced'
means divorced or widowed)
4 - education (categorical:
'basic.4y','basic.6y','basic.9y','highschool','illiterate','professional.course','university.degree','u
nknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
# related with the last contact of the current campaign:
8 - contact: contact communication type (categorical: 'cellular’, ‘telephone')
9 - month: last contact month of year (categorical: 'Jan', 'Feb', 'mar', ..., 'Nov', 'Dec')
10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','Thu','Fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly
affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known
before a call is performed. Also, after the end of the call y is obviously known. Thus, this
input should only be included for benchmark purposes and should be discarded if the
intention is to have a realistic predictive model.
# other attributes:
12 - campaign: number of contacts performed during this campaign and for this client
(numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous
campaign (numeric; 999 means clients were not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client
(numeric)
2
15 - poutcome: outcome of the previous marketing campaign (categorical:
'failure','nonexistent','success')
# social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric)
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)
3
HEAD OF DATA
age job marital education default balance housing loan contact day month
duration campaign pdays previous
poutcome y
1 unknown no
2 unknown no
3 unknown no
4 unknown no
5 unknown no
6 unknown no
4
TAIL OF DATA
age job marital education default balance housing loan contact day month
duration campaign pdays previous
poutcome y
45210 unknown no
45211 other no
5
SUMMARY OF DATA
8 job blue-collar:9732
6
SR MEAN, MEDIAN, QUARTILE RANGE VALUES,
ATTRIBUTES
No MIN/MAX
23 education secondary:23202
29 default no :44396
Median : 448
38 balance
This indicates that the people have an average balance in
their account
7
39 balance Mean : 1362
43 housing no :20081
44 housing yes:25130
50 loan no :37967
8
66 day Median :16.00
9
81 duration Mean : 258.2
10
97 pdays Max. :871.0
113 y no :39922
11
The above summary gives us the minimum, maximum quartile range, mean and median
of every attribute in the data set
ANALYSIS
The histogram indicates that majority customers are in the age group of 30 to 40 years
of age. This clearly indicates that the bank is trying to appeal to younger audiences as they
are more likely to open an account in the bank. Very few customers are above 70 years of
age. That may be because they are retired so they do not have the urgency of opening a new
bank account since they already might be having one in some other bank.
12
2. BOXPLOT OF AGE DATA
ANALYSIS
This boxplot clearly shows the outliers which are all above the age of 70 years as supported
by the age histogram.
Also, majority are in the range of 30-40 years which is also supported by the histogram
Hence, we can confidently say that the bank wants to appeal to younger masses
13
3. MARRIAGE STATUS OF CLIENTS
14
ANALYSIS
The graphs above indicate that maximum clients are married (27214 clients).
Only about 5000 clients are divorced and the rest are single
The density plot tells us that the data of married clients is normally distributed
The bank clearly wants to appeal to the married section of society because after marriage,
monetary needs are more due to a family and children. Such kind of people are more
likely to open an account in the bank compared to clients who are single or divorced.
15
ANALYSIS
The above bar graph clearly indicates that majority of the clients who were contacted
have attained secondary or tertiary education.
Around 23,000 clients have attained secondary education and around 13,000 clients have
attained tertiary education.
This clearly is an indication, that the bank wants to appeal to educated clients as they are
more likely to understand the advantages of opening an account in a bank than people
who are not educated. More education means that they work at a higher post, and when
there is more money involved there is a higher chance of opening an account in the bank.
16
The clear answer is “NO”.
From the above graph is it is very clear that there is no clear relation between education and
job status. Even people having tertiary education are unemployed and people with only
primary education are self-employed. So as per the previous graph on page 15 it is clear
that the bank is taking a new chance by assuming there is a direct relation between
education and job status.
Also, more data needs to be collected to regarding clients with “unknown” education
status.
17
ANALYSIS
From the graph above it is clear that there is no direct relationship between education
and marriage status.
Clients who are divorced also have tertiary education and clients who are married also have
tertiary education.
More data is needed regarding clients with “unknown” status to help determine any sort
of direct relationship.
FURTHER DETAILS
The above graph shows that many clients who are highly educated are married or single.
Very few primary educated clients are divorced or single. Many of them are married.
18
6. EDUCATION STATUS RELATION WITH BALANCE IN ACCOUNT
ANALYSIS
The above graph clearly indicates that there is a direct relation between education and
balance. Educated clients have a very high balance as high as 1,00,000 and uneducated
clients have a low balance of 60,000. More education means that the clients work at a
higher post or are self-employed which supports the claim of them having a high bank
balance.
19
7. FURTHER ANALYSIS ON BANK BALANCE
20
ANALYSIS
The above boxplot and density plot are indicative of the fact that majority clients have a
very low bank balance.
This is supported by the numerous outliers in the boxplot and the density curve which tells us
that majority clients have a balance between 5,000 to 25,000 only.
21
8. HOUSING LOAN DATA
ANALYSIS
Majority clients have taken a housing loan (about 25,000 clients) and about 20,000 have not
taken a housing loan.
This indicates that most clients are from average financial backgrounds which supports
the claim of most clients having a bank balance up to 25,000.
22
9. HAVE PERSONAL LOANS BEEN TAKEN?
ANALYSIS
Clearly the data is indicative of the fact that more than 30,000 clients have not taken any
personal loans. This is indicative of the fact that they are more likely to open an account
in the bank due to less personal debt.
Hence this criterion has been successfully satisfied.
23
10. EDUCATION STATUS RELATION WITH CAMPAIGN EFFORTS
ANALYSIS
The above graph indicates that the bank does not need to contact educated people having
degrees again and again and the campaign duration for them is only 40-50 days.
On the other hand, people having low educational qualification need more time to give a
decision. They take more than 70 days.
Clearly this is because educated clients know the advantage of opening a bank account
and it would be much easier for the bank to explain them their policies.
24
11. OUTCOME OF PREVIOUS SUCH A CAMPAIGN
ANALYSIS
The data is indicative of the fact that around 4900 clients refused to open an account in
such a previous campaign. Only about 1500 clients agreed to do so.
However, a major chunk of the data is not known to us.
So, more data needs to be collected regarding result of previous such campaigns.
25
12. RELATION BETWEEN BALANCE AND DESIRED TARGET
ANALYSIS
The graph clearly shows that clients having a bank balance of 1,00,000 have not agreed to
open an account while people having a moderate balance are opening an account
This is probably because rich people who have a high balance have already opened an
account in some other bank and it is not of urgent need for them to open an account unlike
the middle class.
26
13. RELATIONSHIP BETWEEN AGE AND EDUCATION STATUS OF CLIENTS
ANALYSIS
The above density plot shows that majority clients who do not have any education status on
record are about 50 years of age and clients who are highly educated with degrees are about 25
years old.
27
14. RELATION BETWEEN CLIENT AGE AND OUTCOME OF CAMPAIGN
ANALYSIS
The density plot clearly indicates that clients who are young have a higher success rate
compared to old clients. However, all old clients who have been contacted have agreed to
open an account in the bank.
This may be because they may be old, long time clients who are well affiliated with the
bank policies.
28
15. RELATION BETWEEN OUTCOME OF PRESENT AND PREVIOUS
CAMPAIGN
ANALYSIS
The graph clearly indicates that the campaign was partially successful because many of
the clients who did not agree to open an account last time did not agree this time as well.
However, the success rate among previous clients who agreed to open an account or did
not give any clear indication is high
29
16. DEFAULTER DATA
ANALYSIS
The graph indicates that majority clients have not defaulted on any loans and hence are
more likely to open an account due to no debt or financial crisis.
30
17. RELATION BETWEEN JOB AND OUTCOME
ANALYSIS
This clearly indicates that the majority failure rate is from those who do blue-collar
menial jobs and the maximum success rate is from those who do white- collar
management jobs or those who are the admin of a firm
This may be because clients who do menial jobs are living hand to mouth and do not feel
the need of opening a bank account.
31
18. RELATION BETWEEN DURATION OF LAST CALL AND LENGTH OF
CAMPAIGN FOR EACH CLIENT
ANALYSIS
The scatter plot clearly indicates that the longer the campaign went on, the shorter the
duration of the last call was to the client which is quite natural
32
FINAL RESULT
39,922 clients have agreed to open an account while 5,289 clients have not agreed to do so
and so clearly the campaign is not fully successful but since the outcome has improved
from last time we can say relatively the campaign has been partially successful
(graph 15)
CONCLUSION
Hence, we can say the campaign was partially successful.
33