0% found this document useful (0 votes)
418 views

Bank Marketing Data Set Analysis

The data set relates to a direct marketing campaign conducted by a Portuguese banking institution. It contains information on 45,211 clients that were contacted via phone calls and includes 21 attributes related to client demographics, previous banking interactions, and campaign details. Summary statistics show that most clients were between 30-40 years old, married, had secondary or tertiary education, and were contacted in May via cellular phones. This suggests the bank's marketing strategy targeted younger, educated individuals.

Uploaded by

Rohan Sanghavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
418 views

Bank Marketing Data Set Analysis

The data set relates to a direct marketing campaign conducted by a Portuguese banking institution. It contains information on 45,211 clients that were contacted via phone calls and includes 21 attributes related to client demographics, previous banking interactions, and campaign details. Summary statistics show that most clients were between 30-40 years old, married, had secondary or tertiary education, and were contacted in May via cellular phones. This suggests the bank's marketing strategy targeted younger, educated individuals.

Uploaded by

Rohan Sanghavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

BANK MARKETING DATA SET

ANALYSIS
BY ROHAN SANGHAVI

Pritesh Tiwari| Data Science - Develearn| 06/05/2019

1
1. What is this data regarding?
The data is related with direct marketing campaigns of a Portuguese banking institution. The
marketing campaigns were based on phone calls. Often, more than one contact to the same
client was required, in order to access if the product (bank term deposit) would be ('yes') or
not ('no') subscribed.
Further details are given below:
Input variables:
# bank client data:
1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-
collar','entrepreneur','housemaid','management','retired','self-
employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced'
means divorced or widowed)
4 - education (categorical:
'basic.4y','basic.6y','basic.9y','highschool','illiterate','professional.course','university.degree','u
nknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
# related with the last contact of the current campaign:
8 - contact: contact communication type (categorical: 'cellular’, ‘telephone')
9 - month: last contact month of year (categorical: 'Jan', 'Feb', 'mar', ..., 'Nov', 'Dec')
10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','Thu','Fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly
affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known
before a call is performed. Also, after the end of the call y is obviously known. Thus, this
input should only be included for benchmark purposes and should be discarded if the
intention is to have a realistic predictive model.
# other attributes:
12 - campaign: number of contacts performed during this campaign and for this client
(numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous
campaign (numeric; 999 means clients were not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client
(numeric)

2
15 - poutcome: outcome of the previous marketing campaign (categorical:
'failure','nonexistent','success')
# social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric)
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):


21 - y - has the client subscribed a term deposit? (binary: 'yes','no')

3
HEAD OF DATA

age job marital education default balance housing loan contact day month
duration campaign pdays previous

1 58 management married tertiary no 2143 yes no unknown 5 may 261


1 -1 0

2 44 technician single secondary no 29 yes no unknown 5 may 151


1 -1 0

3 33 entrepreneur married secondary no 2 yes yes unknown 5 may 76


1 -1 0

4 47 blue-collar married unknown no 1506 yes no unknown 5 may 92


1 -1 0

5 33 unknown single unknown no 1 no no unknown 5 may 198


1 -1 0

6 35 management married tertiary no 231 yes no unknown 5 may 139


1 -1 0

poutcome y

1 unknown no

2 unknown no

3 unknown no

4 unknown no

5 unknown no

6 unknown no

4
TAIL OF DATA

age job marital education default balance housing loan contact day month
duration campaign pdays previous

45206 25 technician single secondary no 505 no yes cellular 17 nov 386


2 -1 0

45207 51 technician married tertiary no 825 no no cellular 17 nov 977


3 -1 0

45208 71 retired divorced primary no 1729 no no cellular 17 nov 456


2 -1 0

45209 72 retired married secondary no 5715 no no cellular 17 nov


1127 5 184 3

45210 57 blue-collar married secondary no 668 no no telephone 17 nov


508 4 -1 0

45211 37 entrepreneur married secondary no 2971 no no cellular 17 nov


361 2 188 11

poutcome y

45206 unknown yes

45207 unknown yes

45208 unknown yes

45209 success yes

45210 unknown no

45211 other no

5
SUMMARY OF DATA

SR MEAN, MEDIAN, QUARTILE RANGE VALUES,


ATTRIBUTES
No MIN/MAX

1 age Min. :18.00

2 age 1st Qu.:33.00

3 age Median :39.00

4 age Mean :40.94

5 age 3rd Qu.:48.00

6 age Max. :95.00

8 job blue-collar:9732

9 job management :9458

10 job technician :7597

11 job admin. :5171

12 job services :4154

13 job retired :2264

14 job (Other) :6835

6
SR MEAN, MEDIAN, QUARTILE RANGE VALUES,
ATTRIBUTES
No MIN/MAX

15 marital divorced: 5207

16 marital married :27214

22 education primary : 6851

23 education secondary:23202

24 education tertiary :13301

25 education unknown : 1857

29 default no :44396

30 default yes: 815

36 balance Min. : -8019

37 balance 1st Qu.: 72

Median : 448

38 balance
This indicates that the people have an average balance in
their account

7
39 balance Mean : 1362

40 balance 3rd Qu.: 1428

41 balance Max. :102127

43 housing no :20081

44 housing yes:25130

50 loan no :37967

51 loan yes: 7244

57 contact cellular :29285

58 contact telephone: 2906

59 contact unknown :13020

64 day Min.: 1.00

65 day 1st Qu.: 8.00

8
66 day Median :16.00

67 day Mean :15.81

68 day 3rd Qu.:21.00

69 day Max. :31.00

71 month may :13766

72 month jul : 6895

73 month aug : 6247

74 month jun : 5341

75 month nov : 3970

76 month apr : 2932

77 month (Other): 6060

78 duration Min. : 0.0

79 duration 1st Qu.: 103.0

80 duration Median : 180.0

9
81 duration Mean : 258.2

82 duration 3rd Qu.: 319.0

83 duration Max. :4918.0

85 campaign Min. : 1.000

86 campaign 1st Qu.: 1.000

87 campaign Median : 2.000

88 campaign Mean : 2.764

89 campaign 3rd Qu.: 3.000

90 campaign Max. :63.000

92 pdays Min. : -1.0

93 pdays 1st Qu.: -1.0

94 pdays Median : -1.0

95 pdays Mean : 40.2

96 pdays 3rd Qu.: -1.0

10
97 pdays Max. :871.0

99 previous Min. : 0.0000

100 previous 1st Qu.: 0.0000

101 previous Median : 0.0000

102 previous Mean : 0.5803

103 previous 3rd Qu.: 0.0000

104 previous Max. :275.0000

106 poutcome failure: 4901

107 poutcome other : 1840

108 poutcome success: 1511

109 poutcome unknown:36959

113 y no :39922

114 y yes: 5289

11
The above summary gives us the minimum, maximum quartile range, mean and median
of every attribute in the data set

GRAPHICAL IN-DEPTH ANALYSIS OF ATTRIBUTES

1. CLIENT AGE HISTOGRAM

ANALYSIS
The histogram indicates that majority customers are in the age group of 30 to 40 years
of age. This clearly indicates that the bank is trying to appeal to younger audiences as they
are more likely to open an account in the bank. Very few customers are above 70 years of
age. That may be because they are retired so they do not have the urgency of opening a new
bank account since they already might be having one in some other bank.

12
2. BOXPLOT OF AGE DATA

ANALYSIS
This boxplot clearly shows the outliers which are all above the age of 70 years as supported
by the age histogram.
Also, majority are in the range of 30-40 years which is also supported by the histogram
Hence, we can confidently say that the bank wants to appeal to younger masses

13
3. MARRIAGE STATUS OF CLIENTS

14
ANALYSIS
The graphs above indicate that maximum clients are married (27214 clients).
Only about 5000 clients are divorced and the rest are single
The density plot tells us that the data of married clients is normally distributed
The bank clearly wants to appeal to the married section of society because after marriage,
monetary needs are more due to a family and children. Such kind of people are more
likely to open an account in the bank compared to clients who are single or divorced.

CLIENT EDUCATION ANALYSIS

15
ANALYSIS
The above bar graph clearly indicates that majority of the clients who were contacted
have attained secondary or tertiary education.
Around 23,000 clients have attained secondary education and around 13,000 clients have
attained tertiary education.
This clearly is an indication, that the bank wants to appeal to educated clients as they are
more likely to understand the advantages of opening an account in a bank than people
who are not educated. More education means that they work at a higher post, and when
there is more money involved there is a higher chance of opening an account in the bank.

4. IS THERE A RELATIONSHIP BETWEEN EDUCATION STATUS AND JOB


FOR THESE CLIENTS?

16
The clear answer is “NO”.
From the above graph is it is very clear that there is no clear relation between education and
job status. Even people having tertiary education are unemployed and people with only
primary education are self-employed. So as per the previous graph on page 15 it is clear
that the bank is taking a new chance by assuming there is a direct relation between
education and job status.
Also, more data needs to be collected to regarding clients with “unknown” education
status.

5. MARRIAGE- EDUACTION RELATIONSHIP FOR CLIENTS

17
ANALYSIS
From the graph above it is clear that there is no direct relationship between education
and marriage status.
Clients who are divorced also have tertiary education and clients who are married also have
tertiary education.
More data is needed regarding clients with “unknown” status to help determine any sort
of direct relationship.

FURTHER DETAILS

The above graph shows that many clients who are highly educated are married or single.
Very few primary educated clients are divorced or single. Many of them are married.

18
6. EDUCATION STATUS RELATION WITH BALANCE IN ACCOUNT

ANALYSIS
The above graph clearly indicates that there is a direct relation between education and
balance. Educated clients have a very high balance as high as 1,00,000 and uneducated
clients have a low balance of 60,000. More education means that the clients work at a
higher post or are self-employed which supports the claim of them having a high bank
balance.

19
7. FURTHER ANALYSIS ON BANK BALANCE

20
ANALYSIS
The above boxplot and density plot are indicative of the fact that majority clients have a
very low bank balance.
This is supported by the numerous outliers in the boxplot and the density curve which tells us
that majority clients have a balance between 5,000 to 25,000 only.

21
8. HOUSING LOAN DATA

ANALYSIS
Majority clients have taken a housing loan (about 25,000 clients) and about 20,000 have not
taken a housing loan.
This indicates that most clients are from average financial backgrounds which supports
the claim of most clients having a bank balance up to 25,000.

22
9. HAVE PERSONAL LOANS BEEN TAKEN?

ANALYSIS
Clearly the data is indicative of the fact that more than 30,000 clients have not taken any
personal loans. This is indicative of the fact that they are more likely to open an account
in the bank due to less personal debt.
Hence this criterion has been successfully satisfied.

23
10. EDUCATION STATUS RELATION WITH CAMPAIGN EFFORTS

ANALYSIS
The above graph indicates that the bank does not need to contact educated people having
degrees again and again and the campaign duration for them is only 40-50 days.
On the other hand, people having low educational qualification need more time to give a
decision. They take more than 70 days.
Clearly this is because educated clients know the advantage of opening a bank account
and it would be much easier for the bank to explain them their policies.

24
11. OUTCOME OF PREVIOUS SUCH A CAMPAIGN

ANALYSIS
The data is indicative of the fact that around 4900 clients refused to open an account in
such a previous campaign. Only about 1500 clients agreed to do so.
However, a major chunk of the data is not known to us.
So, more data needs to be collected regarding result of previous such campaigns.

25
12. RELATION BETWEEN BALANCE AND DESIRED TARGET

ANALYSIS
The graph clearly shows that clients having a bank balance of 1,00,000 have not agreed to
open an account while people having a moderate balance are opening an account
This is probably because rich people who have a high balance have already opened an
account in some other bank and it is not of urgent need for them to open an account unlike
the middle class.

26
13. RELATIONSHIP BETWEEN AGE AND EDUCATION STATUS OF CLIENTS

ANALYSIS
The above density plot shows that majority clients who do not have any education status on
record are about 50 years of age and clients who are highly educated with degrees are about 25
years old.

27
14. RELATION BETWEEN CLIENT AGE AND OUTCOME OF CAMPAIGN

ANALYSIS
The density plot clearly indicates that clients who are young have a higher success rate
compared to old clients. However, all old clients who have been contacted have agreed to
open an account in the bank.
This may be because they may be old, long time clients who are well affiliated with the
bank policies.

28
15. RELATION BETWEEN OUTCOME OF PRESENT AND PREVIOUS
CAMPAIGN

ANALYSIS
The graph clearly indicates that the campaign was partially successful because many of
the clients who did not agree to open an account last time did not agree this time as well.
However, the success rate among previous clients who agreed to open an account or did
not give any clear indication is high

29
16. DEFAULTER DATA

ANALYSIS
The graph indicates that majority clients have not defaulted on any loans and hence are
more likely to open an account due to no debt or financial crisis.

30
17. RELATION BETWEEN JOB AND OUTCOME

ANALYSIS
This clearly indicates that the majority failure rate is from those who do blue-collar
menial jobs and the maximum success rate is from those who do white- collar
management jobs or those who are the admin of a firm
This may be because clients who do menial jobs are living hand to mouth and do not feel
the need of opening a bank account.

31
18. RELATION BETWEEN DURATION OF LAST CALL AND LENGTH OF
CAMPAIGN FOR EACH CLIENT

ANALYSIS
The scatter plot clearly indicates that the longer the campaign went on, the shorter the
duration of the last call was to the client which is quite natural

32
FINAL RESULT

39,922 clients have agreed to open an account while 5,289 clients have not agreed to do so
and so clearly the campaign is not fully successful but since the outcome has improved
from last time we can say relatively the campaign has been partially successful
(graph 15)

CONCLUSION
Hence, we can say the campaign was partially successful.

33

You might also like