100% found this document useful (1 vote)
1K views63 pages

EDA Credit Case Study (Karan Pratap Singh)

This case study aims to analyze loan applicant data using exploratory data analysis (EDA) techniques to identify patterns that indicate if a client may have difficulty repaying an installment. The analysis identifies relationships between various numerical and categorical variables like income, credit score, education, age, and likelihood of default. Key findings include that applicants with lower income, credit scores, years of employment have higher rates of default. The results can be used by the company to reduce risk when providing loans.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views63 pages

EDA Credit Case Study (Karan Pratap Singh)

This case study aims to analyze loan applicant data using exploratory data analysis (EDA) techniques to identify patterns that indicate if a client may have difficulty repaying an installment. The analysis identifies relationships between various numerical and categorical variables like income, credit score, education, age, and likelihood of default. Key findings include that applicants with lower income, credit scores, years of employment have higher rates of default. The results can be used by the company to reduce risk when providing loans.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

CREDIT EDA CASE

STUDY

SUBMITTED BY- KARAN PRATAP SINGH


INTRODUCTION

This case study aims to give you an idea of applying EDA in a real business scenario. In this
case study, apart from applying the techniques that you have learnt in the EDA module, you
will also develop a basic understanding of risk analytics in banking and financial services
and understand how data is used to minimize the risk of losing money while lending to
customers.
BUSINESS
UNDERSTANDING
The loan providing companies find it hard to give loans to the people
due to their insufficient or non-existent credit history. Because of that,
some consumers use it as their advantage by becoming a defaulter.
Suppose you work for a consumer finance company which specializes in
lending various types of loans to urban customers. You have to use EDA
to analyze the patterns present in the data. This will ensure that the
applicants capable of repaying the loan are not rejected.
OBJECTIVE
This case study aims to identify patterns which indicate if a client has difficulty paying
their installments which may be used for taking actions such as denying the loan,
reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc.
This will ensure that the consumers capable of repaying the loan are not rejected.
Identification of such applicants using EDA is the aim of this case study.

In other words, the company wants to understand the driving factors (or driver variables)
behind loan default, i.e. the variables which are strong indicators of default. The
company can utilize this knowledge for its portfolio and risk assessment.
To develop your understanding of the domain, you are advised to independently
research a little about risk analytics - understanding the types of variables and their
significance should be enough).
CHECKING FOR
OUTLIERS
• The outliers in Income, is most likely relevant value. This
values could be binned when analyzing.

• The outliers in credit is most likely relevant value. This value


can be binned when analyzing.
• The Days Employed column has got an invalid value of
'1000' for lot of entries. This can be treated as missing
values.

• The outlier in Annuity is most likely relevant value. This


value could be binned when analyzing.
ANALYSIS
UNIVARIATE ANALYSIS
I. NUMERICAL
a. Income

In the above graph there is a distinct peak observed in the low income range (1L-2L) in the
case of defaulters.
b. Annuity

In the graph above Annuity amount of defaulters are less distributed when compared with non defaulters which
extend to higher amounts.
c. Credit Amount

In the graph above we can see Defaulters are observed more in the lower credit amount region.
d. Ext Source Score 2 & 3

- The mean score of Payment


Defaulters in EXT_SOURCE_2 is
less than 0.5.
- The mean score of Re-payers
in EXT_SOURCE_2 is greater
than 0.5

- The mean score of


Payment Defaulters in
EXT_SOURCE_3 is less than
0.4.
- The mean score of Re-
payers in EXT_SOURCE_3 is
greater than 0.5
e. Years Employed

The following can be observed in the graph above:


-Payment Defaulters have been employed for an average of less than 3 years
-Re-payers are employed for an average of 5+ years
UNIVARIATE ANALYSIS
II. CATEGORICAL
a. Gender

In the graph here we can observe that Females avail majority of loan when compared with men.
b. Education

Following can be observed form the graph:


• Secondary education category avails the
highest loan.
• The other categories are less in proportion
when compared.
c. Age

Following can be observed from the


above graph:
• 30-40 age group avails the highest
loan
• 60-70 age group avails the lowest
d. Income

People with income between 1 and 2 Lakhs are


the highest availers of Loan.
e. Dependants

From the graph here we can observe


that people with two dependents avail
the highest loan
BIVARIATE ANALYSIS
I. Numerical-Numerical
1. Income vs Credit

• Payment defaulters are majority low income group people.

• In many cases higher credit amount is given for low income people, this must be looked into.
2. Goods Price Vs Credit Amount
3.Income Vs Annuity

• No significant correlation observed.


• High annuity amounts observed for low income.
4.Income vs EXT_SOURCE_3

• No significant correlation between income and ext_source_3


5. Age vs ext_score_3

• No correlation observed between days birth and ext_source_2.


5. Credit amount vs Annuity

• In the graph here a positive correlation is observed between credit and annuity amount.
II. Categorical-Numerical
1. EDUCATION TYPE VS EXT_SOURCE_3

The following can be observed


from the graph here:
• Separated people despite
having a greater avg score
has made significant default
in payments.
• Non defaulters have an
average score greater than
0.4
2. Education vs Credit Amount

• In the graph here we


can observe that Higher
Education category has
received the highest
credit amount.
3. Gender vs Score

• In the graph here


we can observe
that Non
defaulters have a
greater average
score of greater
than 0.5 when
compared with
those having
payment
difficulties.(less
than 0.4)
4.EDUCATION VS SCORE

The following can be


observed in the graph as:
• Academic degree
holders having
EXT_SCORE_3 less than
0.4 will most likely
default.
• Lower secondary males
are the highest
defaulters.
5.Region rating Vs EXT_SOURCE_2

The following can be observed in


the graph here as:
• Those with payment difficulty
have lower EXT_SOURCE_2
average score.
• Those with region rating 3 and
score less than 0.5 will most
likely have payment difficulty.
III. Categorical-Categorical
1. AGE VS INCOME

Following are the


observations from the graph:
• Age group of <30 earning
less than 1Lakh are most
likely to have payment
difficulties.
• Income group 5L-10L
shows the least chances of
payment difficulty.
• Income group 'more than
10L' are outliers, thus
considering them as
isolated events.
2. AGE VS INCOME TYPE

Following observations are


derived from the graph:
• Unemployed category
shows the highest chances
of having payment difficulty.
• Pensioners in the age group
of <30 are at greater risk of
payment difficulty.
• The values of 'Unemployed',
'Student', 'Businessman' and
'Maternity leave' are
inconclusive because of very
less applicants.
3. FAMILY STATUS VS EDUCATION TYPE

Following are the observations


gathered from the graph:
• Lower secondary education-
Civil Marriage & Single people
are the most risky category
exhibiting payment difficulty.
• Widows have shown less
percentage of payment
difficulty through all Education
types.
• Academic degree Education
type shows less chances of
payment difficulty.
4.AGE VS HOUSING TYPE

Following are the observation


from the graph:
• People living in Office
apartments are of the least risk
to have payment difficulty.
• People living in Rented
apartments exhibit the most
chances of having payment
difficulty.
• Age group <30 living in rented
apartments shows a greater
chance of exhibiting payment
difficulty.
5. GENDER VS OCCUPATION TYPE

• Low-skill laborers are the


most risky group.
• Male realty agents shows 17%
likelihood of payment
difficulty.
• Female Waiter/Barmen staff
are more likely to have
payment difficulty than Male
waiter/barmen staff.
• Accountants are the least
riskiest.
ANALYSIS OF
MERGED DATA SETS
UNIVARIATE ANALYSIS
A. Categorical
1.NAME_CONTRACT_STATUS

• Higher proportion of males in


defaulters whose previous loan
applications were approved
• Similar to what we expected,
payment defaulters had a
higher proportion of those
whose previous loan
applications were refused.
2. NAME_CLIENT_TYPE

• Higher ratio of people whose


previous applications were
rejected in recurring customers
who showcased difficulty with
loan repayments
3. NAME_CONTRACT_TYPE

• Consumer /Retail loan applicants


were in higher proportion in both
groups(those with payment
difficulty and those who had no
payment difficulty)
UNIVARIATE ANALYSIS
B. Numerical
1. AMT_CREDIT_y

• 75 percentile of applicants have been credited with lower amount of credit (less than 220K) in both segments of
applicants.
2. AMT_APPLICATION

• 75 percentile of applicants have applied for lower amount of credit (less than 200K) in both segments.
3. AMT_ANNUITY

• 75 percentile of applicants have been approved with lower annuity loans(either higher term loans and/or low
interest loans)
BIVARIATE ANALYSIS
A. Categorical - Numerical
1. Name_contract_status vs AMT_CREDIT_Y

• Surprisingly till 75 percentile


of applications which were
approved in previous years
had lower amount being
credited in the current loan
where as the highest credit
amount as well as the 75
percentile amount of those
applications which were
rejected in the previous
years are much higher
compared to those which got
approval. This could be
flagged as a potential
leakage where previous
rejections are approved for
higher credits with out due
consideration of the reason
behind previous rejections.
2.NAME_CONTRACT_STATUS VS EXT_SOURCE_3

• Based on the box plot,


the External Credit
Score is reliable, since
above 25 percentile of
those applicants who
got approved earlier are
in the higher credit
score bracket compared
to those who got
rejected in the previous
applications. But, credit
score should be
considered always along
with other drivers to
determine a customer
default
3.NAME_CONTRACT_STATUS VS EXT_SOURCE_2

• Based on the box plot, the


External Credit Score is
reliable, since above 25
percentile of those
applicants who got
approved earlier are in the
higher credit score bracket
compared to those who got
rejected in the previous
applications. But, credit
score should be considered
always along with other
drivers to determine a
customer default
B. Numerical - Numerical
1. APPLICATION AMOUNT VS CREDIT AMOUNT

• As expected the amount credited vs loan amount applied has a direct correlation between them. There are
no outliers or exceptional amount being credited for a lower amount applied by an applicant.
2. ANNUITY AMOUNT VS CREDIT AMOUNT

• For a lesser increment in annuity there is a higher increment in the amount credited. So for a moderately risky
customer credit amount can be made unchanged provided the applicant is willing to go for a higher annuity
loans or higher interest charge or/and with lesser term of the loan.
3. CREDIT AMOUNT VS GOODS PRICE

• As expected the amount credited vs the amount of goods under loan consideration has a direct correlation between
them. There are no outliers or exceptional amount being credited for a low value good purchased/transacted by an
applicant. From a forensic perspective the visualization doesn't call for a potential fraud/collusion between an applicant
and bank employee or potential violation of the bank norms.
C. Categorical - Categorical
1.PORTFOLIO VS CONTRACT STATUS

• POS Portfolio loans have


the highest payment
defaults thereby having a
low approval rate.
• Defaulting in Car loans is
very less..
2.GENDER VS CONTRACT STATUS

• Males had higher proportion of


their previous loans rejected
compared to Females.
3.YIELD GROUP VS CONTRACT STATUS

• High yield loans


have high risk of
defaulting.
• Refusal rate is low
for Low action
loans
CONTINUOUS BIVARIATE
ANALYSIS
1. Credit Score vs Previous Loan Approved : Rejection ratio

• As evident from the scatter plot, credit score and the previous application status has no evident relation to each other.
• Less correlation between Credit Score and Previous Application Approved : Rejection Ratio. We expected a higher positive
correlation. As evident from the scatter plot, credit score and the previous application status has no evident relation to each
other.
2. Income Vs Approved : Rejection ratio (between target = 0 & target =1)

• Low income group (200K to 400K) showcased the most payment defaults where as the high income group on the other side
were correct on loan repayment. Still, there is a good chunk of low income people who made no defaults in their current loan.
Surprisingly previous application rejection ratio has no influence over this.
3. Credit Amount Vs Approved : Rejection ratio (between target = 0 & target =1)
4. Income VS Credit Score (between High Risk & Low Risk Customers)

• High Risk Customers(those with higher rejection ratio) has it's major chunk in the low income group As evident from the
second graph high income group is generally less risky;
• Income of a customer is major driving factor for his/her credit worthiness
5. Income VS Age of Customers (between High Risk & Low Risk Customers)

• Age has no evident influence over the risk exposure of a customer vis-a-vis the customer's income as well as the previous
loans rejection ratio
6.AGE GROUP VS PREVIOUS APPLICATION STATUS
OBSERVATIONS
• Customers with the higher rejection ratio and default in the current application (on the lower bottom of graph 1
(age group - 20 - 50 ) are with high risk profiles hence could be rejected the loans if their external credit score is less
and income levels are relatively low and credit amount applied is high

• More than 25% people who have made no default in the current loan have a lower rejection ratio(higher approval
ratio, >.75) across all age groups, They are low risk customers who could be given higher credit loans in future
applications. One thing to be mindful here is the number of approved loans a particular customer holding till date.

• Customers with the higher rejection ratio but has not defaulted in the current application (on the lower bottom of
graph 2 (across age groups) are with moderate risk profiles hence could be granted the loans if their external credit
score is high and reliable and income levels are relatively high and credit amount applied is low.
a. They could be granted loans with lesser credit amount
b. They could be granted credit at a higher interest rate provided income is higher and credit score is reliably higher

• Customers who had a higher approval ratio but defaulted with the current application could be granted with loans
with lesser credit amount since a higher credit amount would attract higher interest charges and that would further
stress the customer financially
RISKY APPLICANTS
Following are the types of applicants with high risk factor involved:

• EXT_SOURCE_3 mean score is less than 0.4


• EXT_SOURCE_2 mean score is less than 0.5
• Have been employed for an average of less than 3 years
• Men with lower secondary Education
• Age group of <30 earning less than 1Lakh, living in rented apartments.
• Those with region rating 3 and score less than 0.5
• Unemployed category Pensioners in the age group of <30 are at greater risk of payment difficulty.
• Lower secondary education-Civil Marriage & Single people
• People living in Rented apartments.
• Age group <30 living in rented apartments shows a greater chance of exhibiting payment difficulty.
• Low-skill laborers.
• Male realty agents .
• Female Waiter/Barmen staff
CONCLUSION
We identified the following Variables significantly driving Credit Default by an applicant:

• Income
• Previous Loan Rejection Ratio
• External Credit Score
• Education Category
• Family Status
• Occupation Type
• No. of Approved Loans currently outstanding

We also identified the following patterns ‘if a client has difficulty paying their installments’:

• Males exhibited more payment difficulty though there were more female applicants
• Applicants living in rented apartments and with parents exhibited a greater chance of payment default
• Low skilled labors are prone to payment default
• As expected unemployed and lower income group exhibited higher payment default tendency
RECOMMENDATION
Based on the customer profile and credit default drivers we identify and recommend as follows:

• High Risk profiles: Loan applications could be rejected if external credit score is less & income levels are low and credit
amount applied is high.

• Low Risk profiles: They should be extended higher credit loans in future applications. The number of previously approved
loans a particular customer holding till date should be enquired.

• Moderately Low Risk profiles: Could be granted loans with lesser credit amount. Could be granted credit at a higher interest
rate provided income is higher and credit score is reliably higher.

• Medium Risk profiles: could be granted loans with lesser credit amount since a higher credit amount would attract higher
annuity and that would further stress an already defaulting customer.

You might also like