EDA Credit Case Study (Karan Pratap Singh)
EDA Credit Case Study (Karan Pratap Singh)
STUDY
This case study aims to give you an idea of applying EDA in a real business scenario. In this
case study, apart from applying the techniques that you have learnt in the EDA module, you
will also develop a basic understanding of risk analytics in banking and financial services
and understand how data is used to minimize the risk of losing money while lending to
customers.
BUSINESS
UNDERSTANDING
The loan providing companies find it hard to give loans to the people
due to their insufficient or non-existent credit history. Because of that,
some consumers use it as their advantage by becoming a defaulter.
Suppose you work for a consumer finance company which specializes in
lending various types of loans to urban customers. You have to use EDA
to analyze the patterns present in the data. This will ensure that the
applicants capable of repaying the loan are not rejected.
OBJECTIVE
This case study aims to identify patterns which indicate if a client has difficulty paying
their installments which may be used for taking actions such as denying the loan,
reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc.
This will ensure that the consumers capable of repaying the loan are not rejected.
Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables)
behind loan default, i.e. the variables which are strong indicators of default. The
company can utilize this knowledge for its portfolio and risk assessment.
To develop your understanding of the domain, you are advised to independently
research a little about risk analytics - understanding the types of variables and their
significance should be enough).
CHECKING FOR
OUTLIERS
• The outliers in Income, is most likely relevant value. This
values could be binned when analyzing.
In the above graph there is a distinct peak observed in the low income range (1L-2L) in the
case of defaulters.
b. Annuity
In the graph above Annuity amount of defaulters are less distributed when compared with non defaulters which
extend to higher amounts.
c. Credit Amount
In the graph above we can see Defaulters are observed more in the lower credit amount region.
d. Ext Source Score 2 & 3
In the graph here we can observe that Females avail majority of loan when compared with men.
b. Education
• In many cases higher credit amount is given for low income people, this must be looked into.
2. Goods Price Vs Credit Amount
3.Income Vs Annuity
• In the graph here a positive correlation is observed between credit and annuity amount.
II. Categorical-Numerical
1. EDUCATION TYPE VS EXT_SOURCE_3
• 75 percentile of applicants have been credited with lower amount of credit (less than 220K) in both segments of
applicants.
2. AMT_APPLICATION
• 75 percentile of applicants have applied for lower amount of credit (less than 200K) in both segments.
3. AMT_ANNUITY
• 75 percentile of applicants have been approved with lower annuity loans(either higher term loans and/or low
interest loans)
BIVARIATE ANALYSIS
A. Categorical - Numerical
1. Name_contract_status vs AMT_CREDIT_Y
• As expected the amount credited vs loan amount applied has a direct correlation between them. There are
no outliers or exceptional amount being credited for a lower amount applied by an applicant.
2. ANNUITY AMOUNT VS CREDIT AMOUNT
• For a lesser increment in annuity there is a higher increment in the amount credited. So for a moderately risky
customer credit amount can be made unchanged provided the applicant is willing to go for a higher annuity
loans or higher interest charge or/and with lesser term of the loan.
3. CREDIT AMOUNT VS GOODS PRICE
• As expected the amount credited vs the amount of goods under loan consideration has a direct correlation between
them. There are no outliers or exceptional amount being credited for a low value good purchased/transacted by an
applicant. From a forensic perspective the visualization doesn't call for a potential fraud/collusion between an applicant
and bank employee or potential violation of the bank norms.
C. Categorical - Categorical
1.PORTFOLIO VS CONTRACT STATUS
• As evident from the scatter plot, credit score and the previous application status has no evident relation to each other.
• Less correlation between Credit Score and Previous Application Approved : Rejection Ratio. We expected a higher positive
correlation. As evident from the scatter plot, credit score and the previous application status has no evident relation to each
other.
2. Income Vs Approved : Rejection ratio (between target = 0 & target =1)
• Low income group (200K to 400K) showcased the most payment defaults where as the high income group on the other side
were correct on loan repayment. Still, there is a good chunk of low income people who made no defaults in their current loan.
Surprisingly previous application rejection ratio has no influence over this.
3. Credit Amount Vs Approved : Rejection ratio (between target = 0 & target =1)
4. Income VS Credit Score (between High Risk & Low Risk Customers)
• High Risk Customers(those with higher rejection ratio) has it's major chunk in the low income group As evident from the
second graph high income group is generally less risky;
• Income of a customer is major driving factor for his/her credit worthiness
5. Income VS Age of Customers (between High Risk & Low Risk Customers)
• Age has no evident influence over the risk exposure of a customer vis-a-vis the customer's income as well as the previous
loans rejection ratio
6.AGE GROUP VS PREVIOUS APPLICATION STATUS
OBSERVATIONS
• Customers with the higher rejection ratio and default in the current application (on the lower bottom of graph 1
(age group - 20 - 50 ) are with high risk profiles hence could be rejected the loans if their external credit score is less
and income levels are relatively low and credit amount applied is high
• More than 25% people who have made no default in the current loan have a lower rejection ratio(higher approval
ratio, >.75) across all age groups, They are low risk customers who could be given higher credit loans in future
applications. One thing to be mindful here is the number of approved loans a particular customer holding till date.
• Customers with the higher rejection ratio but has not defaulted in the current application (on the lower bottom of
graph 2 (across age groups) are with moderate risk profiles hence could be granted the loans if their external credit
score is high and reliable and income levels are relatively high and credit amount applied is low.
a. They could be granted loans with lesser credit amount
b. They could be granted credit at a higher interest rate provided income is higher and credit score is reliably higher
• Customers who had a higher approval ratio but defaulted with the current application could be granted with loans
with lesser credit amount since a higher credit amount would attract higher interest charges and that would further
stress the customer financially
RISKY APPLICANTS
Following are the types of applicants with high risk factor involved:
• Income
• Previous Loan Rejection Ratio
• External Credit Score
• Education Category
• Family Status
• Occupation Type
• No. of Approved Loans currently outstanding
We also identified the following patterns ‘if a client has difficulty paying their installments’:
• Males exhibited more payment difficulty though there were more female applicants
• Applicants living in rented apartments and with parents exhibited a greater chance of payment default
• Low skilled labors are prone to payment default
• As expected unemployed and lower income group exhibited higher payment default tendency
RECOMMENDATION
Based on the customer profile and credit default drivers we identify and recommend as follows:
• High Risk profiles: Loan applications could be rejected if external credit score is less & income levels are low and credit
amount applied is high.
• Low Risk profiles: They should be extended higher credit loans in future applications. The number of previously approved
loans a particular customer holding till date should be enquired.
• Moderately Low Risk profiles: Could be granted loans with lesser credit amount. Could be granted credit at a higher interest
rate provided income is higher and credit score is reliably higher.
• Medium Risk profiles: could be granted loans with lesser credit amount since a higher credit amount would attract higher
annuity and that would further stress an already defaulting customer.