0% found this document useful (0 votes)
63 views

Report On Lending Co.

The document describes a statistical study conducted on lending data from Lending Club, a peer-to-peer lending company, from 2007-2011. The study aimed to determine factors that influence loan repayment and default rates in order to improve the company's profits and business. Variables analyzed included borrower demographics, loan characteristics, and customer repayment behavior. Descriptive statistics were calculated on the data, including mean, median, and standard deviation values for key variables such as loan amount, interest rate, income, and debt-to-income ratio.

Uploaded by

Abhishek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Report On Lending Co.

The document describes a statistical study conducted on lending data from Lending Club, a peer-to-peer lending company, from 2007-2011. The study aimed to determine factors that influence loan repayment and default rates in order to improve the company's profits and business. Variables analyzed included borrower demographics, loan characteristics, and customer repayment behavior. Descriptive statistics were calculated on the data, including mean, median, and standard deviation values for key variables such as loan amount, interest rate, income, and debt-to-income ratio.

Uploaded by

Abhishek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

A STATISTICAL

STUDY ON

LENDING CLUB

COMPANY

BY:
AMAN ARYA- 1828003
IKSHEK MISRI- 1828013
PRANAY PATVARI- 1828022
SAKSHAM SOLANKI- 1828026
RISHIKA SHARMA- 1828051
SARA SUSAN - 1828053
INTRODUCTION

Lending club is the largest peer-to-peer marketplace connecting borrowers with lenders. Borrowers

apply through an online platform where they are assigned an internal score. Lenders decide 1)

whether to lend and 2) the terms of loan such as interest rate, monthly instalment, tenure etc. Some

popular products are credit card loans, debt consolidation loans, house loans, car loans etc.

We analysed the data of Lending Club Company to find out the scope of improvement in their

business. We obtained the data dump from the company for the period 2007 to 2011. We cleaned the

data to clear out all the null values and irrelevant data. We used the method of random sampling to

sample 500 values out of 39000 and conducted our statistical study.

OBJECTIVES

The main objective of our study is:

1. To derive a solution on whether the company should give loan to a particular customer or not.

2. To increase the profits of the company

3. To determine the least selling and most selling products of the company and tweak

(increase/decrease) the sales respectively.

Based on the main objective, various subobjectives have been derived. They are as follows:

• To identify variables which are strong indicators of default and potentially use the insights in

approval / rejection decision making.

• To check that in which loan average amount is more i.e. in rent or own house so that lending

club can decide whom should be given loan importance.

• To find the difference in interest rate of Grade A and Grade B.

1|Page
• To find whether the home ownership and loan status are related. This is so that the Lending

Club makes sure to check one/both of the parameters, loan status and home ownership before

lending loans.

• To find whether the verification status and new loan status are related or not.

This is so that the Lending Club makes sure to check one/both of the parameters, loan status

and verification status before lending loans.

• To check whose average loan amount which is taken into consideration for Grade C and

Grade D employees is more in order to improve the business.

• To check on which among the two average interest rate was given more as it will help to

identify a particular sector where company would like to improve its business.

• To find the Debt to Income ratio (DTI) which will measure an individual's ability to manage

monthly payments and debts and assist the Lending Club to assess if the customer should be

given loan or not.

• To make sure that the revolving balance value is maintained low.

• To examine the relationship between Loan Purpose and Amount of Loan applied.

• To examine the relationship between Loan Grade and interest rate.

TYPES OF VARIABLES

• Customer (Applicant) Demographics

• Loan Related Information & Characteristics

• Customer Behaviour (If The Loan Is Granted)

Customer Demographic Loan Information & Customer Behaviour

Characteristics Variables

Employment Length Loan Amount Loan Purpose

Annual Income VS Loan Loan Term DTI Ratio

2|Page
Amount

Home Ownership Interest Rate Revolving Balance

City Address Loan Status Revolving Utility

Loan Grade Earliest Credit Line

Recoveries

Delinquent Years

CUSTOMER DEMOGRAPHICS:-

Employee Length:- Employment length in years. Possible values are between 0 and 10 where 0

means less than one year and 10 means ten or more years.

Loan Amount:- The listed amount of the loan applied for by the borrower. If at some point in time,

the credit department reduces the loan amount, then it will be reflected in this value.

Home Ownership:- The home ownership status provided by the borrower during registration. Our

values are: RENT, OWN, MORTGAGE, OTHER.

Address State:- The state provided by the borrower in the loan application. It is also referred to as

Contact Address sometimes.

LOAN INFORMATION & CHARACTERISTICS:-

Loan Amount :- A sum of money that is lent, usually with an interest fee. The agreement or contract

specifying the terms and conditions of the repayment of such a sum. It refers to total amount that the

organisation has lent to various individual customers Organisation generally issue loan upon some

collateral security.

Loan Term:- Loan Term refers to time period for which the loan was given.

3|Page
Interest Rate And Average Loan Amount:- Interest rate is the amount charged, expressed as

a percentage of principal, by a lender to a borrower for the use of assets. The amount specified in

the mortgage contract that the borrower agrees to pay back. The amount of points included and

various other costs make the loan amount different from the quantity of cash distributed by

the lender.

Loan Grade:- Loan grading involves assigning a quality score to a loan based on credit history,

quality of collateral and likelihood of repayment. A score can also be applied to a portfolio of loans.

Loan Status:- After a loan is made in any context, whether through a traditional banking institution,

an online marketplace lender, or even informally between friends, it moves through various stages as

it is either paid off or not paid off. These stages may be categorized in different ways. On Prosper,

loan status is defined as: Fully Paid, Charged Off and Current.

CUSTOMER BEHAVIOUR VARIABLES:-

Loan Purpose:- Loan purpose is a term in United States to show the underlying reason an applicant

is seeking a loan. The purpose of the loan is used by the lender to make decisions on the risk and

may even impact the interest rate that is offered. Loan purpose is important to the process of

obtaining mortgages or business loans that are connected with specific types of business activities.

DTI:- The debt-to-income (DTI) ratio is a personal finance measure that compares an individual’s

debt payment to his or her overall income. The debt-to-income ratio is one way lenders, including

mortgage lenders, measure an individual’s ability to manage monthly payment and repay debts. DTI

is calculated by dividing total recurring monthly debt by gross monthly income, and it is expressed

as a percentage.

Revolving Balance:- In credit card terms, a revolving balance is the portion of credit card spending

that goes unpaid at the end of a billing cycle. The amount can vary, going up or down depending on

the amount borrowed and the amount repaid. Revolving credit is a line of credit where the customer

4|Page
pays a commitment fee to a financial institution to borrow money, and is then allowed to use the

funds when needed. It usually is used for operating purposes and the amount drawn can fluctuate

each month depending on the customer's current cash flow needs. Revolving lines of credit can be

taken out by corporations or individuals.

Revolving Utility:- Revolving utilization, also known as your “debt-to-limit ratio” or “credit

utilization,” measures the amount of your revolving credit limits that you are currently using.

Revolving balances can also be paid in full without incurring finance charges, if paid within the

“grace period.”

Earliest Credit Line:- A credit line is a pool of money available for borrowing. Also known as

a line of credit, these loans have a maximum limit, and borrowers have the option of borrowing any

amount up to that limit or not using any of the money at all. The earliest credit line is the month or

year the borrower’s earliest reported credit line was opened.

Recoveries:- When the borrower is unable to pay the loan, it is charged off. The charged off loans

are then sold and the amount recovered by this sale is called recoveries. It is sold off to cover some

percentage off the loss incurred because of charging off the loan.

Delinquency:- The term "delinquent" refers to a situation where the borrower is late or overdue on a

payment. For the study, we have considered the number of 30+ days past-due incidences of

delinquency in the borrower’s credit file for the past two years. It has been combined with each grade

group to study the behaviour.

DESCRIPTIVE STATISTICS

5|Page
MEAN

The mean or average is probably the most commonly used method of describing central tendency.

To compute the mean, add up all the values and divide by the number of values.

The mean loan Amount is 11492.95

The mean Loan Term is 42.816

The mean Interest Rate is 12.261

The mean Installment is 330.1862

The mean Annual Income is 66850.53

The mean DTI ratio is 66850.53

The mean Delinquency is 0.164

The mean Revolving Balance is 13063.8

The mean Revolving Utility is 0.49438

The mean Recoveries is 34.06592

MEDIAN

The median is the score found at the exact middle of the set of values. To compute the median, list

all scores in numerical order and then locate the score in the centre of that sample.

The median of loan Amount is 10000

The median of Loan Term is 36

The median of Interest Rate is 12.18

The median of Installment is 277.35

The median of Annual Income is 55657.5

6|Page
The median of DTI ratio is 55657.5

The median of Delinquency is 0

The median of Revolving Balance is 8952

The median of Revolving Utility is 0.493

The median of Recoveries is 0

MODE

The mode is the most frequently occurring value in the set of scores. To determine the mode, list all

the scored in numerical order, then count each one. The most frequently occurring value is the mode.

The mode of loan Amount is 10000

The mode of Loan Term is 36

The mode of Interest Rate is 5.42

The mode of Installment is 90.48

The mode of Annual Income is 50000

The mode of DTI ratio is 50000

The mode of Delinquency is 0

The mode of Revolving Balance is 0

The mode of Revolving Utility is 0

The mode of Recoveries is 0

STANDARD DEVIATION

The standard deviation is a more accurate and detailed estimate of dispersion because an outlier can

7|Page
greatly exaggerate the range. The standard deviation shows the relation that set of scores has to mean

of the sample.

The Standard Deviation of loan Amount is 7601.1596

The Standard Deviation of Loan Term is 10.83332201

The Standard Deviation of Interest Rate is 3.7696

The Standard Deviation of Installment is 213.27822

The Standard Deviation of Annual Income is 42219.656

The Standard Deviation of DTI ratio is 42219.66

The Standard Deviation of Delinquency is 0.5601174

The Standard Deviation of Revolving Balance is 15038.1

The Standard Deviation of Revolving Utility is 0.29417

The Standard Deviation of Recoveries is 198.9455

RANGE

The range is simply the highest minus the lowest value.

The range of loan Amount is 34100

The range of Loan Term is 24

The range of Interest Rate is 18.98

The range of Installment is 1208.09

The range of Annual Income is 381720

The range of DTI ratio is 381720

8|Page
The range of Delinquency is 7

The range of Revolving Balance is 116150

The range of Revolving Utility is 0.999

The range of Recoveries is 2815.31

SAMPLE VARIANCE

The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number of

items taken from a population.

The sample variance of loan Amount is 57777627

The sample variance of Loan Term is 117.3608657

The sample variance of Interest Rate is 14.21

The sample variance of Installment is 45487.6

The sample variance of Annual Income is 1782499335

The sample variance of DTI ratio is 1782499335

The sample variance of Delinquency is 0.3137315

The sample variance of Revolving Balance is 226145390

The sample variance of Revolving Utility is 0.08654

The sample variance of Recoveries is 39579.32

KURTOSIS

Kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random

variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a

9|Page
probability distribution and, just as for skewness, there are different ways of quantifying it for a

theoretical distribution and corresponding ways of estimating it from a sample from a population.

The kurtosis of loan Amount is 0.4926191

The kurtosis of Loan Term is 1.081029213

The kurtosis of Interest Rate is 0.1597

The kurtosis of Installment is 1.0571941

The kurtosis of Annual Income is 11.763276

The kurtosis of DTI ratio is 11.76327596

The kurtosis of Delinquency is 58.067543

The kurtosis of Revolving Balance is 15.25793468

The kurtosis of Revolving Utility is 1.19599

The kurtosis of Recoveries is 103.4227

SKEWNESS

Skewness is a measure of the asymmetry of the probability distribution of a real-valued random

variable about its mean. The skewness value can be positive or negative, or undefined.

The skewness of loan Amount is 1.0161942

The skewness of Loan Term is 0.960890717

The skewness of Interest Rate is 0.2837

The skewness of Installment is 1.1235838

The skewness of Annual Income is 2.56632039

10 | P a g e
The skewness of DTI ratio is 2.566320392

The skewness of Delinquency is 6.2833914

The skewness of Revolving Balance is 3.191226138

The skewness of Revolving Utility is 0.04157

The skewness of Recoveries is 9.291197

11 | P a g e
ONE SAMPLE T TEST
1. Debt-to-income (DTI) Ratio
Introduction:
A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations,
excluding mortgage and the requested LC (Letter of Credit) loan, divided by the borrower’s self-
reported monthly income. Through the test, we want to find the average DTI ratio of the customers
of the Lending Club Company.
Research Question and Hypothesis:
The purpose of this test is to find out whether the debt-to-income ratio is below or equal to the ideal
15% .
Research Question:

To find the Debt to Income ratio (DTI) which will measure an individual's ability to manage monthly
payments and debts and assist the Lending Club to assess if the customer should be given loan or
not.
Hypothesis:
H0 : The average debt-to-income ratio of the customers is greater than or equal to fifteen percent.
H1 : The average debt-to-income ratio of the customers is less than fifteen percent.
H0 :µ ≥ 15%
H1 :µ< 15%
t-Test: Two-Sample Assuming Unequal Variances

dti dummy
Mean 13.5382 0
Variance 44.71585567 0
Observations 500 500
Hypothesized Mean Difference 15
df 499
t Stat -4.88812369
P(T<=t) one-tail 0.000001
t Critical one-tail 1.647912984
P(T<=t) two-tail 0.000001
t Critical two-tail 1.964729391
Interpretation:
Comparing the alpha value (0.05) to the p-value of the output (0.00), we can see that the p value is
lesser than the alpha value which implies the null hypothesis is rejected.
Then comparing the t value (-4.89) with the t critical value (1.65), it is observed that t value is less
than the t critical value and we can conclude that the null hypothesis is rejected.

12 | P a g e
Conclusion:
A sample of 500 was taken for the test and it is seen that the null hypothesis is rejected. So, the
alternate hypothesis that the average debt-to-income ratio is lesser than 15% is accepted. This means
most of the customers have a low debt to income ratio which is a good sign for the company as a low
debt to income ratio is preferred.
2. Revolving Balance
Introduction:
Total credit revolving balance. A revolving balance is the portion of credit card spending that goes
unpaid at the end of a billing cycle. With the test, we want to find the average revolving balance of
the customers of Lending Club.
Research Question and Hypothesis:
The purpose of conducting this test is to check if the revolving credit balance of the customers is
below the preferred mark of 20,000.
Research Question:
To make sure that the revolving balance value is maintained low.
Hypothesis:
H0 : The average revolving credit balance of each customer is greater than or equal to 20,000.
H1 : The average revolving credit balance of each customer is less than 20,000.
H0 : µ ≥ 20,000
H1 : µ < 20,000
t-Test: Two-Sample Assuming Unequal Variances

revol_bal Dummy
Mean 13063.842 0
Variance 226145390 0
Observations 500 500
Hypothesized Mean Difference 20000
df 499
t Stat -10.3135959
P(T<=t) one-tail 0.000000
t Critical one-tail 1.647912984
P(T<=t) two-tail 0.000000
t Critical two-tail 1.964729391
Interpretation:
Comparing the alpha value (0.05) to the p-value of the output (0.00), we can see that the p value is
lesser than the alpha value which implies the null hypothesis is rejected.
Then comparing the t value (-10.31) with the t critical value (1.65), the t value is less than the t
critical value and we can conclude that the null hypothesis is rejected.

13 | P a g e
Conclusion:
A sample of 500 customers was taken and one sample t test was done for the variable revolving
balance. From the test, we saw that the null hypothesis rejected. This means that most of the
customers have a revolving balance of less than 20,000 which is good for Lending Club as a lower
balance is always preferred.
TWO SAMPLE T TEST

3. Loan Amount And Loan Grade:

Introduction:

Loan Amount is the listed amount of the loan applied for by the borrower. If at some point in time,

the credit department reduces the loan amount, then it will be reflected in this value. In other words,

loan amount describes the total amount that a borrower is authorized to borrow. Loan

amounts are used in standard loans, credit cards and line of credit accounts. On the other hand, Loan

Grade is the loan grade assigned by the Lending Club Company. Here, we are going to perform Two

Sample T-Tail test for Unequal Variances to find out some valuable results for the company.

Research Question and Hypothesis:

The goal of this study is to find out how much amount of loan is being distributed on the basis of

Loan Grades of the company. This data can be useful in improving the business of the company.

RQ:- To check to whom average loan amount among Grade C and Grade D employees is given

more.

H0 :- The average Loan Amount of the Grade C employees is less than the average Loan Amount of

the Grade D employees.

HA :- The average Loan Amount of the Grade C employees is greater than or equal to the average

Loan Amount of the Grade D employees.

14 | P a g e
t-Test: Two-Sample Assuming Unequal
Variances

C D
Mean 11977.25225 12472.95082
Variance 53986676.7 72213297.81
Observations 111 61
Hypothesized Mean Difference 0
df 109
-
t Stat 0.383561133
P(T<=t) one-tail 0.35102543
t Critical one-tail 1.658953458
P(T<=t) two-tail 0.70205086
t Critical two-tail 1.98196749
Interpretation:

This test shows that P-value is more than α (alpha) i.e. level of significance for one tail is more

(0.3510 > -0.05) and T-Critical for one tail is greater than T-Statistical i.e. 1.6589 > -0.3835 So

as per the comparisons we can say that Null Hypothesis is accepted and mean of C grade employees

is less than mean of D grade employees.

Conclusion:

From the analysis it can be concluded that average Loan Amount given to Grade D employees is

more but at the same time the risk factor should also be taken into consideration compared to Grade

C employees. Yes, the company should focus more on Grade C employees but at the same time the

company should look at the risk factor involved by giving too much loans to the Grade D employees.

4. Loan Purpose ( Credit Card & Small Business) And Interest Rate:

Introduction:

Loan Purpose is defined as the purpose for which the customer has taken loan from the lending

company. Interest rate is the rate which is charged on the Loan Amount taken by the customer from

the lending company. Here, with the help of Two Sample T-Tail test for Unequal Variances we are

going to find out some facts for the company.

15 | P a g e
Research Question and Hypothesis:

The goal of this study is to find out how much average Interest Rate is being applied while giving

loans for the purpose of Credit Cards and Small Businesses. This data can tell the company on which

sector the company should focus more.

RQ:- To find out on which sector the average interest rate was given more.

H0 :- The average Interest Rate given for Credit Card purpose is equal to the average Interest Rate

given for Small Business.

HA :- The average Interest Rate given for Credit Card purpose is not equal to the average Interest

Rate given for Small Business.

t-Test: Two-Sample Assuming Unequal


Variances

Small
Credit Card Business
Mean 12.65246154 13.4562069
Variance 15.6798251 19.28727438
Observations 65 29
Hypothesized Mean Difference 0
df 49
-
t Stat 0.844269167
P(T<=t) one-tail 0.20131111
t Critical one-tail 1.676550893
P(T<=t) two-tail 0.402622221
t Critical two-tail 2.009575237
Interpretation:

This test shows that P-value is more than α (alpha) i.e. level of significance for one tail is more

(0.3510>-0.05) and T-Critical for one tail is greater than T-Statistical i.e. 1.6589>-0.3835 So as

per the comparisons we can say that Null Hypothesis is accepted and mean of C grade employees is

less than mean of D grade employees.

16 | P a g e
Conclusion:

From the analysis it can be concluded that Variance of the Credit Card is almost equal to the

Variance of the Small Business. Since it is a sample so approx. equal values are coming but when we

will consider the whole data they will be equal which can also be made out with the mean values as

they are also equal for both the purposes. As per this, the company is already focusing equally on

both these sectors so company should look to other sectors to improve its business.

5. Grade and Interest rate

Introduction:

Loan Grade refers to the grade given to individual customer depending upon the credit worthiness of

the individual. Interest rate refers to the interest taken from the individual for the loan.

The objective is to find whether the interest rate of Grade A and Grade B people are same or not.

Research Question and Hypothesis:

The goal of this study is to find out whether the interest rate charged for Grade A and Grade B

customer is equal or not.

H0: The interest rate charged for Grade A and Grade B customers are equal.

HA: The interest rate charged for Grade A and Grade B customers are not equal.

OR

H0:µ1=µ2

H1:µ1≠µ2

Two sample T-Test assuming unequal variance

A B
Mean 7.242368421 11.25268
Variance 1.303037704 0.824288
Observations 114 153
Hypothesized Mean Difference 0

17 | P a g e
df 210
-
t Stat 30.92397463
P(T<=t) one-tail 0.000000
t Critical one-tail 1.652141981
P(T<=t) two-tail 0.000000
t Critical two-tail 1.971324793
Interpretation:

A sample of 500 customer was taken for the study. In this P-value is more than alpha, level of

significance then accept the null hypothesis at 5% which means that mean of 1 & 2 is equal.

For more accuracy, when we compare t critical value with t statistic and here 1.9713>-30.9239 which

again leads to accept the null hypothesis.

Conclusion:

It can be concluded from the study that almost same amount of interest rate is charged for Grade A

and Grade B customers with slight variation. Therefore both grade customer provide same interest

revenue to the firm.

6. Home Ownership and Annual Income

Introduction:

The home ownership status provided by the borrower during registration. Our values are: RENT,

OWN, MORTGAGE, OTHER. Loan Amount refers to total amount that the organization has lent to

various individual customers. Organization generally issue loan upon some collateral security. To

check that in which loan average amount is more i.e. in rent or own house so that lending club can

decide whom should be given loan importance.

Research Question and Hypothesis:

The goal of this study is to find out if theres any relationship between Home ownership and annual

income

18 | P a g e
H0: The annual income of customers with rented house is less than own house customers.

HA: The annual income of customers with rented house is greater than or equal to own house

customers.

OR

H0:µr<µo

HA:µr≥µo

t-Test: Two-Sample Assuming Unequal


Variances

RENT OWN
Mean 56461.03155 55383.46872
Variance 1073324868 1422666749
Observations 238 39
Hypothesized Mean Difference 0
Df 48
t Stat 0.168310847
P(T<=t) one-tail 0.433522974
t Critical one-tail 1.677224196
P(T<=t) two-tail 0.867045949
t Critical two-tail 2.010634758
Interpretation:

This test shows that p-value is more than alpha i.e. level of significance which is 0.4335>0.05 and t

critical value for one tail is more than t statistical value i.e. 1.6772>0.1683 so we accept the null

hypothesis which means mean of rent is less than mean of own.

Conclusion:

From this study we can conclude that the customers staying on rent having less annual income as

compared to customers who are staying in their own house. This can help Lending club to identify

that which customer is having more creditworthiness or less creditworthiness so that it will be easy

for lending club to make decision in future.

19 | P a g e
CHI SQUARE TEST

Our Approach:

The Chi Square statistic is commonly used for testing relationships between categorical variables.

A chi-square test for independence compares two variables in a contingency table to see if they are

related. In a more general sense, it tests to see whether distributions of categorical variables differ

from each another.

• A very small chi square test statistic means that your observed data fits your expected data
extremely well. In other words, there is a relationship.
• A very large chi square test statistic means that the data does not fit very well. In other words,
there isn’t a relationship.

Hypothesis:

A null hypothesis and alternate hypothesis is formed. The null hypothesis of the Chi-Square test is

that no relationship exists on the categorical variables in the population; they are independent.

Contingency Table

Observed Frequency Table: A pivot table is made using the categorical variables “X” in the

column and “Y” in the row. The count of X is observed across different Y data.

Expected Frequency Table: The expected frequency table is prepared by the formula:

Eij = (Row i Total)(Column j Total)/Sample Size

Test Statistics:

The test statistic is found by the formula:

It can also be calculated in excel by the formula CHISQ.INV.RT(p,f)

where p is the p-value(probability) and f is the degree of freedom

20 | P a g e
f= (m-1)(n-1)

where m is the number of rows and n the number of columns

P-value is calculated by the formula

P= ChiSq.Test(Observed Range, Expected Range)

Interpretation:

In the majority of analyses, an alpha of 0.05 is used as the cut off for significance. If the p-value is

less than 0.05, we do not accept the null hypothesis that there's no difference between the means and

conclude that a significant difference does. If p-value is greater than 0.05 then we accept the null

hypothesis.

7. Loan Status and Home Ownership

Objective:

The objective is to find whether home ownership and loan status are related.

This is so that the Lending Club makes sure to check one/both of the parameters, loan status and

home ownership before lending loans.

Research question and Hypotheses:

The goal of the study is to examine the relationship between home ownership and loan status. Using

these variables, we sought to answer the following research question.

RQ: Does home ownership of the consumer has a relationship with the loan status or not.

H0: Home Ownership and loan status are independent.

H1: Home ownership and loan status has a relationship with each other.

21 | P a g e
Chi Square Test 1:

A pivot table is made using the categorical variables “Loan Status” in the column and “Home

Ownership” in the row. The count of loan status is observed across different home ownership data.

Observed Frequency Table:

Count of loan_status Column Labels


Row Labels MORTGAGE OWN RENT Grand Total
Charged Off 31 6 34 71
Current 10 4 8 22
Fully Paid 159 115 130 404
Grand Total 200 125 172 497

Expected Frequency Table:

The expected frequency table is prepared by the formula:

Eij = (Row i Total)(Column j Total)/Sample Size

Row Labels MORTGAGE OWN RENT Grand Total


Charged Off 28.57 17.8 24.57 70.94
Current 8.85 5.53 7.61 21.99
Fully Paid 162.58 101.61 139.81 404
Grand Total 200 124.94 171.99 496.93

The test statistic is found by the formula:

It can also be calculated in excel by the formula CHISQ.INV.RT(p,f)

where p is the p-value(probability) and f is the degree of freedom

f= (m-1)(n-1)

where m is the number of rows and n the number of columns

22 | P a g e
Here m= 3, n=3

Therefore, f=4

P-value is calculated by the formula

P= ChiSq.Test(Observed Range, Expected Range)

Test Statistic 14.77281048


Degrees of
Freedom 4
p-Value 0.005196379

Observation:

In the majority of analyses, an alpha of 0.05 is used as the cut off for significance. If the p-value is

less than 0.05, we reject the null hypothesis that there's no difference between the means and

conclude that a significant difference does.

P-value 0.05196 is greater than the alpha value 0.05

Interpretation:

Since p>0.05, we accept the null hypothesis, which means home ownership and loan status are not

related

8. Verification Status and New Status

Objective:

The objective is to find whether the verification status and new status are related or not.

This is so that the Lending Club makes sure to check one/both of the parameters,new status and

verification status before lending loans.

23 | P a g e
Research question and Hypotheses:

The goal of the study is to examine the relationship between verification status and new status. Using

these variables, we sought to answer the following research question.

RQ: Does the verification status of the consumer has a relationship with the new status or not.

H0: Verification status and new status are independent.

H1: Verification status and new status has a relationship with each other.

Chi Square Test 2:

A pivot table is made using the categorical variables “New Status” in the column and “Verification

Status” in the row. The count of loan status is observed across different home ownership data.

Observed Frequency Table:

Count of verification_status Column Labels


Source Grand
Row Labels Not Verified Verified Verified Total
Defaulted 27 14 31 72
Paid/current 202 92 134 428
Grand Total 229 106 165 500

Expected Frequency Table:

The expected frequency table is prepared by the formula:

Eij = (Row i Total)(Column j Total)/Sample Size

Source Grand
Row Labels Not Verified Verified Verified Total
Defaulted 32.976 15.264 23.76 72
Paid/current 196.024 90.736 141.24 428
Grand Total 229 106 165 500

24 | P a g e
The test statistic is found by the formula:

It can also be calculated in excel by the formula CHISQ.INV.RT(p,f)

where p is the p-value(probability) and f is the degree of freedom

f= (m-1)(n-1)

where m is the number of rows and n the number of columns

Here m= 3, n=2

Therefore, f=2

P-value is calculated by the formula

P= ChiSq.Test(Observed Range, Expected Range)

Test Statistic 3.964702927


Degrees of Freedom 2
p-value 0.137744954

Observation:

The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in

the population; they are independent.

P-value 0.1377 is greater than the alpha value 0.05

Interpretation:

Since p>0.05, we accept the null hypothesis, which means new status and verification status are not

related.

25 | P a g e
ANOVA

9. Loan Amount and Loan Purpose

Introduction:

Loan Purpose refers to the purpose for which was taken by the customer. Loan amount refers to

amount of loan applied by the consumer. We want to find out if the total amount of loan applied for

different loan purpose is same or not. By finding out the loan for which highest amount of loan was

applied would help in finding the highest demand for loan category.

Research question and Hypotheses:

The goal of this study was be to examine the relationship between Loan Purpose and Amount of

Loan applied. Using these variables, we sought to answer the following research question.

RQ: Does the amount of loan taken by the consumer for different loan purpose is same or different

H0: The amount of loan taken by the consumer for different loan purpose is same i.e.

µCar=µCredit card=µDebt Consolidation=µEducational Loan =µHome improvement =µHouse

construction=µMajor purchase= µMoving= µMedical=µVacation=µSmall Business=µRenewable

Energy=µMarriage=µother

H1: The amount of loan taken by the consumer for different loan purpose is different

ANOVA: Single Factor

SUMMARY
Groups Count Sum Average Variance
Car 249 1813150 7281.726908 23146429
credit Card 249 3137375 12599.8996 43240552
Debt Consolidation 249 3219225 12928.61446 59320869
Educational 249 1684925 6766.767068 27819095
Home Improvement 249 3343800 13428.91566 85441918
House Contruction 249 3581700 14384.33735 80382677
Major Purchase 249 3378775 13569.37751 81546944

26 | P a g e
Medical 249 2180025 8755.120482 45090561
Moving 249 1762950 7080.120482 40404926
Other 249 2140850 8597.791165 46205066
Renewable Energy 249 2548506 10234.96386 70898746
Small Business 249 3981225 15988.85542 86651153
Vacation 249 1415675 5685.441767 21235233
Wedding 249 2553650 10255.62249 45749852

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 34726436827 13 2671264371 49.39377 0.000000000 1.722972
Within Groups 187769237144.10 3472 54081001.48

Total 222495673970.61 3485


Interpretation:

A sample of 249 was taken to conduct the study. People belonging to various loan category were

taken for the study. From the table it can be interpreted that highest loan was taken for Small

Business category and least amount was taken for educational purpose. The value of F> F critic,

hence we reject null hypothesis and accept alternative hypothesis i.e. the mean of various loan

category is not same

Conclusion:

From the analysis it can be concluded that the amount of loan taken for different purpose is not same.

Highest amount was taken for small business, indicating that the demand for such loans is very high.

Firms could develop various strategies to attract small business people.

10. Loan Grade and Interest Rate:

Introduction:

Loan Grade refers to the grade given to individual customer depending upon the credit worthiness of

the individual. Interest rate refers to the interest taken from the individual for the loan. We want to

find out if the interest rate charged by the organisation for different grade category of the individual

is same or not. This would help in finding out the interest policy of the Organisation.

27 | P a g e
Research question and Hypotheses:

The goal of this study was be to examine the relationship between Loan Grade and interest rate.

Using these variables, we sought to answer the following research question.

RQ: Is the interest rate charged by banks for different loan grades of the individual is same or not.

H0: The interest rate charged by banks for different loan grade is same.

µGrade A=µGrade B=µGrade C=µGrade D=µGrade E=µGrade F=µGrade G

H1: The interest rate charged by banks for different loan grade is not same.

ANOVA: Single Factor

SUMMARY
Groups Count Sum Average Variance
Grade A 249 19.1355 0.076849398 0.000091276
Grade B 249 28.7777 0.115573092 0.000107898
Grade C 249 36.1238 0.145075502 0.000068531
Garde D 249 42.7201 0.171566667 0.000035742
Grade E 249 48.2116 0.193620884 0.000036525
Grade F 249 53.2395 0.213813253 0.000033856
Grade G 249 53.9795 0.216785141 0.000171060

ANOVA
Source of
Variation SS df MS F P-value F crit
Between Groups 4.100999 6 0.683499789 8780.706384 0 2.103797
Within Groups 0.135132 1736 0.000077841

Total 4.236131 1742


Interpretation:

A sample of 249 was taken to conduct the study. People belonging to various loan grade were taken

for the study. From the table it can be interpreted that highest interest rate was charged from Grade G

category and least amount was charged from Grade A category. The value of F> F critic, hence we

28 | P a g e
reject null hypothesis and accept alternative hypothesis i.e. the mean of interest rate for different loan

grade is different

Conclusion:

It can be concluded from the study that different rates are charged by financial institution for

different categories of individual depending upon the credit worthiness of the individual. Individuals

with Grade A were given loan with lower interest rate whereas highest interest rate was charged from

Grade G individuals. This is done because people with high grade have greater credit worthiness and

the chance of recovering the loan is high.

29 | P a g e
CORRELATION ANALYSIS:

30 | P a g e
Meaning and Interpretation:

A heat map is a tool that uses color the way a bar graph uses height and width: as a data visualization

tool. If you’re looking at a web page and you want to know which areas get the most attention, a heat

map shows you in a visual way that’s easy to assimilate and make decisions from. Heat Map is used

to representation of correlation between variables using colours. Heat map can be interpreted using

following information.

1 = Positive Correlation

0= No Correlation

-1= Negative Correlations

Green and Shades of Green represent Positive Correlation which states that the change is one

variable causes significant change in other variable

Yellow and shades of yellow represent Moderate Correlation which states that the change is one

variable causes Moderate change in other variable

Red and Shades of Red represent negative Correlation which states that the change is one variable

causes no or very less change in other variable

31 | P a g e
CONCLUSION

We did a study on lending club company to find out their areas of weakness and improve to expand

their business. We divided the information provided to us into three categories – customer

demographic, loan information and characteristics, customer behaviour variables. the focus of the

study was to identify which groups of customers to concentrate on and see the customer retention of

the company to see how the company has grown.

A sample of 500 was taken from the population and was tested to derive certain conclusions. they are

as follows:

1. Most of the customers have a low debt to income ratio which is a good sign for the company

as a low debt to income ratio is preferred.

2. Most of the customers have a revolving balance of less than 20,000 which is good for lending

club as a lower balance is always preferred.

3. Average loan amount given to grade d employees is more but at the same time the risk factor

should also be taken into consideration compared to grade c employees. yes, the company

should focus more on grade c employees but at the same time the company should look at the

risk factor involved by giving too much loans to the grade d employees.

4. Variance of the credit card is almost equal to the variance of the small business. since it is a

sample so approx. equal values are coming but when we will consider the whole data they

will be equal which can also be made out with the mean values as they are also equal for both

the purposes. as per this, the company is already focusing equally on both these sectors so

company should look to other sectors to improve its business.

5. Almost same amount of interest rate is charged for grade a and grade b customers with slight

variation. therefore both grade customer provide same interest revenue to the firm.

32 | P a g e
6. The customers staying on rent having less annual income as compared to customers who are

staying in their own house. this can help lending club to identify that which customer is

having more creditworthiness or less creditworthiness so that it will be easy for lending club

to make decision in future.

7. The amount of loan taken for different purpose is not same. highest amount was taken for

small business, indicating that the demand for such loans is very high. firms could develop

various strategies to attract small business people.

8. Different rates are charged by financial institution for different categories of individual

depending upon the credit worthiness of the individual. individuals with grade a were given

loan with lower interest rate whereas highest interest rate was charged from grade g

individuals. this is done because people with high grade have greater credit worthiness and

the chance of recovering the loan is high.

Under customer demographic, we learned that customers who have worked more than 10 years have

taken the most loans while 9 years group have availed the least number of loans. So the lending club

has to find ways to approach this group more. The people in the salary range of 50,000rs to

1,00,000rs took the most number of loans. The lending club should work on the group with annual

income above 2 lakhs. The customers who stay in rented homes have taken the most number of loans

while those who have their own houses have taken less loans. The company should put forth loan

options like car loans or for vacation purposes to them.

Under loan information, it is seen that most of the customers have taken loans below 10,000rs. The

company should offer attractive interest rates to increase the loan taken by the customers and

advertise to them the different types of loans. Most of the loans are taken for 3 years, the company

could offer loans for various terms by taking survey of the customer’s needs. Grade b customers

have a good credit score but have taken less loans, the company should encourage them to take more

loans as their group has the potential to pay back. Most of the loans have been fully paid which is a

33 | P a g e
good sign for the company but the company should find ways to avoid the situation of charged off

loans in the future for avoiding losses.

Under customer behaviour variables, we have observed that loans are taken mostly for debt

consolidation. The company should advertise more to promote their other loan options like home

loans and educational loans. It was observed that customers with higher income have lower dti (debt

to income) ratio of range 10-15 & 15-20 respectively. Maximum dti ratio is for customers who have

lower annual income. It is recommended that the lending club should accept borrowers with higher

annual income so that the dti ratio doesn’t increase above 20. The company has relatively high

revolving balance that indicates low credit scores of the customers which is not good for the

company. The revolving utility was calculated which helps the company to determine the credit

worthiness of the customer and he can be given loans in the future accordingly. We can also observe

that the company has good customer retention ability as it can be seen from the graph that a lot of the

customers who have taken loans now have the earliest credit line dating back to 1990-2000. It is seen

that the company is not been able to earn recoveries from their charged off loans. They should avoid

this by ensuring that the customers have proper source of funds to repay the loans. They should

particularly keep a watch on grade f customers as they have the most defaults in payments which is

not good for the smooth running of the company.

The company should look into all these variables and see what changes they can make with each

aspect and implement them for growth of the lending club.

34 | P a g e
RECOMMENDATIONS

The lending club should be mandatorily studying the customer’s loan behaviour and respective data

variables. The Lending Club should focus on the following data before lending loans to their

customers: Customer Grade, Interest rate, DTI ratio, loan status, revolving balance. To attract more

customers, the company could decrease the interest rate of the loan given to the lower income groups

or give them an option for a longer repayment period. Education loans are very less, so the company

could focus more on college students to help them with their education. The company will also be

contributing towards shaping the future generation. They can introduce different loan terms like short

term loans which can be paid off within a year.

35 | P a g e

You might also like