0% found this document useful (0 votes)
76 views

Assignment 2

This document contains an assignment submission form for a statistical methods course. It includes the course name, assignment title, names and student IDs of 4 students submitting the assignment. The assignment analyzes a dataset on loans to answer 10 questions. It finds that the claim of 1/4 repeat customers is inaccurate, FICO scores are normally distributed, and there are significant differences between original and repeat loans in FICO scores but not other factors like years in business or satisfied accounts. It recommends a sample size of over 2000 to estimate the population average PRSM score.

Uploaded by

Abhishek Batra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Assignment 2

This document contains an assignment submission form for a statistical methods course. It includes the course name, assignment title, names and student IDs of 4 students submitting the assignment. The assignment analyzes a dataset on loans to answer 10 questions. It finds that the claim of 1/4 repeat customers is inaccurate, FICO scores are normally distributed, and there are significant differences between original and repeat loans in FICO scores but not other factors like years in business or satisfied accounts. It recommends a sample size of over 2000 to estimate the population average PRSM score.

Uploaded by

Abhishek Batra
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistical Methods for Managerial Decisions - Indian School of Business

DHM-2, 2021-22 Assignment 1

ASSIGNMENT SUBMISSION FORM

Course Name: Statistical Methods for Managerial Decisions


Assignment Title: Assignment 1
Submitted by: L10 (Section L)

Student Name PG ID
Saumya Shukla 62110149
Abhishek Batra 62110322
Samank Singh 62110162
Mukund Bubna 62110646

ISB Honour Code

· I will represent myself in a truthful manner.


· I will not fabricate or plagiarize any information with regard to the curriculum.
· I will not seek, receive or obtain an unfair advantage over other students.
· I will not be a party to any violation of the ISB Honour Code.
· I will personally uphold and abide, in theory and practice, the values, purpose and rules of
the ISB Honour Code.
· I will report all violations of the ISB Honour Code by members of the ISB community.
· I will respect the rights and property of all in the ISB community.
· I will abide by all the rules and regulations that are prescribed by ISB.
SMMD Assignment 2
[Note: We have removed the outlier PRSM score row from all analysis]
1.The variable Loan Type indicates a new or repeat customer. It has been claimed that 1/4 of
customers who get these loans have had them before. Does that seem to be an appropriate
claim about the population based on your data? Offer a brief explanation.
Let, H0: π= ¼, HA: π ≠ ¼
Where, H0 is null hypothesis and π is the population proportion of repeat customers.
Following the method 2- sided testing for proportion hypothesis:

We see that the P value i.e., probability > |t| is <0.0001. Hence, the probability of making Type-I
error while rejecting the null hypothesis is <0.0001.
Therefore, we can reject the null hypothesis at this significance level. It is highly likely that
population proportion of repeat customer is not ¼.
2. Does the FICO score of the borrower appear to be normally distributed? Justify your answer.

The distribution of FICO is almost normal with slightly positive skewness of 0.2248. We can
observe the quantile box plot, most observations align with normal line (i.e., lies within the control
limits and is approximately straight), with some skew on the tails. The median is slightly greater
than the mean. Overall, the sample observations approximate normal distribution.
3. State the 95% confidence interval for the average FICO score and give a brief interpretation of
what this interval means. Be sure to check that it is appropriate to form such a confidence
interval for your data.
The 95% confidence interval for FICO is from 568.728 to 579.8398. This implies that, there is a 90%
chance that the population mean of FICO score will lie in this interval based on this sample.
We have assumed that the sample is unbiased, independent and of enough size for CLT to be
applicable, also normal distribution has been assumed for t-statistic to apply.
4. A manager at the loan operation claimed that the average age (the Years in Business variable)
of all the businesses served by this firm is less than 8 years. Do you agree, based on your data?
Explain briefly.
Let, H0: µ≤ 8, HA : µ > 8
Where, H0 is null hypothesis and µ is the population mean of years in business of loan taking
companies.

We see that the P-value i.e., probability > |t| is <0.0001. Hence, the probability of making Type-I
error while rejecting the null hypothesis is <0.0001.
Therefore, we can reject the null hypothesis at this significance level. It is highly likely that
population mean of years in business of loan taking companies is not 8.
[Note. We are assuming that population is normally distributed]
5. To identify the average PRSM in the population to 2 decimal places (i.e., to have the margin of
error less than 0.005), how large a sample would you recommend?
DMOE=0.005
t ( 1−α , n−1 ) S
Let n be the sample size, then n≥( )^2
DMOE
From the above approximately normal distribution of PRSM score, we can take S=0.115201,
Confidence interval of 95% and follow the iterative process (as shown below) to achieve the ideal
sample size of 2042 loans.

Hence, with a random sample of 2042 loans, we can predict the mean PRSM population with a
MOE of 0.005 and 95% confidence.
6. Is the population average PRSM score statistically significantly different from 1?
(a) Indicate an answer to the question, with a brief account of your analysis.
Let, H0: µ= 1, HA : µ ≠ 1
Where, H0 is null hypothesis and µ is the population mean of PRSM score of loans.

We see that the P-value i.e. probability > |t| is <0.0001. Hence, the probability of making Type-I
error while rejecting the null hypothesis is <0.0001.
Therefore, we can reject the null hypothesis at this significance level. It is highly likely that
population mean of years in business of loan taking companies is not 1.

(b) What are the implications of your answer for the business?
Since the sample mean of 627 loans is 0.7998, and the above hypothesis confirmed that
population estimate is significantly different from 1. Based on this sample, we can assume that the
population PRSM score is less than 1. Building on this, this is a negative indicator for the company
since the loans are not being paid back at the required rate. The longer time in paying back the
loan, may increase the chances of default, and may decrease working capital for the company.
7. The Chief Risk Officer is particularly concerned about the percentage of loans that have PRSM
scores of less than 0.7. What can you tell her about this percentage from your data? Approach
this question by creating a confidence interval for the proportion of loans in the population that
have a PRSM score of 0.7 or less.

As per the sample PRSM data of 627 loans, the population proportion of lower than or equal to 0.7
PRSM score loans is between 0.1665 and 0.22903 with 95% confidence level.
This implies that there is a 95% chance that the proportion of PRSM Scores lower than 0.7 will lie
between the above interval.
8. Do loans from ISO SPS have significantly different average PRSM scores than those from EZ
Check?
Let, H0: µ1- µ2= 0, HA: µ1- µ2 ≠ 0
Where, H0 is null hypothesis. µ1 is the population mean of PRSM score of loans from SPS and µ2
of EZ Check.
We see that the P-value i.e., probability > |t| is <0.0001. Hence, the probability of making Type-I
error while rejecting the null hypothesis is <0.0001.
Therefore, we can reject the null hypothesis at this significance level. It is highly likely that ISO SPS
have significantly different average PRSM scores than those from EZ Check.

9. Is it possible that the results of the previous comparison have been confounded by a lurking
variable? If so, suggest a possible lurking variable that could influence the comparison.
Otherwise, explain briefly why it is not possible.
It is possible that the results have been confounded by a lurking variable, if there is an inherent
process variation in type of loans granted to SPS and EZ Check. We have considered SPS and EZ
Check loans to be largely similar, i.e., overall similar type loans were given to both ISOs (loan
amount, rate of interest, repayment terms etc.). But this assumption may not be true. Terms and
amount of loans are not necessarily random; the amount of loan, rate of interest, repayment
terms etc. are dependant on screening and application procedure. Therefore, it could be the case
that EZ check is able to acquire loans at lower rate of interest (as it is deemed a less risky
company) and hence is able to repay the loan faster. In that case we may have confounded the
impact of lurking variable – loan interest rate with ISO and would need more information
regarding the type of loans.
10. It is the case for most datasets that there are significantly more original loans than repeat
loans. Using two-sample comparisons, do the original/repeat loans look different in any of the
following respects? (Just report a p-value and provide a conclusion for each of the following
variables; no elaborate comparison is necessary.)
In this question, we will conduct a two-sample comparison test for checking whether the original
and repeat loans LOOK DIFFERENT in the given following respects.
The null hypothesis H0: µ1- µ2= 0
Alternate Hypothesis HA: µ1- µ2 ≠ 0
Where µ1 is the mean of original loans and µ2 is the mean of repeated loans for various factors
testes below.

(a) FICO

The p value in this case is 0.0406 implying that if significant α=0.05, we would reject the null
hypothesis that original and repeat loans have the same population mean of FICO score. This
would mean that alternate hypothesis i.e., that original and repeat loans have different population
mean of FICO score is true. [If we reject the null hypothesis, we will make Type I error with 0.0406
probability]
(b) Years in business

The p value in this case is 0.1474 implying that if significant α=0.05, we would accept the null
hypothesis that original and repeat loans have the same population mean of years in business. So,
original and repeat loans are not different in this respect. [If we reject the null hypothesis, we will
make Type I error with 0.1474 probability]
(c) Satisfied accounts

The p value in this case is 0.1130 implying that if significant α=0.05, we would accept the null
hypothesis that original and repeat loans have the same population mean of satisfied accounts. So,
original and repeat loans are not different in this respect. [If we reject the null hypothesis, we will
make Type I error with 0.1130 probability]
(d) Current delinquent credit lines

The p value in this case is 0.0912 implying that if significant α=0.05, we would accept the null
hypothesis that original and repeat loans have the same population mean of satisfied accounts. So,
original and repeat loans are not different in this respect. [If we reject the null hypothesis, we will
make Type I error with 0.0912 probability]
(e) Number of derogatory legal item

The p value in this case is 0.8108 implying that if significant α=0.05, we would accept the null
hypothesis that original and repeat loans have the same population mean of satisfied accounts. So,
original, and repeat loans are not different in this respect. [If we reject the null hypothesis, we will
make Type I error with 0.8108 probability]

You might also like