0% found this document useful (0 votes)
24 views

Final Report Data Analysis Example 1 Template

This study analyzes relationships between annual income, home ownership, and loan amount using the loan50 dataset. Key findings include: most individuals owned homes with mortgages, median income was highest for mortgage holders, and higher incomes correlated with larger loan amounts. Descriptive statistics and visualizations revealed patterns around central tendencies, dispersions, and potential outliers.

Uploaded by

周姿馨
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Final Report Data Analysis Example 1 Template

This study analyzes relationships between annual income, home ownership, and loan amount using the loan50 dataset. Key findings include: most individuals owned homes with mortgages, median income was highest for mortgage holders, and higher incomes correlated with larger loan amounts. Descriptive statistics and visualizations revealed patterns around central tendencies, dispersions, and potential outliers.

Uploaded by

周姿馨
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Exploring the Dynamics of Annual Income, Home Ownership, and

Loan Amount: An In-depth Exploratory Data Analysis


Name & Student ID

Abstract
This study employs exploratory data analysis methods to examine the

relationships between annual income, home ownership, and loan amount. Through

statistical analysis and data visualization, we investigate correlations, patterns, and

potential outliers within the loan50 dataset. The findings provide valuable insights for

financial institutions, policymakers, and individuals making housing and loan

decisions. The methodology establishes a foundation for future socio-economic

research and modeling.

Keywords: annual income, loan amount, home ownership

Introduction
The increasing popularity of online lending platforms, such as the Lending Club,

has revolutionized the way individuals access loans. As the lending industry evolves,

it becomes crucial to explore the dynamics between key variables that influence loan

decisions. Previous research in the field of personal finance and lending has focused

on factors like credit scores and debt-to-income ratios (Agarwal et al. 2015).

However, limited attention has been given to the interplay between annual income,

home ownership, and loan amount. Understanding the relationships between these

variables can provide valuable insights into the loan borrowing process and inform

better lending practices. (Research Background)

The objective of this study is to conduct an in-depth exploratory data analysis to


investigate the dynamics between annual income, home ownership, and loan amount

within the loan50 dataset. By analyzing the relationships and patterns that emerge

from the data, this study aims to uncover potential insights and identify any

significant associations between these variables. The findings will contribute to our

understanding of the factors influencing loan decisions and provide valuable

information for financial institutions, policymakers, and individuals in making

informed housing and loan choices. Additionally, the study aims to lay the

groundwork for future socio-economic research and modeling in the lending industry.

(Statement of Purpose)

Methods
The loan50 dataset was obtained from the "OpenIntro Statistics" website and

consists of 50 samples and 18 variables. For this analysis, the focus is on three

variables: home_ownership, annual_income, and loan_amount. Home_ownership

represents the ownership status of the applicant's residence, annual_income represents

the applicants' annual income, and loan_amount represents the amount of the loan

they received. (Data Description)

The home_ownership variable is categorical, with categories including rent,

mortgage, and own. The annual_income and loan_amount variables are numerical,

representing continuous values. (Nature of Variables)

To gain insights from the data, descriptive statistics were calculated and

visualizations were created. These descriptive statistics provide measures of central

tendency (mean and median) and data dispersion (standard deviation) for each

variable. Furthermore, various visualizations, including histogram for annual_income

variable, bar chart for homeownership variable, scatter plot of the variables
annual_income and loan_amount, and parallel box plots of the variables

annual_income and homeownership, were created to explore the relationships,

patterns, and potential outliers within the loan50 dataset. The parallel boxplots allow

for a visual comparison of the annual income distributions among different

homeownership groups (e.g., rent, mortgage, own). By examining the positions and

shapes of the boxes and whiskers, we can gain insights into potential variations in

income levels. The scatter plot can help us determine whether there is a relationship

between annual income and loan amount. If the data points are tightly clustered

around a linear pattern, it suggests a strong correlation. On the other hand, if the data

points are more scattered and do not form a clear linear pattern, it suggests a weak or

no correlation. These methods allow for a comprehensive understanding of the

dataset, enabling further analysis and interpretation of the findings. (Descriptive

Statistics and Data Visualization)

Results
Table 1 shows the mean annual income in the dataset is $82,276. The median

income is slightly lower at $75,000, indicating that the distribution of incomes may be

slightly right-skewed (Figure 1). The standard deviation for annual income is

relatively high at $66,631.74, indicating a wide dispersion of income values around

the mean. The mean loan amount is $17,989.74, representing the average loan amount

in the dataset. The median loan amount of $16,000 is slightly lower than the mean,

also suggesting a slightly right-skewed distribution of loan amounts. The range of

loan amounts varies from a minimum of $5,825 to a maximum of $40,000, indicating

the spread of loan amounts in the dataset. The standard deviation for loan amount is

$8,195.35, indicating a considerable variation in loan amounts around the mean.


Table 1. The descriptive statistics for the annual_income and loan_amount variables.

Variable Mean Median Standard Deviation

Annual Income $82,276 $75,000 $66,631.74

Loan Amount $17,989.74 $16,000 $8,195.35

Figure 1: The histogram with density plot for annual_income.


Figure 2 shows the most common homeownership status among the provided

dataset is "mortgage," which suggests that a significant number of individuals in the

sample own homes with mortgage loans.

Figure 2: The bar plot for homeownership variable.

It could be interesting to explore the relationship between homeownership status

(rent, mortgage, and own) and annual income. In figure 3, the horizontal lines inside

each box represent the median annual income for each homeownership category.

Figure 3 shows that the median annual income for mortgage holders is higher than the

median annual income for renters and homeowners. The interquartile range (IQR) for

annual income is higher for renters than for homeowners and mortgage holders. This

means that there is a wider range of incomes among renters than among homeowners

and mortgage holders. There are a few outliers in the box plot for annual income for
renters and mortgage holders.

Figure 3: The parallel box plots of the variables homeownership and annual_income.

From the scatter plot in Figure 4, we can see a general positive relationship

between annual income and loan amount. As annual income increases, there is a

tendency for the loan amount to also increase. This suggests that individuals with

higher incomes tend to qualify for larger loan amounts.


Figure 4: The scatter plot of the variables annual_income and loan_amount.
Conclusions
For the results of annual income in Table 1, the median income of $75,000 is

slightly lower than the mean ($82,276), suggesting a slightly right-skewed distribution

of incomes (Figure 1). This skewness indicates that there may be a few high-income

outliers or a group of individuals with relatively higher incomes. The standard

deviation of annual income is $66,631.74, which is relatively high. This indicates a

wide dispersion of income values around the mean. The large standard deviation

suggests significant variability in income levels within the dataset, with some

individuals having considerably higher or lower incomes compared to the mean.

The most common homeownership status among the dataset is "mortgage,"

implying that a significant number of individuals in the sample own homes with

mortgage loans (Figure 2). This finding aligns with the expectation that

homeownership is a common form of housing arrangement. This is consistent with

the fact that the mean loan amount is $17,989.74.

The slightly right-skewed distribution of loan amounts also indicates that there
may be a few higher loan amount outliers or a group of individuals with relatively

larger loans. The standard deviation for loan amount ($8,195.35) indicates a

considerable variation in loan amounts around the mean. This suggests that there is a

range of loan sizes within the dataset, with some individuals borrowing substantially

higher or lower amounts compared to the average.

Based on the findings in Figure 3, it can be concluded that there is a potential

disparity in income between homeownership groups, with mortgage holders having

the highest median income and renters having the lowest. The distribution of annual

income is also more evenly distributed for homeowners than for renters or mortgage

holders. This suggests that there is less variation in income levels among homeowners

than among renters or mortgage holders. The presence of outliers in the box plots for

renters and mortgage holders suggests that there are some people in these groups with

very high or very low incomes. This could be due to a number of factors, such as

differences in employment status, education level, or occupation.

Figure 4 shows a positive correlation between annual income and loan amount,

where higher incomes qualify for larger loans. This relationship is consistent with

general expectations, as higher incomes provide a stronger financial foundation for

borrowing larger sums. The majority of data points appear to be concentrated in the

lower range of both annual income and loan amount. This indicates that the dataset

predominantly consists of individuals with lower to moderate incomes and smaller

loan amounts. There are a few outliers in the scatter plot, represented by data points

that deviate significantly from the general pattern. The scatter plot reveals that the

data points are spread out rather than forming a tightly clustered pattern, suggesting

that while there is a positive relationship, it is not a perfect correlation. There is

variation in loan amounts even within similar income levels, indicating the influence
of other factors in determining loan amounts.

This study provides initial insights into the relationship between homeownership,

annual income, and loan amounts. However, it is important to acknowledge that the

conclusions are based on a relatively small dataset with only 50 samples. Therefore,

the findings may not be representative of the entire population or generalize to

broader contexts. Further research and analysis are warranted to explore additional

factors that may influence loan decisions, such as credit scores, employment status, or

loan terms. Additionally, studying a larger dataset and conducting more sophisticated

statistical analyses, such as hypothesis testing or regression analysis, would enhance

the reliability and validity of the conclusions.

References
Agarwal, S., I. Ben-David and V.W. Yao. 2015. Collateral Valuation and Borrower
Financial Constraints: Evidence from the Residential Real Estate Market.
Management Science 61: 2220– 2240.

You might also like