Final Report Data Analysis Example 1 Template
Final Report Data Analysis Example 1 Template
Abstract
This study employs exploratory data analysis methods to examine the
relationships between annual income, home ownership, and loan amount. Through
potential outliers within the loan50 dataset. The findings provide valuable insights for
Introduction
The increasing popularity of online lending platforms, such as the Lending Club,
has revolutionized the way individuals access loans. As the lending industry evolves,
it becomes crucial to explore the dynamics between key variables that influence loan
decisions. Previous research in the field of personal finance and lending has focused
on factors like credit scores and debt-to-income ratios (Agarwal et al. 2015).
However, limited attention has been given to the interplay between annual income,
home ownership, and loan amount. Understanding the relationships between these
variables can provide valuable insights into the loan borrowing process and inform
within the loan50 dataset. By analyzing the relationships and patterns that emerge
from the data, this study aims to uncover potential insights and identify any
significant associations between these variables. The findings will contribute to our
informed housing and loan choices. Additionally, the study aims to lay the
groundwork for future socio-economic research and modeling in the lending industry.
(Statement of Purpose)
Methods
The loan50 dataset was obtained from the "OpenIntro Statistics" website and
consists of 50 samples and 18 variables. For this analysis, the focus is on three
the applicants' annual income, and loan_amount represents the amount of the loan
mortgage, and own. The annual_income and loan_amount variables are numerical,
To gain insights from the data, descriptive statistics were calculated and
tendency (mean and median) and data dispersion (standard deviation) for each
variable, bar chart for homeownership variable, scatter plot of the variables
annual_income and loan_amount, and parallel box plots of the variables
patterns, and potential outliers within the loan50 dataset. The parallel boxplots allow
homeownership groups (e.g., rent, mortgage, own). By examining the positions and
shapes of the boxes and whiskers, we can gain insights into potential variations in
income levels. The scatter plot can help us determine whether there is a relationship
between annual income and loan amount. If the data points are tightly clustered
around a linear pattern, it suggests a strong correlation. On the other hand, if the data
points are more scattered and do not form a clear linear pattern, it suggests a weak or
Results
Table 1 shows the mean annual income in the dataset is $82,276. The median
income is slightly lower at $75,000, indicating that the distribution of incomes may be
slightly right-skewed (Figure 1). The standard deviation for annual income is
the mean. The mean loan amount is $17,989.74, representing the average loan amount
in the dataset. The median loan amount of $16,000 is slightly lower than the mean,
the spread of loan amounts in the dataset. The standard deviation for loan amount is
(rent, mortgage, and own) and annual income. In figure 3, the horizontal lines inside
each box represent the median annual income for each homeownership category.
Figure 3 shows that the median annual income for mortgage holders is higher than the
median annual income for renters and homeowners. The interquartile range (IQR) for
annual income is higher for renters than for homeowners and mortgage holders. This
means that there is a wider range of incomes among renters than among homeowners
and mortgage holders. There are a few outliers in the box plot for annual income for
renters and mortgage holders.
Figure 3: The parallel box plots of the variables homeownership and annual_income.
From the scatter plot in Figure 4, we can see a general positive relationship
between annual income and loan amount. As annual income increases, there is a
tendency for the loan amount to also increase. This suggests that individuals with
slightly lower than the mean ($82,276), suggesting a slightly right-skewed distribution
of incomes (Figure 1). This skewness indicates that there may be a few high-income
wide dispersion of income values around the mean. The large standard deviation
suggests significant variability in income levels within the dataset, with some
implying that a significant number of individuals in the sample own homes with
mortgage loans (Figure 2). This finding aligns with the expectation that
The slightly right-skewed distribution of loan amounts also indicates that there
may be a few higher loan amount outliers or a group of individuals with relatively
larger loans. The standard deviation for loan amount ($8,195.35) indicates a
considerable variation in loan amounts around the mean. This suggests that there is a
range of loan sizes within the dataset, with some individuals borrowing substantially
the highest median income and renters having the lowest. The distribution of annual
income is also more evenly distributed for homeowners than for renters or mortgage
holders. This suggests that there is less variation in income levels among homeowners
than among renters or mortgage holders. The presence of outliers in the box plots for
renters and mortgage holders suggests that there are some people in these groups with
very high or very low incomes. This could be due to a number of factors, such as
Figure 4 shows a positive correlation between annual income and loan amount,
where higher incomes qualify for larger loans. This relationship is consistent with
borrowing larger sums. The majority of data points appear to be concentrated in the
lower range of both annual income and loan amount. This indicates that the dataset
loan amounts. There are a few outliers in the scatter plot, represented by data points
that deviate significantly from the general pattern. The scatter plot reveals that the
data points are spread out rather than forming a tightly clustered pattern, suggesting
variation in loan amounts even within similar income levels, indicating the influence
of other factors in determining loan amounts.
This study provides initial insights into the relationship between homeownership,
annual income, and loan amounts. However, it is important to acknowledge that the
conclusions are based on a relatively small dataset with only 50 samples. Therefore,
broader contexts. Further research and analysis are warranted to explore additional
factors that may influence loan decisions, such as credit scores, employment status, or
loan terms. Additionally, studying a larger dataset and conducting more sophisticated
References
Agarwal, S., I. Ben-David and V.W. Yao. 2015. Collateral Valuation and Borrower
Financial Constraints: Evidence from the Residential Real Estate Market.
Management Science 61: 2220– 2240.