Project 1
Project 1
Introduction:
In this project, you will take on the role of data analysts working for a real estate agency.
The agency is interested in gaining a deeper understanding of the factors that influence
housing prices within a specific city. Your analysis will contribute to more informed
property valuation and investment decisions.
Objective:
Dataset Description:
You will be provided with a dataset containing information on houses within the city.
The dataset includes the following variables:
1. House Price: The price of the house. This is the dependent variable you'll be
analyzing.
2. Square Footage: The size of the house in square feet.
3. Number of Bedrooms: The number of bedrooms in the house.
4. Number of Bathrooms: The number of bathrooms in the house.
5. Neighborhood Quality: A rating of the neighborhood where the house is
located.
6. Year Built: The year the house was constructed.
7. Garage Presence: A binary variable indicating the presence or absence of a
garage.
8. Garage Size: The size or capacity of the garage.
9. Backyard Size: The size of the backyard in square feet.
10. School Rating: A rating of the nearest school's quality.
11. Distance to Work: The distance from the house to a common workplace
location.
12. Crime Rate: A rating of the crime rate in the neighborhood.
Project Tasks:
1. Data Exploration:
Begin by exploring the dataset to understand its structure and key
statistics.
Identify any missing values and decide on strategies for handling them.
2. Correlation Analysis:
Conduct a correlation analysis to determine which independent variables
are strongly correlated with house prices. Identify at least three
independent variables that exhibit a significant correlation with house
prices.
3. Regression Analysis:
Build a multiple linear regression model with house price as the
dependent variable.
Include the three independent variables identified in the correlation
analysis as predictors.
Analyze the coefficients and significance of the independent variables in
predicting house prices.
4. Interpretation:
Interpret the coefficients of the independent variables to understand their
impact on house prices.
Provide insights into which factors contribute most significantly to housing
price variations.
5. Discussion on Omitted Variable Bias (OVB):
Explore the concept of omitted variable bias (OVB) in the context of the
dataset. What happens if we omit the garage size variable from the
analysis considering its strong correlation with the garage presence.
Discuss how an omitted variable could affect the coefficients and
inferences drawn from the model.
Generate an artificial instrumental variable which is in high correlation with
the omitted variable and use that as an instrument to treat the omitted
variable bias.
6. Heteroskedasticity Problem:
Neighborhood Quality and Crime Rate are two strongly correlated
variables.
Discuss how this strong correlation affects the inference making problem,
particularly, when there is a small sample.
You can restrict yourself to a very small portion of the sample to describe
the heteroskedasticity problem.
7. Inclusion of Polynomial terms
Include squares and cubes and interactions and discuss how inclusion of
those terms affect the results.
8. Conclusions and Recommendations:
Summarize the findings of your analysis, including key factors influencing
housing prices.
Provide recommendations to the real estate agency based on your
analysis.
Presentation:
At the end of the project, you will be required to present your findings and insights to
the real estate agency. You will explain the methodology, results, and recommendations
based on your analysis.