HousePricePrediction Zillow Solution Methodology

Uploaded by

Chetan Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

HousePricePrediction Zillow Solution Methodology

Uploaded by

Chetan Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Zillow Zestimate: Home Value Prediction

Business Objective
Buying a house that suits his / her choices is every person's desire and it is thus known as their
dream house. There are several aspects that one considers while buying a house starting from
the budget, the location, the number of rooms available, and many more. But how to find a
house that satisfies one’s requirements? This is not a quick and easy task.
But no need to worry, homebuyers can nowadays find their dream home with a click of a
button. Zillow is a popular estimator for house evaluation available online. It is considered
one of the top real estate marketplaces for buying a house in the United States. Zillow’s
Zestimate provides the homebuyers to search for a house that satisfies their requirements of
location, area, budget, etc.
The Zillow Zestimate provides the homebuyers with information on the real worth of the
house based on public data. The accuracy of the Zestimate information depends on the
location and availability of the data of a specific area. Hence the more data available the more
is the accuracy of the Zestimate.
The purpose of this project is to build a machine learning model that will give the best future
sales prediction of homes.

Data Description

The Zillow Zestimate dataset is a dataset from Kaggle, that is used for making future sales
predictions and improving the log error.
There are two datasets available –
• train_2016 – This dataset consists of the target variable i.e. the logerror
• properties_2016 – Contents all the features related to the property/home.

There are around 60 attributes in the dataset on basis of which the model can be built.
Some of the important features amongst them are as follows:
• train_2016:
1. parcelid - Unique identification for every parcel
2. transactiondate - The date on which the home was sold
3. logerror - The residual between the actual and the predicted sale price of homes.
(Target Array)

• properties_2016:
1. parcelid - Unique identification for every parcel (common in both the datasets)
2. bathroomcnt - Total number of bathrooms in the house
3. bedroomcnt - Total number of bedrooms in the house
4. buildingqualitytypeid - This gives the quality assessment of the building ranging from
best to worst.
5. finishedsqaurefeet12 - Finished living area of the house.
6. fips - This stand for Federal Information Processing Standard code (for more detail
regarding this check: https://en.wikipedia.org/wiki/FIPS_county_code
7. latitude & longitude - longitude and latitude of the home location
8. propertylandusetypeid - This gives the type of land the property is zoned for.
9. regionidcity - The city in which the property is situated.
10. regionidcounty - The county where the property is situated.
11. regionidzip - This provides the zip code for the location of the property.
12. taxamount - Property tax for each assessment year

Aim
To predict the sale prices of the houses and improve the log error i.e. the error due to the
difference between the actual and the predicted home values.

Tech stack
➢ Language - Python
➢ Libraries - Scikit-learn, pandas, numpy, matplotlib, seaborn, scipy, xgboost, joblib

Approach
1. Importing the required libraries and reading the dataset.
▪ Merging of the two datasets
▪ Understanding the dataset
2. Exploratory Data Analysis (EDA) –
▪ Data Visualization
3. Feature Engineering
▪ Duplicate value removal
▪ Missing value imputation
▪ Rescaling of incorrectly scaled data
▪ Standardization
▪ Encoding of categorical variables
▪ Generation of new feature wherever required.
▪ Dropping of redundant feature columns
▪ Checking for multi-collinearity and removal of highly correlated features
▪ Check for the outliners and removal of outliers.

4. Model Building
▪ Performing train test split
▪ Feature Scaling
▪ Dropping features if necessary
▪ Linear Regression Model
▪ Elastic Net
▪ Ridge Regression
▪ Lasso Regressor
▪ XGBoost Regressor
▪ Adaboost Regressor
▪ Gradient Boosting Regressor
▪ Decision Tree Regressor
▪ Random Forest Regressor
5. Model Validation
▪ Mean Absolute Error
6. Hypermeter Tuning (GridSearchCV)
▪ For Random Forest Regressor
7. Checking for Feature Importance
8. Creating the final model and making predictions

Modular code overview

Once you unzip the modular_code.zip file you can find the following folders within it.
1. input
2. src
3. output
4. lib
1. Input folder - It contains all the data that we have for analysis. There are two csv files
in our case,
• train_2016_v2
• properties_2016

2. src folder - This is the most important folder of the project. This folder contains all the
modularized code for all the above steps in a modularized manner. This folder consists
of:
• engine.py
• ML_Pipeline
The ML_pipeline is a folder that contains all the functions put into different python files
which are appropriately named. These python functions are then called inside the engine.py
file.
3. Output folder – The output folder contains the best fitted model that we trained for
this data. This model can be easily loaded and used for future use and the user need
not have to train all the models from the beginning.
Note: This model is built over a chunk of data. One can obtain the model for the entire
data by running engine.py by taking the entire data to train the models.

4. lib folder - This is a reference folder. It contains the original ipython notebook that we
saw in the videos.

Project Takeaways
1. Understanding the business problem.
2. Importing the dataset and required libraries.
3. Performing basic Exploratory Data Analysis (EDA).
4. Data cleaning and missing data handling if required, using appropriate methods.
5. Checking data distribution using statistical techniques.
6. Checking for outliers and how they need to be treated as per the model selection.
7. Using python libraries such as matplotlib and seaborn for data interpretation and
advanced visualizations.
8. Splitting Dataset into Train and Test using various sampling techniques
9. Performing Feature Engineering on sample data for better performance.
10. Training a model using Regression techniques like Linear Regression, Random Forest
Regressor, XGBoost Regressor, etc.
11. Training multiple models using different Machine Learning Algorithms suitable for the
scenario and checking for best performance.
12. Understanding feature scaling importance and applying them if required.
13. Performing Cross Validation to check if the model is overfitting and whether results
are somewhat constant.
14. Tuning hyper-parameters of models to achieve optimal performance.
15. Making predictions using the trained model.
16. Gaining confidence in the model using metrics such as MAE, MSE, RMSE.
17. What features are most helpful for predictive power using Feature Importance.
18. How Target variable is dependent on the values of Input features.
19. Selection of the best model based on performance metrics and Hyper-Parameter
Optimization.

Econometric Analysis 8th Edition Greene Solutions Manual - Complete Set of Chapters Available For Instant Download
100% (5)
Econometric Analysis 8th Edition Greene Solutions Manual - Complete Set of Chapters Available For Instant Download
34 pages
MINIMUM DETECTABLE EFFECTS A Simple Way To Report The Statistical Power of Experimental Designs
No ratings yet
MINIMUM DETECTABLE EFFECTS A Simple Way To Report The Statistical Power of Experimental Designs
10 pages
House Value
No ratings yet
House Value
22 pages
Relative Potency Estimation in Direct Bioassay With Measurement e
No ratings yet
Relative Potency Estimation in Direct Bioassay With Measurement e
16 pages
Iso 5168 2005
No ratings yet
Iso 5168 2005
15 pages
Statistics
No ratings yet
Statistics
412 pages
Kirubavathi
No ratings yet
Kirubavathi
10 pages
Hansen 1982
No ratings yet
Hansen 1982
27 pages
Dse-4) ) Business Research Methods ?
No ratings yet
Dse-4) ) Business Research Methods ?
112 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
House Prices Analysis - Final Assessment
No ratings yet
House Prices Analysis - Final Assessment
2 pages
(House Price Prediction) Capstone Project For Python
No ratings yet
(House Price Prediction) Capstone Project For Python
10 pages
Fat Tails STATISTICAL CONSEQUENCES OF FAT TAILS PDF
100% (2)
Fat Tails STATISTICAL CONSEQUENCES OF FAT TAILS PDF
364 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
ISMLA Module5
No ratings yet
ISMLA Module5
25 pages
Presentation 1
No ratings yet
Presentation 1
11 pages
Ese Lab File
No ratings yet
Ese Lab File
30 pages
Robust Weighted Least Squares Estimation
No ratings yet
Robust Weighted Least Squares Estimation
7 pages
Intership Report
No ratings yet
Intership Report
20 pages
Point Estimation
No ratings yet
Point Estimation
22 pages
House Price Prediction
No ratings yet
House Price Prediction
27 pages
ML Project CLG
No ratings yet
ML Project CLG
62 pages
Presentation 21
No ratings yet
Presentation 21
9 pages
House Price Prediction With Analysis
No ratings yet
House Price Prediction With Analysis
9 pages
KA Prep 20 04 24
No ratings yet
KA Prep 20 04 24
10 pages
Weeks 1-4 AI Paper by Hand PDF
No ratings yet
Weeks 1-4 AI Paper by Hand PDF
22 pages
House Price Prediction
No ratings yet
House Price Prediction
17 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
61 pages
Anbuselvan Phase2
No ratings yet
Anbuselvan Phase2
5 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
For House Price Prediction Model
No ratings yet
For House Price Prediction Model
9 pages
Dawit House
No ratings yet
Dawit House
49 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
Different Algorithms Used - Reference
No ratings yet
Different Algorithms Used - Reference
1 page
Phase 2 Irfan
No ratings yet
Phase 2 Irfan
5 pages
TL On-Tap
No ratings yet
TL On-Tap
158 pages
Report
No ratings yet
Report
40 pages
5 Ai Trends in 2025 PDF
No ratings yet
5 Ai Trends in 2025 PDF
9 pages
Story Point Estimation Copy
No ratings yet
Story Point Estimation Copy
16 pages
9.1 Classroom Notes Key
No ratings yet
9.1 Classroom Notes Key
8 pages
House Price Prediction Report
No ratings yet
House Price Prediction Report
2 pages
Ads Lab8
No ratings yet
Ads Lab8
5 pages
House Price Predictor PPT Project
No ratings yet
House Price Predictor PPT Project
13 pages
HOUSE PREDICTION (1) (1) New
No ratings yet
HOUSE PREDICTION (1) (1) New
24 pages
Machine Learning and Fund Characteristics Help To Select Mutual Funds With Positive Alpha
No ratings yet
Machine Learning and Fund Characteristics Help To Select Mutual Funds With Positive Alpha
22 pages
24 Linreg 2
No ratings yet
24 Linreg 2
12 pages
Shub Neet DT
No ratings yet
Shub Neet DT
12 pages
AIreport
No ratings yet
AIreport
17 pages
Real-Estate Property
No ratings yet
Real-Estate Property
11 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
Sia Notes 2013
No ratings yet
Sia Notes 2013
279 pages
On Pilot Symbol Assisted Carrier Synchro
No ratings yet
On Pilot Symbol Assisted Carrier Synchro
8 pages
Document 4
No ratings yet
Document 4
4 pages
Econ 335 Wooldridge CH 8 Heteroskedasticity
No ratings yet
Econ 335 Wooldridge CH 8 Heteroskedasticity
23 pages
Nikita Prasad - T-Test Basics PDF
No ratings yet
Nikita Prasad - T-Test Basics PDF
7 pages
Project Synopsis Shaiba
No ratings yet
Project Synopsis Shaiba
5 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
Heteroskedasticity & Autocorrelation
No ratings yet
Heteroskedasticity & Autocorrelation
31 pages
House Price Prediction
No ratings yet
House Price Prediction
5 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
LOV
No ratings yet
LOV
43 pages
Solution Methodology
No ratings yet
Solution Methodology
5 pages
FML PROJECT Diya
No ratings yet
FML PROJECT Diya
9 pages
UtkarshGupta (House Price Prediction)
No ratings yet
UtkarshGupta (House Price Prediction)
14 pages
BA Project - Team17
No ratings yet
BA Project - Team17
13 pages
House Report
No ratings yet
House Report
26 pages
An Analysis of Cost Estimation and Pricing
No ratings yet
An Analysis of Cost Estimation and Pricing
9 pages
Phase 5
No ratings yet
Phase 5
5 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
PN1 Shakti Akshaya S PDF
100% (2)
PN1 Shakti Akshaya S PDF
60 pages
Comprehensive Project
No ratings yet
Comprehensive Project
10 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages
Chap 5 MCQ
No ratings yet
Chap 5 MCQ
12 pages
Lesson 6.4 Simple Analysis of Variance Fin
No ratings yet
Lesson 6.4 Simple Analysis of Variance Fin
19 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
Utkarsh Gupta - House Price Prediction
No ratings yet
Utkarsh Gupta - House Price Prediction
6 pages
Determinants of Stock Prices in Dhaka Stock Exchange (DSE)
No ratings yet
Determinants of Stock Prices in Dhaka Stock Exchange (DSE)
10 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
A Synopsys Report
No ratings yet
A Synopsys Report
16 pages
Syllabus Math HL
No ratings yet
Syllabus Math HL
31 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
House Price Prediction Using Machine Learning in Python
No ratings yet
House Price Prediction Using Machine Learning in Python
13 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
Property Price Prediction Capstone Project
100% (1)
Property Price Prediction Capstone Project
7 pages
Dsbda Mini Priyanshu
No ratings yet
Dsbda Mini Priyanshu
17 pages
STAT 552 Probability and Statistics Ii: Short Review of S551
No ratings yet
STAT 552 Probability and Statistics Ii: Short Review of S551
51 pages
Data Science Assignment Chapter 1
No ratings yet
Data Science Assignment Chapter 1
5 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
4 pages
6.controlling Sampling Non-Sampling Errors
No ratings yet
6.controlling Sampling Non-Sampling Errors
3 pages
The Analytic Hierarchy Process - What It Is and How It Is Used
No ratings yet
The Analytic Hierarchy Process - What It Is and How It Is Used
17 pages
Fundamentals of Signal Processing For Phased Array Radar: Dr. Ulrich Nickel
No ratings yet
Fundamentals of Signal Processing For Phased Array Radar: Dr. Ulrich Nickel
22 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

HousePricePrediction Zillow Solution Methodology

Uploaded by

HousePricePrediction Zillow Solution Methodology

Uploaded by

Zillow Zestimate: Home Value Prediction

Modular code overview

You might also like