0% found this document useful (0 votes)

41 views14 pages

House Price Prediction

This document discusses predicting house prices using machine learning models. It performs data preprocessing steps like dropping unnecessary columns, handling missing values, feature scaling and encoding categorical variables. Different regression models like linear regression, ridge regression, decision trees are applied to predict house prices.

Uploaded by

Vinay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views14 pages

House Price Prediction

Uploaded by

Vinay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

7/8/23, 4:15 PM house price

House Price Prediction

In [ ]: #Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import FastMarkerCluster
from sklearn import preprocessing
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import Ridge

In [ ]: # Importing the dataset

data = pd.read_csv('https://raw.githubusercontent.com/rashida048/Datasets/master
data.head()

Out[ ]: id date price bedrooms bathrooms sqft_living sqft_lot f

0 7129300520 20141013T000000 221900 3 1.00 1180 5650

1 6414100192 20141209T000000 538000 3 2.25 2570 7242

2 5631500400 20150225T000000 180000 2 1.00 770 10000

3 2487200875 20141209T000000 604000 4 3.00 1960 5000

4 1954400510 20150218T000000 510000 3 2.00 1680 8080

5 rows × 21 columns

In [ ]: #droping the unnecessary columns such as id, date, zipcode , lat and long
data.drop(['id','date'],axis=1,inplace=True)
data.head()

file:///E:/Data Science Course/Projects/house price.html 1/13

7/8/23, 4:15 PM house price

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot floors waterfront view condit

0 221900 3 1.00 1180 5650 1.0 0 0

1 538000 3 2.25 2570 7242 2.0 0 0

2 180000 2 1.00 770 10000 1.0 0 0

3 604000 4 3.00 1960 5000 1.0 0 0

4 510000 3 2.00 1680 8080 1.0 0 0

In [ ]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 21613 non-null int64
1 bedrooms 21613 non-null int64
2 bathrooms 21613 non-null float64
3 sqft_living 21613 non-null int64
4 sqft_lot 21613 non-null int64
5 floors 21613 non-null float64
6 waterfront 21613 non-null int64
7 view 21613 non-null int64
8 condition 21613 non-null int64
9 grade 21613 non-null int64
10 sqft_above 21613 non-null int64
11 sqft_basement 21613 non-null int64
12 yr_built 21613 non-null int64
13 yr_renovated 21613 non-null int64
14 zipcode 21613 non-null int64
15 lat 21613 non-null float64
16 long 21613 non-null float64
17 sqft_living15 21613 non-null int64
18 sqft_lot15 21613 non-null int64
dtypes: float64(4), int64(15)
memory usage: 3.1 MB

In [ ]: data.describe()

file:///E:/Data Science Course/Projects/house price.html 2/13

7/8/23, 4:15 PM house price

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot fl

count 2.161300e+04 21613.000000 21613.000000 21613.000000 2.161300e+04 21613.000

mean 5.400881e+05 3.370842 2.114757 2079.899736 1.510697e+04 1.494

std 3.671272e+05 0.930062 0.770163 918.440897 4.142051e+04 0.539

min 7.500000e+04 0.000000 0.000000 290.000000 5.200000e+02 1.000

25% 3.219500e+05 3.000000 1.750000 1427.000000 5.040000e+03 1.000

50% 4.500000e+05 3.000000 2.250000 1910.000000 7.618000e+03 1.500

75% 6.450000e+05 4.000000 2.500000 2550.000000 1.068800e+04 2.000

max 7.700000e+06 33.000000 8.000000 13540.000000 1.651359e+06 3.500

In [ ]: # checking for null values/missing values

data.isnull().sum()

Out[ ]: price 0
bedrooms 0
bathrooms 0
sqft_living 0
sqft_lot 0
floors 0
waterfront 0
view 0
condition 0
grade 0
sqft_above 0
sqft_basement 0
yr_built 0
yr_renovated 0
zipcode 0
lat 0
long 0
sqft_living15 0
sqft_lot15 0
dtype: int64

In [ ]: data.nunique()

file:///E:/Data Science Course/Projects/house price.html 3/13

7/8/23, 4:15 PM house price

Out[ ]: price 4032

bedrooms 13
bathrooms 30
sqft_living 1038
sqft_lot 9782
floors 6
waterfront 2
view 5
condition 5
grade 12
sqft_above 946
sqft_basement 306
yr_built 116
yr_renovated 70
zipcode 70
lat 5034
long 752
sqft_living15 777
sqft_lot15 8689
dtype: int64

Data Preprocessing
In [ ]: # changing float to integer
data['bathrooms'] = data['bathrooms'].astype(int)
data['floors'] = data['floors'].astype(int)
# renaming the column yr_built to age and changing the values to age
data.rename(columns={'yr_built':'age'},inplace=True)
data['age'] = 2023 - data['age']
# changing the column yr_renovated to renovated and changing the values to 0 and
data.rename(columns={'yr_renovated':'renovated'},inplace=True)
data['renovated'] = data['renovated'].apply(lambda x: 0 if x == 0 else 1)

In [ ]: # using simple feature scaling

data['sqft_living'] = data['sqft_living']/data['sqft_living'].max()
data['sqft_living15'] = data['sqft_living15']/data['sqft_living15'].max()
data['sqft_lot'] = data['sqft_lot']/data['sqft_lot'].max()
data['sqft_above'] = data['sqft_above']/data['sqft_above'].max()
data['sqft_basement'] = data['sqft_basement']/data['sqft_basement'].max()
data['sqft_lot15'] = data['sqft_lot15']/data['sqft_lot15'].max()

In [ ]: data.head()

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot floors waterfront view cond

0 221900 3 1 0.087149 0.003421 1 0 0

1 538000 3 2 0.189808 0.004385 2 0 0

2 180000 2 1 0.056869 0.006056 1 0 0

3 604000 4 3 0.144756 0.003028 1 0 0

4 510000 3 2 0.124077 0.004893 1 0 0

file:///E:/Data Science Course/Projects/house price.html 4/13

7/8/23, 4:15 PM house price

Exploratory Data Analysis

Correlation Matrix to find the relationship between the variables

In [ ]: # using correlation statistical method to find the relation between the price an
data.corr()['price'].sort_values(ascending=False)

Out[ ]: price 1.000000

sqft_living 0.702035
grade 0.667434
sqft_above 0.605567
sqft_living15 0.585379
bathrooms 0.510072
view 0.397293
sqft_basement 0.323816
bedrooms 0.308350
lat 0.307003
waterfront 0.266369
floors 0.237211
renovated 0.126092
sqft_lot 0.089661
sqft_lot15 0.082447
condition 0.036362
long 0.021626
zipcode -0.053203
age -0.054012
Name: price, dtype: float64

In [ ]: plt.figure(figsize=(20,20))
sns.heatmap(data.corr(),annot=True)
plt.show()

file:///E:/Data Science Course/Projects/house price.html 5/13

7/8/23, 4:15 PM house price

Visualizing the coorelation with price

In [ ]: data.corr()['price'][:-1].sort_values().plot(kind='bar')

Out[ ]: <Axes: >

file:///E:/Data Science Course/Projects/house price.html 6/13

7/8/23, 4:15 PM house price

Visulaizing the data

In [ ]: # visualizing the relation between price and sqft_living, sqft_lot, sqft_above,

fig, ax = plt.subplots(4,4,figsize=(20,20))
sns.scatterplot( x = data['sqft_living'], y = data['price'],ax=ax[0,0])
sns.scatterplot( x = data['sqft_lot'], y = data['price'],ax=ax[0,1])
sns.scatterplot( x = data['sqft_above'], y = data['price'],ax=ax[0,2])
sns.scatterplot( x = data['sqft_basement'], y = data['price'],ax=ax[0,3])
sns.scatterplot( x = data['sqft_living15'], y = data['price'],ax=ax[1,0])
sns.scatterplot( x = data['sqft_lot15'], y = data['price'],ax=ax[1,1])
sns.lineplot( x = data['age'], y = data['price'],ax=ax[1,2])
sns.boxplot( x = data['renovated'], y = data['price'],ax=ax[1,3])
sns.scatterplot( x = data['bedrooms'], y = data['price'],ax=ax[2,0])
sns.lineplot( x = data['bathrooms'], y = data['price'],ax=ax[2,1])
sns.barplot( x = data['floors'], y = data['price'],ax=ax[2,2])
sns.boxplot( x = data['waterfront'], y = data['price'],ax=ax[2,3])
sns.barplot( x = data['view'], y = data['price'],ax=ax[3,0])
sns.barplot( x = data['condition'], y = data['price'],ax=ax[3,1])
sns.lineplot( x = data['grade'], y = data['price'],ax=ax[3,2])
sns.lineplot( x = data['age'], y = data['renovated'],ax=ax[3,3])
plt.show()

file:///E:/Data Science Course/Projects/house price.html 7/13

7/8/23, 4:15 PM house price

Plotting the location of the houses based on longitude and latitude on

the map

In [ ]: # adding a new column price_range and categorizing the price into 4 categories
data['price_range'] = pd.cut(data['price'],bins=[0,321950,450000,645000,1295648]

In [ ]: map = folium.Map(location=[47.5480, -121.9836],zoom_start=8)

marker_cluster = FastMarkerCluster(data[['lat', 'long']].values.tolist()).add_to
map

file:///E:/Data Science Course/Projects/house price.html 8/13

7/8/23, 4:15 PM house price

22
Out[ ]: Make this Notebook Trusted to load map: File -> Trust Notebook
+ 13 34

− 47
6

7 56
Leaflet (https://leafletjs.com) | Data by © OpenStreetMap (http://openstreetmap.org), under ODbL
(http://www.openstreetmap.org/copyright).
30 52

Train/Test Split
In [ ]: data.drop(['price_range'],axis=1,inplace=True)
X_train, X_test, y_train, y_test = train_test_split(data.drop('price',axis=1),da

Model Training

Using pipeline to combine the transformers and estimators

and fit the model
In [ ]: input = [('scale',StandardScaler()),('polynomial', PolynomialFeatures(degree=2))
pipe = Pipeline(input)
pipe

Out[ ]: ▸ Pipeline

▸ StandardScaler

▸ PolynomialFeatures

▸ LinearRegression

In [ ]: #training the model

pipe.fit(X_train,y_train)
pipe.score(X_test,y_test)

Out[ ]: 0.8271896429378042

In [ ]: #testing the model

pipe_pred = pipe.predict(X_test)
r2_score(y_test,pipe_pred)

file:///E:/Data Science Course/Projects/house price.html 9/13

7/8/23, 4:15 PM house price

Out[ ]: 0.8271896429378042

Ridge Regression
In [ ]: Ridgemodel = Ridge(alpha = 0.001)
Ridgemodel

Out[ ]: ▾ Ridge

Ridge(alpha=0.001)

In [ ]: # training the model

Ridgemodel.fit(X_train,y_train)
Ridgemodel.score(X_test,y_test)

In [ ]: #testing the model

r_pred = Ridgemodel.predict(X_test)
r2_score(y_test,r_pred)

Out[ ]: 0.7123220593275169

Random Forest Regression

In [ ]: from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=100, random_state=0)
regressor

Out[ ]: ▾ RandomForestRegressor

RandomForestRegressor(random_state=0)

In [ ]: # training the model

regressor.fit(X_train,y_train)
regressor.score(X_test,y_test)

Out[ ]: 0.878968081057204

In [ ]: #testing the model

yhat = regressor.predict(X_test)
r2_score(y_test,yhat)

Out[ ]: 0.878968081057204

Model Evalution

Distribution plot from the models predictions and the

actual values
In [ ]: # displot of the actual price and predicted price for all models
fig, ax = plt.subplots(1,3,figsize=(20,5))
sns.distplot(y_test,ax=ax[0])
sns.distplot(pipe_pred,ax=ax[0])

file:///E:/Data Science Course/Projects/house price.html 10/13

7/8/23, 4:15 PM house price

sns.distplot(y_test,ax=ax[1])
sns.distplot(r_pred,ax=ax[1])
sns.distplot(y_test,ax=ax[2])
sns.distplot(yhat,ax=ax[2])
# legends
ax[0].legend(['Actual Price','Predicted Price'])
ax[1].legend(['Actual Price','Predicted Price'])
ax[2].legend(['Actual Price','Predicted Price'])
#model name as title
ax[0].set_title('Linear Regression')
ax[1].set_title('Ridge Regression')
ax[2].set_title('Random Forest Regression')
plt.show()

Error Evaluation
In [ ]: #plot the graph to compare mae, mse, rmse for all models
fig, ax = plt.subplots(1,3,figsize=(20,5))
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest'],y=[mean_a
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest'],y=[mean_s
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest'],y=[np.sqr
# label for the graph
ax[0].set_ylabel('Mean Absolute Error')
ax[1].set_ylabel('Mean Squared Error')
ax[2].set_ylabel('Root Mean Squared Error')
plt.show()

Accuracy Evaluation

In [ ]: # plot accuracy of all models in the same graph

fig, ax = plt.subplots(figsize=(7,5))
sns.barplot(x=['Linear Regression','Ridge Regression','Random Forest Regression'
ax.set_title('Accuracy of all models')
plt.show()

file:///E:/Data Science Course/Projects/house price.html 11/13

7/8/23, 4:15 PM house price

Predicting the price of a new house

In [ ]: #input the values
bedrooms = 3
bathrooms = 2
sqft_living = 2000
sqft_lot = 10000
floors = 2
waterfront = 0
view = 0
condition = 3
grade = 8
sqft_above = 2000
sqft_basement = 0
yr_built = 1990
yr_renovated = 0
zipcode = 98001
lat = 47.5480
long = -121.9836
sqft_living15 = 2000
sqft_lot15 = 10000

In [ ]: #predicting the price using random forest regression

price = regressor.predict([[bedrooms,bathrooms,sqft_living,sqft_lot,floors,water
print('The price of the house is $',price[0])

The price of the house is $ 1078694.0533333335

Conclusion

file:///E:/Data Science Course/Projects/house price.html 12/13

7/8/23, 4:15 PM house price

From the analysis, we can see that the Random Forest Regression model performed
better than the Ridge Regression model and Polynomial Regression model.

During the EDA process, we found out that the location of the house is a very important
factor in determining the price of the house, since houese with similar area and other
features can have different prices depending on the location of the house.

The location of the houses has been plotted on the map using the longitude and latitude
values which makesrole of location in determining the price of the house more clear.

file:///E:/Data Science Course/Projects/house price.html 13/13

NOTE: For some reasons, the map was not rendered properly when
the notebook was converted into pdf. So here is the image of the
rendered map showing the locations of the houses, color coded
according to their price range

u1p2
No ratings yet
u1p2
127 pages
UNIT-2 (3)
No ratings yet
UNIT-2 (3)
78 pages
BCA 5th Sem Lab(ML)
No ratings yet
BCA 5th Sem Lab(ML)
20 pages
House Prices Analysis_Final Assessment
No ratings yet
House Prices Analysis_Final Assessment
2 pages
(House Price Prediction) Capstone Project for Python (1)
No ratings yet
(House Price Prediction) Capstone Project for Python (1)
10 pages
ml file
No ratings yet
ml file
6 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Deep Learning - House Price Prediction
No ratings yet
Deep Learning - House Price Prediction
17 pages
ds_ml__house_price_book
No ratings yet
ds_ml__house_price_book
46 pages
Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
EDA
No ratings yet
EDA
14 pages
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
Final DA LAB1 Merged (1)
No ratings yet
Final DA LAB1 Merged (1)
48 pages
DL_LR_1.ipynb - Colab
No ratings yet
DL_LR_1.ipynb - Colab
5 pages
Formal Research Paper Slideshow by Slidesgo
No ratings yet
Formal Research Paper Slideshow by Slidesgo
9 pages
Eda Project
No ratings yet
Eda Project
28 pages
ese lab file
No ratings yet
ese lab file
30 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
House Price Prediction Analysis Project
No ratings yet
House Price Prediction Analysis Project
7 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
House Rent Prediction EDA
No ratings yet
House Rent Prediction EDA
35 pages
Long Memory Time Series Theory and Methods 1st Edition Wilfredo Palma - Download the ebook today and own the complete content
No ratings yet
Long Memory Time Series Theory and Methods 1st Edition Wilfredo Palma - Download the ebook today and own the complete content
84 pages
Capstone project 6 April
No ratings yet
Capstone project 6 April
64 pages
Report
No ratings yet
Report
40 pages
Project PDF
No ratings yet
Project PDF
13 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
[Ebooks PDF] download (Ebook) The Essentials of Statistics: A Tool for Social Research by Joseph F. Healey ISBN 9781111829568, 111182956X full chapters
100% (4)
[Ebooks PDF] download (Ebook) The Essentials of Statistics: A Tool for Social Research by Joseph F. Healey ISBN 9781111829568, 111182956X full chapters
64 pages
Week 12
No ratings yet
Week 12
2 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
1684918425867
No ratings yet
1684918425867
14 pages
1722414346054
No ratings yet
1722414346054
18 pages
Emllab
No ratings yet
Emllab
6 pages
IndianHouses 1695069727
No ratings yet
IndianHouses 1695069727
7 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
(eBook PDF) Statistics for The Behavioral Sciences 10th Edition download
No ratings yet
(eBook PDF) Statistics for The Behavioral Sciences 10th Edition download
50 pages
Ex 1
No ratings yet
Ex 1
119 pages
How To Perform Simple Linear Regression in SPSS
No ratings yet
How To Perform Simple Linear Regression in SPSS
8 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Assignement 4
No ratings yet
Assignement 4
6 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
Machine Learning - Code - Jupiter
No ratings yet
Machine Learning - Code - Jupiter
14 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Copy of Project 4 _ House Price Prediction.ipynb - Colab
No ratings yet
Copy of Project 4 _ House Price Prediction.ipynb - Colab
5 pages
Machine Learning 8hmrvc (1)
No ratings yet
Machine Learning 8hmrvc (1)
52 pages
Evan Marie Carr - Python and SKlearn
No ratings yet
Evan Marie Carr - Python and SKlearn
32 pages
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
100% (26)
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
23 pages
Credit risk prediction with and without weights of evidence
No ratings yet
Credit risk prediction with and without weights of evidence
20 pages
Coding
No ratings yet
Coding
7 pages
Introduction To Linear Regression Analysis - (CHAPTER 2 SIMPLE LINEAR REGRESSION)
No ratings yet
Introduction To Linear Regression Analysis - (CHAPTER 2 SIMPLE LINEAR REGRESSION)
51 pages
Kaggle House Prices Advanced Regression Techniques
No ratings yet
Kaggle House Prices Advanced Regression Techniques
87 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
Quantam - Learning - Colaboratory
No ratings yet
Quantam - Learning - Colaboratory
13 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
a
No ratings yet
a
2 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Pandas Assignment 1
No ratings yet
Pandas Assignment 1
7 pages
ENENDA30 - Module 01 Part 1
No ratings yet
ENENDA30 - Module 01 Part 1
85 pages
Introduction to Area-Based Anti-Aliasing for CGI
From Everand
Introduction to Area-Based Anti-Aliasing for CGI
Michel A Rohner
No ratings yet
Prac - 8 (1) - Jupyter Notebook
No ratings yet
Prac - 8 (1) - Jupyter Notebook
6 pages
Stats Workbook For College Students
No ratings yet
Stats Workbook For College Students
337 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
FLIGHT DELAY Prediction 4th
No ratings yet
FLIGHT DELAY Prediction 4th
18 pages
Low Head Oxygenators
No ratings yet
Low Head Oxygenators
13 pages
????? ?? ??????????? ?????
No ratings yet
????? ?? ??????????? ?????
117 pages
Savings Behaviour in The Indian Economy: Upender, M. Reddy, N.L
No ratings yet
Savings Behaviour in The Indian Economy: Upender, M. Reddy, N.L
22 pages
Chapter 12 - Lecture 1 Linear Regression Model and Estimation of Parameters
No ratings yet
Chapter 12 - Lecture 1 Linear Regression Model and Estimation of Parameters
19 pages
PN1 Shakti Akshaya S PDF
100% (2)
PN1 Shakti Akshaya S PDF
60 pages
Effect of Brand Awareness Price and Promotion On P
No ratings yet
Effect of Brand Awareness Price and Promotion On P
16 pages
Alex Andersson
No ratings yet
Alex Andersson
66 pages
Vinay Os
No ratings yet
Vinay Os
38 pages
The Impact of Working From Home On Employee Productivity During 21st Century
100% (2)
The Impact of Working From Home On Employee Productivity During 21st Century
16 pages
Report of Slum Committee
No ratings yet
Report of Slum Committee
74 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Double Indicator Titration (Inorganic Lab-B.Sc. III Sem)
No ratings yet
Double Indicator Titration (Inorganic Lab-B.Sc. III Sem)
16 pages
Evaluation of Correlation Equations of CBR of Soils
No ratings yet
Evaluation of Correlation Equations of CBR of Soils
9 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Stochastic Gradient Descent Algorithm With Python and NumPy - Real
No ratings yet
Stochastic Gradient Descent Algorithm With Python and NumPy - Real
21 pages
Sentiment Analysis in Pre Ipo Market
No ratings yet
Sentiment Analysis in Pre Ipo Market
7 pages
Data Analyst Nanodegree Program - Syllabus
50% (2)
Data Analyst Nanodegree Program - Syllabus
7 pages
Mcqs in Dbms213
No ratings yet
Mcqs in Dbms213
9 pages
Machinelearning
No ratings yet
Machinelearning
6 pages
An Evaluative Effect of Temperature On The Human Cognitive Response Using Stroop Test
No ratings yet
An Evaluative Effect of Temperature On The Human Cognitive Response Using Stroop Test
6 pages
Statistical Anaylsis For Industrial Engineering 2
No ratings yet
Statistical Anaylsis For Industrial Engineering 2
2 pages
Learn SQL in 4 Hours
No ratings yet
Learn SQL in 4 Hours
3 pages
Credit Risk Analysis Using Machine and Deep Learning
No ratings yet
Credit Risk Analysis Using Machine and Deep Learning
19 pages
Demand Forecasting
100% (3)
Demand Forecasting
45 pages
Camden County College MTH-111 Final Exam Sample Questions
No ratings yet
Camden County College MTH-111 Final Exam Sample Questions
13 pages
Business Research Methods, OXFORD University Press, ISBN: 978 983 47074 77
No ratings yet
Business Research Methods, OXFORD University Press, ISBN: 978 983 47074 77
5 pages
Econ 271 Reading - Aut09Win10
No ratings yet
Econ 271 Reading - Aut09Win10
3 pages
Six Sigma Tools in A Excel Sheet
No ratings yet
Six Sigma Tools in A Excel Sheet
24 pages
FRM Quant Question Bank
100% (5)
FRM Quant Question Bank
111 pages

House Price Prediction

Uploaded by

House Price Prediction

Uploaded by

7/8/23, 4:15 PM house price

House Price Prediction

In [ ]: # Importing the dataset

Out[ ]: id date price bedrooms bathrooms sqft_living sqft_lot f

0 7129300520 20141013T000000 221900 3 1.00 1180 5650

1 6414100192 20141209T000000 538000 3 2.25 2570 7242

2 5631500400 20150225T000000 180000 2 1.00 770 10000

3 2487200875 20141209T000000 604000 4 3.00 1960 5000

4 1954400510 20150218T000000 510000 3 2.00 1680 8080

file:///E:/Data Science Course/Projects/house price.html 1/13

0 221900 3 1.00 1180 5650 1.0 0 0

1 538000 3 2.25 2570 7242 2.0 0 0

2 180000 2 1.00 770 10000 1.0 0 0

3 604000 4 3.00 1960 5000 1.0 0 0

4 510000 3 2.00 1680 8080 1.0 0 0

file:///E:/Data Science Course/Projects/house price.html 2/13

Out[ ]: price bedrooms bathrooms sqft_living sqft_lot fl

count 2.161300e+04 21613.000000 21613.000000 21613.000000 2.161300e+04 21613.000

mean 5.400881e+05 3.370842 2.114757 2079.899736 1.510697e+04 1.494

std 3.671272e+05 0.930062 0.770163 918.440897 4.142051e+04 0.539

min 7.500000e+04 0.000000 0.000000 290.000000 5.200000e+02 1.000

25% 3.219500e+05 3.000000 1.750000 1427.000000 5.040000e+03 1.000

50% 4.500000e+05 3.000000 2.250000 1910.000000 7.618000e+03 1.500

75% 6.450000e+05 4.000000 2.500000 2550.000000 1.068800e+04 2.000

max 7.700000e+06 33.000000 8.000000 13540.000000 1.651359e+06 3.500

In [ ]: # checking for null values/missing values

file:///E:/Data Science Course/Projects/house price.html 3/13

Out[ ]: price 4032

In [ ]: # using simple feature scaling

0 221900 3 1 0.087149 0.003421 1 0 0

1 538000 3 2 0.189808 0.004385 2 0 0

2 180000 2 1 0.056869 0.006056 1 0 0

3 604000 4 3 0.144756 0.003028 1 0 0

4 510000 3 2 0.124077 0.004893 1 0 0

file:///E:/Data Science Course/Projects/house price.html 4/13

Exploratory Data Analysis

Out[ ]: price 1.000000

file:///E:/Data Science Course/Projects/house price.html 5/13

Visualizing the coorelation with price

Out[ ]: <Axes: >

file:///E:/Data Science Course/Projects/house price.html 6/13

Visulaizing the data

In [ ]: # visualizing the relation between price and sqft_living, sqft_lot, sqft_above,

file:///E:/Data Science Course/Projects/house price.html 7/13

Plotting the location of the houses based on longitude and latitude on

In [ ]: map = folium.Map(location=[47.5480, -121.9836],zoom_start=8)

file:///E:/Data Science Course/Projects/house price.html 8/13

Using pipeline to combine the transformers and estimators

In [ ]: #training the model

In [ ]: #testing the model

file:///E:/Data Science Course/Projects/house price.html 9/13

In [ ]: # training the model

In [ ]: #testing the model

Random Forest Regression

In [ ]: # training the model

In [ ]: #testing the model

Distribution plot from the models predictions and the

file:///E:/Data Science Course/Projects/house price.html 10/13

In [ ]: # plot accuracy of all models in the same graph

file:///E:/Data Science Course/Projects/house price.html 11/13

Predicting the price of a new house

In [ ]: #predicting the price using random forest regression

The price of the house is $ 1078694.0533333335

file:///E:/Data Science Course/Projects/house price.html 12/13

file:///E:/Data Science Course/Projects/house price.html 13/13

You might also like