House Price Prediction Project Report
House Price Prediction Project Report
Techniques
Submitted by:
HINAL SETH
ACKNOWLEDGMENT
I would like to thank for Flip Robo Technologies for giving me this
golden opportunity to work on this project.
I also would to thank my mentor, Ms. Swati Mahaseth for her
guidance, suggestions, and patience toward my project.
In addition, I would like to thank my family for their supports.
Lastly, I would like to thank The Almighty for making me whoever I
am today.
INTRODUCTION
Target Variable:
Selling Price of the house.
• Review of Literature
House is one of human existence's most fundamental
requirements, alongside other key necessities like food, water, and
significantly more. Interest for houses developed quickly
throughout the years as individuals' expectations for everyday
comforts gotten to the next level. While there are individuals who
make their home as a venture and property, yet the vast majority
all over the planet are purchasing a house as their asylum/shelter or
as their business.
Real estate markets emphatically affect a nation's money, which is a
significant public economy scale. Mortgage holders will buy
merchandise like furnishings and family hardware for their home,
and homebuilders or workers for hire will buy unrefined substance
to construct houses to fulfill house interest, which means that the
financial wave impact made by the new house supply. Other than
that, purchasers have cash-flow to make an enormous speculation,
and the development business is in great condition should be
visible through a nation's significant degree of house supply.
Housing prices are an important reflection of the economy, and
housing price ranges are of great interest for both buyers and
sellers. In this project. house prices will be predicted given
explanatory variables that cover many aspects of residential
houses. As continuous house prices, they will be predicted with
various regression techniques including Linear Regression, Ridge,
Elastic Net, SGD Regressor, Decision Tree regression, and
Random Forest regression, etc.
The data contains 1460 entries each having 81 variables, hence we
may also use PCA for dimension reduction.
18. 'OverallQual': Rates the overall material and finish of the house
10 Very Excellent
9 Excellent
8 Very Good
7 Good
6 Above Average
5 Average
4 Below Average
3 Fair
2 Poor
1 Very Poor
19. 'OverallCond': Rates the overall condition of the house
10 Very Excellent
9 Excellent
8 Very Good
7 Good
6 Above Average
5 Average
4 Below Average
3 Fair
2 Poor
1 Very Poor
Flat Flat
Gable Gable
Gambrel Gabrel (Barn)
Hip Hip
Mansard Mansard
Shed Shed
BrkCmnBrick Common
BrkFaceBrick Face
CBlock Cinder Block
None None
Stone Stone
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
Ex Excellent
Gd Good
TA Typical - slight dampness allowed
Fa Fair - dampness or some cracking or settling
Po Poor - Severe cracking, settling, or wetness
NA No Basement
Gd Good Exposure
Av Average Exposure (split levels or foyers typically score
average or above)
Mn Mimimum Exposure
No No Exposure
NA No Basement
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
Po Poor
N No
Y Yes
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor
Fin Finished
RFn Rough Finished
Unf Unfinished
NA No Garage
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor
NA No Garage
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor
NA No Garage
Y Paved
P Partial Pavement
N Dirt/Gravel
Ex Excellent
Gd Good
TA Average/Typical
Fa Fair
NA No Pool
Elev Elevator
Gar2 2nd Garage (if not described in garage section)
Othr Other
Shed Shed (over 100 SF)
TenC Tennis Court
NA None
ii) Ridge:
The Ridge regression is a procedure which is particular to
investigate multiple regression data which is multicollinearity
in nature.
iii) Elastic Net:
Sklearn provides a linear model named ElasticNet which is
trained with both L1, L2-norm for regularisation of the
coefficients. The advantage of such combination is that it
allows for learning a sparse model where few of the weights
are non-zero like Lasso regularisation method, while still
maintaining the regularization properties of Ridge
regularisation method.
We can see from the plot that most house prices fall between
100,000 and 250,000. The dashed lines represent the
locations of the three quartiles Q1, Q2 (the median), and Q3.
v) Boxplot: