Real Estate Price Prediction
Real Estate Price Prediction
Abstract - Real Estate industry is dynamic in terms of the systems using the machine learning algorithms with
maximum accuracy. Under the domain of ML and Data
prices being fluctuated regularly. It’s one of the main area to
Science the designing of the real estate price prediction
apply the machine learning concepts to predict the prices of along with the full-fledged website is done. According to the
real estate depending upon the current situations and make out census of 2011 only 80 percent of people own their houses.
maximum accuracy for the same. The research paper mainly And only people based in rural areas own maximum houses
focus on to predicting the real valued prices for the places and but people in urban sector only about 69 % own a house.
This is due to the raising prices of the properties and vague
the houses by applying the appropriate ML algorithms. The
house prices. The main aim to design and develop this model
proposed article considers some essential aspects and
is to produce price prediction system along with a user-
parameters for calculating the prices of real estate property friendly front end that will facilitate the users to choose the
Also some more geographical and statistical techniques will be desired destination and get an idea about the price rates. The
needed to predict the price of a house. The paper consist how Analysis that has been made in the paper is mainly using the
the house pricing model works after using some machine dataset from the trusted website that gives ample of sample
points for better analysis. One must be aware of the exact
learning techniques and algorithms. The use of the dataset in
price of house before concluding the deal. As the price of
the proposed system from the reputed website helps to get the house depends on many factors like Area, location,
detailed analysis of the data points. Algorithms like Linear population, size and number of bedrooms & bathrooms
regression and sklearn are used to effectively increase the given, parking space, elevator, style of construction, balcony
accuracy. During model structure nearly all data similarities space, condition of building, price per square foot etc. The
and cleaning, outlier removal and feature engineering,
proposed model aims to create an accurate result by taking
into consideration all different factors. For House price
dimensionality reduction, gridsearchcv for hyperparameter
prediction one can use various prediction models (Machine
tuning, k fold cross-validation, etc. are covered. Learning Models) like support vector regression, Support
Keywords - Linear regression model, Python, Machine vector machine (SVM), Logistic regression, k-means,
artificial neural network etc. House- pricing model is
Learning, House Price, Decision Tree, Lasso, Ridge, KNN.
beneficial for the buyers, property investors, and house
builders. This model will be informative and knowledgeable
I. INTRODUCTION for the entities related to the real estate and all the
stakeholders to evaluate the current market trends and
The proposed research paper refers to the predictions on the budget friendly properties. Studies initially concentrated on
recent trends and for the plans of economy. The main drive analysis of the attributes which influence prices of the
behind the article is prediction of the real estate prices to houses based on which model of ML is used and still this
build best of the house price prediction article brings together both predicting house price and
attributes together. For this paper, Bangalore city is taken as
an example because it is Asia's fastest-growing city. The
city's growth has already slowed its own economic growth
rate and it has gone through various changes that have
contributed to its growth over the last few decades, one of to applying PCA (Principal Component Analysis) steps to
which is the IT industry. Bangalore has an excellent social get the optimal solution from the dataset. Then they have
infrastructure, also excellent educational institutions and a applied SVM (Support Vector Machine) for the
rapidly changing physical infrastructure. These factors have competitive approach. Thus how several methods are
led to an increase in migration from other states to implemented to getthe best results out of it.
Bangalore, but the cost of living as increased, making it
M. Jain, H. Rajput, N. Garg and P. Chawla 2020 [2] is
difficult for or people to manage their households effectively
also ahouse price prediction system using some techniques.
[5]. The model building starts with the dataset from a
In this they have used the simple process of machine
reliable source that is simple to use. For a dataset was chosen
learning from data cleaning, visualization, pre-processing
for our house price prediction, which contains 13320 records
and using k-fold cross validation for the output results.
of data and 9 features for training our model. There are
Finally they have displayed the graph that shows close
various machine learning procedures that can be used to
resemblance with actual price and the predicted price
forecast future values. In any case, it is required a model that
showing decent accuracy through their working model.
can forecast future property estimations with greater
accuracy and less error. With a specific end goal of N. N. Ghosalkar and S. N. Dhage 2018 [4], Real Estate
preparing the model, a significant amount of memorable Pricevalue using Linear Regression are using simple Linear
dataset is required. Generally one wants to create a Regression technique to give the price value for the houses.
framework because there is little research on forecasting Through this paper they have tried to have best fitting line
land property in India. This can forecast the cost of a (relationship) between the factors of the real estate taken
property by taking into account the various parameters that into consideration and used various mathematical
influence the target value. In addition, the prediction techniques like MSE (Mean Squared Error), RMSE(Root
accuracy is measured by taking into account various error Mean SquaredError) etc.
metrics [5]. After reviewing various articles and research papers
about machine learning for housing price prediction the
article now focus is on understanding current trends in
II. LITERATURE SURVEY
house prices and homeownership. The proposed system
uses a machine learning model to predict prices with high
Every common man's first desire and need is for real estate
accuracy.
property. Investing in the real estate appears to be very
profitable as the property rates do not fall steeply. Investing
in real estate appears to be difficult task for investors when
III. PROPOSED SYSTEM
one has to select a new house and predict the price with
minimum difficulty for this there are several factors which
affect the price of a house and all these factors are needed to The main end or focus of our design is to prognosticate the
be taken into consideration to predict the price effectively. accurate price of the real estate parcels present in India for
Also building such models for prediction needs much thecoming forthcoming times through different
research and data analysis as many researchers are already Algorithms used in the model building are:
working on it to get the better results.
Linear Regression- It's a supervised literacy fashion
S. Rana, J. Mondal, A. Sharma and I. Kashyap 2020 [5] andresponsible for prognosticating the value of variable(Y)
have used various regression algorithms to predict the relying on variable(X) which is not dependent [4]. It's the
house prices, like XG Boost, Decision Tree Regression, relationship between the input( X) and ( Y) [5].
SVR, and Random forest. After applying all these
The formula for linear regression equation is given by:
algorithms on to thedataset a comparison for the accuracy is
y = a + bx
done at the end. From which the maximum accuracy of 99% where y is the predicted value,
given by the decision tree algorithm followed by the XG a is Y-intercept of the line,
Boost of 63%, this was purely the experimental analysis by b is Slope of the line,
testing various algorithms models. x is the input value
T. D. Phan, 2018 [1] is House Price Prediction using
machine learning algorithms: A case study of Melbourne Least Absolute Shrinkage and Selection Operator-
city, Australia. This is a through case study for analyzing Lasso is direct regression that considers loss. Loss is a point
the dataset to give some useful insights on to the housing where data values are diminished towards a central point,
industry of Melbourne city in Australia. They have used like the mean. The selection operator is an LR technique
variousregression models. Starting with the data reduction that also regularizes functionality, and LASSO stands for
least absolute shrinkage. It is similar to ridge regression, but known as child nodes or terminal nodes. Each sub node is
it differs in the values of regularization. The absolute values parted into two or more sub trees based on the values of the
ofthe sum of the regression coefficients are considered. It input attributes [8]. Decision tree regression helps to predict
evensets the coefficients to zero to eliminate all errors. As a the data using trained model in the form of a tree structure to
result,lasso regression is used to select features. The lariat generate the meaningful output and continuous affair which
procedure encourages simple, sparse models (i.e. models is nothing but non separable result/affair [9].
with smaller parameters) [6] [7].
The formula for computing the Lasso regression K-Nearest Neighbors (KNN): K-Nearest Neighbors is a
coefficient can be expressed as: non-parametric algorithm that classifies data points based on
β^lasso = argmin ( RSS+ α ∑j=1p ∣βj∣) the majority class of their nearest neighbors in the feature
space. The distance metric (e.g., Euclidean, Manhattan) is
Where: used to determine the nearest neighbors.
β^lasso represents the estimated coefficients for Lasso For a new input sample x:
regression.
1. Calculate Distance: Compute the distance between x and
RSS is the residual sum of squares, which measures the all instances in the training dataset using a distance metric
difference between the observed and predicted target (e.g., Euclidean distance, Manhattan distance).
values.
2. Find Neighbors: Select the k instances (neighbors) with
α is the regularization parameter (tuning parameter), the smallest distances to x.
controlling the strength of regularization.
3. Majority Voting: Assign the class label to x based on the
βj denotes the coefficients of the predictor variables. majority class among its k nearest neighbors. For
classification tasks, this can be achieved through majority
voting, where the class with the highest frequency among the
Ridge regression is a regularization technique in linear neighbors is assigned to x.
regression that minimizes the residual sum of squares
(RSS) between observed and predicted target values while
penalizing large coefficients. Ridge regression employs a Initially feature engineering is applied on the raw data
penalty term proportional to the squared magnitude of which includes cleaning, outlier removal to make the data
coefficients (L2 regularization). This penalty term, ready forthe model building. From the fig 1, the dataset is
controlled by a regularization parameter α helps prevent divided intotwo sets i.e. training which is 80% and testing
overfitting by shrinking the coefficients towards zero, which is 20%. To find the accuracy k-fold cross validation
particularly useful in handling multicollinearity. Overall, technique is usedwhere value of k is 5 due to which accuracy
Ridge regression provides stable estimates of coefficients of model comesout to be around 82% to 85%. The training
and helps mitigate the issues of overfitting, particularly in set is passed through machine learning algorithms to
situations with highly correlated predictors. generate trained model also the hyperparameters passed by
The formula for computing the Ridge regression the k-fold cross validation are helpful to take decision based
coefficient can be expressed as: on best score andbest parameters of the models which are
β^ridge = (XTX+αI) −1XTy considered here. After evaluating test set and trained model
obtain from a training set is passed on to the artifacts where
Where:
pickle file contain the model and the json file contain the
X is the design matrix of predictor variables. column details. The back-end is supported by the python
y is the target variable vector. flask server which take input as set of values and provide
I is the identity matrix. output as predicted values.
Technology used-
In Fig 3 below shows the scatterplot of price_per_sqft
Data Science- Data wisdom is the first stage in which we vs Total Square feet of a random place from the dataset
takethe dataset and will do the data drawing on it. We'll do Hebbal where blue dot represents 2BHK and green plus
the data drawing to make sure that it provides dependable represents 3BHK. This plot is after removing the outliers
prognostications. present in the dataset by using the function. Also in the
Machine Learning- The gutted data is fed into the above fig we can find one or two green plus which is 3BHK
machine literacy model, and we do some of the algorithms and still shows asoutlier after the function is applied. But
like direct retrogression, retrogression trees to test out our that is a minordifference where is has come due to the place
model. and its area where the house is present.
Fig. 6 Histogram
V. COMPARATIVE ANALYSIS
MSE=1/n∑|yi-y^i|
Where:
n is the number of samples.
Fig 3: Model Evaluation Metrics Comparison with graph
yi is the actual (observed) target value for the i-th sample.
y^i is the predicted target value for the i-th sample.
VI. CONCLUSION
3. R-squared (Coefficient of Determination): R-squared is
In this study, various machine learning algorithms are
a statistical measure that represents the proportion of the
used to estimate house prices. All of the methods were
variance in the dependent variable that is explained by the
described in detail, and then the dataset is taken as input,
independent variables in a regression model.
applied the various models to give out the results of the
R2 = 1 - ∑ (yi-y^i)2 prediction. The presentation of each model was then
compared based on features where it is found that linear
∑ (yi-yˉi)2 regression gives maximum accuracy of about 84 to 85%
after a proper comparison with decision tree and Lasso
regression. The correlation matrix also displays the doi; 10.1109/MLBDBI54094.2021.00059.
visualization of the largerdata into compact pattern. Thus
the model can work with decent efficiency giving the [9] R. Sawant, Y. Jangid, T. Tiwari, S. Jain and A. Gupta,
required features to the customer. "Comprehensive Analysis of Housing Price Prediction in Pune Using
Multi-Featured Random Forest Approach," 2018 Fourth International
Conference on Computing Communication Control and Automation
REFERENCES (ICCUBEA), Pune, India, 2018,
pp. 1-5, doi: 10.1109/ICCUBEA.2018.8697402.
[1] T. D. Phan, "Housing Price Prediction Using Machine Learning
Algorithms: The Case of Melbourne City, Australia," 2018 International
[10] C. R. Madhuri, G. Anuradha and M. V. Pujitha, "House
Conference on Machine Learning and Data Engineering (iCMLDE),
Price Prediction Using Regression Techniques: A Comparative Study,"
Sydney, NSW, Australia, 2018, pp. 35-42, doi:
2019 International Conference on Smart Structures and Systems (ICSSS),
17.1109/iCMLDE.2018.00017.
Chennai, India, 2019, pp. 1-5, doi: 10.1109/ICSSS.2019.8882834.
[3] Nihar Bhagat, Ankit Mohokar and Shreyash Mane. House Price
Forecasting using Data Mining. International Journal of Computer
Applications 152(2):23-26, October 2016.