0% found this document useful (0 votes)
72 views7 pages

Real Estate Price Prediction

The document discusses predicting real estate prices using machine learning algorithms. It describes building a model to accurately predict prices based on factors like location, size, amenities. Linear regression and other algorithms are used and evaluated on a real estate dataset to demonstrate the approach.

Uploaded by

Pratik Nagare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views7 pages

Real Estate Price Prediction

The document discusses predicting real estate prices using machine learning algorithms. It describes building a model to accurately predict prices based on factors like location, size, amenities. Linear regression and other algorithms are used and evaluated on a real estate dataset to demonstrate the approach.

Uploaded by

Pratik Nagare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Real Estate Price Prediction

Pratik Nagare Suyash Kolte Chaitanya kshirsagar


Department of Computer Department of Computer Department of Computer
Engineering Engineering Engineering
Pimpri Chinchwad College Pimpri Chinchwad College of Pimpri Chinchwad College of
of Engineering Engineering Engineering
Pune 411044 Pune 411044 Pune 411044
pratik.nagare22@pccoepune. suyash.kolte21@pccoepune. chaitanya.kshirsagar21@pc
org org coepune.org

Abstract - Real Estate industry is dynamic in terms of the systems using the machine learning algorithms with
maximum accuracy. Under the domain of ML and Data
prices being fluctuated regularly. It’s one of the main area to
Science the designing of the real estate price prediction
apply the machine learning concepts to predict the prices of along with the full-fledged website is done. According to the
real estate depending upon the current situations and make out census of 2011 only 80 percent of people own their houses.
maximum accuracy for the same. The research paper mainly And only people based in rural areas own maximum houses
focus on to predicting the real valued prices for the places and but people in urban sector only about 69 % own a house.
This is due to the raising prices of the properties and vague
the houses by applying the appropriate ML algorithms. The
house prices. The main aim to design and develop this model
proposed article considers some essential aspects and
is to produce price prediction system along with a user-
parameters for calculating the prices of real estate property friendly front end that will facilitate the users to choose the
Also some more geographical and statistical techniques will be desired destination and get an idea about the price rates. The
needed to predict the price of a house. The paper consist how Analysis that has been made in the paper is mainly using the
the house pricing model works after using some machine dataset from the trusted website that gives ample of sample
points for better analysis. One must be aware of the exact
learning techniques and algorithms. The use of the dataset in
price of house before concluding the deal. As the price of
the proposed system from the reputed website helps to get the house depends on many factors like Area, location,
detailed analysis of the data points. Algorithms like Linear population, size and number of bedrooms & bathrooms
regression and sklearn are used to effectively increase the given, parking space, elevator, style of construction, balcony
accuracy. During model structure nearly all data similarities space, condition of building, price per square foot etc. The
and cleaning, outlier removal and feature engineering,
proposed model aims to create an accurate result by taking
into consideration all different factors. For House price
dimensionality reduction, gridsearchcv for hyperparameter
prediction one can use various prediction models (Machine
tuning, k fold cross-validation, etc. are covered. Learning Models) like support vector regression, Support
Keywords - Linear regression model, Python, Machine vector machine (SVM), Logistic regression, k-means,
artificial neural network etc. House- pricing model is
Learning, House Price, Decision Tree, Lasso, Ridge, KNN.
beneficial for the buyers, property investors, and house
builders. This model will be informative and knowledgeable
I. INTRODUCTION for the entities related to the real estate and all the
stakeholders to evaluate the current market trends and
The proposed research paper refers to the predictions on the budget friendly properties. Studies initially concentrated on
recent trends and for the plans of economy. The main drive analysis of the attributes which influence prices of the
behind the article is prediction of the real estate prices to houses based on which model of ML is used and still this
build best of the house price prediction article brings together both predicting house price and
attributes together. For this paper, Bangalore city is taken as
an example because it is Asia's fastest-growing city. The
city's growth has already slowed its own economic growth
rate and it has gone through various changes that have
contributed to its growth over the last few decades, one of to applying PCA (Principal Component Analysis) steps to
which is the IT industry. Bangalore has an excellent social get the optimal solution from the dataset. Then they have
infrastructure, also excellent educational institutions and a applied SVM (Support Vector Machine) for the
rapidly changing physical infrastructure. These factors have competitive approach. Thus how several methods are
led to an increase in migration from other states to implemented to getthe best results out of it.
Bangalore, but the cost of living as increased, making it
M. Jain, H. Rajput, N. Garg and P. Chawla 2020 [2] is
difficult for or people to manage their households effectively
also ahouse price prediction system using some techniques.
[5]. The model building starts with the dataset from a
In this they have used the simple process of machine
reliable source that is simple to use. For a dataset was chosen
learning from data cleaning, visualization, pre-processing
for our house price prediction, which contains 13320 records
and using k-fold cross validation for the output results.
of data and 9 features for training our model. There are
Finally they have displayed the graph that shows close
various machine learning procedures that can be used to
resemblance with actual price and the predicted price
forecast future values. In any case, it is required a model that
showing decent accuracy through their working model.
can forecast future property estimations with greater
accuracy and less error. With a specific end goal of N. N. Ghosalkar and S. N. Dhage 2018 [4], Real Estate
preparing the model, a significant amount of memorable Pricevalue using Linear Regression are using simple Linear
dataset is required. Generally one wants to create a Regression technique to give the price value for the houses.
framework because there is little research on forecasting Through this paper they have tried to have best fitting line
land property in India. This can forecast the cost of a (relationship) between the factors of the real estate taken
property by taking into account the various parameters that into consideration and used various mathematical
influence the target value. In addition, the prediction techniques like MSE (Mean Squared Error), RMSE(Root
accuracy is measured by taking into account various error Mean SquaredError) etc.
metrics [5]. After reviewing various articles and research papers
about machine learning for housing price prediction the
article now focus is on understanding current trends in
II. LITERATURE SURVEY
house prices and homeownership. The proposed system
uses a machine learning model to predict prices with high
Every common man's first desire and need is for real estate
accuracy.
property. Investing in the real estate appears to be very
profitable as the property rates do not fall steeply. Investing
in real estate appears to be difficult task for investors when
III. PROPOSED SYSTEM
one has to select a new house and predict the price with
minimum difficulty for this there are several factors which
affect the price of a house and all these factors are needed to The main end or focus of our design is to prognosticate the
be taken into consideration to predict the price effectively. accurate price of the real estate parcels present in India for
Also building such models for prediction needs much thecoming forthcoming times through different
research and data analysis as many researchers are already Algorithms used in the model building are:
working on it to get the better results.
Linear Regression- It's a supervised literacy fashion
S. Rana, J. Mondal, A. Sharma and I. Kashyap 2020 [5] andresponsible for prognosticating the value of variable(Y)
have used various regression algorithms to predict the relying on variable(X) which is not dependent [4]. It's the
house prices, like XG Boost, Decision Tree Regression, relationship between the input( X) and ( Y) [5].
SVR, and Random forest. After applying all these
The formula for linear regression equation is given by:
algorithms on to thedataset a comparison for the accuracy is
y = a + bx
done at the end. From which the maximum accuracy of 99% where y is the predicted value,
given by the decision tree algorithm followed by the XG a is Y-intercept of the line,
Boost of 63%, this was purely the experimental analysis by b is Slope of the line,
testing various algorithms models. x is the input value
T. D. Phan, 2018 [1] is House Price Prediction using
machine learning algorithms: A case study of Melbourne Least Absolute Shrinkage and Selection Operator-
city, Australia. This is a through case study for analyzing Lasso is direct regression that considers loss. Loss is a point
the dataset to give some useful insights on to the housing where data values are diminished towards a central point,
industry of Melbourne city in Australia. They have used like the mean. The selection operator is an LR technique
variousregression models. Starting with the data reduction that also regularizes functionality, and LASSO stands for
least absolute shrinkage. It is similar to ridge regression, but known as child nodes or terminal nodes. Each sub node is
it differs in the values of regularization. The absolute values parted into two or more sub trees based on the values of the
ofthe sum of the regression coefficients are considered. It input attributes [8]. Decision tree regression helps to predict
evensets the coefficients to zero to eliminate all errors. As a the data using trained model in the form of a tree structure to
result,lasso regression is used to select features. The lariat generate the meaningful output and continuous affair which
procedure encourages simple, sparse models (i.e. models is nothing but non separable result/affair [9].
with smaller parameters) [6] [7].
The formula for computing the Lasso regression K-Nearest Neighbors (KNN): K-Nearest Neighbors is a
coefficient can be expressed as: non-parametric algorithm that classifies data points based on
β^lasso = argmin ( RSS+ α ∑j=1p ∣βj∣) the majority class of their nearest neighbors in the feature
space. The distance metric (e.g., Euclidean, Manhattan) is
Where: used to determine the nearest neighbors.
β^lasso represents the estimated coefficients for Lasso For a new input sample x:
regression.
1. Calculate Distance: Compute the distance between x and
RSS is the residual sum of squares, which measures the all instances in the training dataset using a distance metric
difference between the observed and predicted target (e.g., Euclidean distance, Manhattan distance).
values.
2. Find Neighbors: Select the k instances (neighbors) with
α is the regularization parameter (tuning parameter), the smallest distances to x.
controlling the strength of regularization.
3. Majority Voting: Assign the class label to x based on the
βj denotes the coefficients of the predictor variables. majority class among its k nearest neighbors. For
classification tasks, this can be achieved through majority
voting, where the class with the highest frequency among the
Ridge regression is a regularization technique in linear neighbors is assigned to x.
regression that minimizes the residual sum of squares
(RSS) between observed and predicted target values while
penalizing large coefficients. Ridge regression employs a Initially feature engineering is applied on the raw data
penalty term proportional to the squared magnitude of which includes cleaning, outlier removal to make the data
coefficients (L2 regularization). This penalty term, ready forthe model building. From the fig 1, the dataset is
controlled by a regularization parameter α helps prevent divided intotwo sets i.e. training which is 80% and testing
overfitting by shrinking the coefficients towards zero, which is 20%. To find the accuracy k-fold cross validation
particularly useful in handling multicollinearity. Overall, technique is usedwhere value of k is 5 due to which accuracy
Ridge regression provides stable estimates of coefficients of model comesout to be around 82% to 85%. The training
and helps mitigate the issues of overfitting, particularly in set is passed through machine learning algorithms to
situations with highly correlated predictors. generate trained model also the hyperparameters passed by
The formula for computing the Ridge regression the k-fold cross validation are helpful to take decision based
coefficient can be expressed as: on best score andbest parameters of the models which are
β^ridge = (XTX+αI) −1XTy considered here. After evaluating test set and trained model
obtain from a training set is passed on to the artifacts where
Where:
pickle file contain the model and the json file contain the
X is the design matrix of predictor variables. column details. The back-end is supported by the python
y is the target variable vector. flask server which take input as set of values and provide
I is the identity matrix. output as predicted values.

α is the regularization parameter.

Decision Tree- It is like linear regression, which is one of


the data mining methods of analyzing multiple variables. It is
a tree that consist of root node which is also called as decision
node and forms a tree with leaf nodes at the end which helps
to take the appropriate decision. A sub node is a node with
outgoing edges. All other nodes with no outgoing edges are
Fig 1 Architecture

Fig 2 Price Outliers for a place (Hebbal)

Technology used-
In Fig 3 below shows the scatterplot of price_per_sqft
Data Science- Data wisdom is the first stage in which we vs Total Square feet of a random place from the dataset
takethe dataset and will do the data drawing on it. We'll do Hebbal where blue dot represents 2BHK and green plus
the data drawing to make sure that it provides dependable represents 3BHK. This plot is after removing the outliers
prognostications. present in the dataset by using the function. Also in the
Machine Learning- The gutted data is fed into the above fig we can find one or two green plus which is 3BHK
machine literacy model, and we do some of the algorithms and still shows asoutlier after the function is applied. But
like direct retrogression, retrogression trees to test out our that is a minordifference where is has come due to the place
model. and its area where the house is present.

Front End (UI) - The frontal end is principally the


structure or a figure up for a website. In this to admit an
information for prognosticating the price. It takes the form
data entered by the stoner and executes the function which
employs the prediction model to calculate the predicted
price for thehouse.

IV. DATA VISUALIZATION

Visualization gradually makes complex data more


accessible,reasonable, and usable as shown in Fig 2 and Fig
3. Dealing with, analyzing, and transmitting this data
presents good and orderly challenges for data Fig 3 Price after outliers removed (Hebbal)
representation. This test is addressed by the field of data
science and experts known as data scientists.
A correlation matrix is just a simple visual representation
table that gives correlation between the different variables
In Fig 2 below shows the scatterplot of price_per_sqft vs of the table. The matrix gives almost all the possible
Total Square feet of the random place from the dataset correlation between the variables possible. Whenever the
Hebbal where blue dot represents 2BHK and green plus large datasets are considered it is best option to display the
represents 3BHK. This plot is with the outliers present in summary of thedifferent patterns of the data. The correlation
the dataset. matrix has the value ranging between -1 to +1. Thus the
positive number shows the positive links among the
variables while the negative number shows the negative link
between the variables that are considered. In the Fig 4 below
five variables (features- total_sqft, bath, price, bhk, and
price_per_sqft) areplotted and the correlation among them
is displayed. For Heatmap the Python library sns is used for
data visualizationthat is based on matplotlib.

Fig. 6 Histogram

Fig.7 appears to be a bar chart showing the explained


variance ratio of principal components. The x-axis is
labeled "Principal components" and goes from 1 to 10. The
y-axis is labeled "Explained variance ratio" and goes from
0 to 0.5.
Fig 4 Correlation Matrix
Based on the chart, the first principal component explains
the most variance in the data, followed by decreasing
Clustering is an unsupervised learning technique used to amounts of variance explained by subsequent components.
group similar objects or data points together based on their This is a typical pattern in PCA; the first few components
characteristics or features. The goal of clustering is to capture most of the important information in the data.
partition a dataset into groups, or clusters, where data points
within the same cluster are more similar to each other than
they are to data points in other clusters.
Fig. 5 appears to be a scatter plot showing price per square
foot for different clusters of properties. The x-axis is
labeled "Total Square Feet Area" and goes from 0 to
50,000. The y-axis is labeled "Price per Square Feet" and
goes from 0 to 40,000. There are five clusters labeled 0, 1,
2, 3, and 4. Without more data points it is difficult to say
anything about the relationship between price and square
footage.

Fig. 7 Bar chart of Variance ratio of PCA

V. COMPARATIVE ANALYSIS

In this section, we conduct a comparative analysis of the five


classification algorithms namely K-Nearest Neighbors
(KNN), Decision Tree, Lasso Regression, Linear Regression
and Ridge Regression for the task of real estate price
Fig 5 Scatterplot of Clustering prediction. We evaluate the performance of each algorithm
Properties using various evaluation metrics on the test data. Performance
Metrics:
Fig. 6 shows a histogram visualizing the distribution of
prices per square foot in my dataset.
Where:
n is the number of samples.
yi is the actual (observed) target value for the i-th
sample.
y^i is the predicted target value for the i-th sample.
yˉ is the mean of the actual target values.

Fig. 2 Shows the Regression Evaluation Matrix of each


Fig. 1 Comparative Performance Analysis model’s Mean Squared Error, Mean Absolute Error, R-
squared (Coefficient of Determination).

The above Fig 1 shows the comparison between the


various algorithms used to build the price prediction model,
where it is found out that the Linear Regression gives the
maximum accuracy of about 84.77 percent. While other
algorithms KNN, Lasso, Ridge and Decision Tree gives
69.08, 72.67, 84.68 and 71.9 percent respectively.

Regression Evaluation Matrix:


Fig. 2 Regression Evaluation Matrix
1. Mean Squared Error (MSE): Mean Squared Error is a
commonly used metric to measure the average squared
difference between the actual and predicted values in
Based on the table, Decision TreeRegressor appears to have
regression analysis. the lowest MSE (1456.61) and MAE (19.62), suggesting it
MSE=1/n∑(yi-y^i)2 might be the best performing algorithm in terms of
minimizing errors.
Linear Regression has the highest R-squared (0.8629) which
Where: indicates it explains the most variance in the data. However, it
n is the number of samples. also has a higher MSE compared to Decision TreeRegressor.
yi is the actual (observed) target value for the i-th sample.
y^i is the predicted target value for the i-th sample.

2. Mean Absolute Error (MAE): Mean Absolute Error


measures the average absolute difference between the actual
and predicted values in regression analysis.

MSE=1/n∑|yi-y^i|

Where:
n is the number of samples.
Fig 3: Model Evaluation Metrics Comparison with graph
yi is the actual (observed) target value for the i-th sample.
y^i is the predicted target value for the i-th sample.
VI. CONCLUSION
3. R-squared (Coefficient of Determination): R-squared is
In this study, various machine learning algorithms are
a statistical measure that represents the proportion of the
used to estimate house prices. All of the methods were
variance in the dependent variable that is explained by the
described in detail, and then the dataset is taken as input,
independent variables in a regression model.
applied the various models to give out the results of the
R2 = 1 - ∑ (yi-y^i)2 prediction. The presentation of each model was then
compared based on features where it is found that linear
∑ (yi-yˉi)2 regression gives maximum accuracy of about 84 to 85%
after a proper comparison with decision tree and Lasso
regression. The correlation matrix also displays the doi; 10.1109/MLBDBI54094.2021.00059.
visualization of the largerdata into compact pattern. Thus
the model can work with decent efficiency giving the [9] R. Sawant, Y. Jangid, T. Tiwari, S. Jain and A. Gupta,
required features to the customer. "Comprehensive Analysis of Housing Price Prediction in Pune Using
Multi-Featured Random Forest Approach," 2018 Fourth International
Conference on Computing Communication Control and Automation
REFERENCES (ICCUBEA), Pune, India, 2018,
pp. 1-5, doi: 10.1109/ICCUBEA.2018.8697402.
[1] T. D. Phan, "Housing Price Prediction Using Machine Learning
Algorithms: The Case of Melbourne City, Australia," 2018 International
[10] C. R. Madhuri, G. Anuradha and M. V. Pujitha, "House
Conference on Machine Learning and Data Engineering (iCMLDE),
Price Prediction Using Regression Techniques: A Comparative Study,"
Sydney, NSW, Australia, 2018, pp. 35-42, doi:
2019 International Conference on Smart Structures and Systems (ICSSS),
17.1109/iCMLDE.2018.00017.
Chennai, India, 2019, pp. 1-5, doi: 10.1109/ICSSS.2019.8882834.

[2] M. Jain, H. Rajput, N. Garg and P. Chawla, "Prediction of House


Pricing using Machine Learning with Python," 2020 International
Conference on Electronics and Sustainable Communication Systems
(ICESC), Coimbatore, India, 2020, pp. 570-574, doi:
10.1109/ICESC48915.2020.9155839.

[3] Nihar Bhagat, Ankit Mohokar and Shreyash Mane. House Price
Forecasting using Data Mining. International Journal of Computer
Applications 152(2):23-26, October 2016.

[4] N. N. Ghosalkar and S. N. Dhage, "Real Estate Value Prediction


Using Linear Regression," 2018 Fourth International Conference on
Computing Communication Control and Automation (ICCUBEA), Pune,
India, 2018, pp. 1-5, doi: 10.1109/ICCUBEA.2018.8697639.

[5] V. S. Rana, J. Mondal, A. Sharma and I. Kashyap, "House Price


Prediction Using Optimal Regression Techniques," 2020 2nd
International Conference on Advances in Computing, Communication
Control and Networking (ICACCCN), Greater Noida, India, 2020, pp.
203-208, doi: 10.1109/ICACCCN51052.2020.9362864.

[6] J. Manasa, R. Gupta and N. S. Narahari, "Machine Learning based


Predicting House Prices using Regression Techniques," 2020 2nd
International Conference on Innovative Mechanisms for Industry
Applications (ICIMIA), Bangalore, India, 2020, pp. 624-630, doi:
10.1109/ICIMIA48430.2020.9074952.

[7] N. S. R H, P. R, R. R. R and M. K. P, "Price Prediction of House using


KNN based Lasso and Ridge Model," 2022 International Conference on
Sustainable Computing and Data Communication Systems (ICSCDS),
Erode, India, 2022, pp. 1520-1527,
doi:10.1109/ICSCDS53736.2022.9760832.

[8] Z. Zhang, "Decision Trees for Objective House Price Prediction,"


2021 3rd International Conference on Machine Learning, Big Data and
Business Intelligence (MLBDBI), Taiyuan, China, 2021, pp. 280-283

You might also like