0% found this document useful (0 votes)
1 views4 pages

Housing Price Prediction Model Using Machine Learning

The document presents a machine learning-based model for predicting housing prices, utilizing various algorithms and data pre-processing techniques. The study evaluates the model's efficiency using real-time data, highlighting that Random Forest showed the best results, although it is prone to overfitting. The research emphasizes the importance of data quality and suggests that machine learning can significantly enhance the accuracy of housing price predictions.

Uploaded by

sankeerthana832
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views4 pages

Housing Price Prediction Model Using Machine Learning

The document presents a machine learning-based model for predicting housing prices, utilizing various algorithms and data pre-processing techniques. The study evaluates the model's efficiency using real-time data, highlighting that Random Forest showed the best results, although it is prone to overfitting. The research emphasizes the importance of data quality and suggests that machine learning can significantly enhance the accuracy of housing price predictions.

Uploaded by

sankeerthana832
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)

Housing Price Prediction Model Using Machine


Learning
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET) | 979-8-3503-2919-3/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSEIET58677.2023.10303359

Aman Chaurasia Inam Ul Haq


Department of CSE Department of CSE
Chandigarh University Chandigarh University
[email protected] [email protected]

Abstract—Housing price prediction is a challenging task due to II. LITERATURE REVIEW


the complexity of huge data variance with changes in location
In this [1] paper, the authors compare three different
points. In this research paper, we propose a machine learning- methods XG Boost , Random Forest and LightGBM. They
based house pricing prediction model that can predict the prices also compare two different techniques , hybrid regression and
of houses more accurately. The proposed model uses a stacked generalization regression.According to the paper ,
combination of data pre-processing techniques and machine methods used in it showed good results but advantages and
learning algorithms simultaneously. The efficiency of the disadvantages can be seen every method.Random Forest
proposed model is evaluated using real-time house price data, showed the most promised results as it was the most precise
and the results show significant improvement over the existing on the given training set., which suggests that it fits the data
techniques. very well. However, the method is prone to overfitting, which
means that it may not generalize well to new data.
Keywords—House Price Prediction, Machine Learning Additionally, the time complexity of the Random Forest
techniques. method is much more than others because the dataset has to
be used many times in different subsets.. This can make the
I. INTRODUCTION method slow and computationally expensive, especially for
large datasets.Overall the paper shows that each method has
The real estate market space has seen immense growth in their strength and weekness on the given data set of the
recent years, housing prices have become a vital topic of problem. It is important to carefully evaluate each method and
interest for many individuals, including buyers, sellers, and to choose the one that best suits the problem at hand.
investors. Accurately predicting house prices has become
increasingly crucial, as it helps buyers make informed This paper[2] describes a method to predict property rates
decisions, helps sellers set reasonable prices for their in Mumbai and neighbouring districts using a linear regression
properties, and helps investors identify lucrative model. The dataset used in the study was obtained from
opportunities. Traditional methods of predicting housing Kaggle, and it contained 17 attributes such as location, carpet
prices depend on expert knowledge and statistical analysis. area, and security. The authors preprocessed the dataset by
However, with the advent of machine learning algorithms, it removing any noisy data and outliers.
is now possible to make more precise and reliable predictions.
Machine learning models using one of several algorithm can In the research paper, the author divided the dataset into a
learn patterns and relationships in large datasets and use the training set and a test set. They used the training set to train a
outcome result to make accurate predictions on new data. In linear regression model and the test set to evaluate the
this research paper, we present a house price prediction model performance of the model. The R-squared value of the model
based on machine learning. We use various features, including is calculated to be 0.8643, indicating that the model explains
property characteristics, neighborhood demographics, and 86.43% of the variability in the data. The research paper
economic indicators, to predict housing prices accurately. The shows that their method can be extended to predict property
model is trained on a large dataset of housing prices and prices in other cities and rural areas in India. Additionally,
features from various sources, including real estate websites, they propose adding features like trends in a particular
databases, and public records. Our top objective is to location and comparisons with other properties to the system,
demonstrate the effectiveness of machine learning algorithms which can be developed into a live website on the internet.
in predicting housing prices. The paper also shows that the system can be used to predict
the appreciation in the price of the property. Overall, the paper
We also examine the impact of different features on the presents a method for predicting property rates using a linear
accuracy of our model and identify the most significant regression model and suggests potential avenues for future
predictors of housing prices.We gave more detailed attention research and development. However, it is important to note
to linear regression. Overall, our research has important that model has some limitation as the performance of the
implications for the real estate industry and for anyone model may be limited by the quality and representativeness of
interested in buying or selling a property. We show that the dataset used, and further validation on different datasets
machine learning algorithms can provide more accurate and may be required to confirm the robustness of the method.
reliable predictions of housing prices, which can help buyers,
sellers, and investors make better decisions. In this research paper [3], the author has analyzed previous
research on key characteristics of house real estate prices and
the data mining techniques used to predict them. The paper
noted that homes in areas with easy access to amenities such
as nearby shopping malls likely to be more costly than those

979-8-3503-2919-3/23/$31.00 ©2023 IEEE

497
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)

in rural areas with limited amenities. We also looked at located,price of the house and the address of the house in that
various predictive models such as SVR, ANN, and XGBoost area.
that have been developed and show positive correlations with
house prices.
This article [4] describes how to use generalized linear
regression models to further improve the reliability of house
price prediction and analysis. The paper cover the basics of
data mining and examine cluster analysis algorithms for
choosing generalized linear regression models as the focus of
the research. In this paper they analyze the general estimation
methods for generalized linear regression models,
nonparametric regression models and partial linear models. It
also verify the validity of the proposed model through
comparative experiments. The experimental results show the
model based on the generalized regression model in the paper
proposes house price prediction has high price prediction
accuracy. Overall the paper gives an informative discussion of Fig. 1. Linear Regression
generalized linear regression models for house price
prediction and analysis. C. Pre-processing Steps
To run the code install the necessary libraries and setup the
The [5] study is using random forest machine learning
the environment for the project.In this research , library used
techniques and in it, they are the Boston housing dataset to
are SciKit-Learn, Pandas, Seaborn, Matplotlib and Numpy
predict the prices based on variables. They compared the
and all the figure data are used on using linear regression
predicted and actual prices and found that model achieved a
technique.
±5 difference. This showed that model is useful in predicting
house prices. The initial step is to import the required libraries. We'll be
using pandas for loading the dataset, scikit-learn (sklearn) for
In this study [6], they founded that the decision tree
splitting the data and training the model, and the needed
provides most promising result with highest accuracy of
functions from sklearn for evaluating the model's
84.64%. Lasso, a supervised regularization technique used in
performance.The next step is to load the dataset. In this case,
machine learning, gives a minimum accuracy of 60.32%. The
the dataset is in a CSV file called 'housing_data.csv'.We then
accuracies of logistic regression and support vector regression
divide the data into X (features) and y (target variable) using
are 72.81% and 67.81%, respectively.
the pandas drop() function. We drop the 'Price' column
In [7] they proved that based on results , hybrid regression because that's the target variable we want to predict, and we
performs much better when compare to lasso ,ridge and also drop the 'Address' column because it contains text data
gradient-boosted regression. The result they got in hybrid which is not needed for linear regression modeling.After that,
regression is best where the test data is 0.11260 using 65% we used the train_test_split() function from SciKit-Learn to
lasso and 35% gradient boosting algorithms. split the data available into training and test sets. Once we've
split the data, we can train the linear regression model. We
III. 3. METHODOLOGY USED create an instance of the LinearRegression class from
In this research we have used dataset in linear regression sklearn.linear_model and fit the training data to the model
model. using the fit() method.Finally, we can evaluate the
performance of the model that we have build using the
A. Linear Regression mean_absolute_error(),mean_squared_error(),androot_mean
Linear Regression [8] is a machine learning algorithm that _squared_error() functions from sklearn.metrics. We'll need
performs a regression task.It is primarily utilized for to pass in the predicted values for the testing data and the
identifying the relationship between variables and for actual values for the testing data to these functions. These
forecasting purposes.. Below figure1 represents the proposed functions will return the evaluation metric scores, which we
linear regression workflow. Linear regression performs the can use to assess how well the model is performing.
task which is shown in Fig 1 below, that a given independent
variable (x) is used to predict a dependent variable value (y). IV. EXPERIMENTAL RESULT ANALYSIS
So, it finds out a linear relationship between x (input) and y The proposed model was trained on the data set using
(output) using Linear Regression. linear regression techniques. The [11] R squared value
The dataset used and the methodology used is explained in (statistical measure of how near the data are to the fitted
the subsequent sections. regression line) is a measure of how well the model works
with the data and with values ranging from 0 to 1. Where 1
B. Data set Used shows a perfect fit. RMSE and MAE are measures of how well
The dataset housing_data.csv for this work has been the model works to predicts the target variable, with lower
collected from Kaggle repository [10].The dataset contains the values showing better performance. In this paper, examining
average income of people living in a region, the average age the coefficients of a linear regression model shows the
of a house in a region, the average number of rooms in a house strength and direction of the relationship between the
in a region, the average number of bedrooms in a house in a independent and target variables. For example, a positive
region, the population of the region where house is number of bedrooms coefficient indicates that homes with
more bedrooms tend to be more expensive.

498
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)

Model has used all the data from the dataset to process
and evaluate and obtain the results as shown in below figures.

Fig. 4. Shows Scatter plot


Fig. 2. Histogram result from dataset
plt.scatter()[14] is a function used here provided by the
sns.distplot() [12] is a function that is used here provided Python library Matplotlib, which is used to create a scatter
by the Python library Seaborn, displot function can produce plot. Scatter plots are used to visualize the relationship
several dierent representations of a distribution, including a between two continuous variables .scatter plot is a graphical
histogram, kernel density estimate, or empirical cumulative representation of two sets of data plotted along two axes,
distribution function. usually the x-axis and y-axis. It is an effective tool for
visualizing the relationship between two variables.If the
values along the y-axis increase or decrease as the x-axis
increases, it suggests a positive or negative linear relationship,
respectively, between the two variables. In other words, the
two variables are correlated and have a tendency to move in
the same direction or opposite directions. This relationship
can be further quantified using statistical measures like
correlation coefficients or linear regression models.
Overall, scatter plots are a useful tool for getting insight in
the relationship between two variables and gaining insights
from data. They are simple to create, easy to understand, and
can reveal valuable information about the data that may not be
apparent from descriptive statistics alone.

Fig. 3. Data result after analysis

A [13] heatmap shown in figure 3 is a graphical


representation of 2D data where the values are represented
using colors. Typically, the values are arranged in a matrix or
a table-like format, where each row and column represents a
particular variable or category. The color of each cell in the
heatmap is determined by the value of the corresponding data
point.Seaborn's heatmap function provides an easy way to
create heatmaps in Python. It takes a rectangular data set as
input and plots a grid of colored squares in which color of their
respective square represents the value of the corresponding
data point in the matrix. By default, the heatmap function also
adds annotations to the cells, displaying the actual values of
the data points.Seaborn is built on top of Matplotlib, so it is
possible to create heatmaps using Matplotlib's scatter function Fig. 5. Shows comprehensive data analysis
as well. Howeve Seaborn provides a more convenient and
intuitive way to create heatmaps, especially when dealing with In fig 5 is extracted using seaborn pairplot which is python
larger datasets. data visualization library based on matplotlib.The pairplot()

499
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)

[15] function can also be used to showcase the subset of


variables. It is an effective plotting method to find the
concentration of data points.
V. RESULT AND ANALYSIS
The MAE (Mean Absolute Error) value of the linear
regression model on this dataset is $82,288.22, which
represents the average absolute difference between the actual
house prices and the predicted prices by the model and
average price of houses comes out to be $1232072 and the
predicted price is error of ±6.67% which shows that linear
regression could be considered for house price prediction.
VI. CONCLUSION AND FUTURE SCOPE
In conclusion, accurately predicting housing prices is
crucial in the real estate industry, and machine learning
algorithms like linear regression have shown promise in
making predictions . In this paper, we have discussed the
background and and gave special attention to linear regression
model and its shows promise but efficiency will be dependent
on quality of input data.
REFERENCES
[1] Quang Truong, Minh Nguyen, Hy Dang, Bo Mei,Housing Price
Prediction via Improved Machine Learning Techniques,Procedia
Computer Science,Volume 174,2020,Pages 433-442,ISSN 1877-
0509,https://doi.org/10.1016/j.procs.2020.06.111.(https://www.scienc
edirect.com/science/article/pii/S1877050920316318)
[2] Housing Price Prediction Using Linear Regression", International
Journal of Emerging Technologies and Innovative Research
(www.jetir.org | UGC and issn Approved), ISSN:2349-5162, Vol.8,
Issue 10, page no. ppd9-d12, October-2021, Available at :
http://www.jetir.org/papers/JETIR2110302.pdf
[3] Nor Hamizah Zulkifley, Shuzlina Abdul Rahman, Nor Hasbiah
Ubaidullah, Ismail Ibrahim, " House Price Prediction using a Machine
Learning Model: A Survey of Literature", International Journal of
Modern Education and Computer Science(IJMECS), Vol.12, No.6, pp.
46-54, 2020.DOI: 10.5815/ijmecs.2020.06.04
[4] Li X. Prediction and Analysis of Housing Price Based on the
Generalized Linear Regression Model. Comput Intell Neurosci. 2022
Sep 29;2022:3590224. doi: 10.1155/2022/3590224. PMID: 36211010;
PMCID: PMC9536958.
[5] Adetunji, Abigail & Funmilola Alaba, Ajala & Ajala, & Oyewo,
Ololade & Akande, Yetunde & Oluwadara, Gbenle & OLUWATOBI,
AKANDE. (2022). House Price Prediction using Random Forest
Machine Learning Technique. Procedia Computer Science. 199.
10.1016/j.procs.2022.01.100.
[6] Neelam Shinde, Kiran Gawande. “Valuation of house prices using
predictive techniques”, International Journal of Advances in
Electronics and Computer Science, ISSN: 2393-2835, Volume-5,
Issue-6, Jun.-2018, pp 34-40.
[7] Lu, Sifei & Li, Zengxiang & Qin, Zheng & Yang, Xulei & Goh, Rick.
(2017). A hybrid regression technique for house prices prediction. 319-
323. 10.1109/IEEM.2017.8289904.
[8] A. Begum, N. J. Kheya, and Md. Z. Rahman, “Housing Price Prediction
with Machine Learning,” International Journal of Innovative
Technology and Exploring Engineering, vol. 11, no. 3. Blue Eyes
Intelligence

500
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.

You might also like