Housing Price Prediction Model Using Machine Learning
Housing Price Prediction Model Using Machine Learning
497
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)
in rural areas with limited amenities. We also looked at located,price of the house and the address of the house in that
various predictive models such as SVR, ANN, and XGBoost area.
that have been developed and show positive correlations with
house prices.
This article [4] describes how to use generalized linear
regression models to further improve the reliability of house
price prediction and analysis. The paper cover the basics of
data mining and examine cluster analysis algorithms for
choosing generalized linear regression models as the focus of
the research. In this paper they analyze the general estimation
methods for generalized linear regression models,
nonparametric regression models and partial linear models. It
also verify the validity of the proposed model through
comparative experiments. The experimental results show the
model based on the generalized regression model in the paper
proposes house price prediction has high price prediction
accuracy. Overall the paper gives an informative discussion of Fig. 1. Linear Regression
generalized linear regression models for house price
prediction and analysis. C. Pre-processing Steps
To run the code install the necessary libraries and setup the
The [5] study is using random forest machine learning
the environment for the project.In this research , library used
techniques and in it, they are the Boston housing dataset to
are SciKit-Learn, Pandas, Seaborn, Matplotlib and Numpy
predict the prices based on variables. They compared the
and all the figure data are used on using linear regression
predicted and actual prices and found that model achieved a
technique.
±5 difference. This showed that model is useful in predicting
house prices. The initial step is to import the required libraries. We'll be
using pandas for loading the dataset, scikit-learn (sklearn) for
In this study [6], they founded that the decision tree
splitting the data and training the model, and the needed
provides most promising result with highest accuracy of
functions from sklearn for evaluating the model's
84.64%. Lasso, a supervised regularization technique used in
performance.The next step is to load the dataset. In this case,
machine learning, gives a minimum accuracy of 60.32%. The
the dataset is in a CSV file called 'housing_data.csv'.We then
accuracies of logistic regression and support vector regression
divide the data into X (features) and y (target variable) using
are 72.81% and 67.81%, respectively.
the pandas drop() function. We drop the 'Price' column
In [7] they proved that based on results , hybrid regression because that's the target variable we want to predict, and we
performs much better when compare to lasso ,ridge and also drop the 'Address' column because it contains text data
gradient-boosted regression. The result they got in hybrid which is not needed for linear regression modeling.After that,
regression is best where the test data is 0.11260 using 65% we used the train_test_split() function from SciKit-Learn to
lasso and 35% gradient boosting algorithms. split the data available into training and test sets. Once we've
split the data, we can train the linear regression model. We
III. 3. METHODOLOGY USED create an instance of the LinearRegression class from
In this research we have used dataset in linear regression sklearn.linear_model and fit the training data to the model
model. using the fit() method.Finally, we can evaluate the
performance of the model that we have build using the
A. Linear Regression mean_absolute_error(),mean_squared_error(),androot_mean
Linear Regression [8] is a machine learning algorithm that _squared_error() functions from sklearn.metrics. We'll need
performs a regression task.It is primarily utilized for to pass in the predicted values for the testing data and the
identifying the relationship between variables and for actual values for the testing data to these functions. These
forecasting purposes.. Below figure1 represents the proposed functions will return the evaluation metric scores, which we
linear regression workflow. Linear regression performs the can use to assess how well the model is performing.
task which is shown in Fig 1 below, that a given independent
variable (x) is used to predict a dependent variable value (y). IV. EXPERIMENTAL RESULT ANALYSIS
So, it finds out a linear relationship between x (input) and y The proposed model was trained on the data set using
(output) using Linear Regression. linear regression techniques. The [11] R squared value
The dataset used and the methodology used is explained in (statistical measure of how near the data are to the fitted
the subsequent sections. regression line) is a measure of how well the model works
with the data and with values ranging from 0 to 1. Where 1
B. Data set Used shows a perfect fit. RMSE and MAE are measures of how well
The dataset housing_data.csv for this work has been the model works to predicts the target variable, with lower
collected from Kaggle repository [10].The dataset contains the values showing better performance. In this paper, examining
average income of people living in a region, the average age the coefficients of a linear regression model shows the
of a house in a region, the average number of rooms in a house strength and direction of the relationship between the
in a region, the average number of bedrooms in a house in a independent and target variables. For example, a positive
region, the population of the region where house is number of bedrooms coefficient indicates that homes with
more bedrooms tend to be more expensive.
498
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)
Model has used all the data from the dataset to process
and evaluate and obtain the results as shown in below figures.
499
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.
2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET)
500
Authorized licensed use limited to: VTU Consortium. Downloaded on April 11,2025 at 13:48:48 UTC from IEEE Xplore. Restrictions apply.