0% found this document useful (0 votes)
221 views

House Price Prediction Using Machine Learning

This document discusses using machine learning algorithms to predict house prices. It describes preprocessing the data, including encoding variables, imputing missing values, and transforming features. Various regression algorithms like decision trees and lasso are used for modeling. The models are then used to forecast property prices. Data cleaning techniques are also outlined, such as dropping columns, handling null values, and creating new features from existing data. Feature engineering steps include dimensionality reduction and creating a "price per square foot" feature.

Uploaded by

phani phaniii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views

House Price Prediction Using Machine Learning

This document discusses using machine learning algorithms to predict house prices. It describes preprocessing the data, including encoding variables, imputing missing values, and transforming features. Various regression algorithms like decision trees and lasso are used for modeling. The models are then used to forecast property prices. Data cleaning techniques are also outlined, such as dropping columns, handling null values, and creating new features from existing data. Feature engineering steps include dimensionality reduction and creating a "price per square foot" feature.

Uploaded by

phani phaniii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

www.ijcrt.

org © 2023 IJCRT | Volume 11, Issue 5 May 2023 | ISSN: 2320-2882

HOUSE PRICE PREDICTION USING MACHINE


LEARNING
Bharti Vidhury, Ansh Tyagi*1, Jayant Kumar Jyoti *2 Rajat Sharma *3 ,Kaustubh Upadhyay *4
1. Assistant Professor, Department of Computer Science & Engineering, SRM Institute of Science & Technology,
Ghaziabad
2. Department of Computer Science & Engineering, SRM Institute of Science & Technology, Modinagar, Ghaziabad
3. Department of Computer Science & Engineering, SRM Institute of Science & Technology, Modinagar, Ghaziabad
4. Department of Computer Science & Engineering, SRM Institute of Science & Technology,Modinagar, Ghaziabad
5. Department of Computer Science & Engineering, SRM Institute of Science & Technology, Modinagar, Ghaziabad

ABSTRACT -

This project demonstrates the usage of 1. INTRODUCTION -


machine learning algorithms in the
This section is about Machine learning problem
prediction of House/Villa prices. House
and their solution methods. Generally, Machine
Price Index (HPI) is commonly used to
Learning problems can be classified into
estimate the changes in housing prices.
classification or Binary and Distributive
Since housing price is strongly correlated to
problems. Here we are dealing with the
other factors such as location, area, and
Distributive Problem in which we will have to
population, it requires other information
process the data, categorize it, remove the null
apart from HPI to predict individual
values, and then only we will be able to train the
housing prices. This project will
model for Distributive Problems we Mostly use
comprehensively validate multiple
LRA or Random Forest Algorithm (RFA) In
techniques in model implementation using
forward problems, we have used Linear
AWS E2C (Amazon elastic compute cloud)
Regression for solving the Problem of Predicting
and provide an optimistic result for housing
the number of cases in coming future with help of
price prediction.
real-time data collected from various sources here
Keywords: Dataset, Classifications, Machine two parameters are Days and Confirmed Cases.
Learning. In the area of Random forest, we just use various
decision trees to train them with data collected
then based on the output of all the decision trees.

IJCRT2305570 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e523


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 5 May 2023 | ISSN: 2320-2882
JUPYTER NOTEBOOK within the set. The dataset is then portioned into a
training and a test static. The involved steps are:
Jupyter Notebook is one of the best Python
1. Transforming categorical features into
interpreters provided by Anaconda which is
numerical variables.
used for performing machine learning and data
2. Replace the non-numeric or missing date with
science processes. It is available free and
correct values without disturbing central tendency.
comfortable to use.
3. Data standardization or normalization.
4. Divide the dataset into train-test sections
The null values of the ‘balcony’ feature are imputed
with mode. The null values of the ‘bath’ feature have
been imputed with mode i.e, ‘2 BHK’ in both sets. I
observe that area values are in square meters. They
are transformed into square feet as it is practically
more relevant.
Figure 1: Jupyter Notebook
interface
2.2 MODELING
SYSTEM ANALYSIS.

This stage uses regression algorithms such as


we are going to discuss the various
decision tree and lasso. These algorithms provide
experiments done to find the most
better results for regression problems.
accurate model for calculating house
prices. We will be discussing the
2.3 PRICE PREDICTION
problem and the system we are
making to solve that problem. We
will be dealing with various machine Following the classification results, we will
learning algorithms like decision forecast a property’s price and discuss the
trees, linear regression, random findings.
forest, etc. We will analyze the model
to find how the price of a ride depends
on different parameters like weather,
time, destination, surge multiplier,
icon, etc.

2. METHODOLOGY

2.1 PRE-PROCESSING

In this phase, we encode variables. As part of the Figure 2: The proposed structure of the methodology
clean-up, we do an imputation for missing values.
Then, all attempts are made to remove disparity

IJCRT2305570 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e524


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 5 May 2023 | ISSN: 2320-2882
IMPLEMENTATION AND RESULT

IMPLEMENTATION

We first build a model using sklearn and linear


regression using the Bangalore home prices dataset
from kaggle.com. Then we write a Python flask
server that uses the saved model to serve HTTP
requests. The third component is the website built in Figure 4: Data handling
HTML, CSS, and JavaScript that allows the user to
enter the home square ft area, bedrooms, etc and it Feature engineering involves creating a new “price
will call the python flask server to retrieve the per square feet” feature that can help in outlier
predicted price detection and removal. Dimensionality reduction
techniques are used to handle high dimensionality
Data cleaning techniques for a house price prediction problems like too many locations in the categorical
project, including downloading a dataset into pandas, feature "location," which is reduced using the "other"
examining the features of the dataset, dropping category.
certain columns, handling null values, and creating a
new column called BHK to account for
inconsistencies in the size feature. The dataset has
13,000 rows, and the dependent variable is price. It
covers installing Anaconda distribution and
importing basic libraries such as Jupiter Notebook
and Pandas. The data cleaning process involves
dropping null values and applying a lambda function
to transform the BHK column into an integer value.

Figure 5: Feature engineering


As a data scientist when we have a conversation with
a business manager(who is an expert in real estate),
he told us that normally square ft per bedroom is 300
(i.e 2 bhk apartment is a minimum of 600 sqft. If you
have for example 400 sqft apartment with 2 bhk then
that seems suspicious and can be removed such
outliners by keeping our minimum threshold per
Figure 3: Model using sklearn
BHK to 300 sqft.

IJCRT2305570 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e525


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 5 May 2023 | ISSN: 2320-2882
Outlier detection and removal in a real estate price prepared for model building by dropping the
prediction project. Outliers are data points that unnecessary columns and creating X and Y variables.
represent extreme variations in a dataset and can The data set is split into a training and test data set. A
create issues. Techniques for detecting and removing linear regression model is created and trained on the
outliers include using the standard deviation or X and Y training data. The model score is evaluated
domain knowledge. An example of using domain to be 84%, which is considered decent. K-fold cross-
knowledge is removing data points where the square validation and shuffle split are used to evaluate the
footage per bedroom is less than a typical threshold model’s performance with different samples.
of 300.
Scatter plot where green stars are 2bhk and blue dots Now that our model is built and the artifacts are
are 3bhk, where the x-axis has the price of that area exported the next step would be to write a Python
and y axis has the total square feet area. flash server that can sell HTTP requests made from
the UI and it can predict the house prices. we are
going to write that Python flask server which will be
used as a back end for our UI application. The first
step is to download PyCharm Community Edition.
The project directory has three subfolders - client,
server, and model - and two artifacts - a saved model
and a columns JSON file. In the server folder, create
a file named server.py and import the Flask module.
Figure 5: Scatter plot Configure the interpreter as anaconda in the File
Settings. Define the main function to run the Flask
app on a specific port. Define a simple 'hello'
function to return "Hi" using app. route() method.
Run the Flask server on a specific URL by using the
'python server.py' command in the terminal. Create a
subdirectory within the server directory named
'artifacts' and copy the exported model and columns
JSON file into it. Define a function named
Figure 6: Histogram 'get_location_names' in util.py to read the column
JSON file and return a list of all the locations. Import
Using k-fold cross-validation and grid cv to find the util.py in server.py and call the get_location_names()
best algorithm and parameters. One hot encoding is function to return a JSON response containing all the
used to convert the location column from categorical location names. Load the saved artifacts into global
to numerical data. The panda’s dummies method is variables using a function named
used for one hot encoding. A separate data frame is 'load_saved_artifacts' in util.py.Define a function
created for the dummy columns and appended to the named 'get_estimated_price' in util.py to return the
main data frame. To avoid a dummy variable trap, predicted price of a given location, total square feet,
one less dummy column is used. The data frame is number of bedrooms, and number of bathrooms.
IJCRT2305570 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e526
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 5 May 2023 | ISSN: 2320-2882
Import util.py in server.py and call the
get_estimated_price() function to return a JSON
response containing the predicted price. Test the
Flask server by running the 'python server.py'
command in the terminal, opening the URL in the
browser, and testing the 'get_location_names' and
'get_estimated_price' functions.

DEPLOY Figure 7: python flask server


Deploy a machine learning model to production on
Amazon EC2 instance using Flask and Nginx. This
project will be deployed is a Bangalore home price
prediction website. The website and Flask server will
be running on the same Amazon EC2 instance. The
architecture of the application involves using Nginx
as the web server and Flask as the Python server.
Nginx will handle two HTTP requests, one for the
website and the other for the Flask server. The
website files will be returned by Nginx, while the
Flask server will handle the prediction request using Figure 8: Website
a saved ML model.

RESULT CONCLUSION
We build a website using HTML, CSS, and In this paper, an overview of the concept of machine
JavaScript to serve as the front end of the house price learning along with its various applications is
prediction project. The website communicates with a discussed. Taking the sample dataset for houses, and
back-end server using jQuery to retrieve data and considering its various attributes, the prices for
estimate prices. We used Visual Studio Code as the houses have been predicted by employing machine
code editor for the website. The HTML code contains learning methods of regression for predicting the
two sections, head, and body, and the body includes price of the house using prior data, and clustering-
various UI elements such as input fields, dropdowns, for inspecting the quality of the solution and output.
and buttons. The JavaScript code communicates with House selling prices are calculated using various
the back-end server to retrieve data and dynamically algorithms. The selling price was calculated with
populate the dropdowns. Finally, we implement a better accuracy and accuracy than. This will be of
function for the "estimate price" button to provide the great help to people. Various factors that affect home
estimated price to the user. 888888Python flask prices need to be considered and addressesd.
server used as a backend for UI application, flask is a .
model that allows writing a python service which can
saw HTTP request.
IJCRT2305570 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e527
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 5 May 2023 | ISSN: 2320-2882
FUTURE SCOPE . and technology (ICIST). IEEE, 2019, pp. 491–
The accuracy of the gadget may be improved.
495.
Several extra cities may be protected within the
 Q. Truong, M. Nguyen, H. Dang, and B. Mei,
gadget if the gadget's scale and computational
strength increase. In addition, we can integrate “Housing price prediction via improved machine
different UI/UX methods to better visualize the
learning”.
results in a more interactive way using Augmented
Reality.  Abdul G . Sario, Muhammad burhanHafez,

REFRENCES “Fuzzy logic application for House price

prediction,2015”.
 J. Manasa, R. Gupta, and N. Narahari, “Machine

learning based predicting house prices using  Andrzej Bilazar and Maurizo d’ Amato,

regression techniques,” in 2020 2nd International “Residential market ratings using fuzzy logic

Conference on innovative mechanisms for decision-making procedures,2019”.

industry applications (ICIMIA). IEEE, 2020, pp.  Jian Guan, Jozef Zurada and Alan S. Levitan,

624–630. “An adaptive Neuro-Fuzzy inference system-

 R. Sawant, Y. Jangid, T. Tiwari, S. Jain, and A. based approach to real Estate property

Gupta, “Comprehensive analysis of housing price Assessment, 2020”.

prediction in pune using multi-featured random  Anand G. Rawaal, Dattatrav V. Rogye, Sainath

forest approach,” in 2018 Fourth International G. Rana, Dr. vinayk A, “House Price prediction

Conference on Computing Communication using machine learning 2021”.

Control and Automation (ICCUBEA). IEEE,  Pei-ying wang1, Chiao-Ting chen2, Jain-Wun

2018, pp. 1–5. su1,Ting-Yun Wang1,SZu-Hao Huang3, “Deep

 P.-Y. Wang, C.-T. Chen, J.-W. Su, T.-Y. Wang, learning model for House Price prediction Using

and S.- H. Huang, “Deep learning model for Heterogeneous Data Analysis along with joint

house price prediction using heterogeneous data self-attentionmechanism,2021”.

analysis along with joint self-attention

mechanism,” IEEE Access, vol. 9, pp. 55 244–55

259, 2021.

 Y. Piao, A. Chen, and Z. Shang, “Housing price

prediction based on cnn,” in 2019 9th

international conference on information science


IJCRT2305570 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org e528

You might also like