0% found this document useful (0 votes)

30 views51 pages

House Price Prediction with ML Techniques

The project report details the development of a house price prediction model using Python, aimed at assisting buyers and sellers in the real estate market by providing accurate price estimations based on various factors. The students conducted a literature survey on machine learning algorithms, particularly focusing on linear regression, decision trees, and random forests to enhance their predictive model. The project emphasizes the importance of machine learning in analyzing data trends and making informed decisions in real estate investments.

Uploaded by

dibyarashmiulick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views51 pages

House Price Prediction with ML Techniques

Uploaded by

dibyarashmiulick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A PROJECT REPORT

“HOUSE PRICE PREDICTION USING PYTHON”

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

[Link] IN

COMPUTER SCIENCE & ENGINEERING

SUBMITTED BY

NAME OF THE STUDENT REGISTRATION NO.

1 SUCHISMITA SAHOO 2221304049

2 SURAJ SAHOO 2221304051
3 TUSAR KANTA DHAL 2221304052
4 DIBYARASHMI BHANJA 2231304002

GANDHI INSTITUTE OF TECHNOLOGY AND MANAGEMENT

(GITAM) BHUBANESWAR
YEAR

2024-25

i
GANDHI INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(GITAM) BHUBANESWAR, ODISHA

CERTIFICATE
This is to certify that the work in this Project Report entitled
“HOUSE PRICE PREDICTION USING PYTHON” by
Suchismita Sahoo (2221304049), Suraj Sahoo (2221304051), Tusar
Kanta Dhal(2221304052), Dibyarashmi Bhanja (2231304002)
have been carried out under my supervision in partial fulfillment of
the requirements for the [Link] in Computer Science &
Engineering during session 2024-2025 in Department of Computer
Science & Engineering of GITAM and this work is the original
work of the above students.

Er. Prasanjit Das

(Project Guide)

ii
GANDHI INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(GITAM) BHUBANESWAR, ODISHA

ACKNOWLEDGMENT

This project is done as a semester project, as a part course titled

"MINOR PROJECT" . We are really thankful to our HOD.
Department of Computer Science &Engineering ,GANDHI
INSTITUTE OF TECHNOLOGY AND MANAGEMENT,
for his invaluable guidance and assistance, without which the
accomplishment of the task would have never been possible.
We also thank Er. Prasanjit Das for giving this opportunity to
explore into the real world and realize the interrelation without
which a Project can never progress . In our present project we
have chosen the topic- “HOUSE PRICE PREDICTION
USING PYTHON”.We are also thankful to parents, friend and
all staff of Department of Computer Science & Engineering
for providing us relevant information and necessary
clarifications, and great support.

NAME OF THE STUDENT REGISTRATION NO.

1 SUCHISMITA SAHOO 2221304049

2 SURAJ SAHOO 2221304051

3 TUSAR KANTA DHAL 2221304052

4 DIBYARASHMI BHANJA 2231304002

iii
GANDHI INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(GITAM) BHUBANESWAR, ODISHA

DECLARATION
We declare that every part of project report submitted is
genuinely our work and has not been submitted for the
award of any other degree. We acknowledge that, if any
sort of malpractice is detected in relation to this project,
we shall be held liable for it.

Submitted By:
Suchismita Sahoo
Suraj Sahoo
Tusar Kanta Dhal
Dibyarashmi Bhanja

iv
ABSTRACT

We propose to implement a house price prediction model of

Bangalore, India. Buyers focus on finding a suitable home/flat
within their budget, and consider their investment on house to
increases over a period of time. On the other hand, Sellers aim to
sell their homes at the best possible price. Since house prices are
subject to fluctuations, customers often face difficulties in
purchasing a house at the right time before prices change in the
near future. To address this major issue in the real estate market,
we are designing a machine learning model for predicting house
prices. Machine learning techniques play a vital role in this project
by providing more precise house price estimations based on user
preferences such as location, number of rooms, and air quality,
among others. Housing prices fluctuate on a daily basis and are
sometimes exaggerated rather than based on worth. The major
focus of this project is on predicting home prices using genuine
factors. Here, we intend to base an evaluation on every basic
criterion that is taken into account when establishing the pricing.
The goal of this project is to learn Python and get experience in
Data Analytics, Machine Learning.

i
LIST OF TABLES

Table 1 Application

LIST OF FIGURES

Figure 1 Introduction to Machine Learning

Figure 2 Working of Machine Learning
Figure 3 Application of Machine Learning
Figure 4 Supervised Learning
Figure 5 Types of supervised Machine learning
Figure 6 Linear Regression in Machine Learning
Figure 7 Linear Regression Line
Figure 8 Negative Linear Relationship
Figure 9 Random Forest Algorithm
Figure 10 Decision Tree Classification Algorithm
Figure 11 Decision Tree algorithm Working
Figure 12 Logistic Regression
Figure 13 Working of Unsupervised Learning
Figure 14 Clustering in Machine Learning
Figure 15 Visualization insights
Figure 16 Location Diversity
Figure 17 Evaluation Metric

ii
TABLE OF CONTENTS

Page Number
Certificate ii
Acknowledgement iii
Abstract iv
List of Table v
List of Figures v
PHASE-I
1. Introduction
1.1 Introduction 1
1.2 Motivation 2
2. Literature Survey
2.1 Literature Survey 3
3. Proposed Work
3.1 Objective of proposed work 9
3.2 Methodology 9
3.2.1 Introduction to machine learning 9
3.2.2 How does Machine Learning work? 11
3.2.3 Need for Machine Learning 12
3.2.4 Applications of Machine learning: - 12
3.2.5 Machine Learning Classifications 15
[Link] Supervised Learning 15
[Link] Unsupervised Machine Learning 28
[Link] Reinforcement Learning: 30
PHASE-II
4. Implementation
4.1 Code 31
5. Result Analysis
5.1 Visualization Insights 33
5.2 Advantages 36
5.3 Disadvantages 37
5.4 Maintenance 39
5.5 Application 41
6. Conclusion and Future Development 42
Reference 43

iii
CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION:

The real estate industry is a significant contributor to the global economy.

Accurately predicting house prices is essential for buyers, sellers, and investors in
this industry. Traditionally, real estate agents and property appraisers rely on their
experience and knowledge to determine the value of a house. However, with the
rapid growth of data and machine learning algorithms, predicting house prices has
become more precise and efficient. Machine learning algorithms can analyze vast
amounts of data and identify patterns to predict future prices accurately. In this
project, we aim to develop a model that can predict house prices based on various
factors such as location, size, number of bedrooms and bathrooms, age of the
property, and other features. This project's objective is to explore different machine
learning algorithms and identify the most effective approach for predicting house
prices accurately. The results of this study can help real estate agents, property
appraisers, and investors make informed decisions about buying, selling, and
investing in properties. This proposed version could make viable who are expecting
the precise costs of house. This model of Linear regression in machine learning takes
the internal factor of house valuation dependencies like area, no of bedrooms,
locality etc. and external factors like air pollution and crime rates. This Linear
regression in machine learning gives the output of price of the house with more
accuracy. Here in this project, we are going to use linear regression algorithm (a
supervised learning approach) in machine learning to build a predictive model for
estimation of the house price for real estate customers. In this project we are going to
build the Machine learning model by using python programming and other python
tools like NumPy, pandas, matplotlib etc.

1
1.2 Motivation
We are highly interested in anything related to Machine Learning, the independent
project provided us with the opportunity to study and reaffirm our passion for this
subject. The capacity to generate guesses, forecasts, and offer machines the ability to
learn on their own is both powerful and infinite in terms of application possibilities.
Machine Learning may be applied in finance, medicine, and virtually any other field.
That is why we opted to base our idea on Machine Learning.

2
CHAPTER 2
LITERATURE SURVEY

2.1 Literature Survey

We are conducting an analysis of various Machine Learning algorithms in this

project to enhance the training of our Machine Learning model. The study focuses on
housing cost trends, which serve as indicators of the current economic situation and
have direct implications for buyers and sellers. The actual cost of a house depends
on numerous factors, including the number of bedrooms, bathrooms, and location. In
rural areas, the cost tends to be lower compared to cities. Additionally, factors such
as proximity to highways, malls, supermarkets, job opportunities, and good
educational facilities greatly influence house prices.
To address this issue, our research paper presents a survey on predicting house
prices by analyzing given features. We employed different Machine Learning
models, including Linear Regression, Decision Tree, and Random Forest, to
construct a predictive model with their working accuracy. Our approach involved a
step-by-step process, encompassing Data Collection, Pre-Processing Data, Data
Analysis to Model Building.
Real Estate Property is not only a person's primary desire, but it also reflects a
person's wealth and prestige in today's society. Real estate investment typically
appears to be lucrative since property values do not drop in a choppy fashion. Changes
in the value of the real estate will have an impact on many home investors, bankers,
policymakers, and others. Real estate investing appears to be a tempting option for
investors. As a result, anticipating the important estate price is an essential economic
indicator. According to the 2011 census, the Asian country ranks second in the
world in terms of the number of households, with a total of 24.67 crores. However,
previous recessions have demonstrated that real estate costs cannot be seen. The
expenses of significant estate property are linked to the state's economic

3
situation. Regardless, we don't have accurate standardized approaches to live the
significant estate property values.

First, we looked at different articles and discussions about machine learning for
housing price prediction. The title of the article is house price prediction, and it is
based on machine learning and neural networks. The publication's description is
minimal error and the highest accuracy. The aforementioned title of the paper is
Hedonic models based on price data from Belfast infer that submarkets and
residential valuation this model is used to identify over a larger spatial scale and
implications for the evaluation process related to the selection of comparable
evidence and the quality of variables that the values may require. Understanding
current developments in house prices and homeownership are the subject of the
study. In this article, they utilized a feedback mechanism or social pandemic that
fosters a perception of property as an essential market investment.
In this section, first of all, the basic economic structure affecting housing prices is
emphasized. Houses meet the shelter needs of people and are also an investment tool.
The housing market differs from other markets in that housing is both a
consumption and an investment good. Housing markets differ from other markets in
that the housing supply is very costly, the housing is permanent and continuous,
heterogeneous, fixed, causes growth in the secondary markets, and is used as a
guarantee (Iacoviello, 2000). The housing market is formed through a mechanism of
housing supply and demand. In the housing market, unlike the goods and services
market, the housing supply is inelastic. Supply and demand for housing change and
develop over time depending on the economic, social, cultural, geographical, and
demographic realities of the countries. Meeting the housing demand is associated
with housing policies and economic conditions. Housing demand arises for different
purposes such as consumption, investment, and wealth accumulation. The supply
and demand factors change according to the type of housing demand. In addition to
the input costs of the house as a product,

4
the determination of the price of the house is affected by many variables such as
people’s income level, marital status, industrialization of the society and agricultural
employment rate, interest rates, population growth and migration, and all variables
also affect the price. Since changes in housing prices affect both socio-economic
conditions and national economic conditions, it is an important issue that concerns
governments and individuals (Kim and Park, 2005). Housing demand arises for
different purposes such as consumption, investment, and wealth accumulation. In
this part of the literature, some studies that estimate housing prices are cited. The
prediction of houses with real factors is important for the studies. With the
developments in artificial intelligence methods, it now allows the solution of many
problems in daily life such as purchasing a house. The competitive nature 4
AESTIMUM JUST ACCEPTED MANUSCRIPT of the housing sector helps the
data mining process in this industry, processing this data and predicting its future
trends. Regression is a machine learning tool that encourages to build expectations
from available measurable information by taking the links between the target
parameter and many different independent parameters. The cost of a house is based
on several parameters. Machine learning is one of the most important areas to apply
ideas on how to increase costs and predict with high accuracy. Machine learning
method is one of the recent methods used for prediction. It is used to interpret and
analyze highly complex data structures and patterns (Ngiam and Khor, 2019).
Machine learning predicts that computers learn and behave like humans (Feggella,
2019). Machine learning means providing valid dataset, and moreover predictions
are based on it, machine learns how important a particular event might be on the
whole system based on pre-loaded data and predicts the outcome accordingly.
Various modern applications of this technique include predicting stock prices,
predicting the probability of an earthquake, and predicting company sales, and the
list has infinite possibilities (Shiller, 2007). Unlike traditional econometrics models,
machine learning algorithms do not require the training data to be normally
distributed. Many statistical tests rely on the assumption of normality. If the data are
not

5
normally distributed, these statistical tests will fail and become invalid. These
processes used to take a long time, however, today they can be completed quickly
with the high-speed computing power of modern computers and therefore this
technique is less costly and less timely to use. Rafiei and Adeli (2016) used SVR to
determine whether a property developer should build a new development or stop the
construction at the beginning of a project based on the prediction of future house
prices. The study, in which data from 350 apartment houses built in Tehran (Iran)
between 1993 and 2008 were used, had 26 features such as zip code, gross floor area,
land area, estimated cost of construction, construction time, and property prices. Its
results revealed that SVR was a suitable method for making home price predictions
since the loss of prediction (error) was as low as 3.6% of the test data. Therefore, the
prediction results provide valuable input to the property developer’s decision- making
process. Cechin et al. (2000) analyzed the data of buildings for sale and rental in
Porto Alegre, Brazil, using linear regression and artificial neural network methods.
They used parameters such as the size of the house, district, geographical location,
environmental arrangement, number of rooms, building construction date and total
area of use. According to the study, they reported that the artificial neural network
method was more useful compared to linear regression. Yu and Wu (2016) used the
classification and regression algorithms. According to the analysis, living area
square meter, roof content and neighborhood have the greatest statistical
significance in predicting the selling price of a house, and the prediction analysis
can be improved by the Principal Component Analysis (PCA) technique. Because
the value of a particular property is closely associated with the infrastructure
facilities surrounding the property. Koktashev et al. (2019) attempted to predict the
house values in the city of Krasnoyarsk by using 1.970 housing transaction records.
The number of rooms, total area, floor, parking lot, type of repair, number of
balconies, type of bathroom, number of elevators, garbage disposal, year of
construction and accident rate of the house were discussed as the features in that
study. They applied random forest, ridge regression, and linear regression to predict
the property prices. Their study concluded

6
that the random forest outperformed the other two algorithms, as evaluated by the
Mean Absolute Error (MAE). Park and Bae (2015) developed a house price
prediction model with machine learning algorithms in real estate research and
compared their performance in terms of classification accuracy. Their study aimed at
helping real estate sellers or real estate agents to make rational decisions in real
estate transactions. The tests showed that the accuracy-based Repeated Incremental
Pruning to Produce Error Reduction (RIPPER) consistently outperformed other
models in house price prediction performance. Bhagat et al. (2016) studied on linear
regression algorithms for house prediction. The aim of the study was to predict the
effective price of the real estate for clients based on their budget and priorities. They
indicated that the linear regression technique of the analysis of past market trends
and price ranges could be used to determine future house prices. In their study,
Mora-Esperanza and Gallego (2004) analyzed house prices in Madrid using 12
parameters. The parameters they used were the distance to the city center, road, size
of the district, construction class, age of the building, renovation status, housing
area, terrace area, location within the district, housing design, the floor and the
presence of outbuildings. The dataset was created assuming that the sales values of
100 houses for sale in the region were the real values. Researchers, who used the
ANN and linear regression analysis technically, reported that the ANN technique
was more successful and achieved an average agreement of 95% and an accuracy of
86%. Wang and Wu (2018) used 27.649 data on home appraisal price from
Airlington County, Virginia, USA in 2015 and suggested that Random Forest
outperformed the linear regression in terms of accuracy. In their study in the case of
Mumbai, India, Varma et al. (2018) attempted to predict t the price of the house by
using various regression techniques (Linear Regression, Forest regression, boosted
regression) and artificial neural network technique based on the features of the house
(usage area, number of rooms, number of bathrooms, parking lot, elevator,
furniture). In conclusion, they determined that the efficiency of the algorithm with
the use of artificial neural networks was higher compared to other regression
techniques. They also revealed that the system prevented the

7
risk of investing in the wrong house by providing the right output. Thamarai and
Malarvizhi (2020) attempted to predict the prices of houses from real- time data
after the large fluctuation in house price increases in 2018 at the Tadepalligudem
location of West Godavari District in Andhra Pradesh, India using the features of the
number of bedrooms, age of the house, transportation facilities, nearby schools, and
shopping opportunities. They applied these models in decision tree regression and
multiple linear regression techniques, which are among the machine learning
techniques. They suggested that the performance of multiple linear regression was
better than decision tree regression in predicting the house prices.
Zhao et al. [1] who applied deep learning in combination with extreme Gradient
Boosting (XGBoost) for real estate price predictions, by analyzing historical
property sale records. The dataset was extracted from Online Real Estate website.
The data split into 80% as training set and 20% as testing test. According to Satish et
al. [2] regression deals with specifying the relationship between dependent also
called as response or outcome and independent variable or predictor. The study
aimed to predict future house price with the help of machine learning algorithm.

8
CHAPTER 3.
PROPOSED WORK
3.1 Objective of proposed work

As a first project, we intended to make it as instructional as possible by tackling

each stage of the machine learning process and attempting to comprehend it well.
We have picked Bangalore Real Estate Prediction as a method, which is known as a
"toy issue," identifying problems that are not of immediate scientific relevance but
are helpful to demonstrate and practice. The objective was to forecast the price of a
specific apartment based on market pricing while accounting for various features.

3.2 Methodology

3.2.1 Introduction to machine learning

Machine learning uses various algorithms for building mathematical models and
making predictions using historical data or information. Currently, it is being used
for various tasks such as image recognition, speech recognition, email filtering,
Face book auto-tagging, recommender system, and many more. This machine
learning tutorial gives you an introduction to machine learning along with the wide
range of machine learning techniques such as Supervised, Unsupervised, and
Reinforcement learning. You will learn about regression and classification models,
clustering methods, hidden Markov models, and various sequential models We’ve
seen machine learning in action in a variety of businesses, like Facebook, where it
helps us identify ourselves and our friends, and YouTube, where it helps us discover
new content. Where it recommends movies based on our preferences. Machine
learning is divided into two type’s unsupervised learning and supervised learning. A
data analyst often employs supervised learning to address problems like
classification and regression, implying that the data in this case is targetable and that
we want to anticipate in the future, such as assessing a student's worth or the
amount of monthly costs in the real world, we are

9
surrounded by humans who can increasing data. This research includes the history of
machine learning learn everything from their experiences with their learning
capability, and we have computers or machines which work on our instructions. But
can a machine also learn from experiences or past data like a human does? So here
comes the role of Machine Learning. It is a science that will improve more in the
future. The reason behind this development is the difficulty of analyzing and
processing the rapidly increasing data. Machine learning is based on the principle of
finding the best model for the new data among the previous data thanks to this
increasing data. Therefore, machine learning researches will go on in parallel with
the, the methods used in machine learning, its application fields, and the researches
on this field. The aim of this study is to transmit the knowledge on machine learning,
which has become very popular nowadays, and its applications to the researchers.
There is no error margin in the operations carried out by computers based an
algorithm and the operation follows certain steps. Different from the commands
which are written to have an output based on an input, there are some situations
when the computers make decisions based upon the present sample data. In those
situations, computers may make mistakes just like people in the decision-making
process. That is, machine learning is the process of equipping the computers with the
ability to learn by using the data and experience like a human brain. The main aim of
machine learning is to create models which can train themselves to improve,
perceive the complex patterns, and find solutions to the new problems by using the
previous data.

Fig. 1 Introduction to Machine Learning

10
3.2.2 How does Machine Learning work?
A Machine Learning system learns from historical data, builds the prediction models,
and whenever it receives new data, predicts the output for it. The accuracy of
predicted output depends upon the amount of data, as the huge amount of data helps
to build a better model which predicts the output more accurately. Suppose we have
a complex problem, where we need to perform some predictions, so instead of
writing a code for it, we just need to feed the data to generic algorithms, and with the
help of these algorithms, machine builds the logic as per the data and predict the
output. Machine learning has changed our way of thinking about the problem. The
below block diagram explains the working of Machine Learning algorithm:

Fig. 2 Working of Machine Learning

Features of Machine Learning: -

• Machine learning uses data to detect various patterns in a given
dataset.
• It can learn from past data and improve automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as it also deals
with the huge amount of the data.

11
3.2.3 Need for Machine Learning
The need for machine learning is increasing day by day. The reason behind the need
for machine learning is that it is capable of doing tasks that are too complex for a
person to implement directly. As a human, we have some limitations as we cannot
access the huge amount of data manually, so for this, we need some computer
systems and here comes the machine learning to make things easy for us. We can
train machine learning algorithms by providing them the huge amount of data and
let them explore the data, construct the models, and predict the required output
automatically. The performance of the machine learning algorithm depends on the
amount of data, and it can be determined by the cost function. With the help of
machine learning, we can save both time and money. The importance of machine
learning can be easily understood by its use’s cases, Currently, machine learning is
used in self-driving cars, cyber fraud detection, face recognition, and friend
suggestion by Facebook, etc. Various top companies such as Netflix and Amazon
have built machine learning models that are using a vast amount of data to analyses
the user interest and recommend product accordingly.

3.2.4 Applications of Machine learning: -

Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing

it such as Google Maps, Google assistant, Alexa, etc. Below are some
most trending real-world applications of Machine Learning:

Machine learning Life cycle

Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system
work? So, it can be described using the life cycle of machine learning. Machine
learning life cycle is a cyclic process to build an

12
efficient machine learning project. The main purpose of the life cycle is to find a
solution to the problem or project. Machine learning life cycle involves seven major
steps, which are given below:
• Gathering Data
• Data preparation
• Data Wrangling
• Analyze Data
• Train the model
• Test the model

Fig 3 Application of Machine Learning

Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this
step is to identify and obtain all data-related problems. In this step, we need to
identify the different data sources, as data can be collected from various sources
such as files, database, internet, or mobile devices. It is one of the most important
steps of the life cycle. The quantity and quality of the collected data will determine
the efficiency of the output. The more will be the data, the more accurate will be the
prediction. This step includes the below tasks:

13
• Identify various data sources
• Collect data
• Integrate the data obtained from different source

Data preparation:
After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training. In this step, first, we put all data together, and then
randomize the ordering of data. This step can be further divided into two processes:

Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.

Data Wrangling:
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues. It is not necessary that data we have
collected is always of our use as some of the data may not be useful. In real-world
applications, collected data may have various issues, including:
• Missing Values
• Duplicate data
• Invalid data
• Noise

14
Data Analysis
• Now the cleaned and prepared data is passed on to the analysis step. This
step involves:
• Selection of analytical techniques
• Building models
• Review the result

The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the determination
of the type of the problems, where we select the machine learning techniques
such as Classification, Regression, Cluster analysis, Association, etc. then build
the model using prepared data, and evaluate the model.

Deployment
The last step of machine learning life cycle is deployment, where we deploy the
model in the real-world system. If the above-prepared model is producing an accurate
result as per our requirement with acceptable speed, then we deploy the model in the
real system. But before deploying the project, we will check whether it is improving
its performance using available data or not. The deployment phase is similar to
making the final report for a project.

3.2.5 Machine Learning Classifications

Machine Learning can be examined in four parts as follows;
• Supervised learning
• Unsupervised learning
• Reinforced learning

[Link] Supervised Learning

Supervised learning is a type of machine learning method in which we provide
sample labelled data to the machine learning system in order to train it, and on that
basis, it predicts the output. The system creates a model using

15
labelled data to understand the datasets and learn about each data, once the training
and processing are done then we test the model by providing a sample data to check
whether it is predicting the exact output or not. The goal of supervised learning is to
map input data with the output data. The supervised learning is based on supervision,
and it is the same as when a student learns things in the supervision of the teacher.
The example of supervised learning is spam filtering. Supervised learning is a
process of providing input data as well as correct output data to the machine learning
model. The aim of a supervised learning algorithm is to find a mapping function to
map the input variable(x) with the output variable(y). In the real-world, supervised
learning can be used for Risk Assessment, Image classification, Fraud Detection,
spam filtering, etc. In supervised learning, models are trained using labelled dataset,
where the model learns about each type of data. Once the training process is
completed, the model is tested on the basis of test data (a subset of the training set),
and then it predicts the output. The working of Supervised learning can be easily
understood by the below example and diagram:

Fig. 4 Supervised Learning

Suppose we have a dataset of different types of shapes which includes square,

rectangle, triangle, and Polygon. Now the first step is that we need to train the model
for each shape.

• If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
• If the given shape has three sides, then it will be labelled as a triangle.

16
• If the given shape has six equal sides, then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is
to identify the shape. The machine is already trained on all types of shapes, and
when it finds a new shape, it classifies the shape on the bases of a number of sides,
and predicts the output.

Steps Involved in Supervised Learning

• First Determine the type of training dataset

• Collect/Gather the labelled training data.
• Split the training dataset into training dataset, test dataset, and validation
dataset.
• Determine the input features of the training dataset, which should have
enough knowledge so that the model can accurately predict the output.
• Determine the suitable algorithm for the model, such as support vector
machine, decision tree, etc.
• Execute the algorithm on the training dataset. Sometimes we need validation
sets as the control parameters, which are the subset of training datasets.
• Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems which is
shown in figure:

17
Fig. 5 Types of supervised Machine learning

Regression

Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc. Below are some popular Regression
algorithms which come under supervised learning:

• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression

Linear Regression in Machine Learning: -

Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such as sales,
salary, age, product price, etc. Linear regression algorithm shows a linear
relationship between a dependent (y) and one or more independent (y) variables,
hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent

18
variable is changing according to the value of the independent variable. The linear
regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Fig 6 Linear Regression in Machine Learning

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value). ε =
random error
The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression:

Linear regression can be further divided into two types of the algorithm:

19
• Simple Linear Regression: If a single independent variable is used to
predict the value of a numerical dependent variable, then such a Linear
Regression algorithm is called Simple Linear Regression.
• Multiple Linear Regressions: If more than one independent variable is used
to predict the value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.

Linear Regression Line

• A linear line showing the relationship between the dependent and
independent variables is called a regression line. A regression line can show
two types of relationship:
• Positive Linear Relationship: If the dependent variable increases on the Y-
axis & independent variable increases on X-axis, then such a relationship s
termed as a Positive linear relationship.

Fig 7 Linear Regression Line

Negative Linear Relationship

If the dependent variable decreases on the Y-axis and independent variable increases
on the X-axis, then such a relationship is called a negative linear relationship.

20
Fig 8 Negative Linear Relationship

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized.
The best fit line will have the least error. The different values for weights or the
coefficient of lines (a0, a1) gives a different line of regression, so we need to
calculate the best values for a0 and a1 to find the best fit line, so to calculate this we
use cost function.

Cost function

• The different values for weights or coefficient of lines (a0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
• Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.
• We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function
is also known as Hypothesis function.
• For Linear Regression, we use the Mean Squared Error (MSE) cost function,
which is the average of squared error occurred between the

21
predicted values and actual values. It can be written as: For the above linear
equation, MSE can be calculated as:

Where,
N=Total number of observations Yi =
Actual value
(a1xi+a0) = Predicted value.

Classification
Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-No, Male-
Female, True-false, etc.
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the

supervised learning technique. It can be used for both Classification and Regression
problems in ML. It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and to improve the
performance of the model. As the name suggests, "Random Forest is a classifier that
contains a number of decision trees on various subsets of the given dataset and takes
the average to improve the predictive accuracy of that dataset." Instead of relying on
one decision tree, the random forest takes the prediction from each tree and based on
the majority votes of predictions, and it predicts the final output. The greater number
of trees in the forest leads to higher accuracy and prevents the

22
problem of over fitting. The below diagram explains the working of the Random
Forest algorithm:

Fig 9 Random Forest Algorithm

Why use Random Forest?

Below are some points that explain why we should use the Random Forest algorithm:

• It takes less training time as compared to other algorithms.

• It predicts output with high accuracy, even for the large dataset it runs
efficiently.
• It can also maintain accuracy when a large proportion of data is missing.

Applications of Random Forest

There are mainly four sectors where Random Forest mostly used:
• Banking: Banking sector mostly uses this algorithm for the
identification of loan risk.
• Medicine: With the help of this algorithm, disease trends and
risks of the disease can be identified.

23
• Land Use: We can identify the areas of similar land use by this
algorithm.
• Marketing: Marketing trends can be identified using this
algorithm.

Advantages of Random Forest

• Random Forest is capable of performing both Classification and
Regression tasks.
• It is capable of handling large datasets with high
dimensionality.
• It enhances the accuracy of the model and prevents the over fitting
issue.

Advantages of Supervised learning

• With the help of supervised learning, the model can predict the
output on the basis of prior experiences.
• In supervised learning, we can have an exact idea about the
classes of objects.
• Supervised learning model helps us to solve various real- world
problems such as fraud detection, spam filtering, etc.

Disadvantages of supervised learning: -

• Supervised learning models are not suitable for handling the complex tasks.
• Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
• Training required lots of computation times.
• In supervised learning, we need enough knowledge about the classes of
object.

24
Decision Tree Classification Algorithm: -

• Decision Tree is a supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and
each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
• The decisions or the test are performed on the basis of features of the given
dataset.
• In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into sub trees.
• Below diagram explains the general structure of a decision tree:

Fig 10 Decision Tree Classification Algorithm

25
Why use Decision Trees?

• There are various algorithms in Machine learning, so choosing the best

algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model. Below are the two reasons for
using the Decision tree:

• Decision Trees usually mimic human thinking ability while making a

decision, so it is easy to understand. The logic behind the decision tree can be
easily understood because it shows a tree-like structure.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts
from the root node of the tree. This algorithm compares the values of root attribute
with the record (real dataset) attribute and, based on the comparison, follows the
branch and jumps to the next node. For the next node, the algorithm again compares
the attribute value with the other sub-nodes and move further. It continues the process
until it reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:

• Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the best
attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node

26
as a leaf node. Finally, the decision node splits into two leaf nodes (Accepted
offers and Declined offer). Consider the below diagram:

Fig 11 Decision Tree algorithm Working

Logistic Regression
• Logistic Regression is a significant machine learning algorithm because it
has the ability to provide probabilities and classify new data using continuous
and discrete datasets.
• Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification. The below image is showing the logistic function:

Fig 12 Logistic Regression

27
[Link] Unsupervised Machine Learning:
Unsupervised learning cannot be directly applied to a regression or classification
problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Fig 13 Working of Unsupervised Learning

Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the
machine learning model in order to train it. Firstly, it will interpret the raw data to
find the hidden patterns from the data and then will apply suitable algorithms such as
k-means clustering, Decision tree, etc.

Unsupervised Learning algorithms: -

Below is the list of some popular unsupervised learning

algorithms:

• K-means clustering
• KNN (k-nearest neighbors)

28
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition

Clustering in Machine Learning

Clustering is very much important as it determines the intrinsic grouping among the
unlabeled data present. There are no criteria for good clustering. It depends on the
user, what is the criteria they may use which satisfy their need. For instance, we
could be interested in finding representatives for homogeneous groups (data
reduction), in finding “natural clusters” and describe their unknown properties
(“natural” data types), in finding useful and suitable groupings (“useful” data
classes) or in finding unusual data objects (outlier detection). This algorithm must
make some assumptions that constitute the similarity of points and each assumption
make different and equally valid clusters. Clustering or cluster analysis is a machine
learning technique, which groups the unlabeled dataset. It can be defined as "A way
of grouping the data points into different clusters, consisting of similar data points.
The objects with the possible similarities remain in a group that has less or no
similarities with another group does it by finding some similar patterns in the
unlabeled dataset such as shape, size, color, behavior, etc., and divides them as per
the presence and absence of those similar patterns. It is an unsupervised learning
method, hence no supervision is provided to the algorithm, and it deals with the
unlabeled dataset. After applying this clustering technique, each cluster or group is
provided with a cluster-ID. ML system can use this id to simplify the processing of
large and complex datasets.

29
Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of products. Netflix
also uses this technique to recommend the movies and web-series to its users as per
the watch history. The below diagram explains the working of the clustering
algorithm. We can see the different fruits are divided into several groups with
similar properties.

Fig 14 Clustering in Machine Learning

[Link] Reinforcement Learning:

This is a kind of learning in which the agents learn via reward system. Although
there is a start and finish points, the aim of the agent is to use the shortest and the
correct ways to reach the goal. When the agent goes through the correct ways, s/he
is given positive rewards. But the going through wrong ways means negative
rewards. Learning occurs on the way to the goal.

30
CHAPTER. 4
4. IMPLIMENTATION

4.1 CODE
import matplotlib. pyplot as plt def
plot_scatter_chart(df,location):
bhk2=df[([Link]==location)&([Link]==2)]
bhk3=df[([Link]==location)&([Link]==3)] [Link]['[Link]']=(15,10)
[Link](bhk2.total_sqft,[Link],color='Blue',label='2 BHK',s=50)
[Link](bhk3.total_sqft,[Link],color='green',marker='+',label='3
BHK',s=50)
[Link]('Total Square Foot')
[Link]('Price') [Link](location)
[Link]() plot_scatter_chart(data3,"Rajaji
Nagar") def remove_bhk_outliers(df):
exclude_indices=[Link]([])
for location, location_df in [Link]('location'):
bhk_sats={}
for BHK,BHK_df in location_df.groupby('BHK'): bhk_sats[BHK]={
'mean':[Link](BHK_df.price_per_sqft),
'std':[Link](BHK_df.price_per_sqft), 'count':BHK_df.shape[0]
}
for BHK,BHK_df in location_df.groupby('BHK'):
stats=bhk_sats.get(BHK-1)
if stats and stats['count']>5:

31
exclude_indices=[Link](exclude_indices,BHK_df[BHK_df.price_per_s
qft<(stats['mean'])].[Link])
return [Link](exclude_indices,axis='index')
data4=remove_bhk_outliers(data3) [Link]

32
CHAPTER 5
5 RESULT ANALYSIS
5.1 VISUALIZATION INSIGHTS:
2BHK Preference:
The observation that most houses sold are 2BHK suggests that buyers may prefer
smaller-sized homes, possibly due to factors such as affordability, family size, or
lifestyle preferences.

Fig. 15 Visualization insights

Location Diversity:
With houses from 255 different locations, 'Whitefield' and 'Sarjapur Road' emerge as
popular areas. This information is valuable for understanding market demand and
can aid in targeted marketing or investment decisions.

Fig 16 Location Diversity

33
Distribution Plots:
The distribution plots for 'bath', 'bhk', 'price', and 'total_sqft' provide insights into the
spread and variability of these features. Understanding their distributions can help in
identifying outliers, understanding central tendencies, and assessing data quality.
Train-Test Split and Model Building:

Data Splitting:
The dataset is split into training and testing sets, with 80% of the data used for
training and 20% for testing. This ensures that the model's performance is evaluated
on unseen data, providing a more accurate assessment of its generalization ability.

Model Selection:
Three regression models - Linear Regression, Lasso Regression, and Ridge
Regression - are chosen for predicting house prices. These models offer different
approaches to regression and can capture different aspects of the data's underlying
relationships.

Preprocessing:
One-hot encoding is used to handle the categorical feature 'location', while standard
scaling ensures that all features are on a similar scale, preventing any particular
feature from dominating the model training process.

Evaluation Metric:
R2 score, also known as the coefficient of determination, is employed as the
evaluation metric. It represents the proportion of the variance in the dependent
variable (house prices) that is predictable from the independent variables.

34
Fig. 17 Evaluation Metric

Result Analysis:

Model Performance:
Linear Regression and Ridge Regression exhibit similar performance, with R2
scores of around 0.82. This indicates that approximately 82% of the variance in
house prices is captured by these models.

Impact of Regularization:
Lasso Regression, which applies L1 regularization, slightly underperforms
compared to the other two models. The negligible difference in performance
between Ridge and Linear Regression suggests that regularization might not
significantly affect model performance in this scenario.

Overall Assessment: The models demonstrate good performance in

predicting house prices based on the given features. However, there is room for
further analysis, feature engineering, or tuning to potentially improve model
performance even further. In conclusion, the analysis highlights the

35
effectiveness of the chosen regression models in predicting house prices. The insights
gleaned from data visualization aid in understanding market dynamics, while model
evaluation provides valuable feedback for refining model selection and
preprocessing techniques.

5.2 ADVANTAGES

Predicting house prices using Python offers several advantages:

• Flexibility: Python provides a wide range of libraries and frameworks for

machine learning, such as Scikit-learn, TensorFlow, and PyTorch. This
flexibility allows developers to choose the best tools for their specific
requirements.
• Ease of Use: Python is known for its simplicity and readability, making it
accessible for both beginners and experienced developers. It offers clear and
concise syntax, which facilitates the implementation of machine learning
algorithms.
• Abundant Libraries: Python boasts a rich ecosystem of libraries and
packages for data analysis, preprocessing, visualization, and modelling. For
house price prediction, libraries like Pandas, NumPy, Matplotlib, and
Seaborn are invaluable for tasks such as data manipulation, exploration, and
visualization.
• Scalability: Python is highly scalable, allowing developers to work with
datasets of various sizes. Whether working with small datasets or big data,
Python's libraries and frameworks can handle the task efficiently.
• Community Support: Python has a vast and active community of
developers, data scientists, and researchers. This community provides
extensive documentation, tutorials, and forums, making it easy to find help
and resources for building predictive models.
• Integration: Python seamlessly integrates with other technologies and tools
commonly used in data science and machine learning

36
projects. It can be integrated with databases, web frameworks, and cloud
services, allowing for end-to-end development and deployment of predictive
models.
• Machine Learning Ecosystem: Python's machine learning ecosystem is
well-established and constantly evolving. It offers state- of-the-art
algorithms, techniques, and methodologies for solving predictive modelling
problems, including house price prediction.
• Interpretability: Python-based machine learning models are often highly
interpretable, allowing stakeholders to understand the factors driving
predictions. This transparency is crucial, especially in real estate, where
buyers, sellers, and agents seek to understand the rationale behind house
price estimates.
• Open Source: Python is open source and free to use, making it accessible to
everyone. This democratization of technology enables individuals and
organizations of all sizes to leverage machine learning for various
applications, including house price prediction.

5.3 DISADVANTAGES
While Python offers numerous advantages for house price prediction, there
are also some potential disadvantages to consider:

• Performance: Python, being an interpreted language, can be slower

compared to compiled languages like C++ or Java. For large-scale datasets
or complex models, Python's performance may become a limiting factor,
leading to longer training and inference times.
• Memory Usage: Python's memory management can be less efficient
compared to languages like C or C++. This inefficiency may lead to higher
memory usage, especially when working with large datasets or deep learning
models, potentially causing resource constraints on systems with limited
memory.

37
• GIL Limitation: Python's Global Interpreter Lock (GIL) can hinder
multithreaded performance, particularly in CPU-bound tasks. While libraries
like NumPy and Pandas can offload computation to optimized C or Fortran
code, certain operations may still be affected by the GIL, impacting parallel
processing performance.
• Dependency Management: Python's dependency management system,
particularly with respect to package versions and compatibility, can
sometimes be challenging. Dependency conflicts or version mismatches
between libraries may arise, requiring careful management and potentially
causing issues with model reproducibility.
• Debugging Complexity: Python's dynamic typing and flexible syntax, while
advantageous for development speed, can sometimes lead to more
challenging debugging processes. Errors may not be caught until runtime,
and troubleshooting issues in complex machine learning pipelines may
require significant effort.
• Limited Deployment Options: While Python excels in model development
and experimentation, deploying Python-based machine learning models into
production environments may present challenges.
• Interpretability: While Python-based machine learning models can offer
interpretability, certain advanced techniques such as deep learning may
produce less interpretable models. Understanding and explaining the
predictions of complex models may require additional effort and expertise.
• Security Risks: Python's open-source nature and extensive library
ecosystem can introduce security risks, particularly when using third- party
packages or dependencies. Ensuring the security of machine learning
pipelines and protecting against vulnerabilities requires careful attention and
proactive measures.

38
• Learning Curve: While Python's syntax is relatively easy to learn,
mastering the full spectrum of machine learning techniques and libraries can
be challenging. Beginners may face a steep learning curve, requiring time
and dedication to gain proficiency in data preprocessing, model selection,
and evaluation.

5.4 MAINTENANCE

• Maintaining a house price prediction system developed using Python

involves several key aspects to ensure its reliability, accuracy, and relevance
over time. Here are some maintenance tasks for such a system:
• Data Quality Monitoring: Continuously monitor the quality of input data to
detect any anomalies, missing values, or inconsistencies. Implement data
validation checks and automated alerts to notify when data quality issues
arise.
• Model Performance Monitoring: Regularly assess the performance of the
predictive models deployed in the system. Monitor key metrics such as
accuracy, precision, recall, and F1-score. Retrain models periodically using
updated data to maintain their effectiveness.
• Feature Engineering Updates: Stay informed about changes in the real estate
market and incorporate relevant features or indicators into the predictive
models. Periodically review and update feature engineering techniques
to capture new trends or patterns in housing data.
• Model Interpretability: Ensure that the predictive models remain
interpretable and explainable. Use techniques such as feature importance
analysis and model-agnostic interpretation methods to understand the factors
driving predictions and maintain transparency.
• Version Control: Implement version control for both code and data to track
changes and facilitate reproducibility. Maintain a version history

39
of the predictive models, along with the associated data preprocessing and
feature engineering pipelines.
• Security Measures: Implement security measures to protect the integrity and
confidentiality of the data used in the prediction system. Use encryption,
access controls, and secure communication protocols to safeguard sensitive
information.
• Scalability: Monitor system performance and scalability as the volume of
data and user traffic grows. Optimize code and infrastructure to handle
increasing workloads efficiently and ensure timely responses to user queries.
• Documentation: Maintain comprehensive documentation for the prediction
system, including model specifications, data sources, preprocessing steps,
and evaluation metrics. Document any changes or updates made to
the system over time.
• User Feedback Incorporation: Gather feedback from users and stakeholders
to identify areas for improvement and address any usability issues.
Incorporate user feedback into future iterations of the prediction system to
enhance user satisfaction and adoption.
• Continual Improvement: Continuously evaluate and refine the prediction
system based on feedback, performance metrics, and evolving business
requirements. Experiment with new algorithms, techniques, or features to
improve predictive accuracy and relevance.

40
5.5 APPLICATION
Table 1 Application

41
CHAPTER 6

6.1 CONCLUSION AND FUTURE DEVELOPMENTS

With several characteristics, the suggested method predicts the property price in
Bangalore. We experimented with different Machine Learning algorithms to get the
best model. When compared to all other algorithms, the Decision Tree Algorithm
achieved the lowest loss and the greatest R-squared. Flask was used to create the
website.
Let's see how our project pans out. Open the HTML web page we generated and run
the [Link] file in the backend. Input the property's square footage, the number of
bedrooms, the number of bathrooms, and the location, then click 'ESTIMATE
PRICE.' We forecasted the cost of what may be someone's ideal home.

The goal of the project "House Price Prediction Using Machine Learning" is to
forecast house prices based on various features in the provided data. Our best
accuracy was around 90% after we trained and tested the model. To make this model
distinct from other prediction systems, we must include more parameters like tax
and air quality. People can purchase houses on a budget and minimize financial loss.
Numerous algorithms are used to determine house values. The selling price was
determined with greater precision and accuracy. People will benefit greatly from
this. Numerous elements that influence housing prices must be taken into account
and handled.

42
REFERENCE
[1] Model “BANGALORE HOUSE PRICE PREDICTION MODEL”
[2] Heroku “Documentation”
[3] Repository: “Web Application” [Link]
House-Price-Prediction
[4] Repository: “Web Application” [Link]
Thakur/BANGALORE-HOUSE-PRICE-PREDICTION
[5] Pickle ‘’Documentation’
[6] A. Varma, A. Sarma, S. Doshi and R. Nair, "House Price Prediction Using
Machine Learning and Neural Networks," 2018 Second International Conference on
Inventive Communication and Computational Technologies (ICICCT), 2018, pp.
1936-1939, doi: 10.1109/ICICCT.2018.8473231.
[7] Furia, Palak, and Anand Khandare. "Real Estate Price Prediction Using Machine
Learning Algorithm." e-Conference on Data Science and Intelligent Computing.
2020.
[8] Musciano, Chuck, and Bill Kennedy. HTML & XHTML: The Definitive Guide:
The Definitive Guide. " O'Reilly Media, Inc.", 2002.
[9] Aggarwal, Shalabh. Flask framework cookbook. Packt Publishing Ltd, 2014.
[10] Grinberg, Miguel. Flask web development: developing web applications with
python. " O'Reilly Media, Inc.", 2018.
[11] Middleton, Neil, and Richard Schneeman. Heroku: up and running: effortless
application deployment and scaling. " O'Reilly Media, Inc.", 2013.
[12]Available:[Link]
Price_Prediction_using_a_Machine_Learning_Model_A_Survey_of_Literat ure
[13] House price prediction using a hedonic price model vs an artificial neural
network. American Journal of Applied Sciences. Limsombunchai, Christopher Gan,
and Minsoo Lee. 3:193–201.

43
[14] Joep Steegmans and Wolter Hassink. an empirical investigation of how wealth
and income affect one's financial status and ability to purchase a home. Journal of
Housing Economics. 2017;z36:8–24.
[15] Ankit Mohokar, Nihar Baghat, and Shreyash Mane. House Price Forecasting
Using Data Mining, International Journal of Computer Applications. 152:23–26.
[16] Joao Gama, Torgo, and Luis. Logic regression using Classification
Algorithms. Intelligent Data Analysis. 4:275-292.
[17] Fabian Pedregosa et al. Python's Scikit-learn library for machine learning,
Journal of Machine Learning Research. 12:2825–830.
[18] Real Estate Economics. Heidelberg, Bork M. and Moller VS, House Price
Forecast Ability: A Factor Analysis. 46:582–611.
[19] Hy Dang, Minh Nguyen, Bo Mei, and Quang Troung. Improvements to home
price prediction methods using machine learning. Precedia Engineering. 174:433-
442.
[20] Atharva Chogle, Priyankakhaire, Akshata Gaud, and Jinal Jain. A article titled
House Price Forecasting Using Data Mining Techniques was published in the
International Journal of Advanced Research in Computer and Communication
Engineering. 6:24-28.
[21] Kai-Hsuan Chu, Li, Li. Prediction of real estate price variation based on
economic parameters, International Conference on. IEEE, Applied System
Innovation (ICASI); 2017.
[22] Subhani Shaik, Uppu Ravibabu. Classification of EMG Signal Analysis based
on Curvelet Transform and Random Forest tree Method. Paper selected for Journal of
Theoretical and Applied Information Technology (JATIT). 95.
[23] Subhani Shaik, Uppu Ravibabu. Classification of EMG Signal Analysis based
on Curvelet Transform and Random Forest tree Method. Paper selected for Journal of
Theoretical and Applied Information Technology (JATIT). 95.

House Price Prediction with Python
No ratings yet
House Price Prediction with Python
50 pages
House Price Prediction with Python
No ratings yet
House Price Prediction with Python
50 pages
House Price Prediction Using ML
No ratings yet
House Price Prediction Using ML
4 pages
House Price Prediction Project Report
No ratings yet
House Price Prediction Project Report
32 pages
House Price Prediction Model Analysis
No ratings yet
House Price Prediction Model Analysis
27 pages
Housing Price Prediction Using ML
No ratings yet
Housing Price Prediction Using ML
7 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
92 pages
House Price Prediction Project Report
No ratings yet
House Price Prediction Project Report
21 pages
S2 PDF
No ratings yet
S2 PDF
21 pages
House Price Prediction Project Report
No ratings yet
House Price Prediction Project Report
22 pages
Krishna Sorthiya - House Price Prediction Using ML
No ratings yet
Krishna Sorthiya - House Price Prediction Using ML
41 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
3 pages
House Price Prediction by Shreya Majumder
No ratings yet
House Price Prediction by Shreya Majumder
22 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
46 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
7 pages
House Price Prediction with Linear Regression
No ratings yet
House Price Prediction with Linear Regression
14 pages
House Price Prediction System Report
100% (1)
House Price Prediction System Report
51 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
17 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
16 pages
House Price Prediction Project Report
No ratings yet
House Price Prediction Project Report
45 pages
House Price Prediction System Project
No ratings yet
House Price Prediction System Project
14 pages
Housing Price Prediction Project Report
No ratings yet
Housing Price Prediction Project Report
41 pages
House Price Prediction with ML Models
No ratings yet
House Price Prediction with ML Models
6 pages
House Price Prediction Model Analysis
No ratings yet
House Price Prediction Model Analysis
12 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
4 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
30 pages
IJCRT2111135
No ratings yet
IJCRT2111135
7 pages
House Price Prediction with ML Models
No ratings yet
House Price Prediction with ML Models
62 pages
House Price Prediction with ML & Flask
No ratings yet
House Price Prediction with ML & Flask
27 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
16 pages
House Price Prediction Mini Project
No ratings yet
House Price Prediction Mini Project
5 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
4 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
5 pages
Housepricepdf 2
No ratings yet
Housepricepdf 2
3 pages
House Price Prediction Using ML Models
No ratings yet
House Price Prediction Using ML Models
6 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
58 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
5 pages
Real Estate Price Prediction Report
No ratings yet
Real Estate Price Prediction Report
16 pages
House Price Prediction Project
No ratings yet
House Price Prediction Project
32 pages
House Price Prediction Using Linear Regression
No ratings yet
House Price Prediction Using Linear Regression
10 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
14 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
16 pages
Real Estate Price Prediction Project
No ratings yet
Real Estate Price Prediction Project
33 pages
House Price Prediction with ML in Python
80% (5)
House Price Prediction with ML in Python
85 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
1 page
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
7 pages
Bengaluru House Price Prediction Model
No ratings yet
Bengaluru House Price Prediction Model
6 pages
KIIT Housing Price Prediction Project
No ratings yet
KIIT Housing Price Prediction Project
33 pages
House Price Prediction Project Report
100% (1)
House Price Prediction Project Report
26 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
20 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
20 pages
Home Price Prediction with ML Techniques
No ratings yet
Home Price Prediction with ML Techniques
5 pages
Project - Synopsis - Format (1) (1) (1) Copy 2
No ratings yet
Project - Synopsis - Format (1) (1) (1) Copy 2
33 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
5 pages
Real Estate Price Prediction Tool
No ratings yet
Real Estate Price Prediction Tool
3 pages
House Price Prediction Model Overview
No ratings yet
House Price Prediction Model Overview
17 pages
House Price Prediction Project Report
No ratings yet
House Price Prediction Project Report
57 pages
AI, ML, and DL: Key Differences Explained
No ratings yet
AI, ML, and DL: Key Differences Explained
79 pages
Hotel Review Sentiment Analysis Using SVM
No ratings yet
Hotel Review Sentiment Analysis Using SVM
8 pages
Computer Vision Fire Detection System
No ratings yet
Computer Vision Fire Detection System
57 pages
Electric Bus Energy Consumption Model
No ratings yet
Electric Bus Energy Consumption Model
14 pages
Tire Wear Prediction Using Machine Learning
No ratings yet
Tire Wear Prediction Using Machine Learning
162 pages
Bacteria Classification via Deep Learning
No ratings yet
Bacteria Classification via Deep Learning
3 pages
Le Wagon Data Science Bootcamp Overview
No ratings yet
Le Wagon Data Science Bootcamp Overview
37 pages
APT Detection via Optimized CNN-LSTM
No ratings yet
APT Detection via Optimized CNN-LSTM
17 pages
Deep Learning CNN Architectures Explained
No ratings yet
Deep Learning CNN Architectures Explained
2 pages
Privacy Protection in E-Healthcare Data
No ratings yet
Privacy Protection in E-Healthcare Data
20 pages
Student Performance Prediction with ML
No ratings yet
Student Performance Prediction with ML
3 pages
Cardiocare Course Overview and Assessment
No ratings yet
Cardiocare Course Overview and Assessment
8 pages
Class X AI Sample Test Paper 2023-24
No ratings yet
Class X AI Sample Test Paper 2023-24
5 pages
Kidney Stone Detection Methodology
No ratings yet
Kidney Stone Detection Methodology
4 pages
Overfitting vs. Underfitting Explained
No ratings yet
Overfitting vs. Underfitting Explained
8 pages
Spam Email Classification Report Using NLP
100% (1)
Spam Email Classification Report Using NLP
18 pages
AI Model Evaluation Metrics Explained
No ratings yet
AI Model Evaluation Metrics Explained
35 pages
Machine Learning Approaches Explained
No ratings yet
Machine Learning Approaches Explained
27 pages
Diabetes Risk Stratification with ML
No ratings yet
Diabetes Risk Stratification with ML
17 pages
Sentiment Analysis Techniques Review
No ratings yet
Sentiment Analysis Techniques Review
14 pages
Best Algorithms for Non-Linear Classification
100% (1)
Best Algorithms for Non-Linear Classification
26 pages
Financial Time Series Stock Forecasting
No ratings yet
Financial Time Series Stock Forecasting
10 pages
CNN-Based Skin Disease Detection System
No ratings yet
CNN-Based Skin Disease Detection System
13 pages
Forecasting PV Output From PV Panels
No ratings yet
Forecasting PV Output From PV Panels
33 pages
Regression Tasks in Supervised Learning
No ratings yet
Regression Tasks in Supervised Learning
115 pages
Data Preprocessing in 6 Steps
No ratings yet
Data Preprocessing in 6 Steps
8 pages
IOAI 2025 Radar Semantic Segmentation
No ratings yet
IOAI 2025 Radar Semantic Segmentation
25 pages
Predicting Movie Box Office Success
No ratings yet
Predicting Movie Box Office Success
13 pages
Overview of Scikit-Learn Library
100% (2)
Overview of Scikit-Learn Library
12 pages
Lane Change Prediction with BDD100K Data
No ratings yet
Lane Change Prediction with BDD100K Data
10 pages

House Price Prediction with ML Techniques

Uploaded by

House Price Prediction with ML Techniques

Uploaded by

A PROJECT REPORT

“HOUSE PRICE PREDICTION USING PYTHON”

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

COMPUTER SCIENCE & ENGINEERING

NAME OF THE STUDENT REGISTRATION NO.

1 SUCHISMITA SAHOO 2221304049

GANDHI INSTITUTE OF TECHNOLOGY AND MANAGEMENT

Er. Prasanjit Das

This project is done as a semester project, as a part course titled

NAME OF THE STUDENT REGISTRATION NO.

1 SUCHISMITA SAHOO 2221304049

3 TUSAR KANTA DHAL 2221304052

We propose to implement a house price prediction model of

Figure 1 Introduction to Machine Learning

The real estate industry is a significant contributor to the global economy.

2.1 Literature Survey

We are conducting an analysis of various Machine Learning algorithms in this

As a first project, we intended to make it as instructional as possible by tackling

3.2.1 Introduction to machine learning

Fig. 1 Introduction to Machine Learning

Fig. 2 Working of Machine Learning

Features of Machine Learning: -

3.2.4 Applications of Machine learning: -

Machine learning Life cycle

Fig 3 Application of Machine Learning

3.2.5 Machine Learning Classifications

[Link] Supervised Learning

Fig. 4 Supervised Learning

Suppose we have a dataset of different types of shapes which includes square,

Steps Involved in Supervised Learning

• First Determine the type of training dataset

Types of supervised Machine learning Algorithms:

Linear Regression in Machine Learning: -

Fig 6 Linear Regression in Machine Learning

Mathematically, we can represent a linear regression as:

Types of Linear Regression:

Linear Regression Line

Fig 7 Linear Regression Line

Negative Linear Relationship

Finding the best fit line:

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the

Fig 9 Random Forest Algorithm

Why use Random Forest?

• It takes less training time as compared to other algorithms.

Applications of Random Forest

Advantages of Random Forest

Advantages of Supervised learning

Disadvantages of supervised learning: -

Fig 10 Decision Tree Classification Algorithm

• There are various algorithms in Machine learning, so choosing the best

• Decision Trees usually mimic human thinking ability while making a

How does the Decision Tree algorithm Work?

Fig 11 Decision Tree algorithm Working

Fig 12 Logistic Regression

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

Fig 13 Working of Unsupervised Learning

Unsupervised Learning algorithms: -

Below is the list of some popular unsupervised learning

Clustering in Machine Learning

Fig 14 Clustering in Machine Learning

[Link] Reinforcement Learning:

Fig. 15 Visualization insights

Fig 16 Location Diversity

Overall Assessment: The models demonstrate good performance in

Predicting house prices using Python offers several advantages:

• Flexibility: Python provides a wide range of libraries and frameworks for

• Performance: Python, being an interpreted language, can be slower

• Maintaining a house price prediction system developed using Python

6.1 CONCLUSION AND FUTURE DEVELOPMENTS

You might also like