Review Paper of House Rate Prediction
Review Paper of House Rate Prediction
Chinmay Kansal
Student, Department of Computer Science
and Engineering
Inderprastha Engineering College
Ghaziabad, India
Abstract---The House Price Prediction System is an is lack of jobs due to this public is migrating for financial
innovative solution designed to estimate the market value of purpose.so result is increasing demand of housing in cities. People
residential properties using advanced machine learning who don’t know the actual price of that particular house and they
techniques. This system integrates multiple data sources, suffer loss of money. In this project, the house price prediction of
including historical property prices, location demographics, the house is done using different Machine Learning algorithms
property features, and economic indicators, to generate like Leaner Regression, Decision Tree Regression, K- Means
accurate predictions. Regression and Random Forest Regression. 80% of data form
kwon dataset is used for training purpose and remaining 20% of
At its core, the system employs algorithms like regression data used for testing purpose. This work applies various
analysis, decision trees, or neural networks to analyze large techniques such as features, labels, reduction techniques and
datasets and capture complex relationships between transformation techniques such as attribute combinations, set
variables. The incorporation of data preprocessing missing attributes as well as looking for new correlations. This all
techniques, such as feature scaling and handling missing indicates that house price prediction is an emerging research area
values, ensures the reliability and robustness of the and it requires the knowledge of machine learning.
predictions. Geographic Information System (GIS)
integration provides insights into location-based trends,
highlighting the impact of neighborhood characteristics on
house prices.
Over long ago, there is manually decide the price of any property. In this conference paper we have to analyse the different Machine
But problem is that in manually there are 25% percent error is Learning algorithms for better training Machine Learning model.
occurred and such affect is loss of money. But now there is big Trends in housing cost show the current economic situation and
change by changing the old technology. Today’s Machine as well as to directly concern with buyers and sellers. Actual cost
Learning is trending technology. Data is the heart of Machine of house is depending on so many factors. They include like no of
Learning. Nowadays the booming of AI and Machine Learning in bedrooms, number of bathrooms, and location as well.in rural area
market. All industry are move towards automation. But without cost is low as compare to city. The house price grate with like near
data we can’t train model. Basically in Machine Learning involves to highway, mall, super market, job opportunities, good
building these model from previous data and by using them to educational facilities etc. Over few years ago, the real estate
predict new data. The market demand for housing is increases companies trying to predict price of property by manually. In
daily because our population is rising rapidly. In rural area there company there is special management team is present for
prediction of cost of any real estate property. They are decide descriptive data, providing a unique perspective on how
price manually by analysing previous data. But there 25% of error qualitative factors can influence pricing. In conclusion, the
is occurred on that prediction.so there is loss of buyers as well as literature on house price prediction systems reveals a dynamic
sellers. Hence there are many systems are developed for house field that integrates traditional statistical methods with advanced
price prediction. Sifei Lu, Rick Siow had proposed advance house machine learning techniques. By continuously refining models
prediction system. The main objective of this system’s was to and incorporating diverse data types, researchers aim to improve
make a model which give us a good house price prediction based the accuracy and reliability of house price forecasts. This ongoing
on other features. research is crucial for stakeholders in the real estate market, as it
directly impacts investment decisions and policy formulation.
House price prediction is a significant area of research due to its
implications for real estate markets, housing policies, and
economic forecasting. Predicting house prices involves analyzing III. PROPOSED SYSTEM
various factors that influence real estate values, such as location,
size, amenities, and market trends. Accurate predictions can help In this proposed system, we focus on predicting house price using
buyers, sellers, and policymakers make informed machine learning algorithms like Leaner Regression, Decision
decisions.Recent studies increasingly focus on machine learning Tree, k-Means, and Random Forest. We proposed the system
(ML) techniques to enhance the accuracy of house price “House Price Prediction Using Machine Learning” we have
predictions. Below are five notable research papers that contribute predict the house price using multiple features. In this proposed
to this field: system, we are able to train model from various features like ZN,
INDUS, CHAS, RAD etc. the previous data taken and out of this
80% of data is used for training purpose and remaining 20% of
1. Zulkifley, N. H., & Nasir, M. H. (2020). Machine data used for testing purpose. Hare, the raw data is stored in ‘.csv’
Learning for House Price Prediction: A Review of file. We are majorly used two machine learning libraries to solve
Techniques and Applications. This paper reviews these problems. The first one was ‘pandas’ and another one is
various machine learning techniques applied to house ‘numpy’. The pandas used for to load ‘.csv’ file into Jupiter
price prediction, focusing on the effectiveness and notebook and also used to clean the data as well as manipulate the
efficiency of each method. data. Another was sklearner, which was used for real analysis and
2. Khan, A. A., & Khan, M. N. (2021). A Comparative it has containing various inbuilt functions which help to solve the
Study of Machine Learning Algorithms for House Price problem.one more library was used which is nothing but numpy.
Prediction. In this study, the authors compare several For the purpose of train-test splitting numpy was used.
ML algorithms, including Linear Regression, Decision
Trees, and Random Forests, assessing their prediction
accuracy using various datasets. IV. SYSTEM DESIGN AND ARCHITECTURE
3. Cheng, Y., & Huang, X. (2019). Factors Influencing
House Prices: A Statistical Analysis. This research Phase I: collection of data
examines various economic and demographic factors
influencing house prices and emphasizes the We are collected data for real estate from different online real
importance of integrating these variables into predictive estate websites and repository. In such data have features like
models. ‘ZN’, ‘INDUS’, ‘RAD’, ‘CHAS’, ‘LSTAT’, ‘CRIM’, ‘AGE’,
4. Agus, M., & Ismail, M. (2022). Leveraging Natural ‘NOX’ etc. and one label is ‘MEDV’. We must collect the data
Language Processing for Real Estate Price Prediction. which is well structured and categorized. When we are start to
This innovative study focuses on the use of NLP solve any machine learning problem first data is must require.
techniques to analyze property descriptions, Dataset validity is must otherwise there is no point in analysing
demonstrating the potential for textual data to enhance the data.
prediction accuracy.
5. Gao, Y., & Zhang, F. (2023). Deep Learning Phase II: Data pre-processing
Approaches for Housing Price Prediction: A Systematic
Review. This paper systematically reviews the In this phase, our data is clean up. There is might be missing
application of deep learning models in predicting house values in our dataset. There are three ways to fill our missing
prices, highlighting their advantages over traditional values: 1) Get rid of the missing data points.2) Get rid of the whole
methods and discussing future research directions. attribute. 3) Set the value to some value (0, mean or median).
A critical aspect of house price prediction research is the Phase III: Training the model
comparative analysis of different algorithms. The study by Khan
and Khan emphasizes the importance of evaluating models based In this phase, data in broken down into two part: Training and
on accuracy and error rates. By comparing various machine Testing. There are 80% of data is used for training purpose and
learning techniques, researchers aim to identify the model that reaming 20% used for testing purpose. The training set include
provides the best performance in predicting house prices. The target variable. The model is trained by using various machine
House Price Index (HPI) is commonly referenced in studies as a learning algorithms and getting the result. Out of these Random
metric for estimating changes in housing prices. Research forest regressions predict batter results.
indicates that housing prices are strongly correlated with various
factors, including economic indicators, demographic Phase IV: Testing the model
trends, and geographical features. Understanding these
correlations is essential for developing robust predictive Finally, the trained model is applied to test dataset and house
models.An innovative approach discussed in the literature price predicted. The trained model is save by using ‘.joblib’.
involves using textual descriptions of properties to enhance
prediction accuracy. The work by Agus and Ismail leverages
natural language processing (NLP) techniques to analyze
This adaptability is crucial for maintaining accurate predictions in
a dynamic real estate landscape. In conclusion, the research on
house price prediction is evolving rapidly, driven by
advancements in machine learning, the integration of textual data,
and the application of robust statistical methods. As the real estate
market continues to change, ongoing research will be vital in
refining predictive models and enhancing their accuracy. The
ability to forecast housing prices accurately is essential for
stakeholders in the real estate sector, providing them with the
information needed to make informed decisions and navigate the
complexities of the market effectively.
VI. METHODOLOGY
Try to find out new attribute from collision of old attribute. For
ex. By using ‘TAX’ and ‘RM’ find ‘TAXRM’ is new attribute.
Our MEDV= 1.00000 and TAXRM = -0.558626 which shows
that ‘TAXRM’ strongly negative correlation with ‘MEDV’
In above data, the ‘RM’ column have total 399 data point out of
404.some data point are missing. To use value of median to set
missing points. After setting missing point ‘RM’ column has all
total 404 data points are fulfil. After that, creating a pipeline for
the execution. For this purpose from sklearn import pipeline.
1. Data Preprocessing To use various machine learning algorithms for solving this
o Data Cleaning: Handle missing values, problem. Out of that the Random forest is predict better accuracy
remove duplicate records, and correct than other models.
inconsistent formats.
o Feature Engineering: Generate new variables Final RMSE = Mean Standard
like price per square foot, distance to city 2.9131988953 Deviation
center, and neighborhood quality indices.
o Normalization/Standardization: Scale
Leaner 4.221894675 0.752030492
numeric data to improve model convergence Regression
and comparability. Decision Tree 4.189504504 0.848096620
o Categorical Encoding: Use one-hot encoding K-Means 21.91834139 2.115566025
or label encoding for categorical attributes
(e.g., house type, region). Random Forest 3.494650261 0.762041223
2. Exploratory Data Analysis (EDA)
o Visualization: Correlation heatmaps to study Fig 7. Models Outputs
attribute relationships with price, scatter plots
for distribution analysis, and bar charts for
categorical breakdowns. IX. CONCLUSION
o Outlier Detection: Identify anomalies in data,
such as extremely high/low prices, using In conclusion, we have successfully developed a machine learning
boxplots and z-scores. web solution to predict house prices based on various features.
o Statistical Analysis: Use regression The solution involves collecting and cleaning data, building and
diagnostics to validate assumptions of training a linear regression model. Moreover, we have
linearity and independence. incorporated hyperparameter tuning to optimize the model's
3. Model Selection performance further. This improves the model's ability to predict
o Baseline Models: Start with linear regression, house prices accurately, leading to better decision-making for
decision trees, and k-nearest neighbors both buyers and sellers in the real estate market. By implementing
(KNN) for initial testing. the model in a web-based solution, users can input data on a
o Advanced Models: house, and the solution will provide an estimated price based on
Gradient Boosting Machines the model's predictions. This makes it easier for buyers and sellers
(XGBoost, LightGBM). to obtain a rough estimate of a property's value without the need
Random Forests for handling non- for extensive research. Overall, this machine learning web
linear interactions. solution for house price prediction provides a valuable tool for the
Deep learning models like real estate industry and can aid in making more informed
Artificial Neural Networks (ANNs) decisions regarding property values.
and Convolutional Neural
Networks (CNNs) to capture
complex data patterns. X. FUTURE WORK
o Ensemble Techniques: Combine multiple
models (e.g., bagging or stacking) for better Further exploration of data with additional features should be
performance. conducted through comprehensive feature engineering to enhance
4. Hyperparameter Tuning the model's predictive capabilities. It's essential to investigate
o Use grid search or Bayesian optimization to advanced ensemble methods such as stacking or blending to
fine-tune parameters such as learning rates, leverage the strengths of multiple models for improved
tree depth, and dropout rates for neural performance. Additionally, the enhancing model interpretability
networks. through techniques like feature importance analysis and SHAP
5. Model Evaluation values can provide insights into the factors influencing house
o Metrics: Mean Absolute Error (MAE), Mean prices. To address imbalanced data issues, consider employing
Squared Error (MSE), and Root Mean sampling techniques or alternative evaluation metrics. It's crucial
Squared Error (RMSE) for accuracy. to develop a robust deployment strategy for the model, ensuring
o Cross-Validation: Perform k-fold cross- scalability and efficient prediction handling. Implement
validation to ensure model generalization. continuous monitoring mechanisms to track model performance
6. Implementation of Explainable AI (XAI) over time and detect potential issues promptly. Enhance the user
o Use techniques like SHAP (Shapley Additive interface of the application to improve user experience and
Explanations) and LIME (Local Interpretable usability. Lastly, incorporate a feedback loop to gather user
Model-Agnostic Explanations) to interpret feedback and iteratively improve the model.
model predictions and attribute importance.
7. Deployment and Testing
o Deployment Frameworks: Use Flask or XI. REFERENCES
FastAPI to build RESTful APIs for model
deployment. [1] Lakshmi, B. N., and G. H. Raghunandhan. "A conceptual
o Testing: Ensure system robustness by testing overview of data mining." 2011 National Conference on
against unseen data and edge cases. Innovations in Emerging Technology. IEEE, 2011.
[4] Arietta, Sean M., et al. "City forensics: Using visual elements
to predict non-visual city attributes." IEEE transactions on
visualization and computer graphics 20.12 (2014): 2624-2633.
[5] Yu, H., and J. Wu. "Real estate price prediction with
regression and classification CS 229 Autumn 2016 Project Final
Report 1–5." (2016).
[6] Li, Li, and Kai-Hsuan Chu. "Prediction of real estate price
variation based on economic parameters." 2017 International
Conference on Applied System Innovation (ICASI). IEEE, 2017.
[9] Pow, Nissan, Emil Janulewicz, and Liu Dave Liu. "Applied
Machine Learning Project 4 Prediction of real estate property
prices in Montréal." Course project, COMP-598, Fall/2014,
McGill University (2014).