Document 4 (1)
Document 4 (1)
UNIVERSITY COLLEGE
OF COMPUTING
DEPARTMENT OF CS
AI PROJECT
Title: House Price
Prediction
Name ID
1.Tedy Ugr/
Shimelis 32313/14
Problem Statement
Problem: The real estate market requires accurate predictions of
house prices based on various features to assist buyers, sellers, and
investors in making informed decisions.
Goal: To develop a predictive model that can estimate house
prices using available features such as location, size, and
demographics.
Solution Approach
How the Problem is Solved:
Data preprocessing is performed to ensure the dataset is clean and
suitable for modeling.
A linear regression model is trained using the processed features.
The model is evaluated based on its predictive performance on a test
dataset.
Model Training
The script train_and_create_model.py outlines the model training
process:
Data Loading: The dataset is read using
pandas.read_csv('housing.csv')
to load the data.
Data Preprocessing:
o Missing values are imputed using the mean strategy with
data.fillna(data.mean(), inplace=True).
o Categorical features are encoded using from
sklearn.preprocessing import LabelEncoder.
Model Training:
o A LinearRegression model is trained using
model.fit(X_train, y_train).
o The trained model and preprocessing objects are saved
using
joblib.dump(model, 'model.joblib').
Data Preprocessing Explained
Data preprocessing is critical for ensuring the quality of the dataset.
Handling Missing Values: Missing data can skew results. Using the
mean for imputation helps maintain overall data integrity.
Encoding Categorical Variables: Categorical data needs to be converted
into a numerical format for the model to process it effectively.
Model Evaluation
After training, the model's performance is evaluated using metrics
such as:
Mean Absolute Error (MAE): Measures the average magnitude of
errors in a set of predictions.
R-squared: Indicates the proportion of variance in the dependent
variable predictable from the independent variables.
Prediction Process
The script predict_with_model.py implements the prediction
functionality:
It loads the trained model and preprocessors using
model = joblib.load('model.joblib').
A GUI is created using Tkinter to allow users to
select a CSV file for prediction.
Predictions are made using the trained model with
predictions = model.predict(X_test), and results are
displayed in the GUI.
GUI Functionality
The GUI allows users to:
Select a CSV file containing new housing data.
View the predictions made by the model in real-
time.
Export the predictions to a new CSV file for further
analysis.
Conclusion
The analysis highlights the significance of thorough data
preprocessing and model training for accurate predictions.
The developed predictive model can effectively estimate house prices
based on various features, providing valuable insights for
stakeholders in the real estate market.