Car Price Prediction Using Neural Networks
Car Price Prediction Using Neural Networks
By
Zeeshan Asghar(B210317003)
Hamza Afzal(B210317034)
Reg.No:2021-UOK-04803
Reg.No:2021-UOK-04834
Session:2021-2025
BS HONORS
IN
ARTIFICIAL INTELLIGENCE
By
Zeeshan Asghar(B210317003)
Hamza Afzal(B210317034)
Reg.No:2021-UOK-04803
Reg.No:2021-UOK-04834
Session:2021-2025
Supervisor
Ma’am Umaira Abbasi
Due to emergence in the field of technologies, the field of Artificial Intelligence is also
developing rapidly. In today’s world there are a lot of invention happening in the field of
Artificial Intelligence like from driving manual car to self driving and fully AI-based cars. All
of this happened due to advancements in technologies.
In our paper we also aim for the developing a model that kind a deals with cars queries like
price prediction and other things. For this purpose we are using different techniques of machine
learning and neural network techniques. Also we summarize the results of our model in this
paper.
3 Scope .................................................................................................................................. 2
4 Objectives ........................................................................................................................... 2
7 Model selection................................................................................................................... 4
8 Dataset Collection............................................................................................................... 5
9 Implementation ................................................................................................................... 6
10 Results ............................................................................................................................. 8
11 Conclusion .................................................................................................................... 12
List Of Figures
Figure 8-1 Snape of dataset........................................................................................................ 5
Figure 10-1 Home Page ............................................................................................................. 8
Figure 10-2 Home Page ............................................................................................................. 9
Figure 10-3 Selling Details ........................................................................................................ 9
Figure 10-4 Confusion Matrix ................................................................................................. 10
Figure 10-5 Histogram Distribution......................................................................................... 11
Figure 10-6 Categorial Distribution ......................................................................................... 11
Figure 10-7 2-D Distribution ................................................................................................... 11
Figure 10-8 2 -D Categorial distribution ................................................................................. 11
Figure 10-9 Plot Graph of Loss Data ....................................................................................... 12
1 Introduction
In today’s worlds there are lot of things happenings around the world and vice versa. The world
become Global Village after the invention of Internet and due to invention of internet E-
commerce made rapid progress and now everything available on the E-commence store from
a small needle to dresses and other things like cell phones, laptops and other things. All of these
happened through internet. In today’s world everything will be at your door after the few clicks
of your fingers. Similarly, there a lot of advancements happening in the field of automation and
vehicles industries from driving the manual car to automatic cars, from driving the voice
command navigation base system cars to self-driving cars. All of these happened due to
advancement in technology.
In today’s world Artificial Intelligence also making rapid progress and making the human life
way easier as compared to past few years. Like robots work in the homes doing the duties, in
the medical field helps the doctors for diagnosing the diseases and vice versa. Also, Artificial
Intelligence making rapid progress in the field of automation and vehicles industries. In our
paper we aim to develop a Neural Network model that kind of predict the prices of cars. The
cars may be used cars or new cars. We aim to develop that kind of model that predict cars prices
on the basis of their structure like either the cars are fuel type or diesel type. How many kms
are driven by the car? Like the basis o such queries our model can be trained and predict the
prices. Prediction is like kind of games for those who are fond of the cars and vice versa. We
use different techniques of neural network to train the model.
2 Background Knowledge
Predicting car prices involves leveraging various factors such as market demand, supply,
economic conditions, brand reputation, vehicle specifications, and historical pricing data. Also
predicting the price, the new or used cars analyze the new trends of the markets like what kind
of cars is considered is good in market and what kind of cars can give the good milage and
better performance. You have to analyze all these trends before the entering to the market. Also,
you have a good understanding of the economy of the market. The most important thing is that
knowledge about interest rate and low rate of the cars. These are also including the brands of
the cars which brands car are looking for prediction and what specifications these cars have.
Overall, you have to analyze the historical price of cars and demands of the cars.
1
3 Scope
Now come to the scope of the project “Car Price Prediction” that what kind of the scope that
this project holds. Scope of the project refers to the dimensions, range of the topic. In which
extent does the topic or project can cover the dimensions of the project. It also defines the
parameters and boundries of the project that can cover. In the case of our project the scope of
our project is given below;
These are the overall scope of the our project that are cover through entire process of training
the model.
4 Objectives
After the scope of the project now come to the objectives of the our project. What kind of
objectives and goals that our project can be achieved after the completion of the project which
is “Car Price Prediction” in neural network. The objectives of the our project are given below;
5 Problem Statement
Before going to next phase come to important thing which is that why we develop this model?
What is the nature of the problem that convinced us to develop such kind of model? These are
2
the question that comes to our mind first before doing this work. So the problem of our model
is that apart from the people who are experts in cars and have broader knowledge about the
cars and their parts. Other people don’t know such things. They don’t know the what kind of
engine is that car is using and what is the average millage of the car. What is the car type? What
is the nature of the engine? Either the car is automatic or manual. These are basic things or
specs that a person should know before purchasing the car. So that is the motive behind of our
project.
6 Literature Review
Now come to the related work of our project “Car Price Prediction”. In recent years there are
lot of work happened on different prediction types projects like stock price prediction, house
price prediction, land price prediction and also car price prediction. They use different
techniques and different algorithms in their projects and have different accuracy according to
the classifier or model they used. Here are the brief literature review of these projects in given
table;
Table 1-Error! Use the Home tab to apply 0 to the text that you want to appear here.-1 Accuracy of different
projects
3
These are the results of the different projects related to prediction projects. Remember that they
use different techniques, some of them use linear regression and some of them use logistic
regression and some of them use CNN model for classification.
7 Model selection
In this section we discuss about method of our project that how our project work and which
steps are taken by us. All of them are explained in this section and also which model are we
using these are all explained in this section. So the model which we are using in this project is
RNN model of neural network. Basically we are implementing our project by using artificial
neural network so we are using RNN model for this purpose.
In the realm of artificial intelligence and machine learning, Recurrent Neural Networks (RNNs)
stand out as a powerful tool for modeling sequential data. Unlike traditional feedforward neural
networks, which process input data in a fixed-size vector format, RNNs are designed to handle
sequences of variable lengths. This flexibility makes them well-suited for tasks involving time
series data, natural language processing, speech recognition, and many other applications
where data is inherently sequential in nature.
At the heart of an RNN lies its recurrent nature, which enables it to maintain an internal state
or memory that captures information about previous inputs in the sequence. This memory
allows RNNs to exhibit dynamic temporal behavior, making them adept at capturing patterns
and dependencies over time. The fundamental building block of an RNN is the recurrent
neuron, which processes input data at each time step while also incorporating information from
previous time steps. This feedback loop enables RNNs to exhibit temporal dynamics and learn
from sequential patterns present in the data.
One of the key advantages of RNNs is their ability to handle inputs of varying lengths, making
them well-suited for tasks such as sequence prediction, language modeling, and sentiment
analysis. Moreover, RNNs can be trained using backpropagation through time (BPTT), which
extends the traditional backpropagation algorithm to handle sequences by unfolding the
network over time.
Despite their strengths, traditional RNNs suffer from certain limitations, such as the vanishing
gradient problem, which can hinder their ability to capture long-range dependencies in
sequential data. To address these shortcomings, several advanced architectures have been
4
developed, including Long Short-Term Memory (LSTM) networks and Gated Recurrent Units
(GRUs). These architectures incorporate mechanisms to better preserve and update information
over long sequences, making them more effective for tasks requiring memory retention over
extended periods. In recent years, RNNs have demonstrated remarkable success across a wide
range of applications, from machine translation and speech recognition to time series
forecasting and autonomous driving. Their ability to capture complex sequential patterns and
model temporal dependencies has propelled them to the forefront of modern machine learning
research, making them an indispensable tool for tackling real-world problems in diverse
domains.
8 Dataset Collection
The dataset that is collected for our project name “Car Price Prediction” is dataset of Indian
vehicles and co. The reason behind choosing the data of Indian vehicles is that the automation
and vehicles industries of Indian market is the third largest market in the world. They make
their owns cars an also import made in India cars to other countries. That’s why we choose the
data of Indian vehicles. Also our data is noisy data and we apply several techniques for cleaning
the data and remove the duplicate and null value from the data. Our data is in the tabular form
and also the data is in the form of CSV file which can be readable and also we can make
changing in the dataset. The other features of dataset is given below;
5
9 Implementation
Now come to the section of implementation of our project in which we explained how we
implement our project and how many python frameworks we used in this project. As you all
know that developer use different frameworks for developing machine learning model for
training. Here we also use different techniques for our model training and developing like
NumPy, pandas, seaborn, matplotlib etc. The further detail also given in this section one by
one.
• Pandas
First, we use pandas library which is widely used in machine learning, data mining for analysis
of data. Pandas is used for cleaning and preprocessing of data to remove redundancy and
duplication of data foe better training and accuracy of models.
• NumPy
NumPy is a core Python library for numerical computing. It introduces the ND array, a
powerful array object, which enables efficient storage and manipulation of large datasets. With
NumPy, you can perform various mathematical operations on arrays, including basic
arithmetic, statistical, and linear algebra functions. Its efficient implementation in C and
Fortran makes it ideal for handling numerical computations. NumPy serves as a foundation for
many other scientific computing libraries in Python, providing essential tools for tasks like data
analysis, machine learning, and scientific simulations.
• Matplotlib
Matplotlib is a popular Python library for creating static, interactive, and animated
visualizations. It offers a comprehensive set of plotting tools that enable users to generate a
wide range of plots and charts, including line plots, scatter plots, bar plots, histograms,
heatmaps, and more. Matplotlib's flexibility allows for customization of every aspect of a plot,
including colors, line styles, markers, labels, and annotations, ensuring that visualizations can
be tailored to specific needs and preferences.
One of Matplotlib's key strengths is its integration with other Python libraries such as NumPy
and pandas, allowing users to easily visualize data stored in arrays or DataFrames. Additionally,
Matplotlib supports a variety of output formats, including PNG, PDF, SVG, and interactive
formats for use in web applications or interactive notebooks.
6
Matplotlib's object-oriented interface provides fine-grained control over plots, enabling users
to create complex layouts with multiple subplots, axes, and figures. Furthermore, Matplotlib's
pyplot interface offers a convenient way to create quick plots and simple visualizations with
minimal code.
• Seaborn
Seaborn is a Python data visualization library designed to complement and enhance
Matplotlib's capabilities. It provides a higher-level interface for creating attractive and
informative statistical graphics with ease. Seaborn's strength lies in its ability to produce
visually appealing plots while emphasizing statistical relationships in the data. By leveraging
concise functions and built-in presets, Seaborn simplifies the process of generating complex
visualizations, making it accessible to users of all skill levels.
One of Seaborn's key features is its seamless integration with pandas Data Frames, allowing
users to directly input data stored in pandas objects into its plotting functions. This tight
integration streamlines the visualization process, enabling users to focus on exploring and
analyzing their data rather than managing plotting intricacies.
Seaborn comes with a variety of default aesthetics, including styles and color palettes, which
enhance the visual appeal of plots. Users can easily customize the appearance of their
visualizations using these built-in options or by creating their own custom styles and palettes.
Furthermore, Seaborn offers a wide range of plot types, including univariate and bivariate
distributions, regression plots, categorical plots, and more. These plot types enable users to
explore complex relationships within their data, providing insights into patterns and trends that
may not be immediately apparent from raw data alone.
• Sklearn
It is simple and widely used library in machine learning model training. It provide simple
toolkit for various machine learning tasks like classification, clustering, regression, prediction
and other major tasks. Scikit-learn offers a comprehensive selection of supervised and
unsupervised learning algorithms, including support vector machines, random forests, k-
nearest neighbors, gradient boosting, clustering algorithms, and many more.
7
• Tenserflow
These are the libraries and framework are used by us for completing the our project.
10 Results
Here are the results of our model. How many epochs run and what is the accuracy of our model
also how our model can predict the prices of the cars. All of these include in this section and
will be explained by one by one.
• Home page
Here is the front page of our application where you can predict the price of different cars by
selecting the various feature of car like car name, car company, fuel type and how many car is
driven. [1]
8
Figure 10-2 Home Page
These are the few results of our model training and also you can see that the price prediction
of different cars. Also you can select various companies car and then you can try to predict the
price of these cars. The form is designed in CSS and HTML and also in python by using stream
lit library which is widely used in for web page and user interface.
• Visualization of data
In this section we explained the visual of the data and how they are related to each other and
how they compare to each other in the form of the graphs.
9
In above graph one can clearly understand that what’s happening there. In first half of the graph
the graph is about the numbers of the cars that how many are petrol cars and how many are
diesel cars. It is clearly mentioned in the graph that total number of petrol cars are round about
2100 plus and the total count of diesel cars are 2200 plus. Similarly there are also other types
of cars such as CNG and Electric cars but there count are very less so these cars are not
significant enough to write about them.
Now come to the next half which is about seller type. In this section it explained about how
many cars are sold through individual process and how many are sold through dealer type. So
according to this plot most of the cars are sold individually and vice versa.
In third half which is about the engine type of the cars either the car is automatic or the car is
manually. So according to the bar graph most of the cars are manual type and less are automatic
cars.
Now come to final part of the graph which is about the detail of owner type either the car are
second owner or first owner or third owner. So according to the graph of these details most of
the cars are first owner which means they are brand new cars. Also some are second hand cars
and some of these cars are drive and test cars which is used for driving and testing of the cars.
• Heat Map
Here Is the heat map or multi confusion matrix of our dataset and data related to cars collection.
In this map you can clearly see that our target value accuracy which is selling price has 1.0
which means our model prediction is accurate and there is less chances of mis calculation and
others vice versa.
• Data Distribution
10
This explain that how our data is distributes through data points and what is the shape of data
either our data is categorial data or 2d categorial distribution. All of this things are explained
in this section.
Distributions
Categorical distributions
2-d distributions
• Graph of Loss
Here is the plot of how many amount of data we loss during the training. There are two line
that indicate how much we loss data and how much we loss value.
11
Figure 10-9 Plot Graph of Loss Data
• Accuracy
Finally come to the final results of our model training and accuracy of our model that how
percent our model can predict the price and what is accuracy of these results. For this we run
multiple iteration and epochs for testing the model training. Also we change the amount of
layers and dense of the model to see the results at different layers and epochs. Our model
accuracy changes differently according to number of layers and epochs. The final result is given
below;
These are the results of our model training and you can see that the accuracy varies from
different epochs and layers.
11 Conclusion
In conclusion, it appears that using Recurrent Neural Networks (RNN), a type of Artificial
Neural Network (ANN), to anticipate car prices is a promising method. The prediction model's
efficiency and accuracy are increased when data manipulation and preparation are done using
libraries like pandas and numpy. The RNN architecture is especially well-suited for time-series
12
forecasting tasks such as automobile price prediction because of its capacity to capture
sequential patterns in data. We can create reliable models that provide insightful information
on the dynamic nature of the automobile market and help stakeholders make well-informed
decisions by utilizing these tools and methodologies.
13