0% found this document useful (0 votes)
88 views25 pages

Seminar Presentation

Machine learning algorithms like linear regression, random forest, gradient boosting and randomized search cross-validation were used to predict used car prices. Ensemble models like random forest and gradient boosting performed better than individual models with R2 scores of 0.9 and 0.89 respectively. Linear regression, random forest regression and gradient boosting models were evaluated using R2 score and mean squared error. Plots were generated to visualize the predictions.

Uploaded by

PaDiNjArAn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views25 pages

Seminar Presentation

Machine learning algorithms like linear regression, random forest, gradient boosting and randomized search cross-validation were used to predict used car prices. Ensemble models like random forest and gradient boosting performed better than individual models with R2 scores of 0.9 and 0.89 respectively. Linear regression, random forest regression and gradient boosting models were evaluated using R2 score and mean squared error. Plots were generated to visualize the predictions.

Uploaded by

PaDiNjArAn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

USED CAR PRICE

PREDICTION
USING
MACHINE LEARNING
1. Machine Learning
● Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and
algorithms to imitate the way that humans learn, gradually
improving its accuracy.
● Machine learning is an important component of the growing
field of data science.
● Through the use of statistical methods, algorithms are trained
to make classifications or predictions, uncovering key insights
within data mining projects.
3
2. Ensemble Learning
● Ensemble learning helps improve machine learning results by
combining several models.
● This approach allows the production of better predictive
performance compared to a single model.
● Basic idea is to learn a set of classifiers (experts) and to allow
them to vote.

5
3. Basic Algorithms Used
3.1 Linear Regression
● Linear regression is one of the easiest and most
popular Machine Learning algorithms.
● It is a statistical method that is used for predictive
analysis.
● Linear regression makes predictions for
continuous/real or numeric variables such as sales,
salary, age, product price, etc.
7
● Linear regression algorithm shows a linear relationship between a
dependent (y) and one or more independent (y) variables, hence called
as linear regression.
● Since linear regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing according to
the value of the independent variable.
● The linear regression model provides a sloped straight line representing
the relationship between the variables.

8
3.2 Random Forest
● Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique.
● It can be used for both Classification and Regression problems
in ML.
● It is based on the concept of ensemble learning, which is a
process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.

9
● "Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of
that dataset.“
● Instead of relying on one decision tree, the random forest
takes the prediction from each tree and based on the
majority votes of predictions, and it predicts the final
output.

10
3.3 Gradient Boost
● Gradient boosting is a machine learning technique used
in regression and classification tasks, among others.

● It gives a prediction model in the form of an ensemble of weak prediction


models, which are typically decision trees.

● When a decision tree is the weak learner, the resulting algorithm is called
gradient-boosted trees; it usually outperforms random forest.

● A gradient-boosted trees model is built in a stage-wise fashion as in


other boosting methods, but it generalizes the other methods by allowing
optimization of an arbitrary differentiable loss function. 11
3.4 Randomized Search CV
● Random search is a technique where random combinations of the
hyperparameters are used to find the best solution for the built model.

● It is similar to grid search, and yet it has proven to yield better results
comparatively.

● RandomizedSearchCV implements a “fit” and a “score” method.

● It also implements “score_samples”, “predict”, “predict_proba”,


“decision_function”, “transform” and “inverse_transform” if they are
implemented in the estimator used.
12
4. Basic Tools Used
4.1 Numpy
● NumPy is a general-purpose array-processing
package.
● It provides a high-performance multidimensional
array object, and tools for working with these
arrays.
● It is the fundamental package for scientific
computing with Python.
14
4.2 Pandas
● pandas is a software library written for the
Python programming language for data
manipulation and analysis.

15
4.3 Matplotlib
● Matplotlib is easy to use and an amazing visualizing library in
Python.
● It is built on NumPy arrays and designed to work with the
broader SciPy stack and consists of several plots like line, bar,
scatter, histogram, etc. 
● Matplotlib is a low level graph plotting library in python that
serves as a visualization utility.
● Matplotlib is open source and we can use it freely.
16
4.4 Seaborn
● Seaborn is a data visualization library built on top of
matplotlib and closely integrated with pandas data
structures in Python.
● Visualization is the central part of Seaborn which
helps in exploration and understanding of data.

17
4.6 Sklearn
● Scikit-learn (Sklearn) is the most useful and robust library
for machine learning in Python.
● It provides a selection of efficient tools for machine
learning and statistical modeling including classification,
regression, clustering and dimensionality reduction via a
consistence interface in Python.
● This library, which is largely written in Python, is built upon
NumPy, SciPy and Matplotlib.

18
4.6.1 Mean Squared Error
● The Mean Squared Error (MSE) or Mean Squared Deviation
(MSD) of an estimator measures the average of error squares i.e. the
average squared difference between the estimated values and true
value.
● It is a risk function, corresponding to the expected value of the
squared error loss.
● It is always non – negative and values close to zero are better.
● The MSE is the second moment of the error (about the origin) and
thus incorporates both the variance of the estimator and its bias. 19
4.6.2 R2 Score
● Coefficient of determination also called as R2 score is used to
evaluate the performance of a linear regression model.
● It is the amount of the variation in the output dependent
attribute which is predictable from the input independent
variable(s).
● It is used to check how well-observed results are reproduced
by the model, depending on the ratio of total deviation of
results described by the model.

20
5. Results
Plots

22
23
24
R2 value for various models

● R2 score for Linear Regression: 0.8407655400238144


● R2 score for Random Forest:
0.9128634064889848
● R2 score for Gradient Boosting: 0.8919869294964318
● R2 score for Randomized SearchCV: 0.808154428580237

25

You might also like