0% found this document useful (0 votes)

2 views

Data Science Concepts Overfitting Underfitting

The document discusses the concepts of overfitting and underfitting in machine learning, emphasizing the importance of model generalization. It explains the definitions of both terms, provides examples, and outlines strategies to avoid these issues across different model types, including parametric models, tree-based algorithms, and neural networks. Key strategies include using more data, reducing model complexity, feature engineering, and employing techniques like regularization and cross-validation.

Uploaded by

teletuvissalados

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Data Science Concepts Overfitting Underfitting

Uploaded by

teletuvissalados

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Science concepts

Overfitting & Underfitting

Aitor Larrinoa

January 2025
Contents

1 Introduction 1

2 What are overfitting and underfitting? 2

3 Example 3

4 Avoid overfitting and underfitting 4

4.1 Parametric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Tree based algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1 Introduction

Our principal goal when we train a ML model is to get good results. Thus, the better is the
metric of the model, the better the performance of it. However, is this entirely true? We have
to be cautious because our main goal should be to look for a good generalization, instead.

A model is said to generalize well when it can handle new, unseen input data effectively.
However, finding a balance between fitting the training data and performing well on new data
is not straightforward and can lead to two common problems: overfitting and underfitting.

In this post we will dive into the concepts of overfitting and underfitting, we will understand
why they happen, get into a practical example and what strategies can we use in order to
avoid them.

1
2 What are overfitting and underfitting?

One of the biggest problems when dealing with ML models are overfitting and underfitting.
As if it were a human being, learning machines must be able to generalize concepts. Suppose
that we see a Labrador Retriever for the first time in our lives, and someone tells us, ”That
is a dog.” Later, we are shown a Poodle and asked, ”Is that a dog?” We might say, ”No,” as
it looks nothing like what we previously learned. Now imagine someone shows us a book with
pictures of 10 different dog breeds. When we see a breed we are unfamiliar with, we will be able
to recognize it as a dog because of the characteristics observed in the various dogs depicted in
the photos.

The goal is to ensure that the model can generalize a concept so that when presented with a
new, unfamiliar dataset, it can understand, and provide a reliable result thanks to its
generalization ability. Before diving into overfitting and underfitting, the following concepts
must be understood:

Defintion 2.1. Bias is the difference between the model’s prediction and the correct value it
aims to predict.

Definition 2.2. Variance is the variability of model prediction for a given data point or a
value which tells us spread of our data.
So now, what is overfitting? what is underfitting?

• Overfitting: The model will only adjust to learn the specific cases it is taught (training
set) and will be unable to recognize new input data (test set).

• Underfitting: Underfitting, in contrast to overfitting, occurs when the algorithm fails to

extract meaningful patterns from the data and is unable to generalize the knowledge.

In other words, underfitting occurs when the model is too simple, resulting in high bias and
an inability to capture the true patterns in the data, whereas overfitting happens when the
model is too complex, leading to high variance and poor generalization to unseen data.

Next we will show some overfitting and underfitting visual examples for classification and
regression tasks:

2
Figure 1: Overfitting and underfitting

3 Example

We will create an example in order to see the relevance of overfitting and underfitting more
easily.

Let’s supose we are in front of a dataset where y is a function of x and their relationship is
given by the next equation:

y = x2

If we consider a linear regression model, for example y = β0 + x · β1 , the error will be high
because a straight line cannot capture the curvature of the previously seen relationship. This
is underfitting and can be shown in the next plot:

Figure 2: Underfitting example

3
Clearly, the line does not fit well our data points. In fact, the metric result show us the poor
performance of the model. Thus, clearly underfitting appears.

However, if we consider a polynomial regression with a high degree, let’s say for example 22,
we will obtain a model that fits extremely well on training data and will not be capable of
generalizing predictions.

Figure 3: Overfitting example

As said before, the model performs extremely well on training data. This results in that if we
consider a new data point now, a little bit different from training data, the error will be quite
high due to overfitting. Thus, our main goal when training a model should be to look for
generalization.

These examples can be seen in my Github profile, https://github.com/aitorlarrinoa/

data-science-concepts/blob/main/notebooks/01_overfitting_underfitting.ipynb.

4 Avoid overfitting and underfitting

As seen, overfitting and underfitting can cause serious problems when creating a machine
learning model. Thus, we need to control them. We are going to talk about different generic
considerations we can take in order to have underfitting and overfitting under control:

• More data. Training with few data points can cause overfitting.

• Reduce model complexity. Less is more, thus, a very complex model can lead to
overfitting.

• Feature engineering. Poor feature engineering means underfitting, and that is why this
is one of the most important considerations in a data science project.

• Cross-validation. Techniques like k-fold cross-validation can help evaluate the model’s
performance on unseen data, reducing the risk of overfitting or underfitting during
training.

4
In fact, depending on the model we are dealing with, avoiding overfitting and underfitting
never take the same path. Let’s dive into different type of models and how we can deal with
these problems in each case:

4.1 Parametric models

Parametric models, such as linear regression and logistic regression, assume a fixed functional
form with a finite number of parameters. Here are some approaches to controlling
underfitting and overfitting in these models:

• Regularization: Techniques like ridge or lasso regression add constraints to the model
coefficients, reducing overfitting.

• Feature selection: Choose only the most relevant features for the model. Reducing
irrelevant or highly correlated features can improve generalization.

• Polynomial features: For models that underfit, consider adding polynomial or

interaction terms to capture nonlinear relationships in the data. However, ensure the
degree is not too high to avoid overfitting.

4.2 Tree based algorithms

Descision trees, random forests, XGBoost, ... are examples of tree based algorithms. These
type of algorithms tend to overfit. Let’s see what can be done in these types of model:

• Ensemble methods: Models like random forests and gradient boosting combine multiple
trees to improve generalization. Use techniques like bagging (random forests) or
boosting to balance bias and variance.

• Hyperparameters: Hyperparameters such as max depth, min samples split, and

min samples leaf can help us with overfitting and underfitting.

• Feature importance: Selecting features that contribute most to the model’s predictions
can help when generalizing those predictions and avoids overfitting.

4.3 Neural networks

Neural networks are the most complex models within machine learning and AI. Thus, these
models are prone to overfitting when the architecture is too complex or the dataset is small.
Consider the following tips:

• Dropout: Randomly deactivate a proportion of neurons during training to improve

generalization.

• Early stopping: Stop the process of training once the loss stops improving, preventing
the network from overfitting.

• Regularization: Apply regularization to penalize large weights.

5
• Data augmentation: Normally used when dealing with images. The idea is to artificially
increase the size of the training dataset by applying transformations such as rotations,
flips, or noise to the input data. Thus, we will get more data for free.

• Architecture tuning: Reduce the number of layers or neurons if the network is too large
for the dataset.. This is the principal approach when we have an overfitted neural
network.

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
MA2 - Workbook1 (CHP 1 - 10) Jan19 (EO) PDF
100% (3)
MA2 - Workbook1 (CHP 1 - 10) Jan19 (EO) PDF
232 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
Unit 4
No ratings yet
Unit 4
35 pages
machine learning-unit 3
No ratings yet
machine learning-unit 3
18 pages
Underfitting and Overfitting
No ratings yet
Underfitting and Overfitting
4 pages
40_Machine_Learning_Interview_Questions
No ratings yet
40_Machine_Learning_Interview_Questions
55 pages
Machine Learning Volume I 280820241047
No ratings yet
Machine Learning Volume I 280820241047
4 pages
Bias_and_Variance
No ratings yet
Bias_and_Variance
4 pages
module 3 modified
No ratings yet
module 3 modified
48 pages
Unit 4
No ratings yet
Unit 4
50 pages
Ensemble Method
No ratings yet
Ensemble Method
12 pages
Bias and Variance
No ratings yet
Bias and Variance
7 pages
ML 22-23 Sem, GPT
No ratings yet
ML 22-23 Sem, GPT
14 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
No ratings yet
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
5 pages
All DL
No ratings yet
All DL
72 pages
Edab Module - 2
No ratings yet
Edab Module - 2
20 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Deep Learning[1]
No ratings yet
Deep Learning[1]
26 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
9 pages
emsemble methods-pages-deleted
No ratings yet
emsemble methods-pages-deleted
2 pages
ML Module 1 + Module 2
No ratings yet
ML Module 1 + Module 2
4 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
Bias Variance dichotomy
No ratings yet
Bias Variance dichotomy
11 pages
Chapter III - Supervised and Unsupervised Algorithms
No ratings yet
Chapter III - Supervised and Unsupervised Algorithms
122 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Unit 2
No ratings yet
Unit 2
76 pages
Ridge Regression
No ratings yet
Ridge Regression
20 pages
What is Regularization.
No ratings yet
What is Regularization.
10 pages
Bias and Variance in Machine Learning
No ratings yet
Bias and Variance in Machine Learning
3 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
Bias and Variance
No ratings yet
Bias and Variance
36 pages
ML Final Notes Unit 4,5 Rishi
No ratings yet
ML Final Notes Unit 4,5 Rishi
45 pages
Btcse 504 Machine Learning
No ratings yet
Btcse 504 Machine Learning
11 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
unit4
No ratings yet
unit4
93 pages
Data Science Interview Questions 30 Days 1686062665
No ratings yet
Data Science Interview Questions 30 Days 1686062665
300 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Project Lit Final1
No ratings yet
Project Lit Final1
15 pages
U&O Fitting
No ratings yet
U&O Fitting
6 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
ML_DS_interview_quetions
No ratings yet
ML_DS_interview_quetions
17 pages
Data Science Interview
100% (4)
Data Science Interview
12 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
Data Science Interview Questions #Week1
No ratings yet
Data Science Interview Questions #Week1
111 pages
30 Days of Interview Preparation
100% (1)
30 Days of Interview Preparation
415 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
Pre Class Read Study Guide
No ratings yet
Pre Class Read Study Guide
3 pages
Data Science Interview Questions (#Day11) PDF
100% (1)
Data Science Interview Questions (#Day11) PDF
11 pages
MLSC Final Notes
No ratings yet
MLSC Final Notes
24 pages
Week 15
No ratings yet
Week 15
41 pages
ML NOTES
No ratings yet
ML NOTES
13 pages
BCom - Business Statistics
No ratings yet
BCom - Business Statistics
460 pages
The 7 Hidden Reasons Employees Leave by Leigh Branham
No ratings yet
The 7 Hidden Reasons Employees Leave by Leigh Branham
83 pages
Edvard S Son 2003
No ratings yet
Edvard S Son 2003
16 pages
Working Report of Employee Satisfaction and Customer Satisfaction On Service Quality at Pubali Bank Ltd.
75% (4)
Working Report of Employee Satisfaction and Customer Satisfaction On Service Quality at Pubali Bank Ltd.
43 pages
Module 4 Question Bank: Big Data Analytics
No ratings yet
Module 4 Question Bank: Big Data Analytics
2 pages
Jaggia BA 1e Chap009 PPT
No ratings yet
Jaggia BA 1e Chap009 PPT
25 pages
Moving Average1
No ratings yet
Moving Average1
3 pages
European Commission Risk and Control Matrix Audit Procurement in DG X
No ratings yet
European Commission Risk and Control Matrix Audit Procurement in DG X
2 pages
Propel Research and Analysis With A Comprehensive Statistical Software Solution (SPSS Statistics v28)
No ratings yet
Propel Research and Analysis With A Comprehensive Statistical Software Solution (SPSS Statistics v28)
9 pages
MAS202 ch14
No ratings yet
MAS202 ch14
23 pages
Sample Name: Academic Credentials
No ratings yet
Sample Name: Academic Credentials
1 page
ISI Placement Brochure 2019 2020
No ratings yet
ISI Placement Brochure 2019 2020
8 pages
TableauCertifiedDataAnalyst ExamGuide
No ratings yet
TableauCertifiedDataAnalyst ExamGuide
16 pages
Mansi Updated Resume
No ratings yet
Mansi Updated Resume
2 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Module 5 Introduction
No ratings yet
Module 5 Introduction
6 pages
Uantitative: Data Analysis
No ratings yet
Uantitative: Data Analysis
15 pages
Synopsis Working Capital Management
No ratings yet
Synopsis Working Capital Management
18 pages
DSN3126 Chapter 4
No ratings yet
DSN3126 Chapter 4
18 pages
Data Wrangling
No ratings yet
Data Wrangling
30 pages
The Completely Randomized Design (CRD)
No ratings yet
The Completely Randomized Design (CRD)
16 pages
First find the mean µ=∑X/N µ =10+60+50+30+40+20 6 µ=210/6=35 Variance=σ =∑ (X-µ) N σ =1750/6 σ =291.7 Standard deviation σ=√ ∑ (X-µ) N N=6 ∑X=210
No ratings yet
First find the mean µ=∑X/N µ =10+60+50+30+40+20 6 µ=210/6=35 Variance=σ =∑ (X-µ) N σ =1750/6 σ =291.7 Standard deviation σ=√ ∑ (X-µ) N N=6 ∑X=210
5 pages
Kim and Kim WOM14-00848-V2
No ratings yet
Kim and Kim WOM14-00848-V2
14 pages
Project Report Sem II Final
No ratings yet
Project Report Sem II Final
102 pages
CB3021
No ratings yet
CB3021
6 pages
Research Notes
No ratings yet
Research Notes
3 pages
HOPE
No ratings yet
HOPE
76 pages
Ukkd Journal
No ratings yet
Ukkd Journal
10 pages