cst414-deep learning module 2

This document provides an introduction to deep learning, explaining its significance within artificial intelligence and its applications. It covers the structure of Deep Feed Forward Networks, regularization techniques to prevent overfitting, and various optimization methods used to improve model performance. Key concepts include L1 and L2 regularization, dropout, early stopping, and different gradient descent strategies such as SGD and RMSProp.

Uploaded by

heisenberganaya1

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

cst414-deep learning module 2

Uploaded by

heisenberganaya1

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

MODULE 2

Introduction to Deep Learning

●
Deep Learning is a subset of machine learning, which itself is a part of
artificial intelligence (AI).
●
It involves training algorithms, typically artificial neural networks, to
learn from large amounts of data and make predictions or decisions.
●
Deep learning has become one of the most significant advances in the
field of AI, with applications in areas such as image and speech
recognition, natural language processing, and autonomous driving.
Deep Feed Forward Network
●
A Deep Feed Forward Network (DFFN), also known as a multilayer
perceptron (MLP).
●
It is a class of feedforward neural networks with multiple layers of
nodes, where each node in a layer is connected to all nodes in the
previous and next layers.
Concept of Regularization and Optimization
●
Regularization and optimization are two core concepts in deep
learning that help improve model performance, prevent overfitting,
and ensure efficient convergence during training.
Regularization in Deep Learning
●
This techniques are used to prevent overfitting — a scenario where a
model performs well on training data but poorly on unseen data.
●
Regularization methods reduce this complexity or add constraints to
the model to improve its ability to generalize to new data.
1) L2 Regularization (Ridge Regularization): It adds a penalty term to the
loss function based on the square of the weights. This discourages large
weights, which can lead to overfitting.

Where Ltotal is the total loss, Loriginal is the original loss, wi are the
weights and λ is the regularization parameter that controls the penalty.
2) L1 Regularization (Lasso Regularization): It adds a penalty term to the
loss function based on the absolute values of the weights. It has a tendency
to drive some weights to exactly zero, effectively performing feature
selection.
3) Dropout: At each training step, a fraction of the neurons in the network is
randomly ignored (dropped out). The remaining neurons must learn to work
together and generalize better.
4) Early Stopping: The training process is stopped when the validation loss
starts increasing, indicating that the model is overfitting to the training data.
This helps prevent the model from memorizing the training data.
5) Data Augmentation: It is the process of artificially increasing the size of
the training set by applying random transformations (e.g., rotation, scaling,
flipping, etc.) to the training data. For tasks like image classification,
transformations (e.g., rotations, flips, crops) are applied to the original
images to create new training samples.
Optimization in Deep Learning
●
Optimization in deep learning refers to the process of adjusting the parameters
(weights and biases) of a neural network in order to minimize a loss function.
1) Gradient Descent (GD):
• It is used for finding local minimum for a differentiable function.In this
method weight is initialized using some initialization strategies and is updated
with each epoch according to the update equation.
• The core idea is to update the parameters in the direction opposite to the
gradient of the function, thus progressively reducing the value of the function.
The size of the step taken in each iteration is controlled by a parameter called
the learning rate (η).
2) Stochastic Gradient Descent (SGD):
●
It updates the model parameters using only one data point at a time, as
opposed to the entire dataset.
●
But this approach may lead to noiser result because it iterates one
obseervation at a time.
●
Mini-Batch SGD: It is a cross-over between GD and SGD. Here entire
dataset is divided into batches and compute the gradient for each
batch.
3) GD with momentum:
●
Even with Mini-Batch SGD there are noises and taking too much time for
convergence.
●
SGD with momentum is working with the concept of Exponentially
Weighted Moving Average(EWMA).
●
Here most recent previous values will be getting more importance and
earlier events will be having less importance.
●
Here βVt-1 is the momentum, this will reduce the noise while trying to
reach the global minima.
4) Nestrov Accelerated GD
●
In gradient descent with momentum, the actual movement is large due
to the added momentum.
●
This added momentum causes different types of problems.We cross
the actual minimum point and have to come back to get the minimum
point.
●
That means SGD with momentum oscillates around the minimum
point.
●
To reduce the problem we can use Nestrov Accelerated GD.
●
In NAG we calculate the gradient at the look ahead point and then used
it to update the weights.
●
In momentum based GD, the weight is update was based on
ΔW= History of velocity + Gradient at that point
●
But in NAG the only difference is that
ΔW= History of velocity + Gradient at a look ahead point
●
So based on this it will decide wheather to move forward or backward to
reach minimum.
5) Adagrad Optimizer
●
In SGD and Mini-Batch SGD the learning rate was same for each
weight or for each parameter.
●
But in Adagrad Optimizer the learning rate gets modified based on
how frequently a parameter gets updated during training.
●
This method is adopted when the dataset is having different features
of various dimentions. For example if it is having Dense and Sparse
features the saming learning rate is applied to both types of features
will face difficulty in optimization.
6) RMSProp
●
It is an extension of the popular Adaptive Gradient Algorithm and is
designed to dramatically reduce the amount of computational effort
used in training neural networks.
●
RMSProp is able to smoothly adjust the learning rate for each of the
parameters in the network, providing a better performance than
regular Gradient Descent.

Download ebooks file Toronto Notes 2023 39th Edition Anders W. Erickson all chapters
100% (2)
Download ebooks file Toronto Notes 2023 39th Edition Anders W. Erickson all chapters
65 pages
Osaka-Batteries Technical-Data-Sheet
No ratings yet
Osaka-Batteries Technical-Data-Sheet
7 pages
5 Total & Effective Stress
100% (7)
5 Total & Effective Stress
17 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Module 2
No ratings yet
Module 2
67 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
ADL Unit-3
No ratings yet
ADL Unit-3
21 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Unit-3
No ratings yet
Unit-3
47 pages
DL 4
No ratings yet
DL 4
15 pages
MODULE 2 DL
No ratings yet
MODULE 2 DL
9 pages
NN optimizers
No ratings yet
NN optimizers
2 pages
Artificial neural networks-optimization
No ratings yet
Artificial neural networks-optimization
4 pages
GD Types
No ratings yet
GD Types
98 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
ADAM-1
No ratings yet
ADAM-1
11 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
Unit 2
No ratings yet
Unit 2
13 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
Deep Learning-Summery
No ratings yet
Deep Learning-Summery
24 pages
Advantages Bpa
No ratings yet
Advantages Bpa
38 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
DL Class3
No ratings yet
DL Class3
28 pages
Asdfvvasdfr
No ratings yet
Asdfvvasdfr
1 page
Qbank 2 solutions
No ratings yet
Qbank 2 solutions
6 pages
2.vanishing Gradient and Exploding Gradient Simple Notes
No ratings yet
2.vanishing Gradient and Exploding Gradient Simple Notes
2 pages
lec7-8+CNN-2
No ratings yet
lec7-8+CNN-2
69 pages
Adl Unit 1 2
No ratings yet
Adl Unit 1 2
67 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
Unit2 Optimizer
No ratings yet
Unit2 Optimizer
18 pages
2marks ML
No ratings yet
2marks ML
3 pages
CV Lec4
No ratings yet
CV Lec4
46 pages
DL UNIT-I
No ratings yet
DL UNIT-I
30 pages
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
No ratings yet
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
9 pages
Global Sparse Momentum SGD For Pruning Very Deep Neural Networks
No ratings yet
Global Sparse Momentum SGD For Pruning Very Deep Neural Networks
13 pages
Paper 2
No ratings yet
Paper 2
27 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
SGD
No ratings yet
SGD
3 pages
chp2 Gradient Descent algorithm
No ratings yet
chp2 Gradient Descent algorithm
5 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Unit 2.4
No ratings yet
Unit 2.4
31 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
AI_UNIT_5
No ratings yet
AI_UNIT_5
33 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
Unit – IV
No ratings yet
Unit – IV
24 pages
Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
26 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Oracle 19c Vs EDB 12 - Updated - V2
No ratings yet
Oracle 19c Vs EDB 12 - Updated - V2
18 pages
Excel Utility for Income Tax Ga 83 Fy 2023 24 by Rakesh Saini Updated02112023 Copy 1
No ratings yet
Excel Utility for Income Tax Ga 83 Fy 2023 24 by Rakesh Saini Updated02112023 Copy 1
8 pages
PANDAS PRACTISE QUESTIONS
No ratings yet
PANDAS PRACTISE QUESTIONS
2 pages
Manufacturer Equipment Type Model Cylinder Type OEM Part Number Stock
No ratings yet
Manufacturer Equipment Type Model Cylinder Type OEM Part Number Stock
2 pages
Objection Handling HB
No ratings yet
Objection Handling HB
4 pages
PA-41 Manual1 en PDF
No ratings yet
PA-41 Manual1 en PDF
158 pages
Chick Queen Fast Foods
100% (1)
Chick Queen Fast Foods
32 pages
Sir Moonlight - DEEPLOL.GG LoL stats, AI-Score, Multi-Search, Live-Game
No ratings yet
Sir Moonlight - DEEPLOL.GG LoL stats, AI-Score, Multi-Search, Live-Game
1 page
Pavan 10X
No ratings yet
Pavan 10X
88 pages
Communication Skill
100% (9)
Communication Skill
26 pages
TLE-Q4-L2-QUIZ - Google Forms
No ratings yet
TLE-Q4-L2-QUIZ - Google Forms
3 pages
Equipment Register - Pat Testing Original Certificate
No ratings yet
Equipment Register - Pat Testing Original Certificate
3 pages
Insider Vs Outsider System
No ratings yet
Insider Vs Outsider System
6 pages
Types of Electrical Power Distribution Systems
No ratings yet
Types of Electrical Power Distribution Systems
7 pages
Isc N-Channel MOSFET Transistor 2SK2654: Features
No ratings yet
Isc N-Channel MOSFET Transistor 2SK2654: Features
2 pages
RTI For Web
No ratings yet
RTI For Web
21 pages
Tasks:: Figure 1 An Example Assembly
No ratings yet
Tasks:: Figure 1 An Example Assembly
3 pages
FCH10A15 File
No ratings yet
FCH10A15 File
7 pages
Installation Manuals W 2000 Ts 2000
No ratings yet
Installation Manuals W 2000 Ts 2000
4 pages
Keltron Report
No ratings yet
Keltron Report
35 pages
Where To Find Free Cosmetic Formulas: General Cosmetic Formulation Lists
100% (1)
Where To Find Free Cosmetic Formulas: General Cosmetic Formulation Lists
14 pages
How To Install TWRP On Samsung Galaxy Core Prime (Guide) - Dottech
No ratings yet
How To Install TWRP On Samsung Galaxy Core Prime (Guide) - Dottech
2 pages
Stamford University Bangladesh Assignment 1: Department of CSE
No ratings yet
Stamford University Bangladesh Assignment 1: Department of CSE
5 pages
Annexure-C - Examination SOP
No ratings yet
Annexure-C - Examination SOP
42 pages
PR E2-3, E2-7, E7-1, E7-4
No ratings yet
PR E2-3, E2-7, E7-1, E7-4
5 pages
1.2.4 EconomicChallenges EN
No ratings yet
1.2.4 EconomicChallenges EN
23 pages
Electricity_Price_Prediction_Based_on_LSTM_and_LightGBM
No ratings yet
Electricity_Price_Prediction_Based_on_LSTM_and_LightGBM
5 pages