0% found this document useful (0 votes)

6 views

DL (2)

The document covers various concepts in neural networks, including the McCulloch-Pitts model, perceptron learning algorithms, and different activation functions such as linear, sigmoid, tanh, ReLU, softmax, and softplus. It also discusses optimization techniques like Momentum, Adagrad, RMSProp, and Adam, comparing their features and use cases. Additionally, it touches on autoencoders and their architectures, including undercomplete, overcomplete, and sparse autoencoders.

Uploaded by

itsmissrj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

DL (2)

Uploaded by

itsmissrj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

PT1

1. Explain Mc-Culloch Pitts Model. MCCULLOH

2. Explain discrete perceptron learning algorithm with block

diagram.

3. Explain continuous perceptron learning algorithm with block

diagram.

4. Problems based on discrete/continuous perceptron learning

algorithm.

5. Explain with example any four activation functions.

Types of Activation Functions in Deep Learning

1. Linear Activation Function

Linear Activation Function resembles straight line define by y=x. No

matter how many layers the neural network contains, if they all use linear
activation functions, the output is a linear combination of the input.

● The range of the output spans from

● (−∞ to +∞)
● (−∞ to +∞).

● Linear activation function is used at just one place i.e. output

layer.

● Using linear activation across all layers makes the network’s

ability to learn complex patterns limited.

Linear activation functions are useful for specific tasks but must be
combined with non-linear functions to enhance the neural network’s
learning and predictive capabilities.

Linear Activation Function or Identity Function returns the input as the output

2. Non-Linear Activation Functions

1. Sigmoid Function
Sigmoid Activation Function is characterized by ‘S’ shape. It is
mathematically defined as

A=11+e−x

A=
1+e

−x

. This formula ensures a smooth and continuous output that is essential for
gradient-based optimization methods.
● It allows neural networks to handle and model complex patterns

that linear equations cannot.

● The output ranges between 0 and 1, hence useful for binary

classification.

● The function exhibits a steep gradient when x values are

between -2 and 2. This sensitivity means that small changes in

input x can cause significant changes in output y, which is critical

during the training process.

Sigmoid or Logistic Activation Function Graph

2. Tanh Activation Function

Tanh function or hyperbolic tangent function, is a shifted version of the
sigmoid, allowing it to stretch across the y-axis. It is defined as:
f(x)=tanh⁡(x)=21+e−2x–1.

f(x)=tanh(x)=
1+e

−2x

–1.

Alternatively, it can be expressed using the sigmoid function:

tanh⁡(x)=2×sigmoid(2x)–1

tanh(x)=2×sigmoid(2x)–1

● Value Range: Outputs values from -1 to +1.

● Non-linear: Enables modeling of complex data patterns.

● Use in Hidden Layers: Commonly used in hidden layers due to its

zero-centered output, facilitating easier learning for subsequent

layers.

Tanh Activation Function

3. ReLU (Rectified Linear Unit) Function

ReLU activation is defined by

A(x)=max⁡(0,x)

A(x)=max(0,x), this means that if the input x is positive, ReLU returns x, if the
input is negative, it returns 0.

● Value Range:
● [0,∞)
● [0,∞), meaning the function only outputs non-negative values.

● Nature: It is a non-linear activation function, allowing neural

networks to learn complex patterns and making backpropagation

more efficient.

● Advantage over other Activation: ReLU is less computationally

expensive than tanh and sigmoid because it involves simpler

mathematical operations. At a time only a few neurons are

activated making the network sparse making it efficient and easy

for computation.
ReLU Activation Function

3. Exponential Linear Units

1. Softmax Function
Softmax function is designed to handle multi-class classification
problems. It transforms raw output scores from a neural network into
probabilities. It works by squashing the output values of each class into
the range of 0 to 1, while ensuring that the sum of all probabilities equals
1.

● Softmax is a non-linear activation function.

● The Softmax function ensures that each class is assigned a

probability, helping to identify which class the input belongs to.

Softmax Activation Function

2. SoftPlus Function
Softplus function is defined mathematically as:

A(x)=log⁡(1+ex)

A(x)=log(1+e
x

). This equation ensures that the output is always positive and

differentiable at all points, which is an advantage over the traditional ReLU
function.

● Nature: The Softplus function is non-linear.

● Range: The function outputs values in the range

● (0,∞)
● (0,∞), similar to ReLU, but without the hard zero threshold that

ReLU has.
● Smoothness: Softplus is a smooth, continuous function, meaning

it avoids the sharp discontinuities of ReLU, which can sometimes

lead to problems during optimization.

Softplus Activation Function

6. Explain different loss functions.

https://www.geeksforgeeks.org/loss-functions-in-deep-learning/
7. Explain different activation functions and loss functions. How
to choose the activation functions and loss functions for deep
learning applications.

8. Explain Momentum/Nestorev/Adagrad/RMSProp/Adam
based Gradient descent algorithm.(ALL GPT)
1. What is Momentum in Learning?

Momentum helps a model retain past updates to smoothen the optimization process.
Instead of only considering the current gradient, momentum incorporates previous updates,
making the optimization faster and more stable.

● Uses past update history to smoothen the update.

● Moves faster in relevant directions while dampening oscillations.
● Helps escape local minima and speeds up convergence.
📌 Key Terms:
● η\etaη (eta) = Learning rate (controls step size).
● γ\gammaγ (gamma) = Momentum coefficient (controls how much past update
contributes).
● ∇wt\nabla w_t∇wt= Gradient at time step ttt.
● updatetupdate_tupdatet= Velocity term, which accumulates past gradients.

📌 Adagrad (Adaptive Gradient Algorithm) – Theory & Explanation

Adagrad is an adaptive learning rate optimization algorithm that adjusts the learning
rate for each parameter individually. It is particularly useful for problems with sparse data
and features with different frequencies.

🖼️ Real-World Example: Studying for Exams 📚

Imagine you are preparing for multiple subjects:

● Mathematics (Harder Subject)

● English (Easier Subject)

Instead of studying all subjects equally, you realize that Math needs more focus while
English needs less.

Adagrad applies the same logic! 🚀

● It reduces the learning rate for frequently updated parameters (like easy
subjects).
● It increases focus on rarely updated parameters (like hard subjects).
● Over time, parameters that receive large gradients slow down in learning, while
others continue to improve.

📌 Advantages of Adagrad
✅ No need to manually tune the learning rate → Adagrad adjusts it automatically.
✅ Great for sparse data problems → Common in NLP (e.g., word embeddings) and
✅ Works well for convex optimization problems → Suitable for simpler ML tasks.
Recommendation Systems.

❌ Disadvantages of Adagrad
1. 📉 Learning Rate Becomes Too Small (Vanishing Learning Rate)
○ Since Adagrad keeps accumulating squared gradients, the denominator
grows over time.
○ This reduces the learning rate too much, causing optimization to slow

🛠️
down or even stop.
2. Not Suitable for Deep Learning
○ Deep networks require long-term learning stability.
○ Adagrad slows down too quickly, making it ineffective for complex models.
○ Algorithms like RMSProp & Adam fix this issue.

📌 When to Use Adagrad?

🔹 NLP & Sparse Data Problems (e.g., text-based models, recommendation systems).
🔹 Feature Imbalance Scenarios where some features are much more frequent than
🔹 Convex optimization problems, but not deep learning.
others.
‘1️⃣ RMSProp (Root Mean Square Propagation)
🧠 Key Idea: Solve Adagrad’s "Vanishing Learning Rate" Issue
Adagrad accumulates squared gradients, which causes the learning rate to decrease too
much over time. RMSProp fixes this by using a moving average of squared gradients
instead of a full sum.

🖼️ Real-World Analogy: Adjusting Running Pace 🏃‍♂️

Imagine you're running a marathon. If you always keep slowing down based on past
effort (like Adagrad), you might stop completely before finishing the race.

✅
RMSProp balances this by:

✅
Keeping track of recent effort rather than all past efforts.
Smoothing out sudden changes in speed.

✅ Advantages of RMSProp
● Solves Adagrad’s problem of vanishing learning rate.
● Works well for non-stationary (changing) problems.
● Great for deep learning and RNNs (Recurrent Neural Networks).

❌ Disadvantages of RMSProp
2️⃣ Adam (Adaptive Moment Estimation)
🧠 Key Idea: Combine Momentum & RMSProp
Adam is like a mix of Momentum and RMSProp, making it one of the best optimizers for
deep learning.

🖼️ Real-World Analogy: Driving a Car 🚗

Think about driving on a curvy road.

● Momentum (like β1\beta_1β1) helps keep the movement smooth so you don’t
oversteer.
● RMSProp (like β2\beta_2β2) adjusts how much you slow down or speed up
based on past driving experience.
● Together, Adam ensures efficient, smooth, and adaptive learning – like an expert
driver adjusting speed & direction simultaneously.

✅ Advantages of Adam
● Fast convergence → Works well even with noisy gradients.
● Adaptive learning rate → Adjusts per parameter.
● Great for deep learning → Default optimizer in many frameworks (e.g., TensorFlow,
PyTorch).

❌ Disadvantages of Adam
● Computationally expensive due to tracking multiple moving averages.
● May not always generalize well for some problems (SGD can generalize better).
● Requires tuning of β1\beta_1β1and β2\beta_2β2, though defaults work well.

9. Compare and Contrast AdaGrad with RMSProp gradient

descent.

10. Differentiate between Momentum based and Adam

Gradient Descent.
Comparison of AdaGrad and RMSProp Gradient Descent
Feature AdaGrad RMSProp

Full Form Adaptive Gradient Algorithm Root Mean Square Propagation

Adaptability Adapts learning rate per Modifies AdaGrad by using an
parameter by accumulating exponentially decaying moving
squared gradients over time. average of past squared gradients.

Learning Rate Decreases over time due to Keeps the learning rate more stable
Adjustment continuous accumulation of by preventing extreme reduction,
squared gradients, making it slow making it better for non-convex
for long-term training. problems.

Handling of Works well with sparse data Also works well with sparse data but
Sparse Data since it aggressively reduces avoids drastic reduction of learning
learning rates for frequently rates.
occurring features.

Downside Learning rate can become too Mitigates AdaGrad’s aggressive

small over time, leading to slow learning rate decay but still requires
convergence or stopping training careful tuning of hyperparameters.
prematurely.

Best Use Case Suitable for convex problems and Works well for non-stationary and
sparse data scenarios. non-convex optimization problems.

Difference Between Momentum-Based and Adam Gradient Descent

Feature Momentum-Based Gradient Adam Gradient Descent
Descent

Full Form Momentum (uses past gradients Adaptive Moment Estimation

to smooth updates)

Update Adds a fraction of the past update Combines momentum and

Mechanism to the current update, helping to adaptive learning rates using
overcome local minima and first-moment and
smooth updates. second-moment estimates.

Speed Accelerates training by reducing Faster convergence due to

oscillations in steep regions. adaptive learning rate and
momentum.

Hyperparameters Learning rate (α), momentum Learning rate (α), β1

coefficient (β). (momentum term), β2 (RMS
term), and epsilon (ϵ).

Handling of Not well-suited for sparse data, as Performs well on sparse data
Sparse Data learning rate remains constant. due to per-parameter adaptive
learning rates.
Convergence Helps escape local minima but More stable and adaptive,
Stability may overshoot due to fixed making it a robust choice for
learning rate. deep learning.

Best Use Case Useful when dealing with high Preferred for deep learning
variance or noisy gradients. models due to its adaptive and
robust nature.

Summary

● AdaGrad is good for sparse data but suffers from diminishing learning rates.
● RMSProp improves upon AdaGrad by using an exponentially decaying average of
squared gradients, making it more suitable for non-convex problems.
● Momentum-based Gradient Descent accelerates convergence by adding
momentum but uses a fixed learning rate.
● Adam combines the benefits of momentum and adaptive learning rates, making it
one of the most popular optimizers for deep learning.

11. Discuss Autoencoders with appropriate architecture

diagram. (Lect 7)

12. Discuss architectures of undercomplete and overcomplete

autoencoders.
11,12 https://youtu.be/wPz3MPl5jvY?si=2EdkFk5HBZx0fwFD

13. Explain sparse auto encoders in short.

https://youtu.be/REzrCEDMQws?si=aA5hZVxKraPkudeg

DL
No ratings yet
DL
12 pages
Unit-1 and 2 and 3 (1)
No ratings yet
Unit-1 and 2 and 3 (1)
212 pages
عرفان
No ratings yet
عرفان
5 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
6 NN RNN
No ratings yet
6 NN RNN
55 pages
Module 2
No ratings yet
Module 2
13 pages
DL UNIT2
No ratings yet
DL UNIT2
22 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
ADL Unit-3
No ratings yet
ADL Unit-3
21 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Module 2
No ratings yet
Module 2
67 pages
02_NEURAL_NETWORKS
No ratings yet
02_NEURAL_NETWORKS
32 pages
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
100% (1)
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
18 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Unit – IV
No ratings yet
Unit – IV
24 pages
Unit 3 Self Made
No ratings yet
Unit 3 Self Made
23 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
file
No ratings yet
file
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
358 pages
Deep Learning Assignment 01
No ratings yet
Deep Learning Assignment 01
3 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
3
No ratings yet
3
11 pages
ISE-1 Imp DLpdf
No ratings yet
ISE-1 Imp DLpdf
28 pages
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Architectures Discription
No ratings yet
Architectures Discription
75 pages
cst414-deep learning module 2
No ratings yet
cst414-deep learning module 2
13 pages
what are the activation functions, how do i deter...
No ratings yet
what are the activation functions, how do i deter...
3 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
GD Compare
No ratings yet
GD Compare
5 pages
DL 4
No ratings yet
DL 4
15 pages
Part 1.3. Optimazation of Learning Algorithms
No ratings yet
Part 1.3. Optimazation of Learning Algorithms
14 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
73 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
deep learning
No ratings yet
deep learning
11 pages
Module 1.Pptx
No ratings yet
Module 1.Pptx
64 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
BSC Commands
100% (1)
BSC Commands
4 pages
Placement Portal
No ratings yet
Placement Portal
1 page
Pro Vendor Sample EHR Contract File - DTM - 1
No ratings yet
Pro Vendor Sample EHR Contract File - DTM - 1
20 pages
Mastering Kubernetes Large scale container deployment and management 1st Edition Gigi Sayfan - Download the ebook now for an unlimited reading experience
100% (1)
Mastering Kubernetes Large scale container deployment and management 1st Edition Gigi Sayfan - Download the ebook now for an unlimited reading experience
58 pages
SIETH
100% (1)
SIETH
28 pages
Lesson 03 - Deep Learning, NLP, Robotics
No ratings yet
Lesson 03 - Deep Learning, NLP, Robotics
49 pages
AORC Panel BI Members. Australia, Malaysia, New Zealand, India, Japan, Singapore, Korea, Hong Kong SAR China, Indonesia and Thailand
No ratings yet
AORC Panel BI Members. Australia, Malaysia, New Zealand, India, Japan, Singapore, Korea, Hong Kong SAR China, Indonesia and Thailand
20 pages
A Visual LISP Exercise: Application That Reads Point List From A
No ratings yet
A Visual LISP Exercise: Application That Reads Point List From A
3 pages
Lasso Blur: Multiple Choice Questions
No ratings yet
Lasso Blur: Multiple Choice Questions
4 pages
University of Mumbai: WWW - Mu.ac - In/idol WWW - Mahaonline.gov - in WWW - Mu.ac - In/idol
No ratings yet
University of Mumbai: WWW - Mu.ac - In/idol WWW - Mahaonline.gov - in WWW - Mu.ac - In/idol
2 pages
Fundamentals of Java Programming
No ratings yet
Fundamentals of Java Programming
4 pages
DATRON D5 Extended Brochure
No ratings yet
DATRON D5 Extended Brochure
44 pages
Untitled
No ratings yet
Untitled
23 pages
2014 2 MELAKA SMK Gajah Berang - MATHS QA
67% (3)
2014 2 MELAKA SMK Gajah Berang - MATHS QA
6 pages
Technical Writing 101
No ratings yet
Technical Writing 101
14 pages
Dell Case Study
No ratings yet
Dell Case Study
15 pages
Electronic Evidence Rule
No ratings yet
Electronic Evidence Rule
4 pages
Optimization of Dynamic Parameters For A Traction-Type Passenger Elevator Using A Dynamic Byte Coding Genetic Algorithm
No ratings yet
Optimization of Dynamic Parameters For A Traction-Type Passenger Elevator Using A Dynamic Byte Coding Genetic Algorithm
14 pages
FAARFIELD 2.0.7 Readme 2021-09-14
No ratings yet
FAARFIELD 2.0.7 Readme 2021-09-14
5 pages
s13638-019-1402-8
No ratings yet
s13638-019-1402-8
15 pages
Sample Research Papers
No ratings yet
Sample Research Papers
4 pages
Working With SFC Charts PDF
No ratings yet
Working With SFC Charts PDF
55 pages
Icicibank
No ratings yet
Icicibank
2 pages
Annexure-A2-Specification of Kavach (The Indian Railway ATP)- Onboard KAVACH Configurable Parameters Amdt-3 16_07_2024 Final after signed
No ratings yet
Annexure-A2-Specification of Kavach (The Indian Railway ATP)- Onboard KAVACH Configurable Parameters Amdt-3 16_07_2024 Final after signed
16 pages
Whodunit Case Study
83% (6)
Whodunit Case Study
8 pages
Online Influencers in Long Tail Markets
No ratings yet
Online Influencers in Long Tail Markets
51 pages
Digital Twin As Energy Management Tool Through IoT and BIM Data Integration
No ratings yet
Digital Twin As Energy Management Tool Through IoT and BIM Data Integration
8 pages
Semipermanentes STM 2
No ratings yet
Semipermanentes STM 2
4 pages
Sapagent711 Installconfig
No ratings yet
Sapagent711 Installconfig
64 pages
Deld Mcqs 1
No ratings yet
Deld Mcqs 1
25 pages