100% found this document useful (1 vote)
431 views

Deep Learning - IIT Ropar - Unit 7 - Week 4

The document is a summary of an assignment for a deep learning course on NPTEL. It includes 10 multiple choice questions about concepts related to gradient descent, stochastic gradient descent, momentum, adaptive learning rates, and activation functions. The questions assess understanding of key algorithms and concepts in deep learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
431 views

Deep Learning - IIT Ropar - Unit 7 - Week 4

The document is a summary of an assignment for a deep learning course on NPTEL. It includes 10 multiple choice questions about concepts related to gradient descent, stochastic gradient descent, momentum, adaptive learning rates, and activation functions. The questions assess understanding of key algorithms and concepts in deep learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

X

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

[email protected]

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Deep Learning - IIT Ropar (course)

If already
registered, click
to check your
Week 4 : Assignment
payment status The due date for submitting this assignment has passed.
Due on 2024-02-21, 23:59 IST.

Course Assignment submitted on 2024-02-21, 18:13 IST


outline
1) We have following functions x 3 , ln(x), ex , x and 4. Which of the following 1 point
functions has the steepest slope at x=1?
About
NPTEL ()
3
x

How does an
ln(x)
NPTEL
online e
x

course 4
work? ()
Yes, the answer is correct.
Score: 1
Week 0 ()
Accepted Answers:
3
x
Week 1 ()
2) Which of the following represents the contour plot of the function f(x,y) = x 2 − y 2 ? 1 point
Week 2 ()

Week 3 ()

week 4 ()

Recap:
Learning
Parameters:
Guess Work,
Gradient
Descent (unit?
unit=59&lesso
n=60)

Contours
Maps (unit?
unit=59&lesso
n=61)

Momentum
based
Gradient
Descent (unit?
unit=59&lesso
n=62)

Nesterov
Accelerated
Gradient
Descent (unit?
unit=59&lesso
n=63)

Stochastic And
Mini-Batch
Gradient
Descent (unit?
unit=59&lesso
n=64)

Tips for
Adjusting
Learning Rate
and
Momentum
(unit?
unit=59&lesso
n=65)

Line Search
(unit?
unit=59&lesso
n=66)

Gradient
Descent with
Adaptive
Learning Rate
(unit?
unit=59&lesso
n=67)

Bias
Correction in
Adam (unit?
unit=59&lesso
n=68)

Lecture
Material for
Week 4 (unit?
unit=59&lesso
n=69)

Week 4
Feedback
Form: Deep
Learning - IIT
Ropar (unit?
unit=59&lesso
n=187)

Quiz: Week 4
: Assignment
(assessment? Yes, the answer is correct.
name=266) Score: 1
Accepted Answers:
Practice: Week
4: Assignment
4 (Non
Graded)
(assessment?
name=261)

Week 4:
Solution (unit?
unit=59&lesso
n=248)

Week 5 ()

Week 6 ()

Week 7 ()

Week 8 () 3) Choose the correct options for the given gradient descent update rule 1 point
ωt+1 = ωt − η∇ω η ( is the learning rate)
Week 9 ()
The weight update is tiny at a gentle loss surface
week 10 () The weight update is tiny at a steep loss surface
The weight update is large at a steep loss surface
Week 11 ()
The weight update is large at a gentle loss surface
Week 12 () Yes, the answer is correct.
Score: 1
Download Accepted Answers:
Videos () The weight update is tiny at a gentle loss surface
The weight update is large at a steep loss surface
Books ()
4) Which of the following algorithms will result in more oscillations of the parameter 1 point
Text during the training process of the neural network?
Transcripts ()
Stochastic gradient descent

Problem Mini batch gradient descent


Solving Batch gradient descent
Batch NAG

Y h i
Session - Yes, the answer is correct.
Jan 2024 () Score: 1
Accepted Answers:
Stochastic gradient descent

5) Which of the following are among the disadvantages of Adagrad? 1 point

It doesn’t work well for the Sparse matrix.


It usually goes past the minima.
It gets stuck before reaching the minima.
Weight updates are very small at the initial stages of the algorithm.

Yes, the answer is correct.


Score: 1
Accepted Answers:
It gets stuck before reaching the minima.

6) Which of the following is a variant of gradient descent that uses an estimate of the 1 point
next gradient to update the current position of the parameters?

Momentum optimization
Stochastic gradient descent
Nesterov accelerated gradient descent
Adagrad

Yes, the answer is correct.


Score: 1
Accepted Answers:
Nesterov accelerated gradient descent

7) Consider a gradient profile ∇W = [1, 0.9, 0.6, 0.01, 0.1, 0.2, 0.5, 0.55, 0.56]. 1 point
Assume v−1 = 0, ϵ = 0, β = 0.9 and the learning rate is η −1 = 0.1 . Suppose that we use the
Adagrad algorithm then what is the value of η 6 = η/sqrt(vt + ϵ)?

0.03
0.06
0.08
0.006

Yes, the answer is correct.


Score: 1
Accepted Answers:
0.06

8) Which of the following can help avoid getting stuck in a poor local minimum while 1 point
training a deep neural network?

Using a smaller learning rate.


Using a smaller batch size.
Using a shallow neural network instead.
None of the above.

No, the answer is incorrect.

S 0
Score: 0
Accepted Answers:
None of the above.

9) What are the two main components of the ADAM optimizer? 1 point

Momentum and learning rate.


Gradient magnitude and previous gradient.
Exponential weighted moving average and gradient variance.
Learning rate and a regularization term.

Yes, the answer is correct.


Score: 1
Accepted Answers:
Exponential weighted moving average and gradient variance.

10) What is the role of activation functions in deep learning? 1 point

Activation functions transform the output of a neuron into a non-linear function, allowing
the network to learn complex patterns.
Activation functions make the network faster by reducing the number of iterations needed
for training.
Activation functions are used to normalize the input data.
Activation functions are used to compute the loss function.

Yes, the answer is correct.


Score: 1
Accepted Answers:
Activation functions transform the output of a neuron into a non-linear function, allowing the
network to learn complex patterns.

You might also like