0% found this document useful (0 votes)
116 views

Week 11

Uploaded by

Netaji Gandi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Week 11

Uploaded by

Netaji Gandi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DEEP LEARNING WEEK 11

1. Which of the following is a limitation of traditional feedforward neural networks in handling


sequential data?(MSQ)
a) They can only process fixed-length input sequences
b) They can handle variable-length input sequences
c) They can’t model temporal dependencies between sequential data
d) They are not affected by the order of input sequences
Answer: a),c),d) They can only process fixed-length input sequences
Solution: Traditional feedforward neural networks are limited in their ability to handle
sequential data because they can only process fixed-length input sequences. In contrast,
recurrent neural networks (RNNs) can handle variable-length input sequences and model the
temporal dependencies between sequential data.

2. Which of the following is a common architecture used for sequence learning in deep learning?
a) Convolutional Neural Networks (CNNs)
b) Autoencoders
c) Recurrent Neural Networks (RNNs)
d) Generative Adversarial Networks (GANs) Answer: c) Recurrent Neural Networks
(RNNs)
Solution: Recurrent Neural Networks (RNNs) are a common architecture used for sequence
learning in deep learning. RNNs are designed to handle sequential data by maintaining a
hidden state that captures the context of the previous inputs in the sequence. This allows
RNNs to model the temporal dependencies between sequential data.

3. What is the vanishing gradient problem in training RNNs?


a) The weights of the network converge to zero during training
b) The gradients used for weight updates become too large
c) The gradients used for weight updates become too small
d) The network becomes overfit to the training data
Answer: c) The gradients used for weight updates become too small
Solution: The vanishing gradient problem is a common issue in training RNNs where the
gradients used for weight updates become too small, making it difficult to learn long-term
dependencies in the input sequence. This can lead to poor performance and slow convergence
during training.

4. Which of the following is the main disadvantage of using BPTT?


a) It is computationally expensive.
b) It is difficult to implement.
c) It requires a large amount of data.
d) It is prone to overfitting.
Answer: a) It is computationally expensive.
Solution: The main disadvantage of using BPTT is that it can be computationally
expensive, especially for long sequences. This is because the network needs to be unrolled for
each timestep in the sequence, which can result in a large number of weights and
calculations. Additionally, the use of gradient descent for weight updates can result in slow
convergence and potentially unstable learning.

1
In BPTT, what is the role of the error gradient?
a) To update the weights of the connections between the neurons.
b) To propagate information backward through time.
c) To determine the output of the network.
d) To adjust the learning rate of the network. Answer: b) To propagate information
backward through time.
Solution: In BPTT, the error gradient is used to propagate information backward through
time by computing the derivative of the error with respect to each weight in the network.
This allows the network to learn from past inputs and to use that information to make
predictions about future inputs.
5. Arrange the following sequence in the order they are performed by LSTM at time step t.
[Selectively read, Selectively write, Selectively forget]

a)Selectively read, Selectively write, Selectively forget


b)Selectively write, Selectively read, Selectively forget
c)Selectively read, Selectively forget, Selectively write
d)Selectively forget, Selectively write, Selectively read

Answer: c)
Solution: At time step t we first selectively read from the state st−1 , then selectively forget
to create the state st . Then we selectively write to create the state ht from st which will be
used in the t+1 time step.
6. What are the problems in the RNN architecture? (MSQ)

a)Morphing of information stored at each time step.


b)Exploding and Vanishing gradient problem.
c)Errors caused at time step tn can’t be related to previous time steps faraway
d)All of the above

Answer: d)
Solution: Information stored in the network gets morphed at every time step due to new
input. Exploding and vanishing gradient problems are caused by the long dependency chains
in RNN.
7. What is the purpose of the forget gate in an LSTM network?
A) To decide how much of the cell state to keep from the previous time step
B) To decide how much of the current input to add to the cell state
C) To decide how much of the current cell state to output
D) To decide how much of the current input to output
Answer: A) To decide how much of the cell state to keep from the previous time step
Explanation: The forget gate in an LSTM network determines how much of the previous
cell state to forget and how much to keep for the current time step.
8. Which of the following is the formula for calculating the output gate in a GRU network?
A) zt = σ(Wz ∗ [ht−1 , xt ])
B) zt = σ(Wz ∗ ht−1 + Uz ∗ xt )
C) zt = σ(Wz ∗ ht−1 + Uz ∗ xt + bz )

2
D) zt = tanh(Wz ∗ ht−1 + Uz ∗ xt )
Answer: c) zt = σ(Wz ∗ ht−1 + Uz ∗ xt + bz )
Common data for question 1-3
We are given the following RNN. We are also given the architecture for this RNN (doesn’t
include W connecting the states of the network).

Input Hidden Output


layer layer
(1) 1 layer
h1
x1 ŷ1
(1)
h2
x2 ŷ2
(1)
h3
x3 ŷ3
(1)
h4

9. How many neurons are in the hidden layer at state s2 of the RNN?
a)6
b)2
c)9
d)4
Answer: d)
Solution: There is only one architecture in RNN. The different blocks in the picture
represent the state of the network at different times.
10. We have trained the above given RNN and it has learned weights and biases accordingly. If
the weight of x1 to h1 (1) at s5 is 3, what will be the value of the same weight at s6 ?
a)3
b)6
c)4
d)1
Answer: a)
Solution: Weights for all the states are the same in RNN.

You might also like