Week 11
Week 11
2. Which of the following is a common architecture used for sequence learning in deep learning?
a) Convolutional Neural Networks (CNNs)
b) Autoencoders
c) Recurrent Neural Networks (RNNs)
d) Generative Adversarial Networks (GANs) Answer: c) Recurrent Neural Networks
(RNNs)
Solution: Recurrent Neural Networks (RNNs) are a common architecture used for sequence
learning in deep learning. RNNs are designed to handle sequential data by maintaining a
hidden state that captures the context of the previous inputs in the sequence. This allows
RNNs to model the temporal dependencies between sequential data.
1
In BPTT, what is the role of the error gradient?
a) To update the weights of the connections between the neurons.
b) To propagate information backward through time.
c) To determine the output of the network.
d) To adjust the learning rate of the network. Answer: b) To propagate information
backward through time.
Solution: In BPTT, the error gradient is used to propagate information backward through
time by computing the derivative of the error with respect to each weight in the network.
This allows the network to learn from past inputs and to use that information to make
predictions about future inputs.
5. Arrange the following sequence in the order they are performed by LSTM at time step t.
[Selectively read, Selectively write, Selectively forget]
Answer: c)
Solution: At time step t we first selectively read from the state st−1 , then selectively forget
to create the state st . Then we selectively write to create the state ht from st which will be
used in the t+1 time step.
6. What are the problems in the RNN architecture? (MSQ)
Answer: d)
Solution: Information stored in the network gets morphed at every time step due to new
input. Exploding and vanishing gradient problems are caused by the long dependency chains
in RNN.
7. What is the purpose of the forget gate in an LSTM network?
A) To decide how much of the cell state to keep from the previous time step
B) To decide how much of the current input to add to the cell state
C) To decide how much of the current cell state to output
D) To decide how much of the current input to output
Answer: A) To decide how much of the cell state to keep from the previous time step
Explanation: The forget gate in an LSTM network determines how much of the previous
cell state to forget and how much to keep for the current time step.
8. Which of the following is the formula for calculating the output gate in a GRU network?
A) zt = σ(Wz ∗ [ht−1 , xt ])
B) zt = σ(Wz ∗ ht−1 + Uz ∗ xt )
C) zt = σ(Wz ∗ ht−1 + Uz ∗ xt + bz )
2
D) zt = tanh(Wz ∗ ht−1 + Uz ∗ xt )
Answer: c) zt = σ(Wz ∗ ht−1 + Uz ∗ xt + bz )
Common data for question 1-3
We are given the following RNN. We are also given the architecture for this RNN (doesn’t
include W connecting the states of the network).
9. How many neurons are in the hidden layer at state s2 of the RNN?
a)6
b)2
c)9
d)4
Answer: d)
Solution: There is only one architecture in RNN. The different blocks in the picture
represent the state of the network at different times.
10. We have trained the above given RNN and it has learned weights and biases accordingly. If
the weight of x1 to h1 (1) at s5 is 3, what will be the value of the same weight at s6 ?
a)3
b)6
c)4
d)1
Answer: a)
Solution: Weights for all the states are the same in RNN.