DL - Assignment 4 Solution
DL - Assignment 4 Solution
Deep Learning
Assignment- Week 4
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Which of the following cannot be realized with single layer perceptron (only input and output
layer)?
a. AND
b. OR
c. NAND
d. XOR
Correct Answer: d
Detailed Solution:
It cannot implement XOR gate as it cannot be classified by a linear separator.
QUESTION 2:
For a function f (θ0, θ1), if θ0 and θ1 are initialized at a local minimum, then what should be the
values of θ0 and θ1 after a single iteration of gradient descent:
Correct Answer: b
Detailed Solution:
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
QUESTION 3:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: b
Detailed Solution:
Follow lecture 17
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
Suppose for a cost function 𝐽(𝜃) = 0.25𝜃 2 as shown in graph below, refer to this graph and
choose the correct option regarding the Statements given below. 𝜃 is plotted along horizontal
axis.
Statement i: The magnitude of weight update at the green point is higher than the magnitude
of weight update at yellow point.
Statement ii: The magnitude of weight update at the green point is higher than the magnitude
of weight update at red point.
Correct Answer: a
Detailed Solution:
Weight update is directly proportional to the magnitude of the gradient of the cost
𝜕𝐽(𝜃)
function. In our case, = 0.5𝜃. So, the weight update will be more for higher values of 𝜃.
𝜕𝜃
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 5:
Choose the correct option. Gradient of a continuous and differentiable function is:
i) is zero at a minimum
ii) is non-zero at a maximum
iii) is zero at a saddle point
iv) magnitude decreases as you get closer to the minimum
Correct Answer: b
QUESTION 6:
Input to SoftMax activation function is [3,1,2]. What will be the output?
a. [0.58,0.11, 0.31]
b. [0.43,0.24, 0.33]
c. [0.60,0.10,0.30]
d. [0.67, 0.09,0.24]
Correct Answer: d
Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟑) = 𝑛 =0.67and similarly the other values
∑𝑘=1 𝒆𝒙𝒌
QUESTION 7:
If SoftMax of 𝑥𝑗 is denoted as 𝜎(𝑥𝑗 ) where 𝑥𝑗 is the jth element of the n-dimensional vector X
𝜕𝜎(𝑥𝑗 )
i.e., X = [𝑥1 , … , 𝑥𝑗 , … , 𝑥𝑛 ], then derivative of 𝜎(𝑥𝑗 ) w.r.t. 𝑥𝑗 i.e., is given by,
𝜕𝑥𝑗
a. 𝜎(𝑥𝑗 ) x 𝜎(𝑥𝑗 )
b. 1 − 𝜎(𝑥𝑗 )
c. 0
d. 𝜎(𝑥𝑗 ) x (1 − 𝜎(𝑥𝑗 ))
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: d
Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
QUESTION 8:
Which of the following options is true?
a. In Stochastic Gradient Descent, a small batch of sample is selected randomly
instead of the whole data set for each iteration. Too large update of weight
values leading to faster convergence
b. In Stochastic Gradient Descent, the whole data set is processed together for
update in each iteration.
c. Stochastic Gradient Descent considers only one sample for updates and has
noisier updates.
d. Stochastic Gradient Descent is a non-iterative process
Correct Answer: c
Detailed Solution:
Stochastic Gradient Descent considers just one sample for update and thus has noisier updates.
QUESTION 9:
What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Re-iterate until you find the best weights of network
3. Pass an input through the network and get values from output layer
4. Initialize random weight and bias
5. Go to each neurons which contributes to the error and change its respective values to reduce
the error
a. 1, 2, 3, 4, 5
b. 5, 4, 3, 2, 1
c. 3, 2, 1, 5, 4
d. 4, 3, 1, 5, 2
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: d
Detailed Solution:
Initialize random weights, and then start passing input instances and calculate error response
from output layer and back-propagate the error through each subsequent layers. Then update
the neuron weights using a learning rate and gradient of error. Please refer to the lectures of
week 4.
QUESTION 10:
J(θ) = 2θ2 -2θ+2 is a given cost function? Find the correct weight update rule for gradient
descent optimization at step t+1? Consider, 𝛼=0.01 to be the learning rate.
a. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 0.01(2𝜃 − 1)
c. 𝜃𝑡+1 = 𝜃𝑡 − (2𝜃 − 1)
d. 𝜃𝑡+1 = 𝜃𝑡 − 0.02(2𝜃 − 1)
Correct Answer: d
Detailed Solution:
𝜕𝐽(𝜃)
= 4𝜃 − 2 = 2(2𝜃 − 1)
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 0.01 ∗ 2(2𝜃 − 1) = 𝜃𝑡 − 0.02(2𝜃 − 1)
___________________________________________________________________
______________________________________________________________________________
************END*******