0% found this document useful (0 votes)
15 views

Week 3

The document discusses deep learning concepts including neural network architecture, forward and backpropagation, loss functions, and activation functions. It provides examples of calculating outputs, loss, and gradients for a sample neural network. It also contains multiple choice questions testing understanding of entropy, cross entropy, and appropriate activation functions for different problem types.

Uploaded by

savageq469
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Week 3

The document discusses deep learning concepts including neural network architecture, forward and backpropagation, loss functions, and activation functions. It provides examples of calculating outputs, loss, and gradients for a sample neural network. It also contains multiple choice questions testing understanding of entropy, cross entropy, and appropriate activation functions for different problem types.

Uploaded by

savageq469
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DEEP LEARNING WEEK 3

0. Use the following data to answer the questions below The following diagram
represents a neural network containing two hidden layers and one output layer. The input to
the network is a column vector x ∈ R3 . The activation function used in hidden layers is
sigmoid. The output layer doesn’t contain any activation function and the loss used is
squared error loss (predy − truey )2 .
Input Hidden Hidden Output
layer layer 1 layer 2 layer
x1 (1)
h1
(2)
h1
x2 (1)
h2 ŷ1
(2)
h2
x3 (1)
h3

The following network doesn’t contain any biases and the weights of the network are given
below: 
1 1 2  
1 1 2  
W1= 3 1 1 W2= W3= 2 5
3 1 1
1 2 3
 
1
The input to the network is: x=1
1
The target value y is: y=10
1. What is the total number of parameters in the following network?
a) 15
b)7
c)9
d)17
Answer:d)
Solution: Elements of weight and bias matrix represent the parameters of the network.
Since the biases are not present in the network counting the elements of weight matrices
gives the answer.
2. What is the predicted output for the given input x1 after doing the forward pass? (Choose
the option closest to your answer)
a)7.33
b)6.92
c)6.31
d)8
Answer:b)
Solution: Doing
 the forward
   pass
 in the network we get
1 1 2 1 4
h1=W1.x1=3 1 1 . 1 = 5
1 2 3 1 6

1
 
0.982
a1= sigmoid(h1)=0.993a
0.997
 
  0.982  
1 1 2  3.969
h2=W1.a1= . 0.993 =
3 1 1 4.936
0.997
 
0.981
a2=sigmoid(h2)=
0.992
 
  0.981
y= 2 5 . = 6.922
0.992

3. Compute and enter the loss between the output generated by input x and the true output y.
(NAT)
Answer:Range(9.38,9.58)
Solution: Loss=(6.922 − 10)2 = 9.447
4. If we call the predicted y as ŷ then what is the gradient dL/dŷ? (L is the loss function)

a)-5.17
b)-7.52
c)-6.15
d)-7.15
Answer:c)
Solution:dL/dŷ = 2x(ŷ − y)=2x(6.922-10)=-6.15
5. What is the sum of elements of ∇w3? (Choose the closest value to your answer)

a)-12.9
b)-11.6
c)-10.07
d)-12.14
Answer:d)
Solution:
[∇w31, ∇w32] = [a31 × dL/dŷ, a32 × dL/dŷ]=[0.981x-6.156,0.992x-6.156]=[-6.039,-6.106]. The
sum of elements of this vector should give the required answer.
6. What is the sum of elements of ∇w2?
Answer: Range(-1.2,-1.4)
Solution: To find ∇w2 , find ∇h31, ∇h32 and then compute [a21, a22] ∗ [∇h31, ∇h32]T .
7. What is the sum of elements of ∇w2?
Answer: Range(-0.04,-0.08)
Solution:
8. The probability of all the events x1 , x2 , x2 ....xn in a system is equal(n > 1). What can you
say about the entropy H(X) of that system?(base of log is 2)

a)H(X) ≤ 1
b)H(X) = 1
c)H(X) ≥ 1

2
d)We can’t say anything conclusive with the provided information.

Answer: c)
Solution: P
Since all elements are
Pnequal our entropy is of the form
n
H(X) = i=1 −pi .log(pi ) = i=1 −log(1/n) ≥ 1

9. Let p and q be two probability distributions. Under what conditions will the cross entropy
between p and q be minimized?
a) p=q
b) All the values in p are lower than corresponding values in q
c) All the values in p are lower than corresponding values in q
d) p = 0 [0 is a vector]
Answer: a
Solution: Cross entropy is lowest when both distributions are the same.

10. Suppose we have a problem where data x and label y are related by y = x2 + 1. Which of the
following is not a good choice for the activation function in the hidden layer if the activation
function at the output layer is linear?

a)Linear
b)Relu
c)Sigmoid
d)Tan−1 (x)

Answer: a)
Solution: If we chose the first activation function then the output of the neural network will
be a linear function of the data since the network is just doing a combination of weight and
biases at every layer, hence we won’t be able to learn the non-linear relationship.

You might also like