0% found this document useful (0 votes)
19 views

Assginment - With Hints

This document contains instructions for 3 questions related to machine learning concepts. Q1 asks about the parameters and forward/backward propagation in a simple 1D convolutional neural network. Q2 asks about selecting appropriate gradient descent algorithms for different scenarios like online learning, steep cost functions, and sparse data. Q3 asks about running Q-learning on a Markov Decision Process to learn state-action values and derive an optimal policy from those values.

Uploaded by

rui
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Assginment - With Hints

This document contains instructions for 3 questions related to machine learning concepts. Q1 asks about the parameters and forward/backward propagation in a simple 1D convolutional neural network. Q2 asks about selecting appropriate gradient descent algorithms for different scenarios like online learning, steep cost functions, and sparse data. Q3 asks about running Q-learning on a Markov Decision Process to learn state-action values and derive an optimal policy from those values.

Uploaded by

rui
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

COMP3057 Assignment 2

Q1 (40 marks):
The following shows a simple 1D convnet with a single convolution kernel, the input
is a 5D vector.

1. List all learnable parameters in the network. (Notes: in the conv layer, k
represents weights of the convolution kernel, b represents the bias (please refer to
P13 of the class slides); in the fc layer, w represents the weights, a represents the
bias.)
k ,b,w,a
2. Write down the forward propagation of the network in a layer-by-layer manner.

3. Write down the backward propagation of the network in a layer-by-layer manner.

Q2 (30 marks):
Consider the following scenarios, you may select appropriate gradient descent
algorithms according to different scenarios, please write down your thoughts (better
with formulas).
1.When dealing with online data.
2.When the area around a local optima is like a ravine, i.e., where the surface curves
are much more steep in one dimension than another.
3.When the data is sparse and the features have very different frequencies.
Please refer to “An overview of gradient descent optimization algorithms”
(https://arxiv.org/pdf/1609.04747.pdf)

Q3 (30 marks):
Consider an unknown Markov Decision Process (MDP) with 3 states (A, B, C) and 2
actions (turnLeft, turnRight), and the agent make decisions according to some policy
COMP3057 Assignment 2

π . Given a dataset consisting of samples ( s , a , s ' , r ), which representing taking an


action a in state s resulting in a transition to state s' and a reward r .(hints: here we
consider a dynamic system , which means the reward in each step is
p ( s , r| s , a ¿
'

also stochastic.)
'
s a s r
A turnRight B 2
C turnLeft B 2
B turnRight C -2
A turnRight B 4
You may consider a discount factor of γ=1.
The update function of Q-learning is:
Q ( s t , at )=( 1−α ) ⋅Q ( st , at ) +α ⋅(r t + γ max Q( s t +1 , a ' )) (1)
'
a

1
Assume all Q-values are initialized to 0 and use a learning rate of α = .
2
1. Run Q-learning with data in the table and compute the value of Q( A , turnRight )
and Q(B , turnRight ). (hints: you may consider to compute Q 1 ( A , turnRight ),
with the update function in
Q1 (C , turnLeft ) , Q1 ( B ,turnRight ) ,Q2 ( A ,turnRight )

Eq.(1))

2. Construct a policy π Q that maximizes the Q-value in a given state:


. What are the actions chosen by the policy in states A and
π Q ( s )=argma x a Q(s , a)

B?

You might also like