Slide 6: Script For 17 March 2020
Slide 6: Script For 17 March 2020
Recap
1. Owing to the large size of problems one has to look for iterative methods for reaching the minimum.
2. Gradient descent forms a class of algorithms where the descent direction - direction in which to move
so that the function value decreases - is a function of the gradient at a point.
xk+1 = xk + αk dk ,
where αk > 0 is the step size and dk is the descent direction, chosen such that
∇f (xk )T dk < 0.
5. Stopping criterion is that the gradient should be zero. In your code this gets reflected as norm of the
gradient becoming very small (threshold is your choice).
6. We stopped with a small experiment which would have driven into your mind the importance of choosing
the step size properly.
Stop reading
and think
Note that this is a function of only αk . Let us call this g(αk ). The “best” value of step size is the one which
minimizes g(αk ).
The mini experiment we did in the last class was minimizing f (x, y) = x2 + y 2 for a fixed step size. Answer
the following questions and figure out if things are better when one goes for the optimum step size. Take a
pen and paper and write down the answers!
1
1. Let the starting point be x0 = (x0 , y0 )T . Write down the expression for gradient at this point.
2. Write down the direction in which the first move is done (d0 ). (i have shifted the index k to subscript)
d
3. The function to be minimized is g(α) = f (x0 + αd0 ). Use chain rule for finding g(α). Since f is
dα
multivariate, the chain rule is
d
g(α) = ∇f (x0 + αd0 )T d0 .
dα
Find the α which minimizes g(α).
4. Post the value of α that you get. (in backchannel, will share the link later).
7. Do you think this has something to do with the shape of the level curves?
Go to slides
• Go to slide 7. You can observe that the minimum is reached in one step in this case.
x2
• Frame 8, slide 1 shows the level curves and how steepest descent proceeds for f (x, y) = + y 2 , when
10
optimum step size is used. Note down your observations and post it.
• Frame 8, slide 2 shows a zoomed in portion of the above, do you notice anything specific about the
direction of motion at each iteration? Post your answer.
• Frame 9, slide 1, 2, 3 shows the 3D plot of the function and progress of steepest descent (and a zoomed
version) for the function f (x, y) = (1 − x)2 + (y − x2 )2 . Again post your observations.
• Frame 10, slide 1, 2, 3 shows a plot of g(α) for the first three iterations. The value of α at which the
function becomes minimum is used as the step size.
If you made a specific observation about the direction in each step try expressing your observation as a
mathematical statement. And then prove it ☺.
Line search
This method of finding the optimum step size is called as line search since one basically moves along
a line to find the best location.
2
TODO
Write a program to implement the line search algorithm.