0% found this document useful (0 votes)
35 views

Slide 6: Script For 17 March 2020

1. The document discusses gradient descent and methods for selecting the step size αk at each iteration to minimize a function f(x). 2. It describes how the "best" value of the step size αk is the one that minimizes the function g(αk) = f(xk + αkdk). 3. An example is given where the minimum of f(x,y) = x^2 + y^2 is found in one step by choosing the optimal step size through line search along the descent direction dk.

Uploaded by

RaviKumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Slide 6: Script For 17 March 2020

1. The document discusses gradient descent and methods for selecting the step size αk at each iteration to minimize a function f(x). 2. It describes how the "best" value of the step size αk is the one that minimizes the function g(αk) = f(xk + αkdk). 3. An example is given where the minimum of f(x,y) = x^2 + y^2 is found in one step by choosing the optimal step size through line search along the descent direction dk.

Uploaded by

RaviKumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Script for 17 March 2020

We stopped at slide 6 in the previous lecture.

Recap
1. Owing to the large size of problems one has to look for iterative methods for reaching the minimum.

2. Gradient descent forms a class of algorithms where the descent direction - direction in which to move
so that the function value decreases - is a function of the gradient at a point.

3. Starting with an initial location x0 , the algorithm proceeds as

xk+1 = xk + αk dk ,

where αk > 0 is the step size and dk is the descent direction, chosen such that

∇f (xk )T dk < 0.

4. For the steepest descent


−∇f (xk )
dk = ,
∥ ∇f (xk ) ∥
note that the normalization is optional.

5. Stopping criterion is that the gradient should be zero. In your code this gets reflected as norm of the
gradient becoming very small (threshold is your choice).

6. We stopped with a small experiment which would have driven into your mind the importance of choosing
the step size properly.

Selection of step size


At the k th iteration the location is xk and dk = −∇f (xk ) is the direction in which next move is made. Once
the direction dk is selected, the function value is f (xk + αk dk ).

What do you observe aboutf (xk + αk dk )?

Stop reading
and think

Note that this is a function of only αk . Let us call this g(αk ). The “best” value of step size is the one which
minimizes g(αk ).
The mini experiment we did in the last class was minimizing f (x, y) = x2 + y 2 for a fixed step size. Answer
the following questions and figure out if things are better when one goes for the optimum step size. Take a
pen and paper and write down the answers!

1
1. Let the starting point be x0 = (x0 , y0 )T . Write down the expression for gradient at this point.

2. Write down the direction in which the first move is done (d0 ). (i have shifted the index k to subscript)

d
3. The function to be minimized is g(α) = f (x0 + αd0 ). Use chain rule for finding g(α). Since f is

multivariate, the chain rule is
d
g(α) = ∇f (x0 + αd0 )T d0 .

Find the α which minimizes g(α).

4. Post the value of α that you get. (in backchannel, will share the link later).

5. Substitute this α in the expression for x1 . What do you get?

6. In how many steps is the minimum reached?

7. Do you think this has something to do with the shape of the level curves?

Go to slides
• Go to slide 7. You can observe that the minimum is reached in one step in this case.

x2
• Frame 8, slide 1 shows the level curves and how steepest descent proceeds for f (x, y) = + y 2 , when
10
optimum step size is used. Note down your observations and post it.

• Frame 8, slide 2 shows a zoomed in portion of the above, do you notice anything specific about the
direction of motion at each iteration? Post your answer.

• Frame 9, slide 1, 2, 3 shows the 3D plot of the function and progress of steepest descent (and a zoomed
version) for the function f (x, y) = (1 − x)2 + (y − x2 )2 . Again post your observations.

• Frame 10, slide 1, 2, 3 shows a plot of g(α) for the first three iterations. The value of α at which the
function becomes minimum is used as the step size.

If you made a specific observation about the direction in each step try expressing your observation as a
mathematical statement. And then prove it ☺.

Line search
This method of finding the optimum step size is called as line search since one basically moves along
a line to find the best location.

2
TODO
Write a program to implement the line search algorithm.

You might also like