0% found this document useful (0 votes)
20 views

Ps 2

This document contains the problem set for a course on advanced graph algorithms and optimization. It includes 5 exercises exploring concepts like matrix norms, gradient descent, convexity, and strong convexity. It asks students to prove theorems related to gradient descent convergence and properties of strongly convex functions. It also provides analysis problems involving determining convexity and Lipschitz constants of functions.

Uploaded by

Maja Gwozdz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Ps 2

This document contains the problem set for a course on advanced graph algorithms and optimization. It includes 5 exercises exploring concepts like matrix norms, gradient descent, convexity, and strong convexity. It asks students to prove theorems related to gradient descent convergence and properties of strongly convex functions. It also provides analysis problems involving determining convexity and Lipschitz constants of functions.

Uploaded by

Maja Gwozdz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Advanced Graph Algorithms and Optimization Spring 2021

Linear Algebra, Convexity, and Gradient Descent


R. Kyng & M. Probst Problem Set 2 — Tuesday, March 2nd

The exercises for this week will not count toward your grade, but you are highly encouraged to
solve them all.

Exercise 0.

Prove that if a matrix A ∈ Rn×n is symmetric, then kAk = max(|λmax (A)| , |λmin (A)|) and give
an example of a non-symmetric matrix for which this is not true.

Exercise 1.

Consider a twice continuously differentiable function f : S → R, where S ⊂ Rn is a convex open


set. Prove that f is β-gradient Lipschitz if and only if for all x ∈ S we have kH f (x )k ≤ β.

Exercise 2.

Prove that when running Gradient Descent, kx i − x ∗ k2 ≤ kx 0 − x ∗ k2 for all i.

Exercise 3.

Prove the following theorem.

Theorem. Let f : Rn → R be an β-gradient Lipschitz, convex function. Let x 0 be a given starting


point, and let x ∗ ∈ arg minx ∈Rn f (x ) be a minimizer of f . The Gradient Descent algorithm given
by
1
x i+1 = x i − ∇f (x i )
β
ensures that the kth iterate satisfies

2β kx 0 − x ∗ k22
f (x k ) − f (x ∗ ) ≤ .
k+1

Hint: do an induction on 1/gapi .

Exercise 4.

1. For each of the following functions answer these questions:

• Is the function convex?

1
• Is the function β-gradient Lipschitz for some β?
• If the function is β-gradient Lipschitz give an upper bound on β – the bound should be
within a factor 4 of the true value.

(a) f (x) = |x|1.5 on x ∈ R


(b) f (x) = exp(x) on x ∈ R
(c) f (x) = exp(x) on x ∈ (−1, 1)

(d) f (x, y) = x + y on (x, y) ∈ (0, 1) × (0, 1).

(e) f (x, y) = x + y on (x, y) ∈ (1/2, 1) × (1/2, 1).
p
(f) f (x, y) = x2 + y 2 on (x, y) ∈ R2 .

Special Exercise: Strongly Convex Functions

Let f : Rn → R be a convex function. Assume f is twice continuously (Frechét) differentiable and


that its first and second (Frechét) derivatives are integrable (basically, don’t worry that weird stuff
is happening with the derivatives).
Assume that for all x , we have for some constant µ > 0, that λmin (Hf (x )) ≥ µ. When this holds,
we say that f is µ-strongly convex.

Part A. Prove that for all x , y ∈ Rn


µ
f (y ) ≥ f (x ) + ∇f (x )> (y − x ) + ky − x k22 .
2

Part B. Prove that there is value L ∈ R such that for all x ∈ Rn , we have f (x ) ≥ L. In other
words, the function is not unbounded below.

Part C. Prove that f is strictly convex as per Definition 3.2.8 in Chapter 3. Prove also that the
minimizer x ∗ ∈ arg minx ∈Rn f (x ) of f is unique.

Part D. Let x 0 be a given starting point and x ∗ be the minimizer of f . Suppose we have
an algorithm DecentDescent which takes a starting point x 0 , and a step count t ∈ N.
DecentDescent(x 0 , t) runs for t steps and returns x̃ ∈ Rn such that

γ kx 0 − x ∗ k22
f (x̃ ) − f (x ∗ ) ≤
t+1
where γ > 0 is a positive number.
Assume that the cost of running DecentDescent for t steps is t. Explain how, with a total cost
of at most 8γ ∗ x − x ∗ k2 ≤ δ for
b ∈ Rn such that kb
µ log(kx 0 − x k2 /δ), we can produce a point x
δ > 0.

2
Part E. Consider a function h : Rn → R which is both µ-strongly convex and β-gradient Lipschitz.
Give an algorithm that returns x 0 with

h(x 0 ) − h(x ∗ ) ≤ 

by computing the gradient of h at at most 32β


µ log(2β kx 0 − x ∗ k22 /) points.

You might also like