convex-fns-scribed
convex-fns-scribed
Lecture 2: August 28
Lecturer: Ryan Tibshirani Scribes: Shuyang Yang, Xingyu Liu, Bo Lei
min f (x)
x∈D
subject to
gi (x) ≤ 0, i = 1, ..., m
hj (x) = 0, j = 1, ..., r
where f and gi are all convex , and hj are affine. Any local minimizer of a convex optimization problem is
a global minimizer.
2.2.1 Definitions
Definition 2.1 Convex set: a set C ⊆ Rn is a convex set if for any x, y ∈ C, we have
tx + (1 − t)y ∈ C, f or all 0 ≤ t ≤ 1
Definition 2.3 Convex hull of set C: all convex combinations of elements in C. The convex hull is always
convex.
Definition 2.4 Cone: a set C ⊆ Rn is a cone if for any x ∈ C, we have tx ∈ C for all t ≥ 0
x1 , x2 ∈ C =⇒ t1 x1 + t2 x2 ∈ C for all t1 , t2 ≥ 0
2-1
2-2 Lecture 2: August 28
θ1 x1 + ... + θk xk , with θi ≥ 0
conv{e1 , ..., en } = {w : w ≥ 0, 1T w = 1}
C ⊆ {x : aT x ≤ b}, D ⊆ {x : aT x ≥ b}
• Supporting hyperplane theorem: a boundary point of a convex set has a supporting hyperplane
passing through it. Formally, if C is a nonempty convex set, and x0 ∈ bd(C), then there exists a such
that
C ⊆ {x : aT x ≤ aT x0 }
Lecture 2: August 28 2-3
2.2.5.1 Operations
Approach 1: directly verify that x, y ∈ C ⇒ tx + (1 − t)y ∈ C. This follows by checking that, for any
v, !
Xk
T
v B− (txi + (1 − t)yi )Ai ) v ≥ 0
i=1
Pk
k n
Approach 2: let f : R → S , f (x) = B − i=1 xi Ai . Note that C = f −1 (Sn+ ), affine preimage of convex set.
Let U, V be random variables over {1, ..., n}, {1, ..., m}. Let C ⊆ Rnm be a set of joint distributions for U, V ,
i.e., each p ∈ C defines joint probabilities
pij = P(U = i, V = j)
nm
Let D ⊆ R contain corresponding conditional distributions, i.e., each q ∈ D defines
qij = P(U = i|V = j)
Assume C is convex. Let’s prove that D is convex. Write
pij
D = q ∈ Rnm : qij = Pn , for some p ∈ C = f (C)
k=1 pkj
where f is a linear-fractional function, hence D is convex.
2-4 Lecture 2: August 28
2.3.1 Definitions
Definition 2.8 Convex function: f: Rn → R such that the domain of function f dom(f ) ⊆ Rn is convex.
f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y), f or 0 ≤ t ≤ 1
And all x, y ∈ dom(f )
In other words, the function lies below the line segment joining f (x) and f (y)
Definition 2.9 Concave function: opposite inequality of the definition above, so that
f concave ⇔ −f convex
• Strictly Convex: f (tx + (1 − t)y) < tf (x) + (1 − t)f (y), f or x 6= y and 0 < t < 1.
In other words, f is convex and has greater curvature than a linear function.
• Strongly Convex: With parameter m > 0, f (− m 2
2 ||x||2 ) is convex.
In other words, f is at least as convex as a quadratic function.
Note: strongly convex implies strictly convex, which subsequently implies convex. In equation format:
strongly convex ⇒ strictly convex ⇒ convex
where σ1 (X) ≥ ... ≥ σr (X) ≥ 0 are the singular values of the matrix X.
Lecture 2: August 28 2-5
is convex
• Support function: for any set C (convex or not), its support function
is convex
is a convex set.
x ∈ dom(f ) : f (x) ≤ t
• First-order characterization: if f is differentiable, then f is convex if and only if dom(f) is convex, and
for all x, y ∈ dom(f ). Therefore for a differentiable convex function ∇f (x) = 0 ⇔ x minimizes f .
• Jensen’s inequality: if f is convex, and X is a random variable supported on dom(f ), then f (E[X]) ≤
E[f (x)].
Pk T
• Long-sum-exp function: g(x) = log( i=1 eai x+bi ) for fixed ai , bi . This is often called the soft max,
since it smoothly approximates maxi=1,...,k (aTi x + bi ).
• Pointwise maximization: if fs is convex for any s ∈ S, then f (x) = maxs∈S is also convex.
Note: the set S is the number of functions fx , which can be infinite.
2-6 Lecture 2: August 28
• Partial minimization: if g(x, y) is convex in x, y, and C is convex, then f (x) = miny∈C g(x, y) is convex.
where g : Rn → Rk , h : Rk → R, f : Rn → R. Then:
(1) f is convex if h is convex and nondecreasing in each argument, g is convex
(2) f is convex if h is convex and nonincreasing in each argument, g is concave
(3) f is concave if h is concave and nondecreasing in each argument, g is concave
(4) f is concave if h is concave and nonincreasing in each argument, g is convex