0% found this document useful (0 votes)
26 views

Optimization Problem

The document discusses optimization problems and methods for solving them. It describes: 1) Optimization problems involve finding the minimum or maximum of an objective function subject to constraints. They can be unconstrained or constrained. 2) Gradient descent is an analytical method that iteratively moves the solution in the direction of steepest descent. 3) Newton's method fits a quadratic approximation to find the minimum, updating the solution faster than gradient descent due to using second-order information. 4) Conjugate gradient methods choose directions conjugate to the Hessian, allowing faster convergence than gradient descent for some problems.

Uploaded by

Lyes Br
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Optimization Problem

The document discusses optimization problems and methods for solving them. It describes: 1) Optimization problems involve finding the minimum or maximum of an objective function subject to constraints. They can be unconstrained or constrained. 2) Gradient descent is an analytical method that iteratively moves the solution in the direction of steepest descent. 3) Newton's method fits a quadratic approximation to find the minimum, updating the solution faster than gradient descent due to using second-order information. 4) Conjugate gradient methods choose directions conjugate to the Hessian, allowing faster convergence than gradient descent for some problems.

Uploaded by

Lyes Br
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Mathematics Basics

Optimization Problems

Huawei Confidential
Contents
 Mathematics and AI

 Linear Algebra

 Probability and Statistics

 Optimization Problems

• Classification of Optimization Problems


• Gradient Descent Method
• Newton's Method and Conjugate Gradient

Huawei Confidential
Optimization Problems
 Optimization problem: a problem of changing the values of parameters(decision variables) 𝑥 to
minimize or maximize objective function 𝑓(𝑥), It can be represented by

𝑥 ∗ = 𝑎𝑟𝑔min 𝑓(𝑥) , 𝑥 = (𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 )𝑇 ∈ 𝑅𝑛


𝑥

𝑠. 𝑡. 𝑐𝑖 𝑥 ≥ 0, 𝑖 = 1, 2, ⋯ , 𝑚, inequality constraint

𝑐𝑗 𝑥 = 0, 𝑗 = 1, 2, ⋯ , 𝑝, equality constraint
 Constraints define a feasible region, which is nonempty.
 If we seek a maximum of 𝑓 𝑥 it is equivalent to seeking to a min of −𝑓 𝑥 .
 In an optimization problem, if there are no other constraints for each variable except for the objective
function, then it is called an unconstrained optimization problem. Otherwise, it is called a constrained
optimization problem.

Huawei Confidential
Solutions to Optimization Problems
 Solutions to unconstrained optimization: mainly include analytical methods and direct methods.
 Direct methods are usually used when the representation of an objective function is
complicated or cannot be specified. Through numerical calculation in a series of iterative
processes, a range of points will be generated for searching for an optimal point.
 The analytical methods, also known as indirect methods, obtain the optimal solution based on
the analytical expression of the objective function that an unconstrained optimization problem
focuses on. The analytical methods mainly include gradient descent method, Newton's method,
Quasi-Newton method, conjugate direction method, and conjugate gradient method.

Huawei Confidential
Solutions to Optimization Problems
 Solutions to constrained optimization: The method of Lagrange multiplier is usually used in
solving optimization problems subject to equality constraints, while the Karush–Kuhn–Tucker
(KKT) approach is used in solving problems subject to inequality constraints. These methods turn
constrained optimization problems involving n variables and k constraints into unconstrained
optimization problems involving (n+k) variables.
 In this course, we focus on the most common solution to unconstrained optimization problems in
deep learning, that , the gradient descent method and Newton method.

Huawei Confidential
Extension to N dimensions
 How big N can be?
 Problem sizes can vary from a handful of parameters to many thousands.

 We will consider examples for N=2, so that cost function surfaces can be visualized.

Two-dimensional space
Original point

Huawei Confidential
An optimization Algorithm
 Start at 𝑥0 , k = 0.
1. Compute a search direction 𝑝𝑘 .
2. Compute a step size 𝛼𝑘 , such that 𝑓(𝑥𝑘 + 𝛼𝑘 𝑝𝑘 ) < 𝑓(𝑥𝑘 ) 𝑘 =𝑘+1

3. Update 𝑥𝑘 = 𝑥𝑘 + 𝛼𝑘 𝑝𝑘
4. Check for convergence(stopping criteria)
e.g. 𝛻𝑓(𝑥) = 0

Huawei Confidential
Contents
 Mathematics and AI

 Linear Algebra

 Probability and Statistics

 Optimization Problems

• Classification of Optimization Problems


• Gradient Descent Method
• Newton's Method and Conjugate Gradient

Huawei Confidential
Gradient Descent
 Convex function: If 𝜆 ∈ (0, 1) and any 𝑥1 , 𝑥2 ∈ 𝑅 satisfy

𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓(𝑥2 ),

𝑓(𝑥) is called a convex function. The minima of a convex function appears at the stationary points.

Huawei Confidential
Gradient Descent
 Basic principle is to minimize the N-dimensional function by a series of 1D line-minimizations:
𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘 𝑝𝑘
 The gradient descent method chooses 𝑝𝑘 to be parallel to the gradient
𝑝𝑘 = −𝛻𝑓(𝑥𝑘 )
 Step size 𝛼𝑘 is the learning rate, a positive scalar determining step chosen to minimize 𝑓(𝑥𝑘 +
𝛼𝑘 𝑝𝑘 ).

𝑝𝑘𝑇 𝑝𝑘
𝛼𝑘 = 𝑇
𝑝𝑘 𝐻𝑝𝑘
 Gradient descent converges when every element of the gradient is zero or close to zero.

Huawei Confidential
Gradient Descent
 The gradient is everywhere perpendicular to the contour lines.
 After each line minimization the new gradient orthogonal to the previous step direction.
Therefore the iterates tend to zig-zag down the valley.

Huawei Confidential
Contents
 Mathematics and AI

 Linear Algebra

 Probability and Statistics

 Optimization Problems

• Classification of Optimization Problems


• Gradient Descent Method
• Newton's Method and Conjugate Gradient

Huawei Confidential
Newton’s Method-1D
 Fit a quadratic approximation to 𝑓 𝑥 using both gradient and curvature information at 𝑥.
 Expand 𝑓(𝑥) locally using a Taylor series.
1
𝑓 𝑥 + 𝛿𝑥 = 𝑓 𝑥 + 𝑓(𝑥)𝛿𝑥 + 𝑓 𝑥 𝛿𝑥 2 + 𝜊(𝛿𝑥 2 )
2
 Find the 𝛿𝑥 which minimizes this local quadratic approximation.

𝑓(𝑥)
𝛿𝑥 = −
𝑓 𝑥

𝑓(𝑥)
 Update 𝑥. 𝑥𝑛+1 = 𝑥𝑛 − 𝛿𝑥 = 𝑥𝑛 − 𝑓 𝑥

Huawei Confidential
Newton’s Method-N Dimension
 Expand 𝑓(𝑥) locally using a Taylor series 𝑥𝑘 .
1
𝑓 𝑥𝑘 + 𝛿𝑥 = 𝑓 𝑥𝑘 + ℊk𝑇 𝛿𝑥 + 𝛿𝑥 𝑇 𝐻𝑘 𝛿𝑥
2
Where the gradient is the vector
𝑇
𝜕𝑓 𝜕𝑓
ℊ𝑘 = 𝛻𝑓 𝑥𝑘 = …
𝑥1 𝑥𝑁

And the Hessian is the symmetric matrix

𝜕2𝑓 𝜕2𝑓

𝜕𝑥12 𝜕𝑥1 𝜕𝑥𝑁
𝐻𝑘 = 𝐻 𝑥𝑘 = ⋮ ⋱ ⋮
2
𝜕 𝑓 2
𝜕 𝑓

𝜕𝑥𝑁 𝜕𝑥1 𝜕𝑥𝑁2
Huawei Confidential
Newton’s Method-N Dimension
 For a minima we require that 𝛻𝑓 𝑥 = 0, and so 𝛻𝑓 𝑥 = ℊ𝑘 + 𝐻𝑘 𝛿𝑥 = 0
 With the solution 𝛿𝑥 = −𝐻𝑘−1 ℊ𝑘 , this gives the iterative update
𝑥𝑘+1 = 𝑥𝑘 − 𝐻𝑘−1 ℊ𝑘
 If 𝑓(𝑥) is quadratic, then the solution is found in one step.
 The method has quadratic convergence(as in the 1D case).
 The solution 𝛿𝑥 = −𝐻𝑘−1 ℊ𝑘 is guaranteed to be a downhill direction.
 Rather than jump straight to the minimum, better to perform a line minimization which ensures global
convergence
𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝑘 𝐻𝑘−1 ℊ𝑘
 If 𝐻 = 𝐼 then this reduces to gradient descent.

Huawei Confidential
Newton’s Method
 Quadratic convergence(decimal accuracy doubles at each iteration)
 Global convergence of Newton’s method is poor if the starting point is too far from the minima.
 In practice, combined with a globalization strategy which reduces the step size until the function
decrease is assured.

Huawei Confidential
Conjugate Gradient
 Each direction 𝑝𝑘 is chosen to be conjugate to all previous directions with respect to Hessian 𝐻:

𝛻𝑓𝑘𝑇 𝛻𝑓𝑘
𝑝𝑖𝑇 𝐻𝑝𝑗 = 0, i ≠ 𝑗; 𝑝𝑘 = 𝛻𝑓𝑘 + 𝑝
𝑇
𝛻𝑓𝑘−1 𝛻𝑓𝑘−1 𝑘−1
 Compute step size 𝛼𝑘 for 𝑥𝑘 at Hessian 𝐻𝑘 . Set 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘 𝑑𝑘 and calculate 𝑓𝑘+1 = 𝑓(𝑥𝑘+1 ).

ℊ𝑘𝑇 ℊ𝑘
𝛼𝑘 = 𝑇
𝑑𝑘 𝐻𝑑𝑘
 If 𝛼𝑘 𝑑𝑘 < 𝜀, output 𝑥 ∗ = 𝑥𝑘+1 and 𝑓 𝑥 ∗ = 𝑓𝑘+1 and stop.

𝑇
ℊ𝑘+1 ℊ𝑘+1
 Compute ℊ𝑘+1 . Compute 𝛽𝑘 = ℊ𝑘𝑇 ℊ𝑘

 Generate new direction 𝑑𝑘+1 = −ℊ𝑘+1 + 𝛽𝑘 𝑑𝑘

Huawei Confidential
Conjugate Gradient
 An N-dimensional quadratic form can be minimized in at most N conjugate descent
steps.

Huawei Confidential
Summary

 This chapter mainly introduces the essential mathematics topics used in AI,
including linear algebra, probability and statistics, as well as the optimization
problems. It lays a foundation for other learning materials.

Huawei Confidential
More Information

Huawei e-Learning website


 https://support.huawei.com/learning/en/newindex.html

Huawei support case library


 https://support.huawei.com/enterprise/en/index.html

Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2020 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like