Project For Automated Train by Roshan
Project For Automated Train by Roshan
Unconstrained Nonlinear Programming General statement of unconstrained nonlinear programming (NLP) problem:
Maximize F ( x1, x2 ,..., xn )
x1,x 2 ,...,x n
All candidate solutions are feasible for unconstrained problems and optimum always lies inside feasible region. Necessary Conditions: 1. Feasibility Any x* is feasible 2. Stationarity If x* is a local maximum then: F ( x*) =0 x j 3. Inequality Lagrange multiplier Not applicable since there are no constraints 4. Curvature If x* is local maximum then:. 2 L( x*, ) 2 F ( x*) Wkl = Z ik Z lj = 0 x j x k x j x k In NLP problems local maxima are usually not necessarily global maxima (convexity conditions usually do not hold). Finding a Local Maximum. Use an iterative search that moves from one candidate solution x k = xik on iteration k to a new candidate solution x k +1 = xik +1 on iteration k + 1:
Search procedure: 0 Start by selecting an initial solution x , set k = 0. 1. Test for convergence, if converged set x * = x k and exit. 2. Compute a search direction vector p k = p ik 3. Compute a search distance > 0 along p k . For some search methods is set to 1. 4. Compute the new solution x k +1 = x k + x k = x k + k p k and go to Step 1 If search is an ascent method then F(x + 1) F(x) for sufficiently small. x2 First determine search direction p . Then find
k k k k k
p x2
x1 x1=1p1 x0 x1 There are a number of different search alternatives for UNLP.. Steepest Ascent/Gradient Search Derive p by expanding F(x + 1) in a first-order Taylor series in x k = k p ik :
k k
k +1
F (x
F ( x k ) k k ) = F (x ) + pi + K xi
k
F ( x k ) F ( x) = where = Gradient xi x = x k xi
The increase in F(x) is largest when p ik is aligned with the objective function gradient F ( x k ) / x j :
p ik =
F ( x k ) xi
If the step x k is too large the first-order expansion may not be valid. Control the step with , which is derived by maximizing F(x
k k k Maximize F ( x + p i )
k+1
Use an iterative univariate search algorithm to solve this problem (see Gill et al. for alternatives). This requires multiple evaluations of F(x) on each iteration. Usually the gradient F ( x k ) / x j is derived numerically, using finite difference or adjoint methods that require multiple evaluations of F(x) on each iteration. Steepest ascent characteristics: Converges very slowly (or not at all). k Univariate search for is essential Many function evaluations required Newtons Method k k k k Derive p by expanding F(x + 1) in a second-order Taylor series about x (with = 1, so x k = p k ):
F (x
k +1
F ( x k ) k 1 2 F ( x k ) k k ) = F (x ) + pi + pi p j + K xi 2 xi x j
k
where:
2 F ( x) xi x j
=
x= xk
2 F (x k ) k = H ij = objective Hessian xi x j
k
Maximize increase in F(x) with respect to p : F ( x k ) k 1 k k k pi + H ij p i p j Maximize xi 2 pk Necessary conditions for this quadratic optimization problem imply that:
k k H ij p j = ik where k = ik = F ( x k ) / x i = objective gradient
The Newton direction x k = p k is solution to this set of linear eqs. It is an ascent direction if
k H ij < 0 (negative definite). Modifications are needed at points where F ( x k ) / x i = 0 and/or k H ij = 0 . Modified Newton methods replace indefinite Hessian by a positive definite
Usually the gradient and Hessian need to be computed numerically, using finite difference or adjoint methods. Exact Hessian computation may be infeasible for large problems because of the number of function evaluations required. Newtons method characteristics: Hessian must be negative definite Converges faster than steepest ascent Many function evaluations required
k Very expensive to compute H ij if numerical differentiation is required.
Quasi-Newton Methods Avoid computational disadvantages of Newtons method and retain good convergence properties by gradually constructing a negative definite approximation to Hessian. Basic idea is to use an iterative algorithm to update an approximate negative definite Hessian
k matrix B k = Bij ., using only information about gradients and previous search steps. The search
direction p is found from an equation having the same form as the Newton equation:
k k Bij p j = ik
The new solution is usually obtained by using a search distance a obtained from a univariate search: x k +1 = x k + x k = x k + k p k The iteration is often initialized with the negative of the identity matrix B 0 = I = ij . Consequently, the first step is a steepest ascent step. The approximate Hessian is obtained from:
B k +1 = B k + U k
k where U k = U ij is an update matrix
Quasi-Newton condition.for k k +1
In most quasi-Newton methods U k is chosen subject to following requirements: k+1 1. B should satisfy the quasi-Newton condition:
k 2. U k should be rank 1, which implies that U ij = uik vik for some vectors u and v .
There are various methods that satisfy these requirements asymptotically (for large k). One of the most widely used is the Broyden-Fltecher-Goldfarb-Shanno (BFGS) algorithm: 1 1 k U ij B k x k B k x k + = ik k j k k k k k il l mj m xl Blm xm l xl
k k When we substitute x k = k p k and Bij p j = ik this simplifies to: k U ij =
lk plk
ik k j +
lk plk
ik k j
The BFGS update maintains the negative definiteness of the approximate Hessian from iteration to iteration. As in steepest ascent, the univariate search and gradient evaluation together require many function evaluations on each iteration. Quasi-Newton characteristics: No need to derive Hessian Approximate Hessian is negative definite k Univariate search for is desirable Many function evaluations required Good convergence properties Somewhat more expensive than steepest ascent but much more reliable. Gauss-Newton Methods Unconstrained nonlinear least-squares problems have a special structure that leads to efficient iterative search algorithms. These problems seek to minimize the weighted sum of squared errors between a set of measurements yt (t = 1,, m) and a nonlinear model ht ( x j ) that depends on a set of decision variables (or parameters) x j (j = 1,, n). Minimize F ( x1 , x 2 ,..., x n )
x1,x2,...,xn
m
where F ( x) =
Solve iteratively, starting with an initial estimate x 0 . Obtain search step as follows: 1. Set gradient at new iterate x k +1 equal to zero (from stationarity condition): h ( x k +1 ) F ( x k +1 ) N = =0 K t [ y t ht ( x k +1 )] t x j x j t =1
ht ( x k +1 ) = ht ( x k ) +
ht ( x k ) k x j + K x j
This is valid only if step x k j is sufficiently small. 3. Assume ht ( x k +1 ) / x j = ht ( x k ) / x j . Then stationarity condition reduces to:
k k Bij x k j =i
where:
k F ( x k ) N k ht ( x ) = K t [ y t ht ( x )] xi xi t =1
k Bij
ht ( x k +1 ) ht ( x k +1 ) = Kt xi x j t =1
This is the Gauss-Newton search algorithm. It has same form as Quasi-Newton algorithm, with a positive definite approximate Hessian B k computed from the sensitivity derivatives ht ( x k ) / x j and with x k = p k ( k = 1) . Sensitivity derivatives are usually be computed numerically, from multiple evaluations of the function h(x). Algorithm performance often improved by adding a positive constant to diagonals of B k so:
k k [ Bij + k ij ]x k j =i
This gives Levenberg-Marquardt search algorithm. k Small unmodified Gauss-Newton (steps may be too large) Large Steepest descent (steps may be too small) It is helpful to adjust dynamically to improve convergence. Univariate search along x k is an alternative. Gauss-Newton characteristics: Specialized algorithm for least-squares (minimization) problems Approximate positive definite Hessian is derived from sensitivity derivatives Many function evaluations required Good convergence properties Most efficient option for least-squares problems (depending on effort required to evaluate sensitivity derivatives).
k k