05 - Local Search
05 - Local Search
Alfredo Milani
Outline
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 2
Local search and optimization
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 3
Local search and optimization
• Local search
– Keep track of single current state
– Move only to neighboring states
– Ignore paths
• Advantages:
– Use very little memory
– Can often find reasonable solutions in large or infinite (continuous)
state spaces.
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 4
“Landscape” of search
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 5
Hill-climbing search
current MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor a highest valued successor of current
if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]
current neighbor
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 6
Hill-climbing search
• Hill climbing does not look ahead of the immediate neighbors of the
current state.
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 7
Hill climbing and local maxima
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 8
Hill-climbing example
• Successor function:
– move a single queen to another square in the same column.
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 9
Hill-climbing example
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 10
A local minimum for 8-queens
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 11
Other drawbacks
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 12
Performance of hill-climbing on 8-queens
• However…
– Takes only 4 steps on average when it succeeds
– And 3 on average when it gets stuck
– (for a state space with ~17 million states)
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 13
Possible solution…sideways moves
• For 8-queens
– Now allow sideways moves with a limit of 100
– Raises percentage of problem instances solved from 14 to 94%
– However….
• 21 steps for every successful solution
• 64 for each failure
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 14
Hill-climbing variations
• Stochastic hill-climbing
– Random selection among the uphill moves.
– The selection probability can vary with the steepness of the uphill
move.
• First-choice hill-climbing
– stochastic hill climbing by generating successors randomly until a
better one is found
– Useful when there are a very large number of successors
• Random-restart hill-climbing
– Tries to avoid getting stuck in local maxima.
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 15
Hill-climbing with random restarts
• Different variations
– For each restart: run until termination v. run for a fixed time
– Run a fixed number of restarts or run indefinitely
• Analysis
– Say each search has probability p of success
• E.g., for 8-queens, p = 0.14 with no sideways moves
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 16
Local beam search
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 17
Gradient Descent
1. Compute the gradient : C (x1 ,..., xn ) i
xi
2. Take a small step downhill in the direction of the gradient:
xi x 'i xi C (x1 ,..., xn ) i
xi
3. Check if C (x1 ,.., x ',..,
i xn ) C (x1,.., xi ,.., xn )
5. Repeat.
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 18
Learning as optimization
• Example:
– Training data D = {(x1,c1),………(xn, cn)}
where xi = feature or attribute vector
and ci = class label (say binary-valued)
– We can measure the error E(w) for any settig of the weights w,
and given a training data set D
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 19
Learning a minimum error decision boundary
Minimum Error
7 Decision Boundary
5
FEATURE 2
0
0 1 2 3 4 5 6 7 8
FEATURE 1
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 20
Search using Simulated Annealing
• Simulated Annealing = hill-climbing with non-deterministic search
• Basic ideas:
– like hill-climbing identify the quality of the local improvements
– instead of picking the best move, pick one randomly
– say the change in objective function is d
– if d is positive, then move to that state
– otherwise:
• move to this state with probability proportional to d
• thus: worse moves (very large negative d) are executed less
often
– however, there is always a chance of escaping from local maxima
– over time, make it less likely to accept locally bad moves
– (Can also make the size of the move random as well, i.e., allow
“large” steps in state space)
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 21
Physical Interpretation of Simulated Annealing
• A Physical Analogy:
• imagine letting a ball roll downhill on the function surface
– this is like hill-climbing (for minimization)
• now imagine shaking the surface, while the ball rolls, gradually
reducing the amount of shaking
– this is like simulated annealing
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 22
Simulated annealing
current MAKE-NODE(INITIAL-STATE[problem])
for t 1 to ∞ do
T schedule[t]
if T = 0 then return current
next a randomly selected successor of current
∆E VALUE[next] - VALUE[current]
if ∆E > 0 then current next
else current next only with probability e∆E /T
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 23
More Details on Simulated Annealing
– Lets say there are 3 moves available, with changes in the objective
function of d1 = -0.1, d2 = 0.5, d3 = -5. (Let T = 1).
– pick a move randomly:
• if d2 is picked, move there.
• if d1 or d3 are picked, probability of move = exp(d/T)
• move 1: prob1 = exp(-0.1) = 0.9,
– i.e., 90% of the time we will accept this move
• move 3: prob3 = exp(-5) = 0.05
– i.e., 5% of the time we will accept this move
– T = “temperature” parameter
• high T => probability of “locally bad” move is higher
• low T => probability of “locally bad” move is lower
• typically, T is decreased as the algorithm runs longer
– i.e., there is a “temperature schedule”
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 24
Simulated Annealing in Practice
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 25
Genetic algorithms
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 26
Genetic algorithms
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 27
Genetic algorithms
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 28
Genetic algorithm pseudocode
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 29
Comments on genetic algorithms
• Positive points
– Random exploration can find solutions that local search can’t
• (via crossover primarily)
– Appealing connection to human evolution
• E.g., see related area of genetic programming
• Negative points
– Large number of “tunable” parameters
• Difficult to replicate performance from one problem to another
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 30
Online search
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 31
Summary
• Reading:
– All of chapter 4
ICS 271, Fall 2007: Professor Padhraic Smyth Slide Set 5: Local Search 32