L04 - SEARCH (MORE SEARCH STRATEGIES)
Outline of this Lecture
Informed Search Strategies
Best-first search
Greedy best-first search
A* search
Heuristics
Local search algorithms
Hill-climbing search
Simulated annealing search
Beam search
Local beam search
Informed Search
Relies on additional knowledge about the problem or domain
frequently expressed through heuristics (rules of thumb)
Used to distinguish more promising paths towards a goal
may be mislead, depending on the quality of the heuristic
In general, performs much better than uninformed search
but frequently still exponential in time and space for realistic problems
Review: Tree search
Tree search algorithm:
function Tree-Search( problem, fringe) returns a solution, or failure
fringe Insert(Make-Node(Initial-State[problem]), fringe)
loop do
if fringe is empty then return failure
node Remove-Front(fringe)
if Goal-Test[problem] applied to State(node) succeeds return node
fringe InsertAll(Expand(node, problem), fringe)
A search strategy is defined by picking the order of node expansion
56
Best-First search
Relies on an evaluation function that gives an indication of how useful it
would be to expand a node (use an evaluation function f(n) for each node
estimate of "desirability" Expand most desirable unexpanded node)
family of search methods with various evaluation functions
usually gives an estimate of the distance to the goal
often referred to as heuristics in this context
The node with the lowest value is expanded first
the name is a little misleading: the node with the lowest value for
the evaluation function is not necessarily one that is on an optimal
path to a goal
if we really know which one is the best, theres no need to do a
search
The following is an algorithm for the Best-First Search:
function BEST-FIRST-SEARCH(problem, EVAL-FN) returns
solution
fringe := queue with nodes ordered by EVAL-FN
return TREE-SEARCH(problem, fringe)
Special cases:
greedy best-first search
A* search
Romania with step costs in km
57
Greedy Best-First search
minimizes the estimated cost to a goal
expand the node that seems to be closest to a goal
utilizes a heuristic function as evaluation function
f(n) = h(n) = estimated cost from the current node to a goal
heuristic functions are problem-specific
often straight-line distance for route-finding and similar
problems
often better than depth-first, although worst-time complexities are
equal or worse (space)
utilizes a heuristic function as evaluation function, f(n) = h(n) = estimated cost
from the current node to a goal
heuristic functions are problem-specific
often straight-line distance for route-finding and similar problems
For example, hSLD(n) = straight-line distance from n to Bucharest
Greedy best-first search expands the node that appears to be closest to goal
Greedy best-first search is often better than depth-first, although worst-time
complexities are equal or worse (space)
The following is an algorithm for Greedy Best-First search:
58
function GREEDY-SEARCH(problem) returns solution
return BEST-FIRST-SEARCH(problem, h)
Greedy best-first search example first step
Greedy best-first search example second step
59
Greedy best-first search example third step
Greedy best-first search example fourth step
Properties of greedy best-first search
Complete? No can get stuck in loops, e.g., Iasi Neamt Iasi Neamt
Time? O(bm), but a good heuristic can give dramatic improvement
Space? O(bm) -- keeps all nodes in memory
Optimal? No
60
A* search
Idea: avoid expanding paths that are already expensive
It combines greedy and uniform-cost search to find the (estimated) cheapest path
through the current node
Evaluation function f(n) = g(n) + h(n), the estimated total cost of path through n
to goal
g(n) = cost so far to reach n (path cost up to n)
h(n) = estimated cost from n to goal
heuristics must be admissible
That is, h(n) h*(n) where h*(n) is the true cost from n. (Also require h(n)
0, so h(G) = 0 for any goal G.)
never overestimate the cost to reach the goal (hSLD(n) never overestimates the
actual road distance)
It is a very good search method, but with complexity problems
The following is an algorithm for A* search:
function A*-SEARCH(problem) returns solution
return BEST-FIRST-SEARCH(problem, g+h)
A* search example first step
61
A* search example second step
A* search example third step
A* search example fourth step
62
A* search example fifth step
A* search example sixth step
63
Admissible heuristics
A heuristic h(n) is admissible if for every node n,
h(n) h*(n), where h*(n) is the true cost to reach the goal state from n.
An admissible heuristic never overestimates the cost to reach the goal, i.e., it is
optimistic
Example: hSLD(n) (never overestimates the actual road distance)
Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal
Optimality of A* (proof)
Suppose some suboptimal goal G2 has been generated and is in the fringe. Let n
be an unexpanded node in the fringe such that n is on a shortest path to an optimal
goal G.
We shall have:
f(G2) = g(G2)
since h(G2) = 0
64
g(G2) > g(G)
f(G) = g(G)
f(G2) > f(G)
since G2 is suboptimal
since h(G) = 0
from above
h(n)
h*(n)
g(n) + h(n) g(n) h*(n)
f(n)
f(G)
since h is admissible
Hence f(G2) > f(n), and A* will never select G2 for expansion
Consistent heuristics
A heuristic is consistent if for every node n, every successor n' of n generated by
any action a,
h(n) c(n ,a, n') + h(n')
65
If h is consistent, we have
f(n')
= g(n') + h(n')
= g(n) + c(n, a, n') + h(n')
g(n) + h(n)
= f(n)
that is, f(n) is non-decreasing along any path.
Theorem: If h(n) is consistent, A* using GRAPH-SEARCH is optimal
This GRAPH-SEARCH is as shown below:
Optimality of A* search
A* expands nodes in order of increasing f value
Gradually adds "f-contours" of nodes
Contour i has all nodes with f=fi, where fi < fi+1
66
A* will find the optimal solution
the first solution found is the optimal one
A* is optimally efficient
no other algorithm is guaranteed to expand fewer nodes than A*
A* is not always the best algorithm
optimality refers to the expansion of nodes
other criteria might be more relevant
A* generates and keeps all nodes in memory
improved in variations of A*
Complexity of A*
The number of nodes within the goal contour search space is still exponential
with respect to the length of the solution
better than other algorithms, but still problematic
67
Frequently, space complexity is more severe than time complexity
A* keeps all generated nodes in memory
Properties of A* search
The value of f never decreases along any path starting from the initial node
also known as monotonicity of the function
almost all admissible heuristics show monotonicity
those that dont can be modified through minor changes
This property can be used to draw contours
regions where the f-cost is below a certain threshold
with uniform cost search (h = 0), the contours are circular
the better the heuristics h, the narrower the contour around the optimal path
Complete? Yes (unless there are infinitely many nodes with f f(G) )
Time? Exponential O(bd)
Space? Keeps all nodes in memory O(bd)
Optimal? Yes
Admissible heuristics
For example, for the 8-puzzle:
h1(n) = number of misplaced tiles
68
h2(n) = total Manhattan distance
(that is, the number of squares from desired location of each tile)
h1(S) = 8
h2(S) = 3+1+2+2+2+3+3+2 = 18
Dominance
If h2(n) h1(n) for all n (both admissible) then h2 dominates h1
h2 is better for search
Typical search costs (average number of nodes expanded):
d=12 IDS = 3,644,035 nodes
A*(h1) = 227 nodes
A*(h2) = 73 nodes
d=24 IDS = too many nodes
A*(h1) = 39,135 nodes
A*(h2) = 1,641 nodes
Relaxed problems
A problem with fewer restrictions on the actions is called a relaxed problem
The cost of an optimal solution to a relaxed problem is an admissible heuristic
for the original problem
69
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then
h1(n) gives the shortest solution
If the rules are relaxed so that a tile can move to any adjacent square, then
h2(n) gives the shortest solution
Heuristics for Searching
for many tasks, a good heuristic is the key to finding a solution
prune the search space
move towards the goal
relaxed problems
fewer restrictions on the successor function (operators)
its exact solution may be a good heuristic for the original problem
8-Puzzle Heuristics
level of difficulty
around 20 steps for a typical solution
branching factor is about 3
exhaustive search would be 320 =3.5 * 109
9!/2 = 181,440 different reachable states
distinct arrangements of 9 squares
candidates for heuristic functions
number of tiles in the wrong position
sum of distances of the tiles from their goal position
city block or Manhattan distance
generation of heuristics
possible from formal specifications
Local Search and Optimization
for some problem classes, it is sufficient to find a solution
the path to the solution is not relevant
70
memory requirements can be dramatically relaxed by modifying the current
state
all previous states can be discarded
since only information about the current state is kept, such methods are
called local
Local search algorithms
In many optimization problems, the path to the goal is irrelevant; the goal state
itself is the solution
State space = set of "complete" configurations
Find configuration satisfying constraints, e.g., n-queens
In such cases, we can use local search algorithms
Keep a single "current" state, try to improve it
Example: n-queens
Put n queens on an n n board with no two queens on the same row, column, or
diagonal
Iterative Improvement Search
for some problems, the state description provides all the information required for
a solution
71
path costs become irrelevant
global maximum or minimum corresponds to the optimal solution
iterative improvement algorithms start with some configuration, and try
modifications to improve the quality
8-queens: number of un-attacked queens
VLSI layout: total wire length
analogy: state space as landscape with hills and valleys
Hill-climbing search
"Like climbing Everest in thick fog with amnesia"
72
continually moves uphill
increasing value of the evaluation function
gradient descent search is a variation that moves downhill
very simple strategy with low space requirements
stores only the state and its evaluation, no search tree
there are some problems
local maxima
algorithm cant go higher, but is not at a satisfactory solution
plateau
area where the evaluation function is flat
ridges
search may oscillate slowly
general problem: depending on initial state, can get stuck in local maxima
73
74
Hill-climbing search: 8-queens problem
h = number of pairs of queens that are attacking each other, ether directly or
indirectly
h = 17 for the above state
75
A local minimum with h = 1
76
Simulated annealing search
Idea: escape local maxima by allowing some "bad" moves but gradually decrease
their frequency
similar to hill-climbing, but some down-hill movement
random move instead of the best move
depends on two parameters
E, energy difference between moves; T, temperature
temperature is slowly lowered, making bad moves less likely
analogy to annealing
gradual cooling of a liquid until it freezes
will find the global optimum if the temperature is lowered slowly enough
applied to routing and scheduling problems
Properties of simulated annealing search
One can prove: If T decreases slowly enough, then simulated annealing search
will find a global optimum with probability approaching 1
Widely used in VLSI layout, airline scheduling, etc
77
Beam search
Beam search is a heuristic search algorithm that is an optimization of best-first search
that reduces its memory requirement. Best-first search is a graph search which orders all
partial solutions (states) according to some heuristic which attempts to predict how close
a partial solution is to a complete solution (goal state). In beam search, only a
predetermined number of best partial solutions are kept as candidates.
Beam search uses breadth-first search to build its search tree. At each level of the tree, it
generates all successors of the states at the current level, sorting them in order of
increasing heuristic values. However, it only stores a predetermined number of states at
each level (called the beam width). The smaller the beam width, the more states are
pruned. Therefore, with an infinite beam width, no states are pruned and beam search is
identical to breadth-first search. The beam width bounds the memory required to perform
the search, at the expense of risking completeness (possibility that it will not terminate)
and optimality (possibility that it will not find the best solution). The reason for this risk
is that the goal state could potentially be pruned.
The beam width can either be fixed or variable. In a fixed beam width, a maximum
number of successor states is kept. In a variable beam width, a threshold is set around the
current best state. All states that fall outside this threshold are discarded. Thus, in places
where the best path is obvious, a minimal number of states is searched. In places where
the best path is ambiguous, many paths will be searched.
Local beam search
variation of beam search
a path-based method that looks at several paths around the current one
Keep track of k states rather than just one
information between the states can be shared
moves to the most promising areas
Start with k randomly generated states
At each iteration, all the successors of all k states are generated
If any one is a goal state, stop; else select the k best successors from the complete
list and repeat.
stochastic local beam search selects the k successor states randomly
with a probability determined by the evaluation function
78