Artificial Intelligence-437
Artificial Intelligence-437
-Weak AI just emulates selected functions in the brain, while strong AI actually tries to recreate the
functions of the inside of the brain itself.
-The concepts of weak and strong AIs can be explained by an example.
Assuming that there is a very intelligent machine that does a lot of tasks with a lot of intelligence. If
the machine and a cat are both thrown into a pool of water and their reactions observed, the cat will
try to save its life by attempting to swim out of the pool, while the intelligent machine would die out
in the water without any effort to save itself. Hence, the cat possesses strong Intelligence while the
machine does not. The machine only knew what it was taught or in other words only knew what was
programmed into it. It never had the inherent capability of intelligence which would have helped it to
deal with this new situation.
(2) Informed search: In contrast, informed search methods use a heuristic to guide the search for the
problem at hand and are therefore much more efficient than uninformed search. Examples
include uniform cost search, greedy best first search, A* search, hill climbing, simulated
annealing, etc.
Page | 1
Examples of such algorithms include ant colony algorithm, particle swarm algorithm,
genetic algorithm, etc.
(4) Fuzzy Logic: Fuzzy logic deals with fuzzy sets and logical connectives for modelling the human-like
reasoning problems of the real world. A fuzzy set, unlike conventional sets, includes all
elements of the universal set of the domain but with varying membership values in the
interval [0,1].
(5) Machine learning: A computer or machine is said to learn from experience with respect to some class
of tasks if its performance at such tasks improves with experience. Machine learning is
classified into supervised learning, unsupervised learning and reinforcement learning.
(6) Artificial neural network: Artificial neural network mimics the way in which the brain performs a
particular task or function of interest. The neural computer adapts itself during a training
period, based on examples of similar problems even without a desired solution to each
problem. After sufficient training, the neural network is able to relate the problem data to
the solutions, inputs to outputs, and it is then able to offer viable solutions to brand new
problems.
SEARCH STRATEGIES
- There exist quite a large number of problem solving techniques in AI that rely on search procedure
- In such cases, problem solving first requires representation of the problem by an appropriately organized
state space. Some important terminologies relevant to AI search problems are:
- State: A state represents the status of the solution at a given step of the problem solving procedure.
- Initial State: The Initial State of the problem defines the starting state.
- Operator: This defines the set of possible actions that may be taken at a given time to move from one
state to another.
- Neighbourhood: The set of states that can be moved to from a particular state is often called the
Neighbourhood. Another way it is sometimes represented is by applying a Successor Function, S.
By applying S(x), where x is the current state, all states reachable from the current state are
neighbour states.
- State Space: The initial state and the operators define the State Space of a problem. An operator is
repeatedly applied to the current state to cause a transition to the next state (i.e., neighbourhood)
until the goal (desired) state is reached.
- Goal State and Goal test: The state space is expanded in steps and the desired state, called “the goal
state”, is searched after each incremental expansion of the state space. It is necessary to be able to
know when a problem has been solved. A Goal Test, when applied to the current state, reveals
whether the current state is a goal state.
- Path Cost Function: For some problems, the target might not be only to find a solution but also the
"cost" of the found solution is of equal importance. This cost is termed the Path Cost Function.
Page | 2
Time complexity: Indicates how long it takes to find a solution
Space complexity: Indicates how much memory is needed to perform a search
Search Trees and Graphs
- State space problems are often represented in form of trees or graphs that comprise the following:
(i) Node: represents the set of existing states in the search space
(ii) Arc: denotes an operator applied to an existing state to cause transition to another state;
(iii) Goal state: denotes the desired state to be identified in the nodes; and
(iv) Current state: represents the present state reached while searching for the goal state.
At each stage of the search process, the following are usually of interest
Depth: This is the number of nodes from the root node
Path-Cost: This is the total cost from the initial state to the current node
-Consider a simple search problem in physical space illustrated in Fig. 1 (a). Assuming the initial state is
‘A’ from which there are three possible actions that lead to position ‘B,’ ‘C,’ or ‘D.’ Action in this case
(also called an operator) is simply a legal move between one place and another.
-This search space can be reduced to a tree structure as illustrated in Fig. 1(b). Each node in the tree is a
physical location and the arcs between nodes are the legal moves. The depth of the tree is the distance
from the initial position ‘A’. Some important terminologies associated with this tree are as follows:
Graphs and trees may be undirected (where arcs do not imply a direction) or they may be directed (where a
direction is implicit in the arc). Figs. 2(a) and (b) illustrate directed and undirected graphs, respectively.
A A
B D D
C B
C
E F G E F G
Fig. 2 (a) An example of a directed graph Fig. 2 (b) An example of an undirected graph
containing seven nodes and nine arcs. containing seven nodes and nine arcs.
The adjacency matrices for the directed and undirected graphs in Fig. 2 (a) and (b) are shown in Figs. 3(a)
and (b), respectively.
Page | 3
Time is often measured in terms of the number of nodes generated during the search, and space in terms of
the maximum number of nodes stored in memory.
To To
A B C D E F G A B C D E F G
A 1 A 1 1 1
B 1 B 1 1 1
C 1 C 1 1
From
From
D 1 D 1 1
E 1 E 1 1
F 1 1 F 1 1 1
G 1 1 G 1 1 1
Implementing a Search
First of all, the information to be stored at each node of the tree needs to be decided upon. It is possible to
define a data structure for node components as follows:
Datatype: node
Components: STATE, PARENT_NODE, OPERATOR, DEPTH, PATH_COST
If the tree is implemented using a queue, the next node to be expanded may be taken from the front (or
from the rear) of the queue. The queue will have common queue functions; such as
There will be extensive use of the above function in subsequent search strategies that will be discussed. It
is assumed that the queuing function always adds new nodes to the rear of the queue.
Page | 4
Depth-first Search
-The depth-first search generates nodes and compares them with the goal along the largest depth of the tree
and moves up to the parent of the last visited node, only when no further node can be generated below the
last visited node.
-After moving up to the parent, the algorithm attempts to generate a new offspring of the parent node.
-The above principle is employed recursively to each node of a tree in a depth-first search.
-One simple way to realize the recursion in the depth-first search algorithm is to employ a stack. The
procedure for the stack implementation, LIFO (Last In First Out), is presented as follows:
Begin
1. Push the starting node into a stack
The above algorithm can be implemented using the search function described earlier. It is as follows:
Example 1
Assume that the goal node in the tree illustrated in Fig. 4(a) is V9. By employing a stack, apply DFS
algorithm to the tree with the assumption that left nodes are searched first and the search process terminates
as soon as the goal item has been found.
Page | 5
(a) Illustrate the content of the stack for each step of the DFS algorithm
(b) Provide a trace of the stack process by completing Table 1(a)
(c) Indicate on the tree which part of it is explored by the DFS algorithm
Depth
Table 1(a) Trace of the stack process
V1 0
Node Stack process number
V2 Insert process Delete process
V4 V3 1
V1
V2
Finish
V7 V6 V5 V10 2 V3
V4
V8 V9 V11 V12 4 V5
V6
V7
Goal node V8
V9
Fig. 4 (a) A directed tree V10
V11
V12
Solution
(a) The status of the stack is illustrated in Fig. 4(b)
Top of stack V2 V7 V8
V4 V4 V6 V6 V9 V9
V1 V3 V3 V3 V3 V3 V3 V3
search search
begins ends
Fig. 4(b) Status of stack when DFS algorithm is executed on Fig. 4(a)
(c) The part of the tree explored by DFS algorithm is indicated in Fig 4(c)
Breadth-first Search
-The breadth-first search (BFS) algorithm visits the nodes of the tree along its breadth, starting from the
level with depth 0 to the maximum depth.
-It can be easily realized with a queue.
-The queue implementation, FIFO (First In First Out), is presented as follows:
Page | 6
END
UNTIL the queue is empty
END
Breadth first search can be implemented by using a queuing function that adds expanded nodes to the rear
of a queue as presented in the following functions:
Example 2
Assume that the goal node in the tree illustrated in Fig. 5(a) is V9. By employing a queue, apply BFS
algorithm to the tree with the assumption that left nodes are searched first and the search process terminates
as soon as the goal item has been found.
(a) Illustrate the content of the queue for each step of the BFS algorithm
(b) Provide a trace of the queue process by completing Table 2(a)
(c) Indicate on the tree which part of it is explored by the BFS algorithm
Depth
Table 2(a) Trace of the queue process
V1 0
Node Queue process number
V2 Insert process Delete process
V4 V3 1 V1
V2
Finish
V7 V6 V5 V10 2 V3
V4
V5
V8 V9 V11 V12 4 V6
V7
Goal node V8
V9
V10
Fig. 5 (a) A directed tree V11
V12
Solution
(a) The status of the queue is illustrated in Fig. 5(b)
Search begins
V1 V2 V4 V3 V4 V3 V3 V7 V6 V7 V6 V5 V10 V6 V5 V10 V5 V10 V8 V9
Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue
front rear front rear front rear front rear front rear front rear front rear
Search ends
V10 V8 V9 V8 V9 V11 V12 V9 V11 V12 V11 V12
Fig. 5(b) Status of queue when BFS algorithm is executed on Fig. 5(a)
Page | 7
(b) A trace of the queueing process is shown in Table 2(b)
Table 2(b) Trace of queue process
Node Queue process number
Insert process Delete process
V1 1 2 V1
V2 3
Finish 6
V3 5 10
V2
V4 4 7 V4 V3
V5 11 17
V6 9 14
V7 8 13 V7 V6 V5 V10
V8 15 21
V9 16 22 V8 V11
V9 V12
V10 12 18
V11 19 -
V12 20 -
Fig. 5 (c) A directed tree
(c) The part of the tree explored by the BFS algorithm is indicated in Fig 5(c)
Depth-Limited Search
-The problem with depth-first search is that the search can go down an infinite branch and thus never
return.
-Depth-limited search avoids this problem by imposing a depth limit which effectively terminates the
search at that depth.
-The algorithm can be implemented using the general search algorithm by using operators to keep track of
the depth.
-The choice of the depth parameter can be an important factor. Choosing a parameter that is too deep is
wasteful in terms of both time and space. But choosing a depth parameter that is too shallow may result in
never reaching a goal state.
-As long as the depth parameter, l, is set “deep enough” then it is guaranteed that a solution will be found,
if it exists.
-Therefore it is complete as long as l >= d (where d is the depth of the solution). If this condition is not met
then depth limited search is not complete.
-The space requirements for depth limited search are similar to depth first search. That is, O(bl).
-The time complexity is O(bl)
Page | 8
-What iterative deepening search does is that it combines breadth first search and depth first search.
-It may appear wasteful as it is expanding nodes multiple times. But the overhead is small in comparison to
the growth of an exponential search tree.
-To show this is the case, consider this.
-For a depth limited search to depth d with branching factor b the number of expansions is
1 + b + b2 + … + bd-2 + bd-1 + bd
-If we apply some real numbers to this (say b = 10 and d = 5), we get
-For an iterative deepening search the nodes at the bottom level, d, are expanded once, the nodes at d-1 are
expanded twice, those at d-3 are expanded three times and so on back to the root.
The formula for this is
(d+1)1 + (d)b+(d-1)b2 + … + 3bd-2 + 2bd-1 + 1bd
-It can be seen that, compared to the overall number of expansions, the total is not substantially increased.
-In fact, when b=10 only about 11% more nodes are expanded for a breadth first search or a depth limited
node down to level d.
-Even when the branching factor is 2, iterative deepening search only takes about twice as long as a
complete breadth first search.
-The time complexity for iterative deepening search is O(bd) with the space complexity being O(bd).
-For large search spaces where the depth of the solution is not known iterative deepening search is
normally the preferred search method.
There are three methods that can be used to stop generating repeated nodes.
1. Do not generate a node that is the same as the parent node. Or, to put it another way, do not return to
the state you have just come from.
2. Do not create paths with cycles in them. To do this we can check each ancestor node and refuse to
create a state that is the same as this set of nodes.
3. Do not generate any state that is the same as any state generated before. This requires that every state is
kept in memory (meaning a potential space complexity of O(bd)).
Page | 9
Table 3 Summary of some blind search methods
Algorithm Time Space Optimal? Complete? Derivative
Depth First Search (DFS) bm bm No No -
Breadth First Search (BFS) d
b bd Yes Yes -
Depth Limited Search (DLS) bl bl No Yes, if l >= d DFS
Iterative Deepening Search (IDS) d
b bd Yes Yes DLS
h(n) = estimated cost of the cheapest path from the state at node n to a goal state
-To implement an informed search, the nodes are ordered based on the value returned from a heuristic
function.
-We can implement such a search using the general search algorithm called the BEST-FIRST-SEARCH
function which chooses the best node as the node to be expanded first.
-The general best first function is as follows:
Where
Problem = search problem
Eval_Fn = an evaluation function
Examples of informed search strategies include uniform cost search, greedy best first search, hill climbing
search, simulated annealing, etc.
Page | 10
A
-Consider Fig. 6(a), assuming it is desired to find the shortest
route from S to G (i.e., S is the initial state and G is the goal 1 10
state), it can be seen that SBG is the shortest route
-But if breadth first search is applied to the problem, it will find 5 B 5
S G
the path SAG; assuming that A is the first node to be expanded
at depth level 1. 5
15
-The following illustrates how uniform cost search tackles C
the problem: Fig. 6(a)
*We start with the initial state, S, and expand it. This leads to S
the tree in Fig. 6(b)
*The path cost of A is the cheapest, so it is expanded next;
A 1 B 5 C 15
giving the tree in Fig. 6(c)
-We now have a goal state, but the algorithm does not recognize
Fig. 6(b)
it yet as there is still a node with a cheaper path cost. The
algorithm orders the queue by the path cost so the node with
cost 11 will be behind node B in the queue. S
*Node B (being the cheapest) is now expanded, giving the
tree in Fig. 6(d) A B 5 C 15
*A goal state (G) will now be at the front of the queue.
Therefore the search will end and the path SBG will be
G 11
returned. Fig. 6(c)
-In summary, uniform cost search will find the cheapest solution
provided that the cost of the path never decreases as we S
proceed along the path, otherwise, it will result in carrying out
an exhaustive search on the entire tree.
A B C 15
Of course, this heuristic might not always give us the best city to go to. Two possible reasons are
• The city you decide to go to may not have a route to your destination city so that you will have to re-
trace your steps. For example, if you are trying to get to Abakaliki and you are in Kumo and the
heuristic suggests Ayangba as the next city, you would have to re-trace your steps eventually as there is
no way to get to Abakaliki from Ayangba without going through Kumo.
• The route to your final destination may be a meandering road so, although the SLD is shorter, the actual
distance to travel is longer.
-But as a general strategy, it is good to adopt the SLD heuristic since it is likely to find a solution faster
than a blind search.
-Given the Zuru/Abakaliki route finding problem and the hSLD heuristic, greedy best first search would
work as follows (see the map in Fig. 7(a) and the SLDs in Table 4):
Gusau Table 4 Straight line
71 distances from some
Birnin Nigerian towns to
Kebbi Abakaliki
Kumo Town SLD to
75 Bauchi 90
Abakaliki
Zuru
Abakaliki 0
140 Abuja 193
118 92
Kaduna 98 Ayangba 126
99 Jos
Wawa Bali 199
Page | 12
Zuru
*We start at Zuru. This has an SLD of 366 to Abakaliki and as it is the only node we have got, we
expand it.
*After expanding we have three new nodes (Birnin Kebbi, Kaduna and Wawa). These have SLD's of
374, 253 and 329 respectively. The SLD heuristic tells us that Kaduna should be expanded next.
*This leads to four new nodes (Gusau, Zuru, Jos and Abuja) with SLD's of 380, 366, 178 and 193.
*Next we choose Jos to expand as this has the lowest heuristic value. This step leads to Abakaliki being
generated which (with a heuristic value of 0) is a goal state.
*The partially drawn search tree based on the greedy best first algorithm is illustrated in Fig. 7(b).
-Using this heuristic led to a very efficient search. We never expanded a node that was not on the solution
path (a minimal search cost).
-But, it is noticed that it is not optimal. If we had gone from Zuru through Kaduna and Abuja to Abakaliki
we would have travelled a shorter distance.
-Therefore, it is true to say that greedy best first search often performs well although it cannot be
guaranteed to give an optimal solution.
-If you start at Kumo and were trying to get to Jos you would initially go to Bauchi as this is closer to Jos
than Bali. But you would eventually need to go back through Kumo and through Bali to get to your
destination.
-This implies that it may be necessary to check for repeated states, otherwise, one will forever move back
and forth between Kumo and Bauchi.
Example 3
Apply Greedy Best First Search algorithm to the map in Fig. 7(a) and work out a route from Benin town to
Abakaliki by using the following cost functions.
H(n) = The Straight Line Distance between any town and town Abakaliki.
These distances are given in the Table 7.
(i) Provide the search tree for your solution and indicate the order in which you expanded the nodes.
(ii) State the route you would take
Solution
(i) The figure next to each node represents the H(n) function, where
H(n) = The heuristic value (i.e. the straight line distance to the target town)
The Search Tree is as follows
Page | 13
Benin
200
A* Search
-Greedy best first search can considerably cut the search time but it is neither optimal nor complete.
-By comparison uniform cost search minimises the cost of the path so far, g(n). Uniform cost search is both
optimal and complete but can be very inefficient.
-A* search combines these two search strategies (i.e., uniform cost search and greedy best first search) to
get the advantages of both. This is done simply by combining both evaluation functions. That is
-As g(n) gives the path cost from the start node to node n and h(n) gives the estimated cost of the cheapest
path from n to the goal, we have
-The good thing about this strategy is that it is optimal and complete
-this algorithm expands the lowest cost node on the fringe, wherever that node is in the search tree; i.e., the
select nodes for next expansion are not just restricted to nodes that have just been generated.
Page | 14
Admissible Heuristics
-An admissible heuristic is a heuristic that never overestimate the cost to reach a goal.
-It is obvious that the SLD heuristic function is admissible as we can never find a shorter distance between
any two towns.
Example 4
Apply A* Search algorithm to the map in Fig. 7(a) and work out a route from Benin town to Abakaliki by
using the following cost functions.
G(n) = The cost of each move as the distance between each town (shown on map)
H(n) = The Straight Line Distance between any town and town Abakaliki.
These distances are given in the Table 7.
(i) Provide the search tree for your solution and indicate the order in which you expanded the nodes.
(ii) State the route you would take and the cost of that route.
(iii) The straight line distance heuristic used above is known to be an admissible heuristic. What does this
mean and why is it important?
Solution
(i) The figures next to each node represent G(n) and H(n) functions, where
G(n) = The cost of each move as the distance between each town (shown on map)
H(n) = The heuristic value (i.e. the straight line distance to a target town as shown in Table 4)
Benin
0+200 = 200
Hill Climbing
-The hill climbing searches a point in the search space that is better than all the others. "Better" in this
context means that the evaluation is higher. We might also say that the solution is of better quality than
all the others.
-The search can be made to minimise a function as well; and in this case, it searches for a point in the
search space that has a lower evaluation.
Page | 15
3. Choose the neighbour with the best quality and move to that state.
4. Repeat 2 thru 4 until all the neighbouring states are of lower quality.
5. Return the current state as the solution state.
-Note that this algorithm does not maintain a search tree. It only returns a final solution.
-Also, if two neighbours have the same evaluation and they are both the best quality, then the algorithm
will choose between them at random.
Simulated Annealing
-Annealing is a process of metal casting, where the metal is first melted at a high temperature beyond its
melting point and then is allowed to cool down, until it returns to the solid form.
-Thus in the physical process of annealing, the hot material gradually loses energy and finally at one point
in time reaches a state of minimum energy.
-In general, the aim of simulated annealing is to find a minimum energy for a system
-The outer loop of the simulated annealing algorithm is quite similar to hill climbing. Instead of picking the
best move, however, simulated annealing picks random move. If the move results in a lower energy, it is
accepted with a probability of 1; otherwise, it is accepted with a probability less than 1.
-Hence, it may make some downhill moves before finding a good way to move uphill. This downhill
moves enables it to be pulled out of a local maximum.
-E is the change in energy level (i.e., E = Costnextstate – Costcurrentstate)
- If E < 0, then the next state is accepted as the current state (i.e., transversal to lower energy state);
otherwise, it’s acceptance is based on a randomly generated probability P (which may result in transversal
to higher energy state)
-The probability, P, of moving to a higher energy state is
E
−
P= e T (2)
where
T is the temperature which is periodically reduced
-The simulated annealing algorithm is as follows:
Current = MAKE_NODE(INITIAL_STATE[Problem])
temp = tempmax
Loop do
If temp = tempmin
then return Current
End if
Next = a randomly selected successor of Current
ΔE = VALUE[Next] – VALUE[Current]
If ΔE < 0
then Current = Next
else
Current = Next only with probability P given as exp(-ΔE/ temp)
End if
temp = temp-temp
End Loop
End Function
Page | 17
Evolutionary or population-based computation
-Evolution has been used as a metaphor for solving very difficult problems.
-Evolution in itself is a mechanism of incremental search, whereby more fit solutions to problems
propagate to future generations, and less fit solutions gradually fade away.
-The algorithms used here are population-based algorithms.
-Each of the algorithms operates on a population of entities, parallelizing the ability to solve a problem.
-Examples of population-based algorithms include, ant colony algorithm, particle swarm algorithm,
genetic algorithm, etc.
1
ij = (5)
dij
This static information is used to direct the choice of the ants towards closer nodes, and to
avoid nodes that are too remote;
3. The quantity of pheromone deposited on the edge connecting the two nodes, called intensity of
the trail. This parameter defines the relative attraction of part of the total path and changes with
each passage of an ant. This can be viewed as a global memory of the system, which evolves
through a training process.
Page | 18
( ij (t )) (ij ) if j J ik
pijk (t ) = lJ k ( il (t )) (ij ) (6)
i
0 if j J ik
where α and β are two parameters controlling the relative importance of the trail intensity, ij (t ) ,and
visibility ij .
With α = 0, only visibility of the node is taken into consideration; the node nearest is thus
selected with each step.
On the contrary, with β = 0, only the trails of pheromone become influential.
-After each iteration, each ant leaves a certain quantity of pheromones ijk (t ) on its entire course, the
amount of which depends on the quality of the solution found:
Q
if (i, j ) T k (t )
ijk (t ) = Lk (t ) (7)
0 if (i, j ) T (t )
k
where T k (t ) is the path traversed by ant k during the iteration t, Lk (t ) is the length of the turn and Q a
fixed parameter.
-Bad solutions are forgotten through the process of evaporation of the trails of pheromone which helps
avoid getting trapped in sub-optimal solutions. Hence, the update rule for the trails is given as:
ij (t + 1) = (1 − ) ij (t ) + ij (t ) (8)
where
ij (t ) = ijk (t )
m
k =1
(9)
and m is the number of ants. The initial quantity of pheromone on the paths is a uniform distribution
of a small quantity τ0 ≥ 0.
Page | 19
C
d =0.5 d =0.5
E A
d =1 D d =1
B
d =1 d =1
H
Fig. 9(a) Graph illustrating the paths to be taken by ants
-Assume that thirty ants are moving from E to A and another thirty ants are moving from A to E.
-At t=1 there will be thirty ants at B and thirty ants at D. At this point they have a 0.5 probability as to
which way they will turn. We assume that half go one way and half go the other way as indicated in Fig
9(a).
-At t = 2 there will be fifteen ants at D (who have travelled from B, via C) and fifteen ants at B (who have
traveled from D, via C). As illustrated in ig. 9(b),there will be thirty ants at H (fifteen from D and
fifteen from B). The intensities on the edges will be as follows.
C C
30 ants
d =0.5 d =0.5 d =0.5 d =0.5
15 ants 30 ants 30 ants 20 ants 30 ants
E 15 ants 20 ants
d =1 A E A
D 15 ants d =1 d =1 D 10 ants d =1
B B
15 ants 10 ants
d =1 d =1 d =1 d =1
H H
Fig. 9(b) Graph illustrating the paths to be taken by ants at Fig. 9(c) Graph illustrating the paths to be taken by ants at
time t = 1 time t = 2
-If we now introduce another 60 ants into the system (30 from each direction) more ants are likely to
follow the BCD rather than BHD as the pheromone trail is more intense on the BCD route as
illustrated in Fig 9(c).
Figs. 10(a) through (f) illustrates how ants are made to tour from node to node
[A] [B]
[ [
[C]
] [ ]
A B A Ant11 ] Ant12 B
Ant13 C
C [E]
[D]
[
[ ]
D ] E Ant15
E
D Ant14
Fig. 10(a) Original nodes without ants Fig. 10(b) One ant is deployed at each of the starting
nodes
Page | 20
[E,A] [C,B] [D,E,A] [E,A,B]
[ [
[B,C] [A,D,C]
] ]
A Ant15 Ant 3 B A Ant14 Ant 5 B
Ant12 C Ant11 C
[D,E] [C,B,E]
[A,D] [B,C,D]
D E Ant14 D E Ant13
Ant11 Ant12
Fig. 10(c) Each ant makes a tour to the 1st Fig. 10(d) Each ant makes a tour to the 2nd
neighbouring node neighbouring node
D E Ant11 D E Ant12
Ant13 Ant15
Fig. 10(e) Each ant makes a tour to the 3rd Fig. 10(f) Each ant makes a tour to the 4th
neighbouring node neighbouring node
Total distance covered by each ant at the end of the tour is computed e.g.,
After each ant has visited all nodes, it is returned to its starting node
Q
Compute pheromone deposited by each ant on the route it followed: ik, j =
Lk
Compute total pheromone deposited by all ants that followed a route: ij (t ) = m
k =1
ijk (t )
Ant 1
At node A:
Page | 22
Since Path AD has the highest probability, path AD is selected
At node D:
The probability of moving to node C is
5
(1)1 1
( DC (t )) ( DC ) 6
(t ) = p 1DC (t ) = =
0.00013
= 13
(( DB (t )) ( DB ) )
k
p DC
1 1 5 0.00001
(1)
10
The probability of moving to node B is
5
(1)1 1
( DB (t )) ( DB ) 10
(t ) = p 1DC (t ) = =
0.00001
= 0.077
(( DC (t )) ( DC ) )
k
p DB
1 1 5 0.00013
(1)
6
Since Path DC has the highest probability, path DC is selected
Ant 2
At node C:
The probability of moving to node A is
5
(1)1 1
( CA (t )) ( CA ) 7
(t ) = (t ) = =
0.00006
= 0.054
(( CB (t )) ( CB ) )+ (( CD (t )) ( CD ) )
k 2
p CA p CA
1 1 1 1 0.00111
5 5
(1) + (1)
4 6
The probability of moving to node B is
5
(1)1 1
( CB (t )) ( CB ) 4
(t ) = (t ) = =
0.00098
= 1.11
(( CA (t )) ( CA ) )+ (( CD (t )) ( CD ) )
k 2
p CB p CB
1 1 1 1 0.00088
5 5
(1) + (1)
7 6
The probability of moving to node D is
5
(1)1 1
( CD (t )) ( CD )
6
(t ) = p 1AC (t ) = =
0.00013
= 0.125
(( CA (t )) ( CA ) )+ (( CB (t )) ( CB ) )
k
p CD
1 1 5 1 1 5 0.00104
(1) + (1)
7 4
Since Path CB has the highest probability, path CB is selected
At node B:
The probability of moving to node A is
5
(1)1 1
( BA (t )) ( BA )
8
(t ) = p 1DC (t ) = =
0.000031
= 3.1
(( BD (t )) ( BD ) )
k
p BA
1 1 5 0.00001
(1)
10
The probability of moving to node D is
Page | 23
5
1
(1) 1
( BD (t )) ( BD ) 10
(t ) = p 1DC (t ) = =
0.00001
= 0.32
(( BA (t )) ( BA ) )
k
p BD
1 1 5 0.000031
(1)
8
Ant 3
At node B:
The probability of moving to node A is
5
(1)1 1
( BA (t )) ( BA )
8
(t ) = (t ) = =
0.000031
= 0.031
(( BC (t )) ( BC ) )+ (( BD (t )) ( BD ) )
k 3
p BA
p CA
1 1 5 1 1 5 0.00099
(1) + (1)
4 10
The probability of moving to node C is
5
(1)1 1
( BC (t )) ( BC ) 4
(t ) = (t ) = =
0.00098
= 23.9
(( BA (t )) ( BA ) )+ (( BD (t )) ( BD ) )
k 3
p BC p BC
1 1 1 1 0.000041
5 5
(1) + (1)
8 10
The probability of moving to node D is
5
(1)1 1
( BD (t )) ( BD ) 10
(t ) = p 3AC (t ) = =
0.00001
= 0.01
(( BA (t )) ( BA ) )+ (( BC (t )) ( BC ) )
k
p BD
1 1 1 1 5 5 0.001
(1) + (1)
8 4
At node C:
The probability of moving to node A is
5
(1)1 1
( CA (t )) ( CA )
7
(t ) = (t ) = =
0.00006
= 0.46
(( CD (t )) ( CD ) )
k 3
p CA p DC
1 1 5 0.00013
(1)
6
The probability of moving to node D is
5
(1)1 1
( CD (t )) ( CD )
6
(t ) = (t ) = =
0.00013
= 2.17
(( CA (t )) ( CA ) )
k 3
p CD p DC
1 1 5 0.00006
(1)
7
Page | 24
Ant 1: L1 = 23
Ant 2: L2 = 23
Ant 3: L3 = 23
Therefore, the required minimum distance is 23
(ii) Since all three distances that were found to be the same distances (i.e., 23), therefore, any of them can
be selected for determining the required sequence of node visitation. Had it been that different
distances were obtained, the shortest among them would have been selected. Hence, the sequence is
Ant 1: ADCBA
Or
Ant 2: CBADC
Or
Ant 3: BCDAB
(iii)
Pheromone deposited by each ant
Q
if(i, j) tour
ik, j = Lk
0
otherwise
Ant 1:
The path followed by Ant 1 = ADCBA
Path length = L1 = 23
20
ik, j = = 0.87
23
1AD = 1DC = CB
1
= 1BA = 0.87
Ant 2:
The path followed by Ant 2 = CBADC
Path length = L2 = 23
20
ik, j = = 0.87
23
CB
2
= BA
2
= AD
2
= DC
2
= 0.87
Ant 3:
The path followed by Ant 2 = BCDAB
Path length = L3 = 23
20
ik, j = = 0.87
23
BC
2
= CD
2
= DA
2
= AB
2
= 0.87
Path AD:
total
AD = AD + AD + AD
1 2 3
total
AD = 0.87 + 0.87 + 0.87 = 2.61
Page | 25
Path AC:
total
AC = AC + AC + AC
1 2 3
total
AC = 0 + 0 + 0 = 0
Path AB:
total
AB = AB + AB + AB
1 2 3
total
AB = 0.87 + 0.87 + 0.87 = 2.61
Path BC:
BC
total
= 1BC + BC
2
+ BC
3
BC
total
= 0.87 + 0.87 + 0.87 = 2.61
Path BD:
BD
total
= 1BD + BD
2
+ BD
3
BD
total
= 0+0+0 = 0
Path CD:
CD
total
= CD
1
+ CD
2
+ CD
3
BD
total
= 0.87 + 0.87 + 0.87 = 2.61
Pheromone evaporation
Path AD:
AD (t + 1) = (1 − ) AD (t ) + AD
Page | 26
BD (t + 1) = (1 − ) BD (t ) + BD
Path CD:
CD (t + 1) = (1 − ) CD (t ) + CD
-In PS algorithm, each single solution is a "bird" in the search space and it is called a "particle".
-Each particle has its fitness value which is evaluated by the fitness function to be optimized, and have
velocities which direct the flying of the particles.
-The particles fly through the problem space by following the current optimum particle.
-PS algorithm is initialized with a group of random particles (i.e., solutions) and then searches for optima
by updating generations.
-In every iteration t, each particle, i, is updated by the following two "best" values.
*The first one is the best solution (i.e., fitness) the particle has achieved so far. This value is called
t
PBest ,i
*The second one is the overall best solution (i.e., fitness) among all the particles in the population
t
that has been achieved so far. This best value is a global best and called GBest .
-After finding the two best values, the particle updates its velocity and positions with Eqs. (10) and (11),
respectively as follows:
vit,+j1 = w vit, j + c1r1t, j Pbest,
t
t
t
i − xi , j + c2 r2, j Gbest − xi , j
t t
(10)
inertia cognitivecomponent social component
Page | 27
t
Gbest is the global best position of all particle i in dimension j found from initialization through time t;
c1 and c2 are positive acceleration constants which are used to balance the contribution of the cognitive and
social components respectively;
r1, j and r2t, j are random numbers from uniform U(0,1) distribution at time t.
t
-Fig 12(b) shows the new positions of all particles after the first iteration i.e., at t = 1
-Fig 12(c) shows the new positions of all particles after the second iteration i.e., at t = 2
-Fig 12(d) shows the new positions of all particles after several iterations i.e. at t >>1
max max
y y
min min
fitness fitness
x x
Search space Search space
Fig. 12(a) Initial positions of all particles at Fig 12(b) The new positions of all particles
time t = 0. All particles are only after the first iteration i.e. at t = 1
attracted toward the global best
position.
max
y
max
y
min
min fitness
fitness
x
x
Search space
Search space
Fig 12(d) The new positions of all particles after
Fig 12(c) The new positions of all particles the several iterations i.e. at t >> 1
after the second iteration i.e. at t = 2
Consider the global optimum of an n-dimensional function defined by
f (x1, x2 , x3 , xn ) = f ( X )
where xi is the search variable, which represents the set of free variables of the given function. The aim is to find a
value x* such that the function f(x*) is either a maximum or a minimum in the search space.
Page | 28
For each particle i in I do
Initialize parameters;
End For
For time t =1 to t_max
For each particle i in I do
Fitness_x = f( xit, j );
t
If Fitness_x is better than f( Pbest,i)
t t
Pbest,i = xi , j ;
End For
t t
Gbest = best Pbest,i among all particles i in I;
Example
Using particle swarm optimization, maximize the function
f ( x ) = − x 2 + 3x − 7
s.t:
-12 ≤ x ≤ 12
Show the detailed computations for the first 3 iterations (stopping criteria) and parameters selection should
be done as follows:
-Use 5 particles with the initial positions x1 = -8.1, x2 = -3.6, x3 = -1.7, x4 = 2.8, and x5 = 5.3.
-Use zero initial particle velocity
-Use cognitive random number r1 = 0.25 and social random r2 = 0.90
-Use cognitive acceleration constant c1 = 1 and social acceleration constant c2 = 1
-Use inertia weight factor, w = 1
Solution
Page | 29
The fitness of each particle is evaluated as follows
f ( x ) = − x 2 + 3x − 7
f10 = −(− 8.1)2 + 3 (− 8.1) − 7 = −96.91
f 20 = −(− 3.6)2 + 3 (− 3.6) − 7 = −30.76
f30 = −(− 1.7)2 + 3 (− 1.7) − 7 = −14.99
f 40 = −(2.8)2 + 3 (2.8) − 7 = −6.44
f50 = −(5.3)2 + 3 (5.3) − 7 = −19.19
Page | 30
f51 = −3.052 + 3 3.05 − 7 = −7.15
Step 7: Compute the fitness values of the new positions of particles, x1i
f ( x ) = − x 2 + 3x − 7
f12 = −11.522 + 3 11.52 − 7 = −105.15
f 22 = −7.522 + 3 7.52 − 7 = −40.99
f32 = −5.822 + 3 5.82 − 7 = −23.41
f 42 = −2.82 + 3 2.8 − 7 = −4.85
f51 = −(− 0.41)2 + 3 (− 0.41) − 7 = −8.40
Page | 31
Step 2: Set the iteration number as t = 2+1 = 3 and go to step 3
Step 3: Find the personal best for each particle
t +1
t +1
xi ( ) (
if f xit +1 f Pbest,
t
)
= t
i
Pbest,i
Pbest,i otherwise
Step 7: Compute the fitness values of the new positions of particles, x1i
f ( x ) = − x 2 + 3x − 7
f13 = −10.052 + 3 10.05 − 7 = −77.85
f 23 = −6.312 + 3 6.31 − 7 = −40.99
f33 = −5.822 + 3 5.82 − 7 = −23.41
f 43 = −0.742 + 3 0.74 − 7 = −5.32
f53 = −(0.28)2 + 3 (0.28) − 7 = −6.23
Stopping criteria has been reached since 3 iterations has been done
t +1
t +1
xi ( ) (
if f xit +1 f Pbest,
t
)
= t
i
Pbest,i
Pbest,i otherwise
Page | 32
Gbest = max Pbest,
t
i
From the computed fitness values, f11 = −4.79 is the overall highest which corresponds to x = 1.71
Hence, x = 1.71.
Note that:
During the computation, if a number outside x obtained does not fall within the given boundary, the closest
boundary value should be selected for x
For example, supposing x13 was obtained to be 15, since 15 does not fall within the given range -12 ≤ x ≤
12, then x13 = 12 is chosen, since 12 is the closest to 15 among all the allowed values of x .
Likewise, supposing x13 was obtained to be -18, since -18 does not fall within the given range -12 ≤ x ≤
12, then x13 = -12 is chosen, since -12 is the closest to -18 among all the allowed values of x
Genetic Algorithm
Genetic Algorithms (GA) are based on the principles of survival of the fittest; sometimes called natural
selection.
In GA, many potential solutions to a problem are created. Each solution is evaluated to see how good it is.
The best solutions are allowed to breed with each other.
This cycle continues in the hope that better solutions will emerge and each solution is normally called a
chromosome (or an individual). Each chromosome is made up of genes, which are the individual elements
that represent the problem. The collection of chromosomes is called a population.
Firstly the suitable “parents” must be chosen. The choosing of parents is normally done after the evaluation
of the rating of individuals. In doing this the fitter individuals are more likely to breed but the weaker
members of the population also have the opportunity.
After choosing the parents, two offspring (normally) are produced from the two parents. The children
consist of genetic material taken from both parents. How the genetic material is distributed can be done in a
number of ways, which will be discussed shortly.
However, breeding is not done all the time. There is a probability associated with each breeding pair as to
whether they produce children or not. The probability of breeding is usually set to about sixty percent but
other figures are also possible.
Mutation happens with low probability and how the mutation occurs depends on the coding that is being
used. If the problem is being represented by bit strings then mutation is fairly easy to implement. It can
simply look at each bit in the chromosome and decide (with some low probability) if the bit should be
replaced with a randomly produced bit.
Genetic algorithm does not usually have any knowledge about the problem it is trying to solve. The only
part of the GA that has some domain knowledge is the evaluation function. This function is given a
chromosome and passes back an evaluation for the chromosome. It is this evaluation rating that the
breeding mechanism uses in deciding which chromosomes should breed.
The breeding mechanism has no knowledge about the problem. In its simplest form a GA is just
manipulating bit strings.
The GA Algorithm
Page | 33
A Genetic Algorithm can be implemented using the following outline algorithm
Definition of Terms:
Allele : The possible values that can be taken by a gene are called alleles.
Chromosome : An individual within the population. You may also see the following terms, which
means the same thing; individual, solution, strings and vectors.
Gene : Genes are the basic building blocks which form chromosomes. In the examples in this
handout, we have mainly considered bit strings. Each bit is a gene.
Genotype : The genotype is the expression of the chromosome. The classical representation is bit
strings but many other representations and data structures are possible.
Locus : The position of a variable within a chromosome is called its locus.
Phenotype : The phenotype is the physical expression of the chromosome. For example, a
chromosome might consist of bit strings but it could actually represent integers or real
numbers and those could represent anything.
Population : A set of chromosomes that represent a pool of solutions that are current being
considered.
Chromosome Evaluation
It is very important to note that this is the only part of the GA that has any knowledge about the problem
that is to be solved. The rest of the GA modules are simply operating on (typically) bit strings with no
information about the problem.
Population Creation
Some techniques are usually employed in creating the required population. These techniques are as follows
▪ Initialisation Technique
This technique determines how the initial population is initialized. It is often the case than a random
initialisation is done. In the case of a binary coded chromosome this means that each bit is initialised to a
random zero or one. But there are instances where the population is initialized with some known good
solutions. This might be applicable where, for example, a good solution is known but it is desired to try and
improve on it.
▪ Deletion Technique
This technique determines how the population is deleted at each generation of the GA.
Three common deletion techniques are
Delete-All : This technique deletes all the members of the current population and
replaces them with the same number of chromosomes that have just been
created.
Steady-State : This technique deletes n old members and replaces them with n new
members. The number to delete and replace (n) at any one time is a
parameter to this deletion technique.
Another consideration for this technique is deciding which members to
delete from the current population. Should the worst individuals be
Page | 34
deleted? Should deletion candidates be picked at random? Should parent
chromosomes be the ones to be deleted?
Steady-State-No-Duplicates : This is the same as the steady-state technique but the algorithm checks
that no duplicate chromosomes are added to the population.
The danger of always using, say, only the best chromosomes is that the population quickly converges to
one of these individuals and an inferior final solution is likely.
Tournament Selection : In effect, potential parents are selected and a tournament is held to decide
which of the individuals will be the parent. There are many ways this can
be achieved and two of them are
1. Select a pair of individuals at random. Generate a random number, R,
between 0 and 1. If R < r use the first individual as a parent. If the R
>= r then use the second individual as the parent. This is repeated to
select the second parent. The value of r is a parameter to this method.
2. Select two individuals at random. The individual with the highest
evaluation becomes the parent. Repeat to find a second parent.
▪ Fitness Technique
As mentioned above, using the evaluation to choose parents can lead to problems. For example, if one
individual has an evaluation that is higher than all the other members of the population then that
chromosome will get chosen a lot and will dominate the population. Similarly, if the population has almost
identical evaluations then they have an almost equal chance of being selected, which will lead to an almost
random search.
In order to solve this problem, each chromosome is sometimes given two values, an evaluation and a
fitness. The fitness is a normalised evaluation so that parent selection is done more fairly. Some of the
methods for calculating fitness are described below.
Page | 35
Fitness-Is-Evaluation : It is common to simply have the fitness of the chromosome equal to its
evaluation.
Windowing : The windowing evaluation technique takes the lowest evaluation and
assigns each chromosome a fitness equal to the amount it exceeds this
minimum.
Linear Normalization : The chromosomes are sorted by decreasing evaluation value. Then the
chromosomes are assigned a fitness value that starts with a constant value
and decreases linearly. The initial value and the decrement are parameters
to the techniques.
▪ Population Size
This parameter determines how many chromosomes should be in the population at any one time.
▪ Elitism
It is sometimes the case that a good solution previously found during the GA run gets deleted from the
population as the GA progresses. One solution is to “remember” the best solution found so far.
Alternatively a technique called elitism can be used.
This technique ensures that the best members of the population are carried forward from one generation to
the next.
It is usual to supply a parameter to the elitism function that says what percentage of the population should
be carried over from one generation to the next.
Reproduction
Chromosomes are bred through the reproduction process. Typically, two parents are selected from which
two offspring are produced. Reproduction is done by operators, with mutation playing a lesser, but still
important role.
Operators
The first GA operator to be developed was one-point crossover. The other operators have been added as the
GA field has developed. There have also been operators developed for specific problems.
▪ One-Point Crossover
One-point crossover takes two parents and breeds two children. It works as follows
Parent 1 1 0 1 1 1 0 1
Parent 2 1 1 0 0 1 1 0
Child 1 1 0 0 0 1 1 0
Child 2 1 1 1 1 1 0 1
Page | 36
Two-Point Crossover
Two-point crossover works in a similar way as one point crossover but two crossover points are selected. It
is also possible to have n-point crossover.
Uniform Crossover
For each bit position of the two children we decide, at random, which parent will contribute its bit value to
that child.
The algorithm can be implemented as follows:
Parent 1 1 0 1 1 1 0 1
Parent 2 1 1 0 0 1 1 0
Template 0 1 1 0 0 1 0
Child 1 1 0 1 0 1 0 0
Child 2 1 1 0 1 1 1 1
Mutation
If we use a crossover operator such as one-point crossover we may get better and better chromosomes but
the problem is, if the two parents (or worse – the entire population) have the same value in a certain
position then one-point crossover will not change that. In other words, that bit will have the same value
forever. Mutation is designed to overcome this problem and add some diversity to the population.
The most common way that mutation is implemented is to select a bit at random and randomly change it to
a zero or a one. Other mutation operators may swap parts of the gene or may develop problem specific
mutation operators.
Mutation Rate
This is a parameter to the GA algorithm. It defines how often mutation should be applied. A typical value
is 0.008. Therefore, when presented with a chromosome the mutation sweeps down the bit string and each
bit has one chance in 8000 of being mutated.
Crossover Rate
The crossover rate defines how often crossover should be applied. A typical value if 0.6. This means that
when presented with two parents there is a 60% chance that the parents will breed.
Example
Given the objective function
f(x) = x2 - 3x+ 12
s.t:
0 ≤ x≤ 31
Page | 37
Assuming the initial population is as provided in Table 6, apply genetic algorithm and determine the value
of x that maximizes the given expression
Hint:
Stopping criteria = stop at the end of the 2nd generation
Mutation probability = 0.0
Cross-over probability = 1.0
Use single-point cross-over with cross-over point of 3
Use Roulette wheel selection technique
Use Delete-All replacement technique
Table 6
Parent
String
Parent 1
10010
Parent 2
01111
Parent 3
01010
Parent 4
00111
Solution
First Generation
Parent
String x f(x)= x2 - 3x + 12
Parent 1
10010 18 282
Parent 2
01111 15 192
Parent 3
01010 10 82
Parent 4
00111 07 40
Total 596
Average 149
f (x ) i
596
Average = i =1
= = 149
n 4
Page | 38
f (x )i 282
For parent P1, Percentage probability = n
= = 0.47
f (x )
596
i
i =1
f (x )i 192
For parent P2, Percentage probability = n
= = 0.32
f (x )
596
i
i =1
f (x )i 82
For parent P3, Percentage probability = n
= = 0.14
f (x )
596
i
i =1
f (x )i 40
For parent P4, Percentage probability = n
= = 0.07
f (x )
596
i
i =1
f (x )i 282
For parent P1, Expected Count = = = 1.89 2
Avg f (x )i 149
f (x )i 192
For parent P2, Expected Count = = = 1.23 1
Avg f (x )i 149
f (x )i 82
For parent P3, Expected Count = = = 0.55 1
Avg f (x )i 149
f (x )i 40
For parent P4, Expected Count = = = 0.27 0
Avg f (x )i 149
Page | 39
Second Generation
Parent
String x f(x)= x2 - 3x + 12
Parent 1 10011 19 316
Parent 4 01010 10 82
Total 846
Average 211.5
f (x )
i =1
i = 316 + 166 + 282 + 82 = 846
f (x ) i
846
Average = i =1
= = 211.5
n 4
f (x )i 316
For parent P1, Percentage probability = n
= = 0.34
f (x )
846
i
i =1
f (x )i 166
For parent P2, Percentage probability = n
= = 0.20
f (x )
846
i
i =1
f (x )i 282
For parent P3, Percentage probability = n
= = 0.33
f (x )
846
i
i =1
f (x )i 82
For parent P4, Percentage probability = n
= = 0.10
f (x )
846
i
i =1
f (x )i 316
For parent P1, Expected Count = = = 1.5 2
Avg f (x )i 211.5
f (x )i 166
For parent P2, Expected Count = = = 0.78 1
Avg f (x )i 211.5
f (x )i 282
For parent P3, Expected Count = = = 1.3 1
Avg f (x )i 211.5
Page | 40
f (x )i 82
For parent P4, Expected Count = = = 0.40 0
Avg f (x )i 211.5
At the end of the second generation, the value of x and corresponding values of the objective function are:
For P1 = 100102 = 1810 , f (x)= 282
For P2 = 011112 = 1510 , f (x)= 192
For P3 = 100102 = 1810 , f (x)= 282
For P4 = 100112 = 1910 , f (x)= 316
( )
max(100102 , 011112 , 100102 , 100112) ≡ max f(18), f(15), f(18), f(19) = max(282, 192, 282, 316) = 316
Therefore, the value of x that maximizes the objective function at the end of the second generation is
x = 19
Page | 41