0% found this document useful (0 votes)
9 views41 pages

Artificial Intelligence-437

Computer engineering - AI COURSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views41 pages

Artificial Intelligence-437

Computer engineering - AI COURSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

ARTIFICIAL INTELLIGENCE

-The phrase “artificial intelligence” was coined by John McCarthy in 1956


-Artificial intelligence (AI) is a field of study that encompasses computational techniques for performing
tasks that mimics human’s perception, learning, reasoning, self-correction, and problem solving
capabilities.
-AI has inherited its ideas, concepts and techniques from many disciplines including philosophy,
mathematics, psychology, linguistics, biology etc.

Types of Artificial Intelligence


Artificial intelligence can be classified into two categories namely; (i) Strong AI, and (ii) Weak AI
(i) Strong AI: The field of strong AI concerns itself with trying to create systems that mimic human
thought processes and behaviours. A strong AI figures out its own models based on raw
input.
(ii) Weak AI: The field of weak AI concerns itself with trying to create systems that can be made to act as
if they are intelligent. A weak AI uses models given to it by programmers.

-Weak AI just emulates selected functions in the brain, while strong AI actually tries to recreate the
functions of the inside of the brain itself.
-The concepts of weak and strong AIs can be explained by an example.
Assuming that there is a very intelligent machine that does a lot of tasks with a lot of intelligence. If
the machine and a cat are both thrown into a pool of water and their reactions observed, the cat will
try to save its life by attempting to swim out of the pool, while the intelligent machine would die out
in the water without any effort to save itself. Hence, the cat possesses strong Intelligence while the
machine does not. The machine only knew what it was taught or in other words only knew what was
programmed into it. It never had the inherent capability of intelligence which would have helped it to
deal with this new situation.

Testing for Intelligence (Turing Test)


-The Turing Test, proposed by Alan Turing (1950), was designed to provide a satisfactory operational
definition of intelligence.
- In the Turing test, if the machine could fool a human into thinking that it was also human, then, it has
passed the intelligence test.

Artificial Intelligence Techniques


Some techniques that are commonly used by AI systems in problem solving include uninformed search,
informed search, machine learning, fuzzy logic, evolutionary algorithm, and artificial neural networks.
(1) Uninformed search: uninformed search, also called blind search or naive search, is a class of general
purpose search algorithms that operate in a brute-force way and are considered
inefficient. Examples include depth-first search, breadth-first search, depth-limited
search, iterative-deepening search, etc.

(2) Informed search: In contrast, informed search methods use a heuristic to guide the search for the
problem at hand and are therefore much more efficient than uninformed search. Examples
include uniform cost search, greedy best first search, A* search, hill climbing, simulated
annealing, etc.

(3) Evolutionary or population-based computation: Evolutionary computation mimics the fundamental


concepts of life using biological metaphors. The algorithms have some amount of
biological plausibility, and are based on evolution or the simulation of natural systems.

Page | 1
Examples of such algorithms include ant colony algorithm, particle swarm algorithm,
genetic algorithm, etc.

(4) Fuzzy Logic: Fuzzy logic deals with fuzzy sets and logical connectives for modelling the human-like
reasoning problems of the real world. A fuzzy set, unlike conventional sets, includes all
elements of the universal set of the domain but with varying membership values in the
interval [0,1].

(5) Machine learning: A computer or machine is said to learn from experience with respect to some class
of tasks if its performance at such tasks improves with experience. Machine learning is
classified into supervised learning, unsupervised learning and reinforcement learning.

-Supervised learning: In supervised learning, a teacher is present to identify when a result is


right or wrong.
-Unsupervised learning: In unsupervised learning, learning is achieved without a teacher. In
this case, there is no target variable, but instead, relationships in the data are exploited for
classification.
-Reinforcement learning: In reinforcement learning, an intelligent agent receives input from
the environment, takes action and receives a corresponding reward or punishment for the
action taken. The main goal is to discover which action yields the maximum reward.

(6) Artificial neural network: Artificial neural network mimics the way in which the brain performs a
particular task or function of interest. The neural computer adapts itself during a training
period, based on examples of similar problems even without a desired solution to each
problem. After sufficient training, the neural network is able to relate the problem data to
the solutions, inputs to outputs, and it is then able to offer viable solutions to brand new
problems.

SEARCH STRATEGIES
- There exist quite a large number of problem solving techniques in AI that rely on search procedure
- In such cases, problem solving first requires representation of the problem by an appropriately organized
state space. Some important terminologies relevant to AI search problems are:
- State: A state represents the status of the solution at a given step of the problem solving procedure.
- Initial State: The Initial State of the problem defines the starting state.
- Operator: This defines the set of possible actions that may be taken at a given time to move from one
state to another.
- Neighbourhood: The set of states that can be moved to from a particular state is often called the
Neighbourhood. Another way it is sometimes represented is by applying a Successor Function, S.
By applying S(x), where x is the current state, all states reachable from the current state are
neighbour states.
- State Space: The initial state and the operators define the State Space of a problem. An operator is
repeatedly applied to the current state to cause a transition to the next state (i.e., neighbourhood)
until the goal (desired) state is reached.
- Goal State and Goal test: The state space is expanded in steps and the desired state, called “the goal
state”, is searched after each incremental expansion of the state space. It is necessary to be able to
know when a problem has been solved. A Goal Test, when applied to the current state, reveals
whether the current state is a goal state.
- Path Cost Function: For some problems, the target might not be only to find a solution but also the
"cost" of the found solution is of equal importance. This cost is termed the Path Cost Function.

Measuring problem-solving performance


Some criteria usually used to measure the performance of search algorithm include:
Completeness: Indicates whether an algorithm is guaranteed to find a solution when there is one
Optimality: Indicates whether an algorithm finds the best solution where there are several solutions.

Page | 2
Time complexity: Indicates how long it takes to find a solution
Space complexity: Indicates how much memory is needed to perform a search
Search Trees and Graphs
- State space problems are often represented in form of trees or graphs that comprise the following:
(i) Node: represents the set of existing states in the search space
(ii) Arc: denotes an operator applied to an existing state to cause transition to another state;
(iii) Goal state: denotes the desired state to be identified in the nodes; and
(iv) Current state: represents the present state reached while searching for the goal state.

At each stage of the search process, the following are usually of interest
Depth: This is the number of nodes from the root node
Path-Cost: This is the total cost from the initial state to the current node

Trees and graphs


-A graph is a finite set of vertices (or nodes) that are connected by edges (or arcs). A loop (or cycle) may
exist in a graph, where an arc (or edge) may lead back to the original node.

-A specially connected graph, called a tree, does not contain cycles.

-Consider a simple search problem in physical space illustrated in Fig. 1 (a). Assuming the initial state is
‘A’ from which there are three possible actions that lead to position ‘B,’ ‘C,’ or ‘D.’ Action in this case
(also called an operator) is simply a legal move between one place and another.

-This search space can be reduced to a tree structure as illustrated in Fig. 1(b). Each node in the tree is a
physical location and the arcs between nodes are the legal moves. The depth of the tree is the distance
from the initial position ‘A’. Some important terminologies associated with this tree are as follows:

“A” is the “root node” E B A D H


State A
“A, B, C, …, J” are “nodes”
“B” is a “child” to “A” B D
I F C G J C Operator
“A” is the “parent” of “B” G H
E F
“A” is the “ancestor” of “E” Fig. 1 (a) A search problem represented as a
physical space
I J
“E” is a “descendant” of “A”
“E, F, G, I, J” are “leaf nodes” Search space

Fig. 1 (b) Representing the physical space problem


in Fig. 1(a) as a tree

Graphs and trees may be undirected (where arcs do not imply a direction) or they may be directed (where a
direction is implicit in the arc). Figs. 2(a) and (b) illustrate directed and undirected graphs, respectively.
A A
B D D
C B
C
E F G E F G
Fig. 2 (a) An example of a directed graph Fig. 2 (b) An example of an undirected graph
containing seven nodes and nine arcs. containing seven nodes and nine arcs.
The adjacency matrices for the directed and undirected graphs in Fig. 2 (a) and (b) are shown in Figs. 3(a)
and (b), respectively.

Complexity in graphs is expressed in terms of three quantities, namely:


Branching factor (b): The maximum number of successors of any node;
Depth (d): The depth of the shallowest goal node;
Length (l): The maximum depth of the search tree.

Page | 3
Time is often measured in terms of the number of nodes generated during the search, and space in terms of
the maximum number of nodes stored in memory.

To To
A B C D E F G A B C D E F G
A 1 A 1 1 1
B 1 B 1 1 1
C 1 C 1 1

From
From

D 1 D 1 1
E 1 E 1 1
F 1 1 F 1 1 1
G 1 1 G 1 1 1

Fig. 3(b) Adjacency matrix for the undirected


Fig. 3(a) Adjacency matrix for the directed
graph shown in Fig. 2(b)
graph shown in Fig. 2(a)

Implementing a Search
First of all, the information to be stored at each node of the tree needs to be decided upon. It is possible to
define a data structure for node components as follows:
Datatype: node
Components: STATE, PARENT_NODE, OPERATOR, DEPTH, PATH_COST
If the tree is implemented using a queue, the next node to be expanded may be taken from the front (or
from the rear) of the queue. The queue will have common queue functions; such as

Make_Queue(Elements): Create a queue with the given elements


Empty?(Queue): Returns true if the queue is empty
Remove_Front(Queue): Removes the element at the front of the queue and returns it
Queuing_FN(Elements, Queue): Inserts a set of elements into the queue. Different queuing
functions produce different search algorithms.

Based on the above, a function that performs a search is presented as follows:

Function GENERAL_SEARCH(problem, QUEUEING_FN) returns a solution or failure


nodes = MAKE_QUEUE(MAKE_NODE(INITIAL_STATE[problem]))
Loop do
If nodes is empty then return failure
node = REMOVE_FRONT(nodes)
If GOAL_TEST[problem] applied to STATE(node) succeeds then return node
nodes = QUEUEING_FN(nodes,EXPAND(node,OPERATORS[problem]))
End
End Function

There will be extensive use of the above function in subsequent search strategies that will be discussed. It
is assumed that the queuing function always adds new nodes to the rear of the queue.

UNINFORMED SEARCH STRATEGIES


- Uninformed search (also known as blind search) strategies do not take into account the location of the
goal during the search process but blindly search for it until it is found.
- Hence, the only thing that a blind search can do is distinguish a non-goal state from a goal state.
- Examples of uninformed search strategies include (i) depth-first search (ii) breath-first search

Page | 4
Depth-first Search
-The depth-first search generates nodes and compares them with the goal along the largest depth of the tree
and moves up to the parent of the last visited node, only when no further node can be generated below the
last visited node.
-After moving up to the parent, the algorithm attempts to generate a new offspring of the parent node.
-The above principle is employed recursively to each node of a tree in a depth-first search.
-One simple way to realize the recursion in the depth-first search algorithm is to employ a stack. The
procedure for the stack implementation, LIFO (Last In First Out), is presented as follows:

Depth-first Search Algorithm using a stack

Begin
1. Push the starting node into a stack

2. WHILE stack is not empty DO


Pop stack to get stack_top element

IF stack_top element = goal node


Return success and stop
ELSE
push the children of the stack_top element in any order into the stack
END WHILE
END

The above algorithm can be implemented using the search function described earlier. It is as follows:

Function DEPTH_FIRST_SEARCH(problem) returns a solution or failure


Return GENERAL_SEARCH(problem,ENQUEUE_AT_FRONT)
End Function

The depth-first search calls the following general search function

Function GENERAL_SEARCH(problem, QUEUEING_FN) returns a solution or failure


nodes = MAKE_QUEUE(MAKE_NODE(INITIAL_STATE[problem]))
Loop do
If nodes is empty then return failure
node = REMOVE_FRONT(nodes)
If GOAL_TEST[problem] applied to STATE(node) succeeds then return node
nodes = QUEUEING_FN(nodes,EXPAND(node,OPERATORS[problem]))
End
End Function

Comments on Depth-first Search


• It has modest memory requirements. It only needs to store the path from the root to the leaf node as
well as the unexpanded nodes. For a state space with a branching factor of b and a maximum depth of
m, depth first search requires storage of bm nodes.
• The time complexity for depth first search is bm in the worst case.
• If depth first search goes down an infinite branch, it will not terminate if it does not find a goal state.
Even if it does find a solution, there may be a better solution at a higher level in the tree. Therefore,
depth first search is neither complete nor optimal.

Example 1
Assume that the goal node in the tree illustrated in Fig. 4(a) is V9. By employing a stack, apply DFS
algorithm to the tree with the assumption that left nodes are searched first and the search process terminates
as soon as the goal item has been found.
Page | 5
(a) Illustrate the content of the stack for each step of the DFS algorithm
(b) Provide a trace of the stack process by completing Table 1(a)
(c) Indicate on the tree which part of it is explored by the DFS algorithm
Depth
Table 1(a) Trace of the stack process
V1 0
Node Stack process number
V2 Insert process Delete process
V4 V3 1
V1
V2
Finish
V7 V6 V5 V10 2 V3
V4
V8 V9 V11 V12 4 V5
V6
V7
Goal node V8
V9
Fig. 4 (a) A directed tree V10
V11
V12

Solution
(a) The status of the stack is illustrated in Fig. 4(b)

Top of stack V2 V7 V8
V4 V4 V6 V6 V9 V9
V1 V3 V3 V3 V3 V3 V3 V3
search search
begins ends
Fig. 4(b) Status of stack when DFS algorithm is executed on Fig. 4(a)

(b) A trace of the stack process is shown in Table 1.solution


Table 1(b) Trace of the stack process
Node Stack process number
Insert process Delete process
V1 1 2 V1
Finish
V2 5 6 V2
V3 3 - V4 V3
V4 4 7
V5 - - V7 V6 V5 V10
V6 8 11 Part of the tree
V7 9 10 searched by DFS V8 V9 V11 V12
V8 13 14
V9 12 15
V10 - - Fig. 4 (c) Part of the tree searched by DFS
V11 - -
V12 - -

(c) The part of the tree explored by DFS algorithm is indicated in Fig 4(c)

Breadth-first Search
-The breadth-first search (BFS) algorithm visits the nodes of the tree along its breadth, starting from the
level with depth 0 to the maximum depth.
-It can be easily realized with a queue.
-The queue implementation, FIFO (First In First Out), is presented as follows:

Breadth-first Search Algorithm using a queue


BEGIN
1. Place the starting node in a queue
2. REPEAT
Pop queue to get the front element
IF the front element of the queue = goal node
return success and stop
ELSE DO
BEGIN
insert the children of the front element (if exist) in any order at the rear end of the queue

Page | 6
END
UNTIL the queue is empty
END

Breadth first search can be implemented by using a queuing function that adds expanded nodes to the rear
of a queue as presented in the following functions:

Function BREADTH_FIRST_SEARCH(problem) returns a solution or failure


Return GENERAL_SEARCH(problem,ENQUEUE_AT_END)
End Function

The breadth-first search calls the general function

Function GENERAL_SEARCH(problem, QUEUING_FN) returns a solution or failure


nodes = MAKE_QUEUE(MAKE_NODE(INITIAL_STATE[problem]))
Loop do
If nodes is empty then return failure
node = REMOVE_FRONT(nodes)
If GOAL_TEST[problem] applied to STATE(node) succeeds then return node
nodes = QUEUING_FN(nodes,EXPAND(node,OPERATORS[problem]))
End
End Function

Example 2
Assume that the goal node in the tree illustrated in Fig. 5(a) is V9. By employing a queue, apply BFS
algorithm to the tree with the assumption that left nodes are searched first and the search process terminates
as soon as the goal item has been found.
(a) Illustrate the content of the queue for each step of the BFS algorithm
(b) Provide a trace of the queue process by completing Table 2(a)
(c) Indicate on the tree which part of it is explored by the BFS algorithm
Depth
Table 2(a) Trace of the queue process
V1 0
Node Queue process number
V2 Insert process Delete process
V4 V3 1 V1
V2
Finish
V7 V6 V5 V10 2 V3
V4
V5
V8 V9 V11 V12 4 V6
V7
Goal node V8
V9
V10
Fig. 5 (a) A directed tree V11
V12

Solution
(a) The status of the queue is illustrated in Fig. 5(b)
Search begins
V1 V2 V4 V3 V4 V3 V3 V7 V6 V7 V6 V5 V10 V6 V5 V10 V5 V10 V8 V9

Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue Queue
front rear front rear front rear front rear front rear front rear front rear

Search ends
V10 V8 V9 V8 V9 V11 V12 V9 V11 V12 V11 V12

Queue Queue Queue Queue Queue Queue Queue Queue


front rear front rear front rear front rear

Fig. 5(b) Status of queue when BFS algorithm is executed on Fig. 5(a)

Page | 7
(b) A trace of the queueing process is shown in Table 2(b)
Table 2(b) Trace of queue process
Node Queue process number
Insert process Delete process
V1 1 2 V1
V2 3
Finish 6
V3 5 10
V2
V4 4 7 V4 V3
V5 11 17
V6 9 14
V7 8 13 V7 V6 V5 V10
V8 15 21
V9 16 22 V8 V11
V9 V12
V10 12 18
V11 19 -
V12 20 -
Fig. 5 (c) A directed tree

(c) The part of the tree explored by the BFS algorithm is indicated in Fig 5(c)

Comments on Breadth-first search


• If there is a solution, breadth first search is guaranteed to find it
• If there are several solutions then breadth first search will always find the shallowest goal state first
• Breadth-first search is both complete and optimal. Or rather, it is optimal provided the path cost is a
non-decreasing function of the depth of the node.
• To measure time and space complexity
- We have to consider the branching factor, b; i.e., every state creates b new states.
- The root of the tree has 1 state.
- Level 1 produces b states and the next level produces b2 states.
- The next level produces b3 states, etc.
Assuming the solution is found a level d, then the size of the tree is:
1 + b + b2 + b3 + ... + bd
(Note: as the solution could be found at any point on the d level, then the search tree could be
smaller than this)

Depth-Limited Search
-The problem with depth-first search is that the search can go down an infinite branch and thus never
return.
-Depth-limited search avoids this problem by imposing a depth limit which effectively terminates the
search at that depth.
-The algorithm can be implemented using the general search algorithm by using operators to keep track of
the depth.
-The choice of the depth parameter can be an important factor. Choosing a parameter that is too deep is
wasteful in terms of both time and space. But choosing a depth parameter that is too shallow may result in
never reaching a goal state.
-As long as the depth parameter, l, is set “deep enough” then it is guaranteed that a solution will be found,
if it exists.
-Therefore it is complete as long as l >= d (where d is the depth of the solution). If this condition is not met
then depth limited search is not complete.
-The space requirements for depth limited search are similar to depth first search. That is, O(bl).
-The time complexity is O(bl)

Iterative Deepening Search


-The problem with depth limited search is deciding on a suitable depth parameter.
-The iterative deepening search method simply tries all possible depth limits;
first 0, then 1, then 2 etc., until a solution is found.

Page | 8
-What iterative deepening search does is that it combines breadth first search and depth first search.
-It may appear wasteful as it is expanding nodes multiple times. But the overhead is small in comparison to
the growth of an exponential search tree.
-To show this is the case, consider this.
-For a depth limited search to depth d with branching factor b the number of expansions is

1 + b + b2 + … + bd-2 + bd-1 + bd
-If we apply some real numbers to this (say b = 10 and d = 5), we get

1 + 10 + 100 + 1,000 +10,000 + 100,000 = 111,111

-For an iterative deepening search the nodes at the bottom level, d, are expanded once, the nodes at d-1 are
expanded twice, those at d-3 are expanded three times and so on back to the root.
The formula for this is
(d+1)1 + (d)b+(d-1)b2 + … + 3bd-2 + 2bd-1 + 1bd

-If we plug in the same numbers we get

6 + 50 + 400 + 3,000 + 20,000 + 100,000 = 123,456

-It can be seen that, compared to the overall number of expansions, the total is not substantially increased.
-In fact, when b=10 only about 11% more nodes are expanded for a breadth first search or a depth limited
node down to level d.
-Even when the branching factor is 2, iterative deepening search only takes about twice as long as a
complete breadth first search.
-The time complexity for iterative deepening search is O(bd) with the space complexity being O(bd).
-For large search spaces where the depth of the solution is not known iterative deepening search is
normally the preferred search method.

Checking for Repeated States


-Whilst the blind search algorithms are running, it is possible (and for some problems extremely likely) that
the same state is generated more than once.
-If this can be avoided, then, the number of nodes that are created is limited and, more importantly, the
need to have to expand the repeated nodes can be stopped.

There are three methods that can be used to stop generating repeated nodes.
1. Do not generate a node that is the same as the parent node. Or, to put it another way, do not return to
the state you have just come from.
2. Do not create paths with cycles in them. To do this we can check each ancestor node and refuse to
create a state that is the same as this set of nodes.
3. Do not generate any state that is the same as any state generated before. This requires that every state is
kept in memory (meaning a potential space complexity of O(bd)).

Summary of blind search techniques


Table 3 illustrates the summary of the four discussed blind search methods. The definitions of the symbols
used in the table are as follows:
b = Branching factor
d = Depth of solution
m = Maximum depth of the search tree
l = Search depth Limit

Page | 9
Table 3 Summary of some blind search methods
Algorithm Time Space Optimal? Complete? Derivative
Depth First Search (DFS) bm bm No No -
Breadth First Search (BFS) d
b bd Yes Yes -
Depth Limited Search (DLS) bl bl No Yes, if l >= d DFS
Iterative Deepening Search (IDS) d
b bd Yes Yes DLS

INFORMED SEARCH STRATEGIES


-Blind search strategies are normally very inefficient. However, by adding domain knowledge we can
improve the search process
-The idea behind informed search is that we explore the node that is most likely to be nearest to a goal
state.
-To do this, a heuristic function is used, which tells how close a goal state is.
-There is no guarantee that the heuristic function will return a value that ensures that a particular node is
closer to a goal state than any other node.
- Some knowledge of the domain is needed in order to implement a heuristic function; i.e., the heuristic
function has to know something about the problem so that it can judge how close the current state is to the
goal state.
-Let the path cost function be denoted by g(n) and a heuristic function be denoted by h(n), i.e.,

h(n) = estimated cost of the cheapest path from the state at node n to a goal state

-To implement an informed search, the nodes are ordered based on the value returned from a heuristic
function.
-We can implement such a search using the general search algorithm called the BEST-FIRST-SEARCH
function which chooses the best node as the node to be expanded first.
-The general best first function is as follows:

Function BEST_FIRST_SEARCH(problem, Eval_Fn) returns a solution sequence


Inputs : problem
Eval_Fn
Queueing_Fn = a function that orders nodes by Eval_Fn
Return GENERAL_SEARCH(problem,Queueing_Fn)
End Function

Where
Problem = search problem
Eval_Fn = an evaluation function

Examples of informed search strategies include uniform cost search, greedy best first search, hill climbing
search, simulated annealing, etc.

Uniform Cost Search


-It was mentioned earlier that breadth first search finds the shallowest goal state and that this will be the
cheapest solution, as long as the path cost is a function of the depth of the solution.
-But, if this is not the case, then breadth first search is not guaranteed to find the best (i.e. cheapest)
solution.
-Uniform cost search remedies this.
-Uniform cost search works by always expanding the lowest cost node on the fringe (where the cost is the
path cost, g(n))
-In fact, breadth first search is a uniform cost search with g(n) = DEPTH(n).

Page | 10
A
-Consider Fig. 6(a), assuming it is desired to find the shortest
route from S to G (i.e., S is the initial state and G is the goal 1 10
state), it can be seen that SBG is the shortest route
-But if breadth first search is applied to the problem, it will find 5 B 5
S G
the path SAG; assuming that A is the first node to be expanded
at depth level 1. 5
15
-The following illustrates how uniform cost search tackles C
the problem: Fig. 6(a)
*We start with the initial state, S, and expand it. This leads to S
the tree in Fig. 6(b)
*The path cost of A is the cheapest, so it is expanded next;
A 1 B 5 C 15
giving the tree in Fig. 6(c)
-We now have a goal state, but the algorithm does not recognize
Fig. 6(b)
it yet as there is still a node with a cheaper path cost. The
algorithm orders the queue by the path cost so the node with
cost 11 will be behind node B in the queue. S
*Node B (being the cheapest) is now expanded, giving the
tree in Fig. 6(d) A B 5 C 15
*A goal state (G) will now be at the front of the queue.
Therefore the search will end and the path SBG will be
G 11
returned. Fig. 6(c)

-In summary, uniform cost search will find the cheapest solution
provided that the cost of the path never decreases as we S
proceed along the path, otherwise, it will result in carrying out
an exhaustive search on the entire tree.
A B C 15

Greedy Best First Search G 11 G 10


-Greedy best first search seeks to minimise the estimated cost to reach a goal. Fig. 6(d)
-To do this, it expands the node that is judged to be closest to the goal state.
-To make this judgement, it uses the heuristic function, say h.
-Given a heuristic function h, we can implement a greedy search as follows:

Function GREEDY_SEARCH(problem) returns a solution or failure


Return BEST_FIRST_SEARCH(problem, h)
End Function

Function BEST_FIRST_SEARCH(problem, Eval_Fn) returns a solution sequence


Inputs : problem
Eval_Fn
Queueing_Fn = a function that orders nodes by Eval_Fn
Return GENERAL_SEARCH(problem,Queueing_Fn)
End Function

Function GENERAL_SEARCH(problem, QUEUEING_FN) returns a solution or failure


nodes = MAKE_QUEUE(MAKE_NODE(INITIAL_STATE[problem]))
Loop do
If nodes is empty then return failure
node = REMOVE_FRONT(nodes)
If GOAL_TEST[problem] applied to STATE(node) succeeds then return node
nodes = QUEUEING_FN(nodes,EXPAND(node,OPERATORS[problem]))
End
End Function
Page | 11
Example
-Consider the Nigerian road map illustrated in Fig. 7(a). We can develop a heuristic function that helps us
to find a solution to a city-to-city route finding problem.
-One possible heuristic is based on the straight line distance between the cities under consideration.
-For example, assuming that we are trying to get to Abakaliki from Zuru, whenever we find ourselves in a
city, we look at the neighbouring cities and go to the one that has the nearest straight line distance (SLD)
to Abakaliki.
The corresponding heuristic function can be defined as:

hSLD(n) = straight line distance between n and the goal location

Of course, this heuristic might not always give us the best city to go to. Two possible reasons are
• The city you decide to go to may not have a route to your destination city so that you will have to re-
trace your steps. For example, if you are trying to get to Abakaliki and you are in Kumo and the
heuristic suggests Ayangba as the next city, you would have to re-trace your steps eventually as there is
no way to get to Abakaliki from Ayangba without going through Kumo.
• The route to your final destination may be a meandering road so, although the SLD is shorter, the actual
distance to travel is longer.

-But as a general strategy, it is good to adopt the SLD heuristic since it is likely to find a solution faster
than a blind search.
-Given the Zuru/Abakaliki route finding problem and the hSLD heuristic, greedy best first search would
work as follows (see the map in Fig. 7(a) and the SLDs in Table 4):
Gusau Table 4 Straight line
71 distances from some
Birnin Nigerian towns to
Kebbi Abakaliki
Kumo Town SLD to
75 Bauchi 90
Abakaliki
Zuru
Abakaliki 0
140 Abuja 193
118 92
Kaduna 98 Ayangba 126
99 Jos
Wawa Bali 199

80 Bali Bauchi 234


Benin 200
111 Ayangba
Bida Abuja 80 Bida 244
211
Birnin Kebbi 374
70 97 85 Gboko Calabar 77
Loko
Lokoja Gboko 80
146 101 Abakaliki Gusau 380
75 Jos 178
138 90 Kaduna 253
Owo Kumo 226
120 Loko 98
Benin 138 Onitsha
Calabar Lokoja 241
Onitsha 160
Owo 242
Fig. 7(a) Road map of some selected Nigerian Wawa 329
towns
Zuru 366

Page | 12
Zuru

Wawa 374 Birnin


Kaduna 253
329 Kebbi
Abuja
Gusau Zuru 178 Jos
380 366 193
Kaduna 0 Abakaliki
253

Fig. 7(b) Partially drawn search tree for greedy


best first search algorithm

*We start at Zuru. This has an SLD of 366 to Abakaliki and as it is the only node we have got, we
expand it.
*After expanding we have three new nodes (Birnin Kebbi, Kaduna and Wawa). These have SLD's of
374, 253 and 329 respectively. The SLD heuristic tells us that Kaduna should be expanded next.
*This leads to four new nodes (Gusau, Zuru, Jos and Abuja) with SLD's of 380, 366, 178 and 193.
*Next we choose Jos to expand as this has the lowest heuristic value. This step leads to Abakaliki being
generated which (with a heuristic value of 0) is a goal state.
*The partially drawn search tree based on the greedy best first algorithm is illustrated in Fig. 7(b).

-Using this heuristic led to a very efficient search. We never expanded a node that was not on the solution
path (a minimal search cost).
-But, it is noticed that it is not optimal. If we had gone from Zuru through Kaduna and Abuja to Abakaliki
we would have travelled a shorter distance.
-Therefore, it is true to say that greedy best first search often performs well although it cannot be
guaranteed to give an optimal solution.
-If you start at Kumo and were trying to get to Jos you would initially go to Bauchi as this is closer to Jos
than Bali. But you would eventually need to go back through Kumo and through Bali to get to your
destination.
-This implies that it may be necessary to check for repeated states, otherwise, one will forever move back
and forth between Kumo and Bauchi.

-Hence, greedy best first search is not optimal.


-It is also not complete as we can start down a path and never return (consider the Bauchi/Bali problem we
discussed above). In these respects greedy search is similar to depth first search. In fact, it is also similar
in the way that it explores one path at a time and then backtracks.
-Greedy best first search has time complexity of O(bm) (where m is the maximum depth of the search tree).
-The space complexity is the same as all nodes have to be kept in memory.

Example 3
Apply Greedy Best First Search algorithm to the map in Fig. 7(a) and work out a route from Benin town to
Abakaliki by using the following cost functions.
H(n) = The Straight Line Distance between any town and town Abakaliki.
These distances are given in the Table 7.
(i) Provide the search tree for your solution and indicate the order in which you expanded the nodes.
(ii) State the route you would take

Solution
(i) The figure next to each node represents the H(n) function, where
H(n) = The heuristic value (i.e. the straight line distance to the target town)
The Search Tree is as follows

Page | 13
Benin
200

Owo 242 Onitsha


160

Abuja 193 98 Loko

Abuja 193 0 Abakaliki

(ii) The route is Benin, Onitsha, Loko, Abakaliki

A* Search
-Greedy best first search can considerably cut the search time but it is neither optimal nor complete.
-By comparison uniform cost search minimises the cost of the path so far, g(n). Uniform cost search is both
optimal and complete but can be very inefficient.
-A* search combines these two search strategies (i.e., uniform cost search and greedy best first search) to
get the advantages of both. This is done simply by combining both evaluation functions. That is

f(n) = g(n) + h(n) (1)

-As g(n) gives the path cost from the start node to node n and h(n) gives the estimated cost of the cheapest
path from n to the goal, we have

f(n) = estimated cost of the cheapest solution through n

-The good thing about this strategy is that it is optimal and complete

The A* search can be implemented as follows:

Function A*-SEARCH(problem) returns a solution or failure


Return BEST_FIRST_SEARCH(problem, g + h)
End Function

Function BEST_FIRST_SEARCH(problem, Eval_Fn) returns a solution sequence


Inputs : problem
Eval_Fn
Queueing_Fn = a function that orders nodes by Eval_Fn
Return GENERAL_SEARCH(problem,Queueing_Fn)
End Function

Function GENERAL_SEARCH(problem, QUEUING_FN) returns a solution or failure


nodes = MAKE_QUEUE(MAKE_NODE(INITIAL_STATE[problem]))
Loop do
If nodes is empty then return failure
node = REMOVE_FRONT(nodes)
If GOAL_TEST[problem] applied to STATE(node) succeeds then return node
nodes = QUEUING_FN(nodes,EXPAND(node,OPERATORS[problem]))
End
End Function

-this algorithm expands the lowest cost node on the fringe, wherever that node is in the search tree; i.e., the
select nodes for next expansion are not just restricted to nodes that have just been generated.

Page | 14
Admissible Heuristics
-An admissible heuristic is a heuristic that never overestimate the cost to reach a goal.
-It is obvious that the SLD heuristic function is admissible as we can never find a shorter distance between
any two towns.

Comment on the A* search algorithm


-A* is optimal and complete.

Example 4
Apply A* Search algorithm to the map in Fig. 7(a) and work out a route from Benin town to Abakaliki by
using the following cost functions.
G(n) = The cost of each move as the distance between each town (shown on map)
H(n) = The Straight Line Distance between any town and town Abakaliki.
These distances are given in the Table 7.
(i) Provide the search tree for your solution and indicate the order in which you expanded the nodes.
(ii) State the route you would take and the cost of that route.
(iii) The straight line distance heuristic used above is known to be an admissible heuristic. What does this
mean and why is it important?

Solution
(i) The figures next to each node represent G(n) and H(n) functions, where
G(n) = The cost of each move as the distance between each town (shown on map)
H(n) = The heuristic value (i.e. the straight line distance to a target town as shown in Table 4)

The Search Tree is as follows

Benin
0+200 = 200

120+242=362 Owo Onitsha 138+160 = 298

284 + 193 = 477 Abuja Loko 276 + 98 = 374

373 + 193 = 566 Abuja Abakaliki 377 + 0 = 377

(ii) The route is Benin, Onitsha, Loko, Abakaliki at a cost of 377


(iii) An admissible heuristic is one which never overestimates the cost to the goal. This is obviously the
case with the straight line distance between two towns. Having admissible heuristics is important as it
allows the A* algorithm to be proved to be optimal (i.e. always find the best solution).

Hill Climbing
-The hill climbing searches a point in the search space that is better than all the others. "Better" in this
context means that the evaluation is higher. We might also say that the solution is of better quality than
all the others.
-The search can be made to minimise a function as well; and in this case, it searches for a point in the
search space that has a lower evaluation.

The idea behind hill climbing is as follows:

1. Pick a random point in the search space.


2. Consider all the neighbours of the current state.

Page | 15
3. Choose the neighbour with the best quality and move to that state.
4. Repeat 2 thru 4 until all the neighbouring states are of lower quality.
5. Return the current state as the solution state.

-The hill climbing algorithm is as follows:

Function HILL_CLIMBING(Problem) returns a solution state


Inputs: Problem
Local variables: Current node
Next node
Current node = MAKE_NODE(INITIAL_STATE[Problem])
Loop do
Next node = a highest_valued successor of Current node
If VALUE[Next node] < VALUE[Current node] then return Current node
Current node = Next node
End

-Note that this algorithm does not maintain a search tree. It only returns a final solution.
-Also, if two neighbours have the same evaluation and they are both the best quality, then the algorithm
will choose between them at random.

The hill climbing algorithm has some shortcomings which are:


-Local maxima (or minima): One common problem is trapping at local maxima at a foothill. A
local maximum is a peak that is higher than each of its neighbouring states but lower than the
global maximum.
-Ridges: Ridges result in a sequence of local maxima that is very difficult for greedy algorithms to
navigate.
-Plateaux: a plateau is an area of the state space landscape where the evaluation function is flat.
Once a state on a plateau is reached, all legal next states will also lie on this surface, making
further search ineffective.

Simulated Annealing
-Annealing is a process of metal casting, where the metal is first melted at a high temperature beyond its
melting point and then is allowed to cool down, until it returns to the solid form.
-Thus in the physical process of annealing, the hot material gradually loses energy and finally at one point
in time reaches a state of minimum energy.
-In general, the aim of simulated annealing is to find a minimum energy for a system
-The outer loop of the simulated annealing algorithm is quite similar to hill climbing. Instead of picking the
best move, however, simulated annealing picks random move. If the move results in a lower energy, it is
accepted with a probability of 1; otherwise, it is accepted with a probability less than 1.
-Hence, it may make some downhill moves before finding a good way to move uphill. This downhill
moves enables it to be pulled out of a local maximum.
-E is the change in energy level (i.e., E = Costnextstate – Costcurrentstate)
- If E < 0, then the next state is accepted as the current state (i.e., transversal to lower energy state);
otherwise, it’s acceptance is based on a randomly generated probability P (which may result in transversal
to higher energy state)
-The probability, P, of moving to a higher energy state is
 E 
− 
P= e T  (2)
where
T is the temperature which is periodically reduced
-The simulated annealing algorithm is as follows:

Function SIMULATED_ANNEALING(Problem, Schedule) returns a solution state


Inputs : Problem, a problem
Page | 16
Schedule, a mapping from time to temperature
Local Variables : Current, a node
Next, a node
temp, a “temperature” controlling the probability of downward steps

Current = MAKE_NODE(INITIAL_STATE[Problem])
temp = tempmax
Loop do
If temp = tempmin
then return Current
End if
Next = a randomly selected successor of Current
ΔE = VALUE[Next] – VALUE[Current]
If ΔE < 0
then Current = Next
else
Current = Next only with probability P given as exp(-ΔE/ temp)
End if
temp = temp-temp
End Loop
End Function

Analysis of the simulated annealing algorithm


-In the above algorithm, the reduction in temperature is referred to as cooling.
-There are two common ways of reducing the temperature (i.e., cooling); they are:
(i) tempnext = tempcurrent - temp (3)
and
(ii) tempnext = α x tempcurrent (4)
where temp is the temperature step size and α (i.e., cooling rate) is a constant fraction e.g., 0.9
-At the beginning of the algorithm, the temperature is usually very high.
-As the temperature becomes lower
E
* gets bigger
temp
E
* - gets smaller and
temp
 E 
− 
 temp 
* e gets
smaller
-This means that, as the process continues, the probability, P, of a downhill move gets smaller and smaller.
-And the probability of uphill moves gets bigger and bigger.
-The algorithm assumes that the annealing process will continue until the temperature reaches its minimum
value after which it terminates. Some other implementations keep decreasing the temperature until some
other condition is met; e.g., no change in the best state for a certain period of time or 3 successive
temperature stages that yield no acceptance, etc.
-Another possible stopping criteria of the algorithm is to terminate after a certain number of iterations has
been reached.

Page | 17
Evolutionary or population-based computation
-Evolution has been used as a metaphor for solving very difficult problems.
-Evolution in itself is a mechanism of incremental search, whereby more fit solutions to problems
propagate to future generations, and less fit solutions gradually fade away.
-The algorithms used here are population-based algorithms.
-Each of the algorithms operates on a population of entities, parallelizing the ability to solve a problem.
-Examples of population-based algorithms include, ant colony algorithm, particle swarm algorithm,
genetic algorithm, etc.

Ant Colony Algorithm


-The ant colony algorithms (ACA) are inspired by the behaviour of social insects
-As the name implies, ACA is an optimization algorithm using simulated ants. Ants have the ability to
build invisible trails made up of pheromones. These trails can be used to lead ants to and from food, thus
optimizing their round-trip times to the food and back to the nest.
-A colony of ants is thus able to choose the shortest path towards a source to exploit without the individuals
having a global vision of the path
- The ants that followed the shortest path arrives at the nest earlier than others, after having visited the
source of food. Thus, the quantity of pheromone present on the shortest path is slightly more significant
than that present on the longest path.
-A trail presenting a greater concentration of pheromones is more attractive for the ants and it has a larger
probability to be followed.
- Hence the short trail will be reinforced more than the
long one, and, in the long run, will be chosen by the
great majority of the ants. This phenomenon is
illustrated in Fig. 8.
- The phenomenon can be applied to the search for the
shortest distance between n nodes in a fully connected
graph. In this case, the artificial ants have a memory to
store the path traversed, an initial state and the stopping
conditions. The ants move according to a probabilistic
rule of decision function of the local trails of
pheromone, state of the ant and constraints of the
problem.
- For each ant, the path between a node i and a node j
depends on 3 factors:
(a) (b)
1. The list of the unvisited nodes ( J ik ), which
Fig. 8 Illustration of the selection of the
defines the possible movements in each step, shortest branches by a colony of ants:
when an ant k is on the node i (a) the beginning of the branch
2. The reciprocal of the distance between the selection process, (b) the end of the
nodes called the visibility ij given by: branch selection process.

1
ij = (5)
dij
This static information is used to direct the choice of the ants towards closer nodes, and to
avoid nodes that are too remote;
3. The quantity of pheromone deposited on the edge connecting the two nodes, called intensity of
the trail. This parameter defines the relative attraction of part of the total path and changes with
each passage of an ant. This can be viewed as a global memory of the system, which evolves
through a training process.

The rule of movement is given as follows:


The probability that ant k located at node i will move to node j is

Page | 18


( ij (t ))  (ij ) if j  J ik

pijk (t ) =  lJ k ( il (t ))  (ij ) (6)
 i


0 if j  J ik

where α and β are two parameters controlling the relative importance of the trail intensity,  ij (t ) ,and
visibility ij .
With α = 0, only visibility of the node is taken into consideration; the node nearest is thus
selected with each step.
On the contrary, with β = 0, only the trails of pheromone become influential.

-After each iteration, each ant leaves a certain quantity of pheromones  ijk (t ) on its entire course, the
amount of which depends on the quality of the solution found:

 Q
if (i, j )  T k (t )
 ijk (t ) =  Lk (t ) (7)
0 if (i, j )  T (t )
k

where T k (t ) is the path traversed by ant k during the iteration t, Lk (t ) is the length of the turn and Q a
fixed parameter.
-Bad solutions are forgotten through the process of evaporation of the trails of pheromone which helps
avoid getting trapped in sub-optimal solutions. Hence, the update rule for the trails is given as:

 ij (t + 1) = (1 −  ) ij (t ) +  ij (t ) (8)
where
 ij (t ) =   ijk (t )
m
k =1
(9)
and m is the number of ants. The initial quantity of pheromone on the paths is a uniform distribution
of a small quantity τ0 ≥ 0.

-The ant colony algorithm is as follows:

For time t = 1, . . . , tmax


For each ant k = 1, . . . , m
Choose a node randomly
For each non-visited node i
Choose a node j, from the list J ik of the remaining nodes, according to the Eq. (6)
End For
Deposit a trail  ijk (t ) on the path T k (t ) in accordance with the Eq. (7)
End For
Evaporate trails according Eq. (8)
End For

Analysis of the ant colony algorithm


-Assuming Fig. 9(a) represents the map that the ants have to traverse.
-At each time unit, t, each ant moves a distance, d, of 1.
-All ants are assumed to move at the same time. At the end of each time step the ants lay down a
pheromone trail of intensity 1 on the edge (route) they have just travelled along.
-At t=0 there is no pheromone on any edge of the graph.

Page | 19
C
d =0.5 d =0.5
E A
d =1 D d =1
B
d =1 d =1

H
Fig. 9(a) Graph illustrating the paths to be taken by ants
-Assume that thirty ants are moving from E to A and another thirty ants are moving from A to E.
-At t=1 there will be thirty ants at B and thirty ants at D. At this point they have a 0.5 probability as to
which way they will turn. We assume that half go one way and half go the other way as indicated in Fig
9(a).
-At t = 2 there will be fifteen ants at D (who have travelled from B, via C) and fifteen ants at B (who have
traveled from D, via C). As illustrated in ig. 9(b),there will be thirty ants at H (fifteen from D and
fifteen from B). The intensities on the edges will be as follows.

ED = 30, AB = 30, BH = 15, HD = 15, BC = 30 and CD = 30

C C

30 ants
d =0.5 d =0.5 d =0.5 d =0.5
15 ants 30 ants 30 ants 20 ants 30 ants
E 15 ants 20 ants
d =1 A E A
D 15 ants d =1 d =1 D 10 ants d =1
B B
15 ants 10 ants
d =1 d =1 d =1 d =1

H H
Fig. 9(b) Graph illustrating the paths to be taken by ants at Fig. 9(c) Graph illustrating the paths to be taken by ants at
time t = 1 time t = 2
-If we now introduce another 60 ants into the system (30 from each direction) more ants are likely to
follow the BCD rather than BHD as the pheromone trail is more intense on the BCD route as
illustrated in Fig 9(c).

Figs. 10(a) through (f) illustrates how ants are made to tour from node to node

[A] [B]
[ [
[C]
] [ ]
A B A Ant11 ] Ant12 B
Ant13 C
C [E]
[D]
[
[ ]
D ] E Ant15
E
D Ant14
Fig. 10(a) Original nodes without ants Fig. 10(b) One ant is deployed at each of the starting
nodes

Page | 20
[E,A] [C,B] [D,E,A] [E,A,B]
[ [
[B,C] [A,D,C]
] ]
A Ant15 Ant 3 B A Ant14 Ant 5 B
Ant12 C Ant11 C
[D,E] [C,B,E]
[A,D] [B,C,D]

D E Ant14 D E Ant13
Ant11 Ant12
Fig. 10(c) Each ant makes a tour to the 1st Fig. 10(d) Each ant makes a tour to the 2nd
neighbouring node neighbouring node

[B,C,D,A] [D,E,A,B] [C,B,E,D,A] [A,D,C,E,B]


[ [
[E,A,B,C] [D,E,A,B,C]
] ]
A Ant12 Ant 4 B A Ant13 Ant 1 B
Ant15 C Ant14 C
[A,D,C,E] [B,C,D,A,E]
[C,B,E,D] [E,A,B,C,D]

D E Ant11 D E Ant12
Ant13 Ant15
Fig. 10(e) Each ant makes a tour to the 3rd Fig. 10(f) Each ant makes a tour to the 4th
neighbouring node neighbouring node

Total distance covered by each ant at the end of the tour is computed e.g.,

[A,D,C,E,B] [B,C,D,A,E] [C,B,E,D,A] [D,E,A,B,C] [E,A,B,C,D]

L1 =300 L2 =450 L3 =260 L3 =280 L5 =420


Ant 1 Ant12 Ant13 Ant14 Ant15

The sequence of the ACA implementation is summarised as follows:

Deploy ants on different nodes


Determine the route followed by each ant using the equation: pijk (t ) =
( ij (t ))  (ij )
lJ k ( il (t ))  (ij )
i

After each ant has visited all nodes, it is returned to its starting node
Q
Compute pheromone deposited by each ant on the route it followed:  ik, j =
Lk
Compute total pheromone deposited by all ants that followed a route:  ij (t ) = m
k =1
 ijk (t )

(e.g., for route AB,  total


A, B =  A, B +  A, B +  A, B +  A, B +  A, B )
1 2 3 4 5

End of First Run


Evaporate and update pheromone:  ij (t + 1) = (1 −  ) ij (t ) +  ij (t )

Save Best Tour (Sequence and length)


All ants die
New ants are born and another run begins
Page | 21
Example 5
The Ant Colony Algorithm is to be used on Fig. 11, and the associated ant colony algorithm parameters are
as follows:
influence of pheromone: α = 1
influence of visibility: β = 5
pheromone evaporation constant: ρ = 0.5
size of deposited pheromone control constant: Q = 20
initial pheromone on all paths: τ = 1
Number of generation of ants: 1
Total number of ants: 3
Nodes where ants are deployed: nodes A, B, and C
(i) Using Ant Colony Algorithm, determine the shortest round trip distance covered in a tour that begins at
a starting node, visits all other nodes and returns to the starting node.
(ii) What is the sequence in which the nodes are visited that resulted in the computed shortest round trip
distance in (i) above
(iii) Determine the size of pheromone on all the paths after evaporation
8
A 7
4 B
5 6 C
Solution
10
(i)
D
The probability of an ant moving from node node i to node j is Fig. 11


( ij (t ))  ( ij )
if j  J ik
lJ k ( il (t ))  ( ij )

p ijk (t ) = 
 i

0 if j  J ik

Ant 1

At node A:

The probability of moving to node D is


5
(1)1   1 
( AD (t ))  ( AD ) 5
(t ) =  p 1AD (t ) = =
0.00032
= 3.56
(( AC (t ))  ( AC ) )+ (( AB (t ))  ( AB ) )
k
p AD
 1  1    1  1   0.00009
5 5
 (1)     +  (1)    
  7     8  

The probability of moving to node C is
5
(1)1   1 
( AC (t ))  ( AC )
 
7
(t ) =  p 1AC (t ) = =
0.00006
= 0.171
(( AD (t ))  ( AD ) )+ (( AB (t ))  ( AB ) )
k
p AC 
 1  1    1  1   0.00035
5 5
 (1)     +  (1)    
  5     8  

The probability of moving to node B is
5
(1)   1 
1
( AB (t ))  ( AB ) 8
(t ) =  p 1AC (t ) = =
0.000031
= 0.082
(( AC (t ))  ( AC ) )+ (( AD (t ))  ( AD ) )
k
p AB
 1  1  5   1  1  5  0.00038
 (1)     +  (1)    
  7     5  

Page | 22
Since Path AD has the highest probability, path AD is selected

At node D:
The probability of moving to node C is
5
(1)1   1 
( DC (t ))  ( DC ) 6
(t ) =  p 1DC (t ) = =
0.00013
= 13
(( DB (t ))  ( DB ) )
k
p DC
 1 1 5  0.00001
 (1)    
  10  
 
The probability of moving to node B is
5
(1)1   1 
( DB (t ))  ( DB )  10 
(t ) =  p 1DC (t ) = =
0.00001
= 0.077
(( DC (t ))  ( DC ) )
k
p DB
 1 1 5  0.00013
 (1)    
 6 
 
Since Path DC has the highest probability, path DC is selected

Therefore, the path followed by Ant 1 = ADCBA


Path length = L1 = 5 + 6 + 4 + 8 = 23

Ant 2

At node C:
The probability of moving to node A is
5
(1)1   1 
( CA (t ))  ( CA ) 7
(t ) =  (t ) = =
0.00006
= 0.054
(( CB (t ))  ( CB ) )+ (( CD (t ))  ( CD ) )
k 2
p CA p CA
 1  1    1  1   0.00111
5 5
 (1)     +  (1)    
  4     6  

The probability of moving to node B is
5
(1)1   1 
( CB (t ))  ( CB ) 4
(t ) =  (t ) = =
0.00098
= 1.11
(( CA (t ))  ( CA ) )+ (( CD (t ))  ( CD ) )
k 2
p CB p CB
 1  1    1  1   0.00088
5 5
 (1)     +  (1)    
  7     6  

The probability of moving to node D is
5
(1)1   1 
( CD (t ))  ( CD )
 
6
(t ) =  p 1AC (t ) = =
0.00013
= 0.125
(( CA (t ))  ( CA ) )+ (( CB (t ))  ( CB ) )
k
p CD 
 1  1  5   1  1  5  0.00104
 (1)     +  (1)    
  7     4  

Since Path CB has the highest probability, path CB is selected

At node B:
The probability of moving to node A is
5
(1)1   1 
( BA (t ))  ( BA )
 
8
(t ) =  p 1DC (t ) = =
0.000031
= 3.1
(( BD (t ))  ( BD ) )
k
p BA
 1 1 5  0.00001
 (1)    
  10  
 
The probability of moving to node D is

Page | 23
5
1
(1)   1 
( BD (t ))  ( BD )  10 
(t ) =  p 1DC (t ) = =
0.00001
= 0.32
(( BA (t ))  ( BA ) )
k
p BD
 1  1  5  0.000031
 (1)    
  8  

Since Path BA has the highest probability, path BA is selected

Therefore, the path followed by Ant 2 = CBADC


Path length = L2 = 4 + 8 + 5 + 6 = 23

Ant 3

At node B:
The probability of moving to node A is
5
(1)1   1 
( BA (t ))  ( BA )
 
8
(t ) =  (t ) = =
0.000031
= 0.031
(( BC (t ))  ( BC ) )+ (( BD (t ))  ( BD ) )
k 3
p BA 
p CA
 1  1  5   1  1  5  0.00099
 (1)     +  (1)    
  4     10  

The probability of moving to node C is
5
(1)1   1 
( BC (t ))  ( BC ) 4
(t ) =  (t ) = =
0.00098
= 23.9
(( BA (t ))  ( BA ) )+ (( BD (t ))  ( BD ) )
k 3
p BC p BC
 1  1    1  1   0.000041
5 5
 (1)     +  (1)    
  8     10  

The probability of moving to node D is
5
(1)1   1 
( BD (t ))  ( BD )  10 
(t ) =  p 3AC (t ) = =
0.00001
= 0.01
(( BA (t ))  ( BA ) )+ (( BC (t ))  ( BC ) )
k
p BD
 1 1   1 1  5 5 0.001
 (1)     +  (1)    
  8     4  

Since Path BC has the highest probability, path BC is selected

At node C:
The probability of moving to node A is
5
(1)1   1 
( CA (t ))  ( CA )
 
7
(t ) =  (t ) = =
0.00006
= 0.46
(( CD (t ))  ( CD ) )
k 3
p CA p DC
 1  1  5  0.00013
 (1)    
  6  

The probability of moving to node D is

5
(1)1   1 
( CD (t ))  ( CD )
 
6
(t ) =  (t ) = =
0.00013
= 2.17
(( CA (t ))  ( CA ) )
k 3
p CD p DC
 1 1 5  0.00006
 (1)    
 7 
 

Since Path CD has the highest probability, path CD is selected


Therefore, the path followed by Ant 3 = BCDAB
Path length = L3 = 4 + 6 + 5 + 8 = 23
The distance covered by the ants are:

Page | 24
Ant 1: L1 = 23
Ant 2: L2 = 23
Ant 3: L3 = 23
Therefore, the required minimum distance is 23

(ii) Since all three distances that were found to be the same distances (i.e., 23), therefore, any of them can
be selected for determining the required sequence of node visitation. Had it been that different
distances were obtained, the shortest among them would have been selected. Hence, the sequence is
Ant 1: ADCBA
Or
Ant 2: CBADC
Or
Ant 3: BCDAB

(iii)
Pheromone deposited by each ant

Q
 if(i, j)  tour
 ik, j =  Lk
0
 otherwise

Ant 1:
The path followed by Ant 1 = ADCBA
Path length = L1 = 23
20
 ik, j = = 0.87
23

 1AD =  1DC =  CB
1
=  1BA = 0.87

Ant 2:
The path followed by Ant 2 = CBADC
Path length = L2 = 23
20
 ik, j = = 0.87
23

 CB
2
=  BA
2
=  AD
2
=  DC
2
= 0.87
Ant 3:
The path followed by Ant 2 = BCDAB
Path length = L3 = 23
20
 ik, j = = 0.87
23

 BC
2
=  CD
2
=  DA
2
=  AB
2
= 0.87

Pheromone update by all ants


 ij (t ) =   ijk (t )
m
k =1

Path AD:
 total
AD =  AD +  AD +  AD
1 2 3

 total
AD = 0.87 + 0.87 + 0.87 = 2.61
Page | 25
Path AC:
 total
AC =  AC +  AC +  AC
1 2 3

 total
AC = 0 + 0 + 0 = 0

Path AB:
 total
AB =  AB +  AB +  AB
1 2 3

 total
AB = 0.87 + 0.87 + 0.87 = 2.61

Path BC:

 BC
total
=  1BC +  BC
2
+  BC
3

 BC
total
= 0.87 + 0.87 + 0.87 = 2.61

Path BD:
 BD
total
=  1BD +  BD
2
+  BD
3

 BD
total
= 0+0+0 = 0

Path CD:
 CD
total
=  CD
1
+  CD
2
+  CD
3

 BD
total
= 0.87 + 0.87 + 0.87 = 2.61

Pheromone evaporation

Path AD:
 AD (t + 1) = (1 −  ) AD (t ) +  AD

 AD (t + 1) = (0.5 1) + 2.61 = 3.11


Path AC:
 AC (t + 1) = (1 −  ) AC (t ) +  AC
 AC (t + 1) = (0.5 1) + 0 = 0.5
Path AB:
 AB (t + 1) = (1 −  ) AB (t ) +  AB
( )
 AB (t + 1) = 0.5 1 + 2.61 = 3.11
Path BC:
 BC (t + 1) = (1 −  ) BC (t ) +  BC

 BC (t + 1) = (0.5 1) + 2.61 = 3.11


Path BD:

Page | 26
 BD (t + 1) = (1 −  ) BD (t ) +  BD

 BD (t + 1) = (0.5 1) + 0 = 0.5

Path CD:
 CD (t + 1) = (1 −  ) CD (t ) +  CD

 CD (t + 1) = (0.5 1) + 2.61 = 3.11


Hence, the size of pheromone on the paths after evaporation are:
 AD = 3.11  AC = 0.5  AB = 3.11
 BC = 3.11  = 0.5  CD = 3.11
BD

Particle Swarm Algorithm


-Particle swarm (PS) algorithm simulates the behaviour of bird flocking.
-It was proposed by James Kennedy & Russell Eberhart in 1995
-Supposing that
* A group of birds are randomly searching for food in an area and there is only one piece of food in
the area being searched.
*All the birds do not know where the food is but they know how far the food is after each
movement.
-Given this scenario, what is the best solution to find the food?
*The best solution is to follow the bird which is nearest to the food.

-In PS algorithm, each single solution is a "bird" in the search space and it is called a "particle".
-Each particle has its fitness value which is evaluated by the fitness function to be optimized, and have
velocities which direct the flying of the particles.
-The particles fly through the problem space by following the current optimum particle.

-PS algorithm is initialized with a group of random particles (i.e., solutions) and then searches for optima
by updating generations.

-In every iteration t, each particle, i, is updated by the following two "best" values.
*The first one is the best solution (i.e., fitness) the particle has achieved so far. This value is called
t
PBest ,i

*The second one is the overall best solution (i.e., fitness) among all the particles in the population
t
that has been achieved so far. This best value is a global best and called GBest .

-After finding the two best values, the particle updates its velocity and positions with Eqs. (10) and (11),
respectively as follows:

 

vit,+j1 = w  vit, j + c1r1t, j Pbest,
t
 

 
 t

t


i − xi , j + c2 r2, j Gbest − xi , j
t t
(10)
inertia cognitivecomponent social component

xit +1 = xit + vit +1 (11)


where
vit, j is the velocity vector of particle i in dimension j at time t;
w is the inertia weight factor
xit is the position vector of particle i in dimension j at time t;
t
Pbest,i is the personal best position of particle i in dimension j found from initialization through time t;

Page | 27
t
Gbest is the global best position of all particle i in dimension j found from initialization through time t;
c1 and c2 are positive acceleration constants which are used to balance the contribution of the cognitive and
social components respectively;
r1, j and r2t, j are random numbers from uniform U(0,1) distribution at time t.
t

-In Eq. (10),


* the inertia component makes the particle move in the same direction and with the same velocity
* the cognitive component makes the particle move to its own best position so far
* the social component makes the particle follow the best neighbors direction
-Figs. 12(a) through (d) show the position updates for more than one particle in a two dimensional search
space
-Fig. 12(a) shows the initial position of all particles. The cognitive component is zero at t = 0 and all
particles are only attracted toward the best position by the social component.

-Fig 12(b) shows the new positions of all particles after the first iteration i.e., at t = 1

-Fig 12(c) shows the new positions of all particles after the second iteration i.e., at t = 2

-Fig 12(d) shows the new positions of all particles after several iterations i.e. at t >>1

max max
y y

min min
fitness fitness

x x
Search space Search space

Fig. 12(a) Initial positions of all particles at Fig 12(b) The new positions of all particles
time t = 0. All particles are only after the first iteration i.e. at t = 1
attracted toward the global best
position.

max
y
max
y

min
min fitness
fitness
x
x
Search space
Search space
Fig 12(d) The new positions of all particles after
Fig 12(c) The new positions of all particles the several iterations i.e. at t >> 1
after the second iteration i.e. at t = 2
Consider the global optimum of an n-dimensional function defined by

f (x1, x2 , x3 , xn ) = f ( X )

where xi is the search variable, which represents the set of free variables of the given function. The aim is to find a
value x* such that the function f(x*) is either a maximum or a minimum in the search space.

The particle swarm algorithm is as follows:

Page | 28
For each particle i in I do
Initialize parameters;
End For
For time t =1 to t_max
For each particle i in I do
Fitness_x = f( xit, j );
t
If Fitness_x is better than f( Pbest,i)
t t
Pbest,i = xi , j ;
End For
t t
Gbest = best Pbest,i among all particles i in I;

For each particle i in I do



vit,+j1 = w  vit, j + c1r1t, j Pbest,
t
 
i − xi , j + c2 r2, j Gbest − xi , j
t t t t

xit +1 = xit + vit +1
End For
End For

In particle swarm algorithm,


-Number of particles is usually between 10 and 50
-C1 and C2 are usually chosen such that C1 + C2 = 4
-r1 and r2 are random numbers in the interval 0 and 1 usually selected using a uniform distribution
-If velocity, v, is too low → algorithm too slow
-If velocity, v, is too high → algorithm too unstable
-The algorithm continues to run until either
* a predetermined maximum number of iteration is reached or
*a predetermined stopping error criteria is achieved

Example
Using particle swarm optimization, maximize the function
f ( x ) = − x 2 + 3x − 7
s.t:
-12 ≤ x ≤ 12
Show the detailed computations for the first 3 iterations (stopping criteria) and parameters selection should
be done as follows:
-Use 5 particles with the initial positions x1 = -8.1, x2 = -3.6, x3 = -1.7, x4 = 2.8, and x5 = 5.3.
-Use zero initial particle velocity
-Use cognitive random number r1 = 0.25 and social random r2 = 0.90
-Use cognitive acceleration constant c1 = 1 and social acceleration constant c2 = 1
-Use inertia weight factor, w = 1

Solution

Step 1: initialize population


The initial population xi0 (i = 1, 2, …, 5) at time t = 0 is as follows
x10 = −8.1 , x20 = −3.6 , x30 = −1.7 , x40 = 2.8 , x50 = 5.3 ,

Page | 29
The fitness of each particle is evaluated as follows
f ( x ) = − x 2 + 3x − 7
f10 = −(− 8.1)2 + 3  (− 8.1) − 7 = −96.91
f 20 = −(− 3.6)2 + 3  (− 3.6) − 7 = −30.76
f30 = −(− 1.7)2 + 3  (− 1.7) − 7 = −14.99
f 40 = −(2.8)2 + 3  (2.8) − 7 = −6.44
f50 = −(5.3)2 + 3  (5.3) − 7 = −19.19

Intial velocities of particles are:


vi0 = 0 , i.e., v10 = v20 = v30 = v40 = v50 = 0

Step 2: Set the iteration number as t = 0+1 = 1 and go to step 3


Step 3: Find the personal best for each particle
t +1
 t +1
 xi ( ) (
if f xit +1  f Pbest,
t
)
=
i
Pbest, i  t

Pbest,i otherwise

1 = −8.1 , Pbest,2 = −3.6 , Pbest,3 = −1.7


1 1 1
Pbest,

4 = 2.8 , Pbest,5 = 5.3


1 1
Pbest,

Step 4: Find the global best among all particles



Gbest = max Pbest,
t
i 
From the computed fitness values, f 40 = −6.44 is the highest
4 = 2.8 and therefore Gbest = 2.8
1
Hence, Pbest,

Step 5: Compute the velocities of particles



vit +1 = w  vit + c1r1t Pbest,
t
 
i − xi + c2 r2 Gbest − xi
t t t t

v11 = 0 + 1 0.25  − 8.1 − (− 8.1) + 1 0.9  2.8 − (− 8.1) = 9.81
v12 = 0 + 1 0.25  − 3.6 − (− 3.6)+ 1 0.9  2.8 − (− 3.6) = 5.76
v13 = 0 + 1 0.25  −1.7 − (−1.7) + 1 0.9  2.8 − (−1.7) = 4.05
v14 = 0 + 1 0.25  2.8 − (2.8) + 1 0.9  2.8 − (2.8) = 0
v15 = 0 + 1 0.25  5.3 − (5.3) + 1 0.9  2.8 − 5.3 = −2.25

Step 6: Find new position of particles x1i


x11 = −8.1 + 9.81 = 1.71
x12 = −3.6 + 5.76 = 2.16
x13 = −1.7 + 4.05 = 2.35
x14 = 2.8 + 0 = 2.8
x15 = 5.3 − 2.25 = 3.05
Step 7: Compute the fitness values of the new positions of particles, x1i
f ( x ) = − x 2 + 3x − 7
f11 = −1.712 + 3 1.71 − 7 = −4.79
f 21 = −2.162 + 3  2.16 − 7 = −5.19
f31 = −2.352 + 3  2.35 − 7 = −5.47
f 41 = −2.82 + 3  2.8 − 7 = −6.44

Page | 30
f51 = −3.052 + 3  3.05 − 7 = −7.15

Step 8: Stopping criterion


If termination criteria is not satisfied, go to step 2
Otherwise stop the iteration and output the results

Step 2: Set the iteration number as t = 1+1 = 2 and go to step 3


Step 3: Find the personal best for each particle
t +1
 t +1
 xi ( ) (
if f xit +1  f Pbest,
t
)
=
i
Pbest, i  t

Pbest,i otherwise

1 = 1.71 , Pbest,2 = 2.16 , Pbest,3 = 2.35


2 2 2
Pbest,

4 = 2.8 , Pbest,5 = 3.05


2 2
Pbest,

Step 4: Find the global best among all particles



Gbest = max Pbest,
t
i 
From the computed fitness values, f11 = −4.79 is the highest
1 = 1.71 and therefore Gbest = 1.71
2
Hence, Pbest,

Step 5: Compute the velocities of particles



vit +1 = w  vit + c1r1t Pbest,
t
 
i − xi + c2 r2 Gbest − xi
t t t t

v12 = 9.81 + 1 0.25  1.71 −1.71 + 1 0.9  1.71 −1.71 = 9.81
v22 = 5.76 + 1 0.25  2.16 − 2.16+ 1 0.9  1.71 − 2.16 = 5.36
v32 = 4.05 + 1 0.25  2.35 − 2.35 + 1 0.9  1.71 − 2.35 = 3.47
v42 = 0 + 1 0.25  2.8 − (2.8) + 1 0.9  1.71 − (2.8) = −0.98
v52 = −2.25 + 1 0.25  3.05 − (3.05) + 1 0.9  1.71 − 3.05 = −3.46

Step 6: Find new position of particles xi2


x12 = 9.81 + 1.71 = 11.52
x22 = 2.16 + 5.36 = 7.52
x32 = 2.35 + 3.47 = 5.82
x42 = 2.8 − 0.98 = 1.82
x52 = 3.05 − 3.46 = −0.41

Step 7: Compute the fitness values of the new positions of particles, x1i
f ( x ) = − x 2 + 3x − 7
f12 = −11.522 + 3 11.52 − 7 = −105.15
f 22 = −7.522 + 3  7.52 − 7 = −40.99
f32 = −5.822 + 3  5.82 − 7 = −23.41
f 42 = −2.82 + 3  2.8 − 7 = −4.85
f51 = −(− 0.41)2 + 3  (− 0.41) − 7 = −8.40

Step 8: Stopping criterion


If termination criteria is not satisfied, go to step 2
Otherwise stop the iteration and output the results

Page | 31
Step 2: Set the iteration number as t = 2+1 = 3 and go to step 3
Step 3: Find the personal best for each particle
t +1
 t +1
 xi ( ) (
if f xit +1  f Pbest,
t
)
= t
i
Pbest,i

Pbest,i otherwise

1 = 1.71 , Pbest,2 = 2.16 , Pbest,3 = 2.35


3 3 3
Pbest,

4 = 1.82 , Pbest,5 = 3.05


3 3
Pbest,

Step 4: Find the global best among all particles



Gbest = max Pbest,
t
i 
From the computed fitness values, f11 = −4.79 is the highest
1 = 1.71 and therefore Gbest = 1.71
3
Hence, Pbest,

Step 5: Compute the velocities of particles



vit +1 = w  vit + c1r1t Pbest,
t
 
i − xi + c2 r2 Gbest − xi
t t t t

v13 = 9.81 + 1 0.25  1.71 −11.52 + 1 0.9  1.71 −11.52 = −1.47
v23 = 5.36 + 1 0.25  2.16 − 7.52+ 1 0.9  1.71 − 7.52 = −1.21
v33 = 3.47 + 1 0.25  2.35 − 5.82 + 1 0.9  1.71 − 5.82 = 1.10
v43 = −0.98 + 1 0.25  1.82 −1.82 + 1 0.9  1.71 −1.82 = −1.08
v53 = −3.46 + 1 0.25  3.05 − (− 0.41) + 1 0.9  1.71 − (− 0.41) = −0.69

Step 6: Find new position of particles xi3


x13 = 11.52 − 1.47 = 10.05
x23 = 7.52 −1.21 = 6.31
x33 = 5.82 + 1.10 = 6.92
x43 = 1.82 − 1.08 = 0.74
x53 = −0.41 − (− 0.69) = 0.28

Step 7: Compute the fitness values of the new positions of particles, x1i
f ( x ) = − x 2 + 3x − 7
f13 = −10.052 + 3 10.05 − 7 = −77.85
f 23 = −6.312 + 3  6.31 − 7 = −40.99
f33 = −5.822 + 3  5.82 − 7 = −23.41
f 43 = −0.742 + 3  0.74 − 7 = −5.32
f53 = −(0.28)2 + 3  (0.28) − 7 = −6.23

Stopping criteria has been reached since 3 iterations has been done

t +1
 t +1
 xi ( ) (
if f xit +1  f Pbest,
t
)
= t
i
Pbest,i

Pbest,i otherwise

Pbest,1 = 1.71 , Pbest,2 = 2.16 , Pbest,3 = 2.35


Pbest,4 = 1.82 , Pbest,5 = 0.28

Page | 32

Gbest = max Pbest,
t
i 
From the computed fitness values, f11 = −4.79 is the overall highest which corresponds to x = 1.71
Hence, x = 1.71.

Note that:
During the computation, if a number outside x obtained does not fall within the given boundary, the closest
boundary value should be selected for x
For example, supposing x13 was obtained to be 15, since 15 does not fall within the given range -12 ≤ x ≤
12, then x13 = 12 is chosen, since 12 is the closest to 15 among all the allowed values of x .
Likewise, supposing x13 was obtained to be -18, since -18 does not fall within the given range -12 ≤ x ≤
12, then x13 = -12 is chosen, since -12 is the closest to -18 among all the allowed values of x

Genetic Algorithm
Genetic Algorithms (GA) are based on the principles of survival of the fittest; sometimes called natural
selection.
In GA, many potential solutions to a problem are created. Each solution is evaluated to see how good it is.
The best solutions are allowed to breed with each other.
This cycle continues in the hope that better solutions will emerge and each solution is normally called a
chromosome (or an individual). Each chromosome is made up of genes, which are the individual elements
that represent the problem. The collection of chromosomes is called a population.

Firstly the suitable “parents” must be chosen. The choosing of parents is normally done after the evaluation
of the rating of individuals. In doing this the fitter individuals are more likely to breed but the weaker
members of the population also have the opportunity.

After choosing the parents, two offspring (normally) are produced from the two parents. The children
consist of genetic material taken from both parents. How the genetic material is distributed can be done in a
number of ways, which will be discussed shortly.

However, breeding is not done all the time. There is a probability associated with each breeding pair as to
whether they produce children or not. The probability of breeding is usually set to about sixty percent but
other figures are also possible.

Mutation happens with low probability and how the mutation occurs depends on the coding that is being
used. If the problem is being represented by bit strings then mutation is fairly easy to implement. It can
simply look at each bit in the chromosome and decide (with some low probability) if the bit should be
replaced with a randomly produced bit.

Genetic algorithm does not usually have any knowledge about the problem it is trying to solve. The only
part of the GA that has some domain knowledge is the evaluation function. This function is given a
chromosome and passes back an evaluation for the chromosome. It is this evaluation rating that the
breeding mechanism uses in deciding which chromosomes should breed.

The breeding mechanism has no knowledge about the problem. In its simplest form a GA is just
manipulating bit strings.

The GA Algorithm

Page | 33
A Genetic Algorithm can be implemented using the following outline algorithm

1. Initialise a population of chromosomes


2. Evaluate each chromosome (individual) in the population
2.1. Create new chromosomes by mating chromosomes in the current population (using crossover and
mutation)
2.2. Delete members of the existing population to make way for the new members
2.3. Evaluate the new members and insert them into the population
3. Repeat stage 2 until some termination condition is reached (normally based on time or number of
populations produced)
4. Return the best chromosome as the solution.

Definition of Terms:
Allele : The possible values that can be taken by a gene are called alleles.
Chromosome : An individual within the population. You may also see the following terms, which
means the same thing; individual, solution, strings and vectors.
Gene : Genes are the basic building blocks which form chromosomes. In the examples in this
handout, we have mainly considered bit strings. Each bit is a gene.
Genotype : The genotype is the expression of the chromosome. The classical representation is bit
strings but many other representations and data structures are possible.
Locus : The position of a variable within a chromosome is called its locus.
Phenotype : The phenotype is the physical expression of the chromosome. For example, a
chromosome might consist of bit strings but it could actually represent integers or real
numbers and those could represent anything.
Population : A set of chromosomes that represent a pool of solutions that are current being
considered.

Chromosome Evaluation
It is very important to note that this is the only part of the GA that has any knowledge about the problem
that is to be solved. The rest of the GA modules are simply operating on (typically) bit strings with no
information about the problem.

Population Creation
Some techniques are usually employed in creating the required population. These techniques are as follows

▪ Initialisation Technique
This technique determines how the initial population is initialized. It is often the case than a random
initialisation is done. In the case of a binary coded chromosome this means that each bit is initialised to a
random zero or one. But there are instances where the population is initialized with some known good
solutions. This might be applicable where, for example, a good solution is known but it is desired to try and
improve on it.

▪ Deletion Technique
This technique determines how the population is deleted at each generation of the GA.
Three common deletion techniques are
Delete-All : This technique deletes all the members of the current population and
replaces them with the same number of chromosomes that have just been
created.
Steady-State : This technique deletes n old members and replaces them with n new
members. The number to delete and replace (n) at any one time is a
parameter to this deletion technique.
Another consideration for this technique is deciding which members to
delete from the current population. Should the worst individuals be
Page | 34
deleted? Should deletion candidates be picked at random? Should parent
chromosomes be the ones to be deleted?
Steady-State-No-Duplicates : This is the same as the steady-state technique but the algorithm checks
that no duplicate chromosomes are added to the population.

▪ Parent Selection Technique


The fittest individuals from the population are usually used as parents but it is sometimes beneficial to use
less fit individuals as well so that more of the search space is explored in order to increase the chances of
producing promising offspring.

The danger of always using, say, only the best chromosomes is that the population quickly converges to
one of these individuals and an inferior final solution is likely.

Two common parent selection techniques are described are


Roulette Wheel Selection : The idea behind the roulette wheel parent selection technique is that each
individual is given a chance to become a parent in proportion to its fitness
evaluation. It is called roulette wheel selection as the chances of selecting
a parent can be seen as spinning a roulette wheel with the size of the slot
for each parent being proportional to its fitness. Obviously those with the
largest fitness (and slot sizes) have more chance of being chosen.
Roulette wheel selection can be implemented as follows
1. Sum the fitnesses of all the population members. Call this TF (total
fitness).
2. Generate a random number m, between 0 and TF.
3. Return the first population member whose fitness added to the
preceding population members is greater than or equal to m.
The problem with roulette wheel selection is that one member can
dominate all the others and get selected a high proportion of times. The
reverse is also true. If the evaluation of the members are very close then
they will have an almost equal chance of being selected. To get round
these problems various fitness normalisation techniques can be used.
These are described shortly.

Tournament Selection : In effect, potential parents are selected and a tournament is held to decide
which of the individuals will be the parent. There are many ways this can
be achieved and two of them are
1. Select a pair of individuals at random. Generate a random number, R,
between 0 and 1. If R < r use the first individual as a parent. If the R
>= r then use the second individual as the parent. This is repeated to
select the second parent. The value of r is a parameter to this method.
2. Select two individuals at random. The individual with the highest
evaluation becomes the parent. Repeat to find a second parent.

▪ Fitness Technique
As mentioned above, using the evaluation to choose parents can lead to problems. For example, if one
individual has an evaluation that is higher than all the other members of the population then that
chromosome will get chosen a lot and will dominate the population. Similarly, if the population has almost
identical evaluations then they have an almost equal chance of being selected, which will lead to an almost
random search.
In order to solve this problem, each chromosome is sometimes given two values, an evaluation and a
fitness. The fitness is a normalised evaluation so that parent selection is done more fairly. Some of the
methods for calculating fitness are described below.

Page | 35
Fitness-Is-Evaluation : It is common to simply have the fitness of the chromosome equal to its
evaluation.
Windowing : The windowing evaluation technique takes the lowest evaluation and
assigns each chromosome a fitness equal to the amount it exceeds this
minimum.
Linear Normalization : The chromosomes are sorted by decreasing evaluation value. Then the
chromosomes are assigned a fitness value that starts with a constant value
and decreases linearly. The initial value and the decrement are parameters
to the techniques.

▪ Population Size
This parameter determines how many chromosomes should be in the population at any one time.

▪ Elitism
It is sometimes the case that a good solution previously found during the GA run gets deleted from the
population as the GA progresses. One solution is to “remember” the best solution found so far.
Alternatively a technique called elitism can be used.
This technique ensures that the best members of the population are carried forward from one generation to
the next.
It is usual to supply a parameter to the elitism function that says what percentage of the population should
be carried over from one generation to the next.

Reproduction
Chromosomes are bred through the reproduction process. Typically, two parents are selected from which
two offspring are produced. Reproduction is done by operators, with mutation playing a lesser, but still
important role.

Operators
The first GA operator to be developed was one-point crossover. The other operators have been added as the
GA field has developed. There have also been operators developed for specific problems.

▪ One-Point Crossover
One-point crossover takes two parents and breeds two children. It works as follows

Parent 1 1 0 1 1 1 0 1
Parent 2 1 1 0 0 1 1 0

Child 1 1 0 0 0 1 1 0
Child 2 1 1 1 1 1 0 1

• Two parents are selected.


• A crossover point is chosen at random (as shown above).
• Child 1 is built by taking genes from parent 1 from the left of the crossover point and genes from parent
2 from the right of crossover point.
• Child 2 is built in the same way but it takes genes from the left of the crossover point of parent 2 and
genes from the right of the crossover point of parent 1.

Page | 36
Two-Point Crossover
Two-point crossover works in a similar way as one point crossover but two crossover points are selected. It
is also possible to have n-point crossover.

Uniform Crossover
For each bit position of the two children we decide, at random, which parent will contribute its bit value to
that child.
The algorithm can be implemented as follows:

Parent 1 1 0 1 1 1 0 1
Parent 2 1 1 0 0 1 1 0
Template 0 1 1 0 0 1 0

Child 1 1 0 1 0 1 0 0
Child 2 1 1 0 1 1 1 1

• Two parents are selected


• A template is created which consists of random bits
• Child 1 receives bits from parent 1 indicated by a one in the template and bits from parent 2 indicated
by a zero in the template.
• Child 2 is built in the same way but receives bits from parent 1 where there is a zero in the template and
bits from parent 2 when there is a one in the template.

Mutation
If we use a crossover operator such as one-point crossover we may get better and better chromosomes but
the problem is, if the two parents (or worse – the entire population) have the same value in a certain
position then one-point crossover will not change that. In other words, that bit will have the same value
forever. Mutation is designed to overcome this problem and add some diversity to the population.

The most common way that mutation is implemented is to select a bit at random and randomly change it to
a zero or a one. Other mutation operators may swap parts of the gene or may develop problem specific
mutation operators.

Mutation Rate
This is a parameter to the GA algorithm. It defines how often mutation should be applied. A typical value
is 0.008. Therefore, when presented with a chromosome the mutation sweeps down the bit string and each
bit has one chance in 8000 of being mutated.

Crossover Rate
The crossover rate defines how often crossover should be applied. A typical value if 0.6. This means that
when presented with two parents there is a 60% chance that the parents will breed.

Example
Given the objective function

f(x) = x2 - 3x+ 12
s.t:
0 ≤ x≤ 31
Page | 37
Assuming the initial population is as provided in Table 6, apply genetic algorithm and determine the value
of x that maximizes the given expression
Hint:
Stopping criteria = stop at the end of the 2nd generation
Mutation probability = 0.0
Cross-over probability = 1.0
Use single-point cross-over with cross-over point of 3
Use Roulette wheel selection technique
Use Delete-All replacement technique

Table 6
Parent
String
Parent 1
10010
Parent 2
01111
Parent 3
01010
Parent 4
00111

Solution

First Generation

Parent
String x f(x)= x2 - 3x + 12
Parent 1
10010 18 282
Parent 2
01111 15 192
Parent 3
01010 10 82
Parent 4
00111 07 40
Total 596

Average 149

 f (x) = 282 + 192 + 82 + 40 = 596


i =1
i

 f (x ) i
596
Average = i =1
= = 149
n 4
Page | 38
f (x )i 282
For parent P1, Percentage probability = n
= = 0.47
 f (x )
596
i
i =1

f (x )i 192
For parent P2, Percentage probability = n
= = 0.32
 f (x )
596
i
i =1

f (x )i 82
For parent P3, Percentage probability = n
= = 0.14
 f (x )
596
i
i =1

f (x )i 40
For parent P4, Percentage probability = n
= = 0.07
 f (x )
596
i
i =1

f (x )i 282
For parent P1, Expected Count = = = 1.89  2
Avg f (x )i 149
f (x )i 192
For parent P2, Expected Count = = = 1.23  1
Avg f (x )i 149

f (x )i 82
For parent P3, Expected Count = = = 0.55  1
Avg f (x )i 149

f (x )i 40
For parent P4, Expected Count = = = 0.27  0
Avg f (x )i 149

Parent Binary String Cross-over Binary String x value of f(x) = x2 - 3x + 12


Chromosome of parent point of offspring offspring
P1 3 10011 19 316
10010
P2 3 01110 14 166
01111
P1 3 10010 18 282
10010
P3 3 01010 10 82
01010

Page | 39
Second Generation
Parent
String x f(x)= x2 - 3x + 12
Parent 1 10011 19 316

Parent 2 01110 14 166

Parent 3 10010 18 282

Parent 4 01010 10 82

Total 846

Average 211.5

 f (x )
i =1
i = 316 + 166 + 282 + 82 = 846

 f (x ) i
846
Average = i =1
= = 211.5
n 4

f (x )i 316
For parent P1, Percentage probability = n
= = 0.34
 f (x )
846
i
i =1

f (x )i 166
For parent P2, Percentage probability = n
= = 0.20
 f (x )
846
i
i =1

f (x )i 282
For parent P3, Percentage probability = n
= = 0.33
 f (x )
846
i
i =1

f (x )i 82
For parent P4, Percentage probability = n
= = 0.10
 f (x )
846
i
i =1

f (x )i 316
For parent P1, Expected Count = = = 1.5  2
Avg f (x )i 211.5
f (x )i 166
For parent P2, Expected Count = = = 0.78  1
Avg f (x )i 211.5

f (x )i 282
For parent P3, Expected Count = = = 1.3  1
Avg f (x )i 211.5

Page | 40
f (x )i 82
For parent P4, Expected Count = = = 0.40  0
Avg f (x )i 211.5

Parent Binary String Cross-over Binary String x value of f(x) = x2 - 3x + 12


Chromosome of parent point of offspring offspring
P1 10011 3 10010 18 282
P2 01110 3 01111 15 192
P1 10011 3 10010 18 282
P3 10010 3 10011 19 316

At the end of the second generation, the value of x and corresponding values of the objective function are:
For P1 = 100102 = 1810 , f (x)= 282
For P2 = 011112 = 1510 , f (x)= 192
For P3 = 100102 = 1810 , f (x)= 282
For P4 = 100112 = 1910 , f (x)= 316

( )
max(100102 , 011112 , 100102 , 100112) ≡ max f(18), f(15), f(18), f(19) = max(282, 192, 282, 316) = 316

Therefore, the value of x that maximizes the objective function at the end of the second generation is
x = 19

Page | 41

You might also like