CSS3563
Artificial
Intelligence
Topic 5 –
Intelligent
System
Applications
University College of
Technology Sarawak
Outline
• What is intelligent system?
• Overview of Intelligent System Applications
• Artificial Neural Networks (ANN)
• Natural Language Processing (NLP)
• Genetic Algorithms (GA)
• Swarm Intelligence
2
What is an intelligent system?
• What is intelligence?
• Hard to define unless you list characteristics eg,
• Reasoning
• Learning
• Adaptivity
• A truly intelligent system adapts itself to deal with changes in
problems (automatic learning)
• Few machines can do that at present
• Machine intelligence has a computer follow problem solving
processes something like that in humans
• Intelligent systems display machine-level intelligence, reasoning,
often learning, not necessarily self-adapting
3
Intelligent systems in business
• Intelligent systems in business utilise one or more intelligence
tools, usually to aid decision making
• Provides business intelligence to
• Increase productivity
• Gain competitive advantage
• Examples of business intelligence – information on
• Customer behaviour patterns
• Market trend
• Efficiency bottlenecks
• Examples of successful intelligent systems applications in
business:
• Customer service (Customer Relations Modelling)
• Scheduling (eg Mine Operations)
• Data mining
• Financial market prediction
• Quality control
4
Intelligent systems in business
• HNC (now Fair Isaac) software’s credit card fraud detector Falcon
offers 30-70% improvement over existing methods (an example of a
neural network).
• MetLife insurance uses automated extraction of information from
applications in MITA (an example of language technology use)
• Personalized, Internet-based TV listings (an intelligent agent)
• Hyundai’s development apartment construction plans FASTrak-Apt (a
Case Based Reasoning project)
• US Occupational Safety and Health Administration (OSHA uses
"expert advisors" to help identify fire and other safety hazards at
work sites (an expert system).
• Source: http://www.newsfactor.com/perl/story/16430.html
5
Introduction to ANN
• Artificial neural networks (ANNs) are a family of models
inspired by biological neural networks (the central nervous
systems of animals, in particular the brain)
• ANN are used to estimate or approximate functions that
can depend on a large number of inputs and are
generally unknown.
• Artificial neural networks are generally presented as
systems of interconnected "neurons" which exchange
messages between each other.
• The connections have numeric weights (wi) that can be
tuned based on experience, making neural nets adaptive to
inputs and capable of learning.
6
• https://www.youtube.com/watch?v=gcK_5x2KsLA
ANN
7
Characteristics of NNs
• Learning from experience: Complex, difficult to solve
problems, but with plenty of data that describe the
problem
• Generalizing from examples: Can interpolate from
previous learning and give the correct response to
unseen data (prediction)
• Rapid applications development: NNs are generic
machines and quite independent from domain knowledge
• Adaptability: Adapts to a changing environment, if is
properly designed
• Computational efficiency: Although the training of a
neural network demands a lot of computer power, a
trained network demands almost nothing in recall mode.
• Non-linearity: A NN, made up of an interconnection of
nonlinear neurons, is itself nonlinear. Physical mechanism
responsible for generation of the input signal (e.g., Slide 8
speech signal) is inherently nonlinear
Who are concerned with NNs?
• Computer scientists want to find out about the properties of
non-symbolic information processing with neural nets and about
learning systems in general.
• Statisticians use neural nets as flexible, nonlinear regression and
classification models.
• Engineers of many kinds exploit the capabilities of neural
networks in many areas, such as signal processing and automatic
control.
• Cognitive scientists view neural networks as a possible
apparatus to describe models of thinking and consciousness (High-
level brain function).
• Neuro-physiologists use neural networks to describe and explore
medium-level brain function (e.g. memory, sensory system,
motorics).
• Physicists use neural networks to model phenomena in statistical
mechanics and for a lot of other tasks.
• Biologists use Neural Networks to interpret nucleotide sequences.
• Philosophers and some other people may also be interested in
Neural Networks for various reasons
Neural Network Techniques
• Conventional: Computers have to be explicitly
programmed
• Analyze the problem to be solved.
• Write the code in a programming language (rules) – this is
what we conventional do.
• Neural networks learn from examples (not
programmed a priori)
• No requirement of an explicit description of the problem –
learn from training data.
• No need for a programmer.
• The neural computer adapts itself during a training period,
based on examples of similar problems even without a
desired solution to each problem.
• After sufficient training the neural computer is able to relate
the problem data to the solutions, inputs to outputs, and it is
then able to offer a viable solution to a brand new problem.
• Able to generalize or to handle incomplete data. Slide 10
NNs vs Computers
Digital Computers Neural Networks
• Deductive Reasoning - • Inductive Reasoning -
We apply known rules Given input and output
(programs) to input data to data (training examples),
produce output. we construct the rules.
• Computation is centralized, • Computation is collective,
synchronous, and serial. asynchronous, and parallel.
• Memory is packetted, • Memory is distributed,
literally stored, and internalized, short term
location addressable. and content addressable.
• Not fault tolerant. One • Fault tolerant, redundancy,
transistor goes and it no and sharing of
longer works. responsibilities.
• Exact. • Inexact.
• Static connectivity. • Dynamic connectivity.
• Applicable if well • Applicable if rules are
defined rules with unknown or
precise input data. complicated, or if data Slide 11
are noisy or partial.
Neurons in the Brain
• Although heterogeneous, at a low level the brain is
composed of neurons
• A neuron receives input from other neurons (generally
thousands) from its synapses
• Inputs are approximately summed
• When the input exceeds a threshold the neuron sends an
electrical spike (signal) that travels from the body, down the
axon, to the next neuron(s)
A Simple Artificial Neuron
• An artificial neuron is a device with many inputs and one
output.
• The neuron has two modes of operation;
• the training mode and
• the using mode
A Simple Artificial Neuron
• In the training mode, the neuron can be trained to fire
(or not), for particular input patterns.
• In the using mode, when a taught input pattern is
detected at the input, its associated output becomes the
current output.
• If the input pattern does not belong in the taught list of
input patterns, the firing rule is used to determine
whether to fire or not.
• A firing rule determines how one calculates whether a
neuron should fire for any input pattern. It relates to all
the input patterns, not only the ones on which the node
was trained on previously.
Neural Network Learning Process
15
Artificial Neuron Model
• Neural computing requires a number of neurons, to be
connected together into a neural network.
• Neurons are arranged in layers.
• Each neuron within the network is usually a simple
processing unit which takes one or more inputs and
produces an output.
• At each neuron, every input has an associated weight
which modifies the strength of each input. The neuron
simply adds together all the inputs and calculates an
output to be passed on.
An Artificial Neuron Model
• When a neuron receives excitatory input (stimulus) that is
sufficiently large compared with its inhibitory input
(threshold), it sends a spike of electrical activity down its
axon.
• Learning occurs by changing the effectiveness of the
synapses so that the influence of one neuron on another
changes.
• We conduct these neural networks by first trying to deduce
the essential features of neurons and their
interconnections.
Artificial Neuron Model
Threshold
https://www.youtube.com/watch?v=SGZ6BttHMPw 18
Network Structure
• The number of layers and number of neurons depend
on the specific task.
• In practice this issue is solved by trial and error.
• Two types of adaptive algorithms can be used:
• start from a large network and successively remove some
neurons and links until network performance degrades.
• begin with a small network and introduce new neurons until
performance is satisfactory.
Network Layers
• Input Layer - The activity of the input units represents
the raw information that is fed into the network.
• Hidden Layer - The activity of each hidden unit is
determined by the activities of the input units and the
weights on the connections between the input and the
hidden units.
• Output Layer - The behavior of the output units
depends on the activity of the hidden units and the
weights between the hidden and output units.
Network Layers
• This simple type of network is interesting because the
hidden units are free to construct their own
representations of the input.
• The weights between the input and hidden units
determine when each hidden unit is active, and so by
modifying these weights, a hidden unit can choose
what it represents.
Feedforword NNs
• The basic structure of a feedforward Neural Network
• Feed-forward NNs allow signals to travel one way
only; from input to output.
• There is no feedback (loops) i.e. the output of any
layer does not affect that same layer.
Feedforword NNs
• Feed-forward NNs tend to be straight forward
networks that associate inputs with outputs. They are
extensively used in pattern recognition.
• This type of organization is also referred to as
bottom-up or top-down.
• The learning rule modifies the weights according to
the input patterns that it is presented with. In a
sense, ANNs learn by example.
• When the desired output are known we have
supervised learning (classification) or learning with a
teacher
Feedforward Network
Single-Layer Feedforward Multilayer Feedforward Networks
Networks
24
Feedforward Network (animated)
https://www.youtube.com/wa
tch?v=b5mHER4rI-c
25
Feedback/Recurrent Networks
• Feedback networks can have signals traveling in both
directions by introducing loops in the network.
• Feedback networks are dynamic; their 'state' is
changing continuously until they reach an equilibrium
point.
• They remain at the equilibrium point until the input
changes and a new equilibrium needs to be found.
• Feedback architectures are also referred to as
interactive or recurrent, although the latter term is
often used to denote feedback connections in single-
layer organizations.
• https://www.youtube.com/watch?v=I9Da6f6MpLs
Feedback/Recurrent Networks
27
Weights
• In general, initial weights are randomly chosen, with
typical values between -1.0 and 1.0 or -0.5 and 0.5.
• There are two types of NNs:
• Fixed Networks – where the weights are fixed
• Adaptive Networks – where the weights are changed to
reduce prediction error.
The Learning Rule
• The delta rule is often utilized by the most common class
of ANNs called back-propagational neural networks.
1. When a neural network is initially presented with a
pattern it makes a random guess as to what it might be.
2. It then sees how far its answer was from the actual one
and makes an appropriate adjustment to its connection
weights.
I nput
De s ire d
O utput
https://en.wikipedia.org/wiki/Delta_rule
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
#Adaptive-Linear-Neurons-and-the-Delta-Rule Slide 29
Training: Back-prop Algorithm
• The Back-propagation algorithm searches for weight
values that minimize the total error of the network over
the set of training examples (training set).
• Backprop consists of the repeated application of the
following two passes:
• Forward pass: in this step the network is activated on one
example and the error of (each neuron of) the output layer is
computed.
• Backward pass: in this step the network error is used for
updating the weights (feedback). Starting at the output layer,
the error is propagated backwards through the network, layer by
layer. This is done by recursively computing the local gradient of
each neuron.
Back-propagation (animated)
• http://ocw.mit.edu/courses/electrical-engineering-and-c
omputer-science/6-034-artificial-intelligence-fall-2010/le
cture-videos/lecture-12-learning-neural-nets-back-propa
gation/
31
Back-propagation
1. A set of examples for training the network is
assembled. Each case consists of a problem
statement (which represents the input into the
network) and the corresponding solution
(which represents the desired output from the
network).
2. The input data is entered into the network via
the input layer.
3. Each neuron in the network processes the
input data with the resultant values steadily
"percolating" through the network, layer by
layer, until a result is generated by the output
layer.
4. The actual output of the network is compared
to expected output for that particular input.
This results in an error value.. The connection
weights in the network are gradually adjusted,
working backwards from the output layer,
through the hidden layer, and to the input
layer, until the correct output is produced.
Fine tuning the weights in this way has the
effect of teaching the network how to produce
the correct output for a particular input, i.e. Slide 32
the network learns.
Training Basics
• The most basic method of training a neural network is trial
and error.
• The set of data which enables the training is called the
training set.
1. During the training of a network the same set of data is
processed many times as the connection weights are ever
refined. If the network isn't behaving the way it should,
change the weighting of a random link by a random
amount.
2. If the accuracy of the network declines, undo the change
and make a different one.
• It takes time, but the trial and error method does produce
results.
Size of Training Data
• Rule of thumb: the number of training examples should
be at least five to ten times the number of weights of the
network.
|W|
N
(1 - a)
|W|= number of weights
a = expected accuracy on test set
Design Conciderations
• What transfer function should be used?
• How many inputs does the network need?
• How many hidden layers does the network need?
• How many hidden neurons per hidden layer?
• How many outputs should the network have?
• How are the weights initialized?
• How many hidden layers and how many neurons?
• How many examples in the training set?
There is no standard methodology to
determinate these values. Even there is
some heuristic points, final values are
determined by a trial and error procedure.
The Learning Process
• Every neural network possesses knowledge which is
contained in the values of the connection weights.
• Modifying the knowledge stored in the network as a
function of experience implies a learning rule for changing
the values of the weights.
• Adaptive networks are NNs that allow the change of
weights in its connections.
• The learning / training methods can be classified in two
categories:
• Supervised Learning
• Unsupervised Learning
Supervised Learning
(Learning with a Teacher)
• In this sort of learning, the teacher’s experience is used
to tell the NN which outputs are correct and which are
not.
• This does not mean that a human teacher needs to be
present at all times, only the correct classifications
gathered from the human teacher on a domain needs
to be present.
• The network then learns from its error, that is, it
changes its weight to reduce its prediction error.
Supervised Learning
(Learning with a Teacher)
• In supervised training, both the inputs and the desred outputs
are provided. Supervised learning which incorporates an
external teacher, so that each output unit is told what its
desired response to input signals ought to be.
• Error convergence: the minimization of error between the
desired and computed unit values. The network then
processes the inputs and compares its resulting outputs
against the desired outputs. Errors are then propagated back
through the system, causing the system to adjust the weights
which control the network. This process occurs over and over
as the weights are continually tweaked.
• The aim is to determine a set of weights which minimizes the
error. One well-known method, which is common to many
learning paradigms is the least mean square (LMS)
convergence.
Unsupervised learning
(Learning without a Teacher)
• Unsupervised learning uses no external teacher and is
based upon only local information.
• The network is provided with inputs but not with
desired outputs. The system itself must then decide
what features it will use to group the input data. The
network is then used to construct clusters of similar
patterns.
• It is also referred to as self-organization, in the sense
that it self-organizes data presented to the network and
detects their emergent collective properties.
• This is particularly useful when instances are checked
to match previous scenarios. For example, detecting
credit card fraud.
Advantages of ANN
• Powerful data-driven, self-adaptive, flexible
computational tool having the capability of capturing
nonlinear and complex underlying characteristics of
any physical process (e.g. damage detection) with a
high degree of accuracy.
• ANNs are used to treat complicated problems, in
which too many variables to be simplified in a model.
• Easily implemented in parallel architectures (i.e. in
multicore processors or systems with GPUs).
40
Disadvantages of ANN
• The individual relations between the input variables
and the output variables are not developed by
engineering judgment so that the model tends to be a
black box or input/output table without analytical
basis.
• Due to this, the user cannot explain how learning
from input data was performed. (You can get the result
but you may not know why)
• The sample size has to be large (for training).
• Requires lot of trial and error so training can be time
consuming.
What is Natural Language Processing
(NLP)?
• NLP is the branch of computer science focused on
developing systems that allow computers to communicate
with people using everyday language.
• Also called Computational Linguistics (CL), Human
Language Technology (HLT), Natural Language Engineering
(NLE)
• Concerns how computational methods can aid the
understanding of human language
• Can machines understand human language?
• It depends on what you mean by ‘understand’
• Understanding is the ultimate goal. However, one doesn’t
need to fully understand to be useful.
What is NLP?
• Started off as a branch of Artificial Intelligence.
• Borrows from Linguistics, Psycholinguistics, Cognitive
Science & Statistics.
• Analyze, understand and generate human languages
just like humans do.
• Applying computational techniques to language
domain.
• To explain linguistic theories, to use the theories to
build systems that can be of social use.
• Make computers learn our language rather than
we learn theirs.
What is NLP?
• Computers using natural language as input and/or
output
44
Natural Language Processing (NLP)
• Natural Language Understanding (input)
• Taking some spoken/typed sentence and working out what it
means
• Natural Language Generation (output)
• Taking some formal representation of what you want to say
and working out a way to express it in a natural (human)
language (e.g., English)
Why Study NLP?
• Language is a hallmark of human intelligence.
• Text is the largest repository of human knowledge and
is growing quickly: emails, news articles, web pages, IM,
scientific articles, insurance claims, customer complaint
letters, transcripts of phone calls, technical documents,
government documents, patent portfolios, court
decisions, contracts, ……
• Are we reading any faster than before?
The Problems
• When people see text, they understand its meaning (by and
large)
• When computers see text, they get only character strings
(and perhaps HTML tags)
• We'd like computer agents to see meanings (like a human
reader) and be able to intelligently process text
• These desires have led to many proposals for structured,
semantically marked up formats
• But often human beings still resolutely make use of text in
human languages
• This problem isn’t likely to just go away anytime soon…
Syntax
• Words convey meaning. But when they are put together they
convey more.
• Syntax is the grammatical structure of the sentence. Just
like the syntax in programming languages.
• structures and patterns in phrases
• how phrases are formed by smaller phrases and words
• Identifying the structure is the first step towards
understanding the meaning of the sentence.
• Syntactic Analysis (Parsing) = Process of assigning a
parse tree to a sentence.
• Constituents, grammatical relations, subcategorization and
dependencies.
Syntactic Analysis (Parsing)
49
Syntactic Parsing
• Produce the correct syntactic parse tree for a sentence.
Word Segmentation
• Breaking a string of characters (graphemes) into a
sequence of words.
• In some written languages (e.g. Chinese) words are not
separated by spaces.
• Even in English, characters other than white-space can
be used to separate words [e.g. , ; . - : ( ) ]
• Examples from English URLs:
• jumptheshark.com jump the shark .com
• myspace.com/pluckerswingbar
• myspace .com pluckers wing bar
• myspace .com plucker swing bar
Part Of Speech (POS) Tagging
• Annotate each word in a sentence with a part-of-
I ate the spaghetti with meatballs.
speech.
Pro V Det N Prep N
• Useful for subsequent syntactic parsing and word sense
John saw the saw and decided to take it to the table.
disambiguation.
PN V Det N Con V Part V Pro Prep Det N
Phrase Chunking
• Find all non-recursive noun phrases (NPs) and verb
phrases (VPs) in a sentence:
• [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs].
• [NP He ] [VP reckons ] [NP the current account deficit ] [VP will
narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP
September ]
Morphological Analysis
• Morphology is the field of linguistics that studies the
internal structure of words. (Wikipedia)
• A morpheme is the smallest linguistic unit that has
semantic meaning (Wikipedia)
• e.g. “carry”, “pre”, “ed”, “ly”, “s”
• Morphological analysis is the task of segmenting a
word into its morphemes:
• carried carry + ed (past tense)
• independently in + (depend + ent) + ly
• Googlers (Google + er) + s (plural)
• unlockable un + (lock + able) ?
• (un + lock) + able ?
Morphology: What is a word?
• Morphology is all about the words.
• Make more words from less ☺.
• Structures and patterns in words
• Analyzes how words are formed from minimal units of
meaning, or morphemes, e.g., dogs= dog+s.
• Words are a sequence of Morphemes.
• Morpheme – smallest meaningful unit in a word. Free & Bound.
• Inflectional Morphology – Same Part of Speech
• Buses = Bus + es
• Carried = Carry + ed
• Derivational Morphology – Change PoS.
• Destruct + ion = Destruction (Noun)
• Beauty + ful = Beautiful (Adjective)
• Affixes – Prefixes, Suffixes & Infixes
• Rules govern the fusion.
• https://en.wikipedia.org/wiki/Morphology_(linguistics)
Morphology
56
Morphology is not as easy as it may
seem to be…
• Examples from Woods et. al. 2000
• Delegate(delegasyon,heyet)
• (de + leg + ate) take the legs from
• Caress(okşamak)
• (car + ess(dişilik eki)) female car
• cashier (cashy + er) more wealthy
• lacerate (lace + rate) speed of tatting
• ratify (yırtmak, yaralamak; (kalbini) kırmak)
• (rat + ify) infest with rodents(kemigenlerin istilası)
• Infantry(piyade)
• (infant(bebek, küçük çocuk ) + ry) childish behavior
Semantics
• What do you mean? What is the meaning?
• Semantics: the meaning of a word or phrase within a
sentence
• How to represent meaning?
• Semantic network? Logic? Policy?
• How to construct meaning representation?
• Is meaning compositional?
• Words – Lexical Semantics
• Sentences – Compositional Semantics
• Converting the syntactic structures to semantic format –
meaning representation.
Word Sense Disambiguation (WSD)
• Words in natural language usually have a fair number
of different possible meanings.
• Ellen has a strong interest in computational linguistics.
• Ellen pays a large amount of interest on her credit card.
• For many tasks (question answering, translation), the
proper sense of each ambiguous word in a sentence
must be determined.
• http://aclweb.org/aclwiki/index.php?title=Word_sense_d
isambiguation
59
Word Sense Disambiguation (WSD)
60
Semantic Role Labeling (SRL)
• Semantic role labeling, sometimes also called shallow
semantic parsing, is a task in natural language
processing consisting of the detection of the semantic
arguments associated with the predicate or verb of
a sentence and their classification into their specific roles.
• For each clause, determine the semantic role played by each
noun phrase that is an argument to the verb.
• agent recipient source destination instrument
• John drove Mary from Austin to Dallas in his Toyota Prius.
• agent theme
• The hammer broke the window.
• Also referred to a “case role analysis,” “thematic
analysis,” and “shallow semantic parsing”
61
• https://en.wikipedia.org/wiki/Semantic_role_labeling
Semantic Role Labeling (SRL)
62
Semantic Parsing
• A semantic parser maps a natural-language sentence
to a complete, detailed semantic representation (logical
form, e.g. predicate logic).
• For many applications, the desired output is
immediately executable by another program.
• Example: Mapping an English database query to Prolog
(predicate logic):
• How many cities are there in the US?
• answer(A, count(B, (city(B), loc(B, C), const(C,
countryid(USA))), A))
• Remember propositional and predicate logic
Semantic Parsing
64
Pragmatics
• Pragmatics: structures and patterns in discourses
• Sentence standing alone may not mean so much. It may
be ambiguous.
• What information is contained in the contextual
sentences that is not conveyed in the actual
sentence?
• https://en.wikipedia.org/wiki/Pragmatics
Pragmatics
• Discourse / Context makes utterances more complicated.
• Implicatures: what is suggested in an utterance, even though
neither expressed nor strictly implied (that is, entailed) by the
utterance.
• "Mary had a baby and got married" strongly suggests that Mary had the baby
before the wedding
• https://en.wikipedia.org/wiki/Implicature
• Speech acts: A speech act is an utterance that serves a function in
communication. We perform speech acts when we offer an apology,
greeting, request, complaint, invitation, compliment, or refusal.
• Greeting: "Hi, Eric. How are things going?"
• Request: "Could you pass me the mashed potatoes, please?"
• Complaint: "I’ve already been waiting three weeks for the computer, and I
was told it would be delivered within a week."
• Invitation: "We’re having some people over Saturday evening and wanted to
know if you’d like to join us."
• Compliment: "Hey, I really like your tie!"
• Refusal: "Oh, I’d love to see that movie with you but this Friday just isn’t
going to work.“
66
• https://en.wikipedia.org/wiki/Speech_act
Pragmatics
• Discourse / Context makes utterances more complicated.
• Anaphora – the deliberate repetition of the first part of the sentence
in order to achieve an artistic effect
• “Every day, every night, in every way, I am getting better and better”
• “My life is my purpose. My life is my goal. My life is my inspiration.”
• “Buying nappies for the baby, feeding the baby, playing with the baby: This is
what your life is when you have a baby.
• “I want my money right now, right here, all right?”
• http://literarydevices.net/anaphora/
• Ellipsis – Incomplete sentences
• “What’s your name?”
• “Srini, and yours?”
• The second sentence is not complete, but what it means can be inferred from
the first one.
• https://en.wikipedia.org/wiki/Ellipsis_(linguistics)
67
Anaphora Resolution/Co-Reference
• Determine which phrases in a document refer to the
same underlying entity.
• John put the carrot on the plate and ate it.
• Bush started the war in Iraq. But the president needed the
consent of Congress.
• Some cases require difficult reasoning.
• Today was Jack's birthday. Penny and Janet went to the store. They
were going to get presents. Janet decided to get a kite. "Don't do that,"
said Penny. "Jack has a kite. He will make you take it back."
Natural Languages vs. Computer Languages
• Ambiguity is the primary difference between natural and
computer languages.
• Formal programming languages are designed to be
unambiguous, i.e. they can be defined by a grammar that
produces a unique parse for each sentence in the
language.
• Programming languages are also designed for efficient
(deterministic) parsing, i.e. they are deterministic context-
free languages (DCLFs).
• A sentence in a DCFL can be parsed in O(n) time where n is the
length of the string.
• Processing natural language text involves many various
syntactic, semantic and pragmatic tasks in addition to
other problems. 69
What is Genetic Algorithm?
• It is a computerized search and optimization
algorithms based on Darwin’s Principle of Natural
Selection
• Genetic Algorithms (GAs) are adaptive heuristic
search algorithm based on the evolutionary ideas of
natural selection and genetics.
• Part of Evolutionary Algorithms - to simulate
processes in natural system necessary for evolution
70
What is Genetic Algorithm?
• Each generation consists of a population of character strings
that are analogous to the chromosome that we see in our
DNA. Each individual represents a point in a search space and
a possible solution.
• The individuals in the population are then made to go through
a process of evolution.
• Provide efficient, effective techniques for optimization
• Structural Engineering Problems, Biology, Computer Science,
Image processing and Pattern Recognition, physical science,
social sciences and Neural Networks
• http
://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1
.html 71
Search Techniques
Search techniques
Calculus-based techniques Guided random search techniques Enumerative techniques
Direct methods Indirect methods Evolutionary algorithms Simulated annealing Dynamic programming
Finonacci Newton Evolutionary strategies Genetic algorithms
Parallel Sequential
Centralized Distributed Steady-state Generational
Genetic Algorithm
• Based on Darwinian Paradigm (4 stages)
Reproduction Competition
Survive Selection
• Intrinsically a robust search and optimization
mechanism
Components of a GA
1. A problem definition as input,
2. Encoding principles (gene, chromosome)
3. Initialization procedure (creation)
4. Selection of parents (reproduction)
5. Genetic operators (mutation, recombination)
6. Evaluation function (environment)
7. Termination condition
74
Issues for Genetic Algorithm Practitioners
• Choosing basic implementation issues such as
• Representation
• Population size and mutation rate
• Selection, deletion policies
• Cross over and mutation operators
• Termination criterion
• Performance and scalability
• Solution is only as good as evaluation functions.
Benefits Of Genetic Algorithms
• Easy to understand
• Supports multi-objective optimisation
• Good for noisy environment
• We always get answer and answer gets better with time
• Inherently parallel and easily distributed
• Easy to exploit for precious or alternate solutions
• Flexible in forming building blocks for hybrid applications
• Has substantial history and range of use
How do you encode a solution?
• Obviously this depends on the problem!
• GA’s often encode solutions as fixed length “bitstrings”
• e.g. 101110, 111111, 000101
• Each bit represents some aspect of the proposed
solution to the problem
• For GA’s to work, we need to be able to “test” any
string and get a “score” indicating how “good” that
solution is.
Example: Drilling for Oil
• Imagine you had to drill for oil somewhere along a
single 1km desert road
• Problem: choose the best place on the road that
produces the most oil per day
• We could represent each solution as a position on the
road
• Say, a whole number between [0..1000]
Example: Drilling for Oil
• The set of all possible solutions [0..1000] is called the
search space or state space
• In this case it’s just one number but it could be many
numbers or symbols
• Often GA’s code numbers in binary producing a
bitstring representing a solution
• In our example we choose 10 bits which is enough to
represent 0..1000
• i.e. 0000000000….1111111111
Convert to Binary String
512 256 128 64 32 16 8 4 2 1
900 1 1 1 0 0 0 0 1 0 0
300 0 1 0 0 1 0 1 1 0 0
1023 1 1 1 1 1 1 1 1 1 1
In GA’s these encoded strings are sometimes called “genotypes”
or “chromosomes” and the individual bits are sometimes called
“genes”
Example: Drilling for Oil
Solution1 = 300 Solution2 = 900
(0100101100) (1110000100)
Road
0 1000
OIL
30
5
Location
Search Space
• For a simple function f(x) the search space is one
dimensional.
• But by encoding several values into the chromosome
many dimensions can be searched e.g. two dimensions
f(x, y)
• Search space an be visualised as a surface or fitness
landscape in which fitness dictates height
• Each possible genotype is a point in the space
• A GA tries to move the points to better places (higher
fitness) in the space
Fitness Landscapes
Search Space
• Obviously, the nature of the search space dictates how
a GA will perform
• A completely random space would be bad for a GA
• Also GA’s can get stuck in local maxima if search
spaces contain lots of these
• Generally, spaces in which small improvements (locally)
get closer to the global optimum are good to use GA
Basic Genetic Algorithm
1. Initialization: Start with a large “population” of
randomly generated “attempted solutions” to a
problem
2. Repeatedly do the following:
1. Evaluate each of the attempted solutions
2. Keep a subset of these solutions (the “best” ones), eliminate
the “bad” ones
3. Use these solutions to generate a new population
3. Quit when you have a satisfactory solution (or you
run out of time)
85
Simply put…
1. Generate a set of random solutions
2. Repeat
1. Test (with a fitness function) each solution in the set (rank
them)
2. Remove some bad solutions from set
3. Duplicate some good solutions
4. Make small changes to some of them
3. Until best solution is good enough
86
Implementation Details
• After an initial population is randomly generated, the
algorithm evolves the through three stochastic operators:
1. Selection which equates to survival of the fittest;
Replicates the most successful solutions found in a
population at a rate proportional to their relative quality.
2. Recombination / crossover which represents mating
between individuals; Decomposes two distinct solutions
and then randomly mixes their parts to form novel/new
solutions.
3. Mutation introduces random modifications.
• https://www.youtube.com/watch?v=ejxfTy4lI6I
87
Example 1: A Simple Example
• Suppose your “organisms” are 32-bit computer words
• You want a string in which all the bits are ones
• Here’s how you can do it:
1. Create 100 randomly generated computer words
2. Repeatedly do the following:
1. Count the 1 bits in each word
2. Exit if any of the words have all 32 bits set to 1
3. Keep the ten words that have the most 1s (discard the rest)
4. From each word, generate 9 new words as follows: Pick a random bit in
the word and toggle (change) it
• Note that this procedure does not guarantee that the next
“generation” will have more 1 bits (a better solution), but
it’s likely
88
1. Selection Operator
• Give preference to better individuals, allowing them
to pass on their genes to the next generation.
• The goodness of each individual depends on its
fitness.
• Fitness may be determined by an objective / fitness
function or by a subjective judgement.
89
2. Crossover Operator
• Main feature of GA from other optimization techniques
• Two individuals are chosen from the population using
the selection operator
• A crossover site along the bit strings is randomly
chosen
• The values of the two strings are exchanged up to this
point
• If s1=000000 and s2=111111 and the crossover point
is 2 then S1'=110000 and s2'=001111
• The two new offspring created from this mating are put
into the next generation of the population
• Assumption: By recombining portions of good
individuals, this process is likely to create even better
individuals
90
Crossover
• It relies on random mutation (mixture) to find a good
solution.
• It has been found that by introducing “sex” into the
algorithm better results are obtained
• Two high scoring “parent” bit strings (chromosomes) are
selected and with some probability (crossover rate)
combined
• Producing two new offspring (bit strings)
• Each offspring may then be changed randomly
(mutation)
3. Mutation Operator
• With some low probability, a portion of the new
individuals will have some of their bits flipped /
toggled.
• Its purpose is to maintain diversity within the
population and inhibit premature convergence.
• Mutation alone induces a random walk through the
search space
• Mutation and selection (without crossover) create a
parallel, noise-tolerant, hill-climbing algorithms
92
How to selecting parents?
• Many schemes are possible so long as better scoring
chromosomes more likely selected
• Score is often termed the fitness
• “Roulette Wheel” selection can be used:
1. Add up the fitness's of all chromosomes
2. Generate a random number R in that range
3. Select the first chromosome in the population that - when all
previous fitness’s are added - gives you at least the value R
Example 2: Population
No. Chromosome Fitness
1 1010011010 1
2 1111100001 2
3 1011001100 3
4 1010000000 1
5 0000010000 3
6 1001011111 5
7 0101010101 1
8 1011100111 2
Roulette Wheel Selection
1 2 3 4 5 6 7 8
1 2 3 1 3 5 1 2
0 Rnd[0..18] Rnd[0..18] = 18
=7 12
Chromosom Chromosome
e4 6
Parent1 Parent2
Parameters to Set
• Any GA implementation needs to decide on a number
of parameters:
• Population size (N),
• mutation rate (m),
• crossover rate (c)
• Often these have to be “tuned” based on results
obtained - no general theory to deduce good values
• Typical values might be: N = 50, m = 0.05, c = 0.9
Crossover - Recombination
10100000 Parent1 Offspring 10110111
00 1 11
10010111 Paren Offspring 10100000
11 t2 2 00
Crossover
single With some high probability
point - (crossover rate) apply crossover
random to the parents. (typical values
are 0.8 to 0.95)
Mutation mutat
e
Offspring 10110111 Offspring 10110011
1 11 1 11
Offspring 10100000 Offspring 10000000
2 00 2 00
Original offspring Mutated
offspring
With some small probability (the
mutation rate) flip each bit in the
offspring (typical values between 0.1 and
0.001)
Example 3: A More Realistic
Example
• Suppose you have a large number of (x, y) data points
• For example, (1.0, 4.1), (3.1, 9.5), (-5.2, 8.6), ...
• You would like to fit a polynomial (of up to degree 5)
through these data points
• That is, you want a formula y = ax5 + bx4 + cx3 + dx2 +ex + f
that gives you a reasonably good fit to the actual data
• Here’s the usual way to compute goodness of fit:
• Compute the sum of (actual y – predicted y) 2 for all the data points
• The lowest sum represents the best fit
• There are some standard curve fitting techniques, but
let’s assume you don’t know about them
• You can use a genetic algorithm to find a “pretty good”
solution 99
Example 3: A More Realistic
Example
• Your formula is y = ax5 + bx4 + cx3 + dx2 +ex + f
• Your “genes” are a, b, c, d, e, and f
• Your “chromosome” is the array [a, b, c, d, e, f]
• Your evaluation function for one array is:
• For every actual data point (x, y)
• Compute ý = ax5 + bx4 + cx3 + dx2 +ex + f
• Find the sum of (y – ý)2 over all x
• The sum is your measure of “badness” (larger numbers are worse)
• Example: For [0, 0, 0, 2, 3, 5] and the data points (1, 12) and (2,
22):
• ý = 0x5 + 0x4 + 0x3 + 2x2 +3x + 5 is 2 + 3 + 5 = 10 when x is 1
• ý = 0x5 + 0x4 + 0x3 + 2x2 +3x + 5 is 8 + 6 + 5 = 19 when x is 2
• (12 – 10)2 + (22 – 19)2 = 22 + 32 = 13
• If these are the only two data points, the “badness” of [0, 0, 0, 2, 3, 5] is
13 100
Example 3: A More Realistic
Example
• Your algorithm might be as follows:
• Create 100 six-element arrays of random numbers
• Repeat 500 times (or any other number):
• For each of the 100 arrays, compute its badness (using all data
points)
• Keep the ten best arrays (discard the other 90)
• From each array you keep, generate nine new arrays as follows:
• Pick a random element of the six
• Pick a random floating-point number between 0.0 and 2.0
• Multiply the random element of the array by the random floating-point
number
• After all 500 trials, pick the best array as your final answer
101
Many Variants of GA
• Different kinds of selection (not roulette)
• Tournament
• Elitism, etc.
• Different recombination
• Multi-point crossover
• 3 way crossover etc.
• Different kinds of encoding other than bitstring
• Integer values
• Ordered set of symbols
• Different kinds of mutation
What is a Swarm?
• A loosely structured collection of interacting agents
• Agents:
• Individuals that belong to a group (but are not necessarily
identical)
• They contribute to and benefit from the group
• They can recognize, communicate, and/or interact with each
other
• A swarm is better understood if thought of as agents
exhibiting a collective behavior
Examples of Swarms in Nature
• Classic Example: Swarm of Bees
• Can be extended to other similar
systems:
• Ant colony
• Agents: ants
• Flock of birds
• Agents: birds
• Traffic
• Agents: cars
• Crowd
• Agents: humans
• Immune system
• Agents: cells and molecules
Characteristics of Swarming
• Simple rules for each individual
• 3 simple rules as in the next slides
• No central control
• Decentralized and hence robust
• Emergent
• Performs complex functions
Example: Bird Flocking
• “Boids” model was proposed by Reynolds
• Boids = Bird-oids (bird like)
• Only 3 simple rules
1. Rule 1: Avoid Collision with neighboring birds
2. Rule 2: Match the velocity of neighboring birds
3. Rule 3: Stay near neighboring birds
• https://www.youtube.com/watch?v=A6nvvFkbRkY
• https://en.wikipedia.org/wiki/Flock_(birds)
Collision Avoidance
• Rule 1: Avoid Collision with neighboring birds
Velocity Matching
• Rule 2: Match the velocity (direction & speed) of
neighboring birds
Flock Centering
• Rule 3: Stay near neighboring birds
Real World Insects Example
• Insects have a few hundred brain cells
• However, organized insects have been known for:
• Architectural wonder
• Complex communication systems
• Resistance to hazards in nature
• In the 1950’s E.O. Wilson observed:
• A single ant acts (almost) randomly – often leading to its own
demise
• A colony of ants provides food and protection for the entire
population
Bees
• Colony cooperation
• Regulate hive temperature
• Efficiency via Specialization: division of labour in the
colony
• Communication: Food sources are exploited according
to quality and distance from the hive
112
Self-Organization in Honey Bee Nest
Building
113
Simulation of Honey Bee Nest
Building
114
Simulation of Honey Bee Nest
Building
115
Simulation of Honey Bee Nest
Building
116
Simulation of Honey Bee Nest
Building
117
Simulation of Honey Bee Nest
Building
118
Simulation of Honey Bee Nest
Building
119
Ants
• Organizing highways to and from their foraging sites by
leaving pheromone trails
• Form chains from their own bodies to create a bridge to
pull and hold leafs together with silk
• Division of labour between major and minor ants
An In-depth Look at Real Ant Behaviour
Interrupt the Flow
The Path Thickens!
The New Shortest Path
Adapting to Environment Changes
Adapting to Environment Changes
Problems with Swarm Intelligent Systems
• Swarm Intelligent Systems are hard to ‘program’
since the problems are usually difficult to define.
• Solutions are emergent in the systems (not designed a priori,
you won’t know the pattern beforehand)
• Solutions result from behaviors and interactions among and
between individual agents
Possible Solutions to Create Swarm
Intelligence Systems
1. Create a catalog of the collective behaviours –
exhaustive list!
2. Model how social insects collectively perform tasks
• Use this model as a basis upon which artificial variations can
be developed
• Model parameters can be tuned within a biologically relevant
range or by adding non-biological factors to the model
Properties of Self-Organization
• Creation of structures
• Nest, foraging trails, or social organization
• Changes resulting from the existence of multiple
paths of development
• Non-coordinated & coordinated phases
• Possible coexistence of multiple stable states
• E.g. two equal food sources
• Self Organization:
• https://www.youtube.com/watch?v=3ypBqxv_tz8
• https://www.youtube.com/watch?v=BTR17I_Eb_o
4 Ingredients of Self Organization
1. Positive Feedback
2. Negative Feedback
3. Amplification of Fluctuations - randomness
4. Reliance on multiple interactions
Why is Swarm Intelligence interesting
for IT?
• Computer Systems are getting more and more
complicated
• Hard to have a master (centralized) control (coordinated)
• Swarm intelligence systems are:
• Robust
• Relatively simple
• Analogies in IT and social insects
• distributed system of interacting autonomous agents
• goals: performance optimization and robustness
• self-organized control and cooperation (decentralized)
• division of labour and distributed task allocation
• indirect interactions
• https://en.wikipedia.org/wiki/Swarm_intelligence
• http://www.scholarpedia.org/article/Swarm_intelligence
Two Common SI Algorithms
• Ant Colony Optimization
• https://en.wikipedia.org/wiki/Ant_colony_optimization_algorith
ms
• Particle Swarm Optimization
• http://www.swarmintelligence.org/tutorials.php
Particle Swarm Optimization
• Particle swarm optimization (PSO) imitates human
or insects social behavior.
• Individuals interact with one another while learning
from their own experience, and gradually move towards
the goal.
• It is easily implemented and has proven both very
effective and quick when applied to a diverse set of
optimization problems.
Particle Swarm Optimization
• Bird flocking is one of the best example of PSO in nature.
• One motive of the development of PSO was to model
human social behavior.
Biological Inspiration –
Ant Colony Optimization (ACO)
• Inspired by foraging behavior of ants.
• Ants find shortest path to food source from nest.
• Ants deposit pheromone (
http://www.medicalnewstoday.com/articles/232635.php)
along traveled path which is used by other ants to
follow the trail.
• This kind of indirect communication via the local
environment is called stigmergy.
• Has adaptability, robustness and redundancy.
Foraging behavior of Ants
• 2 ants start with equal probability of going on either
path.
Foraging behavior of Ants
• The ant on shorter path has a shorter to-and-fro time
from it’s nest to the food.
Foraging behavior of Ants
• The density of pheromone on the shorter path is higher
because of 2 passes by the ant (as compared to 1 by
the other).
Foraging behavior of Ants
• The next ant takes the shorter route.
Foraging behavior of Ants
• Over many iterations, more ants begin using the path
with higher pheromone, thereby further reinforcing it.
Foraging behavior of Ants
• After some time, the shorter path is almost exclusively
used.
Advantages of SI
• The systems are scalable because the same control
architecture can be applied to a couple of agents or
thousands of agents
• The systems are flexible because agents can be easily
added or removed without influencing the structure
• The systems are robust because agents are simple in
design, the reliance on individual agents is small, and
failure of a single agents has little impact on the
system’s performance
• The systems are able to adapt to new situations easily