0% found this document useful (0 votes)
18 views142 pages

Topic 5 - Intelligent System Applications

Uploaded by

bcs23020007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views142 pages

Topic 5 - Intelligent System Applications

Uploaded by

bcs23020007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 142

CSS3563

Artificial
Intelligence

Topic 5 –
Intelligent
System
Applications

University College of
Technology Sarawak
Outline
• What is intelligent system?
• Overview of Intelligent System Applications
• Artificial Neural Networks (ANN)
• Natural Language Processing (NLP)
• Genetic Algorithms (GA)
• Swarm Intelligence

2
What is an intelligent system?

• What is intelligence?
• Hard to define unless you list characteristics eg,
• Reasoning
• Learning
• Adaptivity

• A truly intelligent system adapts itself to deal with changes in


problems (automatic learning)

• Few machines can do that at present

• Machine intelligence has a computer follow problem solving


processes something like that in humans

• Intelligent systems display machine-level intelligence, reasoning,


often learning, not necessarily self-adapting
3
Intelligent systems in business

• Intelligent systems in business utilise one or more intelligence


tools, usually to aid decision making
• Provides business intelligence to
• Increase productivity
• Gain competitive advantage
• Examples of business intelligence – information on
• Customer behaviour patterns
• Market trend
• Efficiency bottlenecks
• Examples of successful intelligent systems applications in
business:
• Customer service (Customer Relations Modelling)
• Scheduling (eg Mine Operations)
• Data mining
• Financial market prediction
• Quality control
4
Intelligent systems in business

• HNC (now Fair Isaac) software’s credit card fraud detector Falcon
offers 30-70% improvement over existing methods (an example of a
neural network).

• MetLife insurance uses automated extraction of information from


applications in MITA (an example of language technology use)

• Personalized, Internet-based TV listings (an intelligent agent)

• Hyundai’s development apartment construction plans FASTrak-Apt (a


Case Based Reasoning project)

• US Occupational Safety and Health Administration (OSHA uses


"expert advisors" to help identify fire and other safety hazards at
work sites (an expert system).
• Source: http://www.newsfactor.com/perl/story/16430.html
5
Introduction to ANN
• Artificial neural networks (ANNs) are a family of models
inspired by biological neural networks (the central nervous
systems of animals, in particular the brain)

• ANN are used to estimate or approximate functions that


can depend on a large number of inputs and are
generally unknown.

• Artificial neural networks are generally presented as


systems of interconnected "neurons" which exchange
messages between each other.

• The connections have numeric weights (wi) that can be


tuned based on experience, making neural nets adaptive to
inputs and capable of learning.
6
• https://www.youtube.com/watch?v=gcK_5x2KsLA
ANN

7
Characteristics of NNs
• Learning from experience: Complex, difficult to solve
problems, but with plenty of data that describe the
problem
• Generalizing from examples: Can interpolate from
previous learning and give the correct response to
unseen data (prediction)
• Rapid applications development: NNs are generic
machines and quite independent from domain knowledge
• Adaptability: Adapts to a changing environment, if is
properly designed
• Computational efficiency: Although the training of a
neural network demands a lot of computer power, a
trained network demands almost nothing in recall mode.
• Non-linearity: A NN, made up of an interconnection of
nonlinear neurons, is itself nonlinear. Physical mechanism
responsible for generation of the input signal (e.g., Slide 8
speech signal) is inherently nonlinear
Who are concerned with NNs?
• Computer scientists want to find out about the properties of
non-symbolic information processing with neural nets and about
learning systems in general.
• Statisticians use neural nets as flexible, nonlinear regression and
classification models.
• Engineers of many kinds exploit the capabilities of neural
networks in many areas, such as signal processing and automatic
control.
• Cognitive scientists view neural networks as a possible
apparatus to describe models of thinking and consciousness (High-
level brain function).
• Neuro-physiologists use neural networks to describe and explore
medium-level brain function (e.g. memory, sensory system,
motorics).
• Physicists use neural networks to model phenomena in statistical
mechanics and for a lot of other tasks.
• Biologists use Neural Networks to interpret nucleotide sequences.
• Philosophers and some other people may also be interested in
Neural Networks for various reasons
Neural Network Techniques
• Conventional: Computers have to be explicitly
programmed
• Analyze the problem to be solved.
• Write the code in a programming language (rules) – this is
what we conventional do.

• Neural networks learn from examples (not


programmed a priori)
• No requirement of an explicit description of the problem –
learn from training data.
• No need for a programmer.
• The neural computer adapts itself during a training period,
based on examples of similar problems even without a
desired solution to each problem.
• After sufficient training the neural computer is able to relate
the problem data to the solutions, inputs to outputs, and it is
then able to offer a viable solution to a brand new problem.
• Able to generalize or to handle incomplete data. Slide 10
NNs vs Computers
Digital Computers Neural Networks
• Deductive Reasoning - • Inductive Reasoning -
We apply known rules Given input and output
(programs) to input data to data (training examples),
produce output. we construct the rules.
• Computation is centralized, • Computation is collective,
synchronous, and serial. asynchronous, and parallel.
• Memory is packetted, • Memory is distributed,
literally stored, and internalized, short term
location addressable. and content addressable.
• Not fault tolerant. One • Fault tolerant, redundancy,
transistor goes and it no and sharing of
longer works. responsibilities.
• Exact. • Inexact.
• Static connectivity. • Dynamic connectivity.
• Applicable if well • Applicable if rules are
defined rules with unknown or
precise input data. complicated, or if data Slide 11

are noisy or partial.


Neurons in the Brain

• Although heterogeneous, at a low level the brain is


composed of neurons
• A neuron receives input from other neurons (generally
thousands) from its synapses
• Inputs are approximately summed
• When the input exceeds a threshold the neuron sends an
electrical spike (signal) that travels from the body, down the
axon, to the next neuron(s)
A Simple Artificial Neuron

• An artificial neuron is a device with many inputs and one


output.

• The neuron has two modes of operation;


• the training mode and
• the using mode
A Simple Artificial Neuron
• In the training mode, the neuron can be trained to fire
(or not), for particular input patterns.

• In the using mode, when a taught input pattern is


detected at the input, its associated output becomes the
current output.

• If the input pattern does not belong in the taught list of


input patterns, the firing rule is used to determine
whether to fire or not.

• A firing rule determines how one calculates whether a


neuron should fire for any input pattern. It relates to all
the input patterns, not only the ones on which the node
was trained on previously.
Neural Network Learning Process

15
Artificial Neuron Model
• Neural computing requires a number of neurons, to be
connected together into a neural network.

• Neurons are arranged in layers.

• Each neuron within the network is usually a simple


processing unit which takes one or more inputs and
produces an output.

• At each neuron, every input has an associated weight


which modifies the strength of each input. The neuron
simply adds together all the inputs and calculates an
output to be passed on.
An Artificial Neuron Model
• When a neuron receives excitatory input (stimulus) that is
sufficiently large compared with its inhibitory input
(threshold), it sends a spike of electrical activity down its
axon.
• Learning occurs by changing the effectiveness of the
synapses so that the influence of one neuron on another
changes.

• We conduct these neural networks by first trying to deduce


the essential features of neurons and their
interconnections.
Artificial Neuron Model

Threshold

https://www.youtube.com/watch?v=SGZ6BttHMPw 18
Network Structure
• The number of layers and number of neurons depend
on the specific task.

• In practice this issue is solved by trial and error.


• Two types of adaptive algorithms can be used:
• start from a large network and successively remove some
neurons and links until network performance degrades.
• begin with a small network and introduce new neurons until
performance is satisfactory.
Network Layers

• Input Layer - The activity of the input units represents


the raw information that is fed into the network.
• Hidden Layer - The activity of each hidden unit is
determined by the activities of the input units and the
weights on the connections between the input and the
hidden units.
• Output Layer - The behavior of the output units
depends on the activity of the hidden units and the
weights between the hidden and output units.
Network Layers

• This simple type of network is interesting because the


hidden units are free to construct their own
representations of the input.
• The weights between the input and hidden units
determine when each hidden unit is active, and so by
modifying these weights, a hidden unit can choose
what it represents.
Feedforword NNs
• The basic structure of a feedforward Neural Network

• Feed-forward NNs allow signals to travel one way


only; from input to output.
• There is no feedback (loops) i.e. the output of any
layer does not affect that same layer.
Feedforword NNs
• Feed-forward NNs tend to be straight forward
networks that associate inputs with outputs. They are
extensively used in pattern recognition.

• This type of organization is also referred to as


bottom-up or top-down.

• The learning rule modifies the weights according to


the input patterns that it is presented with. In a
sense, ANNs learn by example.

• When the desired output are known we have


supervised learning (classification) or learning with a
teacher
Feedforward Network
Single-Layer Feedforward Multilayer Feedforward Networks
Networks

24
Feedforward Network (animated)

https://www.youtube.com/wa
tch?v=b5mHER4rI-c
25
Feedback/Recurrent Networks
• Feedback networks can have signals traveling in both
directions by introducing loops in the network.

• Feedback networks are dynamic; their 'state' is


changing continuously until they reach an equilibrium
point.

• They remain at the equilibrium point until the input


changes and a new equilibrium needs to be found.

• Feedback architectures are also referred to as


interactive or recurrent, although the latter term is
often used to denote feedback connections in single-
layer organizations.

• https://www.youtube.com/watch?v=I9Da6f6MpLs
Feedback/Recurrent Networks

27
Weights

• In general, initial weights are randomly chosen, with


typical values between -1.0 and 1.0 or -0.5 and 0.5.

• There are two types of NNs:

• Fixed Networks – where the weights are fixed

• Adaptive Networks – where the weights are changed to


reduce prediction error.
The Learning Rule
• The delta rule is often utilized by the most common class
of ANNs called back-propagational neural networks.
1. When a neural network is initially presented with a
pattern it makes a random guess as to what it might be.
2. It then sees how far its answer was from the actual one
and makes an appropriate adjustment to its connection
weights.

I nput

De s ire d
O utput

https://en.wikipedia.org/wiki/Delta_rule
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
#Adaptive-Linear-Neurons-and-the-Delta-Rule Slide 29
Training: Back-prop Algorithm
• The Back-propagation algorithm searches for weight
values that minimize the total error of the network over
the set of training examples (training set).
• Backprop consists of the repeated application of the
following two passes:
• Forward pass: in this step the network is activated on one
example and the error of (each neuron of) the output layer is
computed.
• Backward pass: in this step the network error is used for
updating the weights (feedback). Starting at the output layer,
the error is propagated backwards through the network, layer by
layer. This is done by recursively computing the local gradient of
each neuron.
Back-propagation (animated)

• http://ocw.mit.edu/courses/electrical-engineering-and-c
omputer-science/6-034-artificial-intelligence-fall-2010/le
cture-videos/lecture-12-learning-neural-nets-back-propa
gation/
31
Back-propagation
1. A set of examples for training the network is
assembled. Each case consists of a problem
statement (which represents the input into the
network) and the corresponding solution
(which represents the desired output from the
network).
2. The input data is entered into the network via
the input layer.
3. Each neuron in the network processes the
input data with the resultant values steadily
"percolating" through the network, layer by
layer, until a result is generated by the output
layer.
4. The actual output of the network is compared
to expected output for that particular input.
This results in an error value.. The connection
weights in the network are gradually adjusted,
working backwards from the output layer,
through the hidden layer, and to the input
layer, until the correct output is produced.
Fine tuning the weights in this way has the
effect of teaching the network how to produce
the correct output for a particular input, i.e. Slide 32
the network learns.
Training Basics
• The most basic method of training a neural network is trial
and error.

• The set of data which enables the training is called the


training set.

1. During the training of a network the same set of data is


processed many times as the connection weights are ever
refined. If the network isn't behaving the way it should,
change the weighting of a random link by a random
amount.

2. If the accuracy of the network declines, undo the change


and make a different one.

• It takes time, but the trial and error method does produce
results.
Size of Training Data
• Rule of thumb: the number of training examples should
be at least five to ten times the number of weights of the
network.

|W|
N
(1 - a)
|W|= number of weights

a = expected accuracy on test set


Design Conciderations

• What transfer function should be used?


• How many inputs does the network need?
• How many hidden layers does the network need?
• How many hidden neurons per hidden layer?
• How many outputs should the network have?
• How are the weights initialized?
• How many hidden layers and how many neurons?
• How many examples in the training set?
There is no standard methodology to
determinate these values. Even there is
some heuristic points, final values are
determined by a trial and error procedure.
The Learning Process

• Every neural network possesses knowledge which is


contained in the values of the connection weights.

• Modifying the knowledge stored in the network as a


function of experience implies a learning rule for changing
the values of the weights.

• Adaptive networks are NNs that allow the change of


weights in its connections.

• The learning / training methods can be classified in two


categories:
• Supervised Learning
• Unsupervised Learning
Supervised Learning
(Learning with a Teacher)

• In this sort of learning, the teacher’s experience is used


to tell the NN which outputs are correct and which are
not.
• This does not mean that a human teacher needs to be
present at all times, only the correct classifications
gathered from the human teacher on a domain needs
to be present.
• The network then learns from its error, that is, it
changes its weight to reduce its prediction error.
Supervised Learning
(Learning with a Teacher)
• In supervised training, both the inputs and the desred outputs
are provided. Supervised learning which incorporates an
external teacher, so that each output unit is told what its
desired response to input signals ought to be.

• Error convergence: the minimization of error between the


desired and computed unit values. The network then
processes the inputs and compares its resulting outputs
against the desired outputs. Errors are then propagated back
through the system, causing the system to adjust the weights
which control the network. This process occurs over and over
as the weights are continually tweaked.

• The aim is to determine a set of weights which minimizes the


error. One well-known method, which is common to many
learning paradigms is the least mean square (LMS)
convergence.
Unsupervised learning
(Learning without a Teacher)

• Unsupervised learning uses no external teacher and is


based upon only local information.
• The network is provided with inputs but not with
desired outputs. The system itself must then decide
what features it will use to group the input data. The
network is then used to construct clusters of similar
patterns.
• It is also referred to as self-organization, in the sense
that it self-organizes data presented to the network and
detects their emergent collective properties.
• This is particularly useful when instances are checked
to match previous scenarios. For example, detecting
credit card fraud.
Advantages of ANN
• Powerful data-driven, self-adaptive, flexible
computational tool having the capability of capturing
nonlinear and complex underlying characteristics of
any physical process (e.g. damage detection) with a
high degree of accuracy.

• ANNs are used to treat complicated problems, in


which too many variables to be simplified in a model.

• Easily implemented in parallel architectures (i.e. in


multicore processors or systems with GPUs).

40
Disadvantages of ANN

• The individual relations between the input variables


and the output variables are not developed by
engineering judgment so that the model tends to be a
black box or input/output table without analytical
basis.
• Due to this, the user cannot explain how learning
from input data was performed. (You can get the result
but you may not know why)

• The sample size has to be large (for training).

• Requires lot of trial and error so training can be time


consuming.
What is Natural Language Processing
(NLP)?
• NLP is the branch of computer science focused on
developing systems that allow computers to communicate
with people using everyday language.

• Also called Computational Linguistics (CL), Human


Language Technology (HLT), Natural Language Engineering
(NLE)

• Concerns how computational methods can aid the


understanding of human language

• Can machines understand human language?


• It depends on what you mean by ‘understand’

• Understanding is the ultimate goal. However, one doesn’t


need to fully understand to be useful.
What is NLP?
• Started off as a branch of Artificial Intelligence.

• Borrows from Linguistics, Psycholinguistics, Cognitive


Science & Statistics.

• Analyze, understand and generate human languages


just like humans do.

• Applying computational techniques to language


domain.

• To explain linguistic theories, to use the theories to


build systems that can be of social use.

• Make computers learn our language rather than


we learn theirs.
What is NLP?
• Computers using natural language as input and/or
output

44
Natural Language Processing (NLP)

• Natural Language Understanding (input)


• Taking some spoken/typed sentence and working out what it
means

• Natural Language Generation (output)


• Taking some formal representation of what you want to say
and working out a way to express it in a natural (human)
language (e.g., English)
Why Study NLP?

• Language is a hallmark of human intelligence.

• Text is the largest repository of human knowledge and


is growing quickly: emails, news articles, web pages, IM,
scientific articles, insurance claims, customer complaint
letters, transcripts of phone calls, technical documents,
government documents, patent portfolios, court
decisions, contracts, ……

• Are we reading any faster than before?


The Problems
• When people see text, they understand its meaning (by and
large)

• When computers see text, they get only character strings


(and perhaps HTML tags)

• We'd like computer agents to see meanings (like a human


reader) and be able to intelligently process text

• These desires have led to many proposals for structured,


semantically marked up formats

• But often human beings still resolutely make use of text in


human languages

• This problem isn’t likely to just go away anytime soon…


Syntax
• Words convey meaning. But when they are put together they
convey more.

• Syntax is the grammatical structure of the sentence. Just


like the syntax in programming languages.
• structures and patterns in phrases
• how phrases are formed by smaller phrases and words

• Identifying the structure is the first step towards


understanding the meaning of the sentence.

• Syntactic Analysis (Parsing) = Process of assigning a


parse tree to a sentence.

• Constituents, grammatical relations, subcategorization and


dependencies.
Syntactic Analysis (Parsing)

49
Syntactic Parsing

• Produce the correct syntactic parse tree for a sentence.


Word Segmentation
• Breaking a string of characters (graphemes) into a
sequence of words.

• In some written languages (e.g. Chinese) words are not


separated by spaces.

• Even in English, characters other than white-space can


be used to separate words [e.g. , ; . - : ( ) ]

• Examples from English URLs:


• jumptheshark.com  jump the shark .com
• myspace.com/pluckerswingbar
•  myspace .com pluckers wing bar
•  myspace .com plucker swing bar
Part Of Speech (POS) Tagging

• Annotate each word in a sentence with a part-of-


I ate the spaghetti with meatballs.
speech.
Pro V Det N Prep N

• Useful for subsequent syntactic parsing and word sense


John saw the saw and decided to take it to the table.
disambiguation.
PN V Det N Con V Part V Pro Prep Det N
Phrase Chunking

• Find all non-recursive noun phrases (NPs) and verb


phrases (VPs) in a sentence:

• [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs].

• [NP He ] [VP reckons ] [NP the current account deficit ] [VP will
narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP
September ]
Morphological Analysis

• Morphology is the field of linguistics that studies the


internal structure of words. (Wikipedia)

• A morpheme is the smallest linguistic unit that has


semantic meaning (Wikipedia)
• e.g. “carry”, “pre”, “ed”, “ly”, “s”

• Morphological analysis is the task of segmenting a


word into its morphemes:
• carried  carry + ed (past tense)
• independently  in + (depend + ent) + ly
• Googlers  (Google + er) + s (plural)
• unlockable  un + (lock + able) ?
•  (un + lock) + able ?
Morphology: What is a word?
• Morphology is all about the words.
• Make more words from less ☺.
• Structures and patterns in words
• Analyzes how words are formed from minimal units of
meaning, or morphemes, e.g., dogs= dog+s.
• Words are a sequence of Morphemes.
• Morpheme – smallest meaningful unit in a word. Free & Bound.
• Inflectional Morphology – Same Part of Speech
• Buses = Bus + es
• Carried = Carry + ed
• Derivational Morphology – Change PoS.
• Destruct + ion = Destruction (Noun)
• Beauty + ful = Beautiful (Adjective)
• Affixes – Prefixes, Suffixes & Infixes
• Rules govern the fusion.
• https://en.wikipedia.org/wiki/Morphology_(linguistics)
Morphology

56
Morphology is not as easy as it may
seem to be…

• Examples from Woods et. al. 2000

• Delegate(delegasyon,heyet)
• (de + leg + ate) take the legs from

• Caress(okşamak)
• (car + ess(dişilik eki)) female car

• cashier (cashy + er) more wealthy

• lacerate (lace + rate) speed of tatting

• ratify (yırtmak, yaralamak; (kalbini) kırmak)


• (rat + ify) infest with rodents(kemigenlerin istilası)

• Infantry(piyade)
• (infant(bebek, küçük çocuk ) + ry) childish behavior
Semantics
• What do you mean? What is the meaning?

• Semantics: the meaning of a word or phrase within a


sentence
• How to represent meaning?
• Semantic network? Logic? Policy?
• How to construct meaning representation?
• Is meaning compositional?

• Words – Lexical Semantics


• Sentences – Compositional Semantics

• Converting the syntactic structures to semantic format –


meaning representation.
Word Sense Disambiguation (WSD)

• Words in natural language usually have a fair number


of different possible meanings.
• Ellen has a strong interest in computational linguistics.
• Ellen pays a large amount of interest on her credit card.
• For many tasks (question answering, translation), the
proper sense of each ambiguous word in a sentence
must be determined.
• http://aclweb.org/aclwiki/index.php?title=Word_sense_d
isambiguation

59
Word Sense Disambiguation (WSD)

60
Semantic Role Labeling (SRL)
• Semantic role labeling, sometimes also called shallow
semantic parsing, is a task in natural language
processing consisting of the detection of the semantic
arguments associated with the predicate or verb of
a sentence and their classification into their specific roles.

• For each clause, determine the semantic role played by each


noun phrase that is an argument to the verb.
• agent recipient source destination instrument
• John drove Mary from Austin to Dallas in his Toyota Prius.

• agent theme
• The hammer broke the window.

• Also referred to a “case role analysis,” “thematic


analysis,” and “shallow semantic parsing”

61
• https://en.wikipedia.org/wiki/Semantic_role_labeling
Semantic Role Labeling (SRL)

62
Semantic Parsing

• A semantic parser maps a natural-language sentence


to a complete, detailed semantic representation (logical
form, e.g. predicate logic).

• For many applications, the desired output is


immediately executable by another program.

• Example: Mapping an English database query to Prolog


(predicate logic):
• How many cities are there in the US?
• answer(A, count(B, (city(B), loc(B, C), const(C,
countryid(USA))), A))

• Remember propositional and predicate logic


Semantic Parsing

64
Pragmatics
• Pragmatics: structures and patterns in discourses
• Sentence standing alone may not mean so much. It may
be ambiguous.
• What information is contained in the contextual
sentences that is not conveyed in the actual
sentence?
• https://en.wikipedia.org/wiki/Pragmatics
Pragmatics
• Discourse / Context makes utterances more complicated.

• Implicatures: what is suggested in an utterance, even though


neither expressed nor strictly implied (that is, entailed) by the
utterance.
• "Mary had a baby and got married" strongly suggests that Mary had the baby
before the wedding
• https://en.wikipedia.org/wiki/Implicature

• Speech acts: A speech act is an utterance that serves a function in


communication. We perform speech acts when we offer an apology,
greeting, request, complaint, invitation, compliment, or refusal.
• Greeting: "Hi, Eric. How are things going?"
• Request: "Could you pass me the mashed potatoes, please?"
• Complaint: "I’ve already been waiting three weeks for the computer, and I
was told it would be delivered within a week."
• Invitation: "We’re having some people over Saturday evening and wanted to
know if you’d like to join us."
• Compliment: "Hey, I really like your tie!"
• Refusal: "Oh, I’d love to see that movie with you but this Friday just isn’t
going to work.“
66
• https://en.wikipedia.org/wiki/Speech_act
Pragmatics
• Discourse / Context makes utterances more complicated.

• Anaphora – the deliberate repetition of the first part of the sentence


in order to achieve an artistic effect
• “Every day, every night, in every way, I am getting better and better”
• “My life is my purpose. My life is my goal. My life is my inspiration.”
• “Buying nappies for the baby, feeding the baby, playing with the baby: This is
what your life is when you have a baby.
• “I want my money right now, right here, all right?”
• http://literarydevices.net/anaphora/

• Ellipsis – Incomplete sentences


• “What’s your name?”
• “Srini, and yours?”
• The second sentence is not complete, but what it means can be inferred from
the first one.
• https://en.wikipedia.org/wiki/Ellipsis_(linguistics)

67
Anaphora Resolution/Co-Reference

• Determine which phrases in a document refer to the


same underlying entity.
• John put the carrot on the plate and ate it.

• Bush started the war in Iraq. But the president needed the
consent of Congress.

• Some cases require difficult reasoning.


• Today was Jack's birthday. Penny and Janet went to the store. They
were going to get presents. Janet decided to get a kite. "Don't do that,"
said Penny. "Jack has a kite. He will make you take it back."
Natural Languages vs. Computer Languages
• Ambiguity is the primary difference between natural and
computer languages.

• Formal programming languages are designed to be


unambiguous, i.e. they can be defined by a grammar that
produces a unique parse for each sentence in the
language.

• Programming languages are also designed for efficient


(deterministic) parsing, i.e. they are deterministic context-
free languages (DCLFs).
• A sentence in a DCFL can be parsed in O(n) time where n is the
length of the string.

• Processing natural language text involves many various


syntactic, semantic and pragmatic tasks in addition to
other problems. 69
What is Genetic Algorithm?
• It is a computerized search and optimization
algorithms based on Darwin’s Principle of Natural
Selection

• Genetic Algorithms (GAs) are adaptive heuristic


search algorithm based on the evolutionary ideas of
natural selection and genetics.

• Part of Evolutionary Algorithms - to simulate


processes in natural system necessary for evolution

70
What is Genetic Algorithm?
• Each generation consists of a population of character strings
that are analogous to the chromosome that we see in our
DNA. Each individual represents a point in a search space and
a possible solution.

• The individuals in the population are then made to go through


a process of evolution.

• Provide efficient, effective techniques for optimization

• Structural Engineering Problems, Biology, Computer Science,


Image processing and Pattern Recognition, physical science,
social sciences and Neural Networks

• http
://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1
.html 71
Search Techniques
Search techniques

Calculus-based techniques Guided random search techniques Enumerative techniques

Direct methods Indirect methods Evolutionary algorithms Simulated annealing Dynamic programming

Finonacci Newton Evolutionary strategies Genetic algorithms

Parallel Sequential

Centralized Distributed Steady-state Generational


Genetic Algorithm

• Based on Darwinian Paradigm (4 stages)


Reproduction Competition

Survive Selection

• Intrinsically a robust search and optimization


mechanism
Components of a GA

1. A problem definition as input,

2. Encoding principles (gene, chromosome)

3. Initialization procedure (creation)

4. Selection of parents (reproduction)

5. Genetic operators (mutation, recombination)

6. Evaluation function (environment)

7. Termination condition
74
Issues for Genetic Algorithm Practitioners
• Choosing basic implementation issues such as
• Representation
• Population size and mutation rate
• Selection, deletion policies
• Cross over and mutation operators

• Termination criterion

• Performance and scalability

• Solution is only as good as evaluation functions.


Benefits Of Genetic Algorithms
• Easy to understand

• Supports multi-objective optimisation

• Good for noisy environment

• We always get answer and answer gets better with time

• Inherently parallel and easily distributed

• Easy to exploit for precious or alternate solutions

• Flexible in forming building blocks for hybrid applications

• Has substantial history and range of use


How do you encode a solution?

• Obviously this depends on the problem!

• GA’s often encode solutions as fixed length “bitstrings”


• e.g. 101110, 111111, 000101

• Each bit represents some aspect of the proposed


solution to the problem

• For GA’s to work, we need to be able to “test” any


string and get a “score” indicating how “good” that
solution is.
Example: Drilling for Oil

• Imagine you had to drill for oil somewhere along a


single 1km desert road
• Problem: choose the best place on the road that
produces the most oil per day
• We could represent each solution as a position on the
road
• Say, a whole number between [0..1000]
Example: Drilling for Oil

• The set of all possible solutions [0..1000] is called the


search space or state space

• In this case it’s just one number but it could be many


numbers or symbols

• Often GA’s code numbers in binary producing a


bitstring representing a solution

• In our example we choose 10 bits which is enough to


represent 0..1000
• i.e. 0000000000….1111111111
Convert to Binary String
512 256 128 64 32 16 8 4 2 1

900 1 1 1 0 0 0 0 1 0 0

300 0 1 0 0 1 0 1 1 0 0

1023 1 1 1 1 1 1 1 1 1 1

In GA’s these encoded strings are sometimes called “genotypes”


or “chromosomes” and the individual bits are sometimes called
“genes”
Example: Drilling for Oil
Solution1 = 300 Solution2 = 900
(0100101100) (1110000100)

Road
0 1000
OIL

30
5

Location
Search Space

• For a simple function f(x) the search space is one


dimensional.

• But by encoding several values into the chromosome


many dimensions can be searched e.g. two dimensions
f(x, y)

• Search space an be visualised as a surface or fitness


landscape in which fitness dictates height

• Each possible genotype is a point in the space

• A GA tries to move the points to better places (higher


fitness) in the space
Fitness Landscapes
Search Space

• Obviously, the nature of the search space dictates how


a GA will perform

• A completely random space would be bad for a GA

• Also GA’s can get stuck in local maxima if search


spaces contain lots of these

• Generally, spaces in which small improvements (locally)


get closer to the global optimum are good to use GA
Basic Genetic Algorithm

1. Initialization: Start with a large “population” of


randomly generated “attempted solutions” to a
problem

2. Repeatedly do the following:


1. Evaluate each of the attempted solutions
2. Keep a subset of these solutions (the “best” ones), eliminate
the “bad” ones
3. Use these solutions to generate a new population

3. Quit when you have a satisfactory solution (or you


run out of time)

85
Simply put…
1. Generate a set of random solutions

2. Repeat
1. Test (with a fitness function) each solution in the set (rank
them)
2. Remove some bad solutions from set
3. Duplicate some good solutions
4. Make small changes to some of them

3. Until best solution is good enough

86
Implementation Details
• After an initial population is randomly generated, the
algorithm evolves the through three stochastic operators:

1. Selection which equates to survival of the fittest;


Replicates the most successful solutions found in a
population at a rate proportional to their relative quality.

2. Recombination / crossover which represents mating


between individuals; Decomposes two distinct solutions
and then randomly mixes their parts to form novel/new
solutions.

3. Mutation introduces random modifications.

• https://www.youtube.com/watch?v=ejxfTy4lI6I
87
Example 1: A Simple Example
• Suppose your “organisms” are 32-bit computer words

• You want a string in which all the bits are ones

• Here’s how you can do it:


1. Create 100 randomly generated computer words
2. Repeatedly do the following:
1. Count the 1 bits in each word
2. Exit if any of the words have all 32 bits set to 1
3. Keep the ten words that have the most 1s (discard the rest)
4. From each word, generate 9 new words as follows: Pick a random bit in
the word and toggle (change) it

• Note that this procedure does not guarantee that the next
“generation” will have more 1 bits (a better solution), but
it’s likely
88
1. Selection Operator
• Give preference to better individuals, allowing them
to pass on their genes to the next generation.

• The goodness of each individual depends on its


fitness.

• Fitness may be determined by an objective / fitness


function or by a subjective judgement.

89
2. Crossover Operator
• Main feature of GA from other optimization techniques
• Two individuals are chosen from the population using
the selection operator
• A crossover site along the bit strings is randomly
chosen
• The values of the two strings are exchanged up to this
point
• If s1=000000 and s2=111111 and the crossover point
is 2 then S1'=110000 and s2'=001111
• The two new offspring created from this mating are put
into the next generation of the population
• Assumption: By recombining portions of good
individuals, this process is likely to create even better
individuals

90
Crossover
• It relies on random mutation (mixture) to find a good
solution.

• It has been found that by introducing “sex” into the


algorithm better results are obtained

• Two high scoring “parent” bit strings (chromosomes) are


selected and with some probability (crossover rate)
combined

• Producing two new offspring (bit strings)

• Each offspring may then be changed randomly


(mutation)
3. Mutation Operator
• With some low probability, a portion of the new
individuals will have some of their bits flipped /
toggled.
• Its purpose is to maintain diversity within the
population and inhibit premature convergence.
• Mutation alone induces a random walk through the
search space
• Mutation and selection (without crossover) create a
parallel, noise-tolerant, hill-climbing algorithms

92
How to selecting parents?

• Many schemes are possible so long as better scoring


chromosomes more likely selected

• Score is often termed the fitness

• “Roulette Wheel” selection can be used:


1. Add up the fitness's of all chromosomes
2. Generate a random number R in that range
3. Select the first chromosome in the population that - when all
previous fitness’s are added - gives you at least the value R
Example 2: Population
No. Chromosome Fitness
1 1010011010 1
2 1111100001 2
3 1011001100 3
4 1010000000 1
5 0000010000 3
6 1001011111 5
7 0101010101 1
8 1011100111 2
Roulette Wheel Selection

1 2 3 4 5 6 7 8
1 2 3 1 3 5 1 2

0 Rnd[0..18] Rnd[0..18] = 18
=7 12
Chromosom Chromosome
e4 6
Parent1 Parent2
Parameters to Set

• Any GA implementation needs to decide on a number


of parameters:
• Population size (N),
• mutation rate (m),
• crossover rate (c)

• Often these have to be “tuned” based on results


obtained - no general theory to deduce good values

• Typical values might be: N = 50, m = 0.05, c = 0.9


Crossover - Recombination

10100000 Parent1 Offspring 10110111


00 1 11
10010111 Paren Offspring 10100000
11 t2 2 00
Crossover
single With some high probability
point - (crossover rate) apply crossover
random to the parents. (typical values
are 0.8 to 0.95)
Mutation mutat
e

Offspring 10110111 Offspring 10110011


1 11 1 11
Offspring 10100000 Offspring 10000000
2 00 2 00
Original offspring Mutated
offspring

With some small probability (the


mutation rate) flip each bit in the
offspring (typical values between 0.1 and
0.001)
Example 3: A More Realistic
Example
• Suppose you have a large number of (x, y) data points
• For example, (1.0, 4.1), (3.1, 9.5), (-5.2, 8.6), ...

• You would like to fit a polynomial (of up to degree 5)


through these data points
• That is, you want a formula y = ax5 + bx4 + cx3 + dx2 +ex + f
that gives you a reasonably good fit to the actual data
• Here’s the usual way to compute goodness of fit:
• Compute the sum of (actual y – predicted y) 2 for all the data points
• The lowest sum represents the best fit

• There are some standard curve fitting techniques, but


let’s assume you don’t know about them

• You can use a genetic algorithm to find a “pretty good”


solution 99
Example 3: A More Realistic
Example
• Your formula is y = ax5 + bx4 + cx3 + dx2 +ex + f
• Your “genes” are a, b, c, d, e, and f
• Your “chromosome” is the array [a, b, c, d, e, f]
• Your evaluation function for one array is:

• For every actual data point (x, y)


• Compute ý = ax5 + bx4 + cx3 + dx2 +ex + f
• Find the sum of (y – ý)2 over all x
• The sum is your measure of “badness” (larger numbers are worse)

• Example: For [0, 0, 0, 2, 3, 5] and the data points (1, 12) and (2,
22):
• ý = 0x5 + 0x4 + 0x3 + 2x2 +3x + 5 is 2 + 3 + 5 = 10 when x is 1
• ý = 0x5 + 0x4 + 0x3 + 2x2 +3x + 5 is 8 + 6 + 5 = 19 when x is 2
• (12 – 10)2 + (22 – 19)2 = 22 + 32 = 13
• If these are the only two data points, the “badness” of [0, 0, 0, 2, 3, 5] is
13 100
Example 3: A More Realistic
Example
• Your algorithm might be as follows:

• Create 100 six-element arrays of random numbers

• Repeat 500 times (or any other number):


• For each of the 100 arrays, compute its badness (using all data
points)
• Keep the ten best arrays (discard the other 90)
• From each array you keep, generate nine new arrays as follows:
• Pick a random element of the six
• Pick a random floating-point number between 0.0 and 2.0
• Multiply the random element of the array by the random floating-point
number

• After all 500 trials, pick the best array as your final answer
101
Many Variants of GA

• Different kinds of selection (not roulette)


• Tournament
• Elitism, etc.

• Different recombination
• Multi-point crossover
• 3 way crossover etc.

• Different kinds of encoding other than bitstring


• Integer values
• Ordered set of symbols

• Different kinds of mutation


What is a Swarm?
• A loosely structured collection of interacting agents

• Agents:
• Individuals that belong to a group (but are not necessarily
identical)
• They contribute to and benefit from the group
• They can recognize, communicate, and/or interact with each
other

• A swarm is better understood if thought of as agents


exhibiting a collective behavior
Examples of Swarms in Nature
• Classic Example: Swarm of Bees

• Can be extended to other similar


systems:
• Ant colony
• Agents: ants
• Flock of birds
• Agents: birds
• Traffic
• Agents: cars
• Crowd
• Agents: humans
• Immune system
• Agents: cells and molecules
Characteristics of Swarming

• Simple rules for each individual


• 3 simple rules as in the next slides

• No central control
• Decentralized and hence robust

• Emergent
• Performs complex functions
Example: Bird Flocking

• “Boids” model was proposed by Reynolds


• Boids = Bird-oids (bird like)

• Only 3 simple rules


1. Rule 1: Avoid Collision with neighboring birds
2. Rule 2: Match the velocity of neighboring birds
3. Rule 3: Stay near neighboring birds

• https://www.youtube.com/watch?v=A6nvvFkbRkY
• https://en.wikipedia.org/wiki/Flock_(birds)
Collision Avoidance

• Rule 1: Avoid Collision with neighboring birds


Velocity Matching

• Rule 2: Match the velocity (direction & speed) of


neighboring birds
Flock Centering

• Rule 3: Stay near neighboring birds


Real World Insects Example
• Insects have a few hundred brain cells

• However, organized insects have been known for:


• Architectural wonder
• Complex communication systems
• Resistance to hazards in nature

• In the 1950’s E.O. Wilson observed:


• A single ant acts (almost) randomly – often leading to its own
demise
• A colony of ants provides food and protection for the entire
population
Bees

• Colony cooperation

• Regulate hive temperature

• Efficiency via Specialization: division of labour in the


colony

• Communication: Food sources are exploited according


to quality and distance from the hive
112
Self-Organization in Honey Bee Nest
Building

113
Simulation of Honey Bee Nest
Building

114
Simulation of Honey Bee Nest
Building

115
Simulation of Honey Bee Nest
Building

116
Simulation of Honey Bee Nest
Building

117
Simulation of Honey Bee Nest
Building

118
Simulation of Honey Bee Nest
Building

119
Ants

• Organizing highways to and from their foraging sites by


leaving pheromone trails

• Form chains from their own bodies to create a bridge to


pull and hold leafs together with silk

• Division of labour between major and minor ants


An In-depth Look at Real Ant Behaviour
Interrupt the Flow
The Path Thickens!
The New Shortest Path
Adapting to Environment Changes
Adapting to Environment Changes
Problems with Swarm Intelligent Systems

• Swarm Intelligent Systems are hard to ‘program’


since the problems are usually difficult to define.

• Solutions are emergent in the systems (not designed a priori,


you won’t know the pattern beforehand)

• Solutions result from behaviors and interactions among and


between individual agents
Possible Solutions to Create Swarm
Intelligence Systems

1. Create a catalog of the collective behaviours –


exhaustive list!

2. Model how social insects collectively perform tasks

• Use this model as a basis upon which artificial variations can


be developed

• Model parameters can be tuned within a biologically relevant


range or by adding non-biological factors to the model
Properties of Self-Organization

• Creation of structures
• Nest, foraging trails, or social organization

• Changes resulting from the existence of multiple


paths of development
• Non-coordinated & coordinated phases

• Possible coexistence of multiple stable states


• E.g. two equal food sources

• Self Organization:
• https://www.youtube.com/watch?v=3ypBqxv_tz8
• https://www.youtube.com/watch?v=BTR17I_Eb_o
4 Ingredients of Self Organization

1. Positive Feedback

2. Negative Feedback

3. Amplification of Fluctuations - randomness

4. Reliance on multiple interactions


Why is Swarm Intelligence interesting
for IT?
• Computer Systems are getting more and more
complicated
• Hard to have a master (centralized) control (coordinated)

• Swarm intelligence systems are:


• Robust
• Relatively simple

• Analogies in IT and social insects


• distributed system of interacting autonomous agents
• goals: performance optimization and robustness
• self-organized control and cooperation (decentralized)
• division of labour and distributed task allocation
• indirect interactions
• https://en.wikipedia.org/wiki/Swarm_intelligence
• http://www.scholarpedia.org/article/Swarm_intelligence
Two Common SI Algorithms
• Ant Colony Optimization
• https://en.wikipedia.org/wiki/Ant_colony_optimization_algorith
ms

• Particle Swarm Optimization


• http://www.swarmintelligence.org/tutorials.php
Particle Swarm Optimization

• Particle swarm optimization (PSO) imitates human


or insects social behavior.
• Individuals interact with one another while learning
from their own experience, and gradually move towards
the goal.
• It is easily implemented and has proven both very
effective and quick when applied to a diverse set of
optimization problems.
Particle Swarm Optimization

• Bird flocking is one of the best example of PSO in nature.


• One motive of the development of PSO was to model
human social behavior.
Biological Inspiration –
Ant Colony Optimization (ACO)

• Inspired by foraging behavior of ants.

• Ants find shortest path to food source from nest.

• Ants deposit pheromone (


http://www.medicalnewstoday.com/articles/232635.php)
along traveled path which is used by other ants to
follow the trail.

• This kind of indirect communication via the local


environment is called stigmergy.

• Has adaptability, robustness and redundancy.


Foraging behavior of Ants

• 2 ants start with equal probability of going on either


path.
Foraging behavior of Ants

• The ant on shorter path has a shorter to-and-fro time


from it’s nest to the food.
Foraging behavior of Ants

• The density of pheromone on the shorter path is higher


because of 2 passes by the ant (as compared to 1 by
the other).
Foraging behavior of Ants

• The next ant takes the shorter route.


Foraging behavior of Ants

• Over many iterations, more ants begin using the path


with higher pheromone, thereby further reinforcing it.
Foraging behavior of Ants

• After some time, the shorter path is almost exclusively


used.
Advantages of SI
• The systems are scalable because the same control
architecture can be applied to a couple of agents or
thousands of agents

• The systems are flexible because agents can be easily


added or removed without influencing the structure

• The systems are robust because agents are simple in


design, the reliance on individual agents is small, and
failure of a single agents has little impact on the
system’s performance

• The systems are able to adapt to new situations easily

You might also like