Soft Computing
Soft Computing
Introduction to Neuro, Fuzzy and Soft Computing, Fuzzy Sets : Basic Definition and Terminology,
Set-theoretic Operations, Member Function Formulation and Parameterization, Fuzzy Rules and
Fuzzy Reasoning, Extension Principle and Fuzzy Relations, Fuzzy If-Then Rules, Fuzzy Reasoning ,
Fuzzy Inference Systems, Mamdani Fuzzy Models, Sugeno Fuzzy Models, Tsukamoto Fuzzy Models,
Input Space Partitioning and Fuzzy Modeling.
LECTURE-1
INTRODUCTION:
What is intelligence?
Real intelligence is what determines the normal thought process of a human.
Artificial intelligence is a property of machines which gives it ability to mimic the human
thought process. The intelligent machines are developed based on the intelligence of a
subject, of a designer, of a person, of a human being. Now two questions: can we construct a
control system that hypothesizes its own control law? We encounter a plant and looking at
the plant behavior, sometimes, we have to switch from one control system to another control
system where the plant is operating. The plant is may be operating in a linear zone or non-
linear zone; probably an operator can take a very nice intelligent decision about it, but can a
machine do it? Can a machine actually hypothesize a control law, looking at the model? Can
we design a method that can estimate any signal embedded in a noise without assuming any
signal or noise behavior?
That is the first part; before we model a system, we need to observe. That is we collect certain
data from the system and How do we actually do this? At the lowest level, we have to sense
the environment, like if I want to do temperature control I must have temperature sensor.
This data is polluted or corrupted by noise. How do we separate the actual data from the
corrupted data? This is the second question. The first question is that can a control system be
able to hypothesize its own control law? These are very important questions that we should
think of actually. Similarly, also to represent knowledge in a world model, the way we
manipulate the objects in this world and the advanced is a very high level of intelligence that
we still do not understand; the capacity to perceive and understand.
What is AI ?
Artificial Intelligence is concerned with the design of intelligence in an artificial device.
The term was coined by McCarthy in 1956.
There are two ideas in the definition.
1. Intelligence
2. artificial device
What is intelligence?
–Is it that which characterize humans? Or is there an absolute standard of judgement?
–Accordingly there are two possibilities:
– A system with intelligence is expected to behave as intelligently as a human
– A system with intelligence is expected to behave in the best possible manner
Given this scenario different interpretations have been used by different researchers as defining
the scope and view of Artificial Intelligence.
1. One view is that artificial intelligence is about designing systems that are as intelligent as
humans. This view involves trying to understand human thought and an effort to build
machines that emulate the human thought process. This view is the cognitive science approach
to AI.
2. The second approach is best embodied by the concept of the Turing Test. Turing held that in
future computers can be programmed to acquire abilities rivaling human intelligence. As part
of his argument Turing put forward the idea of an 'imitation game', in which a human being
and a computer would be interrogated under conditions where the interrogator would not
know which was which, the communication being entirely by textual messages. Turing argued
that if the interrogator could not distinguish them by questioning, then it would be
unreasonable not to call the computer intelligent. Turing's 'imitation game' is now usually
called 'the Turing test' for intelligence.
3. Logic and laws of thought deals with studies of ideal or rational thought process and inference.
The emphasis in this case is on the inferencing mechanism, and its properties. That is how the
system arrives at a conclusion, or the reasoning behind its selection of actions is very
important in this point of view. The soundness and completeness of the inference mechanisms
are important here.
4. The fourth view of AI is that it is the study of rational agents. This view deals with building
machines that act rationally. The focus is on how the system acts and performs, and not so
much on the reasoning process. A rational agent is one that acts rationally, that is, is in the best
possible manner.
Typical AI problems
While studying the typical range of tasks that we might expect an “intelligent entity” to perform,
we need to consider both “common-place” tasks as well as expert tasks.
Examples of common-place tasks include
– Recognizing people, objects.
– Communicating (through natural language).
– Navigating around obstacles on the streets
These tasks are done matter of factly and routinely by people and some other animals.
Expert tasks include:
• Medical diagnosis.
• Mathematical problem solving
• Playing games like chess
These tasks cannot be done by all people, and can only be performed by skilled specialists.
Now, which of these tasks are easy and which ones are hard? Clearly tasks of the first type are
easy for humans to perform, and almost all are able to master them. However, when we look at
what computer systems have been able to achieve to date, we see that their achievements include
performing sophisticated tasks like medical diagnosis, performing symbolic integration, proving
theorems and playing chess.
On the other hand it has proved to be very hard to make computer systems perform many routine
tasks that all humans and a lot of animals can do. Examples of such tasks include navigating our
way without running into things, catching prey and avoiding predators. Humans and animals are
also capable of interpreting complex sensory information. We are able to recognize objects and
people from the visual image that we receive. We are also able to perform complex social
functions.
Intelligent behaviour
This discussion brings us back to the question of what constitutes intelligent behaviour. Some of
these tasks and applications are:
1. Perception involving image recognition and computer vision
2. Reasoning
3. Learning
4. Understanding language involving natural language processing, speech processing
5. Solving problems
6. Robotics
Practical applications of AI
AI components are embedded in numerous devices e.g. in copy machines for automatic
correction of operation for copy quality improvement. AI systems are in everyday use for
identifying credit card fraud, for advising doctors, for recognizing speech and in helping complex
planning tasks. Then there are intelligent tutoring systems that provide students with personalized
attention.
Thus AI has increased understanding of the nature of intelligence and found many applications. It
has helped in the understanding of human reasoning, and of the nature of intelligence. It has also
helped us understand the complexity of modeling human reasoning.
Approaches to AI
Strong AI aims to build machines that can truly reason and solve problems. These machines
should be self aware and their overall intellectual ability needs to be indistinguishable from that
of a human being. Excessive optimism in the 1950s and 1960s concerning strong AI has given
way to an appreciation of the extreme difficulty of the problem. Strong AI maintains that suitably
programmed machines are capable of cognitive mental states.
Weak AI: deals with the creation of some form of computer-based artificial intelligence that
cannot truly reason and solve problems, but can act as if it were intelligent. Weak AI holds that
suitably programmed machines can simulate human cognition.
Applied AI: aims to produce commercially viable "smart" systems such as, for example, a
security system that is able to recognise the faces of people who are permitted to enter a particular
building. Applied AI has already enjoyed considerable success.
Cognitive AI: computers are used to test theories about how the human mind works--for example,
theories about how we recognise faces and other objects, or about how we solve abstract
problems.
Limits of AI Today
Today‟s successful AI systems operate in well-defined domains and employ narrow, specialized
knowledge. Common sense knowledge is needed to function in complex, open-ended worlds.
Such a system also needs to understand unconstrained natural language. However these
capabilities are not yet fully present in today‟s intelligent systems.
What can AI systems do
Today‟s AI systems have been able to achieve limited success in some of these tasks.
• In Computer vision, the systems are capable of face recognition
• In Robotics, we have been able to make vehicles that are mostly autonomous.
• In Natural language processing, we have systems that are capable of simple machine
translation.
• Today‟s Expert systems can carry out medical diagnosis in a narrow domain
• Speech understanding systems are capable of recognizing several thousand words continuous
speech
• Planning and scheduling systems had been employed in scheduling experiments with
the Hubble Telescope.
• The Learning systems are capable of doing text categorization into about a 1000 topics
• In Games, AI systems can play at the Grand Master level in chess (world champion), checkers,
etc.
What can AI systems NOT do yet?
• Understand natural language robustly (e.g., read and understand articles in a newspaper)
• Surf the web
• Interpret an arbitrary visual scene
• Learn a natural language
• Construct plans in dynamic real-time domains
• Exhibit true autonomy and intelligence
Applications:
We will now look at a few famous AI system that has been developed over the years.
1. ALVINN:
Autonomous Land Vehicle In a Neural Network
In 1989, Dean Pomerleau at CMU created ALVINN. This is a system which learns to control
vehicles by watching a person drive. It contains a neural network whose input is a 30x32 unit
two dimensional camera image. The output layer is a representation of the direction the
vehicle should travel.
The system drove a car from the East Coast of USA to the west coast, a total of about 2850
miles. Out of this about 50 miles were driven by a human, and the rest solely by the system.
2. Deep Blue
In 1997, the Deep Blue chess program created by IBM, beat the current world chess
champion, Gary Kasparov.
3. Machine translation
A system capable of translations between people speaking different languages will be a
remarkable achievement of enormous economic and cultural benefit. Machine translation is
one of the important fields of endeavour in AI. While some translating systems have been
developed, there is a lot of scope for improvement in translation quality.
4. Autonomous agents
In space exploration, robotic space probes autonomously monitor their surroundings, make
decisions and act to achieve their goals.
NASA's Mars rovers successfully completed their primary three-month missions in April,
2004. The Spirit rover had been exploring a range of Martian hills that took two months to
reach. It is finding curiously eroded rocks that may be new pieces to the puzzle of the region's
past. Spirit's twin, Opportunity, had been examining exposed rock layers inside a crater.
5. Internet agents
The explosive growth of the internet has also led to growing interest in internet agents to
monitor users' tasks, seek needed information, and to learn which information is most useful
What is soft computing?
An approach to computing which parallels the remarkable ability of the human mind to
reason and learn in an environment of uncertainty and imprecision.
It is characterized by the use of inexact solutions to computationally hard tasks such as the
solution of nonparametric complex problems for which an exact solution can‟t be derived in
polynomial of time.
Why soft computing approach?
Mathematical model & analysis can be done for relatively simple systems. More complex
systems arising in biology, medicine and management systems remain intractable to
conventional mathematical and analytical methods. Soft computing deals with imprecision,
uncertainty, partial truth and approximation to achieve tractability, robustness and low
solution cost. It extends its application to various disciplines of Engg. and science. Typically
human can:
1. Take decisions
2. Inference from previous situations experienced
3. Expertise in an area
4. Adapt to changing environment
5. Learn to do better
6. Social behaviour of collective intelligence
Intelligent control strategies have emerged from the above mentioned characteristics of
human/ animals. The first two characteristics have given rise to Fuzzy logic;2nd , 3rd and 4th
have led to Neural Networks; 4th , 5th and 6th have been used in evolutionary algorithms.
Characteristics of Neuro-Fuzzy & Soft Computing:
1. Human Expertise
2. Biologically inspired computing models
3. New Optimization Techniques
4. Numerical Computation
5. New Application domains
6. Model-free learning
7. Intensive computation
8. Fault tolerance
9. Goal driven characteristics
10. Real world applications
Intelligent Control Strategies (Components of Soft Computing): The popular soft computing
components in designing intelligent control theory are:
1. Fuzzy Logic
2. Neural Networks
3. Evolutionary Algorithms
Fuzzy logic:
Most of the time, people are fascinated about fuzzy logic controller. At some point of time in
Japan, the scientists designed fuzzy logic controller even for household appliances like a
room heater or a washing machine. Its popularity is such that it has been applied to various
engineering products.
Fuzzy number or fuzzy variable:
We are discussing the concept of a fuzzy number. Let us take three statements: zero, almost
zero, near zero. Zero is exactly zero with truth value assigned 1. If it is almost 0, then I can
think that between minus 1 to 1, the values around 0 is 0, because this is almost 0. I am not
very precise, but that is the way I use my day to day language in interpreting the real world.
When I say near 0, maybe the bandwidth of the membership which represents actually the
truth value. You can see that it is more, bandwidth increases near 0. This is the concept of
fuzzy number. Without talking about membership now, but a notion is that I allow some
small bandwidth when I say almost 0. When I say near 0 my bandwidth still further increases.
In the case minus 2 to 2, when I encounter any data between minus 2 to 2, still I will consider
them to be near 0. As I go away from 0 towards minus 2, the confidence level how near they
are to 0 reduces; like if it is very near to 0, I am very certain. As I progressively go away
from 0, the level of confidence also goes down, but still there is a tolerance limit. So when
zero I am precise, I become imprecise when almost and I further become more imprecise in
the third case.
When we say fuzzy logic, that is the variables that we encounter in physical devices, fuzzy
numbers are used to describe these variables and using this methodology when a controller is
designed, it is a fuzzy logic controller.
Neural networks :
Neural networks are basically inspired by various way of observing the biological organism.
Most of the time, it is motivated from human way of learning. It is a learning theory. This is
an artificial network that learns from example and because it is distributed in nature, fault
tolerant, parallel processing of data and distributed structure.
The basic elements of artificial Neural Network are: input nodes, weights, activation function
and output node. Inputs are associated with synaptic weights. They are all summed and
passed through an activation function giving output y. In a way, output is summation of the
signal multiplied with synaptic weight over many input channels.
If we look at a computer and a brain, this is the central processing unit and a brain. Let us
compare the connection between our high speed computers that are available in the market
today and a brain. Approximately there are 10 to the power of 14 synapses in the human
brain, whereas typically you will have 10 to the power of 8 transistors inside a CPU. The
element size is almost comparable, both are 10 to the power minus 6 and energy use is almost
like 30 Watts and comparable actually; that is energy dissipated in a brain is almost same as
in a computer. But you see the processing speed. Processing speed is only 100 hertz; our
brain is very slow, whereas computers nowadays, are some Giga hertz.
When you compare this, you get an idea that although computer is very fast, it is very slow to
do intelligent tasks like pattern recognition, language understanding, etc. These are certain
activities which humans do much better, but with such a slow speed, 100 Hz. ….. contrast
between these two, one of the very big difference between these two is the structure; one is
brain, another is central processing unit is that the brain learns, we learn. Certain mapping
that is found in biological brain that we have studied in neuroscience is not there in a central
processing unit and we do not know whether self awareness takes place in the brain or
somewhere else, but we know that in a computer there is no self-awareness.
Neural networks are analogous to adaptive control concepts that we have in control theory
and one of the most important aspects of intelligent control is to learn the control parameters,
to learn the system model. Some of the learning methodologies we will be learning here is the
error-back propagation algorithm, real-time learning algorithm for recurrent network,
Kohonen‟s self organizing feature map & Hopfield network.
Features of Artificial Neural Network (ANN) models:
1. Parallel Distributed information processing
2. High degree of connectivity between basic units
3. Connections are modifiable based on experience
4. Learning is a continuous unsupervised process
5. Learns based on local information
6. Performance degrades with less units
All the methods discussed so far makes a strong assumption about the space around; that
is, when we use whether a neural network or fuzzy logic or …. and .… any method that may
have been adopted in intelligent control framework, they all make always very strong
assumptions and normally they cannot work in a generalized condition. The question is that
can they hypothesize a theory? When I design all these controllers, I always take the data; the
engineer takes the data. He always builds these models that are updated. They update their
own weights based on the feedback from the plant. But the structure of the controller, the
model by which we assume the physical plant, all these are done by the engineer and also the
structure of the intelligent controller is also decided by the engineer. We do not have a
machine that can hypothesize everything; the model it should select, the controller it should
select, looking at simply data. As it encounters a specific kind of data from a plant can it
come up with specific controller architecture and can it come up with specific type of system
model? That is the question we are asking now.
You will see that in the entire course we will be discussing various tools. They will only be
dealing with these two things; behaviour. These tools are actually developed by mimicking
the human behavior, but not the human way of working. An intelligent machine is one which
learns, thinks and behaves in line with the thought process. That we would like but we are
very far from it. At least, at the moment, we are very far from this target of achieving real
intelligence.
We perceive the environment in a very unique way, in a coherent manner. This is called unity
of perception and intelligence has also something to do with this unity of perception,
awareness and certain things are not very clear to us until now. So an intelligent machine is
one which learns, thinks & behaves in line with thought process.
Evolutionary algorithms:
These are mostly derivative free optimization algorithms that perform random search in a
systematic manner to optimize the solution to a hard problem. In this course Genetic
Algorithm being the first such algorithm developed in 1970‟s will be discussed in detail. The
other algorithms are swarm based that mimic behaviour of organisms, or any systematic
process.
LECTURE-2
Fuzzy Sets Basic Concepts
Characteristic Function (Membership Function)
Notation
Semantics and Interpretations
Related crisp sets
Support, Bandwidth, Core, α-level cut
Features, Properties, and More Definitions
Convexity, Normality
Cardinality, Measure of Fuzziness
MF parametric formulation
Fuzzy Set-theoretic Operations
Intersection, Union, Complementation
T-norms and T-conorms
Numerical Examples
Fuzzy Rules and Fuzzy Reasoning
Extension Principle and Fuzzy Relations
Fuzzy If-Then Rules
Fuzzy Reasoning
Fuzzy Inference Systems
Mamdani Fuzzy Models
Sugeno Fuzzy Models
Tsukamoto Fuzzy Models
Input Space Partitioning
Fuzzy Modeling.
The father of fuzzy logic is Lotfi Zadeh who is still there, proposed in 1965. Fuzzy logic can
manipulate those kinds of data which are imprecise.
Fuzzy Number:
A fuzzy number is fuzzy subset of the universe of a numerical number that satisfies condition
of normality & convexity.It is the basic type of fuzzy set.
why fuzzy is used? Why we will be learning about fuzzy? The word fuzzy means that, in
general sense when we talk about the real world, our expression of the real world, the way we
quantify the real world, the way we describe the real world, are not very precise.
When I ask what your height is, nobody would say or nobody would expect you to know a
precise answer. If I ask a precise question, probably, you will give me your height as 5 feet 8
inches. But normally, when I see people, I would say this person is tall according to my own
estimate, my own belief and my own experience; or if I ask, what the temperature is today,
the normal answer people would give is, today it is very hot or hot or cool. Our expression
about the world around us is always not precise. Not to be precise is exactly what is fuzzy.
Fuzzy logic is logic which is not very precise. Since we deal with our world with this
imprecise way, naturally, the computation that involves the logic of impreciseness is much
more powerful than the computation that is being carried through a precise manner, or rather
precision logic based computation is inferior; not always, but in many applications, they are
very inferior in terms of technological application in our day to day benefits, the normal way.
Fuzzy logic has become very popular; in particular, the Japanese sold the fuzzy logic
controller, fuzzy logic chips in all kinds of house hold appliances in early 90‟s. Whether it is
washing machine or the automated ticket machine, anything that you have, the usual house
hold appliances, the Japanese actually made use of the fuzzy logic and hence its popularity
grew.
As fuzzy means from precision to imprecision. Here, when I say 10, I have an arrow at 10,
pointing that I am exactly meaning 10 means 10.00000 very precise. When I say they are all
almost 10, I do not mean only 10, rather in the peripheral 10. I can tolerate a band from minus
9 to 9, whereas if I go towards 9 or 11, I am going away from 10, the notion of 10. That is
what is almost 10, that is around 10, but in a small bandwidth, I still allow certain bandwidth
for 10.
This concept to be imprecise is fuzzy or to deal with the day to day data that we collect or we
encounter and representing them in an imprecise manner like here almost 0, near 0, or hot,
cold, or tall; if I am referring to height, tall, short medium. This kind of terminology that we
normally talk or exchange among ourselves in our communication actually deals with
imprecise data rather than precise data. Naturally, since our communications are imprecise,
the computation resulting out of such communication language, the language which is
imprecise must be associated with some logic.
Fig. Sets: classical & fuzzy boundary
Set: A collection of objects having one or more common characteristics. For example, set of
natural number, set of real numbers, members, or elements. Objects belonging to a set is
represented as x belonging to A, where A is a set.
Universe of Discourse:
Defined as “a collection of objects all having the same characteristics”.
Notation: U or X, and elements in the universe of discourse are: u or x
Now, we will be talking about fuzzy sets. When I talked about classical set, we had classical
set of the numbers that we know, like we talked about the set of natural numbers, set of real
numbers. What is the difference between a fuzzy set and a classical set or a crisp set? The
difference is that the members, they belong to a set A or a specific set A or B or X or Y,
whatever it is, we define them; but the degree of belonging to the set is imprecise. If I say, a
universal set in natural numbers, all the natural numbers fall in this set. If I take a subset of
this natural number, like in earlier case, we put 1 to 11 in one set. When I ask, whether 12
belongs to set A, the answer is no; 13 belongs to set A? The answer is no; because, in my
natural number set, only 1 to 11 are placed. This is called classical set and their
belongingness here is one. They all belong to this set.
But in a fuzzy set, I can have all the numbers in this set, but with a membership grade
associated with it. When I say membership grade is 0 that means, they do not belong to the
set, whereas a membership grade between 0 to 1, says how much this particular object may
belong to the set.
The nomenclature/ Notation of a fuzzy set - how do we represent a fuzzy set there? One
way is that let the elements of X be x1, x2, up to xn; then the fuzzy set A is denoted by any of
the following nomenclature.
Mainly 2 types:
1. Numeric
2. Functional
Mostly, we
will be using either this or the first one, where you see the ordered pair x
1 µ A x1; x1 is member of A and x1 is associated with a fuzzy index and so forth, x2 and its
fuzzy index, xn and its fuzzy membership. The same thing, I can also write x1 upon µ A x1.
That means x1 is the member and this is the membership. The other way is here, in the third
pattern the membership is put first and in the bottom the member x1 with a membership, x2
with membership and xn with membership.
Every member x of a fuzzy set A is assigned a fuzzy index. This is the membership grade µA
x in the interval of 0 to 1, which is often called as the grade of membership of x in A. In a
classical set, this membership grade is either 0 or 1; it either belongs to set A or does not
belong. But in a fuzzy set this answer is not precise, answer is, it is possible. It is belonging to
set A with a fuzzy membership 0.9 and I say it belongs to A with a fuzzy membership 0.1;
that is, when I say 0.9, more likely it belongs to set A. When I say 0.1, less likely it belongs
to set A. Fuzzy sets are a set of ordered pairs given by A. The ordered pair is x, where x is a
member of the set. Along with that, what is its membership grade and how likely the subject
belongs to set A? That is the level we put, where x is a universal set and µx is the grade of
membership of the object x in A. As we said, this membership µ.
A x lies between 0 to 1; so, more towards 1, we say more likely it belongs to A. Like if I say
membership grade is 1, certainly it belongs to A.
For an example: a set of all tall people. Tall if I define, classically I would say above 6 is tall
and below 6 is not tall; that is, 5.9, 5 feet 9 inches is not tall and 6.1, 6 feet 1 inch is tall. That
looks very weird; it does not look nice to say that a person who is 6 feet 1 inch is tall and 5
feet 9 inches is not tall. This ambiguity that we have in terms of defining such a thing in
classical set, the difficulty that we face can be easily resolved in fuzzy set. In fuzzy set, we
can easily say both 6.1, 6 feet 1 inch as well as 5.9 inches as tall, but level this difference;
they are tall, but with a membership grade associated with this. This is what fuzzy set is.
Membership function - a membership function µ A x is characterized by µ A that maps all
the members in set x to a number between 0 to 1, where x is a real number describing an
object or its attribute, X is the universe of discourse and A is a subset of X.
Convexity:
Symmetry:
A fuzzy set is symmetric if its MF is symmetric about a certain point x=c such that,
µA(c+x)= µA(c-x) for all x in X
Comparison of the classical approach and fuzzy approach:
Let us say, consider a universal set T which stands for temperature. Temperature I can say
cold, normal and hot. Naturally, these are subsets of the universal set T; the cold temperature,
normal temperature and hot temperature they are all subsets of T.
The classical approach, probably, one way to define the classical set is cold. I define cold:
temperature T; temperature is a member of cold set which belongs to the universal set T such
that this temperature, the member temperature is between 5 degree and 15 degree centigrade.
Similarly, the member temperature belongs to normal, if it is between 15 degree centigrade
and 25 degree centigrade. Similarly, the member temperature belongs to hot set when the
temperature is between 25 degree centigrade and 35 degree centigrade. As I said earlier, one
should notice that 14.9 degree centigrade is cold according to this definition while 15.1
degree centigrade is normal implying the classical sets have rigid boundaries and because of
this rigidity, the expression of the world or the expression of data becomes very difficult. For
me, I feel or any one of us will feel very uneasy to say that 14.9 degrees centigrade is cold
and 15.1 degree centigrade is normal or for that matter, 24.9 degrees centigrade is normal and
25 degree or 25.1 degree centigrade is hot. That is a little weird or that is bizarre to have such
an approach to categorize things into various sets.
In a fuzzy set, it is very easy to represent them here. If the temperature is around 10 degree
centigrade, it is cold; temperature is around 20 degrees centigrade, it is normal and when
temperature is around 30 degree centigrade it is hot. In that sense, they do not have a rigid
boundary. If you say here, 25 degree centigrade, the 25 degree centigrade can be called
simultaneously hot as well as normal, with a fuzzy membership grade 0.5. 25 degrees
centigrade belongs to both normal as well as hot, but when I say 28 degree centigrade, this is
more likely a temperature in the category of hot, whereas the 22 degree centigrade is a
temperature that is more likely belonging to the set normal. This is a much nicer way to
represent a set. This is how the imprecise data can be categorized in a much nicer way using
fuzzy logic. This is the contrasting feature, why the fuzzy logic was introduced in the first
place.
Fuzzy sets have soft boundaries. I can say cold from almost 0 degree centigrade to 20 degree
centigrade. If 10 degree has a membership grade 1 and as I move away from 10 degree in
both directions, I lose the membership grade. The membership grade reduces from 1 to 0
here, and in this direction also from 1 to 0. The temperature, As I go, my membership grade
reduces; I enter into a different set simultaneously and that is normal. You can easily see, like
temperature 12, 13, 14, 15 all belong to both categories cold as well as normal, but each
member is associated with a membership grade; this is very important.
In a classical set, there are members in a set. Here, there are members in a set associated with
a fuzzy index or membership function.
LECTURE-3
Where c is the centre & a is adjusted to vary the width of MF, b controls slope at crossover
points.
Bell membership function is also termed as Cauchy MF.
Left-Right MF:
Sigmoidal MF:
It can be open left or open right depending on sign of a.
If I say, x is 1 In classical set when I say x is 1 then, I would say 1 minus x is 0. In this, the
manipulation concerns with the member; whereas any kind of manipulation in fuzzy set does
not involve with x; rather it involves µx.
Containment or subset:
Three common operations: intersection which we say is the minimum function, union, which
we say is the maximum function and then fuzzy complementation
This candidate, when it comes to A union B take these two values of membership, find the
maximum which is 0.1 and assign here, which is 0.1. This is, µ union B is 0.1. This is the
meaning. This is a very important operation that we do. When we have two different fuzzy
sets, the operations are classical. The manipulation is among the membership functions;
otherwise, the notion of the classical fuzzy operation also remains intact, except that the
associated fuzzy membership gets changed.
Complement(Negation):
now it is fuzzy complementation. What is complement? This one, this particular triangular
function is my set R(red); fuzzy set R. The complement is like this; just inverse (blue). What
is 1 minus µA x; meaning 1 minus µA x.
What is seen that the members remain intact in the set A, whereas the associated membership
functions got changed.
The other operations that we know for classical sets like De Morgan‟s law, the difference also
can be used for the sets like De Morgan‟s law.
Properties/ identities of fuzzy sets:
Identity: A union null set is A, A intersection universal set is A, A intersection null set is null
and A union universal set is universal set X; here, X represents universal set.
The next step in establishing a complete system of fuzzy logic is to define the operations of
EMPTY, EQUAL, COMPLEMENT (NOT), CONTAINMENT, UNION (OR), and
INTERSECTION (AND). Before we can do this rigorously, we must state some formal
definitions:
Definition 1: Let X be some set of objects, with elements noted as x. Thus,
X = {x}.
Definition 2: A fuzzy set A in X is characterized by a membership function
mA(x) which maps each point in X onto the real interval [0.0, 1.0]. As
mA(x) approaches 1.0, the "grade of membership" of x in A increases.
Definition 3: A is EMPTY iff for all x, µA(x) = 0.0.
Definition 4: A = B iff for all x: µA(x) = µB(x) [or, µA = µB].
Definition 5: µA' = 1 - µA.
Definition 6: A is CONTAINED in B iff µA <= µB.
Definition 7: C = A UNION B, where: µC(x) = MAX(µA(x), µB(x)).
Definition 8: C = A INTERSECTION B where: µC(x) = MIN(µA(x), µB(x)).
Difference probability & fuzzy operations:
It is important to note the last two operations, UNION (OR) and INTERSECTION (AND),
which represent the clearest point of departure from a probabilistic theory for sets to fuzzy
sets. Operationally, the differences are as follows:
For independent events, the probabilistic operation for AND is multiplication, which (it can
be argued) is counterintuitive for fuzzy systems. For example, let us presume that x = Bob, S
is the fuzzy set of smart people, and T is the fuzzy set of tall people. Then, if µS(x) = 0.90
and µT(x) = 0.90, the probabilistic result would be:
µS(x) * µT(x) = 0.81
whereas the fuzzy result would be:
MIN(µS(x), µT(x)) = 0.90
The probabilistic calculation yields a result that is lower than either of the two initial values,
which when viewed as "the chance of knowing" makes good sense. However, in fuzzy terms
the two membership functions would read something like "Bob is very smart" and "Bob is
very tall." If we presume for the sake of argument that "very" is a stronger term than "quite,"
and that we would correlate "quite" with the value 0.81, then the semantic difference
becomes obvious. The probabilistic calculation would yield the statement If Bob is very
smart, and Bob is very tall, then Bob is a quite tall, smart person. The fuzzy calculation,
however, would yield If Bob is very smart, and Bob is very tall, then Bob is a very tall, smart
person.
Another problem arises as we incorporate more factors into our equations (such as the fuzzy
set of heavy people, etc.). We find that the ultimate result of a series of AND's approaches
0.0, even if all factors are initially high. Fuzzy theorists argue that this is wrong: that five
factors of the value 0.90 (let us say, "very") AND'ed together, should yield a value of 0.90
(again, "very"), not 0.59 (perhaps equivalent to "somewhat").
Similarly, the probabilistic version of A OR B is (A+B - A*B), which approaches 1.0 as
additional factors are considered. Fuzzy theorists argue that a sting of low membership grades
should not produce a high membership grade instead, the limit of the resulting membership
grade should be the strongest membership value in the collection.
The skeptical observer will note that the assignment of values to linguistic meanings (such as
0.90 to "very") and vice versa, is a most imprecise operation. Fuzzy systems, it should be
noted, lay no claim to establishing a formal procedure for assignments at this level; in fact,
the only argument for a particular assignment is its intuitive strength. What fuzzy logic does
propose is to establish a formal method of operating on these values, once the primitives have
been established.
Hedges :
Another important feature of fuzzy systems is the ability to define "hedges," or modifier of
fuzzy values. These operations are provided in an effort to maintain close ties to natural
language, and to allow for the generation of fuzzy statements through mathematical
calculations. As such, the initial definition of hedges and operations upon them will be quite a
subjective process and may vary from one project to another. Nonetheless, the system
ultimately derived operates with the same formality as classic logic. The simplest example is
in which one transforms the statement "Jane is old" to "Jane is very old." The hedge "very" is
usually defined as follows:
µ"very"A(x) = µA(x)^2
Thus, if mOLD(Jane) = 0.8, then mVERYOLD(Jane) = 0.64.
Other common hedges are "more or less" [typically SQRT(µA(x))], "somewhat," "rather,"
"sort of," and so on. Again, their definition is entirely subjective, but their operation is
consistent: they serve to transform membership/truth values in a systematic manner according
to standard mathematical functions.
LECTURE-5
A point to point mapping from a set A to B through a function is possible. If it is many to one
for two x in A then the membership function value in set B is calculated for f(x) as max
value of MF.
Fuzzy Relation:
CRISP MAPPINGS:
B=A◦R
2011 13 / 62
Applying the Relation
Fuzzy Mappings:
The example is, let x equal to 1 2 3. Then x has three members, y has two members 1 and 2.
If the membership function associated with each ordered pair is given by this e to the power
minus x minus y whole squared. I is seen that this is the kind of membership function that is
used to know, how close is the members of y are from members of x. Because, if I relate from
1 to 1 using this, then you can see 1 minus 1 is 0 that is 1 and 1 very close to each other;
whereas, 2 and 1 is little far and 3 1 one is further far. This is a kind of relationship we are
looking between these two sets.
Let us derive fuzzy relation. If this is the membership function, fuzzy relation is of course all
the ordered pairs. We have to find out 111 2 2 1 2 2 3 1 and 3 2. These are all the sets of
ordered pairs and associated membership functions. You just compute e to the power minus x
minus y whole square. Here, 1 1 1 minus 1 whole square, 1 2 1 minus 2 whole square, 2 1 2
minus 1 whole square, 2 two 2 minus 2 whole square, 3 1 3 minus 1 whole square, 3 2 3
minus 2 whole square and if you compute them, you find 1 0.4 3 0.4 3 1 0.1 6 0.4 3. This is
your membership function. This is one way to find relation.
Normally, I know, it is easier to express the relation in terms of a matrix instead of this
continuum fashion, where each ordered pair is associated with membership function. It is
easier to appreciate the relation by simply representing them in terms of matrix. How do we
do that? This is my x 1 2 3 y is 1 21 the membership function associated was 1 1 2
membership is 0.4 3 2 1 0.4 3 2 2 1 3 1 0.1 6 and 3 2 is 0.4 3 that you can easily verify here 1
3 0.4 3 0.1 6 and 1.
The membership function describes the closeness between set x and y. It is obvious that
higher value implies stronger relations. What is the stronger relation? It is between 1 and 1,
and they are very close to each other, and 2 and 2; they are very close to each other.
Closeness between 2 and
2, between 1 and 1 is actually 1 and 1. They are very close to each other; similarly, 2 and 2. If
I simply say numerical closeness, then 2 and 2 are the closest, and 1 and 1 are the closest.
That is how these are the closest. Higher value implies stronger relations.
This is a formal definition of fuzzy relation; it is a fuzzy set defined in the Cartesian product
of crisp sets; crisp sets x1 x2 until xn. A fuzzy relation R is defined as µR upon x1 to xn,
where x1 to xn belongs to the Cartesian product space of x1 until xn; whereas, this µR the
fuzzy membership associated is a number between 0 and 1.
LECTURE-6
Max-product composition:
Let us say, this is x1 x2 x3 z1 z2 z1 z2 and this is x1 x2 x3 for x1. I take this row which is 0.1
0.2 and finding the relation the fuzzy membership associate x1 and z1. I take the column
from z1 which is 0.9 0.7 and I multiply them here 0.1 0.9 is point 0 9 0.2 0.7 is 0.1 4 and find
out what is the maximum. This is the maximum 0.1 4.
I take another example. Let us find out the relationship between x2 and z2; for x2 the row is
0.4 0.5 and z2 the column is 0.8 0.6. Corresponding to this, if I multiply I get 0.4 0.8 is 0.3 2
0.5 0.6 is 0.3. Maximum is 0.3 2. This is 0.4 3 0.3 2. This is where it is 0.1. The answer is
here, the R3 and if I go back, if I look, R3 here is different.
LECTURE-7
If x is A then y is B
“y is B” is consequence or conclusion
The whole statement is the rule.
FUZZY MODELLING:
Fuzzy Inferencing
The process of fuzzy reasoning is incorporated into what is called a Fuzzy Inferencing
System. It is comprised of three steps that process the system inputs to the appropriate system
outputs. These steps are 1) Fuzzification, 2) Rule Evaluation, and 3) Defuzzification. The
system is illustrated in the following figure.
Each step of fuzzy inferencing is described in the following sections.
Fuzzification
Fuzzification is the first step in the fuzzy inferencing process. This involves a domain
transformation where crisp inputs are transformed into fuzzy inputs. Crisp inputs are exact
inputs measured by sensors and passed into the control system for processing, such as
temperature, pressure, rpm's, etc.. Each crisp input that is to be processed by the FIU has its
own group of membership functions or sets to which they are transformed. This group of
membership functions exists within a universe of discourse that holds all relevant values that
the crisp input can possess. The following shows the structure of membership functions
within a universe of discourse for a crisp input.
where:
degree of membership: degree to which a crisp value is compatible to a membership
function, value from 0 to 1, also known as truth value or fuzzy input.
membership function, MF: defines a fuzzy set by mapping crisp values from its domain to
the sets associated degree of membership.
crisp inputs: distinct or exact inputs to a certain system variable, usually measured
parameters external from the control system, e.g. 6 Volts.
label: descriptive name used to identify a membership function.
scope: or domain, the width of the membership function, the range of concepts, usually
numbers, over which a membership function is mapped.
universe of discourse: range of all possible values, or concepts, applicable to a system
variable.
When designing the number of membership functions for an input variable, labels must
initially be determined for the membership functions. The number of labels correspond to the
number of regions that the universe should be divided, such that each label describes a region
of behavior. A scope must be assigned to each membership function that numerically
identifies the range of input values that correspond to a label.
The shape of the membership function should be representative of the variable. However this
shape is also restricted by the computing resources available. Complicated shapes require
more complex descriptive equations or large lookup tables. The next figure shows examples
of possible shapes for membership functions.
When considering the number of membership functions to exist within the universe of
discourse, one must consider that:
i) too few membership functions for a given application will cause the response of the system
to be too slow and fail to provide sufficient output control in time to recover from a small
input change. This may also cause oscillation in the system.
ii) too many membership functions may cause rapid firing of different rule consequents for
small changes in input, resulting in large output changes, which may cause instability in the
system.
These membership functions should also be overlapped. No overlap reduces a system based
on Boolean logic. Every input point on the universe of discourse should belong to the scope
of at least one but no more than two membership functions. No two membership functions
should have the same point of maximum truth, (1). When two membership functions overlap,
the sum of truths or grades for any point within the overlap should be less than or equal to 1.
Overlap should not cross the point of maximal truth of either membership function.
The fuzzification process maps each crisp input on the universe of discourse, and its
intersection with each membership function is transposed onto the μ axis as illustrated in the
previous figure. These μ values are the degrees of truth for each crisp input and are associated
with each label as fuzzy inputs. These fuzzy inputs are then passed on to the next step, Rule
Evaluation.
Fuzzy If then Rules :
We briefly comment on so-called fuzzy IF-THEN rules introduced by Zadeh. They may be
understood as partial imprecise knowledge on some crisp function and have (in the simplest
case) the form IF x is A THEN y is B . They should not be immediately understood as
i i
implications; think of a table relating values of a (dependent) variable y to values of an
(independent variable) x:
OR represents the union or maximum between the two sets, expressed as:
The process for determining the result or rule strength of the rule may be done by taking the
minimum fuzzy input of antecedent 1 AND antecedent 2, min. inferencing. This minimum
result is equal to the consequent rule strength. If there are any consequents that are the same
then the maximum rule strength between similar consequents is taken, referred to as
maximum or max. inferencing, hence min./max. inferencing. This infers that the rule that is
most true is taken. These rule strength values are referred to as fuzzy outputs.
Defuzzification
Defuzzification involves the process of transposing the fuzzy outputs to crisp outputs. There
are a variety of methods to achieve this, however this discussion is limited to the process used
in this thesis design.
A method of averaging is utilized here, and is known as the Center of Gravity method or
COG, it is a method of calculating centroids of sets. The output membership functions to
which the fuzzy outputs are transposed are restricted to being singletons. This is so to limit
the degree of calculation intensity in the microcontroller. The fuzzy outputs are transposed to
their membership functions similarly as in fuzzification. With COG the singleton values of
outputs are calculated using a weighted average, illustrated in the next figure. The crisp
output is the result and is passed out of the fuzzy inferencing system for processing
elsewhere.
A fuzzy implication relation is another category, which will call Zadeh implication. This is if
p implies q may imply either p and q are true or p is false. What we are saying is that just like
a local Mamdani rule, we say p and q are true imply either p and q are true or p is false. Thus,
p implies q means…. p and q are simultaneously true, which is Mamdani local rule or if p is
false, then p implies q has no meaning or p is false. This has taken an extra logic that is p and
q or not p.
Thus, the relational matrix can be computed as follows. If I look at this, what is p and q? p
and q means minimum of muA x and muB y. What is not p? 1 minus µA x. This entire thing
has to be maximum of minimum of these and this, which is this statement.µ, the relational
matrix elements are computed using this particular expression. Given a set of rules, we just
learnt various schemes by which we can construct a relational matrix between the antecedent
and the consequent. The next step would be to utilize this relational matrix for inference. This
method is commonly known as compositional rule of inference, that is, associated with each
rule we have a relational matrix. So, given a rule means given a relational matrix and given
another antecedent, we compute a consequent.
Fig. Compositional rules
This is derived using fuzzy compositional rules. The following are the different rules for
fuzzy composition operation, that is, B equal to A composition R. R is the relational matrix
associated with a specific rule, A is a new antecedent that is known, R is known, B is the new
consequent for the new antecedent A. I have to find out what is B for this new A, given R.
That is computed by A composition R and we have already discussed in the relation class that
there are various methods and max-min is very popular.
First, we compute min and then max. Similarly, max-product: instead of min, we take the
product and compute what is the maximum value. Similarly, min-max: instead of max-min, it
is min-max. First, max and then min. Next, max-max and min-min. One can employ these
looking at the behavior of a specific data.
There are other mechanisms also that we discussed. For the same example, if you use max-
min, you get B dash; for max-product, you get another B dash; for min-max, you get another.
Min-max and max are same for this example. Then, for max-max, you see that all the fuzzy
membership are the maximum values and for min-min, they are the minimum values here.
Approximate reasoning:
means given any logical system, we do not have, it is very difficult to make an exact result.
That is why from engineering perspective, we are more liberal. We do not want to be so
precise. As long as our system works, we are happy; if our control system works, we are
happy.
Rule 1: If temperature is hot, then the fan should run fast. If temperature is moderately hot,
then the fan should run moderately fast. In this example, we are given the temperature is in
degree Fahrenheit and the speed is expressed as 1000 rpm. The fuzzy set for hot H is for 70
degree Fahrenheit, 80 degree Fahrenheit, 90 degree Fahrenheit, and 100 degree Fahrenheit,
the membership values are 0.4, 0.6, 0.8, and 0.9. Similarly, for the fuzzy set F, for which the
fan should run fast, the fuzzy set is for 1000 rpm, the membership is 0.3, for 2000 rpm, the
membership is 0.5, for 3000 rpm, the membership 0.7, and for 4000 rpm, the membership is
0.9.
Given H dash, which is moderately hot, to be for 70… moderately hot means it is a little
more hot. So, same temperature obviously and their corresponding membership values will
reduce, because if I am describing moderately hot, they will have the same temperature but
the membership values will be less. You can easily see here that for 70, instead of 0.4, now it
is 0.2; for 80, instead of 0.6, it is 0.4; for 90, instead of 0.8, it is 0.6; for 100, instead of 0.9, it
is 0.8. This is moderately hot. Now, the question is find F dash.
I hope you are clear with this question. The question is very simple. We are given rule 1, we
have defined what is the fuzzy set hot and fuzzy set fast by these two statements and in the
second rule for moderately hot, we know the fuzzy set. We do not know what the fuzzy set is
corresponding to moderately hot, that is, moderately fast. We do not know moderately fast.
Find out F dash. If H, then F. If H dash, then F dash. Find out F dash. First, what do we do?
Corresponding to rule 1, we found out what is R. This is for rule 1. We knew that the
membership functions for H were 0.4, 0.6, 0.8, and 0.9, and for fast, the membership
functions where 0.3, 0.5, 0.7, and 0.9. If you look at this, these are my H values, the crisp
values: 70 degree Fahrenheit, 80 degree Fahrenheit, 90 degree Fahrenheit, and 100 degree
Fahrenheit. This is my speed: 1000 rpm, 2000 rpm, 3000 rpm, and 4000 rpm.
Between 70 and 1000 rpm, the entry would be minimum of these two (Refer Slide Time:
41:57), which is 0.3. Similarly, between 0.4 and 0.5, the minimum would be again 0.4 and
then between 0.4 and 0.7, it will be 0.4, and for 0.4 and 0.9, it is 0.4.
Similarly, we go to the next one, which is 0.6. For 0.6, 0.3 minimum 0.3, for 0.6 and 0.5, the
minimum is 0.5, for 0.6 and 0.7, minimum is 0.6, for 0.6 and 0.9, it is 0.6. Similarly, you can
fill all other cells here with their values: 0.3, 0.5, 0.7, 0.8, 0.3, 0.5, 0.7, and 0.9. This is my
relation matrix associated with rule 1: if H, then F. Now, what I have to do is I have to find
out F dash given H dash, using the fuzzy compositional rule of inference, which is
represented like this.
This is my H dash (moderately hot) and I have to do compositional inference between H dash
and R. Again, I am repeating so that you understand how to compute it. You put this row
vector along this column vector first . For each element, you find out what is the minimum.
You see that here it is 0.2, 0.3, 0.3, and 0.3 and the maximum of that is 0.3.
Similarly, you take again these values and put them here vertically. Here, the minimum is 0.2,
here 0.4, here 0.5, here 0.5, and maximum is 0.5. I am sure you will see here it is 0.7, but in
this case, you find that if you take this here, it is 0.2, here 0.4, here 0.6, here 0.8, and
maximum is 0.8. F dash is 0.3, 0.5, 0.7, and 0.8. That is how we infer or we do approximate
reasoning for a rule base. This is a very simple case.
Multiple rule:
There are two rules now. Rule 1 is if height is tall, then speed is high. Rule 2: if height is
medium, then speed is moderate. This is describing a rule for a person as to how fast he can
walk. Normally, those who are tall can walk very fast and those who are short, naturally their
speed will be less. This is one fuzzy rule that expresses the speed of a person while walking.
If height is tall, then speed is high and if height is medium, then speed is moderate. For this,
the fuzzy memberships are defined as tall, high, medium, and moderate.
Tall is 0.5, 0.8, and 1 for various feet like 5, 6, and 7. For speed is high, for 5 meter per
second, 7 meter per second, and 9 meter per second, the corresponding membership values
are 0.4, 0.7, and 0.9. For H2, which is medium height, the corresponding fuzzy
membership… you can easily see that when I say medium in this fuzzy set, 5 has 0.6, 6 has
0.7, and 7 has 0.6. The moderate speed is 0.6 for 5 meter per second, 0.8 for 7 meter per
second, and 0.7 for 9 meter per second. If this is the fuzzy set given, now the question is
given H dash, which is above average, and the corresponding fuzzy set is 0.5, 0.9, 0.8 for
three different heights, find S dash, the speed above normal. I hope the question is very clear
to you.
This is the solution of this example. We have two rules. Naturally, we will have two
relational matrices: R1 for rule 1 and R2 for rule 2. I will not go in detail of how we compute.
You simply you go the antecedent and consequent, look at the membership function, find the
minimum for each entry. Here, these are the heights and these are the speeds; 5, 6, 7 feet is
the height and 5, 7, and 9 meter per second are the speeds of the individuals.
Now, you check the fuzzy sets and corresponding to each fuzzy set, find out what is the
minimum membership function. For 5, 5, you will find the membership function is 0.4,
minimum 0.5, 0.5, 0.4, 0.8, 0.8, 0.4, 0.8, 0.9. You can verify this. Similarly, R2 can be found
out. Taking the minimum membership entry between these two fuzzy sets, that is,
if I say this is H1 and S1 and this is H2 and S2. Look at these two fuzzy sets, find out what
the minimum entries are for each relation and then, how do we compute S dash above
normal? We have now two relational matrices. It is very simple. We do two composition
operations: H dash composition with R1 (this one) and again, H dash composition R2 and
then, we take the maximum of that, maximum of these two.
You can easily see that the maximum of H dash composition R1, H dash composition R2.
You can easily see that because H dash is common, this particular expression is the same as
H dash composition max of R1 and R2. This is R1 and R2. We look at all those entries
wherever it is the maximum: for 0.4 and 0.6, the maximum is 0.6; for 0.5 and 0.6, the
maximum is 0.6; for 0.5 and 0.6, the maximum is 0.6. You see the last element here 0.9 here
and 0.6, so this is 0.9. Like that, for all entries of R1 and R2, whatever the maximum values,
you put these values here (that is called maximum R1 and R2) and take a composition with H
dash. So H dash composition max of R1 and R2. H dash is already given as 0.5, 0.9, and 0.8.
If you do this composition, you get 0.6, 0.8, and 0.8. I hope this clears your concept of how
we compute or we do approximate reasoning in a rule base. Similarly, if there are multiple
rules, we have no problem and we can go ahead with the same principle.
The last section is the multiple rules with continuous fuzzy sets. We talked about discrete
fuzzy set, but if it is continuous fuzzy sets, how do we deal with that? Normally, a continuous
fuzzy system with two non-interactive inputs x1 and x2, which are antecedents, and a single
output y, the consequent, is described by a collection of r linguistic IF-THEN rules Where the
rule looks like this: If x1 is A1 k and x2 is A2 k, then y k is B k, where k is 1, 2 up to r. This
is the k th rule. Similarly, we can have rule 1, rule 2, rule 3, up to rule r. In this particular
rule, A1 k and A2 k are the fuzzy sets representing the k th antecedent pairs and B k are the
fuzzy sets representing the k th consequent. In the following presentation, what we will do
now is we will take a two-input system and two-rule system just to illustrate how we infer
from a rule base where the fuzzy sets are continuous. The inputs to the system are crisp
values and we use a max-min inference method.
We have two rules here represented graphically. You can see there are two variables x1 and
x2. There are two fuzzy variables and for each rule, we have a consequent y. The first rule
says that if x1 is A1 1 and x2 is A2 1, then y is B1.
Similarly, if x1 is A1 2, x2 is A2 2, then y is B2. Now, how do we infer? Given a crisp input,
a new input is given, crisp input in the domain of x1 and another crisp input in the domain of
x2. There can be a system whose two variables can be temperature as well as pressure. You
can easily think x1 to be the temperature and x2 to be the pressure. For example, for a
particular given system, you found out the temperature to be 50 degrees centigrade and
pressure to be some value. Given these two quantities, crisp quantities, how do we infer what
should be y?
The crisp input is given – temperature. Now, you find out corresponding membership values
here. Corresponding to this crisp input, we get the membership value in rule 1 as µA1 1 and
for the same crisp input, this rule 2 will provide you muA1 2. Now, in the second fuzzy
variable, given crisp input, rule 1 will compute µA2 1 and for the second one, the second
rule, the same crisp input would give this one, which is muA2 2. Once we find out these
membership values, what do we do? We graphically see which is minimum between µA1 1
and µA2 1. The minimum is µA2 1. We take that and we shade these areas in consequence.
Now, we take the second rule. We find between µA1 2 and µA2 2, the minimum is µA1 2.
We take that minimum and shade the area and consequent part of this rule 2. Now
graphically, we add these two taking the maximum. First, min and then max. You can easily
see that when I overlap this figure over this figure, I get this particular figure. You overlap
this second figure on the first figure or first figure on the second figure and take the resultant
shaded area. After taking this resultant shaded area…. Once you find this shaded area, the
next part is to see what is y given a crisp value. There are many methods, but we will focus in
this class or in this course on only one method, that is, center of gravity method(COG).
Obviously, if I take this figure and find out what is the center of gravity, it is this value y star.
The crisp output can be obtained using various methods. One of the most common method is
the center of gravity method. The resulting crisp output is denoted as y star in the figure. This
is y star. What we learnt in this is given a crisp input 1 and crisp input 2 and given two fuzzy
rules, how do we infer correspondingly a crisp output? Our data is crisp, but we are doing
fuzzy computation. Hence, rules are fuzzy. We take this data to the fuzzy rule base and then
fuzzify them through fuzzification process. Graphically, we find what is the net shaded area
using the max principle. We found out the shaded area for each rule in consequent taking the
min principle. Taking the max principle, we found out the resultant area and then, y star is the
center of gravity of these areas.
LECTURE-8
Categories:
1. Mamdani type and
2. Takagi–Sugeno type (T-S or TSK for short form T. Takagi, M. Sugeno, and K. T.
Kang).
Mamdani type fuzzy systems:
These employ fuzzy sets in the consequent part of the rules. This is a Mamdani type fuzzy
logic controller. What they do is that the consequent part itself takes the control action; the
incremental control action is described in the consequent part of each rule.
Fuzzifier. In a fuzzy logic controller, the computation is through linguistic values, not
through exact computation. Naturally, the fuzzifier would fuzzify the crisp data. In case of
temperature, I can say it is hot, medium-hot, cold, medium-cold, very hot and normal. These
are the fuzzifier. That means given a crisp data or the value of temperature say 40 degrees,
then I have to now convert to various linguistic values and each linguistic value will be
associated with a specific membership function. That is fuzzifier.
Once the data has been fuzzified, then it goes to the rule base and using an inference
mechanism…. The inference is taking place in fuzzy term, not in classical term and after a
fuzzy inference takes place about the decision or about the control action, we place a
defuzzifier. What this defuzzifier does is it converts the fuzzy control action to a crisp control
action.
In general, what we can say is the principal design parameters of a fuzzy logic controller are
the following: fuzzification strategies and interpretation of a fuzzification operator. How do
we fuzzify a crisp data? In the database, the discretization or normalization of universe of
discourse is done, because we must know the range of data one will encounter in an actual
plant. Accordingly, the normalization must be done so that we are taking into account all
possible values of data that one may encounter in a physical plant.
If I know the dynamic range of an input to the controller and the input to the plant (input to
Fig. Parameters to be designed in FLC
the plant is actually output to the controller)… if I know the dynamic range, then in that
dynamic range, I must learn how to do fuzzy partition of the input and output space and this
fuzzification suits such the process should be complete in the sense.… You see that I am
drawing a universe of discourse here. This is the real value for a specific variable x1. If I have
defined a fuzzy set like this and like this, you can easily see that this part of the data is not
associated with any fuzzy membership. This is µ and this is x1 and unfortunately, this part is
not associated with any membership.
This fuzzification process is not complete. That means the entire universe of discourse in a
specific domain, wherever there are control systems…. There are various kinds of control
systems: process control, robot control, and aircraft control. Every control system is
associated with some input data and some output data. All possible input data and all possible
output data should be associated with a specific linguistic value as well as a membership
function.
Rule base:
Once fuzzification is done, how do we create a rule base? As I said, typically, in the rule
base, the two variables that are most important are error and change in error and we also
showed why it is so. Rule base. Choice of process state input variables and control variables.
You know that if I am implementing a fuzzy state feedback controller, then, a fuzzy state
feedback controller u would be minus K x. So, x is the states of the system, whereas if I am
implementing a fuzzy PID controller, then it will be u old plus K delta u k. Here, this delta u
k is a function of error and change in
error, whereas, in a state feedback controller, this is a common signal r and so, the control
action is dependent on state x1, x2, and xn.
Source and derivation of fuzzy control rules.
How do I derive these rules? What is the basis? Types of fuzzy control rules. A type of fuzzy
control rule means whether it is a PID controller, fuzzy PID controller or it is a fuzzy state
feedback controller. Similarly, completeness of fuzzy control rules means given any crisp
data in the domain of input space as well as output space, do I have in my rule base a specific
rule associated with this data? If I do not have any rule for this data, then the FLC will fail.
That is meaning of completeness of fuzzy control rules.
Fuzzy inference mechanism:
We have already talked about what is fuzzy inference mechanism. Given multiple rules, how
do we infer the consequence part? Defuzzification strategies and the interpretation of
fuzzification operator. Once the fuzzy inference is done, from the fuzzy inference, how do I
get a crisp value or a crisp control action? This is called defuzzification.
This is how we fuzzify a crisp data to fuzzy data or we make them fuzzy, that is, the crisp
input for variable x1 and x2…. Actually, this is not x1 and x2 but e and delta e are converted to
fuzzy sets using triangular membership functions. It is not always triangular, it can be
anything, but normally in control literature, most of these membership functions are
triangular functions.
Fig. Defuzzification
LECTURE-9
Example #2 : Two-input
• A two-input TSK fuzzy model with 4 rules can be expressed as
– If X is small and Y is small then Z = -X +Y +1.
– If X is small and Y is large then Z = -Y +3.
– If X is large and Y is small then Z = -X+3.
– If X is large and Y is large then Z = X+Y+2.
Zero-order TSK Fuzzy Model
• When f is constant, we have a zero-order TSK fuzzy model (a special case of the
Mamdani fuzzy inference system which each rule‟s consequent is specified by a fuzzy
singleton or a pre defuzzified consequent)
• Minimum computation time Overall output via either weighted
average or weighted sum is always crisp
• Without the time-consuming defuzzification operation, the TSK (Sugeno) fuzzy model is by
far the most popular candidate for sample data-based fuzzy modeling.
A general Takagi–Sugeno model of N rules for any physical plant, a general T–S model of N
rules is given by Rulei. This is the i th rule. If x1 k is a specific fuzzy set M1 i and x2 k is
another specific fuzzy set M2 i and so on until xn k is another fuzzy set Mn i, then the system
dynamics locally is described as x k plus 1 is Ai x k plus Bi u k, where i equal to 1, 2 until N,
because there are N rules.
Advantages over Mamdani model:
1. Less computation
2. Less time consuming
3. Simple
4. Mostly used for sample data based fuzzy modelling
Tsukamoto Fuzzy Models:
• The consequent of each fuzzy if-then rule is represented by a fuzzy set with monotonical
MF
– As a result, the inferred output of each rule is defined as a crisp value induced by the
rules‟ firing strength.
• The overall output is taken as the weighted average of each rule‟s output.
Fig. (a) Grid partition (b) Tree partition (c) Scatter partition
If certain transformation of the input is done, more flexible boundaries and partition will be
obtained.
LECTURE-10
What is a neuron? A neuron is the basic processing unit in a neural network sitting on our
brain. It consists of
1. Nucleus-
3. Dendrites-Input node
4. Synaptic junction
The dynamics of this synaptic junction is complex. We can see the signal inputs from the
action of a neuron and through synaptic junction an output is actuated which is carried over
through dendrites to another neuron. Here, these are the neurotransmitters. We learned from
our experience that these synaptic junctions are either reinforced or in the sense they behave
in such a way that the output of synaptic junction may excite a neuron or inhibit the neuron.
This reinforcement of the synaptic weight is a concept that has been taken to artificial neural
model.
The objective is to create artificial machine and this artificial neural networks are motivated
by certain features that are observed in human brain, like as we said earlier, parallel
distributed information processing.
1. Computer has such fast speed of GHz, a traditional computer, however, when it
comes to certain processing like pattern recognition and language understanding, the
brain is very fast.
2. Intelligence and self-awareness, are absent in an artificial machine.
Fig. An artificial neuron
An Artificial Neuron:
Basic computational unit in an artificial neural network is neuron. Obviously, it has to be an
artificial neuron.
In a simple neuron, if input signals be x1, x2, xn with weights w1, w2 and wn. The weighted sum
will activate this total output by an activation function f. That is your output. What you are
seeing is actually a nonlinear map from input vector x i to output y. A single neuron has single
output but multiple inputs. Inputs are multiple for a single neuron and the output is unique, y
and this output y and the input bear a nonlinear relationship, by f. Neural networks can be
built using this single neuron. We can use the single neuron and build neural networks.
Analogy to brain:
Artificial Neural Network (ANN) is a system which performs information processing. An
ANN resembles or it can be considered as a generalization of mathematical model of human
brain assuming that
1. Information processing occurs at many simple elements called neurons.
2. Signals are passed between neurons over connection links.
3. Each connection link has an associated weight, which in a typical neural net
multiplies the signal transmitted.
ANN is built with basic units called neurons which greatly resemble the neurons of human
brain. A neural net consists of a large number of simple processing elements called neurons.
Each neuron applies an activation function to its net input to determine its output signal.
Every neuron is connected to other neurons by means of directed communication links, each
with an associated weight. Each neuron has an internal state called its activation level, which
is a function of the inputs it has received. As and when the neuron receives the signal, it gets
added up and when the cumulative signal reaches the activation level the neuron sends an
output. Till then it keeps receiving the input. So activation level can be considered as a
threshold value for us to understand.
In general, a neural network is characterized by
1. Pattern of connections between the neurons called its architecture
2. Method of determining the weights on the connections called its training or learning
algorithm
3. Its internal state called its Activation function.
The arrangement of neurons into layers and the connection patterns within and between
layers is called the net architecture. A neural net in which the signals flow from the input
units to the output units in a forward direction is called feed forward nets.
Interconnected competitive net in which there are closed loop signal paths from a unit back to
it is called a recurrent network. In addition to architecture, the method of setting the values of
the weights called training is an important characteristic of neural nets. Based on the training
methodology used neural nets can be distinguished into supervised or unsupervised neural
nets. For a neural net with supervised training, the training is accomplished by presenting a
sequence of training vectors or patterns each with an associated target output vector. The
weights are then adjusted according to a learning algorithm. For neural nets with
unsupervised training, a sequence of input vectors is provided, but no target vectors are
specified. The net modifies the weights so that the most similar input vectors are assigned to
the same output unit. The neural net will produce a representative vector for each cluster
formed. Unsupervised learning is also used for other tasks, in addition to clustering.
LECTURE-2
Activation functions:
Architecture:
There are a wide variety of networks depending on the nature of information processing
carried out at individual nodes, the topology of the links, and the algorithm for adaptation of
link weights. Some of the popular among them include:
Perceptron: Definition: It‟s a step function based on a linear combination of real-valued
inputs. If the combination is above a threshold it outputs a 1, otherwise it outputs a –1. This
consists of a single neuron with multiple inputs and a single output. It has restricted
information processing capability. The information processing is done through a transfer
function which is either linear or non-linear.
Fig. A perceptron
A perceptron can learn only examples that are called “linearly separable”. These are
examples that can be perfectly separated by a hyperplane.
Perceptrons can learn many boolean functions: AND, OR, NAND, NOR, but not XOR
However, every boolean function can be represented with a perceptron network that has two
levels of depth or more.
The weights of a perceptron implementing the AND function is shown below.
Recurrent Neural Networks: RNN topology involves backward links from output to the
input and hidden layers. The notion of time is encoded in the RNN information processing
scheme. They are thus used in applications like speech processing where inputs are time
sequences data.
Fig. Multilayer feed back network (Recurrent Neural Network)
Self-Organizing Maps: SOMs or Kohonen networks have a grid topology, wit unequal grid
weights. The topology of the grid provides a low dimensional visualization of the data
distribution. These are thus used in applications which typically involve organization and
human browsing of a large volume of data. Learning is performed using a winner take all
strategy in a unsupervised mode. It is described in detail later.
Single layer Network:
A neural net with only input layer and output layer is called single layer neural network. A
neural network with input layer, one or more hidden layers and an output layer is called a
multilayer neural network. A single layer network has limited capabilities when compared to
the multilayer neural networks.
LECTURE-3
LECTURE-4
LECTURE-5
LECTURE-6
Architectures of MLP:
If there is no nonlinearity then an MLP can be reduced to a linear neuron.
1. Universal Approximator:
For the above theorem to be valid, the sigmoid function g(.) has to satisfy some conditions. It
must be: 1) non-constant, 2) bounded, 3) monotone-increasing and 4) continuous.
All the four transfer functions described in the section on Perceptrons satisfy conditions #1,2
and 3. But the hardlimiting nonlinearities are not continuous. Therefore, the logistic function
or the tanh function are suitable for use as sigmoids in MLPs.
2. In general more layers/nodes greater network complexity
Although 3 hidden layers with full connectivity are enough to learn any function often more
hidden layers and/or special architectures are used.
Sequential mode:
Updating the network weights after every presentation of a data point is sequential mode of
update.
- lesser memory requirement
- The random order of presentation of input patterns acts as a noise source lesser chance of
local minima
Rate of learning:
We have already seen the tradeoffs involved in choice of a learning rate.
Small learning rate η,approximate original continuous domain equations more closely but
slows down learning.
Large learning rate η ,oorer approximation of original equations. Error may not decrease
monotonically and may even oscillate. But learning is faster..
A good thumb rule for choosing eta 'η':
η = 1/m
Where „m‟ is the number of inputs to a neuron. This rule assumes that there are different η s
for different neurons.
3. Important tip relating learning rate and error surface:
Premature Saturation:
All the weight modification activity happens only when |h| is within certain limits.
g‟(h) ≈ 0, or delta(w) = 0, for large |h|.
NN gets stuck in a shallow local minimum.
Solutions:
1) - Keep a copy of weights
- Retract to pre-saturation state
- Perturb weights, decrease η and proceed
2) - Reduce sigmoid gain (lambda) initially
e) Increase lambda gradually as error is minimized
Applications of MLP
Three applications of MLPs that simulate aspects of sensory, motor or cognitive functions are
described.
1. Nettalk
2. Past tense learning
3. Autonomous Land Vehicle in a Neural Network (ALVINN)
LECTURE-6
Multilayer feed forward network has more hidden layers and again, when I say feed forward
network, the connections are all allowed only from any layer to its succeeding layer, but the
connections are not allowed from any layer to its preceding layer. The example is you see
here there are four layers. These are all inputs. First hidden layer, second hidden layer, third
hidden layer and this is output layer. When we say the number of layers, we do not count the
input layer as one of the layers. When I say two layered network, then I have only one hidden
layer and next layer becomes output layer.
This particular configuration means there are sub-units, sub-neurons here and this particular
configuration, if I connect you will see why I say feed forward network, because I am able to
connect any layer from its preceding layer. That means connections are allowed from the
preceding layer to any layer, but cannot allow the feedback connection. (Refer Slide Time:
30:54) This is called feedback connection; this is not allowed. This is allowed. From this
layer, I can connect to this layer. This is allowed, but I cannot allow from this layer to
connect to this layer. These are called feedback connections. They are not allowed and that is
why this is known as feed forward network.
Today, we will derive a two-layered feed forward neural network with sigmoid activation
function. We can very easily see that this is 1 layer; this is the only hidden layer and this is
the only output layer; output layer is always only one.
We have a certain convention that we will put while deriving a back propagation learning
algorithm for this. The same simple principle; given training data, we allow the input to pass
through the network, compute the error here, use the gradient descent rule and the back
propagated error are used to modify the weights here that is between output layer and hidden
layer and again another form of back propagated error here has to be used for modification of
the weights between input layer and hidden layer. This is again the convention that we will
use.
Fig. The Gradient descent rule
After choosing the weights of the network randomly, the backpropagation algorithm is used
to compute the necessary corrections. The algorithm can be decomposed in the following four
steps:
i) Feed-forward computation
ii) Backpropagation to the output layer
iii) Backpropagation to the hidden layer
iv) Weight updates
The algorithm is stopped when the value of the error function has become sufficiently small.
In the case of p > 1 input-output patterns, an extended network is used to compute the error
function for each of them separately. The weight corrections The Backpropagation Algorithm
are computed for each pattern and so we get, for example, for weight w(1)ij the corrections
LECTURE-7
• The j is called the center of the jth hidden or RBF node and is called the width.
• We can have different for different hidden nodes.
We next consider learning the parameters of a RBF network from training samples.
• Let {(Xi, di), i = 1, ・ ・ ・ ,N} be the training set.
• Suppose we are using the Gaussian RBF.
• Then we need to learn the centers ( j ) and widths ( ) of the hidden nodes and the weights
into the output node (wj).
Like earlier, we can find parameters to minimize empirical risk under squared error loss
function.
• Same as minimizing sum of squares of errors. Let
J is a function of , wj , j , j = 1, ・ ・ ・ , p.
We can find the weights/parameters of the network to minimize J.
• To minimize J, we can use the standard iterative algorithm of gradient descent.
• This needs computation of gradient which can be done directly from the expression for J.
• For this network structure there are no special methods to evaluate all the needed partial
derivatives. Such a gradient descent algorithm is certainly one method of learning an RBF
network from given training data.
• This is a general-purpose method for learning an RBF network.
• Like in the earlier case, we have to fix p, the number of hidden nodes.
• Such procedure would have the usual problems of converging to a local minimum of the
error function.
• There are also other methods of learning an RBF network.
• If we have the basis functions, Áj , then it is exactly same as a linear model and we can use
standard linear least squares method to learn wj .
• To fix Áj , we need to essentially fix j (and may be ).
• So, if we can somehow fix centers and widths of the RBF nodes, then we can learn the wj
very easily.
As we have discussed earlier, these RBF networks use „local‟ representations.
• What this means is that j should be „representative‟ points of the feature space and they
should „cover‟ the feature space.
• Essentially, the proof that these networks can represent any continuous function is based on
having such centers for RBF nodes.
• We can use such ideas to formulate methods for fixing centers of RBF nodes.
One simple method of choosing centers, j , is to randomly choose p of the training
examples.
• We know that with N hidden nodes and centers same as training examples, we get perfect
interpolation.
• Hence we can take some of the training examples as centers.
• There can be some variations on this theme.
• However, such a method does not, in general, ensure that we have representative points in
the feature space as centers.
When we have p hidden nodes, we need p „centers‟.
• Hence we are looking for p number of „representative‟points in the feature space.
• The only information we have are the N training examples.
• Hence the problem is:
given N points, Xi, i = 1, ・ ・ ・ ,N in <m, find p „representative‟ points in <m.
• This is the „clustering problem‟ This is a problem of forming the data into p clusters.
• We can take the „cluster centers‟ to be the representative points.
• The kind of clusters we get depends on how we want to formalize the notion of the p points
being representative of the N data points.
• We now look at one notion of clustering that is popular.
Let 1 , ・ ・ ・ , p represent the p cluster centers.
• Now we need an objective function that specifies how representative these are of the data
Xi, i = 1, ・ ・ ・ ,N.
Now we can define a cost function as
LECTURE-9
The essence of the modification proposed in the SOM model, is a mechanism that ensures
that the weight vectors remain spatially ordered, while they also move towards the data points
that activate them maximally.
Unlike a competitive learning network, which consists of a single row of output neurons, a
SOM consists of a m-dimensional grid of neurons. Usually two-dimensional SOMs are
studied since SOMs were originally inspired by the two-dimensional maps in the brain. The
topology of the grid is usually rectangular, though sometimes hexagonal topologies (Fig.) are
also considered.
Figure: Rectangular and hexagonal trajectories of Kohonen‟s network
As in the case of competitive learning, the weight vector of the winner is moved towards the
input, x. But addition, neurons close to the winner in the SOM are also moved towards the
input, x, but with a lesser learning rate. Neurons that are nearby in the SOM are defined by a
neighborhood N .
Fig. For the neuron in white (center) the neurons in red represent the neighborhood if we
consider the neighborhood radius to be 1
Neighborhood size is large in the early stages, and is decreased gradually as training
progresses.
Learning Vector Quantization(LVQ):
Vector quantization is noting but clustering, where Given a set of vectors {x}, find a set of
representative vectors {wm; 1 ≤m ≤M} such that each x is quantized into a particular wm.
{wm} locate at the mean (centroid) of the density distribution of each cluster. LVQ is an
unsupervised pattern classifier where the actual class membership information is not used.
Speech Recognition
• Robot Arm control
• Industrial process control
• automated synthesis of digital systems
• channel equalization for telecommunication
• image compression
• radar classification of sea-ice
• optimization problems
• sentence understanding
• classification of insect courtship songs
LECTURE-10
Fig. A simple network topology for Hebbian Learning, where W ij resides between two
neurons
Where η is the learning rate, f (.) is the neuron function, x is the input to the jth neuron.
Since the weights are adjusted according to the correlation formula is a type of correlational
learning rule.
A sequence of learning patterns indexed by p is presented to the network. Initial weights are
taken zero. So updated weight after entire data set is:
Frequent input patterns have more impact on weights, giving largest output at end.
The objective function is maximized to maximize output.
This rule causes unconstrained growth of weights. Hebbian rule was modified by Oja by
normalization.
Modified Hebbian Learning:
For small learning rate expanding in Taylor‟s series weight update rule becomes
Here, a weight decay proportional to the squared output is added to maintain weight vector
unit length automatically.
LECTURE-11
• Notice that Oki is used as both node output and node function. Assume that a training data
set has P entries.
• The error measure for the pth entry can be defined as the sum of the squared error
For the internal node at (k, i), the error rate can be derived by the chain rule:
where 1 ≤ k ≤ L − 1
• The error rate of an internal node is a linear combination of the error rates of the nodes in
the next layer.
If the parameters are to be updated after each input-output pair (on-line training) then the
update formula is:
With the batch learning (off-line learning) the update formula is based on the derivative of
the overall error with respect to α:
Applications of ANFIS:
1. Printed Character recognition
2. Inverse Kinematics
3. Nonlinear System identification
4. Channel Equalization
5. Feed back control system
6. Adaptive noise cancellation
References:
The principle of Darwinian evolution theory i.e., survival of the fittest is evaluated by a
fitness function derived from objective function. Every individual in a population searches to
be the best according to a fitness function in its own way(randomly).
Basic Concepts:
Optimization means to make the objective function max or min. That means in evolutionary
computing where the individuals/ elements represent possible solutions, an element exists
such that the fitness of that element is the maximum or minimum among all others‟ fitness
depending on it is maximization or minimization problem.
Optimization can be classified as:
1. Deterministic-Uses derivative or gradient to reach final solution
2. Stochastic- Derivative free optimization, a type of random search, suitable for non-
linearity, discontinuity escape from local optima and non-convex region
Components of Genetic Algorithm:
The individuals are genes which encode a trait or a parameter. The design space is to be
converted to genetic space. It is parallel processing by a population used when single point
approach of traditional methods cannot find a possible solution with in the required time
frame.
Important common aspects of evolutionary/swarm optimization algorithms: It is an
iterative process where best solution is searched by a population in search space evaluating a
fitness function.
1. Search space-Space for all feasible solutions is called search space.
2. Solution- It is the point with maximum or minimum value of fitness function.
3. Fitness function- A function derived from objective function
4. Population size- A number of points in a search space used in parallel for computing is
called population, generally ranging from 30 to 200.
5. Constraints- Lower and upper bounds
6. Stopping criteria- it can be no. of iterations, or minimum value of error in fitness or
minimum improvement from previous iteration
LECTURE-2
Binary coding:
If each design variable is given a string of length „l‟, and there are n such variables, then the
design vector will have a total string length of „nl‟. For example, let there are 3 design
variables and the string length be 4 (not necessarily fixed for all problems, depends on
accuracy in representing variable)for each variable. The variables are x1=4,x2=7 & x3=1.
Then the chromosome length is 12, where 4 bit in binary representing x1=0100 , x2=0111,
x3=0001 are genes. So each string/ chromosome represents a different solution.
An individual consists a genotype and a fitness function. Fitness represents the quality of the
solution (normally called fitness function). It forms the basis for selecting the individuals and
thereby facilitates improvements.
Decoding:
If xiL & xiU correspond to 0000 to 0111 ; ni the bit length of coding decoded value xi will be
LECTURE-4
PARENT SELECTION:
After fitness function evaluation, individuals are distinguished based on their quality.
According to Darwin's evolution theory the best ones should survive and create new offspring
for the next generation. There are many methods to select the best chromosomes.
1. Roulette wheel selection
2. Boltzmann selection
3. Tournament selection
4. Rank selection
5. Steady state selection
The first one is briefly described.
Roulette Wheel Selection: Parents are selected according to their fitness i.e., each individual
is selected with a probability proportional to its fitness value. In other words, depending on
the percentage contribution to the total population fitness, string is selected for mating to
form the next generation. This way, weak solutions are eliminated and strong solutions
survive to form the next generation. For example, consider a population containing four
strings shown in the Table 1. Each string is formed by concatenating four substrings which
represents variables a,b,c and d. Length of each string is taken as four bits. The first column
represents the possible solution in binary form. The second column gives the fitness values of
the decoded strings. The third column gives the percentage contribution of each string to the
total fitness of the population. Then by "Roulette Wheel" method, the probability of
candidate 1 being selected as a parent of the next generation is 28.09%. Similarly, the
probability that the candidates 2, 3, 4 will be chosen for the next generation are 19.59, 12.89
and 39.43 respectively. These probabilities are represented on a pie chart, and then four
numbers are randomly generated between 1 and 100. Then, the likeliness that the numbers
generated would fall in the region of candidate 2 might be once, whereas for candidate 4 it
might be twice and candidate 1 more than once and for candidate 3 it may not fall at all.
Thus, the strings are chosen to form the parents of the next generation.
LECTURE-5
CROSSOVER:
It is a recombination operator. Selection alone cannot introduce any new individuals into the
population, i.e., it cannot find new points in the search space. These are generated by
genetically-inspired operators, of which the most well known are crossover and mutation.
Types-
1. One-point
2. Two-point
3. Uniform
4. Arithmetic
5. Heuristic
6. Matrix
In one point crossover, selected pair of strings is cut at some random position and their
segments are swapped to form new pair of strings.
LECTURE-6
Shortcomings of GA:
1. Minimal deception problem- Some objective functions may be very difficult to optimize
by GA. Representing the solution accuracy depends on coding.
2. GA drift(Bias)- Loss of population diversity may seek suboptimal solution with a smaller
population size
3. Real time& online issues- It does not guarantee response time, which is vital in real time
issues. Works offline satisfactorily.
4. Computationally expensive and time consuming
5. Issues in representation of problem
6. Proper writing of fittness function
7. Proper values of size of population, crossover and mutation rate
8. Premature Convergence
9. No one mathematically perfect solution since problems of biological adaptation don't have
this issue
LECTURE-8 to 10
References:
1. Chapter 7 J.S.R.Jang, C.T.Sun and E.Mizutani, “Neuro-Fuzzy and Soft
Computing”, PHI, 2004, Pearson Education 2004.
2. Chapter 8 & 9 S. Rajasekaran & GA Vijayalakshmi Pai “Neural Networks, Fuzzy
Logic, and Genetic Algorithms synthesis and application”, PHI
3. Internet sources
MODULE-IV (10 HOURS)
Evolutionary Computing, Simulated Annealing, Random Search, Downhill Simplex Search, Swarm
optimization
LECTURE-1
Analogy-based algorithms
For any natural phenomenon you can think of, there will be at least one AI research
group that will have a combinatorial optimization algorithm “based” on “analogies”
and “similarities” with the phenomenon. Here’s the beginning of the list…
• Metal cooling annealing
• Evolution / Co-evolution / Sexual Reproduction
• Thermodynamics
• Societal Markets
• Management Hierarchies
• Ant/Insect Colonies
• Immune System
• Animal Behavior Conditioning
• Neuron / Brain Models
• Particle Physics
LECTURE-2
Advantages of Simulated Annealing
1. Simulated annealing is sometimes empirically much better at avoiding local
minima than hill-climbing. It is a successful, frequently-used, algorithm. Basic hill
climbing algorithm is so prone to getting caught in local optimums.
This is because a hill climber algorithm will simply accept neighbour solutions that
are better than the current solution. When the hill climber can't find any better
neighbours, it stops.
2. Not much opportunity to say anything formal about it (though there is a proof
that with an infinitely slow cooling rate, you’ll find the global optimum).
.
LECTURE-2
LECTURE-3
LECTURE-4
Swarm Optimization
Swarm intelligence (SI) is the collective behavior of decentralized, self-
organized systems, natural or artificial. SI systems consist typically of a population of
simple agents or boids interacting locally with one another and with their
environment. The inspiration often comes from nature, especially biological systems.
The agents follow very simple rules, and although there is no centralized control
structure dictating how individual agents should behave, local, and to a certain
degree random, interactions between such agents lead to the emergence of
"intelligent" global behavior, unknown to the individual agents. Examples in natural
systems of SI include ant colonies, bird flocking, animal herding, bacterial growth,
fish schooling and Microbial intelligence. It is infact a multi-agent system that has
self-organized behaviour that shows some intelligent behaviour.
Two principles in swarm intelligence
self-organization is based on:
activity amplification by positive feedback
activity balancing by negative feedback
amplification of random fluctuations
multiple interactions
stigmergy - stimulation by work - is based on:
work as behavioural response to the environmental state
an environment that serves as a work state memory
work that does not depend on specific agents
LECTURE-5
Ant Algorithm:
Algorithms inspired by the behavior of real ants
Examples:
Foraging
Corpse clustering
Division of labor
While walking ants deposit a substance called pheromone on the ground They
choose with higher probability paths that are marked by stronger pheromone
concentrations
Cooperative interaction which leads to the emergence of short(est) paths
LECTURE-6
Bees Algorithm:
The queen moves randomly over the combs eggs are more likely to be layed
in the neighbourhood of brood
honey and pollen are deposited randomly in empty cells
four times more honey is brought to the hive than pollen
removal ratios for honey: 0.95; pollen: 0.6
removal of honey and pollen is proportional to the number of surrounding cells
containing brood
The above are few examples of SI, there are numerous others.
LECTURE-7-10
Swarm Optimization Applications:
1. Combinatorial optimization
2. Mixed integer-continuous optimization
3. Networks: AntNet
4. Data clustering and exploratory data analysis
5. Coordinated motion
6. Self-assembling
References:
1. Yamille del Valle, Ganesh Kumar Venayagamoorthy, Salman Mohagheghi,
and Ronald G. Harley, “Particle Swarm Optimization: Basic Concepts,
Variants and Applications in Power Systems, IEEE Transactions On
Evolutionary Computation, Vol. 12, No. 2, April 2008, pp. 171-195.
2. Krause, J., Ruxton, G. D., & Krause, S. (2010). Swarm intelligence in animals
and humans. Trends in ecology & evolution, 25(1), 28-34.
3. Internet sources.
4. Chapter 7 J.S.R.Jang, C.T.Sun and E.Mizutani, “Neuro-Fuzzy and Soft
Computing”, PHI, 2004, Pearson Education 2004.