Unit - 5 Machine Learning
Unit - 5 Machine Learning
• Confidence: Confidence indicates how often the rule has been found to
be true. Or how often the items X and Y occur together in the dataset
when the occurrence of X is already given. It is the ratio of the transaction
that contains X and Y to the number of records that contain X.
Association Rules Learning
How does Association Rule Learning work?
• Lift: It is the strength of any rule, which can be defined as below formula:
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
Science artificial neural networks that have steeped into the world in the
mid‐20th century are exponentially developing. In the present time, we
have investigated the pros of artificial neural networks and the issues
encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which are a flourishing science branch,
are eliminated individually, and their pros are increasing day by day. It
means that artificial neural networks will turn into an irreplaceable part
of our lives progressively important.
Architecture of Artificial Neural Network
What is an activation function and why to use them?
• Definition of activation function:‐ Activation function decides, whether a
neuron should be activated or not by calculating weighted sum and
further adding bias with it. The purpose of the activation function is
to introduce non‐linearity into the output of a neuron.
• Explanation :‐
We know, neural network has neurons that work in correspondence
of weight, bias and their respective activation function. In a neural
network, we would update the weights and biases of the neurons on the
basis of the error at the output. This process is known as back‐
propagation. Activation functions make the back‐propagation possible
since the gradients are supplied along with the error to update the
weights and biases.
Advantages of Artificial Neural Network (ANN)
• Parallel processing capability: Artificial neural networks have a numerical value
that can perform more than one task simultaneously.
• Storing data on the entire network: Data that is used in traditional
programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
• Capability to work with incomplete knowledge: After ANN training, the
information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.
• Having a memory distribution: For ANN is to be able to adapt, it is important to
determine the examples and to encourage the network according to the desired
output by demonstrating these examples to the network. The succession of the
network is directly proportional to the chosen instances, and if the event can't
appear to the network in all its aspects, it can produce false output.
• Having fault tolerance: Extortion of one or more cells of ANN does not prohibit
it from generating output, and this feature makes the network fault‐tolerance.
Disadvantages of Artificial Neural Network (ANN)
• Assurance of proper network structure: There is no particular guideline for
determining the structure of artificial neural networks. The appropriate
network structure is accomplished through experience, trial, and error.
• Unrecognized behavior of the network: It is the most significant issue of ANN.
When ANN produces a testing solution, it does not provide insight concerning
why and how. It decreases trust in the network.
• Hardware dependence: Artificial neural networks need processors with parallel
processing power, as per their structure. Therefore, the realization of the
equipment is dependent.
• Difficulty of showing the issue to the network: ANNs can work with numerical
data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
• The duration of the network is unknown: The network is reduced to a specific
value of the error, and this value does not give us optimum results.
How do Artificial Neural Networks work?
Artificial Neural Network can be best
represented as a weighted directed
graph, where the artificial neurons
form the nodes. The association
between the neurons outputs and
neuron inputs can be viewed as the
directed edges with weights. The
Artificial Neural Network receives
the input signal from the external
source in the form of a pattern and
image in the form of a vector. These
inputs are then mathematically
assigned by the notations x(n) for
every n number of inputs.
How do Artificial Neural Networks work?
• Afterward, each of the input is multiplied by its corresponding weights (
these weights are the details utilized by the artificial neural networks to
solve a specific problem ).
• In general terms, these weights normally represent the strength of the
interconnection between neurons inside the artificial neural network. All
the weighted inputs are summarized inside the computing unit.
• If the weighted sum is equal to zero, then bias is added to make the
output non‐zero or something else to scale up to the system's response.
Bias has the same input, and weight equals to 1.
• Here the total of weighted inputs can be in the range of 0 to positive
infinity. Here, to keep the response in the limits of the desired value, a
certain maximum value is benchmarked, and the total of weighted inputs
is passed through the activation function.
How do Artificial Neural Networks work?
• The activation function refers to the set of transfer functions used to achieve
the desired output. There is a different kind of the activation function, but
primarily either linear or non‐linear sets of functions. Some of the commonly
used sets of activation functions are the Binary, linear, and Tan hyperbolic
sigmoidal activation functions. Let us take a look at each of them in details:
• Binary: In binary activation function, the output is either a one or a 0. Here, to
accomplish this, there is a threshold value set up. If the net weighted input of
neurons is more than 1, then the final output of the activation function is
returned as one or else the output is returned as 0.
• Sigmoidal Hyperbolic: The Sigmoidal Hyperbola function is generally seen as
an "S" shaped curve. Here the tan hyperbolic function is used to approximate
output from the actual net input. The function is defined as:
F(x) = (1/1 + exp(‐????x))
Where ???? is considered the Steepness parameter.
Types of Artificial Neural Network
There are various types of Artificial Neural Networks (ANN) depending upon the
human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some
similarities with a more complex biological partner and are very effective at their
expected tasks. For example, segmentation or classification.
• Feedback ANN: In this type of ANN, the output returns into the network to
accomplish the best‐evolved results internally. As per the University of
Massachusetts, Lowell Centre for Atmospheric Research. The feedback
networks feed information back into itself and are well suited to solve
optimization issues. The Internal system error corrections utilize feedback
ANNs.
• Feed‐Forward ANN: A feed‐forward network is a basic neural network
comprising of an input layer, an output layer, and at least one layer of a neuron.
Through assessment of its output by reviewing its input, the intensity of the
network can be noticed based on group behavior of the associated neurons,
and the output is decided. The primary advantage of this network is that it
figures out how to evaluate and recognize input patterns.
Genetic Algorithms ‐ Introduction
• Genetic Algorithm (GA) is a search‐based optimization technique based
on the principles of Genetics and Natural Selection. It is frequently used
to find optimal or near‐optimal solutions to difficult problems which
otherwise would take a lifetime to solve. It is frequently used to solve
optimization problems, in research, and in machine learning.
Introduction to Optimization
• Optimization is the process of making something better. In any process,
we have a set of inputs and a set of outputs as shown in the following
figure.
Genetic Algorithms ‐ Introduction
• Optimization refers to finding the values of inputs in such a way that we
get the “best” output values. The definition of “best” varies from
problem to problem, but in mathematical terms, it refers to maximizing
or minimizing one or more objective functions, by varying the input
parameters.
• The set of all possible solutions or values which the inputs can take make
up the search space. In this search space, lies a point or a set of points
which gives the optimal solution. The aim of optimization is to find that
point or set of points in the search space.
Genetic Algorithms ‐ Introduction
What are Genetic Algorithms?
• Nature has always been a great source of inspiration to all mankind.
Genetic Algorithms (GAs) are search based algorithms based on the
concepts of natural selection and genetics. GAs are a subset of a much
larger branch of computation known as Evolutionary Computation.
• GAs were developed by John Holland and his students and colleagues at
the University of Michigan, most notably David E. Goldberg and has since
been tried on various optimization problems with a high degree of
success.
• In GAs, we have a pool or a population of possible solutions to the given
problem. These solutions then undergo recombination and mutation (like
in natural genetics), producing new children, and the process is repeated
over various generations.
Genetic Algorithms ‐ Introduction
• Each individual (or candidate solution) is assigned a fitness value (based
on its objective function value) and the fitter individuals are given a
higher chance to mate and yield more “fitter” individuals. This is in line
with the Darwinian Theory of “Survival of the Fittest”.
• In this way we keep “evolving” better individuals or solutions over
generations, till we reach a stopping criterion.
• Genetic Algorithms are sufficiently randomized in nature, but they
perform much better than random local search (in which we just try
various random solutions, keeping track of the best so far), as they exploit
historical information as well.
Genetic Algorithms ‐ Introduction
Advantages of GAs
• Does not require any derivative information (which may not be available
for many real‐world problems).
• Is faster and more efficient as compared to the traditional methods.
• Has very good parallel capabilities.
• Optimizes both continuous and discrete functions and also multi‐
objective problems.
• Provides a list of “good” solutions and not just a single solution.
• Always gets an answer to the problem, which gets better over the time.
• Useful when the search space is very large and there are a large number
of parameters involved.
Genetic Algorithms ‐ Introduction
Limitations of GAs
• GAs are not suited for all problems, especially problems which are simple
and for which derivative information is available.
• Fitness value is calculated repeatedly which might be computationally
expensive for some problems.
• Being stochastic, there are no guarantees on the optimality or the quality
of the solution.
• If not implemented properly, the GA may not converge to the optimal
solution.
Genetic Algorithms ‐ Fundamentals
Basic Terminology
• Population − It is a subset of all the possible (encoded) solutions to the
given problem. The population for a GA is analogous to the population for
human beings except that instead of human beings, we have Candidate
Solutions representing human beings.
• Chromosomes − A chromosome is one such solution to the given
problem.
• Gene − A gene is one element position of a chromosome.
• Allele − It is the value a gene takes for a particular chromosome.
Genetic Algorithms ‐ Fundamentals
• Genotype − Genotype is the population in the computation space. In the computation
space, the solutions are represented in a way which can be easily understood and
manipulated using a computing system.
• Phenotype − Phenotype is the population in the actual real world solution space in
which solutions are represented in a way they are represented in real world
situations.
Genetic Algorithms ‐ Fundamentals
• Decoding and Encoding − For simple problems, the phenotype and
genotype spaces are the same. However, in most of the cases, the
phenotype and genotype spaces are different. Decoding is a process of
transforming a solution from the genotype to the phenotype space, while
encoding is a process of transforming from the phenotype to genotype
space. Decoding should be fast as it is carried out repeatedly in a GA
during the fitness value calculation.
• For example, consider the 0/1 Knapsack Problem. The Phenotype space
consists of solutions which just contain the item numbers of the items to
be picked.
• However, in the genotype space it can be represented as a binary string
of length n (where n is the number of items). A 0 at position x represents
that xth item is picked while a 1 represents the reverse. This is a case
where genotype and phenotype spaces are different.
Genetic Algorithms ‐ Fundamentals
• Fitness Function − A fitness function simply defined is a function which takes the
solution as input and produces the suitability of the solution as the output. In some
cases, the fitness function and the objective function may be the same, while in
others it might be different based on the problem.
• Genetic Operators − These alter the genetic composition of the offspring. These
include crossover, mutation, selection, etc.
Basic Structure ‐ Genetic Algorithms
Basic Structure ‐ Genetic Algorithms
• We start with an initial population (which may be generated at random or seeded by
other heuristics), select parents from this population for mating. Apply crossover and
mutation operators on the parents to generate new off‐springs. And finally these off‐
springs replace the existing individuals in the population and the process repeats. In
this way genetic algorithms actually try to mimic the human evolution to some extent.
• A generalized pseudo‐code for a GA is explained in the following program −
GA()
initialize population
find fitness of population
Population Models
• Steady State: In steady state GA, we generate one or two off‐springs in each
iteration and they replace one or two individuals from the population. A steady
state GA is also known as Incremental GA.
• Generational: In a generational model, we generate ‘n’ off‐springs, where n is
the population size, and the entire population is replaced by the new one at
the end of the iteration.
Genetic Algorithms ‐ Fitness Function
• The fitness function simply defined is a function which takes a candidate
solution to the problem as input and produces as output how “fit” our
how “good” the solution is with respect to the problem in consideration.
• Calculation of fitness value is done repeatedly in a GA and therefore it
should be sufficiently fast. A slow computation of the fitness value can
adversely affect a GA and make it exceptionally slow.
• In most cases the fitness function and the objective function are the
same as the objective is to either maximize or minimize the given
objective function. However, for more complex problems with multiple
objectives and constraints, an Algorithm Designer might choose to have a
different fitness function.
Genetic Algorithms ‐ Fitness Function
• A fitness function should possess the following characteristics −
• The fitness function should be sufficiently fast to compute.
• It must quantitatively measure how fit a given solution is or how fit individuals can
be produced from the given solution.
• In some cases, calculating the fitness function directly might not be
possible due to the inherent complexities of the problem at hand. In such
cases, we do fitness approximation to suit our needs.
Genetic Algorithms ‐ Parent Selection
• Parent Selection is the process of selecting parents which mate and
recombine to create off‐springs for the next generation. Parent selection
is very crucial to the convergence rate of the GA as good parents drive
individuals to a better and fitter solutions.
• Maintaining good diversity in the population is extremely crucial for the
success of a GA.
• This taking up of the entire population by one extremely fit solution is
known as premature convergence and is an undesirable condition in a
GA.
Genetic Algorithms ‐ Parent Selection
Fitness Proportionate Selection:
• Fitness Proportionate Selection is one of the most popular ways of parent
selection. In this every individual can become a parent with a probability
which is proportional to its fitness. Therefore, fitter individuals have a
higher chance of mating and propagating their features to the next
generation. Therefore, such a selection strategy applies a selection
pressure to the more fit individuals in the population, evolving better
individuals over time.
• Consider a circular wheel. The wheel is divided into n pies, where n is the
number of individuals in the population. Each individual gets a portion of
the circle which is proportional to its fitness value.
Genetic Algorithms ‐ Parent Selection
Two implementations of fitness proportionate selection are possible −
1. Roulette Wheel Selection: In this the circular wheel is divided as
described before. A fixed point is chosen on the wheel circumference and
the wheel is rotated. The region of the wheel which comes in front of the
fixed point is chosen as the parent. For the second parent, the same
process is repeated.
Genetic Algorithms ‐ Parent Selection
2. Stochastic Universal Sampling (SUS): Stochastic Universal Sampling is
quite similar to Roulette wheel selection, however instead of having just
one fixed point, we have multiple fixed points. Therefore, all the parents
are chosen in just one spin of the wheel. Also, such a setup encourages the
highly fit individuals to be chosen at least once.
Genetic Algorithms ‐ Parent Selection
Tournament Selection:
• In K‐Way tournament selection, we select K individuals from the
population at random and select the best out of these to become a
parent. The same process is repeated for selecting the next parent.
Tournament Selection is also extremely popular in literature as it can
even work with negative fitness values.
Genetic Algorithms ‐ Parent Selection
Rank Selection:
• Rank Selection also works with negative fitness values and is mostly used
when the individuals in the population have very close fitness values (this
happens usually at the end of the run). This leads to each individual
having an almost equal share of the pie (like in case of fitness
proportionate selection) and hence each individual no matter how fit
relative to each other has an approximately same probability of getting
selected as a parent.
Genetic Algorithms ‐ Parent Selection
• This in turn leads to a loss in the selection pressure towards fitter
individuals, making the GA to make poor parent selections in such
situations.
Random Selection
• In this strategy we randomly select parents from the existing population.
There is no selection pressure towards fitter individuals and therefore this
strategy is usually avoided.
Genetic Algorithms ‐ Crossover
Introduction to Crossover
• The crossover operator is analogous to reproduction and biological
crossover. In this more than one parent is selected and one or more off‐
springs are produced using the genetic material of the parents. Crossover
is usually applied in a GA with a high probability – pc .
Crossover Operators
• It is to be noted that these crossover operators are very generic and the
GA Designer might choose to implement a problem‐specific crossover
operator as well.
Genetic Algorithms ‐ Crossover
Crossover Operators
1. One Point Crossover: In this one‐point crossover, a random crossover
point is selected and the tails of its two parents are swapped to get new
off‐springs.
There exist a lot of other crossovers like Partially Mapped Crossover (PMX),
Order based crossover (OX2), Shuffle Crossover, Ring Crossover, etc.
Genetic Algorithms ‐ Mutation
• In simple terms, mutation may be defined as a small random tweak in the
chromosome, to get a new solution. It is used to maintain and introduce
diversity in the genetic population and is usually applied with a low
probability – pm. If the probability is very high, the GA gets reduced to a
random search.
• Mutation is the part of the GA which is related to the “exploration” of the
search space. It has been observed that mutation is essential to the
convergence of the GA while crossover is not.
Mutation Operators
• Like the crossover operators, this is not an exhaustive list and the GA
designer might find a combination of these approaches or a problem‐
specific mutation operator more useful.
Genetic Algorithms ‐ Mutation
Mutation Operators
1. Bit Flip Mutation: In this bit flip mutation, we select one or more
random bits and flip them. This is used for binary encoded GAs.