Genetic algorithm
Genetic algorithm
1. Initial population
2. Fitness function
3. Selection
4. Crossover
5. Mutation
Initial Population
The process begins with a set of individuals which is called
a Population. Each individual is a solution to the problem
you want to solve.
Fitness Function
The fitness function determines how fit an individual is
(the ability of an individual to compete with other
individuals). It gives a fitness score to each individual.
The probability that an individual will be selected for
reproduction is based on its…
The genetic algorithm works on the evolutionary generational cycle to
generate high-quality solutions. These algorithms use different operations
that either enhance or replace the population to give an improved fit
solution.
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of
individuals, which is called population. Here each individual is the solution
for the given problem. An individual contains or is characterized by a set
of parameters called Genes. Genes are combined into a string and
generate chromosomes, which is the solution to the problem. One of the
most popular techniques for initialization is the use of random binary
strings.
ADVERTISEMENT
ADVERTISEMENT
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the
ability of an individual to compete with other individuals. In every
iteration, individuals are evaluated based on their fitness function. The
fitness function provides a fitness score to each individual. This score
further determines the probability of being selected for reproduction. The
high the fitness score, the more chances of getting selected for
reproduction.
3. Selection
The selection phase involves the selection of individuals for the
reproduction of offspring. All the selected individuals are then arranged in
a pair of two to increase reproduction. Then these individuals transfer
their genes to the next generation.
4. Reproduction
After the selection process, the creation of a child occurs in the
reproduction step. In this step, the genetic algorithm uses two variation
operators that are applied to the parent population. The two operators
involved in the reproduction phase are given below:
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution
is reached. It will identify the final solution as the best solution in the
population.
General Workflow of a Simple Genetic Algorithm
ADVERTISEMENT
Limitations of Genetic Algorithms
o Genetic algorithms are not efficient algorithms for solving simple
problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational
challenges.
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part,
data cleaning is done. It involves handling of missing data, noisy data etc.
1. Binning Method:
This method works on sorted data in order to smooth it. The whole
data is divided into segments of equal size and then various methods
are performed to complete the task. Each segmented is handled
separately. One can replace all data in a segment by its mean or
boundary values can be used to complete the task.
2. Regression:
Here data can be made smooth by fitting it to a regression
function.The regression used may be linear (having one independent
variable) or multiple (having multiple independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be
undetected or it will fall outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable
for mining process. This involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0
or 0.0 to 1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of
attributes to help the mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval
levels or conceptual levels.
3. Data Reduction:
Data reduction is a crucial step in the data mining process that involves
reducing the size of the dataset while preserving the important information.
This is done to improve the efficiency of data analysis and to avoid overfitting
of the model. Some common steps involved in data reduction are:
Feature Selection: This involves selecting a subset of relevant features from
the dataset. Feature selection is often performed to remove irrelevant or
redundant features from the dataset. It can be done using various techniques
such as correlation analysis, mutual information, and principal component
analysis (PCA).
Feature Extraction: This involves transforming the data into a lower-
dimensional space while preserving the important information. Feature
extraction is often used when the original features are high-dimensional and
complex. It can be done using techniques such as PCA, linear discriminant
analysis (LDA), and non-negative matrix factorization (NMF).
Sampling: This involves selecting a subset of data points from the dataset.
Sampling is often used to reduce the size of the dataset while preserving the
important information. It can be done using techniques such as random
sampling, stratified sampling, and systematic sampling.
Clustering: This involves grouping similar data points together into clusters.
Clustering is often used to reduce the size of the dataset by replacing similar
data points with a representative centroid. It can be done using techniques
such as k-means, hierarchical clustering, and density-based clustering.
Compression: This involves compressing the dataset while preserving the
important information. Compression is often used to reduce the size of the
dataset for storage and transmission purposes. It can be done using
techniques such as wavelet compression, JPEG compression, and gzip
compression.