0% found this document useful (0 votes)
0 views

Genetic algorithm

A genetic algorithm is a search heuristic inspired by natural evolution, utilizing processes like selection, crossover, and mutation to evolve solutions over generations. It operates through five phases: initialization, fitness assignment, selection, reproduction, and termination, aiming to optimize complex problems. While genetic algorithms can effectively generate high-quality solutions, they may not be efficient for simpler problems and do not guarantee optimal results.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Genetic algorithm

A genetic algorithm is a search heuristic inspired by natural evolution, utilizing processes like selection, crossover, and mutation to evolve solutions over generations. It operates through five phases: initialization, fitness assignment, selection, reproduction, and termination, aiming to optimize complex problems. While genetic algorithms can effectively generate high-quality solutions, they may not be efficient for simpler problems and do not guarantee optimal results.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Algorithm

A genetic algorithm is a search heuristic that is inspired


by Charles Darwin’s theory of natural evolution. This
algorithm reflects the process of natural selection where
the fittest individuals are selected for reproduction in
order to produce offspring of the next generation.

Notion of Natural Selection


The process of natural selection starts with the selection of
fittest individuals from a population. They produce
offspring which inherit the characteristics of the parents
and will be added to the next generation. If parents have
better fitness, their offspring will be better than parents
and have a better chance at surviving. This process keeps
on iterating and at the end, a generation with the fittest
individuals will be found.

This notion can be applied for a search problem. We


consider a set of solutions for a problem and select the set
of best ones out of them.

Five phases are considered in a genetic algorithm.

1. Initial population

2. Fitness function
3. Selection

4. Crossover

5. Mutation

Initial Population
The process begins with a set of individuals which is called
a Population. Each individual is a solution to the problem
you want to solve.

An individual is characterized by a set of parameters


(variables) known as Genes. Genes are joined into a string
to form a Chromosome (solution).

In a genetic algorithm, the set of genes of an individual is


represented using a string, in terms of an alphabet.
Usually, binary values are used (string of 1s and 0s). We
say that we encode the genes in a chromosome.
Population, Chromosomes and Genes

Fitness Function
The fitness function determines how fit an individual is
(the ability of an individual to compete with other
individuals). It gives a fitness score to each individual.
The probability that an individual will be selected for
reproduction is based on its…
The genetic algorithm works on the evolutionary generational cycle to
generate high-quality solutions. These algorithms use different operations
that either enhance or replace the population to give an improved fit
solution.

It basically involves five phases to solve the complex optimization


problems, which are given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of
individuals, which is called population. Here each individual is the solution
for the given problem. An individual contains or is characterized by a set
of parameters called Genes. Genes are combined into a string and
generate chromosomes, which is the solution to the problem. One of the
most popular techniques for initialization is the use of random binary
strings.

ADVERTISEMENT

ADVERTISEMENT

2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the
ability of an individual to compete with other individuals. In every
iteration, individuals are evaluated based on their fitness function. The
fitness function provides a fitness score to each individual. This score
further determines the probability of being selected for reproduction. The
high the fitness score, the more chances of getting selected for
reproduction.

3. Selection
The selection phase involves the selection of individuals for the
reproduction of offspring. All the selected individuals are then arranged in
a pair of two to increase reproduction. Then these individuals transfer
their genes to the next generation.

There are three types of Selection methods available, which are:


o Roulette wheel selection
o Tournament selection
o Rank-based selection

4. Reproduction
After the selection process, the creation of a child occurs in the
reproduction step. In this step, the genetic algorithm uses two variation
operators that are applied to the parent population. The two operators
involved in the reproduction phase are given below:

o Crossover: The crossover plays a most significant role in the reproduction


phase of the genetic algorithm. In this process, a crossover point is
selected at random within the genes. Then the crossover operator swaps
genetic information of two parents from the current generation to produce
a new individual representing the offspring.

The genes of parents are exchanged among themselves until the


crossover point is met. These newly generated offspring are added to the
population. This process is also called or crossover. Types of crossover
styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to
maintain the diversity in the population. It can be done by flipping some
bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and
enhances diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation

5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution
is reached. It will identify the final solution as the best solution in the
population.
General Workflow of a Simple Genetic Algorithm

Advantages of Genetic Algorithm


o The parallel capabilities of genetic algorithms are best.
o It helps in optimizing various problems such as discrete functions, multi-
objective problems, and continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.

ADVERTISEMENT
Limitations of Genetic Algorithms
o Genetic algorithms are not efficient algorithms for solving simple
problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational
challenges.

Difference between Genetic Algorithms and


Traditional Algorithms
o A search space is the set of all possible solutions to the problem. In the
traditional algorithm, only one set of solutions is maintained, whereas, in a
genetic algorithm, several sets of solutions in search space can be used.
o Traditional algorithms need more information in order to perform a search,
whereas genetic algorithms need only one objective function to calculate
the fitness of an individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms
can work parallelly (calculating the fitness of the individualities are
independent).
o One big difference in genetic Algorithms is that rather of operating directly
on seeker results, inheritable algorithms operate on their representations
(or rendering), frequently appertained to as chromosomes.
o One of the big differences between traditional algorithm and genetic
algorithm is that it does not directly operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas
Genetic Algorithms can generate multiple optimal results from different
generations.
o The traditional algorithm is not more likely to generate optimal results,
whereas Genetic algorithms do not guarantee to generate optimal global
results, but also there is a great possibility of getting the optimal result for
a problem as it uses genetic operators such as Crossover and Mutation.
o Traditional algorithms are deterministic in nature, whereas Genetic
algorithms are probabilistic and stochastic in nature.
Data preprocessing is an important step in the data mining process. It refers
to the cleaning, transforming, and integrating of data in order to make it
ready for analysis. The goal of data preprocessing is to improve the quality of
the data and to make it more suitable for the specific data mining task.

Some common steps in data preprocessing include:

Data preprocessing is an important step in the data mining process that


involves cleaning and transforming raw data to make it suitable for analysis.
Some common steps in data preprocessing include:
Data Cleaning: This involves identifying and correcting errors or
inconsistencies in the data, such as missing values, outliers, and duplicates.
Various techniques can be used for data cleaning, such as imputation,
removal, and transformation.
Data Integration: This involves combining data from multiple sources to
create a unified dataset. Data integration can be challenging as it requires
handling data with different formats, structures, and semantics. Techniques
such as record linkage and data fusion can be used for data integration.
Data Transformation: This involves converting the data into a suitable
format for analysis. Common techniques used in data transformation include
normalization, standardization, and discretization. Normalization is used to
scale the data to a common range, while standardization is used to transform
the data to have zero mean and unit variance. Discretization is used to
convert continuous data into discrete categories.
Data Reduction: This involves reducing the size of the dataset while
preserving the important information. Data reduction can be achieved
through techniques such as feature selection and feature extraction. Feature
selection involves selecting a subset of relevant features from the dataset,
while feature extraction involves transforming the data into a lower-
dimensional space while preserving the important information.
Data Discretization: This involves dividing continuous data into discrete
categories or intervals. Discretization is often used in data mining and
machine learning algorithms that require categorical data. Discretization can
be achieved through techniques such as equal width binning, equal
frequency binning, and clustering.
Data Normalization: This involves scaling the data to a common range,
such as between 0 and 1 or -1 and 1. Normalization is often used to handle
data with different units and scales. Common normalization techniques
include min-max normalization, z-score normalization, and decimal scaling.
Data preprocessing plays a crucial role in ensuring the quality of data and
the accuracy of the analysis results. The specific steps involved in data
preprocessing may vary depending on the nature of the data and the
analysis goals.
By performing these steps, the data mining process becomes more efficient
and the results become more accurate.
Preprocessing in Data Mining:
Data preprocessing is a data mining technique which is used to transform the
raw data in a useful and efficient format.

Data preprocessing is a data mining technique which is used to transform the


raw data in a useful and efficient format.

Steps Involved in Data Preprocessing:

1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part,
data cleaning is done. It involves handling of missing data, noisy data etc.

 (a). Missing Data:


This situation arises when some data is missing in the data. It can be
handled in various ways.
Some of them are:

1. Ignore the tuples:


This approach is suitable only when the dataset we have is quite large
and multiple values are missing within a tuple.

2. Fill the Missing values:


There are various ways to do this task. You can choose to fill the
missing values manually, by attribute mean or the most probable
value.

 (b). Noisy Data:


Noisy data is a meaningless data that can’t be interpreted by machines.It
can be generated due to faulty data collection, data entry errors etc. It can
be handled in following ways :

1. Binning Method:
This method works on sorted data in order to smooth it. The whole
data is divided into segments of equal size and then various methods
are performed to complete the task. Each segmented is handled
separately. One can replace all data in a segment by its mean or
boundary values can be used to complete the task.

2. Regression:
Here data can be made smooth by fitting it to a regression
function.The regression used may be linear (having one independent
variable) or multiple (having multiple independent variables).

3. Clustering:
This approach groups the similar data in a cluster. The outliers may be
undetected or it will fall outside the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable
for mining process. This involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0
or 0.0 to 1.0)

2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of
attributes to help the mining process.

3. Discretization:
This is done to replace the raw values of numeric attribute by interval
levels or conceptual levels.

4. Concept Hierarchy Generation:


Here attributes are converted from lower level to higher level in hierarchy.
For Example-The attribute “city” can be converted to “country”.

3. Data Reduction:
Data reduction is a crucial step in the data mining process that involves
reducing the size of the dataset while preserving the important information.
This is done to improve the efficiency of data analysis and to avoid overfitting
of the model. Some common steps involved in data reduction are:
Feature Selection: This involves selecting a subset of relevant features from
the dataset. Feature selection is often performed to remove irrelevant or
redundant features from the dataset. It can be done using various techniques
such as correlation analysis, mutual information, and principal component
analysis (PCA).
Feature Extraction: This involves transforming the data into a lower-
dimensional space while preserving the important information. Feature
extraction is often used when the original features are high-dimensional and
complex. It can be done using techniques such as PCA, linear discriminant
analysis (LDA), and non-negative matrix factorization (NMF).
Sampling: This involves selecting a subset of data points from the dataset.
Sampling is often used to reduce the size of the dataset while preserving the
important information. It can be done using techniques such as random
sampling, stratified sampling, and systematic sampling.
Clustering: This involves grouping similar data points together into clusters.
Clustering is often used to reduce the size of the dataset by replacing similar
data points with a representative centroid. It can be done using techniques
such as k-means, hierarchical clustering, and density-based clustering.
Compression: This involves compressing the dataset while preserving the
important information. Compression is often used to reduce the size of the
dataset for storage and transmission purposes. It can be done using
techniques such as wavelet compression, JPEG compression, and gzip
compression.

Participate in Three 90 Challenge! Enroll in any GeeksforGeeks course and


get 90% refund by completing 90% course. Explore offer now.

You might also like