Basic Concepts of Data Mining, Clustering and Genetic Algorithms
Basic Concepts of Data Mining, Clustering and Genetic Algorithms
. of member a as classify and center cluster nearest the
find set, training in the patern each For set. training entire he Classify t
C X C
X
i
i
Loop until the
change in cluster
means is less the
amount specified
by the user.
The drawbacks of K-means
clustering
The final clusters do not represent a global
optimization result but only the local one,
and complete different final clusters can
arise from difference in the initial randomly
chosen cluster centers. (fig. 1)
We have to know how many clusters we
will have at the first.
Drawback of K-means clustering
(Cont.)
Figure 1
Clustering with Genetic
Algorithm
Introduction of Genetic Algorithm
Elements consisting GAs
Genetic Representation
Genetic operators
Introduction of GAs
Inspired by biological evolution.
Many operators mimic the process of the
biological evolution including
Natural selection
Crossover
Mutation
Elements consisting GAs
Individual (chromosome):
feasible solution in an optimization
problem
Population
Set of individuals
Should be maintained in each generation
Elements consisting GAs
Genetic operators. (crossover, mutation)
Define the fitness function.
The fitness function takes a single
chromosome as input and returns a
measure of the goodness of the solution
represented by the chromosome.
Genetic Representation
The most important starting point to develop a
genetic algorithm
Each gene has its special meaning
Based on this representation, we can define
fitness evaluation function,
crossover operator,
mutation operator.
Genetic Representation (Cont.)
Examples 1
Outlook
0
Wind
1
PlayTennis
1
Overcast
Rain
Sunny
1 1
Strong
Normal
Yes
No
0 0
If Outlook is
Overcast or Rain
and
Wind is Strong,
then
PlayTennis = Yes
0 1 1 1 0 1 0
A chromosome
Gene
Allele value
Genetic Representation (Cont.)
Examples 2 ( In clustering problem)
Each chromosome represents a set of clusters; each
gene represents an object; each allele value represents a
cluster. Genes with the same allele value are in the
same cluster.
1 2 1 4 3 5 5
A B C D E F G
Crossover
Exchange features of two individuals to produce
two offspring (children)
Selected mates may have good properties to
survive in next generations
So, we can expect that exchanging features may
produce other good individuals
Crossover (cont.)
Single-point Crossover
Two-point Crossover
Uniform Crossover
1 1 0 1 1
0 0 0 0 1
0 0 1 0 0 0
0 1 0 1 0 1
1 1 0 1 1
0 0 0 0 1
0 1 0 1 0 1
0 0 1 0 0 0
1 1 0 1 1
0 0 0 0 1
0 0 1 0 0 0
0 1 0 1 0 1
1 1 0 0 1
0 0 0 1 1
0 1 1 0 0 0
0 0 0 1 0 1
1 0 1 0 1 0 1 0 0 1 1
1 1 0 1 1
0 0 0 0 1
0 0 1 0 0 0
0 1 0 1 0 1
1 0 0 0 1
0 1 0 1 1
0 0 0 1 0 0
0 1 1 0 0 1
Crossover template
Mutation
Usually change a single bit in a bit string
This operator should happen with very low
probability.
0 1 0 1 1
0 1 1 1 1
Mutation point
(random)
Typical Procedures
Crossover mates are probabilistically
selected based on their fitness value.
0 1 0 0 1
1 1 0 1 0
0 0 1 1 1
0 1 0 1 1
1 1 0 1 0
1 1 0 1 1
1 1 0 1 1
0 1 0 0 1
1 1 0 0 1
0 1 0 1 1
Crossover point
randomly selected
1 1 0 0 1
0 1 1 1 1
0 1 1 1 1
old generation
new generation
0 1 0 1 1
1 1 0 1 0
1 1 0 1 1
Mutation point
(random)
Probabilistically select individuals
Preparing the chromosomes
Defining genetic operators
Fusion: takes two unique allele values and combines them into a
single allele value, combining two clusters into one.
Fission: takes a single allele value and gives it a different random
allele value, breaking a cluster apart.
Defining fitness functions
How to apply GA on a clustering
problem
1 2 3 3 5
1 2 3 3 5 3 2 3 3 5
1 3 3 3 5 1 3 4 4 5
Example: (Cont.)
Crossover
Mutation
Fusion
Fission
Old generation
New generation
Select the chromosomes
according to the fitness
function.
1 2 3 3 5
1 2 4 3 5
2 1 3 3 5
2 2 4 3 5
1 1 1 3 5
2 2 3 2 5
1 2 5 3 5
2 4 3 3 4
2 2 4 3 5
2 1 2 3 5
Finally