0% found this document useful (0 votes)

5 views

Machine Learning

The document discusses different types of unsupervised machine learning techniques including hierarchical clustering, non-hierarchical clustering, and K-means clustering. Hierarchical clustering builds clusters hierarchically from individual data points into larger clusters, while non-hierarchical clustering directly splits or merges clusters. K-means clustering is a popular non-hierarchical technique that partitions data into K clusters by minimizing distances between data points and cluster centroids.

Uploaded by

Keertana R

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Machine Learning

Uploaded by

Keertana R

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Machine Learning

Unit 4: Unsupervised Learning and Association Mining

Hierarchical Clustering: Hierarchical clustering is basically an unsupervised clustering

technique which involves creating clusters in a predefined order. The clusters are ordered
in a top to bottom manner. In this type of clustering, similar clusters are grouped together
and are arranged in a hierarchical manner. It can be further divided into two types namely
agglomerative hierarchical clustering and Divisive hierarchical clustering. In this
clustering, we link the pairs of clusters all the data objects are there in the hierarchy.

Non-Hierarchical Clustering: Non-Hierarchical Clustering involves formation of new

clusters by merging or splitting the clusters. It does not follow a tree like structure like
hierarchical clustering. This technique groups the data in order to maximize or minimize
some evaluation criteria. K means clustering is an effective way of non-hierarchical
clustering. In this method the partitions are made such that non-overlapping groups
having no hierarchical relationships between themselves.

Difference between Hierarchical Clustering and Non-Hierarchical Clustering:

S.NO. Hierarchical Clustering: Non-Hierarchical Clustering:

Non-Hierarchical Clustering involves

Hierarchical Clustering involves
formation of new clusters by merging or
1. creating clusters in a predefined
splitting the clusters instead of following a
order from top to bottom.
hierarchical order.

It is considered less reliable than It is comparatively more reliable than

2.
Non-Hierarchical Clustering. Hierarchical Clustering.

It is considered slower than Non- It is comparatively faster than Hierarchical

3.
Hierarchical Clustering. Clustering.
S.NO. Hierarchical Clustering: Non-Hierarchical Clustering:

It is very problematic to apply

It can work better than Hierarchical
4. this technique when we have
clustering even when error is there.
data with high level of error.

The clusters are difficult to read and

It is comparatively easier to read
5. understand as compared to Hierarchical
and understand.
clustering.

It is relatively unstable than

6. It is a relatively stable technique.
Non-Hierarchical clustering.

Hierarchical clustering is a popular unsupervised machine learning technique used to

group similar data points into clusters based on their similarity or dissimilarity. It is called
“hierarchical” because it creates a tree-like hierarchy of clusters, where each node
represents a cluster that can be further divided into smaller sub-clusters.

There are two types of hierarchical clustering techniques:

 Agglomerative and
 Divisive clustering

Agglomerative Clustering

Agglomerative clustering is a type of hierarchical clustering algorithm that merges the

most similar pairs of data points or clusters, building a hierarchy of clusters until all the
data points belong to a single cluster. It starts with each data point as its own cluster and
then iteratively merges the most similar pairs of clusters until all data points belong to a
single cluster

Divisive Clustering

Divisive Clustering is the technique that starts with all data points in a single cluster and
recursively splits the clusters into smaller sub-clusters based on their dissimilarity. It is
also known as, “top-down” clustering. It starts with all data points in a single cluster, and
then recursively splits the clusters into smaller sub-clusters based on their dissimilarity.

Unlike agglomerative clustering, which starts with each data point as its own cluster and
iteratively merges the most similar pairs of clusters, divisive clustering is a “divide and
conquer” approach that breaks a large cluster into smaller sub-clusters

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the

clustering problems in machine learning or data science.

K-Means Algorithm

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabelled

dataset into different clusters. Here K defines the number of pre-defined clusters that need
to be created in the process, as if K=2, there will be two clusters, and for K=3, there will
be three clusters, and so on.

It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabelled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

main aim of this algorithm is to minimize the sum of distances between the data point and
their corresponding clusters.

The algorithm takes the unlabelled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

 Determines the best value for K center points or centroids by an iterative process.
 Assigns each data point to its closest k-center. Those data points which are near to
the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:
Working of K-Means Algorithm

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables
is given below:
Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different
clusters.

We need to choose some random k points or centroid to form the cluster. These points can
be either the points from the dataset or any other point. So, here we are selecting the
below two points as k points, which are not the part of our dataset. Consider the below
image:

Now we will assign each data point of the scatter plot to its closest K-point or centroid. We
will compute it by applying some mathematics that we have studied to calculate the
distance between two points. So, we will draw a median between both the centroids.
Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color
them as blue and yellow for clear visualization.

As we need to find the closest cluster, so we will repeat the process by choosing a new
centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:
Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same
process of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two
blue points are right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
We will repeat the process by finding the center of gravity of centroids, so the new centroids
will be as shown in the below image:

As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:

We can see in the above image; there are no dissimilar

data points on either side of the line, which means our model is formed. Consider the
below image:
As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:

K-Means Algorithm has a few limitations which are as follows:

 It only identifies spherical-shaped clusters i.e it cannot identify, if the clusters are
non-spherical or of various sizes and densities.
 It suffers from local minima and has a problem when the data contains outliers.

Bisecting K-Means

Bisecting K-Means Algorithm is a modification of the K-Means algorithm. It is a hybrid

approach between partitional and hierarchical clustering. It can recognize clusters of any
shape and size. This algorithm is convenient because:

It beats K-Means in entropy measurement.

When K is big, bisecting k-means is more effective. Every data point in the data collection
and k centroids are used in the K-means method for computation. On the other hand,
only the data points from one cluster and two centroids are used in each Bisecting stage
of Bisecting k-means. As a result, computation time is shortened.

While k-means is known to yield clusters of varied sizes, bisecting k-means results in
clusters of comparable sizes.

Bisecting K-Means Algorithm:

Initialize the list of clusters to accommodate the cluster consisting of all points.

repeat

Discard a cluster from the list of clusters.

{ Perform several “trial” bisections of the selected cluster. }

for i = 1 to number of trials do

Bisect the selected clusters using basic K-means.

end for

Select the 2 clusters from the bisection with the least total SSE.

until Until the list of clusters contain ‘K’ clusters

The working of this algorithm can be condensed into two steps.

Firstly, let us assume the number of clusters required at the final stage, ‘K’ = 3 (Any value
can be assumed, if not mentioned).

Step 01:

All points/objects/instances are put into 1 cluster.

Step 02:

Apply K-Means (K=3). The cluster ‘GFG’ is split into two clusters ‘GFG1’ and ‘GFG2’. The
required number of clusters aren’t obtained yet. Thus, ‘GFG1’ is further split into two
(since it has a higher SSE (formula to calculate SSE is explained below))
In the above diagram, as we split the cluster ‘GFG’ into ‘GFG1’ and ‘GFG2’, we calculate
the SSE of the two clusters separately using the above formula. The cluster, with the
higher SSE, will be split further. The cluster, with the lower SSE, contains lesser errors
comparatively, and hence won’t be split further.

Here, if we get the calculation that the cluster ‘GFG1’ is the one with higher SSE, we split
it into (GFG1)` and (GFG1)`. The number of clusters required at the final stage is
mentioned as ‘3’, and we obtained 3 clusters.

If the required number of clusters is not obtained, we should continue splitting until they
are produced.

K-Medoids clustering

K-Medoids and K-Means are two types of clustering mechanisms in Partition Clustering.
First, Clustering is the process of breaking down an abstract group of data points/ objects
into classes of similar objects such that all the objects in one cluster have similar traits. ,
a group of n objects is broken down into k number of clusters based on their similarities.
Two statisticians, Leonard Kaufman, and Peter J. Rousseeuw came up with this method.

K-medoids is an unsupervised method with unlabelled data to be clustered. It is an

improvised version of the K-Means algorithm mainly designed to deal with outlier data
sensitivity. Compared to other partitioning algorithms, the algorithm is simple, fast, and
easy to implement.

The partitioning will be carried on such that:

 Each cluster must have at least one object

 An object must belong to only one cluster

K-Medoids:

Medoid: A Medoid is a point in the cluster from which the sum of distances to other data
points is minimal.

(or)

A Medoid is a point in the cluster from which dissimilarities with all the other points in
the clusters are minimal.

Instead of centroids as reference points in K-Means algorithms, the K-Medoids algorithm

takes a Medoid as a reference point.
There are three types of algorithms for K-Medoids Clustering:

 PAM (Partitioning Around Clustering)

 CLARA (Clustering Large Applications)
 CLARANS (Randomized Clustering Large Applications)

PAM is the most powerful algorithm of the three algorithms but has the disadvantage of
time complexity. The following K-Medoids are performed using PAM. In the further parts,
we'll see what CLARA and CLARANS are.

Algorithm:

 Given the value of k and unlabelled data:

 Choose k number of random points from the data and assign these k points to k
number of clusters. These are the initial medoids.
 For all the remaining data points, calculate the distance from each medoid and
assign it to the cluster with the nearest medoid.
 Calculate the total cost (Sum of all the distances from all the data points to the
medoids)
 Select a random point as the new medoid and swap it with the previous medoid.
Repeat 2 and 3 steps.
 If the total cost of the new medoid is less than that of the previous medoid, make
the new medoid permanent and repeat step 4.
 If the total cost of the new medoid is greater than the cost of the previous medoid,
undo the swap and repeat step 4.
 The Repetitions have to continue until no change is encountered with new medoids
to classify data points.

CLARA:

It is an extension to PAM to support Medoid clustering for large data sets. This algorithm
selects data samples from the data set, applies Pam on each sample, and outputs the best
Clustering out of these samples. This is more effective than PAM. We should ensure that
the selected samples aren't biased as they affect the Clustering of the whole data.

CLARANS:

This algorithm selects a sample of neighbors to examine instead of selecting samples from
the data set. In every step, it examines the neighbors of every node. The time complexity
of this algorithm is O(n2), and this is the best and most efficient Medoids algorithm of all.
Advantages of using K-Medoids:

 Deals with noise and outlier data effectively

 Easily implementable and simple to understand
 Faster compared to other partitioning algorithms

Disadvantages:

 Not suitable for Clustering arbitrarily shaped groups of data points.

 As the initial medoids are chosen randomly, the results might vary based on the
choice in different runs.

Association Rule Learning

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be
more profitable. It tries to find some interesting relations or associations among the
variables of dataset. It is based on different rules to discover the interesting relations
between variables in the database.

The association rule learning is one of the very important concepts of machine learning,
and it is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover
the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together are put
together.

For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk,
so these products are stored within a shelf or mostly nearby. Consider the below diagram:
Working of Association Rule Learning

Association rule learning works on the concept of If and Else Statement, such as if A then
B.

Here the If element is called antecedent, and then statement is called as Consequent.
These types of relationships where we can find out some association or relation between
two items is known as single cardinality. It is all about creating rules, and if the number
of items increases, then cardinality also increases accordingly. So, to measure the
associations between thousands of data items, there are several metrics. These metrics
are given below:

 Support
 Confidence
 Lift

Support

Support is the frequency of A or how frequently an item appears in the dataset. It is defined
as the fraction of the transaction T that contains the itemset X. If there are X datasets,
then for transactions T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often the items
X and Y occur together in the dataset when the occurrence of X is already given. It is the
ratio of the transaction that contains X and Y to the number of records that contain X.

Lift

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:

If Lift= 1: The probability of occurrence of antecedent and consequent is independent of

each other.

Lift>1: It determines the degree to which the two itemsets are dependent to each other.

Lift<1: It tells us that one item is a substitute for other items, which means one item has
a negative effect on another.

Apriori Algorithm in Machine Learning

The Apriori algorithm uses frequent itemsets to generate association rules, and it is
designed to work on the databases that contain transactions. With the help of these
association rule, it determines how strongly or how weakly two objects are connected. This
algorithm uses a breadth-first search and Hash Tree to calculate the itemset associations
efficiently. It is the iterative process for finding the frequent itemsets from the large
dataset.

This algorithm was given by the R. Agrawal and Srikant in the year 1994. It is mainly used
for market basket analysis and helps to find those products that can be bought together.
It can also be used in the healthcare field to find drug reactions for patients.

Frequent Itemset

Frequent itemsets are those items whose support is greater than the threshold value or
user-specified minimum support. It means if A & B are the frequent itemsets together,
then individually A and B should also be the frequent itemset.8:10llscreen

Suppose there are the two transactions: A= {1,2,3,4,5}, and B= {2,3,7}, in these two
transactions, 2 and 3 are the frequent itemsets.

Note: To better understand the apriori algorithm, and related term such as support and
confidence, it is recommended to understand the association rule learning.
Steps for Apriori Algorithm

Below are the steps for the apriori algorithm:

Step-1: Determine the support of itemsets in the transactional database, and select the
minimum support and confidence.

Step-2: Take all supports in the transaction with higher support value than the minimum
or selected support value.

Step-3: Find all the rules of these subsets that have higher confidence value than the
threshold or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.

Apriori Algorithm Working

We will understand the apriori algorithm using an example and mathematical calculation:

Example: Suppose we have the following dataset that has various transactions, and from
this dataset, we need to find the frequent itemsets and generate the association rules using
the Apriori algorithm:

Solution:

Step-1: Calculating C1 and L1:

In the first step, we will create a table that contains support count (The frequency of each
itemset individually in the dataset) of each itemset in the given dataset. This table is called
the Candidate set or C1.

Now, we will take out all the itemsets that have the greater support count that the
Minimum Support (2). It will give us the table for the frequent itemset L1.
Since all the itemsets have greater or equal support count than the minimum support,
except the E, so E itemset will be removed.
Step-2: Candidate Generation C2, and L2:

In this step, we will generate C2 with the help of L1. In C2, we will create the pair of the
itemsets of L1 in the form of subsets.

After creating the subsets, we will again find the support count from the main transaction
table of datasets, i.e., how many times these pairs have occurred together in the given
dataset. So, we will get the below table for C2:

Again, we need to compare the C2 Support count with the minimum support count, and
after comparing, the itemset with less support count will be eliminated from the table C2.
It will give us the below table for L2

Step-3: Candidate generation C3, and L3:

For C3, we will repeat the same two processes, but now we will form the C3 table with
subsets of three itemsets together, and will calculate the support count from the dataset.
It will give the below table:
Now we will create the L3 table. As we can see from the above C3 table, there is only one
combination of itemset that has support count equal to the minimum support count. So,
the L3 will have only one combination, i.e., {A, B, C}.

Step-4: Finding the association rules for the subsets:

To generate the association rules, first, we will create a new table with the possible rules
from the occurred combination {A, B.C}. For all the rules, we will calculate the Confidence
using formula sup( A ^B)/A. After calculating the confidence value for all rules, we will
exclude the rules that have less confidence than the minimum threshold(50%).

Consider the below table:

Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

As the given threshold or minimum confidence is 50%, so the first three rules A ^B → C,
B^C → A, and A^C → B can be considered as the strong association rules for the given
problem.

Advantages of Apriori Algorithm

 This is easy to understand algorithm
 The join and prune steps of the algorithm can be easily implemented on large
datasets.

Disadvantages of Apriori Algorithm

 The apriori algorithm works slow compared to other algorithms.

 The overall performance can be reduced as it scans the database for multiple times.
 The time complexity and space complexity of the apriori algorithm is O(2D), which
is very high. Here D represents the horizontal width present in the database.

FP -Growth

The FP-Growth Algorithm proposed by Han in. This is an efficient and scalable method for
mining the complete set of frequent patterns by pattern fragment growth, using an
extended prefix-tree structure for storing compressed and crucial information about
frequent patterns named frequent-pattern tree (FP-tree). In his study, Han proved that his
method outperforms other popular methods for mining frequent patterns, e.g. the Apriori
Algorithm and the Tree Projection. In some later works, it was proved that FP-Growth
performs better than other methods, including Eclat and Relim. The popularity and
efficiency of the FP-Growth Algorithm contribute to many studies that propose variations
to improve its performance.

FP Growth Algorithm

The FP-Growth Algorithm is an alternative way to find frequent item sets without using
candidate generations, thus improving performance. For so much, it uses a divide-and-
conquer strategy. The core of this method is the usage of a special data structure named
frequent-pattern tree (FP-tree), which retains the item set association information.

This algorithm works as follows:

 First, it compresses the input database creating an FP-tree instance to represent

frequent items.
 After this first step, it divides the compressed database into a set of conditional
databases, each associated with one frequent pattern.
 Finally, each such database is mined separately.
 Using this strategy, the FP-Growth reduces the search costs by recursively looking
for short patterns and then concatenating them into the long frequent patterns.
 In large databases, holding the FP tree in the main memory is impossible. A strategy
to cope with this problem is to partition the database into a set of smaller databases
(called projected databases) and then construct an FP-tree from each of these
smaller databases.

FP-Tree

The frequent-pattern tree (FP-tree) is a compact data structure that stores quantitative
information about frequent patterns in a database. Each transaction is read and then
mapped onto a path in the FP-tree. This is done until all transactions have been read.
Different transactions with common subsets allow the tree to remain compact because
their paths overlap.

A frequent Pattern Tree is made with the initial item sets of the database. The purpose of
the FP tree is to mine the most frequent pattern. Each node of the FP tree represents an
item of the item set.

The root node represents null, while the lower nodes represent the item sets. The
associations of the nodes with the lower nodes, that is, the item sets with the other item
sets, are maintained while forming the tree.

Han defines the FP-tree as the tree structure given below:

One root is labelled as "null" with a set of item-prefix subtrees as children and a frequent-
item-header table.

Each node in the item-prefix subtree consists of three fields:

 Item-name: registers which item is represented by the node;

 Count: the number of transactions represented by the portion of the path reaching
the node;
 Node-link: links to the next node in the FP-tree carrying the same item name or null
if there is none.

Each entry in the frequent-item-header table consists of two fields:

 Item-name: as the same to the node;

 Head of node-link: a pointer to the first node in the FP-tree carrying the item name.
Dimensionality Reduction

The number of input features, variables, or columns present in a given dataset is known
as dimensionality, and the process to reduce these features is called dimensionality
reduction.

A dataset contains a huge number of input features in various cases, which makes the
predictive modelling task more complicated. Because it is very difficult to visualize or make
predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting the higher
dimensions dataset into lesser dimensions dataset ensuring that it provides similar
information." These techniques are widely used in machine learning for obtaining a better
fit predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.

Principal Component Analysis (PCA)

Principal Component Analysis is a statistical process that converts the observations of

correlated features into a set of linearly uncorrelated features with the help of orthogonal
transformation. These new transformed features are called the Principal Components. It
is one of the popular tools that is used for exploratory data analysis and predictive
modelling.

PCA works by considering the variance of each attribute because the high attribute shows
the good split between the classes, and hence it reduces the dimensionality. Some real-
world applications of PCA are image processing, movie recommendation system,
optimizing the power allocation in various communication channels.

SVD, or Singular Value Decomposition, is one of several techniques that can be used to
reduce the dimensionality, i.e., the number of columns, of a data set. Why would we want
to reduce the number of dimensions? In predictive analytics, more columns normally
means more time required to build models and score data. If some columns have no
predictive value, this means wasted time, or worse, those columns contribute noise to the
model and reduce model quality or predictive accuracy.
Dimensionality reduction can be achieved by simply dropping columns, for example, those
that may show up as collinear with others or identified as not being particularly predictive
of the target as determined by an attribute importance ranking technique. But it can also
be achieved by deriving new columns based on linear combinations of the original
columns. In both cases, the resulting transformed data set can be provided to machine
learning algorithms to yield faster model build times, faster scoring times, and more
accurate models.

While SVD can be used for dimensionality reduction, it is often used in digital signal
processing for noise reduction, image compression, and other areas.
SVD is an algorithm that factors an m x n matrix, M, of real or complex values into three
component matrices, where the factorization has the form USV*. U is an m x p matrix. S is
a p x p diagonal matrix. V is an n x p matrix, with V* being the transpose of V, a p x
n matrix, or the conjugate transpose if M contains complex values. The value p is called
the rank. The diagonal entries of S are referred to as the singular values of M. The columns
of U are typically called the left-singular vectors of M, and the columns of V are called the
right-singular vectors of M.

Consider the following visual representation of these matrices:

One of the features of SVD is that given the decomposition of M into U, S, and V, one can
reconstruct the original matrix M, or an approximation of it. The singular values in the
diagonal matrix S can be used to understand the amount of variance explained by each of
the singular vectors.

Exclusive Buyer Agent
No ratings yet
Exclusive Buyer Agent
7 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Clustering
No ratings yet
Clustering
10 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
algo
No ratings yet
algo
59 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
ML UNIT 2
No ratings yet
ML UNIT 2
17 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Unit 4
No ratings yet
Unit 4
74 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
DM UNIT IV (1)
No ratings yet
DM UNIT IV (1)
45 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
17 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Machine_Learning_Unit_4
No ratings yet
Machine_Learning_Unit_4
22 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Unit IV
No ratings yet
Unit IV
96 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Clustering
No ratings yet
Clustering
11 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Kmean
No ratings yet
Kmean
24 pages
Clustering
No ratings yet
Clustering
125 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
MODULE 4 - 5TH SEM (2)
No ratings yet
MODULE 4 - 5TH SEM (2)
23 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
K Mean
No ratings yet
K Mean
7 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Power Mill Thai
No ratings yet
Power Mill Thai
179 pages
Commercial Law Notes
No ratings yet
Commercial Law Notes
3 pages
Problems Unit 1. Alkanes Unit 2. Cycloalkanes
No ratings yet
Problems Unit 1. Alkanes Unit 2. Cycloalkanes
31 pages
Practice Problems Engi Math University of Mindanao
No ratings yet
Practice Problems Engi Math University of Mindanao
6 pages
Kopi Kenceng Marketing Strategy On Instagram and Facebook During The Pandemic
No ratings yet
Kopi Kenceng Marketing Strategy On Instagram and Facebook During The Pandemic
6 pages
Inside GEs Transformation
No ratings yet
Inside GEs Transformation
22 pages
Study Plan Imran Khan
No ratings yet
Study Plan Imran Khan
2 pages
Vidmar Catalog
No ratings yet
Vidmar Catalog
116 pages
DMN FST Manual Sp-Ec-Rot-Mt Reset Module en 20100713-Ms
No ratings yet
DMN FST Manual Sp-Ec-Rot-Mt Reset Module en 20100713-Ms
2 pages
(Solution manual) Business Analytics 3rd Edition James R. Evans instant download
100% (1)
(Solution manual) Business Analytics 3rd Edition James R. Evans instant download
9 pages
POGIL Presentation 9th GCC Medical Education Conference
No ratings yet
POGIL Presentation 9th GCC Medical Education Conference
16 pages
NHLS 603 F001 Spin Protocol Viral RNA Extraction Using QIAamp RNA Mini Kit Work Sheet
No ratings yet
NHLS 603 F001 Spin Protocol Viral RNA Extraction Using QIAamp RNA Mini Kit Work Sheet
10 pages
LCA Biomas Combustion
No ratings yet
LCA Biomas Combustion
15 pages
15.waves 1-7
No ratings yet
15.waves 1-7
7 pages
289 Whats New and Exciting in SAP Screen Personas 3.0 - Demo, Migration From Version 2, and Deployment at Scale PDF
No ratings yet
289 Whats New and Exciting in SAP Screen Personas 3.0 - Demo, Migration From Version 2, and Deployment at Scale PDF
13 pages
BSBWHS616 - CAC Class Activities.v1.0
No ratings yet
BSBWHS616 - CAC Class Activities.v1.0
16 pages
Eco-Friendly Management of Insect Pests in Agriculture by Pheromones: A Review
No ratings yet
Eco-Friendly Management of Insect Pests in Agriculture by Pheromones: A Review
14 pages
Riva 44' RIVARAMA - Lady M
No ratings yet
Riva 44' RIVARAMA - Lady M
15 pages
Modelling and Analysis of Thermo-Acoustic Instability of Primixed Flame
No ratings yet
Modelling and Analysis of Thermo-Acoustic Instability of Primixed Flame
15 pages
Cash Chapter 5
No ratings yet
Cash Chapter 5
8 pages
Text Analytics Movie Reviews
No ratings yet
Text Analytics Movie Reviews
1,316 pages
Jegan CV
No ratings yet
Jegan CV
4 pages
Struktur Teks News Item Text
No ratings yet
Struktur Teks News Item Text
17 pages
Electronics Engineer 04-2017 Room Assignment
No ratings yet
Electronics Engineer 04-2017 Room Assignment
7 pages
How To Build A Giant Jenga Set
No ratings yet
How To Build A Giant Jenga Set
8 pages
Breathing and Exchange of Gases Practice Questions
No ratings yet
Breathing and Exchange of Gases Practice Questions
1 page
Methods of Enhancing Wood and Bamboo Products
No ratings yet
Methods of Enhancing Wood and Bamboo Products
4 pages
Resource Management Plan Template
No ratings yet
Resource Management Plan Template
10 pages
Kannur University Syllabus
No ratings yet
Kannur University Syllabus
93 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

Machine Learning

Unit 4: Unsupervised Learning and Association Mining

Hierarchical Clustering: Hierarchical clustering is basically an unsupervised clustering

Non-Hierarchical Clustering: Non-Hierarchical Clustering involves formation of new

Difference between Hierarchical Clustering and Non-Hierarchical Clustering:

S.NO. Hierarchical Clustering: Non-Hierarchical Clustering:

Non-Hierarchical Clustering involves

It is considered less reliable than It is comparatively more reliable than

It is considered slower than Non- It is comparatively faster than Hierarchical

It is very problematic to apply

The clusters are difficult to read and

It is relatively unstable than

Hierarchical clustering is a popular unsupervised machine learning technique used to

There are two types of hierarchical clustering techniques:

Agglomerative clustering is a type of hierarchical clustering algorithm that merges the

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabelled

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

The k-means clustering algorithm mainly performs two tasks:

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

We can see in the above image; there are no dissimilar

K-Means Algorithm has a few limitations which are as follows:

Bisecting K-Means Algorithm is a modification of the K-Means algorithm. It is a hybrid

It beats K-Means in entropy measurement.

Bisecting K-Means Algorithm:

Discard a cluster from the list of clusters.

for i = 1 to number of trials do

Bisect the selected clusters using basic K-means.

until Until the list of clusters contain ‘K’ clusters

All points/objects/instances are put into 1 cluster.

K-medoids is an unsupervised method with unlabelled data to be clustered. It is an

The partitioning will be carried on such that:

 Each cluster must have at least one object

Instead of centroids as reference points in K-Means algorithms, the K-Medoids algorithm

 PAM (Partitioning Around Clustering)

 Given the value of k and unlabelled data:

 Deals with noise and outlier data effectively

 Not suitable for Clustering arbitrarily shaped groups of data points.

Association Rule Learning

It is the strength of any rule, which can be defined as below formula:

If Lift= 1: The probability of occurrence of antecedent and consequent is independent of

Apriori Algorithm in Machine Learning

Below are the steps for the apriori algorithm:

Step-4: Sort the rules as the decreasing order of lift.

Apriori Algorithm Working

Step-1: Calculating C1 and L1:

Step-3: Candidate generation C3, and L3:

Step-4: Finding the association rules for the subsets:

Consider the below table:

Rules Support Confidence

A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%

B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%

A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%

C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%

A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%

B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

Advantages of Apriori Algorithm

Disadvantages of Apriori Algorithm

 The apriori algorithm works slow compared to other algorithms.

This algorithm works as follows:

 First, it compresses the input database creating an FP-tree instance to represent

Han defines the FP-tree as the tree structure given below:

Each node in the item-prefix subtree consists of three fields:

 Item-name: registers which item is represented by the node;

Each entry in the frequent-item-header table consists of two fields:

 Item-name: as the same to the node;

Principal Component Analysis (PCA)

Principal Component Analysis is a statistical process that converts the observations of

Consider the following visual representation of these matrices:

You might also like