0% found this document useful (0 votes)

27 views

IAI&ML UNIT-4

Aiml unit 4 notes

Uploaded by

g.monikadevi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

IAI&ML UNIT-4

Aiml unit 4 notes

Uploaded by

g.monikadevi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

IAI&ML MECH (R20)

UNIT-IV
UNIT– IV: Basic Methods in Supervised Learning: Distance-based methods, Nearest-Neighbors,
Decision Trees, Support Vector Machines, Nonlinearity and Kernel Methods. Unsupervised Learning:
Clustering, K-means, Dimensionality Reduction, PCA and kernel.

Basic Methods in Supervised Learning:

Distance-based methods
Clustering consists of grouping certain objects that are similar to each other, it can be used to decide if
two items are similar or dissimilar in their properties.
The similarity measure is a distance with dimensions describing object features. That means if the
distance among two data points is small then there is a high degree of similarity among the objects and
vice versa. The similarity is subjective and depends heavily on the context and application. For example,
similarity among vegetables can be determined from their taste, size, colour etc.

Most clustering approaches use distance measures to assess the similarities or differences between a pair
of objects, the most popular distance measures used are:
1. Euclidean Distance
2. Manhattan Distance
3. Jaccard Index
4. Minkowski distance
5. Cosine Index
1. Euclidean Distance:

Euclidean distance is considered the traditional metric for problems with geometry. It can be simply
explained as the ordinary distance between two points. It is one of the most used algorithms in the
cluster analysis. One of the algorithms that use this formula would be K-mean. Mathematically it
computes the root of squared differences between the coordinates between two objects.

1
IAI&ML MECH (R20)
Figure – Euclidean Distance

2. Manhattan Distance:

This determines the absolute difference among the pair of the coordinates.
Suppose we have two points P and Q to determine the distance between these points we simply have to
calculate the perpendicular distance of the points from X-Axis and Y-Axis.In a plane with P at coordinate
(x1, y1) and Q at (x2, y2).
Manhattan distance between P and Q = |x1 – x2| + |y1 – y2|

3. Jaccard Index:

The Jaccard distance measures the similarity of the two data set items as the intersection of those items
divided by the union of the data items.

2
IAI&ML MECH (R20)

Figure – Jaccard Index

Example: Note that we need to transform the data into binary form before applying Jaccard Index. Let’s
consider Store 1 and Store 2 sell below items and each item is considered as an element.

Then, we can observe that bread, jam, coke and cake are sold by both stores. Hence, 1 is assigned for both
stores.

Jaccard Index value ranges from 0 to 1. Higher the similarity when Jaccard index is high.

4. Minkowski distance:

It is the generalized form of the Euclidean and Manhattan Distance Measure. In an N-dimensional space,
a point is represented as (x1, x2, ..., xN)
Consider two points P1 and P2:
P1: (X1, X2, ..., XN)
P2: (Y1, Y2, ..., YN)

Then, the Minkowski distance between P1 and P2 is given as:

 When p = 2, Minkowski distance is same as the Euclidean distance.
 When p = 1, Minkowski distance is same as the Manhattan distance.

3
IAI&ML MECH (R20)

5. Cosine Index:

Cosine distance measure for clustering determines the cosine of the angle between two vectors given by
the following formula.

Here (theta) gives the angle between two vectors and A, B are n-dimensional vectors.

Figure – Cosine Distance

Cosine Similarity values range between -1 and 1. Lower the cosine similarity, low is the similarity b/w two
observations.

Example: Consider shirt brand rating by 2 customer on the rate of 5 scale.

Allen Solly ArrowPeter England US PoloVan Heusen Zodiac

Customer 1

Customer 2

4
IAI&ML MECH (R20)
Decision Tree Classification Algorithm
o Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision based on given
conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further
branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

5
IAI&ML MECH (R20)

Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset and
problem is the main point to remember while creating a machine learning model. Below are the two reasons
for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like structure.

Decision Tree Terminologies:

 Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.

 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a
leaf node.

 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.

 Branch/Sub Tree: A tree formed by splitting the tree.

 Pruning: Pruning is the process of removing the unwanted branches from the tree.

 Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the
child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the
tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based
on the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue
this process until a stage is reached where you cannot further classify the nodes and called the final node as
a leaf node.

6
IAI&ML MECH (R20)

Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next decision node (distance from the office) and one leaf node
based on the corresponding labels. The next decision node further gets split into one decision node (Cab
facility) and one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:

Attribute Selection Measures:

While implementing a Decision tree, the main issue arises that how to select the best attribute for the root
node and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the
tree. There are two popular techniques for ASM, which are:

 Information Gain
 Gini Index

1. Information Gain:
o Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an
attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision tree.
o A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute
having the highest information gain is split first. It can be calculated using the below formula:

7
IAI&ML MECH (R20)
1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data.
Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples

o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification
and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the important features
of the dataset. Therefore, a technique that decreases the size of the learning tree without reducing accuracy
is known as Pruning. There are mainly two types of tree pruning technology used:

o Cost Complexity Pruning

o Reduced Error Pruning.

Advantages of the Decision Tree:

o It is simple to understand as it follows the same process which a human follow while making any decision
in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

8
IAI&ML MECH (R20)
o It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
o For more class labels, the computational complexity of the decision tree may increase.

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used
for Classification as well as Regression problems. However, primarily, it is used for Classification problems
in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below
diagram in which there are two different categories that are classified using a decision boundary or
hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we
see a strange cat that also has some features of dogs, so if we want a model that can accurately identify
whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train
our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs,
and then we test it with this strange creature. So as support vector creates a decision boundary between these
two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and
dog. On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:

9
IAI&ML MECH (R20)
Video

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot
be classified by using a straight line, then such data is termed as non-linear data and classifier used is called
as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance between
the data points.

10
IAI&ML MECH (R20)
Support Vectors:The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence
called a Support vector.

How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that
has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can
classify the pair(x1, x2) of coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can
be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is
called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These
points are called support vectors. The distance between the vectors and the hyperplane is called as margin.

11
IAI&ML MECH (R20)
And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:

So to separate these data points, we need to add one more dimension. For linear data, we have used two
dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

12
IAI&ML MECH (R20)

So now, SVM will divide the datasets into classes in the following way. Consider the below image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space
with z=1, then it will become as:

13
IAI&ML MECH (R20)

Nonlinearity and Kernel Methods

Kernels or kernel methods (also called Kernel functions) are sets of different types of algorithms that are
being used for pattern analysis. They are used to solve a non-linear problem by using a linear classifier.
Kernels Methods are employed in SVM (Support Vector Machines) which are used in classification and
regression problems. The SVM uses “Kernel Trick” where the data is transformed and an optimal boundary
is found for the possible outputs.

The Need for Kernel Method and its Working

Before we get into the working of the Kernel Methods, it is more important to understand support vector
machines or the SVMs because kernels are implemented in SVM models. So, Support Vector Machines are
supervised machine learning algorithms that are used in classification and regression problems such as
classifying an apple to class fruit while classifying a Lion to the class animal.

To demonstrate, below is what support vector machines look like:

Here we can see a hyperplane which is separating green dots from the blue ones. A hyperplane is one
dimension less than the ambient plane. E.g. in the above figure, we have 2 dimension which represents the
ambient space but the lone which divides or classifies the space is one dimension less than the ambient
space and is called hyperplane.

But what if we have input like this:

14
IAI&ML MECH (R20)

It is very difficult to solve this classification using a linear classifier as there is no good linear line that
should be able to classify the red and the green dots as the points are randomly distributed. Here comes the
use of kernel function which takes the points to higher dimensions, solves the problem over there and
returns the output. Think of this in this way, we can see that the green dots are enclosed in some perimeter
area while the red one lies outside it, likewise, there could be other scenarios where green dots might be
distributed in a trapezoid-shaped area.

So what we do is to convert the two-dimensional plane which was first classified by one-dimensional
hyperplane (“or a straight line”) to the three-dimensional area and here our classifier i.e. hyperplane will not
be a straight line but a two-dimensional plane which will cut the area.

In order to get a mathematical understanding of kernel, let us understand the Lili Jiang’s equation of kernel
which is:

K(x, y)=<f(x), f(y)> where,

K is the kernel function,
X and Y are the dimensional inputs,
f is the map from n-dimensional to m-dimensional space and,
< x, y > is the dot product.

Example:
Let us say that we have two points, x= (2, 3, 4) and y= (3, 4, 5)

As we have seen, K(x, y) = < f(x), f(y) >.

Let us first calculate < f(x), f(y) >

f(x)=(x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)

f(y)=(y1y1, y1y2, y1y3, y2y1, y2y2, y2y3, y3y1, y3y2, y3y3)

15
IAI&ML MECH (R20)

so,
f(2, 3, 4)=(4, 6, 8, 6, 9, 12, 8, 12, 16)and
f(3 ,4, 5)=(9, 12, 15, 12, 16, 20, 15, 20, 25)
so the dot product,
f (x). f (y) = f(2,3,4) . f(3,4,5)=
(36 + 72 + 120 + 72 +144 + 240 + 120 + 240 + 400)=
1444
And,
K(x, y) = (2*3 + 3*4 + 4*5) ^2=(6 + 12 + 20)^2=38*38=1444.

This as we find out, f(x).f(y) and K(x, y) give us the same result, but the former method required a lot of
calculations(because of projecting 3 dimensions into 9 dimensions) while using the kernel, it was much
easier.

Types of Kernel and methods in SVM

Some of the kernel function or the types that are being used in SVM:

1. Liner Kernel
Let us say that we have two vectors with name x1 and Y1, then the linear kernel is defined by the dot
product of these two vectors:

K(x1, x2) = x1 . x2

2. Polynomial Kernel
A polynomial kernel is defined by the following equation:

K(x1, x2) = (x1 . x2 + 1)d,

Where,

d is the degree of the polynomial and x1 and x2 are vectors

3. Gaussian Kernel
This kernel is an example of a radial basis function kernel. Below is the equation for this:

The given sigma plays a very important role in the performance of the Gaussian kernel and should neither
be overestimated and nor be underestimated, it should be carefully tuned according to the problem.

4. Exponential Kernel
This is in close relation with the previous kernel i.e. the Gaussian kernel with the only difference is – the
square of the norm is removed.

16
IAI&ML MECH (R20)

The function of the exponential function is:

This is also a radial basis kernel function.

5. Laplacian Kernel
This type of kernel is less prone for changes and is totally equal to previously discussed exponential
function kernel, the equation of Laplacian kernel is given as:

6. Hyperbolic or the Sigmoid Kernel

This kernel is used in neural network areas of machine learning. The activation function for the sigmoid
kernel is the bipolar sigmoid function. The equation for the hyperbolic kernel function is:

This kernel is very much used and popular among support vector machines.

7. Anova radial basis kernel

This kernel is known to perform very well in multidimensional regression problems just like the Gaussian
and Laplacian kernels. This also comes under the category of radial basis kernel.

The equation for Anova kernel is :

There are a lot more types of Kernel Method and we have discussed the mostly used kernels. It purely
depends on the type of problem which will decide the kernel function to be used.

Unsupervised Learning:

Clustering
Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It can be
defined as "A way of grouping the data points into different clusters, consisting of similar data

17
IAI&ML MECH (R20)

points. The objects with the possible similarities remain in a group that has less or no similarities with
another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color, behavior, etc.,
and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with
the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can
use this id to simplify the processing of large and complex datasets.

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but the difference is the type of
dataset that we are using. In classification, we work with the labeled data set, whereas in clustering,
we work with the unlabelled dataset.

Example: Let's understand the clustering technique with the real-world example of Mall: When we visit any
shopping mall, we can observe that the things with similar usage are grouped together. Such as the t-shirts
are grouped in one section, and trousers are at other sections, similarly, at vegetable sections, apples,
bananas, Mangoes, etc., are grouped in separate sections, so that we can easily find out the things. The
clustering technique also works in the same way. Other examples of clustering are grouping documents
according to the topic.

The clustering technique can be widely used in various tasks. Some most common uses of this technique
are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.

Apart from these general usages, it is used by the Amazon in its recommendation system to provide the
recommendations as per the past search of products. Netflix also uses this technique to recommend the
movies and web-series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the different fruits are
divided into several groups with similar properties.

18
IAI&ML MECH (R20)

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one group)
and Soft Clustering (data points can belong to another group also). But there are also other various
approaches of Clustering exist. Below are the main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

1. Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the centroid-
based method. The most common example of partitioning clustering is the K-Means Clustering
algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of pre-defined
groups. The cluster center is created in such a way that the distance between the data points of one cluster is
minimum as compared to another cluster centroid.

19
IAI&ML MECH (R20)
2. Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of high densities into clusters. The dense areas in data
space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying densities and high
dimensions.

3. Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability of how a
dataset belongs to a particular distribution. The grouping is done by assuming some distributions
commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses Gaussian
Mixture Models (GMM).

4. Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no requirement
of pre-specifying the number of clusters to be created. In this technique, the dataset is divided into clusters

20
IAI&ML MECH (R20)
to create a tree-like structure, which is also called a dendrogram. The observations or any number of
clusters can be selected by cutting the tree at the correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.

5. Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more than one group or
cluster. Each dataset has a set of membership coefficients, which depend on the degree of membership to be
in a cluster. Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also known
as the Fuzzy k-means algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained above. There are
different types of clustering algorithms published, but only a few are commonly used. The clustering
algorithm is based on the kind of data that we are using. Such as, some algorithms need to guess the number
of clusters in the given dataset, whereas some are required to find the minimum distance between the
observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in machine learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It classifies the dataset
by dividing the samples into different clusters of equal variances. The number of clusters must be specified in this
algorithm. It is fast with fewer computations required, with the linear complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density of data points. It is an
example of a centroid-based model, that works on updating the candidates for centroid to be the center of the points
within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise. It is an example
of a density-based model similar to the mean-shift, but with some remarkable advantages. In this algorithm, the areas
of high density are separated by the areas of low density. Because of this, the clusters can be found in any arbitrary
shape.

21
IAI&ML MECH (R20)
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative for the k-means
algorithm or for those cases where K-means can be failed. In GMM, it is assumed that the data points are Gaussian
distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the bottom-up
hierarchical clustering. In this, each data point is treated as a single cluster at the outset and then successively merged.
The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require to specify the number of
clusters. In this, each data point sends a message between the pair of data points until convergence. It has O(N 2T) time
complexity, which is the main drawback of this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used for the identification of cancerous cells.
It divides the cancerous and non-cancerous data sets into different groups.
o In Search Engines: Search engines also work on the clustering technique. The search result appears based on the
closest object to the search query. It does it by grouping similar data objects in one group that is far from the other
dissimilar objects. The accurate result of a query depends on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based on their choice and
preferences.
o In Biology: It is used in the biology stream to classify different species of plants and animals using the image
recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use in the GIS database. This
can be very useful to find that for what purpose the particular land should be used, that means for which purpose it is
more suitable.

K-means Algorithm
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

22
IAI&ML MECH (R20)
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats
the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a
cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

23
IAI&ML MECH (R20)
Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given below:

Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different clusters. It
means here we will try to group these datasets into two different clusters.

We need to choose some random k points or centroid to form the cluster. These points can be either the
points from the dataset or any other point. So, here we are selecting the below two points as k points, which
are not the part of our dataset. Consider the below image:

Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute
it by applying some mathematics that we have studied to calculate the distance between two points. So, we
will draw a median between both the centroids. Consider the below image:

24
IAI&ML MECH (R20)

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points
to the right of the line are close to the yellow centroid. Let's color them as blue and yellow for clear
visualization.

As we need to find the closest cluster, so we will repeat the process by choosing a new centroid. To choose the new
centroids, we will compute the center of gravity of these centroids, and will find new centroids as below:

25
IAI&ML MECH (R20)
Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process of
finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue points are
right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-
points.We will repeat the process by finding the center of gravity of centroids, so the new centroids will be
as shown in the below image:

26
IAI&ML MECH (R20)
o As we got the new centroids so again will draw the median line and reassign the data points. So, the image will be:
o

o We can see in the above image; there are no dissimilar data points on either side of the line, which means our model is
formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as
shown in the below image:

27
IAI&ML MECH (R20)
How to choose the value of "K number of clusters" in K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient clusters that it forms.
But choosing the optimal number of clusters is a big task. There are some different ways to find the optimal
number of clusters, but here we are discussing the most appropriate method to find the number of clusters or
value of K. The method is given below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters. This method uses
the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the total
variations within a cluster. The formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in

CLuster3 distance(Pi C3)2

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point and its
centroid within a cluster1 and the same for the other two terms.

To measure the distance between data points and centroid, we can use any method such as Euclidean
distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below steps:

o It executes the K-means clustering on a given dataset for different K values (ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow method.
The graph for the elbow method looks like the below image:

28
IAI&ML MECH (R20)
Dimensionality Reduction
The number of input features, variables, or columns present in a given dataset is known as dimensionality,
and the process to reduce these features is called dimensionality reduction.

A dataset contains a huge number of input features in various cases, which makes the predictive modeling
task more complicated. Because it is very difficult to visualize or make predictions for the training dataset
with a high number of features, for such cases, dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions
dataset into lesser dimensions dataset ensuring that it provides similar information." These techniques
are widely used in machine learning for obtaining a better fit predictive model while solving the
classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech recognition, signal
processing, bioinformatics, etc. It can also be used for data visualization, noise reduction, cluster
analysis, etc.

29
IAI&ML MECH (R20)
Benefits of applying Dimensionality Reduction

Some benefits of applying dimensionality reduction technique to the given dataset are given below:

o By reducing the dimensions of the features, the space required to store the dataset also gets reduced.
o Less Computation training time is required for reduced dimensions of features.
o Reduced dimensions of features of the dataset help in visualizing the data quickly.
o It removes the redundant features (if present) by taking care of multicollinearity.

Disadvantages of dimensionality Reduction

There are also some disadvantages of applying the dimensionality reduction, which are given below:

o Some data may be lost due to dimensionality reduction.

o In the PCA dimensionality reduction technique, sometimes the principal components required to consider are
unknown.

Approaches of Dimension Reduction

There are two ways to apply the dimension reduction technique, which are given below:

Feature Selection
Feature selection is the process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a way of
selecting the optimal features from the input dataset.

Three methods are used for the feature selection:

1. Filters Methods

In this method, the dataset is filtered, and a subset that contains only the relevant features is taken. Some
common techniques of filters method are:

o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.

2. Wrappers Methods

The wrapper method has the same goal as the filter method, but it takes a machine learning model for its
evaluation. In this method, some features are fed to the ML model, and evaluate the performance. The
performance decides whether to add those features or remove to increase the accuracy of the model. This

30
IAI&ML MECH (R20)
method is more accurate than the filtering method but complex to work. Some common techniques of
wrapper methods are:

o Forward Selection
o Backward Selection
o Bi-directional Elimination

3. Embedded Methods: Embedded methods check the different training iterations of the machine
learning model and evaluate the importance of each feature. Some common techniques of Embedded
methods are:

o LASSO
o Elastic Net
o Ridge Regression, etc.

Feature Extraction:
Feature extraction is the process of transforming the space containing many dimensions into space with
fewer dimensions. This approach is useful when we want to keep the whole information but use fewer
resources while processing the information.

Some common feature extraction techniques are:

1. Principal Component Analysis

2. Linear Discriminant Analysis
3. Kernel PCA
4. Quadratic Discriminant Analysis

Common techniques of Dimensionality Reduction

1. Principal Component Analysis
2. Backward Elimination
3. Forward Selection
4. Score comparison
5. Missing Value Ratio
6. Low Variance Filter
7. High Correlation Filter
8. Random Forest
9. Factor Analysis
10. Auto-Encoder

31
IAI&ML MECH (R20)
PCA and kernel
Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality
reduction in machine learning. It is a statistical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of orthogonal transformation. These
new transformed features are called the Principal Components. It is one of the popular tools that is
used for exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from
the given dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the important variables and
drops the least important variable.

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance

o Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given dataset. More easily, it is the
number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each other. Such as if one changes, the
other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if variables are
inversely proportional to each other, and +1 indicates that variables are directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and hence the correlation between the pair
of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector if Av is
the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of variables is called the Covariance
Matrix.

Principal Components in PCA

As described above, the transformed new features or the output of PCA are the Principal Components.
The number of these PCs are either equal to or less than the original features present in the dataset.
Some properties of these principal components are given below:

o The principal component must be the linear combination of the original features.
o These components are orthogonal, i.e., the correlation between a pair of variables is zero.

32
IAI&ML MECH (R20)
o The importance of each component decreases when going to 1 to n, it means the 1 PC has the most importance,
and n PC will have the least importance.

Steps for PCA algorithm

1. Getting the dataset

Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the training set, and
Y is the validation set.

2. Representing data into a structure

Now we will represent our dataset into a structure. Such as we will represent the two-dimensional matrix of
independent variable X. Here each row corresponds to the data items, and the column corresponds to the Features.
The number of columns is the dimensions of the dataset.

3. Standardizing the data

In this step, we will standardize our dataset. Such as in a particular column, the features with high variance are
more important compared to the features with lower variance. If the importance of features is independent of the
variance of the feature, then we will divide each data item in a column with the standard deviation of the column.
Here we will name the matrix as Z.

4. Calculating the Covariance of Z

To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose, we will
multiply it by Z. The output matrix will be the Covariance matrix of Z.

5. Calculating the Eigen Values and Eigen Vectors

Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance matrix Z. Eigenvectors or
the covariance matrix are the directions of the axes with high information. And the coefficients of these
eigenvectors are defined as the eigenvalues.

6. Sorting the Eigen Vectors

In this step, we will take all the eigenvalues and will sort them in decreasing order, which means from largest to
smallest. And simultaneously sort the eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix
will be named as P*.

7. Calculating the new features Or Principal Components

33
IAI&ML MECH (R20)
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In the resultant matrix
Z*, each observation is the linear combination of original features. Each column of the Z* matrix is independent
of each other.

8. Remove less or unimportant features from the new dataset.

The new feature set has occurred, so we will decide here what to keep and what to remove. It means, we will only
keep the relevant or important features in the new dataset, and unimportant features will be removed out.

Applications of Principal Component Analysis

o PCA is mainly used as the dimensionality reduction technique in various AI applications such as computer
vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA is used are
Finance, data mining, Psychology, etc.

Exam Final
100% (1)
Exam Final
21 pages
Using Supervised Learning To Predict English Premier League Match
No ratings yet
Using Supervised Learning To Predict English Premier League Match
79 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
UNIT 2 - Groups (Decision Tree) (1)
No ratings yet
UNIT 2 - Groups (Decision Tree) (1)
20 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
NOTES
No ratings yet
NOTES
18 pages
chapter 04
No ratings yet
chapter 04
48 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
AI UNIT 4
No ratings yet
AI UNIT 4
17 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision tree
No ratings yet
Decision tree
16 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
Lab 2
No ratings yet
Lab 2
3 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Tree
No ratings yet
Tree
7 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
1822-b.e-cse-batchno-149
No ratings yet
1822-b.e-cse-batchno-149
66 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree (1)
No ratings yet
Decision Tree (1)
7 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Recommendation Systems
No ratings yet
Recommendation Systems
27 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Learning Types ML
No ratings yet
Learning Types ML
18 pages
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
No ratings yet
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
20 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
No ratings yet
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
50 pages
decision tree new
No ratings yet
decision tree new
8 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
u34
No ratings yet
u34
4 pages
Unit 4
No ratings yet
Unit 4
33 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Lecture Material 12
No ratings yet
Lecture Material 12
9 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
5 Learning
No ratings yet
5 Learning
7 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
5.desion Tree
No ratings yet
5.desion Tree
18 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Naive_Bayes_vs_decision_trees_in_intrusion_detecti
No ratings yet
Naive_Bayes_vs_decision_trees_in_intrusion_detecti
6 pages
Functional Bid Landscape Forecasting for Display Advertising
No ratings yet
Functional Bid Landscape Forecasting for Display Advertising
16 pages
Slide 1
No ratings yet
Slide 1
29 pages
Fraud Detection Based-On Data Mining On Indonesian E-Procurement System (SPSE)
No ratings yet
Fraud Detection Based-On Data Mining On Indonesian E-Procurement System (SPSE)
6 pages
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
100% (5)
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
63 pages
DMT MCQ
No ratings yet
DMT MCQ
15 pages
Electric Vehicle Energy Consumption Prediction
No ratings yet
Electric Vehicle Energy Consumption Prediction
15 pages
Integrated Long-Term Stock Selection Models Based On Feature Selection and Machine Learning Algorithms For China Stock Market
No ratings yet
Integrated Long-Term Stock Selection Models Based On Feature Selection and Machine Learning Algorithms For China Stock Market
14 pages
IJSRA-2024-1777
No ratings yet
IJSRA-2024-1777
22 pages
Lecture-04 - Multi Stage Decision Making Decision Tree
No ratings yet
Lecture-04 - Multi Stage Decision Making Decision Tree
3 pages
Cs701 Data Warehouse and Data Mining
No ratings yet
Cs701 Data Warehouse and Data Mining
23 pages
Lesson 5
No ratings yet
Lesson 5
26 pages
AFRAID: Fraud Detection Via Active Inference in Time-Evolving Social Networks
No ratings yet
AFRAID: Fraud Detection Via Active Inference in Time-Evolving Social Networks
8 pages
Note - Before Use Check Answers According To Your Syllabus.: Importance
No ratings yet
Note - Before Use Check Answers According To Your Syllabus.: Importance
31 pages
14MachineLearningDecisionTreeRandomForest - Ipynb - Colaboratory
No ratings yet
14MachineLearningDecisionTreeRandomForest - Ipynb - Colaboratory
29 pages
Models For Machine Learning: M. Tim Jones
No ratings yet
Models For Machine Learning: M. Tim Jones
10 pages
Summer Internship Report
No ratings yet
Summer Internship Report
24 pages
Application of Machine Learning in A Mineral LeachingProcessTaking Pyrolusite Leaching As An Example
No ratings yet
Application of Machine Learning in A Mineral LeachingProcessTaking Pyrolusite Leaching As An Example
9 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
IR Question Bank
100% (2)
IR Question Bank
29 pages
Paper9-Ijisae 12 Batini+Dhanwanth
No ratings yet
Paper9-Ijisae 12 Batini+Dhanwanth
10 pages
Additive Models and Trees
No ratings yet
Additive Models and Trees
34 pages
mothan aiml_removed (1) (1) (1)
No ratings yet
mothan aiml_removed (1) (1) (1)
34 pages
Dhanush - Diabetes Report
No ratings yet
Dhanush - Diabetes Report
4 pages
Or Ug Proj 7310
No ratings yet
Or Ug Proj 7310
884 pages
Decision Tree Random Forest Theory
No ratings yet
Decision Tree Random Forest Theory
13 pages
Data Mining Attrition Analysis
No ratings yet
Data Mining Attrition Analysis
14 pages
Twitter Sentiment Analysis Project Report Compressed
No ratings yet
Twitter Sentiment Analysis Project Report Compressed
33 pages

IAI&ML UNIT-4

Uploaded by

IAI&ML UNIT-4

Uploaded by

IAI&ML MECH (R20)

Basic Methods in Supervised Learning:

Figure – Jaccard Index

Then, the Minkowski distance between P1 and P2 is given as:

Figure – Cosine Distance

Example: Consider shirt brand rating by 2 customer on the rate of 5 scale.

Allen Solly ArrowPeter England US PoloVan Heusen Zodiac

Why use Decision Trees?

Decision Tree Terminologies:

 Branch/Sub Tree: A tree formed by splitting the tree.

How does the Decision Tree algorithm Work?

Attribute Selection Measures:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

o S= Total number of samples

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree

o Cost Complexity Pruning

Advantages of the Decision Tree:

Disadvantages of the Decision Tree

Support Vector Machine Algorithm

SVM can be of two types:

Hyperplane and Support Vectors in the SVM algorithm:

How does SVM works?

Nonlinearity and Kernel Methods

The Need for Kernel Method and its Working

To demonstrate, below is what support vector machines look like:

But what if we have input like this:

K(x, y)=<f(x), f(y)> where,

As we have seen, K(x, y) = < f(x), f(y) >.

Let us first calculate < f(x), f(y) >

f(x)=(x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)

Types of Kernel and methods in SVM

K(x1, x2) = (x1 . x2 + 1)d,

d is the degree of the polynomial and x1 and x2 are vectors

The function of the exponential function is:

This is also a radial basis kernel function.

6. Hyperbolic or the Sigmoid Kernel

7. Anova radial basis kernel

The equation for Anova kernel is :

The clustering technique is commonly used for statistical data analysis.

Types of Clustering Methods

3. Distribution Model-Based Clustering

The k-means clustering algorithm mainly performs two tasks:

How does the K-Means Algorithm Work?

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in

In the above formula of WCSS,

Disadvantages of dimensionality Reduction

o Some data may be lost due to dimensionality reduction.

Approaches of Dimension Reduction

Three methods are used for the feature selection:

Some common feature extraction techniques are:

1. Principal Component Analysis

Common techniques of Dimensionality Reduction

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance

Some common terms used in PCA algorithm:

Principal Components in PCA

Steps for PCA algorithm

2. Representing data into a structure

3. Standardizing the data

4. Calculating the Covariance of Z

5. Calculating the Eigen Values and Eigen Vectors

6. Sorting the Eigen Vectors

7. Calculating the new features Or Principal Components

8. Remove less or unimportant features from the new dataset.

Applications of Principal Component Analysis

You might also like