0% found this document useful (0 votes)

5 views

ML UNIT-5 (1)

The document discusses various distance measures and algorithms in machine learning, focusing on the K-Nearest Neighbors (KNN) algorithm and hierarchical clustering. KNN is a supervised learning method that classifies data points based on their similarity to existing data, while hierarchical clustering organizes data into a tree-like structure without needing a predefined number of clusters. Additionally, it covers ensemble methods like bagging and boosting, highlighting their roles in improving model accuracy through the combination of multiple classifiers.

Uploaded by

sardarshaiksardar08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

ML UNIT-5 (1)

Uploaded by

sardarshaiksardar08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

UNIT-5

DISTANCE MEASURES:
Minkowski distance is primarily used in machine learning and data science, particularly in algorithms
that require measuring the similarity or dissimilarity between data points, like K-Nearest Neighbors
(KNN), clustering algorithms (e.g., K-Means), and other classification tasks
K-Nearest Neighbor(KNN) Algorithm for Machine
Learning
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to
the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat
and dog, but we want to know either it is a cat or dog. So for this
identification, we can use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of the new data set to
the cats and dogs images and based on the most similar features it will put it
in either cat or dog category.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To solve
this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below
diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
o Step-4: Among these k neighbors, count the number of the data points in
each category.
o Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
o Step-6: Our model is ready.

o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:

o As we can see the 3 nearest neighbors are from category A, hence this new
data point must belong to category A.
How to select the value of K in the K-NN Algorithm?
o There is no particular way to determine the best value for "K", so we need to
try some values to find the best out of them. The most preferred value for K
is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.

Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the
data points for all the training samples.
DISTANCE BASED CLUSTURING:
Hierarchical clustering is another unsupervised machine learning algorithm, which is
used to group the unlabeled datasets into a cluster and also known as hierarchical
cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this
tree-shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look
similar, but they both differ depending on how they work. As there is no
requirement to predetermine the number of clusters as we did in the K-Means
algorithm.

The hierarchical clustering technique has two approaches:

Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm
starts with taking all data points as single clusters and merging them until one
cluster is left.

Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is

a top-down approach.

Why hierarchical clustering?

As we already have other clustering algorithms such as K-Means Clustering, then

why we need hierarchical clustering? So, as we have seen in the K-means clustering
that there are some challenges with this algorithm, which are a predetermined
number of clusters, and it always tries to create the clusters of the same size. To
solve these two challenges, we can opt for the hierarchical clustering algorithm
because, in this algorithm, we don't need to have knowledge about the predefined
number of clusters.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To

group the datasets into clusters, it follows the bottom-up approach. It means, this
algorithm considers each dataset as a single cluster at the beginning, and then start
combining the closest pair of clusters together. It does this until all the clusters are
merged into a single cluster that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

The working of the AHC algorithm can be explained using the below steps:
Step-1: Create each data point as a single cluster. Let's say there are N data points,
so the number of clusters will also be N.

Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.

Step-3: Again, take the two closest clusters and merge them together to form one
cluster. There will be N-2 clusters.

Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.

Measure for the distance between two clusters

As we have seen, the closest distance between the two clusters is crucial for the
hierarchical clustering. There are various ways to calculate the distance between
two clusters, and these ways decide the rule for clustering. These measures are
called Linkage methods. Some of the popular linkage methods are given below:

Single Linkage: It is the Shortest Distance between the closest points of the
clusters. Consider the below image:

Complete Linkage: It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter clusters than
single-linkage.

Average Linkage: It is the linkage method in which the distance between each pair
of datasets is added up and then divided by the total number of datasets to
calculate the average distance between two clusters. It is also one of the most
popular linkage methods.

Centroid Linkage: It is the linkage method in which the distance between the
centroid of the clusters is calculated. Consider the below image:

From the above-given approaches, we can apply any of them according to the type
of problem or business requirement.
EXAMPLE:
From the above calculation :
p(yes/sunny,hot) < p(No/sunny,hot)
27 < 73
Hence we can conclude ‘not to play’ as the probability of NO is 73%

Laplace Smoothing

From the dataset, the occurrences of "Overcast" under Play Tennis:

Outloo Play Cou

k Tennis nt

Overcas
Yes 4
t

Overcas
No 0
t

Total occurrences of "Overcast" = 4 (Yes) + 0 (No) = 4

Step 2: Calculate Probability Without Smoothing

Without smoothing, the probability of Play Tennis being "No" when Outlook is "Overcast" is:

P(Yes/Overcast)=4/9

P(No∣Overcast)= 0/5=0
This is problematic because a zero probability would completely eliminate this case from any
classification calculations.

Step 3: Apply Laplace Smoothing

Using Laplace Smoothing (k = 1), we add 1 to all category counts. The formula is:

α = Smoothing parameter (typically 1, but can be any positive number)

K=total no of classes

N=Total number of w’ in class

Since we have two possible classes (Yes and No), we add 2 in the denominator:

Similarly, for "Yes":

P(Yes/Overcast)=4+1/9+2=5/11=0.45=45%

P(No∣Overcast)= 0+1/5+2=1/7=0.14=14%

Why is Laplace Smoothing Important?

 Without smoothing, P(No∣Overcast)=0 means we would never predict "No" if Outlook is

Overcast.
 With smoothing, we allow a small probability 14% for "No", preventing a total elimination of this
case in classification.
 This helps avoid overfilling to a small data set and ensures better generalization.

ENSEMBLING METHODS:
To improve the accuracy (estimate) of the model, ensemble learning methods are developed. Ensemble
is a machine learning concept, in which several models are trained using machine learning algorithms. It
combines low performing classifiers (also called as weak learners or base learner) and combine
individual model prediction for the final prediction.
On the basis of type of base learners, ensemble methods can be categorized as homogeneous and
heterogeneous ensemble methods. If base learners are same, then it is a homogeneous ensemble
method. If base learners are different then it is a heterogeneous ensemble method.
Bagging

Consider a scenario where you are looking at the users’ ratings for a product. Instead of approving one
user’s good/bad rating, we consider average rating given to the product. With average rating, we can be
considerably sure of quality of the product. Bagging makes use of this principle. Instead of depending on
one model, it runs the data through multiple models in parallel, and average them out as model’s final
output.

What is Bagging? How it works?

 Bagging is an acronym for Bootstrapped Aggregation. Bootstrapping means random selection of

records with replacement from the training dataset. ‘Random selection with replacement’ can
be explained as follows:

a. Consider that there are 8 samples in the training dataset. Out of these 8 samples, every weak
learner gets 5 samples as training data for the model. These 5 samples need not be unique, or
non-repetitive.

b. The model (weak learner) is allowed to get a sample multiple times. For example, as shown in
the figure, Rec5 is selected 2 times by the model. Therefore, weak learner1 gets Rec2, Rec5,
Rec8, Rec5, Rec4 as training data.

c. All the samples are available for selection to next weak learners. Thus all 8 samples will be
available for next weak learner and any sample can be selected multiple times by next weak
learners.

 Bagging is a parallel method, which means several weak learners learn the data pattern
independently and simultaneously. This can be best shown in the below diagram:
1. The output of each weak learner is averaged to

3. generate final output of the model.

4. Since the weak learner’s outputs are averaged, this mechanism helps to reduce variance or
variability in the predictions. However, it does not help to reduce bias of the model.

5. Since final prediction is an average of output of each weak learner, it means that each weak
learner has equal weight in the final output.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and it predicts the final
output.

The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by combining N decision tree,
and second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to
the category that wins the majority votes.
Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.

2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.

3. Land Use: We can identify the areas of similar land use by this algorithm.

4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest

o Random Forest is capable of performing both Classification and Regression tasks.

o It is capable of handling large datasets with high dimensionality.

o It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest

o Although random forest can be used for both classification and regression tasks, it is not more
suitable for Regression tasks.

What is Boosting?
Definition: The term ‘Boosting’ refers to a family of algorithms which converts weak learner to strong
learners.

Let’s understand this definition in detail by solving a problem of spam email identification:

How would you classify an email as SPAM or not? Like everyone else, our initial approach would be to
identify ‘spam’ and ‘not spam’ emails using following criteria. If:

1. Email has only one image file (promotional image), It’s a SPAM

2. Email has only link(s), It’s a SPAM

3. Email body consist of sentence like “You won a prize money of $ xxxxxx”, It’s a SPAM

4. Email from our official domain “[email protected]” , Not a SPAM

5. Email from known source, Not a SPAM

Above, we’ve defined multiple rules to classify an email into ‘spam’ or ‘not spam’. But, do you think
these rules individually are strong enough to successfully classify an email? No.

Individually, these rules are not powerful enough to classify an email into ‘spam’ or ‘not
spam’. Therefore, these rules are called as weak learner.

To convert weak learner to strong learner, we’ll combine the prediction of each weak learner using
methods like:
• Using average/ weighted average
• Considering prediction has higher vote

For example: Above, we have defined 5 weak learners. Out of these 5, 3 are voted as ‘SPAM’ and 2 are
voted as ‘Not a SPAM’. In this case, by default, we’ll consider an email as SPAM because we have
higher(3) vote for ‘SPAM’.

How Boosting Algorithms works?

Now we know that, boosting combines weak learner to form a strong rule. An immediate question
which should pop in your mind is, ‘How boosting identify weak rules?‘

To find weak rule, we apply base learning (ML) algorithms with a different distribution. Each time base
learning algorithm is applied, it generates a new weak prediction rule. This is an iterative process. After
many iterations, the boosting algorithm combines these weak rules into a single strong prediction rule.

Here’s another question which might haunt you, ‘How do we choose different distribution for each
round?’

For choosing the right distribution, here are the following steps:

Step 1: The base learner takes all the distributions and assign equal weight or attention to each
observation.

Step 2: If there is any prediction error caused by first base learning algorithm, then we pay higher
attention to observations having prediction error. Then, we apply the next base learning algorithm.

Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is achieved.

Finally, it combines the outputs from weak learner and creates a strong learner which eventually
improves the prediction power of the model. Boosting pays higher focus on examples which are mis-
classiﬁed or have higher errors by preceding weak rules.

Types of Boosting Algorithms

Underlying engine used for boosting algorithms can be anything. It can be decision stamp, margin-
maximizing classification algorithm etc. There are many boosting algorithms which use other types of
engine such as:

1. AdaBoost (Adaptive Boosting)

2. Gradient Tree Boosting

Lab Summative 2 - Franz de Vera
No ratings yet
Lab Summative 2 - Franz de Vera
10 pages
ML UNIT-5
No ratings yet
ML UNIT-5
31 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Clustering
No ratings yet
Clustering
75 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
4.1 Clustering
No ratings yet
4.1 Clustering
80 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Cluster
100% (1)
Cluster
72 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Clustering
No ratings yet
Clustering
20 pages
unsupervised_learning_1
No ratings yet
unsupervised_learning_1
40 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
ML UNIT 2
No ratings yet
ML UNIT 2
17 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
K-Means Clustering and K-Nearest Neighbors Algorithm
No ratings yet
K-Means Clustering and K-Nearest Neighbors Algorithm
62 pages
Clustering
No ratings yet
Clustering
84 pages
MachineLearning Unit IV.pptx
No ratings yet
MachineLearning Unit IV.pptx
51 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Unsupervised-Learning-Part-1 (1)
No ratings yet
Unsupervised-Learning-Part-1 (1)
9 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Clustering Algorithms: Dalya Baron (Tel Aviv University) XXX Winter School, November 2018
No ratings yet
Clustering Algorithms: Dalya Baron (Tel Aviv University) XXX Winter School, November 2018
53 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
80 pages
U-5_IML (2)
No ratings yet
U-5_IML (2)
20 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Mid 2
No ratings yet
Mid 2
11 pages
UNIT- IV UNSUPERVISIED LEARNING_NOTES
No ratings yet
UNIT- IV UNSUPERVISIED LEARNING_NOTES
32 pages
Clustering
No ratings yet
Clustering
35 pages
Unit 2
No ratings yet
Unit 2
33 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
MIDS Lab Theory
No ratings yet
MIDS Lab Theory
6 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
IM Ch14 Big Data Analytics NoSQL Ed12
No ratings yet
IM Ch14 Big Data Analytics NoSQL Ed12
8 pages
Lecture 5 Introduction To Data Mining Business Intelligence
No ratings yet
Lecture 5 Introduction To Data Mining Business Intelligence
50 pages
People'S University, Bhopal Syllabus of Examination Choice Based Credit System (CBCS)
No ratings yet
People'S University, Bhopal Syllabus of Examination Choice Based Credit System (CBCS)
23 pages
ML CHeat sheet
No ratings yet
ML CHeat sheet
3 pages
Just Enough R An Interactive Approach to Machine Learning and Analytics 1st Edition Richard J. Roiger pdf download
100% (2)
Just Enough R An Interactive Approach to Machine Learning and Analytics 1st Edition Richard J. Roiger pdf download
54 pages
ML Notes
No ratings yet
ML Notes
60 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
48 pages
KDD
No ratings yet
KDD
3 pages
Genetic Neural Network Based Data Mining in Prediction of Heart Disease Using Risk Factors
No ratings yet
Genetic Neural Network Based Data Mining in Prediction of Heart Disease Using Risk Factors
5 pages
Clustering
No ratings yet
Clustering
7 pages
6 Osama-DM in Sports
100% (1)
6 Osama-DM in Sports
76 pages
DMW Merged
No ratings yet
DMW Merged
454 pages
Customer Churn Prediction by Hybrid Neural Networks
No ratings yet
Customer Churn Prediction by Hybrid Neural Networks
7 pages
Download
No ratings yet
Download
91 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
2 pages
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
6 pages
BI Unit 1
No ratings yet
BI Unit 1
39 pages
The Weather Forecast Using Data Mining Research Based On Cloud Computing
No ratings yet
The Weather Forecast Using Data Mining Research Based On Cloud Computing
7 pages
Part 1 Section F - Technology and Analytics
No ratings yet
Part 1 Section F - Technology and Analytics
7 pages
Final
No ratings yet
Final
48 pages
19cs521-Data Warehousing and Data Mining
No ratings yet
19cs521-Data Warehousing and Data Mining
3 pages
Gujarat Technological University: Page 1 of 2
No ratings yet
Gujarat Technological University: Page 1 of 2
2 pages
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan - Get the ebook instantly with just one click
100% (1)
Introduction To Data Mining Instructors Solution Manual 1st ed. Edition Tan - Get the ebook instantly with just one click
40 pages
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
100% (1)
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
13 pages
TPW Data Mining
No ratings yet
TPW Data Mining
4 pages
Does Machine Learning Really Work?: Tom M. Mitchell
No ratings yet
Does Machine Learning Really Work?: Tom M. Mitchell
10 pages

ML UNIT-5 (1)

Uploaded by

ML UNIT-5 (1)

Uploaded by

UNIT-5

Why do we need a K-NN Algorithm?

o Step-1: Select the number K of the neighbors

Advantages of KNN Algorithm:

Disadvantages of KNN Algorithm:

The hierarchical clustering technique has two approaches:

Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is

Why hierarchical clustering?

As we already have other clustering algorithms such as K-Means Clustering, then

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

Measure for the distance between two clusters

From the dataset, the occurrences of "Overcast" under Play Tennis:

Outloo Play Cou

Total occurrences of "Overcast" = 4 (Yes) + 0 (No) = 4

Step 2: Calculate Probability Without Smoothing

Step 3: Apply Laplace Smoothing

α = Smoothing parameter (typically 1, but can be any positive number)

N=Total number of w’ in class

Similarly, for "Yes":

Why is Laplace Smoothing Important?

 Without smoothing, P(No∣Overcast)=0 means we would never predict "No" if Outlook is

What is Bagging? How it works?

 Bagging is an acronym for Bootstrapped Aggregation. Bootstrapping means random selection of

3. generate final output of the model.

Random Forest Algorithm

Step-1: Select random K data points from the training set.

Step-4: Repeat Step 1 & 2.

4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest

o Random Forest is capable of performing both Classification and Regression tasks.

o It is capable of handling large datasets with high dimensionality.

Disadvantages of Random Forest

2. Email has only link(s), It’s a SPAM

4. Email from our official domain “[email protected]” , Not a SPAM

5. Email from known source, Not a SPAM

How Boosting Algorithms works?

Types of Boosting Algorithms

1. AdaBoost (Adaptive Boosting)

2. Gradient Tree Boosting

You might also like