0% found this document useful (0 votes)

236 views18 pages

Clustering PPT 1233

Uploaded by

gugulothdevendarnaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

236 views18 pages

Clustering PPT 1233

Uploaded by

gugulothdevendarnaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Clustering In

Machine Learning
BY
BODA SANTOSH NAIK(EC21B020)
BANOTH ROHITH(EC21B015)
DESAVATH SIVA NAIK(EC21B024)
Introduction to Clustering
Clustering is an unsupervised learning technique used to group
similar data points.

It helps in discovering inherent patterns within datasets without

prior labels.

Clustering is widely used in various applications such as image

segmentation and customer segmentation.

PAGE-2
Importance of Clustering

Clustering simplifies complex datasets by reducing dimensionality.

It facilitates
. better data analysis by grouping similar items together.

Clustering can improve decision-making processes in business and research.

PAGE-3
Types of Clustering

Clustering can be categorized into several types,

including centroid-based, density-based, and
hierarchical clustering.

Each type has its own methodology and use cases

suited for different data distributions.

Understanding the types of clustering is crucial for

selecting the appropriate algorithm.
Centroid-Based Clustering
K-Means is a widely used centroid-based clustering
algorithm.

It partitions the data into K clusters by minimizing

the variance within each cluster.

The algorithm iteratively updates cluster centroids

until convergence is reached.
Density-Based Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) identifies clusters based
on the density of data points.

DBSCAN requires two parameters: epsilon (neighborhood radius) and minPts (minimum points
to form a cluster).

Density Based consists of 3 types of data points

Core point : It should satisfy the condition of min. pts

Boundary point : Neighbour of Core.

Noise point : Not core nor boundary

PAGE-6
Hierarchical Clustering

Hierarchical clustering creates a tree-like

structure to represent data relationships.

It can be agglomerative (bottom-up) or

divisive (top-down) in its approach.

Dendrograms are commonly used to

visualize the results of hierarchical
clustering.

PAGE-7
Evaluation Metrics

Evaluation matrices are crucial tools in machine learning for assessing the performance of a model. They
provide quantitative measures to understand how well a model is making predictions. Here are some
commonly used evaluation matrices.
.
Classification of matrices:
Accuracy : Accuracy is a matrices that measures how often a machine learning model correctly
predicts the outcomes.

Precision : Precision performance the quality of a positive prediction made by the model.

Recall : Recall is a machine learning metric that measures how well a model can identify positive
instances in a dataset.

PAGE-8
Challenges in Clustering

Clustering is sensitive to outliers, which

can distort the results significantly.

The choice of the number of clusters (K)

in algorithms like K-Means can be
subjective.

High-dimensional data often leads to the

“curse of dimensionality,” complicating
clustering.
Practical Applications

Clustering is used in customer segmentation

to tailor marketing strategies effectively.

It plays a critical role in image and video

processing for object recognition.

In bioinformatics, clustering helps in gene

expression analysis and protein
classification.

PAGE-10
Tools and Libraries

Popular libraries for clustering in Python include Scikit-

learn, Scipy, and HDBSCAN.

R also offers robust clustering packages such as 'cluster'

and 'factoextra’.

These tools provide easy-to-use implementations of

various clustering algorithms.

Pandas is useful for data manipulation and preprocessing

before clustering.

Numpy is useful for numerical operations, it’s often used

for implementing clustering algorithms from scratch.
Case Study: Customer Segmentation
A retail company used K-Means clustering to segment its
customer base into distinct groups.

This segmentation enabled targeted marketing campaigns

and improved customer engagement.

The results showed a significant increase in sales and

customer satisfaction.

PAGE-12
Case Study: Image Segmentation

Researchers applied DBSCAN for segmenting complex

images in a computer vision project.

The algorithm effectively identified regions of interest

while ignoring background noise.

This segmentation improved the accuracy of subsequent

image classification tasks.

The segmentation approach was applied to real-world

data, such as satellite images and medical scans, Where
DBSCAN successfully identified key region like urban
areas or tumor boundaries, further validating its
effectiveness.

PAGE-13
Future Directions

The integration of clustering with deep learning

techniques is an emerging trend.

Research is focusing on developing algorithms that

can handle dynamic and streaming data.

Further advancements in clustering will enhance its

applicability across various domains.

PAG-14
Best Practices

Always preprocess your data to remove noise and

handle missing values before clustering.

Experiment with multiple algorithms and

parameters to find the most suitable method for
your data.

Visualize the clusters formed to gain insights and

validate the clustering results.

PAGE-15
Conclusion
Clustering is a powerful tool for data analysis that
uncovers hidden structures in data.

Understanding different clustering algorithms and

their applications is essential for practitioners.

As data continues to grow, the importance and

relevance of clustering in machine learning will
only increase

PAG-16
PAG-17

Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Mini Max
100% (1)
Mini Max
9 pages
CS6659 AI UNIT 3 Notes
50% (4)
CS6659 AI UNIT 3 Notes
30 pages
Artificial Intelligence CS-3431w (V2)
No ratings yet
Artificial Intelligence CS-3431w (V2)
15 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
JVM (Java Virtual Machine) Architecture
No ratings yet
JVM (Java Virtual Machine) Architecture
4 pages
Artificial Intelligence CS-3431w (V2)
No ratings yet
Artificial Intelligence CS-3431w (V2)
23 pages
Chapter 6 - Automated and Emerging Technologies
No ratings yet
Chapter 6 - Automated and Emerging Technologies
148 pages
Software Mining Repository Practical
No ratings yet
Software Mining Repository Practical
28 pages
Machine Learning Algorithms
100% (1)
Machine Learning Algorithms
15 pages
CS6659 UNIT 5 Notes
89% (9)
CS6659 UNIT 5 Notes
25 pages
Managing Errors and Exception
67% (3)
Managing Errors and Exception
12 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
01 - Introduction To Data Science
No ratings yet
01 - Introduction To Data Science
77 pages
Soft Computing
No ratings yet
Soft Computing
39 pages
Clustering
No ratings yet
Clustering
21 pages
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
Data Mining Presentation On
No ratings yet
Data Mining Presentation On
11 pages
AI-ques-ans-Unit-1 Prof. Anuj Khanna KOIT
100% (1)
AI-ques-ans-Unit-1 Prof. Anuj Khanna KOIT
17 pages
DBMS Notes
No ratings yet
DBMS Notes
180 pages
Unit VI Software Testing New
No ratings yet
Unit VI Software Testing New
11 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Artificial Intelligence Question Bank
100% (2)
Artificial Intelligence Question Bank
8 pages
Unit 2 AI
No ratings yet
Unit 2 AI
22 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Software Project Management Questionnaire
No ratings yet
Software Project Management Questionnaire
18 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Interview Questions
No ratings yet
Interview Questions
9 pages
SE Unit 3
No ratings yet
SE Unit 3
10 pages
Tree Traversals (Inorder, Preorder and Postorder)
No ratings yet
Tree Traversals (Inorder, Preorder and Postorder)
4 pages
CPE121 - Chapter01 - Introduction To Data Structures and Algorithm
No ratings yet
CPE121 - Chapter01 - Introduction To Data Structures and Algorithm
24 pages
mst-2 Record4docx
No ratings yet
mst-2 Record4docx
115 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
16 pages
Unit 4
No ratings yet
Unit 4
4 pages
Adversarial Search 2020
No ratings yet
Adversarial Search 2020
34 pages
Lec11&12-Adversarial Search
No ratings yet
Lec11&12-Adversarial Search
30 pages
Eceg-4221-Vlsi Lec 01 Overview
No ratings yet
Eceg-4221-Vlsi Lec 01 Overview
42 pages
IM Jhtp7 Ch21
No ratings yet
IM Jhtp7 Ch21
19 pages
Presentation 2
No ratings yet
Presentation 2
36 pages
Unit-6 CGMA Multimedia Notes
No ratings yet
Unit-6 CGMA Multimedia Notes
5 pages
Cmi Naac SSR PDF
No ratings yet
Cmi Naac SSR PDF
145 pages
ML Lab
No ratings yet
ML Lab
62 pages
AI CH3 Unit3
No ratings yet
AI CH3 Unit3
40 pages
Android Studio Viva Questions
No ratings yet
Android Studio Viva Questions
23 pages
Data Warehousing and Data Mining (10cs755)
No ratings yet
Data Warehousing and Data Mining (10cs755)
142 pages
ML Notes Updated
No ratings yet
ML Notes Updated
60 pages
Java String
No ratings yet
Java String
29 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Tell Me About Your Self
No ratings yet
Tell Me About Your Self
3 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
CS2055 - Software Quality Assurance
No ratings yet
CS2055 - Software Quality Assurance
15 pages
B Tree Assignment
No ratings yet
B Tree Assignment
4 pages
Java J2ee Syllabus JBK PDF
No ratings yet
Java J2ee Syllabus JBK PDF
3 pages
Data Structures 2
No ratings yet
Data Structures 2
82 pages
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
No ratings yet
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
10 pages
Software Engineering - Notes
No ratings yet
Software Engineering - Notes
92 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
MVC Design Pattern PPT Presented by QuontraSolutions
No ratings yet
MVC Design Pattern PPT Presented by QuontraSolutions
35 pages
Ireb Cpre FL
No ratings yet
Ireb Cpre FL
5 pages
QB Students DM
No ratings yet
QB Students DM
12 pages
5G Lte Endc
No ratings yet
5G Lte Endc
26 pages
SAP HANA Hadoop Integration
No ratings yet
SAP HANA Hadoop Integration
16 pages
V5 Catia2 Week3
No ratings yet
V5 Catia2 Week3
17 pages
CS0024 M3S2
No ratings yet
CS0024 M3S2
35 pages
Huong Dan Thuc Hanh
No ratings yet
Huong Dan Thuc Hanh
7 pages
Functions
No ratings yet
Functions
34 pages
Resume 2
No ratings yet
Resume 2
1 page
6infoman Lab 1
No ratings yet
6infoman Lab 1
37 pages
BYODsnfdo IJESE Paper PDF
No ratings yet
BYODsnfdo IJESE Paper PDF
9 pages
Westermo RN Weos v4 32 3
No ratings yet
Westermo RN Weos v4 32 3
40 pages
AIML MANUAL Word Final
No ratings yet
AIML MANUAL Word Final
38 pages
Coursework 2 20 21viii+-++Computer+Simulation+for+Business+-+7BSP0413+iv
No ratings yet
Coursework 2 20 21viii+-++Computer+Simulation+for+Business+-+7BSP0413+iv
5 pages
91 Yolov10
No ratings yet
91 Yolov10
12 pages
Manual Outlook Meeting Room Reservations v03
No ratings yet
Manual Outlook Meeting Room Reservations v03
18 pages
Cuadros
No ratings yet
Cuadros
11 pages
FSM Report
No ratings yet
FSM Report
17 pages
How To Design An FIR Filter Using Frequency Sampling Method? For What Type of Filters The Frequency Sampling Method Is Suitable?
No ratings yet
How To Design An FIR Filter Using Frequency Sampling Method? For What Type of Filters The Frequency Sampling Method Is Suitable?
3 pages
DFD of Social Networking Site Project
No ratings yet
DFD of Social Networking Site Project
4 pages
Game Engine Programming 2 Week 6 Module 1
No ratings yet
Game Engine Programming 2 Week 6 Module 1
18 pages
Prog7 Deployment
No ratings yet
Prog7 Deployment
4 pages
Beyond Apps Digital Literacies in A Plat
No ratings yet
Beyond Apps Digital Literacies in A Plat
7 pages
DaloRadius Planning Full
No ratings yet
DaloRadius Planning Full
4 pages
Full Practical GitOps: Infrastructure Management Using Terraform, AWS, and GitHub Actions 1st Edition Rohit Salecha Ebook All Chapters
No ratings yet
Full Practical GitOps: Infrastructure Management Using Terraform, AWS, and GitHub Actions 1st Edition Rohit Salecha Ebook All Chapters
36 pages
BookChapter Social Exclusion
No ratings yet
BookChapter Social Exclusion
9 pages
Lionbridge Polaris ELearning Module15
No ratings yet
Lionbridge Polaris ELearning Module15
27 pages
DGT Schachbrett Software Livechess 2 User Manual en Rev 1802e
No ratings yet
DGT Schachbrett Software Livechess 2 User Manual en Rev 1802e
39 pages
Ubicacion L 12, MZ 141 CP Chota
No ratings yet
Ubicacion L 12, MZ 141 CP Chota
2 pages
Sample SBX PDF
No ratings yet
Sample SBX PDF
1 page

Clustering PPT 1233

Uploaded by

Clustering PPT 1233

Uploaded by

Clustering In

It helps in discovering inherent patterns within datasets without

Clustering is widely used in various applications such as image

Clustering simplifies complex datasets by reducing dimensionality.

Clustering can improve decision-making processes in business and research.

Clustering can be categorized into several types,

Each type has its own methodology and use cases

Understanding the types of clustering is crucial for

It partitions the data into K clusters by minimizing

The algorithm iteratively updates cluster centroids

Density Based consists of 3 types of data points

Core point : It should satisfy the condition of min. pts

Boundary point : Neighbour of Core.

Noise point : Not core nor boundary

Hierarchical clustering creates a tree-like

It can be agglomerative (bottom-up) or

Dendrograms are commonly used to

Clustering is sensitive to outliers, which

The choice of the number of clusters (K)

High-dimensional data often leads to the

Clustering is used in customer segmentation

It plays a critical role in image and video

In bioinformatics, clustering helps in gene

Popular libraries for clustering in Python include Scikit-

R also offers robust clustering packages such as 'cluster'

These tools provide easy-to-use implementations of

Pandas is useful for data manipulation and preprocessing

Numpy is useful for numerical operations, it’s often used

This segmentation enabled targeted marketing campaigns

The results showed a significant increase in sales and

Researchers applied DBSCAN for segmenting complex

The algorithm effectively identified regions of interest

This segmentation improved the accuracy of subsequent

The segmentation approach was applied to real-world

The integration of clustering with deep learning

Research is focusing on developing algorithms that

Further advancements in clustering will enhance its

Always preprocess your data to remove noise and

Experiment with multiple algorithms and

Visualize the clusters formed to gain insights and

Understanding different clustering algorithms and

As data continues to grow, the importance and

You might also like