0% found this document useful (0 votes)

10 views5 pages

Unit V Machine Learning

Clustering is an unsupervised machine learning technique that groups similar data points into clusters, widely used in areas such as customer segmentation and anomaly detection. It can be categorized into types like hard vs. soft clustering, hierarchical vs. partition-based, and density-based vs. model-based clustering, with various algorithms like K-Means, DBSCAN, and Gaussian Mixture Models. Evaluation metrics for clustering include the Silhouette Score, Davies-Bouldin Index, and Dunn Index, with applications in customer segmentation, image segmentation, and medical diagnosis.

Uploaded by

vaishnavipathakoti145

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views5 pages

Unit V Machine Learning

Uploaded by

vaishnavipathakoti145

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

UNIT V

1. Introduction to Clustering

Clustering is an unsupervised machine learning technique used to group similar data points
into clusters. The goal is to ensure that:

 Data points within a cluster are similar to each other.

 Data points in different clusters are dissimilar.

Clustering is widely used in:

 Customer segmentation

 Pattern recognition

 Image segmentation

 Anomaly detection

 Genomics and Bioinformatics

2. Types of Clustering

Clustering can be categorized into different types based on how clusters are formed.

a) Hard Clustering vs. Soft Clustering

 Hard Clustering: Each data point belongs to only one cluster.

o Example: K-Means Clustering.

 Soft Clustering (Fuzzy Clustering): A data point can belong to multiple clusters with
probabilities.

o Example: Fuzzy C-Means Clustering.

b) Hierarchical vs. Partition-Based Clustering

1. Hierarchical Clustering:

o Creates a tree-like structure of clusters.

o Can be agglomerative (bottom-up) or divisive (top-down).

o Example: Agglomerative Clustering, Divisive Clustering.

2. Partition-Based Clustering:

o Divides the dataset into k predefined clusters.

o Example: K-Means Clustering.

c) Density-Based vs. Model-Based Clustering

1. Density-Based Clustering:

o Clusters are formed based on dense regions in the data.

o Example: DBSCAN (Density-Based Spatial Clustering).

2. Model-Based Clustering:

o Assumes data is generated from a mixture of statistical distributions.

o Example: Gaussian Mixture Models (GMM).

3. Clustering Algorithms

a) K-Means Clustering

 One of the most popular clustering algorithms.

 Steps:

1. Choose K (number of clusters).

2. Select K random points as centroids.

3. Assign each point to the nearest centroid.

4. Recalculate new centroids.

5. Repeat until clusters do not change.

 Pros:

o Simple and efficient for large datasets.

o Works well when clusters are spherical.

 Cons:

o Requires choosing K in advance.

o Sensitive to outliers.

b) Hierarchical Clustering

 Forms a tree structure of clusters (dendrogram).

 Two main types:

o Agglomerative (Bottom-Up): Merges small clusters into bigger ones.

o Divisive (Top-Down): Splits a large cluster into smaller ones.

 Pros:

o No need to specify K.

o Works well for small datasets.

 Cons:

o Computationally expensive for large datasets.

c) DBSCAN (Density-Based Spatial Clustering)

 Groups points based on density.

 Can detect arbitrary-shaped clusters.

 Steps:

1. Select a point.

2. Find all nearby points within a given radius (ε).

3. Expand the cluster if there are enough points (minPts threshold).

4. Ignore noise points.

 Pros:

o Can detect outliers.

o Works well with non-spherical clusters.

 Cons:

o Requires choosing ε (radius) carefully.

o Not effective for varying-density clusters.

d) Gaussian Mixture Model (GMM)

 Uses probabilistic models to form clusters.

 Each cluster follows a Gaussian distribution.

 Works well for overlapping clusters.

 Pros:
o Can model complex cluster shapes.

o Provides probability of belonging to a cluster.

 Cons:

o Requires choosing the number of clusters (K).

e) Fuzzy C-Means (Soft Clustering)

 Each data point has a degree of membership in multiple clusters.

 Instead of assigning a point to only one cluster, it belongs to all clusters with
different probabilities.

 Pros:

o More flexible than hard clustering.

 Cons:

o More computationally expensive.

f) Rough Clustering & Rough K-Means

 Allows uncertain data points to belong to multiple clusters.

 Rough K-Means is an extension of K-Means that handles uncertainty better.

4. Evaluation Metrics for Clustering

Since clustering is unsupervised, we cannot use accuracy. Instead, we use:

1. Silhouette Score:

o Measures how similar a point is to its cluster vs. other clusters.

o Range: -1 (bad) to +1 (good).

2. Davies-Bouldin Index:

o Lower values indicate better clustering.

3. Dunn Index:

o Higher values mean better separation between clusters.

4. Elbow Method (For K-Means):

o Finds the optimal K value by plotting inertia (SSE) vs. K.

5. Applications of Clustering

1. Customer Segmentation:

o Grouping customers based on purchasing behavior.

2. Anomaly Detection:

o Detecting fraud, cyber threats, and network intrusions.

3. Image Segmentation:

o Identifying objects in images.

4. Document Clustering:

o Grouping similar news articles, research papers, or emails.

5. Medical Diagnosis:

o Identifying different disease patterns.

Metrics

1. Silhouette Score: Measures how well-separated clusters are.

o Range: -1 to 1 (Higher is better)

o Good Clustering: > 0.5

2. Davies-Bouldin Index: Measures intra-cluster similarity.

o Lower values are better

3. Calinski-Harabasz Index: Measures compactness and separation.

o Higher values indicate better clustering

Clustering
No ratings yet
Clustering
11 pages
Unit 4
No ratings yet
Unit 4
29 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 155-202
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 155-202
48 pages
Partition
No ratings yet
Partition
52 pages
Big Data Techniques of 2025
No ratings yet
Big Data Techniques of 2025
31 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Projects
No ratings yet
Projects
35 pages
Clustering Notes in Deep
No ratings yet
Clustering Notes in Deep
19 pages
ML U5
No ratings yet
ML U5
24 pages
Clustering
No ratings yet
Clustering
21 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
Unit 4
No ratings yet
Unit 4
19 pages
Clustering
No ratings yet
Clustering
45 pages
ML - Chapter 5 - Neural Network
No ratings yet
ML - Chapter 5 - Neural Network
64 pages
FAI Lecture - 9-10-2023 PDF
No ratings yet
FAI Lecture - 9-10-2023 PDF
16 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Geostatistical Análisis
No ratings yet
Geostatistical Análisis
105 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
13 pages
Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Lect 12
No ratings yet
Lect 12
80 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Module 5 - Notes - 13 12 2024
No ratings yet
Module 5 - Notes - 13 12 2024
45 pages
Clustering Personal
No ratings yet
Clustering Personal
9 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Clustering Theory Notes
No ratings yet
Clustering Theory Notes
4 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Machine Learning KNN Presentation
No ratings yet
Machine Learning KNN Presentation
28 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Unit 4
No ratings yet
Unit 4
16 pages
DWM PT 2 QB Soln
No ratings yet
DWM PT 2 QB Soln
8 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Unit IV Unsupervised Learning
No ratings yet
Unit IV Unsupervised Learning
4 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
The Dynamics of Micro-Task Crowdsourcing
No ratings yet
The Dynamics of Micro-Task Crowdsourcing
10 pages
Unit 5
No ratings yet
Unit 5
10 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
CLUSTERING
No ratings yet
CLUSTERING
5 pages
Tandom Forest
No ratings yet
Tandom Forest
6 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Mind Reading PPT Sreeja
No ratings yet
Mind Reading PPT Sreeja
18 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
AI Masterclass
No ratings yet
AI Masterclass
2 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Crime Prediction System Proposal
No ratings yet
Crime Prediction System Proposal
24 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
Lec 01
No ratings yet
Lec 01
76 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Say Hi To Machine Learning - Parveen Khurana - Medium
No ratings yet
Say Hi To Machine Learning - Parveen Khurana - Medium
76 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages
Clustering New
No ratings yet
Clustering New
6 pages
Big Data Mining and Analytics Notes
No ratings yet
Big Data Mining and Analytics Notes
7 pages
AI Algorithms and Logic (6 Hours) 1. AI Algorithms 2. Popular Search Algorithm 3. Means end analysis Algorithm 4. Fuzzy Logic System 5. Natural Language Processing 6. Expert System 7. Neural Network 8. AI termino
No ratings yet
AI Algorithms and Logic (6 Hours) 1. AI Algorithms 2. Popular Search Algorithm 3. Means end analysis Algorithm 4. Fuzzy Logic System 5. Natural Language Processing 6. Expert System 7. Neural Network 8. AI termino
13 pages
Amit Trivedi Cap450 Term Paper
No ratings yet
Amit Trivedi Cap450 Term Paper
18 pages
Clustering
No ratings yet
Clustering
6 pages
Wafer Map Failure Pattern Recognition and Similarity Ranking For Large Scale Data Sets
No ratings yet
Wafer Map Failure Pattern Recognition and Similarity Ranking For Large Scale Data Sets
12 pages
Data Preprocessing
No ratings yet
Data Preprocessing
12 pages
1cO1CO2: A CO1CO1Co1
No ratings yet
1cO1CO2: A CO1CO1Co1
4 pages
(15200426 - Journal of Atmospheric and Oceanic Technology) Automatic Classification of Biological Targets in A Tidal Channel Using A Multibeam Sonar
No ratings yet
(15200426 - Journal of Atmospheric and Oceanic Technology) Automatic Classification of Biological Targets in A Tidal Channel Using A Multibeam Sonar
19 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Unit II Classifications
No ratings yet
Unit II Classifications
18 pages
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
No ratings yet
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
14 pages
Predicting Student Academic Performanceusing Support Vector Machineand Random Forest
No ratings yet
Predicting Student Academic Performanceusing Support Vector Machineand Random Forest
9 pages
Heart Failure Prediction Using XGB Classifier Logistic Regression and Support Vector Classifier
No ratings yet
Heart Failure Prediction Using XGB Classifier Logistic Regression and Support Vector Classifier
5 pages
Https WWW Chegg Com Homework Help Questions and Answers B Assume
No ratings yet
Https WWW Chegg Com Homework Help Questions and Answers B Assume
3 pages
NLP Sentiment Analysis On Movie Reviews With Toxic Comment Detection
No ratings yet
NLP Sentiment Analysis On Movie Reviews With Toxic Comment Detection
33 pages
Experiment No 06: AIM: Implementation of Classification in WEKA
No ratings yet
Experiment No 06: AIM: Implementation of Classification in WEKA
9 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet