Unsupervised+Learning
Unsupervised+Learning
401892 LECTURE 9 11
Outline
• Last Time Today: Unsupervised Learning
• Supervised
Learning • Introduction to Unsupervised Learning
• YouTube video (link)
• Key Concepts in Unsupervised Learning
• Types of Unsupervised Learning
• Building an Unsupervised Learning Model
• Common Algorithms in Unsupervised Learning
• Practical Application of Unsupervised Learning
• Challenges in Unsupervised Learning
• The Future of Unsupervised Learning
401892 LECTURE 9 22
(Link)
401892 LECTURE 9 33
3
Introduction to Unsupervised Learning
401892 LECTURE 9 44
401892 LECTURE 9 55
401892 LECTURE 9 66
6
• Data Without Labels:
• No predefined outputs or correct answers.
• Machine learns from patterns and structures in
input data.
• Contrast with supervised learning (uses labeled
data).
• Features and Observations:
• Features: Attributes or variables describing data
(e.g., age, purchase amount).
• Observations: Individual data points or examples.
Key Concepts in • Algorithms analyze features across observations to
find patterns.
Unsupervised • Exploratory Data Analysis:
401892 LECTURE 9 77
401892 LECTURE 9 88
Types of Unsupervised
Learning
2. Association
• Association involves finding relationships or
associations between variables in large
datasets.
• It discovers rules that describe how variables
relate to each other.
• Examples:
• Market Basket Analysis
• Association Rules:
• Expressed in the form "If X, then Y," indicating a
relationship between items or features.
401892 LECTURE 9 99
9
Types of Unsupervised
Learning
3. Dimensionality Reduction
• Reducing the number of variables or features in the
data while retaining important information.
• Simplifies data analysis and visualization.
• Examples:
• Simplifying data for visualization.
• Noise reduction.
• Technique:
• Principal Component Analysis (PCA)
• A common method for dimensionality
reduction.
• Transforms data into new coordinates to
maximize variance.
401892 LECTURE 9 10
10
10
401892 LECTURE 9 11
11
11
401892 LECTURE 9 12
12
12
Common Algorithms in Unsupervised
Learning
2. Hierarchical Clustering
• Concept:
• Builds a tree (hierarchy) of clusters.
• Approaches:
• Agglomerative: Starts with each data point as its own cluster and merges them step by
step.
• Divisive: Starts with one cluster containing all data points and divides them recursively.
• Dendrograms:
• A visual representation showing how clusters are merged or split.
• Helps in deciding the number of clusters by cutting the dendrogram at a certain level.
401892 LECTURE 9 13
13
13
401892 LECTURE 9 14
14
14
401892 LECTURE 9 15
15
15
Common Algorithms in Unsupervised
Learning
5. Principal Component Analysis (PCA)
• Concept:
• A dimensionality reduction technique that transforms data into new coordinates (principal components).
• Captures the most variance in the data with the least number of components.
• Algorithm Steps:
1. Standardize the data.
2. Compute the covariance matrix.
3. Calculate eigenvalues and eigenvectors of the covariance matrix.
1. Eigenvalues: These represent the magnitude of the variance along the principal components.
2. Eigenvectors: These are the directions of the principal components, which are the axes that capture the
maximum variance in the data.
4. Select principal components (eigenvectors with the highest eigenvalues).
5. Transform the data into the new coordinate system.
• Applications:
• Data Visualization: Reducing dimensions for plotting.
• Noise Reduction: Eliminating less significant components.
401892 LECTURE 9 16
16
16
Class Discussion
Scenario: Suppose you have sales data Question: Which unsupervised learning
from a retail store. algorithms might you use to gain insights,
and why?
401892 LECTURE 9 17
17
17
401892 LECTURE 9 18
18
18
Challenges in Unsupervised Learning
• Determining the Number of • Interpretability
Clusters • Difficulty interpreting clusters
o Choosing the right K in K-Means • Use visualizations and domain
o Methods: Elbow Method, Silhouette expertise
Score
• Evaluating Model Performance
o Lack of ground truth labels
o Internal metrics: Silhouette
Coefficient, Davies-Bouldin Index
o Visual inspection
• Handling High-Dimensional Data
o Curse of dimensionality
o Need for dimensionality reduction
(e.g., PCA)
401892 LECTURE 9 19
19
19
401892 LECTURE 9 20
20
20
References
• Book: Machine Learning for Absolute Beginners (3rd e) by Oliver
Theobald
• Book: Introduction to Machine Learning, (4th e) by Ethem Alpaydin
• Video: https://www.youtube.com/watch?v=JnnaDNNb380&t=7s
• IBM: https://www.ibm.com/topics/unsupervised-learning
• Google: https://cloud.google.com/discover/what-is-unsupervised-
learning
401892 LECTURE 9 21
21
21
Next Class
Introduction to Reinforcement Learning
401892 LECTURE 9 22
22
22