0% found this document useful (0 votes)
3 views

Unsupervised+Learning

The document provides an overview of unsupervised learning in artificial intelligence, detailing its importance, key concepts, types (clustering, association, and dimensionality reduction), and common algorithms like K-Means and PCA. It discusses practical applications such as customer segmentation and anomaly detection, as well as challenges faced in model evaluation and interpretability. The future of unsupervised learning is also addressed, highlighting advancements in algorithms and technology, along with ethical considerations.

Uploaded by

Gaming with Joel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unsupervised+Learning

The document provides an overview of unsupervised learning in artificial intelligence, detailing its importance, key concepts, types (clustering, association, and dimensionality reduction), and common algorithms like K-Means and PCA. It discusses practical applications such as customer segmentation and anomaly detection, as well as challenges faced in model evaluation and interpretability. The future of unsupervised learning is also addressed, highlighting advancements in algorithms and technology, along with ethical considerations.

Uploaded by

Gaming with Joel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Unsupervised Learning

Introduction to Artificial Intelligent


Hamdi Abdurhman, PhD

401892 LECTURE 9 11

Outline
• Last Time Today: Unsupervised Learning
• Supervised
Learning • Introduction to Unsupervised Learning
• YouTube video (link)
• Key Concepts in Unsupervised Learning
• Types of Unsupervised Learning
• Building an Unsupervised Learning Model
• Common Algorithms in Unsupervised Learning
• Practical Application of Unsupervised Learning
• Challenges in Unsupervised Learning
• The Future of Unsupervised Learning

401892 LECTURE 9 22

(Link)

401892 LECTURE 9 33

3
Introduction to Unsupervised Learning

What is Unsupervised Learning?


• Learning from Unlabeled Data:
• Unsupervised learning involves training algorithms on data that does not have labeled
responses.
• The goal is to discover hidden patterns, groupings, or structures in the data.
• It's like exploring without a guide, letting the data reveal its own organization.
• Discovering Hidden Structures:
• The machine tries to make sense of data by finding similarities and differences.
• It identifies clusters or associations without being told what to look for.
• Comparison to Supervised Learning:
• In supervised learning, models learn from labeled data (inputs with known outputs).
• Unsupervised learning has no labels; it's about pattern discovery.

401892 LECTURE 9 44

Introduction to Unsupervised Learning

Why is Unsupervised Learning Important?


• Understanding Data Without Predefined Labels:
• Many real-world datasets are unlabeled and large; labeling them can be time-
consuming or impractical.
• Unsupervised learning helps make sense of vast amounts of data.
• Applications in Exploratory Data Analysis:
• Identifying natural groupings in data.
• Uncovering underlying structures that were not previously known.

401892 LECTURE 9 55

Introduction to • Real-World Examples:


• Customer Segmentation:
Unsupervised • Grouping customers for targeted marketing.
Learning • Topic Clustering:
• Organizing articles or documents by topics.
• Unsupervised Learning in Everyday Life:
• Recommendation Systems:
• Suggesting products or content you might
like.
• Anomaly Detection:
• Identifying fraud or errors in data.

This Photo by Unknown Author is licensed under CC BY

401892 LECTURE 9 66

6
• Data Without Labels:
• No predefined outputs or correct answers.
• Machine learns from patterns and structures in
input data.
• Contrast with supervised learning (uses labeled
data).
• Features and Observations:
• Features: Attributes or variables describing data
(e.g., age, purchase amount).
• Observations: Individual data points or examples.
Key Concepts in • Algorithms analyze features across observations to
find patterns.
Unsupervised • Exploratory Data Analysis:

Learning • Summarizing and visualizing data to understand


characteristics.
• Identifying patterns, anomalies, relationships.
• Techniques include statistical summaries and
charts.

401892 LECTURE 9 77

Types of Unsupervised Learning


1. Clustering
• Clustering is the process of grouping similar data points together based on
their features.
• The goal is to discover inherent groupings in data without predefined
categories.
• Examples:
• Customer segmentation.
• Grouping similar images.

401892 LECTURE 9 88

Types of Unsupervised
Learning
2. Association
• Association involves finding relationships or
associations between variables in large
datasets.
• It discovers rules that describe how variables
relate to each other.
• Examples:
• Market Basket Analysis
• Association Rules:
• Expressed in the form "If X, then Y," indicating a
relationship between items or features.

401892 LECTURE 9 99

9
Types of Unsupervised
Learning
3. Dimensionality Reduction
• Reducing the number of variables or features in the
data while retaining important information.
• Simplifies data analysis and visualization.
• Examples:
• Simplifying data for visualization.
• Noise reduction.
• Technique:
• Principal Component Analysis (PCA)
• A common method for dimensionality
reduction.
• Transforms data into new coordinates to
maximize variance.

401892 LECTURE 9 10
10

10

Building an Unsupervised Learning Model


Data Collection and Gathering unlabeled data.
Preparation: Cleaning and preprocessing steps (handling missing values, normalization, removing noise).

Clustering: K-Means, Hierarchical Clustering, DBSCAN.


Choosing a Method: Association: Apriori algorithm.
Dimensionality Reduction: Principal Component Analysis (PCA).

Implementing the Applying chosen algorithms.


Model: Interpreting results using visualization techniques.

Subjective evaluation based on domain knowledge.


Evaluating the Model: Common metrics: Silhouette Score, Inertia, Dunn Index.

401892 LECTURE 9 11
11

11

Common Algorithms in Unsupervised


Learning
1. K-Means Clustering:
• Concept:
• K-Means is a clustering algorithm that partitions data into K distinct clusters based on
feature similarity.
• Algorithm Steps:
1. Choose the number of clusters (K).
2. Initialize cluster centers (centroids) randomly.
3. Assign each data point to the nearest centroid.
4. Recalculate centroids as the mean of assigned points.
5. Repeat steps 3 and 4 until centroids no longer change significantly.
• Applications:
• Customer Segmentation: Grouping customers based on behavior.
• Image Compression: Reducing colors in an image by clustering pixel values.

401892 LECTURE 9 12
12

12
Common Algorithms in Unsupervised
Learning
2. Hierarchical Clustering
• Concept:
• Builds a tree (hierarchy) of clusters.
• Approaches:
• Agglomerative: Starts with each data point as its own cluster and merges them step by
step.
• Divisive: Starts with one cluster containing all data points and divides them recursively.
• Dendrograms:
• A visual representation showing how clusters are merged or split.
• Helps in deciding the number of clusters by cutting the dendrogram at a certain level.

401892 LECTURE 9 13
13

13

Common Algorithms in Unsupervised


Learning
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
• Concept:
• Clusters data based on density (regions with a high concentration of points).
• Identifies core points, reachable points, and noise (outliers).
• Algorithm Steps:
1. Select parameters:
• Epsilon (ε): Maximum distance between two points to be considered neighbors.
• MinPts: Minimum number of neighboring points required to form a dense region.
2. Classify points as core, border, or noise.
3. Form clusters from core points and their neighbors.
• Applications:
• Anomaly Detection: Identifying outliers in data.
• Clustering Spatial Data: Useful in geospatial analysis.

401892 LECTURE 9 14
14

14

Common Algorithms in Unsupervised


Learning
• Apriori Algorithm (Association Rules)
• Concept:
• Used for mining frequent itemsets and generating association rules in transactional
databases.
• Association Rules Format:
• If itemset X then itemset Y.
• Algorithm Steps:
• Identify frequent individual items (with support greater than a minimum threshold).
• Extend frequent itemsets by combining them to find larger frequent itemsets.
• Generate association rules from the frequent itemsets.
• Applications:
• Market Basket Analysis: Understanding purchasing patterns.
• Cross-Selling Strategies: Recommending complementary products.

401892 LECTURE 9 15
15

15
Common Algorithms in Unsupervised
Learning
5. Principal Component Analysis (PCA)
• Concept:
• A dimensionality reduction technique that transforms data into new coordinates (principal components).
• Captures the most variance in the data with the least number of components.
• Algorithm Steps:
1. Standardize the data.
2. Compute the covariance matrix.
3. Calculate eigenvalues and eigenvectors of the covariance matrix.
1. Eigenvalues: These represent the magnitude of the variance along the principal components.
2. Eigenvectors: These are the directions of the principal components, which are the axes that capture the
maximum variance in the data.
4. Select principal components (eigenvectors with the highest eigenvalues).
5. Transform the data into the new coordinate system.
• Applications:
• Data Visualization: Reducing dimensions for plotting.
• Noise Reduction: Eliminating less significant components.

401892 LECTURE 9 16
16

16

Class Discussion

Scenario: Suppose you have sales data Question: Which unsupervised learning
from a retail store. algorithms might you use to gain insights,
and why?

401892 LECTURE 9 17
17

17

Practical Application of Unsupervised


Learning
• Customer Segmentation • Image and Text Clustering
• Group customers based on • Organize images or documents into
similarities. meaningful groups.
• Enables targeted marketing and • Simplifies retrieval and management.
personalization.
• Genomics and Bioinformatics
• Anomaly Detection • Analyze biological data for patterns.
• Identify unusual data points. • Applications in gene expression and
• Applications in fraud detection and disease classification.
network security.
• Recommendation Systems
• Suggest products or content based on
behavior.
• Techniques: Collaborative filtering,
association rules.

401892 LECTURE 9 18
18

18
Challenges in Unsupervised Learning
• Determining the Number of • Interpretability
Clusters • Difficulty interpreting clusters
o Choosing the right K in K-Means • Use visualizations and domain
o Methods: Elbow Method, Silhouette expertise
Score
• Evaluating Model Performance
o Lack of ground truth labels
o Internal metrics: Silhouette
Coefficient, Davies-Bouldin Index
o Visual inspection
• Handling High-Dimensional Data
o Curse of dimensionality
o Need for dimensionality reduction
(e.g., PCA)

401892 LECTURE 9 19
19

19

The Future of Unsupervised Learning


• Advancements in Algorithms: • Advancements in Technology:
• Deep clustering combining neural networks • High-performance computing with GPUs
and clustering. and TPUs.
• Self-organizing maps for dimensionality • Automation through AutoML.
reduction. • Integration with IoT for real-time analysis.
• Spectral clustering for complex data
structures. • Ethical Considerations:
• Privacy preservation and data
• Integration with Other Learning anonymization.
Methods: • Fairness, transparency, and accountability.
• Semi-supervised learning mixing labeled • Compliance with regulations.
and unlabeled data.
• Unsupervised pre-training in deep learning.
• Applications in Emerging Fields:
• Bioinformatics and genomics analysis.
• Social media analysis for trends and
insights.

401892 LECTURE 9 20
20

20

References
• Book: Machine Learning for Absolute Beginners (3rd e) by Oliver
Theobald
• Book: Introduction to Machine Learning, (4th e) by Ethem Alpaydin
• Video: https://www.youtube.com/watch?v=JnnaDNNb380&t=7s
• IBM: https://www.ibm.com/topics/unsupervised-learning
• Google: https://cloud.google.com/discover/what-is-unsupervised-
learning

401892 LECTURE 9 21
21

21
Next Class
Introduction to Reinforcement Learning

401892 LECTURE 9 22
22

22

You might also like