0% found this document useful (0 votes)
8 views

Aiml 8

Uploaded by

kushnayade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Aiml 8

Uploaded by

kushnayade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Name Kush Nhayade

UID no. 2022300068

Experiment No. 8

Unsupervised Learning Algorithms

Problem Statement:

To perform clustering using K-means on the given dataset.

Sample dataset from kaggle(Iris dataset):

Dataset Link: Kaggle

Notebook Link: Colab


Theory:

Unsupervised learning is a machine learning approach that works with unlabeled data, meaning the data
lacks predefined categories or labels. The aim of unsupervised learning is to uncover patterns and
relationships within the data independently, without external guidance.

In this approach, the model's task is to group unclassified data by identifying similarities, differences, and
patterns without prior training on labeled data.

Key aspects of unsupervised learning include:

• Identifying patterns and relationships within unlabeled data.


• Using clustering algorithms to group similar data points based on their inherent properties.
• Extracting essential features to highlight significant information, allowing the model to make
meaningful distinctions.
• Assigning labels to clusters based on the identified patterns and features.

Unsupervised learning generally falls into two main categories: Clustering and Association.

Clustering is a common unsupervised learning technique that organizes similar data points into groups.
Clustering algorithms achieve this by iteratively moving data points closer to the center of their respective
clusters, creating groups where data points within the same group are more alike than those in other
groups. Clustering is essentially a way of organizing items based on their similarities and differences.

K-Means Clustering is a popular clustering algorithm used for dividing data points into a specified
number of clusters, represented by ‘K’. The algorithm follows these steps:

1. Randomly initialize 'k' points as the initial cluster centroids.


2. Assign each data point to the nearest centroid and recalculate the centroid’s position based on the
average position of all points within that cluster.
3. Repeat the process over several iterations until the clusters are well-defined, yielding distinct
groups.

Code:
Selecting only the first two features for clustering and visualization:
Step 1: Define the number of clusters (k) and initialize random centroids for each cluster.

Step 2: Define the Euclidean distance function:


Step 3: Assign each point to the nearest cluster center:

Step 4: Update cluster centers to the mean of assigned points:

Now, clustering is completed.


Step 5: Predict the cluster for each data point in the dataframe:

Step 6: Run multiple iterations of step 3 and step 4 (assign and update) for all the data
points in the dataset.

it=5 Specifies the number of times we will iterate over the step 3 and 4. In a typical
K-means algorithm, we continue iterating until the cluster centers do not change
significantly (i.e., until convergence). Here, I have limited it to 5 iterations for simplicity.

Predicting clusters for final plot:


Step 7: Plot the final clusters with updated cluster centers:

Conclusion:In this experiment on unsupervised learning algorithms, I gained a deeper


understanding of their mechanics, particularly how the K-means clustering algorithm
operates. This knowledge was applied to successfully cluster the widely recognized iris
dataset.

You might also like