0% found this document useful (0 votes)
19 views

unit-3

Uploaded by

sachdevagngn13
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

unit-3

Uploaded by

sachdevagngn13
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT-3 NOTES

Regression, Classification and Unsupervised Learning- Linear Regression, Non-linear


Regression, Model evaluation methods, K-Nearest Neighbor, Decision Trees, Logistic
Regression, Support Vector Machines, Unsupervised Learning, K-Means Clustering,
Hierarchical Clustering, Density-Based Clustering, Content-based recommender systems,
Collaborative Filtering

Linear Regression
Linear regression analysis is used to predict the value of a variable based on the value of another
variable. The variable you want to predict is called the dependent variable. The variable you are
using to predict the other variable's value is called the independent variable. Linear regression
makes predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.

This form of analysis estimates the coefficients of the linear equation, involving one or more
independent variables that best predict the value of the dependent variable. Linear regression fits
a straight line or surface that minimizes the discrepancies between predicted and actual output
values.
When there is only one independent feature, it is known as Simple Linear Regression, and when
there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate Linear
Regression, while when there are more than one dependent variables, it is known as Multivariate
Regression.
Equation of linear Regression is Y=mx+b.
M is the slop of the line.
X is independent variable.
B is the intercept.
Y is dependent variable.
Now we predicting the prices of the pizzas
Project: Predicting Pizza Prices
1. Data Collection
2. Calculations
3. Predictions
4. Visualizations
Table for predicting the pizza prices

Diameter in Price in Mean(X) Mean(Y) Deviation Deviation Product of Sum of Square of


Inches (X) Dollar(Y) (X) (Y) Deviations product Deviations
of for X
deviations

8 (small) 10 10 13 -2 -3 6 12 4

10(medium) 13 0 0 0 0

12(large) 16 2 3 6 4

1
lop of the line is:
M=sum of product of deviations/sum of square of deviation for X
Formula of calculating the value of b
b= Mean of Y-(m*Mean of X)
Calculate Y=mx+b

Types of Linear Regression


There are two main types of linear regression:
1.Simple Linear Regression
This is the simplest form of linear regression, and it involves only one independent variable and
one dependent variable. The equation for simple linear regression is:
y= β0+β1X
where:
• Y is the dependent variable
• X is the independent variable
• β0 is the intercept
• β1 is the slope
2.Multiple Linear Regression
This involves more than one independent variable and one dependent variable. The equation for
multiple linear regression is:
y=β0+β1X1+β2X2+………βnXn
where:
• Y is the dependent variable
• X1, X2, …, Xn are the independent variables
• β0 is the intercept
• β1, β2, …, βn are the slopes

Benefits of linear regression


Linear regression is a popular statistical tool used in data science, thanks to the several benefits it
offers, such as:
1. Easy implementation
The linear regression model is computationally simple to implement as it does not demand a lot
of engineering overheads, neither before the model launch nor during its maintenance.
2. Interpretability
Unlike other deep learning models (neural networks), linear regression is relatively
straightforward. As a result, this algorithm stands ahead of black-box models that fall short in
justifying which input variable causes the output variable to change.
3. Scalability
Linear regression is not computationally heavy and, therefore, fits well in cases where scaling is
essential. For example, the model can scale well regarding increased data volume (big data).
4. Optimal for online settings
The ease of computation of these algorithms allows them to be used in online settings. The
model can be trained and retrained with each new example to generate predictions in real-time,
unlike the neural networks or support vector machines that are computationally heavy and
require plenty of computing resources and substantial waiting time to retrain on a new dataset.
All these factors make such compute-intensive models expensive and unsuitable for real-time
applications.
Non-linear Regression
It is a way to model relationship between variables when the relationship is not straight or linear.

Linear: If the relationship between the independent and dependent variables is linear, then we
can use a straight line to fit the given data.

If the relationship between the independent and dependent variables is not linear, then linear
regression cannot be used as it will result in large errors.

Examples:
1. Height and Age
Child Growth: The relationship between age and height in children is typically non-linear, as
growth rates vary with age, often modeled using a quadratic function.

2. Music Listening Habits


Time Spent Listening vs. Number of Songs: The number of songs a person listens to might
increase rapidly with more listening time, but at some point, they may experience diminishing
returns as they exhaust their playlist.
3. Discounts and Sales
Price Reduction vs. Sales Volume: The effect of discounting a product on sales volume can be
non-linear; small discounts might significantly increase sales, but deeper discounts may lead to
smaller percentage increases as the product saturates the market.

4. Plant Growth
Watering vs. Plant Height: The relationship between the amount of water given to a plant and
its growth can be non-linear, with optimal watering leading to significant height increases, while
too much or too little water can stunt growth.

Characteristics of Non-Linear Models


1. Flexibility: They can fit a wide range of data shapes and complexities.
2. Higher Complexity: Non-linear models often have more parameters, which can lead to
overfitting if not managed carefully.
3. Interpretability: Non-linear models can be harder to interpret than linear models,
especially in cases like neural networks.
4. Computationally Intensive: Fitting non-linear models often requires more computational
resources and time.

Common Types of Non-Linear Models


1. Polynomial Regression
2. Sigmoid (Logistic) Regression
3. Decision Tree

Polynomial Regression:

It can handle non-linear relationships among variables by using nth degree of a polynomial.
Polynomial regression can be directly used to deal with different levels of curvilinearity.

For example, the second-degree polynomial (called quadratic transformation) is given as:
y = α₁ + α₁x + a2x² and third degree polynomial is called cubic transformation given as:
y = a + a₁x + a2x² + α3x³. generally polynomial of maximum degree 4 are used, as higher order
polynomials take some strange shapes & make the curve more flexible. It leads to a situation of
overfitting & hence it is avoided.

Overfitting: when model is too complex and perform extremely well on training data but poorly
on new, unseen data.
Underfitting: when model is too simple and performs poorly on both the training data & new
unseen data.

Applications of non Linear Regression


Non-linear regression has a wide range of applications across various fields. Here are some
notable examples:

• Identifying complex relationships


Machine learning can be used to identify complex relationships between variables using
nonlinear regression algorithms like Random Forest and Deep Neural Networks.

• Analyzing travel behavior


Machine learning techniques can be used to analyze the nonlinear interactions between travel
behavior and the built environment.

• Modeling real world phenomena


Nonlinear regression can be used to model complex relationships in real world phenomena like
economics.

• Feature selection
Nonlinear regression can be used for feature selection in prediction modeling.
• Building interpretability
Nonlinear regression can help build interpretability into prediction models.

Model evaluation methods


It is a crucial aspect of machine learning as it provides an estimate of how well a trained
model is expected to perform on unseen data. Generalization error refers to the
difference between the model's performance on the training data and its performance
on new, unseen data. Here are a few common techniques for estimating generalization
errors:

Holdout Method: In the holdout method, the available dataset is split into two disjoint
subsets: a training set and a test set. The model is trained on the training set and then
evaluated on the test set to estimate its generalization error. The test set should be
representative of the unseen data the model will encounter in real-world scenarios. The
generalization error is typically measured using metrics such as accuracy, mean squared
error, or area under the curve.
Cross-Validation: Cross-validation is a resampling technique that helps estimate the
generalization error by iteratively splitting the dataset into training and validation
subsets. One common approach is k-fold cross-validation, where the dataset is divided
into k equally sized folds. The model is trained k times, each time using k-1 folds as the
training set and the remaining fold as the validation set. The average performance
across the k iterations provides an estimate of the generalization error. This method
helps reduce the variance in the estimated error compared to the holdout method,
especially when the dataset is limited.

Leave-One-Out Cross-Validation (LOOCV): LOOCV is a special case of cross-validation


where each data point is held out as the validation set, and the model is trained on the
remaining data points. This process is repeated for each data point, and the average
performance is computed. LOOCV provides a nearly unbiased estimate of the
generalization error but can be computationally expensive for large datasets.
Bootstrapping: Bootstrapping is a resampling technique where multiple datasets of the
same size as the original dataset are created by sampling with replacement. Each
bootstrap dataset is used to train a model, and the performance on the original dataset
is measured. The variability in performance across the bootstrap models provides an
estimate of the generalization error.

K-Nearest Neighbor

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.

o K-NN algorithm assumes the similarity between the new case/data and available cases and put
the new case into the category that is most similar to the available categories.

o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.

o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for
the Classification problems.

o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.

o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.

o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we
want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm,
as it works on a similarity measure. Our KNN model will find the similar features of the new
data set to the cats and dogs images and based on the most similar features it will put it in
either cat or dog category.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:

How does K-NN work?

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors

o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

o Step-4: Among these k neighbors, count the number of the data points in each category.

o Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.

o Step-6: Our model is ready.


Example of KNN:

IMDB Rating Duration Genre

8.0 (Mission 160 Action


Impossible)

6.2 (Gadar 2) 170 Action

7.2 (Rocky & Rani) 168 Comedy

8.2 (OMG 2) 155 Comedy

Predict the genre of “barbie” movie with IMDB rating 7.4 & duration 114 minutes

• Step-1:

• Step-2: Select K Nearest Neighbor:

If k=1 then lowest distance is 41. Means Barbie is also a comedy movie.

If k=3 then choose 41(comedy), 46(action), 54(comedy).

• Step-3: Majority Voting:

Means Barbie is a comedy movie.

Advantages of KNN Algorithm:

o It is simple to implement.

o It is robust to the noisy training data

o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.

o The computation cost is high because of calculating the distance between the data points for
all the training samples.
Logistic Regression
Logistic regression is a statistical method used to predict the probability of a certain event
occurring. It’s especially useful when you’re dealing with binary outcomes (i.e., yes/no, 0/1,
true/false).

Key Concepts:
• Input: You provide some input data (called features). For example, these could be things
like age, income, or other factors.
• Output: The output is a probability value between 0 and 1, representing the likelihood of
a specific outcome. In binary logistic regression, this is typically either "success" (1) or
"failure" (0).
• Logistic Function (Sigmoid Curve): Instead of making predictions directly, logistic
regression uses a special mathematical function called the logistic (or sigmoid) function
to squish the results between 0 and 1.

Problem:
You want to predict whether a student will pass or fail a test based on the number of
hours they studied.
Data:
You have the following data for several students:
Data set:

Hours Studied Passed (1= yes, 0= No)

1 0

2 0

3 0

4 1

5 1

6 1

Graph:

Here is a graph showing how the probability of passing a test increases as the number of hours
studied goes up. The red dashed line represents the 0.5 threshold, which means if the probability
is greater than 0.5, the model would predict the student will pass. If it's below 0.5, the prediction
would be that the student will fail.
Support Vector Machines
The objective of the support vector machine (SVM) algorithm is to maximize the margin
which is defined as the distance between the separating hyperplane (or decision
boundary) and the training samples that are closest to this hyperplane, the so-called
support vectors. The margin is calculated as the perpendicular distance from the line to
only the closest points, as shown in Figure 4-3. Hence, SVM calculates a
maximum-margin boundary that leads to a homogeneous partition of all data points.

Figure 4-3. Support vector machine


• Support-vector machine (SVM) are supervised learning models with associated
learning algorithms that analyze data for classification and regression analysis.
• Given a set of training examples, each marked as belonging to one of two
categories, an SVM training algorithm builds a model that assigns new examples to
one category or the other, making it a non-probabilistic binary linear classifier
• SVM maps training examples to points in space so as to maximise the width of the
gap between the two categories.
• New examples are then mapped into that same space and predicted to belong to a
category based on which side of the gap they fall.
• In addition to performing linear classification, SVMs can efficiently perform a
non-linear classification using what is called the kernel trick, implicitly mapping their
inputs into high-dimensional feature spaces.
Terminology of SVM
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.

Unsupervised Learning
Unsupervised learning is a machine learning technique in which models are not supervised using training
dataset. Instead, models itself find the hidden patterns and insights from the given data. It can be
compared to learning which takes place in the human brain while learning new things. It can be defined
as: Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification problem because unlike
supervised learning, we have the input data but no corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it
does not have any idea about the features of the dataset. The task of the unsupervised learning
algorithm is to identify the image features on their own. Unsupervised learning algorithm will perform
this task by clustering the image dataset into the groups according to similarities between images.

Working of Unsupervised Learning


Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and corresponding
outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in order
to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will
apply suitable algorithms such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to
the similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of problems:

o Clustering: Clustering algorithms group similar data points together based on their
characteristics or proximity in the feature space. K-means clustering, hierarchical clustering, and
DBSCAN are popular clustering algorithms. Clustering can be useful for customer segmentation,
image segmentation, document clustering, and more.
o Association: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such
as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:


Below is the list of some popular unsupervised learning algorithms:

o K-means clustering

o KNN (k-nearest neighbors)

o Hierarchal clustering

o Anomaly detection

o Neural Networks

o Principle Component Analysis

o Independent Component Analysis

o Apriori algorithm

o Singular value decomposition

Advantages of Unsupervised Learning

o Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.

o Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled


data

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.

o The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.

Unsupervised machine learning is a branch of machine learning where the algorithm learns from
unlabeled data to discover patterns, relationships, or structures within the dataset. Unlike supervised
learning, which requires labeled examples for training, unsupervised learning operates on raw,
unclassified data.
The primary objective of unsupervised learning is to gain insights into the underlying structure of the
data and extract meaningful information without prior knowledge or guidance. It can be used for tasks
such as data exploration, clustering and dimensionality reduction.

Let's explore some common techniques used in unsupervised machine learning:

1. Clustering: Clustering algorithms group similar data points together based on their
characteristics or proximity in the feature space. K-means clustering, hierarchical clustering, and
DBSCAN are popular clustering algorithms. Clustering can be useful for customer segmentation,
image segmentation, document clustering, and more.

2. Dimensionality Reduction: Dimensionality reduction methods aim to reduce the number of


features or variables in a dataset while preserving its essential information. Techniques like
Principal Component Analysis (PCA) and t-SNE (t-Distributed Stochastic Neighbor Embedding)
are commonly used for dimensionality reduction. These methods help visualize
high-dimensional data, remove noise, and improve computational efficiency.

Unsupervised learning techniques are valuable when dealing with large, unlabeled datasets, as they can
uncover hidden patterns, extract useful representations, and provide a foundation for subsequent
analysis or decision-making. However, evaluation and interpretation of unsupervised models can be
subjective and challenging due to the absence of explicit ground truth labels.

K-Means Clustering
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that
each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters. The value of k should be predetermined in
this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.

o Assigns each data point to its closest k-center. Those data points which are near to the particular
k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given
below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different
clusters. It means here we will try to group these datasets into two different clusters.
o We need to choose some random k points or centroid to form the cluster. These points can be
either the points from the dataset or any other point. So, here we are selecting the below two
points as k points, which are not the part of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance
between two points. So, we will draw a median between both the centroids. Consider the below
image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and
points to the right of the line are close to the yellow centroid. Let's color them as blue and yellow for
clear visualization.

o As we need to find the closest cluster, so we will repeat the process by choosing a new centroid.
To choose the new centroids, we will compute the center of gravity of these centroids, and will
find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same
process of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue points
are right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or
K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new centroids will
be as shown in the below image:

o As we got the new centroids so again will draw the median line and reassign the data points. So,
the image will be:

o We can see in the above image; there are no dissimilar data points on either side of the line,
which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be
as shown in the below image:
Numerical example of k means clustering

Sr. No. Age Amount

C1 20 500

C2 40 1000

C3 30 800

C4 18 300

C5 28 1200

C6 35 1400

C7 45 1800

How to choose the value of "K number of clusters" in K-means Clustering?

The performance of the K-means clustering algorithm depends upon highly efficient clusters that it
forms. But choosing the optimal number of clusters is a big task. There are some different ways to find
the optimal number of clusters, but here we are discussing the most appropriate method to find the
number of clusters or value of K. The method is given below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters. This method
uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the total
variations within a cluster. The formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point and its
centroid within a cluster1 and the same for the other two terms.

To measure the distance between data points and centroid, we can use any method such as Euclidean
distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below steps:

o It executes the K-means clustering on a given dataset for different K values (ranges from 1-10).

o For each value of K, calculates the WCSS value.

o Plots a curve between calculated WCSS values and the number of clusters K.

o The sharp point of bend or a point of the plot looks like an arm, then that point is considered as
the best value of K.

Hierarchical Clustering
Hierarchical clustering is a connectivity-based clustering model that groups the data points
together that are close to each other based on the measure of similarity or distance. The
assumption is that data points that are close to each other are more similar or related than data
points that are farther apart.
Types of Hierarchical Clustering
Basically, there are two types of hierarchical Clustering:
1. Agglomerative Clustering (Dendrogram Clustering)
2. Divisive clustering
Hierarchical Agglomerative Clustering
The agglomerative hierarchical clustering algorithm is a popular example of HCA(hierarchical
Clustering analysis). To group the datasets into clusters, it follows the bottom-up approach. It
means, this algorithm considers each dataset as a single cluster at the beginning, and then start
combining the closest pair of clusters together. It does this until all the clusters are merged into
a single cluster that contains all the datasets.

The application of Hierarchical clustering


The application of Hierarchical clustering has a wide range of applications across various
domains. Some common applications of hierarchical clustering include:
1. Customer Segmentation: Hierarchical clustering can be used to segment customers
based on their purchasing behavior, preferences, or demographic data. This segmentation
can help businesses tailor their marketing strategies and provide personalized
recommendations to different customer groups.
2. Image Segmentation: Hierarchical clustering can be applied to segment images into
regions or objects based on their visual similarities. This is useful in image processing,
computer vision, and object recognition tasks.
3. Document Clustering: Hierarchical clustering can group similar documents together
based on their content, allowing for topic extraction, information retrieval, and document
organization.
4. Gene Expression Analysis: Hierarchical clustering can be used to analyze gene
expression data and identify patterns or clusters of genes with similar expression profiles.
This helps in understanding gene functions, and genetic relationships, and identifying
potential biomarkers.
5. Anomaly Detection: By clustering normal or expected data points, hierarchical
clustering can help identify anomalies or outliers that do not fit into any cluster. This is
useful in fraud detection, network intrusion detection, and outlier detection in various
domains.
6. Recommender Systems: Hierarchical clustering can assist in building recommender
systems by grouping similar users or items based on their preferences or behavior. This
enables personalized recommendations for users and helps in improving user experience
and engagement.
7. Social Network Analysis: Hierarchical clustering can be applied to analyze social
networks and identify communities or groups of individuals with similar social
connections or interests. This aids in understanding social structures, influence
propagation, and targeted marketing.
8. Ecology and Species Classification: Hierarchical clustering can help classify and group
species based on their ecological characteristics, genetic traits, or geographic distribution.
This supports biodiversity studies, conservation efforts, and ecosystem analysis.
9. Market Segmentation: Hierarchical clustering can be utilized to segment markets based
on consumer behavior, demographics, or geographical factors. This assists businesses in
identifying target markets, designing marketing campaigns, and optimizing product
offerings.
10. Text Analysis and Natural Language Processing: Hierarchical clustering can be used
to cluster documents, text snippets, or words based on their semantic similarities. This
aids in text summarization, topic modeling, sentiment analysis, and information retrieval.

Divisive Clustering:
• Divisive clustering takes the opposite approach compared to agglomerative clustering.

• It starts with all data points belonging to a single cluster.


• At each step, it divides the cluster into two smaller clusters based on some criteria, such
as maximizing the inter-cluster dissimilarity or minimizing the intra-cluster dissimilarity.
• This process continues recursively, splitting clusters into smaller clusters until each data
point is in its own cluster or until a stopping criterion is met.

The result of hierarchical clustering is represented as a dendrogram, which is a tree-like structure


that illustrates the sequence of cluster merging or splitting. The dendrogram provides a visual
representation of the hierarchy of clusters and allows us to interpret the relationships and
similarities between different clusters.
Hierarchical clustering has several advantages. It does not require the number of clusters to be
specified in advance, making it a flexible approach. It also captures the inherent structure of the
data by forming nested clusters. Additionally, hierarchical clustering can be useful for
exploratory data analysis and visualization.

Divisive clustering is a hierarchical clustering method that's used in machine learning to identify
large clusters. It's used in the following ways:
• Alternative to k-means: Divisive clustering can be used as an alternative to k-means
clustering.
• Faster than agglomerative clustering: Divisive clustering can be faster than agglomerative
clustering because it only takes O(N) time if the number of levels is constant.
• Splits based on all results: Divisive clustering makes splitting decisions based on all
results, while bottom-up methods make myopic merge decisions
Advantages of Hierarchical clustering
o It is simple to implement and gives the best output in some cases.
o It is easy and results in a hierarchy, a structure that contains more information.
o It does not need us to pre-specify the number of clusters.
Disadvantages of hierarchical clustering
o It breaks the large clusters.
o It is Difficult to handle different sized clusters and convex shapes.
o It is sensitive to noise and outliers.
o The algorithm can never be changed or deleted once it was done previously.

Density-Based Clustering

Density-Based Clustering refers to unsupervised learning methods that identify distinctive


groups/clusters in the data, based on the idea that a cluster in data space is a contiguous region of
high point density, separated from other such clusters by contiguous regions of low point
density.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a base algorithm for
density-based clustering. It can discover clusters of different shapes and sizes from a large
amount of data, which is containing noise and outliers.
The DBSCAN algorithm uses two parameters:
• minPts(): The minimum number of points inside the circle i.e. 3.
• eps (Epsilon): radius of circle formed with data object as centre.

There are three types of points after the DBSCAN clustering is complete:

• Core — it should satisfy the condition of minpts.


• Border — neighbour of core.
• Noise — not Core nor Boundary

Content-based recommender systems

• It is a machine learning technique that uses similarities in features to make decision.


• Algorithm designed to advertise or recommend things to user based on knowledge
accumulated about the user.
• Comparing user interest to product features.
• Most overlapping features with user interests are what is recommended.
• Algorithm can keep track of the products, the user has chosen before and add these
features to the user’s data.
Example: Protein Powder, amazon

User1 watched movies M1 and M2, giving M1 a rating of 5 and M2 a rating of 4. Both are
romantic movies, or they are by the same director. Then movie M3 was released, which user1
hasn't seen yet. Based on their interests, we will recommend this movie to user1 because it is also
a romantic movie and it is also directed by the same director.

Collaborative Filtering:

This technique is frequently used in recommender systems to identify the similarities between
user and items.
If user A and B both like product A and user B also like product B, then product B could be
recommended to user A by the systems.
Model Keeps track of what products user like and their characteristics.

User 1 User 2 User 3 User 4


Product 1 1 1
Product 2 1
Product 3 1 1
Product 4 1

Based on the common interests of user3 and user4, a product was recommended to user3.
According to the interests of user3 and user2, a product was recommended to user2.
Another Example:

User1 watched movie M1 and gave it 5 stars. User2 also watched movie M1 and gave it 5 stars.
Then, user1 watched movie M2 and gave it 4 stars, while user2 also watched movie M2 and gave
it 5 stars. User3 watched movie M3 and gave it 4 stars, but user2 hasn’t seen movie M3. Based
on their common interests up to movie M2, we will recommend movie M3 to user2 because
there is a higher possibility that user2 will like movie M3.

You might also like