0% found this document useful (0 votes)
19 views32 pages

Unit-4 Learning in AI (1)

This document provides an overview of machine learning, a branch of artificial intelligence that enables systems to learn from data. It details various learning methodologies, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with their definitions, workings, advantages, and disadvantages. Additionally, it discusses artificial neural networks, their architecture, training processes, applications, and the support vector machine algorithm for classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views32 pages

Unit-4 Learning in AI (1)

This document provides an overview of machine learning, a branch of artificial intelligence that enables systems to learn from data. It details various learning methodologies, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with their definitions, workings, advantages, and disadvantages. Additionally, it discusses artificial neural networks, their architecture, training processes, applications, and the support vector machine algorithm for classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Learning
Machine learning is the branch of Artificial Intelligence that focuses on developing
models and algorithms that let computers learn from data and improve from previous
experience without being explicitly programmed for every task. In simple words, ML teaches
the systems to think and understand like humans by learning from the data.

In artificial intelligence (AI), there are several types of learning methodologies that systems use
to acquire knowledge and improve their performance over time. The main types of learning in
AI include:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

Supervised Learning
Definition: Supervised learning is a type of machine learning where the model is trained using
labeled data. This means that for each input data point, there is a corresponding output label.
How It Works:
1. Training Phase: The algorithm learns from a training dataset, which includes input-
output pairs. The model makes predictions on the inputs and adjusts itself based on the
errors it makes.
2. Testing Phase: The trained model is tested on a separate dataset (test data) to evaluate
its performance.
Examples:
 Image Classification: Consider a dataset of images where each image is labeled as
either a 'cat' or 'dog'. The model is trained to classify new images correctly.
 Spam Detection: Email messages are labeled as 'spam' or 'not spam'. The model learns
to classify incoming emails accordingly.
 Regression: Predicting the price of a house based on features like size, location, and
number of rooms. Here, the output is a continuous value.

Page 1
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Real-World Application:
 Healthcare: Predicting diseases from medical imaging.
 Finance: Stock price prediction and credit scoring.

Advantages:
1. High Accuracy: Models can achieve high accuracy with sufficient labeled data.
2. Interpretability: The relationship between input and output is well-defined, making it
easier to understand and interpret model predictions.
3. Wide Range of Applications: Suitable for various tasks, such as classification and
regression, across different domains.
Disadvantages:
1. Data Labeling: Requires a large amount of labeled data, which can be time-consuming
and expensive to obtain.
2. Overfitting: Models can overfit to the training data, leading to poor generalization on
new data.
3. Limited to Known Tasks: Can only be used for tasks where labeled data is available.

Page 2
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Unsupervised Learning
Definition: Unsupervised learning involves training a model on data without labeled
responses. The goal is to uncover hidden patterns or intrinsic structures in the input data.
How It Works:
1. Data Analysis: The algorithm tries to learn the underlying structure or distribution in
the data.
2. Clustering: Grouping similar data points together.
3. Dimensionality Reduction: Reducing the number of random variables under
consideration.
Examples:
 Clustering: Grouping customers based on purchasing behavior without knowing in
advance what the groups might be. Algorithms like k-means or hierarchical clustering
are used.
 Anomaly Detection: Identifying unusual data points, such as fraudulent transactions,
that do not conform to the rest of the dataset.
 Principal Component Analysis (PCA): Reducing the dimensionality of a dataset while
retaining most of the variance. For example, compressing high-dimensional image data.
Real-World Application:
 Marketing: Customer segmentation for targeted marketing campaigns.
 Biology: Classifying species based on genetic information.

Page 3
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Advantages:
1. No Need for Labeled Data: Can work with unlabeled data, which is often more readily
available.
2. Discover Hidden Patterns: Useful for exploring data and finding hidden structures,
such as clustering and associations.
3. Data Preprocessing: Effective for tasks like dimensionality reduction and anomaly
detection.
Disadvantages:
1. Uncertainty in Results: The results can be difficult to interpret and validate since there
are no labels.
2. Less Control: Less control over the output compared to supervised learning as it is more
exploratory.
3. Complexity: Some algorithms can be computationally intensive and require significant
computational resources.

Semi-Supervised Learning
Definition: Semi-supervised learning is a hybrid approach that combines a small amount of
labeled data with a large amount of unlabeled data during training. This method can
significantly improve learning accuracy when acquiring labeled data is costly.
How It Works:
1. Initial Training: The model is first trained on the small labeled dataset.
2. Expansion: The model then uses this initial knowledge to label the large unlabeled
dataset.
3. Refinement: The model is retrained on the newly labeled dataset.
Examples:
 Text Classification: Using a small set of labeled documents to classify a much larger set
of unlabeled documents.
 Image Recognition: Training a model on a few labeled images and a large number of
unlabeled images to improve classification accuracy.

Page 4
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Real-World Application:
 Web Content Classification: Classifying vast amounts of web pages with minimal
labeled data.
 Medical Imaging: Labeling a small number of medical images and using a large set of
unlabeled images to train more accurate diagnostic models.

Semi-Supervised Learning

Advantages:
1. Cost-Effective: Reduces the need for large labeled datasets by leveraging a small
amount of labeled data with a large amount of unlabeled data.
2. Improved Accuracy: Can improve model performance compared to unsupervised
learning alone by using labeled data.
3. Scalable: Can scale to large datasets where full labeling is impractical.
Disadvantages:
1. Complex Implementation: Combining labeled and unlabeled data effectively can be
complex and may require advanced techniques.
2. Quality of Unlabeled Data: The success depends on the quality and representativeness
of the unlabeled data.
3. Data Integration: Integrating labeled and unlabeled data can be challenging and might
require careful preprocessing.

Page 5
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Reinforcement Learning
Definition: Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by performing actions in an environment to achieve maximum cumulative
reward. Unlike supervised learning, RL does not require labeled input/output pairs and instead
learns from the consequences of actions.
How It Works:
1. Agent and Environment: The agent interacts with the environment by taking actions.
2. Reward and Feedback: The environment provides feedback in the form of rewards or
punishments.
3. Policy and Value Function: The agent uses a policy to decide actions and a value
function to evaluate the goodness of states or state-action pairs.
4. Learning and Exploration: The agent continuously explores and exploits to maximize
the cumulative reward over time.
Examples:
 Game Playing: Training a model to play games like Chess, Go, or video games where the
agent learns strategies to win. For instance, DeepMind’s AlphaGo used reinforcement
learning to become the best Go player in the world.
 Robotics: Teaching a robot to navigate and manipulate objects in its environment to
complete tasks like assembling products.
 Self-Driving Cars: The car learns to make driving decisions (e.g., when to stop, turn, or
accelerate) to reach a destination safely and efficiently.
Real-World Application:
 Finance: Algorithmic trading where the agent learns to make trading decisions to
maximize profit.
 Healthcare: Personalized treatment plans where the agent learns to recommend
treatment sequences for patients to maximize health outcomes.

Page 6
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Reinforcement Machine Learning


Advantages:
1. Adaptability: Can adapt to changing environments and learn optimal policies through
trial and error.
2. Sequential Decision Making: Suitable for tasks that involve a sequence of decisions,
such as game playing and robotics.
3. No Need for Supervised Labels: Learns from the consequences of actions rather than
needing labeled data.
Disadvantages:
1. Computationally Expensive: Training can be computationally intensive and time-
consuming, especially for complex environments.
2. Exploration vs. Exploitation: Balancing exploration (trying new actions) and
exploitation (using known actions) can be challenging.
3. Reward Design: Designing an appropriate reward signal can be difficult and
significantly impacts learning efficiency and effectiveness.

Artificial Neural Network(ANN)


Artificial Neural Networks (ANNs) are a fundamental component of artificial intelligence (AI)
and machine learning, inspired by the structure and function of the human brain. They are
designed to recognize patterns and learn from data through a process that mimics the way

Page 7
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

human neurons operate. This detailed explanation will cover the architecture, functioning,
types, training process, applications, and advantages and disadvantages of ANNs.

Architecture of Artificial Neural Networks

Neural Networks Architecture

An ANN consists of layers of nodes (neurons), where each node represents a computational
unit. The basic architecture includes:
1. Input Layer: This layer receives the input data. Each neuron in the input layer
represents one feature of the data.
2. Hidden Layers: These layers perform the computations and feature extraction. There
can be one or more hidden layers, and each neuron in a hidden layer is connected to
every neuron in the previous and subsequent layers. The more hidden layers an ANN
has, the deeper it is, and such networks are called Deep Neural Networks (DNNs).
3. Output Layer: This layer produces the final output. The number of neurons in the
output layer corresponds to the number of possible output classes or the nature of the
prediction.
Functioning of Artificial Neural Networks
Neurons and Activation Functions:
 Neurons: Each neuron receives inputs, processes them using weights, biases, and an
activation function, and passes the result to the next layer.

Page 8
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

 Weights and Biases: Weights determine the importance of each input, while biases
allow shifting the activation function.
 Activation Functions: These functions introduce non-linearity into the network,
enabling it to learn complex patterns. Common activation functions include Sigmoid,
Tanh, ReLU (Rectified Linear Unit), and Softmax.

Forward Propagation: During forward propagation, input data passes through the network
layer by layer. Each neuron's output is calculated using the formula:
Output=Activation Function (∑ (Input × Weight) + Bias)
Loss Function: The loss function measures the difference between the predicted output and
the actual output. Common loss functions include Mean Squared Error (MSE) for regression
tasks and Cross-Entropy Loss for classification tasks.
Backward Propagation: Backpropagation is the process of updating the weights and biases to
minimize the loss function. It involves calculating the gradient of the loss function with respect
to each weight using the chain rule of calculus and adjusting the weights in the opposite
direction of the gradient.

Types of Artificial Neural Networks


1. Feedforward Neural Networks (FNNs): The simplest type, where connections between
the nodes do not form cycles. Information moves in one direction—from input to output.
2. Convolutional Neural Networks (CNNs): Primarily used for image processing. CNNs
use convolutional layers that apply filters to the input to extract spatial features.
3. Recurrent Neural Networks (RNNs): Suitable for sequential data like time series or
text. RNNs have connections that form directed cycles, enabling them to maintain a
memory of previous inputs.
4. Long Short-Term Memory Networks (LSTMs): A type of RNN designed to handle long-
term dependencies. LSTMs use gates to control the flow of information and mitigate the
vanishing gradient problem.

Page 9
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

5. Generative Adversarial Networks (GANs): Consist of two networks—a generator and


a discriminator—that compete against each other. GANs are used for generating realistic
synthetic data.

Training Process
1. Data Preparation: Gather and preprocess the data, which includes normalization,
scaling, and splitting into training, validation, and test sets.
2. Initialization: Initialize the weights and biases, usually with small random values.
3. Forward Pass: Compute the output of the network by passing the input data through all
layers.
4. Loss Calculation: Calculate the loss using the loss function.
5. Backward Pass (Backpropagation): Compute the gradient of the loss with respect to
each weight and bias, and update them using optimization algorithms like Gradient
Descent or its variants (e.g., Adam, RMSprop).
6. Iteration: Repeat the forward and backward passes for multiple epochs until the model
converges (i.e., the loss function stops decreasing significantly).
Applications of Artificial Neural Networks
1. Image and Video Recognition: ANNs, especially CNNs, are extensively used for image
classification, object detection, and facial recognition.
2. Natural Language Processing (NLP): RNNs and their variants (LSTMs, GRUs) are used
for tasks like language translation, sentiment analysis, and text generation.
3. Speech Recognition: ANNs are used to convert spoken language into text.
4. Healthcare: ANNs help in diagnosing diseases, analyzing medical images, and predicting
patient outcomes.
5. Finance: Used for stock price prediction, fraud detection, and credit scoring.
6. Autonomous Systems: Self-driving cars use ANNs for perception, decision-making, and
control.
7. Game Playing: ANNs, combined with reinforcement learning, have been used to develop
agents that play complex games like Go and Chess.

Page 10
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Advantages of Artificial Neural Networks


1. Capability to Learn Complex Patterns: ANNs can model and learn intricate patterns in
data, making them suitable for a wide range of tasks.
2. Scalability: ANNs can handle large datasets and scale well with more data and
computational resources.
3. Versatility: Applicable to various domains, including vision, speech, and natural
language processing.
4. Non-Linearity: Activation functions introduce non-linearity, enabling ANNs to learn
non-linear relationships in data.
Disadvantages of Artificial Neural Networks
1. Computationally Intensive: Training large neural networks requires significant
computational power and memory.
2. Black Box Nature: ANNs are often criticized for being opaque; it can be challenging to
interpret and understand how they make decisions.
3. Overfitting: With complex models, there's a risk of overfitting, where the network
performs well on training data but poorly on unseen data.
4. Data Hungry: ANNs require large amounts of labeled data for training, which may not
always be available.
5. Training Time: Training deep networks can be time-consuming, especially with large
datasets and complex architectures.

Support Vector Machine


Support Vector Machine (SVM) is a supervised machine learning algorithm used for both
classification and regression. Though we say regression problems as well it’s best suited for
classification. The main objective of the SVM algorithm is to find the optimal hyperplane in an
N-dimensional space that can separate the data points in different classes in the feature space.
The hyperplane tries that the margin between the closest points of different classes should be
as maximum as possible. The dimension of the hyperplane depends upon the number of
features. If the number of input features is two, then the hyperplane is just a line. If the number

Page 11
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to
imagine when the number of features exceeds three.
Let’s consider two independent variables x1, x2, and one dependent variable which is either a
blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line
because we are considering only two input features x1, x2) that segregate our data points or do a
classification between red and blue circles. So how do we choose the best line or in general the
best hyperplane that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the largest separation
or margin between the two classes.

Multiple hyperplanes separate the data from two classes

Page 12
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

So we choose the hyperplane whose distance from it to the nearest data point on each side is
maximized. If such a hyperplane exists it is known as the maximum-margin hyperplane/hard
margin. So from the above figure, we choose L2. Let’s consider a scenario like shown below

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the data?
It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The SVM
algorithm has the characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.

Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum margin as done with previous
data sets along with that it adds a penalty each time a point crosses the margin. So the margins
in these types of cases are called soft margins. When there is a soft margin to the data set, the
SVM tries to minimize (1/margin+฀(∑penalty)). Hinge loss is a commonly used penalty. If no
violations no hinge loss.If violations hinge loss proportional to the distance of violation.

Page 13
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Till now, we were talking about linearly separable data(the group of blue balls and red balls are
separable by a straight line/linear line). What to do if data are not linearly separable?

Original 1D dataset for classification

Say, our data is shown in the figure above. SVM solves this by creating a new variable using
a kernel. We call a point xi on the line and we create a new variable yi as a function of distance
from origin o.so if we plot this we get something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance from the origin. A non-linear
function that creates a new variable is referred to as a kernel.

Support Vector Machine Terminology


1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data
points of different classes in a feature space. In the case of linear classifications, it will be
a linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyperplane, which
makes a critical role in deciding the hyperplane and margin.

Page 14
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

3. Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyperplane can be
easily found out even if the data points are not linearly separable in the original input
space. Some of the common kernel functions are linear, polynomial, radial basis
function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits
a soft margin technique. Each data point has a slack variable introduced by the soft-
margin SVM formulation, which softens the strict margin requirement and permits
certain misclassifications or violations. It discovers a compromise between increasing
the margin and reducing violations.
7. C: Margin maximisation and misclassification fines are balanced by the regularisation
parameter C in SVM. The penalty for going over the margin or misclassifying data items
is decided by it. A stricter penalty is imposed with a greater value of C, which results in a
smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently formed
by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The dual
formulation enables the use of kernel tricks and more effective computing.

Types of Support Vector Machine


Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided
into two main parts:

Page 15
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

 Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear SVMs are very
suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A
hyperplane that maximizes the margin between the classes is the decision boundary.
 Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original input data
is transformed by these kernel functions into a higher-dimensional feature space, where
the data points can be linearly separated. A linear SVM is used to locate a nonlinear
decision boundary in this modified space.

Advantages of SVM
 Effective in high-dimensional cases.
 Its memory is efficient as it uses a subset of training points in the decision function
called support vectors.
 Different kernel functions can be specified for the decision functions and its possible to
specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about patients diagnosed with
cancer enables doctors to differentiate malignant cases and benign ones are given independent
attributes.
Steps
 Load the breast cancer dataset from sklearn.datasets
 Separate input features and target variables.
 Buil and train the SVM classifiers using RBF kernel.
 Plot the scatter plot of the input features.
 Plot the decision boundary.
 Plot the decision boundary

Page 16
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

# Load the important packages


from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.svm import SVC

# Load the datasets


cancer = load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target

#Build the model


svm = SVC(kernel="rbf", gamma=0.5, C=1.0)
# Trained the model
svm.fit(X, y)

# Plot Decision Boundary


DecisionBoundaryDisplay.from_estimator(
svm,
X,
response_method="predict",
cmap=plt.cm.Spectral,
alpha=0.8,
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
)

# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()

Output:

Breast Cancer Classifications with SVM RBF kernel

Page 17
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Clustering in Machine Learning


In real world, not every data we work upon has a target variable. This kind of data
cannot be analyzed using supervised learning algorithms. We need the help of unsupervised
algorithms. One of the most popular type of analysis under unsupervised learning is Cluster
analysis. When the goal is to group similar data points in a dataset, then we use cluster
analysis.

The task of grouping data points based on their similarity with each other is called
Clustering or Cluster Analysis. This method is defined under the branch of Unsupervised
Learning, which aims at gaining insights from unlabelled data points, that is, unlike supervised
learning we don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous
dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity,
Manhattan distance, etc. and then group the points with highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3 circular clusters
forming on the basis of distance.

Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters
can be arbitrary. There are many algortihms that work well with detecting arbitrary shaped
clusters.
For example, In the below given graph we can see that the clusters formed are not circular in
shape.

Page 18
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group similar data
points:
 Hard Clustering: In this type of clustering, each data point belongs to a cluster
completely or not. For example, Let’s say there are 4 data point and we have to cluster
them into 2 clusters. So each data point will either belong to cluster 1 or cluster 2.

Data Points Clusters

A C1

B C2

C C2

D C1

 Soft Clustering: In this type of clustering, instead of assigning each data point into a
separate cluster, a probability or likelihood of that point being that cluster is evaluated.
For example, Let’s say there are 4 data point and we have to cluster them into 2 clusters.
So we will be evaluating a probability of a data point belonging to both clusters. This
probability is calculated for all data points.

Page 19
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Data Points Probability of C1 Probability of C2

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0

Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use cases of
Clustering algorithms. Clustering algorithms are majorly used for:
 Market Segmentation – Businesses use clustering to group their customers and use
targeted advertisements to attract more audience.
 Market Basket Analysis – Shop owners analyze their sales and figure out which items are
majorly bought together by the customers. For example, In USA, according to a study
diapers and beers were usually bought together by fathers.
 Social Network Analysis – Social media sites use your data to understand your browsing
behavior and provide you with targeted friend recommendations or content
recommendations.
 Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic images
like X-rays.
 Anomaly Detection – To find outliers in a stream of real-time dataset or forecasting
fraudulent transactions we can use clustering to identify them.
 Simplify working with large datasets – Each cluster is given a cluster ID after clustering
is complete. Now, you may reduce a feature set’s whole feature set into its cluster ID.
Clustering is effective when it can represent a complicated case with a straightforward
cluster ID. Using the same principle, clustering data can make complex datasets simpler.

Page 20
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

There are many more use cases for clustering but there are some of the major and common use
cases of clustering. Moving forward we will be discussing Clustering Algorithms that will help
you perform the above tasks.
Types of Clustering Algorithms
At the surface level, clustering helps in the analysis of unstructured data. Graphing, the shortest
distance, and the density of the data points are a few of the elements that influence cluster
formation. Clustering is the process of determining how related the objects are based on a
metric called the similarity measure. Similarity metrics are easier to locate in smaller sets of
features. It gets harder to create similarity measures as the number of features increases.
Depending on the type of clustering algorithm being utilized in data mining, several techniques
are employed to group the data from the datasets. In this part, the clustering techniques are
described. Various types of clustering algorithms are:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering

Centroid-based Clustering (Partitioning methods)


Partitioning methods are the most easiest clustering algorithms. They group data points
on the basis of their closeness. Generally, the similarity measure chosen for these algorithms
are Euclidian distance, Manhattan Distance or Minkowski Distance. The datasets are separated
into a predetermined number of clusters, and each cluster is referenced by a vector of values.
When compared to the vector value, the input data variable shows no difference and joins the
cluster.
The primary drawback for these algorithms is the requirement that we establish the
number of clusters, “k,” either intuitively or scientifically (using the Elbow Method) before any
clustering machine learning system starts allocating the data points. Despite this, it is still the
most popular type of clustering. K-means and K-medoids clustering are some examples of this
type clustering.

Page 21
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Density-based Clustering (Model-based methods)


Density-based clustering, a model-based method, finds groups based on the density of
data points. Contrary to centroid-based clustering, which requires that the number of clusters
be predefined and is sensitive to initialization, density-based clustering determines the number
of clusters automatically and is less susceptible to beginning positions. They are great at
handling clusters of different sizes and forms, making them ideally suited for datasets with
irregularly shaped or overlapping clusters. These methods manage both dense and sparse data
regions by focusing on local density and can distinguish clusters with a variety of
morphologies.
In contrast, centroid-based grouping, like k-means, has trouble finding arbitrary shaped
clusters. Due to its preset number of cluster requirements and extreme sensitivity to the initial
positioning of centroids, the outcomes can vary. Furthermore, the tendency of centroid-based
approaches to produce spherical or convex clusters restricts their capacity to handle
complicated or irregularly shaped clusters. In conclusion, density-based clustering overcomes
the drawbacks of centroid-based techniques by autonomously choosing cluster sizes, being
resilient to initialization, and successfully capturing clusters of various sizes and forms. The
most popular density-based clustering algorithm is DBSCAN.

Connectivity-based Clustering (Hierarchical clustering)


A method for assembling related data points into hierarchical clusters is called
hierarchical clustering. Each data point is initially taken into account as a separate cluster,
which is subsequently combined with the clusters that are the most similar to form one large
cluster that contains all of the data points.
Think about how you may arrange a collection of items based on how similar they are.
Each object begins as its own cluster at the base of the tree when using hierarchical clustering,
which creates a dendrogram, a tree-like structure. The closest pairings of clusters are then
combined into larger clusters after the algorithm examines how similar the objects are to one
another. When every object is in one cluster at the top of the tree, the merging process has
finished. Exploring various granularity levels is one of the fun things about hierarchical
clustering. To obtain a given number of clusters, you can select to cut the dendrogram at a

Page 22
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

particular height. The more similar two objects are within a cluster, the closer they are. It’s
comparable to classifying items according to their family trees, where the nearest relatives are
clustered together and the wider branches signify more general connections. There are 2
approaches for Hierarchical clustering:
 Divisive Clustering: It follows a top-down approach, here we consider all data points to
be part one big cluster and then this cluster is divide into smaller groups.
 Agglomerative Clustering: It follows a bottom-up approach, here we consider all data
points to be part of individual clusters and then these clusters are clubbed together to
make one big cluster with all data points.

Distribution-based Clustering
Using distribution-based clustering, data points are generated and organized according
to their propensity to fall into the same probability distribution (such as a Gaussian, binomial,
or other) within the data. The data elements are grouped using a probability-based distribution
that is based on statistical distributions. Included are data objects that have a higher likelihood
of being in the cluster. A data point is less likely to be included in a cluster the further it is from
the cluster’s central point, which exists in every cluster.
A notable drawback of density and boundary-based approaches is the need to specify
the clusters a priori for some algorithms, and primarily the definition of the cluster form for the
bulk of algorithms. There must be at least one tuning or hyper-parameter selected, and while
doing so should be simple, getting it wrong could have unanticipated repercussions.
Distribution-based clustering has a definite advantage over proximity and centroid-based
clustering approaches in terms of flexibility, accuracy, and cluster structure. The key issue is
that, in order to avoid over fitting, many clustering methods only work with simulated or
manufactured data, or when the bulk of the data points certainly belong to a preset distribution.
The most popular distribution-based clustering algorithm is Gaussian Mixture Model.

Page 23
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Applications of Clustering in different fields:


1. Marketing: It can be used to characterize & discover customer segments for marketing
purposes.
2. Biology: It can be used for classification among different species of plants and animals.
3. Libraries: It is used in clustering different books on the basis of topics and information.
4. Insurance: It is used to acknowledge the customers, their policies and identifying the
frauds.
5. City Planning: It is used to make groups of houses and to study their values based on
their geographical locations and other factors present.
6. Earthquake studies: By learning the earthquake-affected areas we can determine the
dangerous zones.
7. Image Processing: Clustering can be used to group similar images together, classify
images based on content, and identify patterns in image data.
8. Genetics: Clustering is used to group genes that have similar expression patterns and
identify gene networks that work together in biological processes.
9. Finance: Clustering is used to identify market segments based on customer behavior,
identify patterns in stock market data, and analyze risk in investment portfolios.
10. Customer Service: Clustering is used to group customer inquiries and complaints into
categories, identify common issues, and develop targeted solutions.
11. Manufacturing: Clustering is used to group similar products together, optimize
production processes, and identify defects in manufacturing processes.
12. Medical diagnosis: Clustering is used to group patients with similar symptoms or
diseases, which helps in making accurate diagnoses and identifying effective treatments.
13. Fraud detection: Clustering is used to identify suspicious patterns or anomalies in
financial transactions, which can help in detecting fraud or other financial crimes.
14. Traffic analysis: Clustering is used to group similar patterns of traffic data, such as peak
hours, routes, and speeds, which can help in improving transportation planning and
infrastructure.
15. Social network analysis: Clustering is used to identify communities or groups within
social networks, which can help in understanding social behavior, influence, and trends.

Page 24
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

16. Cybersecurity: Clustering is used to group similar patterns of network traffic or system
behavior, which can help in detecting and preventing cyberattacks.
17. Climate analysis: Clustering is used to group similar patterns of climate data, such as
temperature, precipitation, and wind, which can help in understanding climate change
and its impact on the environment.
18. Sports analysis: Clustering is used to group similar patterns of player or team
performance data, which can help in analyzing player or team strengths and weaknesses
and making strategic decisions.
19. Crime analysis: Clustering is used to group similar patterns of crime data, such as
location, time, and type, which can help in identifying crime hotspots, predicting future
crime trends, and improving crime prevention strategies.

Association in Machine Learning


In machine learning, association refers to discovering interesting relationships, patterns,
or correlations between variables in large datasets. This is often accomplished through
association rule learning, a technique used in data mining.

Association rule learning is a powerful tool in machine learning for uncovering hidden
patterns and relationships within large datasets. By understanding key metrics like support,
confidence, and lift, and applying efficient algorithms like Apriori, FP-Growth, and Eclat,
practitioners can derive valuable insights to inform decision-making across various domains
such as retail, finance, healthcare, and more.

Key Concepts in Association Rule Learning


1. Association Rules
 Definition: An association rule is an implication of the form {X} →{Y}, where X and Y
are sets of items. This means that if a transaction contains X, it is likely to contain Y as
well.
 Example: In a supermarket dataset, an association rule might be {bread}→{butter}
suggesting that customers who buy bread also tend to buy butter.

Page 25
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

2. Support
 Definition: Support measures the frequency of an itemset in the dataset. It indicates
how often a rule applies to the dataset.
 Formula:

 Example: If bread appears in 200 out of 1000 transactions, the support for bread is 0.2%
or 20%

3. Confidence
 Definition: Confidence measures the likelihood of Y given X. It quantifies the reliability
of the rule.
 Formula:

 Example: If 150 out of 200 transactions containing bread also contain butter, the
confidence of {bread}→{butter} is 0.75% or 75%

4. Lift
 Definition: Lift measures how much more likely Y is to occur with X compared to its
occurrence alone. It indicates the strength of a rule over random chance.
 Formula:

Example: If the support of butter is 0.10.10.1, the lift for {bread}→{butter} with a
confidence of

Page 26
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Techniques and Algorithms


Apriori Algorithm
 Overview: The Apriori algorithm is a classic algorithm for mining frequent itemsets and
deriving association rules. It uses a bottom-up approach, where frequent subsets are
extended one item at a time (known as candidate generation).
 Steps:
1. Generate Candidate Itemsets: Generate all possible itemsets of a given size.
2. Count Support: Count the support of each candidate itemset.
3. Prune: Remove itemsets that do not meet the minimum support threshold.
4. Repeat: Increase the size of itemsets and repeat the process until no more
itemsets meet the support threshold.
 Challenges: The main challenge is the computational cost, especially when the dataset is
large and has many items.

FP-Growth Algorithm
 Overview: The FP-Growth (Frequent Pattern Growth) algorithm is an improvement
over the Apriori algorithm. It uses a divide-and-conquer approach and avoids candidate
generation.
 Steps:
1. Construct FP-Tree: Compress the dataset into a frequent pattern tree (FP-tree),
retaining the itemset associations.
2. Mine FP-Tree: Extract frequent itemsets directly from the FP-tree.
 Advantages: More efficient than Apriori for large datasets as it reduces the number of
scans and the size of intermediate datasets.

Eclat Algorithm
 Overview: The Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal)
algorithm uses a depth-first search strategy to find frequent itemsets.
 Steps:

Page 27
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

1. Vertical Data Format: Represent data using transaction ID lists (tid-lists) for
each item.
2. Intersection: Use the intersection of tid-lists to count the support of itemsets.
3. Recursive Exploration: Recursively explore itemset extensions.
 Advantages: Efficient in certain cases where a vertical data format is more suitable.

Applications of Association Rule Learning


1. Market Basket Analysis
 Purpose: To identify sets of products that frequently co-occur in transactions.
 Example: Supermarkets use market basket analysis to discover that customers who buy
diapers often also buy beer.
 Benefits: Helps in product placement, inventory management, and promotional
bundling.
2. Recommendation Systems
 Purpose: To suggest items to users based on their past behavior or the behavior of
similar users.
 Example: E-commerce sites recommend products based on what other customers with
similar purchase histories have bought.
 Benefits: Enhances user experience and increases sales.
3. Fraud Detection
 Purpose: To identify patterns indicative of fraudulent behavior.
 Example: Credit card companies use association rules to detect unusual transaction
patterns that may indicate fraud.
 Benefits: Helps in early detection and prevention of fraudulent activities.
4. Healthcare
 Purpose: To discover associations between medical conditions, symptoms, and
treatments.
 Example: Analysis of patient records to find common co-occurrences of symptoms and
diagnoses.
 Benefits: Aids in diagnosis, treatment planning, and understanding disease progression.

Page 28
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

5. Web Usage Mining


 Purpose: To analyze user navigation patterns on websites.
 Example: Identifying common paths users take on an e-commerce site to optimize the
site’s structure and content placement.
 Benefits: Improves website usability and user engagement.

Advantages and Disadvantages of Unsupervised Learning


Advantages
1. No Need for Labeled Data:
o Saves time and effort since no labeling is required.
o Useful when labeled data is scarce or unavailable.
2. Discovering Hidden Patterns:
o Finds patterns and relationships in the data that may not be immediately
apparent.
o Can reveal new insights and trends.
3. Flexibility:
o Applicable to a variety of tasks like clustering, dimensionality reduction, and
anomaly detection.
o Adaptable to many different types of problems.
4. Data Preprocessing:
o Helps organize and simplify data, making it easier to use in other machine
learning tasks.
o Improves the efficiency and performance of subsequent algorithms.
5. Handling High-Dimensional Data:
o Manages and analyzes data with many features more efficiently.
o Simplifies complex data.
Disadvantages
1. Interpretability:
o Results can be difficult to understand and explain.
o Challenging to derive actionable insights.

Page 29
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

2. No Clear Evaluation Metrics:


o Lacks straightforward ways to measure model performance.
o Tough to assess accuracy and effectiveness.
3. Dependency on Data Quality:
o Highly sensitive to the quality of input data.
o Noisy or irrelevant features can degrade performance.
4. Complexity and Computation:
o Some algorithms require significant computing power and time.
o Can be slow and resource-intensive.
5. Risk of Overfitting:
o Model might learn noise in the data instead of actual patterns.
o Poor generalization to new, unseen data.
6. Lack of Direct Control:
o Less control over the output compared to supervised learning.
o Difficult to steer the algorithm towards specific insights or behaviors.

Hill Climbing Algorithm


The Hill Climbing algorithm is a simple optimization algorithm used in artificial
intelligence and other fields to find the best solution by iteratively making small changes to an
initial solution. It is a local search algorithm that continuously moves in the direction of
increasing value (for maximization problems) or decreasing value (for minimization problems)
to find the local optimum.
Key Concepts
1. State Space: The set of all possible states or solutions.
2. Objective Function: A function that evaluates the quality of a given state.
3. Neighbor: A state that is reachable from the current state by a small change.
4. Current State: The state being evaluated.
5. Successor: A neighbor with a better objective function value.

Page 30
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Algorithm Steps
1. Start with an initial state: Choose a random initial state from the state space.
2. Evaluate the current state: Calculate the value of the objective function for the current
state.
3. Generate neighbors: Create neighboring states by making small changes to the current
state.
4. Select the best neighbor: Choose the neighbor with the best objective function value.
5. Move to the neighbor: If the best neighbor is better than the current state, move to the
neighbor and repeat the process. If not, the algorithm terminates as a local optimum has
been reached.

Types of Hill Climbing Algorithm


1. Simple Hill Climbing
 Move to the first neighbor that improves the objective function.
 Stops when no improvement is found.
2. Steepest-Ascent Hill Climbing
 Evaluate all neighbors and move to the best one.
 More computationally expensive but potentially more effective.
3. Stochastic Hill Climbing
 Randomly selects a neighbor and moves to it if it improves the objective function.
 Adds randomness to avoid local optima.
4. Random-Restart Hill Climbing
 Runs the hill climbing algorithm multiple times with different random initial states.
 Helps to avoid getting stuck in local optima by restarting the search.
5. Simulated Annealing
 Introduces a probabilistic element to occasionally accept worse solutions.
 Helps to escape local optima by allowing worse moves with decreasing probability over
time.

Page 31
Unit-4 Learning VI Sem BCA (Artificial Intelligence)

Advantages
 Simple to implement and understand.

 Requires minimal computational resources.

 Effective for problems with well-behaved objective functions.

Disadvantages
 Prone to getting stuck in local optima.

 Performance depends on the initial state.

 Not suitable for highly complex landscapes with many local optima.

Applications
 Path finding (e.g., in robotics or game AI).

 Scheduling problems.

 Optimization problems in various fields like finance, engineering, and operations

research.

Page 32

You might also like