Unit-4 Learning in AI (1)
Unit-4 Learning in AI (1)
Learning
Machine learning is the branch of Artificial Intelligence that focuses on developing
models and algorithms that let computers learn from data and improve from previous
experience without being explicitly programmed for every task. In simple words, ML teaches
the systems to think and understand like humans by learning from the data.
In artificial intelligence (AI), there are several types of learning methodologies that systems use
to acquire knowledge and improve their performance over time. The main types of learning in
AI include:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Supervised Learning
Definition: Supervised learning is a type of machine learning where the model is trained using
labeled data. This means that for each input data point, there is a corresponding output label.
How It Works:
1. Training Phase: The algorithm learns from a training dataset, which includes input-
output pairs. The model makes predictions on the inputs and adjusts itself based on the
errors it makes.
2. Testing Phase: The trained model is tested on a separate dataset (test data) to evaluate
its performance.
Examples:
Image Classification: Consider a dataset of images where each image is labeled as
either a 'cat' or 'dog'. The model is trained to classify new images correctly.
Spam Detection: Email messages are labeled as 'spam' or 'not spam'. The model learns
to classify incoming emails accordingly.
Regression: Predicting the price of a house based on features like size, location, and
number of rooms. Here, the output is a continuous value.
Page 1
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Real-World Application:
Healthcare: Predicting diseases from medical imaging.
Finance: Stock price prediction and credit scoring.
Advantages:
1. High Accuracy: Models can achieve high accuracy with sufficient labeled data.
2. Interpretability: The relationship between input and output is well-defined, making it
easier to understand and interpret model predictions.
3. Wide Range of Applications: Suitable for various tasks, such as classification and
regression, across different domains.
Disadvantages:
1. Data Labeling: Requires a large amount of labeled data, which can be time-consuming
and expensive to obtain.
2. Overfitting: Models can overfit to the training data, leading to poor generalization on
new data.
3. Limited to Known Tasks: Can only be used for tasks where labeled data is available.
Page 2
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Unsupervised Learning
Definition: Unsupervised learning involves training a model on data without labeled
responses. The goal is to uncover hidden patterns or intrinsic structures in the input data.
How It Works:
1. Data Analysis: The algorithm tries to learn the underlying structure or distribution in
the data.
2. Clustering: Grouping similar data points together.
3. Dimensionality Reduction: Reducing the number of random variables under
consideration.
Examples:
Clustering: Grouping customers based on purchasing behavior without knowing in
advance what the groups might be. Algorithms like k-means or hierarchical clustering
are used.
Anomaly Detection: Identifying unusual data points, such as fraudulent transactions,
that do not conform to the rest of the dataset.
Principal Component Analysis (PCA): Reducing the dimensionality of a dataset while
retaining most of the variance. For example, compressing high-dimensional image data.
Real-World Application:
Marketing: Customer segmentation for targeted marketing campaigns.
Biology: Classifying species based on genetic information.
Page 3
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Advantages:
1. No Need for Labeled Data: Can work with unlabeled data, which is often more readily
available.
2. Discover Hidden Patterns: Useful for exploring data and finding hidden structures,
such as clustering and associations.
3. Data Preprocessing: Effective for tasks like dimensionality reduction and anomaly
detection.
Disadvantages:
1. Uncertainty in Results: The results can be difficult to interpret and validate since there
are no labels.
2. Less Control: Less control over the output compared to supervised learning as it is more
exploratory.
3. Complexity: Some algorithms can be computationally intensive and require significant
computational resources.
Semi-Supervised Learning
Definition: Semi-supervised learning is a hybrid approach that combines a small amount of
labeled data with a large amount of unlabeled data during training. This method can
significantly improve learning accuracy when acquiring labeled data is costly.
How It Works:
1. Initial Training: The model is first trained on the small labeled dataset.
2. Expansion: The model then uses this initial knowledge to label the large unlabeled
dataset.
3. Refinement: The model is retrained on the newly labeled dataset.
Examples:
Text Classification: Using a small set of labeled documents to classify a much larger set
of unlabeled documents.
Image Recognition: Training a model on a few labeled images and a large number of
unlabeled images to improve classification accuracy.
Page 4
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Real-World Application:
Web Content Classification: Classifying vast amounts of web pages with minimal
labeled data.
Medical Imaging: Labeling a small number of medical images and using a large set of
unlabeled images to train more accurate diagnostic models.
Semi-Supervised Learning
Advantages:
1. Cost-Effective: Reduces the need for large labeled datasets by leveraging a small
amount of labeled data with a large amount of unlabeled data.
2. Improved Accuracy: Can improve model performance compared to unsupervised
learning alone by using labeled data.
3. Scalable: Can scale to large datasets where full labeling is impractical.
Disadvantages:
1. Complex Implementation: Combining labeled and unlabeled data effectively can be
complex and may require advanced techniques.
2. Quality of Unlabeled Data: The success depends on the quality and representativeness
of the unlabeled data.
3. Data Integration: Integrating labeled and unlabeled data can be challenging and might
require careful preprocessing.
Page 5
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Reinforcement Learning
Definition: Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by performing actions in an environment to achieve maximum cumulative
reward. Unlike supervised learning, RL does not require labeled input/output pairs and instead
learns from the consequences of actions.
How It Works:
1. Agent and Environment: The agent interacts with the environment by taking actions.
2. Reward and Feedback: The environment provides feedback in the form of rewards or
punishments.
3. Policy and Value Function: The agent uses a policy to decide actions and a value
function to evaluate the goodness of states or state-action pairs.
4. Learning and Exploration: The agent continuously explores and exploits to maximize
the cumulative reward over time.
Examples:
Game Playing: Training a model to play games like Chess, Go, or video games where the
agent learns strategies to win. For instance, DeepMind’s AlphaGo used reinforcement
learning to become the best Go player in the world.
Robotics: Teaching a robot to navigate and manipulate objects in its environment to
complete tasks like assembling products.
Self-Driving Cars: The car learns to make driving decisions (e.g., when to stop, turn, or
accelerate) to reach a destination safely and efficiently.
Real-World Application:
Finance: Algorithmic trading where the agent learns to make trading decisions to
maximize profit.
Healthcare: Personalized treatment plans where the agent learns to recommend
treatment sequences for patients to maximize health outcomes.
Page 6
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Page 7
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
human neurons operate. This detailed explanation will cover the architecture, functioning,
types, training process, applications, and advantages and disadvantages of ANNs.
An ANN consists of layers of nodes (neurons), where each node represents a computational
unit. The basic architecture includes:
1. Input Layer: This layer receives the input data. Each neuron in the input layer
represents one feature of the data.
2. Hidden Layers: These layers perform the computations and feature extraction. There
can be one or more hidden layers, and each neuron in a hidden layer is connected to
every neuron in the previous and subsequent layers. The more hidden layers an ANN
has, the deeper it is, and such networks are called Deep Neural Networks (DNNs).
3. Output Layer: This layer produces the final output. The number of neurons in the
output layer corresponds to the number of possible output classes or the nature of the
prediction.
Functioning of Artificial Neural Networks
Neurons and Activation Functions:
Neurons: Each neuron receives inputs, processes them using weights, biases, and an
activation function, and passes the result to the next layer.
Page 8
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Weights and Biases: Weights determine the importance of each input, while biases
allow shifting the activation function.
Activation Functions: These functions introduce non-linearity into the network,
enabling it to learn complex patterns. Common activation functions include Sigmoid,
Tanh, ReLU (Rectified Linear Unit), and Softmax.
Forward Propagation: During forward propagation, input data passes through the network
layer by layer. Each neuron's output is calculated using the formula:
Output=Activation Function (∑ (Input × Weight) + Bias)
Loss Function: The loss function measures the difference between the predicted output and
the actual output. Common loss functions include Mean Squared Error (MSE) for regression
tasks and Cross-Entropy Loss for classification tasks.
Backward Propagation: Backpropagation is the process of updating the weights and biases to
minimize the loss function. It involves calculating the gradient of the loss function with respect
to each weight using the chain rule of calculus and adjusting the weights in the opposite
direction of the gradient.
Page 9
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Training Process
1. Data Preparation: Gather and preprocess the data, which includes normalization,
scaling, and splitting into training, validation, and test sets.
2. Initialization: Initialize the weights and biases, usually with small random values.
3. Forward Pass: Compute the output of the network by passing the input data through all
layers.
4. Loss Calculation: Calculate the loss using the loss function.
5. Backward Pass (Backpropagation): Compute the gradient of the loss with respect to
each weight and bias, and update them using optimization algorithms like Gradient
Descent or its variants (e.g., Adam, RMSprop).
6. Iteration: Repeat the forward and backward passes for multiple epochs until the model
converges (i.e., the loss function stops decreasing significantly).
Applications of Artificial Neural Networks
1. Image and Video Recognition: ANNs, especially CNNs, are extensively used for image
classification, object detection, and facial recognition.
2. Natural Language Processing (NLP): RNNs and their variants (LSTMs, GRUs) are used
for tasks like language translation, sentiment analysis, and text generation.
3. Speech Recognition: ANNs are used to convert spoken language into text.
4. Healthcare: ANNs help in diagnosing diseases, analyzing medical images, and predicting
patient outcomes.
5. Finance: Used for stock price prediction, fraud detection, and credit scoring.
6. Autonomous Systems: Self-driving cars use ANNs for perception, decision-making, and
control.
7. Game Playing: ANNs, combined with reinforcement learning, have been used to develop
agents that play complex games like Go and Chess.
Page 10
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Page 11
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to
imagine when the number of features exceeds three.
Let’s consider two independent variables x1, x2, and one dependent variable which is either a
blue circle or a red circle.
From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line
because we are considering only two input features x1, x2) that segregate our data points or do a
classification between red and blue circles. So how do we choose the best line or in general the
best hyperplane that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the largest separation
or margin between the two classes.
Page 12
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
So we choose the hyperplane whose distance from it to the nearest data point on each side is
maximized. If such a hyperplane exists it is known as the maximum-margin hyperplane/hard
margin. So from the above figure, we choose L2. Let’s consider a scenario like shown below
Here we have one blue ball in the boundary of the red ball. So how does SVM classify the data?
It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The SVM
algorithm has the characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.
So in this type of data point what SVM does is, finds the maximum margin as done with previous
data sets along with that it adds a penalty each time a point crosses the margin. So the margins
in these types of cases are called soft margins. When there is a soft margin to the data set, the
SVM tries to minimize (1/margin+(∑penalty)). Hinge loss is a commonly used penalty. If no
violations no hinge loss.If violations hinge loss proportional to the distance of violation.
Page 13
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Till now, we were talking about linearly separable data(the group of blue balls and red balls are
separable by a straight line/linear line). What to do if data are not linearly separable?
Say, our data is shown in the figure above. SVM solves this by creating a new variable using
a kernel. We call a point xi on the line and we create a new variable yi as a function of distance
from origin o.so if we plot this we get something like as shown below
In this case, the new variable y is created as a function of distance from the origin. A non-linear
function that creates a new variable is referred to as a kernel.
Page 14
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
3. Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyperplane can be
easily found out even if the data points are not linearly separable in the original input
space. Some of the common kernel functions are linear, polynomial, radial basis
function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits
a soft margin technique. Each data point has a slack variable introduced by the soft-
margin SVM formulation, which softens the strict margin requirement and permits
certain misclassifications or violations. It discovers a compromise between increasing
the margin and reducing violations.
7. C: Margin maximisation and misclassification fines are balanced by the regularisation
parameter C in SVM. The penalty for going over the margin or misclassifying data items
is decided by it. A stricter penalty is imposed with a greater value of C, which results in a
smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently formed
by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The dual
formulation enables the use of kernel tricks and more effective computing.
Page 15
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear SVMs are very
suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A
hyperplane that maximizes the margin between the classes is the decision boundary.
Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original input data
is transformed by these kernel functions into a higher-dimensional feature space, where
the data points can be linearly separated. A linear SVM is used to locate a nonlinear
decision boundary in this modified space.
Advantages of SVM
Effective in high-dimensional cases.
Its memory is efficient as it uses a subset of training points in the decision function
called support vectors.
Different kernel functions can be specified for the decision functions and its possible to
specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about patients diagnosed with
cancer enables doctors to differentiate malignant cases and benign ones are given independent
attributes.
Steps
Load the breast cancer dataset from sklearn.datasets
Separate input features and target variables.
Buil and train the SVM classifiers using RBF kernel.
Plot the scatter plot of the input features.
Plot the decision boundary.
Plot the decision boundary
Page 16
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
# Scatter plot
plt.scatter(X[:, 0], X[:, 1],
c=y,
s=20, edgecolors="k")
plt.show()
Output:
Page 17
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
The task of grouping data points based on their similarity with each other is called
Clustering or Cluster Analysis. This method is defined under the branch of Unsupervised
Learning, which aims at gaining insights from unlabelled data points, that is, unlike supervised
learning we don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a heterogeneous
dataset. It evaluates the similarity based on a metric like Euclidean distance, Cosine similarity,
Manhattan distance, etc. and then group the points with highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3 circular clusters
forming on the basis of distance.
Now it is not necessary that the clusters formed must be circular in shape. The shape of clusters
can be arbitrary. There are many algortihms that work well with detecting arbitrary shaped
clusters.
For example, In the below given graph we can see that the clusters formed are not circular in
shape.
Page 18
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Types of Clustering
Broadly speaking, there are 2 types of clustering that can be performed to group similar data
points:
Hard Clustering: In this type of clustering, each data point belongs to a cluster
completely or not. For example, Let’s say there are 4 data point and we have to cluster
them into 2 clusters. So each data point will either belong to cluster 1 or cluster 2.
A C1
B C2
C C2
D C1
Soft Clustering: In this type of clustering, instead of assigning each data point into a
separate cluster, a probability or likelihood of that point being that cluster is evaluated.
For example, Let’s say there are 4 data point and we have to cluster them into 2 clusters.
So we will be evaluating a probability of a data point belonging to both clusters. This
probability is calculated for all data points.
Page 19
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
A 0.91 0.09
B 0.3 0.7
C 0.17 0.83
D 1 0
Uses of Clustering
Now before we begin with types of clustering algorithms, we will go through the use cases of
Clustering algorithms. Clustering algorithms are majorly used for:
Market Segmentation – Businesses use clustering to group their customers and use
targeted advertisements to attract more audience.
Market Basket Analysis – Shop owners analyze their sales and figure out which items are
majorly bought together by the customers. For example, In USA, according to a study
diapers and beers were usually bought together by fathers.
Social Network Analysis – Social media sites use your data to understand your browsing
behavior and provide you with targeted friend recommendations or content
recommendations.
Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic images
like X-rays.
Anomaly Detection – To find outliers in a stream of real-time dataset or forecasting
fraudulent transactions we can use clustering to identify them.
Simplify working with large datasets – Each cluster is given a cluster ID after clustering
is complete. Now, you may reduce a feature set’s whole feature set into its cluster ID.
Clustering is effective when it can represent a complicated case with a straightforward
cluster ID. Using the same principle, clustering data can make complex datasets simpler.
Page 20
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
There are many more use cases for clustering but there are some of the major and common use
cases of clustering. Moving forward we will be discussing Clustering Algorithms that will help
you perform the above tasks.
Types of Clustering Algorithms
At the surface level, clustering helps in the analysis of unstructured data. Graphing, the shortest
distance, and the density of the data points are a few of the elements that influence cluster
formation. Clustering is the process of determining how related the objects are based on a
metric called the similarity measure. Similarity metrics are easier to locate in smaller sets of
features. It gets harder to create similarity measures as the number of features increases.
Depending on the type of clustering algorithm being utilized in data mining, several techniques
are employed to group the data from the datasets. In this part, the clustering techniques are
described. Various types of clustering algorithms are:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering
Page 21
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Page 22
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
particular height. The more similar two objects are within a cluster, the closer they are. It’s
comparable to classifying items according to their family trees, where the nearest relatives are
clustered together and the wider branches signify more general connections. There are 2
approaches for Hierarchical clustering:
Divisive Clustering: It follows a top-down approach, here we consider all data points to
be part one big cluster and then this cluster is divide into smaller groups.
Agglomerative Clustering: It follows a bottom-up approach, here we consider all data
points to be part of individual clusters and then these clusters are clubbed together to
make one big cluster with all data points.
Distribution-based Clustering
Using distribution-based clustering, data points are generated and organized according
to their propensity to fall into the same probability distribution (such as a Gaussian, binomial,
or other) within the data. The data elements are grouped using a probability-based distribution
that is based on statistical distributions. Included are data objects that have a higher likelihood
of being in the cluster. A data point is less likely to be included in a cluster the further it is from
the cluster’s central point, which exists in every cluster.
A notable drawback of density and boundary-based approaches is the need to specify
the clusters a priori for some algorithms, and primarily the definition of the cluster form for the
bulk of algorithms. There must be at least one tuning or hyper-parameter selected, and while
doing so should be simple, getting it wrong could have unanticipated repercussions.
Distribution-based clustering has a definite advantage over proximity and centroid-based
clustering approaches in terms of flexibility, accuracy, and cluster structure. The key issue is
that, in order to avoid over fitting, many clustering methods only work with simulated or
manufactured data, or when the bulk of the data points certainly belong to a preset distribution.
The most popular distribution-based clustering algorithm is Gaussian Mixture Model.
Page 23
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Page 24
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
16. Cybersecurity: Clustering is used to group similar patterns of network traffic or system
behavior, which can help in detecting and preventing cyberattacks.
17. Climate analysis: Clustering is used to group similar patterns of climate data, such as
temperature, precipitation, and wind, which can help in understanding climate change
and its impact on the environment.
18. Sports analysis: Clustering is used to group similar patterns of player or team
performance data, which can help in analyzing player or team strengths and weaknesses
and making strategic decisions.
19. Crime analysis: Clustering is used to group similar patterns of crime data, such as
location, time, and type, which can help in identifying crime hotspots, predicting future
crime trends, and improving crime prevention strategies.
Association rule learning is a powerful tool in machine learning for uncovering hidden
patterns and relationships within large datasets. By understanding key metrics like support,
confidence, and lift, and applying efficient algorithms like Apriori, FP-Growth, and Eclat,
practitioners can derive valuable insights to inform decision-making across various domains
such as retail, finance, healthcare, and more.
Page 25
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
2. Support
Definition: Support measures the frequency of an itemset in the dataset. It indicates
how often a rule applies to the dataset.
Formula:
Example: If bread appears in 200 out of 1000 transactions, the support for bread is 0.2%
or 20%
3. Confidence
Definition: Confidence measures the likelihood of Y given X. It quantifies the reliability
of the rule.
Formula:
Example: If 150 out of 200 transactions containing bread also contain butter, the
confidence of {bread}→{butter} is 0.75% or 75%
4. Lift
Definition: Lift measures how much more likely Y is to occur with X compared to its
occurrence alone. It indicates the strength of a rule over random chance.
Formula:
Example: If the support of butter is 0.10.10.1, the lift for {bread}→{butter} with a
confidence of
Page 26
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
FP-Growth Algorithm
Overview: The FP-Growth (Frequent Pattern Growth) algorithm is an improvement
over the Apriori algorithm. It uses a divide-and-conquer approach and avoids candidate
generation.
Steps:
1. Construct FP-Tree: Compress the dataset into a frequent pattern tree (FP-tree),
retaining the itemset associations.
2. Mine FP-Tree: Extract frequent itemsets directly from the FP-tree.
Advantages: More efficient than Apriori for large datasets as it reduces the number of
scans and the size of intermediate datasets.
Eclat Algorithm
Overview: The Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal)
algorithm uses a depth-first search strategy to find frequent itemsets.
Steps:
Page 27
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
1. Vertical Data Format: Represent data using transaction ID lists (tid-lists) for
each item.
2. Intersection: Use the intersection of tid-lists to count the support of itemsets.
3. Recursive Exploration: Recursively explore itemset extensions.
Advantages: Efficient in certain cases where a vertical data format is more suitable.
Page 28
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Page 29
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Page 30
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Algorithm Steps
1. Start with an initial state: Choose a random initial state from the state space.
2. Evaluate the current state: Calculate the value of the objective function for the current
state.
3. Generate neighbors: Create neighboring states by making small changes to the current
state.
4. Select the best neighbor: Choose the neighbor with the best objective function value.
5. Move to the neighbor: If the best neighbor is better than the current state, move to the
neighbor and repeat the process. If not, the algorithm terminates as a local optimum has
been reached.
Page 31
Unit-4 Learning VI Sem BCA (Artificial Intelligence)
Advantages
Simple to implement and understand.
Disadvantages
Prone to getting stuck in local optima.
Not suitable for highly complex landscapes with many local optima.
Applications
Path finding (e.g., in robotics or game AI).
Scheduling problems.
research.
Page 32