0% found this document useful (0 votes)
2 views

Module 2

A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons) that learn from data to recognize patterns and solve problems. Key components include input, hidden, and output layers, with various types of networks like feedforward, convolutional, and recurrent networks tailored for specific tasks. Applications range from social media recommendations to personal assistants, with foundational models like the perceptron serving as building blocks for more complex neural networks.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 2

A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons) that learn from data to recognize patterns and solve problems. Key components include input, hidden, and output layers, with various types of networks like feedforward, convolutional, and recurrent networks tailored for specific tasks. Applications range from social media recommendations to personal assistants, with foundational models like the perceptron serving as building blocks for more complex neural networks.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

1. What is a Neural Network?

A neural network is a computational model inspired by the human


brain, designed to recognize patterns and solve problems through
learning from data. It is a key element in artificial intelligence (AI) and
machine learning (ML).

Key Characteristics:
• Consists of layers of nodes (neurons) connected by edges (weights).
• Learns by adjusting weights through a process called training.
• Used for tasks like image recognition, natural language processing,
and predictive modeling.
Artificial neural networks (ANNs) are a fundamental concept in deep
learning within artificial intelligence. They are crucial in handling complex
application scenarios that traditional machine-learning algorithms may
struggle with. Here’s an overview of how neural networks operate and
their components:
• Inspired by Biology
ANNs are inspired by biological neurons in the human brain. Just as
neurons activate under specific conditions to trigger actions in the
body, artificial neurons in ANNs activate based on input data.
• Structure of ANNs
ANNs consist of layers of interconnected artificial neurons. These
neurons are organized into layers, each performing specific
computations using activation functions to decide which signals to pass
onto the next layer.
2. The Structure of a Neural Network
Neural networks are composed of three main layers:
a. Input Layer:
1.Receives raw data (e.g., numerical values, pixel intensities).
2.Passes the data to the next layer for processing.
b. Hidden Layers:
3.Perform computations on the input data.
4.Apply activation functions to introduce non-linearity.
5.There can be multiple hidden layers in a deep neural network.
c. Output Layer:
6.Produces the final output (e.g., a classification, regression value,
or probability).
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal
brains So they share a lot of similarities in structure and function wise.
3. How Neural Networks Work
a. Input Data: The model receives data in numerical form.
Example: An image is represented as a matrix of pixel values.
b. Forward Propagation: The input data flows through the network:
1.Each neuron processes inputs by applying weights and biases.
2.Outputs are passed through an activation function (e.g., ReLU,
sigmoid).
c. Prediction: The network generates an output (e.g., a label, score, or
probability).
d. Training:
3.The network learns by comparing its predictions with actual
outputs (labels) using a loss function.
4.Backpropagation adjusts weights to minimize error.
4. Key Terms and Concepts
• Neuron (Node) - The basic unit in a neural network that processes input and
generates output.
• Weights - Numerical values representing the importance of connections between
neurons.
• Bias - A constant added to the input to help the network fit the data better.
• Activation Function - Introduces non-linearity into the model to capture complex
patterns.
• Examples: ReLU, sigmoid, tanh.
• Loss Function -Measures the difference between predicted and actual values.
• Examples: Mean Squared Error (MSE), Cross-Entropy Loss.
• Learning Rate - Determines how much the weights are adjusted during training.
5. Types of Neural Networks
• Feedforward Neural Networks (FNN):
• Data flows in one direction from input to output.
• Used for simple classification and regression tasks.
• Convolutional Neural Networks (CNN):
• Specialized for image data.
• Uses convolution layers to detect spatial patterns like edges or
shapes.
• Recurrent Neural Networks (RNN):
• Processes sequential data (e.g., text, time series).
• Has memory mechanisms to retain context over time.
• Deep Neural Networks (DNN):
• Contains multiple hidden layers.
• Used for complex problems requiring high accuracy.
Applications of Artificial Neural Networks

Social Media
• Artificial Neural Networks are used heavily in Social Media. For example, let’s
take the ‘People you may know’ feature on Facebook that suggests people that
you might know in real life so that you can send them friend requests.
• Well, this magical effect is achieved by using Artificial Neural Networks that
analyze your profile, your interests, your current friends, and also their friends
and various other factors to calculate the people you might potentially know.
• Another common application of Machine Learning in social media is facial
recognition .
• This is done by finding around 100 reference points on the person’s face and
then matching them with those already available in the database using
convolutional neural networks.
Marketing and Sales
• When you log onto E-commerce sites like Amazon and Flipkart, they
will recommend your products to buy based on your previous
browsing history.
• Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show
you restaurant recommendations based on your tastes and previous
order history.
• This is true across all new-age marketing segments like Book sites,
Movie services, Hospitality sites, etc. and it is done by
implementing personalized marketing .
• This uses Artificial Neural Networks to identify the customer likes,
dislikes, previous shopping history, etc., and then tailor the marketing
campaigns accordingly.
Personal Assistants
• I am sure you all have heard of Siri, Alexa, Cortana, etc.
• These are personal assistants and an example of speech recognition
that uses Natural Language Processing to interact with the users and
formulate a response accordingly.
• Natural Language Processing uses artificial neural networks that are
made to handle many tasks of these personal assistants such as
managing the language syntax, semantics, correct speech, the
conversation that is going on, etc.
Basic example of implementing a simple neural network using Python and
TensorFlow in Google Colab. This neural network classifies handwritten
digits from the MNIST dataset.

https://colab.research.google.com/drive/1T1ACgTU0Jx7JElK0S5NyqLwFAtSZXgxB?usp=sharing
What is Perceptron | The Simplest Artificial
neural network
The Perceptron is one of the simplest artificial neural
network architectures, introduced by Frank Rosenblatt in 1957. It is
primarily used for binary classification.

At that time, traditional methods like Statistical Machine Learning and


Conventional Programming were commonly used for predictions.

Despite being one of the simplest forms of artificial neural networks,


the Perceptron model proved to be highly effective in solving specific
classification problems, laying the groundwork for advancements in AI
and machine learning.

This lesson aims to provide fundamentals of the perceptron


model, its architecture, working principles, and application
What is Perceptron?

Perceptron is a type of neural network that performs binary


classification that maps input features to an output decision, usually
classifying data into one of two categories, such as 0 or 1.

Perceptron consists of a single layer of input nodes that are fully


connected to a layer of output nodes. It is particularly good at
learning linearly separable patterns.

It utilizes a variation of artificial neurons called Threshold Logic


Units (TLU), which were first introduced by McCulloch and Walter
Pitts in the 1940s.

This foundational model has played a crucial role in the


development of more advanced neural networks and machine
learning algorithms.
Types of Perceptron

Single-Layer Perceptron is a type of perceptron is limited to learning


linearly separable patterns.
It is effective for tasks where the data can be divided into distinct
categories through a straight line.
While powerful in its simplicity, it struggles with more complex
problems where the relationship between inputs and outputs is non-
linear.

Multi-Layer Perceptron possess enhanced processing capabilities as


they consist of two or more layers, adept at handling more complex
patterns and relationships within the data.
Basic Components of Perceptron

A Perceptron is composed of key components that work together to


process information and make predictions.
• Input Features
• The perceptron takes multiple input features, each representing a
characteristic of the input data.
• Weights
• Each input feature is assigned a weight that determines its
influence on the output. These weights are adjusted during
training to find the optimal values.
• Summation Function
• The perceptron calculates the weighted sum of its inputs,
combining them with their respective weights.
• Activation Function
• The weighted sum is passed through the Heaviside step
function, comparing it to a threshold to produce a binary output
(0 or 1).
• Output
• The final output is determined by the activation function, often
used for binary classification tasks.
• Bias
• The bias term helps the perceptron make adjustments
independent of the input, improving its flexibility in learning.
• Learning Algorithm
• The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize
prediction errors.

These components enable the perceptron to learn from data and


make predictions. While a single perceptron can handle simple binary
classification, complex tasks require multiple perceptrons organized
How does Perceptron work?

A weight is assigned to each input node of a perceptron, indicating the importance


of that input in determining the output.
The Perceptron’s output is calculated as a weighted sum of the inputs, which is then
passed through an activation function to decide whether the Perceptron will fire.
The weighted sum is computed as:

The step function compares this weighted sum to a threshold. If the input is larger
than the threshold value, the output is 1; otherwise, it’s 0.
This is the most common activation function used in Perceptrons are represented
by the Heaviside step function:
A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU
fully connected to all input nodes.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:

where X is the input W is the weight for each inputs neurons


and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize


the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta


rule or the Perceptron learning rule.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:

where X is the input W is the weight for each inputs neurons


and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize


the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta


rule or the Perceptron learning rule.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:

where X is the input W is the weight for each inputs neurons


and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize


the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta


rule or the Perceptron learning rule.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:

where X is the input W is the weight for each inputs neurons


and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize


the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta


rule or the Perceptron learning rule.
1. Introduction

• Activation functions are mathematical functions applied to the


output of neurons in a neural network.
• They introduce non-linearity, enabling the network to learn
complex patterns and make accurate predictions.
• Without activation functions, a neural network would behave
like a simple linear regression model, limiting its ability to solve
complex problems.
2. Importance of Activation Functions

• Introduce non-linearity, allowing neural networks to approximate


complex functions.
• Help control the range of neuron outputs, making training more
stable.
• Enable networks to learn hierarchical patterns and features.
3. Understanding Non-Linearity

• In simple terms, non-linearity means that the relationship between inputs


and outputs is not a straight line.
• If a neural network were purely linear, increasing an input would always lead
to a proportional increase in the output.
• However, real-world data is often complex, and relationships between inputs
and outputs can be curved or more intricate.
• By introducing non-linearity, activation functions allow the network to learn
and model complex behaviors, such as recognizing images, understanding
language, or making sophisticated predictions.
• Without non-linearity, no matter how many layers a neural network has, it
would still behave like a simple equation and fail to capture complex patterns.
4. Common Activation Functions

Sigmoid Function
• Formula: f(x) = 1 / (1 + exp(-x))
• Range: (0,1)
• Pros:
• Smooth and differentiable.
• Converts inputs into probabilities (useful in binary classification).
• Cons:
• Prone to vanishing gradient problem (gradients become very small
for extreme values of x).
• Not centered around zero, leading to inefficient weight updates.
Hyperbolic Tangent (Tanh) Function

• Formula: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))


• Range: (-1,1)
• Pros:
• Centered around zero, leading to better gradient updates than
Sigmoid.
• Useful in hidden layers to transform data.
• Cons:
• Also suffers from the vanishing gradient problem for extreme
values.
Rectified Linear Unit (ReLU) Function

• Formula: f(x) = max(0, x)


• Range: [0, +∞)
• Pros:
• Avoids vanishing gradients for positive inputs.
• Computationally efficient.
• Allows sparse activations, making the network efficient.
• Cons:
• Dead neurons issue: If x is negative, the gradient becomes zero, meaning
the neuron may stop learning.
• Can cause exploding activations for large values.
Leaky ReLU Function

• Formula: f(x) = max(αx, x), where α is a small constant (e.g., 0.01)


• Range: (-∞, +∞)
• Pros:
• Addresses the dead neuron problem in ReLU by allowing small negative
values.
• Retains advantages of ReLU while improving gradient flow.
• Cons:
• The choice of is arbitrary and requires tuning.
Exponential Linear Unit (ELU) Function

• Formula:
• f(x) = x if x > 0
• f(x) = α * (exp(x) - 1) if x ≤ 0
• Range: (-∞, +∞)
• Pros:
• Avoids dead neuron problem.
• Has negative values, which improves gradient flow.
• Cons:
• More computationally expensive than ReLU.
• Requires tuning the parameter .
Softmax Function

• Used in the output layer for multi-class classification.


• Formula: f(x_i) = exp(x_i) / sum(exp(x_j))
• Range: (0,1), outputs sum to 1 (probabilities of each class).
• Pros:
• Converts logits into probabilities, making it interpretable.
• Cons:
• Can amplify differences, leading to confident but possibly incorrect
predictions.
Sidenote: What is the e?

The "e" in the formulas represents Euler's number, which is a mathematical constant
approximately equal to: e≈2.7183

It is the base of the natural logarithm and is widely used in exponential functions, especially in
activation functions for neural networks.

Why is "e" used in activation functions?

1. Growth and Decay - Functions involving ex naturally model exponential growth or decay,
making them useful for probability functions and gradients in machine learning.
2. Differentiation Simplicity - The derivative of ex is simply ex, which makes backpropagation
calculations in neural networks efficient.
3. Probability Applications - In functions like sigmoid and softmax, ex ensures positive outputs
and smooth transformations between values.
Intended Learning Outcome

At the end of the lesson, you are expected to be able to:


- describe the structure of a Multilayer Perceptron, including
input, hidden, and output layers
- implement a basic MLP model using Python and libraries
like scikit-learn
- evaluate the performance of MLP models using appropriate
metrics (e.g., accuracy, loss functions) and identify factors
that affect model performance, such as activation functions
and hyperparameters
LET’S REVIEW!!!

• A neural network consists of interconnected nodes, called


neurons, organized into layers.
• Each neuron receives input signals, performs a computation
on them using an activation function, and produces an
output signal that may be passed to other neurons in the
network.
• An activation function determines the output of a neuron
given its input.
• These functions introduce nonlinearity into the network,
enabling it to learn complex patterns in data.
LET’S REVIEW!!!

• The network is typically organized into layers, starting


with the input layer, where data is introduced.
• Followed by hidden layers where computations are
performed and finally, the output layer where
predictions or decisions are made.
• Neurons in adjacent layers are connected by weighted
connections, which transmit signals from one layer to
the next.
• The strength of these connections, represented by
weights, determines how much influence one neuron's
output has on another neuron's input.
• During the training process, the network learns to
adjust its weights based on examples provided in a
LET’S REVIEW!!!

• Neural networks are trained using techniques called


feedforward propagation and backpropagation.
• During feedforward propagation, input data is passed through
the network layer by layer, with each layer performing a
computation based on the inputs it receives and passing the
result to the next layer.
• Backpropagation is an algorithm used to train neural networks
by iteratively adjusting the network's weights and biases in
order to minimize the loss function.
• A loss function (also known as a cost function or objective
function) is a measure of how well the model's predictions
match the true target values in the training data.
• The loss function quantifies the difference between the
predicted output of the model and the actual output, providing
a signal that guides the optimization process during training.
• The goal of training a neural network is to minimize
this loss function by adjusting the weights and biases.
• The adjustments are guided by an optimization
algorithm, such as gradient descent.
There are several types of ANN, each designed for specific tasks and architectural
requirements. Let's briefly discuss some of the most common types before diving
deeper into MLPs next.

Feedforward Neural Networks (FNN)


• These are the simplest form of ANNs, where information flows in one direction, from
input to output.
• There are no cycles or loops in the network architecture.
• Multilayer perceptrons (MLP) are a type of feedforward neural network.

Recurrent Neural Networks (RNN)


• In RNNs, connections between nodes form directed cycles, allowing information to
persist over time.
• This makes them suitable for tasks involving sequential data, such as time series
prediction, natural language processing, and speech recognition.
Introduction to Multilayer Perceptron (MLP)

• A Multilayer Perceptron (MLP) is a class of feedforward


artificial neural networks (ANNs).
• It consists of at least three layers of nodes: an input layer,
one or more hidden layers, and an output layer.
• Each node (neuron) in one layer connects with a certain
weight to every node in the following layer.
Input layer
• The input layer consists of nodes or neurons that receive the initial input data.
Each neuron represents a feature or dimension of the input data.
• The number of neurons in the input layer is determined by the dimensionality of
the input data.

Hidden layer
• Between the input and output layers, there can be one or more layers of
neurons.
• Each neuron in a hidden layer receives inputs from all neurons in the previous
layer (either the input layer or another hidden layer) and produces an output that
is passed to the next layer.
• The number of hidden layers and the number of neurons in each hidden layer are
hyperparameters that need to be determined during the model design phase.
Output layer
• This layer consists of neurons that produce the final output of the network.
• The number of neurons in the output layer depends on the nature of the task.
• In binary classification, there may be either one or two neurons depending on the
activation function and representing the probability of belonging to one class;
while in multi-class classification tasks, there can be multiple neurons in the
output layer.

Weights
• Neurons in adjacent layers are fully connected to each other.
• Each connection has an associated weight, which determines the strength of the
connection.
• These weights are learned during the training process.
Bias neurons
• In addition to the input and hidden neurons, each layer (except the input layer)
usually includes a bias neuron that provides a constant input to the neurons in the
next layer.
• Bias neurons have their own weight associated with each connection, which is also
learned during training.
• The bias neuron effectively shifts the activation function of the neurons in the
subsequent layer, allowing the network to learn an offset or bias in the decision
boundary.
• By adjusting the weights connected to the bias neuron, the MLP can learn to
control the threshold for activation and better fit the training data.

Note: It is important to note that in the context of MLPs, bias can refer to two related
but distinct concepts: bias as a general term in machine learning and the bias neuron
(defined above). In general machine learning, bias refers to the error introduced by
approximating a real-world problem with a simplified model. Bias measures how well
the model can capture the underlying patterns in the data.
In a multilayer perceptron, neurons process information in a step-by-step
manner, performing computations that involve weighted sums and
nonlinear transformations.

Input layer

• The input layer of an MLP receives input data, which could be features
extracted from the input samples in a dataset. Each neuron in the input
layer represents one feature.
• Neurons in the input layer do not perform any computations; they
simply pass the input values to the neurons in the first hidden layer.
Hidden layers
• The hidden layers of an MLP consist of interconnected neurons that perform computations on the
input data.
• Each neuron in a hidden layer receives input from all neurons in the previous
layer.
• The inputs are multiplied by corresponding weights, denoted as w.
• The weights determine how much influence the input from one neuron has on the
output of another.
• In addition to weights, each neuron in the hidden layer has an associated bias,
denoted as b.
• The bias provides an additional input to the neuron, allowing it to adjust its output
threshold. Like weights, biases are learned during training.
• For each neuron in a hidden layer or the output layer, the weighted sum of its
inputs is computed.
• This involves multiplying each input by its corresponding weight, summing up
these products, and adding the bias:
• The weighted sum is then passed through an activation function, denoted as f.
• The activation function introduces nonlinearity into the network, allowing it to learn
and represent complex relationships in the data.
• The activation function determines the output range of the neuron and its behavior
in response to different input values.
• The choice of activation function depends on the nature of the task and the desired
properties of the network.
Output layer
• The output layer of an MLP produces the final predictions or
outputs of the network.
• The number of neurons in the output layer depends on the task
being performed (e.g., binary classification, multi-class
classification, regression).
• Each neuron in the output layer receives input from the neurons
in the last hidden layer and applies an activation function.
• This activation function is usually different from those used in
the hidden layers and produces the final output value or
prediction.
Intended Learning Outcomes (ILOs) for
Loss Functions in Neural Networks

At the end of the lesson, students are expected to be able to:


• Explain the purpose of a loss function in training neural networks.
• Identify and classify loss functions based on the type of problem
(regression vs. classification).
• Select an appropriate loss function for a given machine learning task.
• Implement loss functions using Python libraries such as
TensorFlow/Keras.
Introduction to Loss Functions
• In neural networks, a loss function measures how well the model's
predictions match the actual target values.
• It serves as the foundation for training the network, guiding the
optimization process to improve accuracy.
• The loss function, also referred to as the error function, is a crucial
component in machine learning that quantifies the difference
between the predicted outputs of a machine learning algorithm and
the actual target values.
Why Do We Need a Loss Function?
• It quantifies the difference between the predicted output and the
actual output.
• It provides feedback to update model parameters (weights and
biases).
• It helps optimize the neural network through gradient descent or
other optimization techniques.
Why Do We Need a Loss Function?
• The resulting value, the loss, reflects the accuracy of
the model's predictions.
• During training, a learning algorithm such as the
backpropagation algorithm uses the gradient of the
loss function with respect to the model's parameters to
adjust these parameters and minimize the loss,
effectively improving the model's performance on the
dataset.
Gradient Descent
Types of Loss Functions
Loss functions are broadly categorized into regression loss functions
and classification loss functions based on the type of problem being
solved.
• Loss Functions for Regression
• Mean Squared Error (MSE)
• Mean Absolute Error (MAE)
• Loss Functions for Classification
• Binary Cross-Entropy (Log Loss) (for Binary Classification)
• Categorical Cross-Entropy (for Multi-Class Classification)
• Sparse Categorical Cross-Entropy
Mean Squared Error (MSE)

• Measures the average squared difference between actual and


predicted values.
• Penalizes larger errors more than smaller ones.

Actual Pred Diff Squared Sum MSE


85 86 -1 1
78 84 -6 36
89 92 -3 9 114 22.8
78 76 2 4
82 74 8 64
Binary Cross-Entropy (Log Loss) (for Binary
Classification)

• Used when predicting probabilities for two classes (0 or 1).


• Encourages correct probability estimation.

Class Probability of Predicted Log ( Predicted


User ID Prediction Predicted Class Actual Class Probability of Probability of ‘Class 1’ )
‘Class 1’
sd459 1 0.8 1 0.8 -0.22
sd325 1 0.65 1 0.65 -0.43
ef345 1 0.78 1 0.78 -0.25
bw678 1 0.91 1 0.91 -0.09
df837 0 0.65 0 0.35 -1.05
lk948 1 0.87 1 0.87 -0.14
os274 0 0.22 0 0.78 -0.25
ye923 0 0.33 0 0.67 -0.4
Categorical Cross-Entropy (for Multi-Class
Classification)
• Categorical Cross-Entropy (CCE), also known as softmax loss or log loss, is one of the most
commonly used loss functions in machine learning, particularly for classification problems.
• It measures the difference between the predicted probability distribution and the actual (true)
distribution of classes.
Calculating Categorical Cross-Entropy
• Let's break down the categorical cross-entropy calculation with a mathematical
example using the following true labels and predicted probabilities.
• We have 3 samples, each belonging to one of 3 classes (Class A, Class B, or Class
C). The true labels are one-hot encoded.
True Labels (y_true):
Example 1: Class B → [0, 1, 0]
Example 2: Class A → [1, 0, 0]
Example 3: Class C → [0, 0, 1]

Predicted Probabilities (y_pred):


Example 1: [0.1, 0.8, 0.1]
Example 2: [0.7, 0.2, 0.1]
Example 3: [0.2, 0.3, 0.5]
Calculating Categorical Cross-Entropy
• Final Losses:
• For Example 1, the loss is: 0.22314355
• For Example 2, the loss is: 0.35667494
• For Example 3, the loss is: 0.69314718

• How Categorical Cross-Entropy Works


• Prediction of Probabilities - The model outputs probabilities for each class.
These probabilities are the likelihood of a data point belonging to each class.
Typically, this is done using a softmax function, which converts raw scores into
probabilities.
• Comparison with True Class - Categorical cross-entropy compares the
predicted probabilities with the actual class labels (one-hot encoded).
• Calculation of Loss - The logarithm of the predicted probability for the correct
class is taken, and the loss function penalizes the model based on how far the
prediction was from the actual class.
Sparse Categorical Cross-Entropy
• Similar to categorical cross-entropy but used when target labels are integer-
encoded instead of one-hot encoded.
• Instead, the labels are represented as integers corresponding to the class
indices. The true labels are integers, where each integer represents the class
index.
• Example:
• If the correct label is "Cat," it would be represented as the integer 1 (since
"Cat" is the second class, starting from 0).
• Suppose the model predicts probabilities like [0.2, 0.7, 0.1]. The loss is
calculated for the correct class (Cat) using the formula: -log(0.7)
• Sparse Categorical Cross entropy internally converts these integer labels into
one-hot encoded format before calculating the loss. This approach can save
memory and computational resources, especially when dealing with datasets
containing a large number of classes.
How Does Loss Function Affect Model
Training?
• The loss function provides a numerical value that the optimizer (e.g.,
Gradient Descent, Adam, RMSprop) minimizes.
• The network updates weights using backpropagation, which
calculates the gradient of the loss function with respect to each
parameter.
MSE
MAE
Binary Cross-Entropy
Categorical Cross-Entropy
Sparse Categorical Cross-Entropy

You might also like