0% found this document useful (0 votes)

2 views

Module 2

A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons) that learn from data to recognize patterns and solve problems. Key components include input, hidden, and output layers, with various types of networks like feedforward, convolutional, and recurrent networks tailored for specific tasks. Applications range from social media recommendations to personal assistants, with foundational models like the perceptron serving as building blocks for more complex neural networks.

Uploaded by

Erica Mae Cañete

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Module 2

Uploaded by

Erica Mae Cañete

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 84

1. What is a Neural Network?

A neural network is a computational model inspired by the human

brain, designed to recognize patterns and solve problems through
learning from data. It is a key element in artificial intelligence (AI) and
machine learning (ML).

Key Characteristics:
• Consists of layers of nodes (neurons) connected by edges (weights).
• Learns by adjusting weights through a process called training.
• Used for tasks like image recognition, natural language processing,
and predictive modeling.
Artificial neural networks (ANNs) are a fundamental concept in deep
learning within artificial intelligence. They are crucial in handling complex
application scenarios that traditional machine-learning algorithms may
struggle with. Here’s an overview of how neural networks operate and
their components:
• Inspired by Biology
ANNs are inspired by biological neurons in the human brain. Just as
neurons activate under specific conditions to trigger actions in the
body, artificial neurons in ANNs activate based on input data.
• Structure of ANNs
ANNs consist of layers of interconnected artificial neurons. These
neurons are organized into layers, each performing specific
computations using activation functions to decide which signals to pass
onto the next layer.
2. The Structure of a Neural Network
Neural networks are composed of three main layers:
a. Input Layer:
1.Receives raw data (e.g., numerical values, pixel intensities).
2.Passes the data to the next layer for processing.
b. Hidden Layers:
3.Perform computations on the input data.
4.Apply activation functions to introduce non-linearity.
5.There can be multiple hidden layers in a deep neural network.
c. Output Layer:
6.Produces the final output (e.g., a classification, regression value,
or probability).
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal
brains So they share a lot of similarities in structure and function wise.
3. How Neural Networks Work
a. Input Data: The model receives data in numerical form.
Example: An image is represented as a matrix of pixel values.
b. Forward Propagation: The input data flows through the network:
1.Each neuron processes inputs by applying weights and biases.
2.Outputs are passed through an activation function (e.g., ReLU,
sigmoid).
c. Prediction: The network generates an output (e.g., a label, score, or
probability).
d. Training:
3.The network learns by comparing its predictions with actual
outputs (labels) using a loss function.
4.Backpropagation adjusts weights to minimize error.
4. Key Terms and Concepts
• Neuron (Node) - The basic unit in a neural network that processes input and
generates output.
• Weights - Numerical values representing the importance of connections between
neurons.
• Bias - A constant added to the input to help the network fit the data better.
• Activation Function - Introduces non-linearity into the model to capture complex
patterns.
• Examples: ReLU, sigmoid, tanh.
• Loss Function -Measures the difference between predicted and actual values.
• Examples: Mean Squared Error (MSE), Cross-Entropy Loss.
• Learning Rate - Determines how much the weights are adjusted during training.
5. Types of Neural Networks
• Feedforward Neural Networks (FNN):
• Data flows in one direction from input to output.
• Used for simple classification and regression tasks.
• Convolutional Neural Networks (CNN):
• Specialized for image data.
• Uses convolution layers to detect spatial patterns like edges or
shapes.
• Recurrent Neural Networks (RNN):
• Processes sequential data (e.g., text, time series).
• Has memory mechanisms to retain context over time.
• Deep Neural Networks (DNN):
• Contains multiple hidden layers.
• Used for complex problems requiring high accuracy.
Applications of Artificial Neural Networks

Social Media
• Artificial Neural Networks are used heavily in Social Media. For example, let’s
take the ‘People you may know’ feature on Facebook that suggests people that
you might know in real life so that you can send them friend requests.
• Well, this magical effect is achieved by using Artificial Neural Networks that
analyze your profile, your interests, your current friends, and also their friends
and various other factors to calculate the people you might potentially know.
• Another common application of Machine Learning in social media is facial
recognition .
• This is done by finding around 100 reference points on the person’s face and
then matching them with those already available in the database using
convolutional neural networks.
Marketing and Sales
• When you log onto E-commerce sites like Amazon and Flipkart, they
will recommend your products to buy based on your previous
browsing history.
• Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show
you restaurant recommendations based on your tastes and previous
order history.
• This is true across all new-age marketing segments like Book sites,
Movie services, Hospitality sites, etc. and it is done by
implementing personalized marketing .
• This uses Artificial Neural Networks to identify the customer likes,
dislikes, previous shopping history, etc., and then tailor the marketing
campaigns accordingly.
Personal Assistants
• I am sure you all have heard of Siri, Alexa, Cortana, etc.
• These are personal assistants and an example of speech recognition
that uses Natural Language Processing to interact with the users and
formulate a response accordingly.
• Natural Language Processing uses artificial neural networks that are
made to handle many tasks of these personal assistants such as
managing the language syntax, semantics, correct speech, the
conversation that is going on, etc.
Basic example of implementing a simple neural network using Python and
TensorFlow in Google Colab. This neural network classifies handwritten
digits from the MNIST dataset.

https://colab.research.google.com/drive/1T1ACgTU0Jx7JElK0S5NyqLwFAtSZXgxB?usp=sharing
What is Perceptron | The Simplest Artificial
neural network
The Perceptron is one of the simplest artificial neural
network architectures, introduced by Frank Rosenblatt in 1957. It is
primarily used for binary classification.

At that time, traditional methods like Statistical Machine Learning and

Conventional Programming were commonly used for predictions.

Despite being one of the simplest forms of artificial neural networks,

the Perceptron model proved to be highly effective in solving specific
classification problems, laying the groundwork for advancements in AI
and machine learning.

This lesson aims to provide fundamentals of the perceptron

model, its architecture, working principles, and application
What is Perceptron?

Perceptron is a type of neural network that performs binary

classification that maps input features to an output decision, usually
classifying data into one of two categories, such as 0 or 1.

Perceptron consists of a single layer of input nodes that are fully

connected to a layer of output nodes. It is particularly good at
learning linearly separable patterns.

It utilizes a variation of artificial neurons called Threshold Logic

Units (TLU), which were first introduced by McCulloch and Walter
Pitts in the 1940s.

This foundational model has played a crucial role in the

development of more advanced neural networks and machine
learning algorithms.
Types of Perceptron

Single-Layer Perceptron is a type of perceptron is limited to learning

linearly separable patterns.
It is effective for tasks where the data can be divided into distinct
categories through a straight line.
While powerful in its simplicity, it struggles with more complex
problems where the relationship between inputs and outputs is non-
linear.

Multi-Layer Perceptron possess enhanced processing capabilities as

they consist of two or more layers, adept at handling more complex
patterns and relationships within the data.
Basic Components of Perceptron

A Perceptron is composed of key components that work together to

process information and make predictions.
• Input Features
• The perceptron takes multiple input features, each representing a
characteristic of the input data.
• Weights
• Each input feature is assigned a weight that determines its
influence on the output. These weights are adjusted during
training to find the optimal values.
• Summation Function
• The perceptron calculates the weighted sum of its inputs,
combining them with their respective weights.
• Activation Function
• The weighted sum is passed through the Heaviside step
function, comparing it to a threshold to produce a binary output
(0 or 1).
• Output
• The final output is determined by the activation function, often
used for binary classification tasks.
• Bias
• The bias term helps the perceptron make adjustments
independent of the input, improving its flexibility in learning.
• Learning Algorithm
• The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize
prediction errors.

These components enable the perceptron to learn from data and

make predictions. While a single perceptron can handle simple binary
classification, complex tasks require multiple perceptrons organized
How does Perceptron work?

A weight is assigned to each input node of a perceptron, indicating the importance

of that input in determining the output.
The Perceptron’s output is calculated as a weighted sum of the inputs, which is then
passed through an activation function to decide whether the Perceptron will fire.
The weighted sum is computed as:

The step function compares this weighted sum to a threshold. If the input is larger
than the threshold value, the output is 1; otherwise, it’s 0.
This is the most common activation function used in Perceptrons are represented
by the Heaviside step function:
A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU
fully connected to all input nodes.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:

where X is the input W is the weight for each inputs neurons

and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize

the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta

rule or the Perceptron learning rule.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:

where X is the input W is the weight for each inputs neurons

and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize

the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta

where X is the input W is the weight for each inputs neurons

and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize

the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta

where X is the input W is the weight for each inputs neurons

and b is the bias and h is the step function.

During training, the Perceptron’s weights are adjusted to minimize

the difference between the predicted output and the actual output.

This is achieved using supervised learning algorithms like the delta

rule or the Perceptron learning rule.
1. Introduction

• Activation functions are mathematical functions applied to the

output of neurons in a neural network.
• They introduce non-linearity, enabling the network to learn
complex patterns and make accurate predictions.
• Without activation functions, a neural network would behave
like a simple linear regression model, limiting its ability to solve
complex problems.
2. Importance of Activation Functions

• Introduce non-linearity, allowing neural networks to approximate

complex functions.
• Help control the range of neuron outputs, making training more
stable.
• Enable networks to learn hierarchical patterns and features.
3. Understanding Non-Linearity

• In simple terms, non-linearity means that the relationship between inputs

and outputs is not a straight line.
• If a neural network were purely linear, increasing an input would always lead
to a proportional increase in the output.
• However, real-world data is often complex, and relationships between inputs
and outputs can be curved or more intricate.
• By introducing non-linearity, activation functions allow the network to learn
and model complex behaviors, such as recognizing images, understanding
language, or making sophisticated predictions.
• Without non-linearity, no matter how many layers a neural network has, it
would still behave like a simple equation and fail to capture complex patterns.
4. Common Activation Functions

Sigmoid Function
• Formula: f(x) = 1 / (1 + exp(-x))
• Range: (0,1)
• Pros:
• Smooth and differentiable.
• Converts inputs into probabilities (useful in binary classification).
• Cons:
• Prone to vanishing gradient problem (gradients become very small
for extreme values of x).
• Not centered around zero, leading to inefficient weight updates.
Hyperbolic Tangent (Tanh) Function

• Formula: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

• Range: (-1,1)
• Pros:
• Centered around zero, leading to better gradient updates than
Sigmoid.
• Useful in hidden layers to transform data.
• Cons:
• Also suffers from the vanishing gradient problem for extreme
values.
Rectified Linear Unit (ReLU) Function

• Formula: f(x) = max(0, x)

• Range: [0, +∞)
• Pros:
• Avoids vanishing gradients for positive inputs.
• Computationally efficient.
• Allows sparse activations, making the network efficient.
• Cons:
• Dead neurons issue: If x is negative, the gradient becomes zero, meaning
the neuron may stop learning.
• Can cause exploding activations for large values.
Leaky ReLU Function

• Formula: f(x) = max(αx, x), where α is a small constant (e.g., 0.01)

• Range: (-∞, +∞)
• Pros:
• Addresses the dead neuron problem in ReLU by allowing small negative
values.
• Retains advantages of ReLU while improving gradient flow.
• Cons:
• The choice of is arbitrary and requires tuning.
Exponential Linear Unit (ELU) Function

• Formula:
• f(x) = x if x > 0
• f(x) = α * (exp(x) - 1) if x ≤ 0
• Range: (-∞, +∞)
• Pros:
• Avoids dead neuron problem.
• Has negative values, which improves gradient flow.
• Cons:
• More computationally expensive than ReLU.
• Requires tuning the parameter .
Softmax Function

• Used in the output layer for multi-class classification.

• Formula: f(x_i) = exp(x_i) / sum(exp(x_j))
• Range: (0,1), outputs sum to 1 (probabilities of each class).
• Pros:
• Converts logits into probabilities, making it interpretable.
• Cons:
• Can amplify differences, leading to confident but possibly incorrect
predictions.
Sidenote: What is the e?

The "e" in the formulas represents Euler's number, which is a mathematical constant
approximately equal to: e≈2.7183

It is the base of the natural logarithm and is widely used in exponential functions, especially in
activation functions for neural networks.

Why is "e" used in activation functions?

1. Growth and Decay - Functions involving ex naturally model exponential growth or decay,
making them useful for probability functions and gradients in machine learning.
2. Differentiation Simplicity - The derivative of ex is simply ex, which makes backpropagation
calculations in neural networks efficient.
3. Probability Applications - In functions like sigmoid and softmax, ex ensures positive outputs
and smooth transformations between values.
Intended Learning Outcome

At the end of the lesson, you are expected to be able to:

- describe the structure of a Multilayer Perceptron, including
input, hidden, and output layers
- implement a basic MLP model using Python and libraries
like scikit-learn
- evaluate the performance of MLP models using appropriate
metrics (e.g., accuracy, loss functions) and identify factors
that affect model performance, such as activation functions
and hyperparameters
LET’S REVIEW!!!

• A neural network consists of interconnected nodes, called

neurons, organized into layers.
• Each neuron receives input signals, performs a computation
on them using an activation function, and produces an
output signal that may be passed to other neurons in the
network.
• An activation function determines the output of a neuron
given its input.
• These functions introduce nonlinearity into the network,
enabling it to learn complex patterns in data.
LET’S REVIEW!!!

• The network is typically organized into layers, starting

with the input layer, where data is introduced.
• Followed by hidden layers where computations are
performed and finally, the output layer where
predictions or decisions are made.
• Neurons in adjacent layers are connected by weighted
connections, which transmit signals from one layer to
the next.
• The strength of these connections, represented by
weights, determines how much influence one neuron's
output has on another neuron's input.
• During the training process, the network learns to
adjust its weights based on examples provided in a
LET’S REVIEW!!!

• Neural networks are trained using techniques called

feedforward propagation and backpropagation.
• During feedforward propagation, input data is passed through
the network layer by layer, with each layer performing a
computation based on the inputs it receives and passing the
result to the next layer.
• Backpropagation is an algorithm used to train neural networks
by iteratively adjusting the network's weights and biases in
order to minimize the loss function.
• A loss function (also known as a cost function or objective
function) is a measure of how well the model's predictions
match the true target values in the training data.
• The loss function quantifies the difference between the
predicted output of the model and the actual output, providing
a signal that guides the optimization process during training.
• The goal of training a neural network is to minimize
this loss function by adjusting the weights and biases.
• The adjustments are guided by an optimization
algorithm, such as gradient descent.
There are several types of ANN, each designed for specific tasks and architectural
requirements. Let's briefly discuss some of the most common types before diving
deeper into MLPs next.

Feedforward Neural Networks (FNN)

• These are the simplest form of ANNs, where information flows in one direction, from
input to output.
• There are no cycles or loops in the network architecture.
• Multilayer perceptrons (MLP) are a type of feedforward neural network.

Recurrent Neural Networks (RNN)

• In RNNs, connections between nodes form directed cycles, allowing information to
persist over time.
• This makes them suitable for tasks involving sequential data, such as time series
prediction, natural language processing, and speech recognition.
Introduction to Multilayer Perceptron (MLP)

• A Multilayer Perceptron (MLP) is a class of feedforward

artificial neural networks (ANNs).
• It consists of at least three layers of nodes: an input layer,
one or more hidden layers, and an output layer.
• Each node (neuron) in one layer connects with a certain
weight to every node in the following layer.
Input layer
• The input layer consists of nodes or neurons that receive the initial input data.
Each neuron represents a feature or dimension of the input data.
• The number of neurons in the input layer is determined by the dimensionality of
the input data.

Hidden layer
• Between the input and output layers, there can be one or more layers of
neurons.
• Each neuron in a hidden layer receives inputs from all neurons in the previous
layer (either the input layer or another hidden layer) and produces an output that
is passed to the next layer.
• The number of hidden layers and the number of neurons in each hidden layer are
hyperparameters that need to be determined during the model design phase.
Output layer
• This layer consists of neurons that produce the final output of the network.
• The number of neurons in the output layer depends on the nature of the task.
• In binary classification, there may be either one or two neurons depending on the
activation function and representing the probability of belonging to one class;
while in multi-class classification tasks, there can be multiple neurons in the
output layer.

Weights
• Neurons in adjacent layers are fully connected to each other.
• Each connection has an associated weight, which determines the strength of the
connection.
• These weights are learned during the training process.
Bias neurons
• In addition to the input and hidden neurons, each layer (except the input layer)
usually includes a bias neuron that provides a constant input to the neurons in the
next layer.
• Bias neurons have their own weight associated with each connection, which is also
learned during training.
• The bias neuron effectively shifts the activation function of the neurons in the
subsequent layer, allowing the network to learn an offset or bias in the decision
boundary.
• By adjusting the weights connected to the bias neuron, the MLP can learn to
control the threshold for activation and better fit the training data.

Note: It is important to note that in the context of MLPs, bias can refer to two related
but distinct concepts: bias as a general term in machine learning and the bias neuron
(defined above). In general machine learning, bias refers to the error introduced by
approximating a real-world problem with a simplified model. Bias measures how well
the model can capture the underlying patterns in the data.
In a multilayer perceptron, neurons process information in a step-by-step
manner, performing computations that involve weighted sums and
nonlinear transformations.

Input layer

• The input layer of an MLP receives input data, which could be features
extracted from the input samples in a dataset. Each neuron in the input
layer represents one feature.
• Neurons in the input layer do not perform any computations; they
simply pass the input values to the neurons in the first hidden layer.
Hidden layers
• The hidden layers of an MLP consist of interconnected neurons that perform computations on the
input data.
• Each neuron in a hidden layer receives input from all neurons in the previous
layer.
• The inputs are multiplied by corresponding weights, denoted as w.
• The weights determine how much influence the input from one neuron has on the
output of another.
• In addition to weights, each neuron in the hidden layer has an associated bias,
denoted as b.
• The bias provides an additional input to the neuron, allowing it to adjust its output
threshold. Like weights, biases are learned during training.
• For each neuron in a hidden layer or the output layer, the weighted sum of its
inputs is computed.
• This involves multiplying each input by its corresponding weight, summing up
these products, and adding the bias:
• The weighted sum is then passed through an activation function, denoted as f.
• The activation function introduces nonlinearity into the network, allowing it to learn
and represent complex relationships in the data.
• The activation function determines the output range of the neuron and its behavior
in response to different input values.
• The choice of activation function depends on the nature of the task and the desired
properties of the network.
Output layer
• The output layer of an MLP produces the final predictions or
outputs of the network.
• The number of neurons in the output layer depends on the task
being performed (e.g., binary classification, multi-class
classification, regression).
• Each neuron in the output layer receives input from the neurons
in the last hidden layer and applies an activation function.
• This activation function is usually different from those used in
the hidden layers and produces the final output value or
prediction.
Intended Learning Outcomes (ILOs) for
Loss Functions in Neural Networks

At the end of the lesson, students are expected to be able to:

• Explain the purpose of a loss function in training neural networks.
• Identify and classify loss functions based on the type of problem
(regression vs. classification).
• Select an appropriate loss function for a given machine learning task.
• Implement loss functions using Python libraries such as
TensorFlow/Keras.
Introduction to Loss Functions
• In neural networks, a loss function measures how well the model's
predictions match the actual target values.
• It serves as the foundation for training the network, guiding the
optimization process to improve accuracy.
• The loss function, also referred to as the error function, is a crucial
component in machine learning that quantifies the difference
between the predicted outputs of a machine learning algorithm and
the actual target values.
Why Do We Need a Loss Function?
• It quantifies the difference between the predicted output and the
actual output.
• It provides feedback to update model parameters (weights and
biases).
• It helps optimize the neural network through gradient descent or
other optimization techniques.
Why Do We Need a Loss Function?
• The resulting value, the loss, reflects the accuracy of
the model's predictions.
• During training, a learning algorithm such as the
backpropagation algorithm uses the gradient of the
loss function with respect to the model's parameters to
adjust these parameters and minimize the loss,
effectively improving the model's performance on the
dataset.
Gradient Descent
Types of Loss Functions
Loss functions are broadly categorized into regression loss functions
and classification loss functions based on the type of problem being
solved.
• Loss Functions for Regression
• Mean Squared Error (MSE)
• Mean Absolute Error (MAE)
• Loss Functions for Classification
• Binary Cross-Entropy (Log Loss) (for Binary Classification)
• Categorical Cross-Entropy (for Multi-Class Classification)
• Sparse Categorical Cross-Entropy
Mean Squared Error (MSE)

• Measures the average squared difference between actual and

predicted values.
• Penalizes larger errors more than smaller ones.

Actual Pred Diff Squared Sum MSE

85 86 -1 1
78 84 -6 36
89 92 -3 9 114 22.8
78 76 2 4
82 74 8 64
Binary Cross-Entropy (Log Loss) (for Binary
Classification)

• Used when predicting probabilities for two classes (0 or 1).

• Encourages correct probability estimation.

Class Probability of Predicted Log ( Predicted

User ID Prediction Predicted Class Actual Class Probability of Probability of ‘Class 1’ )
‘Class 1’
sd459 1 0.8 1 0.8 -0.22
sd325 1 0.65 1 0.65 -0.43
ef345 1 0.78 1 0.78 -0.25
bw678 1 0.91 1 0.91 -0.09
df837 0 0.65 0 0.35 -1.05
lk948 1 0.87 1 0.87 -0.14
os274 0 0.22 0 0.78 -0.25
ye923 0 0.33 0 0.67 -0.4
Categorical Cross-Entropy (for Multi-Class
Classification)
• Categorical Cross-Entropy (CCE), also known as softmax loss or log loss, is one of the most
commonly used loss functions in machine learning, particularly for classification problems.
• It measures the difference between the predicted probability distribution and the actual (true)
distribution of classes.
Calculating Categorical Cross-Entropy
• Let's break down the categorical cross-entropy calculation with a mathematical
example using the following true labels and predicted probabilities.
• We have 3 samples, each belonging to one of 3 classes (Class A, Class B, or Class
C). The true labels are one-hot encoded.
True Labels (y_true):
Example 1: Class B → [0, 1, 0]
Example 2: Class A → [1, 0, 0]
Example 3: Class C → [0, 0, 1]

Predicted Probabilities (y_pred):

Example 1: [0.1, 0.8, 0.1]
Example 2: [0.7, 0.2, 0.1]
Example 3: [0.2, 0.3, 0.5]
Calculating Categorical Cross-Entropy
• Final Losses:
• For Example 1, the loss is: 0.22314355
• For Example 2, the loss is: 0.35667494
• For Example 3, the loss is: 0.69314718

• How Categorical Cross-Entropy Works

• Prediction of Probabilities - The model outputs probabilities for each class.
These probabilities are the likelihood of a data point belonging to each class.
Typically, this is done using a softmax function, which converts raw scores into
probabilities.
• Comparison with True Class - Categorical cross-entropy compares the
predicted probabilities with the actual class labels (one-hot encoded).
• Calculation of Loss - The logarithm of the predicted probability for the correct
class is taken, and the loss function penalizes the model based on how far the
prediction was from the actual class.
Sparse Categorical Cross-Entropy
• Similar to categorical cross-entropy but used when target labels are integer-
encoded instead of one-hot encoded.
• Instead, the labels are represented as integers corresponding to the class
indices. The true labels are integers, where each integer represents the class
index.
• Example:
• If the correct label is "Cat," it would be represented as the integer 1 (since
"Cat" is the second class, starting from 0).
• Suppose the model predicts probabilities like [0.2, 0.7, 0.1]. The loss is
calculated for the correct class (Cat) using the formula: -log(0.7)
• Sparse Categorical Cross entropy internally converts these integer labels into
one-hot encoded format before calculating the loss. This approach can save
memory and computational resources, especially when dealing with datasets
containing a large number of classes.
How Does Loss Function Affect Model
Training?
• The loss function provides a numerical value that the optimizer (e.g.,
Gradient Descent, Adam, RMSprop) minimizes.
• The network updates weights using backpropagation, which
calculates the gradient of the loss function with respect to each
parameter.
MSE
MAE
Binary Cross-Entropy
Categorical Cross-Entropy
Sparse Categorical Cross-Entropy

Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
An Ingression Into Deep Learning - Resp
No ratings yet
An Ingression Into Deep Learning - Resp
25 pages
50 Deep Learning Technical Interview Questions With Answers
100% (1)
50 Deep Learning Technical Interview Questions With Answers
20 pages
Unit IV Artificial Neural Networks
No ratings yet
Unit IV Artificial Neural Networks
25 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
deep learning UNIT 1
No ratings yet
deep learning UNIT 1
22 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Chapter One
No ratings yet
Chapter One
9 pages
Physucs_prjct-1
No ratings yet
Physucs_prjct-1
33 pages
Neural Networks
No ratings yet
Neural Networks
16 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Technical Seminar
No ratings yet
Technical Seminar
27 pages
ML - Chapter 5 - Neural Network
No ratings yet
ML - Chapter 5 - Neural Network
64 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
47 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
Unit 5 Neural Network
No ratings yet
Unit 5 Neural Network
31 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
UNIT I-PGI20C05J-Deep Neural Networks (1)
No ratings yet
UNIT I-PGI20C05J-Deep Neural Networks (1)
35 pages
physics12.docx
No ratings yet
physics12.docx
33 pages
Neural Network
No ratings yet
Neural Network
85 pages
1. Introduction to neural networks -Single layer perceptrons - Modified
No ratings yet
1. Introduction to neural networks -Single layer perceptrons - Modified
26 pages
Neural Net2
No ratings yet
Neural Net2
24 pages
Lecture 2
No ratings yet
Lecture 2
37 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
Intro To Neural Nets PDF
No ratings yet
Intro To Neural Nets PDF
29 pages
Neural Networks
No ratings yet
Neural Networks
44 pages
ANN
No ratings yet
ANN
26 pages
Deep Learning
No ratings yet
Deep Learning
156 pages
Defense Presentation - Zubair
No ratings yet
Defense Presentation - Zubair
29 pages
Ann - Unit 1
No ratings yet
Ann - Unit 1
96 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
51 pages
Mod 2.1,2.2
No ratings yet
Mod 2.1,2.2
24 pages
Basics
No ratings yet
Basics
48 pages
Purple Gradient Artificial Intelligence Presentation
No ratings yet
Purple Gradient Artificial Intelligence Presentation
9 pages
ML06_Neural-Network_2024-2025
No ratings yet
ML06_Neural-Network_2024-2025
78 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
N N LECTURE
No ratings yet
N N LECTURE
40 pages
UNIT-4 Material
No ratings yet
UNIT-4 Material
43 pages
Neural Networks
100% (1)
Neural Networks
119 pages
Int254 Unit 3
No ratings yet
Int254 Unit 3
29 pages
What Is A Neural Network
No ratings yet
What Is A Neural Network
7 pages
Skin Cancer
No ratings yet
Skin Cancer
19 pages
Don_t_be_scared_of_Neural_Network__1731079571
No ratings yet
Don_t_be_scared_of_Neural_Network__1731079571
18 pages
4 Neural Networks
No ratings yet
4 Neural Networks
44 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Nural Networks
No ratings yet
Nural Networks
2 pages
Neural Nets
No ratings yet
Neural Nets
43 pages
Unit Ii ML
No ratings yet
Unit Ii ML
22 pages
1 Perceptron1
No ratings yet
1 Perceptron1
25 pages
Group 7 Multilayer Network
No ratings yet
Group 7 Multilayer Network
14 pages
Deep Learning c1
No ratings yet
Deep Learning c1
86 pages
CCS355 NNDL Unit1
No ratings yet
CCS355 NNDL Unit1
30 pages
Unit 4 - Artificial Intelligence
No ratings yet
Unit 4 - Artificial Intelligence
9 pages
Unit 5 PR
No ratings yet
Unit 5 PR
47 pages
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
From Everand
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
Fouad Sabry
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
CS312 Intro To Robotics
100% (1)
CS312 Intro To Robotics
25 pages
CS317 ProjectTeamComposition Roble Cañete Salvador
No ratings yet
CS317 ProjectTeamComposition Roble Cañete Salvador
2 pages
Group 5 Bye
No ratings yet
Group 5 Bye
16 pages
Core Region 10 Dec2022 DB
No ratings yet
Core Region 10 Dec2022 DB
341 pages
LP Math2
No ratings yet
LP Math2
9 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
BAB3 - 185150207111021 - Alfen Hasiholan
100% (1)
BAB3 - 185150207111021 - Alfen Hasiholan
23 pages
Deep Learning Overview
No ratings yet
Deep Learning Overview
102 pages
Course Plan 21AIC202J NNML_24-25_even
No ratings yet
Course Plan 21AIC202J NNML_24-25_even
9 pages
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
No ratings yet
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
4 pages
ML Syllabus
No ratings yet
ML Syllabus
3 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
1991 Multilayer Perceptrons
No ratings yet
1991 Multilayer Perceptrons
15 pages
Feedforward
No ratings yet
Feedforward
34 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Machine Learning (Csen 3233)
No ratings yet
Machine Learning (Csen 3233)
4 pages
Data Warehousing Mining MCQs
No ratings yet
Data Warehousing Mining MCQs
12 pages
A Review of Recurrent Neural Networks - LSTM Cells and Network Architectures (Neural Computation) (2019)
No ratings yet
A Review of Recurrent Neural Networks - LSTM Cells and Network Architectures (Neural Computation) (2019)
36 pages
Practice Lecture4
No ratings yet
Practice Lecture4
3 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
2 pages
Latest Developments in Regression
No ratings yet
Latest Developments in Regression
2 pages
A Novel Statistical Analysis and Autoencoder Driven (CB)
No ratings yet
A Novel Statistical Analysis and Autoencoder Driven (CB)
29 pages
Pretest Praktikum
75% (4)
Pretest Praktikum
3 pages
Lab PRGM
No ratings yet
Lab PRGM
16 pages
Idm Assignment 3 22735
No ratings yet
Idm Assignment 3 22735
8 pages
MACHINE LEARNING New
No ratings yet
MACHINE LEARNING New
2 pages
MCQ
No ratings yet
MCQ
8 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
31 pages
Homework DL 5GI Sheet2
No ratings yet
Homework DL 5GI Sheet2
2 pages
Sample Question AML
No ratings yet
Sample Question AML
2 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
21 pages
Assignment Questions 2
No ratings yet
Assignment Questions 2
2 pages