0% found this document useful (0 votes)

17 views10 pages

Neural Networks

Neural networks are machine learning models that simulate the brain's functions through interconnected neurons, enabling tasks like pattern recognition and decision-making. They consist of input, hidden, and output layers, utilizing forward propagation and backpropagation for learning by adjusting weights and biases based on loss calculations. Multi-Layer Perceptrons (MLPs) are a type of neural network that model complex relationships, with various optimization techniques like Gradient Descent and its variants used to refine their performance.

Uploaded by

rayav46818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views10 pages

Neural Networks

Uploaded by

rayav46818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Neural networks

Neural networks are machine learning models that mimic the complex functions of the human
brain. These models consist of interconnected nodes or neurons that process data, learn
patterns, and enable tasks such as pattern recognition and decision-making. These networks are
built from several key components:

1. Neurons: The basic units that receive inputs, each neuron is governed by a threshold
and an activation function.

2. Connections: Links between neurons that carry information, regulated by weights and
biases.

3. Weights and Biases: These parameters determine the strength and influence of
connections.

4. Propagation Functions: Mechanisms that help process and transfer data across layers
of neurons.

5. Learning Rule: The method that adjusts weights and biases over time to improve
accuracy.

Learning in neural networks follows a structured, three-stage process:

Input Computation: Data is fed into the network.

Output Generation: Based on the current parameters, the network generates an output.
Iterative Refinement: The network refines its output by adjusting weights and biases,
gradually improving its performance on diverse tasks.

In an adaptive learning environment:

 The neural network is exposed to a simulated scenario or dataset.

 Parameters such as weights and biases are updated in response to new data or
conditions.
 With each adjustment, the network’s response evolves, allowing it to adapt effectively
to different tasks or environments.
Layers in Neural Network Architecture

1. Input Layer: This is where the network receives its input data. Each input neuron in
the layer corresponds to a feature in the input data.

2. Hidden Layers: These layers perform most of the computational heavy lifting. A
neural network can have one or multiple hidden layers. Each layer consists of units
(neurons) that transform the inputs into something that the output layer can use.

3. Output Layer: The final layer produces the output of the model. The format of these
outputs varies depending on the specific task (e.g., classification, regression).

Working of Neural Networks

Forward Propagation
When data is input into the network, it passes through the network in the forward direction,
from the input layer through the hidden layers to the output layer. This process is known as
forward propagation.

1. Linear Transformation: Each neuron in a layer receives inputs, which are multiplied
by the weights associated with the connections. These products are summed together,
and a bias is added to the sum. This can be represented mathematically as: z = w1x1+w2
x2+…+wnxn+b where w represents the weights, x represents the inputs, and b is the
bias.

2. Activation: The result of the linear transformation (denoted as z) is then passed through
an activation function. The activation function is crucial because it introduces non-
linearity into the system, enabling the network to learn more complex patterns. Popular
activation functions include ReLU, sigmoid, and tanh.

Backpropagation

After forward propagation, the network evaluates its performance using a loss function, which
measures the difference between the actual output and the predicted output. The goal of training
is to minimize this loss. This is where backpropagation comes into play:

1. Loss Calculation: The network calculates the loss, which provides a measure of error
in the predictions. The loss function could vary; common choices are mean squared
error for regression tasks or cross-entropy loss for classification.

2. Gradient Calculation: The network computes the gradients of the loss function with
respect to each weight and bias in the network. This involves applying the chain rule of
calculus to find out how much each part of the output error can be attributed to each
weight and bias.

3. Weight Update: Once the gradients are calculated, the weights and biases are updated
using an optimization algorithm like stochastic gradient descent (SGD). The weights
are adjusted in the opposite direction of the gradient to minimize the loss. The size of
the step taken in each update is determined by the learning rate.

Iteration

This process of forward propagation, loss calculation, backpropagation, and weight updates is
repeated for many iterations over the dataset. Over time, this iterative process reduces the loss,
and the network’s predictions become more accurate.

Through these steps, neural networks can adapt their parameters to better approximate the
relationships in the data, thereby improving their performance on tasks such as classification,
regression, or any other predictive modeling.
3.1.6 Multilayer Perceptron

A Multi-Layer Perceptron (MLP) consists of fully connected dense layers that transform
input data from one dimension to another. It is called “multi-layer” because it contains an input
layer, one or more hidden layers, and an output layer. The purpose of an MLP is to model
complex relationships between inputs and outputs, making it a powerful tool for various
machine learning tasks.

The key components of Multi-Layer Perceptron include:

 Input Layer: Each neuron (or node) in this layer corresponds to an input feature. For
instance, if you have three input features, the input layer will have three neurons.

 Hidden Layers: An MLP can have any number of hidden layers, with each layer
containing any number of nodes. These layers process the information received from
the input layer.

 Output Layer: The output layer generates the final prediction or result. If there are
multiple outputs, the output layer will have a corresponding number of neurons.

Working of Multi-Layer Perceptron

Let’s delve in to the working of the multi-layer perceptron. The key mechanisms such as
forward propagation, loss function, backpropagation, and optimization.

Step 1: Forward Propagation

In forward propagation, the data flows from the input layer to the output layer, passing
through any hidden layers. Each neuron in the hidden layers processes the input as follows:

1. Weighted Sum: The neuron computes the weighted sum of the inputs:

z=∑wixi+b

where, xi is the input feature, wi is the corresponding weight, and b is the bias term.

2. Activation Function: The weighted sum z is passed through an activation function to

introduce non-linearity. Common activation functions include:

 Sigmoid

 ReLU (Rectified Linear Unit)

 Tanh (Hyperbolic Tangent)

Step 2: Loss Function

Once the network generates an output, the next step is to calculate the loss using a loss function.
In supervised learning, this compares the predicted output to the actual label.

For a classification problem, the commonly used binary cross-entropy loss function is:

Where;

 yi is the actual label.

 is the predicted label.

 N is the number of samples.

For regression problems, the mean squared error (MSE) is often used:

Step 3: Backpropagation

The goal of training an MLP is to minimize the loss function by adjusting the network’s weights
and biases. This is achieved through backpropagation:

1. Gradient Calculation: The gradients of the loss function with respect to each weight and
bias are calculated using the chain rule of calculus.

2. Error Propagation: The error is propagated back through the network, layer by layer.

3. Gradient Descent: The network updates the weights and biases by moving in the
opposite direction of the gradient to reduce the loss.
where,

w is the weight, η is the learning rate, and ∂L/∂w is the gradient of the loss function with respect
to the weight.

Gradient Descent (GD)

Concept:

In Gradient Descent, we calculate the gradient of the loss function with respect to the model
parameters (weights and biases) for the entire dataset and update the parameters in the direction
that reduces the loss. It does this by computing the mean gradient over all training examples.

Steps in Gradient Descent:

1. Initialization: Randomly initialize the model parameters (weights and biases).

2. Compute Gradient: Compute the gradient (partial derivatives) of the loss function
with respect to all parameters using the entire dataset.
3. Update Parameters: Update the parameters by subtracting the product of the learning
rate and the computed gradients.
4. Repeat: Repeat the process for a set number of iterations or until convergence.

The update rule is given by:

Where:

 θ is the vector of parameters (weights and biases).

 η is the learning rate (step size).
 ∇J(θ) is the gradient of the loss function with respect to the parameters.

Pros:

 Accurate Update: Since we use the gradient from the whole dataset, the update is more
accurate.
 Convergence: Gradient Descent tends to converge steadily toward the minimum of the
loss function (provided the learning rate is set appropriately).

Cons:

 Computationally Expensive: For large datasets, computing the gradient over the entire
dataset at each step can be very slow and requires a lot of memory.
 Slow Convergence: Gradient Descent may take many iterations to converge, especially
if the dataset is large.
Stochastic Gradient Descent (SGD)

 Concept:
 In Stochastic Gradient Descent (SGD), instead of computing the gradient over the
entire dataset, we update the parameters after computing the gradient for each
individual training example. This makes the updates faster because we are not waiting
for the entire dataset to be processed.
 SGD updates the parameters in a much more noisy way, as each training example
provides a noisy approximation of the true gradient.

Pros:

 Faster Updates: Since it processes one training example at a time, SGD can make
updates much faster, especially for large datasets.
 Can Escape Local Minima: The noisy updates in SGD (due to the random selection
of individual samples) can help the algorithm escape local minima or saddle points,
potentially leading to better solutions.
 Less Memory Usage: As we only need to compute the gradient for one sample at a
time, SGD requires much less memory than standard Gradient Descent.

Cons:

 Noisy Updates: Because we update based on a single sample, the gradient can be
quite noisy. This can cause the algorithm to oscillate or diverge if not managed
properly (e.g., with learning rate decay).
 Convergence Issues: Although it can be faster in terms of computation, the noisy
updates can cause slow convergence, and it may not converge to the exact minimum
of the loss function. It may keep oscillating around the optimal point.
 Sensitive to Learning Rate: The learning rate needs to be chosen carefully. A large
learning rate can lead to divergence, and a small one can slow down convergence.

Mini-Batch Gradient Descent

To address the weaknesses of both Gradient Descent and Stochastic Gradient Descent,
there's a middle-ground approach known as Mini-Batch Gradient Descent.

 Mini-Batch Gradient Descent splits the dataset into small batches and performs
updates based on each batch. This combines the advantages of both methods:
o It provides faster convergence compared to full-batch GD.
o It smoothens the noisy updates from SGD, providing more stable convergence.

Mini-Batch GD Update Rule:

Where:
 mmm is the batch size.
 The summation is over the batch of mmm samples.

Mini-Batch Gradient Descent is widely used in practice for training large models, especially
in deep learning, because it balances the speed of SGD and the stable convergence of GD.

Step 4: Optimization

MLPs rely on optimization algorithms to iteratively refine the weights and biases during
training. Popular optimization methods include:

 Stochastic Gradient Descent (SGD): Updates the weights based on a single sample or a
small batch of data:

 Adam Optimizer: An extension of SGD that incorporates momentum and adaptive

learning rates for more efficient training:

 Here, gt represents the gradient at time t, and β1, β2 are decay rates.

Advantages of Multi-Layer Perceptron

 Versatility: MLPs can be applied to a variety of problems, both classification and

regression.

 Non-linearity: MLPs can model complex, non-linear relationships in data.

 Parallel Computation: With the help of GPUs, MLPs can be trained quickly by taking
advantage of parallel computing.

Disadvantages of Multi-Layer Perceptron

 Computationally Expensive: MLPs can be slow to train, especially on large datasets

with many layers.

 Prone to Overfitting: Without proper regularization techniques, MLPs can overfit the
training data, leading to poor generalization.

 Sensitivity to Data Scaling: MLPs require properly normalized or scaled data for
optimal performance.

ColorGATE RIP-Software Release Notes 8.00 Build 5055
No ratings yet
ColorGATE RIP-Software Release Notes 8.00 Build 5055
34 pages
Process Gas Compressors
100% (1)
Process Gas Compressors
24 pages
Unit 1
No ratings yet
Unit 1
72 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Graph Theory Report
No ratings yet
Graph Theory Report
9 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
NEURAL NETWORKS
No ratings yet
NEURAL NETWORKS
4 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
19 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Neural
No ratings yet
Neural
53 pages
Unit-3 ML
No ratings yet
Unit-3 ML
21 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
3rd Unit ML
No ratings yet
3rd Unit ML
7 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Unit III
No ratings yet
Unit III
29 pages
Neural Network
100% (1)
Neural Network
54 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Mod 4 Notes
No ratings yet
Mod 4 Notes
46 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
34 pages
Aiml Unit 5
No ratings yet
Aiml Unit 5
34 pages
ML Unit 2 Lecture Notes
No ratings yet
ML Unit 2 Lecture Notes
20 pages
09-Neural Networks
No ratings yet
09-Neural Networks
18 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Basics of Deep Learning
No ratings yet
Basics of Deep Learning
20 pages
Unit 4
No ratings yet
Unit 4
38 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
38 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Neural Network
No ratings yet
Neural Network
97 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
4-perceptron-06-08-2025
No ratings yet
4-perceptron-06-08-2025
32 pages
AI Week 12
No ratings yet
AI Week 12
2 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
Building Blocks of Neural Networks
No ratings yet
Building Blocks of Neural Networks
3 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
Unit 1
No ratings yet
Unit 1
20 pages
DLC OBE Assignment Solution 22-49016-3
No ratings yet
DLC OBE Assignment Solution 22-49016-3
3 pages
SDQCQAManual
No ratings yet
SDQCQAManual
344 pages
NCC Group Microsoft Msft283 Report 2023-10-13 v12
No ratings yet
NCC Group Microsoft Msft283 Report 2023-10-13 v12
71 pages
CIT 207 MODULE v2
No ratings yet
CIT 207 MODULE v2
57 pages
Silicon N-Channel Power MOSFET: General Description
No ratings yet
Silicon N-Channel Power MOSFET: General Description
10 pages
MP - ECE - UNIT-2 8086 and Interfacing
No ratings yet
MP - ECE - UNIT-2 8086 and Interfacing
60 pages
The Diagnostic Process: Learning Objectives Key Terms
No ratings yet
The Diagnostic Process: Learning Objectives Key Terms
21 pages
IARPA Cyber-Attack Automated Unconventional Sensor Environment (CAUSE)
No ratings yet
IARPA Cyber-Attack Automated Unconventional Sensor Environment (CAUSE)
93 pages
Jwt-Auth: Pacote: Tymon/Jwt-Auth Github: Documentação: 1. Instalar O Pacote
No ratings yet
Jwt-Auth: Pacote: Tymon/Jwt-Auth Github: Documentação: 1. Instalar O Pacote
3 pages
S-1206 Series: Ultra Low Current Consumption and Low Dropout Cmos Voltage Regulator
No ratings yet
S-1206 Series: Ultra Low Current Consumption and Low Dropout Cmos Voltage Regulator
35 pages
The Machine Learning Solutions Architect Handbook - 2nd Edition (Early Access) David Ping All Chapter Instant Download
100% (1)
The Machine Learning Solutions Architect Handbook - 2nd Edition (Early Access) David Ping All Chapter Instant Download
49 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
7 pages
Modeling & Simulation: An Introduction To Business Process Simulation
No ratings yet
Modeling & Simulation: An Introduction To Business Process Simulation
24 pages
JioFiber Tariff For Business
No ratings yet
JioFiber Tariff For Business
1 page
HWS701 Manual
No ratings yet
HWS701 Manual
24 pages
RDBMS - SQL
No ratings yet
RDBMS - SQL
30 pages
Marketing Information Systems
No ratings yet
Marketing Information Systems
7 pages
Prasana Kumar.S: Educational Qualification
No ratings yet
Prasana Kumar.S: Educational Qualification
2 pages
NSi AutoStore InstallGuide en
No ratings yet
NSi AutoStore InstallGuide en
49 pages
Deep Dive Microservices Architecture
No ratings yet
Deep Dive Microservices Architecture
2 pages
Intel® Core™2 Duo Processor E7500
No ratings yet
Intel® Core™2 Duo Processor E7500
4 pages
Source Code
No ratings yet
Source Code
49 pages
EI8751-Industrial Data Networks
No ratings yet
EI8751-Industrial Data Networks
10 pages
Box Plot
No ratings yet
Box Plot
4 pages
Itu-T G.841
No ratings yet
Itu-T G.841
98 pages
BPM Guide
No ratings yet
BPM Guide
10 pages
Dell Latitude E5400 and E5500 Spec Sheet
100% (1)
Dell Latitude E5400 and E5500 Spec Sheet
2 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
126 pages

Neural Networks

Uploaded by

Neural Networks

Uploaded by

Neural networks

Learning in neural networks follows a structured, three-stage process:

Input Computation: Data is fed into the network.

In an adaptive learning environment:

 The neural network is exposed to a simulated scenario or dataset.

Working of Neural Networks

The key components of Multi-Layer Perceptron include:

Working of Multi-Layer Perceptron

Step 1: Forward Propagation

2. Activation Function: The weighted sum z is passed through an activation function to

 ReLU (Rectified Linear Unit)

Step 2: Loss Function

 yi is the actual label.

 is the predicted label.

 N is the number of samples.

Gradient Descent (GD)

Steps in Gradient Descent:

1. Initialization: Randomly initialize the model parameters (weights and biases).

The update rule is given by:

 θ is the vector of parameters (weights and biases).

Mini-Batch Gradient Descent

Mini-Batch GD Update Rule:

 Adam Optimizer: An extension of SGD that incorporates momentum and adaptive

Advantages of Multi-Layer Perceptron

 Versatility: MLPs can be applied to a variety of problems, both classification and

 Non-linearity: MLPs can model complex, non-linear relationships in data.

Disadvantages of Multi-Layer Perceptron

 Computationally Expensive: MLPs can be slow to train, especially on large datasets

You might also like