0% found this document useful (0 votes)
10 views

ch1 of artificial newral network

Notes

Uploaded by

Danish Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

ch1 of artificial newral network

Notes

Uploaded by

Danish Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to Artificial Neural Networks (ANN)

An Artificial Neural Network (ANN) is a computational model inspired by the way biological neural
networks in the human brain process information. It is used in machine learning to solve complex
problems, including pattern recognition, decision-making, and data classification.

Components of an ANN:

1. Neurons (Nodes): The basic processing units that mimic biological neurons. Each neuron receives inputs,
applies weights, adds biases, and processes them through an activation function.
2. Weights and Biases:
a. Weights determine the importance of each input.
b. Bias shifts the activation function to improve the model's learning capability.
3. Layers:
a. Input Layer: Receives data inputs.
b. Hidden Layers: Perform computations. These layers enable the network to capture complex patterns.
c. Output Layer: Produces the final prediction or decision.
4. Activation Functions: Introduce non-linearity to help the network learn complex relationships. Examples
include ReLU, sigmoid, and tanh.

Working of an ANN:

1. Forward Propagation:
a. Data flows from the input layer, through the hidden layers, to the output layer.
b. The network computes outputs using the activation functions at each layer.
2. Loss Function:
a. Measures the difference between the predicted output and the actual output.
3. Backward Propagation:
a. The network adjusts weights and biases using gradient descent to minimize the loss function.

Introduction to Deep Learning Networks

A Deep Learning Network is a type of ANN with multiple hidden layers, allowing it to learn more abstract and complex
representations. Deep learning has revolutionized fields like image recognition, natural language processing (NLP), and
autonomous systems.
Key Concepts in Deep Learning:

1. Deep Neural Networks (DNN):


a. DNNs consist of many hidden layers.
b. These networks automatically learn features from raw data, reducing the need for manual feature
engineering.
2. Convolutional Neural Networks (CNNs):
a. Specialized for image and spatial data.
b. Use convolutional layers to capture spatial hierarchies.
3. Recurrent Neural Networks (RNNs):
a. Designed for sequential data like time series and text.
b. Use feedback loops and memory cells (e.g., LSTM, GRU) for capturing temporal patterns.
4. Generative Adversarial Networks (GANs):
a. Comprise two networks: a generator and a discriminator.
b. Generate realistic data, such as synthetic images or videos.
5. Transformer Models:
a. Built for NLP tasks.
b. Use attention mechanisms to capture dependencies in data. (e.g., GPT, BERT)

How Deep Learning Works:

1. Feature Extraction:
a. Lower layers capture basic features like edges or simple patterns.
2. Training: Uses large datasets and powerful computing resources (e.g., GPUs, TPUs).
a. Requires advanced optimizers like Adam or RMSProp.
3. Applications:
a. Self-driving cars (image and sensor data analysis).
b. Speech recognition and synthesis.
c. Healthcare (diagnostic predictions from medical images).

Feature ANN Deep Learning


Number of Layers Few hidden layers Many hidden layers
Data Processing Requires feature engineering Learns features automatically
Applications Simple problems Complex tasks (e.g., NLP, vision)
Less computationally Requires significant computational
Computation
intensive power
The ANN(Artificial Neural Network) is based on BNN(Biological Neural Network) as its primary goal is
to fully imitate the Human Brain and its functions. Similar to the brain having neurons interlinked to
each other, the ANN also has neurons that are linked to each other in various layers of the networks
which are known as nodes.

The ANN learns through various learning algorithms that are described as supervised or unsupervised
learning.

• In supervised learning algorithms, the target values are labeled. Its goal is to try to reduce the
error between the desired output (target) and the actual output for optimization. Here, a
supervisor is present.
• In unsupervised learning algorithms, the target values are not labeled and the network learns
by itself by identifying the patterns through repeated trials and experiments.

ANN Terminology:

• Weights: each neuron is linked to the other neurons through connection links that carry weight.
The weight has information and data about the input signal. The output depends solely on the
weights and input signal. The weights can be presented in a matrix form that is known as the
Connection matrix.
• if there are “n” nodes with each node having “m” weights, then it is represented as:

• Bias: Bias is a constant that is added to the product of inputs and weights to calculate the
product. It is used to shift the result to the positive or negative side. The net input weight is
increased by a positive bias while The net input weight is decreased by a negative bias.

Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the function g(x) which
sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+........+xn+b

and the role of the activation is to provide the output depending on the results of the summation
function:
Y=1 if g(x)>=0
Y=0 else

• Threshold: A threshold value is a constant value that is compared to the net input to get the
output. The activation function is defined based on the threshold value to calculate the output.
For Example:
Y=1 if net input>=threshold
Y=0 else

• Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used for balancing
weights during the learning of ANN.
• Target value: Target values are Correct values of the output variable and are also known as just
targets.
• Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:

• Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is also known as
Least Mean Square Method. It reduces the error over the entire learning and training process. In
order to minimize error, it follows the gradient descent method in which the Activation Function
continues forever.
• Outstar Learning: It was first proposed by Grossberg in 1976, where we use the concept that a
Neural Network is arranged in layers, and weights connected through a particular node should
be equal to the desired output resulting in neurons that are connected with those weights.
Unsupervised Learning Algorithms:

• Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of nodes in a
network. The change in weight is based on input, output, and learning rate. the transpose of the
output is needed for weight adjustment.
• Competitive Learning: It is a winner takes all strategy. Here, when an input pattern is sent to
the network, all the neurons in the layer compete with each other to represent the input pattern,
the winner gets the output as 1 and all the others 0, and only the winning neurons have weight
adjustments.
Perceptron is a type of neural network that performs binary classification that maps input features to
an output decision, usually classifying data into one of two categories, such as 0 or 1.

Perceptron consists of a single layer of input nodes that are fully connected to a layer of output
nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of artificial
neurons called Threshold Logic Units (TLU), which were first introduced by McCulloch and Walter
Pitts in the 1940s. This foundational model has played a crucial role in the development of more
advanced neural networks and machine learning algorithms.
Types of Perceptron

1. Single-Layer Perceptron is a type of perceptron is limited to learning linearly separable


patterns. It is effective for tasks where the data can be divided into distinct categories through
a straight line. While powerful in its simplicity, it struggles with more complex problems where
the relationship between inputs and outputs is non-linear.
2. Multi-Layer Perceptron possess enhanced processing capabilities as they consist of two or
more layers, adept at handling more complex patterns and relationships within the data.
Basic Components of Perceptron
A Perceptron is composed of key components that work together to process information and make
predictions.

• Input Features: The perceptron takes multiple input features, each representing a characteristic
of the input data.
• Weights: Each input feature is assigned a weight that determines its influence on the output.
These weights are adjusted during training to find the optimal values.
• Summation Function: The perceptron calculates the weighted sum of its inputs, combining
them with their respective weights.
• Activation Function: The weighted sum is passed through the Heaviside step function,
comparing it to a threshold to produce a binary output (0 or 1).
• Output: The final output is determined by the activation function, often used for binary
classification tasks.
• Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.
• Learning Algorithm: The perceptron adjusts its weights and bias using a learning algorithm,
such as the Perceptron Learning Rule, to minimize prediction errors.
These components enable the perceptron to learn from data and make predictions. While a single
perceptron can handle simple binary classification, complex tasks require multiple perceptrons
organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the importance of that input in
determining the output. The Perceptron’s output is calculated as a weighted sum of the inputs, which
is then passed through an activation function to decide whether the Perceptron will fire.

The weighted sum is computed as:


z=w1x1+w2x2+…+wnxn=XTWz=w1 x1 +w2 x2 +…+wn xn =XTW

The step function compares this weighted sum to a threshold. If the input is larger than the threshold
value, the output is 1; otherwise, it’s 0. This is the most common activation function used in
Perceptrons are represented by the Heaviside step function:

h(z)={0if z<Threshold1if z≥Thresholdh(z)={01 if z<Thresholdif z≥Threshold

A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully connected
to all input nodes.In a fully connected layer, also known as a dense layer, all neurons in one layer are
connected to every neuron in the previous layer.

The output of the fully connected layer is computed as:


fW,b(X)=h(XW+b)fW,b (X)=h(XW+b)

where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is the step
function.

During training, the Perceptron’s weights are adjusted to minimize the difference between the
predicted output and the actual output. This is achieved using supervised learning algorithms like the
delta rule or the Perceptron learning rule.

The weight update formula is:


wi,j=wi,j+η(yj−y^j)xiwi,j =wi,j +η(yj −y^ j )xi

Where:

• wi,jwi,j is the weight between the ithith input and jthjth output neuron,
• xixi is the ithith input value,
• yjyj is the actual value, and y^jy^ j is the predicted value,
• ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy over time.

Example: Perceptron in Action

Let’s take a simple example of classifying whether a given fruit is an apple or not based on two
inputs: its weight (in grams) and its color (on a scale of 0 to 1, where 1 means red). The perceptron
receives these inputs, multiplies them by their weights, adds a bias, and applies the activation
function to decide whether the fruit is an apple or not.

• Input 1 (Weight): 150 grams


• Input 2 (Color): 0.9 (since the fruit is mostly red)
• Weights: [0.5, 1.0]
• Bias: 1.5
The perceptron’s weighted sum would be:
(150∗0.5)+(0.9∗1.0)+1.5=76.4 (150∗0.5)+(0.9∗1.0)+1.5=76.4

Let’s assume the activation function uses a threshold of 75. Since 76.4 > 75, the perceptron classifies
the fruit as an apple (output = 1).

Building and Training Single Layer Perceptron Model


For building the perceptron model we are going to implement following steps

Step 1: Initialize the weight and learning rate

We consider the weight values for the number of inputs + 1 (with the additional +1 accounting for
the bias term). This ensures that both the inputs and bias are included during training.

Step 2: Define the Linear Layer

The first step is to calculate the weighted sum of the inputs. This is done using the formula: Z = XW +
b, where X represents the inputs, W the weights, and b the bias.
def linear(self, inputs):
Z = inputs @ self.weights[1:].T + self.weights[0] # Weighted sum: XW + b
return Z

Step 3: Define the Activation Function


The Heaviside Step function is used as the activation function, which compares the weighted sum to
a threshold. If the sum is greater than or equal to 0, it outputs 1; otherwise, it outputs 0.

Step 4: Define the Prediction

Use the linear function followed by the activation function to generate predictions based on the input
features.

Step 5: Define the Loss FunctionThe loss function calculates the error between the predicted
output and the actual output. In the Perceptron, the loss is the difference between the target value
and the predicted value.

Step 6: Define Training

In this step, weights and bias are updated according to the error calculated from the loss function.
The Perceptron learning rule is applied to adjust the weights to minimize the error.

Step 7: Fit the Model

The fitting process involves training the model over multiple iterations (epochs) to adjust the weights
and bias. This allows the Perceptron to learn from the data and improve its prediction accuracy over
time.

What is Backpropagation?
Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial neural
networks, particularly feed-forward networks. It works iteratively, minimizing the cost function by
adjusting weights and biases.

In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient
descent. The algorithm computes the gradient using the chain rule from calculus, allowing it to
effectively navigate complex layers in the neural network to minimize the cost function.
Why is Backpropagation Important?
Backpropagation plays a critical role in how neural networks improve over time. Here's why:

1. Efficient Weight Update: It computes the gradient of the loss function with respect to each
weight using the chain rule, making it possible to update weights efficiently.
2. Scalability: The backpropagation algorithm scales well to networks with multiple layers and
complex architectures, making deep learning feasible.
3. Automated Learning: With backpropagation, the learning process becomes automated, and
the model can adjust itself to optimize its performance.
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward Pass.

How Does the Forward Pass Work?

In the forward pass, the input data is fed into the input layer. These inputs, combined with their
respective weights, are passed to hidden layers.

For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output from
h1 serves as the input to h2. Before applying an activation function, a bias is added to the weighted
inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the
input if it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex
relationships in the data. Finally, the outputs from the last hidden layer are passed to the output
layer, where an activation function, such as softmax, converts the weighted outputs into probabilities
for classification.

The forward pass using weights and biases

How Does the Backward Pass Work?

In the backward pass, the error (the difference between the predicted and actual output) is
propagated back through the network to adjust the weights and biases. One common method for
error calculation is the Mean Squared Error (MSE), given by:

MSE=(Predicted Output−Actual Output)2MSE=(Predicted Output−Actual Output)2

Once the error is calculated, the network adjusts weights using gradients, which are computed with
the chain rule. These gradients indicate how much each weight and bias should be adjusted to
minimize the error in the next iteration. The backward pass continues layer by layer, ensuring that the
network learns and improves its performance. The activation function, through its derivative, plays a
crucial role in computing these gradients during backpropagation.
Example geeks for geek swe kr lena
Types Of Learning Rules in ANN
Learning rule enhances the Artificial Neural Network’s performance by applying this rule over the

network. Thus learning rule updates the weights and bias levels of a network when certain

conditions are met in the training process. it is a crucial part of the development of the Neural

Network.

Types Of Learning Rules in ANN

1. Hebbian Learning Rule


Donald Hebb developed it in 1949 as an unsupervised learning algorithm in the neural network. We
can use it to improve the weights of nodes of a network. The following phenomenon occurs when

• If two neighbor neurons are operating in the same phase at the same period of time, then the
weight between these neurons should increase.
• For neurons operating in the opposite phase, the weight between them should decrease.
• If there is no signal correlation, the weight does not change, the sign of the weight between
two nodes depends on the sign of the input between those nodes
• When inputs of both the nodes are either positive or negative, it results in a strong positive
weight.
• If the input of one node is positive and negative for the other, a strong negative weight is
present.
Mathematical Formulation:
δw=αxiy
where δw=change in weight,α is the learning rate.xi the input vector,y the output.
2. Perceptron Learning Rule

It was introduced by Rosenblatt. It is an error-correcting rule of a single-layer feedforward network.


it is supervised in nature and calculates the error between the desired and actual output and if the
output is present then only adjustments of weight are done.

Computed as follows:

Assume (x1,x2,x3……………………….xn) –>set of input vectors

and (w1,w2,w3…………………..wn) –>set of weights

y=actual output

wo=initial weight

wnew=new weight

δw=change in weight

α=learning rate

actual output(y)=wixi

learning signal(ej)=ti-y (difference between desired and actual output)

δw=αxiej

wnew=wo+δw

Now, the output can be calculated on the basis of the input and the activation function applied over
the net input and can be expressed as:

y=1, if net input>=θ

y=0, if net input<θ


3. Delta Learning Rule

It was developed by Bernard Widrow and Marcian Hoff and It depends on supervised learning and
has a continuous activation function. It is also known as the Least Mean Square method and it
minimizes error over all the training patterns.

It is based on a gradient descent approach which continues forever. It states that the modification in
the weight of a node is equal to the product of the error and the input where the error is the
difference between desired and actual output.

Computed as follows:

Assume (x1,x2,x3……………………….xn) –>set of input vectors

and (w1,w2,w3…………………..wn) –>set of weights

y=actual output

wo=initial weight

wnew=new weight

δw=change in weight

Error= ti-y

Learning signal(ej)=(ti-y)y’

y=f(net input)= ∫wixi

δw=αxiej=αxi(ti-y)y’
wnew=wo+δw

The updating of weights can only be done if there is a difference between the target and actual
output(i.e., error) present:

case I: when t=y

then there is no change in weight

case II: else

wnew=wo+δw

4. Correlation Learning Rule

The correlation learning rule follows the same similar principle as the Hebbian learning rule,i.e., If
two neighbor neurons are operating in the same phase at the same period of time, then the weight
between these neurons should be more positive. For neurons operating in the opposite phase, the
weight between them should be more negative but unlike the Hebbian rule, the correlation rule is
supervised in nature here, the targeted response is used for the calculation of the change in weight.

In Mathematical form:

δw=αxitj

where δw=change in weight,α=learning rate,xi=set of the input vector, and tj=target value

5. Out Star Learning Rule

It was introduced by Grossberg and is a supervised training procedure.


Out Star Learning Rule is implemented when nodes in a network are arranged in a layer. Here the
weights linked to a particular node should be equal to the targeted outputs for the nodes connected
through those same weights. Weight change is thus calculated as=δw=α(t-y)

Where α=learning rate, y=actual output, and t=desired output for n layer nodes.

6. Competitive Learning Rule

It is also known as the Winner-takes-All rule and is unsupervised in nature. Here all
the output nodes try to compete with each other to represent the input pattern and the
winner is declared according to the node having the most outputs and is given the
output 1 while the rest are given 0.

There are a set of neurons with arbitrarily distributed weights and the activation function is applied
to a subset of neurons. Only one neuron is active at a time. Only the winner has updated weights, the
rest remain unchanged.
Activation Functions

Activation functions introduce non-linearity into a neural network, allowing it to learn complex
patterns. They are applied to the output of neurons to determine whether they should "fire" or
contribute to the final output. Here's a detailed breakdown:

Types of Activation Functions

1. Linear Activation Function


a. Equation: f(x) = x
b. Use Case: Rarely used in hidden layers because it lacks non-linearity. However, it can be
used in the output layer for regression tasks.
c. Limitation: Cannot capture complex patterns due to lack of non-linearity.
2. Sigmoid Function
a. Equation: f(x)= 1/1+e^-x
b. Range: (0, 1)
c. Use Case: Used for binary classification (logistic regression).
d. Pros: Smooth gradient, interpretable output as probabilities.
e. Cons: Prone to vanishing gradient problem for large inputs.
3. Tanh (Hyperbolic Tangent)
a. Equation: f(x)= e^x –e^-x/e^x +e^-x
b. Range: (-1, 1)
c. Use Case: Centers data around 0, improving convergence in some models.
d. Pros: Better than sigmoid for hidden layers.
e. Cons: Still suffers from vanishing gradients.
4. ReLU (Rectified Linear Unit)
a. Equation: f(x)=max⁡(0,x)
b. Range: [0, ∞)
c. Use Case: Most common in hidden layers.
d. Pros: Efficient computation, mitigates vanishing gradient issue.
e. Cons: Can lead to "dead neurons" (outputs stuck at 0).
5. Leaky ReLU
a. Equation: f(x)={ x,x>0

αx, x≤0 }

b. Range: (-∞, ∞)
c. Use Case: Addresses the dead neuron problem.
d. Pros: Allows small gradients for negative inputs.
e. Cons: Choice of α\alpha is often arbitrary.
6. Softmax
a. Equation:
b. Range: (0, 1) f(x)=x
c. Use Case: Multi-class classification.
d. Pros: Outputs probabilities that sum to 1.
e. Cons: Computationally expensive for large output spaces.

Loss Functions

Loss functions quantify the difference between the predicted output and the actual target, guiding
the optimization process.

Types of Loss Functions

1. Mean Squared Error (MSE)


a. Equation:
b. Use Case: Regression problems.
c. Pros: Sensitive to outliers.
d. Cons: Penalizes larger errors more heavily.
2. Mean Absolute Error (MAE)
a. Equation:
b. Use Case: Regression problems.
c. Pros: Robust to outliers.
d. Cons: May converge slower than MSE.
3. Binary Cross-Entropy (BCE)
a. Equation:
b. Use Case: Binary classification.
c. Pros: Effective for probabilistic outputs.
d. Cons: Requires outputs to be in the range (0, 1).
4. Categorical Cross-Entropy
a. Equation:
b. Use Case: Multi-class classification.
c. Pros: Works well with softmax outputs.
d. Cons: Sensitive to incorrect predictions.
5. Huber Loss
a. Equation: - \
b. Use Case: Regression with outliers.
c. Pros: Combines benefits of MAE and MSE.
d. Cons: Requires tuning of δ\delta.

Function Approximation Perspective

In the context of function approximation, activation and loss functions play complementary roles:

• Activation Functions:
o Enable networks to approximate non-linear functions. Without them, neural networks
would be limited to linear transformations.
o Different activation functions introduce flexibility in modeling varying complexity levels
of the target function.
• Loss Functions:
o Guide the optimization to minimize the discrepancy between the network’s
approximation and the target function.
o The choice of loss function influences how errors are penalized, affecting the learning of
the function.
Together, they enable neural networks to approximate highly complex mappings from inputs to
outputs, learning representations that can generalize well to unseen data.

You might also like