ch1 of artificial newral network
ch1 of artificial newral network
An Artificial Neural Network (ANN) is a computational model inspired by the way biological neural
networks in the human brain process information. It is used in machine learning to solve complex
problems, including pattern recognition, decision-making, and data classification.
Components of an ANN:
1. Neurons (Nodes): The basic processing units that mimic biological neurons. Each neuron receives inputs,
applies weights, adds biases, and processes them through an activation function.
2. Weights and Biases:
a. Weights determine the importance of each input.
b. Bias shifts the activation function to improve the model's learning capability.
3. Layers:
a. Input Layer: Receives data inputs.
b. Hidden Layers: Perform computations. These layers enable the network to capture complex patterns.
c. Output Layer: Produces the final prediction or decision.
4. Activation Functions: Introduce non-linearity to help the network learn complex relationships. Examples
include ReLU, sigmoid, and tanh.
Working of an ANN:
1. Forward Propagation:
a. Data flows from the input layer, through the hidden layers, to the output layer.
b. The network computes outputs using the activation functions at each layer.
2. Loss Function:
a. Measures the difference between the predicted output and the actual output.
3. Backward Propagation:
a. The network adjusts weights and biases using gradient descent to minimize the loss function.
A Deep Learning Network is a type of ANN with multiple hidden layers, allowing it to learn more abstract and complex
representations. Deep learning has revolutionized fields like image recognition, natural language processing (NLP), and
autonomous systems.
Key Concepts in Deep Learning:
1. Feature Extraction:
a. Lower layers capture basic features like edges or simple patterns.
2. Training: Uses large datasets and powerful computing resources (e.g., GPUs, TPUs).
a. Requires advanced optimizers like Adam or RMSProp.
3. Applications:
a. Self-driving cars (image and sensor data analysis).
b. Speech recognition and synthesis.
c. Healthcare (diagnostic predictions from medical images).
The ANN learns through various learning algorithms that are described as supervised or unsupervised
learning.
• In supervised learning algorithms, the target values are labeled. Its goal is to try to reduce the
error between the desired output (target) and the actual output for optimization. Here, a
supervisor is present.
• In unsupervised learning algorithms, the target values are not labeled and the network learns
by itself by identifying the patterns through repeated trials and experiments.
ANN Terminology:
• Weights: each neuron is linked to the other neurons through connection links that carry weight.
The weight has information and data about the input signal. The output depends solely on the
weights and input signal. The weights can be presented in a matrix form that is known as the
Connection matrix.
• if there are “n” nodes with each node having “m” weights, then it is represented as:
• Bias: Bias is a constant that is added to the product of inputs and weights to calculate the
product. It is used to shift the result to the positive or negative side. The net input weight is
increased by a positive bias while The net input weight is decreased by a negative bias.
Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the function g(x) which
sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+........+xn+b
and the role of the activation is to provide the output depending on the results of the summation
function:
Y=1 if g(x)>=0
Y=0 else
• Threshold: A threshold value is a constant value that is compared to the net input to get the
output. The activation function is defined based on the threshold value to calculate the output.
For Example:
Y=1 if net input>=threshold
Y=0 else
• Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used for balancing
weights during the learning of ANN.
• Target value: Target values are Correct values of the output variable and are also known as just
targets.
• Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:
• Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is also known as
Least Mean Square Method. It reduces the error over the entire learning and training process. In
order to minimize error, it follows the gradient descent method in which the Activation Function
continues forever.
• Outstar Learning: It was first proposed by Grossberg in 1976, where we use the concept that a
Neural Network is arranged in layers, and weights connected through a particular node should
be equal to the desired output resulting in neurons that are connected with those weights.
Unsupervised Learning Algorithms:
• Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of nodes in a
network. The change in weight is based on input, output, and learning rate. the transpose of the
output is needed for weight adjustment.
• Competitive Learning: It is a winner takes all strategy. Here, when an input pattern is sent to
the network, all the neurons in the layer compete with each other to represent the input pattern,
the winner gets the output as 1 and all the others 0, and only the winning neurons have weight
adjustments.
Perceptron is a type of neural network that performs binary classification that maps input features to
an output decision, usually classifying data into one of two categories, such as 0 or 1.
Perceptron consists of a single layer of input nodes that are fully connected to a layer of output
nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of artificial
neurons called Threshold Logic Units (TLU), which were first introduced by McCulloch and Walter
Pitts in the 1940s. This foundational model has played a crucial role in the development of more
advanced neural networks and machine learning algorithms.
Types of Perceptron
• Input Features: The perceptron takes multiple input features, each representing a characteristic
of the input data.
• Weights: Each input feature is assigned a weight that determines its influence on the output.
These weights are adjusted during training to find the optimal values.
• Summation Function: The perceptron calculates the weighted sum of its inputs, combining
them with their respective weights.
• Activation Function: The weighted sum is passed through the Heaviside step function,
comparing it to a threshold to produce a binary output (0 or 1).
• Output: The final output is determined by the activation function, often used for binary
classification tasks.
• Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.
• Learning Algorithm: The perceptron adjusts its weights and bias using a learning algorithm,
such as the Perceptron Learning Rule, to minimize prediction errors.
These components enable the perceptron to learn from data and make predictions. While a single
perceptron can handle simple binary classification, complex tasks require multiple perceptrons
organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the importance of that input in
determining the output. The Perceptron’s output is calculated as a weighted sum of the inputs, which
is then passed through an activation function to decide whether the Perceptron will fire.
The step function compares this weighted sum to a threshold. If the input is larger than the threshold
value, the output is 1; otherwise, it’s 0. This is the most common activation function used in
Perceptrons are represented by the Heaviside step function:
A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully connected
to all input nodes.In a fully connected layer, also known as a dense layer, all neurons in one layer are
connected to every neuron in the previous layer.
where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is the step
function.
During training, the Perceptron’s weights are adjusted to minimize the difference between the
predicted output and the actual output. This is achieved using supervised learning algorithms like the
delta rule or the Perceptron learning rule.
Where:
• wi,jwi,j is the weight between the ithith input and jthjth output neuron,
• xixi is the ithith input value,
• yjyj is the actual value, and y^jy^ j is the predicted value,
• ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy over time.
Let’s take a simple example of classifying whether a given fruit is an apple or not based on two
inputs: its weight (in grams) and its color (on a scale of 0 to 1, where 1 means red). The perceptron
receives these inputs, multiplies them by their weights, adds a bias, and applies the activation
function to decide whether the fruit is an apple or not.
Let’s assume the activation function uses a threshold of 75. Since 76.4 > 75, the perceptron classifies
the fruit as an apple (output = 1).
We consider the weight values for the number of inputs + 1 (with the additional +1 accounting for
the bias term). This ensures that both the inputs and bias are included during training.
The first step is to calculate the weighted sum of the inputs. This is done using the formula: Z = XW +
b, where X represents the inputs, W the weights, and b the bias.
def linear(self, inputs):
Z = inputs @ self.weights[1:].T + self.weights[0] # Weighted sum: XW + b
return Z
Use the linear function followed by the activation function to generate predictions based on the input
features.
Step 5: Define the Loss FunctionThe loss function calculates the error between the predicted
output and the actual output. In the Perceptron, the loss is the difference between the target value
and the predicted value.
In this step, weights and bias are updated according to the error calculated from the loss function.
The Perceptron learning rule is applied to adjust the weights to minimize the error.
The fitting process involves training the model over multiple iterations (epochs) to adjust the weights
and bias. This allows the Perceptron to learn from the data and improve its prediction accuracy over
time.
What is Backpropagation?
Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial neural
networks, particularly feed-forward networks. It works iteratively, minimizing the cost function by
adjusting weights and biases.
In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient
descent. The algorithm computes the gradient using the chain rule from calculus, allowing it to
effectively navigate complex layers in the neural network to minimize the cost function.
Why is Backpropagation Important?
Backpropagation plays a critical role in how neural networks improve over time. Here's why:
1. Efficient Weight Update: It computes the gradient of the loss function with respect to each
weight using the chain rule, making it possible to update weights efficiently.
2. Scalability: The backpropagation algorithm scales well to networks with multiple layers and
complex architectures, making deep learning feasible.
3. Automated Learning: With backpropagation, the learning process becomes automated, and
the model can adjust itself to optimize its performance.
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward Pass.
In the forward pass, the input data is fed into the input layer. These inputs, combined with their
respective weights, are passed to hidden layers.
For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output from
h1 serves as the input to h2. Before applying an activation function, a bias is added to the weighted
inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the
input if it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex
relationships in the data. Finally, the outputs from the last hidden layer are passed to the output
layer, where an activation function, such as softmax, converts the weighted outputs into probabilities
for classification.
In the backward pass, the error (the difference between the predicted and actual output) is
propagated back through the network to adjust the weights and biases. One common method for
error calculation is the Mean Squared Error (MSE), given by:
Once the error is calculated, the network adjusts weights using gradients, which are computed with
the chain rule. These gradients indicate how much each weight and bias should be adjusted to
minimize the error in the next iteration. The backward pass continues layer by layer, ensuring that the
network learns and improves its performance. The activation function, through its derivative, plays a
crucial role in computing these gradients during backpropagation.
Example geeks for geek swe kr lena
Types Of Learning Rules in ANN
Learning rule enhances the Artificial Neural Network’s performance by applying this rule over the
network. Thus learning rule updates the weights and bias levels of a network when certain
conditions are met in the training process. it is a crucial part of the development of the Neural
Network.
• If two neighbor neurons are operating in the same phase at the same period of time, then the
weight between these neurons should increase.
• For neurons operating in the opposite phase, the weight between them should decrease.
• If there is no signal correlation, the weight does not change, the sign of the weight between
two nodes depends on the sign of the input between those nodes
• When inputs of both the nodes are either positive or negative, it results in a strong positive
weight.
• If the input of one node is positive and negative for the other, a strong negative weight is
present.
Mathematical Formulation:
δw=αxiy
where δw=change in weight,α is the learning rate.xi the input vector,y the output.
2. Perceptron Learning Rule
Computed as follows:
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
α=learning rate
actual output(y)=wixi
δw=αxiej
wnew=wo+δw
Now, the output can be calculated on the basis of the input and the activation function applied over
the net input and can be expressed as:
It was developed by Bernard Widrow and Marcian Hoff and It depends on supervised learning and
has a continuous activation function. It is also known as the Least Mean Square method and it
minimizes error over all the training patterns.
It is based on a gradient descent approach which continues forever. It states that the modification in
the weight of a node is equal to the product of the error and the input where the error is the
difference between desired and actual output.
Computed as follows:
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
Error= ti-y
Learning signal(ej)=(ti-y)y’
δw=αxiej=αxi(ti-y)y’
wnew=wo+δw
The updating of weights can only be done if there is a difference between the target and actual
output(i.e., error) present:
wnew=wo+δw
The correlation learning rule follows the same similar principle as the Hebbian learning rule,i.e., If
two neighbor neurons are operating in the same phase at the same period of time, then the weight
between these neurons should be more positive. For neurons operating in the opposite phase, the
weight between them should be more negative but unlike the Hebbian rule, the correlation rule is
supervised in nature here, the targeted response is used for the calculation of the change in weight.
In Mathematical form:
δw=αxitj
where δw=change in weight,α=learning rate,xi=set of the input vector, and tj=target value
Where α=learning rate, y=actual output, and t=desired output for n layer nodes.
It is also known as the Winner-takes-All rule and is unsupervised in nature. Here all
the output nodes try to compete with each other to represent the input pattern and the
winner is declared according to the node having the most outputs and is given the
output 1 while the rest are given 0.
There are a set of neurons with arbitrarily distributed weights and the activation function is applied
to a subset of neurons. Only one neuron is active at a time. Only the winner has updated weights, the
rest remain unchanged.
Activation Functions
Activation functions introduce non-linearity into a neural network, allowing it to learn complex
patterns. They are applied to the output of neurons to determine whether they should "fire" or
contribute to the final output. Here's a detailed breakdown:
αx, x≤0 }
b. Range: (-∞, ∞)
c. Use Case: Addresses the dead neuron problem.
d. Pros: Allows small gradients for negative inputs.
e. Cons: Choice of α\alpha is often arbitrary.
6. Softmax
a. Equation:
b. Range: (0, 1) f(x)=x
c. Use Case: Multi-class classification.
d. Pros: Outputs probabilities that sum to 1.
e. Cons: Computationally expensive for large output spaces.
Loss Functions
Loss functions quantify the difference between the predicted output and the actual target, guiding
the optimization process.
In the context of function approximation, activation and loss functions play complementary roles:
• Activation Functions:
o Enable networks to approximate non-linear functions. Without them, neural networks
would be limited to linear transformations.
o Different activation functions introduce flexibility in modeling varying complexity levels
of the target function.
• Loss Functions:
o Guide the optimization to minimize the discrepancy between the network’s
approximation and the target function.
o The choice of loss function influences how errors are penalized, affecting the learning of
the function.
Together, they enable neural networks to approximate highly complex mappings from inputs to
outputs, learning representations that can generalize well to unseen data.