Module 2
Module 2
Key Characteristics:
• Consists of layers of nodes (neurons) connected by edges (weights).
• Learns by adjusting weights through a process called training.
• Used for tasks like image recognition, natural language processing,
and predictive modeling.
Artificial neural networks (ANNs) are a fundamental concept in deep
learning within artificial intelligence. They are crucial in handling complex
application scenarios that traditional machine-learning algorithms may
struggle with. Here’s an overview of how neural networks operate and
their components:
• Inspired by Biology
ANNs are inspired by biological neurons in the human brain. Just as
neurons activate under specific conditions to trigger actions in the
body, artificial neurons in ANNs activate based on input data.
• Structure of ANNs
ANNs consist of layers of interconnected artificial neurons. These
neurons are organized into layers, each performing specific
computations using activation functions to decide which signals to pass
onto the next layer.
2. The Structure of a Neural Network
Neural networks are composed of three main layers:
a. Input Layer:
1.Receives raw data (e.g., numerical values, pixel intensities).
2.Passes the data to the next layer for processing.
b. Hidden Layers:
3.Perform computations on the input data.
4.Apply activation functions to introduce non-linearity.
5.There can be multiple hidden layers in a deep neural network.
c. Output Layer:
6.Produces the final output (e.g., a classification, regression value,
or probability).
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal
brains So they share a lot of similarities in structure and function wise.
3. How Neural Networks Work
a. Input Data: The model receives data in numerical form.
Example: An image is represented as a matrix of pixel values.
b. Forward Propagation: The input data flows through the network:
1.Each neuron processes inputs by applying weights and biases.
2.Outputs are passed through an activation function (e.g., ReLU,
sigmoid).
c. Prediction: The network generates an output (e.g., a label, score, or
probability).
d. Training:
3.The network learns by comparing its predictions with actual
outputs (labels) using a loss function.
4.Backpropagation adjusts weights to minimize error.
4. Key Terms and Concepts
• Neuron (Node) - The basic unit in a neural network that processes input and
generates output.
• Weights - Numerical values representing the importance of connections between
neurons.
• Bias - A constant added to the input to help the network fit the data better.
• Activation Function - Introduces non-linearity into the model to capture complex
patterns.
• Examples: ReLU, sigmoid, tanh.
• Loss Function -Measures the difference between predicted and actual values.
• Examples: Mean Squared Error (MSE), Cross-Entropy Loss.
• Learning Rate - Determines how much the weights are adjusted during training.
5. Types of Neural Networks
• Feedforward Neural Networks (FNN):
• Data flows in one direction from input to output.
• Used for simple classification and regression tasks.
• Convolutional Neural Networks (CNN):
• Specialized for image data.
• Uses convolution layers to detect spatial patterns like edges or
shapes.
• Recurrent Neural Networks (RNN):
• Processes sequential data (e.g., text, time series).
• Has memory mechanisms to retain context over time.
• Deep Neural Networks (DNN):
• Contains multiple hidden layers.
• Used for complex problems requiring high accuracy.
Applications of Artificial Neural Networks
Social Media
• Artificial Neural Networks are used heavily in Social Media. For example, let’s
take the ‘People you may know’ feature on Facebook that suggests people that
you might know in real life so that you can send them friend requests.
• Well, this magical effect is achieved by using Artificial Neural Networks that
analyze your profile, your interests, your current friends, and also their friends
and various other factors to calculate the people you might potentially know.
• Another common application of Machine Learning in social media is facial
recognition .
• This is done by finding around 100 reference points on the person’s face and
then matching them with those already available in the database using
convolutional neural networks.
Marketing and Sales
• When you log onto E-commerce sites like Amazon and Flipkart, they
will recommend your products to buy based on your previous
browsing history.
• Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show
you restaurant recommendations based on your tastes and previous
order history.
• This is true across all new-age marketing segments like Book sites,
Movie services, Hospitality sites, etc. and it is done by
implementing personalized marketing .
• This uses Artificial Neural Networks to identify the customer likes,
dislikes, previous shopping history, etc., and then tailor the marketing
campaigns accordingly.
Personal Assistants
• I am sure you all have heard of Siri, Alexa, Cortana, etc.
• These are personal assistants and an example of speech recognition
that uses Natural Language Processing to interact with the users and
formulate a response accordingly.
• Natural Language Processing uses artificial neural networks that are
made to handle many tasks of these personal assistants such as
managing the language syntax, semantics, correct speech, the
conversation that is going on, etc.
Basic example of implementing a simple neural network using Python and
TensorFlow in Google Colab. This neural network classifies handwritten
digits from the MNIST dataset.
https://colab.research.google.com/drive/1T1ACgTU0Jx7JElK0S5NyqLwFAtSZXgxB?usp=sharing
What is Perceptron | The Simplest Artificial
neural network
The Perceptron is one of the simplest artificial neural
network architectures, introduced by Frank Rosenblatt in 1957. It is
primarily used for binary classification.
The step function compares this weighted sum to a threshold. If the input is larger
than the threshold value, the output is 1; otherwise, it’s 0.
This is the most common activation function used in Perceptrons are represented
by the Heaviside step function:
A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU
fully connected to all input nodes.
In a fully connected layer, also known as a dense layer, all neurons in
one layer are connected to every neuron in the previous layer.
The output of the fully connected layer is computed as:
Sigmoid Function
• Formula: f(x) = 1 / (1 + exp(-x))
• Range: (0,1)
• Pros:
• Smooth and differentiable.
• Converts inputs into probabilities (useful in binary classification).
• Cons:
• Prone to vanishing gradient problem (gradients become very small
for extreme values of x).
• Not centered around zero, leading to inefficient weight updates.
Hyperbolic Tangent (Tanh) Function
• Formula:
• f(x) = x if x > 0
• f(x) = α * (exp(x) - 1) if x ≤ 0
• Range: (-∞, +∞)
• Pros:
• Avoids dead neuron problem.
• Has negative values, which improves gradient flow.
• Cons:
• More computationally expensive than ReLU.
• Requires tuning the parameter .
Softmax Function
The "e" in the formulas represents Euler's number, which is a mathematical constant
approximately equal to: e≈2.7183
It is the base of the natural logarithm and is widely used in exponential functions, especially in
activation functions for neural networks.
1. Growth and Decay - Functions involving ex naturally model exponential growth or decay,
making them useful for probability functions and gradients in machine learning.
2. Differentiation Simplicity - The derivative of ex is simply ex, which makes backpropagation
calculations in neural networks efficient.
3. Probability Applications - In functions like sigmoid and softmax, ex ensures positive outputs
and smooth transformations between values.
Intended Learning Outcome
Hidden layer
• Between the input and output layers, there can be one or more layers of
neurons.
• Each neuron in a hidden layer receives inputs from all neurons in the previous
layer (either the input layer or another hidden layer) and produces an output that
is passed to the next layer.
• The number of hidden layers and the number of neurons in each hidden layer are
hyperparameters that need to be determined during the model design phase.
Output layer
• This layer consists of neurons that produce the final output of the network.
• The number of neurons in the output layer depends on the nature of the task.
• In binary classification, there may be either one or two neurons depending on the
activation function and representing the probability of belonging to one class;
while in multi-class classification tasks, there can be multiple neurons in the
output layer.
Weights
• Neurons in adjacent layers are fully connected to each other.
• Each connection has an associated weight, which determines the strength of the
connection.
• These weights are learned during the training process.
Bias neurons
• In addition to the input and hidden neurons, each layer (except the input layer)
usually includes a bias neuron that provides a constant input to the neurons in the
next layer.
• Bias neurons have their own weight associated with each connection, which is also
learned during training.
• The bias neuron effectively shifts the activation function of the neurons in the
subsequent layer, allowing the network to learn an offset or bias in the decision
boundary.
• By adjusting the weights connected to the bias neuron, the MLP can learn to
control the threshold for activation and better fit the training data.
Note: It is important to note that in the context of MLPs, bias can refer to two related
but distinct concepts: bias as a general term in machine learning and the bias neuron
(defined above). In general machine learning, bias refers to the error introduced by
approximating a real-world problem with a simplified model. Bias measures how well
the model can capture the underlying patterns in the data.
In a multilayer perceptron, neurons process information in a step-by-step
manner, performing computations that involve weighted sums and
nonlinear transformations.
Input layer
• The input layer of an MLP receives input data, which could be features
extracted from the input samples in a dataset. Each neuron in the input
layer represents one feature.
• Neurons in the input layer do not perform any computations; they
simply pass the input values to the neurons in the first hidden layer.
Hidden layers
• The hidden layers of an MLP consist of interconnected neurons that perform computations on the
input data.
• Each neuron in a hidden layer receives input from all neurons in the previous
layer.
• The inputs are multiplied by corresponding weights, denoted as w.
• The weights determine how much influence the input from one neuron has on the
output of another.
• In addition to weights, each neuron in the hidden layer has an associated bias,
denoted as b.
• The bias provides an additional input to the neuron, allowing it to adjust its output
threshold. Like weights, biases are learned during training.
• For each neuron in a hidden layer or the output layer, the weighted sum of its
inputs is computed.
• This involves multiplying each input by its corresponding weight, summing up
these products, and adding the bias:
• The weighted sum is then passed through an activation function, denoted as f.
• The activation function introduces nonlinearity into the network, allowing it to learn
and represent complex relationships in the data.
• The activation function determines the output range of the neuron and its behavior
in response to different input values.
• The choice of activation function depends on the nature of the task and the desired
properties of the network.
Output layer
• The output layer of an MLP produces the final predictions or
outputs of the network.
• The number of neurons in the output layer depends on the task
being performed (e.g., binary classification, multi-class
classification, regression).
• Each neuron in the output layer receives input from the neurons
in the last hidden layer and applies an activation function.
• This activation function is usually different from those used in
the hidden layers and produces the final output value or
prediction.
Intended Learning Outcomes (ILOs) for
Loss Functions in Neural Networks