Notes DL-1
Notes DL-1
Lectures
Lecture # 3
Neural Networks
Neural networks are computational models that mimic the behavior of the human brain, capable of
learning from data and making decisions. • Neurons are the basic processing units of neural networks,
interconnected to form layers that process information
Activation Function
Definition: Activation functions are mathematical functions applied to the weighted sum of inputs at
each neuron to determine its output.
Purpose: Activation functions introduce non-linearity into the network, enabling it to learn complex
patterns and relationships in the data.
1. Sigmoid Function: S-shaped curve, squashes the input values between 0 and 1. Commonly used
in the output layer for binary classification tasks.
2. ReLU (Rectified Linear Unit): Piecewise linear function, outputs the input if it is positive, and zero
otherwise. Faster convergence and alleviates the vanishing gradient problem.
3. Tanh Function: Similar to the sigmoid function,but output values range from -1 to 1. Used in
hidden layers to introduce non-linearity.
Perceptrons
Definition: Perceptrons are single-layer neural networks that can make binary decisions based on a
linear combination of input features.
• Structure: A perceptron consists of: • Input layer: Receives input features. • Weights: Each input
feature is associated with a weight that determines its importance. • Activation function: Decides
whether the perceptron should fire (output a 1) or not (output a 0) based on the weighted sum of
inputs.
Perceptrons can only solve linearly separable problems where a single straight line can be drawn to
separate the classes.
Comparison with Complex Architectures: • Perceptrons have limited capabilities compared to more
complex neural network architectures such as multi-layer perceptrons or convolutional neural networks.
• Complex architectures can learn non-linear relationships in data and solve more complex classification
and regression tasks.
Definition: Single-layer networks consist of one layer of neurons directly connected to the input data. •
Capabilities: Single-layer networks are suitable for simple classification tasks where the classes are
linearly separable.
Limitations: 1.Inability to Represent Non-linear Relationships: Single-layer networks cannot learn non-
linear relationships in data, limiting their applicability to linearly separable problems. 2.Lack of Hidden
Layers: Without hidden layers, single-layer networks cannot capture complex patterns or hierarchies in
the data.
Applications: • Single-layer networks are used in basic classification tasks, such as binary classification
problems with linear decision boundaries
Definition: Multi-layer feedforward networks consist of multiple layers of neurons, including input,
hidden, and output layers.
• Architecture: Hidden layers enable the network to learn complex patterns and relationships in the data
by introducing non-linearity through activation functions.
Key Components: 1.Input Layer: Receives input data features. 2.Hidden Layers: Intermediate layers
between the input and output layers. Each hidden layer learns progressively more abstract
representations of the input data. 3.Output Layer: Produces the final output based on the learned
representations.
Advantages: • Capturing Complex Patterns: Multi-layer networks can capture non-linear relationships
in data and solve complex classification and regression tasks. • Hierarchical Representation: Hidden
layers enable the network to learn hierarchical representations of the input data, leading to better
generalization
FeedForward Process
•Forward Pass: During the feedforward process, input data is propagated forward through the network,
layer by layer. •Weighted Sum: At each neuron, the inputs are multiplied by corresponding weights, and
the weighted sum is computed. •Activation: The weighted sum is passed through an activation function
to introduce non-linearity and produce the output of the neuron.
Learning algorithms
Gradient Descent: • Definition: Optimization algorithm that minimizes the loss function by iteratively
adjusting the network parameters in the direction of steepest descent. • Behavior: Computes the
gradient of the loss function with respect to the network parameters and updates the parameters
accordingly. • Suitability: Widely used for training neural networks due to its simplicity and
effectiveness. •
Backpropagation: • Definition: Algorithm for efficiently computing gradients of the loss function with
respect to each parameter in the network. • Behavior: Propagates the error backwards through the
network, allowing for efficient computation of gradients using the chain rule. • Suitability: Essential for
training multi-layer neural networks by efficiently propagating errors and updating weights.
Key Points: • Gradient descent and backpropagation are fundamental algorithms for training neural
networks. • Gradient descent optimizes the network parameters to minimize the loss function, while
backpropagation efficiently computes gradients for parameter updates. • These algorithms enable
neural networks to learn from data and improve their performance over time
Training process
Training Set: • Dataset used to train the neural network by providing input-output pairs for learning. •
Epoch: • One complete pass through the entire training set. • Multiple epochs may be required for the
network to converge to an optimal solution. •
Batch Training: • Training the network using mini-batches of data samples rather than the entire
dataset. • Enables more efficient computation of gradients and parameter updates.
Key Points: • The training process involves iteratively adjusting the network parameters using
gradient-based optimization algorithms such as backpropagation. • Multiple epochs of training are
typically required to optimize the network parameters and minimize prediction error. • Batch training
improves the efficiency of gradient computation and parameter updates by using mini-batches of data
samples
Pattern Recognition: • Handwritten digit recognition, facial recognition, object detection, and image
classification are common applications of neural networks in pattern recognition tasks.
• Speech Recognition: • Transcription of spoken language into text, voice-controlled assistants, and
speaker identification utilize neural networks for accurate speech recognition.
• Autonomous Vehicles: • Neural networks play a crucial role in autonomous vehicles for tasks such as
object detection, lane detection, and decisionmaking, enabling safer and more efficient transportation
systems
Data Limitations: • Insufficient or low-quality data can hinder the performance and generalization ability
of neural networks, leading to suboptimal results.
•Overfitting: • Neural networks may memorize noise in the training data instead of learning meaningful
patterns, resulting in poor performance on unseen data. •
Types of ML Algorithms
● Supervised Learning ○ trained with labeled data; including regression and classification problems
● Unsupervised Learning ○ trained with unlabeled data; clustering and association rule learning
problems.
● Reinforcement Learning ○ no training data; stochastic Markov decision process; robotics and self-
driving cars.
● Limitations of traditional machine learning algorithms ○ not good at handling high dimensional data. ○
difficult to do feature extraction and object recognition.
A convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural
networks that explicitly assumes that the inputs are images, which allows us to encode certain
properties into the architecture.
What is Machine Learning?
Machine Learning is the study of algorithms that • improve their performance P • at some task T • with
experience E
ML is used when: • Human expertise does not exist (navigating on Mars) • Humans can’t explain their
expertise (speech recognition) • Models must be customized (personalized medicine) • Models are
based on huge amounts of data (genomics)
Types of Learning
• We generally assume that the training and test examples are independently drawn from the same
overall distribution of data – We call this “i.i.d” which stands for “independent and identically
distributed” • If examples are not independent, requires collective classification
ML in a Nutshell
• Symbolic functions – Decision trees – Rules in propositional logic – Rules in first-order predicate logic •
Instance-based functions – Nearest-neighbor – Case-based • Probabilistic Graphical Models – Naïve
Bayes – Bayesian networks – Hidden-Markov Models (HMMs) – Probabilistic Context Free Grammars
(PCFGs) – Markov networks
1 Measuring Model Performance: Evaluating model performance is essential for understanding its
strengths, weaknesses, and overall effectiveness.
2 Comparing to Ground Truth: Model predictions are compared to known, correct labels or values to
determine accuracy.
3 Identifying Improvement Areas: Careful analysis of evaluation metrics can reveal opportunities to
refine and optimize the model
Definition: Accuracy is the proportion of correct predictions made by the model out of all predictions.
Calculation: Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False
Positives + False Negatives)
Interpretation: Accuracy provides a general overview of model performance but doesn't reveal the full
picture.
Limitations: Accuracy can be misleading in imbalanced datasets or when certain errors are more
important than others
Definition: Precision measures the proportion of true positives among all the positive predictions made
by the model.
Interpretation: Precision is useful for evaluating the model's ability to avoid false positive errors
Definition: Recall measures the proportion of actual positive instances that the model correctly
identified.
Interpretation: Recall is useful for evaluating the model's ability to avoid false negative errors
Definition: The F1 score is the harmonic mean of precision and recall, providing a balanced measure of
model performance.
What is Overfitting?
Causes of Overfitting
Dangers of Overfitting
Bias-Variance Tradeoff
Regularization Methods
Ensemble Methods
Definition: Generalization refers to a model's ability to perform well on new, unseen data, not just the
data it was trained on.
Importance: Ensuring good generalization is crucial for the real-world applicability and deployment of
machine learning models.
Strategies: Techniques like regularization, cross-validation, and ensemble methods can help improve
model generalization