0% found this document useful (0 votes)
9 views

3rd Unit DL Final Class Notes (1)

Deep learning notes Jntuh

Uploaded by

niteeshs7e
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

3rd Unit DL Final Class Notes (1)

Deep learning notes Jntuh

Uploaded by

niteeshs7e
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Subject: Introduction to Deep Learning

(UNIT-3 Class Notes)


R18 B.Tech. CSE (AIML) III & IV Year JNTU Hyderabad

NEURAL NETWORKS AND DEEP LEARNING

B.Tech. IV Year I Sem. L T P C


3 0 0 3
Course Objectives:
 To introduce the foundations of Artificial Neural Networks
 To acquire the knowledge on Deep Learning Concepts
 To learn various types of Artificial Neural Networks
 To gain knowledge to apply optimization strategies

Course Outcomes:
 Ability to understand the concepts of Neural Networks
 Ability to select the Learning Networks in modeling real world systems
 Ability to use an efficient algorithm for Deep Models
 Ability to apply optimization strategies for large scale applications

UNIT-I
Artificial Neural Networks Introduction, Basic models of ANN, important terminologies, Supervised
Learning Networks, Perceptron Networks, Adaptive Linear Neuron, Back-propagation Network.
Associative Memory Networks. Training Algorithms for pattern association, BAM and Hopfield
Networks.

UNIT-II
Unsupervised Learning Network- Introduction, Fixed Weight Competitive Nets, Maxnet, Hamming
Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization, Counter Propagation
Networks, Adaptive Resonance Theory Networks. Special Networks-Introduction to various networks.

UNIT - III
Introduction to Deep Learning, Historical Trends in Deep learning, Deep Feed - forward networks,
Gradient-Based learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms

UNIT - IV
Regularization for Deep Learning: Parameter norm Penalties, Norm Penalties as Constrained
Optimization, Regularization and Under-Constrained Problems, Dataset Augmentation, Noise
Robustness, Semi-Supervised learning, Multi-task learning, Early Stopping, Parameter Typing and
Parameter Sharing, Sparse Representations, Bagging and other Ensemble Methods, Dropout,
Adversarial Training, Tangent Distance, tangent Prop and Manifold, Tangent Classifier

UNIT - V
Optimization for Train Deep Models: Challenges in Neural Network Optimization, Basic Algorithms,
Parameter Initialization Strategies, Algorithms with Adaptive Learning Rates, Approximate Second-
Order Methods, Optimization Strategies and Meta-Algorithms
Applications: Large-Scale Deep Learning, Computer Vision, Speech Recognition, Natural Language
Processing

TEXT BOOKS:
1. Deep Learning: An MIT Press Book By Ian Goodfellow and Yoshua Bengio and Aaron Courville
2. Neural Networks and Learning Machines, Simon Haykin, 3rd Edition, Pearson Prentice Hall.
III-UNIT

Introduction to Deep Learning

1.1 Introduction about Deep Learning[1st Topic]

1. Deep learning is a subset of machine learning that focuses on training artificial


neural networks to learn and make predictions.
2. It is inspired by the structure and function of the human brain, where artificial neural
networks attempt to mimic the behavior of neurons.
3. Neural networks are composed of interconnected layers of nodes called neurons,
which process and transmit information.
4. Deep learning models are called "deep" because they typically have multiple hidden
layers between the input and output layers.
5. Training a deep learning model involves feeding it a large amount of labeled data
and adjusting the weights and biases of the neurons to minimize the difference
between predicted and actual outputs.
6. Deep learning has gained popularity because it can automatically learn features from
raw data, reducing the need for manual feature engineering.
7. It has been successful in various applications, such as computer vision, natural
language processing, speech recognition, and recommendation systems.
8. Some popular deep learning architectures include Convolutional Neural Networks
(CNNs) for image processing, Recurrent Neural Networks (RNNs) for sequence
data, and Generative Adversarial Networks (GANs) for generating new content.
9. Deep learning has achieved remarkable results in tasks like image classification,
object detection, machine translation, and even beating human players in complex
games like Go and Chess.
10. However, deep learning models require large amounts of data for training and can
be computationally intensive, often requiring specialized hardware like GPUs or
TPUs.
11. Despite its successes, deep learning still faces challenges, such as interpretability
(understanding how and why the model makes predictions) and robustness to
adversarial attacks (where small perturbations can fool the model).
Researchers and engineers continue to explore and enhance deep learning
techniques to overcome these challenges and unlock more applications.
1.2 What is Deep Learning?
1. Deep learning is a subfield of machine learning that focuses on training artificial
neural networks to learn and make predictions.
2. It is based on the concept of artificial neural networks, which are inspired by the
structure and function of the human brain.
3. Deep learning models are called "deep" because they typically have multiple
layers of interconnected nodes, known as neurons.
4. These neurons process and transmit information, allowing the network to capture
complex patterns and relationships in the data.
5. Deep learning models are trained by feeding them large amounts of labeled data
and adjusting the weights and biases of the neurons to minimize prediction
errors.
6. One of the key advantages of deep learning is its ability to automatically learn
features from raw data, reducing the need for manual feature engineering.
7. Deep learning has been successful in various applications, including computer
vision, natural language processing, speech recognition, and recommendation
systems.
8. Some popular deep learning architectures include Convolutional Neural
Networks (CNNs) for image processing, Recurrent Neural Networks (RNNs) for
sequence data, and Generative Adversarial Networks (GANs) for generating new
content.
9. Deep learning has achieved impressive results, such as surpassing human
performance in tasks like image classification, object detection, and game
playing.
10. However, deep learning models require a significant amount of data for training
and can be computationally intensive, often requiring specialized hardware.
11. There are still challenges in deep learning, such as interpretability
(understanding how and why the model makes predictions) and robustness to
adversarial attacks.
Researchers and practitioners are actively working on improving deep learning
techniques to address these challenges and explore new applications.
1.3. Advantages of Deep Learning:
1. Automatic feature extraction: Deep learning models can automatically learn
relevant features from raw data, reducing the need for manual feature engineering.
2. High Accuracy: Deep learning models have achieved impressive results in various
tasks, often surpassing human performance in areas like image recognition and
natural language processing.
3. Handling complex data: Deep learning models can effectively handle large and
complex datasets with high-dimensional inputs, such as images, audio, and text.
4. Scalability: Deep learning models can scale well with large amounts of data,
allowing for improved performance as more data becomes available.
5. Versatility: Deep learning can be applied to a wide range of tasks, including image
and speech recognition, language translation, recommendation systems, and more.

1.4 Disadvantages of Deep Learning:

1. Data Requirements: Deep learning models typically require large amounts of


labeled data for training, which can be challenging to obtain in certain domains.
2. Computational requirements: Training deep learning models can be
computationally intensive and may require specialized hardware like GPUs or
TPUs.
3. Lack of Interpretability: Deep learning models can be considered black boxes,
making it difficult to understand and interpret how and why they make certain
predictions.
4. Overfitting: Deep learning models are prone to overfitting, meaning they can
perform well on the training data but struggle to generalize to new, unseen data.
5. Vulnerability to adversarial attacks: Deep learning models can be susceptible to
small, intentional perturbations in the input data that can cause the model to make
incorrect predictions.
It's important to note that while deep learning has many advantages, it may not
always be the best approach for every problem. It's crucial to consider the specific
requirements and limitations of the problem at hand before deciding to use deep
learning.
1.5 Applications of Deep Learning:

a. Computer vision: Deep learning has revolutionized computer vision tasks such as
image classification, object detection, and image segmentation. It enables machines
to understand and interpret visual data, powering technologies like autonomous
vehicles, facial recognition, and augmented reality.

b. Natural language processing: Deep learning is extensively used in natural


language processing tasks, including sentiment analysis, language translation,
chatbots, and speech recognition. It enables machines to understand, generate, and
interact with human language.

c. Recommender systems: Deep learning plays a significant role in recommendation


systems, helping platforms personalize content and make tailored suggestions to
users. It powers recommendation algorithms used by popular platforms like Netflix,
Amazon, and Spotify.

d. Healthcare: Deep learning has shown promise in medical imaging analysis, disease
diagnosis, and drug discovery. It aids in identifying patterns in medical images,
predicting disease outcomes, and developing new treatments.

e. Finance: Deep learning is used in finance for tasks like fraud detection, algorithmic
trading, and credit scoring. It helps identify fraudulent transactions, analyze market
trends, and make data-driven investment decisions.

1.6 Challenges of Deep Learning:

a. Data availability: Deep learning models require large amounts of labeled data for
training, which can be challenging to obtain in certain domains. Limited or biased
data can affect model performance and generalization.

b. Computational requirements: Training deep learning models can be


computationally intensive and time-consuming, often requiring powerful hardware
resources like GPUs or TPUs. This can limit the accessibility and scalability of deep
learning approaches.

c. Interpretability: Deep learning models are often considered black boxes, making
it difficult to understand the rationale behind their predictions. The lack of
interpretability raises concerns in critical applications like healthcare and finance.

d. Overfitting: Deep learning models are prone to overfitting, where they memorize
the training data instead of learning generalizable patterns. Overfitting can lead to
poor performance on unseen data.

e. Adversarial Attacks: Deep learning models can be vulnerable to adversarial


attacks, where small, intentional perturbations in the input data can cause the model
to make incorrect predictions. Ensuring robustness against such attacks is a crucial
challenge.

Addressing these challenges requires ongoing research and development in the field of
deep learning to improve data collection, model interpretability, regularization
techniques, and security measures.
2.1 Historical Trends in Deep Learning [2nd Topic]
1. Deep learning has a history that spans a long time and has been known by different
names, reflecting different perspectives and trends in the field.
2. The usefulness of deep learning has increased as the availability of training data has
grown. More data allows deep learning models to learn more effectively and make
better predictions.
3. Deep learning models have become larger over time due to advancements in
computer infrastructure. This includes improvements in both hardware (such as
GPUs) and software (such as optimized algorithms and frameworks) specifically
designed for deep learning.
4. As deep learning models have evolved, they have been able to tackle increasingly
complex applications with higher accuracy. This means that deep learning has been
successful in solving more challenging tasks and producing more reliable results.
2.1.1 The Many Names and Changing Fortunes of Neural Networks
1. Deep learning has a long history dating back to the 1940s, but it has recently gained
popularity and is often referred to as a new technology.
2. Deep learning has gone through various name changes over time, reflecting different
researchers and perspectives in the field.
3. There have been three waves of development in deep learning: cybernetics in the
1940s-1960s, connectionism in the 1980s-1990s, and the current resurgence known
as deep learning since 2006.
4. Deep learning models are sometimes called artificial neural networks (ANNs)
because they are inspired by the functioning of the biological brain.
5. While neural networks have been used to understand brain function, they are not
necessarily realistic models of how the brain works.
6. Deep learning is motivated by the idea of reverse engineering the brain's
computational principles to build intelligent systems and understand human
intelligence.
7. Deep learning also focuses on learning multiple levels of composition, which can be
applied in machine learning frameworks that are not necessarily based on neural
inspiration.

The figure represents two historical waves of artificial neural network research based
on Google Books. The first wave, cybernetics (1940s-1960s), focused on theories of
biological learning and the development of the perceptron, a model that could train a
single neuron. The second wave, connectionism (1980-1995), introduced back-
propagation to train neural networks with one or two hidden layers. The current third
wave, deep learning, began around 2006 and is just now being documented in books
since 2016. It's important to note that books on these waves usually appear later than
the actual research takes place.

Early Neural Networks (Cybernetics)

MCCULLOCH-PITTS NEURON:

 Formula: f(x, w) = x1w1 + ... + xnwn


 Function: Recognizes two categories of inputs based on whether f(x, w) is
positive or negative 
 Limitations: Weights need to be set correctly, cannot learn complex functions 

Perceptron and ADALINE

PERCEPTRON:

 Formula: f(x, w) = x1w1 + ... + xnwn + b


 Function: Learns weights to recognize two categories of inputs 
 Limitations: Cannot learn complex functions, cannot handle non-linear
relationships 
ADALINE:

 Formula: f(x, w) = x1w1 + ... + xnwn


 Function: Learns weights to predict real-valued numbers 
 Limitations: Cannot learn complex functions, cannot handle non-linear
relationships 

Linear Models

 Training Algorithm: Stochastic gradient descent


 Applications: Widely used in machine learning 
 Limitations: Cannot learn complex functions, cannot handle non-linear
relationships 
 Impact: Critics led to backlash against biologically inspired learning 
NEUROSCIENCE AND DEEP LEARNING

 Neuroscience: Still a source of inspiration, but not the predominant guide


 Reason: Lack of information about the brain
 Future: Deep understanding of brain algorithms requires monitoring thousands of
neurons simultaneously

 Deep learning was inspired by neuroscience but is not a direct simulation of the
brain.
 Early neural networks were simple linear models and could only learn to recognize
two categories of inputs.
 The perceptron was the first model that could learn to recognize multiple categories
of inputs.
 Linear models have limitations and cannot learn certain functions, such as the XOR
function.
 Neuroscience is still an important source of inspiration for deep learning, but it is
not the predominant guide for the field.
 We do not have enough information about the brain to use it as a complete guide for
deep learning research.
 Deep learning researchers are more likely to cite the brain as an influence than
researchers working in other machine learning fields.
 Deep learning and computational neuroscience are two separate fields of study that
are both concerned with understanding the brain.
 Deep learning is focused on building AI systems, while computational neuroscience
is focused on building accurate models of the brain.
 Connectionism is a movement in cognitive science that studies models of cognition
based on neural implementations.
 Distributed representation is a key concept in connectionism that states that each
input should be represented by many features and each feature should be involved
in the representation of many possible inputs.
 Back-propagation: A popular algorithm for training deep neural networks.
 LSTM: A type of neural network that is well-suited for modeling sequences.
 Decline of neural networks: In the 1990s, neural networks lost popularity due to
unrealistic expectations and advances in other machine learning fields.
 CIFAR NCAP research initiative: A program that helped to keep neural networks
research alive during the decline.
 Deep Networks: Were once thought to be very difficult to train, but this is no longer
the case.
 Geoffrey Hinton: Developed a new technique for training deep neural
networks called greedy layer-wise pre-training.
 Deep belief networks: A type of neural network that can be efficiently trained
using deep belief networks.
 Deep learning: A term used to emphasize the ability to train deeper neural
networks.
 Third wave of neural networks research: Began in 2006 and is still ongoing.
 Focus of deep learning research: Has shifted from unsupervised learning to
supervised learning.
2.1.2.Important and Conclusions points about Deep Feedforward Networks
1. Deep feedforward networks, also known as multilayer perceptrons (MLPs), are a
type of artificial neural network that approximate a function f* by defining a
mapping y = f(x; θ) and learning the parameters θ. These networks are called
feedforward because information flows in a single direction from the input x to the
output y, without feedback connections. When extended with feedback connections,
they become recurrent neural networks.
2. Feedforward networks consist of multiple layers of functions, where each layer is
connected to the next in a chain. The first layer is called the input layer, and the final
layer is called the output layer. The layers in between are called hidden layers
because their behavior is not directly specified by the training data.
3. During training, the network is presented with labeled examples (x, y) to learn the
desired output y for each input x. The learning algorithm determines how to use the
hidden layers to best approximate f*.
4. The width of the network is determined by the dimensionality of the hidden layers,
and the depth is determined by the number of layers. The choice of functions used
to compute the hidden layer values is inspired by neuroscience, but the goal of these
networks is not to perfectly model the brain.
5. To overcome the limitations of linear models, such as logistic regression and linear
regression, which can only represent linear functions, we can apply a nonlinear
transformation φ(x) to the input x to obtain a set of features describing x. This is
equivalent to using a kernel function in kernel machines.
6. In deep learning, we learn the function φ(x; θ) and map it to the desired output using
parameters w. This approach allows us to capture the benefits of both highly generic
feature mappings and manually engineered feature mappings, while avoiding the
limitations of either.
7. To train a feedforward network, we choose an optimizer, cost function, and output
units, which are similar to those used for linear models. We also choose the
activation functions used to compute the hidden layer values and design the
architecture of the network, including the number of layers, connections between
layers, and number of units in each layer.
8. Computing the gradients of complicated functions in deep neural networks requires
the back-propagation algorithm and its modern generalizations, which can
efficiently compute these gradients.
9. Deep feedforward networks are a type of artificial neural network that approximate
a function by defining a mapping y = f(x; θ) and learning the parameters θ. They
consist of multiple layers of functions, where each layer is connected to the next in
a chain, and can capture the benefits of both highly generic and manually engineered
feature mappings.
10. Training and optimization techniques, such as choosing an optimizer, cost function,
output units, activation functions, and designing the network architecture, are
required to effectively train these networks. The back-propagation algorithm is used
to efficiently compute the gradients required for learning.

2.1.3 A Feedforward Neural Network’s Layers

The following are the components of a feedforward neural network:

Layer of input

It contains the neurons that receive input. The data is subsequently passed on to the next
tier. The input layer’s total number of neurons is equal to the number of variables in the
dataset.

Hidden layer

This is the intermediate layer, which is concealed between the input and output layers.
This layer has a large number of neurons that perform alterations on the inputs. They
then communicate with the output layer.

Output layer

It is the last layer and is depending on the model’s construction. Additionally, the output
layer is the expected feature, as you are aware of the desired outcome.
Neurons weights

Weights are used to describe the strength of a connection between neurons. The range
of a weight’s value is from 0 to 1.

Cost Function in Feedforward Neural Network

The cost function is an important factor of a feedforward neural network. Generally,


minor adjustments to weights and biases have little effect on the categorized data points.
Thus, to determine a method for improving performance by making minor adjustments
to weights and biases using a smooth cost function.

The mean square error cost function is defined as follows:

Where,

w = weights collected in the network

b = biases

n = number of training inputs

a = output vectors

x = input

‖v‖ = usual length of vector v

Loss Function in Feedforward Neural Network


A neural network’s loss function is used to identify if the learning process needs to be
adjusted.

As many neurons as there are classes in the output layer. To show the difference
between the predicted and actual distributions of probabilities.

The cross-entropy loss for binary classification is as follows.

The cross-entropy loss associated with multi-class categorization is as follows:

Figure-Deep Feed Forward Architectural Diagram


2.2 Feed Forward Networks
1. Linear models are not able to solve the XOR problem because they cannot represent non-
linear relationships between inputs. A feedforward network with a hidden layer can be used
to solve this problem. The hidden layer learns a new feature space in which the XOR
function can be represented by a linear model. The function that computes the hidden
layer outputs should be non-linear.
2. The linear model we obtained has w = 0 and b = 1, which means it predicts a constant
value of 0.5 everywhere. This happens because the linear model cannot represent some
functions, like the XOR function. To solve this, we can use a feedforward network with a
hidden layer containing two hidden units and a nonlinear activation function.
3. The hidden units' values are computed using a function f(1) with learned parameters W
and c, and the output is computed using a linear regression applied to the hidden units'
values. This network allows us to learn a different feature space where a linear model can
represent the solution.
4. If f(1) were linear, the network would still be a linear function of its input, so we need a
nonlinear function to describe the features. Most neural networks use a nonlinear
activation function after a linear transformation with learned weights and biases.

Figure-1: Solving the XOR problem by learning a representation


The XOR problem is difficult to solve with a linear model because it requires the output to
behave differently based on the values of two inputs. A linear model can't do this because it
applies a fixed coefficient to one input. However, by learning a nonlinear representation of the
inputs through a neural network, the problem can be solved with a linear model in the new
feature space.
This is because the neural network collapses certain input points into the same point in the
new space, allowing the linear model to describe the function using the new features. This
technique can increase the model's capacity to fit the training data and also improve its ability
to generalize to new inputs.

Figure-2: An example of a feedforward network, drawn in two different styles


The feedforward network shown here has a single hidden layer with 2 units, used to solve the
XOR problem. In the left diagram, each unit is represented as a separate node, making it clear
and unambiguous but taking up a lot of space for larger networks. In the right diagram, a node
represents an entire layer's activation vector, making it more compact. The edges are labeled
with parameter names, such as W for the mapping from input to hidden layer and w for the
mapping from hidden to output layer, without including intercept parameters. This style of
diagram is more concise but may require additional context to fully understand.
Figure-3 The rectified linear activation function.
The rectified linear activation function (ReLU) is a popular choice for deep learning
models because it's simple, efficient, and effective. It's like a gate that turns on when
the input is positive and stays off when it's negative. This helps the model learn complex
patterns without getting too complicated.
 The rectified linear activation function is a commonly used function in neural
networks. It takes the output of a linear transformation and applies a nonlinear
transformation, but it remains almost linear.
 This function is recommended for most feedforward neural networks because it
preserves the properties that make linear models easy to optimize and generalize
well.
 It is a piecewise linear function with two linear pieces, and it is nearly linear, which
means it is a simple component that can be used to build complex function
approximators, just like a Turing machine's memory can store only 0 or 1 states.
 The neural network we trained was able to accurately predict the answer for all the
examples in the batch. In a real situation with many parameters and examples, we
can't just guess the solution like we did here.
 Instead, we use a gradient-based optimization algorithm to find parameters that
result in low error. The algorithm can converge to a point with very little error, but
the solution may not be as simple as the one we presented for the XOR problem.
 The solution found by the algorithm depends on its initial values, and in practice, it
may not be as clean and easy to understand as the one we showed.
CHAPTER 6. DEEP FEEDFORWARD NETWORKS

1
w= , (6.6)
−2
and b = 0.
We can now walk through the way that the model processes a batch of inputs.
Let X be the design matrix containing all four points in the binary input space,
with one example per row:
0 0
0 1
X = . (6.7)
1 0
1 1
The first step in the neural network is to multiply the input matrix by the first
layer’s weight matrix:
0 0
1 1
XW = . (6.8)
1 1
2 2
Next, we add the bias vector c, to obtain
0 −1
1 0
. (6.9)
1 0
2 1
In this space, all of the examples lie along a line with slope 1. As we move along
this line, the output needs to begin at 0, then rise to 1, then drop back down to 0.
A linear model cannot implement such a function. To finish computing the value
of h for each example, we apply the rectified linear transformation:
0 0
1 0
. (6.10)
1 0
2 1
This transformation has changed the relationship between the examples. They no
longer lie on a single line. As shown in figure 6.1, they now lie in a space where a
linear model can solve the problem.
We finish by multiplying by the weight vector w:
0
1
. (6.11)
1
0
176

You might also like