3rd Unit DL Final Class Notes (1)
3rd Unit DL Final Class Notes (1)
Course Outcomes:
Ability to understand the concepts of Neural Networks
Ability to select the Learning Networks in modeling real world systems
Ability to use an efficient algorithm for Deep Models
Ability to apply optimization strategies for large scale applications
UNIT-I
Artificial Neural Networks Introduction, Basic models of ANN, important terminologies, Supervised
Learning Networks, Perceptron Networks, Adaptive Linear Neuron, Back-propagation Network.
Associative Memory Networks. Training Algorithms for pattern association, BAM and Hopfield
Networks.
UNIT-II
Unsupervised Learning Network- Introduction, Fixed Weight Competitive Nets, Maxnet, Hamming
Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization, Counter Propagation
Networks, Adaptive Resonance Theory Networks. Special Networks-Introduction to various networks.
UNIT - III
Introduction to Deep Learning, Historical Trends in Deep learning, Deep Feed - forward networks,
Gradient-Based learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms
UNIT - IV
Regularization for Deep Learning: Parameter norm Penalties, Norm Penalties as Constrained
Optimization, Regularization and Under-Constrained Problems, Dataset Augmentation, Noise
Robustness, Semi-Supervised learning, Multi-task learning, Early Stopping, Parameter Typing and
Parameter Sharing, Sparse Representations, Bagging and other Ensemble Methods, Dropout,
Adversarial Training, Tangent Distance, tangent Prop and Manifold, Tangent Classifier
UNIT - V
Optimization for Train Deep Models: Challenges in Neural Network Optimization, Basic Algorithms,
Parameter Initialization Strategies, Algorithms with Adaptive Learning Rates, Approximate Second-
Order Methods, Optimization Strategies and Meta-Algorithms
Applications: Large-Scale Deep Learning, Computer Vision, Speech Recognition, Natural Language
Processing
TEXT BOOKS:
1. Deep Learning: An MIT Press Book By Ian Goodfellow and Yoshua Bengio and Aaron Courville
2. Neural Networks and Learning Machines, Simon Haykin, 3rd Edition, Pearson Prentice Hall.
III-UNIT
a. Computer vision: Deep learning has revolutionized computer vision tasks such as
image classification, object detection, and image segmentation. It enables machines
to understand and interpret visual data, powering technologies like autonomous
vehicles, facial recognition, and augmented reality.
d. Healthcare: Deep learning has shown promise in medical imaging analysis, disease
diagnosis, and drug discovery. It aids in identifying patterns in medical images,
predicting disease outcomes, and developing new treatments.
e. Finance: Deep learning is used in finance for tasks like fraud detection, algorithmic
trading, and credit scoring. It helps identify fraudulent transactions, analyze market
trends, and make data-driven investment decisions.
a. Data availability: Deep learning models require large amounts of labeled data for
training, which can be challenging to obtain in certain domains. Limited or biased
data can affect model performance and generalization.
c. Interpretability: Deep learning models are often considered black boxes, making
it difficult to understand the rationale behind their predictions. The lack of
interpretability raises concerns in critical applications like healthcare and finance.
d. Overfitting: Deep learning models are prone to overfitting, where they memorize
the training data instead of learning generalizable patterns. Overfitting can lead to
poor performance on unseen data.
Addressing these challenges requires ongoing research and development in the field of
deep learning to improve data collection, model interpretability, regularization
techniques, and security measures.
2.1 Historical Trends in Deep Learning [2nd Topic]
1. Deep learning has a history that spans a long time and has been known by different
names, reflecting different perspectives and trends in the field.
2. The usefulness of deep learning has increased as the availability of training data has
grown. More data allows deep learning models to learn more effectively and make
better predictions.
3. Deep learning models have become larger over time due to advancements in
computer infrastructure. This includes improvements in both hardware (such as
GPUs) and software (such as optimized algorithms and frameworks) specifically
designed for deep learning.
4. As deep learning models have evolved, they have been able to tackle increasingly
complex applications with higher accuracy. This means that deep learning has been
successful in solving more challenging tasks and producing more reliable results.
2.1.1 The Many Names and Changing Fortunes of Neural Networks
1. Deep learning has a long history dating back to the 1940s, but it has recently gained
popularity and is often referred to as a new technology.
2. Deep learning has gone through various name changes over time, reflecting different
researchers and perspectives in the field.
3. There have been three waves of development in deep learning: cybernetics in the
1940s-1960s, connectionism in the 1980s-1990s, and the current resurgence known
as deep learning since 2006.
4. Deep learning models are sometimes called artificial neural networks (ANNs)
because they are inspired by the functioning of the biological brain.
5. While neural networks have been used to understand brain function, they are not
necessarily realistic models of how the brain works.
6. Deep learning is motivated by the idea of reverse engineering the brain's
computational principles to build intelligent systems and understand human
intelligence.
7. Deep learning also focuses on learning multiple levels of composition, which can be
applied in machine learning frameworks that are not necessarily based on neural
inspiration.
The figure represents two historical waves of artificial neural network research based
on Google Books. The first wave, cybernetics (1940s-1960s), focused on theories of
biological learning and the development of the perceptron, a model that could train a
single neuron. The second wave, connectionism (1980-1995), introduced back-
propagation to train neural networks with one or two hidden layers. The current third
wave, deep learning, began around 2006 and is just now being documented in books
since 2016. It's important to note that books on these waves usually appear later than
the actual research takes place.
MCCULLOCH-PITTS NEURON:
PERCEPTRON:
Linear Models
Deep learning was inspired by neuroscience but is not a direct simulation of the
brain.
Early neural networks were simple linear models and could only learn to recognize
two categories of inputs.
The perceptron was the first model that could learn to recognize multiple categories
of inputs.
Linear models have limitations and cannot learn certain functions, such as the XOR
function.
Neuroscience is still an important source of inspiration for deep learning, but it is
not the predominant guide for the field.
We do not have enough information about the brain to use it as a complete guide for
deep learning research.
Deep learning researchers are more likely to cite the brain as an influence than
researchers working in other machine learning fields.
Deep learning and computational neuroscience are two separate fields of study that
are both concerned with understanding the brain.
Deep learning is focused on building AI systems, while computational neuroscience
is focused on building accurate models of the brain.
Connectionism is a movement in cognitive science that studies models of cognition
based on neural implementations.
Distributed representation is a key concept in connectionism that states that each
input should be represented by many features and each feature should be involved
in the representation of many possible inputs.
Back-propagation: A popular algorithm for training deep neural networks.
LSTM: A type of neural network that is well-suited for modeling sequences.
Decline of neural networks: In the 1990s, neural networks lost popularity due to
unrealistic expectations and advances in other machine learning fields.
CIFAR NCAP research initiative: A program that helped to keep neural networks
research alive during the decline.
Deep Networks: Were once thought to be very difficult to train, but this is no longer
the case.
Geoffrey Hinton: Developed a new technique for training deep neural
networks called greedy layer-wise pre-training.
Deep belief networks: A type of neural network that can be efficiently trained
using deep belief networks.
Deep learning: A term used to emphasize the ability to train deeper neural
networks.
Third wave of neural networks research: Began in 2006 and is still ongoing.
Focus of deep learning research: Has shifted from unsupervised learning to
supervised learning.
2.1.2.Important and Conclusions points about Deep Feedforward Networks
1. Deep feedforward networks, also known as multilayer perceptrons (MLPs), are a
type of artificial neural network that approximate a function f* by defining a
mapping y = f(x; θ) and learning the parameters θ. These networks are called
feedforward because information flows in a single direction from the input x to the
output y, without feedback connections. When extended with feedback connections,
they become recurrent neural networks.
2. Feedforward networks consist of multiple layers of functions, where each layer is
connected to the next in a chain. The first layer is called the input layer, and the final
layer is called the output layer. The layers in between are called hidden layers
because their behavior is not directly specified by the training data.
3. During training, the network is presented with labeled examples (x, y) to learn the
desired output y for each input x. The learning algorithm determines how to use the
hidden layers to best approximate f*.
4. The width of the network is determined by the dimensionality of the hidden layers,
and the depth is determined by the number of layers. The choice of functions used
to compute the hidden layer values is inspired by neuroscience, but the goal of these
networks is not to perfectly model the brain.
5. To overcome the limitations of linear models, such as logistic regression and linear
regression, which can only represent linear functions, we can apply a nonlinear
transformation φ(x) to the input x to obtain a set of features describing x. This is
equivalent to using a kernel function in kernel machines.
6. In deep learning, we learn the function φ(x; θ) and map it to the desired output using
parameters w. This approach allows us to capture the benefits of both highly generic
feature mappings and manually engineered feature mappings, while avoiding the
limitations of either.
7. To train a feedforward network, we choose an optimizer, cost function, and output
units, which are similar to those used for linear models. We also choose the
activation functions used to compute the hidden layer values and design the
architecture of the network, including the number of layers, connections between
layers, and number of units in each layer.
8. Computing the gradients of complicated functions in deep neural networks requires
the back-propagation algorithm and its modern generalizations, which can
efficiently compute these gradients.
9. Deep feedforward networks are a type of artificial neural network that approximate
a function by defining a mapping y = f(x; θ) and learning the parameters θ. They
consist of multiple layers of functions, where each layer is connected to the next in
a chain, and can capture the benefits of both highly generic and manually engineered
feature mappings.
10. Training and optimization techniques, such as choosing an optimizer, cost function,
output units, activation functions, and designing the network architecture, are
required to effectively train these networks. The back-propagation algorithm is used
to efficiently compute the gradients required for learning.
Layer of input
It contains the neurons that receive input. The data is subsequently passed on to the next
tier. The input layer’s total number of neurons is equal to the number of variables in the
dataset.
Hidden layer
This is the intermediate layer, which is concealed between the input and output layers.
This layer has a large number of neurons that perform alterations on the inputs. They
then communicate with the output layer.
Output layer
It is the last layer and is depending on the model’s construction. Additionally, the output
layer is the expected feature, as you are aware of the desired outcome.
Neurons weights
Weights are used to describe the strength of a connection between neurons. The range
of a weight’s value is from 0 to 1.
Where,
b = biases
a = output vectors
x = input
As many neurons as there are classes in the output layer. To show the difference
between the predicted and actual distributions of probabilities.
1
w= , (6.6)
−2
and b = 0.
We can now walk through the way that the model processes a batch of inputs.
Let X be the design matrix containing all four points in the binary input space,
with one example per row:
0 0
0 1
X = . (6.7)
1 0
1 1
The first step in the neural network is to multiply the input matrix by the first
layer’s weight matrix:
0 0
1 1
XW = . (6.8)
1 1
2 2
Next, we add the bias vector c, to obtain
0 −1
1 0
. (6.9)
1 0
2 1
In this space, all of the examples lie along a line with slope 1. As we move along
this line, the output needs to begin at 0, then rise to 1, then drop back down to 0.
A linear model cannot implement such a function. To finish computing the value
of h for each example, we apply the rectified linear transformation:
0 0
1 0
. (6.10)
1 0
2 1
This transformation has changed the relationship between the examples. They no
longer lie on a single line. As shown in figure 6.1, they now lie in a space where a
linear model can solve the problem.
We finish by multiplying by the weight vector w:
0
1
. (6.11)
1
0
176