0% found this document useful (0 votes)
3 views

Unit - V

The document covers advanced deep learning topics, including the fundamentals of deep learning, neural networks, representation learning, transfer learning, and domain adaptation. It discusses various techniques and models such as semi-supervised learning, distributed representations, structured probabilistic models, and deep generative models like GANs and VAEs. Additionally, it highlights the importance of these methods in applications like image recognition, natural language processing, and data generation.

Uploaded by

Harishri MQ
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit - V

The document covers advanced deep learning topics, including the fundamentals of deep learning, neural networks, representation learning, transfer learning, and domain adaptation. It discusses various techniques and models such as semi-supervised learning, distributed representations, structured probabilistic models, and deep generative models like GANs and VAEs. Additionally, it highlights the importance of these methods in applications like image recognition, natural language processing, and data generation.

Uploaded by

Harishri MQ
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Unit - V

Advanced Deep
Learning Topics

Dr. S Ruban
Dept of Software Technology
Introduction to
Deep Learning

What is Deep Learning?


• A subfield of machine learning
that utilizes artificial neural
networks with multiple layers
(deep architecture) to learn
complex representations from
data.
• Deep learning algorithms excel
at tasks like image recognition,
natural language processing,
and speech recognition.
Neural Networks: Basic Architecture and Working
• Perceptron: The fundamental building block of neural networks. A simple model that
takes inputs, applies weights, and produces an output.
• Multilayer Perceptron (MLP): A feedforward neural network with multiple layers of
interconnected neurons, enabling the learning of complex non-linear relationships.

• Working:
1. Input data is fed into the input layer.
2. Each neuron in a layer performs a weighted sum of its inputs and applies an
activation function (e.g., sigmoid, ReLU).
3. The output of one layer becomes the input to the next layer.
4. The network learns by adjusting the weights of connections between neurons
through backpropagation.
Representation Learning
Overview of Representation Learning

Definition: Why It Matters: Applications:


Representation learning involves Reduces reliance on manual feature Natural Language Processing (NLP),
discovering efficient and compact engineering. Computer Vision, Speech
representations of data for tasks like Enables deep models to learn Recognition.
classification, clustering, and hierarchical features.
generation.
Representation learning

What it does Extracts meaningful patterns from raw data to create representations
that are easier for machines to understand and process

Why it's important Improves the performance of learning algorithms, especially when
dealing with high-dimensional data

How it works Replaces manual feature engineering, allowing machines to learn the
features and use them to perform a specific task

Types Supervised representation learning and unsupervised representation


learning

Applications Image classification, retrieval, and transfer learning


Key Concepts of Representation
Learning:
Feature Learning: Representation: A
Automatically extracting transformed version of the
relevant features from raw input data that captures
data. meaningful information.

Goal: To learn meaningful


Unsupervised Learning: representations from data
Learning representations that capture essential
from unlabeled data. information and facilitate
downstream tasks.
Greedy Layer-wise Unsupervised Pretraining

Definition: Steps: Benefits:


An unsupervised approach to Train the first layer unsupervised Overcomes vanishing gradient
train each layer of a neural Freeze the first layer, train the second layer. issues.
network incrementally. Improves generalization.
Repeat for deeper layers.
Uses autoencoders or Restricted
Fine-tune the entire network with supervised
Boltzmann Machines (RBMs) for
learning.
layer-wise training.
Greedy layer-wise pretraining is called so because it optimizes each layer at a time greedily. After unsupervised
training, there is usually a fine-tune stage, when a joint supervised training algorithm is applied to all the layers.

Approach:

1. Train each layer of a deep network in a greedy, unsupervised manner.

2. Start with a shallow network and gradually add layers.

3. Pre-train each layer using an unsupervised learning method (e.g., Restricted Boltzmann Machines (RBMs)).

4. Fine-tune the entire network with supervised learning.

Benefits:

- Improves training stability.

- Helps to avoid local minima.

- Can lead to better generalization performance.


• Transfer learning reuses knowledge gained from one task to
improve performance on a related task. For example, if a model
can recognize dogs, it can be trained to recognize cats by
highlighting the differences between the two.

How it works
• In transfer learning, a pre-trained model is fine-tuned for a new,
related task. The model's layers may be frozen or modified
depending on the target dataset. For example, if the target
dataset is small and distinct, most of the top layers may be
removed and new layers added.
Domain
Adaption
• Domain adaptation is a machine
learning technique that adjusts a
model trained on one domain to
work on a different domain. It's
used when there's a lack of
labelled data in the target
domain, but there's ample data in
the source domain.
• Adapting a model trained on a
source domain to perform well on
a target domain with different
data distributions.
• Example: Adapting a model
trained on English text to perform
well on Spanish text.
Domain Adaption Techniques

FINE-TUNING: DOMAIN ADVERSARIAL FEATURE EXTRACTION: USING


ADJUSTING THE TRAINING: TRAINING A FEATURES EXTRACTED FROM A
WEIGHTS OF A PRE- MODEL TO BE INVARIANT TO PRE-TRAINED MODEL AS
TRAINED MODEL ON DOMAIN SHIFTS. INPUT TO A NEW MODEL.
THE TARGET TASK.
Transfer Learning and Domain
Adaptation

Transfer Learning: Domain Adaptation: Benefits:


Leverages knowledge from a pre- Adapts a model trained on one Transfer learning saves time and
trained model to a related task. domain to perform well on a computing power and helps
Example: Using ImageNet pre- different but related domain. manage model accuracy. It's faster
trained models for custom image Example: Adapting a spam and easier to update and retrain a
datasets. detection model trained on English network using transfer learning
emails for French emails. than to train a network from
scratch.
Semi-Supervised
Learning
• Semi-supervised learning is a type of
machine learning that falls in between
supervised and unsupervised learning. It
is a method that uses a small amount of
labeled data and a large amount of
unlabeled data to train a model.
• The goal of semi-supervised learning is to
learn a function that can accurately
predict the output variable based on the
input variables, similar to supervised
learning. However, unlike supervised
learning, the algorithm is trained on a
dataset that contains both labeled and
unlabeled data.
Definition:

• Learning representations that disentangle


Semi- independent factors of variation in data.
Supervised • Example: Separating lighting and pose factors in
Disentangling face images.
of Causal
Factors Techniques:

• Variational Autoencoders (VAEs).


• Adversarial training.

Applications:

• Fairness in AI, Generative modeling.


Definition:
• Represents data (e.g., words, images) using
dense vectors instead of one-hot encoding.
Distributed • Example: Word embeddings like Word2Vec,
Representa GloVe.
tion Benefits:
• Captures semantic relationships.
• Reduces memory usage.

Applications:
• Text summarization, Sentiment analysis,
Machine translation.
Distributed representations are a fundamental
concept in the field of machine learning and natural
Distribute language processing (NLP). They refer to a way of
representing data, typically words or phrases, as
d continuous vectors in a high-dimensional space.
representa
tion In distributed representations, also known as
embeddings, the idea is that the "meaning" or
"semantic content" of a data point is distributed
across multiple dimensions. For example, in NLP,
words with similar meanings are mapped to points
in the vector space that are close to each other.
Word Similarity: Measuring the semantic
similarity between words.
Application Text Classification: Categorizing
s of documents into predefined classes.
Distributed
Representa Machine Translation: Translating text
tions from one language to another.
Information Retrieval: Finding relevant
documents in response to a query.
Sentiment Analysis: Determining the
sentiment expressed in a piece of text.
Structured Probabilistic
Models for Deep Learning
What is a structured
probabilistic model
A structured probabilistic model is a way of describing a
probability distribution with graphs.

The graph describes which random variables in the


probability distribution interact with each other directly.

Because a graph defines the structure of the model, so it


can also be referred to as graphical models.
Definition:
Using • Graphical models represent dependencies
among variables using nodes (variables) and
Graphs to edges (dependencies).

Describe Types:
Model • Bayesian Networks (Directed Acyclic Graphs).
• Markov Random Fields (Undirected Graphs).
Structure
Applications:
• Probabilistic inference, Speech recognition,
Gene network analysis.
Why Sampling?
• Allows inference when exact computations are
Sampling intractable.

from Techniques:
Graphical • Importance Sampling.
• Rejection Sampling.
Models • Gibbs Sampling (explained in later slides).

Challenges:
• Computational complexity, convergence
issues.
Monte Carlo Methods
Markov Chain Monte Carlo
Methods (MCMC)
A class of algorithms for sampling from probability
Definition: distributions based on constructing a Markov chain.

Define the transition probabilities.


Steps:
Use samples from the chain to estimate expectations.

Applications: Bayesian inference, Computational physics, Deep learning.


Markov • Markov chain Monte Carlo (MCMC) methods
are a class of algorithms that are used to
chain draw samples from a probability distribution.
Monte They are a popular choice for Bayesian
analysis and are used to fit models and
Carlo sample from the joint posterior distribution
(MCMC) of model parameters.
Markov Chain Monte Carlo
(MCMC): A class of algorithms for
sampling from complex probability
distributions.
MCMC
methods Gibbs Sampling: An MCMC
algorithm that iteratively samples
each variable conditioned on the
current values of all other variables.
Gibbs Sampling
• Definition:
• A specific MCMC algorithm
where each variable is sampled
conditionally on others.
• Steps:
1. Initialize all variables.
2. Iteratively sample one
variable at a time.
3. Repeat until convergence.
• Advantages:
• Simpler to implement for
conditional distributions.
• Limitations:
• Slow convergence for high-
dimensional data.
Deep Generative
Models

• Deep generative models are a type


of generative model in deep
learning that can learn the joint
probability of multiple variables
and calculate conditional posterior
probability.
• They can model the distribution of
each class and help with data
scarcity and imbalance issues.
Generative adversarial networks (GANs)
• A deep learning architecture that trains two neural
networks to compete against each other to generate new
Examples data from a training dataset. GANs are good for creating
images, but not as sophisticated as diffusion models.
of Deep Variational autoencoders (VAEs)
Generative • A generative model based on neural network
autoencoders, which are made up of two neural networks:
Models encoders and decoders. VAEs are considered to be an
efficient and practical method for developing generative
models.
Autoregressive models
• Commonly used for text generation, language modeling,
and forecasting. They are best for modeling sequential
data, such as in text, audio, and time series prediction.
Overview
of Deep
Generativ
e Models
Boltzmann Machines

Definition: Variants: Applications:


Restricted Boltzmann
A stochastic neural Machines (RBMs). Collaborative filtering,
network that models the Deep Boltzmann Machines Feature learning.
distribution of input data. (DBMs)
Topics in Deep Generative
Models

Directed Generative
Boltzmann Deep Belief
Generative Stochastic
Machines Networks
Nets Networks
Boltzmann machine

• A Boltzmann machine is a neural network that


uses stochastic decisions to learn internal
representations of input. It's a type of
unsupervised learning model that's used in
cognitive science and statistical physics.
How it works
1. A Boltzmann machine is a network of
symmetrically connected neurons that make
stochastic decisions about whether to be on or
off.
2. The model is responsible for finding the
relationship between the input features, and it
doesn't give any output.
Types of Boltzmann machines

Restricted Boltzmann Machines Deep Boltzmann Machines


(RBMs): These machines (DBMs): These machines have
implement restrictions on the undirected connections
connections between neurons between layers, allowing them
to overcome the slow learning to extract more complex
of standard Boltzmann features and perform more
machines. complex tasks.
Deep Belief Networks (DBNs)

Definition: Advantages: Applications:


• Composed of • Efficient • Handwriting
stacked RBMs. pretraining. recognition,
• Trained layer Speech
by layer. synthesis.
What is a Deep Belief Network?
• Deep Belief Networks (DBNs) is used to address issues with classic
neural networks in deep layered networks.
• For example – slow learning, becoming stuck in local minima owing to
poor parameter selection, and requiring a large number of training
datasets of these given input layer.
Directive Generative Nets (Conditional Generative Adversarial Networks
(cGANs))

Definition: Advantages:
Generative models that directly define the High-quality samples.
likelihood function. Direct modeling of dependencies.
Example: PixelRNN, PixelCNN.
Directive Generative Nets (Conditional
Generative Adversarial Networks (cGANs))

• Comprise two neural networks:


Generator: Generates synthetic data samples.
Discriminator: Distinguishes between real data and generated data.
• Training: The generator and discriminator are trained in a competitive
game.
• Applications: Generating realistic images, videos, and other forms of
data.
References
• https://towardsdatascience.com/monte-carlo-markov-chain-mcmc-ex
plained-94e3a6c8de11
• https://en.wikipedia.org/wiki/Domain_adaptation
• https://www.analyticsvidhya.com/blog/2022/03/an-overview-of-deep
-belief-network-dbn-in-deep-learning/

• https://towardsdatascience.com/gibbs-sampling-8e4844560ae5
• https://www.sciencedirect.com/topics/computer-science/boltzmann-
machine

• https://cedar.buffalo.edu/~srihari/CSE676/20.10.1-DirectedGenNets.p
df
Thank You

You might also like