Unit 4 (Adl)
Unit 4 (Adl)
1. Generator Network:
○ Purpose: To create synthetic data that is as realistic as
possible.
○ Input: Random noise (often from a normal distribution).
○ Output: Synthetic data (e.g., images, text).
2. Discriminator Network:
○ Purpose: To distinguish between real data (from the
training set) and fake data (produced by the generator).
○ Input: Data (either real or generated).
○ Output: Probability (a score between 0 and 1) indicating
whether the input data is real or fake.
Training Process
1. Discriminator Training:
○ The discriminator is trained on real data (labeled as real)
and on fake data generated by the generator (labeled as
fake).
○ The discriminator aims to maximize the probability of
correctly classifying real and fake data.
2. Generator Training:
○ The generator generates a batch of fake data and passes it
to the discriminator.
○ The generator is then trained to minimize the
discriminator’s ability to correctly classify this fake data as
fake.
○ Essentially, the generator aims to maximize the probability
that the discriminator classifies the fake data as real.
Loss Functions
● Discriminator Loss:
LD=−(Ex∼pdata[logD(x)]+Ez∼pz[log(1−D(G(z)))])\mathcal{L}_D =
-\left( \mathbb{E}_{x \sim p_{\text{data}}} [\log D(x)] +
\mathbb{E}_{z \sim p_z} [\log(1 - D(G(z)))]
\right)LD=−(Ex∼pdata[logD(x)]+Ez∼pz[log(1−D(G(z)))])
where D(x)D(x)D(x) is the discriminator’s estimate of the
probability that real data xxx is real, and G(z)G(z)G(z) is the
generator’s output from noise zzz.
● Generator Loss:
LG=−Ez∼pz[logD(G(z))]\mathcal{L}_G = - \mathbb{E}_{z \sim p_z}
[\log D(G(z))]LG=−Ez∼pz[logD(G(z))]
where D(G(z))D(G(z))D(G(z)) is the discriminator’s estimate of
the probability that the generated data G(z)G(z)G(z) is real.
Applications of GANs
1. Vanilla Autoencoders
3. Denoising Autoencoders
4. Convolutional Autoencoders
5. Sparse Autoencoders
Applications of NLP:
Challenges in NLP:
1. Action Recognition
● Dataset:
Use datasets such as UCF-101, Kinetics, or HMDB-51 which
contain labeled video clips of different human actions.
● Model: Implement a 3D Convolutional Neural Network (3D CNN)
or a Two-Stream Network which uses both RGB frames and optical
flow for action recognition.
● Training: Train the model using the labeled dataset, employing
techniques such as data augmentation (e.g., random cropping,
horizontal flipping) to improve generalization.
Challenges:
● Temporal Dynamics:
Capturing the temporal aspect of actions is
challenging, requiring models to process sequences of frames
effectively.
● Computational Complexity: Training models on video data is
computationally expensive and requires significant resources.
Performance Metrics:
2. Shape Recognition
● Dataset:
Use synthetic datasets or real-world datasets containing
labeled images of various shapes (e.g., circles, squares,
triangles).
● Model: Implement a Convolutional Neural Network (CNN)
architecture such as VGGNet or ResNet to classify shapes.
● Training: Use standard backpropagation and optimization
techniques to train the CNN on labeled shape images.
Challenges:
4. Emotion Recognition
● Dataset:
Use datasets such as FER-2013, CK+, or AffectNet which
contain labeled facial expressions.
● Model: Implement a CNN or a hybrid model combining CNNs with
RNNs (e.g., LSTMs) to capture temporal dynamics in video
sequences.
● Training: Employ data augmentation techniques and pre-trained
models to improve emotion recognition performance.
Challenges:
● Subtle Expressions:
Distinguishing subtle differences in facial
expressions can be difficult.
● Variability in Expressions: Variability due to lighting, occlusions
(e.g., glasses, hands), and individual differences in expressing
emotions.
Performance Metrics: