0% found this document useful (0 votes)
29 views7 pages

AAI QB1 Answers

The document provides a comprehensive overview of various advanced AI concepts, including Generative Models, GANs, and Autoencoders. It explains the differences between Generative and Discriminative Models, details the architecture and training process of Vanilla GANs, and discusses challenges in GAN training. Additionally, it covers different types of Autoencoders, Hidden Markov Models, Bayesian Networks, Transfer Learning, and Gaussian Mixture Models, along with their applications and key properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views7 pages

AAI QB1 Answers

The document provides a comprehensive overview of various advanced AI concepts, including Generative Models, GANs, and Autoencoders. It explains the differences between Generative and Discriminative Models, details the architecture and training process of Vanilla GANs, and discusses challenges in GAN training. Additionally, it covers different types of Autoencoders, Hidden Markov Models, Bayesian Networks, Transfer Learning, and Gaussian Mixture Models, along with their applications and key properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AAI Question Bank 1 — Model Answers

Advanced AI (AAI) | 10 Marks Each

Q1. Explain Generative Models and differentiate between Generative and


Discriminative Models.
Generative Models:

Generative models learn the joint probability distribution P(X, Y) of the input data X and the labels Y.
They model how the data is generated — essentially learning the underlying distribution of the data.
Given this joint distribution, they can generate new data samples that resemble the training data.
Examples include Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), Naive Bayes,
Variational Autoencoders (VAE), and Generative Adversarial Networks (GAN).

Discriminative Models:

Discriminative models learn the conditional probability P(Y|X) — the probability of a label given the
input. They focus on the decision boundary between classes rather than modeling the data
distribution. Examples include Logistic Regression, Support Vector Machines (SVM), Neural
Networks, and Conditional Random Fields (CRF).

Key Differences:

Aspect | Generative Models | Discriminative Models


• What it learns: Joint distribution P(X,Y) vs. Conditional P(Y|X)
• Goal: Model data distribution vs. Learn decision boundary
• Can generate data?: Yes vs. No
• Training data needed: More data usually needed vs. Works with less data
• Examples: GAN, VAE, HMM, GMM vs. SVM, Logistic Regression, CNN
• Use case: Image generation, data augmentation vs. Classification, regression

Q2. Explain the architecture of a Vanilla GAN, MinMax Loss Function, and
Training Process.
Architecture:

A Vanilla GAN (Generative Adversarial Network), introduced by Ian Goodfellow et al. (2014), consists
of two neural networks trained simultaneously in adversarial fashion:

• Generator (G): Takes a random noise vector z (sampled from a latent space, typically Gaussian
or uniform) as input and generates fake data samples G(z). Its goal is to fool the discriminator.
• Discriminator (D): Takes either real data (from training set) or fake data (from G) and outputs a
probability D(x) indicating whether the input is real (1) or fake (0). It is a binary classifier.
The two networks compete: G tries to minimize detection while D tries to correctly classify real vs.
fake. Over time, G learns to produce increasingly realistic data.

MinMax Loss Function:

The GAN objective is a minimax game expressed as:


min_G max_D V(D, G) = E[log D(x)] + E[log(1 - D(G(z)))]

Where: x ~ P_data (real data), z ~ P_z (noise). D maximizes: correctly classifying real (log D(x)) and
fake (log(1-D(G(z)))). G minimizes: log(1-D(G(z))), i.e., tries to make D output 1 for fake data. In
practice, G is trained to maximize log D(G(z)) instead (non-saturating loss) for better gradients.

Training Process:

1. Sample a minibatch of real data x from training set.


2. Sample a minibatch of noise vectors z from latent space.
3. Generate fake samples G(z) using the generator.
4. Update Discriminator: maximize log D(x) + log(1-D(G(z))) — keep G fixed.
5. Sample fresh noise z and update Generator: minimize log(1-D(G(z))) or maximize log D(G(z))
— keep D fixed.
6. Repeat for many iterations until G produces realistic samples and D cannot distinguish real from
fake (ideally D(x)=0.5 for all inputs).

Q3. Discuss Challenges in Training GANs.


1. Mode Collapse: The generator learns to produce only a limited variety of outputs (sometimes a
single mode) that fool the discriminator, ignoring the full diversity of the real data distribution. For
example, a GAN trained on digits might only generate '3's.

2. Training Instability: GAN training is notoriously unstable. The minimax game can oscillate rather
than converge, as G and D interact dynamically. Small changes in one network destabilize the other.

3. Vanishing Gradients: If D becomes too strong early in training, D(G(z)) approaches 0, causing
log(1-D(G(z))) to saturate, resulting in near-zero gradients for G. G cannot learn effectively.

4. Non-convergence: Standard GAN training lacks a guaranteed convergence criterion. The Nash
equilibrium is theoretically the solution but is difficult to reach in practice.
5. Hyperparameter Sensitivity: GANs are highly sensitive to learning rates, batch sizes, network
architectures, and the balance between G and D training steps.

6. Evaluation Difficulty: Measuring GAN quality is hard. Common metrics include Frechet Inception
Distance (FID) and Inception Score (IS), but no single metric captures all aspects of quality.

7. Failure to Learn Rare Modes: Rare but valid data distributions may never be captured by the
generator.

Mitigation Strategies: Techniques like Wasserstein GAN (WGAN), spectral normalization,


mini-batch discrimination, label smoothing, feature matching, and progressive growing help address
these challenges.

Q4. Explain DCGAN (Deep Convolutional GAN).


DCGAN, proposed by Radford et al. (2015), extends the vanilla GAN by incorporating convolutional
neural network architectures into both the generator and discriminator, making GAN training more
stable and results much higher quality.

Key Architectural Guidelines of DCGAN:


• Generator: Uses transposed convolutions (fractionally strided convolutions) to upsample the
noise vector into a full image. Batch Normalization is applied after each layer (except output).
ReLU activation is used in all layers except the output, which uses Tanh.
• Discriminator: Uses strided convolutions (instead of pooling layers) to downsample the image.
Batch Normalization is applied after each layer (except input). LeakyReLU activation is used
throughout. Output uses Sigmoid.
• No Fully Connected Layers: The dense layers are removed; global average pooling is used
instead.
• Latent Space: The generator input z is a random vector (typically 100-dimensional), which is
reshaped and upsampled through multiple transposed conv layers.
Advantages of DCGAN over Vanilla GAN:

Convolutional layers capture spatial hierarchies in images. Strided convolutions learn to


downsample/upsample. Batch normalization stabilizes training. Resulting images are much sharper
and more realistic. The latent space is smooth and interpolatable — walking through latent space
gives semantically meaningful transitions.

Applications: Image generation, image-to-image translation, data augmentation, style transfer, and
learning image representations for downstream tasks.

Q5. Differentiate between GAN and Variational Autoencoder (VAE).


Both GANs and VAEs are generative models that learn to generate new data, but they differ
fundamentally in approach:

• Framework: GAN — Adversarial (minimax game); VAE — Variational inference (ELBO)


• Components: GAN — Generator + Discriminator; VAE — Encoder + Decoder
• Training Loss: GAN — Adversarial loss; VAE — Reconstruction loss + KL divergence
• Latent Space: GAN — No explicit structure enforced; VAE — Continuous, structured (Gaussian
prior)
• Image Quality: GAN — Sharper, more realistic images; VAE — Slightly blurry images
• Training Stability: GAN — Unstable, prone to mode collapse; VAE — More stable training
• Diversity: GAN — May suffer mode collapse; VAE — Better diversity due to KL term
• Inference: GAN — No encoder; cannot encode real images; VAE — Has encoder; can encode
and reconstruct
• Interpretability: GAN — Harder to interpret latent space; VAE — Smoother, interpretable latent
space
• Applications: GAN — Image synthesis, super-resolution; VAE — Anomaly detection,
interpolation
Summary: GANs produce sharper images and are better at pure generation tasks. VAEs offer a
structured latent space, better for interpolation and tasks requiring encoding of existing data.

Q6. Explain Different Types of Autoencoders.


An Autoencoder is an unsupervised neural network that learns to encode input data into a
compressed latent representation and then decode it back. It consists of an Encoder (compresses
input to latent code) and a Decoder (reconstructs input from latent code). The model is trained to
minimize reconstruction error.

1. Vanilla (Basic) Autoencoder: Standard encoder-decoder with fully connected layers. Learns a
compressed representation but can overfit and simply memorize input. Used for dimensionality
reduction.
2. Sparse Autoencoder: Adds a sparsity constraint to the latent layer (most neurons should be
inactive). Encourages the model to learn meaningful features. Uses L1 regularization or
KL-divergence penalty on activations.
3. Denoising Autoencoder (DAE): Trains the model to reconstruct the original input from a
corrupted version (with added noise or dropout). Forces the encoder to learn robust, essential
features. Widely used in image denoising and pre-training.
4. Variational Autoencoder (VAE): The encoder outputs parameters (mean and variance) of a
probability distribution rather than a fixed vector. The latent space is continuous and follows a
Gaussian prior. Trained with reconstruction loss + KL divergence. Enables generation of new data
by sampling from the latent distribution.
5. Convolutional Autoencoder: Uses convolutional layers in encoder and transposed
convolutions in decoder. Well-suited for image data. Captures spatial features effectively.
6. Contractive Autoencoder: Adds a penalty term (Frobenius norm of the Jacobian of encoder
activations) to make the learned representation robust to small input perturbations. Useful for
learning smooth manifolds.
7. Undercomplete Autoencoder: The latent dimension is smaller than input. Forced to learn the
most salient features for reconstruction. Standard approach for feature extraction.
Applications of Autoencoders: Dimensionality reduction, anomaly detection, image denoising,
generative modeling, data compression, and feature learning for downstream tasks.

Q7. Explain Hidden Markov Model (HMM) with Example.


A Hidden Markov Model (HMM) is a statistical model that represents systems with unobserved
(hidden) states. It assumes the system transitions through hidden states over time, and each state
emits observable outputs according to a probability distribution.

Key Components of HMM:

• States (S): A finite set of hidden states {s1, s2, ..., sN}.
• Observations (O): A sequence of observable symbols emitted by the states.
• Transition Probability Matrix (A): A[i][j] = P(state j at t+1 | state i at t). Probability of moving
from one state to another.
• Emission Probability Matrix (B): B[i][k] = P(observation k | state i). Probability of emitting
observation k from state i.
• Initial State Distribution (π): π[i] = P(starting in state i).
An HMM is defined by the tuple λ = (A, B, π).

Key Assumptions:

• Markov Property: The current state depends only on the previous state (first-order Markov).
• Output Independence: The current observation depends only on the current state.
Example — Weather Model:

Suppose we cannot observe the weather directly but can observe whether a person carries an
umbrella.

• Hidden States: {Sunny, Rainy}


• Observations: {Umbrella, No Umbrella}
• Transition Matrix A: P(Sunny|Sunny)=0.7, P(Rainy|Sunny)=0.3, P(Sunny|Rainy)=0.4,
P(Rainy|Rainy)=0.6
• Emission Matrix B: P(Umbrella|Sunny)=0.1, P(Umbrella|Rainy)=0.8
• Initial: P(Sunny)=0.6, P(Rainy)=0.4
Given observed sequence [Umbrella, Umbrella, No Umbrella], we can use the Viterbi algorithm to find
the most likely hidden state sequence (e.g., [Rainy, Rainy, Sunny]).

Three Fundamental HMM Problems:

1. Evaluation: P(O|λ) — solved by Forward Algorithm.


2. Decoding: Best hidden state sequence — solved by Viterbi Algorithm.
3. Learning: Estimate λ from observations — solved by Baum-Welch (EM) Algorithm.
Applications: Speech recognition, natural language processing, gesture recognition, bioinformatics
(gene prediction), and financial modeling.

Q8. Explain Bayesian Network with Example.


A Bayesian Network (also called Belief Network or Bayes Net) is a probabilistic graphical model that
represents a set of random variables and their conditional dependencies using a Directed Acyclic
Graph (DAG). Each node represents a random variable, and each directed edge represents a
conditional dependency. Each node has a Conditional Probability Table (CPT) specifying P(node |
parents).

Key Properties:
• DAG Structure: Nodes are variables; edges encode direct probabilistic influence.
• Conditional Independence: Each variable is conditionally independent of its non-descendants
given its parents.
• Joint Distribution: P(X1,...,Xn) = ∏ P(Xi | Parents(Xi))
Example — Disease Diagnosis:

Consider diagnosing lung cancer from symptoms:

Variables: Smoking (S), Pollution (P), Cancer (C), Cough (Co), Fatigue (F)
DAG edges: S → C, P → C, C → Co, C → F
CPTs: P(S=Yes)=0.3, P(P=High)=0.4; P(C=Yes|S=Yes, P=High)=0.05, etc.
Given evidence (e.g., patient coughs), we use Bayes' theorem to compute posterior probability of
cancer: P(C|Co=Yes) ∝ P(Co=Yes|C) × P(C). Inference can be exact (variable elimination, junction
tree) or approximate (MCMC sampling).

Advantages: Handles uncertainty, incorporates prior knowledge, supports reasoning under


incomplete information, interpretable model structure.
Applications: Medical diagnosis, spam filtering, fault detection, decision support systems, NLP, and
causal reasoning.

Q9. Explain Transfer Learning and Its Types.


Transfer Learning is a machine learning technique where a model trained on one task (source
domain) is reused as the starting point for a model on a different but related task (target domain).
Instead of training from scratch, knowledge (weights, features) learned from large datasets is
transferred to new tasks with limited data.

Why Transfer Learning? Training deep neural networks from scratch requires massive datasets and
compute. Transfer learning reduces data requirements, training time, and often achieves better
performance on target tasks.

Types of Transfer Learning:

1. Inductive Transfer Learning: The source and target tasks differ but may share the same or
different domains. Labeled data is available in the target domain. The source model provides
inductive bias. Example: Using a model trained on ImageNet to classify medical images.
2. Transductive Transfer Learning: Source and target tasks are the same but domains differ.
Labeled data is available only in source domain. Includes Domain Adaptation as a special case.
Example: Sentiment model trained on movie reviews adapted for product reviews.
3. Unsupervised Transfer Learning: Neither source nor target domain has labels. Focuses on
learning good representations. Example: Pre-training autoencoders or GANs on unlabeled data
and fine-tuning for downstream tasks.
Strategies for Applying Transfer Learning:

• Feature Extraction: Freeze the pre-trained model weights and use it only to extract features.
Add new classifier layers on top and train only those. Useful when target dataset is small.
• Fine-Tuning: Unfreeze some or all layers of the pre-trained model and retrain on the target task
with a small learning rate. Adapts the model features to the new domain. Used when target
dataset is moderate to large.
• Domain Adaptation: Adapts model from one domain (e.g., synthetic images) to another (real
images) by minimizing domain shift, often using adversarial methods.
• Multi-task Learning: Trains on multiple related tasks simultaneously, sharing representations to
improve generalization on all tasks.
Popular Pre-trained Models Used for Transfer Learning: VGG, ResNet, Inception (Computer
Vision); BERT, GPT, T5 (NLP).

Applications: Image classification, object detection, NLP tasks, medical imaging, autonomous
driving, and speech recognition.

Q10. Explain Gaussian Mixture Models (GMM).


A Gaussian Mixture Model (GMM) is a probabilistic generative model that assumes the data is
generated from a mixture of K Gaussian (normal) distributions, each with its own mean, covariance,
and mixing coefficient. It is used for density estimation, clustering, and generative modeling.

Mathematical Formulation:
The probability density of GMM is: P(x) = Σ(k=1 to K) π_k × N(x | µ_k, Σ_k)
Where:

• π_k = mixing coefficient (weight) for component k; Σπ_k = 1, π_k > 0


• µ_k = mean vector of the k-th Gaussian
• Σ_k = covariance matrix of the k-th Gaussian
• N(x|µ,Σ) = multivariate Gaussian density
Parameter Estimation — EM Algorithm:

GMM parameters (π_k, µ_k, Σ_k) are estimated using the Expectation-Maximization (EM) algorithm:

• E-step (Expectation): Compute the responsibility r_ik — posterior probability that component k
generated data point x_i: r_ik = π_k N(x_i|µ_k,Σ_k) / Σ_j π_j N(x_i|µ_j,Σ_j)
• M-step (Maximization): Update parameters using the responsibilities: µ_k = Σ r_ik x_i / N_k; Σ_k
= Σ r_ik (x_i - µ_k)(x_i - µ_k)^T / N_k; π_k = N_k / N
• Repeat E and M steps until convergence (log-likelihood stops increasing).
GMM vs. K-Means Clustering:

GMM is a soft clustering method — each point has a probability of belonging to each cluster.
K-Means performs hard assignment. GMM can model elliptical clusters (via covariance matrices);
K-Means assumes spherical clusters. GMM is more flexible and expressive.

Selecting K: The number of components K is chosen using model selection criteria such as BIC
(Bayesian Information Criterion) or AIC (Akaike Information Criterion).

Applications of GMM:

• Density estimation and anomaly detection


• Speaker identification in speech processing
• Image segmentation and background modeling
• Data augmentation by sampling from the learned distribution
• As a prior in variational autoencoders and probabilistic models

End of AAI QB1 Model Answers | All questions carry 10 marks each.

You might also like