0% found this document useful (0 votes)
6 views10 pages

aai

Uploaded by

sairajbandre04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views10 pages

aai

Uploaded by

sairajbandre04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Generative models

Generative modelling

●​ Purpose: It models how data is generated by learning the joint probability distribution
P(x,y)P(x, y)P(x,y) or just P(x)P(x)P(x) for unlabeled data. It aims to understand the underlying
data structure and generate new, realistic samples.
●​ How it Works: It learns both the features and their correlations, allowing it to generate new
data points similar to the training set. It tries to answer, "How likely is this input?"
●​ Examples:
○​ GANs (Generative Adversarial Networks): Generate realistic images, videos, or
sounds by pitting two neural networks (Generator and Discriminator) against each
other.
○​ VAEs (Variational Autoencoders): Learn efficient data representations to generate
new data samples with controlled variation.
●​ Applications: Image synthesis, text generation, data augmentation, and anomaly detection.
●​ Advantages: Can generate new, unseen data samples, useful for creative tasks and data
augmentation.
●​ Disadvantages: Often harder to train and prone to issues like mode collapse (repetitive
outputs).

Discriminative modelling

●​ Purpose: It models the decision boundary between classes by learning the conditional
probability P(y∣x)P(y|x)P(y∣x), focusing on distinguishing between different categories.
●​ How it Works: It learns to map inputs to their corresponding labels, directly optimizing for
classification accuracy. It tries to answer, "Which class does this input belong to?"
●​ Examples:
○​ Logistic Regression: Classifies data into two categories by estimating probabilities.
○​ SVM (Support Vector Machine): Finds the optimal hyperplane that separates
different classes with the maximum margin.
○​ Neural Networks: Learn complex decision boundaries for tasks like image
classification and speech recognition.
●​ Applications: Image classification, spam detection, speech recognition, and fraud detection.
●​ Advantages: Usually simpler to train and more accurate for classification tasks.
●​ Disadvantages: Cannot generate new data samples and may require large labeled datasets
Generative VS discriminative modelling

Significance of generative models


●​ Data Generation: Generative models create new, realistic data samples that
resemble the original dataset.
●​ Data Augmentation: They generate variations of existing data to improve model
robustness and accuracy.
●​ Simulation and Training: Help simulate real-world scenarios for training AI agents in
robotics, games, and more.
●​ Natural Language Generation: Power language models like GPT to generate fluent,
context-aware text.
●​ Representation Learning: Capture meaningful features from data for tasks like
classification or clustering.
●​ Anomaly Detection: Identify unusual patterns by modeling what normal data should
look like.
●​ Personalization and Recommendations: Create customized content or
suggestions tailored to individual users.

Challenges of generative models


●​ Mode Collapse: In models like GANs, the generator may produce limited variations,
ignoring parts of the data distribution.
●​ Training Instability: Generative models, especially GANs, can be difficult to train
and may not converge reliably.
●​ High Computational Cost: They often require large datasets and significant
computational resources to train effectively.
●​ Evaluation Difficulty: It’s hard to objectively measure the quality and diversity of
generated outputs.
●​ Overfitting: Generative models may memorize training data instead of learning to
generalize.
●​ Security Risks: Generated data can be misused for deepfakes or identity fraud.
●​ Data Privacy Concerns: If not trained carefully, models might leak sensitive
information from the training set.
●​ Poor Interpretability: It’s often unclear what exactly the model has learned or why it
generates specific outputs.

GAN VS VAE

Probabilistic models

GMM
A Gaussian Mixture Model (GMM) is a probabilistic model that assumes all the data points
are generated from a mixture of several Gaussian (normal) distributions with unknown
parameters.​
Each component in the mixture represents a cluster, and the model assigns probabilities to
each point for belonging to a particular cluster.

Working of GMM
GMM works by modeling the data as a weighted sum of multiple Gaussian distributions,
each having its own mean and covariance.

🔁 Steps:
1.​ Initialization:​

○​ Choose the number of components (clusters), say KKK.​

○​ Randomly initialize the means μk\mu_kμk​, covariances Σk\Sigma_kΣk​, and


mixing coefficients πk\pi_kπk​.​

2.​ E-Step (Expectation):​

○​ For each data point, calculate the probability of it belonging to each Gaussian
using the current parameters (soft clustering).​

3.​ M-Step (Maximization):​

○​ Update the parameters (means, covariances, and mixing coefficients) to


maximize the likelihood of the data under the current assignments.​

4.​ Repeat E and M steps until convergence (i.e., the parameters stop changing
significantly).​

This is done using the Expectation-Maximization (EM) algorithm.

Advantages
●​ Soft Clustering: Unlike K-Means, GMM assigns probabilities, not hard labels —
better for overlapping clusters.​

●​ Flexible Shapes: Can model elliptical (non-spherical) clusters due to covariance


matrices.​

●​ Probabilistic Framework: Can handle uncertainty and provides likelihoods.​

●​ Works with EM Algorithm: Efficient estimation of parameters.

Limitations
●​ Number of Components Must Be Predefined: You need to specify the number of
Gaussians in advance.​

●​ Sensitive to Initialization: Bad initialization may lead to poor convergence.​


●​ Prone to Overfitting: Especially with many components or small datasets.​

●​ Assumes Gaussian Distributions: Not ideal if the actual data distribution is


non-Gaussian.

Applications
●​ Image Segmentation: Separating objects based on pixel intensities.​

●​ Anomaly Detection: Points with low probability are flagged as anomalies.​

●​ Speech Recognition: Modeling acoustic features using GMMs.​

●​ Clustering: An alternative to K-Means when data has different variances.​

●​ Finance: Modeling returns of assets which often follow a mixture distribution.

HMM

A Hidden Markov Model (HMM) is a statistical model that describes systems that are influenced by
hidden (unobservable) states but produce observable outputs. It models the probabilistic
relationship between a sequence of hidden states and corresponding observations, helping to make
predictions or understand patterns when the underlying system is not directly visible. It involves:

●​ Hidden States: Unobservable internal states (e.g., weather conditions like Sunny or Rainy).
●​ Observations: Visible outputs influenced by hidden states (e.g., Dry or Wet ground).

Key Components:

1.​ Initial State Probabilities: Probabilities of starting in each hidden state.


2.​ Transition Probabilities: Probabilities of moving from one hidden state to another.
3.​ Emission Probabilities: Probabilities of observing a specific output given a hidden state.

Example: Weather Prediction

Imagine you want to predict the weather (Sunny or Rainy) based on observed conditions (Dry or
Wet). In this case:

●​ Hidden States: The actual weather (Sunny, Rainy) which you cannot observe directly.
●​ Observations: The ground condition (Dry, Wet) that you can see.

Using HMM:

●​ Start Probabilities: Initial chances of the weather being Sunny (60%) or Rainy (40%).
●​ Transition Probabilities: Chances of moving from one weather state to another. For example,
if it's Sunny today, there's a 70% chance it will be Sunny tomorrow and a 30% chance of Rain.
●​ Emission Probabilities: Chances of observing Dry or Wet conditions given the hidden state.
For example, if it's Sunny, there's a 90% chance the ground is Dry.

With this setup, HMM can predict the most likely sequence of weather conditions given a sequence
of observed ground states.

Uses of Hidden Markov Models

●​ Weather Forecasting: Predicting weather patterns based on observed conditions.


●​ Speech Recognition: Mapping audio signals to words.
●​ Bioinformatics: DNA sequence analysis and gene prediction.
●​ Finance: Modeling stock market trends.
●​ Natural Language Processing: Part-of-speech tagging and language translation.

Advantages

●​ Versatility: Applicable to a wide range of sequential data problems.


●​ Probabilistic Framework: Handles uncertainty and noise in observations.
●​ Efficient Algorithms: Algorithms like Viterbi and Baum-Welch efficiently find hidden states
and train the model.

Disadvantages

●​ Assumption of Markov Property: Assumes that the current state only depends on the
previous state, which may not hold in complex scenarios.
●​ Parameter Estimation: Requires careful estimation of transition and emission probabilities.
●​ Hidden State Limitations: Number of hidden states must be predefined, which might
oversimplify complex systems.

Problems: Hidden Markov Model Clearly Explained! Part - 5

MRF

A Markov Random Field (MRF), also known as an Undirected Graphical Model or Markov Network,
is a probabilistic model that uses an undirected graph to represent the dependencies between
random variables. In an MRF:

●​ Nodes represent random variables.


●​ Edges represent direct interactions between variables.

Key Features

●​ Undirected Edges: Unlike Bayesian networks, MRFs use undirected edges, indicating that the
relationship between connected nodes is mutual without a directional cause-and-effect flow.
●​ No Conditional Probability Distribution: Edges in an MRF show potential interactions but are
not associated with conditional probabilities.
●​ Local Interactions: Two nodes interact directly only if they are connected by an edge.

How It Works

1.​ Graph Structure: Nodes are variables (e.g., pixels), and edges show dependencies.
2.​ Potential Functions: These define how connected variables influence each other.
3.​ Joint Probability: Calculated using these functions to find the likelihood of a certain
configuration.
4.​ Inference: Used to predict unknown variables (e.g., labeling image regions).

Applications

●​ Image Segmentation: Classifying each pixel into objects or regions.


●​ Denoising: Cleaning noisy images while preserving edges.
●​ NLP Tasks: Part-of-speech tagging and named entity recognition.
●​ Spatial Analysis: Modeling geographical data dependencies.

Advantages

●​ Captures complex dependencies without assuming direction.


●​ Efficient calculations due to localized interactions.

Disadvantages

●​ Inference is computationally expensive.


●​ Needs large datasets for accurate parameter estimation.
●​ Accuracy depends on the chosen graph structure.

Bayesian network

A Bayesian Network (also known as a Belief Network) is a probabilistic graphical model that
represents a set of variables and their conditional dependencies using a directed acyclic graph
(DAG). It is based on Bayes' theorem and is used to model uncertainty in complex systems.

Components of a Bayesian Network:

1.​ Nodes: Represent random variables, which can be observable quantities, latent variables, or
unknown parameters.
2.​ Edges: Directed edges between nodes represent conditional dependencies. If there's an edge
from node AAA to node BBB, then AAA directly influences BBB.
3.​ Conditional Probability Tables (CPTs): Each node has a CPT that quantifies the effect of the
parent nodes on the node.
Scenario:

●​ B: Burglary occurred
●​ E: Earthquake occurred
●​ A: Alarm went off
●​ J: John called to report the alarm
●​ M: Mary called to report the alarm

The dependencies are:

●​ A burglary or an earthquake can trigger the alarm.


●​ If the alarm goes off, there's a chance that John and Mary will call.

Applications:

●​ Security Systems: To determine the likelihood of a break-in based on multiple sensor alerts.
●​ Medical Diagnosis: Inferring diseases from symptoms and test results.
●​ Fault Detection: In engineering systems based on observed failures.

Advantages:
●​ Compact Representation: Efficiently represents joint distributions.
●​ Causal Relationships: Clearly shows dependencies and causal structures.
●​ Flexible Inference: Can calculate probabilities given any evidence.

Challenges:

●​ Complex Inference: Exact calculations can be computationally expensive.


●​ Dependency Knowledge: Requires knowledge of conditional dependencies.
●​ Parameter Estimation: CPTs need accurate probabilities, which might not always be
available.

EM algorithm
The Expectation-Maximization (EM) algorithm is an iterative method used to find
maximum likelihood estimates of parameters in probabilistic models when the data has
missing or hidden (latent) variables.

Where It's Used


●​ Gaussian Mixture Models (GMM)​

●​ Hidden Markov Models (HMM)​

●​ Missing data problems​

●​ Clustering with soft assignments

How EM Works
It alternates between two steps:

1. Expectation Step (E-step)

Estimate the expected value of the latent variables (like cluster assignments) given the
current parameters of the model.

Example in GMM: Compute the probability that each data point belongs to each
Gaussian (soft assignment).

2. Maximization Step (M-step)

Update the model parameters (like mean, variance, mixing coefficients) to maximize the
expected log-likelihood found in the E-step.
Example in GMM: Update the means, covariances, and weights of each
Gaussian using the soft assignments.

Repeat until convergence (i.e., changes in parameters or log-likelihood


are minimal).

Advantages
●​ Handles missing or hidden data effectively.​

●​ Can work with soft (probabilistic) assignments.​

●​ Guaranteed to converge to a local optimum.

Limitations
●​ May converge to a local maximum, not necessarily the global one.​

●​ Sensitive to initialization.​

●​ Can be computationally expensive for large datasets.

Applications
●​ Clustering (e.g., Gaussian Mixture Models)​

●​ Natural Language Processing (e.g., topic models)​

●​ Image restoration​

●​ Bioinformatics​

●​ Anomaly detection

You might also like