0% found this document useful (0 votes)
29 views12 pages

Unit - 2 - Mathematical Preliminaries For Lossless Compression Models

Uploaded by

Janvi Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views12 pages

Unit - 2 - Mathematical Preliminaries For Lossless Compression Models

Uploaded by

Janvi Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Compression C LLEGPT IT603N

Mathematical
2 Preliminaries for
Lossless Compression
Models

Data

Data

Compression
Compression
Get Prepared Together

Prepared and Edited by:- Divya Kaurani Designed by:- Kussh Prajapati

Get Prepared Together

www.collegpt.com [email protected]
Unit - 2 : Mathematical Preliminaries
for Lossless Compression Models
Modeling + Coding :

● Modeling : It involves creating a representation or description of the structure,


patterns, and characteristics of the data. The model serves as a blueprint for
encoding and decoding the data.
Eg: In a physical model, the modeling process might involve identifying patterns
such as pixel correlation in an image.
● Coding : It refers to the process of representing the data using a specific set of
symbols or codes based on the model. Coding takes the output of the model
and transforms it into a compact form that can be easily transmitted or stored.
Eg: In Huffman Coding, a commonly used coding technique, the more probable
symbols are assigned shorter binary codes, while less probable symbols receive
longer codes.

PYQ: Example of modeling and coding in lossless compression.

● Input Data: "mississippi"


● Modeling:
Analyze the input data to identify patterns and redundancies.
Count the frequency of each symbol (e.g., characters in the input).
Create a model that assigns shorter codes to more frequent symbols and longer
codes to less frequent symbols.
Model:
'i' occurs 4 times
's' occurs 4 times
'p' occurs 2 times
'm' occurs 1 time
● Coding:
Replace each symbol in the input data with its corresponding code based on the
model.
Encode the input data using the generated codes.
Encoded Data:
'i' -> 0
's' -> 10
'p' -> 110
'm' -> 111
● Compressed Data: 0101011010110111
During decompression, the original data can be reconstructed by reversing the
coding process using the same model. The compressed data is decoded to
retrieve the original input data without any loss of information.

Types of Models :
1. Physical Model :
These models exploit knowledge about the physical process that generated
the data. For example, speech recognition, to install the electricity meter in an
area, in image compression, a physical model might account for the spatial
correlations between neighboring pixels. By understanding how the data is
created, the model can predict and potentially remove redundant information.
2. Probability Models :
These models analyze the statistical properties of the data, like the frequency of
occurrence of symbols or patterns. This analysis helps identify symbols or
sequences that appear more frequently and allows for efficient representation
using shorter codes.

Basic Probability Model: This assumes each symbol in the data stream is
independent and has an equal probability of occurring. It's a simple approach,
but may not be very effective for real-world data with inherent biases.

Variable-Length Coding: Techniques like Huffman coding and arithmetic coding


build upon probability models. They assign shorter codes to symbols with
higher probabilities and longer codes to less frequent symbols, achieving
compression.
PYQ : Explain Markov Model in detail

3. Markov Model :
A Markov model, named after the mathematician Andrey Markov, is a stochastic
model used to model sequences of random variables where the probability of
each variable's value depends only on the state of the preceding variable. In
other words, it's a memoryless process where the future state depends only
on the current state and not on the sequence of events that preceded it.
Markov models are widely used in various fields such as natural language
processing, speech recognition, bioinformatics, finance, and more.

Components of Markov Model :


1. State Space: A finite set of states that the system can be in. Each state
represents a possible situation or condition.
2. Transition Probabilities: The probabilities of transitioning from one state to
another. These probabilities are typically represented by a transition matrix,
where each entry Pij represents the probability of transitioning from state i to j.

Types of Markov Model :


1. First-order Markov Model: In this model, the probability of transitioning to
the next state depends only on the current state. Mathematically,
P(Xn+1​∣X1​,X2​,...,Xn​)=P(Xn+1​∣Xn​).
2. Higher-order Markov Model: In this model, the probability of transitioning to
the next state depends on the previous k states. Mathematically,
P(Xn+1​∣X1​,X2​,...,Xn​)=P(Xn+1​∣Xn​,Xn−1​,...,Xn−k​).

Limitations of Markov Model :


1. Memorylessness: Markov models assume that future states depend only on
the current state, ignoring any long-term dependencies in the data. This may not
accurately capture complex relationships in certain datasets.
2. Fixed Order of Dependence: Higher-order Markov models have fixed
dependencies on a predetermined number of previous states. If the order of
dependence is not chosen carefully, it may not capture the true underlying
dynamics of the system.
Benefits of Markov Model :
1. Simplicity: Markov models are relatively simple to implement and
understand, making them useful for modeling systems with limited complexity.
2. Efficiency: Markov models can be computationally efficient, especially when
dealing with large datasets, due to their memoryless property.

Example:
Imagine a simple weather model with states Sunny (S), Rainy (R), and Cloudy
(C). The transition matrix below shows the probabilities of transitioning from
one state to another:

Transition Sunny (S) Rainy (R) Cloudy (C)


From

Sunny (S) 0.7 0.2 0.1

Rainy (R) 0.3 0.6 0.1

Cloudy (C) 0.4 0.3 0.3

This matrix indicates, for example, that if today is sunny, there is a 70% chance
of it being sunny tomorrow, a 20% chance of it being cloudy, and a 10% chance
of it raining.

4. Composite Source Model:


Real-world data often contains multiple types of information, like text, images,
and audio within a single file. A composite source model treats such data as a
combination of different sources with their own statistical properties. By
applying appropriate compression techniques to each component based on its
characteristics, composite models achieve effective compression for diverse data
formats.

First Order Entropy :


It is also known as Shannon entropy or Entropy rate, is a measure of the average
amount of information associated with each symbol in a data stream. In DC, it is used
to quantify the average number of bits needed to represent each symbol.
Get Prepared Together
Prefix Code :

In data compression, a prefix code is a set of codewords (encoded representations of


symbols) with a critical property: no codeword is a prefix (beginning) of any other
codeword. This characteristic ensures unambiguous decoding of the compressed data.

Imagine a dictionary:

Each word (symbol) has a unique abbreviation (codeword).

In a prefix code, no abbreviation can be the beginning of another word.

If any codeword is the prefix of any other codeword then it is NOT a prefix code.

Uniquely Decodable Code :

In data compression, a uniquely decodable code is a set of codewords (encoded


representations of symbols) with a crucial property: during the decoding process, you
can uniquely identify the original symbol based on the received codeword. This
ensures that the encoded data can be accurately reconstructed from its compressed
form.

Imagine a dictionary where each word (symbol) has a unique abbreviation (codeword).
A uniquely decodable code ensures that no abbreviation accidentally matches the
beginning of another word. Here's why it's important:

Unambiguous Decoding: If a code isn't uniquely decodable, you might encounter


situations like:

Overlapping Prefixes: One codeword might be a prefix (beginning) of another. For


example, consider codewords "0" and "01." When you receive "0," you can't tell if it's
just "0" or the beginning of "01."

Dangling Suffixes: A codeword might be a suffix (ending) of another. Imagine


codewords "10" and "1000." Receiving "1000" could be interpreted as either "10"
repeated multiple times or the single codeword "1000."

These ambiguities make it impossible to reconstruct the original data accurately.


SIRF SYLLABUS ME H ISLIYE:

Algorithmic Information Theory (AIT): It's about finding the shortest way to describe
something. In data compression, it means trying to represent data using as little space
as possible while still keeping all the important information.

Minimum Description Length (MDL) Principle: This principle suggests that the best
way to describe data is to find the shortest computer program that can generate it,
plus the length of that program. In data compression, it means finding the shortest way
to describe the data and the method used to compress it.

When applied to data compression:

AIT helps us understand how complex the data is and how much we can compress it
theoretically.

The MDL principle guides us in finding the most efficient compression method by
considering both the data and the description of how it's compressed.

So, these concepts help in creating compression methods that make data smaller
while still being able to recreate the original data accurately.
All the Best
"Enjoyed these notes? Feel free to share them with

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

Visit: www.collegpt.com

www.collegpt.com [email protected]

You might also like