0% found this document useful (0 votes)

29 views12 pages

Unit - 2 - Mathematical Preliminaries For Lossless Compression Models

Uploaded by

Janvi Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views12 pages

Unit - 2 - Mathematical Preliminaries For Lossless Compression Models

Uploaded by

Janvi Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Data Compression C LLEGPT IT603N

Mathematical
2 Preliminaries for
Lossless Compression
Models

Data

Compression
Compression
Get Prepared Together

Prepared and Edited by:- Divya Kaurani Designed by:- Kussh Prajapati

Get Prepared Together

www.collegpt.com [email protected]
Unit - 2 : Mathematical Preliminaries
for Lossless Compression Models
Modeling + Coding :

● Modeling : It involves creating a representation or description of the structure,

patterns, and characteristics of the data. The model serves as a blueprint for
encoding and decoding the data.
Eg: In a physical model, the modeling process might involve identifying patterns
such as pixel correlation in an image.
● Coding : It refers to the process of representing the data using a specific set of
symbols or codes based on the model. Coding takes the output of the model
and transforms it into a compact form that can be easily transmitted or stored.
Eg: In Huffman Coding, a commonly used coding technique, the more probable
symbols are assigned shorter binary codes, while less probable symbols receive
longer codes.

PYQ: Example of modeling and coding in lossless compression.

● Input Data: "mississippi"

● Modeling:
Analyze the input data to identify patterns and redundancies.
Count the frequency of each symbol (e.g., characters in the input).
Create a model that assigns shorter codes to more frequent symbols and longer
codes to less frequent symbols.
Model:
'i' occurs 4 times
's' occurs 4 times
'p' occurs 2 times
'm' occurs 1 time
● Coding:
Replace each symbol in the input data with its corresponding code based on the
model.
Encode the input data using the generated codes.
Encoded Data:
'i' -> 0
's' -> 10
'p' -> 110
'm' -> 111
● Compressed Data: 0101011010110111
During decompression, the original data can be reconstructed by reversing the
coding process using the same model. The compressed data is decoded to
retrieve the original input data without any loss of information.

Types of Models :
1. Physical Model :
These models exploit knowledge about the physical process that generated
the data. For example, speech recognition, to install the electricity meter in an
area, in image compression, a physical model might account for the spatial
correlations between neighboring pixels. By understanding how the data is
created, the model can predict and potentially remove redundant information.
2. Probability Models :
These models analyze the statistical properties of the data, like the frequency of
occurrence of symbols or patterns. This analysis helps identify symbols or
sequences that appear more frequently and allows for efficient representation
using shorter codes.

Basic Probability Model: This assumes each symbol in the data stream is
independent and has an equal probability of occurring. It's a simple approach,
but may not be very effective for real-world data with inherent biases.

Variable-Length Coding: Techniques like Huffman coding and arithmetic coding

build upon probability models. They assign shorter codes to symbols with
higher probabilities and longer codes to less frequent symbols, achieving
compression.
PYQ : Explain Markov Model in detail

3. Markov Model :
A Markov model, named after the mathematician Andrey Markov, is a stochastic
model used to model sequences of random variables where the probability of
each variable's value depends only on the state of the preceding variable. In
other words, it's a memoryless process where the future state depends only
on the current state and not on the sequence of events that preceded it.
Markov models are widely used in various fields such as natural language
processing, speech recognition, bioinformatics, finance, and more.

Components of Markov Model :

1. State Space: A finite set of states that the system can be in. Each state
represents a possible situation or condition.
2. Transition Probabilities: The probabilities of transitioning from one state to
another. These probabilities are typically represented by a transition matrix,
where each entry Pij represents the probability of transitioning from state i to j.

Types of Markov Model :

1. First-order Markov Model: In this model, the probability of transitioning to
the next state depends only on the current state. Mathematically,
P(Xn+1∣X1,X2,...,Xn)=P(Xn+1∣Xn).
2. Higher-order Markov Model: In this model, the probability of transitioning to
the next state depends on the previous k states. Mathematically,
P(Xn+1∣X1,X2,...,Xn)=P(Xn+1∣Xn,Xn−1,...,Xn−k).

Limitations of Markov Model :

1. Memorylessness: Markov models assume that future states depend only on
the current state, ignoring any long-term dependencies in the data. This may not
accurately capture complex relationships in certain datasets.
2. Fixed Order of Dependence: Higher-order Markov models have fixed
dependencies on a predetermined number of previous states. If the order of
dependence is not chosen carefully, it may not capture the true underlying
dynamics of the system.
Benefits of Markov Model :
1. Simplicity: Markov models are relatively simple to implement and
understand, making them useful for modeling systems with limited complexity.
2. Efficiency: Markov models can be computationally efficient, especially when
dealing with large datasets, due to their memoryless property.

Example:
Imagine a simple weather model with states Sunny (S), Rainy (R), and Cloudy
(C). The transition matrix below shows the probabilities of transitioning from
one state to another:

Transition Sunny (S) Rainy (R) Cloudy (C)

From

Sunny (S) 0.7 0.2 0.1

Rainy (R) 0.3 0.6 0.1

Cloudy (C) 0.4 0.3 0.3

This matrix indicates, for example, that if today is sunny, there is a 70% chance
of it being sunny tomorrow, a 20% chance of it being cloudy, and a 10% chance
of it raining.

4. Composite Source Model:

Real-world data often contains multiple types of information, like text, images,
and audio within a single file. A composite source model treats such data as a
combination of different sources with their own statistical properties. By
applying appropriate compression techniques to each component based on its
characteristics, composite models achieve effective compression for diverse data
formats.

First Order Entropy :

It is also known as Shannon entropy or Entropy rate, is a measure of the average
amount of information associated with each symbol in a data stream. In DC, it is used
to quantify the average number of bits needed to represent each symbol.
Get Prepared Together
Prefix Code :

In data compression, a prefix code is a set of codewords (encoded representations of

symbols) with a critical property: no codeword is a prefix (beginning) of any other
codeword. This characteristic ensures unambiguous decoding of the compressed data.

Imagine a dictionary:

Each word (symbol) has a unique abbreviation (codeword).

In a prefix code, no abbreviation can be the beginning of another word.

If any codeword is the prefix of any other codeword then it is NOT a prefix code.

Uniquely Decodable Code :

In data compression, a uniquely decodable code is a set of codewords (encoded

representations of symbols) with a crucial property: during the decoding process, you
can uniquely identify the original symbol based on the received codeword. This
ensures that the encoded data can be accurately reconstructed from its compressed
form.

Imagine a dictionary where each word (symbol) has a unique abbreviation (codeword).
A uniquely decodable code ensures that no abbreviation accidentally matches the
beginning of another word. Here's why it's important:

Unambiguous Decoding: If a code isn't uniquely decodable, you might encounter

situations like:

Overlapping Prefixes: One codeword might be a prefix (beginning) of another. For

example, consider codewords "0" and "01." When you receive "0," you can't tell if it's
just "0" or the beginning of "01."

Dangling Suffixes: A codeword might be a suffix (ending) of another. Imagine

codewords "10" and "1000." Receiving "1000" could be interpreted as either "10"
repeated multiple times or the single codeword "1000."

These ambiguities make it impossible to reconstruct the original data accurately.

SIRF SYLLABUS ME H ISLIYE:

Algorithmic Information Theory (AIT): It's about finding the shortest way to describe
something. In data compression, it means trying to represent data using as little space
as possible while still keeping all the important information.

Minimum Description Length (MDL) Principle: This principle suggests that the best
way to describe data is to find the shortest computer program that can generate it,
plus the length of that program. In data compression, it means finding the shortest way
to describe the data and the method used to compress it.

When applied to data compression:

AIT helps us understand how complex the data is and how much we can compress it
theoretically.

The MDL principle guides us in finding the most efficient compression method by
considering both the data and the description of how it's compressed.

So, these concepts help in creating compression methods that make data smaller
while still being able to recreate the original data accurately.
All the Best
"Enjoyed these notes? Feel free to share them with

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

Visit: www.collegpt.com

www.collegpt.com [email protected]

Fundamentals of Engineering Fe Electrical and Computer Practice Exam 2 PDF Free
No ratings yet
Fundamentals of Engineering Fe Electrical and Computer Practice Exam 2 PDF Free
107 pages
Implementation Details and Examples: Variable-Length Entropy Encoding Lossless Data Compression
No ratings yet
Implementation Details and Examples: Variable-Length Entropy Encoding Lossless Data Compression
26 pages
DC M1 Merged
No ratings yet
DC M1 Merged
26 pages
Data Compression Unit-1 - 1
No ratings yet
Data Compression Unit-1 - 1
21 pages
chap2
No ratings yet
chap2
47 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Data Compression Overview
No ratings yet
Data Compression Overview
77 pages
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
No ratings yet
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
13 pages
Lecture 3-Print
No ratings yet
Lecture 3-Print
22 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
Dereje Teferi Dereje - Teferi@aau - Edu.et
No ratings yet
Dereje Teferi Dereje - Teferi@aau - Edu.et
36 pages
Chapter08 1
No ratings yet
Chapter08 1
70 pages
Assignment cyber security solved
No ratings yet
Assignment cyber security solved
22 pages
Information Theory and Coding (Lecture 1) : Dr. Farman Ullah
No ratings yet
Information Theory and Coding (Lecture 1) : Dr. Farman Ullah
32 pages
DC 3
No ratings yet
DC 3
20 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
5 Data Compression
No ratings yet
5 Data Compression
46 pages
DC (Ca 1)
No ratings yet
DC (Ca 1)
11 pages
Group Presentation Digital Communication Systems
No ratings yet
Group Presentation Digital Communication Systems
29 pages
Data Compression Question Bank
No ratings yet
Data Compression Question Bank
1 page
HTCS501 unit 4
No ratings yet
HTCS501 unit 4
17 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Data Compression Explained
100% (1)
Data Compression Explained
92 pages
UNIT 3 notes
No ratings yet
UNIT 3 notes
6 pages
Analog & Digital Communication Presentation On Data Compression
No ratings yet
Analog & Digital Communication Presentation On Data Compression
31 pages
L15-Compression
No ratings yet
L15-Compression
63 pages
Introduction To Information Theory and Coding
No ratings yet
Introduction To Information Theory and Coding
46 pages
Data Compression Intro
100% (1)
Data Compression Intro
107 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Unit 1
100% (2)
Unit 1
45 pages
Digital Communication Unit 5
No ratings yet
Digital Communication Unit 5
105 pages
3 Information Theory
No ratings yet
3 Information Theory
48 pages
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
No ratings yet
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
26 pages
Data Compression Explained
No ratings yet
Data Compression Explained
110 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Basics of Compression: Goals
No ratings yet
Basics of Compression: Goals
15 pages
MOD 1 DCT
No ratings yet
MOD 1 DCT
37 pages
3-1-Lossless Compression
No ratings yet
3-1-Lossless Compression
10 pages
Data Compression
No ratings yet
Data Compression
20 pages
L M I C: Anguage Odeling S Ompression
No ratings yet
L M I C: Anguage Odeling S Ompression
17 pages
Lecture 3 Compression in Multimedia
No ratings yet
Lecture 3 Compression in Multimedia
60 pages
Umit;1 Mmdcs
No ratings yet
Umit;1 Mmdcs
17 pages
Dce Easy Solution
0% (1)
Dce Easy Solution
87 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
Data Compression: Chapter - 2 Mathematical Preliminaries For Lossless Compression
100% (2)
Data Compression: Chapter - 2 Mathematical Preliminaries For Lossless Compression
26 pages
Channel Coding Theorem
No ratings yet
Channel Coding Theorem
23 pages
Video Processing Communications Yao Wang Chapter8a
No ratings yet
Video Processing Communications Yao Wang Chapter8a
19 pages
Language Modeling Is Compression
No ratings yet
Language Modeling Is Compression
16 pages
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
No ratings yet
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
7 pages
Image Compression: Sankalp Kallakuri
No ratings yet
Image Compression: Sankalp Kallakuri
21 pages
Unit 1 - CA209 Zohaib
No ratings yet
Unit 1 - CA209 Zohaib
24 pages
Compression 2
No ratings yet
Compression 2
70 pages
Information Theory 1
No ratings yet
Information Theory 1
31 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
C++ Programming: From Novice to Expert in a Step-by-Step Journey
From Everand
C++ Programming: From Novice to Expert in a Step-by-Step Journey
Ryan Campbell
No ratings yet
Computer Data
From Everand
Computer Data
Angel Gabaldon
No ratings yet
1660672522126_DPP_Class 10_Maths_QuadraticEquations_Questions
No ratings yet
1660672522126_DPP_Class 10_Maths_QuadraticEquations_Questions
5 pages
Maria Mateo Crespo - 9th Grade - Lesson 4 Alexander The Great and The Legacy of Greece
No ratings yet
Maria Mateo Crespo - 9th Grade - Lesson 4 Alexander The Great and The Legacy of Greece
7 pages
Ffsavaggfq 4
No ratings yet
Ffsavaggfq 4
17 pages
Speed Distance Time WS
No ratings yet
Speed Distance Time WS
8 pages
Math Reviewer 2nd Quarter Grade 10
100% (2)
Math Reviewer 2nd Quarter Grade 10
5 pages
Isoparametric Formulation
No ratings yet
Isoparametric Formulation
40 pages
Civil Service - Electrical Engineering Main Paper I & II - 1992 - 2007
No ratings yet
Civil Service - Electrical Engineering Main Paper I & II - 1992 - 2007
147 pages
Alevel Physics 1 2021 OCT -Final
No ratings yet
Alevel Physics 1 2021 OCT -Final
140 pages
Question 1621907
No ratings yet
Question 1621907
2 pages
The VBA Guide For Using Userform ListBox Controls
No ratings yet
The VBA Guide For Using Userform ListBox Controls
4 pages
Week 7 - PROBABILITY OF UNION OF TWO EVENTS-Lesson Plan
No ratings yet
Week 7 - PROBABILITY OF UNION OF TWO EVENTS-Lesson Plan
6 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
Measuring The Technical Efficiency of Public and Private Hospitals in Australia
No ratings yet
Measuring The Technical Efficiency of Public and Private Hospitals in Australia
35 pages
Retraction Note To Normal Families and Asymptotic
No ratings yet
Retraction Note To Normal Families and Asymptotic
1 page
Section 7.2
No ratings yet
Section 7.2
45 pages
Circuit Theory
No ratings yet
Circuit Theory
12 pages
DLL - Mathematics 1 - Q2 - W4
No ratings yet
DLL - Mathematics 1 - Q2 - W4
8 pages
Class Routine Cse
No ratings yet
Class Routine Cse
20 pages
L1
No ratings yet
L1
21 pages
Lecture 06
No ratings yet
Lecture 06
23 pages
UNIT II Arrays and Strings
100% (1)
UNIT II Arrays and Strings
30 pages
Course Name: Digital Signal Processing Course Code: EE 605A Credit: 3
No ratings yet
Course Name: Digital Signal Processing Course Code: EE 605A Credit: 3
6 pages
Review Exercise Three Dimensional Geometrical Shapes PDF
No ratings yet
Review Exercise Three Dimensional Geometrical Shapes PDF
12 pages
infosys-1
No ratings yet
infosys-1
28 pages
real number sytem
No ratings yet
real number sytem
2 pages
SCIENCE-7-MODULE-10-BOOKLET (1) Ans
No ratings yet
SCIENCE-7-MODULE-10-BOOKLET (1) Ans
16 pages
Geo Trig
No ratings yet
Geo Trig
25 pages
Newton-Raphson Method of Solving A Nonlinear Equation: After Reading This Chapter, You Should Be Able To
No ratings yet
Newton-Raphson Method of Solving A Nonlinear Equation: After Reading This Chapter, You Should Be Able To
10 pages
Experiment No.-4: Stefan'S Law
100% (3)
Experiment No.-4: Stefan'S Law
4 pages

Unit - 2 - Mathematical Preliminaries For Lossless Compression Models

Uploaded by

Unit - 2 - Mathematical Preliminaries For Lossless Compression Models

Uploaded by

Data Compression C LLEGPT IT603N

Get Prepared Together

● Modeling : It involves creating a representation or description of the structure,

PYQ: Example of modeling and coding in lossless compression.

● Input Data: "mississippi"

Variable-Length Coding: Techniques like Huffman coding and arithmetic coding

Components of Markov Model :

Types of Markov Model :

Limitations of Markov Model :

Transition Sunny (S) Rainy (R) Cloudy (C)

Sunny (S) 0.7 0.2 0.1

Rainy (R) 0.3 0.6 0.1

Cloudy (C) 0.4 0.3 0.3

4. Composite Source Model:

First Order Entropy :

In data compression, a prefix code is a set of codewords (encoded representations of

Each word (symbol) has a unique abbreviation (codeword).

In a prefix code, no abbreviation can be the beginning of another word.

Uniquely Decodable Code :

In data compression, a uniquely decodable code is a set of codewords (encoded

Unambiguous Decoding: If a code isn't uniquely decodable, you might encounter

Overlapping Prefixes: One codeword might be a prefix (beginning) of another. For

Dangling Suffixes: A codeword might be a suffix (ending) of another. Imagine

These ambiguities make it impossible to reconstruct the original data accurately.

When applied to data compression:

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

You might also like