0% found this document useful (0 votes)

36 views31 pages

Imc14 05 Dictionary Codes

The document discusses dictionary-based coding techniques where commonly occurring patterns in data are identified, encoded more efficiently using indexes in a dictionary, and a default encoding is used for uncommon patterns, with the goal of achieving a smaller average number of bits per symbol; it covers static and adaptive dictionaries, examples of LZ77 and LZ78 algorithms which use previously encoded data to build the dictionary dynamically, and variations like LZW; problems with these methods include recurring patterns not captured if outside the window size or incomplete dictionary entries during decoding.

Uploaded by

Rohan Borgalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views31 pages

Imc14 05 Dictionary Codes

Uploaded by

Rohan Borgalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Dictionary-based Coding

Techniques

National Chiao Tung University

Chun-Jen Tsai
10/16/2014
Rationale

In previous two chapters, we looked at coding

techniques that assume a source that generates a
sequence of independent symbols.
Most data sources are correlated, thus, the coding step is
generally preceded by a de-correlation step (i.e. model
prediction).
Alternatively, we can build a list of commonly
occurring patterns and encode these patterns by
transmitting their index in the list
→ dictionary techniques

2/31
Static vs. Adaptive Dictionary

The dictionary holds a list of strings of symbols and it

may be static or dynamic (adaptive)
Static dictionary – permanent, sometimes allowing
the addition of strings but no deletions
Dynamic dictionary – holding strings previously found
in the input stream, allowing for additions and
deletions of strings as new input symbols are being
read

3/31
Basic Idea of Dictionary Coding

Given an input source, we want to

Identify frequent symbol patterns
Encode those more efficiently
Use a default (less efficient) encoding for the rest
Hopefully, the average bits per symbol gets smaller
In general, dictionary-based techniques works well
for highly correlated data (e.g. text), but less efficient
for data with low correlation (e.g. i.i.d. sources)

4/31
Motivating Example

Consider an ‘English’ source with 26 letters & six

punctuation marks
Single-symbol FLC, fixed-length encoding: 5 bps
Four-symbol FLC, fixed-length encoding: 20 bps (324)
If we assume uneven distribution of the symbols
Pick a dictionary witch contains the 256 most-frequent
patterns (probability p) and encode them with 8 bits
Encode the rest with 20 bits
Use 1-bit prefix to distinguish the two cases
then, the average rate is 9p + 21(1 – p) = 21 – 12p.
If p > 0.084, 21 – 12p < 20.

5/31
Static Dictionary

Using a static dictionary is less complex, but the

probability p of a hit highly depends on the
applications
For student records in a university is probably ok.
The key for success is that the most common
patterns are a small subset of all possible messages
Out of over 100,000 English words, only less than 2,000
words are used in most writings

6/31
Digram Coding
The dictionary is composed of
All letters from the alphabet
As many digrams (pairs of letters) as possible

For example, if we want to encode pure ASCII text

documents, we can design a dictionary of size 256
entries, and
Source alphabet: 95 printable ASCII symbols
Digrams: 161 most common pairs

7/31
Simple Digram Coding Example

The source alphabet A = {a, b, c, d, r}

Dictionary:

Try to code the sequence abracadabra, the output is

101100110111101100000.

8/31
Problem: Which Digrams to Use?
Source 1: LaTex documents Source 2: C programs

9/31
Adaptive Dictionary Technique

Original ideas published by Jacob Ziv and Abraham

Lempel in 1977 (LZ77/LZ1) and 1978 (LZ78/LZ2)
The most well-known dictionary-based technique,
LZW, is a modification to LZ algorithms published by
Terry Welch in 1984

10/31
LZ77 (1/2)

General approach
Dictionary is a portion of the previously encoded sequence
Use a sliding window for compression
Mechanism
Find the maximum length match for the string pointed to by
the search pointer in the search buffer, and encode it
Rationale
If patterns tend to repeat locally, we should be able to get
more efficient representation

11/31
LZ77 (2/2)
Sliding window is composed of a search buffer and a look-
ahead buffer (note: window size W = S + LA)
Match pointer Search pointer

a _ _ a b r a _ a d a b r a r r a r r a _

Search buffer Look-ahead buffer

(size S = 8) (size LA = 7)

Offset = search pointer – match pointer (o = 7)

Length of match = number of consecutive letters matched (l = 4)
Codeword (c = C(r)), where C(x) is the codeword for x
Encoding triple: <o, l, c> = <7, 4, C(r)>
If FLC is used and alphabet size is |A|, <o, l, c> can be
encoded with log2S + log2W + log2|A| bits.

12/31
Possible Cases for Triples

There could be three different possibilities that may

be encountered during the coding process:
No match for the next character to be encoded in the window
There is a match
The matched string extends inside the look-ahead buffer
For each of these cases, we have a triple to signal
the case to the decoder

14/31
LZ77 Decoding Example

Current input: <0, 0, C(d)> <7, 4, C(r)> <3, 5, C(d)>

15/31
LZ77 Variants

For LZ77, we have

Adaptive scheme, no prior knowledge
Asymptotically approaches the source statistics
Assumes that recurring patterns close to each others
Possible improvements
Variable-bit encoding: PKZip, zip, gzip, …, etc., uses a
variable-length coder to encode <o, l, c>.
Variable buffer size: larger buffer requires faster searches
Elimination of <0, 0, C(x)>
LZSS sends a flag bit to signal whether the next “token” is an
<o, l> pair or the codeword of a symbol

16/31
Problems with LZ77

If the recurring patterns happens with a period larger

than the search window, the performance is bad
Example:

17/31
LZ78

LZ78 improvements from LZ77

No search buffer – explicit dictionary instead
Encoder/decoder must build dictionary in sync
Encoding: <i, c>
i = index in the dictionary, i = 0 for symbols not in the dictionary
c = code of the following character
Example: encode the following contents
wabbabwabbabwabbabwabbabwoobwoobwoo

18/31
LZ78 Example

Input: wabbabwabbabwabbabwabbabwoobwoobwoo
Dictionaries: final dictionary

initial dictionary (empty)

Index Entry

dictionary after encoding w, a, b

Encoder Output Index Entry
<0, C(w)> 01 w
<0, C(a)> 02 a
<0, C(b)> 03 b

19/31
Remarks on LZ78

Observation
If we keep on encoding, the dictionary will keep on growing
Possible solutions
Stop growing the dictionary
Effectively switch to a static dictionary
Prune it
Based on usage statistics
Reset it
Start all over again
The best solution depends on the knowledge of the
source

20/31
LZ78 Variants: LZW
Invented by Terry Welch in 1984
Idea
Instead of <i, c>, encode i only
Algorithm
Initial dictionary contains all alphabet letters, p = null
while (!done)
read next symbol into a
if (p*a) is in the dictionary // Note: ‘*’ stands for concatenation
p = p*a
else
send out index of p
add p*a to the dictionary
p = a
end

21/31
Example: LZW Encoding

Input: wabbabwabbabwabbabwabbabwoobwoobwoo
Dictionaries:
initial dictionary (source alphabet) final dictionary
Index Entry
1 b
2 a
3 b
4 o
5 w

Output: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4
22/31
Problems with LZW Decoding

Decoding of LZW is simple, in general

Output symbols from the dictionary as indexed by the inputs
Construct the dictionary on-the-fly as the encoder does

However, if we have a message pattern cScS …,

where c is a character, S is a string, we may run into
a situation that the indexed entry is in partial
construction
Solution: the current dictionary entry under
construction is in p, we should allow reading partial
data out of p during decoding

23/31
Example: Special Case in Decoding

Alphabet A = {a, b}, input is abababab, encoder output

is 1235 ….
Decoding dictionaries:
initial dictionary intermediate dictionary

Index Entry
1 a
2 b

when we reach decoding of 5, p = ab???, we do not

have the complete output!

24/31
Application: Compress

An early implementation of LZW

Adaptive dictionary, starts with 29 entries
User can configure max codeword length bmax = 9~16
Dictionary grows up to double in size
When dictionary reaches 2bmax entries, it becomes a static
dictionary encoder
If compression ratio falls below a threshold, dictionary
is reset

25/31
Application: GIF Images

LZW scheme, similar to compress:

Clear code is used to reset the encoder/decoder. For
b bits/pixel images, 2b is used as the clear code
Dictionary size is initially 2b+1
Dictionary size can grows up to 4096 entries
Format:
Codewords stored in blocks of 8-bit characters
Each block begins with a header with a size count up to 255,
and ends with a block terminator symbol (8 zero bits)
The last block has a end-of-information code, 2b +1, before
the block terminator

26/31
GIF Performance

GIF vs. arithmetic coding

27/31
Application: PNG Images

Based on LZ77, patent-free alternative to GIF

Designed specifically for lossless image compression
Modes: true color, grayscale, 8-bit pallette
Two autonomous compression components
Deflate (RFC 1951) — LZ77-style dictionary compression
algorithm plus Huffman coding
Filtering — lossless transformations of byte-level image data

28/31
PNG – Deflate

Deflate = LZ77 + Huffman

Three types of data blocks
Uncompressed, LZ77 + fixed Huffman, LZ77 + adaptive
Huffman
Match length is between 3 and 258 bytes
A sliding window of at least 3-byte long is examined
If match is not found, encode the first byte and slide window
At each step, LZ77 either outputs a codeword for a literal or
a paired value of <match_length, offset>
Match length is encoded by index code (257~285) and a
selector code (0~5 bits)
Offset (1~32768) is encoded using Huffman code
29/31
PNG – Filtering

Filters are applied on a scanline-by-scanline basis

All algorithms applied to bytes (not pixels)
Filter types:
None: unmodified value
Sub: difference from previous byte value (mod 256)
Up: difference from the byte value above
Average: subtract average of the left and the above bytes
Paeth:
Compute initial estimate by left + above – upper_left
The value of left, above, or upper_left that is closest to the
initial estimate is used as the estimate

30/31
PNG: Performance

PNG vs. GIF vs. arithmetic coding

31/31

LZW Encoding and Decoding
No ratings yet
LZW Encoding and Decoding
18 pages
Multimedia_2017_2018_Lec10
No ratings yet
Multimedia_2017_2018_Lec10
34 pages
chapter 7
No ratings yet
chapter 7
70 pages
Forouzan6e ch11 PPTs Accessible
No ratings yet
Forouzan6e ch11 PPTs Accessible
119 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Image Compression
100% (1)
Image Compression
38 pages
Arithmetic Lempel and Ziv Coding Chapter 2 Part 2 EH
No ratings yet
Arithmetic Lempel and Ziv Coding Chapter 2 Part 2 EH
23 pages
Seminar Data Compression
No ratings yet
Seminar Data Compression
32 pages
Lempel Ziv Welch
No ratings yet
Lempel Ziv Welch
16 pages
Lecture 10-Print
No ratings yet
Lecture 10-Print
50 pages
Lossless Compression Techniques-Slides
No ratings yet
Lossless Compression Techniques-Slides
11 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
07 DictionaryCoding
No ratings yet
07 DictionaryCoding
37 pages
Lempel-Ziv-Welch - 2024
No ratings yet
Lempel-Ziv-Welch - 2024
9 pages
Dictionary Methods: Introduction To Lempel-Ziv Encoding
No ratings yet
Dictionary Methods: Introduction To Lempel-Ziv Encoding
40 pages
Slides 4
No ratings yet
Slides 4
11 pages
Unit - 5 - Dictionary Technique
No ratings yet
Unit - 5 - Dictionary Technique
19 pages
Dictionary Coding
No ratings yet
Dictionary Coding
44 pages
Ch5 Dictionary Coding
No ratings yet
Ch5 Dictionary Coding
56 pages
ITC-11
No ratings yet
ITC-11
11 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
Chapter 5 - Dictionary Techniques
No ratings yet
Chapter 5 - Dictionary Techniques
25 pages
Image Compression-2
No ratings yet
Image Compression-2
13 pages
Lecture 13 - Delta Coding
No ratings yet
Lecture 13 - Delta Coding
41 pages
Day 20
No ratings yet
Day 20
33 pages
DC Last Exam Notes
No ratings yet
DC Last Exam Notes
10 pages
Efficient Sequential Algorithms, Comp309: University of Liverpool
No ratings yet
Efficient Sequential Algorithms, Comp309: University of Liverpool
20 pages
Image Compression
No ratings yet
Image Compression
33 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Dictionary Techniques (Lempel-Ziv Codes) : Dictionary, and Encode These Patterns by Transmitting
No ratings yet
Dictionary Techniques (Lempel-Ziv Codes) : Dictionary, and Encode These Patterns by Transmitting
26 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
29 pages
DC 1
No ratings yet
DC 1
3 pages
Unit31 LZ78
No ratings yet
Unit31 LZ78
15 pages
Lec5 - LZW Compression
No ratings yet
Lec5 - LZW Compression
29 pages
Lempel Ziv
No ratings yet
Lempel Ziv
22 pages
The Lempel Ziv Algorithm: Seminar "Famous Algorithms" January 16, 2003
No ratings yet
The Lempel Ziv Algorithm: Seminar "Famous Algorithms" January 16, 2003
26 pages
Assignment 2 Data Compression
No ratings yet
Assignment 2 Data Compression
4 pages
Basics of Information Theory
No ratings yet
Basics of Information Theory
21 pages
Class Notes CS 3137 1 LZW Encoding
No ratings yet
Class Notes CS 3137 1 LZW Encoding
5 pages
Lecture19 PDF
No ratings yet
Lecture19 PDF
8 pages
Channel Coding Using Matlab
No ratings yet
Channel Coding Using Matlab
14 pages
LZ77 JensMueller
No ratings yet
LZ77 JensMueller
14 pages
4 LZW
No ratings yet
4 LZW
7 pages
LZ78
No ratings yet
LZ78
17 pages
Lempel-Ziv-Welch (LZW) Compression Algorithm
No ratings yet
Lempel-Ziv-Welch (LZW) Compression Algorithm
12 pages
4.ResM Non Stat Coding
No ratings yet
4.ResM Non Stat Coding
9 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Lesson 1 First Among Equals Equations Unit
0% (1)
Lesson 1 First Among Equals Equations Unit
20 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Lemp El Ziv Report
No ratings yet
Lemp El Ziv Report
17 pages
Lemp El Ziv Compression
No ratings yet
Lemp El Ziv Compression
6 pages
Compression: Author: Paul Penfield, Jr. Url: Toc
No ratings yet
Compression: Author: Paul Penfield, Jr. Url: Toc
5 pages
Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78
No ratings yet
Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78
9 pages
LZ SQUEEZER: A Compression Technique Based On LZ77 and LZ78
No ratings yet
LZ SQUEEZER: A Compression Technique Based On LZ77 and LZ78
4 pages
Implementation of Lempel-Ziv Algorithm For Lossless Compression Using VHDL
No ratings yet
Implementation of Lempel-Ziv Algorithm For Lossless Compression Using VHDL
2 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
Study of LZ77 and LZ78 Data Compression
No ratings yet
Study of LZ77 and LZ78 Data Compression
5 pages
Unit 2 - Part 7 Coding Information Sources: 1 Adaptive Variable-Length Codes
No ratings yet
Unit 2 - Part 7 Coding Information Sources: 1 Adaptive Variable-Length Codes
5 pages
SAP_LA_SACS21_EN_42_EX
No ratings yet
SAP_LA_SACS21_EN_42_EX
137 pages
Design and Implementation Af LZW Data Compression Algorithm
No ratings yet
Design and Implementation Af LZW Data Compression Algorithm
11 pages
Pu Rigid Foams CRC After Final Review
No ratings yet
Pu Rigid Foams CRC After Final Review
85 pages
Parker Pneumatic Catalogue PDE2600PNUK
No ratings yet
Parker Pneumatic Catalogue PDE2600PNUK
809 pages
Workshop On Digital Marketing Day 2 - Session 2 Social Media Marketing - Sachin Sadare
No ratings yet
Workshop On Digital Marketing Day 2 - Session 2 Social Media Marketing - Sachin Sadare
68 pages
1-4 Worksheets Oct42018
No ratings yet
1-4 Worksheets Oct42018
58 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Galvanizing Process
No ratings yet
Galvanizing Process
27 pages
Digital Marketing Day 2 Session 1
No ratings yet
Digital Marketing Day 2 Session 1
85 pages
Physics Class XII Formula All in One (Formula SH - Pfapasu5sj7gorhmns7q
No ratings yet
Physics Class XII Formula All in One (Formula SH - Pfapasu5sj7gorhmns7q
41 pages
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
No ratings yet
CSEP 590 Data Compression: Course Policies Introduction To Data Compression Entropy Variable Length Codes
93 pages
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
No ratings yet
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
35 pages
CP Lab Manual FInal.
No ratings yet
CP Lab Manual FInal.
52 pages
FZ 25 2021
No ratings yet
FZ 25 2021
247 pages
Lempel Ziv Coding Explained
No ratings yet
Lempel Ziv Coding Explained
1 page
Math Book 1 Chapter 12
No ratings yet
Math Book 1 Chapter 12
28 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Notes Data Compression and Cryptography Module 6 System Security
No ratings yet
Notes Data Compression and Cryptography Module 6 System Security
23 pages
SAG Mill Liner Study
100% (2)
SAG Mill Liner Study
109 pages
Image Compression
No ratings yet
Image Compression
38 pages
Audio and Video Compression
No ratings yet
Audio and Video Compression
16 pages
Sociology 01 Lesson 01
No ratings yet
Sociology 01 Lesson 01
41 pages
1996 CASALE HEtrays
No ratings yet
1996 CASALE HEtrays
10 pages
Audio Compression
No ratings yet
Audio Compression
81 pages
JPEG Image Compression: - K. M. Aishwarya
No ratings yet
JPEG Image Compression: - K. M. Aishwarya
31 pages
Sequential Circuits: Latches and Flip-Flops
No ratings yet
Sequential Circuits: Latches and Flip-Flops
32 pages
Imc14 03 Huffman Codes PDF
No ratings yet
Imc14 03 Huffman Codes PDF
31 pages
Flip Flop Conversions: - Srtod - Srtojk - Srtot - Jktot - Jktod - Jktosr - Dtot - Dtosr - Ttod
No ratings yet
Flip Flop Conversions: - Srtod - Srtojk - Srtot - Jktot - Jktod - Jktosr - Dtot - Dtosr - Ttod
23 pages
Report On 2-D State-Space Digital Filter Employing 2's Complement Overflow Arithmetic
No ratings yet
Report On 2-D State-Space Digital Filter Employing 2's Complement Overflow Arithmetic
28 pages
Effect of Chest Physiotherapy On The Prevention of Ventilator-Associated Pneumonia and Mortality
No ratings yet
Effect of Chest Physiotherapy On The Prevention of Ventilator-Associated Pneumonia and Mortality
15 pages
Life Time of A Variable
No ratings yet
Life Time of A Variable
10 pages
12FM Alg Methods
No ratings yet
12FM Alg Methods
6 pages
Lab Manual of DSIP
No ratings yet
Lab Manual of DSIP
22 pages
TCOM 370: NOTES 99-10 Data Compression
No ratings yet
TCOM 370: NOTES 99-10 Data Compression
19 pages
Data Science & Python - TIT - 24 August - Session 2 PDF
No ratings yet
Data Science & Python - TIT - 24 August - Session 2 PDF
17 pages
Tschuchnigg Lebeau Validation Emb Pile
No ratings yet
Tschuchnigg Lebeau Validation Emb Pile
101 pages
Blajer 1997
No ratings yet
Blajer 1997
19 pages
22480-2024-summer-mml
No ratings yet
22480-2024-summer-mml
4 pages
S16 - Scenario Manager - NPV
No ratings yet
S16 - Scenario Manager - NPV
9 pages
AntCorGen Manual
No ratings yet
AntCorGen Manual
3 pages
Inverse Trigonometrice Functions A-1
No ratings yet
Inverse Trigonometrice Functions A-1
6 pages
Cs 440 Theory of Algorithms / Cs 468 Al Ith Ibiif Ti Cs 468 Algorithms in Bioinformatics
No ratings yet
Cs 440 Theory of Algorithms / Cs 468 Al Ith Ibiif Ti Cs 468 Algorithms in Bioinformatics
10 pages
Assignment # 3: Bachelor of Science in Marine Transportation
No ratings yet
Assignment # 3: Bachelor of Science in Marine Transportation
3 pages
Programming Exercise Solutions
No ratings yet
Programming Exercise Solutions
4 pages
1st Periodic Test - Science 7
No ratings yet
1st Periodic Test - Science 7
4 pages
6 GFMF 150 Datasheet
No ratings yet
6 GFMF 150 Datasheet
2 pages
Introduction To JPEG Compression - Tutorialspoint
No ratings yet
Introduction To JPEG Compression - Tutorialspoint
4 pages
Tutorial Block 5-1
No ratings yet
Tutorial Block 5-1
2 pages
Rusczyk R. Introduction To Geometry 2ed 2007
100% (5)
Rusczyk R. Introduction To Geometry 2ed 2007
151 pages

Imc14 05 Dictionary Codes

Uploaded by

Imc14 05 Dictionary Codes

Uploaded by

Dictionary-based Coding

National Chiao Tung University

In previous two chapters, we looked at coding

The dictionary holds a list of strings of symbols and it

Given an input source, we want to

Consider an ‘English’ source with 26 letters & six

Using a static dictionary is less complex, but the

For example, if we want to encode pure ASCII text

The source alphabet A = {a, b, c, d, r}

Try to code the sequence abracadabra, the output is

Original ideas published by Jacob Ziv and Abraham

Search buffer Look-ahead buffer

Offset = search pointer – match pointer (o = 7)

There could be three different possibilities that may

Current input: <0, 0, C(d)> <7, 4, C(r)> <3, 5, C(d)>

For LZ77, we have

If the recurring patterns happens with a period larger

LZ78 improvements from LZ77

initial dictionary (empty)

dictionary after encoding w, a, b

Decoding of LZW is simple, in general

However, if we have a message pattern cScS …,

Alphabet A = {a, b}, input is abababab, encoder output

when we reach decoding of 5, p = ab???, we do not

An early implementation of LZW

LZW scheme, similar to compress:

GIF vs. arithmetic coding

Based on LZ77, patent-free alternative to GIF

Deflate = LZ77 + Huffman

Filters are applied on a scanline-by-scanline basis

PNG vs. GIF vs. arithmetic coding

You might also like