A Software Implementation of The Shannon-Fano Coding Algorithm

This document describes a software implementation of the Shannon-Fano coding algorithm for data compression written in C#. The implementation includes a graphical user interface with forms to select a file for compression, display compression statistics, and provide information to the user. It compresses text files by sorting symbols by frequency, dividing them into sets to construct a binary tree, and assigning variable-length binary codes to each symbol. Experimental results demonstrate the compression ratios and encoding speeds achieved by the program.

Uploaded by

khagaraj paneru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views4 pages

A Software Implementation of The Shannon-Fano Coding Algorithm

Uploaded by

khagaraj paneru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

A Software Implementation of the Shannon-Fano Coding

Algorithm
Student authors: Đorđe K. Manoilov1 and Daniel S. Dimitrov1
Mentors: Radomir Stanković2 and Dušan Gajić2
Abstract – The Shannon-Fano coding technique is one of the for educational purposes. In the following Section IV we give
earliest algorithms which produce code words with minimum experimental results for data compression ratio, time and
redundancy and it serves as a basis for some more recent number of different characters. We close the paper with some
methods. In this paper, we present a C# implementation of the conclusions in the final section.
Shannon-Fano encoding method for data compression. We
conducted various experiments with different inputs provided to
the application and recorded compression rates and algorithm II. SHANNON - FANO ALGORITHM
running times. The presented solution features a graphical user
interface and has solid real-world performance, but it was
developed primarily as an education tool that can help students A. Theoretical basis and the algorithm
to better understand this encoding technique.
Shannon-Fano coding was developed by Claude Elwood
Keywords – Shannon-Fano encoding, C# programming Shannon and Robert Fano [1]. This is a technique which uses
solution, text compression.
prefix encoding. It is based on a set of symbols and their
probabilities.
I. INTRODUCTION A prefix code is a type of a code system which is
characterized by a prefix property. This property states that
Data compression is a mathematical method - an algorithm there is no valid code word in the system that is a prefix (start)
used to decrease the number of the bits in a file that are of any other valid code word in the set. Message can be
necessary for storage, sending or transferring of electronic transmitted as a sequence of concatenated code words,
information. In other words, by using compression the size of without any extra markers to frame the words in the message
a file or group of files is decreased and space needed for using prefix code. The recipient decodes the message by
storing the information becomes smaller. repeating the process searching for prefixes that form valid
There are some compression methods that loose data, but code words. This is not possible with codes that lack the
we will discuss only compression that occurs without loss. prefix property. Shannon–Fano coding starts with the set of
The "good" part is that the compressed data will be symbols, with elements arranged in order from most probable
decompressed in the same form (recovering the data into its to least probable. After that, the set is divided into two sets
initial state), but an error producing even a bit less would be whose total probabilities are as close as possible to being
fatal. Compression with no loss can be realized with different equal. All symbols then have the first digits of their codes
algorithms like: RLE (Run Length Encoding) algorithm, assigned. Symbols in the first set receive "0" and symbols in
algorithm for removing all zeros, Shannon-Fano algorithm, the second set receive "1". Shannon–Fano coding uses a
Huffman algorithm [1]. binary tree structure. As long as any set with more than one
We will discuss the Shannon-Fano compression. For member remains, the same process is repeated. When a set has
Shannon-Fano compression there is an algorithm which uses been reduced to one symbol this means that the symbol's code
prefix coding [1]. is complete and will not form the prefix of any other symbol's
In this paper, we will present its implementation and code.
include test results for different textual files [7]. In Section II The algorithm produces codes with variable and fairly
we describe the theoretical basis of the Shannon-Fano coding. efficient length. When the two smaller sets produced by
Next, in Section III we present a software solution for data partitioning are of exactly equal probabilities, one bit of
compression using the Shannon-Fano algorithm realized in C# information used to distinguish them is used most efficiently.
programming language. This application is mainly developed It can be seen from the examples that the Shannon – Fano
algorithm does not always produce the optimum length codes.
For a set of probabilities {0.35, 0.17, 0.17, 0.16, 0:15}
Student authors: Shannon - Fano coding does not give the optimal length code.
1
Đorđe Manoilov and Daniel Dimitrov are with the Faculty of
Electronic Engineering, Aleksandra Medvedeva 14, 18000 Niš,
The Shannon – Fano compression uses binary tree as data
Serbia, E-mails: [email protected], [email protected]. structure where the encoded symbols are placed in the leaves
of this tree.
Mentors: The tree is constructed in the specific way in order to define
2
Radomir Stanković and Dušan Gajić are with the University of the effective code table. The actual algorithm is simple:
Niš, Faculty of Electronic Engineering, Aleksandra Medvedeva 14,  For a given list of symbols, develop a corresponding list
18000 Niš, Serbia, E-mails: [email protected], of probabilities or frequency counts, so that each
[email protected]. symbol’s relative frequency of occurrence is known.

505
 Sort the lists of symbols according to frequency, with the The application consists of four forms (Fig. 2.). "Main
most frequently occurring symbols at the left and the least form" is used for selection of file (for coding) or for manual
common at the right. input of text for coding. Also, on the form "Main form" (Fig.
 Divide the list into two parts, with the total frequency 1.) the symbols and their respectable codes are displayed. It is
counts of the left half being as close to the total of the possible to save coded text on desired location on disk or
right as possible. other medium. "Manual form" offers a brief user manual. In
 The left half of the list is assigned the binary digit 0, and "Statistics form" (Fig. 3.) we can see degree of compression
the right half is assigned the digit 1. This means that the for selected text. "Information form" includes information
codes for the symbols in the first half will all start with 0, about authors of the application.
and the codes in the second half will all start with 1. Text that is necessary to compress is placed into a string
 Recursively apply the steps 3 and 4 to each of the two variable. In the application there is a function for the
halves, subdividing groups and adding bits to the codes separation of the different nodes, and also for calculating their
until each symbol has become a corresponding code leaf probabilities of occurrence. Probability of occurrence of
on the tree. symbols is calculated as the ratio of the number of
occurrences of this symbol and the number of symbols in the
file. For the purposes of the algorithm, it is necessary to
B. The field of use arrange these symbols in ascending or descending order. After
sorting, coding of symbols is done by calling the Shannon-
Shannon–Fano coding is used in the IMPLODE [2] Fano algorithm implementation. All symbols in the text
compression method, which is part of the ZIP file format. change in to their code and all of that put into the new string.
Huffman algorithm [1] is an improved version of the Shannon Code is replaced with its symbols via library function
– Fano algorithm used to compress music files in MP3 format StringReplace.
and for JPEG picture compression [8]. C# does not support work on the level of bits. Therefore,
before entering into a binary file, the sequence of 8 characters
is stored in the buffer, which is the size of a byte. 0 is entered
III. ARCHITECTURE OF THE APPLICATION AND into the buffer by moving the contents of the buffer to the left
(shift - left), 1 is entered into the buffer using the shift - left
THE PROGRAMMING IMPLEMENTATION
and logical OR operation with 0x01h. This method is possible
if the length of the coded text is divisible with 8. It is therefore
The application is developed in Visual C# .NET 3.5 and it
can be only used within Microsoft Windows operating system.

Fig. 1. Main form of the application.

506
necessary to add additional 0 in buffer with last entry in the IV. EXPERIMENTAL RESULTS
file. This way leads to an increase in encoded file, for up to 7
bits, but it allows the simulation of work on the level of bits in The application was tested for various input files in order to
C#. get time and percentage of compression. Input file is a text.
The all of the experiments were done using a Laptop PC with
Intel Core 2 Duo T5450 processor and 3 GBs of RAM,
running a Windows XP Service Pack 2 operating. The
duration of compression depends on the computer's hardware
and current utilization of computer resources. The test results
for normal text are shown in Table I, and test results for
source code are shown in Table II. Tables I and II also show
that the compression speed and compression ratio also depend
on the number of different characters and file size. For a small
number of different character(s) encoding goes fast regardless
of the size of the file. This is because each character encodes a
small number of bits and operations with strings quickly
completed. For very large files (about 60 MB) the application
reports “Memory error”. The problem is caused because of the
usage of strings and can be solved by using a StringBuilder.
Time coding for a normal text file as source code or a book is
at most a few seconds. Compression ratio is about 50% but if
there are lots of similar character goes up to 80%. From the
presented results we can also conclude that the compression
ratio of source codes is less than for the plain text.

Fig. 2. The architecture of the application.

Fig. 3. Statistics form.

507
TABLE I V. CONCLUSION
DIFFERENT TEXT FILES, n – NUMBER OF DIFFERENT
CHARACTERS, t1 – TIME CODING, t2 – RECORDING TIME, fs – Through performing the experiments with our
FILES SIZE, cr – COMPRESSION RATIO implementation of the Shannon-Fano algorithm we reached
the following conclusions:
n t1 t2 fs cr  The most common characters have shorter code words
4 0 ms 15.625 10B -> 3B 70 % and opposite.
10 0 ms 0 ms 10B -> 5B 50 %  For the same number of different characters, the
15 0 ms 15.625 ms 21B -> 11B 47.6 % algorithm has the same compression ratio.
5 0 ms 15.625 ms 39B -> 12B 69.2 %  For two files with the same size, but with different
47 0 ms 15.625 ms 1,88K -> 1006B 47.7 % number of unique characters, a file with a smaller number
116 656.2 ms 703.12 ms 100,9K -> 49,8K 50.6 % of different characters has a higher compression ratio.
116 31.9 s 29 s 4847K -> 2394K 50.6 %  Time required for recording and encoding increases with
3 10.01 s 3.625 s 23523K->4324K 81.8 % the size of the input file.
3 27 s Mem error 61.2M -> ? ?
Application that we have developed cannot actually
79 5.112 s 3.718 s 889K -> 501,B 43.26%
compete with existing commercial applications that compress
data. It was developed primarily as an education tool that can
help students to better understand this encoding technique that
TABLE II
serves as basis of more recent compression methods.
SOURCE CODES, n – NUMBER OF DIFFERENT
CHARACTERS, t1 – TIME CODING, t2 – RECORDING TIME, fs –
FILES SIZE, cr – COMPRESSION RATIO
REFERENCES
[1] David Salomon, Data Compression: The Complete Reference,
n t1 t2 fs cr 3rd Edition, Springer, 2004. (ISBN 0-387-40697-2)
[2] http://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding
92 807 ms 620ms 126K -> 55,1K 56.2% website last visited on 14/04/2011.
95 186 ms 144ms 29,2K -> 14,2K 51.13% [3] http://www.ustudy.in/node/6409, website last visited on
71 44 ms 36ms 8,56K -> 4,55K 46.88% 15/12/2010.
90 187 ms 118ms 29,7K -> 12,8K 56.99% [4] http://www.binaryessence.com/dct/en000041.htm, website last
90 153 ms 116 ms 21,3K -> 13,2K 38.14% visited on 14/04/2011.
71 19 ms 17 ms 3,1K -> 1,9K 38,71% [5] http://cppgm.blogspot.com/2008/01/shano-fano-code.html,
website last visited on 14/04/2011.
58 5 ms 7 ms 1K -> 647B 41.5%
[6] http://www.dotnetspark.com/Forum/169-how-to-open-one-chm-
65 28 ms 22 ms 5,4K -> 3,15K 41.9% help-file-c-sharp-windows.aspx, website last visited on
63 13 ms 12 ms 2,3K -> 1,35K 41.61% 14/04/2011.
[7] http://www.onlinehowto.net/Why-compress-/2, website last
visited on 14/04/2011.
[8] http://en.wikipedia.org/wiki/Huffman_coding, website last
visited on 14/04/2011.

508

Listaadmin 10 08 2021
No ratings yet
Listaadmin 10 08 2021
35 pages
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
No ratings yet
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
25 pages
Digital Data Compression
No ratings yet
Digital Data Compression
10 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
SF PDF
No ratings yet
SF PDF
2 pages
Shannon-Fano Coding: September 18, 2017
No ratings yet
Shannon-Fano Coding: September 18, 2017
2 pages
Shanon Fano1586521731 PDF
No ratings yet
Shanon Fano1586521731 PDF
6 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
Project Report: "Shannon Fannon Coding"
No ratings yet
Project Report: "Shannon Fannon Coding"
8 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Data Compression
No ratings yet
Data Compression
35 pages
ICT - Module 1 Lecture 3
No ratings yet
ICT - Module 1 Lecture 3
43 pages
4 Huffman and Shannon Fano Coding
No ratings yet
4 Huffman and Shannon Fano Coding
23 pages
Information Theory and Coding Variable Length Code
No ratings yet
Information Theory and Coding Variable Length Code
5 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Compression: Background
No ratings yet
Compression: Background
10 pages
Shox96 Article 0 2 0
No ratings yet
Shox96 Article 0 2 0
12 pages
Codes
No ratings yet
Codes
16 pages
Huffman Coding: A Case Study of A Comparison Between Three Different Type Documents
No ratings yet
Huffman Coding: A Case Study of A Comparison Between Three Different Type Documents
5 pages
Source Coding
No ratings yet
Source Coding
8 pages
Mmis G1 Ass
No ratings yet
Mmis G1 Ass
13 pages
Huffman Codes
No ratings yet
Huffman Codes
12 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Huffman Coding A Case Study of A Comparison
No ratings yet
Huffman Coding A Case Study of A Comparison
2 pages
What Is Huffman Coding and Its History
No ratings yet
What Is Huffman Coding and Its History
5 pages
Information Theory and Coding - Chapter 3
No ratings yet
Information Theory and Coding - Chapter 3
33 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
MM Lec 9 p2
No ratings yet
MM Lec 9 p2
4 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
31 Huffman Encoding
No ratings yet
31 Huffman Encoding
10 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Image Compression
100% (1)
Image Compression
38 pages
Analysis and Comparison of Algorithms For Lossless Data Compression
No ratings yet
Analysis and Comparison of Algorithms For Lossless Data Compression
8 pages
Di-Huffman Trees
No ratings yet
Di-Huffman Trees
44 pages
210 Huffman Encoding
No ratings yet
210 Huffman Encoding
10 pages
CH 6
No ratings yet
CH 6
21 pages
Compression
100% (1)
Compression
38 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
05 Compression
No ratings yet
05 Compression
46 pages
20 Compression
No ratings yet
20 Compression
58 pages
Text Compression
No ratings yet
Text Compression
16 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
40 pages
Multimdia Word File
No ratings yet
Multimdia Word File
22 pages
L10 Huffman Encoding Greedy
No ratings yet
L10 Huffman Encoding Greedy
52 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
HGGJ Chapter Four
No ratings yet
HGGJ Chapter Four
30 pages
7.file Compression
No ratings yet
7.file Compression
20 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
3 Chapter Text and Image Compression
No ratings yet
3 Chapter Text and Image Compression
132 pages
Data Compression With Huffman Coding: An Efficient Dynamic Implementation Using File Partitioning
No ratings yet
Data Compression With Huffman Coding: An Efficient Dynamic Implementation Using File Partitioning
7 pages
Chapter 2
No ratings yet
Chapter 2
13 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Programming With Python Solution W22
No ratings yet
Programming With Python Solution W22
21 pages
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
15 pages
Community-Infineon
No ratings yet
Community-Infineon
6 pages
Advantages and Disadvantages of VIDEO CALLING and SOCIAL NETWORKING
No ratings yet
Advantages and Disadvantages of VIDEO CALLING and SOCIAL NETWORKING
1 page
Signals and Systems T - Sheet
No ratings yet
Signals and Systems T - Sheet
2 pages
F NRIS
No ratings yet
F NRIS
9 pages
Development of A MIS in Marketing Function of A Sales Company
No ratings yet
Development of A MIS in Marketing Function of A Sales Company
4 pages
Information Assurance and Security
No ratings yet
Information Assurance and Security
4 pages
01 - LEGRAND - Cable F - UTP - LSZH Cat6A
No ratings yet
01 - LEGRAND - Cable F - UTP - LSZH Cat6A
2 pages
10.94.141.32 Tdprim
No ratings yet
10.94.141.32 Tdprim
49 pages
Detailed Lesson Plan in English 7 Using Card Catalog, Opac, and Search Engine To Locate Information
No ratings yet
Detailed Lesson Plan in English 7 Using Card Catalog, Opac, and Search Engine To Locate Information
6 pages
Pa-1 Portion (Grade 11)
No ratings yet
Pa-1 Portion (Grade 11)
1 page
Guide LDP
No ratings yet
Guide LDP
6 pages
DMR Conventional Mobile Radio - Clarity Transmission - Application Notes - R1.0
No ratings yet
DMR Conventional Mobile Radio - Clarity Transmission - Application Notes - R1.0
13 pages
Applied Python Programming (Cycle-1) - 1
No ratings yet
Applied Python Programming (Cycle-1) - 1
26 pages
Vector Graphics Algo
No ratings yet
Vector Graphics Algo
24 pages
ADB Bearing Sensor Tester
No ratings yet
ADB Bearing Sensor Tester
2 pages
Calculation of Duty
No ratings yet
Calculation of Duty
898 pages
A Survey On E-Commerce Recommendation Systems Using Artificial Intelligence and Current Trends For Personalization To Improve Customer Experience
No ratings yet
A Survey On E-Commerce Recommendation Systems Using Artificial Intelligence and Current Trends For Personalization To Improve Customer Experience
5 pages
Rate of Penetration (ROP) Optimization in Drilling With Vibration Control by Hegde 2019
No ratings yet
Rate of Penetration (ROP) Optimization in Drilling With Vibration Control by Hegde 2019
11 pages
Adhisuchna 25042013
No ratings yet
Adhisuchna 25042013
5 pages
Swa-Adhyayan Proposal Letter
No ratings yet
Swa-Adhyayan Proposal Letter
3 pages
PV Inverter Thesis
100% (1)
PV Inverter Thesis
7 pages
OPT B1plus Unit Test 11 Higher
No ratings yet
OPT B1plus Unit Test 11 Higher
6 pages
The Ultimate Guide To Debian 12 1 Dyslexia Friendly Edition Morgan Partridge PDF Download
100% (3)
The Ultimate Guide To Debian 12 1 Dyslexia Friendly Edition Morgan Partridge PDF Download
61 pages
NURSING INFORMATICS Updated
No ratings yet
NURSING INFORMATICS Updated
13 pages
Python Can
No ratings yet
Python Can
174 pages
MCQ Questions On Cyber Security
No ratings yet
MCQ Questions On Cyber Security
13 pages
Data Anonymization - SAP
No ratings yet
Data Anonymization - SAP
4 pages

A Software Implementation of The Shannon-Fano Coding Algorithm

Uploaded by

A Software Implementation of The Shannon-Fano Coding Algorithm

Uploaded by

A Software Implementation of the Shannon-Fano Coding

Fig. 1. Main form of the application.

Fig. 2. The architecture of the application.

Fig. 3. Statistics form.

You might also like