0% found this document useful (0 votes)
53 views

Huffman Coding - Wikipedia

afda

Uploaded by

rommel baldago
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Huffman Coding - Wikipedia

afda

Uploaded by

rommel baldago
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

9/24/21, 4:18 PM Huffman coding - Wikipedia

Huffman coding
In computer science and information theory, a Huffman code is
a particular type of optimal prefix code that is commonly used for
lossless data compression. The process of finding or using such a
code proceeds by means of Huffman coding, an algorithm
developed by David A. Huffman while he was a Sc.D. student at
MIT, and published in the 1952 paper "A Method for the
Construction of Minimum-Redundancy Codes".[1]

The output from Huffman's algorithm can be viewed as a Huffman tree generated from the
variable-length code table for encoding a source symbol (such as exact frequencies of the text "this is
a character in a file). The algorithm derives this table from the an example of a huffman tree". The
estimated probability or frequency of occurrence (weight) for frequencies and codes of each
each possible value of the source symbol. As in other entropy character are below. Encoding the
encoding methods, more common symbols are generally sentence with this code requires
represented using fewer bits than less common symbols. 135 (or 147) bits, as opposed to 288
Huffman's method can be efficiently implemented, finding a code (or 180) bits if 36 characters of 8 (or
in time linear to the number of input weights if these weights are 5) bits were used. (This assumes
sorted.[2] However, although optimal among methods encoding that the code tree structure is known
symbols separately, Huffman coding is not always optimal among to the decoder and thus does not
need to be counted as part of the
all compression methods - it is replaced with arithmetic coding[3]
transmitted information.)
or asymmetric numeral systems[4] if better compression ratio is
required.

Contents
History
Terminology
Problem definition
Informal description
Formalized description
Example
Basic technique
Compression
Decompression
Main properties
Optimality
Variations
n-ary Huffman coding
Adaptive Huffman coding
Huffman template algorithm
Length-limited Huffman coding/minimum variance Huffman coding
https://en.wikipedia.org/wiki/Huffman_coding 1/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

Huffman coding with unequal letter costs


Char Freq Code
Optimal alphabetic binary trees (Hu–Tucker coding)
space 7 111
The canonical Huffman code
a 4 010
Applications
e 4 000
References
f 3 1101
Bibliography
h 2 1010
External links
i 2 1000
m 2 0111
History n 2 0010
s 2 1011
In 1951, David A. Huffman and his MIT information theory classmates
were given the choice of a term paper or a final exam. The professor, t 2 0110
Robert M. Fano, assigned a term paper on the problem of finding the l 1 11001
most efficient binary code. Huffman, unable to prove any codes were the
most efficient, was about to give up and start studying for the final when o 1 00110
he hit upon the idea of using a frequency-sorted binary tree and quickly p 1 10011
proved this method the most efficient.[5] r 1 11000

In doing so, Huffman outdid Fano, who had worked with information u 1 00111
theory inventor Claude Shannon to develop a similar code. Building the x 1 10010
tree from the bottom up guaranteed optimality, unlike the top-down
approach of Shannon–Fano coding.

Terminology
Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a
prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular
symbol is never a prefix of the bit string representing any other symbol). Huffman coding is such a
widespread method for creating prefix codes that the term "Huffman code" is widely used as a
synonym for "prefix code" even when such a code is not produced by Huffman's algorithm.

Problem definition

Informal description
Given
A set of symbols and their weights (usually proportional to probabilities).
Find
A prefix-free binary code (a set of codewords) with minimum expected codeword length
(equivalently, a tree with minimum weighted path length from the root).

Formalized description

https://en.wikipedia.org/wiki/Huffman_coding 2/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

Input.

Alphabet , which is the symbol alphabet of


size .

Tuple , which is the tuple of the (positive)


symbol weights (usually proportional to probabilities), i.e.
.

Output.

Code , which is the tuple of (binary)


codewords, where is the codeword for .

Goal.

Let be the weighted path

length of code . Condition: for any


code .

Constructing a Huffman Tree


Example

We give an example of the result of Huffman coding for a code with five characters and given weights.
We will not verify that it minimizes L over all codes, but we will compute L and compare it to the
Shannon entropy H of the given set of weights; the result is nearly optimal.

Symbol (ai) a b c d e Sum


Input (A, W)
Weights (wi) 0.10 0.15 0.30 0.16 0.29 =1

Codewords (ci) 010 011 11 00 10

Codeword length (in bits)


 
3 3 2 2 2
Output C (li)

Contribution to weighted path length

0.30 0.45 0.60 0.32 0.58 L(C) = 2.25


(li wi )

Probability budget

1/8 1/8 1/4 1/4 1/4 = 1.00


(2−li)
Information content (in bits)

Optimality 3.32 2.74 1.74 2.64 1.79  


(−log2 wi) ≈

Contribution to entropy

0.332 0.411 0.521 0.423 0.518 H(A) = 2.205


(−wi log2 wi)

For any code that is biunique, meaning that the code is uniquely decodeable, the sum of the
probability budgets across all symbols is always less than or equal to one. In this example, the sum is
strictly equal to one; as a result, the code is termed a complete code. If this is not the case, one can
always derive an equivalent code by adding extra symbols (with associated null probabilities), to make
the code complete while keeping it biunique.

As defined by Shannon (1948), the information content h (in bits) of each symbol ai with non-null
probability is

https://en.wikipedia.org/wiki/Huffman_coding 3/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

The entropy H (in bits) is the weighted sum, across all symbols ai with non-zero probability wi, of the
information content of each symbol:

(Note: A symbol with zero probability has zero contribution to the entropy, since

So for simplicity, symbols with zero probability can be left out of the formula above.)

As a consequence of Shannon's source coding theorem, the entropy is a measure of the smallest
codeword length that is theoretically possible for the given alphabet with associated weights. In this
example, the weighted average codeword length is 2.25 bits per symbol, only slightly larger than the
calculated entropy of 2.205 bits per symbol. So not only is this code optimal in the sense that no other
feasible code performs better, but it is very close to the theoretical limit established by Shannon.

In general, a Huffman code need not be unique. Thus the set of Huffman codes for a given probability
distribution is a non-empty subset of the codes minimizing for that probability distribution.
(However, for each minimizing codeword length assignment, there exists at least one Huffman code
with those lengths.)

Basic technique

Compression

The technique works by creating a binary tree of nodes. These can be stored in a regular array, the size
of which depends on the number of symbols, . A node can be either a leaf node or an internal node.
Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of
appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the
code (in reverse) starting from a leaf node. Internal nodes contain a weight, links to two child
nodes and an optional link to a parent node. As a common convention, bit '0' represents following
the left child and bit '1' represents following the right child. A finished tree has up to leaf nodes and
internal nodes. A Huffman tree that omits unused symbols produces the most optimal code
lengths.

The process begins with the leaf nodes containing the probabilities of the symbol they represent.
Then, the process takes the two nodes with smallest probability, and creates a new internal node
having these two nodes as children. The weight of the new node is set to the sum of the weight of the
children. We then apply the process again, on the new internal node and on the remaining nodes (i.e.,
we exclude the two leaf nodes), we repeat this process until only one node remains, which is the root
of the Huffman tree.

The simplest construction algorithm uses a priority queue where the node with lowest probability is
given highest priority:

1. Create a leaf node for each symbol and add it to the priority queue.
2. While there is more than one node in the queue:
https://en.wikipedia.org/wiki/Huffman_coding 4/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

1. Remove the two nodes of highest


priority (lowest probability) from the
queue
2. Create a new internal node with these
two nodes as children and with
probability equal to the sum of the two
nodes' probabilities.
3. Add the new node to the queue.
3. The remaining node is the root node and
the tree is complete.

Since efficient priority queue data structures


require O(log n) time per insertion, and a tree
with n leaves has 2n−1 nodes, this algorithm
operates in O(n log n) time, where n is the
number of symbols. Visualisation of the use of Huffman coding to encode the
message
If the symbols are sorted by probability, there "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABAC
is a linear-time (O(n)) method to create a BED". In steps 2 to 6, the letters are sorted by increasing
frequency, and the least frequent two at each step are
Huffman tree using two queues, the first one
combined and reinserted into the list, and a partial tree is
containing the initial weights (along with
constructed. The final tree in step 6 is traversed to generate
pointers to the associated leaves), and
the dictionary in step 7. Step 8 uses it to encode the
combined weights (along with pointers to the message.
trees) being put in the back of the second
queue. This assures that the lowest weight is
always kept at the front of one of the two queues:

1. Start with as many leaves as there are symbols.


2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least
likely item is in the head of the queue).
3. While there is more than one node in the queues:
1. Dequeue the two nodes with the lowest weight by examining the fronts of both queues.
2. Create a new internal node, with the two just-removed nodes as children (either node can be
either child) and the sum of their weights as the new weight.
3. Enqueue the new node into the rear of the second queue.
4. The remaining node is the root node; the tree has now been generated.

Once the Huffman tree has been generated, it is traversed to generate a dictionary which maps the
symbols to binary codes as follows:

1. Start with current node set to the root.


2. If node is not a leaf node, label the edge to the left child as 0 and the edge to the right child as 1.
Repeat the process at both the left child and the right child.

The final encoding of any symbol is then read by a concatenation of the labels on the edges along the
path from the root node to the symbol.

In many cases, time complexity is not very important in the choice of algorithm here, since n here is
the number of symbols in the alphabet, which is typically a very small number (compared to the
length of the message to be encoded); whereas complexity analysis concerns the behavior when n
https://en.wikipedia.org/wiki/Huffman_coding 5/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

grows to be very large.

It is generally beneficial to minimize the variance of codeword


length. For example, a communication buffer receiving Huffman-
encoded data may need to be larger to deal with especially long
symbols if the tree is especially unbalanced. To minimize
variance, simply break ties between queues by choosing the item
A source generates 4 different
in the first queue. This modification will retain the mathematical symbols with
optimality of the Huffman coding while both minimizing variance probability .A
and minimizing the length of the longest character code. binary tree is generated from left to
right taking the two least probable
symbols and putting them together
Decompression to form another equivalent symbol
having a probability that equals the
Generally speaking, the process of decompression is simply a sum of the two symbols. The
matter of translating the stream of prefix codes to individual byte process is repeated until there is
values, usually by traversing the Huffman tree node by node as just one symbol. The tree can then
each bit is read from the input stream (reaching a leaf node be read backwards, from right to
necessarily terminates the search for that particular byte value). left, assigning different bits to
Before this can take place, however, the Huffman tree must be different branches. The final
somehow reconstructed. In the simplest case, where character Huffman code is:
frequencies are fairly predictable, the tree can be preconstructed
(and even statistically adjusted on each compression cycle) and Symbol Code
thus reused every time, at the expense of at least some measure of
a1 0
compression efficiency. Otherwise, the information to reconstruct
the tree must be sent a priori. A naive approach might be to a2 10
prepend the frequency count of each character to the compression
a3 110
stream. Unfortunately, the overhead in such a case could amount
to several kilobytes, so this method has little practical use. If the a4 111
data is compressed using canonical encoding, the compression
model can be precisely reconstructed with just bits of The standard way to represent a
information (where is the number of bits per symbol). Another signal made of 4 symbols is by
method is to simply prepend the Huffman tree, bit by bit, to the using 2 bits/symbol, but the entropy
output stream. For example, assuming that the value of 0 of the source is 1.74 bits/symbol. If
this Huffman code is used to
represents a parent node and 1 a leaf node, whenever the latter is
represent the signal, then the
encountered the tree building routine simply reads the next 8 bits
average length is lowered to 1.85
to determine the character value of that particular leaf. The
bits/symbol; it is still far from the
process continues recursively until the last leaf node is reached; at theoretical limit because the
that point, the Huffman tree will thus be faithfully reconstructed. probabilities of the symbols are
The overhead using such a method ranges from roughly 2 to 320 different from negative powers of
bytes (assuming an 8-bit alphabet). Many other techniques are two.
possible as well. In any case, since the compressed data can
include unused "trailing bits" the decompressor must be able to
determine when to stop producing output. This can be accomplished by either transmitting the length
of the decompressed data along with the compression model or by defining a special code symbol to
signify the end of input (the latter method can adversely affect code length optimality, however).

Main properties

https://en.wikipedia.org/wiki/Huffman_coding 6/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

The probabilities used can be generic ones for the application domain that are based on average
experience, or they can be the actual frequencies found in the text being compressed.
This requires
that a frequency table must be stored with the compressed text. See the Decompression section above
for more information about the various techniques employed for this purpose.

Optimality
See also Arithmetic coding#Huffman coding

Huffman's original algorithm is optimal for a symbol-by-symbol coding with a known input
probability distribution, i.e., separately encoding unrelated symbols in such a data stream. However,
it is not optimal when the symbol-by-symbol restriction is dropped, or when the probability mass
functions are unknown. Also, if symbols are not independent and identically distributed, a single code
may be insufficient for optimality. Other methods such as arithmetic coding often have better
compression capability.

Although both aforementioned methods can combine an arbitrary number of symbols for more
efficient coding and generally adapt to the actual input statistics, arithmetic coding does so without
significantly increasing its computational or algorithmic complexities (though the simplest version is
slower and more complex than Huffman coding). Such flexibility is especially useful when input
probabilities are not precisely known or vary significantly within the stream. However, Huffman
coding is usually faster and arithmetic coding was historically a subject of some concern over patent
issues. Thus many technologies have historically avoided arithmetic coding in favor of Huffman and
other prefix coding techniques. As of mid-2010, the most commonly used techniques for this
alternative to Huffman coding have passed into the public domain as the early patents have expired.

For a set of symbols with a uniform probability distribution and a number of members which is a
power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. This
reflects the fact that compression is not possible with such an input, no matter what the compression
method, i.e., doing nothing to the data is the optimal thing to do.

Huffman coding is optimal among all methods in any case where each input symbol is a known
independent and identically distributed random variable having a probability that is dyadic. Prefix
codes, and thus Huffman coding in particular, tend to have inefficiency on small alphabets, where
probabilities often fall between these optimal (dyadic) points. The worst case for Huffman coding can
happen when the probability of the most likely symbol far exceeds 2−1 = 0.5, making the upper limit
of inefficiency unbounded.

There are two related approaches for getting around this particular inefficiency while still using
Huffman coding. Combining a fixed number of symbols together ("blocking") often increases (and
never decreases) compression. As the size of the block approaches infinity, Huffman coding
theoretically approaches the entropy limit, i.e., optimal compression.[6] However, blocking arbitrarily
large groups of symbols is impractical, as the complexity of a Huffman code is linear in the number of
possibilities to be encoded, a number that is exponential in the size of a block. This limits the amount
of blocking that is done in practice.

A practical alternative, in widespread use, is run-length encoding. This technique adds one step in
advance of entropy coding, specifically counting (runs) of repeated symbols, which are then encoded.
For the simple case of Bernoulli processes, Golomb coding is optimal among prefix codes for coding

https://en.wikipedia.org/wiki/Huffman_coding 7/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

run length, a fact proved via the techniques of Huffman coding.[7] A similar approach is taken by fax
machines using modified Huffman coding. However, run-length coding is not as adaptable to as many
input types as other compression technologies.

Variations
Many variations of Huffman coding exist,[8] some of which use a Huffman-like algorithm, and others
of which find optimal prefix codes (while, for example, putting different restrictions on the output).
Note that, in the latter case, the method need not be Huffman-like, and, indeed, need not even be
polynomial time.

n-ary Huffman coding

The n-ary Huffman algorithm uses the {0, 1,..., n − 1} alphabet to encode message and build an n-
ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies
as for binary (n equals 2) codes, except that the n least probable symbols are taken together, instead
of just the 2 least probable. Note that for n greater than 2, not all sets of source words can properly
form an n-ary tree for Huffman coding. In these cases, additional 0-probability place holders must be
added. This is because the tree must form an n to 1 contractor; for binary coding, this is a 2 to 1
contractor, and any sized set can form such a contractor. If the number of source words is congruent
to 1 modulo n-1, then the set of source words will form a proper Huffman tree.

Adaptive Huffman coding

A variation called adaptive Huffman coding involves calculating the probabilities dynamically
based on recent actual frequencies in the sequence of source symbols, and changing the coding tree
structure to match the updated probability estimates. It is used rarely in practice, since the cost of
updating the tree makes it slower than optimized adaptive arithmetic coding, which is more flexible
and has better compression.

Huffman template algorithm

Most often, the weights used in implementations of Huffman coding represent numeric probabilities,
but the algorithm given above does not require this; it requires only that the weights form a totally
ordered commutative monoid, meaning a way to order weights and to add them. The Huffman
template algorithm enables one to use any kind of weights (costs, frequencies, pairs of weights,
non-numerical weights) and one of many combining methods (not just addition). Such algorithms can
solve other minimization problems, such as minimizing , a problem first
applied to circuit design.

Length-limited Huffman coding/minimum variance Huffman coding

Length-limited Huffman coding is a variant where the goal is still to achieve a minimum
weighted path length, but there is an additional restriction that the length of each codeword must be
less than a given constant. The package-merge algorithm solves this problem with a simple greedy

https://en.wikipedia.org/wiki/Huffman_coding 8/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

approach very similar to that used by Huffman's algorithm. Its time complexity is , where is
the maximum length of a codeword. No algorithm is known to solve this problem in or
time, unlike the presorted and unsorted conventional Huffman problems, respectively.

Huffman coding with unequal letter costs

In the standard Huffman coding problem, it is assumed that each symbol in the set that the code
words are constructed from has an equal cost to transmit: a code word whose length is N digits will
always have a cost of N, no matter how many of those digits are 0s, how many are 1s, etc. When
working under this assumption, minimizing the total cost of the message and minimizing the total
number of digits are the same thing.

Huffman coding with unequal letter costs is the generalization without this assumption: the letters of
the encoding alphabet may have non-uniform lengths, due to characteristics of the transmission
medium. An example is the encoding alphabet of Morse code, where a 'dash' takes longer to send than
a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the
weighted average codeword length, but it is no longer sufficient just to minimize the number of
symbols used by the message. No algorithm is known to solve this in the same manner or with the
same efficiency as conventional Huffman coding, though it has been solved by Karp (http://ieeexplor
e.ieee.org/xpl/articleDetails.jsp?arnumber=1057615&newsearch=true&queryText=Minimum-redund
ancy%20coding%20for%20the%20discrete%20noiseless%20channel) whose solution has been
refined for the case of integer costs by Golin (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnum
ber=705558&queryText=dynamic%20programming%20golin%20constructing%20optimal%20prefix
-free&newsearch=true).

Optimal alphabetic binary trees (Hu–Tucker coding)

In the standard Huffman coding problem, it is assumed that any codeword can correspond to any
input symbol. In the alphabetic version, the alphabetic order of inputs and outputs must be identical.
Thus, for example, could not be assigned code , but instead
should be assigned either or . This is also known as
the Hu–Tucker problem, after T. C. Hu and Alan Tucker, the authors of the paper presenting the
first -time solution to this optimal binary alphabetic problem,[9] which has some
similarities to Huffman algorithm, but is not a variation of this algorithm. A later method, the Garsia–
Wachs algorithm of Adriano Garsia and Michelle L. Wachs (1977), uses simpler logic to perform the
same comparisons in the same total time bound. These optimal alphabetic binary trees are often used
as binary search trees.[10]

The canonical Huffman code

If weights corresponding to the alphabetically ordered inputs are in numerical order, the Huffman
code has the same lengths as the optimal alphabetic code, which can be found from calculating these
lengths, rendering Hu–Tucker coding unnecessary. The code resulting from numerically (re-)ordered
input is sometimes called the canonical Huffman code and is often the code used in practice, due to
ease of encoding/decoding. The technique for finding this code is sometimes called Huffman–
Shannon–Fano coding, since it is optimal like Huffman coding, but alphabetic in weight

https://en.wikipedia.org/wiki/Huffman_coding 9/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

probability, like Shannon–Fano coding. The Huffman–Shannon–Fano code corresponding to the


example is , which, having the same codeword lengths as the original solution,
is also optimal. But in canonical Huffman code, the result is .

Applications
Arithmetic coding and Huffman coding produce equivalent results — achieving entropy — when every
symbol has a probability of the form 1/2k. In other circumstances, arithmetic coding can offer better
compression than Huffman coding because — intuitively — its "code words" can have effectively non-
integer bit lengths, whereas code words in prefix codes such as Huffman codes can only have an
integer number of bits. Therefore, a code word of length k only optimally matches a symbol of
probability 1/2k and other probabilities are not represented optimally; whereas the code word length
in arithmetic coding can be made to exactly match the true probability of the symbol. This difference
is especially striking for small alphabet sizes.

Prefix codes nevertheless remain in wide use because of their simplicity, high speed, and lack of
patent coverage. They are often used as a "back-end" to other compression methods. Deflate (PKZIP's
algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization
followed by the use of prefix codes; these are often called "Huffman codes" even though most
applications use pre-defined variable-length codes rather than codes designed using Huffman's
algorithm.

References
1. Huffman, D. (1952). "A Method for the Construction of Minimum-Redundancy Codes" (http://comp
ression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf) (PDF).
Proceedings of the IRE. 40 (9): 1098–1101. doi:10.1109/JRPROC.1952.273898 (https://doi.org/1
0.1109%2FJRPROC.1952.273898).
2. Van Leeuwen, Jan (1976). "On the construction of Huffman trees" (http://www.staff.science.uu.nl/~
leeuw112/huffman.pdf) (PDF). ICALP: 382–410. Retrieved 2014-02-20.
3. Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (2014-04-09). Fundamentals of Multimedia (https://boo
ks.google.com/books?id=R6vBBAAAQBAJ). Springer Science & Business Media. ISBN 978-3-
319-05290-8.
4. J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp, The use of asymmetric numeral systems as an
accurate replacement for Huffman coding (http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=
7170048), Picture Coding Symposium, 2015.
5. Huffman, Ken (1991). "Profile: David A. Huffman: Encoding the "Neatness" of Ones and Zeroes"
(http://www.huffmancoding.com/my-uncle/scientific-american). Scientific American: 54–58.
6. Gribov, Alexander (2017-04-10). "Optimal Compression of a Polyline with Segments and Arcs".
arXiv:1604.07476 (https://arxiv.org/abs/1604.07476) [cs.CG (https://arxiv.org/archive/cs.CG)].
7. Gallager, R.G.; van Voorhis, D.C. (1975). "Optimal source codes for geometrically distributed
integer alphabets". IEEE Transactions on Information Theory. 21 (2): 228–230.
doi:10.1109/TIT.1975.1055357 (https://doi.org/10.1109%2FTIT.1975.1055357).
8. Abrahams, J. (1997-06-11). Written at Arlington, VA, USA. Division of Mathematics, Computer &
Information Sciences, Office of Naval Research (ONR). "Code and Parse Trees for Lossless
Source Encoding". Compression and Complexity of Sequences 1997 Proceedings. Salerno:
IEEE: 145–171. CiteSeerX 10.1.1.589.4726 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=1
0.1.1.589.4726). doi:10.1109/SEQUEN.1997.666911 (https://doi.org/10.1109%2FSEQUEN.1997.6
66911). ISBN 0-8186-8132-2. S2CID 124587565 (https://api.semanticscholar.org/CorpusID:12458
7565).
https://en.wikipedia.org/wiki/Huffman_coding 10/11
9/24/21, 4:18 PM Huffman coding - Wikipedia

9. Hu, T. C.; Tucker, A. C. (1971). "Optimal Computer Search Trees and Variable-Length
Alphabetical Codes". SIAM Journal on Applied Mathematics. 21 (4): 514. doi:10.1137/0121057 (ht
tps://doi.org/10.1137%2F0121057). JSTOR 2099603 (https://www.jstor.org/stable/2099603).
10. Knuth, Donald E. (1998), "Algorithm G (Garsia–Wachs algorithm for optimum binary trees)", The
Art of Computer Programming, Vol. 3: Sorting and Searching (2nd ed.), Addison–Wesley,
pp. 451–453. See also History and bibliography, pp. 453–454.

Bibliography
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to
Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section
16.3, pp. 385–392.

External links
Huffman coding in various languages on Rosetta Code (http://rosettacode.org/wiki/Huffman_codin
g)
Huffman codes (python implementation) (https://gist.github.com/jasonrdsouza/1c9c895f43497d15
eb2e)
A visualization of Huffman coding (https://demo.tinyray.com/huffman)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Huffman_coding&oldid=1032872889"

This page was last edited on 10 July 2021, at 05:34 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License;


additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

https://en.wikipedia.org/wiki/Huffman_coding 11/11

You might also like