0% found this document useful (0 votes)
6 views

Microprocessor

This document discusses graph theoretic prefix codes and their decoding properties. It introduces the concept of inclusion codes and shows that inclusion codes are equivalent to exhaustive prefix codes. It also demonstrates that exhaustive variable-length inclusion codes are neither anagrammatic nor uniformly composed, and a necessary and sufficient condition for them to be self-synchronizing is for the greatest common divisor of the code word lengths to be 1.

Uploaded by

ufiiggovl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Microprocessor

This document discusses graph theoretic prefix codes and their decoding properties. It introduces the concept of inclusion codes and shows that inclusion codes are equivalent to exhaustive prefix codes. It also demonstrates that exhaustive variable-length inclusion codes are neither anagrammatic nor uniformly composed, and a necessary and sufficient condition for them to be self-synchronizing is for the greatest common divisor of the code word lengths to be 1.

Uploaded by

ufiiggovl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

INFORMATION AND CONTROL 15, 70-94 (1969)

Graph Theoretic Prefix Codes and Their


Synchronizing Properties*
L. S. BOBROW**
Department of Electrical Engineering, University of Massachusetts,
Amherst, Massachusetts 01002

AND
S. L. H A K I ~ I
Department of Electrical Engineering, Northwestern University, Evanston, Illinois
60201

This paper presents a study of a class of D-nary exhaustive prefix


codes which are generated by graphs. In addition to its use as the
representation of a code, the structure of a graph is utilized for de-
coding purposes. The concept of the inclusion property is intro-
duced, and an algorithm for the construction of an inclusion code
equivalent to any specified exhaustive prefix code is given. I t is shown
that the inclusion property is a sufficient condition for a code to have
an equivalent graph theoretic realization. Such a graph, which will
always have less branches than the tree representation of the code,
can be found easily. I t is demonstrated that exhaustive variable-
length inclusion codes are neither anagrammatic nor uniformly com-
posed. Therefore, a necessary and sufficient condition for such codes
to be self-synchronizable is g.c.d. (lengths) = 1.

I. INTRODUCTION
I n r e c e n t y e a r s a n u m b e r of a u t h o r s , such as K a s a m i (1961), H u f f m a n
(1964), F r a z e r (1964), H a k i m i a n d F r a n k (1965), B r e d e s o n a n d H a k i m i
(1967), a n d H a k i m i a n d B r e d e s o n (1968), h a v e a p p l i e d g r a p h t h e o r y t o
t h e s u b j e c t of e r r o r - c o r r e c t i n g a l g e b r a i c (fixed-length) codes. T h e p u r -
p o s e of t h i s p a p e r is to a p p l y g r a p h t h e o r y t o t h e a r e a of v a r i a b l e - l e n g t h
codes.

* This research was sponsored by the Air Force Office of Scientific Research,
Office of Aerospace Research, United States Air Force, under grant no.
AF-AFOSR-98-67.
** Formerly at Northwestern University.
7O
GRAPH THEORETIC PREFIX CODES 71

A code is a non-empty, finite set X consisting of elements xl, x2, • • • ,


x~ called code words. Each code word is a finite sequence of symbols
called letters, where a letter is an element of a non-empty, finite set A
called the alphabet of X. For a D-nary code A = [0, 1, . . . , D - 1}.
The length of a code word x i , denoted by L(x~) or L i , is the number of
letters in the sequence x~. A code in which all of the code words have the
same length is said to be afixed-Iength (block) code. All other codes are
called variable-length codes.
Any concatenation (without spacing) of code words is called a message.
A code is said to be uniquely decipherable if for every finite message, there
is one and only one concatenation of code words which forms that
message. T h e first [last] k, k = 1, 2, . . - , l, symbols of a code word of
length l is said to be a prefix [surf/x] of the code word. A prefix [suffix] of
code word of length l is proper if k < I. A code has the prefix [suff~x]
property if no code word is a prefix [suffix] of any other code word. If a
code has the prefix [suffix] property, it is called a prefix [anagrammatic]
code. 1 A code is exhaustive if any sequence of letters is either a message or
the prefix of a message.
If a code X = {xl, x 2 , . . . , x~} has corresponding probabilities
of transmission p(xl) = p l , p(x2) = p2, "'" , p ( x , ) = p~, such t h a t
~-'~%1 p~ = 1; then the average length L (of a code word) of the code is
given b y L = ~"=~ p~Li. If a code has minimum average length for
a given discrete probability distribution, it is said to be a m i n i m u m
redundancy (optimum) code.
If X = {xl, x2, -- • , xn} is a D-nary code, the characteristic sum z of X
is defined as
¢ = ~D -L~
i=l
An alternate form (Ash, 1965) for expressing a is

(T ~- ~ oJljD-~j
j=l
where ~b" is the number of code words of length l~.
Given a set of probabilities P -=- { p l , p2, " . , p~ [ ~ = 1 p~ = 1},
Huffman (1952) presented a method for the construction of an optimum
prefix code. Gilbert and Moore (1959) have shown that, for a given
discrete probability distribution, a Huffman code has an average length
• Any prefix code is uniquely decipherable.
72 B O B R O W AND HAKIMI

which is less than or equal to that of any uniquely decipherable code for
the same distribution. Furthermore, they proved that every binary
Huffman code is exhaustive.
It is known that a uniquely decipherable D-nary code with W =
~ = i ~tj code words exists if and only if ¢ =< 1 (McMillan, 1956 and
Kraft, 1949). Also, any uniquely decipherable exhaustive code has the
prefix property (Gilbert and iVfoore, 1959), and such a code exists if
and only if ¢ = 1 (Gilbert and Moore, 1959 and Fano, 1961).
GRAPHICAL REPRESENTATION OF PREFIX CODES
Any D-nary prefix code X = {xz, x2, -.. , x~} can be represented
graphically by a coding tree (Fano, 1961). For example, the binary prefix
code X = {10, 11, 010, 011, 0000, 0001, 0010, 0011} corresponds to the
tree shown in Fig. 1. Each vertex of ith order produces j vertices of
(i + 1)st order, where j G {0, 1, 2, -.. , D - 1, D}. A vertex which
produces no new vertices of higher order (j = 0) is called a terminal
vertex. A complete tree is a tree in which every vertex of ith order either
produces D vertices of (i + 1)st order or is a terminal vertex. Clearly,

: , T"order
0 order ~ i vertices
vertex 3 rdOrder
• vertices
ISt order "
vertices 2.a order
vertices
FJQ. 1
GRAPH THEORETIC PREFIX CODES 73

Y4

Y3

Y2

put
Y2
input
Y~

Yl

FIG. 2

the tree of Fig. 1 is complete. It can be shown (Fano, 1961) that a prefix
code is exhaustive if and only if its corresponding coding tree is complete.
An obvious consequence of this fact is that the number of code words of
maximum length is a multiple of D.
It can be demonstrated (Bobrow, 1968) 5 that if W is the number of
terminal vertices of a complete tree (i.e., W is the number of code words
of the corresponding code), then the number of branches in the tree is
exactly [ D / ( D - 1 ) ] ( W - 1).

DECODING P R E F I X CODES

The most direct general method for decoding any D-nary prefix code
is to construct a switching network based upon the tree corresponding
to the code. Such a network is formed by replacing each branch, situated
between an ith order vertex and an (i -b 1)st order vertex, which is
labeled d C I0, 1, -.. , D -- 1}, by a normally open contact that closes
2 As a m a t t e r of convenience, for the r e m a i n d e r of this p a p e r the symbol # will
i n d i c a t e reference to Bobrow (1968).
74 BOBROW AND HAKIMI

when the (i -P 1)st digit of a sequence is d. The network input is the 0


order vertex, and the output is the common connection of all the terminal
vertices.
The decoder for the code corresponding to Fig. 1 is given in Fig. 2.
All of the contacts in Fig. 2 are originally open. If the first bit of a mes-
sage is 1 [0], then contact y1[~1]is closed. If the second bit is 1[0], then all
contacts y2192] are closed; etc. The formation of a path of transmission
between input and output corresponds to the detection of a code word.
After the reception of a code word, all contacts are reset, and the process
is repeated. 3
As previously indicated, any exhaustive D-nary prefix code having
W code words may be decoded by a switching network with
[ D / ( D -- 1 ) ] ( W - 1) contacts.

II. GRAPH THEORETIC PREFIX CODES


A graph G is a finite collection of two types of entities: a set of M
vertices (nodes, points) vl, v2, -.. , v~ and a set of N branches (edges,
lines) b~, b 2 , . - - , bz¢. Each branch is connected between a pair of
vertices and is said to be incident to each of the vertices. The degree of a
vertex is the number of branches incident to that vertex. A subgraph g
of G is a subset of the branches of G which have the same incidence rela-
tionships as in G. The set of all branches in G which are not in g is de-
noted by ~ = G -- g. The number of branches in a subgraph g is repre-
sented by I g I- The subgraph of G which contains no branches is denoted
by ¢. An edge sequence is an alternating sequence of vertices and branches,
beginning and ending with a vertex, in which each branch is incident to
the vertex preceeding it and the vertex following it. A path is a subgraph
consisting of the branches of an edge sequence in which all of the vertices
are distinct. A circuit is a subgraph consisting of the branches of an edge
sequence in which all of the vertices except the first and the last are
distinct. A graph G is connected if there is a path between every pair of
vertices of G. A tree t of a graph G is a maximal subgraph of G which
does not contain a circuit. The rank of G, denoted by R ( G ) , is the num-
ber of branches in t, i.e., R ( G ) = i t [. If G is connected, R ( G ) = M -- 1.
A cut-set c of a graph G is a minimal subgraph of G for which R ( G - c) =
8 An analogous logic network also may be used for decoding purposes. For the
given example, each branch is replaced by an AND gate. In addition, a NOT gate
is required at each level for complementation. Finally, an 8 input OR gate is con-
nected at the output.
GRAPH THEORETIC P R E F I X CODES 75

L.(t) u7
D bl

>

FIG. 3

R(G) - 1. Let p ~ be any path between a pair of vertices (v~, v~) of G.


A basic cut-set (with respect to v~ and v~) is a cut-set c such that
! P ~ N c ] is odd. A weighted graph is a graph which has a real nonnega-
t i r e number associated with each branch. The number associated with
a branch is called the weight of that branch. Examples of most of these
definitions have been given by Seshu and Reed (1961).

CODES BASED ON PATHS

Let G be a weighted graph with N branches labeled bl, b2, . . . , b~-


and let (v~, v~) be a pair of vertices of G. Suppose that the weight of
e~ch branch is an element of the set {1, 2, • • • , D - 1/, for some integer
D > 2. Partition G into subgraphs gl, g2, "'" , g~, where g~ ~ ¢ for
i = 1,2,...,q, s u c h t h a t U~=lg~= G a n d g ~ n g i = ¢ f o r i ~ j . Let
P [ be the subgraph consisting of all of the branches in g~ having weight
d. Clearly, p,0 = ¢. Let a D-nary sequence x = d~ d2 . . . dz be a code
word if the subgraph p = P~' U P ~ U . - . U p~z contains a path between
v~ and v~, and p - P~* does not. Call the set of all distinct code words X.
We may now state the following.
THEORE~ 1. The set X is a D-nary prefix code.
1 1 1
Proof. Let x l , x2 C X, where Xl ~ x2, and let xl = dl d2 . - . d h ,
x2 = d~2d22 • • • d,~.
2
Suppose that x~ is a prefix of x2. Thus, L(Xl) < L(x2).
However, this implies that p2 -- P~'z~ contains a path between v~ and v~
--contradiction. Hence, no code word is a prefix of any other code word.
Q.E.D.
T h e resulting prefix code X will be referred to as the D-nary path code
generated b y G with respect to v~ and v~ (for the given partition).
Example 1. Suppose that D = 3, and G is the weighted graph shown in
76 BOBROW AND HAKIMI

Fig. 3. A branch labeled b~~) signifies that branch b~ has weight d. The
arrows indicate vertices v~ and v~. Choose the partition: gx = {b~, b.,, be},
g~ = {54, bh}, g3 = {b6}, g4 = {bT},g5 = {ba}, g~ = {bg}.Thus,
p0 =~, P11 = {bt,b~}, Px2 = {be}
p 0 = ¢, P2 ~ = {b4}, p 2 = {bh}
Pa ° = ¢, Pa x = {b6}, Pa 2 = ¢

P4° = ~, P41 = {bT}, P42 = ¢


p 0 = ¢, p 1 = {b~}, p2 = ¢

P6 ° = ¢, P61 = {bg}, P62 =

The code words of the ternary path code X generated by G are:


01 221 1011 021111
11 1211
21
Although the coding tree representation of X contains 18 branches, X
can be represented b y a graph G which consists of 9 branches.
In a manner similar to the one described above, prefix codes based on
either circuits or cut-sets may be formulated. This paper, however, will
not be concerned with such codes.
Suppose that a weighted graph 4 is partitioned into subgraphs gl,
g~, . . . , gq such t h a t there exists a subgraph h ~ gq and no branch in h
is contained in some path between v~ and vs . Clearly, the subgraph
G - h generates the same path code as G, if G - h is partitioned into
the subgraphs gl, g~, • • • , g~ - h. In this case, h is sMd to be a non-
essential subgraph. Therefore, without loss of generMity, partitioned
graphs t h a t do not contain non-essential subgraphs only will be con-
sidered. Inspection reveals that the graph of Example 1 does not
have non-essential subgraphs.
Since a path code X has the prefix property, it may be decoded by a
switching network determined from its coding tree. However, any path
code X m a y also be decoded by a switching network based upon the
structure of its generating graph G. This is accomplished simply by
replacing each branch, which has weight d, in subgraph g~ by a normally
4 F o r t h e r e m a i n d e r of t h e p a p e r , it will be a s s u m e d t h a t t h e w e i g h t s of all of
t h e b r a n c h e s of a g r a p h G a r e e l e m e n t s of t h e s e t I 1, 2, • • . , D - 1}.
G R A P H T H E O R E T I C P R E F I X CODES 77

open contact that closes when the ith digit of a sequence is d. Since no
proper prefix of a code word in X corresponds to a subgraph which con-
tains a path in G, the formation of a path of transmission in the switching
network indicates the reception of a code word.
If G contains a branch t h a t is not in any path between v~ and v~,
clearly, t h a t branch is not necessary for decoding purposes. Thus, a path
decoder requires at most [GI contacts.

BASIC CUT-SEw CODES


Again partition a weighted graph G into nonempty subgraphs gl,
g2,"',gqsuchthatU~=lg~= G a n d g l Ng~ = ¢ f o r i ~ j . LetC~be
the subgra_ph consisting of all of the branches in g~ not having weight d.
Clearly, Ci ° = g,. Let a D-nary sequence y = dl d~ • • • dz be a codeword
if the subgraph c = C~' U C~' U . . . U C~z contains a basic cut-set with
respect to v, and v~, and c - C~~ does not. Call the set of all distinct
code words Y. The proof of the following theorem is similar to that of
Theorem 1, and thus, will not be given.
THEOREM 2. The set Y is a D - n a r y prefix code.
The prefix code Y will be referred to as the D-nary basic cut-set code
generated by G with respect to v, and v~ (for the given partition).
E x a m p l e 2. For the graph G and the partition of Example 1, we have

C1~ = gl = { bl , b2 , b3} , C~i = {b3}, C~~ = {bl, b~}

C2~ = g2 = {b4, bb}, C2i = {b~}, C2~ = Ibm}


C~~ = g~ = {be}, C, i = ¢, C~ ~ = {58}
C~~ = g~ = {bT}, C4i = ¢, C4~ -- {bT}

C~ ° = g5 = {b~}, C5 ~ = ¢ , C5 ~ = {b~}

C6 ~ = g6 = {bg}, C8 ~ = ¢ , C6 ~ = {bg}

The code words of the ternary basic cut-set code Y generated by G are:
00 022 122 0210 1010 02110 021110
20 O20 120 0212 1012 02112 021112
102 222 1210
100 220 1212
Although the tree for Y has 32 branches, I G I = 9.
78 BOBROW AND HAKIMI

The comments about non-essential subgraphs discussed in reference


to path codes hold for the case of basic cut-set codes.
To decode a basic cut-set code Y by a switching network determined
by the generating graph G; replace each branch, which has weight d,
in subgraph g~ by a normally closed contact that opens when the ith
digit of a sequence is c~ E {0, 1, . . . , d, . . . , D -- 1},~ i.e., is not d. Since
no proper prefix of a code word in Y corresponds to a subgraph which
contains a basic cut-set of G, the elimination of all paths of transmission
in the switching network indicates the reception of a code word.
As in the case of a path code, a basic cut-set decoder requires at most
I G] contacts.

PATH AND BASIC CUT-SET CODES


Now, we will combine a path code with a basic cut-set code to obtain
the following.
T~Eo~E~ 3. Suppose a weighted graph G is partitioned into non-empty
subgraphs gl , g2, • • • , gq , and suppose that there exists at least one path
between a pair of vertices (v~ , v~) of G. I f X is the path code generated by G,
and Y is the basic cut-set code generated by G; then the set Z = X (J Y is a
D-nary exhaustive prefix code.
Proof. To demonstrate that Z is a prefix code, it suffices to show t h a t if
x C X, y E Y; then neither x is a prefix of y, nor y isa prefix ofx. Clearly,
if g contains a path, then ~ does not contain a basic cut-set, and vice
versa. Let p and c be the subgraphs of G corresponding to x and y,
respectively. Since p is a subgraph of G which contains a path between
v~ and v~, and c is a subgraph which contains a basic cut-set; no prefix of
x corresponds to a subgraph which contains a basic cut-set. Hence, y is
not a prefix of x. Similarly, no prefix of y corresponds to a subgraph
which contains a path, and x is not a prefix of y.
To show that Z is an exhaustive code, let al be any D-nary sequence
such t h a t al is not prefixed b y any code word; and a~ is neither a message
nor the prefix of a message. If L(al) < q, consider the sequence a2 =
al00 . . . 0, where L(a2) = q. Since no prefix of al corresponds to a sub-
graph of G which contains a path, a2 does not correspond to a subgraph
which contains a path. B u t for any graph G with at least one path be-
tween v~ and v~ ; if g does not contain a path between v~ and v~, ~ must
5 The symbol ^ placed over a digit denotes t h a t the digit is deleted from the set.
GRAPH THEORETIC PREFIX CODES 79

contain a basic cut-set. Thus, a prefix of as is a code word. Hence, al is a


prefix of a message--contradiction. If L(al) >___ q, let a~ be the first q
digits of al • Clearly, no prefix of a3 is a code word. Therefore, a~ does not
correspond to a subgraph which contains a path. Thus, it must corre-
spond to a subgraph which contains a basic cut-set. Consequently, a
prefix of a3 is a code word--contradiction. Hence, al is either a message
or the prefix of a message, and Z is exhaustive.
Q.E.D.
The D-nary prefix code Z = X U Y will be referred to as the path and
basic cut-set (P.B.C.S.) code generated by G with respect to v~ and v~.
Example 3. Combining Examples 1 and 2 results in a ternary P.B.C.S.
code Z = X U Y, which can be represented by a tree with 39 branches
or a graph with 9 branches. Incidentally, Z is an optimum ternary prefix
code for the English alphabet (plus a "space") with reference to the
probability distribution given by Dewey (1923).
In addition to the coding tree method, a P.B.C.S. code can be decoded
by the simultaneous use of the path and basic cut-set decoders. Thus, a
P.B.C.S. decoder will require at most 2 I G l contacts.

CODES WITH TWO WORD LENGTHS

For a given code, let o~ denote the number of code words of length I. A
code X consisting of ~i • ~ ~ " " -t- ~Q code words is said to be
equivalent to a code X' consisting of ~l' ~- ~2' ~ " . -}- ~2' code words
if ~z = ~z' for l = 1, 2, .-- , m, whereto = max {Q, R}.
Suppose that X is an exhaustive D-nary prefix code. We would like to
be able to determine a graph G that can be partitioned such that the re-
sulting P.B.C.S. code Z is equivalent to X. Although we shall deal with
this general problem in Section III, for the case when X has two word
lengths/1 and l~, G can be found quite readily. For the sake of simplicity,
we now develop such a procedure for binary codes. The subsequent al-
gorithm can be extended to cover the case of non-binary codes.#
Let X be a binary exhaustive prefix code with word lengths/1 and/2.
Since X is exhaustive
= wz~ 2 - q ~ w~ 2- ~ = 1.

First consider an exhaustive binary code X' having word lengths/1 and
/1 + 1, where

~' = wz12-~I ~ w~+l 2-(~I+1) = 1.


80 BOBROW AND HAKIMI

(A)

¢.:.N _

FIG. 4

Since X' is exhaustive, it has exactly (wl~+~)/2 code words of


(ength/1 + 1 which end with the digit d, for all d E {0, 1}. Algorithm 1
will describe the construction of a graph G' and the selection of a parti-
tion such that the resulting P.B.C.S. code Z' will have code words only
of length /1 and /1 A- 1; and such that there will be exactly (wz~+l)/2
words of length/i A- 1 based on paths, hence, ending with the digit 1.
It then follows that Z' is equivalent to X'. The extension of G' to G
(hence, Z' to Z) will be discussed immediately following the algorithm.
Algorithm 1. Given ~' = w~2 -z~ A- wl~+12-(~+1) = 1.
Step 1. Write the integer w~+~/2 in its binary representation, i.e.,

¢ot1+1 _ 2i~ + 2 ~*-1 + ... + 2~ + 2 '1


2
where 0 = il < i2 < . . . < i v . Since (w~t+~)/2 > 2 ~, then wz~+l => 2 ~+1.
But, Wll+l 2 -(t~+~) < 1 because wz~ ~ 0. Thus, 2 ~+1 > w~+l = 2 ~'/+1.
Hence,/i -4- 1 > iv A- 1 ~ / 1 > iv.
Step 2. If li = iv A- 1, then Fig. 4(A) represents the initial phase in the
construction of G'. If 11 = i~ + p(p > 1), then refer to Fig. 4(B).6
Step 3. Construct subgraph gV+~ as shown in Fig. 5. If /1 = i~ -4-
~(} > 2), we define iv+l = /1 - 1. If/1 = iv + 1 or iv -t- 2, then the
parallel branches bi~+~, bi~+3, • •., b~+l in Fig. 5 are omitted.
a Every branch in G' has weight 1.
GRAPH THEORETIC PREFIX CODES 81

Step ~(~ = 4, 5, - . . , 7 + 1). Replace "7" by "7 + 3 - ~" in Fig. 5.


The resulting figure depicts the construction of subgraph g~+4-~. If
i~+4-~ = i~+3-~" "Jr l , then the parallel branches b~+,_¢+~, b~+~_~+3, • •.,
b,~+,_~ are omitted.
Step 7 + 2. Construct the final subgraph g~ as shown in Fig. 6. If
il = 0, the branches incident to a vertex of degree one are omitted.
It is not difficult to see that if g~ = {b~} for i = 1, 2, - . . , ll + 1; the
graph G', so constructed, generates a P.B.C.S. code with word lengths ll
and l~ + 1 only. The construction was performed such that exactly
(w~1+1)/2 code words of length h + 1 end with the digit 1. Thus, the
code also has (wh+~)/2 code words of length ll + 1 which end in the
digit 0. This implies that the code has w h words of length ll. Hence, G ~
generates Z'.

bT
i+
i
~ vT.+l
~ bTi ,+l
FI~. 5

bi 2

V2

bil+~
FIG. 6
82 BOBROW AND HAKIMI

hi2

b5
F~o. 7

Call G the graph obtained from G' by relabeling btl+l by b** and adding
branches bz~+1, bz1+~, • • ", bl,-1 such that these branches are not con-
tained in any path between v~ and v~. Since G generates a code having
word lengths/1 and l~ only, and the number of words of length/1 is wh ; G
must generate Z.
E x a m p l e 4. Consider a binary exhaustive prefix code X with w14 =
14,954 and wl~ = 11,440. First realize, w~4 = 14,954 and w15 = 2860.
Since w1~/2 = 1430 = 21° + 28 + 2 ~ + 24 + 2~ 4- 21, a graph which
generates a P.B.C.S. code Z equivalent to X is given by Fig. 7. The par-
tition for this graph is the trivial partition gl = {bd, for i = 1, 2, • •., 17.
I t should be noted that for any binary exhaustive prefix code with two
word lengths, Algorithm 1 will always result in G having the trivial par-
tition.
Although any code equivalent to X, given in Example 4, can be repre-
sented by a tree with 2 ( W - 1) = 52,786 branches; Z is equivalent to
X , and Z can be represented by a graph with 17 branches.
In applying Algorithm 1, it is not difficult to show that for the resulting
graph G, [ G { =< (W - 1) = (w h + w~ - 1). In general, for t h e D - n a r y
ease, it can be shown//that [ G { < ½ [ ( D / ( D - 1)) (W - 1)].
In the next section, we will describe how to determine a graph which
generates a P.B.C.S. code eqtfivalent to any exhaustive D-nary prefix
code.
GRAPH THEORETIC PREFIX CODES 83

III. INCLUSION CODES


We now define a class of codes such that any exhaustive code (with
distinct code words) in this class is realizable graph theoretically.
First, we introduce the concept of a code word x' which includes
another code word x. Let X be a D-nary code, and let x E X.
Case A. The last digit of x is not 0. A code word x p E X is said to in-
clude x if L(x') > L(x) and when the ith (i = 1, 2, . - . , L(x)) digit of
x is d ~ 0, then the ith digit of x r is d.
Case B. The last digit of x is 0. A code word x ~ E X is said to include x
if L(x') > L(x) ; when the ith digit of x is 0, then the ith digit of x' is 0;
and when the ith digit of x is d ~ 0, then the ith digit of x' is 0 or d.
If no code word includes any other code word, then X is said to have
the inclusion property, and X is called an inclusion code. Clearly, if an
inclusion code X has distinct code words, then X is a prefix code. How-
ever, the converse is not true in general.

CONSTRUCTION OF INCLUSION CODES

Let T be any complete tree having rth order vertices and no vertices of
order rl > r. At each vertex of order r2 < r, label the branches 0, 1, • •.,
D - 1 as shown in Fig. 8.
Let the number of rth order vertices be 7. Label the ~/D terminal
Tth order vertices incident to a branch labeled d(d E { 1, 2, . . . , D - 1}),
upward as shown in Fig. 8. In addition, label the ~/D terminal rth order
vertices incident to a branch labeled 0, downward. Call the D-nary se-
quence corresponding to the path from v0 to v~, x~; for i = 1, 2, • •., ~/D
and d E {0, 1, 2, . . . , D - 1}. Suppose that a is a non-empty, finite D-
nary sequence. We then have the following lemmas.
LEMM~ 1. I f f > e, for e, f = 1, 2, . . . , ~/D; then x~a does not include
d
xe (d E {1,2, . . . , D - 1 } ) .
Proof. Let v be the highest order vertex in common with the paths
from v0 to vy and from v0 to v~. By construction, one of the two following
cases must occur: (A) The (v W 1)st digit of xff is 0, and the (v -}- 1)st
digit of x~ is nonzero; (B) The (v ~- 1)st digit of xf~ i s j E {1, 2, . . . ,
D--2}, and the (v Jr 1)st digit of x~ is k E {J -~- 1, j Jr 2, . . . ,
D -- 1}. In both cases, by definition, x J a does not include x[.
Q.E.D.
84 B O B R O W AND H A K I M I

LEMMA 2. I f f • e, for e, f ~- 1, 2, . . . , ~/D; then xi° a does not include


0
Xe.

Proof. Choose v as in the proof of Lemma 1. Again, by construction,


one of the two following cases must occur: (A) The (v + 1)st digit of

vp
o

i °
1
v~

~: ~io~-,
.i <

2-! 1iO
(',7/o)-i

~i v~,
Fro. 8
GRAPH THEORETIC PREFIX CODES 85

x~0 is

0, and the (v + 1)st digit of xs° is nonzero; (B) The (v + 1)st digit
of x~°'is3 C {1, 2 , . . . , D - 2} and the (v -5 1)st digit of x7 is k C {j -5
• 0
1, j -5 2, . . . , D - 1}. Thus, xs°a does not include x~. Q.E.D.
I t should also be noted that if dl E {0, 1, . •., D - 1} and d~ E {0, 1,
• " ", •1, "" ", D -- 1} ; then a~la does not include x~~, for all e, f = I,
2, . . . , 7/D.
Let X be any D-nary exhaustive prefix code having ~ = 1 w b. code
words, where 0 < 11 < 12 -- • < lr. Now, we present a method for con-
structing a D-nary exhaustive prefix code X', which is equivalent to X,
such that X ' has the inclusion property.
Algorithm 2. Given ~-~=1w~iD-~i = 1, where 0 < ll < l~ • - • < 1,.
Step 1. Construct a complete coding tree T1 with D ~ vertices of/1st
order. Label the D ~ = 71 vertices of l~st order as if in Fi:g. 8, li = r and
71 -- 7- Choose as terminal vertices the first wz~ vertices in the sequence
~)11' 1 I 2 2 2 D--1 D--1 D--I
V2 , " " " , V~IlD , V l , V2 , " " " , V~/I/D , " " ", Vl , V2 , " " " , V~IID ,
0
V?, Y20, • • "~ ~)~IlD

Step f (i" = 2, 3, • •., r). To each nonterminal l~_lst order vertex, con-
nect a complete tree ~4th D zr-zr-~ vertices of (l~ -- l~_l)st order. The
overall tree T: has (7:-1 -- wz~_~) D ~:-Z¢-~ = v~ vertices of l~st order.
Label these 7~ vertices as if in Fig. 8, l~ = r and 7: = 7. Choose as termi-
nal vertices the first wt~ vertices in the sequence
~)11, 1 1 2 2 2 ~--1, D--I D--I
~)2 ~ " ' ' , V~f/D , V l , Y2 , " '', VV~/D , "" ", V V2 , " '', Y~ID,

0 0
Yl 0, V2, • " ", V~/D

The tree T, obtained by Algorithm 2 represents X', and obviously, X '


is equivalent to X. Since the terminal vertices of the coding tree were
chosen with reference to Lemmas 1 and 2, the D-nary exhaustive prefix
code X ' has the inclusion property.
Example 5. An optimum binary prefix code X for the English alphabet
(plus a "space") is described b y w3 = 2, w4 = 8, w5 = 4, w6 = 7, w7 = 1,
ws = 1, and wl0 = 4. Applying Algorithm 2, the coding tree T7 for X ' is
shown in Fig. 9.
T h e following theorem is one reason for our discussion of inclusion
codes.
T~EOREM 4. For any D-nary exhaustive prefix code X' with the inclusion
property, there exists a graph G which can be partitioned such that the result-
ing P.B.C.S. code Z = X'.
86 BOBROW AND HAKIMI

FIG. 9

Because of its excessive length, the proof of this theorem will be


omitted. For Theorem 4, one such graph G and its corresponding parti-
tion is determined as follows: Let T be the coding tree for X J. To form
G from T, call the 0 order vertex of T, v~. Relabel each branch in T which
is labeled d, for d E {1, 2, • •., D - 1}, and which is incident to vertices
of order i - 1 and i; as (i) ~. From T, remove every branch labeled 0
which is incident to a terminal vertex. Coalesce all the remaining termi-
nal vertices, and label the resulting vertex v~. Short every other branch
in T labeled 0. In the resulting graph G, a branch labeled (i) ~ indicates
t h a t the branch has weight d and is in subgraph g~. This graph G gener-
ates a P.B.C.S. code Z identical to X'.
I t should not be difficult to see t h a t any complete tree with W termi-
nal vertices contains exactly [1/(D - 1)](W - 1) branches labeled d,
for all d E {0, 1, 2, . . . , D - 1}. Thus, the graph G described above con-
sists of (W - 1) branches. However, by means of the following example,
we shall see t h a t in m a n y cases, a number of these branches can be
GRAPH THEORETIC P R E F I X CODES 87

eliminated such that the resulting P.B.C.S. code remains the same. Thus,
W - 1 is an upper bound on the number of branches required for a
graph.
E x a m p l e 6. Let X r be the binary inclusion code given in Example 5.
By the previous discussion, G is found to be the graph in Fig. 10. Since
D = 2, all branches must have weight 1. Thus, in Fig. 10, a branch la-
beled i indicates that the branch is in subgraph g~. Note that G contains
a branch in g4 which, by itself, forms a path between v~ and v~. So, sup-
pose that xr E X t, where x r = dld2d~d4 and d4 = 1. If all branches in g4 ,
except the self-path, are eliminated; then x' is still a code word of the
P.B.C.S. code generated by the resulting graph. Clearly, the removal of
these branches does not affect code words of other lengths. Hence, the
code remains unaltered. In an analogous manner, branches in gs, g6, and

3
f 6

3 6
2
f

4
| 5

FIG. 10
88 BOBROW AND HAKIMI

FIG. 11

bl ')
5(2)
3
b~l)
O, C

b~ )
FIG. 12

gl0 may be removed. Furthermore, if this process of elimination yields a


branch in g~ which is incident to a vertex of degree one, this branch may
be removed provided that ]gil > 1. This procedure for reducing the
number of branches in a graph, without affecting the code, can be for-
malized and extended to the D-nary case.# Thus, we see that the graph
of Fig. 10 can be reduced to the 13 branch graph shown in Fig. 11. The
tree for X' contains 52 branches.
Theorem 4 states that any exhaustive D-nary prefix code with the in-
clusion property is a P.B.C.S. code. To see that the converse is not true
in general, consider the graph shown in Fig. 12. If gl = {bl}, g2 = {b2, b3},
and g~ = [ b4}, then the resulting ternary P.B.C.S. code Z is composed of
the following code words.
00 11 120
01 20 121
02 21 122
10 22
GRAPH THEORETIC P R E F I X CODES 89

Clearly, all three code words of length 3 include the code word 02. Thus,
Z does not have the inclusion property.
Now, we would like to show t h a t when D = 2, the converse of Theorem
4 is valid.
THEOREM 5. Let Z be a binary P.B.C.S. code generated by some graph
G. Then X has the inclusion property.
Proof. Let zl, z2 C Z, where zl = dldg. . . . d~ and z2 = dl' dJ . . . d't, .
Assume t h a t z2 includes zl. Then, I' > 1. Suppose t h a t dz = 1[0]. Since
z2 includes zl, if d~ = 110]; then di' = 1[0] (for 1 =< i < l). But, the l's
[O's] of zl correspond to a subgraph of G which contains a path [basic
cut-set]. Thus, the l's [O's] of dl'd2' . . . d'z,-~ correspond to a subgraph of
G which contains a path [basic cut-set]. Therefore, z~. ~ Z--contradic-
tion. Hence Z has the inclusion property. Q.E.D.
Theorems 4 and 5 can be combined to yield the following obvious
theorem.
THEOREM 6. A n exhaustive binary prefix code X is a P.B.C.S. code if
and only if X has the inclusion property.

IV. SYNCHRONIZATION
In discussing decoding, it was assumed t h a t messages were received
without error (noiseless channel case) and t h a t the decoder was in its
initial state when the first digit of a message was received. Under these
circumstances, any message is synchronized with the decoder. Whether
it is due to noise or network malfunction, the loss of synchronization
could result in a significant loss of information.
In order to investigate the synchronizing properties of inclusion codes,
the following definitions are required. Suppose X is an exhaustive D-nary
prefix code. Let s be a suffix of some x E X. If there exists messages ml
and m2 such t h a t s m~ = m2, then m~ is said to be a synchronizing message
for s. If every suffix (of every x C X) has a synchronizing message, then
X is self-synchronizable.
A number of authors, such as Gilbert and Moore (1959), Schiitzen-
berger (1956, 1964, 1967), Stanfel (1966), Schwartz (1964), and Leven-
shtein (1962), have investigated the self-synchronization properties of
variable-length exhaustive prefix codes. Gilbert and Moore (1959)
showed t h a t if an exhaustive prefix code X has word lengths/1,/2, • •.,
L, then a necessary condition for X to be self-synehronizable is t h a t the
90 BOBROW" AND HAKIMI

greatest common divisor of the lengths be equal to one, i.e., g.e.d.


(ll, l~, . . . , 1 , ) = 1.
Suppose X is a D - n a r y prefix code. Let X k denote the code which con-
sists of all possible k-tuples (k > 2) of the form x~lx~ . . . x ~ , where
x~i E X f o r j -- 1, 2, - . . , k. A D - n a r y prefix code X is said to be uniformly
composed if there exists a prefix code Y and an integer/~ > 2 such t h a t
yk = X. Schfitzenberger (1956) stated t h a t if g.e.d. ( / 1 , / e , . . . , l~) =
1 and X is not self-synchronizable, then X is either a n a g r a m m a t i c or
uniformly composed.
Levenshtein (1962) devised a test for determining if a given exhaus-
tive prefix code is self-synchronizable. F o r the case when such a code
having g.e.d. (11, /2, . . - , l~) = 1 fails the self-synchronization test,
Stanfel (1966) developed a procedure for obtaining an equivalent self-
synchronizable code. Unfortunately, for other t h a n relatively small
codes, the a m o u n t of work required b y Levenshtein's test m a y be quite
prohibitive.
We shall see t h a t inclusion codes need not be tested for the self-syn-
chronization property. T h e first step in this direction is the following.
LEMMX 3. A D - n a r y variable-length exhaustive prefix code X with the
inclusion property is not anagrammatic.
Proof. Assume t h a t X is anagrammatic. Let l~ be the smallest n u m b e r
such t h a t there exists a code word x -- d d . . - d, where L ( x ) = l~ and
d E {0, 1, . . . , D - 1}. We will show t h a t every D - n a r y sequence of
length l~ is a code word.
I f d = 0, no proper prefix of the sequence x~ = a 0 0 . . . 0, where
L(x~) = l~, is a code word for a n y a E {0, 1, . . . , D - 1}. For otherwise,
X does not have the inclusion property. I n addition, no sequence
a 0 . . . 0 of length > l~ is a code word; o t h e r w i s e X does not h a v e the
suffix property. Thus, Xl E X. Similarly, the sequences x~ = a a 0 . - - 0,
x~ = a a a 0 . . . 0, . . . , xz~-i = a a . . - a 0, where L(x2) --- L(x3) =
.... L(x~,_l) = l, ; are code words. I f a = a a . . . a, w h e r e L ( a ) = l , ,
is not a code word; there exists a code word a 0 0 .. • 0. This implies t h a t
X does not have the suffix property. Thus, a E X for every a E {0, 1, • •.,
D -- 1}. Now, no proper prefix of the sequence al = 0 a a .. • a, where
L(a~) = l~, is a code word; because if not, X does not have the inclusion
property. Also, no sequence 0 a a . . . a of length > l~ is a code word;
otherwise, X does not have the suffix property. Thus a~ E X. Similarly,
the sequencesa2 = 0 0 a - . . a, a3 = 0 0 0 a . . . a, . . . , a z , _ ~ = 0 0 . . .
GRAPH THEORETIC PREFIX CODES 9l

0 a, where L(a2) = L(a3) . . . . . L(al,_i) = 1, ; are code words, for


all a E {0, 1, 2, . . . , D -- 1}.
I f d ~ 0, n o p r o p e r p r e f i x o f x l ' = O d d . . . d, where L(xi') = 1, ,
is a code word. For otherwise, X does not have the inclusion property. In
addition, no sequence 0 d d --. d of length > l, is a code word; other-
wise, X does? not have the suffix! property. Thus, xl' E X. / Similarly, the
sequencesx2 = 0 0 d . - . d,x, = 000d... d, . . . , x~,_l = 0 0 - . .
0 d, where L(x2') . .L ( x. j ) . . . L(xz~-i)' = l~; are code words. If
the sequence 0 = 0 0 . - . 0, where L(0) = l , , is not a code word; then
there exists a code word 0 d d .. • d. This implies t h a t X does not have
the suffix property. Thus, 0 E X. Hence, as in the case for d = 0; a~,
a2, . . . , a ~ _ l E X, f o r a l l a E {0, 1, . . . , D - 1}.
In particular, since every sequence 0 0 - . . 0 dl of length l~ is a code
word, for all d~ E {0, 1, • •., D - 1} ; if there exists d2 E {0, 1, . . . , D - 1}
such t h a t the sequence 0 0 • • • 0 d2 of length l, - 1 is a code word, then
X does not have the suffix property. Thus, every sequence 0 0 • • • 0 d~ d~
of length l, is a code word, for all di, d~ E {0, 1, . - . , D -- 1} ; otherwise,
X does not have the inclusion property. Therefore, b y induction, every
sequence of length l, is a code word. This implies that X is not a variable-
length code--contradiction. Hence, X is not anagrammatic. Q.E.D.
Since binary variable-length P.B.C.S. codes have the inclusion
property, they are not anagrammatic. However, a nonbinary variable-
length P.B.C.S. code m a y be anagrammatic.~
We now deal with the question of uniform composition.
LEMMA 4. A D-nary variable-length exhaustive prefix code X with the in-
clusion property is not uniformly composed.
Proof. Assume t h a t X has W = ~-~=i w b. code words and is uniformly
composed. Therefore, there exists an exhaustive variable-length prefix
code Y such that yk = X, for some integer ]c => 2.
Clearly, a minimum length code word of Y has length l~/k. Suppose
t h a t the code word of Y which consists of all O's is denoted by 0. Con-
sider the following two cases.
Case A. L(O) > lJk.
There must be some y E Y, where L(y) = ll/k, and y = di d~ . . .
d~(~) does not consist entirely of O's. Hence, in Y~ = X there are code
wordsxl = y00.-. 0 a n d x : = 0 0 . . . 0, whereL(x~) = L(y)
(k -- 1) L(0) < L(x~) = kL(O). Thus, x2 includes xl - contradiction.
Case B. L(O) = ll/k.
92 BOBROW AND HAKIMI

Suppose d = d d -.. d E Y, where L(d) > h/k for some d C {1, 2,


• •.,D - 1)}. Hence, in Y~ = X there are code words xl = 0 d d ..- d
and x2 = d d . . . d, whereL(xl) = L ( O ) + (k -- 1) L(d) < L ( x 2 ) =
kL(d). Thus, x2 includes xl--contradiction.
Now suppose d C Y, where L(d) = ll/k for all d E / 1, 2, . . . , D - 1}.
Since Y is exhaustive, there exists some y' E Y such that L ( y ' ) > l~/k
and y' = d l ' d2' . . . d~(o) d d . . . d, for some d C {1, 2, . . . , D - 1}. Then,
X has code words xl = 0 d d . . . d and x~ = y' d d . . . d, where L ( x l ) =
L(0) ~- (k -- 1) L(d) whichislessthanL(x2) = L ( y ' ) + (k - 1) L(d).
Thus, x2 includes xl--contradiction.
Hence, X is not uniformly composed. Q.E.D.
Again, since binary variable-length P.B.C.S. codes have the inclusion
property, they are not uniformly composed• But, a nonbinary P.B.C.S.
code may be.#
Combining Lemmas 3 and 4, the following theorem becomes evident.
TI~nOREM 7. Let X be a D - n a r y exhaustive variable-length prefix code
having lengths l~ , 12 , . . . , l, . S u p p o s e that X has the inclusion property.
T h e n X is self-synchronizable i f and only i f g.c.d. (ll , 12 , . . . , lr) = 1.
Finally, we state an obvious corollary to Theorem 7.
COROLLARr. A binary P.B.C.S. code Z having word lengths l~ , 12 , • •.,
l~ is self-synchronizable i f and only if g.c.d. (ll , l~ , . . . , l,) = 1.
We now make a remark about an exhaustive D-nary prefix code having
word lengths/i and/2, only. It is known that such a code is not anagram-
marie (Stanfel, 1966). In addition, the reader should not find it difficult
to verify that no exhaustive prefix code with two word lengths can be uni-
formly composed. Thus, a necessary and sufficient condition for self-
synchronizability is that/1 and/2 be relatively prime. Hence, the prob-
lem of determining if a P.B.C.S. code with two word lengths has the in-
clusion property, becomes academic.
¥. CONCLUSION
This paper has demonstrated how to generate D-nary prefix codes
from weighted graphs. In an analogous manner, other schemes based on
duality, for generating prefix codes from graphs can be defined.# Such a
coding procedure yields further flexibility in the decoding of graph
theoretic codes.
We have shown that for any D-nary exhaustive prefix code with W =
~ = 1 wb. code words, there exists a graph G, with a~ most W -- 1
GRAPH THEORETIC PREFIX CODES 93

branches, that generates an equivalent P.B.C.S. code. Unfortunately,


no procedure for obtaining a graph with the minimum number of
branches is known, except in the case of binary codes with two word
lengths. Clearly, the lower bound on the number of branches is l~. This
bound corresponds to the case when the graph is trivially partitioned.
However, a graph t h a t can be partitioned trivially may not always exist.#
B y an extension of Algorithm 1, some sufficient conditions for realizing a
trivially partitioned graph which generates a P.B.C.S. code equivalent
to a binary exhaustive prefix code have been found.# For the D-nary
case, it would be desirable to have necessary a n d / o r sufficient conditions
for an equivalent code to be realizable by a graph with l~ branches.

RECEIVED: July 15, 1968


REFERENCES
As~, lt. (1965). "Information Theory." John Wiley & Sons, New York.
BOBROW,L. S. (1968). Graph theoretic variable-length codes. Ph.D. Dissertation,
Northwestern Univ., Evanston, Illinois.
BREDESON, J. G. AND HAKIMI, S. L. (1967). Decoding of graph theoretic codes.
IEEE Trans. Inform. Theory IT-13,348-349.
DEWEY, G. (1923). "Relative Frequency of English Speech Sounds." Cambridge
Univ. Press, Cambridge, England.
FANO, R. M. (1961). "Transmission of Information." MIT Press, Cambridge,
Massachusetts.
FRAZ•R, W. D. (1964). A graph theoretic approach to linear codes. Proc. of 2nd
Allerton Conf. on Circuits and Systems, Univ. of Illinois, Urbana, Illinois.
GILbErT, E. N. ANDMOORE,E. F. (1959). Variable-length binary encodings. Bell
System Teeh. J. 38, 933-967.
H,KI~I, S. L. ANDBRE~ESOX,J. G. (1968). Graph theoretic error-correcting codes.
IEEE Trans. Inform. Theory IT-14, 584-591.
HAKIMI, S. L. AND FRANK, H. (1965). Cut-set matrices and linear codes. IEEE
Trans. Inform. Theory IT-11, 457458.
HUF~MAN, D. A. (1952). A method for the construction of minimum-redundancy
codes. Proe. IRE 40, 1098-1101.
HUFFMXN, D. A. (1964). A graph theoretic formulation of binary group codes.
Summaries of papers presented at ICMCI, part 3, pp. 29-30.
KASAMI, T. (1961). A topological approach to construction of group codes. J.
Inst. Elec. Communication Engrs. (Japan) 44, 1316--1321.
KRAFt, L. G. (1949). A device for quantizing, grouping, and coding amplitude
modulated pulses. M.S. Thesis, MIT, Cambridge, Mass.
LEVENSHTEIN, V. I. (1962). Certain properties of code systems. Soviet Physics,
Doklady 6, 858-860.
McMIL~N, B. (1956). Two inequalities implied by unique deeipherability. IRE
Trans. Inform. Theory IT-2, 115-116.
94 BOBI~OW AND HAKIMI

SC~ZENBERG~R, 1VI.P. (1956). On the application of semigroup methods to some


problems in coding. IRE Trans. Inform. Theory IT-2, 47-60.
SCH~TZENBERGER,M. P. (1964). On the synchronizing properties of certain prefix
codes. Inform. Control 7, 23-36.
SC~TZENBERQER, M. P. (1967). On synchronizing prefix codes. Inform. Control
11, 396-401.
SCHWARTZ,E. S. (1964). Self-synchronization of prefix codes. Summaries of papers
presented at ICMCI, part 3, pp. 25-26.
SESHU, S. AND REED, M. B. (1961). "Linear Graphs and Electrical Networks."
Addison-Wesley, Reading, Massachusetts.
STANF~L, L. E. (1966). Synchronizing properties of variable-length encodings.
Ph.D. Dissertation, Northwestern Univ., Evanston, Illinois.

You might also like