0% found this document useful (0 votes)
1K views270 pages

Info Theory Solutions

Solution manual for Information Theory and Reliable Communication by Gallager

Uploaded by

Moaaz Mahdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
1K views270 pages

Info Theory Solutions

Solution manual for Information Theory and Reliable Communication by Gallager

Uploaded by

Moaaz Mahdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 270
INFORMATION THEORY AND RELIABLE COMMUNICATION SOLUTIONS 70 PROBLEMS - CHAPTER 2 a): For disjoint events, P(EG )=P(E Pte 2) *PKE S| de 1 so P(EQ) 53/4. aj? For statistically independent events, 1-P(EQ): 18 the probability that none of the events occur, which is the product of the probabilities that each one doesn't (3/4)? ana P(By)=37/64. occur. Thus 1-P(E, =E,"E,, then P(E) er ar) i ag: Tf “by: From the Venn diagram below, P(Ey) is clearly 1 maximized when the events are disjoint, so max P(EQ)=3/4. . “¥s , by: The intersection of each pair of sets has probability 16. As seen below, P(E,) is maximized if all these pairwise intersections are identical, in which case P(E 9) =3/a~ age. = Ye > 57 2.2 -2- Let L be the event that the loaded die is picked and H the event that the honest die is picked. Let AS be the event that i is turned up on the first roll, and B, the event that i ig turned up on the second rol2. We are given that P(u) be P(A) \u)=2/3; P(AS Lh) =1/15 (29186); P(A,|H)=1/6 (18156). Then eo PLA) P(A, IE) PCL) = eS P(A) P(A, TE) P(Z) + P(A, TH) PH) P(L la = This is the probability that the loaded die was picked conditional on a one on the first roll. For two rolls, we make the assumption from the physical mechanism involved in rolling a die that the outcome on successive vous of a given die are independent. Thus P(A B tn)! =(2/3)? and P(A, B,1H)= 1/6)”, it follows as before that eter “7 P(I A,B. wy = SF (xtyPyytxy) : a = Wy yO) + = YPyy yy) *xY xy = S xy (x) + SyPy(y) = SY x ¥Y Note that statistical independence is not necessary | here and that the argument extends to non-diserete variables if the expectation exists, “3. be Dwr bgy) = 2D xy, Ga ey(y) xe xy & yyly) sy Y es > Py (20) . x Note that the statistical independence was used on the Let x and y take on only the values $1 An example of uncorrelated but dependent variables first line. and 0. “is Pygy (210) =P (O11) =Pyey (2,0) =Pyy (0,2) =1/4. An example of correlated and dependent variables is Pyy(LelL)=Pyy(-be-L)=1/25 Using! a:, we have ce = 2 =y)2 =) =) (xcety-7) © = (x-K)“ + 200-2) (y-7) + (ys 2 oxy =.= from at, is 2[ky - ¥ YJ. For uncorrelated 2 etto2 xty ~ Ox ¥ The middle term, variables this is zero, leaving us with o: Re P(x) = DF xP, (x) eZ ombd = Fee 2.4 since the omitted terms in the latter sum are non-negative. Lower bounding x by 6 in the latter sum, we have 2 P(x) = Px (x28) xz xRb Px(x2b) 3 ¥/6 For given 6>0, the inequality is satisfied with equality if x is a binary variable, taking on only tne values 0 and 6. use -e” gor & in part a, obtaining 2 _ © Prl(y-y)? =e?) ¢ Y= « But (y-y)? 2 e” is equivalent to ly. c: Using induction on 2.3c, the variance of a sum of independent variables is the sum of the individual i 7 7 2 variances. Thus Sz) has the variance No”. —_— 2 No. =\2 2 Lb pen ese)? 2 (gry) =? (D4 DAY = as a: Here y, is 1/N times the number of occurences of B in the N experiments, or in other words y, is the xelative frequency of occurence of E over the sample. Thus (i) states that the event that the relative frequency of % differs from the probability of E by more than some small number ¢ has a probability that! approaches zero with increasing N. Let p be the . | probability of event EB. Then Jy = pana o,? = pliep). a — Finally since the event E can occur i times in N trials in () aiggerent ways, each of probability P*(1-p)"*, x : . rely a a) = Geter nel Since Iyy-¥y! ze is equivalent to | a7 py] 2 e N, os a WNSEN) (Ny oi) py NOE prlly,-9,1 = 4) = (oP UP) tS Copia. i= fyten'] a: For any particular received sequence y, P(yla,)P(ay) P(ayly) = ; . P(yla,)P(a,) + P(yla, Play) y P(a, ly) | y Play ly) 000 Ge)? un (ise) 3403 001 ase uo e 010 are on « 200 lve ou, e b: Letting e be the event of an incorrect decision, we have for any particular decision ,, P(e) = D P(ely)Ply) ¥ Since P(y) is independent of this decision rule, P(e) is minimized by choosing the source letter for each y that minimizes P(ely); ie. the source letter a, that maximizes Plasly). Since Plagly) = 1 - Piayly), ana since the entries on the left of the above table exceed 1/2 and those on the right axe less than 1/2, - the given xule minimizes P(e). ce: An incorrect decision occurs if 2 or 3 of the three xeveived digits are incorrect. Thus Ple) = 3 e%(i-e) +e? dz: The probability of incorrect decision is minimized by picking a, if n orfewer 0's occur and picking a, 1 se.. Since the probability that any given digit othery is received incorrectly is less than 1/2, and an incorrect decision occurs only if over half the digits are incorrectly received, the law of large numbers (see problem 2.4) asserts that P(e) 0 asn~ > 2.6 Let ¥ be the ensemble of events p(prompt) and t (tardy). Then Py 1,(plbtonde) PyL(p) yy (blonderp) = log = log (us YX a)+ (72 Yaz) + bX 9) Similarly 1 bit 227 “+ G(1-6)? e? +e4. thus P, Ty y (brunette sp) Zyjyltedheadrp) = ~* (cannot be redhead) . Ve Ponwunette on time r = — all ti (brunettesppp) = los 17, 7716 imes) Tx, YX, YOY. ==: Co Pimember of X on time all 3 times) = = log 5/2 Pr{ lst received digit = |x, transmitted)” pr{ ist xeceived digit = 0] = log nea = log [2(1-e)] b We have used here a result which is obvious from symmetry and also simple to derive: if the input to a binary symmetric channel is 0 or 1 with equal probability, then the output is also 0 or 1 with equal probability. be opserve that the first three input digits to the channel are statistically independent, eguiprobable binary digits. Since the channel is memoryless, it =~ follows easily that the first 3 outputs are also statistically independent and equiprobable. Hence oloo) = rogl2(1-e)) a (x,70]0) = r (xy7 *¥41¥y 1 Xx, F¥,'¥yy) 1 Finally, averaging over all thecode words, we find thet, the probability of receiving y = (0,0,0,0) is a (0l0,0,0) = (1-e)* ¥41XyS bli-e)? «2 + 64, ! = Anh pe vay, 170101000) = Nos (LEVEE Slice)? oF + oF 1. ix! WVIYYL¥, 2.8 The first N-1 binary digits are statistically independent o£ each other, and the Nth is determined by the first N-l, so that 0; naw 1 bit; n=N Note that this argument is independent of the ordexing of the sequence. In other words no set of N-2 of the @igits provide any information about x,, although given any set of N-2, the remaining digit resolves all uncertainty about x, N 2.9 H(X) = 3/2 bits I(X;¥) = 1/2 bit H(¥Y) = lL bit I0QG2) = Lbit H(Z) = 1 bit T(x%;¥1Z) = 1/2 bit H(YZ) = 2 bits E(X;¥Z) = 3/2 bit The quantity 1(X;¥|Z) is interpreted as the average additional information provided by ¥ about X after Z is known, In this example the conditioning on Z is irrelévant 2.10 By straightforward calculation and simplification of the xesulting expfession, we get L(X;X) = (1-e) Ep ros $ + (Lp) log wy) Either optimizing directly over p or using Theorem 2.3.1, we find that p=1/2 maximizes I(X;¥). Por p=1/2, 2.11 U(X7¥)=(1+e) bits and Ty (070) = Ty yQlrl) = 2 bit; Ey yOuB) = 0. : yb: With the given stratagem, the transmission of a source digit is completed each time the channel does not erase. Thus the average number of source digits transmitted per channel use is (1-e). From the law of laxge numbers, for a large number of channel uses N,” there is a high probability that the number of source @igits transmitted is close to (1-)N. az I(X;Y) = 1 + 3/4 log 3/4 + 1/4 log 1/4 = .189 bits b: The capital at the end of the n™ net, cy is twice the amount bet on the winning color, oe 2, lez, ee ey (and) " Ca] | Applying this formula for each n, z dei = i = 2 Sy = Sy f2(1-q)) ® (2q] 1 8 E - & ta, log [2(1-g)] + Q-2) leg 2q] Since z, = 1 with probability 3/4 and 0 with probability 1/4, and since the expectation of a product of independent variables is equal to the product of the expectations, -10- f2(1-a)1 + 2 O1 8 we Sof 32 N Cjtz4) ah » a w . “3 2 % bog f2a(i-a) + |G log 2a By inspection, C,, is maximized by = + Differentiating By with xespect to g, we see that a unigue maximum occurs at q=1/4. Thus s 3) N mx Cy, = €, q E a 342 2. x: = Ey L+ Flog Z+ Glog ee TRY) ©: Observe that for any given q, E, is the sample mean of N a set of N identically distributed independent random variables, z, log [2(1-q)] + (1-2,) log 2q. Thus, . for any e>0, the law of large numbers states that lim Pr (lg\-E I>eleo. in terms of C,, this zesult states that Nope NN N * Lim pr(C aye «C, * Gr} Nye In other words, By specifies both Cy and Ey for large W to be within close limits with high probability. A player who uses q=1/4 (maximizing ee) will with overwhelming probability have a larger capital after a sufficiently large number of trials than a player’ who uses some other -1l- value of g. Additional insight may be gained into this peculiar situation by considering what happens with q=0 (maximizing e). In this case, the entire capital is bet each time on the predicted color, and any occurence of the other color reduces the capital to 0. vhus after N trials, the capital is ¢, 2 with probability (374)". “as N increases, the probability of winning . is vanishing exponentially, but the winnings are so large when they occur that the expected value of C, is large. This is an extreme example of a situation where the mathematical term expectation has no connotation of the usual English meaning of the word. . We want to maximize H(X) subject to the two constraints Lap (n) = A and £P,(n) = 1. We will ignore the additional constraint P,(n) * 0 for all n ® 0 dnd hope that the solution using the other constraints satisfies this latter inequality constraint. Using Lagrange multipliers, we solve sty ££ Pla) tog Pim) -A Z np(n) - 5 r(a)] n=o n=0 n=o ° = log P(n) - loge - An- Y=0 wane> P(n) =e 277 eB” where B and x must be chosen to satisfy the constraints. > B 2 P(n) = Th Bx Dn Pla) = 92 2.13 2.14 -12- ‘Thus se Rn) = en Gea) 7 BrOede The easiest way to verify that this is indeed the maximum (rather than just a stationary point) is by the convexity approach in’ section 4.4. Always predicting no rain provides no information about the weather. One should not infer, however, that a weatherman should ideally be concerned with maximizing the average mutual” information in his reports. Since Py (ay) = a, Mo H(X) = = alog a A P,(a,) log p,(a,) Mel p,(a,) pyta,) # XE + tog (1-0) 1 £ Lieg ia = -aloga- (l-a) -e@loga- ¢ From theorem 2.3.1, H(¥) % log (4-1), so H(X) # -@log a- (1-a) log (1-w) + (1-9) log (M-1) Let X be an ensemble with probabilities .pa,), 1KsK, Without loss of generality, assume that play) > Pla), end let Y be an ensemble with probabilities P(a)) ~ ¢, )Play) where 0 < 6 < pla.) - Pla) Play) + a Plasde + 2.16 We must show, that u(x) > (x), H(X) - H(¥) = Play) log Play) ~ Play) log Pla) + [(ay)-e] tog fP(a,)-e1 + Teta, )te) log Cplay)+e] Play) ~e Pla,)te = P(a,) log art Pla, ) dog Se Pel =e: log play) =e . Pla,)+e Using the inequality log x ¢ (x~1) log e, HQ ~ H(Y) # (1og e) [P(a)-e - Play) + Plag)te ~ Play)] Pay )-¢ ~ © log Payee Pla))-« . = - 6 log <= <0 P(a,)*e xyl4yrPy) = 172 Pyy(2 12) = 1/4 P xyl?2rP2) = 1/4 -14- The point of this problem is to show that it is possible to observe letters in a Y ensemble which increase the uncertainty about the X ensemble. It is easy to show, however, by the same proof as in Theorem 2.3.5, that x p(a tb.) : = pla, tb5) log Fla is always positive. For a further discussion of these partial averages, see Blachman, Trans. I.T., January 1968, pp. 27-31. . ‘ 2.17 az: It is eguivalent to show that EB wtay a O(a) 5 : P og . Ey 7 PF BG) § and this follows immediately from the inequality log *** (Jog e) (x-1)- ' 8 Using] the same inequality again, ' 1 2 Pia < « ¢ r z p(a,) log aa, log e] Tae } ~ Beta) 2 : P (a) 0 * 2 ear - k * 2 a) sie. . k Qa) i 2.28 Fox the cascaded channels of fig. 2.3.2, we saw in (2.3.15) that 2(%;¥]2) = 0. This implies that x and ¥ are conditionally independent given Z, which means that for each yz pair of non-zero probability, -15- p(xlyz) = pGxlz) «w prom (2.3-17), we see that 1(%jZ) 4 10%5¥) for cascaded channels iff 1(X;2|¥) = 0, which applies iff, for each yz paix of non-zexo probability, p(xlyz) = poly) . Ga) Combining (i) and (44), we see that 1(X;2) = 1QG¥) iff for each yz paix of non-zero probability p(xlz) = POxly) for all x ¢ X (434) Now suppose that Pyjz(b;le;) > 0 and Py[z(bjle.) > 0 Since conditional probabilities are only defined if the conditioning events have non-zero probabilities the yz 2 and 4£ 1(Xj¥) = I(%;¥), then from (iid), pairs b, c, and b, ¢, tpoth have non-zero probabilities, =P gfyl@ lbs) =P gizl ly) P xlall for all x, and c, and c, are equivalent. Conversely, assume that Pygibs res) > 0 and Pyg(P5+ep) > 0 implies that ©, and ¢, are equivalent. Then. from (i). p(xlyz) iis independent of 2 for those yz pairs of non-zero probability, and thus (ii) is valid, which implies (xX) = (07) « mo TE @, and cy axe eguivelent for a given input ~16- distribution P,(a.) > 0, then for all k tole) = Paige lad Pla) . Po iglee LAP (A) Pxlg 7) Py ley) Po (e,) net «= Zep. then for all k adh Pyigles lad = %Bzigle, av) Zé we let Q,(a,) > 0 be another input distribution and Q, the induced output distribution, then from (tv) a,c.) = Kae) Then Pringles fa, 2 &lad = Po jx (er 1a) Q la) ale) ate.) and c, ¢, are still equivalent in the new distribution. It then follows from the result in that if 1(X;Z) =.1(x;¥) for the original distribution, the same result is true for the new. distribution. az £(4¥7Z) = (KZ) + 2020 Since I(¥;Z|X) 2 0, I(X¥;%) 2 1(x;Z). From theorem 2.3.3, equality holds iff conditional on each x,¥ and % are statistically independent. br H(xvlz) = H(x!z) + H(¥]xz) 17 Thus H(xY|Z) = H(X|Z) with equality iff y is uniquely determined by x and z. eo I(X¥;Z) = 1(K;Z) + 1(Z;¥|X) = (27x) + 1(x;z]¥) Rearranging, we see that the given inequality is always satisfied with equality. a: H(x¥Z) - (Xx) = H(2|xx) H(x2) ~ H(X) = H(2|X) From 2.3.13, H(Z|x) 2 H(z|xv) with equality iff conditional” on each x, y and z are statistically independent. Thus the given inequality is valid with the above conditions for equality. Pyryz, (00000) = Pyyy(O,1,1) Pyyg(teOed) = Pyyg(ts2,0) = % This ‘assignment yields 1(x;¥) = 0 and r(x;¥1z) = 1 bit. 0,0,0) = Q1,1) = . Payal Puva This assignment yields 1(x;¥) = 1 bit and 1(x;¥|Z) = 07 var(I(x;y)] = 0 iff I(xzy) is the same for all xy pairs of positive probability; i.e. if Puy . ; log 2, (Py) log a; all xy with P(x,y) > 0. 0 P(x,y) = a PL(x)PY(y) Averaging I(x7y), we get X(X;¥) = log for w = 1, X and ¥ are statistically independent and Z(%;¥) = 0. ~18- ct For the first channel P,(a,) = 1/2 and P,(a,) and P,(a,) axe arbitrary. I(X;¥) = 1 bit. For the second = 1/3; 144 3 and 1(X;¥) = log 3/2 channel P,( 2.22 See any elementary text book dealing with continuous valued random variables. 2 2 - = 2.28 ryjgviNE) = qaewe en[- SSE] 9 n. =- - — SEN ay prly > Ole = -VE) = J, ws ov) age | ev where we have made the substitution u = wis, By the same argument, Prly < 0|x +S ] has the same value. px[ sign y # sigmx ] s/o? px{sign y # sign x) = ~19- whexe we"have used the approximation exp ~ ya 1 for of small. Feller (section VIII.1) shows that for large y, 2 exe (- 2 ) witn y =VS/o*, this gives the desired result. w Py bdry) (zy) ax dy + 2.24 az ~1(X;¥) # (fey by) tog xy Py CX) py (y) j = (10g e) Sf Peete PE = al ex ey bi zet_ p(x) = {2 7 OSKSA elsewhere Tnen H(X) = log A, and this is negative for A <1. I 2.25 Observe that from the symmetry py(y) = Fe 3 Of y<2T, Py |x ty 10x:¥) = Srg00 Peypgtr dos AG a “Now Py|y(¥ (y-x) where subtraction is modulo 27, _ nd thus Gy repel) 20g Pyjxtyd =~ Biz y I(X;¥) = log 2 - H(2) ~20- if pylz) = py a *a x(x 7¥ 12) = TORY sup cae ‘pl fp) = sup I(x, >) 2(%;¥) Thus if 1(x7xX) > 1(x;x12), I(xX_7¥ 12 > L(X;¥|2) sup T(X,7¥,12,) > 20G¥|2) Theré are 1 + 100 + eo) + eg) = 166,701 sequences with 3 or fewer ones and code words must be provided |” , for each. Since 227 < 166,701 < 238, the binary code words must be of length 18. b: Px[sequence contains 4 ox more 1's] ° . 100 . z = (29%) ¢.995) 20°F (008) * : is4 = .0018 ce: Using the form of the Chebyshevinequality in problem 2.4a, and letting i be the number of ones in the sequence, . \ <2, oe] prli2 4] #¢G = 1/8 Using the more usual form in problem 2.4.6, - 100(.005) (.995 y ¢ 400.005) (.995) _ 2 4-i) G5} =04 peli 2 47 ¢ peli We observe that both forms of the bound are rather weak. ot 3.2, From the Chebyshev inequality (see problem 2.4¢) Le 8* pet] 2) ~ nq) [267 « Waele where u is a single source letter and 2 var{x(u)} = 2 f1og $1" + Glog 4)? - lw) = .472 az, ~ 1884 Note that the results will be different if a different logarithm baseis used. 2 bt Ly = +471 % 10 As might be expected the Chebyshev inequality is very loose here. cz From (3.1.13) and (3.1.14), ) gulatu)~6) eae inw)+s) (+ 403 104° «a «107865 for part a) 10 10 qohtt46x10°" sa ¢ 19 1+849x20 for part b) 3.3 23: 0; Code I satisfies the prefix condition and code II doesn't. tb: Both codes are uniquely decodable (the occurence of a 1 in code Ii always specifies the beginning of a new code word.) ec: For code I, Tail) = -log 0.4 For code II, I(aj71) = 0 . a: For code I, — 1(U;X,) = -0.4 log 0.4 --0.6 log 0.6 a = .971 bits For code II, 1(U;X,) = 0 The initial letter in each code word of code II provides information about the previous message. . First take each pair of code words’ for which one is a prefix of the other, and for each such pair, list the dangling, suffix that xemains when the prefix word is removed from the initial part of thelmge word. For example, the dangling suffix for the pair 01, 01110 is 110. Next consider the code words and dangling suffixes together, and for each pair consisting of one dangling suffix and one code word of which one is a prefix of. the other, list a new dangling suffix. There is, of course, no need to list suffixes which have already appeared in the list. Continue to do the sanie thing with each new dangling suffix added to the list until either no new dangling suffixes can be formed (in which case the code is uniguely decodable) or one of the dangling suffixes is a code word (in which casethe code is not uniquely decodable.) -24- b: Bach dangling suffix formea in the first step of the procedure must be the suftix of a code word. Each dangling” suffix subsequently formed is either the suffix of a code word ox the suffix of some previously formed dangling suffix. But using induction, each dangling suffix is the suffix of a code word (i.e. if all dangling suftixes found up to a given point are suffixes of code words, the next dangling suffix must be likewise). Excluding the trivial case of repeatea or zero length code words, we see that for a code word of length m,, there are at most. mj-1 suffixes or that word which could appear as dangling * suffixes, Thus the total number of dangling suffixes that can appear is upper bounded by - J (mj~1). iL All the codes listed except jv, 01, 10} are uniquely decodable. . 2 d: For the code {0, 01, 11}, the sequence 01111111 ... can be resolved into a, a, a, +. or a) a, a3 +e. : For the code {110, 11, 100, 00, 10}, the sequence 11000000... can be resolved into aj, a,, a4, - Agr Aye vee + Such sequences cannot be constructed for see OF a the other uniquely decodaple codes. a: {00, 01, 100, 102, 11u0, 1201, 2110, 22112) For any k > j we observe tnat kel “n = ST pla.) 2 pla.) 22 9 isj j Thus tne code words for a, and a, must differ somewnere in the first n. positions and since poth have length at ° w or BD. least n, the prefix condition is satisfied. Next for each i, we have = log Pla) ny Plas) For any k > j, we then have 1 1 Q-O,= S Pla) +7 Pla) + > Pla) 7 PS st i? Fg Phas? Fg PMY Thus the code words must @iffer within the first n, or n, @igits, whichever is smaller. It follows that the prefix condition is satisfied and the code is alphabetic. Since < - log Play) + 2, we can average over i to get H 0 and for 6 sufficiently small, n = H(U) + l-e. The binary and ternary codes for the first source are {o0, 10, 010, 012, 110, 111} ana {0, 10, 11, 12, 20, 23} The average lengths are 2.5 and 1.7. “The codes are not ~27- unigue, but the average lengths are. The codes for the second source are {00, 10, 010, 110, 122, 0110, 0111} and {0, 1, 20, 21, 220, 221, 222}. the average lengths axe 2.55 and 1.65. * a: N,, is the sum - of the lengths of the L code words corresponding to the L source letters. Thus N, is the ‘sum of L independent identically distributed random variables, each with mean fi = 2.5. Thus from the law of laxge numbers the required limit is 2.5, and in more precise terms, for. any ¢ > 0, Lim pr| [5 - 2.5 [> cl. ° ey = b: From theorem 3.3.2, lim H(U) Keo also by the same argument as in (a), . N(K) OK) | 7 LK LK vin ve[® _ E | , ] ieee Thus, TS nxbX) - (a) |> eko : lim lim pel Re Keo [se he two codes axe {v0, 01, 02, 10, 11, 12} ana {o0, 01, 02, 10, 11, 120, 121}. whe general rule is to start out by grouping the two least probable -26~ ’ messages together if the source alphabet size is odd and the.three least likely messages together if the size is even. The Huffman procedure is then used (with R = 3) until the reduced ensemble contains only two messages, at which point the Huffman procedure with D = 2 is used. The same argument establishes this rule as establishes the ordinary Huffman procedure; the only difference is that a complete tree with the lowest branch point binary and subsequent branch points ternary has an even number of terminal modes. Observe that if each code word in a prefix condition code is inverted (i.e. changing x = x, x, x, x, into x, x, x, x), then the resulting code satisfies the suffix condition. A suffix condition code must be uniquely’ decodable,. for if itswas not, two sequences of code words with identical code letters could be inverted, obtaining a non-uniquely dgcodabl sequence for the corresponding prefix condition code. A minimum average length suffix condition code can be generated by first finding, the Huffman code and then inverting the code words, an operation which does not change the average length. Two such codes are {00, 02, 02, 10, 11, 12, 20, 21} ana "fo, 20, 22, 10, 11, 12, 220, 221}. Both have average length 2 but the variance of the length for the first is 0 and for the second is 0-4. The obvious advantage of the first code is that it generates no, waiting line problems. Observe from (3.3+5) and (3.3.6) that 3.15 ~29- H(U) =f log, 3 iff the ineguality a log Pla) = (log e) Grp - 2) jis satisfied with equality for all k. This in tuxn =n, occurs only if 3 * = Pla). mh : 3£ P(a,) = 3 ™ gor each k, then the Kraft inequality is satisfied with equality. On the other hand if the _ number of messages is even, the code tree is not complete and another code word could be added without violating the Kraft inequality. Thus the number of messages must be odd. : a: ‘here must be at least two code words of the longest length. 'T£ the shortest word had a length shorter than ~ this longest length minus 1, then we could decrease the , length of: the two longest words by 1 and increase the length of the shortest word by 1 without violating the Kraft inequality. This would yield a code of shorter average length than the original code and this is a ‘contradiction of the assumption that the original code is a Huffman code. Thus the longest and shortest length differ by at most 1, and from the Kraft inequality the lengths must be j and j+1. b: Let L be the number of words of length j. From the Kraft inequality, which must be satisfied with equality for a binary Huffman code, we have ~30- pod + G2) = np aa b= (2-2) 3.16 Let ay the reduced ensemble u’. Then and a, be combined in the ensemble U to form H(U) ~ HU!) = -P(ay_)) log Play) = P(e) log Play) Opa, y) + P(a,)) log (Play) + Pla) 1 . = (Pla) + Pla) lg tog 3 + (1a) Log ty) 1 Play) vnen d=) 5t ya Fa Assuming logarithms to the base 2, the binary entropy above is bounded by 1, so :that H(u) - H(U!) # Play) + Play) Since an optimum code for U can be formed from an optimum code for u! by adding a terminal 0 and 1 to the last code word, nn! = Pla, ,) + Play) 3-17; =315 Combining these equations, n- H(U) =n’ -H(U‘) . a: A prefix code can be used for the final stage of encoding, and if it is, each stage is clearly uniquely decodable and thus the overall code is uniquely decodable, bb: the indicated source sequences have probabilities . 0.2, (049) (0-2) (049)7(0.2), «eee (049)7(0.2), (0.9%. Thus 7 & ie. 8 n. = i(0.2) (0.9) + 8(0.9)" = 5.6953 i=l a= (0.9)8 + 4[1-0.9°] = 2.7086 @: Let N(i) be the number of source digits giving rise to the first i intermediate digits: For any ©>0, aah pet EL - a, oe i a ‘ Similarly let L(i) be the number of final encoded digits corresponding to the first i intermediate digits. nim Pet [EEL - a, [>] = 0 joe From this, we see that for any ¢>0, : ui) _ Bay lim PrtVGy 7 o ine nj/ny = +4756 3.18 The average length for the Huffman code encoding 4 digits at a time is 1.9702. Observe from this that the Huffman code is the optimal solution to a mathematical problem with a given message set but the choice of message set can be more important than the choice of code words for a given message set. : a: One optimal code is (1, 01, 0000, 001, 0001) and the average time per source letter is 4.15. b: If the probability associated with one node is greater than that associated with a shorter node, then the two nodes can be reversed (reversing also the parts of the tree stemming from them) and such a reversal must decrease the average length. There are 2Mt1 possibilities with M coins corresponding to each being heavy or light or to all being right. . There are 3 possible outcomes from each weighing and 3" sequences of outcomes from n weighings. It follows that 2M+1 = 3" Mm < (391) /2 a: For any given nél, take M = (3"1)/2. Put (3°41) /2 pennies on one side of the scale and (3"'+1)/2 pennies plus the standard penny on the other side. If the scale balances, all but the'(3"”'-1)/2 remaining pennies have the correct weight and we have the original problem with n reduced by 1, If the scale is unbalanced, we have (3141) /2 potentially heavy coins and (3°74-1)/2 potentially light coins (or perhaps vice versa). We put (377741) /2 wach of potentially heavy coins and potentially Light coins on one side and (3°77.1)/2 each plus a standard coin on the other side. It is easy to verify that after’ the weighing, for each of the three possible outcomes, there are (3""%41)/2 potentially heavy coins and (3"77-1)/2 potentially light coins (or vice versa), and thus the same strategy, with.n-reduced by 1, will : work on the next weighing. . b: The secret of success in a) was.to have each weighing split the alternatives into three equal sets. With m= (3"-2)/2, and without a standard coin, an even number of alternatives will beimpossible if the ‘first weighing doesn't balance, and thus the alternatives can't be split into equal sets. We show now how the ; wieghing can be accomplished with M = (3"-3)/2 coins. ~ First put (3"71+1)/2 on each side of the scale. If the scales balance, we haye,,(3"-*-1)/2 remaining coins to worry about and standard coins are now available. If the scales don't balance, we continue with the strategy in part a). : Hy ©) -¢ BU 9, Wop ae tat HU, WL. uy) : : qa) 2 BW: Woy yee By) (2) when we have observed that the fixst'term on the ‘right ~34- in (1) lower bounds each of the other terms. Also Byin . a , for all n, successive pairs in positions 2n and 2ntl are statistically dependent, but form a Markov graph as below : 3.22 1/2, 00 ) 1/2 1/2 C. | 1/2 Be J The steady.state probabilities are 1/4 for each state, the time average code word length is 9/4, and the - number of code letters per source lettex tends to 9/8. az g(1) = g(3) = 2/7; g(2) = 3/7. Pla) = 3/7; P(a,) = Pag) = 2/7 b: H(uls = 1) = 1 1/2 bits a(uls = 2) 52 bit H(uls = 3) = 0 bits = cz: HU) = 6/7 bit Ste d: For state 1, aj*0, aj“l0, a,~1l. For state 2, a,“0, a,“l; for state 3, no code letter is provided. 1 Given the initial state, the end of the first code word is recognizable; this specifies the source letter, and thus the next state and so torth. e: H= 6/7 bit. We have WH =H (U) if each code conditional on the current state has an average length equal to the entropy given the current state. Thus H =H (U) if ana only if all letter probabilities are of the form 2” for some non-negative integer. -36- : The state sequence is a statistically independent sequence of 1's and 2's, and the source letters axe gust xelabelings of the state. 3 H(U,|U, ] = +2 USS. COM g +988.) = HO Vare ety SQ) “: ° = HW, Uys . HW! U8. 8) : : eH(U U US. 7 ra! ay) : The middle step above results from U, and S, being statistically independent of U,,, conditichal on 8, Vz +++ U, (see 2.3.13). Thus H(U;|U,_,-+.U,5)) is nondecreasing with ; and from theorem 3.5.1, H(z 10, is non increasing with.¢. Also, from (2.3.13), Since the left side is non decreasing with £, HU, 10. jin HU, lu, = 4) . Las The rate of convergence can be bounded also by noting that a é = Hu, ! aaa (uj lu, _yee-¥j8)) <8 tu, ele PCW, .+-0,18)) = On the other hand - ! = 7 HU .+060)) - H(W,+-.U,'8,) = 1(S) 70, --.0.) < H(S)) 1 * log J ~37- where J is the number of states. Combining these equations ~with (3.5.3), jog 3 Hg |Y, 1-2) ~ Hg |U,_1-+6B,8,) 4B + 4.2 Let Q,(x) be a joint probability assignment on the N channel inputs. For any such assignment, 208) = nO) = no 4) Note that ale) is the expected value of -log-Prly!x) = 5° -log Perl): and thus nel N ae ia ye S wile . n=l . Also, from 2.3.10, N HO) = #e,) ay Combining these equations N N aay) -< Frey) «Ec, ~(2) n=l n=l Equality holds in (1) if the inputs are Statistically independent, and in (2) if also the individual input probabilities are chogen to achieve capacity on the = individual channels. Thus the capacity of the parallel canbination is $C -38- 1 L BO) > H(Z) x ¥ a yp . Le ” ° H(X) < (2) Gop—__$ nr rtirnr drsrrmnrs 4.3° The only non-obvious step is’ the last one. We have © “a(ulve) = SS Pr(uvle) 105 EtaTy7 vo out 1 = pr(vle) S Pr(ulve) log i T af Pr(ulve) v, the latter sum is an entropy over For eacn choice an alphabet cf the M-1 choices of.u unequal to v and thus the latter sum is at most log ,(i-1) for each v. Summing over v, R(ulve) © log(M-1). ‘.& Applying (4.3.22) to the source and channel in this problem, we have Heer) 2 1 -c = Mle) ~39= This implies that eS P> flee e If thexe were some strategy that would yield an error probability greater than l-e, then all decisions in that strategy could be reversed, achieving an error probability less than e. In everyday student language, it isas Gifficult to get all the answers wrong in a‘ true-false test as it is to get all the answers right. With no coding, = ¢; one can do no better than to simply -° transmit the digits without change. . a -4u-

log 3 + A (

) B 2c, (i) w <> e (lower ound) »: P(3)k) = 6/3 + itk l-e ; jek where ¢43/4 satisfies ¢ log 3+ W(e) = 2-0, er \ log (M-1) +H() > 1 ; 5 . j log (u-1) = 2=2810— 10 t 6 ; } M2 l+ 2(10 20) The -Zirst two steps in the derivation are straightforward. » rx ry, WY, d= TO ry Ie X ur *nen + ry Ixy For a DMC ., however, the channel output at.each instant depends only on the input at that instant, yielding Ply bevy ee r¥, ay) = PCy) ) ~)

You might also like