det A - det B, (B.12)
= [det Al', (B.13)
det (cA) = c% + det A, for any number c. (B.14)
‘A matrix is nonsingular if and only if its determinant is nonzero.
(©) The eigenvalues
Given an N x N square matrix A and a column vector u with N entries, consider
the set of NV linear equations
Au=\a, (B.15)
where \ is a constant and the entries of u are the unknown. There are only N values
of A (not necessarily distinct) such that (B.15) has a nonzero solution. These numbers
are called the eigenvalues of A, and the corresponding vectors u the eigenvectors associated
with them. Note that if u is an eigenvector associated with the eigenvalue \ then, for
any complex number c, cu is also an eigenvector.
‘The polynomial a(n) 2 det (AL ~ A) in the indeterminate d is called the characteristic
polynomial of A. The equation
det (AI — A) = 0. @.16)
is the characteristic equation of A, and its roots are the eigenvalues of A. The Cayley-
Hamilton theorem states that every square N X N matrix A satisfies its characteristic
equation. That is, if the characteristic polynomial of A is a(\) = XN + oyWN~! +
+++ bay, then
afA) 2 AN + ay ANH 4s hay = 0, (B.17)
where 0 is the null matrix (i.e., the matrix all of whose elements are zero). The monic
polynomial 2(X) of lowest degree such that 41(A) = 0 is called the minimal polynomial
of A. If f(x) is a polynomial in the indeterminate x, and u is an eigenvector of A
associated with the eigenvalue A, then
SA) u = fou. (B.18)
That is, f(A) is an eigenvalue of f(A) and wu is the corresponding eigenvector. The
eigenvalues dy, . . . , Awof the N x N matrix A have the properties
Ny
det (A) = [] 4, (B.19)
Fa
and
w
tr(A) = > d,. (B.20)
a
From (B.19), it is immediately seen that A is nonsingular if and only if none of its
cigenvalues is zero.
584 Some Facts from Matrix Theory — Appendix B(@) The spectral norm and the spectral radius
Given an NX N matrix A, its spectral norm |jA|
the nonnegative number
(all & sup
0 I
(B21)
where x is an N-component column vector, and |lul] denotes the Euclidean norm of the
vector uz
x
(ul & VS wl? = Vara. (6.22)
We have
UA Bil = [A\l-/B 2»
UA xh < All-In 32)
for any matrix B and vector x. If \;, 7 = 1,. . . , N, denote the eigenvalues of A,
the radius p(A) of the smallest disk centered at the origin of the complex plane that
includes all these eigenvalues is called the spectral radius of A:
(A) = max, | (B.25)
In general, for an arbitrary complex N x N matrix A, we have
(A) = |[Al] (B.26)
and
IlAl| = Vp(ArA). (B.27)
IfA = At, then
(A) = |All. (B.28)
(©) Quadratic forms
Given an N x N square matrix A and a column vector x with N entries, we call
8 quadratic form the quantity
oN
WAX = > > xfayx. (B.29)
B.3 SOME CLASSES OF MATRICES
Let A be an N x N square matrix.
(@) A is called symmetric if A’ = A.
(b) A is called Hermitian if AY = A.
Sec. B.3 Some Classes of Matrices 585(©) Ais called orthogonal if A~! = A’
(d) A is called unitary if AW! = AY
(©) Ais called diagonal if its entries ay are zero unless i = j. A useful notation for a
diagonal matrix is
A= diag(ay,, 22, . . - 5 ayy):
(f) A is called scalar if A = cl for some constant number ¢; that is, A is diagonal
with equal entries on the main diagonal.
(g) A is called a Toeplitz matrix if its entri
dy satisfy the condition
ay (B.30)
‘That is, its elements on the same diagonal are equal.
(h) A is called circulant if its rows are all the cyclic shifts of the first one:
‘ 45, = 44-~pmodN- (B31)
(i) Ais called positive (nonnegative) definite if all its eigenvalues are positive (nonnega-
tive). Equivalently, A is positive (nonnegative) definite if and only if for any nonzero
column vector x the quadratic form x"Ax is positive (nonnegative).
Example B.1
Let A be Hermitian. Then the quadratic form f 2 x"Ax is real. In fact,
St=OtAnt
= XA
(atey'x
= xia.
(B.32)
Since A’ = A, this is equal to x"Ax = f, which shows that fis real.
Example B.2
Consider the random column vector x = [x),%, - . . 5 tyl", and its correlation matrix
RE Elxx']. B.33)
It is easily seen that R is Hermitian. Also, R is nonnegative definite; in fact, for any
nonzero deterministic column vector a,
a'Ra=alE[xx'Ja
Efa'x x'al (B34)
= Ella'xP} =0
with equality only if a'x = 0 almost surely; that is, the components of x are linearly
dependent
Ifx,,. . . , xyare samples taken from a wide-sense stationary discrete-time random
process, and we define
ij 2 Eines 835)
586 Some Facts from Matrix Theory Appendix Bit is seem that the entry of Rin the ith row and the jth column is precisely ry, ‘This
shows in particular that R is a Toeplitz matrix
If Gf) denotes the discrete Fourier transform of the autocorrelation sequence (r,),
that is Gf) is the power spectrum of the random process (x,) [see (2.86)], the following
can be shown.
(a) The eigenvalues Ay, . . . , Ayof R are samples (not necessarily equidistant) of
the function 4).
() For any function (+), we have the Toeplitz distribution theorem (Grevander
and Szeg6, 1958):
es
aya ee [wena o 8.36)
Example B3
Let € be a circulant > N matrix ofthe foo
cule % 4 ova
Cae &
Let also w 2 e?*™, so that w = 1. Then the eigenvector associated with the eigenvalue
nis
Do 2h lA (0.38)
fori = 0,1... . .N~ 1. The eigenvalues of C are
x= DS ew, i=0,1, -4, (B.39)
and Af can be interpreted as the value of the Fourier transform of the sequence ¢p, 1.
+ Gots taken at frequency iN. C
Example B.4
Mf U is a unitary N x NV matrix, and A is an N x N arbitrary complex matrix, pre- ot
ostmultiplication of A by U does not alter its spectral norm; that is,
llAU|| = |[All = [lAl|. 3 B49)
8.4 CONVERGENCE OF MATRIX SEQUENCES
Consider the sequence (Ara of powers of the square matrix A. As n—> %, for A" to
tend to the null matrix 0 it is necessary and sufficient that the spectral radius of A be
less than 1. Also, as the spectral radius of A does not exceed its spectral norm, for
A"— 0 it is sufficient that [[Al| <1
Consider now the matrix series
THA ARH. HAMELS (B41)
For this series to converge, it is necessary and sufficient that A"—> 0 as n—> %, If this
holds, the sum of the series equals (I~ A)!
Sec. B.4 Convergence of Matrix Sequences 587B.5 THE GRADIENT VECTOR
Let f(x) = flay, . . . » xy) be a differentiable real function of N real arguments. Its
gradient vector, denoted by Vf, is the column vector whose N entries are the derivatives
aflay, i= 1,... NAME x1,. . . . xy are complex, that is,
=H +t if, (B.42)
the gradient of f(x) is the vector whose components are
FF
oe a
ax, ce
Example B.S
If a denotes a complex column vector, and f(x) 2 R{a'x], we have
VW) =a. 0 (B.43)
Example B.6
“Tf A is a Hermitian NX N matrix, and f(x) & x'A x, we have
Vfl) = 2Ax. 0 B44)
B.6 THE DIAGONAL DECOMPOSITION
Let A be a Hermitian N x N matrix with eigenvalues Aj, . . . , Ay. Then A can be
given the following representation:
A=UAU"',” (B.45)
where A 4 diag (Aj... , Ay), and U is a unitary matrix, so that U~! = UT. From
(B.45) it follows that
AU=UA, (B.46)
which shows that the ith colin of U is the eigenvector of A corresponding to the
eigenvalue ),. For any column vector x, the following can be derived from (B.45):
"
wax= > Abe, Ban
a
where y,,. . . , J are the components of the vector y & U'x.
BIBLIOGRAPHICAL NOTES
; There are many excellent books on matrix theory, and some of them are certainly well
known to the reader. The books by Bellman (1968) and Gantmacher (1959) are ency-
clopedic treatments in which details can be found about any topic one may wish to
study in more depth. A modem treatment of matrix theory, with emphasis on numerical
‘computations, is provided by Golub and Van Loan (1983). Faddeev and Faddeeva (1963)
oes ‘Some Facts from Matrix Theory Appendix Band Varga (1962) include treatments of matrix norms and matrix convergence. The
most complete reference about Toeplitz matrices is the book by Grenander and Szego
(1958). For a tutorial introductory treatment of Toeplitz matrices and a simple proof of
the distribution theorem (B.36), the reader is referred to (Gray, 1971 and 1972). In
Athans (1968) one can find a number of formulas about gradient vectors.
Appendix 8 Bibliographical Notes 589APPENDIX C
Variational Techniques
and
Constrained Optimization
In this appendix, we briefly list some of the optimization theory results used in the
book. Our treatment is far from rigorous, because our aim is to describe a technique
for constrained optimization rather than to provide a comprehensive development of
the underlying theory. The reader interested in more details is referred to (Luenberger,
1969, pp. 171-190) from which our treatment is derived; alternatively, to (Gelfand
and Fomin, 1963).
Let R be a function space (technically, it must be a normed linear space). Assume
that a rule is provided assigning to each function f € R a complex number ¢{ f}. Then
@ is called a functional on R.
Example C.1
Let f(z) be a continuous function defined on the interval (a,b). We write fC(a, b). Then
GUE fl), a xd,
aif2 fmensenan — wecia,b)
and
oun froa.
are functionals on the space C(a, 6). (3
If g is a functional on R, and f, hk € R, the functional
d
BeL fh 4 7 OLf + ah] ano (cyis called the Fréchet differential of . The concept of Fréchet differential provides a
technique to find the maxima and minima of a functional. We have the following result
A necessary condition for g{f] to achieve a maximum or minimum value for f = fo is
that Bo(fo; A) = 0 for all hE R.
In many optimization problems the optimal function is required to satisfy certain
constraints. We consider in particular the situation in which a functional @ on R must
be optimized under n constraints given in the implicit form WiLf]=C1, Wal f|=Co,
+ Val f] = Cyr where Yr, . . , Yy are functionals on R, and Cy, .. . , C, are
constants. We have the following result:
If fo E R gives a maximum or a minimum of @ subject to the constraints yf f1=C),
1
0 for —2 < x < =) subject to the constraint
° 1
rapa = 1
J « Lfeok
‘The relevant Fréchet differential is
a . A
da fifo (FO) ah TSF aC)
_
= [__{ocerso0 ~zeypea}aeo a
and this is zero for any A(x) provided that
yor.
leat = ay
592 Variational Techniques and Constrained Optimization Appendix CAPPENDIX D
Transfer Functions
of State Diagrams
A state diagram of a convolutional code such as that of Fig. 9.28 is a directed graph.
To define a directed graph, we give a set V = {v), ¥2,. . . } of vertices and a subset
E of ordered pairs of vertices from V, called edges. A graph can be represented drawing,
a set of points corresponding to the vertices and a set of arrows corresponding to each
edge of E and connecting two vertices. A path in a graph is a sequence of edges and
can be denoted by giving the string of subsequent vertices included into the path. In
the study of convolutional codes, we are interested in the enumeration of the paths
with similar properties.
A simple directed graph is shown in Fig. D.1. There are three vertices, v4, 0
and v3, and four edges (0, V3), (V4, V3), (U2, 03), and (v, V2). In this graph there are
infinitely many paths between v, and v3 because of the loop at the vertex v2. One path
of length 4, is, for instance, vyvvv0. Each edge of a graph can be assigned a label.
‘An important quantity, called transmission between two vertices, is defined as the sum
of the labels of all paths of any length connecting the two vertices. The label of a path
is defined as the product of the labels of the edges of the path. For example, the label
Figure D1 Example of ©
c te directed graph,
593of the path 7107070705 is LyaL32L25. The transmis
then given by
mn between v; and v3 in Fig. D.1 is
TO, 0s) = Lys + Laalas + Lyabaalas + Liabdalas + +++
= Lys + Lizlas( + Loa + La +++) @.
Likes |
Lyt
pe ela
Notice that, in deriving (D.1), we have assumed that the labels are real numbers less
than 1. Therefore, Fig. D.1 can be replaced by the scheme of Fig. D.2, with the label
Lia given by (D.1).
hs Figure D.2 Reduced graph
8 -. to compute the transmission
” Te).
Given any graph, it is thus possible to compute the transmission between a pair
of vertices by removing one by one the intermediate vertices on the graph and by
redefining the new labels. As an example, the graph of Fig. D.3 replicates that of Fig.
9.32, with labels A, B, C, ... . , G. Using the result (D.1), the graph of Fig. D.3
can be replaced with that of Fig. D.4, in which the vertex d has been removed. By
removing also the vertex b we get the graph of Fig. D.5, and finally the transmission
between a and e:
ACG (1 ~ E) + ABFG
T oe D2)
a.) = TE CD + CDE — BDF a
‘This result, with suitable substitution of labels, coincides with (9.104).
A e
fe 3 : :
Figure D.4 First step to reduce the graph of Fig.
D3.
ae
acs ME <
Figure D.3_ Directed graph corresponding t0 . © .
the convolutional code state diagram of Fig.
9.32. Figure D.S_ Second step to reduce the graph
of Fig. D.3,
There is another technique for evaluating T(a, e) that can be very useful in computa-
tions because it is based on a matrix description of the problem. Let us define by x; the
value of the accumulated path labels from the initial state a to the state i, as influenced
by all other states. Therefore, the state equations for the graph of Fig. D.3 are
594 Transfer Functions of State Diagrams Appendix Dp= A+ Dio,
He = Cy + Pray
xy = Bry + Ey
a= Gx,
In this approach we have Tia, e) £ x,, and therefore we can solve the system (D.3)
and verify that x, is given again by (D.2)
‘The set of equations (D.3) can be given a more general and formal expression
Define the two vectors
X'S (py Kes Xap He)
x) £ (A, 0, 6, 0)
and the state transition matrix T
abc 4
0 D Oja
rea [eo 0 |b
BE 0 0 Je
GG ola
Using (D.4) and (D.5), the system (D.3) can be rewritten in matrix form as
Tx+%.
‘The formal solution to this equation can be written as
x= U-T)! x,
or as the matrix power series
X= (14TH TP + TD +> + xo.
(D.4)
(D.5)
(0.6)
7)
(D.8)
Notice that this power series solution is very satisfying when considering the state
diagram as being described by a walk into a trellis. Each successive multiplication by
T corresponds to a change in the state vector caused by going one level deeper in the
trellis. Notice that the multiplication by T is very simple since most of its entries are
zero. When the number of states is small, the matrix inversion (D7) is also usetul to
get directly the result in the form (D.2)..
Transfer Functions of State Diagrams Appendix DAPPENDIX E
Approximate Computation
of Averages
The aim of this appendix is to describe some techniques for the evaluation of bounds
or numerical approximations to the average E{g(é)], where g(-) is an explicitly known
deterministic function, and £ is some random variable whose probability density function
is not known explicitly, or is highly complex and hence very difficult to compute exactly.
It is assumed that a certain amount of knowledge is available about £, expressed by a
finite and usually small set of its moments. Also, we shal] assume that the range of €
lies in the interval (a, b], where both a and b are finite unless otherwise stated. The
techniques described hereafter are not intended to exhaust the set of possible methods
for solving this problem, However, they are general enough to handle a large class of
situations and computationally efficient in terms of speed and accuracy. Also, instead
of providing a single technique, we describe several, as we advocate that the specific
problem handled should determine the technique best suited to it from the viewpoint of
computational effort required, accuracy, and applicability.
E.1 SERIES EXPANSION TECHNIQUE
In this section, we shall assume that the function g(x) is analytic at point x
Hence, we can represent it in a neighborhood of xy using the Taylor's series expansion:
86) = sa) + (~ se) + Ge ~ 4g EEO
0G
— “Gx eo e)
596If the radius of convergence of (B.1) is large enough to include the range of the
random variable &, and we define
ey & ELE = xol", (E.2)
then, averaging termwise the Taylor’s series expansion of g(é), we get from (E.2)
ra (0%)
Flat) = abe) + e100) + 4c EOD (3)
Itcan be seen from (E.3) that E[g(€)] can be evaluated on the basis of the knowledge
of the sequence of moments (c,);=1, provided that the series converges. In particular,
‘an approximate value of E{g(@)] based on a finite number of moments can be obtained
by truncating (E.3):
£00),
Ble(@1 = Ealet@ & 5 «tf 4)
‘The error of this approximation is
= pte
Ele(@)] - Enlet®) of
sew (5)
“x Sp HE = 20)" 7 Mhay + OE = oD,
where 0 < @ < 1. Depending on the specific application characteristics, either one of
the second and third terms in (E.5) can be used to obtain a bound on the truncation
error. In any case, bounds on the values of the derivatives of g(-) and of the moments
(E.2) must be available.
As an example of application of this technique, let us consider the function
B(x) = erfe () (E.6)
where h and o are known parameters. This function can be expanded in a Taylor's
series in the neighborhood of the origin by observing that (Abramowitz and Stegun,
1972, p. 298)
a
Serie) = So (ze
7
where H,_\(-) is the Hermite polynomial of degree (k ~ 1) (Abramowitz and Stegun,
1972, pp. 773-787). Thus,
h_) BR
(0) = (-t :
(0) = ¢ Wea (Ze exo ( - (E.8)
and we finally get
h 2
Fla] = erfe (a) * vee 20°
(E.9)
S (Dh
2 aye!
(ya)
e
Sec. £1 Series Expansion Technique 597where
Spe, k= 1,2,
are the central moments of the random variable €.
‘The proof that the series (E.9) is convergent, as well as an upper bound on the
truncation error, can be obtained by using the following bound on the value of Hermite
polynomials (Abramowitz and Stegun, 1972, p. 787):
1H,(@)| = 82"? Vat eo(5) B
by
(E.10)
086435, m— 1.2... , (1)
and the following bound on central moments:
Ince] SENEMX’, k= 0, 5 =O, (E.12)
where x denotes the maximum value taken by |g| The bound (E.12) can be easily
derived under the assumption that & is bounded. Using (E.11) and (E.12), we get the
inequality (Prabhu, 1971)
[Rotgll 4 |E[g@)1 ~ Ents)
Bele xen Wtgor x ] ‘ (@.13)
= ‘2 oT weve |! oVNF1
which holds provided that
2
(*) a,EIP,@1, (E22)
c=
which can in turn be approximated by a finite sum. The computation of (E.32) requires
the knowledge of the ‘generalized moments” E[P,(E)}, which can be obtained, for
example, as finite linear combinations of the central moments (E.10)
E.2 QUADRATURE APPROXIMATIONS
In this section we shall describe an approximation technique for E[e(&)] based on the
observation that this average can be formally expressed as an integral:
Sec. £.2 Quadrature Approximations 599>
B[g()] = i BOMEX) dx, (E.23)
where f(-) denotes the probability density function of the random variable &. Having
ascertained that the problem of evaluating E[g(&)] is indeed equivalent to the computation
of an integral, we can resort to numerical techniques developed to compute approximate
values of integrals of the form (E.23). The most widely investigated techniques for
approximating a definite integral lead to the formula
+ w
| ofl) de = D> wie), (B24)
° a
a linear combination of values of the function g(-). The x, i= 1,2,...,N, are
called the abscissas (or points or nodes) of the formula, and the w), i = 1, 2, 5
N, ate called its weights (or coefficients). The set of abscissas and weights is usually
referred to as a quadrature rule. A systematic introduction to the theory of quadrature
rules gf the form (E.24) is given in Krylov (1962).
‘The quadrature rule is chosen to render (B.24) as accurate as possible. A first
difficulty with this theory arises when one wants to define how to measure the accuracy
of a quadrature rule. Since we want the abscissas and weights to be independent of
g(-), and hence be the same for all possible such functions, the definition of what is
‘meant by “‘accuracy’” must be made independent of the particular choice of g(-). The
classical approach here isto select a number of probe functions and constrain the quadrature
rule to be exact for these functions. By choosing g(-) to be a polynomial, it is said
that the quadrature rule (E.24) has degree of precision v if it is exact whenever g(-) i
a polynomial of degree < v for, equivalently, whenever g(x) = 1,x,.. . , 2”) and it
is not exact for g(x) = x**,
Once a criterion of goodness for quadrature rules has been defined, the next step
is to investigate which are the best quadrature rules and how they can be computed.
‘The answer is provided by the following result from numerical analysis, slightly reformu-
lated to fit our framework (see Krylov, 1962, for more details and a proof):
Given a random variable & with range [a, b] and all of whose moments exist, it is
possible to define a sequence of polynomials Po(x), P(x), . . . , deg P, (x) = i, that
are orthonormal with respect to &; that is,
ELP,(6)Pn(Q)1 = Binns myn = 0,1, 2... (E.25)
Denote by x) the RHS of (E.24) converges to
the value of the LHS “‘for almost any conceivable function’” f,(-) “‘which one meets in
practice’’ (Stroud and Secrest, 1966, p. 13). However, this is not true in practice,
essentially because the moments of § needed in the computation are not known with
infinite accuracy. Computational experience shows that the Cholesky decomposition of
the moment matrix M is the crucial step of the algorithm for the computation of Gauss
quadrature rules, since M_ gets increasingly ill-conditioned with increasing N. Round-
off errors may cause the computed M to be no longer positive definite. Thus its Cholesky
decomposition cannot be performed because it implies taking the square root of negative
numbers (Luvison and Pirani, 1979). In practice, values of NV greater than 7 or 8 can
rarely be achieved; the accuracy thus obtained is, however, satisfactory in most situations.
602 Approximate Computation of Averages Appendix E£.3 MOMENT BOUNDS
We have seen in the preceding section that the quadrature rule approach allows Elg(€)]
to be approximated in the form of a linear combination of values of g(-). This is equivalent
ting, for the actual probability density function f_(x), a discrete density in the
i
AO) =D wBe- x), (E.33)
A
where {Ww;, x). are chosen so as to match the first 2V — I moments of & according
to E31).
A more refined approach can be taken by looking for upper and lower bounds to
E[g(@)]}, still based on the moments of é. In particular, we can set the goal of finding
bounds to E{g(@)] that are in some sense optimum (j.e., they cannot be further tightened
with the available informations on £). The problem can be formulated as follows: given
a random variable € with range in the finite interval [a, b], whose first M moments 4),
bo, + Hy are known, we want to find the sharpest upper and lower bounds to the
integral
»
Ele) -{ BEL) de, (E34)
where g(-) is a known function and f+) is the (unknown) probability density function
of &. To solve this problem, we look at the set of all possible f((-) whose range is
fa, b) and whose first M moments are 11, + Hag. Then we compute the maximum
and minimum value of (E.34) as fe(-) runs through that set. The bounds obtained are
optimum, because it is certain that 2 pair of random variables exists, say &’ and &",
with range in {a, | and meeting the lower and the upper bound, respectively, with the
equality sign.
‘This extremal problem can be solved by using a set of results due essentially to
the Russian mathematician M. G. Krein (see Krein and Nudel’ man, 1977). These results
‘can be summarized as follows.
(a) If the function g(-) has continuous (M + 3)th derivative, and g%*9() is
everywhere in [a, 5] nonnegative, then the optimum bounds to E{(&)] are in the form
w x
2 wished Elet@ < 2, wee. (E35)
This is equivalent to saying that the two “‘extremal’’ probability density functions are
discrete, which allows the upper and lower bounds to be written in the form of quadrature
tules. If g%*%(-) is nonpositive instead of being nonnegative, it suffices to consider
=g(+) instead of g)
(b) If M is odd, then N’ = (M + 1)/2 and N” = (M + 3)/2. Also, fw), x//iy isa
Gauss quadrature rule, and {w;, x;}%”, is the quadrature rule having the maximum degree
of precision (i.e., 2N” + 1) under the constraints x; = a, xr = b. If M is even, then
Sec. £3 Moment Bounds 603(M + 2)/2. Also, {w;, xj}¥, (respectively, {w7, xj}J%,) is the quadra-
ture rule having the maximum achievable degree of precision (i.c., 2N’) under the
constraint x, = a (respectively, xj = b).
A technical condition involved with the derivation of these results requires that
the Gram matrix of the moments 4, . . . , jy be positive definite. For our purpose,
1a simple sufficient condition is that the cumulative distribution function of & have more
than M + 1 points of increase. If £ is a continuous random variable, this condition is
immediately satisfied. If € is a discrete random variable, it means that £ must take on
more than M + 1 values. As M + 1 is generally a small number, the latter requirement
is always satisfied in practice; otherwise, the value of E[g(é)] can be evaluated expli
with no need to bound it.
‘Computation of moment bounds
Once the moments 441, . . . , Hyyhave been computed, in order to use Krein’s
resuljs explicitly the quadrature rules {w;, xj)’, and {w;, xj}¥), must be evaluated.
From the preceding discussion it will not be surprising that the algorithms for computing
the moment bounds (E.35) bear a close resemblance to those developed for computing
Gauss quadrature rules. Indeed, the task is still to find abscissas and weights of a
quadrature rule achieving the maximum degree of precision, possibly under constraints
about the location of one or two abscissas. Several algorithms are available for this
computation (Yao and Biglieri, 1980; Omura and Simon, 1980). Here we shall briefly
describe one of them, based on the assumption that Golub and Welsch’s algorithm
TABLE E.1 Computation of abscisss and weights for moment bounds
Input moments (up abscissa and weights
Mo4a Met
Lower bound a Samy TaN
Moaa ta (@+ Det + ays
- = arrearee Me XWin a1,
‘Upper bound Ba — (a + yay + ab a
MBven
are ees Mar samy f= Is
Meven M
Upper bound Mt fatty Pb
604 Approximate Computation of Averages Appendix Edescribed in Section E.3 has been implemented. In particular, we shall show how the
known moments of & must be modified to use them as inputs to that algorithm and
how the weights and abscissas obtained at its output must be modified. These computations
are summarized in Fig. E.1 and Table E.1. By using the modified moments v; of the
first column of Table E. | as an input to Golub and Welsch’s (or an equivalent) algorithm,
the abscissas and weights of the second column are obtained. Abscissas and weights
for moment bounds, to be used in (E.35), are then obtained by performing the operations
shown in the third and fourth columns, respectively.
Say] to) [SOCUE | fw, xh [MODIFY
wa oes Jano weiscr's| WEIGHTS ANO
MOMENTS ALGORITHM, ABSCISSAS,
Figure E.1. Summary of the computations tobe performed for the evaluation of moment
bounds.
‘These computations will yield tighter bounds as the number M of available moments,
increases. However, as was discussed in the context of Gauss quadrature rules, M
‘cannot be increased without bound because of computational instabilities. In practice,
it ig rarely possible to increase M beyond 15 or so. But this gives a sufficient accuracy
in most applications.
Example E.2 (Yao and Biglieri, 1980)
‘As an example of application of theories presented in this section, consider the function
‘40) given in (E.6). Using the expression (E.7) for the kth derivative of the error function,
‘Abscissas for moment bounds ‘Weights for moment bounds
Matt Met
wie
Sec. E.3 Moment Bounds 605we get
shen = [D8 geeraap em “| }-u(53) 36)
For a fixed integer k, the sign of the bracketed term in (E.36) does not depend on
the value of x. To the contrary, H,_\[(h + x)'V2o], a polynomial having only simple
zeros, will change sign whenever its argument crosses a zero value. Hence, g!*) is
continuous and of the same sign in {a, 5] if and only if fa, 6] does not contain a root of
the equation
as an interior point. Furthermore, a simple sufficient condition for g*3%-) t0 be of the
same sign in (a, D] is that the largest root of the preceding equation is smaller than a; that
is .
hed gen, o
“where 244%) is the largest zero of Hiys2(2). Table E.2 shows the values of the largest
zer0s of the Hermite polynomials of degrees 3 to 20.
TABLE E2 The largest
zero of Hermite polynomial
ine)
“Tk in
3 Tae
4 165068
5 dois
6 2.33000
7 28196
8 293064
5 219089
0 3ase16
M sees?
2 aa8972
3 sore
i 420485
is 435399
16 sessra
" agree
18 S04836
ty $2202
20 Siseras
‘Asa simple computation, eta = = 1,6 = 1, M = 3, and
hs
We obtain in this case
Approximate Computation of Averages Appendix Eand.
For the upper bound we get
and.
so the moment bounds for g(-) as in (B.6) are
Lb) +h (Gs) eo (49)
s ol) 4B (et!
1, must be small enough to make this technique applicable
For k= 1, (6.38) may not be valid.
£.4 APPROXIMATING THE AVERAGE E (glé, »)]
Before ending this appendix, we shall briefly discuss the problem of evaluating approxima-
tions of the average Elg(é, 1)], where g(-, -) is a known deterministic function, and &,
‘7 are two correlated random variables with range in a region R of the plane. Exact
computation of this average requires knowledge of the joint probability density function
Je;n(Xs 9) of the pair of random variables &, 1. This may not be available, or the evaluation
of the double integral
Elg(é, )] = If, BOC, Wife n(%, ») dx dy (E.39)
may be unfeasible. In practice, it is often exceedingly easier to compute a small number
of joint moments
Pom 2 ELEN", m= 0,15... My (E.40)
and use this information to obtain Efg(E, ‘))
‘The first technique that can be used to this purpose is based on the expansion of
g(é, 1) in a Taylor’s series. The terms of this series will involve products £'y/ (l, m
0, 1, . . .), so that truncating the series and averaging it termwise will provide the
desired approximation.
Another possible technique uses cubature rules, a two-dimensional generalization
of quadrature rules discussed in Section E.3. With this approach, the approximation of
Elg(&, 0)] takes the form
Sec. E.4 Approximating the Average Elglé, 1] 607x
D wages yd. 4)
Ela, 0)!
As a generiilization of the one-dimensional case, we say that the cubature rule
{;,xi, yy has degree of precision v if (E.41) holds with the equality sign whenever
a(x, y) is a polynomial in x and y of degree < v, but not for all polynomials of degree
v + 1. Unfortunately, construction of cubature rules with maximum degree of precision
is, generally, an unsolved problem, and solutions are only available in some special
cases, For example, in Mysovskih (1968) a cubature rule with degree of precision 4
and NV = 6 is derived. This is valid when the region R and the function fe,(+,°) are
symmetric relative to both coordinate axes; thus, we must have
Fen 9) = Sel 9) = fen Yr OER. * (6.42)
With these assumptions, ji, = 0 if at least one of the numbers i and kis odd. The
moments needed for the computation of the cubature rule are then [1p.0, Ho,2, H4,0>
boa and pz. Under. the same symmetry assumptions, a cubature rule with N = 19
and dégree of precision 9 can be obtained by using the moments 19,0, bo,2s Ha,o» Ho,ts
Hay Baas M42» Ho.6> B6,0r H2,6+ 6,2» Ba,or Hos» and j14,4 (Piessens and Haegemans,
1975).
If a higher degree of precision is sought or the symmetry requirements are not
satisfied, one can resort to ‘‘good”” cubature rules that can be computed through the
joint moments (E.40). Formulas of the type
Ne Ny
Elgg, = > 2 meetin £43)
with degre of precision v = mini 1. Ny — 1 can be found by using the moments
Bor Boss A= 1, + Ney 2Ny, and Hie k= 1, -oNe7 1,
k=l, ...,N,— 1 Eg valent algorithms for the computation of weights and
abscissas in (B.43) were derived in Luvison and Navino (1976) and Omura and Simon
(1980).
We conclude by commenting briefly on the important special case in which the
two random variables & and 7 are independent. In this situation, by using moments of
£ one can construct the Gauss quadrature rule {w;, x}",, and by using moments of
‘n one can similarly obtain the Gauss quadrature rule {u), y)
matter to show that the following cubature rule can be obtaine
Ne Ny
Elgé. wl = > > wants. y), (6.44)
ist jet
and this has degree of precision v = min(2N, — 1, 2N, ~ 1).
‘Then it is a simple
608 Approximate Comiputation of Averages Appendix EAPPENDIX FE
The Viterbi Algorithm
Consider K real scalar functions do(t9), A(T) + Mx-i(tx—1), whose arguments.
“or «+ » 7x1 €an take on a finite number of values, and their sum
KI
Mito Ti ee TD SD Mt. (FD)
In this appendix we consider the problem of computing the minimum of X(-), that is,
the quantity
ke
aro) 2 dA). (F.2)
Of course, this problem is trivial when the 7's are in a sense “‘independent’” (i.c., the
set of values that each of them can assume does not depend on the value taken by the
other variables). In this case, the solution of the problem is obvious:
ka
w= > min adr, 3)
Row
and the minimization of a function of K variables is reduced to the minimization of K
functions of one variable. In mathematical parlance, this situation corresponds to the
‘case in which the range of the variables to, 1), . . . , T-1 is the direct product of the
ranges of 7,0 <= K ~1
Instead, let us consider a more general situation in which the range of 1p, 5
‘zx~1 is something different from this direct product. In this case, the value taken on by
any of the 1; affects the range of the remaining variables, and (F.3) cannot be applied.Example F.1 7
‘A tourist wants to travel by car from Los Angeles to Denver in five days. He does not
‘want to drive for more than 350 miles in a day and wants to spend every night in a motel
of his favorite chain (say, Motel 60). With these constraints, the routes among which he
can choose are summarized in Fig. F.1, The best choice will be the one minimizing the
total mileage.
Clearly, the total mileage of a given route is the sum of five terms, where each
represents the distance driven in a day. Hence, this shortest-route problem has the form of
(F.2), where 7, is the route followed on the ith day, and (7) is its length. Also, (F.3)
cannot hold, as, for example, the choice to travel from Los Angeles to Blythe on the first
day will minimize dy(r,), but will result in a A,(+,) value of 228, which is not its minimum
value. 2)
When the “trivial” solution (F.3) is not valid, in principle one can solve (F.2)
by computing all the possible values for XMzo, . . . , tx) (which dre in a finite
number) and choosing the smallest. In certain cases (for instance in Example F.1) this
can be done; but here we assume that this task is computationally impractical because
of the large number of values to be enumerated. Here we shall look for an algorithm
that allows us to avoid the brute-force approach. To this end, we shall first describe a
sequential algorithm to compute p. in (F.2) and then investigate under which assumption
such an algorithm can be simplified and to what extent.
A sequential minimization algorithm
When the variables 7, . . . , T—1 are not “‘independent’” (in the sense previously
stated), if we attempt to minimize X(-) with respect to 79 alone, our result will depend
‘on the values of 7), . . . , tx-1. This is equivalent to stating that the minimum found
ely
os ciry not 296. JUNCTION
e &
295 PAGE_28 URANGO_-3%
ee te a
BLYTHE GatLuR AT
229 WILLIAMS 24°
DAY DAY Day Day Day
1 2 3 4 5
Figure F.1 Which is the shortest route from Los Angeles to Denver?
610 The Viterbi Algorithm — Appendix Fwill be a function of 7), 2 Teas Say BACT + Tr-1)- Observe also that the
minimization of X(-) with respect to 9 will involve only the function Ao(to) and not
the remaining terms of the summation in (F.1). The resulting function of 7), .
fan be further minimized with respect to 1;. This operation will only involve
pity, « . . y Te) + Ay(ay). The result is now a function of 7, + Tate Say
bala, | Tx-1)- Repeating this procedure a sufficient number of times, we finish
with a function of 11 alone, which can be finally minimized to provide us with the
desired value j2.
‘This recursive procedure can be formalized in the following manner. Denoting
by (ulti... Tt E = 0, , K ~ 2, the set of values that 7 is allowed to
take on once the values of i414...» + Tx=1 have been chosen, the procedure involves
the following steps:
Bare oo TD = fogein,_y Roo
WAT 6s THD Gy yi gg lHIray «+ TDF Malt Ds
b= min pet =)
The Viterbi algorithm
Simplifications of the basic algorithm of (F.4) can be derived from the specific
structure of the sets {r[t141,. . - » =a}: The simplest possible situation arises when,
for all =0,.. . , K — 2, we have
filtieny Teak = (oo (F5)
that is, the values that +; can take on are not influenced by those of ts... = + Tk=1-
This is the case where we deal with ‘‘independent’’ variables and we see that (F.4)
reduces to (F.3).
‘A more interesting situation arises when we consider the second simplest case,
in which, for all] = 0,... ,K ~ 2,
friltieis 6 teh = trite (F.6)
that is, the values that each 7, is allowed to take on depend only on 1)»;
In this case, (F.4) simplifies to
watt) = jain, oto),
BAT) = ema, (ereiGi-d + iG) K-41, (F.7)
w= min px
which is the celebrated Viterbi algorithm
The Viterbi Algorithm — Appendix F en‘The Viterbi algorithm can be given an interesting formulation once the minimization
problem to be solved has been formulated as the task of finding the shortest route
through a graph. To do this, let us describe the general model that in its various forms
leads to application of the Viterbi algorithm in this book. Consider an information
source generating a finite sequence &, £1, . . . , &-1 of independent symbols that
can take on a finite number M of values. These symbols are fed sequentially to a
system whose [th output, say x), depends in a deterministic, time-invariant way on the
present input and the L previous inputs:
w= BE Et, > G0). (F.8)
‘The sequence (x,) can be thought of as generated from a shift register, as shown in
Fig. F.2. With this model, it is usual to define the state of the shift register at the
emission of symbol &, as the vector of the symbols contained in its cells, that is,
oF Er s+ ++ Beds (9)
so we can say that the output x; depends on the present input & and on the state 0, of
the shift register:
a1 a, 0). 10)
When the source emits the symbol &,, the shift register is brought to the state 074) =
(&, Ea... « » &-141). Now we can define the transition between these two states
as
Tet S (On O14): Can)
Determination of the range of index / in (F.8) to (F.11) requires some attention,
because for 1 < L and for > XK the function g(-) in (F.8) has a number of arguments
& bes b-.
D D ----+[>
Pa |~ x, Figure F.2. Generation of a shift-register
state sequence.
& ca bes
D D
0) fo = a. G1 bad
Figure F3a) Example of a shift-register
@ state sequence.
612 The Viterbi Algorithm Appendix Fap
enn
an)
4S Figure F.3(b) ‘The associated trellis dia-
Cc) ‘ram.
that are not defined. A possible choice is to assume £, = 0 for /< 0 and! > K ~ 1,
and to define the function g(-) accordingly. In this case, J can be assumed to range
from 0 to K + L — 1. Otherwise, we may want g(-) to be defined only when its
L + 1 arguments belong to the range of the symbols &. In this situation, we must
assume / in (F.8) to range from L to K — 1, which implies in particular that each state
can take on ME values and each transition ME"! values.
Consider now a graphical representation of this process, in the form of a trelis.
This is a treelike graph with remerging branches, where the nodes on the same vertical
line represent distinct states for a given ! (this index is commonly called the time), and
the branches represent transitions from one state to the next. An example should clarify
this point.
Example F.2
Consider a shift-register state sequence with L = 2, M = 2, & © {~1, I}, and K = 5.
For 2 <1-< 5, the states o, can take on four values: (#1, 21). For I <2 and | > 5, we
should also consider states including one or two zeros, and assume that g(-) is also defined
when one or more of its arguments are zero. In this situation, Fig. F.3b depicts the trellis,
diagram for 2 = 1 = 5. Notice that the actual form of g(+) is irrelevant with respect to the
structure of the trellis diagram. C]
Let us now return to the original minimization problem, to be formulated by way
of the use of the trellis diagram defined. It is sufficient to assume that the function to
be minimized, the one defined in (F.1), has as its arguments the transitions (F.11)
between states, so that the values Ai(r,) can be associated with the branches of the
trellis (these are usually called the lengths or the metrics of the branches). Stated this
way, it is easy to see that the set of variables minimizing \(-) corresponds to the
minimum-length path through the trellis. Also, it is relatively simple to show that (F.6)
holds in this situation. Hence, the Viterbi algorithm can be applied. It suffices to observe
that the sequence 1),.1, » Tx-1 Corresponds to a path in the trellis from oto
oxy, and that the set of transitions 7, compatible with such a path is only determined
by ois)
The Viterbi Algorithm — Appendix F 613Example F.3
Letus illustrate the application ofthe Viterbi algorithm to a minimization problem formulated
with a trllis diagram. Figure F.4 shows a trellis whose branches are labeled according 10
their respective lengths, Figure F.5 shows the five steps to be performed to determine the
shortest path through this trellis, as briefly described in the following.
With reference to (F.7), the fist step in the algorithm is to choose, for each 7, (or,
equivalently, for each state g,), the branch leading, to «and having the minimum length
(this is trivia, because there is only one such branch in our example.) For each value of
6;, store this shortest path and its length, which corresponds to the value of wy in (F.7)
Extend now the paths just selected by one branch. For each state of select the branch
leading to it such that the sum of its length Ay plus the value of ty just stored is at a
‘minimum, Store these unique paths, together with their total lengths 13
Extend again these paths by one branch. For each state 0, select the brarich leading
to it such that hz + His at a minimum and store these minimum paths, together with
their lengths. Similar steps should be performed for / = 4 and ! = 5; when J = 5 the
algorithm is terminated. Then we are left with a single path, the shortest one, together
with its length y.. These steps are illustrated in Fig. F.5. (]
‘An interesting fact can be observed from this example. At step 4 (see Fig. F.5d)
‘we see that all the paths selected so far have a common part. In fact, there is a merge
in that all these paths pass through a single node (the uppermost one for ! = 3). Clearly,
whatever happens from now on will not change anything before this merge. Hence we
can deduce that the optimum path will certainly include the first three branches of the
path depicted in Fig. F.5d.
Complexity of the Viterbi algorithm
Finally, let us briefly discuss the computational complexity of the Viterbi algorithm.
Let N, denote the maximum number of states in the trellis diagram for any 1, and N,
the maximum number of branches leading from the nodes corresponding to time index
To the nodes corresponding to time index ! + 1, for any 1 (e.g., N,= 4 and N, = 8
the trellis diagram of Fig. F.4). As far as memory is concemed, the algorithm requires
no more than N, storage locations, each capable of storing a sequence of paths and
their Iengths. The computations to perform in each unit of time are no more than N,
additions and N, comparisons.
As for a shift-register sequence N, = M“ and N, = M“*!, the complexity of the
30 1 iss
Figure F.4 Trellis labeled with branch lengths.
614 The Viterbi Algorithm — Appendix Fos
(e)
Viterbi algorithm increases exponentially with the length L of the shift register. Notice
that the amount of computations required for the minimization of the function X(-) de-
fined in (F.1) grows only linearly with K, whereas the exhaustive enumeration of all the
values of A(-) would require a number of computations growing exponentially with K.
BIBLIOGRAPHICAL NOTES
The Viterbi algorithm was proposed by Viterbi (1967) as a method for decoding convolu-
tional codes (see Chapter 9). Since then, it has been applied to a number of minimization
problems arising in demodulation of digital signals generated by a modulator with memory
(see Chapter 4) or in sequence estimation for channels with intersymbol interference
(Chapters 7 and 10). A survey of applications of the Viterbi algorithm, as well as a
number of details regarding its implementation, can be found in Forney (1973). The
connections between Viterbi algorithm and dynamic programming techniques were first
recognized by Omura (1969)
The Viterbi Algorithm — Appendix F 615References
10,
12,
616
Aaron, M. R., and D. W. Tufts (1966), “Intersymbol interference and error probabil
IEEE Transactions on Information Theory, vol. IT-12, pp. 2436.
. Abend, K., and B, D. Fritchman (1970), “Statistical detection for communication channels
with intersymbol interference,"* IEEE Proceedings, vol. 58, pp. 779-785.
Abramowitz, M., and I. A. Stegun (eds.) (1972), Handbook of Mathematical Functions.
New York: Dover Publicati
‘Ajmone Marsan, M., and E. Biglieri (1977), “Power spectra of complex PSK for satellite
communications,”” Alta Frequenza, vol. 46, pp. 263-270.
‘Ajmone Marsan, M., S. Benedetto, E. Biglieri, and R. Daffara (1977), Performance Analysis
of a Nonlinear Satellite Channel Using Volterra Series, Final Report to ESTEC Contract
2328/74 HP/Rider 1, Dipartimento di Elettronica, Politecnico di Torino (Italy).
‘Amoroso, F. (1976), ‘Pulse and spectrum manipulation in the MSK format,”” IEEE Transac-
tions on Communications, vol. COM-24, pp. 381-384
‘Amoroso, F. (1980), "The bandwidth of digital data signals,"* EEE Communications Maga-
zine, vol. 18, pp. 13-24, November.
‘Anderson, J. B., and J. R. Lesh (1981), “*Guest editors’ prologue,"* EEE Transactions
‘on Communications, vol. COM-29, pp. 185-186.
Anderson, J. B., and D. P. Taylor (1978), "“A bandwidth efficient class of signal-space
‘codes,"" IEEE Transactions on Information Theory, vol. 11-24, pp. 103-112.
Anderson, R. R., and G. J. Foschini (1975), ““The minimum distance for MLSE digital
data systems of limited complexity,”” IEEE Transactions on Information Theory, vol. IT-
21, pp. 544-551
‘Andrisano, O. (1982), “On the behaviour of 4-PSK radio relay systems during multipath
propagation,” Alia Frequenza, vol. 51, pp. 112-126,
Arens, R. (1957), “Complex processes for envelopes of normal noise,” IEBE Transactions
on Information Theory, vol. TT-3, pp. 203-207
References4
Is.
16.
19.
20.
21
2
23.
24.
25,
20.
28.
29.
30.
31
22.
33,
34.
. Aulin, T., and CE, W, Sundberg (1981),
Arsac, J. (1966), Fourier Transforms and the Theory of Distributions, translated by Allen
Nussbaum and Gretcher C. Heim. Englewood Cliffs, N.J.: Prentice-Hall
Arthurs, E., and H. Dym (1962), ““On the optimum detection of digital signals in the
presence of white Gaussian noise. A geometric interpretation and a study of three basic
data transmission systems," JRE Transactions on Communications Systems, vol. CS-10,
pp. 336-372.
Ash, R. B. (1967), Information Theory. New York: Wiley-Interscience.
Athans, M. (1968), "The matrix minimum principle," Information and Control, vol. 11,
pp. 592-606.
‘ontinnous phase modulation—Part I: Full
response signaling,"” IEEE Transactions on Communications, vol. COM-29, pp. 126-208,
Aulin, T., B. Persson, N. Rydbeck and C.-E. W. Sundberg (1982), Spectrally Efficient
Constant-Amplitude Digital Modulation Schemes for Communication Satellite Applications,
ESA Contract Report, May 1982.
Aulin, T., G. Lindell, and C.-E. W. Sundberg (1981), “Selecting smoothing pulses for
partial response digital FM,"” IEE Proceedings, vol. 128, pt. F, pp. 237-244.
Auli, T., N. Rydbeck, and CE. W. Sundberg (1981), “Continuous phase modulati
Part I: Partial response signaling,” JEBE Transactions on Communications, vol. COM-
29, pp. 210-225.
Austin, M. (1967), Decision-Feedback Equalization for Digital Communication Over Disper-
sive Channels, MIT Res. Lab. Electron. Tech. Rep. 461, August 1967,
Babler, G. M, (1973), “Selectively faded nondiversity and space diversity narrowband
microwave radio channels,"* Bell System Technical Journal, vol. 52, pp. 239-261
Barker, H. A., and S. Ambati (1972), “*Nonlinear sampled-data system analysis by multidi
mensional 2-transforms,"" IEE Proceedings, vol. 119, pp. 1407-1413.
Bamett, W. T. (1972), "Multipath propagation at 4, 6 and 11 GHz,” Bell System Technical
Journal, vol. 51, pp. 321-361.
Beare, C, T. (1978), **The choice of the desired impulse response in combined linear-
Viterbi algorithm equalizers,"” IEEE Transactions on Communications, vol. COM-26, pp.
1301-1307.
Bedrosian, E. (1962), ““The analytic signal representation of modulated waveforms,"” IRE
Proceedings, vol. 50, pp. 2071-2076.
Bedrosian, E., and S. O. Rice (1971), "The output properties of Volterra systems driven
by harmonic and Gaussian inputs,” /EEE Proceedings, vol. 59, pp. 1688-1707
Belfiore, C. A., and J. H. Park, Jr. (1979), “Decision feedback equalization,” IEEE Proceed-
ings, vol. 67, pp. 1143-1156,
Bellini, S., and G. Tartara (1985), “Efficient discritninator detection of partial response
continuous phase modulation,”” IEEE Transactions on Communications, vol. COM-23, pp.
883-886.
Bellman, R. E, (1968), Mairix Analysis, 2nd ed. New York: McGraw-Hill,
Benedetto, S., and E. Biglieri (1974), “On linear receivers for digital transmission systems,”
IEEE Transactions on Communications, vol. COM-22, pp. 1205-1215
Benedetto, S., and E. Biglieri (1983), ‘Nonlinear equalization of digital satellite channels,”
IEEE Journal on Selected Areas in Communications, vol. SAC-1, pp. 57-62.
Benedetto, S., E. Biglieri, and J. K. Omura (1981), “Optimum receivers for nonlinear
satellite channels,"” 5th International Conference on Digital Satellite Communications, Ge-
nova, Italy, March 1981.
Benedetto, S., E. Biglieri, and R. Daffara (1976), “Performance of multilevel baseband
digital systems in a nonlinear environment,”” IEEE Transactions on Communications, vol.
COM-24, pp. 1166-1175
References 61735
36.
30.
38.
aL
2
43,
43
46.
41.
48.
49.
s1.
52.
33.
55.
56.
37.
18
Benedetto, S., E. Biglieri, and R. Daffara (1979), "Modeling and performance evaluation
of nonlinear satellite links—A Volterra series approach,”” JEEE Transactions on Aerospace
and Electronic Systems, vol. AES-15, pp. 494-507.
Benedetto, S., E. Biglieri, and V. Castellani (1973), “*Combined effects of intersymbol,
interchannel, and co-channel interferences in M-ary CPSK systems,"” /EEE Transactions
on Communications, vol. COM-21, pp. 997-1008.
Benedetto, S., G. De Vincentiis, and A. Luvison (1973), “Error probability in the presence
of intersymbol interference and additive noise for multilevel digital signals,"" EEE Transac-
tions on Communications, vol. COM-21, pp. 181-188.
Benedetto, S., M. Ajmone Marsan, G. Albertengo, and E. Giachin (1987), ““Combined
coding and modulation: Theory and applications,”* /EEE Transactions on Information Theory,
to be published.
Bennett, W. R., and S. O. Rice (1963), “Spectral density and autocorrelation functions
associated with binary FSK,"* Bell System Technical Journal, vol. 42, pp. 2355-2385.
Benveniste, A., and M. Goursat (1984), “Blind equalizers,"* IEEE Transactions on Communi-
cations, vol. COM-32, pp. 871-883.
Berger, T. (1971), Rate Distortion Theory. Englewood Cliffs, N.J.: Prentice-Hall.
Beyger, T., and D. W. Tufts (1967), “Optimum pulse amplitude modulation. Part I: Transmit-
ter-receiver design and bounds from information theory,”” IEEE Transactions on Information
Theory, vol. IT-13, pp. 196-208.
Berlekamp, E. R. (1968), Algebraic Coding Theory. New York: McGraw-Hill,
Berlekamp, E. R., ed. (1974), Key Papers in the Development of Coding Theory. New
York: IEEE Press.
Biglier, E. (1984), “High-level modulation and coding for nonlinear satellite channels,"*
IEEE Transactions on Commitnications, vol. COM-32, pp. 616-626.
Biglieri, E., A. Gersho, R. D. Gitlin, and T. L. Lim (1984), “Adaptive cancellation of
nonlinear intersymbol interference for voiceband data transmission,” /EEE Journal on Selected
‘Areas in Communications, vol. SAC-2, pp. 165-777.
Biglieri, E., M. Elia, and L. Lo Presti (1984), “*Optimal linear receiving filter for digital
transmission over nonlinear channels," GLOBECOM 1984, Atlanta, Ga., November 1984.
Blachman, N. M. (1971), “Detectors, bandpass nonlinearities and their optimization: inversion
of the Chebyshev transform,”” IEEE Transactions on Information Theory, vol. YT-17, pp.
398-404.
Blachman, N. M. (1982), Noise and Its Effects on Communication, 2nd ed. Malabar, Fla:
R. E. Krieger Publishing Co.
Blahut, R. E. (1983), Theory and Practice of Error Control Codes. Reading, Mass.: Addison-
Wesley,
Blanchard, A. (1976), Phase-locked Loops. New York: Wiley.
Blanc-Lapierre, A., and R. Fortet (1968), Theory of Random Functions, vol. 2. New York:
Gordon and Breach.
Boutin, N., S. Morissette, and C. Portier (1982), “Extension of Mueller’s theory on optimum
pulse shaping for data transmission,"" /EE Proceedings, vol. 129, Pt. F, pp. 255-260.
Bracewell, R. N. (1978) The Fourier Transform and Its Applications, 2nd ed. New York
McGraw-Hill.
Brennan, L. E., and I. S, Reed (1965), ““A recursive method of computing the Q function,"”
IEEE Transactions on Information Theory, vol. 1T-L1, pp. 312-313.
Butman, S. A., and J. R. Lesh (1977), “The effects of bandpass limiters on n-phase tracking
systems,”” IEEE Transactions on Communications, vol. COM-25, pp. 569-516.
Cambanis, S., and B. Lin (1970), “On harmonizable stochastic processes,” Information
and Control, vol. 17, pp. 183-202.
References58, Cambanis, S., and E. Masry (1971), “On the representation of weakly continuous stochastic
processes,”” Information Sciences, vol. 3, pp. 277-290.
59. Campbell, L. L. (1969), "Series expansions for random processes,"" in Proceedings of the
International Symposium on Probability and Information Theory, Lecture Notes in Mathert-
ies, no. 89, pp. 77-95. New York: Springer-Verlag
60, Campopiano, C. N.. and B. G, Glazer (1962), “A coherent digital amplitude and phase
modulation scheme," JRE Transactions on Communication Systems, vol. C8-10, pp. 90-
9s.
61, Cariolaro, G. L., and G. P. Tronca (1974), “Spectra of block coded digital signals,"*
IEEE Transactions on Communications, vol. COM-22, pp. 1555-1564.
62. Cariolaro, G. L., and S. Pupotin (1975), “Moments of correlated digital signals for error
probability evaluation,” JEEE Transactions on Information Theory, vol. TT-21, pp. 558
568
63. Cariolaro, G. L., G. L. Pietobon, and G. P, Tronca (1983), “Analysis of codes and spectra
calculations,” International Journal of Electronics, vol. 35, pp. 35-19.
64, Castellani, V., L. Lo Presti, and M. Pent (1974a), “Performance of multilevel DCPSK
systems in the presence of both interchannel and intersymbol interference," Electronics
Letters, vol. 10, no. 7, pp. 11-112,
65, Castellani, V., L. Lo Presti, and M. Pent (1974b), "Multilevel DCPSK over real channels,"”
IEEE International Conference on Communications (ICC '74), Minneapolis, Minn., June
1974
66. Chalk, J. H. H. (1950), “The optimum pulse shape for pulse communication," IEE Proceed-
ings, vol. 97, pt. Il, pp. 88-92.
67. Chang, R. W. (1971), “A new equalizer structure for fast start-up digital communicatio
Bell System Technical Journal, vol. 50, pp. 1969-2001
68. Chang, R. W., and J. C. Hancock (1966), ““On receiver structures for channels having
memory," IEEE Transactions on Information Theory, vol. 1T-12, pp. 463-468,
69, Clark, G. C., and J. B. Cain (1981), Error-Correction Coding for Digital Communications.
New York: Plenum Press.
70. Corazza, G., and G. Immovilli (1979), “On the effect of a type of modulation pulse shaping
in FSK transmission systems with limiter-discriminator detection,” Alta Frequenza, vol.
48, pp. 449-457.
71, Corazza, G., G. Crippa, and G. Immovilli (1981), “Performance analysis of quaternary
CPFSK systems with modulation pulse shaping and timiter-discriminator detection,"” Alta
Frequenza, vol. 50, pp. 77-88.
72, Croisier, A. (1970), “Introduction to pseudotemary transmission codes,
Research and Development, vol. 14, pp. 354-367
73, D'Andrea, N. A. and F. Russo (1983), “First-order DPLL's: A survey ofa peculiar methodol-
‘ogy and some new applications,"* Alta Frequenza, vol. 52, pp. 495-505.
74, Daut, D. G., J. W. Modestino, and L.. D. Wismer (1982), “New short constraint length
‘convolutional code construction for selected rational rates,”” BEE Transactions on Information
Theory, vol. 1T-28, pp. 794-800.
75, Davenport, W. B., Jr. (1970), Probability and Random Processes. An Introduction for
Applied Scientists and Engineers. New York: McGraw-Hill
76. De Buda, R. (1972), “Coherent demodulation of frequency-shift keying with low deviation
ratio,”” IEEE Transactions on Communications, vol. COM-20, pp. 429-435.
77, Decina M., and A, Roveri (1987), “Integrated Services Digital Network: Architectures
and Protocols,"” Chapter 2 of K. Feher, Advanced Digital Communications. Englewood
(Cliffs, N.J.: Prentice-Hall.
78. De Jager, F., and C. B. Dekker (1978), ““Tamed frequency modulation—a novel method
IBM Journal of
References 61980.
81.
82.
83.
84,
85.
86.
87.
88,
9
93.
95.
98.
99,
100,
620
to achieve spectrum economy in digital transmission,”” IEEE Transactions on Communica-
tions, vol. COM-26, pp. 534-542
Devieux, C., and R. Pickholtz (1969), “Adaptive equalization with a second-order algo-
rithm," Proceedings of the Symposium on Computer Processing in Communications, Polytech-
nic Institute of Brooklyn, pp. 665-681. April 8-10, 1969.
Di Toro, M. J. (1968), “Communication in time-frequency spread media using adaptive
equalization,” IEEE Proceedings, vol. 56, pp. 1653-1679.
Divsalar, D. (1978), Performance of Mismatched Receivers on Bandlimited Channels, Ph.D.
ssertation, University of California, Los Angeles.
Dugundji, Dj. (1958), “Envelopes and pre-envelopes of real wave-forms,”” IRE Transactions
‘on Information Theory, vol. IT-4, pp. 53-57.
Duttweiler, D, L, (1982), ““Adaptive filter performance with nonlinearities in the correlation
multiplier,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vols ASSP-
30, pp. 578-586.
Dym, H., and H. P, McKean (1972), Fourier Series and Integrals. New York: Academic
Press
ia, M. (1983), “Symbol error rate of binary block codes,"* Transactions of the 9th Prague
Conference on Information Theory, Statistical Decision Functions, Random Processes, PP
223-227, Prague, June 1982.
Etnouti, S., and S. C. Gupta (1981), “Error rate performance of noncoherent detection of