0% found this document useful (0 votes)
125 views

Parthasarathy H. Advanced Probability and Statistics... 2023

Uploaded by

Sandeep Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

Parthasarathy H. Advanced Probability and Statistics... 2023

Uploaded by

Sandeep Dey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 427

Advanced Probability and Statistics:

Applications to Physics and


Engineering
Advanced Probability and Statistics:
Applications to Physics and
Engineering

Harish Parthasarathy
Professor
Electronics & Communication Engineering
Netaji Subhas Institute of Technology (NSIT)
New Delhi, Delhi-110078
First published 2023
by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
and by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
© 2023 Harish Parthasarathy and Manakin Press
CRC Press is an imprint of Informa UK Limited
The right of Harish Parthasarathy to be identified as author of this work has been asserted in
accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in any
form or by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying and recording, or in any information storage or retrieval system,
without permission in writing from the publishers.
For permission to photocopy or use material electronically from this work, access www.
copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive,
Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact
[email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks,
and are used only for identification and explanation without intent to infringe.
Print edition not for sale in South Asia (India, Sri Lanka, Nepal, Bangladesh, Pakistan or
Bhutan).
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record has been requested
ISBN: 9781032384375 (hbk)
ISBN: 9781032384382 (pbk)
ISBN: 9781003345060 (ebk)
DOI: 10.1201/9781003345060
Typeset in Arial, MinionPro, Symbol, CalisMTBol, TimesNewRoman, RupeeForadian,
Wingdings, ZapDingbats, Euclid, MT-Extra
by Manakin Press, Delhi
Preface
This book is primarily a book on advanced probability and statistics that
could be useful for undergraduate and postgraduate students of physics, engi-
neering and applied mathematics who desire to learn about the applications of
classical and quantum probability to problems of classical physics, signal pro-
cessing and quantum physics and quantum field theory. The prerequisites for
reading this book are basic measure theoretic probability, linear algebra, differ-
ential equations, stochastic differential equations, group representation theory
and quantum mechanics. The book deals with classical and quantum probabili-
ties including a decent discussion of Brownian motion, Poisson process and their
quantum non-commutative analogues. The basic results of measure theoretic
integration which are important in constructing the expectation of random vari-
ables are discussed. The Kolmogorov consistency theorem for the existence of
stochastic processes having given consistent finite dimensional probability dis-
tributions is also outlined. For doing quantum probability in Boson Fock space,
we require the construction of the tensor product between Hilbert spaces. This
construction based on the GNS principle, Schur’s theorem of positive definite
matrices and Kolmogorov’s consistency theorem has been outlined. The laws of
large numbers for sums of independent random variables are introduced here and
we state the fundamental inequalities and properties of Martingales originally
due to J.L.Doob culminating finally in the proof of the Martingale convergence
theorem based on the downcrossing/upcrossing inequalities. Doob’s Martingale
inequality can be used to give an easy proof of the strong law of large numbers
and we mention it here. Doob’s optional stopping theorem for submartingales is
proved and applied to calculating the distribution of the hitting times of Brow-
nian motion. We give another proof of this distribution based on the reflection
principle of Desire’ Andre which once again rests on the strong Markov property
of Brownian motion.

Applications of the theory of stochastic processes especially higher order


statistics and spectra to the characterizing of diseased brain EEG and speech
signals has also been discussed in here. Several different models of diseased
brain data have been proposed here as an application of higher order spectra.
These include (a) modeling the brain signals as the output of Volterra filters to
Gaussian signals and estimating the Volterra coefficients from the higher order
spectra of the output measured EEG, speech and MRI data, and (b) construct-
ing dynamical models for the speech, EEG and MRI signal field data as solutions
to partial differential equations in space-time with Gaussian noise input and then
estimating the parameters of these pde’s from noisy sparse measurements using
the extended Kalman filter. We propose this method as a training process for
determining the diseased brain parameters for each disease and then given fresh
data, we estimate the same parameters using the EKF and match it to those
obtained during the training process.
Here, we assume that the reader is familiar with non-linear filtering theory
based on the Kushner-Kallianpur approach. Of course, the derivation of this
filter requires knowledge of conditional expectations and Ito’s formula for Brow-
v
nian motion, and of course also basic facts about Markov processes. We also
in this section discuss group theoretic signal and image processing and apply
these methods also to EEG data analysis in the sense that when the dynam-
ics of the spatio-temporal dynamics of the EEG field data is invariant under a
Lie group of transformations acting on the spatial variables and also when the
noise has G-invariant statistics, how the dynamics can be simplified by taking
group theoretic Fourier transforms based on the irreducible representations of
the group. With a simplified dynamics, the parameters of the dynamical system
are more easily estimated. We also discuss statistical aspects of 3-dimensional
robot links with random torque using the techniques of Lie-group, Lie algebra
theory applied to the 3-dimensional rotation group. The theory of quantum
neural networks has also been outlined here. It involves estimating dynami-
cally the probability density of a random vector using the modulus square of a
wave function following Schrodinger’s equation and driven by a potential that
represents the difference between the true pde and the Schrodinger pdf. This
is a natural approach since the Schrodinger equation by virtue of its unitary
evolution has the property of naturally generating a whole family of probability
densities indexed by time, no matter what the driving potential is as long as it is
real, so that the Hamiltonian is a self-adjoint operator. In this section, we also
introduce the reader to how probabilities of events are computed in quantum
scattering theory based on the wave operators associated with two Hamiltoni-
ans. The ultimate aim of scattering theory is to get to know the asymptotic
probability distribution of scattered particles when the particles are free projec-
tiles which get to interact with a scattering centre. One important probabilistic
feature of scattering theory is to determine the relative average time spent by
the scattered particle within a Borel set in position space and we discuss this
problem here. We also discuss in this section some work carried out by the
author’s doctoral student along with the author on quantum image processing.
This involves encoding classical image fields into quantum states and designing
quantum unitary processors that will denoise this state and then decoding the
processed state back into a classical image field. The other parts of the work
carried out by the the author’s doctoral student along with the author deals with
estimating time varying classical parameters of a quantum electromagnetic field
(ie a quantum image field) by allowing the quantum field to interact with an
electron of an atom and measuring the transition probabilities of the electron
between two of its stationary states as a function of time. The EKF has been
applied to this problem to obtain real time estimates of the classical parameters
that are governed by a classical stochastic differential equation. Applications of
V.P.Belavkin’s theory of quantum filtering with non-demolition measurements
taken on the Hudson-Parthasarathy noisy Schrodinger equation are discussed.
These include problems like obtaining real time estimates of the cavity electro-
magnetic field, the cavity gravitational field etc when the bath surrounding the
cavity has a noisy electromagnetic field. This problem can be viewed as an exer-
cise in non-commutative probability theory. We also present some work dealing
with linearization of the Einstein-Maxwell field equations since this work fits
into the framework of linear partial differential equations driven by stochastic

vi
fields and hence is a problem in advanced stochastic field theory. We also discuss
some applications of advanced probability to general electromagnetics and ele-
mentary quantum mechanics like what is the statistics of the far field radiation
pattern produced by a random current source and how when this random elec-
tromagnetic field is incident upon an atom modeled by the Schrodinger or Dirac
equation, the stochastically averaged transition probability can be computed in
terms of the classical current correlations. Many other problems in statistical
signal processing like prediction and filtering of stationary time series, Kalman,
Extended Kalman and Unscented Kalman filters, the MUSIC and ESPRIT al-
gorithms for estimating the directions of random signal emitting sources, the
recursive least squares lattice algorithm for order and time recursive prediction
and filtering are discussed. We have also included some material on superstring
theory since it is closely connected with the theory of operators in Boson and
Fermion Fock spaces which is now an integral component of non-commutative
probability theory. Some aspects of supersymmetry have also been discussed
in this book with the hope that supersymmetric quantum systems can be used
to design quantum gates of very large size. Various aspects and applications
of large deviation theory have also been included in this book as it forms an
integral part of modern probability theory which is used to calculate the prob-
ability of rare events like the probability of a stochastic dynamical system with
weak noise exiting the stability zone. The computation of such probabilities
enables us to design controllers that will minimize this deviation probability.
The chapter on applied differential equations focuses on problems in robotics
and other engineering or physics problems wherein stochastic processes and field
inevitably enter into the description of the dynamical system. The chapter on
circuit theory and device physics has also been included since it tells us how to
obtain the governing equations for diodes and transistors from the band struc-
ture of semiconductors. When a circuit is built using such elements and thermal
noise is present in the resistances, then the noise gets distorted and even am-
plified by the nonlinearity of the device and the mathematical description of
such circuits can be calculated by perturbatively solving associated nonlinear
stochastic differential equations. Quantum scattering theory has also been in-
cluded since it tells us how quantum effects make the probability distribution of
scattered particles different from that obtained using classical scattering theory.
Thus, quantum scattering theory is an integral part of quantum probability.
Many discussions on the Boltzmann kinetic transport equation in a plasma are
included in this book since the Boltzmann distribution function at each time
t can be viewed as an evolving probability density function of a particle in
phase space. In fact, the Bolzmann equation is so fundamental that it can be
used to derive not only more precise forms of the fluid dynamical equations but
also describe the motion of conducting fluids in the presence of electromagnetic
fields. Any book on applications of advanced probability theory must therefore
necessarily include a discussion of the Boltzmann equation. It can be used to
derive the Fokker-Planck equation for diffusion processes after making approx-
imations and by including the nonlinear collision terms, it can also be used to
prove the H-theorem ie the second law of thermodynamics. The section on the

vii
Atiyah-Singer index theorem has been included because it forms an integral part
of calculating anomalies in quantum field theory which is in turn a branch of
non-commutative probability theory.

At this juncture, it must be mentioned that this book in the course of dis-
cussing applied probability and statistics, also surveys some of the research
work carried out by eminent scientists in the field of pure and applied prob-
ability, quantum probability, quantum scattering theory, group representation
theory and general relativity. In some cases, we also indicate the train of thought
processes by which these eminent scientists arrived at their fundamental con-
tributions. To start with, we review the axiomatic foundations of probability
theory due to A.N.Kolmogorov and how the Indian school of probabilists and
statisticians used this theory effectively to study a host of applied probability
and statistics problems like parameter estimation, convergence of a sequence
of probability distributions, martingale characterization of diffusions enabling
one to extend the scope of the Ito stochastic differential equations to situations
when the drift and diffusion coefficients do not satisfy Lipschitz conditions, gen-
eralization of the large deviation principle and apply it to problems involving
random environment, interacting particle systems etc. We then discuss the work
of R.L.Hudson along with K.R.Parthasarathy on developing a coherent theory
of quantum noise and apply it to study in a rigorous mathematical way the
Schrodinger equation with quantum noise. This gives us a better understand-
ing of open quantum systems, ie systems in which the system gets coupled to
a bath with the joint system-bath universe following a unitary evolution and
after carrying a out a partial trace of the environment, how one ends up with
the standard Gorini-Kossokowski-Sudarshan-Lindblad (GKSL) equation for the
system state alone–this is a non-unitary evolution. The name of George Su-
darshan stands out here as not only one of the creators of open quantum sys-
tem theory but also as the physicist involved in developing the non-orthogonal
resolution of the identity operator in Boson Fock space which enables one to
effectively solve the GKSL equation. We then discuss the work of K.B.Sinha
along with W.O.Amrein in quantum scattering theory especially in the devel-
opment of the time delay principle which computes the average time spent by
the scattered particle in a scattering state relative to the time spent by it in the
free state. We discuss the heavy contributions of the Indian school of general
relativitsts like the Nobel laureate Subramaniyam Chandraskehar on developing
perturbative tools for solving the Einstein-Maxwell equations in a fluid (post-
Newtonian hydrodynamics) and also the work of Abhay Ashtekar and Ashoke
Sen on canonical quantization of the gravitational field and superstring the-
ory. We discuss the the train of thought that led the famous Indian probabilist
S.R.S.Varadhan to develop along with Daniel W.Stroock the martingale charac-
terization of diffusions and along with M.D.Donsker to develop the variational
formulation of the large deviation principle which plays a fundamental role in
assessing the role of weak noise on a system to cause it to exit a stability zone
by computing the probability of this rare event. We then discuss the work of
the legendary Inidian mathematician Harish-Chandra on group representation

viii
theory especially his creation of the discrete series of representations for groups
having both non-compact and compact Cartan subgroups to finally obtain the
Plancherel formula for such semisimple Lie groups. We discuss the impact of
Harish-Chandra’s work on modern group theoretical image processing, for ex-
ample in estimating the element of the Lorentz group that transforms a given
image field into a moving and rotating image field. We then discuss the con-
tributions of the famous Indian probabilist Gopinath Kallianpur to developing
non-linear filtering theory in its modern form along with the work of some other
probabilists like Kushner and Striebel. Kallianpur’s theory of nonlinear filter-
ing is the most general one as we know today since it is applicable to situations
when the process and measurement noises are correlated. Kallianpur’s mar-
tingale approach to this problem has in fact directly led to the development
of the quantum filter of V.P.Belavkin as a non-commutative generalization of
the classical version. Nonlinear filtering theory has been applied in its linearized
approximate form-the Extended Kalman filter to problems in robotics and EEG-
MRI analysis of medical data. It is applicable in fact to all problems where the
system dynamics is described by a noisy differential or difference equation and
one desires to estimate both the state and parameters of this dynamical system
on a real time basis using partial noisy measurement data. We conclude this
work with brief discussions of some of the contributions of the Indian school
of robotics and quantum signal processing to image denoising, robot control
via teleoperation and to artificial intelligence/machine learning algorithms for
estimating the nature of brain diseases from slurred speech data. This review
also includes the work of K.R.Parthasarathy on the noiseless and noisy Shan-
non coding theorems in information theory especially to problems involving the
transmission of information in the form of stationary ergodic processes through
finite memory channels and the computation the Shannon capacity for such
problems. It also includes the pedagogical work of K.R.Parthasarathy in sim-
plfying the proof of Andreas Winter and A.S.Holevo on computing the capacity
of iid classical-quantum channels wherein classical alphabets are encoded into
quantum states and decoding positive operator valued measures are used in the
decoding process. We also include here the work of K.R.Parthasarathy on realiz-
ing via a quantum circuit, the recovery operators in the Knill-Laflamme theorem
for recovering the input quantum state after it has been transmitted through a
noisy quantum channel described in the form of Choi-Krauss-Stinespring opera-
tors. Some generalizations of the single qubit error detection algorithm of Peter
Shor based on group theory due to K.R.Parthasarathy are also discussed here.
This book also contains some of the recent work of the Indian school of robotics
involving modeling the motion of rigid 3-D links in a robot using Lie group-Lie
algebra theory and differential equations on such Lie groups. After setting up
these kinematic differential equation in the Lie algebra domain of SO(3)⊗n , we
include weak noise terms coming from the torque and develop a large deviation
principle for computing the approximate probability of exit of the robot from
the stability zone. We include feedback terms to this robot system and optimize
this feedback controller so that the probability of stability zone exit computed
using the large deviation rate function is as small as possible.

ix
Table of Contents

1. Classical and Quantum Probability 1–38

2. Quantum Scattering Theory 39–56

3. Linear Algebra and Operator Theory 57–98

4. Group Theory in Statistical Signal Processing and Control 99–134

5. Statistical Aspects of EEG Signal Analysis and


Image Processing in the Medical Sciences 135–168

6. Electromagnetism and Quantum Field Theory 169–208

7. Some Aspects of Superstring Theory 209–244

8. Superconductivity 245–248

9. Some Aspects of Large Deviation Theory 249–272

10 Conributions of Some Indian Scientists 273–296

 $SSOLHG'L൵HUHQWLDO(TXDWLRQV ±

12. Quantum Signal Processing 333–394

13. Aspects of Circuit Theory and Device Physics 395–398

14. Index on the Contents and Notes 399–414


Chapter 1

Classical and Quantum


Probability

[1] Classical probability spaces:


This is a triple (Ω, F, P ) where Ω is the sample space of all elementary out-
comes of the experiment, F is the σ-field of events. The elements of F are
subsets of Ω and the class F is closed under countable unions and comple-
mentation and therefore by De-Morgan’s rules, it is also closed under countble
intersections. P : F → [0, 1] is a countably additive map satisfying P (Ω) = 1
and therefore P (φ) = 0. The surprising thing is that non-empty events can
have zero probability, ie, such events can also occur. An example of this is
the uniform probability distribution over [0, 1] any event here that has non-zero
probability is an uncountable union of single point events and each single point
event has zero probability.

[2] Quantum probability spaces: This is a triple (H, P, ρ) where H is a


Hilbert space, P is a lattice of orthogonal projections in H and ρ is a state, ie
a positive definite operator in H having unit trace.
If (Ω, F, P ) is a classical probability space, then we can define H = L2 (Ω, F, μ),
P as consisting of orthogonal projections having the form of χF , F ∈ F, ie, mul-
tiplication by the indicator of F and ρ to be such that

< f, ρg >= f¯(ω)g(ω)dP (ω), f, g ∈ H
Ω

Here μ is a measure on (Ω, F) such that P is absolutely continuous w.r.t μ. In


order that ρ have unit trace, we require that if en , n = 1, 2, ... is an onb in H,

then
|en (ω)|2 dP (ω) = 1
n Ω


We note that
T r(ρ.χF ) = |en (ω)|2 dP (ω)
n F

1
2 Advanced Probability and Statistics: Applications to Physics and Engineering


In particular, if we choose the en s such that n |en (ω)|2 = 1 for P a.e.ω, then
we get
T r(ρ.χF ) = P (F ), F ∈ F
(H, F, P ) is a quantum probability space.

[3] Random variables in classical and quantum probability. In classical prob-


ability, we are given a probability space (Ω, F, P ) and a random variable is then
simply a measurable mapping X from the sample space to the real line or more
generally from the sample space to another measurable space (G, B). The prob-
ability distribution of the random variable is then P X −1 . This probability
distribution is then a probability measure on (G, B). If the range space of the
random variable is (Rn , B(Rn )), then by using the countable additivity prop-
erty of the probability measure and elementary properties of the inverse image
mapping of sets, it is easily seen that the probability distribution function of X
defined as

FX (x1 , ..., xn ) = P X −1 ((−∞, x1 ] × ... × (−∞, xn ])

is right continuous in each variable, non-decreasing in each variable, and and


converges as xn → ∞ to FY (x1 , ..., xn−1 ) and to zero as xn → −∞ where Y is
the r.v. obtained by deleting the last component of X. In quantum probability,
we are given a quantum probability space (H, P (H), ρ) and a random variable,
also called an observable is a self-adjoint operator X in H. If the spectral
representation of X is known, X can be regarded as a real valued random
variable in the state ρ with probability distribution T r(ρEX (x)).
[4] Expectation, variance, moments of r.v’s in classical and quantum proba-
bility.
If X is an observable with spectral measure EX (.) and Y is another observ-
able with spectral measure EY (.), then if X and Y commute, so do their spectral
measures and then the joint probability distribution of (X,Y) in any state ρ can
be defined as
F (dλ, dμ) = T r(ρ.EX (dλ)EY (dμ)
It is easily seen that F is a probability measure, ie, non-negative and whose
double integral is unity. If however X and Y do not commute then their spectral
measures will not commute and F as defined above is in general not even real
valued. This fact can be attributed to the Heisenberg uncertainty principle
which states that two non-commuting observables cannot be simultaneously
measured in any state. In fact, we can easily prove using the Cauchy-Schwarz
inequality that for any two observables X,Y and a state ρ, we have that

T r(ρ.X 2 ).T r(ρ.Y 2 ) ≥ |T r(ρ.X.Y )|2 ≥ (Im(T r(ρXY )))2

= (1/4|T r(ρ.[X, Y ]))|2


which means that if the observables do not commute then the product of their
variances in a given state will in general be positive meaning that if we try
Advanced Probability and Statistics: Applications to Physics and Engineering 3

to choose a state such that X can be localised then in the same state Y will
have infinite variance, so that the two can never be simultaneously measured.
Another way to state this is that if we define the joint characteristic function of
X and Y in the state ρ as
ψ(t, s) = T r(ρ.exp(tX + sY )), t, s ∈ R
then ψ will in general not be positive definite and hence its inverse bivariate
Fourier transform by Bochner’s theorem will not generally be a probability
distribution. If however, X, Y commute, then ψ will always be positive definite:

c̄k cm ψ(tk −tm , sk −sm )
k,m
 
= T r(ρ.( ck exp(tk X+sk Y )))( cm exp(tm X+sm Y ))∗ ≥ 0
k m
[5] Proofs of the main theorems in classical integration theory.
[a] Monotone convergence, [b] Fatou’s lemma, [c] Dominated convergence,
[d]Fubini’s theorem.
If (Ω, F, μ) is a σ-finite measurable space and fn is a sequence of increas-
ing measurable functions bounded below by an integrable function, then the
monotone convergence theorem states that
 
lim fn dμ = limfn dμ
If fn is any sequence of measurable functions bounded below by an integrable
function, then by using the increasing sequence infk≥n fk in the monotone con-
vergence theorem, we can establish Fatou’s lemma:
 
liminf fn dμ ≥ liminf fn dμ

If fn is a sequence of integrable functions bounded in magnitude by an inte-


grable function g and if limfn = f exists a.e.μ, then the dominated convergence
theorem states that  
f dμ = lim fn dμ

This is proved by applying Fatou’s lemma to the non-negative functions g − fn


and g + fn
[6] Tensor products of Hilbert spaces.
Let Hk , k = 1, 2, ..., n be Hilbert spaces with inner products < ., . >k , k =
1, 2, ..., n respectively. Consider the set S = H1 × ... × Hn and define a kernel

K :S×S →C

by
K((x1 , ..., xn ), (y1 , ..., yn )) = Πnk=1 < xk , yk >k
4 Advanced Probability and Statistics: Applications to Physics and Engineering

Then, by Schur’s theorem, K is positive definite and hence by an elementary


application of Kolmogorov’s consistency theorem for stochastic processes, there
exists a probability space (Ω, F, P ) and a Gaussian stochastic process {ξ(x) :
x ∈ S} in this probability space such that K is the correlation of this process,
ie,
< ξ(x), ξ(y) >= E(ξ(x)ξ(y)) = K(x, y), x, y ∈ S
Thus, by the GNS principle, ξ can be extended to an isometry. We define
x1 ⊗ ... ⊗ xn = ξ(x1 , ..., xn ), ()x1 , ..., xn ) ∈ S
and call it the tensor product of the vectors xk ∈ Hk , k = 1, 2, ..., n It is imme-
diately verified by using properties of the inner product that the tensor product
is linear in each of its arguments. Specifically, simple computation shows that
 ξ(x1 , ..., axi + byi , ..., xn ) − aξ(x1 , ..., xi , .., xn ) − b.ξ(x1 , ..., yi , ..., xn ) 2 = 0
for all complex numbers a, b and xk ∈ Hk , k = 1, 2, ..., n, yi ∈ Hi . The closure
of the span of ξ(x), x ∈ S is a closed subspace of the Hilbert space L2 (Ω, F, P )
and is called the tensor product of the Hilbert spaces Hk , k = 1, 2, ..., n and is
denoted by H = H1 ⊗ ... ⊗ Hn . If T1 , ..., Tn are bounded linear operators in
H1 , ..., Hn respectively, then we can define a linear operator T in H1 ⊗ ... ⊗ Hn
by
T (x1 ⊗ ... ⊗ xn ) = T1 x1 ⊗ ... ⊗ Tn xn
and extending the domain of T to the whole of H by linearity and continuity.
It is then easily verifiable that the map (T1 , ..., Tn ) → T1 ⊗ ... ⊗ Tn is multilinear
and satisfies
(S1 ⊗ ... ⊗ Sn ).(T1 ⊗ ... ⊗ Tn ) = (S1 T1 ) ⊗ ... ⊗ (Sn Tn )
K.R.Parthasarathy along with K.Schmidt introduced for the first time this con-
struction of the tensor product using positive definite kernels, the GNS principle
and Kolmogorov’s consistency theorem.

[7] Product probability measures in classical and quantum probability


Let (Ωk , Fk , Pk ), k = 1, 2, ..., n be probability spaces. Define a finitely addi-
tive set function P on the algebra/field of all finite disjoint union of rectangles
of the form E1 × ... × En with Ek ∈ Fk , k = 1, 2, ..., n so that
P (E1 × ... × En ) = P1 (E1 )...Pn (En )
We then show that P is also countably additive on this field and hence by the
Caratheodory extension theorem, P can be extended to a probability measure
on the σ-field
F = σ(F1 × ... × Fn )
P is called the product probability measure of P1 , ..., Pn and if we write Ω =
Ω1 × ... × Ωn , then (Ω, F, P ) is a probability space called the product of the
probability spaces (Ωk , Fk , Pk ), k = 1, 2, ..., n. We write
(Ω, F, P ) = ×nk=1 (Ωk , Fk , Pk )
Advanced Probability and Statistics: Applications to Physics and Engineering 5

Physically, this product probability space corresponds to an experiment in which


each of the outcomes are obtained by independent trials of the component ex-
periments. If X : Ω → R is a random variable on this product probabilty space,
then  
EP (X) = XdP = X(ω1 , ..., ωn )dP1 (ω1 )...dPn (ωn )

In particluar, if Xk is a random variable in (Ωk , Fk , Pk ) for each k = 1, 2, ..., n


and if we define
X(ω1 , ..., ωn ) = X1 (ω1 )...Xn (ωn )
then X is a random variable in (Ω, F, P ) whose expectation is given by

EP (X) = Πnk=1 Xk dPk = Πnk=1 E(Xk )

[8] The weak law of large numbers and the central limit theorem for sums of
independent random variables.
If X(n), n = 1, 2, ... is a sequence of independent r.vs with corresponding
means μn and corresponding finite variances σn2 and if

(μ1 + ... + μn )/n → μ, (σ12 + ... + σn2 )/n → σ 2

then the sequence


Sn /n = (X(1) + ... + X(n))/n
converges in the mean square sense and hence also in probability and hence also
in distribution to the constant r.v.μ. This is proved easily using the Chebyshev
inequality. Indeed, independence of the X(n) s implies that


n
V ar(Sn ) = σk2
k=1

and hence
2
P (|Sn /n − (μ1 + ... + μn )/n| > ) ≤ V ar(Sn /n)/ = (σ12 + ... + σn2 )/n2 2
→0

and the result follows on using

(μ1 + ... + μn )/n → μ

ie, we get
P (|Sn /n − μ| > ) → 0

[9] Sums of independent random variables and the strong law of large num-
bers.
6 Advanced Probability and Statistics: Applications to Physics and Engineering

[10] The weak and strong laws of large numbers and the Levy-Khintchine
theorem on representing the characteristic function of an infinitely divisible
probability distributions in its most general form and its subsequent generaliza-
tion to infinitely divisible distributions for Hilbert space random variables by
S.R.S.Varadhan.

[11] Weak convergence of probability measures and Prohorov’s tightness the-


orem.
[12] An introduction to Brownian motion, Poisson process and other stochas-
tic processes.
Brownian motion is a zero Gaussian stochastic process having almost sure
continuous sample paths B(t), t ≥ 0 and independent increments such that
V ar(B(t) − B(s)) = t − s, t ≥ s ≥ 0. Such a process can easily be constructed
using the so called ”tent functions” as was first done by Paul Levy. It can also
be constructed using the so-called Karhunen-Loeve eigenfunction expansion.
Noting that E(B(t)B(s)) = min(t, s) = K(t, s), we use the spectral theorem for
compact operators to construct an onb {φn } for L2 [0, 1] such that
 1
K(t, s)φn (s)ds = λn φn (t), t ∈ [0, 1]
0

and then prove that if Xn is an iid sequence of N (0, 1) r.v’,s then the sequence of
n
continuous processes Bn (t) = k=1 Xk φk (t), k = 1, 2..., n converges uniformly
almost surely over [0, 1] and hence the limit is a Gaussian process with almost
surely continuous sample paths. This limiting process has all the properties of
Brownian motion. Some of the fundamental properties of BM proved using the
Borel-Cantelli lemmas are (a)

P (limh→0 supt∈[0,1−h] |B(t + h) − B(t)|/ Ch.log(1/h)) ≤ 1) = 1

for any C > 2 and for any C < 2,



P (limh→0 supt∈[0,1−h] |B(t + h) − B(t)|/ C.h.log(1/h) > 1) = 1

This result is known as Levy’s modulus of continuity of Brownian motion. (b)


The law of the iterated logarithm:

P (limsupt→∞ B(t)/ 2tloglog(t) = 1) = 1,

P (liminft→∞ B(t)/ 2tloglog(t) = −1) = 1

[13] Kolmogorov’s consistency theorem for stochastic processes.


This is perhaps the most fundamental theorem in the whole of probability for
it tells us of the existence of a stochastic process given only its finite dimensional
marginal distributions. The proof of this theorem is based on the regularity of
probability measures on Rn , ie, Borel sets can be approximated by compact
Advanced Probability and Statistics: Applications to Physics and Engineering 7

sets to any given degree of accuracy with respect to any probability distribution
on Rn . The proof also uses a fundamental result in topology on compact sets,
namely, given a nested sequence of non-empty compact sets, the intersection
of all these sets is then non-empty. The consistency theorem states that given
a consistent sequence of probability distributions Fn on Rn , for n = 1, 2, ...,
there exists a probability space (Ω, F, P ) and an infinite sequence of real valued
random variables Xn , n = 1, 2, ... on this probability space such that

Fn = P o(X1 , ..., Xn )−1 , n = 1, 2, ...

The complete proof of this theorem is outlined in [15b].

[14a] Wiener’s construction of Brownian motion using sin functions. This


has already been discussed with the proof based on the KL spectral expansion.
[14b] Definition of the Ito stochastic integral for Brownian motion and con-
tinuous martingales.

[14c] Proof of the almost sure existence and uniqueness of solutions to


stochastic differential equations driven by Brownian motion. Proof is based
on Doob’s L2 -inequality for martingales. Assume that the drift and diffusion
coefficients are Lipshitz continuous:

|μ(t, x) − μ(t, y)| ≤ K|x − y|, |σ(t, x) − σ(t, y)| ≤ K|x − y|

Then,  t
xn+1 (t) − xn (t) = (μ(s, xn (s)) − μ(s, xn−1 (s)))ds
0
 t
+ (σ(s, xn (s)) − σ(s, xn−1 (s))dB(s)
0
Thus, writing
Δn+1 (t) = max0≤s≤t |xn+1 (s) − xn (s)|2
we get  t
E(Δn+1 (t)) ≤ 2K 2 t E(Δn (s))ds
0
 t
+8K 2 E(Δn (s))ds
0
 t  t
2 2
= K (2t + 8) E(Δn (s))ds ≤ (2T + 8)K E(Δn (s))ds, 0 ≤ t ≤ T
0 0
from which, we get by iteration that

E(Δn+1 (t)) ≤ Ctn /n!, 0 ≤ t ≤ T

where C depends on T . This gives us

E[max0≤t≤T |xn+r (t) − xn (t)|]


8 Advanced Probability and Statistics: Applications to Physics and Engineering


n+r−1
≤ E[max0≤t≤T |xj+1 (t) − xj (t)|]
j=n


n+r−1  
n+r−1 √
1/2
≤ (E(Δj (t))) ≤ (CT n )/ n!
j=n j=n

which converges to zero as n, r → ∞ because



(T n /n!)1/2 < ∞
n

for any finite positive T . This proves

[15a] Paul Levy’s construction of Brownian motion using the Haar basis.
Let
D(n) = {k/2n : 0 ≤ k ≤ 2n }
I(n) = D(n) − D(n − 1) = {k/2n : 0 ≤ k ≤ 2n , kodd}
Let {ξ(n, k) : k ∈ I(n), n ≥ 0} be iid N (0, 1) random variables. Define the Haar
wavelet functions Hn (t), t ∈ [0, 1] by Hn,k (t) = 2(n−1)/2 , (k − 1)/2n < t < k/2n
and Hn (t) = −2(n−1)/2 , k/2n < t < (k + 1)/2n }. Clearly {Hn,k : n ≥ 0 : k ∈
I(n)} is an onb for L2 [0, 1]. Define the Schauder functions
 t
Sn,k (t) = Hn,k (s)ds
0

Then,
|Sn,k (t)| ≤ 2−(n+1)/2
We evaluate 
Hn,k (t)Hn,k (s) = δ(t − s)
n,k

and hence on integration, we get



Sn,k (t)Sn,l (s) = min(t, s)
n,k

Define the processes



n 
Bn (t) = ξ(m, k)Sm,k (t), n ≥ 0
m=0 k∈I(m)

Then clearly Bn (t) is a continuous stochastic process on [0, 1]. Further,

P (maxk∈I(n) |ξ(n, k)| > n} ≤ 2n P (|ξ(1, 1)| > n)


  ∞
P (|ξ(1, 1)| > n) = C. exp(−x2 /2)dx ≤ 2C (x/n).exp(−x2 /2)dx
|x|>n n
Advanced Probability and Statistics: Applications to Physics and Engineering 9

≤ (2C/n)exp(−n2 /2)
and since 
(2n /n).exp(−n2 /2) < ∞
n≥1

it follows that if we define the r.v’s

b(n) = max(ξ(n, k) : k ∈ I(n))

then, by the Borel-Cantelli Lemma,

P (b(n) > ni.o) = 0

Hence, for a.e. ω, there exists an integer N (ω) such that n > N (ω) implies
b(n) ≤ n. Then  
|ξ(n, k)|Sn,k (t) ≤
n>N (ω) k∈I(n)

n.2−(n+1)/2 < ∞
n≥1

Note that 
Sn,k (t)
k∈I(n)

has a minimum value of zero and a maximum value of 2−(n+1)/2 over the interval
[0, 1]. Its graph consists of nonoverlapping triangles of height 2−(n+1)/2 and base
widths 2−(n−1) .
The above argument implies that for a.e.ω, and for all n > N (ω)

supt∈[0,1] |Bn+r (t, ω) − Bn (t, ω)| ≤ n.2−(n+1)/2
m>n

which converges to zero as n, r → ∞. This means that for a.e.ω, the processes
Bn (., ω), n ≥ 1 converge uniformly over the interval [0, 1] and since each of
these processes is continuous, the limiting process B(., ω) is also continuous
with autocorrelation min(t, s) and hence we have explicitly constructed a mean
zero Gaussian process over the time interval [0, 1] that is a.e. continuous and
hs autcorrelation min(t, s). In other words, the limiting process is Brownian
motion over the time interval [0, 1].

[15b] Proof of the Kolmogorov existence theorem for stochastic processes.


Let Fn (x1 , ..., xn , t1 , .., tn ) be a consistent family of probability distributions
on Rn respectively for n = 1, 2, ... with t1 , ..., tn ∈ R+ . For any B ∈ B(Rn ), we
define 
Pt1 ...tn (B) = dFn (x1 , ..., xn , t1 , ..., tn)
B
Then, for each set. of distinct t1 , .., tn ∈ R+ , Pt1 ...tn is a probability measure on
(Rn , B(Rn )) and these probability measures form a consistent family. Our aim
10 Advanced Probability and Statistics: Applications to Physics and Engineering

is to show the existence of a probability measure P on (R[0,∞) , B(R[0,∞) ) such


that

P ({ω ∈ R[0,∞) : (ω(t1 ), ...ω(tn )) ∈ B) = Pt1 ,...,tn (B), B ∈ B(Rn )

We first define a finitely additive set function P on the field/algebra C generated


by the finite dimensional cylinder sets, ie, sets of the form

D = {ω : (ω(t1 ), ..., ω(tn )) ∈ B}

by
P (D) = Pt1 ...tn (B)
Then by the Caratheodory theorem, it is sufficient to prove that P is countably
additive on C. To prove this it is in turn sufficient to show that if Dn ∈ C and
Dn ↓ φ, then P (Dn ) ↓ 0. Suppose that P (Dn ) ↓ δ > 0. Then, we must arrive
at a contradiction. Since Dn+1 ⊂ Dn , it follows that if we write

Dn = {ω : (ω(t1 ), ...ω(tn )) ∈ Bn }

then Dn+1 is of the form

Dn+1 = {ω : (ω(t1 ), ..., ω(tn ), ω(tn+1 ), ..ω(tm )) ∈ Bm }

where
Bm ⊂ Bn × Rm−n
Thus, we can pad sets Dn,1 , ..., Dn,m−n−1 in between the sets Dn and Dn+1 so
that

Dn,k = {ω : (ω(t1 ), .., ω(tm ), ω(tm+1 ), ...ω(tm+k )) ∈ Bm ×Rk }, k = 1, 2, ..., m−n−1

and hence assume without loss of generality that Dn ↓ φ and

Dn = {ω : (ω(t1 ), ...ω(tn )) ∈ Bn }, n = 1, 2, ...

such that
Bn ∈ Rn , Bn+1 ⊂ Bn × R
Now by the regularity of probability measures on Rn , we can choose for each n
a non-empty compact set Kn ⊂ Bn such that

Pt1 ...tn (Bn − Kn ) < δ/2n , n = 1, 2, ...

Now define

K̃n = (K1 × Rn−1 ) ∩ (K2 × Rn−2 ) × ... × (Kn−1 × R) ∩ Kn , n ≥ 1

Then, K̃n is also a compact subset of Rn and further, defining

{ω : (ω(t1 ), ..., ω(tn )) ∈ K̃n } = En ⊂ Dn


Advanced Probability and Statistics: Applications to Physics and Engineering 11

we find that

n 
n
En = {ω : (ω(t1 ), ...ω(tm )) ∈ Km } = Fm
m=1 m=1

where
Fm = {ω : (ω(t1 ), ...ω(tm )) ∈ Km }
Then, En ↓ and


n 
n
P (Dn − En ) = P ( (Dn − Fm )) ≤ P ( (Dm − Fm ))
m=1 m=1


n 
n
≤ P (Dm − Fm ) = Pt1 ...tm (Bm − Km )
m=1 m=1


n
≤ δ/2m < δ
m=1

from which we deduce that



n
P (En ) ≥ P (Dn ) − P (Dn − En ) ≥ δ − δ/2m > −
m=1

Thus En is a non-increasing sequence of non-empty sets. In particular, it


follows that K̃n is non-empty for each n. Thus, for each n, we can choose
(x(n, 1), ..., x(n, n)) ∈ K̃n . Since K̃n ⊂ K1 × Rn−1 , it follows that x(n, 1) ∈
K1 . Since K1 is compact, it has a convergent subsequence {x(n1 , 1)}. Since
(x(n1 , 1), x(n1 , 2)) ∈ K̃2 ⊂ K2 and K2 is compact, it has a convergent subse-
quence (x(n2 , 1), x(n2 , 2)). continuing in this way, we get a convergent sequence
(x(nk , 1), ..., x(nk , k)) ∈ Kk for each k = 1, 2, ... such that (x(nk , 1), ..., x(nk , k −
1)) ∈ Kk−1 . Note that {nk } is a subsequence of {nk−1 } for each k. Let
limnk x(nk , k) = x(k), k = 1, 2, .... Then it follows easily that (x(1), ..., x(k)) ∈
Kk for each k and hence

Fm = {ω : (ω(1), ..., ω(m)) ∈ Km }

contains
 the point {ω : (ω(1), ..., ω(m)) = (x(1), ..., x(m))} for each m. Thus,
m≥1 m contains the point (x(1), x(2), ...) and in particular,
F  this set is non-
empty. But, Fm ⊂ Dm by our construction and hence m≥1 Dm is non-empty
which is a contradiction. This completes the proof of Kolmogorov’s existence
theorem.

[15c] The Kolomgorov-Centsov theorem for the existence of a continuous


modification of a stochastic process.
Remark: This theorem gives us an alternate way to construct Brownian
motion or more precisely to prove the existence of the Brownian motion process,
12 Advanced Probability and Statistics: Applications to Physics and Engineering

ie, a continuous Gaussian stochastic process with zero mean and autocorrelation
min(t, s). Let X(t) be a stochastic process such that

E[|X(t) − X(s)|a ] ≤ C|t − s|1+b

for all t, s ∈ [0, 1] with C, a, b positive constants. Then X(.) has a continuous
modification, ie, there exists a continuous stochastic process Y (t) defined on the
same probability space such that for any distinct t1 , ..., tn ∈ [0, 1], n = 1, 2, ...,
we have
P (X(tj ) = Y (tj ), j = 1, 2, ..., n) = 0
In particular, all the finite dimensional distributions of X(.) coincide with the
corresponding distributions of Y . The idea behind the use of this theorem
to construct Brownian motion is to first construct using infinite sequences of
iid Gaussian random variables, a zero mean Gaussian process X(t) having the
same autocorrelation function as that of Brownian motion, then prove that X(.)
satisfies the conditions of this theorem and hence use the theorem to deduce the
existence of a continuous modification of the X(.) process, ie, we get a process
Y with continuous trajectories having the same finite dimensional distributions
as that of Brownian motion and hence conclude that Y is a Brownian motion
process.
Proof of the theorem: Let

D(n) = {k/2n : 0 ≤ k ≤ 2n }

and then by the conditions of the theorem, we have

E[|X((k + 1)/2n ) − X(k/2n )|a ] ≤ C.(1/2n )1+b

Thus, by the union bound and Chebyshev’s inequality, for any γ > 0

P (max0≤k≤2n −1 |X((k + 1)/2n ) − X(k/2n )| > 1/2nγ ) ≤

2
n
−1
P (|X((k + 1)/2n ) − X(k/2n )| > 2−nγ ) ≤
k=0

2n 2naγ .C.2−n(1+b) = C.2−n(b−aγ)


Assume that
0 < γ < b/a
Then, by the above inequality since

2−n(b−aγ) < ∞
n≥0

it follows that

P (max0≤k≤2n −1 |X((k + 1)/2n ) − X(k/2n )| > 1/2nγ , i.o) = 0


Advanced Probability and Statistics: Applications to Physics and Engineering 13

[16] Annihilation, creation and conservation fields in quantum probability


theory.
Given a Hilbert space H and a tensor product, we can can construct the
Boson Fock space Γs (H) consisting of the direct sum of all symmetric tensor
products of H and a Fermion Fock space Γa (H) consising of the direct sum of all
antisymmetric tensor products of H. We then construct an exponential vector
e(u) in Γs (H) for each u ∈ H. The exponential vectors span a dense linear
manifold of the Boson Fock space and we can then construct an annihilation
operator a(u) insatisfying

a(u)e(v) =< u, v > e(v), u, v ∈ H

We can then show that


[a(u), a(v)∗ ] =< u, v >
a(u)∗ is a creation operator. Given a linear operator H in H, we can also
construct a conservation operator λ(H) in the Boson Fock space such that

< e(u)|λ(H)|e(v) >=< u|H|v >< e(u)|e(v) >

Λ(H) can be constructed as a quadratic combination of the creation and an-


nihilation operators when H is a Hermitian operator. From the annihilation
and creation operator fields, we can by means of a spectral measure E(.) in
H construct annihilation and creation ”processes” a(E[0, t]u) and a(E[0, t]u)∗
and a conservation ”processes” λ(E[0, t]H) provided that H commutes with the
spectral measure and then using these ”quantum stochastic processes”, build
a non-commutative theory of stochastic processes such that in different states,
these processes exhibit different kinds of statistics. Specifically, we can lin-
early combine the creation and annihilation processes such that the resulting
process has all properties of a classical commutative Brownian motion in an
exponential/coherent state. Likewise, we can show that the conservation pro-
cess satisfies all properties of the classical Poisson process in a coherent state.
Thus, the fundamental stochastic processes in classical probability can be real-
ized using operator theory in Boson Fock space as special cases. The position
and momentum operators for a quantum Harmonic oscillator are expressible as
linear combinations of creation and annihilation operators and extending this
idea, noisy position and momentum processes like Brownian motion and Pois-
son processes are expressible in terms of creation, annihilation and conservation
processes in quantum probability.

[17] Annihilation, creation and conservation processes in quantum proba-


bility. This is a study project. Learn about how one defines the creation,
annihilation and conservation operators fields in a Boson Fock space both using
the Weyl operator and by using an infinite sequence of one dimensional quantum
harmonic oscillators. In the arguments of these operator fields make the vectors
14 Advanced Probability and Statistics: Applications to Physics and Engineering

and matrices equal to vectors and matrices multiplied by the time domain spec-
tral projection and then derive the quantum Ito formula using the commutation
relations between the creation and annihilation operator fields. Explain using
basic physical principles why this derivation shows that the Ito formula can be
alternatively viewed as a manifestation of the Heisenberg uncertainty principle
between position and momentum.

[18] Ito’s formula for Brownian motion and Poisson processes and their quan-
tum generalizations. Take a Brownian motion process B(t) and verify the Levy
oscillation property


N −1
limN →∞ (B((n + 1)t/N ) − B(nt/N ))2 = t
n=0

in the mean square sense by computing the mean and variance of the lhs. Ex-
plain intuitively how this relationship can be cast in the form

(dB(t))2 = dt

Take a Poisson process N (t) and prove that

E((N (t + h) − N (t))2 − (N (t + h) − N (t)))2 = 0

Explain intuitively how this relationship can be cast in the form

(dN (t))2 = dN (t)

Hint: Show that if h = T /N , then


M −1 
M −1
limM →∞ [ (N ((k + 1)h) − N (kh))2 − (N ((k + 1)h) − N (kh))]2 = 0
k=0 k=0

To prove this, make use of the independent increment property of the Poisson
process and calculate E[N (h)2 ] and E(N (h)4 ) using

P (N (h) = n) = exp(−λh)(λh)n /n!, n = 0, 1, 2, ...

By taking limits prove in the mean square sense that if f (t) is a continuous
function, then
 T  T
E( f (t)dB(t))2 = f 2 (t)dt
0 0
where the rhs is interpreted as the limit of


N −1
f (nt/N )(B((n + 1)t/N ) − B(nt/N ))
n=0

Show that
df (B(t)) = f  (B(t))dB(t) + (1/2)f  (B(t))dt
Advanced Probability and Statistics: Applications to Physics and Engineering 15

using mean square logic, ie,



N −1  T
f (B(t))−f (0) = limN →∞ f  (B(nt/N ))(B((n+1)t/N )−B(nt/N ))+(1/2) f  (B(t))dt
n=0 0

Likewise, show that


df (N (t)) = (f (N (t) + 1) − f (N (t))dN (t)
by using the fact that dN (t) = 0, 1.
[19] Classical and quantum scattering theory with comparisons. Computing
the wave operators in quantum scattering. Computing the scattering matrix in
terms of the potential for Schrodinger Hamiltonians.
Study project: When the scattering centre generates a random potential
V (Q) with known mean and autocorrelation EV (Q) and E(V (Q) ⊗ V (Q )),
then what would be the average value of the scattering matrix and what would
be its mean square fluctuation. In the classical case, if the scattering centre
generates a random radial potential V (r), then what would be the mean square
value of the scattering cross section for a given angle and what would be the
statistical correlation between the scattering cross sections at different angles in
terms of EV (r) and E(V (r)V (r ))
[20] Scattering matrix for quantum fields. Examples taken from electron-
positron-photon interactions. Approximate computation of the scattering ma-
trix using the Dyson series. Evaluating the various terms in the Dyson series
using Feynman diagrams.
[21] The Borel-Cantelli lemmas
[1] Let En , n ≥ 1 be a family of events such that

P (En ) < ∞
n

Then
P (En , i.o) = 0
Proof: 
{En , i.o} = Ek
n k≥n

Thus,  
P (En , i.o) = limn→∞ P ( Ek ) ≤ P (Ek ) → 0
k≥n k≥n

[2] Let En , n ≥ 1 be a family of mutually independent events such that



P (En ) = ∞
n
16 Advanced Probability and Statistics: Applications to Physics and Engineering

Then
P (En , i.o) = 1
In fact, 
1 − P (En , i.o) = P ( Ekc )
n k≥n

= limn Πk≥n P (Ekc ) = limn Πk≥n (1 − P (Ek )) = 0


since
Πk≥n (1 − P (Ek )) ≤ Πk≥n exp(−P (Ek ))

= exp(− P (Ek )) = 0
k≥n

[22]Some remarks on classical and quantum entropy


[1] Suppose f (x) is a probability density on Rn such that its entropy H(f ) =
− f (x).log(f (x))dx is a maximum subject to K constraints:

gk (x)f (x)dx = μk , k = 1, 2, ..., K

Then show that f (x) is given by


K
f (x) = C.exp( c(k)gk (x))
k=1

where the c(k) s are given by the equation


  
exp( c(k)gk (x))gm (x)dx = μm /C, m = 1, 2, ..., K
k k

and C is a normalizing constant, ie,


 
C −1 = exp( c(k)gk (x))dx
k

[2] Suppose ρ is a mixed quantum state such that its Von-Neumann entropy
H(ρ) = −T r(ρ.log(ρ)) is a maximum subject to constraints

T r(ρHk ) = μk , k = 1, 2, ..., K

Where Hk , k = 1, 2, ..., K are Hermitian matrices. Then, show that ρ is given


by

K
ρ = C.exp(− β(k)Hk )
k=1
Advanced Probability and Statistics: Applications to Physics and Engineering 17

where the β(k) s are determined by



C −1 μm = T r(exp(− β(k)Hk )Hm ), m = 1, 2, ..., K
k

hint: Writing ρ = exp(Z), we have

δρ = ρ.((1 − exp(−ad(Z))/ad(Z))(δZ)

So
δ(ρ.log(ρ)) = δ(Z.exp(Z))
= δZ.exp(Z) + Z.exp(Z)((1 − exp(−ad(Z))/ad(Z))(δZ)
We may assume that Z is a Hermitian matrix. Then,

δ(T r(ρ.log(ρ))) =

T r(δZ.(ρ + g(ad(Z))∗ (Z.ρ)))


where
g(x) = (1 − exp(−x))/x
Simplifying, the above expression, we get

δ(T r(ρ.log(ρ))) = T r(δZ.ρ(1 + Z))

[3] Let μ and ν be two probability measures on the same measurable space.
Define  
H = supf ( f dν − log( exp(f )dμ))

Show that the supremum is attained when

f = log(dν/dμ) + C

provided that ν << μ and hence deduce that



H = H(ν|μ) = dν.log(dν/dμ)

[4] Let ρ, σ be two quantum states in a given Hilbert space and let X vary
over all observables (Hermitian matrices) in the same Hilbert space. Compute

H = supX (T r(ρ.X) − log(T r(σ.exp(X))))

hint: Consider the spectral decomposition of X:



X= |ek > p(k) < ek |
18 Advanced Probability and Statistics: Applications to Physics and Engineering

Then,
T r(ρ.X) − log(T r(σ.exp(X))) =
 
p(k) < ek |ρ|ek > −log( exp(p(k)) < ek |σ|ek >)
k k

[23] Problems on Brownian motion


[1] Let B(t) be Brownian motion and T the first time at which it hits zero.
Then for 0 < x < y, compute

pdy = Px (B(t) ∈ dy, T > t)

hint: Let τ be the first time at which the process hits either zero or y. Then τ
is a finite stop-time and hence by Doob’s optional stopping theorem,

x = Ex B(τ ) = y.P (T > t)

and hence
P (T > t) = x/y
Now let s(t) = min(B(u) : u ≤ t). Then

pdy = Px (B(t) ∈ dy, s(t) > 0)

We compute instead using the reflection principle,

Px (B(t) ∈ dy, s(t) ≤ 0) = Px (B(t) ∈ [−y, −y + dy])

= (2πt)−1/2 exp(−(x + y)2 /2t)dy


and hence

Px (B(t) ∈ dy, T > t) = (2πt)−1/2 (exp(−(y −x)2 /2t)−exp(−(y +x)2 /2t)), y > 0

Problem: By integrating over y > 0, compute P (T > t).


[2] Repeat Problem [1] for d-dimensional Brownian motion where d ≥ 2 and
B(t) is replaced by |B(t)|.
hint: |B(t)|2 − dt is a Martingale and hence for 0 < a < x < b, if τ denotes
the first time at which |B(t)| hits b starting from x, we have by Doob’s optional
stopping theorem,

|x|2 = Ex (|B(τ )|2 − dτ ) = |b|2 Px (|B(τ )| = b) + |a|2 Px (|B(τ )| = b) − dEx [τ ]

Let X(t) be a d− dimensional diffusion process with drift μ(x) and diffusion
coefficient σ(x). Suppose f (x) satisfies

Lf (x) = 0, L = μ(x)T ∇x (.) + (1/2)T r(∇x ∇Tx (.))


Advanced Probability and Statistics: Applications to Physics and Engineering 19

Then, f (X(t)) is a Martingale and hence if τ denotes the first time at which
f (X(t)) hits either a or b starting from x ∈ (a, b), then by Doob’s optional
stopping theorem,

f (x) = Ex f (X(τ )) = f (a).P (X(τ ) = a) + f (b).P (X(τ ) = b),

P (X(τ ) = a) + P (X(τ ) = b) = 1
Thus, the probability that X(t) hits f −1 ({a}) before it hits f −1 ({b}) staring at
x is given by
P (X(τ ) = a) = (f (x) − f (b))f (a) − f (b))
and the probability that X(t) hits f −1 ({b}) before it hits f −1 ({a}) is given by

P (X(τ ) = b) = (f (a) − f (x))/(f (a) − f (b))

[24] Levy’s modulus for Brownian motion


Let c > 0 and put

An,k = {|B((k + 1)/2n ) − B(k/2n )| > c.2−n .nlog(2)}

Then since for a standard normal random variable Z, we have



P (|Z| > a) ≤ 2(a 2π)−1 .exp(−a2 /2), a > 0

we get taking
Z = 2n/2 (B((k + 1)/2n ) − B(k/2n )),
√ √
a = c.log(2). n
that
P (An,k ) ≤ C.n−1/2 .exp(−cn.log(2)/2) = C.n−1/2 .2−nc/2
Thus,
2

n

n
P( An,k ) ≤ P (An,k ) ≤ Cn−1/2 .2n(1−c/2)
k=1 k=1

So for c > 2,
2

n

limn→∞ P ( An,k ) = 0
k=1

Taking t = k/2n and h = 1/2n , we have that


 
c.h.log(1/h) = c.2−n .n.log(2)

and we deduce from the above equation and the continuity of the Brownian
paths that

P (limsuph→0 sup0<t<1−h |B(t + h) − B(t)|/ c.h.log(1/h) > 1) = 0
20 Advanced Probability and Statistics: Applications to Physics and Engineering

for all c > 2. Note that this is equivalent to the statement that

limδ→0 P (sup0<h<δ sup0<t<1−h |B(t + h) − B(t)|/ c.h.log(1/h) > 1) = 0∀c > 2

To go the other way round, we observe that


2

n

Acn,k ) ≤ Π2k=1 (1 − C1 .n−1 .exp(−cn.log(2)/2))


n
P(
k=1

= (1−C1 .n−1 .exp(−cn.log(2)/2))exp(n.log(2))

≤ exp(−C1 .n−1 .exp(n.log(2)).exp(−c.n.log(2)/2))

= exp(−C1 .n−1 .exp(n.log(2)(1 − c/2))


which is obviously summable for c < 2. Thus, for c < 2, we have that with
probability one, there is a finite positive integer N (ω) such that for all n > N (ω),
2n
we have that k=1 An,k occurs. This proves that

P (limsuph→0 sup0<t<1−h |B(t + h) − B(t)|/ c.h.log(1/h) > 1) = 1∀c < 2

Remark: Let Z be a standard normal r.v. Then, we have for a > 0


 ∞
P (Z > a) = (2π)−1/2 exp(−x2 /2)dx
a
 ∞  ∞
2
= C0 x.exp(−x /2)dx + C0 x2 .exp(−x2 /2)dx
a a
 ∞  ∞
≥ C0 x.exp(−x2 /2)dx + C0 a x.exp(−x2 /2)dx
a a

= C0 exp(−a2 /2) + C0 a.exp(−a2 /2) ≥ C0 .exp(−a2 /2)


where
C0 = (2π)−1/2
[25]The law of the iterated logarithm for Brownian motion
Define 
ψ(t) = 2.t.loglog(t), t > e
Let q > 1. Then 
ψ(q n ) = 2q n .(log(n) + loglogq),
We have
P (B(q n ) − B(q n−1 ) > ψ(q n − q n−1 ))
≥ C.exp(−ψ(q n − q n−1 )2 /2(q n − q n−1 ))
= C.exp(−loglog(q n − q n−1 )) = C.exp(−log((n − 1)logq + log(q − 1)))
≥ C1 (n − 1)−1
Advanced Probability and Statistics: Applications to Physics and Engineering 21

for all sufficiently large n. This is not summable. Hence by the Borel-Cantelli
Lemma we have that with probability one the events {B(q n ) − B(q n−1 ) >
ψ(q n − q n−1 )}, n = 1, 2, ... occur infinitely often. On the other hand, for any
a > 1,

P (B(q n )−B(q n−1 ) > aψ(q n −q n−1 )) ≤ C2 .exp(−a2 .log((n−1)logq+log(q−1)))


2
≤ C3 .(n − 1)−a
for sufficiently large n. This is summable. Hence with probability one, there
exists an integer N (ω) such that for all n > N (ω), all the events {B(q n ) −
B(q n−1 ) ≤ aψ(q n − q n−1 )} occur.
Now consider for a > 1,

P (max0<t<qn B(t) > aψ(q n )) = 2.P (B(q n ) > aψ(q n ))

≤ C1 q n/2 ψ(q n )−1 .exp(−a2 ψ(q n )2 /2q n )


2
≤ C2 (log(n))−1 .exp(−a2 .logn) = C2 /na .log(n)
which is summable. Thus with probability one, there is a finite integer N (ω),
such that for all n > N (ω), the event {max0<t<qn B(t) ≤ aψ(q n )} occurs. In
particular, {maxqn−1 <t<qn B(t) ≤ a.ψ(q n )} holds for all but finitely many n
with probability one. It is easily seen from this that limsupt→∞ B(t)/ψ(t) ≤ a
with probability one and since this is true for all a > 1, it follows that with
probability one, limsupt→∞ B(t)/ψ(t) ≤ 1.

[26](For Rohit Singh). Estimating the image intensity field when the noise is
Poisson plus Gaussian with the mean of the Poisson field being the true intensity
field.
The image field has the model

u(x, y) = N (x, y) + W (x, y)

where N (x, y) is Poisson with unknown mean u0 (x, y) and W (x, y) is N (0, σ 2 ).
Further we assume that N (x, y), W (x, y), x, y = 1, 2, ..., M are all independent
r.v’s. {u0 (x, y)} is the denoised image field and is to be estimated from mea-
surements of {u(x, y)}.
Remark: We write

u(x, y) = u0 (x, y) + (N (x, y) − u0 (x, y)) + W (x, y)

and interpret u0 (x, y) as the signal/denoised image field and N (x, y)−u0 (x, y)+
W (x, y) as the noise. The pdf of the noisy image field is

p(u|u0 ) = ΠM
x,y=1 φ((u(x, y) − n)/σ).exp(−u0 (x, y))u0 (x, y)n /n!
n≥0
22 Advanced Probability and Statistics: Applications to Physics and Engineering

and hence the log-likelihood function to be maximized after taking into account
a regularization term that minimizes the error energy in the image field gradient,
ie, reduces prominent edges is given by
L(u|u0 ) = log(p(u|u0 ))−E(u0 )

M 
= log( φ((u(x, y)−n)/σ).exp(−u0 (x, y))u0 (x, y)n /n!)
x,y=1 n≥0


M
−c. |∇u0 (x, y)|
x,y=1

where

|∇u0 (x, y)|2 = (u0 (x + 1, y) − u0 (x, y))2 + (u0 (x, y + 1) − u0 (x, y))2

Setting the gradient of this w.r.t u0 (x, y) to zero gives us the optimal equation

[ φ((u(x, y) − n)/σ).exp(−u0 (x, y))u0 (x, y)n /n!]−1
n

×[ φ((u(x, y) − n)/σ)exp(−u0 (x, y))(u0 (x, y)n−1 /(n − 1)! − u0 (x, y)n /n!)]
n

−cdiv(∇u0 (x, y)/|∇u0 (x, y)|) = 0


where
∇u0 (x, y)[u0 (x + 1, y) − u0 (x, y), u0 (x, y + 1) − u0 (x, y)]T
and
div([f1 (x, y), f2 (x, y)]T ) = f1 (x + 1, y) − f1 (x, y) + f2 (x, y + 1) − f2 (x, y)

The solution to this problem cannot be achieved in closed form. However, we


can use a non-linear diffusion like equation to converge to the optimum u0 . That
algorithm is based on the differential equation

u0 (t+1, x, y)

= u0 (t, x, y)+μ.[[ φ((u(x, y)−n)/σ).exp(−u0 (t, x, y))u0 (t, x, y)n /n!]−1
n

×[ φ((u(x, y)−n)/σ)exp(−u0 (t, x, y))(u0 (t, x, y)n−1 /(n−1)!−u0 (t, x, y)n /n!)]
n
−cdiv(∇u0 (t, x, y)/|∇u0 (t, x, y)|)]
Note that √
φ(x) = ( 2π)−1 .exp(−x2 /2)
Simulating this algorithm: First we explain how to simulate a Poisson ran-
dom variable with given mean λ. We use the fact that a binomial random vari-
able with parameters n, p = λ/n converges in distribution to a Poisson random
variable with mean λ. Further, a binomial random variable with parameters
Advanced Probability and Statistics: Applications to Physics and Engineering 23

n, p can be expressed as the sum of n independent Bernoulli random variables,


each having success parameter p. Thus, we choose a large n, say n = 100 and
λ ∈ (0, 1) so that p = λ/n and write a MATLAB program as: sum = 0 for
k = 1 : n, U = rand; if U < p, X = 1 else X = 0; end sum = sum + X; end;
P = sum;. Then, P will approximately be Poisson with mean λ.

Now define an M × M image field u0 (x, y): for x = 1 : M, f ory = 1 : M


u0 (x, y) = 10 ∗ randn(x, y); end; end;. For each x, y = 1, 2, ..., M , simulate
u(x, y) as a Poisson r.v. with mean u0 (x, y) followed by addition with W GN :
f orx = 1 : M, f ory = 1 : M p = u0 (x, y)/n; sum = 0; for k = 1 : n, U = rand;
if U < p, X = 1 else X = 0; end; sum = sum + X; end; P = sum; u(x, y) =
P + sigma ∗ randn; end; end;. Take sigma = 0.1.
Now apply the image restoration algorithm: Encode the smoothened image
field u0 (t, x, y) as a M 2 × K matrix U0 with U0 (M (x − 1) + y, t) = u0 (t, x, y)
where x, y = 1, 2, ..., M, t = 1, 2, ..., K. The iteration now reads:
for t=1:K
for x = 1 : M for y = 1 : M
u0 (M (x−1)+y, t+1))

= u0 (M (x−1)+y, t)+μ.[[ φ((u(x, y)−n)/σ).exp(−u(M (x−1)+y, t))u0 (M (x−1)+y)n /n!]−1
n

×[ φ((u(x, y)−n)/σ)exp(−u0 (M (x−1)+y, t))(u0 (M (x−1)+y, t)n−1 /(n−1)!
n
−u0 (M (x−1)+y, t)n /n!)]
−c.div(∇u0 (M (x − 1) + y)/|∇u0 (M (x − 1) + y, t)|)]
While writing the program, we have to evaluate two sums:

S1 = φ((u(x, y) − n)/σ).exp(−u0 (M (x − 1) + y, t))u0 (M (x − 1) + y)n /n!
n
and

S2 = φ((u(x, y)−n)/σ)exp(−u0 (M (x−1)+y, t))(u0 (M (x−1)+y, t)n−1 /(n−1)!
n
−u0 (M (x−1)+y, t)n /n!)
We approximate these infinite series with sums upto a large number say N
terms. The program for evaluating S1 and S2 is as follows: These programs
are to be written inside the for loops
√ for t, x, y: sum1 = 0; sum2 = 0; for
n = 1 : N sum1 = sum1 + (sigma ∗ 2π)−1 ∗ exp(−(u(x, y) − n)2 /2 ∗ sigma2 ) ∗
exp(−(u0 (M (x − 1) + y, t))) ∗ u0 (M (x − 1) + y)n /n!; sum2 = sum2 + (sigma ∗

2π)−1 ∗exp(−(u(x, y)−n)2 /2∗sigma2 )∗exp(−(u0 (M (x−1)+y, t)))∗(u0 (M (x−
1) + y)n−1 /(n − 1)! − u0 (M (x − 1) + y)n /n!); end; S1 = sum1; S2 = sum2;

Application of large deviation theory: Suppose, the denoised image field


u0 (x, y) = u0 is a constant, ie, it does not vary with x, y. Then, u(x, y), x, y =
1, 2, ..., M are iid random variables with common density

p(u|u0 ) = σ −1 φ((u − n)/σ).exp(−u0 )un0 /n!
n≥0
24 Advanced Probability and Statistics: Applications to Physics and Engineering

The joint pdf of all the pixel intensities in the image is then

P (u|u0 ) = ΠM
x,y=1 p(u(x, y)|u0 )

and in in this case, we do not introduce any regularization. Thus, the problem
amounts to constructing the mle of u0 given the matrix of measurements U =
((u(x, y))). We therefore consider the problem of measuring the parameter θ on
which the pdf of X1 depends given an iid sequence X1 , ..., Xn and ask how this
esimator behaves as n → ∞. let

L(X1 |θ) = logp(X1 |θ)

Then, the mle of θ based on the measurements X1 , ...Xn is given by



n
θ̂[n] = argmaxθ L(Xk |θ)
k=1

We write
θ = θ0 + δθ
where θ0 is the true value of θ and then note that

L(Xk |θ0 + δθ) = L(Xk |θ0 ) + Lk (Xk |θ0 )δθ + (1/2)Lk (Xk |θ0 )(δθ)2

so that the mle of θ is given approximately by

θ̂[n] = θ0 + δ θ̂[n]

where

n
δ θ̂[n] = argmaxδθ [Lk (Xk |θ0 )δθ + (1/2)Lk (Xk |θ0 )(δθ)2 ]
k=1

Thus,

n 
n
δ θ̂[n] = −[ L (Xk |θ0 )]−1 .[ L (Xk |θ0 )]
k=1 k=1

By the law of large numbers, this converges as n → ∞ almost surely to

E[L (X1 |θ0 )]/E[L (X1 |θ0 )] = 0

since  
 
E[L (X1 |θ0 )] = p (X1 |θ)dX1 = ∂θ p(X1 |θ)dX1 = 0

The LDP tells us at what rate does δ θ̂[n] converge to  zero. From the contraction
principle of LDP, if I(x, y) is the rate function of n−1 . k=1 (L (Xk |θ0 ), L (Xk |θ0 )),
n

then the rate function of δ θ̂[n] is given by

I0 (θ) = inf(x,y) {I(x, y) : −x/y = θ} = infy {I(−θy, y)}


Advanced Probability and Statistics: Applications to Physics and Engineering 25

[27] Supersymmetry in quantum stochastic calculus


Let Eab be the N × N matrix with a one in the (a, b)th position and zeros
at the other positions. Put σab = σ(Eab ). In other words, if 1 ≤ a, b ≤ r or
r + 1 ≤ a, b ≤ N , then σab = 0 and σab = 1 otherwise. Define

0r 0r×N −r
H = diag(0r , IN −r ) =
0N −r×r IN −r

Define
Ir 0r×N −r
K = (−1)H = diag(Ir , −In−r ) =
0N −r×r −IN −r
Then consider the Boson Fock space

Γs (L2 (R+ ) ⊗ CN )

and for u ∈ L2 (R+ ) ⊗ CN , let ua (t) denote its ath component, 1 ≤ a ≤ N .


Let Λab (t), 0 ≤ a, b ≤ N denote the usual noise processes of the Hudson-
Parthasarathy quantum stochastic calculus. Thus, we have the quantum Ito’s
formula
dΛab (t).dΛcd (t) = ad dΛcb (t)
where 0 ≤ a, b, c, d ≤ N and ab is one if 1 ≤ a = b ≤ N and zero otherwise. For
u ∈∈ L2 (R+ ) ⊗ CN , define u0 (t) = 1. Define

G(t) = Γ((−1)Hχ[0,t] ) = Γ(exp(iπHχ[0,t] ))

= exp(iπλ(Hχ[0,t] )) = exp(iπΛt (H))


It is clear that G(t) is both unitary and Hermitian as an operator in the Boson
Fock space Γs (L2 (R+ ) ⊗ CN ) and its action on the exponential vector |e(u) >
is given by
G(t)|e(u) >= |e((−1)Hχ[0,t] u) >= |e(Kuχ[0,t] + uχ(t,∞) ) >

Note that
K 2 = IN , K ∗ = K
We shall now prove that for s < t,
G(t)dΛab (s) = (−1)σab dΛab (s).G(t)
Indeed, we have for s < t,

< e(u)|G(t)dΛab (s)|e(v) >=< G(t)e(u)|dΛab (s)|e(v) >=

< e(Kuχ[0,t] +u.χ(t,∞) )|dΛab (s)|e(v) >


= va (s)(K ū(s))b ds. < e(Kuχ[0,t] +u.χ(t,∞) )|e(v) >
= va (s)(K ū(s))b ds < e(u)|G(t)|e(v) >
26 Advanced Probability and Statistics: Applications to Physics and Engineering

On the other hand,

< e(u)|dΛab (s).G(t)|e(v) >=< e(u)|dΛab (s)|e(Kvχ[0,t] + vχ(t,∞) ) >

= ūb (s).(Kv(s))a ds. < e(u)|e(Kvχ[0,t] + vχ(t,∞) ) >


= ūb (s)(Kv(s))a ds. < e(u)|G(t)|e(v) >
Next, we observe that

va (s)(K ū(s))b = Kbb va (s)ūb (s),

ūb (s)(Kv(s))a = Kaa va (s)ūb (s)


and further,
Kbb = (−1)sigmaab Kaa
since (−1)σab equals 1 if either 1 ≤ a, b ≤ r or r + 1 ≤ a, b ≤ N and it equals
−1 otherwise. Note that we are here assuming that K00 = 1, K0a = 1, 1 ≤ a ≤
r, K0a = −1, r + 1 ≤ a ≤ N and this extended K is symmetric.

This proves the claim. Now define the process ξab (t) by

dξab (t) = G(t)σab dΛab (t)

We have for s < t, using the above identity,

dξab (t).dξcd (s) = G(t)σab dΛab (t).G(s)σcd dΛcd (s)

= G(t)σab .G(s)σcd .dΛab (t).dΛcd (s)


= G(s)σcd G(t)σab dΛcd (s).dΛab (t)
= G(s)σcd (−1)σab σcd dΛcd (s)G(t)σab .dΛab (t)
= (−1)σab σcd .dξcd (s).dξab (t)
In terms of super-commutation relations, we can express this identity as

[dξab (t), dξcd (s)]S = 0, s < t − − − (1)

where for two operators A, B, their super-commutator is defined as

[A, B]S = AB − (−1)σ(A)σ(B) BA

The grading of ξab (t) process is σab = σ(Eab ). From (1), it follows on integration
that
[dξab (t), ξcd (s)]S = 0, s < t
Note that we have used the easily proved identity

G(t)G(s) = G(s)G(t)∀s, t
Advanced Probability and Statistics: Applications to Physics and Engineering 27

Now consider

dξab (t).dξcd (t) = G(t)σab .dΛab (t).G(t)σcd .Λcd (t)

= G(t)σab +σcd dΛab (t).dΛcd (t)


= G(t)σab +σcd ad .dΛcb (t)
= G(t)σab +σca ad dΛcb (t)
= G(t)σcb ad dΛcb (t)
a
= d dξcb (t)

Interchanging the ordered pairs (ab) and (cd) gives us


c
dξcd (t).dξab (t) = b dξad

Thus we get
[dξab (t), dξcd (t)]S = a
d dξcb − (−1)σab σcd cb dξad
Now Let A, B be N + 1 × N + 1 matrices. We define

ξA (t) = Aab ξab (t)

where summation over the repeated indices a, b is implied. Then, we have from
the above,

[dξA (t), dξB (t)]S = Aab Bcd ad dξcb (t) − (−1)σab σcd Aab Bcd cb dξad (t)

= (B. .A)cb dξcb (t) − ((−1)σ(A).σ(B) A .B)ad dξad (t)


where the notation (−1)σ(A).σ(B) A .B means

((−1)σ(A).σ(B) A .B)ad = (−1)σab Aab bc (−1)σcd Bcd

with summation over the repeated indices b, c being implied.

[28] Quantum mechanics and semi-martingales. Let X(t) be a RCLL semi-


martingale so that the Doleans-Dade-Meyer-Ito formula holds:

f (X(t + dt)) = f (dXc (t) + ΔX(t) + X(t−))

= f (X(t−)) + f  (X(t−))dXc (t) + f  (X(t−))d[Xc , Xc ](t)/2 + Δf (X(t))


where
ΔX(t) = X(t) − X(t−), Δf (X(t)) = f (X(t)) − f (X(t−))
or equivalently,

df (X(t)) = f  (X(t−))dXc (t) + f  (X(t−))d[Xc , Xc ](t)/2 + Δf (X(t))


28 Advanced Probability and Statistics: Applications to Physics and Engineering

= f  (X(t−))(dX(t) − ΔX(t)) + f  (X(t−))d[Xc , Xc ](t)/2 + Δf (X(t))


Determine the corresponding Doleans-Dade-Meyer-Ito correction term dP (t)
that is to be added to the quantum evolution equation

dU (t) = (−iHdt + dP (t) + V (t)df (X(t)))U (t)

In order to make U (t) unitary for every t.

[29] Relative entropy in quantum mechanics


[1]
D(ρ|σ) = T r(ρ.(log(ρ) − log(σ)))

Wp = p(x)Wx
x∈A

p × W = diag[p(x)Wx , x ∈ A]
p ⊗ Wp = diag[p(x)Wp , x ∈ A]
D(p × W |p ⊗ Wp ) =
T r[(p × W )[ln(p × W ) − ln(p ⊗ Wp )]]
 
= p(x)T r(Wx .ln(p(x)Wx )) − p(x)T r(Wx (ln(p(x)) + ln(Wp )))
x x
 
= p(x)T r(Wx (ln(p(x)) + ln(Wx ))) − p(x)(ln(p(x) + T r(Wx .ln(Wp )))
x x
 
= p(x)(ln(p(x)) + T r(Wx .ln(Wx ))) − p(x)(ln(p(x)) + T r(Wx .ln(Wp )))
x x

=− p(x)H(Wx ) + H(Wp ) = I(p, W )
x

[30] Application of the Euler-Maclaurin summation formula to computing


the maximum likelihood estimator of an image field subject to Poisson and
Gaussian noise.
Let u0 (x, y) be the origina image field where 1 ≤ x, y ≤ N . Let P (λ) denote
a Poisson random variable with mean λ. Then, the noisy image field has the
form
u(x, y) = ζ.P (u0 (x, y)/ζ) + w(x, y)
where the Poisson r.v’s P (u0 (x, y)/ζ), 1 ≤ x, y ≤ N are independent r.v’s and
w(x, y) are iid N (0, σ 2 ). We note that

E(u(x, y)) = u0 (x, y), V ar(u(x, y)) = ζ.u0 (x, y) + σ 2

so that as ζ, σ 2 → 0, we have that

u(x, y) → u0 (x, y)
Advanced Probability and Statistics: Applications to Physics and Engineering 29

The likelihood function of the image field u is then clearly given by



p(u|u0 ) = CΠx,y=1,2,...,N [exp(−(u(x, y)−nζ)2 /2σ 2 )(u0 (x, y)/ζ)n exp(−u0 (x, y)/ζ)/n!]
n≥0

and the log likelihood function is therefore


N N 
ln(p(u|u0 )) = ln(C)− (u0 (x, y)/ζ)+ ln( [exp(−(u(x, y)
x,y=1 x,y=1 n≥0

−nζ)2 /2σ 2 )(u0 (x, y)/ζ)n /n!])


Using the first order approximation to the Euler-Maclaurin summation formula,
∞  ∞
f (n) ≈ f (x)dx + f (0)/2 − (B2 /2)f  (0)
n=0 0

(since f (∞) = 0) we get



[exp(−(u(x, y) − nζ)2 /2σ 2 )(u0 (x, y)/ζ)n /n!]
n≥0
 ∞
≈ [exp(−(u(x, y) − xζ)2 /2σ 2 )exp(x.ln(u0 (x, y)/ζ))dx/Γ(x + 1)
0

+(1/2)exp(−u(x, y)2 /2σ 2 )

−(1/2)exp(−u(x, y)2 /2σ 2 )[ln(u0 (x, y)/ζ)−(ζu(x, y)/σ 2 )u(x, y)]


Thus, the approximate likelihood function to be maximized w.r.t u0 (x, y) is
given by
E(u0 (x, y)|u(x, y)) =
 ∞
exp(−u0 (x, y)/ζ)[ [exp(−(u(x, y)−zζ)2 /2σ 2 )exp(z.ln(u0 (x, y)/ζ))dz/Γ(z+1)
0

+(1/2)exp(−u(x, y)2 /2σ 2 )−(B2 /2)exp(−u(x, y)2 /2σ 2 )[ln(u0 (x, y)/ζ)
−(ζu(x, y)/σ 2 )]]
For large values of z, the Γ(z + 1) factor appearing in the denominator becomes
very large and hence the integral above can be well approximated by
 T
A [exp(−(u(x, y) − zζ)2 /2σ 2 )exp(z.ln(u0 (x, y)/ζ)]dz
0
where T is finite and A represents some sort of an average of 1/Γ(z + 1) over
[0, T ]. Further, for z >> T , the integrand above goes rapidly to zero as
exp(−z 2 ζ 2 /2σ 2 ). Hence the above integral can further be well approximated
by  ∞
A [exp(−(u(x, y) − zζ)2 /2σ 2 )exp(z.ln(u0 (x, y)/ζ)]dz
0

Further, for small ζ, we have that u0 (x, y)/ζ > 1 and hence ln(u0 (x, y)/ζ) > 0.
Then for negative values of z, exp(−(u(x, y) − zζ)2 /2σ 2 ).exp(z.ln(u0 (x, y)/ζ))
becomes very small and hence the above integral can be extended to the range
30 Advanced Probability and Statistics: Applications to Physics and Engineering

(−∞, ∞) without causing too much error. This means the finally, the function
to be maximized is
E(u0 (x, y)|u(x, y)) ≈
 ∞
A.exp(−u0 (x, y)/ζ). [exp(−(u(x, y) − zζ)2 /2σ 2 )exp(z.ln(u0 (x, y)/ζ)]dz
−∞

+(1/2)exp(−u0 (x, y)/ζ).exp(−u(x, y)2 /2σ 2 )[1−B2 .(ln(u0 (x, y)/ζ)−(ζu(x, y)/σ 2 ))]
Using the standard Gaussian integral
 ∞ √
exp(−z 2 /2σ 2 ).exp(az)sz = σ 2πexp(σ 2 a2 /2)
−∞
we easily evaluate the above integral to get
 ∞
exp(−(u − zζ)2 /2σ 2 ).exp(z.ln(u0 /ζ))dz
−∞
 ∞
= exp(−u2 /2σ 2 ) exp(−z 2 ζ 2 /2σ 2 ).exp(z.(uζ/σ 2 + ln(u0 /ζ)))dz
−∞

= (σ 2π/ζ).exp(−u2 /2σ 2 ).exp((σ 2 /2ζ 2 )(uζ/σ 2 + ln(u0 /ζ))2 )
and hence
E(u0 (x, y)|u(x, y)) ≈

exp(−u0 /ζ).exp(−u /2σ ).[A.(σ 2π/ζ).exp((σ 2 /2ζ 2 )(uζ/σ 2 + ln(u0 /ζ))2 )
2 2

+(1/2)[1 − B2 .(ln(u0 (x, y)/ζ) − (ζu(x, y)/σ 2 ))]]


Setting the partial derivative of this likelihood function w.r.t u0 (x, y) at
û0 (x, y) to zero gives us the optimal equation

ζ[(Aσ 2π/ζ).exp((σ 2 /2ζ 2 )(uζ/σ 2 +ln(û0 /ζ))2 ).(u/ζ û0 +(σ 2 /û0 ζ 2 )ln(û0 /zeta))
−B2 /2û0 ]

= (Aσ 2π/ζ).exp((σ 2 /2ζ 2 )(uζ/σ 2 +ln(û0 /ζ))2 )+(1/2)(1−B2 (ln(û0 /ζ)−ζu/σ 2 ))
In the special case when we neglect the first order corrections in the Euler-
Maclaurin formula, this equation simplifies to
u = û0 + σ 2 .ln(û0 /ζ)
which can be solved perturbatively treating σ 2 as a small parameter to get
û0 ≈ u − σ 2 .ln(u/ζ)
Derivation of the Euler-Maclaurin summation formula
 n   k+1
n−1
f  (x)(x − [x] − 1/2)dx = f  (x)(x − k − 1/2)dx
0 k=0 k
Advanced Probability and Statistics: Applications to Physics and Engineering 31


n−1  k+1
= [(x − k − 1/2)f (x)|k+1
k − f (x)dx]
k=0 k


n−1  n
= (f (k + 1) + f (k))/2 − f (x)dx
k=0 0


n  n
= f (k) − (f (0) + f (n))/2 − f (x)dx
k=0 0

or equivalently,

n  n  n
f (k) = f (x)dx − (f (0) + f (n))/2 + f  (x)(x − [x] − 1/2)dx
k=0 0 0

Define
P1 (x) = x − [x] − 1/2
Then,  
n n
f  (x)P1 (x) = f  (x)P2 (x)|n0 − f  (x)P2 (x)dx
0 0
 n
= f  (n)P2 (n) − f  (x)P2 (x)dx
0
where  x  x
P2 (x) = P1 (x)dx = (x − [x] − 1/2)dx
0 0

In general, defining the sequence of functions Pn (x), n = 1, 2, ... recursively by


 x
Pn+1 (x) = Pn (x)dx
0

we get that

n
f (k) =
k=0
 n
f (x)dx−(f (0)+f (n))/2+P2 (n)f  (n)−P3 (n)f  (n)+...+(−1)N f (N −1) (n)PN (n)+
0
 n
(−1)N +1 f (N ) (x)PN (x)dx
0

We now briefly calculate Pn (x) for the first few values of n.


 x [x]−1  k+1
 
P2 (x) = (x−[x]−1/2)dx = (x−k−1/2)dx+ x]x (x−[x]−1/2)dx
0 k=0 k [

 {x}
=0+ (u − 1/2)du = (1/2)({x} − 1/2)2 − 1/4) = (1/2)({x}2 − {x})
0
32 Advanced Probability and Statistics: Applications to Physics and Engineering

which is a polynomial in {x} = x−[x]. Note that all the Pn (x) s are polynomials
in {x} and so we can write

Pn (x) = Bn ({x}), n = 1, 2, ...

where Bn (.) s are called the Bernoulli polynomials.


Application to other models in image processing: Consider the Poisson mix-
ture model
u = ζ1 P1 (u0 /ζ1 ) + ... + ζp Pp (u0 /ζp ) + w
where P1 (a1 ), ..., Pp (ap ) are independent Poisson r.v’s with means of a1 , ..., ap re-
spectively with ak = u0 /ζk and w is N (0, σ 2 ) and independent of P1 (a1 ), ..., Pp (ap ).
Then, the likelihood function of u given u0 is given by
∞

P (u|u0 ) = (σ 2π)−1 .exp(−(u−n1 ζ1 −...−np ζp )2 /2σ 2 )
n1 ,...,np =0

exp(−u0 (1/ζ1 +...+1/ζp ))(u0 /ζ1 )n1 ...(u0 /ζp )np

×(n1 !)−1 ...(np !)−1


in order to evaluate this, we require a multidimensional version of the Poisson
summation formula. To get this, we consider
[31] Fundamental characteristics of classical and quantum proba-
bility
A.Classical probability versus quantum probability
[1] p(x) is a probability distribution on a finite set A. E is a subset of A and
its probability of occurrence is

p(E) = − p(x)
x∈E

if f (x) is a random variable, its expected value is



Ep (f ) = p(x)f (x)
x∈A
There are two definite ways of describing this in quantum probability. First we
assume that the system is in a pure state
√ √
|ψ >= [ p1 exp(iθ1 ), ..., pn exp(iθn ]T

and the observable under consideration is

X = diag[f1 , f2 , ..., fn ]

where pj = p(xj ), fj = f (xj ) with A = {x1 , ..., xn }. Then

Ep (f ) =< ψ|X|ψ >


Advanced Probability and Statistics: Applications to Physics and Engineering 33

Second, consider the mixed state

ρ = diag[p1 , ..., pn ]

Then
Ep (f ) = T r(ρ.X)
Is is clear that for non-diagonal ρ and non-diagonal X, the expected value of
X defined by T r(ρX) in quantum probability is a generalization of the classical
scenario.

[2] If X, Y are two random variables in classical probability and both of them
assume discrete values, then X + Y will also assume only discrete values. This
is not so in quantum probability. We can have two observables (ie, self-adjoint
operators) A, B in the same Hilbert space assuming discrete values, ie, having
discrete spectrum, but A + B can have a continuous spectrum. For example,
The Hydrogen atom Hamiltonian H1 = p2 /2m − e2 /r has both discrete and
continuous spectrum and the 3-D Harmonic oscillator H2 = p2 /2m + Kr2 /2 has
discrete spectrum, but their difference

H = H2 − H1 = Kr2 /2 + e2 /r

is a multiplication operator and has only continuous spectrum.

[3] In classical probability theory, we have Bell’s inequality, ie, if X, Y, Z are


three r.v.’s assuming values only ±1, then we have

X(Y − Z) ≤ 1 − Y Z, X(Z − Y ) ≤ 1 − Y Z

and taking expectations gives us

|E(XY ) − E(XZ)| ≤ 1 − E(Y Z)

This is called Bell’s inequality. This does not hold in quantum probability. For
example, consider the three observables

X = (a, σ), Y = (b, σ), Z = (c, σ)

where a, b, c are unit vectors. Then, X, Y, Z all have only ±1 as their eigenval-
ues, but
XY = (a, b) + (i(σ, a × b)
Y Z = (b, c) + i(σ, b × c),
ZX = (c, a) + i(σ, c × a)
Thus, if ρ is any state in which all the three Pauli matrices have zero mean,
then in this state
T r(ρ(XY + Y X)/2) = (a, b),
T r(ρ(Y Z + ZY )/2) = (b, c),
34 Advanced Probability and Statistics: Applications to Physics and Engineering

T r(ρ(ZX + XZ)/2) = (c, a)


and we can certainly choose the unit vectors a, b, c such that

|(a, b) − (a, c)| > 1 − (b, c)

Exercise: Give an example of such unit vectors by considering the geometry


on the unit sphere and noting that the angle θ between two unit vectors u, v
satisfies cos(θ) = (u, v).
hint: Take a as the north pole of the sphere and c as the south pole. Then
choose b so that it makes an acute angle with c and an obtuse angle with a.
We then obtain the desired inequality.

[4] Let X be an observable having spectral measure EX (.). Then f (X) is an


observable having spectral measure EX (f −1 (.)). Indeed, we can write
 
X = x.dEX (x) = xEX (dx),
 
f (X) = f (x)EX (dx) = yEX (f −1 (dy))

Now, if U is another observable having spectral measure EU (.), then g(U ) has
the spectral measure EU (g −1 (.)). if X and U commute, then their spectral
measures also commute and we can talk of the joint probability distribution of
X, U in any state ρ as

Pρ (X ∈ dx, U ∈ du) = T r(ρ.EX (dx).EU (du))

or equivalently, as

Pρ (X ∈ A, U ∈ B) = T r(ρ.EX (A).EU (B))

Note that this is a well defined probability distribution since EX (.) commutes
with EU (.). To see that this is non-negative, we write it as

T r(ρ.EX (A).EU (B)) = T r(EU (B).EX (A)ρ.EX (A).EU (B))

This is possible only because [EX (A), EU (B)] = 0 for any two Borel sets A, B.
If they do not commute, this does not define a probability distribution, in fact,
T r(ρ.EX (A).EU (B)) may even be a complex number with non-vanishing imag-
inary part.

[5] In classical probability, we have De’Morgan’s law: if A, B, C are three


events, then
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
This fails in quantum probability. An event in quantum probability is simply
an orthogonal projection operator P or equivalently, a subspace of the Hilbert
space defined as the range of this orthogonal projection. If P, Q are two events,
Advanced Probability and Statistics: Applications to Physics and Engineering 35

then the event P ∩ Q is defined as the orthogonal projection onto R(P ) ∩ R(Q)
and likewise the event P ∪ Q is the orthogonal projection onto the closure
R(P ) + R(Q). To see how De’Morgan’s law can fail in quantum probability,
let the Hilbert space be R2 , P the orthogonal projection onto the line y =
m1 x and Q the orthogonal projection ont y = m2 x where m2 = ±m1 . Then
R(P )+R(Q) = R2 and hence P ∪Q = I. Thus, if S is the orthogonal projection
onto the line y = m3 x, we get that S ∩(P ∪ Q) = S. On the other hand, S ∩ P =
0, S ∩ Q = 0 if m3 = ±m1 , ±m2 and hence in this case, (S ∩ P ) ∪ (S ∩ Q) = 0.

[6] Gleason’s theorem on the existence of quantum probability spaces: Let


H be a Hilbert space and P (H) the lattice of orthogonal projections in H. One
of the postulates of quantum probability is that P (H) is to be interpreted as
the collection of events for the sample space H. Let μ : P (H) → [0, 1] be
a probability measure on this lattice of events, ie, μ(0) = 0, μ(1) = 1 and if
P1 , P2 , ... is a sequence in P (H) such that Pk Pm = 0, k = m, then
 
μ( Pk ) = μ(Pk )
k k

Then Gleason proved that there exists a unique positive semifinite operator ρ
in H having unit trace such that

μ(P ) = T r(ρ.P ), P ∈ P (H)

Thus, the quantum analogue of a classical probability space (Ω, F, P ) is a quan-


tum probability space (H, P (H), ρ). In order to see how classical probability
spaces are special cases of quantum probability spaces, we start with a classi-
cal probability space (Ω, F, P ) and define the Hilbert space H = L2 (Ω, F, P ).
For each F ∈ H, define PF = χF (ie, multiplication by the indicator of F ).
Then, PF , F ∈ F is a commuting family in P (H). We define ρ to be a positive
semidifinite operator in H so that

T r(ρ.f ) = f (ω)P (dω)
Ω

for any f ∈ H = L2 (Ω, F, P ). This is the same as requiring that



T r(ρ.PF ) = P (F ) = dP (ω)
F

Note that we can choose ρ to be positive semidefinite, since the above definition
implies that 
< f, ρf >= T r(ρ.|f |2 ) = |f (ω)|2 P (dω) ≥ 0

Formally, we can select such a ρ so that it is diagonal w.r.t to a given onb


{ek : k = 1, 2, ...} for H.

[32] A problem in statistical image processing


36 Advanced Probability and Statistics: Applications to Physics and Engineering

Let X, Y be two random vectors defined on the same probability space such
that
E(Y|X) = X
For example, let
X = [x1 , ..., xn ]T
have a pdf pX (x) and let Y = [y1 , ..., yn ]T be such that

yk = ζk Pk (xk /ζk ) + wk , k = 1, 2, ..., n

where given xk Pk (xk /ζk ) is a Poisson r.v. with mean xk /ζk and W = ((wk ))
is independent of X with zero mean and pdf pW (w). Then

EY|X) = X

and


pY (y|x) = pW (y1 −m1 ζ1 , ..., yn −mn ζn ).Πnk=1 [exp(−xk /ζk )(xk /ζk )mk (mk !)−1]
m1 ,...,mn =0

Note that the MAP estimator of X based on the noisy data Y is given by

X̂ = argmaxp(X|Y) = argmaxX p(Y|X)pX (X)

We now find the best linear predictor (BLP) of X given Y. It is given by

X̂ = AY + b

where the matrix A and vector bfb are chosen so that

E[ X − X̂ 2 ]

is a minimum. Optimizing, we get

E[(X − AY − b)YT ] = 0,

E[(X − AY − b] = 0
or equivalently,
ARY Y + bμTY = RXY ,
μX − AμY − b = 0
Now,
RXY = E(XYT ) = E[E(XYT |X)]
= EXE(YT |X) = E(XXT ) = RXX ,
RY = E[(YYT ]
Advanced Probability and Statistics: Applications to Physics and Engineering 37

In the special case of the Poisson model above, when W = 0, so that conditioned
on X, Y is independent Poissonian with mean X = ((xk )), we have that

E((yi yj )X) = xi xj , i = j,

E(yi2 |X) = ζi xi + x2i


we have
RY Y (i, j) = RXX (i, j) + δij ζi μX (i)
or equivalently,
RY Y = RXX + DXX
where
DXX = diag[ζi μX (i) : 1 ≤ i ≤ n]
Chapter 2

Quantum Scattering Theory

After a fairly rigorous mathematical formulation of quantum mechanics based on


transformation theory by Dirac in which he cast Heisenberg’s matrix mechanics
and Schrodinger’s wave mechanics on a common operator theoretic setting and
successfully explained their equivalence, it was John Von Neumann in monumen-
tal treatise ”Mathematical foundations of quantum mechanics” who proved the
spectral theorem for all self adjoint operators in an infinite dimensional Hilbert
space and thereby cast the masterly work of Dirac ”The principles of quantum
mechanics ” in a way that even pure mathematicians were able to understand.
Dirac in his book had also indicated the work of Jordan and Heisenberg on
quantum scattering theory but without the modern method of wave operators.
Lippmann and Schwinger gave a more rigorous discussion of scattering theory
via the introduction of free particle states and scattered states. Actually, free
particle states are not square integrable and hence cannot strictly be termed as
states. However by the use of wave operators the equations of Lippmann and
Schwinger could be made rigorous using the setting of the continuous spectrum
of a self-adjoint opearator introduced by Von Neumann, Crucial to this devel-
opment is the notion of the domain of definition of an unbounded operator.
This leads to the fact that the wave operators that appear in scattering theory
are not defined on all vectors but usually on those vectors that belong to the
continuous spectrum of the free particle Hamiltonian. Today scattering theory
is a well developed subject and the interested reader can get a good grasp of
this subject from the books
[1] Tosio Kato, ”Perturbation theory for linear operators”, Springer.
[2] Werner O.Amrein, ”Hilbert space methods in quantum mechanics”, CRC
press, 2012.

It should be noted that the first problem solved in scattering theory was by
Ernest Rutherford when he bombarded a gold foil with α-particles and from
the scattering pattern w.r.t the angles of deflection, he was able to predict
that the gold foil consisted of atoms with a positively charged nucleus. For
this, he made use of classical scattering theory which is based on calculating

39
40 Advanced Probability and Statistics: Applications to Physics and Engineering

the angle of deflection of a particle from its initial line of approach towards a
scattering centre that produces a repulsive inverse square law of force in terms
of the initial velocity of the particle and the distance of its line of approach
from the scattering centre. From this formula, it is easy to derive a relationship
between the deflection angle and the scattering cross section, ie, the number
of particles scattered per unit solid angle divided by the incident flux. This is
called classical Coulomb scattering. One of the striking facts in quantum theory
is that the wave operators and hence the scattering cross section are not defined
in Coulomb scattering and one has to modify the definition of the wave operator
by multiplying it by a time varying unitary operator so that its asymptotic limit
exists. These facts can be found in W.O.Amrein’s book.
[2] Quantum scattering theory:Scattering cross sections
Derivation of the wave operators, average time spent by the projectile in a
set and inside a cone in terms of the scattering matrix.
Let H = H0 + V ,
U0 (t) = exp(−itH0 ), U (t) = exp(−itH),

Ω+ = slimt→∞ U (−t)U0 (t), Ω− = limt→−∞ U (−t)U0 (t)


S = Ω∗+ Ω− , R = S − 1
Consider
P (f, C) = limt→∞  F (C)U (t)Ω− f 2 , F (C) = χC (Q)

|f > is the in-state of the free particle and after encountering the scatterer, it
goes into the in-scattered state Ω− |f > which evolves with time as U (t). Thus,
 F (C)U (t)Ω− f 2 equals the probability that at time t, the scattered particle’s
position will fall within the set C and hence P (f, C) represents the probability
that after a very long time, the particle’s position eventually falls within the set
C. It is easy to see that if C is a cone with apex at the origin and opening to
r = +∞ subtending a solid angle of Ω0 , then tC = C∀t > 0. Now, we have

U (t)Ω− = Ω− U0 (t)

and hence,
P (f, C) = limt→∞  F (C)Ω− U0 (t)f 2
We have
|  F (C)U (t)Ω− f 2 −  F (C)U0 (t)Sf 2 |
= ( F (C)U (t)Ω− f  +  F (C)U0 (t)Sf ).|  F (C)U (t)Ω− f 
−  F (C)U0 (t)Sf  |
≤ 2  f  .  F (C)(U (t)Ω− f − U0 (t)Sf ) 
≤ 2  f  .  U (t)Ω− f − U0 (t)Sf 
Now,
 (U (t)Ω− − U0 (t)S)f 
Advanced Probability and Statistics: Applications to Physics and Engineering 41

= Ω− f − U (−t)U0 (t)Sf → Ω− f − Ω+ Sf 


= Ω− f − Ω+ Ω∗+ Ω− f 
We note observe that Ω+ is an isometry and hence Ω+ Ω∗+ is the projection onto
R(Ω+ ). In particular, if the condition

R(Ω− ) ⊂ R(Ω+ )

is satisfied, then
Ω+ Ω∗+ Ω− = Ω−
holds good and then we get

P (f, C) = limt→∞  F (C)U0 (t)Sf 

= limt→∞  U0 (−t)F (C)U0 (t)Sf 


Now,
U0 (−t)F (C)U0 (t) = exp(iP 2 t)χC (Q).exp(−iP 2 t)
= χC (exp(iP 2 t).Q.exp(−iP 2 t))
Now,
exp(iP 2 t).Q.exp(−iP 2 t) = exp(it.ad(P 2 ))(Q) = Q + 2tP
while on the other hand,

exp(iaQ2 )P.exp(−iaQ2 ) = exp(ia.ad(Q2 ))(P ) = P − 2aQ = −2a(Q − P/2a)

and hence with a = −1/(4t), we get

exp(−iQ2 /4t)P.exp(iQ2 /4t) = (1/2t).exp(iP 2 t).Q.exp(−iP 2 t)

and hence

exp(−iQ2 /4t)χC (2tP ).exp(iQ2 /4t) = exp(iP 2 t).χC (Q).exp(−iQ2 t)

so that we get
 F (C)U0 (t)Sf = χC (2t.P )Zt (Q)Sf 
where
Zt (Q) = exp(iQ2 /4t)
is a multiplicative unitary operator. Taking C as the cone described above, we
have that χC (2t.P ) = χC (P ), ∀t > 0 and hence

P (f, C) = limt→∞  χC (P ).Zt (Q)Sf 2

= χC (P )Sf 2
since
Zt (Q) → 1, t → ∞
42 Advanced Probability and Statistics: Applications to Physics and Engineering

By using the fact that Fourier transformation F preserves the norm, we get

P (f, C) = |FSf (k)|2 dn k
C

[3] Problems in scattering theory


A.Coulomb scattering.

 t
H0 = P 2 , H = H0 +V (Q), Xt = V (2P t)dt, U (t) = exp(−itH), U0 (t) = exp(−itH0 )
0

d
(U (t)∗ U0 (t)exp(−iXt )f )
dt
= iU (t)∗ (V (Q) − V (2P t))U0 (t).exp(−iXt )f
d
 (U (t)∗ U0 (t)exp(−iXt )f ) =
dt
 (V (Q) − V (2P t))U0 (t)exp(−iXt )f =
 (U0 (−t)V (Q)U0 (t) − V (2P t))exp(−iXt )f 
U0 (−t)QU0 (t) = exp(itad(P 2 ))(Q) = Q + 2tP
U0 (−t)V (Q).U0 (t) = V (Q + 2P t)
So
d
 (U (t)∗ U0 (t)exp(−iXt )f ) =
dt
= (V (Q + 2P t) − V (2P t))exp(−iXt )f 
Alternately,
V (Q + 2P t) = Zt (Q).V (2P t)Zt (Q)∗
where
Zt = Zt (Q) = exp(−iQ2 /4t)
since
exp(−iad(Q2 )/4t)(P ) = P + Q/2t
Thus,
d
 (U (t)∗ U0 (t)exp(−iXt )f ) =
dt
 1
 d/dx(exp(−ix.ad(Q2 )/4t)(V (2P t)).exp(−iXt )f.dx 
0
 1
≤  d/dx(exp(−ix.ad(Q2 )/4t)(V (2P t)).exp(−iXt )f  .dx
0
 1
= (4t)−1  [Q2 , V (2P t)].Z(t/x)

.exp(−iXt )f  .dx
0
Advanced Probability and Statistics: Applications to Physics and Engineering 43

Now,
[Q2 , V (2P t)] = Q.[Q, V (2P t)] + [Q, V (2P t)].Q
= 2it(Q.V  (2P t) + V  (2P t).Q) = 2it([Q, V  (2P t)] + 2V  (2P t).Q)
= −(2t)2 V  (2P t) + 4itV  (2P t).Q = (2t)2 V  (2P t) + 4itQ.V  (2P t)
Now, with gt = exp(−iXt )f , we have

 V  (2P t)Z(t/x)

.gt 

= Z(t/x) V  (2P t).Z(t/x)



gt 

= V  (xQ)U0 (t/x)gt 
and
 V  (2P t).QZ(t/x)

gt 

= V  (xQ).U0 (t/x)Qgt 
Now,

Q2 gt = Q2 exp(−iXt )f = ([Q2 , exp(−iXt )] + exp(−iXt )Q2 )f

and
[Q2 , exp(−iXt )] = Q.[Q, exp(−iXt )] + [Q, exp(−iXt )].Q
= Q.Xt (P )exp ∗ (−iXt ) + Xt (P ).exp(−iXt )Q
and  t
Xt (P ) = (2s)V  (2sP )ds
0

For
V (Q) = K/|Q|
we get
Xt (P ) = (K/|2P |)ln(t) = V (2P ).ln(t)
[Q, exp(−iXt )] = Xt (P )exp(−iXt )
or equivalently,
exp(iXt )Q.exp(−iXt ) = Q + Xt (P )
Then,
exp(iXt )F (Q).exp(−iXt ) =) = F (Q + Xt (P ))
so that

[F (Q), exp(−iXt )] = exp(−iXt )(F (Q + Xt (P )) − F (Q))

= exp(−iXt )(F (Q + 2V  (2P ).ln(t)) − F (Q))


44 Advanced Probability and Statistics: Applications to Physics and Engineering

[4] Hausdorff-Young inequality as a tool for verifying the existence of


the wave operator on a class of functions.
Let p ≥ 2. Let F denote the Fourier transform on L1 (Rn ). Define 1 ≤ q ≤ 2
by
1/p + 1/q = 1
For x ∈ [0, 1], define

f (x) = F f 2/(1−x) /  f 2/(1+x)

Then by the Hadamard three line theorem,

f (x) ≤ f (0)1−x .f (1)x , x ∈ [0, 1]

Now by Parseval’s theorem,

f (0) = 1, f (1) = F f ∞ /  f 1 =≤ (2π)−n/2

and hence we get

 F f 2/(1−x) ≤ K(x).  f 2/(1+x) , x ∈ [0, 1]

where
K(x) = (2π)−nx/2
Thus we obtain by taking 2/(1 − x) = p, 2/(1 + x) = q that

 F f p ≤ K(1 − 2/p)  f q

which is the famous Hausdorff-Young inequality.

An application:
Let p, q be such that
1 = 1/p + 1/q, p ≥ 2
Let q  , q  be defined by

1/2 = 1/p + 1/q  , 1 = 1/q  + 1/q 

Then, q  ≥ 2,
1/2 = 1/q  − 1/p
ie,
1/q  = 1/2 + 1/p
and we have by Holder’s and Hausdorff-Young inequalities,

 φ(Q)ψ(P )f 2 ≤ φ(Q) p  ψ(P )f q

≤ K(q  ).  φ(Q) p .  F ψ(P )f q


= K(q  ).  φ(Q) p .  ψ(Q)F f q
Advanced Probability and Statistics: Applications to Physics and Engineering 45

≤ K(q  ).  φ(Q) p .  ψ(Q) p .  F f 2


= K(q  ).  φ p .  ψ p .  f 2
and therefore,
 φ(Q).ψ(P ) ≤ φ p .  ψ p

[5] Other problems in scattering theory


[1] Let p ≥ 2 and let φ, ψ ∈ Lp (Rn ) where p ≥ 2. Then define

φN (Q) = φ(Q)χN (Q), ψN (Q) = ψ(Q)χN (Q)

where χN is the indicator of the closed ball in Rn with centre at the origin and
radius N . It is readily seen that

 φ − φN p → 0, N → ∞,

 ψ − ψN p → 0, N → ∞
Further,
 φ(P )ψ(Q) − φN (P )ψN (Q) 
= (φ(P ) − φN (P ))ψ(Q) + φN (P )(ψ(Q) − ψN (Q)) 
≤ (φ(P ) − φN (P ))ψ(Q)  +  φN (P )(ψ(Q) − ψN (Q)) 
K ≤ φ − φN p .  ψ p
+K  φN p .  ψ − ψN p → 0, N → ∞
In other words, φN (P )ψN (Q) converges in the operator norm to φ(P )ψ(Q).
Thus, if we can prove that φN (P )ψN (Q) is compact for each N > 0, then
we would establish that φ(P )ψ(Q) is a compact operator. Equivalently, by
the unitary equivalence of Q, P via the Fourier transform, we would get that
φ(Q)ψ(P ) is compact. Now, for any f ∈ L2 (Rn ), we have

F φN (P )ψN (Q)f (Q)(k) = φN (k) ψ̃N (k − k  )f˜(k  )dn k 

and hence, in the Fourier F φN (P )ψN (Q)F −1 has the kernel φN (k)ψ̃N (k − k  ).
Thus its Hilbert-Schmidt norm is given by

 φN (P )ψN (Q) HS = |φN (k)ψ̃N (k − k  )|2 dn kdn k 

 
=( |φN (k)|2 dn k).( |ψ̃N (k)|2 dn k
 
= ( |φN (k)|2 dn k).( |ψN (k)|2 dn k) < ∞
46 Advanced Probability and Statistics: Applications to Physics and Engineering

Hence for each finite positive N , φN (P )ψN (Q) is Hilbert-Schmidt and hence
compact. This proves that φ(P )ψ(Q) and φ(Q)ψ(P ) are compact operators
whenever φ, ψ ∈ Lp (Rn ) for some p ≥ 2.

[6] Coulomb scattering


 t
V (Q) = C/|Q|, Xt (P ) = V (2P s)ds = (C/2|P |)ln(t)
1

d/dt(U (t)∗ U0 (t)exp(−iXt )f ) = iU (t)∗ (V (Q) − V (2P t))U0 (t).exp(−iXt )f


 d/dt(U (t)∗ U0 (t)exp(−iXt )f ) =
 1
 d/dx(exp(−ixQ2 /4t)V (2P t).exp(ixQ2 /4t)).exp(−iXt )f 
0
 t
−1
≤ (4t)  [Q2 , V (2P t)].exp(ixQ2 /4t).exp(−iXt )f  .dx
0

[Q2 , V (2P t)] = 2it(Q.V  (2P t) + V  (2P t).Q)


exp(−ixQ2 /4t)(Q.V  (2P t) + V  (2P t).Q).exp(ixQ2 /4t)
= Q.exp(−ixQ2 /4t)V  (2P t).exp(ixQ2 /4t)+exp(−ixQ2 /4t)V  (2P t).exp(ixQ2 /4t)/Q
= Q.U0 (−t/x)V  (Q)U0 (t/x) + U0 (−t/x)V  (Q)U0 (t/x).Q
Q.U0 (−t)V  (Q)U0 (t) = ([Q, U0 (−t)] + U0 (−t)Q).V  (Q)U0 (t)
= U0 (−t)(−2tP + Q).V  (Q)U0 (t)
Q.V  (2P t) + V  (2P t).Q = [Q, V  (2P t)] + 2V  (2P t).Q
= 2itV  (2P t) + 2V  (2P t).Q
exp(−ixQ2 /4t)(Q.V  (2P t) + V  (2P t).Q).exp(ixQ2 /4t)
= 2itU0 (−t/x)V  (Q)U0 (t/x) + 2U0 (−t/x)V  (Q)U0 (t/x).Q
 [Q2 , V (2P t)].exp(ixQ2 /4t).exp(−iXt )f 
≤ 2t  V  (Q)U0 (t/x)exp(−iXt )f 
+2  V  (Q)U0 (t/x).Qexp(−iXt )f 
We write
< Q >= (1 + Q2 )1/2
Then for large |Q|,

|V  (Q)| ≤ C1 . < Q >−2 , |V  (Q)| ≤ C2 . < Q >−3

 V  (Q)U0 (t/x)exp(−iXt )f 
≤ C2 < Q >−3 U0 (t/x)exp(−iXt )f 
 V  (Q)U0 (t/x).Qexp(−iXt )f ≤
Advanced Probability and Statistics: Applications to Physics and Engineering 47


C1 < Q >−2 U0 (t/x)Qj exp(−iXt )f 
j

We now consider
P.QU0 (s)exp(−iXt ) = P.([Q, U0 (s)] + U0 (s)Q)exp(−iXt )

= P.(2sP U0 (s) + U0 (s)Q)exp(−iXt )


= 2sU0 (s)H0 exp(−iXt ) + U0 (s)P.Qexp(−iXt )
Also,
P.Qexp(−iXt ) = P.([Q, exp(−iXt )] + exp(−iXt )Q)
= P.(−CP ln(t)/|P |3 )exp(−iXt ) + exp(−iXt )P.Q
1/2
= (−Cln(t)/H0 )exp(−iXt ) + exp(−iXt )P.Q
Thus,
−1/2
P.QU0 (s)exp(−iXt )g = U0 (s)exp(−iXt )2sH0 g − Cln(t)U0 (s)exp(−iXt )H0 g

+U0 (s)exp(−iXt )P.Qg


Writing
f = 2sH0 g
gives us

U0 (s)exp(−iXt )f
−3/2
= P.QU0 (s)exp(−iXt )(2sH0 )−1 f +Cln(t)(2s)−1 U0 (s)exp(−iXt )H0 f
Also
P.Q = Q.P − [Q, P ] = Q.P − in
so
U0 (s)exp(−iXt )f
= Q.U0 (s)exp(−iXt )(2sH0 )−1 P f −in(2s)−1 U0 (s)exp(−iXt )H0−1 f
−3/2
+Cln(t)(2s)−1 U0 (s)exp(−iXt )H0 f
Also note that
< Q >−2 U0 (s)Qj exp(−iXt )f 
=< Q >−2 ([U0 (s), Qj ] + Qj U0 (s))exp(−iXt )f 
=< Q >−2 (−2sPj + Qj )U0 (s)exp(−iXt )f 
Suppose we make the inductive hypothesis that for large t,

< Q >−k U0 (t/x)exp(−iXt )f ≤ fk (t)ck (f )

Then we easily deduce from the above inequalities that we can take any fk+1 , ck+1
satisfying
fk+1 (t)ck+1 (f ) ≥
48 Advanced Probability and Statistics: Applications to Physics and Engineering

−3/2
t−1 .fk (t)ck ((2H0 )−1 Pj f )+fk (t)ck (n(2H0 )−1 f )+C.(ln(t)/t)fk (t)ck (t)(2−1 H0 f)
Remark: We have made use of

< Q >−k−1 U0 (s)exp(−iXt )f =

< Q >−k−1 Q.U0 (s)exp(−iXt )(2sH0 )−1 P f −in(2s)−1


< Q >−k−1 U0 (s)exp(−iXt )H0−1 f
−3/2
+Cln(t)(2s)−1 < Q >−k−1 U0 (s)exp(−iXt )H0 f
with
s = t/x, x ∈ (0, 1)
Note that we have also used

< Q >−1 = 1

and therefore,
< Q >−k−1 φ ≤< Q >−k φ 

[7] Some remarks in operator theory related to quantum scatter-


ing. Explicit formulas for the scattering cross section for radially symmetric
potentials in terms of spherical harmonic expansions.
[1] V (r) is a radial potential. The incident wave function is

C.exp(−ikz) = C.exp(−ikr.cos(θ))

=C r−1 (al (k)exp(ikr)+bl (k).exp(−ikr))Pl (cos(θ))
l≥0

The total wave function after scattering is in the asymptotic limit r → ∞ given
by
C.exp(−ikr.cos(θ)) + f (θ).exp(ikr)/r

= r−1 .exp(ikr)(f (θ) + Cal (k).Pl (cos(θ)))
l

+r−1 .exp(−ikr). Cbl (k).Pl (cos(θ))
l

On the other hand, by solving the Schrodinger equation:


(−1/2mr)(rRl (r)) + l(l + 1)Rl (r)/2mr2 + V (r)Rl (r) = ERl (r)
we get for r → ∞ (assuming V (r) → 0, r → ∞) that in this limit,

(−1/2mr)(rRl (r)) + l(l + 1)Rl (r)/2mr2 ≈ ERl (r)

so that the full solution



cl (k)Rl (r).Pl (cos(θ))
l
Advanced Probability and Statistics: Applications to Physics and Engineering 49

has the asymptotic form



(cl (k)r−1 exp(ikr) + dl (k)r−1 exp(−ikr)).Pl (cos(θ))
l

and we get on matching these two expressions for the asymptotic wave function,
 
f (θ) + C al (k).Pl (cos(θ)) = cl (k)Pl (cos(θ))
l l
 
C. bl (k).Pl (cos(θ)) = dl (k).Pl (cos(θ))
l l

This second equation implies that

C.bl (k) = dl (k), k ≥ 0

and the first gives



f (θ) = (cl (k) − C.al (k)).Pl (cos(θ))
l

The total scattering cross section is then given by


 π
σtot (k) = |f (θ)|2 2π.sin(θ)dθ
0

= |cl (k) − C.al (k)|2
l

Some properties of compact operators.


[2] Let H be a separable Hilbert space. If Tn , n ≥ 1 is a sequence of compact
operators such that
 Tn − T → 0
then T is compact.
Proof: Let xn be a bounded sequence. Then T1 xn has a convergent sub-
sequence say T1 (x(1, n1 (k))) → y1 . Here, x(1, n1 (k)) is a subsequence of xn .
This follows from the compactness of T1 . Likewise, x(1, n1 (k)) has a subse-
quence x(2, n2 (k)) such that T2 (x(2, n2 (k)) is convergent: T2 (x(2, n2 (k)) → y2 .
Proceeding this way, we get that for each r = 1, 2, ..., x(r−1, nr−1 (k)) has a sub-
sequence x(r, nr (k)) such that Tr (x(r, nr (k)) is convergent, say Tr (x(r, nr (k)) →
yr . Now consider the ”diagonal subsequence” z(k) = x(k, nk (k)), k = 1, 2, ... of
xn . This sequence is a subsequence of each of the sequences {x(r, nr (k))}, r =
1, 2, ... and hence
T m zk → y m , k → ∞
We have
 ym − yn = limk  (Tm − Tn )zk 
50 Advanced Probability and Statistics: Applications to Physics and Engineering

≤ R.  Tm − Tn → 0, m, n → ∞
where
R = sup  xn 
Thus, {yn } is a Cauchy sequence and hence convergent, say

yn → y

Finally, we have

 T zk − y = (T − Tm )zk + Tm zk − ym + ym − y 

≤ T − Tm  .  zk  +  Tm zk − ym  +  ym − y 
We then get
limsupk  T zk − y ≤ R.  T − Tm 
+  ym − y 
Letting m → ∞ then gives us

limk  T zk − y = 0

ie, the subsequence {zk } of {xn } has the property that T zk → y proving that
T is compact.

[3] Hilbert Schmidt operators are compact. Let H be a separable Hilbert


space with onb |en >. An operator T in H is said to be Hilbert-Schmidt if

T r(T ∗ T ) =  T e n 2 < ∞
n

Note that the quantity n  T en 2 does not depend upon the onb {en } chosen
(prove this). In this case, we define the Hilbert-Schmidt norm of T by

 T HS = T r(T ∗ T )

It is easy to verify that this is a norm on the Banach space of Hilbert-Schmidt


operators. Further,

 T = sup x ≤1  T x ≤ T HS

This follows by noting that any x ∈ H can be expanded as



x= |en >< en , x >
n

and so 
 T x 2 ≤ (  T en  .| < en , x > |)2
n
Advanced Probability and Statistics: Applications to Physics and Engineering 51

 
≤(  T en 2 ). | < en , x > |2 = T 2HS  x 2
n n

Now if T is HS, we define the operators


N
TN = |en >< en |T |em >< em |
n,m=1

Then TN is a finite rank operator and further

 T − TN 2 ≤ T − TN 2HS

= | < en |T |em > |2 → 0, N → ∞
n>N orm>N

since
 
| < en |T |em > |2 = < en |T ∗ T |en > | = T en 2 = T HS < ∞
n,m≥1 n

Thus, any Hilbert-Schmidt operator is the limit in operator norm of a sequence


of finite rank operators and hence it is also compact. Note that finite rank
operators are compact (why).

[8] Miscellaneous problems in group representation and scattering


theory
[1] Let H be a self-adjoint operator and let V be another self-adjoint oper-
ator. Let W = W (H + V, H) be the associated wave operator, ie

W (H + V, H) = slimt→∞ exp(it(H + V )).exp(−itH)

on an appropriate domain. Show that


 ∞
W −I =i exp(it(H + V ))V.exp(−itH)dt
0

Show that W ∗ W = I on D(W ), ie, W is an isometry. Show that W W ∗ is the


orthogonal projection onto R(W ). Show that

W ∗ exp(it(H + V )) = exp(itH)W ∗

on D(W ∗ ) and hence deduce that


 ∞
I − W∗ = i exp(itH).W ∗ V.exp(−itH)dt
0

Hence show that W satisfies the linear equation

W = I + Γ(V W ) − − − (1)
52 Advanced Probability and Statistics: Applications to Physics and Engineering

where  ∞
Γ(X) = i exp(itH)V X.exp(−itH)dt
0

Now solve (1) by iteration after introducing a perturbation parameter δ to write


it as
W = I + δ.Γ(V W )
Under what conditions on V does this perturbation series converge ? For this,
first you may assume that V has finite rank, ie,


p
V = c(k)|ek >< ek |, c(k) ∈ R, < ek |em >= δ(k, m)
k=1

Note that

p  ∞
Γ(XV ) = i c(k) exp(itH)|ek >< ek |X.exp(−itH)dt
k=1 0

and hence

p  ∞
< u|Γ(XV )|v >= i c(k) < u|exp(itH)|ek >< ek |X.exp(−itH)|v > dt
k=1 0

Show that W transforms Hac (H) into Hac (H + V ). To do this part, we note
that
W.exp(itH) = exp(it(H + V ))W, t ∈ R
and hence for any Borel set B ⊂ R, we have that

W EH (B) = EH+V (B)W

from which we get that

 EH (B)|u >2 = EH+V (B)W |u >2

where EH (.) is the spectral measure of H while EH+V (.) is the spectral measure
of H + V . Therefore, in particular, the measure B → EH (B)|u >2 is abso-
lutely continuous iff the measure B → EH+V (B)W |u >2 is absolutely contin-
uous which is equivalent to saying that |u >∈ Hac (H) iff W |u >∈ Hac (H + V ).
Note that 
exp(itH) = exp(itx)EH (dx)

implies that if f is in L1 and fˆ is the Fourier transform of f , then



f (H) = fˆ(x)EH (dx)
ˆ
Advanced Probability and Statistics: Applications to Physics and Engineering 53

Also, the resolvent of H at z ∈ C, Im(z) > 0 is given by


 ∞
R(z, H) = (H − z)−1 = i exp(−itH)exp(itz)dt
0

= (x − z)−1 EH (dx)

Also,  ∞
θ(t)exp(−itx)dt = 1/ix + π.δ(x)
0
This formula implies that
 ∞
πδ(H − y) − iR(y, H) = θ(t)exp(−it(H − y))dt
0

We have  b
(R(H, x + i ) − R(H, x − ))dx
a+0
  b
= dE(y) ((y − x − i )−1 − (y − x + i )−1 )dx
R a+0
  b
= dE(y) 2i .((y − x)2 + 2 −1
) dx
R a+0

which converges as → 0+ to
  b  b
2πi dE(y) δ(y − x)dx = 2πi dE(x)
R a+0 a+0

= 2πiE((a, b])
Thus we have a formula that expresses directly the spectral measure in terms
of the resolvent for a self-adjoint operator.

[2] We wish to show that if the wave operator

Ω+ = slimt→∞ exp(itH).exp(−itH0 )

exists on D, then for any f ∈ D, we have


 ∞
Ω+ |f >= lim →0+ ( exp(− t)exp(itH).exp(−itH0 )|f > dt
0

To prove this we observe that for any T > 0,


 ∞
 Ω+ f − exp(− t)exp(itH).exp(−itH0 )|f > dt 
0
 ∞
= .exp(− t)(Ω+ − exp(itH).exp(−itH0 ))f dt 
0
54 Advanced Probability and Statistics: Applications to Physics and Engineering

 T
≤  (Ω+ − exp(itH).exp(−itH0 ))f  .exp(−epsilont)dt
0
 ∞
+  (Ω+ − exp(itH).exp(−itH0 ))f  . .exp(− t)dt
T
Now by hypothesis, for any δ > 0, there exists a finite T = T (δ) such that

 (Ω+ − exp(itH).exp(−itH0 ))f < δ∀t > T

Hence we get from the above inequality,


 ∞
 Ω+ f − exp(− t)exp(itH).exp(−itH0 )|f > dt 
0

≤ 2T .  f  +δ  f  .exp(− T )
Letting first → 0 and then δ → 0 yields the desired result.

[3] Every symmetric operator is closed since for such an operator A, we have
that A ⊂ A∗ and A∗ is closed. We wish to explore two issues. One, what are all
the closed symmetric extensions of A and two, when does A have a self-adjoint
extension. To this end, define the deficiency indices ν± by

ν− = dimN (A∗ + i), ν− = dimN (A∗ − i)

We note that
ν− = dimR(A − i)⊥ , ν+ = dimR(A + i)⊥
Note that (A ± i)−1 exist and are bounded. This follows for example from

 (A + i)f 2 = Af 2 +  f 2 , f ∈ D(A)

so that N (A + i) = {0} and

 (A + i)−1 f 2 = f 2 −  A(A + i)−1 f 2 ≤ f 2 , f ∈ R(A + i)

A + i thus maps D(A) = D(A + i) = R((A + i)−1 ) onto R(A + i) The bounded
operator U = (A + i)(A − i)−1 maps R(A − i) onto R(A + i). The former
subspace has a co-dimension of ν− while the latter subspace has a co-dimension
of ν+ . It is easy to see that U is an isometry. It is in fact easy to see that U
is a unitary operator between the space R(A − i) and R(A + i). Now let V be
a unitary operator between a subspace M1 of R(A − i)⊥ and a subspace M2
of R(A + i)⊥ . Then W = U ⊕ V is a unitary operator between the subspace
R(A − i) ⊕ M1 and the subspace R(A + i) ⊕ M2 . We can recover A from its
Cayley transform U via the equation

A = i(U − 1)−1 (U + 1)

It should be noted that the domain of U is R(A − i) and for f = (A − i)g ∈


R(A − i) , we have
U (A − i)g = U f = (A + i)g
Advanced Probability and Statistics: Applications to Physics and Engineering 55

so that
(U − 1)Ag = i(U + 1)g, g ∈ D(A)
We have
(1 + U )(A − i)g = (A + i)g + (A − i)g = 2Ag, g ∈ D(A)

and
(1 − U )(A − i)g = (A − i)g − (A + i)g = −2ig, g ∈ D(A)
These give
Ag = −i(1 + U )(1 − U )−1 g, g ∈ D(A)
Thus,
D(A) = D((1 − U )−1 )) = R(1 − U )
Let A be closed symmetric. We then wish to determine the structure of D(A∗ )andA∗ .
We claim that
D(A∗ ) = D(A) + R(A − i)⊥ + R(A + i)⊥
with f ∈ D(A) implying A∗ f = Af , f ∈ R(A − i)⊥ implying A∗ f = −if and
f ∈ R(A + i)⊥ implying A∗ f = if . To see that A∗ is well defined, suppose
f ∈ D(A), g ∈ R(A − i)⊥ = N (A∗ + i), h ∈ R(A + i)⊥ = N (A∗ − i) are such
that
f +g+h=0
Then, g + h ∈ D(A) and we have
A∗ (f +g+h) = Af −ig+ih = −A(g+h)−ig+ih = −(A+i)g−(A−i)h
= −(A∗ +i)g−(A∗ −i)h = 0
and hence A∗ is consistently defined. We have further, for f ∈ D(A), g ∈
N (A∗ + i), h ∈ N (A∗ − i) and u ∈ D(A) that
< f + g + h, Au >=< A∗ (f + g + h), u >=< Af − ig + ih, u >
Suppose now that f ∈ D(A∗ ). We wish to show that f can be expressed as the
sum of three vectors in D(A), N (A∗ + i) and N (A∗ − i) respectively. We first
project (A∗ + i)f onto R(A + i) and hence we can write

(A∗ + i)f = (A + i)f1 + f2 , f1 ∈ D(A), f2 ∈ R(A + i)⊥ = N (A∗ − i)

Then,
(A∗ + i)(f − f1 ) = f2
since D(A) ⊂ D(A∗ ) by the assumed symmetry of A. Now,

(A∗ − i)f2 = 0

which implies that


(A∗ + i)f2 = 2if2
56 Advanced Probability and Statistics: Applications to Physics and Engineering

and hence
(A∗ + i)(f − f1 ) + (A∗ + i)if2 /2 = 0
or equivalently,
(A∗ + i)(f − f1 + if2 /2) = 0
and therefore,
f − f1 + if2 /2 ∈ N (A∗ + i)
Chapter 3

Linear Algebra and


Operator Theory

Study topics in Applied linear algebra,


Some remarkable inequalities in linear algebra and matrix theory have been
obtained by the famous Indian mathematician Rajendra Bhatia. These in-
equalities are based on generalizations of the minimax variational principle for
computing the eigenvalues of a matrix. The main idea behind many of these
inequalities is that one applies the minimax principle to symmetric and anti-
symmetric tensor products of matrices.
[1] We shall survey briefly some important concepts in linear algebra espe-
cially the important matrix decomposition theorems.

Basics: Finite and infinite dimensional vector spaces, notion of a Hamel


basis. The topology on finite and infinite dimensional vector spaces induced by
a norm. Completeness of infinite dimensional vector spaces in terms of Cauchy
sequences ie the notion of a Banach space. norm induced by an inner product on
a vector space, completeness under the norm induced by the inner product, ie
Hilbert spaces as complete inner product spaces. Dual of a vector space in finite
and infinite dimensions, isomorphism of vector spaces, isomorphism of normed
vector spaces and isomorphism of Hilbert spaces.
Commonly occurring examples of normed linear spaces:
[1] Fn where F is a field such as R, C or a finite field with the Lp -norm with
p ≥ 1: 
 x = ( |xi |p )1/p
i

proof of the triangle inequality using Holder’s inequality.


[2] If Y is a normed vector space and X is set, then the set of all maps
f : X → Y with
 f = supx∈X  f (x) 
is a normed vector space. Show that it satisfies the triangle inequality.

57
58 Advanced Probability and Statistics: Applications to Physics and Engineering

[3] If (X, μ) is a σ-finite measure space and Y is a normed vector space, then
the set of all measurable maps f : X → Y with X  f (x) p dμ(x) < ∞ is a
normed vector space and in fact a Banach space with norm

 f p = ( |f (x)|p dμ(x))1/p

prove that this is indeed a norm and completeness under this norm.
[4] If X is a normed linear space and Y is a Banach space, then the set of
all bounded continuous maps f : X → Y with the norm

 f = supx∈X  f (x) 

is a Banach space. By continuity of f , we mean that if xn is a sequence in X


and x ∈ X is such that  xn − x → 0, then  f (xn ) − f (x) → 0.
[5] Dual of a vector space in finite and infinite dimensions. Dual of the Lp
and lp normed linear spaces are isomorphic to the Lq and lq spaces respectively
where 1/p + 1/q = 1. The proof is based on Holder’s inequality and so before
discussing duals, we must prove the Holder inequality and state that for p = 2,
it is the same as the Cauchy-Schwarz inequality for Hilbert spaces. We must
state that normed linear spaces and Banach spaces give length and distance
measure but do not contain geometry like the angle between two vectors, notion
of orthogonality of vectors etc., while the Hilbert spaces contain geometry. The
dual of a Hilbert space is isomorphic to itself for any bounded linear functional
on a Hilbert space can be represented uniquely by an inner product with a
vector. This is the content of the Riesz representation theorem.
[6] Orthonormal bases in separable and non-separable Hilbert spaces. The
Bessel inequality and the Parseval equality. Interpretation of the Parseval equal-
ity in terms of power in the Fourier series components by considering the spe-
cial example of the Fourier series as an orthonormal basis for the Hilbert space
L2 [0, T ].
[7] Weak and strong convergence of vectors in finite and infinite dimensional
normed vector spaces. Integration and differentiation of functions of a real
variable with values in a Banach space.
[8] Linear operators in a normed vector space. The finite superposition of
operators, multiplication of operators and hence the algebra of operators in a
vector space. Linear operators between two Banach spaces. The operator norm
induced by norms in the two Banach spaces. Properties of the operator norm
and proof that the vector space of bounded operators (ie, operators with finite
operator norm) between two Banach spaces forms a Banach space and if both
the input and output Banach spaces are the same we get the result that the
space of bounded linear operators in a Banach space is a Banach algebra.

[9] Tensor products of Hilbert spaces and properties of the tensor product.
The Boson and Fermion Fock spaces with application to describing the nature
of the state of bosons and Fermions. Gelfand-Naimark-Segal (GNS) principle
for construction of the tensor product between Hilbert spaces based on Schur’s
Advanced Probability and Statistics: Applications to Physics and Engineering 59

positivity theorem for matrices and the Kolmogorov consistency theorem for
stochastic processes. (This proof is due to Professor K.R.Parthasarathy).

[10] The principle of uniform boundedness. If X, Y are Banach spaces and


Tn : X → Y, n = 1, 2, ... is a sequence of bounded linear operators such that
for each x ∈ X { Tn x } is a bounded sequence of positive real numbers, then
{ Tn  is also a bounded sequence of positive real numbers. Proof of this fact
is based on Baire’s category theorem.

[11] The Hahn-Banach theorem on extension of linear functionals from a


subspace of a Banach space to the entire space without increasing its norm.

[12] The unit open ball in a finite dimensional normed vector space is compact
while that in an infinite dimensional normed vector space is non-compact. Proof
of this fact.

[13] Proof of the fact that all norms on a finite dimensional vector space
are equivalent, ie generate the same topology which is not the case for infinite
dimensional vector spaces.

[14] Vector space, Banach space and Hilbert space isomorphisms with exam-
ples.
[15] Properties of linear operators in a vector space: Rank-nullity theorem,
range, nullspace of infinite dimensional operators, examples of unbounded oper-
ators like position, momentum, creation, annihilation, conservation, angular mo-
mentum energy operators in quantum mechanics, the domain of an unbounded
operator.
[16] Adjoint of an operator, uniqueness of the adjoint when the operator is
densely defined, closed operators, closable operators and closure of an operator
in the unbounded case.
[17] Proof of the fact that if an operator defined in a Banach space or even
a normed linear space X has a dense domain and is bounded, then it can be
uniquely extended to a bounded operator on the whole of X .
[18] The open mapping/closed graph theorem: If a closed operator (ie, an
operator with a closed graph) in a Banach space X has domain the whole of X,
then it is bounded. If the operator is closed and has an inverse, then its inverse
is also closed and hence by the above theorem, the operator has a bounded
inverse, ie, the operator maps open sets onto open sets.
[19] The spectral theorem for normal operators in a finite dimensional Hilbert
space with proof.
[20] Statement of the spectral theorem for compact normal operators in an
infinite dimensional Hilbert space.
[21] Statement of the spectral theorem for bounded and unbounded self-
adjoint operators in an infinite dimensional Hilbert space.
[22] A short survey of spectral measures on a measurable space and spectral
integration with applications to the description of quantum mechanical observ-
ables like position, momentum, energy and angular momentum. Description of
60 Advanced Probability and Statistics: Applications to Physics and Engineering

the domain of certain unbounded operators occurring in quantum mechanics


like position and momentum.
[23] The polar and singular value decompositions of an operator in a Hilbert
space. Application of the polar decomposition to the construction of the spectral
measure associated with a self-adjoint operator in a Hilbert space.
[24] Construction of the square root of a bounded positive operator in a
Hilbert space by an iterative algorithm.
[25] The spectrum of an operator in a Banach space.
[26] [a] Applications of the singular value decomposition in solving least
squares problems in statistics.
[b] Gram-Schmidt orthonormalization and the QR decomposition of matri-
ces. Application to solving linear equations.
[27] The principal component analysis of data based on singular value de-
composition.
[28] Eigensubspace based algorithms for high resolution direction of arrival
(DOA) estimation. Construction of the array signal model, estimation of the
array signal covariance matrix and its shifted version, using the signal sub-
space eigenvectors to obtain the MUSIC pseudo-spectrum whose peaks give the
DOA’s. The ESPRIT algorithm uses the generalized eigenvalues of the denoised
array signal covariance matrix and its shifted version to obtain the DOA’s. The
source correlation matrix can also be estimated using the ESPRIT algorithm
by exploiting properties of the generalized eigenvectors. Denoising of the array
signal covariance matrix is achieved by determining the noise eigenvalue as the
minimum eigenvalue of the noisy array signal covariance matrix. SVD versions
of the MUSIC and ESPRIT algorithms can also be obtained by directly oper-
ating on the data matrix consisting of several time samples of the array signal
vector.
[29] The various kinds of spectra of a self-adjoint operator in an infinite
dimensional Hilbert space: Point spectrum, absolutely continuous spectrum
and singular continuous spectrum. Compare with the finite dimensional case
where only point spectrum is present. If H is a Hilbert space and E(.) is a
spectral measure on the real line with values in the projection lattice of H, then
for f ∈ H, we say that f ∈ Hp if the map x →< f, E((−∞, x])f > consists
of pure jumps at a discrete set of points. We say that f ∈ Hac iff the map
x →< f, E(−∞, x])f > is absolutely continuous w.r.t the Lebesgue measure
while we say that f ∈ Hsc iff the map x →< f, E(−∞, x])f > is continuous
and singular w.r.t the Lebesgue measure. We have the orthogonal direct sum
decomposition:
H = Hp ⊕ Hac ⊕ Hsc

[30] The matrix inversion lemma and its application to the recursive least
squares algorithm for real time estimation of parameters in LIP systems.
[31] Inverting a matrix when one row and one column is appended to it and
its application to the recursive least squares lattice algorithm for forward and
backward prediction of a time series in a a time and order recursive manner.
Advanced Probability and Statistics: Applications to Physics and Engineering 61

[32] Fast inversion of a Toeplitz centro-symmetric matrix with application


to fast order recursive computation of the predictor of a stationary process.
[33] Fast inversion of a block Toeplitz-centro-symmetric matrix with appli-
cation to fast order recursive computation of the predictor of a vector valued
stationary process.
[34] Symmetric, self-adjoint and essentially self-adjoint operators in a Hilbert
space.
[35] The maximal and minimal operators associated with the Schrodinger
operator. Proofs of self-adjointness of the maximal operator.
[36] Kato’s theory of relative boundedness of an unbounded operator w.r.t
another unbounded operator with applications to stability theorems for pertur-
bation of self-adjoint operators.
[37] The general theory of the resolvent and the spectrum of an operator.
The Neumann series and functions of operators in terms of complex contour
integrals. The resolvent equation.
[38] The semisimple and nilpotent components of an operator.
[39] The primary decomposition theorem and the Jordan canonical form of
a non-diagonable operator with applications to computing the state transition
matrix in control theory and the evolution operator in dissipative quantum
systems.
[40] (a) Computing functions of an operator using the Jordan canonical
form. (b) An algorithm for computing the generalized eigenvectors of the Jordan
canonical form by iterative processes.
[41] Application of linear algebra to robotics.
[a] The fundamental d-link robot differential equation in the presence of
external torques, disturbances and stochastic noise.
[b] The robot differential equation in the case when the end effector is con-
nected to a spring-mass-damping system with an external environmental force
being applied to the end-effector mass.
[c] Trajectory tracking and parameter estimation of a robot by means of
position-velocity feedback torque. Construction of the feedback pd controller
and the adaptive parameter estimator based on Lyapunov energy theory.
[d] Disturbance estimation in robotics. Proof of the asymptotic boundedness
of the disturbance estimation error using Lyuapunov energy function theory.
[f] Linearization of the robot differential equations around the desired tra-
jectory and statistical analysis of the trajectory perturbation.
[g] Introduction to quantum robotics. Quantization of the Hamiltonian of a
robot derived from its Lagrangian.
[42] Introduction to Lie groups, Lie algebras and their representations with
applications to the differential equations of mechanics, quantum mechanics and
image processing.
[a] The general theory of differentiable manifolds. Manifold, chart, atlas,
functions between manifolds, coordinates, tangent manifold, tangent bundle,
vector bundle, cotangent manifold, vector fields, flow of a vector field, Lie
bracket between two vector fields and its physical interpretation in terms of
flows,
62 Advanced Probability and Statistics: Applications to Physics and Engineering

[b] Definitions and examples of Lie groups and Lie algebras.


[c] Application of Lie groups and Lie algebras to differential and partial
differential equations of classical and quantum mechanics.
Noether’s theorem for particles and fields: If a Lagrangian or a Hamiltonian
is invariant under a Lie group of transformations, then a conservation law, ie,
a first integral of the motion can be readily obtained in terms of the generators
of the group.

[2] Wavelets

A multiresolution analysis of L2 (R) is a family of subspaces such that


φ(x − n), n ∈ Z is a basis for V0 . f (x) ∈ Vj if, f (2x) ∈ Vj+1 , Vj ⊂
Vj+1 , Cl( j∈Z Vj ) = L2 (R), j∈Z Vj = {0}. Clearly since φ(x − k) ∈ V0 , it
follows that φ(2x − k) ∈ V1 and since V0 ⊂ V1 , we get that

φ(x) = u[k]φ(2x − k)
k

Note that since φ(x − k), k ∈ Z is a basis for V0 , it follows from the above
assumption that φ(2x − k), k ∈ Z is a basis for V1 . u[k] is called the scaling
sequence. Let 
ψ(x) = v[k]φ(2x − k)
k

Then, 
ψ̂(ω) = (1/2) v[k]exp(−jkω/2)φ̂(ω/2)
k

= v̂(ω/2)φ̂(ω/2)
and likewise,
φ̂(ω) = û(ω/2)φ(ω/2)
In these equations, 
φ̂(ω) = φ(x).exp(−jωx)dx,
R

ψ̂(ω) = ψ(x).exp(−jωx)dx,
R

û(ω) = u[n]exp(−jωn),
n

v̂(ω) = v[n]exp(−jωn)
n

Thus,
< ψ(x), φ(x − k) >=< ψ̂(ω), exp(−jkω)φ̂(ω) >

= exp(−jkω)v̂(ω)∗ û(ω)|φ(ω/2)|2 dω
Advanced Probability and Statistics: Applications to Physics and Engineering 63

If we take
v[k] = (−1)k u[−k − 1]
then, 
v̂(ω) = (−1)k u[k]exp(jω(k + 1)) = exp(jω)û(ω + π)∗
k

and hence
û(ω)v̂(ω)∗ = exp(−jω)û(ω)û(ω + π)
Thus,
û(ω + π).v̂(ω + π)∗ = −exp(jω)û(ω)û(ω + π)
= −û(ω)v̂(ω)∗
Thus,
u(ω)v̂(ω)∗ + û(ω + π)v̂(ω + π)∗ = 0
We then find that S[k] =< ψ(x), φ(x − k) > is given by
 
S[k] = ψ(x)phi(x − k)dx = ψ̂(ω)∗ φ(ω).exp(−jωk)dω


= v(ω/2)∗ u(ω/2)|φ(ω/2)|2 .exp(−jωk)dω

= exp(−jω(k − 1/2))u(ω/2)u(ω/2 + π)|φ(ω/2)|2 dω

= exp(−jω(2k − 1))u(ω).u(ω + π)|φ(ω)|2 dω

Replacing ω by ω + nπ and noting that u(ω) has period 2π, we get



S[k] = (−1)n exp(−jω(2k − 1))u(ω)u(ω + π)|φ(ω + nπ)|2 dω


N 
−1
= (2N + 1) (−1) n
exp(−jω(2k − 1))u(ω)u(ω + π)|φ(ω + nπ)|2 dω
n=−N


N 
= −(2N +1)−1 (−1)n exp(−jω(2k−1))u(ω)u(ω+π).|φ(ω+(n+1)π)|2 dω
n=−N

Now define

N
χN (ω) = (−1)n |φ(ω + nπ)|2
n=−N

Then, we get

S[k] = (2N + 1)−1 exp(−jω(2k − 1))u(ω)u(ω + π)χN (ω)dω
64 Advanced Probability and Statistics: Applications to Physics and Engineering

Also

limN →∞ (2N + 1)−1 χN (ω + π) = −limN →∞ (2N + 1)−1 χN (ω)

Thus, denoting
χ(ω) = limN →∞ (2N + 1)−1 χN (ω),
we get
χ(ω + π) = −χ(ω)
and 
S[k] = exp(−jω(2k − 1))u(ω)u(ω + π)χ(ω)dω

Now observe that


[(N −1)/2]

χN (ω) = (2N + 1)−1 (φ̂(ω + 2nπ)|2 − |φ(ω + (2n + 1)π)|2 )
n=−[N/2]

so if we assume that
limω→∞ |φ̂(ω)|2
exists, then

limn→∞ (|φ̂(ω + 2nπ)|2 − |φ̂(ω + (2n + 1)π)|2 ) = 0

and hence by Cesaro sum theorem,

χ(ω) = limN →∞ χN (ω) = 0

and this proves that ψ(x) is orthogonal to φ(x−k) for every integer k. It follows
easily from this that the subspace W1 = Clspan{ψ(x − k) : k ∈ Z} is orthogonal
to the subspace V0 = Clspan{φ(x − k) : k ∈ Z}

Now we show that we have the orthogonal direct sum

V1 = V0 ⊕ W1

where
W1 = Clspan{ψ(x − k) : k ∈ Z}
Orthogonality of this sum has been proved. It remains only to show that any
f ∈ V1 can be expressed as a finite/infinite linear combination of elements from
V0 and W1 . Since φ(2x − k), k ∈ Z is a basis for V1 , it suffices to show that
 
φ(2x − k) = a[m]φ(x − m) + b[m]ψ(x − m)
m m

for some sequences a[m], b[m]. Taking Fourier transforms, we get that it suffices
to show that

exp(−jωk/2)φ̂(ω/2) = A(ω)φ̂(ω) + B(ω)ψ̂(ω)


Advanced Probability and Statistics: Applications to Physics and Engineering 65

= û(ω/2)A(ω)φ̂(ω/2) + û(ω/2 + π)∗ B(ω)exp(jω)φ̂(ω/2)


or equivalently,

exp(−jωk/2) = û(ω/2)A(ω) + exp(jω)û(ω/2 + π)∗ B(ω) − − − (a)

for some functions A, B having period 2π. Thus, it amount to proving that
there exist 2π-periodic functions A, B such that if F (ω) denotes the rhs of (a),
then F (ω + 2π) = (−1)k F (ω), ie,

û(ω/2 + π)A(ω) + exp(jω/2)û(ω/2)∗ B(ω)

= (−1)k (û(ω/2)A(ω) − exp(jω/2)û(ω/2 + π)∗ B(ω))


or

A(ω)(û(ω/2+π)−(−1)k u(ω/2))+B(ω)exp(jω/2)(u(ω/2)+(−1)k u(ω/2+π))∗ = 0

To prove the existence of functions A, B having period π therefore amounts to


showing that the ratio

(û(ω/2 + π) − (−1)k û(ω/2))


G(ω) = exp(−jω/2)
(û(ω/2) + (−1)k û(ω/2 + π))∗

has period 2π. This follows immediately from the 2π-periodicity of û, v̂. Thus,
having proved that
V1 = V0 ⊕ W1
is an orthogonal
√ direct sum, it follows by applying the unitary scaling operator
S : f (x) → 2f (2x) to this direct sum n times that

Vn+1 = Vn ⊕ Wn+1 , n ∈ Z

where
Wn+1 = span{2n/2 ψ(2n x − k) : k ∈ Z}
and hence we get the direct sum decomposition

L2 (R) = Wn
n∈Z

as an orthogonal direct sum where in deriving this we use the MRA properties
 
L2 (R) = Cl( Vn ), {0} = Vn , Vn ⊂ Vn+1 , S(Vn ) = Vn+1
b n

This proves that any f ∈ L2 (R) can be expanded as



f (x) = c(n, k)ψn,k (x), ψn,k (x) = 2n/2 ψ(2n x − k)
n,k∈Z
66 Advanced Probability and Statistics: Applications to Physics and Engineering

with the additional property that ψn,k ⊥ ψm,l , n = m. However to get an onb
wavelet basis for L2 (R) we require in addition that ψn,k ⊥ ψn,kl , k = l. We shall
prove this under the assumption that φ(x − k), k ∈ Z is an onb (orthonormal
basis) for V0 , not merely a basis. Thus, we have the relations

< φ(x), φ(x − k) >= φ(x)φ(x − k)dx = δ[k]

Then 
φ(x) = u[m]φ(2x − m)
k

implies

δ[k] =< φ(x), φ(x − k) >= u[m]u[n] < φ(2x − m), φ(2x − 2k − n) >
m,n
 
= (1/2) u[m]u[n] < φ(x−m), φ(x−2k−n) >= (1/2) u[m]u[n]δ[2k+n−m]
m,n m,n

= (1/2) u[n]u[n + 2k]
n

After appropriate scaling, we may thus assume that



u[n]u[n − 2k] =< u, R2k u >= δ[k], k ∈ Z − − − (b)
n

where
Rm u[n] = u[n − m]
Taking the DTFT of (b) then gives

|û(ω/2)|2 + |û(ω/2 + π)|2 = 2 − − − (c)

(b) and (c) are equivalent. Conversely, if u satisfies (b) or (c) and φ(x) satisfies
the functional equation

φ(x) = u[k]φ(2x − k) − − − (d)
k

then φ(x−k), k ∈ Z will be an orthogonal set with the same norm and hence after
appropriate normalization, these will form an onb for V0 = Clspan{phi(x − k) :
k ∈ Z} and by the above procedure, we can then obtain an orthonormal wavelet
basis ψn,k (x) = 2n/2 ψ(2n x − k), n, k ∈ Z for L2 (R). To solve (d) for φ(x) in
terms of the scaling sequence u[k], we take the Fourier transform on both sides
to get
φ̂(ω) = û(ω/2)φ̂(ω/2)
which gives on iteration,

φ̂(ω) = φ̂(0)Π∞ n
n=1 û(ω/2 )
Advanced Probability and Statistics: Applications to Physics and Engineering 67

Now, suppose u[k] satisfies (c) (or equivalently (b)) and we define

v[k] = (−1)k u[−k − 1]

Then we have as seen earlier that

v̂(ω) = exp(jω)û(ω + π)∗ − − − (e)

and since (c) holds, it also follows that

|v̂(ω/2)|2 + |v̂(ω/2 + π)|2 = 2 − − − (f )

ie, {R2k v : k ∈ Z} is also an orthonormal set just like {R2k u : k ∈ Z}. We


now show that the subspaces of l2 (Z) spanned by these two orthonormal sets
respectively are mutually orthogonal. To see this, it suffices to show that

< v, R2k u >= 0

or equivalently, 
v[n]u[n + 2k] = 0
n

Taking the Fourier transform of this, shows that we require only to show that

v̂(ω/2)∗ û(ω/2) + v̂(ω/2 + π)∗ û(ω/2 + π) = 0 − − − (g)

But this follows immediately from (e), the 2π periodicity of û and the fact that
exp(jπ) = −1.

Applications of wavelets to signal processing and control problems.


Example: We have a mechanical system describing the motion of a particle
in a one dimensional potential field dependent upon an unknown parameter
vector θ that we wish to estimate from measurements of the trajectory of the
particle. We know roughly before hand that the trajectory of the particle over
the nth time slot [nT, (n + 1)T ) has maximum frequency ωmax (n) and minimum
frequency ωmin (n). Thus, instead of recording the entire trajectory over the
time interval [0, N T ), we record only the dominant wavelet coefficients over
each time slot and then attempt to estimate the parameter θ based on this
data. The advantage is estimation with compressed data storage. Let ψ(x) be
the mother wavelet, so that ψn,k (x) = 2n/2 ψ(2n x − k), n, k ∈ Z is an onb for
L2 (R). The maximum frequency of ψnk over a slot [a, b) is given by

ωmax (n, k) = maxx∈[a,b) |ψnk (x)|/|ψnk (x)|

= 2n .max{|ψ  (x)|/|ψ(x)| : 2n x − k ∈ [a, b)}


So roughly speaking, ωmax (n, k) = 2n M . Thus, if (c, d) is the region of support
of ψ(x), then over the time interval (a, b), a good representation of the signal
68 Advanced Probability and Statistics: Applications to Physics and Engineering

having maximum frequency ωm is obtained by retaining only those wavelet


coefficients for which the scale-translation index pairs (n, k) are such that

M.2n ≤ ωm ,

((c + k)/2n , (d + k)/2n ) ⊂ (a, b)


ie
n ≤ log2 (ωm /M ), M = supx |ψ  (x)/|ψ(x)|,
2n a − c ≤ k ≤ 2n b − d

[3] Some prerequisites for making a transition from linear algebra to infinite
dimensional functional analysis
[1]
[a] General Topology, notion of a topological space, continuity of a func-
tion between two topological spaces, convergence of a sequence in a topological
space. Compactness of a topological space. The product topology, Tychonoff’s
compactness theorem and the axiom of choice.
[b] metric spaces, Cauchy sequences and convergence in metric spaces, the
topology on a metric space induced by a metric, open balls and closed balls in
a metric space. Convergence and continuity of sequences and functions on a
metric space. Equivalence of metrics in terms of the same topology induced by
two different metrics. Complete metric spaces. Completion of a metric space.
Sequential compactness of a metric space in terms of existence of a convergent
subsequence. Equivalence of sequential compactness to total boundedness (ex-
istence of a finite -net for any > 0) and completeness. Hence, equivalence of
sequential compactness to compactness of a metric space regarded as a topolog-
ical space. Rn as a metric space under any norm. Compactness of a subset of
Rn being equivalent to the closedness and boundedness of the subset of Rn

[2] Jordan canonical form. Let N be nilpotent on a complex vector space


V of dimension n. Assume N m = 0, N m−1 = 0 for some m > 1. Then we
can choose a basis of the form B1 = {N m−1 v1,k , k = 1, 2, ..., p1 } for R(N m−1 ).
Then, {N m−1 v1,k , N m−2 v1,k : k = 1, 2, ..., p1 } is a linearly independent set and
can be extended to a basis

B2 = B1 ∪ {N m−2 v1,k1 : k1 = 1, 2, ..., p1 } ∪ {N m−2 v2,k2 : k2 = 1, 2, ..., p2 }

= {N m−1 v1,k1 , N m−2 v1,k1 , N m−2 v2,k2 : 1 ≤ k1 ≤ p1 , 1 ≤ k2 ≤ p2 }


for R(N m−2 ). Further, it is clear that we can arrange matters so that N m−1 v2,k =
0, k = 1, 2, ..., p2 for if N m−1 v2,k = 0 for some k = 1, 2, ..., p2 , then we can find
constants c1 , ..., cp1 so that


p1
N m−1 v2,k = cj N m−1 v1,j
j=1
Advanced Probability and Statistics: Applications to Physics and Engineering 69

p 1
and hence v2,k may be replaced by v2,k − j=1 cj v1,j without affecting the
required linear independence. Likewise, it is clear that

B2 ∪ {N m−2 v1,k1 , N m−3 v1,k1 , N m−3 v2,k2 : 1 ≤ k1 ≤ p1 , 1 ≤ k2 ≤ p2 }

is a linearly independent set and hence can be extended to a basis

B3 = B2 ∪{N m−2 v1,k1 , N m−3 v1,k1 , N m−3 v2,k2 : 1 ≤ k1 ≤ p1 ,


1 ≤ k2 ≤ p2 }∪{N m−3 v3,k3 : k3 = 1, 2, ..., p3 }

for R(N m−3 ). Further, it is clear that we can arrange matters so that

N m−2 v3,k = 0, k = 1, 2, ..., p3

for otherwise, N m−2 v3,k is a linear combination of elements of B2 , ie,

N m−2 v3,k ∈ span{N m−1 v1,k1 , N m−2 v1,k1 , N m−2 v2,k2 : k1 = 1, 2, ..., p1 ,
k2 = 1, 2, ...p2 }
and hence we can replace v3,k by v3,k minus a linear combination of the vectors
N v1,k1 , v1,k1 , v2,k2 , k1 = 1, 2, ..., p1, k2 = 1, 2, ..., p2 yielding the desired basis.
[3] An identity regarding resolvents of linear operators. The aim is to show
that if the resolvents of two operators are close to each other at a certain point
in the complex plane, then they are close to each other at all points in the
complex plane.
R(T, z) = (T − z)−1 = (T − z0 − (z − z0 ))−1 =

(z − z0 )−1 ((z − z0 )−1 (T − z0 ) − 1)−1


on the one hand. On the other,

R(T, z) = R(T, z0 )(1 − (z − z0 )R(T, z0 ))−1

= (z − z0 )−1 R(T, z0 )((z − z0 )−1 − R(T, z0 ))−1


= (z − z0 )−1 (R(T, z0 ) − (z − z0 )−1 + (z − z0 )−1 )(((z − z0 )−1 − R(T, z0 ))−1
= (z − z0 )−1 (−1 − (z − z0 )−1 R(R(T, z0 ), (z − z0 )−1 ))
= −(z − z0 )−1 − (z − z0 )−2 R(R(T, z0 ), (z − z0 )−1 )

[4] Assignment problems in applied linear algebra


[1] Let V be a vector space of dimension n over a field F. Let B = {e1 , ..., en }
and B = {f1 , ..., fn } be two bases for V . Show that there exists a unique matrix

A = ((aij ))1≤i,j≤n ∈ Fn×n

such that 
fi = aij ej , 1 ≤ i ≤ n
j
70 Advanced Probability and Statistics: Applications to Physics and Engineering

Prove that A is a non-singular matrix. Let

((tij )) = [T ]B , ((sij )) = [T ]B

Prove that   
T (fi ) = aij T (ej ) = aij tkj ek = sji fj
j j,k j

= sji ajk ek
j,k

and hence  
aij tkj = sji ajk
j j

and hence
[T ]B AT = AT [T ]B
or equivalently,
[T ]B = A−T [T ]B AT

[2] Let V be vector space and let {e1 , ..., en } and {f1 , ..., fm } be two bases
for V , ie, both are maximal linearly independent subsets of V . Prove that both
are minimal spanning sets for V and that n = m.
hint: To prove that n = m, write
 
fi = aij ej , ej = bjk fk
j k

and deduce that


AB = Im , BA = In
and hence by taking trace, deduce that n = m.

[3] Consider a nonlinear system in Rn described by the differential equation

dx(t)/dt = F (x(t) + .w(t), F : Rn → Rn

where w(t) ∈ Rn is noise. Assume F to be continuously differentiable. Solve


this equation upto O( ) using perturbation theory:
2
x(t) = x0 (t) + .x1 (t) + x2 (t) + O( 3 )
m
Show that on equating coefficients of , m = 0, 1, 2 that

x0 (t) = F (x0 (t)), x1 (t) = F  (x0 (t))x1 (t) + w(t),

x2 (t) = F  (x0 (t))x2 (t) + (1/2)F  (x0 (t))(x1 (t) ⊗ x1 (t))
Advanced Probability and Statistics: Applications to Physics and Engineering 71

Define
J(t) = F  (x0 (t)),

Φ(t, s) = I + J(t1 )...J(tn )dt1 ...dtn
n≥1 s<tn <..<t1 <t

Show that  t
 Φ(t, s) ≤ exp(  J(u)  du)
s
and that  t
x1 (t) = Φ(t, s)w(s)ds,
0
 t
 x1 (t) ≤ (  Φ(t, s)  .ds).sup0≤s≤t  w(s) 
0
Also deduce that
 t
x2 (t) = Φ(t, s)K(s)(x1 (s) ⊗ x1 (s))ds
0

where
K(s) = (1/2)F  (x0 (s))

[4] Let T be a Hermitian operator in a finite dimensional complex Hilbert


space V with spectral decomposition
 
T = λ(a)Pa , Pa@ = Pa = Pa∗ , Pa Pb = 0, a = b, Pa = I
a a

Define √
|T | = T 2 , T+ = (|T | + T )/2, T− = (|T | − T )/2
show that  
|T | = |λ(a)|Pa , T+ = λ(a)Pa ,
a a:λ(a)>a

T− = λ(a)Pa
a:λ(a)<0

Show that for c ∈ R,



(T − c)+ = (λ(a) − c)Pa
a:λ(a)>c

Define E(c) to be the orthogonal projection onto ((T − c)+ ). Show that

E(c) = Pa
a:λ(a)≤c
72 Advanced Probability and Statistics: Applications to Physics and Engineering

Show that
limc↓d E(c) = E(d), c, d ∈ R
but
limc↑d E(c)
need not be E(d). Show that if λ(a), a = 1, 2, ..., r are arranged in ascending
order, so that λ(a − 1) < λ(a) for all a, then

limc↓λ(a) E(c) = Pb
b:b≤a

while 
limc↑λ(a) E(c) = Pb
b:b≤a−1

We write
limc↓x E(c) = E(x + 0), limc↑x E(c) = E(x − 0)
Thus, we have proved that

E(x + 0) = E(x), f orallx, E(λ(a) − 0) = E(λ(a − 1))

[5] Tutorial on linear algebra


[1] Prove that if c1 , ..., cr are distinct complex numbers and

pk (t) = Πj=k (t − cj )/Πj=k (cj − ck ), k = 1, 2, ..., r

then
pk (cj ) = δ(k, j)
Show that if f (t) ∈ C[t] and degf < r, then

r
f (t) = f (ck )pk (t)
k=1

Deduce that if T is an operator in an n dimensional complex vector space such


that T has at least 2 distinct eigenvalues, then

r 
r
I= pj (T ), T = cj pj (T )
j=1 j=1

If further T has a minimal polynomial

p(t) = Πrj=1 (t − cj )

then deduce that

pj (T )pk (T ) = 0, j = k, pj (T )2 = pj (T )
Advanced Probability and Statistics: Applications to Physics and Engineering 73

and hence conclude that T is diagonable.

[2] Give an example of an infinite dimensional separable Hilbert space H and


a linear mapping T : H → H such that T is injective but not surjective.

hint: Take an onb {e1 , e2 , ...} for H and define T (ej ) = e2j , j ≥ 1. Show
that this T is an isometry, ie, T ∗ T = I but T is not unitary.

[3] Let H be a separable Hilbert space, ie, H has a countable dense subset
D. Then, by applying the Gram-Schmidt process to the elements of D, deduce
that H has a countable orthonormal basis.

[4] T be a bounded Hermitian operator in a Hilbert space. Prove the fol-


lowing statements: (1) Define S = T /  T . Show that  S =. Show the the
process
X(n + 1) = X(n)2 /2 + X(n) − S/2, n ≥ 0, X(0) = S
gives a sequence of polynomials X(n) that converge in operator norm to X(∞) =
sqrtS. Hence give a constructive algorithm for computing

|T | = T 2

(2) Define for c ∈ R,

(T − c)+ = (|T − c| + T − c)/2, (T − c)− = (|T − c| − (T − c))/2

Show that

(T −c)+ ≥ 0, (T −c)− ≥ 0, T −c = (T −c)+ −(T −c)− , |T −c| = (T −c)+ +(T −c)−

Show that if c1 > c2 , then

0 ≤ (T − c1 )+ ≤ (T − c2 )+

and hence that


N ((T − c2 )+ ) ⊂ N ((T − c1 )+ )
Let P (c) denote the projection onto N ((T − c)+ ). Then, deduce that

P (c1 ) ≥ P (c2 ), c1 > c2

0 ≤ (T − c2 )(P (c1 ) − P (c2 )) ≤ (c1 − c2 )(P (c1 ) − P (c2 )), c1 > c2


Note that
(T − c1 )P (c2 ) = (T − c2 + c2 − c1 )P (c2 )
≤ ((T − c2 )+ + c2 − c1 )P (c2 ) = (c2 − c1 )P (c2 )
(T − c2 )P (c2 ) ≤ (T − c2 )+ P (c2 ) = 0
(T −c2 )P (c1 ) = (T −c1 +c1 −c2 )P (c1 ) ≤ ((T −c1 )+ +c1 −c2 )P (c1 ) = (c1 −c2 )P (c1 )
74 Advanced Probability and Statistics: Applications to Physics and Engineering

[5] Prove that if p is a prime number, then F = {0, 1, ..., p − 1} is a field


provided that addition and multiplication are performed modulo p.
[6] If T is a bounded operator in a Hilbert space, prove that there exists an
isometry U and a unique positive definite operator P such that T = U P .
hint: Let √
|T | = T ∗ T
where the square root is as defined above. Define U1 : R(|T |) → R(T ) so that
U1 |T |x = T x, ∀x. Show that U1 is well defined. Let U2 : R(|T |)⊥ → R(T )⊥ be
any isometry and define U : H → H so that U coincides with U1 and R(|T |)
and with U2 on R(|T |)⊥ = N (|T |) = N (T ). Then show that U is an isometry
and T = U |T |.

Remark: For finite dimensional Hilbert spaces, we have

dimR(|T |) = dimR(T ), dimR(|T |)⊥ = dimR(T )⊥

so the required isometries are easily constructed. The result of this problem is
known as the polar decomposition.

[6] Course outline for Applied Linear Algebra


[1] Fields, rings, groups, vector spaces, algebras.

[2] Basis of a vector space. Alternative equivalent definitions.

[3] Examples: The algebra of n×n matrices with values in a field, the ring of
polynomials, the fields Fp , p a prime, R, C, the algebra of ratio of polynomials,
the group of permutations of a set, the vector space Fn , F being any field, the
vector space spanned by a set of functions on a set with values in a field. The
notion of algebraically closed and non-closed fields with examples.

[4] Ideals, polynomial ideals, the monic generator of a polynomial, uniqueness


and existence.

[5] Subspaces, quotient spaces, direct sum decompositions

[6] The co-ordinate representation of a vector relative to a basis.

[7] Linear transformations between two vector spaces. Matrix of a linear


transformation relative to bases, examples of linear transformations from signal
and system theory.
Advanced Probability and Statistics: Applications to Physics and Engineering 75

[8] Inner product and Hilbert spaces in finite and infinite dimensions.

[9] Injective, surjective and bijective linear transformations between vector


spaces, notion of vector space isomorphism.

[10] The rank-nullity theorem, range and nullspace of a linear operator, Ech-
elon form of a matrix and its application to solving linear systems of equations.

[11] Decomposition theorems of matrix theory:


[a] A = EF where E has full column rank and F has full row rank.

[b] The spectral theorem for normal, Hermitian and unitary operators in
finite dimensional vector spaces.

[c] Bounded and unbounded operators in a Hilbert space, the spectral theo-
rem for bounded Hermitian operators in an infinite dimensional Hilbert space.
Proof based on construction of the square root of a positive operator.

[d] The polar decomposition of matrices.


[e] The singular value decomposition.
[f] The Gram-Schmidt orthonormalization process and the QR decomposi-
tion.

[g] The LDU and U DL decomposition for positive definite matrices with
application to linear prediction theory.

[h] Characteristic polynomials, eigenvalues, the minimal polynomial, diago-


nability, the primary decomposition theorem for finite dimensional operators on
a vector space over an arbitrary field.

[i] The Jordan canonical form for nilpotent matrices over C.


[j] The Jordan canonical form for finite dimensional matrices over C.

[10] Definition of the spectrum of an operator in an infinite dimensional


Hilbert space. Compact operators in a Hilbert space and countability of their
spectra. Proof that only the zero eigenvalue of a compact operator can be an
accumulation point of its set of eigenvalues, proof of the finite dimensionality of
the eigensubspace of a compact operator corresponding to a non-zero eigenvalue.

[11] The matrix inversion lemma and its application to the RLS-Lattice
algorithm for time and order recursive prediction of a discrete time signal.
76 Advanced Probability and Statistics: Applications to Physics and Engineering

[12]
[a] Applications of vector space and linear transformation theory to quantum
mechanics: Pure state, mixed state, observables, Schrodinger and Heisenberg
dynamics, interaction picture dynamics, Dyson series solution to Schrodinger
evolution. Computing probabilities of events in quantum mechanics, projection
valued and positive operator valued measurements, collapse of a state following
a measurement, time independent perturbation theory,
[b] Scattering theory in quantum mechanics, the wave operators, Lippmann-
Schwinger equations for the scattered states, Born approximation.

[13] Basic quantum information theory over finite dimensional Hilbert spaces.
[a] Proof the Shannon noiseless and noisy coding theorem and its converse
in classical communication theory based on typical sequences and the Feinstein-
Khintchine fundamental lemma.

[b] Von-Neumann Entropy of a state, entropy typical projections and Bernoulli


typical projections, Schumacher noiseless quantum compression.

[c] Classical-Quantum (Cq) coding theorem for classical sources communi-


cated via quantum mixed states.

[d] Noisy quantum channels:The Stinespring and Choi-Krauss representa-


tions.
[e] Recovery operators for noisy quantum channels and the Knill-Laflamme
theorem. Definition of a quantum code as a subspace supporting a class of
density matrices. Definition of the noise manifold of matrices as a vector space
of matrices such that each operator in the Choi-Kraus representation of the
quantum noisy channel is an element of the noise manifold. Proof of the Knill-
Laflamme theorem giving a relationship between the projection onto the code
subspace and the noise manifold. Explicit construction of the recovery opera-
tors.

[14] Application of the singular value decomposition to principal component


analysis.

[15] Application of the spectral theorem to the correlation matrix based


approach to MUSIC and ESPRIT algorithm for direction of arrival estimation.

[16] Application of the singular value decomposition to the data matrix based
approach to MUSIC and ESPRIT algorithms.
Advanced Probability and Statistics: Applications to Physics and Engineering 77

[17] Definition and properties of tensor product of vector spaces and opera-
tors.
[a] Definition of the tensor product of vector spaces and specialization to the
Kronecker tensor product.
[b] Symmetric and antisymmetric tensor products of vector spaces.
[c] Application of tensor products to Maxwell-Boltzmann, Bose-Einstein and
Fermi-Dirac statistics of elementary particles.
[d] Tensor product of infinite dimensional Hilbert spaces, construction based
on the GNS principle combined with Kolmogorov’s consistency theorem.

[18] Basics of classical and quantum filtering theory and control. (comes
under the heading ”stochastic calculus in Boson-Fock space” which is a special
kind of Hilbert space constructed using multiple tensor products).
[a] The Hudson-Parthasarathy quantum stochastic calculus and the HP noisy
Schrodinger equation.
[b] Derivation of the Belavkin filter for a mixture of quadrature and photon
counting measurements.
[c] Quantum control applied to the Belavkin filter for reduction of Lindblad
noise.

[19] Linearization of nonlinear systems.


[a] Lyapunov exponents, linearization of the logistic and Lotka-Volterra
equations.
[b] Linearization of non-linear dynamical systems described by ode and pde
with applications to galactic evolution via linearization of the Einstein field
equations.
[c] The algebra of creation, annihilation and conservation operators with
application to quantum field theory in the presence of noise.

[20] Solving stochastic differential equations driven by discontinuous semi-


martingales (as a part of linearization of non-linear stochastic systems). Con-
struction of the stochastic integral w.r.t a square integrable Martingale is treated
as a Hilbert space isomorphism problem while solving stochastic differential
equations driven by semimartinagles using perturbation theory comes under
the heading ”linearization of nonlinear systems”.
[a] Construction of the stochastic integral w.r.t semi-martingales based on
the Doob-Meyer decomposition.
[b] The Doleans-Dade-Meyer-Ito formula for discontinuous semi-martinagles
with application to superpositions of compound Poisson processes and Brownian
motion.
78 Advanced Probability and Statistics: Applications to Physics and Engineering

[c] The generalized Lipshitz conditions for proving existence and uniqueness
of solutions to sde’s driven by discontinuous semi-martingales.
[d] Applications of stochastic calculus to mathematical finance.
[e] Stochastic optimal control for Markov processes.
[f] Stochastic nonlinear filtering for Markov processes in the presence of Levy
measurement noise.

[21] Quantum image processing and Gaussian states.


[1] Conversion of a classical N × N image field into a pure quantum state of
2
size 2N × 1.
[2] Converting two classical image fields, one a signal field and another a
noise field into two quantum states, applying a preprocessing unitary operator
U dependent on some parameter θ to the signal state, superposing it with the
noise state, and then normalizing it resulting in a preprocessed noisy quantum
state.
[3] Repeat the process in [2] for a set of p signal and noise image field pairs
and then applying the inverse of the same unitary U dependent on the same
parameter to each preprocessed state and matching it with the corresponding
signal image state by a least squares method. The matching is carried out by
selecting the parameter θ so that the sum of norm squares of the mismatch error
over all the pairs is a minimum. This constitutes the training process. Here
the trained parameter will depend on the statistics of the classical noisy image
fields. We assume that these statistics are the same for the noisy image field in
each pair. Thus, we can safely say that the trained parameter in the unitary
U = U (θ) used in the pre and post processing operations is a function the noise
statistics.
[4] The unitary U (θ) may be designed by first coupling all the signal and
noise quantum states to a fixed coherent bath state via the tensor product
and then designing U (θ) as the unitary evolution operator at time T of the
Hudson-Parthasarathy noisy Schrodinger equation with Lindblad operators be-
ing linear functions of the parameter, then solving approximately the Hudson-
Parthasarathy equation using a second order Dyson series so that U (θ) becomes
a linear-quadratic function of θ which then acts on the signal, noisy and noisy
states after they get coupled to the coherent bath. The optimal parameter θ is
then obtained by retaining only upto quadratic term in this parameter in the
matching energy function. Thus the optimal parameter θ satisfies a system of
linear matrix equations. While acting the Dyson series approximated unitary
U (θ) on the bath coupled states, the standard matrix elements of the creation
and annihilation processes of Hudson and Parthasarathy w.r.t coherent states
must be used.
[5] Applying the designed unitary U (θ) based on the above training sequence
of image field pairs to another pair of a signal and noisy image field with the noise
field generated from the same statistics as that used in the training sequence
and then testing whether the unitary has ”learnt” the statistics by computing
the ratio of mismatch error energy to total signal field energy.
Advanced Probability and Statistics: Applications to Physics and Engineering 79

[6] Definition of quantum Gaussian states in L2 (Rn ) which is isomorphic to


Γs (Cn ) via the quantum Fourier transform w.r.t the Weyl operator.
[7] Examples of Gaussian states constructed using canonical position and
momentum operators in L2 (Rn ).
[8] Evaluating the matrix elements of a Gaussian state specified in terms
of position and momentum operators w.r.t coherent states using the Stone-
Von-Neumann theorem on unitary isomorphism between two pairs of canonical
position and momentum operators and Williamson’s theorem on diagonalization
of positive definite matrices using symplectic matrices. Final expression for the
Gaussian states as the exponential of a weighted sum of number operators for n
quantum Harmonic oscillators, or equivalently as a diagonalized quadratic form
in the creation and annihilation operators for n harmonic oscillators.

[7] LDU and UDL decompositions in prediction theory


Let x(t), t ∈ Z be a stochastic process with correlation R(t, s) = E(x(t)x(s)), t, s ∈
Z. We predict x(t) linearly based on x(t − 1), ..., x(t − p). The predictor is given
by
x̂p (t) = −ap (t, 1)x(t − 1) − ... − ap (t, p)x(t − p)
and the corresponding prediction error is


p
ef (t|p) = x(t) − x̂p (t) = ap (t, k)x(t − k), ap (0) = 1
k=0

The prediction filter coefficients ap (t, k) are determined from the normal equa-
tions
< ef (t|p), x(t − k) >= 0, k = 1, 2, ..., p
or equivalently,


p
R(t, t − k) = ap (t, m)R(t − m, t − k), 1 ≤ k ≤ p
m=1

These are obtained by minimizing < ef (t|p), ef (t|p) >. It is easy to see that
ef (t|p) is orthogonal to ef (t − k|p − k), k = 1, 2, ..., p. Then, we can write

[ef (t|p), ef (t − 1|p − 1), ..., ef (t − p|0)]T =


⎛ ⎞⎛ ⎞
1 ap (t, 1) ap (t, 2) ... ap (t, p) x(t)
⎜ 0 1 ap−1 (t − 1, 1) ... ap−1 (t − p, p − 1) ⎟ ⎜ x(t − 1) ⎟
⎜ ⎟⎜ ⎟
⎝ .. .. .. .. ⎠⎝ .. ⎠
0 0 .. 0 1 x(t − p)
This can be expressed as

ef (t|p) = Up (t)xp (t)


80 Advanced Probability and Statistics: Applications to Physics and Engineering

where Up (t) is the above upper triangular matrix. Forming the correlation
matrix on both sides then gives

Dp (t) = Up (t)Rp (t)Up (t)T

where
Rp (t) = ((R(t − k, t − m)))0≤k,m≤p
is a (p + 1) × (p + 1) positive definite matrix and

Dp (t) = diag[Ef (t|p), Ef (t − 1|p − 1), ..., Ef (t − p|0)]

with
Ef (t|p) =< ef (t|p), ef (t|p) >
This decomposition can also be expressed as

Rp (t) = Up (t)−1 Dp (t)Up (t)−T

which is a U DL decomposition of the data correlation matrix.


[8] Examination problems in applied linear algebra
In question [1] full marks will be given if any two parts are correct.
[1] Let x → ρ(x) be a mapping from a finite alphabet A into the convex set
of all density matrices in CN . Let p(.) be a probability distribution on A. For
δ > 0 and a large integer n, define the Bernoulli typical subset of An by:

TB (n, p, δ) = {u ∈ An : |N (x|u) − np(x)| ≤ δ np(x)(1 − p(x))∀x ∈ A}

and if ρ is any density matrix with spectral representation


N
ρ= |j > q(j) < j|
j=1
and m any positive integer, define the Bernoulli typical orthogonal projection:
 
E(ρ⊗m , δ) = |j1 , ..., jm >< j1 , ..., jm |

{(j1 ,...,jm ):|N (j|j1 ,...,jm )−mq(j)|≤δ mq(j)(1−q(j))∀j=1,2,...,N }


= |j1 , ..., jm >< j1 , ..., jm |
(j1 ,....,jm )∈TB (m,q,δ)

Define
ρ(u) = ⊗x∈A ρ(x)N (x|u) , u ∈ An
where N (x|u) denotes the number of times that the element z occurs in a se-
quence u. Prove the following statements:
[a]
p(n) (TB (n, p, δ)) ≥ 1 − a/δ 2
Advanced Probability and Statistics: Applications to Physics and Engineering 81

where a is the number of elements in A.

[b] √
p(n) (u) = 2−nH(p)+δ.O( n)
, u ∈ TB (n, p, δ)
where 
H(p) = − p(x).log(p(x))
x∈A

[c] √
μ(TB (n, p, δ)) = 2n.H(p)+δ.O( n)

where μ(E) denotes the number of elements in the set E,

[d]
√ √
2−mH(ρ)−δ.O( m)
E(ρ⊗m , δ) ≤ ρ⊗m E(ρ⊗m , δ) ≤ 2−mH(ρ)+δ.O( m)
E(ρ⊗m , δ)

where 
H(ρ) = −T r(ρ.log(ρ)) = − q(j).log(q(j))
j

Deduce from [d] that if u ∈ T (n, p, δ), then

[e]
 √
2−n. x p(x)H(ρ(x))−δ.O( n)
E(n, u, δ) ≤ ρ(u)E(n, u, δ)
 √
≤ 2−n. x p(x)H(ρ(x))+δ.O( n)
E(n, u, δ)

Deduce from [c] that



T r(E(ρ⊗m , δ)) = μ(T (m, q, δ)) = 2mH(ρ)+δ.O( n)

Deduce from [e] that for u ∈ TB (n, p, δ), [f]


 √
T r(E(n, u, δ)) = 2n. x∈A p(x)H(ρ(x))+δ.O( n)

[g] Let u1 , ..., uM ∈ TB (n, p, δ) be distinct and D1 , ..., DM be positive oper-


ators in CN with the constraint

M
D k ≤ IN
k=1

Assume that M is maximal subject to this constraint and

T r(ρ(uk )Dk ) ≥ 1 − , k = 1, 2, ..., M,


82 Advanced Probability and Statistics: Applications to Physics and Engineering

Dk ≤ E(n, uk , δ), k = 1, 2, ..., M


Then
[h] prove that there exists a constant γ > 0 depending only on , a, δ and
not on n such that

M
T r(ρ(u). Dk ) ≥ γ∀u ∈ TB (n, p, δ)
k=1

and hence deduce that

[i] √
M = Mn ≥ 2n.I(p,ρ)−δ.O( n)

so that
liminfn→∞ log(Mn )/n ≥ I(p, ρ)
where  
I(p, ρ) = H( p(x)ρ(x)) − p(x)H(ρ(x))
x x

[2] Calculate the row reduced echelon form of the following 3 × 3 matrix by
indicating the sequence of row and column operations:
⎛ ⎞
0 a12 a13
A = ⎝ 0 a22 a23 ⎠
0 0 0

where
a12 , a13 , a22 , a23 , a12 a23 − a13 a22 = 0

[3] Evaluate the spectral/eigen decomposition of the following real symmetric


(and hence Hermitian) matrix:
⎛ ⎞
a11 a12 0
A = ⎝ a12 a2 0 ⎠
0 0 a33

where
a11 , a12 , a22 , a33 ∈ R
Use this decomposition to evaluate exp(tA).

[4] State and prove the Cayley-Hamilton theorem. Use it to evaluate cos(A)
where
a b
A=
c d
Advanced Probability and Statistics: Applications to Physics and Engineering 83

where a, b, c, d are complex numbers such that the two eigenvalues of A are
distinct.

[5] Write short notes on any two of the following:


[a] Converse of Shannon’s noisy coding theorem in the classical and quantum
contexts.
[b] Primary decomposition of a 4 × 4 matrix with two distinct eigenvalues
λ, λ2 and minimal polynomial

p(t) = (t − λ1 )2 (t − λ2 )

[c] Schrodinger’s wave mechanics and Heisenberg’s matrix mechanics.


[d] Jordan decomposition of a complex n×n matrix as a sum of a diagonable
matrix and a nilpotent matrix, both of which commute. Give a proof of the
existence and uniqueness of this decomposition.
[e] Rank-Nullity theorem.

[9] Simultaneous triangulability of a family of matrices of same size


Proof: Let F be a commuting family of n × n complex matrices. We wish to
show that there exists a basis for V = Cn so that in this basis all the elements
of F are lower triangular. Without loss of generality, we may assume that
F = {T1 , T2 , ..., Tr is a finite family of matrices because, F is contained in the
n2 -dimensional vector space of all n×n matrices and since F forms a commuting
family, we can replace F by a maximal linearly independent subset of F.
Now we shall prove that if W is a proper F-invariant subspace of V , then
there exists an α ∈ / W such that Tj α is in the subspace spanned by α and W
for each j = 1, 2, ..., r. Equivalently, there exists constants c1 , ..., cr such that
(Tj − cj )α ∈ W, j = 1, 2, ..., r. Now suppose that we have proved this claim.
Then, it is easy to see that F is simultaneously triangulable. Indeed, to see
this, first choose an eigenvector e1 of T1 : (T1 − c1 )e1 = 0. Let W1 = N (T − c1 ).
Clearly W1 is F-invariant since F is a commuting family. If W1 = V , we are
through by induction on r. If W1 = V , then by the above claim, there exists an
e2 ∈/ W1 such that Tj e2 ∈ span{e2 , W1 }, j = 1, 2, ..., r. Moreover by induction on
dimV , we can in this case choose a basis for W1 such that Tj in this basis is lower
triangular for each j = 1, 2, ..., r. Now, if span{e2 , W1 } = V , we are through
otherwise, suppose span{e2 , W1 } = V . Obviously, span{e2 , W1 } is F-invariant
and hence by the above claim, we can choose an e3 ∈ / span{e2 , W1 } such that
Tj e3 ∈ span{e2 , W1 } for each j = 1, 2, ..., r. Now, if span{e3 , e2 , W1 } = V , we
are through otherwise, continue this process which will terminate in atmost n
steps by the finite dimensionality of V . Now, we come to the proof of the claim:
Let W be as above and choose e2 ∈ / W such that T1 e1 ∈ W . This is possible
in view of the result given below. Thus, we may assume that for some c1 ,
(T1 − c1 )e1 ∈ W . Let V1 denote the set of all x such that (T1 − c1 )x ∈ W .
Then V1 is a non-empty F-invariant subspace that properly contains W since it
contains e1 and W and e1 ∈ / W . (Note that W is F-invariant. Let U2 denote the
restriction of T2 to V1 Then by the result below, there exists an e2 ∈ V1 , e2 ∈ /W
84 Advanced Probability and Statistics: Applications to Physics and Engineering

such that (U2 − c2 )e2 ∈ W for some scalar c2 , ie, (T2 − c2 )e2 ∈ W . Obviously,
(T1 − c1 )e2 ∈ W by the definition of V1 . Now let V2 denote the set of all x such
that (T1 − c1 )x ∈ W, (T2 − c2 )x ∈ W . Then, V2 is F-invariant and contains both
e2 and W . In other words, V2 properly contains W and hence there exists an
e3 ∈/ W and a scalar c3 such that (T3 − c3 )e3 ∈ W . Obviously, by the definition
of V2 , we have that (T1 − c1 )e3 ∈ W, (T2 − c2 )e3 ∈ W . Continuing in this
way, after r steps, we obtain a vector en ∈ / W and scalars c1 , ..., cn such that
(Tj − cj )en ∈ W, j = 1, 2, ..., n and this completes the proof of the theorem.
Note that we have made use of the following result:
Result: Let T be a linear operator in V and let W be a proper T -invariant
subspace. Then, there exists a vector α ∈ / W such that T α ∈ span{α, W }. To
prove this, we choose a β ∈ / W and note that if p0 is the monic generator of the
ideal of all polynomials p for which p(T )β ∈ W (this set is not empty since it
contains the minimal polynomial of T ), then we can write p0 (t) = (t − c)q(t)
where c is a scalar and q is another polynomial, and by the minimality of degp0 ,
it follows that α = q(T )β ∈ / W while (T − c)α ∈ W . This completes the proof.
Note that we have used the T -invariance of W in stating that the above set is
an ideal in the algebra of polynomials.

[10] A problem in linear algebra


Let W1 , W2 be two subspaces of a vector space such that the set theoretic
union W = W1 ∪ W2 is also a vector space. Then show that either W1 ⊂ W2 or
else W2 ⊂ W1 .
Suppose that this is false. Then W = W1 ∪ W2 properly contains either
W1 or W2 . Suppose that it properly contains W1 . Let {e1 , ..., er } be a basis
for W1 and extend this to a basis {e1 , ..., er , er+1 , ..., en } for W . It is clear that
since W properly contains W1 , n > r. Moreover, we may assume without loss
of generality that er+1 , ..., en ∈ W2 for otherwise, we can subtract vectors in
W1 = span{e1 , ..., er } from er+1 , ..., en without altering the basis property and
yielding er+1 , ..., en ∈ W2 . Now, Then since W = W1 ∪ W2 is a subspace,
e1 + ... + en ∈ W , it follows that e1 + ... + en is either a linear combination of
e1 , ..., er or else a linear combination of er+1 , ..., en which is clearly false since
e1 , ..., en are linearly independent vectors.

[11] Some results in operator theory


[1] Spectral theorem for compact Hermitian operators in a Hilbert space.
Let T be a compact operator in a Banach space X. The spectrum of T is
then countable and has no non-zero accumulation point. Every element of the
spectrum of T is an eigenvalue of T with finite multiplicity. Further, if c is an
eigenvalue of T with multiplicity m, then c̄ is an eigenvalue of T ∗ with the same
multiplicity m.
Proof:
[a] Suppose cn is a sequence of eigenvalues with eigenvectors un such that
cn → c = 0. Let Mn = span{u1 , ..., un }. Mn is T -invariant. We can choose
vn ∈ Mn − Mn−1 such that d(vn , Mn−1 ) = 1 and  vn = 1. For example, if X
Advanced Probability and Statistics: Applications to Physics and Engineering 85


is a Hilbert space, choose any vn ∈ Mn ∩Mn−1 having unit norm. Now, vn /cn is
bounded and hence T (vn )/cn has a convergent subsequence. But m < n implies

T (vn )/cn − T (vm )/cm = vn − (T (vm )/cm − (T − cn )(vn )/cn )

Now it is easy to see that

(T − cn )(vn ) ∈ Mn−1

and hence,
T (vn )/cn − T (vm )/cm ∈ vn − Mn−1
Thus,
 T (vn )/cn − T (vm )/cm ≥ d(vn , Mn−1 ) = 1
which contradicts the hypothesis that T (vn )/cn has a convergent subsequence.
[b] Suppose c = 0 is not an eigenvalue of T . Then R(T − c) is closed. Indeed,
suppose un is a bounded sequence such that (T − c)un → v. We must show that
v ∈ R(T − c). Since T is compact and un is bounded, there is a subsequence
up(n) of un such that T (up(n) → w. Then,

cup(n) = T (up(n) − v → w − v

or
up(n) → (w − v)/c = x
say and we get from the boundedness of T ,

T (x) = limT (up(n) ) = w

and therefore,
T (x) − v = cx
or equivalently,
v = (T − c)(x) ∈ R(T − c)
proving the claim. Now suppose un is a sequence with  un → ∞ and (T −
c)(un ) → v. We must show that v ∈ R(T − c). But defining wn = un /  un ,
we get that wn is a bounded sequence and

(T − c)(wn ) → 0

By compactness of T , wn has a convergent subsequence wp(n) such that

T (wp(n) ) → w

Then,
cwp(n) → w
or equivalently,
wp(n) → w/c = 0
86 Advanced Probability and Statistics: Applications to Physics and Engineering

and hence
T (wp(n) ) → T (w)/c
and therefore
T (w)/c = w
ie,
(T − c)(w) = 0
ie, c is an eigenvalue of T , a contradiction which is resolved by going back to
the previous case.
Let B be a closable operator in a Hilbert space. Then R(B̄)⊥ = R(B)⊥ .
Indeed, suppose x ∈ R(B)⊥ . It suffices to show that x ∈ R(B̄)⊥ . We have that
< x, By >= 0∀y ∈ D(B). Choose z ∈ R(B̄). Then there is a sequence yn such
that yn → y, Byn → z and z = B̄y. Then

< x, z >=< x, B̄y >= lim < x, Byn >= 0

proving the claim.

Let A be essentially self-adjoint. Then R(A ± i)⊥ = 0,. ie, R(A ± i) is dense
in H. Indeed, suppose z ∈ R(A + i)⊥ . Then

< z, (A + i)x >= 0∀x ∈ D(A)

This implies that


< z, Ā + i)x >= 0∀x ∈ D(A)
But Ā is self-adjoint and hence

(Ā − i)z = 0

(Note that D(A) is dense in H). Taking the norm on both sides and using the
fact that Ā is self-adjoint, we get

 Āz 2 +  z 2 = 0

and hence z = 0. Thus, R(A + i)⊥ = 0. Likewise, we establish that R(A − i)⊥ =
0.

[12] Problems in operator theory


[1] Define the adjoint of an operator in a Hilbert space.
If A is a densely defined operator in H, ie, D(A) is dense in H, then we
define D(A∗ ) as the set of all g ∈ H such that there exists a ug ∈ H such that

< g, Af >=< ug , f > ∀f ∈ D(A)

In this case, we define


A∗ g = ug
Advanced Probability and Statistics: Applications to Physics and Engineering 87

Show that A∗ is a well defined unique linear operator in H. For proving the
uniqueness, you must use the density of D(A).
[2] Let A be a linear operator in H with dense domain D(A). Show that A∗
is a closed operator in H.
hint: fn ∈ D(A∗ ), fn → f, A∗ fn → g imply that for all h ∈ D(A),

< A∗ fn , h >=< fn , Ah >→< f, Ah >

and
< A∗ fn , h >→< g, h >
proving that
< f, Ah >=< g, h > ∀h ∈ D(A)
and therefore,
g = A∗ f
so that A∗ is closed.
[3] If an operator A in H is closed and D(A) = H, then A is bounded.
Equivalently, if A is a closed invertible operator in H with range H, then A−1
is bounded, ie, A maps open sets into open sets. These two equivalent versions
of the same result are respectively known as the closed graph theorem and the
open mapping theorem. Its proof is based on Baire’s category theorem in general
topology.
Let A be closed with D(A) = H. Let S = A−1 (B(0, 1)). Then, H is the
union of nS, n = 1, 2, .... Hence, by the Category theorem, S̄ contains a ball
K = B[u, r] say. If  x < 2r, we can write x = u1 − u2 = u1 − u0 − (u2 − u0 )
with u1 , u2 ∈ K, ie  uk − u0 < r, k = 1, 2. Since u1 , u2 ∈ S̄, it follows that
there exist sequences u1 (n), u2 (n) ∈ S such that uk (n) → uk , k = 1, 2. Then,

 A(u1 (n) − u2 (n)) < 2

and hence
u1 (n) − u2 (n) ∈ 2S
Thus taking limits,
u1 − u2 ∈ 2S̄
This argument implies that x ∈ B(0, λr) implies x ∈ λS̄. Choose any x ∈
B(0, r). Then x ∈ S̄ and hence there exists v1 ∈ S such that  x − v1 <
r. Then, it follows that x − v1 ∈ S̄. So there exists v2 ∈ S such that
 x − v1 − v2 < 2 r and hence x − v1 − v2 ∈ 2 S̄. Continuing this way, we get
for each n = 1, 2, ... a vn ∈ n−1 S such that  x − v1 − v2 − ... − vn  n r Thus,

n
vj → x,
j=1

and

n 
n
 A( vj ≤ j−1

j=m+1 j=m+1
88 Advanced Probability and Statistics: Applications to Physics and Engineering

n
Thus, A( j=1 vj ), n = 1, 2, ... is Cauchy in H and hence by the closedness of
A, it converges to A(x). This means that


 A(x) ≤  A(vj ) 
j=1



≤ j−1
= (1 − )−1
j=1

It follows that
supx∈B(0,r)  Ax ≤ (1 − )−1
thereby establishing that A is a bounded operator.

[13] Spectral theorem in infinite dimensional Hilbert space


T is a bounded Hermitian operator in a Hilbert space.

|T | = T 2

is defined via an iterative process, ie, as the operator norm limit of a sequence
of polynomials in T . Define

T+ = (|T | + T )/2, T− = (|T | − T )/2

Then,
T± ≥ 0, T = T+ − T− , |T | = T+ + T−
Let E denote the projection onto N (T+ ). Then since

T+ T − = 0

it follows that
R(T− ) ⊂ R(E)
and hence
ET− = T−
Taking adjoints gives us
T− E = T−
and hence,
T− = ET− = T− E = ET− E
Let [S, T ] = 0 with S ≥ 0, S ≥ T . Then S ≥ T+ . To see this we note that
[S, T+ ] = 0 and hence

S = ESE + ES(1 − E) + (1 − E)SE + (1 − E)S(1 − E)

T+ SE = ST+ E = 0
so that
SE = ESE
Advanced Probability and Statistics: Applications to Physics and Engineering 89

and hence
(1 − E)SE = 0
Thus,

S = ESE + (1 − E)S(1 − E) ≥ (1 − E)S(1 − E) ≥ (1 − E)T (1 − E) = T+

Note that E projects onto N (T+ ) = R(T+ )⊥ and hence 1 − E projects onto
R(T+ ). Hence (1−E)T+ = T+ and taking adjoints, we get T+ (1−E) = T+ . This
also directly follows from T+ E = 0. Also T+ T− = T− T+ = (|T |2 − T 2 )/4 = 0
implies ET− = T− and hence (1 − E)T− = T− (1 − E) = 0. Thus,

(1 − E)T (1 − E) = (1 − E)(T+ − T− )(1 − E) = (1 − E)T+ (1 − E) = T+

Now let μ > λ. We have that

T − μ ≤ T − λ ≤ (T − λ)+

and hence by the above result,

(T − λ)+ ≥ (T − μ)+

Thus, if E(λ) is the projection onto N (T − λ)+ ), then we have that

E(λ) ≤ E(μ)

Then,

(T − λ)(E(μ) − E(λ)) = (T − μ)(E(μ) − E(λ)) + (μ − λ)(E(μ) − E(λ))

≤ (T − μ)+ (E(μ) − E(λ)) + (μ − λ)(E(μ) − E(λ))


= (μ − λ)(E(μ) − E(λ))
and also
(T − λ)(E(μ) − E(λ)) ≥ −(T − λ)− (E(μ) − E(λ))
Now
(T − λ)− (T − λ)+ = 0
and hence
(T − λ)− (1 − E(λ)) = 0
It follows that since μ > λ,

(T − λ)− (E(μ) − E(λ)) ≤ (T − λ)− (1 − E(λ)) = 0

and hence
(T − λ)(E(μ) − E(λ)) ≥ 0
and therefore from the above two inequalities,

 (T − λ)(E(μ) − E(λ)) ≤ μ − λ, μ > λ


90 Advanced Probability and Statistics: Applications to Physics and Engineering

from which it follows at once that we have the spectral representation:



T = xdE(x)

(We let M = T  and choose a partition P : −M = c1 < c2 < ... < cn − = M )


and then note from the above that


n−1
T − cj (E(cj+1 ) − E(cj )) 2
j=0


n−1
= (T − cj )(E(cj+1 ) − E(cj )) 2
j=0


n−1
≤ (cj+1 − cj )(E(cj+1 ) − E(cj )) 
j=0


n−1
≤ (cj+1 − cj )2 ≤ 2M |P | → 0, |P | → 0
j=0

Note that we have used the fact that Since E(cj ) ≤ E(cj+1 )∀j and since the
E(cj ) s are orthogonal projections, it follows that the subspaces R(E(cj+1 ) −
E(cj )), j = 0, 1, 2, ..., n − 1 are all mutually orthogonal.

[14] Riesz representation theorem


Let H be a Hilbert space and let ψ be a non-zero bounded linear functional
on H. Consider
N (ψ) = {x ∈ H : ψ(x) = 0}
Then since ψ is bounded, N (ψ) is a closed subspace of H. Let P denote the
orthogonal projection onto N (ψ). Let y be non-zero unit vector in N (ψ)⊥ .
Such a vector exists because ψ is not the zero linear functional. Also note that
N (ψ)⊥ has dimension unity. Let x ∈ H. Then, define

z = x− < y, x > y

Then, obviously z ∈ N (ψ) and hence

0 = ψ(z) = ψ(x)− < y, x > ψ(y)

whence
ψ(x) =< u, x >, u = ψ̄(y).y

[15] All norms on a finite dimensional vector space are equivalent


Advanced Probability and Statistics: Applications to Physics and Engineering 91

Let  .  be a norm on a finite dimensional vector space V . Let {e1 , ..., eN }


be a basis for V . Clearly, the equation


N 
N
 x(k)ek ≤ K. |x(k)|
k=1 k=1

where
K = max( ek : 1 ≤ k ≤ N }
holds good. So it suffices to show that there exists a δ > 0 such that for all
x(1), ..., x(N ) ∈ C, we have


N 
N
δ. |x(k)| ≤ x(k)ek 
k=1 k=1

Suppose that this is false, then there exist a non-zero sequence (x(1, n), ..., x(N, n)),
n =1, 2, ... in CN such that

N 
N
δn . |x(k, n)| > x(k, n)ek 
k=1 k=1

for all n where δn → 0. Then defining


N
y(k, n) = x(k, n)/ |x(k, n)|
k=1

we find that
y(n) = (y(1, n), ..., y(N, n))
is a bounded sequence in CN such that


N
δn > y(k, n)ek , ∀n
k=1

By the Bolzano-Weierstrass theorem for CN , y(n) has a convergent subsequence


say
y(n(k)) = (y(1, n(k)), ..., y(N, n(k))), k = 1, 2, ...
Thus,
y(n(k)) → z = (z(1), ..., z(N )), k → ∞
and hence using continuity of the norm, we get


N 
N
 z(k)ek = limk→∞  y(m, n(k)) ≤ limδn = 0
k=1 k=1
92 Advanced Probability and Statistics: Applications to Physics and Engineering

Thus z(k) = 0, k = 1, 2, ..., N which contradicts the fact that


N
|z(k)| = 1
k=1

in view of

N
|y(k, n)| = 1
k=1

[16] Proof of the Bolzano Weierstrass theorem in RN in terms of


the Bolzano Weierstrass theorem in R
Theorem Let (x1 (n), ..., xN (n)), n = 1, 2, ... be a bounded sequence in RN .
Then, for each k = 1, 2, ..., N , xk (n), n = 1, 2, ... is a bounded sequence in R.
Choose a subsequence (n1 (k)) of (n) such that limk x1 (n1 (k)) = z1 exists. This
is possible in view of the Bolzano Weierstrass theorem for R. Then, choose a
subsequence (n2 (k)) of (n1 (k)) such that limk x2 (n2 (k)) = z2 exists. continu-
ing this way, we finally choose a subsequence (nN (k)) of (nN −1 (k)) such that
limk xN (nN (k)) = zN exists. Then, obviously,

lim(x1 (nN (k)), x2 (nN (k)), ..., xN (nN (k))) = (z1 , z2 , ..., zN )

This proves the theorem. Note that this proof fails if the dimension N →
∞, since the iterative procedure of choosing convergent subsequence does not
terminate and in fact there will generally not be any ”final subsequence”.
[17] Proof of the theorem that a Hilbert space is finite dimensional
iff the closed unit ball is compact
Let H be a finite dimensional Hilbert space, say dimH = N < ∞. Let

B = {x ∈ H : x ≤ 1}

Since as proved earlier, all norms on a finite dimensional vector space are equiv-
alent, there exist finite positive numbers K1 , K2 such that


N 
N 
N
K1 . |x(k)| ≤ x(k)ek ≤ K2 . |x(k)|
k=1 k=1 k=1

forall x(k) ∈ C, k = 1, 2, ..., N where {e1 , ..., eN } is a basis for H. Then, let


N
zn = xn (k)ek
k=1

bounded sequence in H. Then we get from the left hand inequality above
be a 
N
that k=1 |xn (k)| is a bounded sequence in R and hence

un = (xn (1), ..., xn (N ))


Advanced Probability and Statistics: Applications to Physics and Engineering 93

is a bounded sequence in CN and hence has a convergent subsequence say

(xn(m) (1), ..., xn(m) (N )) → (v(1), ..., v(N ))

Then again from the right-hand inequality above, we get


N 
N
 xn(m) (k)ek − v(k)ek 
k=1 k=1


N
K2 ≤ |xn(m) (k) − v(k)| → 0, m → ∞
k=1
N
which proves that k=1 xn (k)ek has a convergent subsequence in H and hence
we have proved that the closed unit ball in H is compact. Note that the closed
unit ball is a bounded subset of H. conversely, suppose H is infinite dimen-
sional. Then, we can choose an infinite sequence of linearly independent vectors
x1 , x2 , ... in H such that

xn+1 ∈
/ Mn = span{x1 , ..., xn }, n = 1, 2, ..

Define
xn+1 − Pn xn+1
en+1 =
 xn+1 − Pn xn+1 
where Pn is the orthogonal projection onto Mn . Then {en : n = 1, 2, ...} forms
an orthonormal set in H and hence does not have any Cauchy subsequence
which proves that the closed unit ball is non-compact.

[18] Syllabus for end-sem exam for SPC01, Applied linear algebra
[1] Schur’s lemmas and the Peter-Weyl theorem with proofs based on the
Schur orthogonality relations and the completeness proof based on the spectral
theory for compact self-adjoint operators in a Hilbert space. Intertwining op-
erators, irreducible representations, completely reducible representations, proof
that unitary representations of a group are completely reducible. Left and right
invariant Haar measures on locally compact groups.
[2] Proof of the Cq Shannon coding theorem. Proof of Schumacher’s noiseless
quantum coding theorem.
[3] differential equations satisfied by the scale factor S(t) and the density
and pressure ρ(t), p(t) in an expanding universe that is homogeneous, isotropic
and spatially flat.
[4] The general form of the energy-momentum tensor of the matter fluid
taking into account viscous and thermal effects. Derivation based on the second
law of thermodynamics.
[5] The linearized Einstein field equations applied to the derivation of the
propagation of inhomogeneities in the form of metric perturbations, density and
pressure perturbations and the velocity field perturbations
94 Advanced Probability and Statistics: Applications to Physics and Engineering

[6] Jordan canonical form, Jordan decomposition, primary decomposition,


examples of minimal and characteristic polynomials, row reduced echelon form,
calculation of functions of a matrix using the Cayley-Hamilton theorem.
[7] Proof of the Knill-Laflamme theorem for quantum error correcting codes.
What is a t-error correcting quantum code ? Examples using the Weyl operators
acting on L2 (An ) where A is a finite Abelian group.
[8] Commutation properties of creation, annihilation and conservation oper-
ator fields in Boson Fock space.
[9] Derivation of the EKF equations by linearizing the Kushner-Kallianpur
non-linear filtering equations.
[10] Derivation of the Belavkin quantum filtering equations using the refer-
ence probability method of Gough and Kostler.
[11] Basics of operator theory in infinite dimensional Hilbert spaces: spectral
norm of an operator, compactness of the closed unit ball and its relationship to
finite dimensionality of the Hilbert space, topological equivalence of all norms
in a finite dimensional vector space

[19] Questions on applied linear algebra


[1] Let X1 , X2 , ... be iid random variables with moment generating function

M (λ) = E[exp(λX1 )]

Assume that EX1 = 0. prove that

P r(|Sn |/n > δ) ≈ exp(−ninf {I(x) : |x| > δ}), n → ∞

where
I(x) = supλ (λx − log(M (λ)))
Show that
inf {I(x) : |x| > δ} = supλ≥0 (−λδ − log(M (λ)))
What is the meaning of this result ?

[2] (a) State and prove the contraction principle in large deviation theory.
Specifically show that if I(x) is the rate function for the family of random
variables Z( ), → 0, then the rate function for the family ψ(Z( )), → 0 is
given by
Iψ (x) = inf {I(z), z : ψ(z) = x}
Let X1 , X2 , ... be iid N (0, σ 2 ) random variables and define

Sn = X1 + ... + Xn

Consider the stochastic dynamical system in discrete time



Yn+1 = f (Yn ) + Xn+1

Show that for any Borel subset B of RN ,

lim →0 .log(P ((Y1 , ..., YN ) ∈ B)) =


Advanced Probability and Statistics: Applications to Physics and Engineering 95


N −1
−inf ( (yn+1 − f (yn ))2 /2σ 2 , (y1 , ..., yn ) ∈ B)
n=0

What is the interpretation of this result from the viewpoint of stability of a


dynamical system in the presence of white Gaussian noise?

[20] Hudson-Parthasarathy quantum stochastic calculus


[1] Prove that if T is a closed symmetric operator in a Hilbert space H, then
T is self-adjoint iff R(T + c ± i) = H for any real non-zero c. Show further that
if T is a self-adjoint operator and A is a closed symmetric operator such that
D(T ) ⊂ D(A) and  Ax ≤ a  T x  +b  x , ∀x ∈ D(T ) where 0 ≤ a < 1,
then  A(T ± ci)−1 < 1 for all sufficiently large real c. Hence deduce that
R(T + A ± i) = H implying that T + A is self-adjoint.

[2] If an , n = 1, 2, ... are operators in a Hilbert space H satisfying [an , am ] =


0, [an , a∗m ] = δnm , then (a) show that for u, v ∈ l2 (Z+ ) and

a(u) = ūn an
n

we have
[a(u), a(v)] = 0, [a(u), a(v)∗ ] =< u, v >
Now define for an infinite matrix H = (Hnm )), the operator

Λ(H) = Hnm a∗n am
n,m

Show that
[a(u), Λ(H)] = a(Hu)
where 
(Hu)n = Hnm um
m

Now let {|en >} be an orthonormal basis for the Hilbert space H ⊗ L2 (R+ ) and
define 
At (u) = < uχ[0,t] , en > an
n

where
u ∈ H ⊗ L2 (R+ )
Prove the quantum Ito formula:

dAt (u).dAt (v)∗ =< u, v > dt

Your may do this problem by considering the matrix elements on both sides
w.r.t coherent vectors |e(w) > defined by

an |e(w) >=< en , w > |e(w) >, w ∈ H ⊗ L2 (R+ )


96 Advanced Probability and Statistics: Applications to Physics and Engineering

Also use the fact that (prove it)


 min(t,s)
[At (u), As (v)] = < u(α), v(α) > dα
0

[21] Consider the following unperturbed metric of space-time

dτ 2 = dt2 − S(t)2 (dx2 + dy 2 + dz 2 )

Let S(t) be such that the Einstein field equations

Rμν = −8πG(Tμν − (1/2)T gμν )

are satisfied with

T00 = ρ(t), Trr = −p(t)grr , r = 1, 2, 3, T0r = 0, r = 1, 2, 3

First prove that such a solution exists. Write down the first order perturbed
field equations taking the perturbed metric tensor as

gμν + δgμν , δg0μ = 0,

the covariant components of the perturbed velocity field as

1 + δv0 , δvr , r = 1, 2, 3,

the perturbed density field as

ρ(t) + δρ(t, r)

and the perturbed pressure field as

p(t) + δp(t, r)

[22] Write short notes on any two of the following:


[a] Post-Newtonian celestial mechanics
[b] Post-Newtonian Einstein field equations
[c] Post-Newtonian hydrodynamics
[d] Proof of the Knill-Laflamme theorem in quantum error correcting codes.

[23] Consider the following situation: The underlying Hilbert space is H =


Cn . The code subspace is C = Cr . Thus ρ is a density matrix whose range falls
in the subspace C iff it has the block structure

ρ1 0
ρ= , ρ1 ∈ Cr×r
0 0
Advanced Probability and Statistics: Applications to Physics and Engineering 97

The noise manifold N consists of all matrices having the block structure

0 X
N=
cY Z

where X is an arbitrary r × (n − r) matrix, c is an arbitrary complex number,


Y is a fixed (n − r) × r matrix satisfying Y ∗ Y = Ir and Z is an arbitrary
(n − r) × (n − r) matrix. Show that N is a vector space and secondly show
that the Knill-Laflamme conditions are satisfied for this choice of (C, N ). Now
explicitly construct the appropriate recovery operators.

[24] Let A be the set {0, 1, ..., N − 1} with addition modulo N . For a, b ∈ A,
define
< a, b >= exp(2πiab/N )
Let |a > denote the N ×1 vector having a one in the a+1th position and zeros at
all the other positions. Thus, L2 (A) = CN has the onb {|a >: a = 0, 1, ..., N −1}
and we define the operators U (a), V (a) on this space by

U (a)|x >= |x + a >, V (a)|x >=< a, x > |x >

Prove that W (a, b) = U (a)V (b), a, b ∈ A are unitary operators in L2 (A) and
that
V (b)U (a) =< b, a > U (a)V (b), a, b ∈ A
Prove that
T r(W (a, b)∗ W (u, v)) = N δa,u δb,v
and hence that {N −1/2 W (a, b), a, b ∈ A} forms an orthonormal basis for the
space B(L2 (A)) of all linear operators in L2 (A) with inner product

< X, Y >= T r(X ∗ Y )

Write down the spectral representations of the Abelian unitary groups of op-
erators {U (a), a ∈ A} and {V (a), a ∈ A}. Specifically, determine the spectral
families {P (a), a ∈ A} and {Q(a), a ∈ A} such that
 
U (a) = < a, b > P (b), V (a) = < a, b > Q(b)
b∈A b∈A

You may have to use the identity



< c, a >< a, b >∗ = N δc,b , c, b ∈ A
a∈A
Chapter 4

Group Theory in Statistical


Signal Processing and
Control

[1] Weyl’s character formula for the characters of compact semisimple Lie groups.
[2] Weyl’s integration formula for compact groups. Let G be a compact
semisimple group and H a Cartan subalgebra. Let X = G/H and define the
map ψ : X × H → Gby ψ(gH, h) = ghg −1 . It is clear that the map ψ is well
defined. The differential of ψ can be obtained as follows:

gh(1 + δh)g −1 = ghg −1 + ghδh.g −1 =

ghg −1 (1 + g.δh.g −1 )
on the one hand and on the other,

g(1 + δg)h(1 + δg)−1 g −1 =

g(1 + δg)h(1 − δg)g −1 = g(h + [δg, h])g −1


= ghg −1 (1 + gh−1 [δg, h]g −1 )
The determinant of the map

δh → g.δh.g −1

is unity and taking δg = Xα (root vector) and h = exp(H), we get

h−1 [δg, h] = h−1 .δg.h − δg =

exp(−ad(H))(Xα ) − Xα = (exp(−α(H)) − 1)Xα


It follows that the Jacobian determinant of the map ψ is given by

det(Dψg,h ) = Πα∈Δ+ |exp(−α(H)) − 1|2 = |Δ(h)|2

99
100 Advanced Probability and Statistics: Applications to Physics and Engineering

say where H = log(h) and

Δ(h) = Πα∈Δ+ |exp(α(H)/2) − exp(−α(H)/2)|2

Now let W denote the Weyl group acting on H. We then have


 
f (g)dg = |W |−1 f (ψ(g, h))Dψg,h d(G/H)dH
G G/H×H

= |W |−1 f (ghg −1 )|Δ(log(h)|2 d(G/H)dH
G/H×H

The reason for the factor |W |−1 appearing on the rhs is that if w ∈ W , then
gwH(gw)−1 = gHg −1 so in the process of integration, each element of G/H is
being counted |W | times. Equivalently ψ(w.H, h) = whw−1 ∈ H which means
that the number of permutations of each element in H is |W |. Equivalently
the number of permutations of each element in each coset of H is |W | in the
expression ghg −1 , ie, we are counting each element in each coset of H |W | times.

[3] Induced representations and the imprimitivity theorem.


Let G be a group and H a subgroup of it. Consider the coset space X =
G/H. Let L be a representation of H. It is clear that G acts on G/H in the
natural way. There are two ways to define the representation of G obtained by
induction of L. The first way is to suppose that h denotes the representation
Hilbert space of L. We define the Hilbert space H to be the space of all functions
f : G → h such that (a) f (gh) = L(h)−1 f (g) for all g ∈ G, h ∈ H and (b) G/H 
f (g) 2 dG/H < ∞. Note that since we are assuming L to be unitary,

 f (gh) 2 L(h)−1 f (g) 2 = −  f (g) 2 g ∈ G, h ∈ H

and hence  f (g) 2 is a well defined function on G/H whenever condition (a)
is satisfied. We define the induced representation UL = indG H L to act in H so
that
(UL (g1 )f )(g) = f (g1−1 g), g, g1 ∈ G, f ∈ H
A second way to define the induced representation is as follows. Let γ(.) be
a cross section for G/H, ie, for each x ∈ G/H, let γ(x) ∈ G satisfy γ(H) =
e, γ(x)H = x and for any distinct x, y ∈ G/H, γ(x) = γ(y). Then define for
x ∈ G/H, g ∈ g,

(U (g)φ)(x) = L(γ(g −1 x)gγ(x))−1 φ(g −1 x)

where φ : G/H → h is square integrable. We first show that U is a representation


of G:
U (g2 )(U (g1 )φ)(x) = L(γ(g2−1 x)g2 γ(x))−1 (U (g1 )φ)(g2−1 x)
= L(γ(g2−1 x)g2 γ(x))−1 L(γ(g1−1 g2−1 x)g1 γ(g2−1 x))−1 f (g1−1 g2−1 x)
Advanced Probability and Statistics: Applications to Physics and Engineering 101

= L(γ(x)−1 g2−1 g1−1 γ(g1−1 g2−1 x)−1 )f (g1−1 g2−1 x)


= L(γ((g2 g1 )−1 x)g1 g2 γ(x))−1 f ((g2 g1 )−1 x)
= U (g2 g1 )φ(x)
This proves that U is a representation of G in the Hilbert space H1 comprising
all square integrable maps from G/H into h. It is clear that H and H1 are
isomorphic in a natural way. Now define

Induced representation theory of semidirect products and its relation to im-


primitivity systems. Let N be an Abelian sub-group and H a closed subgroup
of G so that G = N ⊗s H. Let U be a unitary representation of G acting in
the Hilbert space H. For n ∈ N, h ∈ H, define V (n) = U (n), W (h) = U (h) so
that U (nh) = V (n)W (h) where V is a unitary representation of the Abelian
group N and W is a unitary representation of the group H. Now since the
operators V (n), n ∈ N all commute, they can be simultaneously diagonalized.
Let the eigenvalues of V (n) be χ1 (n), ...χN (n) and the corresponding onb of
eigenvectors be v1 , ..., vN , ie

V (n)vj = χj (n)vj , j = 1, 2, ..., N, < vj , vk >= δjk

Note that some of the characters χj , j = 1, 2, ..., N may be the same so we denote
by N̂ the set of distinct characters of N and let P (χ) the orthogonal projection
onto the common eigenspace of V (n), n ∈ N with corresponding eigenvalues
χ(n), n ∈ N . Then by the spectral theorem,

V (n) = χ(n)P (χ)
χ∈N̂

We have for h ∈ H since hnh−1 ∈ N ,



W (h)V (n)W (h)−1 = U (hnh−1 ) = V (hnh−1 ) = χ(hnh−1 )P (χ)
χ∈N̂

 
= (h−1 χ)(n)P (χ) = χ(n)P (h.χ)
χ∈N̂ χ∈N̂

on the one hand and on the other,



W (h)V (n)W (h)−1 = χ(n)W (h)P (χ)W (h)−1
χ∈N̂

and hence, we deduce the identity

W (h)P (χ)W (h)−1 = P (h.χ), h ∈ H, χˆˆN

Thus, (H, W, P ) defines an imprimitivity system. Irreducibility of the repre-


sentation U is easily seen to be equivalent to the irreducibility of the family
102 Advanced Probability and Statistics: Applications to Physics and Engineering

of operators {W (h), V (n), h ∈ H, n ∈ N } or equivalently, irreducibility of the


family of operators {W (h), P (χ), h ∈ H, χ ∈ N̂ }, ie, irreducibility of the im-
primitivity system (H, W, P ). Now choose and fix a character χ0 ∈ N̂ and
consider its orbit under H:

O(χ0 ) = {h.χ0 : h ∈ H}

Define the vector spaces

V (χ) = P (χ)H, χ ∈ N̂

For n ∈ N , we have

U (n)V (χ) = V (n)P (χ)H = P (χ)H = V (χ)

and for h ∈ H,

W (h)V (χ) = W (h)P (χ)H = W (h)P (χ)W (h)−1 H

= P (h.χ)H = V (h.χ)

Thus H0 = {V (χ) : χ ∈ O(χ0 )} is invariant under the representation W of
H. We define H0 to be the set of all h ∈ H for which h.χ0 = χ0 . then H0
is a group. Let V0 be any proper subspace of V (χ0 ). It is clearly invariant
under the representation V of N . Note that we have seen above that V (χ0 )
is invariant under
 V . It is clear then that {W (h)V0 : h ∈ H} is a proper
subspace of {V (χ) : χ ∈ O(χ0 )} that is U -invariant. Note that W (h)V0 ⊂
V (h.χ0 ), h ∈ H. Thus, U |H0 is irreducible if V (χ0 ) has no proper invariant
subspace under the representation W of H or equivalently (since W (h)V (χ0 ) =
V (h.χ0 ) has zero intersection with V (χ0 ) unless h.χ0 = χ0 ), if V (χ0 ) has no
proper invariant subspace under the representation W of H0 . The converse
is also easily seen to be true. Specifically, suppose S is a proper subspace of
H0 that is invariant under U . Then it is easily seen that V0 = S ∩ V (χ0 )
is a proper subspace of V (χ0 ) that is invariant under H0 because V (χ0 ) and
V (h.χ0 ) are unitarily isomorphic under W (h). Hence the representation U |H0
of G is irreducible under G iff the representation W of H0 is irreducible.Now
let L be any irreducible representation of H0 . Then it is easy to see that
nh → (χ0 ⊗ L)(n, h) = χ0 (n)L(h) is an irreducible representation of N ⊗s H0
and hence by the above argument, U = indG N ⊗s H0 ]χ0 ⊗ L is irreducible.

[4] Work of Harish-Chandra


To derive the Plancherel formula for any semi-simple Lie group.
Consider first G = SL(2, C). We use the Iwasawa decomposition

G = KM AN

Choose a character χ of the diagonal group M A and denote by Lχ the repre-


sentation induced by χ from M A to G. Then Lχ is called a principal series
Advanced Probability and Statistics: Applications to Physics and Engineering 103

representation and Gelfand proved that it is irreducible. Let B = M AN so that


G = KB. Note that B is a group. For u ∈ K and x ∈ G, we write x[u] = k(xu).
Note that by definition

x = k(x)a(x)n(x), k(x) ∈ K, a(x) ∈ A, n(x) ∈ N

We have for x = kb with k ∈ K, b ∈ B, with g(kb) = f (k)E(b),


   
g(x)dx = f (k(x))E(b(x))dx = f (k)E(b)μ(b)dkdb = f (k)dk
G

On the other hand, since G is unimodular,


  
g(x)dx = g(xb)μB (b)dxdb = g(k(x)b(x)b)μB (b)dxdb =
G G
 
= f (k(x))E(b(x)b)μB (b)dxdb = f (k(x))E(b)μB (b(x)−1 b)dxdb

= f (k(x))E(b)μB (b(x))−1 μB (b)dxdb

= f (k(x))μB (b(x))−1 dx

For any u ∈ K, this equals


 
f (k(xu))μB (b(xu))−1 dx = f (x[u])μB (b(xu))−1 dx

Now,
  
f (k)dk = f (x[u])(dx[u]/du)du = f (x[u])(dx[u]/du)dxdu
K

and hence we deduce that

dx[u]/du = μB (b(xu))−1

Principal series: Let χ be a character of M A. We realize induced represen-


tation of G by this character in the space of functions f on G = KAN such
that
f (xa) = χ(a)−1 f (x)
The principal series representation Lχ is then given by

Lχ (g)f (x) = f (g −1 x), x, g ∈ G

Equivalently we can realize the principal series in the space of functions on


G/M A or equivalently on K by the standard formula

Lχ (x)f (u) = (dx−1 [u]/du)1/2 χ(b(x))f (x−1 [u])


104 Advanced Probability and Statistics: Applications to Physics and Engineering

where a(x) is the M A-component of x, ie, x = k(x)b(x), k(x) ∈ K, b(x) ∈ AN =


B. Now,
d(ana−1 ) = δ 2 (a)dn
where
δ(a) = Πα∈P+ (A) |exp(α(loga)) − 1|
where P+ (A) are the positive roots of (g, a) where g is the Lie algebra of G and
a is the Lie algebra of A. We have

dx−1 [u]/du = μB (b(x−1 u))−1

So computing this Radon-Nikodym derivative amounts to computing the modu-


lar function μB of AN . Note that dr b = μB (b)db where dr b is the right invariant
measure on B and db is the left invariant measure. Now,
 
f (ann )δ(a)2 dadn = f (an)δ(a)2 dadn,

 

 2
f (ana )δ(a) dadn = f (aa a −1 na )δ(a)2 dadn
 
 
= f (an)δ(aa −1 )2 dad(a na −1 ) = f (an)δ(a)2 delta(a )−2 δ(a)2 dadn

= f (an)δ(a)2 dadn

and, hence, we deduce that

μB (an) = μB (a) = δ(a)2

Thus, we obtain finally, the principal series representations in the form

Lχ (x)f (u) = δ(a(x−1 u))−1 χ(b(x−1 u))f (x−1 [u])

where f varies over all smooth functions on K.

The principal series character distributions are Fourier transforms of the


orbital integral. Let χ be an invariant function so that χ restricted to the
Cartan subgroup A is a character. Then,
 
F̂f (χ) = Ff (h)χ(h)dh = |Δ(h)||f (xhx−1 )χ(h)dxdh
A G/A×A


= |W (A)|−1 |Δ(h)|2 f (xhx−1 )( χ(sh)/Δ(h))dxdh
s∈W (A)

= f (x)θχ (x)dx = Tχ (f )
G
Advanced Probability and Statistics: Applications to Physics and Engineering 105

where θχ is the principal series character distribution on G induced by the


character χ of A and is given by

θχ (h) = χ(sh)/|Δ(h)|, h ∈ A
x∈W (A)

Some other remarks on the Fourier transform on a Lie algebra.



fˆ(X) = f (Y )exp(iB(X, Y ))dY, B(X, Y ) = T r(ad(X).ad(Y ))
g

Then for g ∈ G,

(fˆ)(gXg −1 ) = f (Y )exp(iB(gXg −1 , Y ))dY
g

= f (Y )exp(iB(X, g −1 Y g))dY
g

= f (gY g −1 )exp(iB(X, Y ))d(gY g −1 )
g

= f (Y )exp(iB(X, Y ))dY = fˆ(X)
g

provided that f is G-invariant, ie,

f (gXg −1 ) = f (X), ∀g ∈ G, X ∈ g

In the above steps, we’ve used the identities

d(gY g −1 ) = dY, B(gXg −1 , gY g −1 ) = B(X, Y )∀g ∈ G, X, Y ∈ g

Remark: The Euclidean measure dX on g is defined w.r.t the invariant inner


product B(X, Y ) = T r(ad(X)ad(Y ). Specifically, if X1 , ..., Xn is an orthonor-
mal basis for g w.r.t this inner product, then for t1 , ..., tn ∈ R

d(t1 X1 ‘ + ...tn Xn ) = dt1 ...dtn

It follows from this that dX is invariant under adjoint action of G. More pre-
cisely, the Euclidean measure in N -dimensional Euclidean space remains in-
variant under orthogonal transformations and X → gXg −1 is an orthogonal
transformation w.r.t the inner product B(., .). Now observe that

Lχ (x)f (u) = χ(a(x−1 u))−1 δ(a(x−1 u))−1 f (x−1 [u])

so 
Lχ (α)f (u) = ( α(x)Lχ (x)dx)f (u) =
G
106 Advanced Probability and Statistics: Applications to Physics and Engineering


α(x)χ̄(a(x−1 u)δ(a(x−1 u))−1 f (x−1 [u])dx
G

= α(x)χ̄(a(x−1 u))δ(a(x−1 u))−1 f (k(x−1 u))dx
G

= α(ux−1 )χ̄(a(x))δ(a(x))−1 f (k(x))dx
G

= α(un−1 a−1 k −1 )χ̄(a)δ(a)−1 f (k)δ(a)2 dkdadn

Remark 1: The Haar measure on G = KAN is given by

dx = d(kan) = δ(a)2 dk.da.dn

where we recall that


d(an) = δ(a)2 dadn
is the Haar measure on B = AN .
Remark 2: Let x, y ∈ G. Then,

y[x[u]] = k(y.x[u]) = k(y.k(xu))

on the other hand,

(yx)[u] = k(yxu) = k(yk(xu)a(xu)n(xu)) = k(yk(xu))

Hence,
y[x[u]] = (yx)[u]
Now, we define
a(x : u) = a(xu), x ∈ G, u ∈ K
Then,
k(xu)a(x : u)n(xu) = xu
or equivalently,
x[u].a(x : u)n(xu) = xu
and also

a(xy : u) = a(xyu) = a(k(xy)a(xy)n(xy)u) = a(xk(yu)a(yu)n(yu))


= a(xk(yu))a(yu)
and hence,
a(xy : u) = a(x : y[u])a(y : u)
Thus, the map (x, u) → a(x : u) from G × K into A is a cocycle.

[5] Discrete series characters of SL(2, R) applied to pattern classification. As


we know, the adjoint action of SL(2, R) on matrices of the form

t+z x + iy
− − − (1)
x − iy t−z
Advanced Probability and Statistics: Applications to Physics and Engineering 107

precisely is equivalent to Lorentz transformations in the xzt plane. Suppose


that we are given two time varying image fields f1 (t, x, y, z), f2 (t, x, y, z). Both
of these two image fields are subject to the same Lorentz transformation in
the xzt plane, ie, rotations in the xz plane followed by Lorentz boosts in the
xz plane. We can thus regard the two image fields as functions on the group
G = SL(2, R) by taking a fixed point in R4 say ξ0 and writing fk (t, x, y, z) as
fk (gξ0 ) where g ∈ SL(2, R) is chosen so that τ (g)ξ0 = (t, x, y, z). Here, τ (g) is
the Lorentz transformation in the xzt plane corresponding to the adjoint action
of g on the matrix (1). We write fk (gξ0 ) = fk (g), g ∈ SL(2, R), k = 1, 2. Let
χm denote the character of a discrete series representation of SL(2, R). It is
given by (See V.S.Varadarajan ”Harmonic analysis on semisimple Lie groups”)

exp(mt)
χm (at ) = , at = exp(tH) ∈ L
exp(t) − exp(−t)

and
exp(imθ)
χm (θ) = , u(θ) = exp(θ.(X − Y )) ∈ B
exp(iθ) − exp(−iθ)
In this formula, L is the non-compact Cartan subgroup of G generated by the
Lie algebra R.H and B is compact Cartan subgroup of G generated by the Lie
algebra R.(X − Y ) where
1 0
H= ,
0 −1

0 1
X= ,
0 0

0 0
Y =
1 0
Note that {X, Y, H} is a basis for the Lie algebra of SL(2, R) and these satisfy
the standard Lie algebra commutation relations:

[H, X] = 2X, [H, Y ] = −2Y, [X, Y ] = H

We observe that B, L are two non-conjugate Cartan subgroups of G and any


element of G is conjugate to either an element of L or to an element of B.
Further, Weyl’s integration formula gives
 
f (g)dg = |exp(t) − exp(−t)|2 f (xhx−1 )dxdh
G G/L×L


+ |exp(iθ) − exp(−iθ)|2 f (xux−1 )dxdu
G/B×B

where
h = exp(at ) = exp(tH), u = exp(θ.(X − Y ))
108 Advanced Probability and Statistics: Applications to Physics and Engineering

Consider now the expression



Am (f1 , f2 ) = f1 (g1 )f2 (g2 )χm (g2−1 g1 )dg1 dg2

We have 
Am (f1 og, f2 og) = f1 (gg1 )f2 (gg2 )χm (g2−1 g1 )dg1 dg2

= f1 (g1 )f2 (g2 )χm (g2−1 g1 )dg1 dg2 = Am (f1 og, f2 og), g ∈ SL(2, R)

Thus Am (f1 , f2 ) is a G-invariant bilinear functional of two image fields f1 and


f2 . This fact enables us to do pattern classification. To apply this fact in a
practical situation, we first need to compute the G-invariant measures on G/L
and G/B. By the Iwasawa decomposition, G/L is isomorphic to BN and G/B
to AN . We therefore compute the invariant measure on G = BAN in terms
of this Iwasawa decomposition: g = g(θ, t, s) = u(θ)a(t)n(s) using the standard
method of calculating Haar measures as the reciprocal of the wedge of product
of a basis of left invariant vector fields on the group. Note that

cos(θ) sin(θ)
u(θ) = = exp(θ(X − Y ))
−sin(θ) cos(θ)

exp(t) 0
a(t) = = exp(tH)
0 exp(−t)
1 s
n(s) =
0 1
Then,
∂g/∂s = g.X,
∂g/∂t = u(θ)a(t)Hn(s) = g(n(s)−1 Hn(s)) =
∂g/∂θ = u(θ)(X − Y )a(t)n(s) = g(n(−s)a(−t)(X − Y )a(t)n(s))
Now,
1 s 1 s
n(s)−1 Hn(s) =
0 −1 0 1
1 2s
= = H + 2sX
0 −1
so
∂/∂t → H + 2sX,
n(−s)a(−t)(X − Y )a(t)n(s) =
1 −s exp(−t) 0 0 1 exp(t) 0 1 s
. . . .
0 1 0 exp(t) −1 0 0 exp(−t) 0 1

= sexp(2t)H + (s2 .exp(2t) + exp(−2t))X − exp(−2t)Y


Advanced Probability and Statistics: Applications to Physics and Engineering 109

So,
∂/∂θ → s.exp(2t)H + (s2 .exp(2t) + exp(−2t)).X − exp(−2t).Y
and
∂/∂s → X
Solving these equations gives us the correspondence

X → ∂/∂s, H → ∂/∂t − 2s.∂/∂s,

Y → −exp(2t)∂/∂θ + s.exp(4t)(∂/∂t − 2s.∂/∂s) + (s2 .exp(4t) + 1)∂/∂s


Computing the reciprocal of the determinant associated with this representation
of the three linearly independent vector fields X, H, Y gives us the Haar measure
on SL(2, R):

dg(θ, t, s) = exp(−2t)dθdtds, g(θ, t, s) = u(θ)a(t)n(s)

Actually, this expression has to be multiplied by one half if we use the Iwasawa
decomposition of G = SL(2, R) in the form

g = kman, G = KM AN

where M = ±I2 . The Haar integral on this group is then


 
f (g)dg = f (u(θ)a(t)n(s))exp(−2t)dθdtds
[0,2π)×R×R

Let Ge denote the elliptic subgroup and Gh the hyperbolic subgroup of G. Thus
G is the the union of these two groups and Ge ∩ Gh = {I}. We note that any
g ∈ Ge is conjugate to an element of the form u(θ) while any element g ∈ Gh is
conjugate to an element of the form a(t). Thus,
  
f (g)dg = f (g)dg + f (g)dg
G Ge Gh

Now by Weyl’s integration formula,


 
f (g)dg = (2π)−1 |Δ(θ)|2 f (x̄u(θ)x̄−1 )dx̄dθ
Ge G/T ×T

where
Δ(θ) = exp(iθ) − exp(−iθ)
and dx̄ is the invariant measure on G/T where T is the one dimensional torus

T = {u(θ) : 0 ≤ θ < 2π} = exp(R.(X − Y ))

For any g ∈ G = SL(2, R), we also have the singular value decomposition

g = u(θ1 )a(t).u(θ2 )
110 Advanced Probability and Statistics: Applications to Physics and Engineering

We compute the Haar measure using this representation as above and obtain

dg = f (t, θ1 , θ2 )dθ1 dtdθ2

We shall in what follows determine f .

∂g/∂θ2 = g.(X − Y )

so
∂/∂θ2 = X − Y
∂g/∂t = u(θ1 )a(t)Hu(θ2 )
= g.u(−θ2 )Hu(θ2 ) = cos(2θ2 ).H − sin(2θ2 ).(X + Y )
so we can write
∂/∂t → cos(2θ2 ).H − sin(2θ2 ).(X + Y )

and finally,
∂g/∂θ1 = u(θ1 )(X − Y )a(t)u(θ2 )
= g.u(−θ2 )a(−t)(X − Y )a(t)u(θ2 ) =
g.[sinh(2t)sin(2θ2 ).H + (exp(2t)sin2 (θ2 ) + exp(−2t)cos2 (θ2 ))X − Y ]
Thus,

∂/∂θ1 → [sinh(2t)sin(2θ2 ).H+(exp(2t)sin2 (θ2 )+exp(−2t)cos2 (θ2 ))X


−(exp(2t)cos2 (θ2 )+exp(−2t)sin2 (θ2 ))Y ]

[6] Pattern recognition using discrete series for SL(2, R) Steps


[1] Show that G = SL(2, R) by adjoint action on the space of all 2 × 2
Hermitian matrices induces the group of all proper Lorentz transformations in
the plane.
[2] Take a 2-D time varying image field f1 (t, x, z) and subject it to a Lorentz
transformation g0 in the txz = D space . The field becomes

f2 (ξ) = f1 (g0−1 ξ), ξ = (t, x, z)

[3] Choose an fix a ξ0 ∈ D and define

K1 (g  ) = f1 (g  ξ0 ), K2 (g) = f2 (gξ0 ), g ∈ G0

where G0 is the group of all proper Lorentz transformations on D. Let Φ denote


the homomorphism that carries g ∈ G to Φ(g) ∈ G0 . Then define

K1 (g) = K1 (Φ(g)ξ0 ), K2 (g) = K2 (Φ(g)ξ0 ), g ∈ G

[4] Let B denote the elliptic Cartan subgroup of G and L the hyperbolic
Cartan subgroup of the same. Determine the invariant measures d(G/B) and
Advanced Probability and Statistics: Applications to Physics and Engineering 111

d(G/L) on G/B and G/L respectively. Note that the Iwasawa decomposition
for G is G = KAN so that G/B can be identified with AN and every element
in L is conjugate to an element in A while every element in B is conjugate to
an element in K. Note that K = exp(R.(X − Y )) while A = exp(R.H). Given
a function f (g) on G we use Weyl’s integration formula:
 
f (g)dg = G/K × K|ΔK (k)|2 f (xkx−1 )dx̄dk
G

+ |ΔA (a)|2 f (yay −1 )dȳda
G/A×A

Here dx̄ is the invariant measure on G/K while dȳ is the invariant measure
on G/A. Suppose we write g ∈ G using the singular value decomposition g =
k1 ak2 , k1 , k2 ∈ K, a ∈ A. The invariant measure on G can be computed using
this decomposition as
dg = P (k1 , a, k2 )dk1 dadk2
Now given a function f on G/K, we can write it as f (gK) and we have that
K×A×K
f (k1 ak2 K)P (k1 , a, k2 )dk1 dadk2 is an invariant integral on G. It can
be expressed as f (k1 aK)P (k1 , a)dk1 da where P (k1 , a) = K P (k1 , a, k2 )dk2 .
It is clear then that this integral defines an invariant integral on G/K. An-
other way to do this is to use the decomposition G = N AK to write dg =
Q(n, a, k)dndadk for the invariant measure on G and then note that f (nak)Q(n, a)dnda
is the invariant integral on G/A where Q(n, a) = Q(n, a, k)dk. Likewise, sup-
pose we use the decomposition G = KN A to write the invariant measure on
G as g = R(k, n, a)dkdnda . Then defining R(k, n) = A R(k, n, a)da we can
express the invariant integral of f χ on G/A as f (kna)R(k, n)dkdn.
Now let χ(g) be a character of G. We wish to evaluate I = I(f, χ) =
G
f (g)χ(g)dg. We have using Weyl’s integration formula,
 
I= |ΔK (k)|2 f (xkx−1 )χ(k)dxdk + |ΔA (a)|2 f (xax−1 )χ(a)dxda
G/K×K G/A×A


= |ΔK (k)|2 f (k  aka−1 k −1
)χ(k)P (k  , a)dk  dadk
K×A×K

+ |ΔA (a)|2 f (knan−1 k −1 )χ(a)R(k, n)dkdnda
K×N ×A

This method can be used to compute image pair invariants for the Lorentz group
in the txz space or equivalently for SL(2, R).
[7] Computing Haar integrals
[1] Suppose that G is a semisimple Lie group with H as its Cartan subgroup.
We consider the mapping ψ : (x̄, h) → x̄hx̄−1 from G/H × H into G. Note
that the mapping is well defined since if x1 ∈ G, h1 ∈ H are arbitrary, then
(x1 h1 )h(x1 h1 )−1 = x1 hx−1
1 . Hence, it does not matter what representative
112 Advanced Probability and Statistics: Applications to Physics and Engineering

in G we choose for any x̄ ∈ G/H in computing x̄hx̄−1 . Now consider the


parametrization (x̄, h) ∈ G/H × H for the element g = ψ(x̄, h) ∈ G. From the
elementary theory of the Haar measure, we can write the Haar integral as
 
f (g)dg = f (ψ(x̄, h))|det(∂L(x̄,h) (x̄ , h )/∂(x̄ , h ))|x̄ =H,h =e |−1 dx̄dh

where
L(x̄,h) (x̄ , h ) = (x̄, h)o(x̄ , h ) = ψ −1 (ψ(x̄, h)oψ(x̄ , h ))
= ψ −1 (gog  ), g = ψ(x̄, h), g  = ψ(x̄ , h )
Thus, 
L(x̄,h) (x̄ , h )|x̄ =H,h =e = ψ −1 (g).Lg (e)ψ  (H, e)
where
Lg (g  ) = gog 
and hence
|det(L(x̄,h) (x̄ , h )|x̄ =H,h =e )|−1 dx̄dh =
|dg/d(x̄, h)|.||Lg (e)|−1 |ψ  (H, e)|−1 dx̄dh
= |dψ(x̄, h)/d(x̄, h)||ψ  (H, e)|−1 dx̄dh
since |Lg (e)|−1 dg = dg is the Haar measure on G and hence |Lg (e)| = 1 More
precisely, if G is also parametrized, then its invariant measure should be taken as
dg/|Lg (e)| and the above calculation will then show that the invariant measure
on G can be written in the G/H × H parametrization as

dμ(x̄, h) = |dψ(x̄, h)/d(x̄, h)|dx̄dh

where the constant |ψ  (H, e)|−1 has been set equal to unity without any loss
of generality. In this formula, dx̄dh is any parametrization of a measure on
G/H × H. Now we observe the following: In this parametrization, the left
invariant measure on G/H is dx̄/|Lx̄ (e)| where Lx̄ (ȳ) = x̄oȳ. Then we can
write the above as

dμ(x̄, h) = |dψ(x̄, h)/d(x̄, h)|Lx̄ (e)|dx̄dh/|Lx̄ (e)|

and we we denote the invariant measure on G/H by dx̄, ie we denote dx̄/|Lx̄ (e)|
by dx̄ so that dx̄ becomes the invariant measure on G/H, then we have for the
invariant measure on G,

dμ(x̄, h) = |dψ(x̄, h)/d(x̄, h)|Lx̄ (e)|dx̄dh

Now, for any function χ(x̄) on G/H we have

(d/ȳ)χ(x̄oȳ) = χ (xoy)Lx̄ (ȳ)

and in particular,
(d/dȳ)χ(x̄oy)|ȳ=H = χ (x̄)Lx̄ (e)
Advanced Probability and Statistics: Applications to Physics and Engineering 113

which gives on taking determinants,

|(d/dȳ)χ(x̄oy)|ȳ=H | = |χ (x̄)||Lx̄ (e)|

In view of this formula, we can write the above formula as

dμ(x̄, h) = |(Dψ(x̄oȳ, h)/D(ȳ, h)|ȳ=H,h=e dx̄dh

where dx̄ is now the invariant measure on G/H and dh is the invariant measure
on H.
Remark: We can give a more precise proof using parametrizations of G
and G/H × H in terms of Euclidean measures. Specifically, in terms of these
parametrizations we require
 
f (g)dg/|Lg (e)| = f (ψ(x̄, h))|L(x̄,h) (H, e)|−1 dx̄dh


Note that dx̄ is now a Euclidean measure on G/H relative to this ///parametriza-
tion, not the invariant measure on G/H and by the above calculation,

|L(x̄,h) (H, e)|−1 = |dg/d(x̄, h)|.||Lg (e)|−1 dx̄dh

Comparing these two expressions gives us

dg = |dg/d(x̄, h)|dx̄dh = |Dψ(x̄, h)/D(x̄, h)|dx̄dh

Note that in this formula dg is a Euclidean measure on G not the invariant


measure and likewise dx̄dh is a Euclidean measure on G/H ×H. In this formula,
g = ψ(x̄, h) is actually the coordinate version of g = x̄hx̄−1 . Thus, if g → F (g)
is the map that takes g ∈ G to its Euclidean coordinates, then the above formula
should be interpreted as

dF (g) = |DF (ψ(x̄, h))/D(x̄, h)|dx̄dh

where now ψ(x̄, h) = x̄hx̄−1 . Also the invariant measure on G in terms of these
Euclidean coordinates is therefore

dF (g)/|(dF (gog1 )/dF (g1 )|g1 =e = dF (g)/|F  (g)||Lg (e)|

provided that we set |F  (e)| = 1. Now, and as noted above, we have

DF (ψ(x̄oȳ, h))/D(ȳ, h)|ȳ=H,h=e =

= F  (g)Dψ(x̄, h)/D(x̄, h)|Lx̄ (H)|


and further,
(d/dg1 )(F (gog1 )|g1 =e = F  (g)Lg (e)
and hence

dF (g)/F  (g)|Lg (e)| = |Dψ(x̄oȳ, h)/D(ȳ, h)|ȳ=H,h=e dx̄dh/(|Lx̄ (H)|Lg (e)|)


114 Advanced Probability and Statistics: Applications to Physics and Engineering

with dx̄/Lx̄ (H) being the invariant measure on G/H so that if we use the
notation dx̄ in place of dx̄/|Lx̄ (H)|, then we can write the above formula as

dF (g)/(|F  (g)||Lg (e)|) = |Dψ(x̄oȳ, h)/D(ȳ, h)|ȳ=H,h=e dx̄dh/(|Lg (e)|)

The lhs of this equation gives the invariant measure on G in terms of its coor-
dinate parametrization while the rhs gives the same invariant measure in terms
of the parametrization of G using coordinates on G/H × H with g = x̄hx̄−1 .
It is clear that for the groups, SL(n, R), SL(n, C), U (n, R), U (n, C) and their
subgroups, |Lg (e)| = 1 and hence we can derive using the above formula, the
celebrated Weyl integration formula.

[8] Work of Harish-Chandra continued


[1] Invariant eigen-distributions on a semi-simple Lie group and on a semisim-
ple Lie algebra.
[2] Fourier transforms on a Lie algebra.
Let g be a semisimple Lie algebra with root space decomposition

g=h⊕ gα
α∈Δ

= h ⊕ n+ ⊕ n−
where h is a Cartan subalgebra and

n+ = gα ,
α∈P

n− = g−α
α∈P

where P is the set of positive roots w.r.t a fixed Weyl chamber and we note that
Δ = P ∪ (−P ) is the set of all roots. Let

B(X, Y ) = T r(ad(X).ad(Y )), X, Y ∈ g

and for f : g → C, define its Fourier transform by



ˆ
f (Y ) = f (X)exp(B(X, Y ))dX, Y ∈ g
g

By Weyl’s integration formula,



fˆ(Y ) = f (Ad(x)H)exp(B(Ad(x)(H), Y ))|π(H)|2 dxdH
G/A×h

where A = exp(h) and


π(H) = Πα∈P α(H)
Here we are assuming the Cartan decomposition

g=t⊕p
Advanced Probability and Statistics: Applications to Physics and Engineering 115

where u = t + ip is the Lie algebra of a maximal compact subgroup U of G and


θ is the involutive automorphism of g defined by

θ(X) = X, θ(Y ) = −Y, X ∈ t, Y ∈ p

Then
θ(X + Y ) = X − Y, X ∈ t, Y ∈ p
Note that if X ∈ t, then ad(X) has all imaginary eigevalues and hence B(X, X) <
0. Likewise, if Y ∈ p, then ad(Y ) has all real eigenvalues and hence B(Y, Y ) > 0.
Now since B(X, Y ) = 0 for X ∈ t, Y ∈ p, it follows that

B(X + Y, θ(X + Y )) = B(X + Y, X − Y ) = B(X, X) − B(Y, Y ) < 0, X ∈ t, Y ∈ p

and this means that (U, V ) → −B(U, θ(V )) is a positive definite form on g × g.
We have

fˆ(Y ) = f (Ad(x)H)exp(B(Ad(x)(H), Y ))|π(H)|2 dxdH
G/A×h

and hence for H  ∈ h, we have



fˆ(H  ) = f (Ad(x)H)exp(B(Ad(x)H, H  )|π(H)|2 dxdH

Now if x = exp(tXα ) for some root α, we get

Ad(x)(H) = H − tα(H)Xα

and hence
B(Ad(x), H  ) = B(H, H  )
since
B(Xα , H  ) = 0
for any H  ∈ h and α any root. Thus, it is clear that B(Ad(x)H, H  ) is non-
zero only when Ad(x)H ∈ h. This can happen when either x = exp(tXα ) with
α(H) = 0 or else when Ad(x) belongs to the Weyl group of h so that we get
Ad(x)H ∈ h. For a given H ∈ h, the set of linear functionals λ on h for which
λ(H) = 0 has zero measure and hence it easily follows from the above argument
that the Fourier transform of f at H  can be expressed as a superposition of
functions of the form (the superposition is over different H0 s)

fH0 (H  ) = c(s, H0 )exp(B(sH0 , H  ))
s∈W

Now, for any Z ∈ g, the Fourier transform of g = ∂(Z)f is given by



ĝ(Y ) = f (X; ∂(Z))exp(B(X, Y ))dX
116 Advanced Probability and Statistics: Applications to Physics and Engineering

= −B(Z, Y )fˆ(Y )

Let π be an irreducible representation of a semisimple Lie group G and


denote by the same symbol π the corresponding representation of the universal
enveloping algebra of U G. First we require to introduce the notion of the
universal enveloping algebra. Let g denote the Lie algebra of G. Suppose now
that Z is any element in the centre of the universal enveloping algebra of g. We
observe that for any H ∈ h, we have where fˆ = fH0 ,

∂(H).fˆ(H  ) = ∂(H) c(s, H0 )exp(B(sH0 , H  ))
s∈W

= c(s, H0 )B(sH0 , H)exp(B(sH0 , H  ))
s∈W

Hence if p is an invariant polynomial on h (ie invariant under the Weyl group),


then,
∂(p)fˆ(H) = p(H0 )fˆ(H)
In other words, fˆ(H) is an eigenfunction of all invariant differential operators.
If further fˆ(H  )/π(H  ) is invariant under the Weyl group, then we must clearly
have c(ss1 , H0 ) = (s)c(s1 , H0 ) for all s, s1 ∈ W . In other words, we must have
c(s, H0 ) = (s)c(H0 ), s ∈ W , ie c(s, H0 ), ie, c is skew-symmetric in s ∈ W . This
is because π(sH  ) = (s)π(H  ). A particular case of this is

FH0 (H) = π(H)−1 (s)exp(B(sH0 , H)), H ∈ h
s∈W

and we get for this function that

(π(H)−1 ∂(p)π(H))FH0 (H) = p(H0 )FH0 (H)

provided that p is any invariant polynomial on h, ie, p(sH) = p(H), H ∈


h, s ∈ W . In other words, FH0 is an invariant eigenfunction of the operator
π(H)−1 ∂(p)π(H) for any invariant polynomial p on h. It then follows from the
Chevalley-HarishChandra theory that FH0 extends to an invariant eigenfunc-
tion, or more precisely invariant eigendistribution on g. Let U (G) denote the the
universal enveloping algebra of G. We denote this function by FH0 (X), X ∈ g.
Then if p is an invariant polynomial on g, we may by use of HarishChandra’s
λ-map, assume p to be an element of the symmetric algebra S(g) for which
Ad(x)(p) = p, x ∈ G, or equivalently, p to be an element of the centre Z of U(G)
and then we have

FH0 (Ad(x)X) = FH0 (X), x ∈ G, X ∈ g,

and
FH0 (X, p) = χ(p)FH0 (X), p ∈ Z
Advanced Probability and Statistics: Applications to Physics and Engineering 117

where χ is a linear function on Z dependent upon H0 . Note that if p(X) is a poly-


nomial on g, we can by using the non-singularity of B(U, V ) = T r(ad(U ).ad(V )),
associate uniquely an element ξ(p) ∈ S(g) such that if p is a linear function,
then
B(ξ(p), Y ) = p(Y ), Y ∈ g
if p = p1 p2 ...pk where p1 , ..., pk are linear functions, then ξ(p) = S[ξ(p1 )...ξ(pk )]
where S denotes the symmetrization operator in the tensor algebra and finally,
if p, q are two polynomials and c a scalar, then

ξ(cp + q) = cξ(p) + ξ(q)

It should be noted that if X ∈ g, then ∂(X) is a differential operator on g, ie,


it acts on functions defined on g and further if x ∈ G, then

∂(Ad(x)X)f (Y ) = df (Y + tAd(x)X)/dt|t=0
−1
= df x (Ad(x−1 )Y + tX)/dt|t=0
−1 −1
= (∂(X)f x )(Ad(x−1 )Y ) = (∂(X)f x )x (Y )
= T (x)∂(X)T (x−1 )f (Y )

[9] Let g, h etc. be as above, ie a semisimple Lie algebra with a Cartan


subalgebra. We wish to show that if D is an invariant differential operator
acting in }, then there exists a unique differential operator δ(D) acting in h
such that (a) δ(D) is invariant under the Weyl group W and further,

f (H; D) = f (H; δ(D))

Further, if p is an invariant polynomial on g, then there is a unique invariant


polynomial p̄ on h such that

f (H; ∂(p)) = f (H; π −1 ∂(p̄)π), H ∈ h

or more precisely,
(∂(p)f )(H) = (π −1 .∂(p̄).π.f )(H)
The Casimir element: Consider

B(X, Y ) = T r(ad(X).ad(Y ))

Clearly, p(X) = B(X, X) is an invariant polynomial on g since

B(Ad(g)X, Ad(g)Y ) = B(X, Y ), g ∈ G, X, Y ∈ g

Choose a basis {X1 , ..., Xn } for g and define

gij = B(Xi , Xj )
118 Advanced Probability and Statistics: Applications to Physics and Engineering

Let ((g ij )) = ((gij ))−1 and define


n
ω= g ij Xi Xj
i,j=1

as an element of the universal enveloping algebra U(G) of G. Then we wish to


show that ω ∈ Z where Z is the centre of U(G). We have for g ∈ G,

Ad(g)ω = g ij (Ad(g)Xi ).(Ad(g)Xj )
i,j

Let {X 1 , ..., X n } be the dual basis to {X1 , ..., Xn }, ie,

B(X i , Xj ) = δji

Then clearly,
g ij = B(X i , X j )
and
 
g ij B(Ad(g)Xi , X).B(Ad(g)Xj , Y ) = g ij B(Xi , Ad(g)X).B(Xj , Ad(g)Y )
i,j i,j


= B(X j , Ad(g)X).B(Xj , Ad(g)Y ) = B(Ad(X), Ad(Y )) = B(X, Y )
j

This proves that Ad(g)ω is independent of g ∈ G and hence

Ad(g)ω = ω, g ∈ G

ie,
ω∈Z

Let
F (x : X) = F (Ad(x)X), x ∈ G, X ∈ g
Then let U, V ∈ g. We have

F (x; U : X; ∂(V )) = ∂ 2 /∂t∂s(F (Ad(x.exp(sU ))(X + tV )))|t=s=0

= ∂/∂s[(∂(Ad(x.exp(sU ))(V ))F )(Ad(x.exp(sU ))(X))]|s=0


= (∂(Ad(x)[U, V ])F )(Ad(x)X) + (∂(Ad(x)V )∂(Ad(x)[U, X])F )(Ad(x)X)
= ∂(Ad(x)([U, V ] + V [U, X]))F (Ad(x)X)
More generally, for U1 , ..., Ur , V1 , ..., Vm ∈ g, we have

F (x; U1 ...Ur : X; ∂(V1 ...Vm )) =


Advanced Probability and Statistics: Applications to Physics and Engineering 119

∂t1 ...∂tr ∂s1 ...∂sm F (Ad(x.exp(t1 U1 )...exp(tr Ur )).(X +s1 V1 +...+sm Vm ))|t=0,s=0

We consider the integral




F (H : H ) = exp(B(Ad(x)H, H  ))dx, H, H  ∈ h
G

We have

F (H; ∂(H1 ) : H  ) = B(Ad(x)H1 , H  )exp(B(Ad(x)H, H  ))dx
G

and more generally if p is any polynomial on h, we get




F (H; ∂(p) : H ) = p(Ad(x−1 )H  )exp(B(Ad(x)H, H  ))dx
G

and in particular, if p is any invariant polynomial on g and p̄ denotes its restric-


tion to h, then we get

F (H; ∂(p̄) : H  ) = p̄(H  )F (H : H  )

ie, for any H  ∈ h, H → F (H : H  ) is an eigenfunction of p̄ with eigenvalue


p̄(H  ). Note that we are here identifying polynomials on a Lie algebra with its
image in the symmetric algebra via the duality defined by the invariant bilinear
form B(., .). By symmetry, the same is true for invariant differential operators
acting on the second variable H  . Thus, we can easily see that the solution to
these eigenequations is of the form

F (H : H  ) = c(s).exp(B(sH, H  ))
s∈W

This is because a polynomial on g that is G-invariant has its restriction to h


invariant under the Weyl group W because any Weyl group element is induced
by an element Ad(x) for some x ∈ G acting on h. However, by the left and right
invariance of the measure dx on G (G is semisimple and hence unimodular), it
easily follows that

F (sH : H  ) = F (H : sH  ) = F (H : H  ), H, H  ∈ h, s ∈ W

Therefore,
c(s) = c0 , s ∈ W
is a constant, ie, 
F (H : H  ) = c0 exp(B(sH, H  ))
∈W
120 Advanced Probability and Statistics: Applications to Physics and Engineering

Let p be an invariant polynomial on g. Then ∂(p) is an invariant differential


operator in g. We have using Weyl’s integration formula
 
f (X)ḡ(X)dX = f (Ad(x)H)ḡ(Ad(x)H)|π(H)|2 dxdH
g G×A

Note that in deriving this, we use the Jacobian formula

d(Ad(exp(tXα ))H)/dt = d/dt(exp(tad(Xα )H)

= α(H)exp(tad(Xα ))H = α(H)Ad(exp(tXα ))H


and hence
d(Ad(x)H) = (Πα∈Δ α(H))dxdH = |π(H)|2 dxdH
Then, 
f (X; ∂(p))ḡ(X)dX =
g

f (x : H, ∂(p̄))ḡ(x : H)|π(H)|2 dxdH

where f (x : H) = f (Ad(x)H) and likewise for g. This identity can be expressed


as 
f (X; ∂(p))ḡ(X)dX =
g

f (x : H; ∂(p̄)π)ḡ(x : H; π)dxdH

− f (x : H, [∂(p̄), π])ḡ(x : H; π)dxdH

The first integral can be written as



|π(H)|2 f (x : H; π −1 ∂(p̄)π)ḡ(x : H)dxdH

Define the function f1 on g by

f1 (x : H) = f (x : H; π −1 ∂(p̄)π)

Then, the above formula can be expressed as



f (X; ∂(p))ḡ(X)dX =
g
 
f1 (X)ḡ(X)dX − f (x : H, [∂(p̄), π])ḡ(x : H; π)dxdH
g

Can we prove that the second integral vanishes ?

[10] Some aspects of Lie algebras


Advanced Probability and Statistics: Applications to Physics and Engineering 121

Let } be a semisimple Lie algebra and  a Cartan subalgebra. Consider

F (x : X) = F (Ad(x)X), x ∈ G, X ∈ g

For U1 , ..., Ur , V1 , ..., Vm ∈ g, consider

F (x; U1 ...Ur : X; ∂(V1 ...Vm )) =

(∂ 2 /∂t∂s)F (x.exp(t1 U1 )...exp(tr Ur ) : (X + s1 V1 + ... + sm Vm ))|t=0,s=0


= (∂ 2 /∂t∂s)F (Ad(x.exp(t1 U1 )...exp(tr Ur )).(X + s1 V1 + ... + sm Vm ))|t=0,s=0
= ∂/∂t[∂(Ad(x.exp(t1 U1 )...exp(tr Ur ))(V1 ...Vm ))F (Ad(x.exp(t1 U1 )...exp(tr Ur )).X)|t=0 ]
= ∂/∂t [∂(Ad(x)[U1 , Ad(exp(t2 U2 )...exp(tr Ur ))(V1 ...Vm )])F (Ad(x.exp(t2 U2 )...exp(tr Ur )X)+
∂(Ad(x.exp(t2 U2 )...exp(tr Ur )).(V1 ...Vm ))∂(Ad(x)[U1 , Ad(exp(t2 U2 )...exp(tr Ur ))X])
F (Ad(x.exp(t2 U2 )...exp(tr Ur )).X)|t2 =...=tr =0 ]
= ∂/∂t [∂(Ad(x)((L[U1 ,Ad(exp(t2 U2 )...exp(tr Ur ))X] +ad(U1 )).Ad(exp(t2 U2 )...exp(tr Ur )))V1 ...Vm ))
.F (Ad(x).Ad(exp(t2 U2 )...exp(tr Ur ))X)]|t =0
where
t = (t2 , .., tr )
This can be expressed as

∂[Ad(x)[((L[U1 ,ψ(t )X] + ad(U1 ))ψ(t ))p]]F (Ad(x).ψ(t )X)

= ∂[Ad(x)[([U1 , ψ(t )X] + ad(U1 ))ψ(t )p])F (Ad(x).ψ(t )X)


where
p = V1 ...Vm , ψ(t ) = Ad(exp(t2 U2 )...exp(tr Ur ))
Here, multiplication is being assumed to be taking place in U(G). Now differ-
entiating the expression

∂(([U1 , ψ(t )X] + ad(U1 ))ψ(t )p)F (ψ(t )X)

w.r.t t2 and setting t2 = 0 gives

[∂(([U1 , [U2 , ψ(t )X] + ad(U1 ).ad(U2 ))ψ(t )p)F (ψ(t )X)

+∂([U1 , ψ(t )X][U2 , ψ(t )p])


+∂([U2 , ψ(t )X])∂(([U1 , ψ(t )X] + ad(U1 )ψ(t ))p)].F (ψ(t )X)
where
t = (t3 , ..., tr )
If we put t = 0, then this expression becomes

[∂(ad(U1 )([U2 , X]+ad(U2 ))p)F (X)+∂([U2 , X]).∂(([U1 , X]+ad(U1 ))p)+∂([U1 , X][U2 , p])]F (X)

= ∂([U1 , [U2 , X]]p + [U1 , [U2 , p]])F (X) + ∂([U1 , X].[U2 , X]p)F (X)
122 Advanced Probability and Statistics: Applications to Physics and Engineering

+∂([U2 , X].[U1 , p])F (X) + ∂([U1 , X][U2 , p])F (X)


Define
σX (Y ) = L[X,Y ] + ad(Y )
Then, σX is a representation of g in S(g) and further,

σX (U1 )σX (U2 )p =

σX (U1 )([U2 , X]p + [U2 , p]) =


([U1 , X] + ad(U1 ))([U2 , X]p + [U2 , p]) =
= [U1 , X][U2 , X]p + [U1 , X][U2 , p] + [U1 , [U2 , X]p] + [U1 , [U2 , p]]
= [U1 , X][U2 , X]p + [U1 , X][U2 , p] + [U1 , [U2 , X]]p + [U2 , X][U1 , p] + [U1 , [U2 , p]]
These expressions are valid in the symmetric algebra.

A simple case:

F (x; U1 U2 : X; V ) = ∂ 3 /∂t1 ∂t2 ∂s(F (Ad(x.exp(t1 U1 ).exp(t2 U2 )).(X+sV )))|t1 =t2 =s=0

= (∂ 2 /∂t1 ∂t2 )∂(Ad(x.exp(t1 U1 ).exp(t2 U2 ))V )F (Ad(x.exp(t1 U1 ).exp(t2 U2 ))X)|t1 =t2 =0


= (∂/∂t2 )[∂(Ad(x.[U1 , Ad(exp(t2 U2 ))V ])F (Ad(x.exp(t2 U2 ))X)+
∂(Ad(x).[U1 , Ad(exp(t2 U2 )X])∂(Ad(x.exp(t2 U2 ))V )F (Ad(x.exp(t2 U2 )).X)]|t2 =0

Without considering the term Ad(x), this equals

∂([U1 , [U2 , V ]])F (X) + ∂([U2 , X][U1 , V ]) + [U1 , [U2 , X]]

+[U1 , X][U2 , V ] + [U1 , X][U2 , X]V )

Let G be a Lie group and let χ be an irreducible character of g, the Lie


algebra of G given by Weyl’s character formula

Δ(H)χ(H) = (s)exp(s(Λ + ρ)(H)), H ∈ h
s∈W

where Λ is a dominant integral weight and ρ = (1/2) α∈P α is one half the
sum of the positive roots. Here

Δ(H) = Πα∈P (exp(α(H)/2) − exp(−α(H)/2))



= (s)exp(sρ(H))
s∈W

χ is defined on Ad(g)h by the obvious relation

χ(Ad(g)H) = χ(H), g ∈ G, H ∈ h
Advanced Probability and Statistics: Applications to Physics and Engineering 123

We wish to prove two things: One that χ is a finite integer linear combination
of characters of the torus h and two that denoting χ above by χΛ , we have that

|Δ(H)|2 χΛ (H)χ̄μ (H)dH = δ(Λ, μ)
h

where μ is any other dominant integral weight. To prove this last statement,
it suffices to show that if s ∈ W differs from the identity, then s(Λ + ρ) − ρ is
integral but not dominant integral. For then it would follow that

exp(s(Λ + ρ)(H)).exp(−s (μ + ρ)(H))dH
h


= exp(s −1 s.(Λ + ρ)(H) − ρ(H)).exp(−μ(H))dH = 0
h

if μ is dominant integral, and s = s. Suppose α is any simple root. Then,

(sα .(Λ + ρ) − ρ)(H) =

Λ(H) − (Λ + ρ)(Hα )α(H)


When evaluated at Hα , this gives

−Λ(Hα ) − 2ρ(Hα )

which is a non-positive integer. A better way to see this is to note that since α
is simple, sα (P − {α}) = P − {α} and sα α = −α. Thus, sα ρ = ρ − α and we
get sα ρ − ρ = −α from which the result follows.

Reference:
[1] Harish-Chandra, Collected papers, Edited by V.S.Varadarajan, Springer.
[2] V.S.Varadarajan, ”Harmonic Analysis on Semisimple Lie Groups”, Cam-
bridge University Press.

[11] Probability theory in mechanics and image processing.


[a] An introduction to Lie groups, Lie algebras and their representations.
[b] The group of rotations SO(3) and its Lie algebra.
[c] The proper orthochronous Lorentz group and its Lie algebra.
[d] The Haar measure on a group.
[e] Examples of computation of the Haar measure.
[f] The groups SL(2, C), SL(2, R) and their irreducible representations.
[g] Realizing the full orthochronous Lorentz group using SL(2, C)
[h] Realizing the orthochronous Lorentz group in the plane (ie txy plane)
using SL(2, R).
[i] The kinetic energy of a rigid body in terms of the Lie algebra generators
of SO(3).
[j] Root space decomposition of a semi-simple Lie algebra.
[k] Classification theory of the classical simple Lie algebras.
124 Advanced Probability and Statistics: Applications to Physics and Engineering

[l] Dynkin diagrams.


[m] Hunt’s theorem on the generator of an infinitely divisible distribution on
a Lie group.
  
Lf (x) = (1/2) a(i, j)Xi Xj f (x) + (f (yx) − f (x) − b(i)Xi f (x))dν(y)
i,j i

[n] Cartan’s criteria for solvability and semisimplicity of a Lie algebra.

[o] The equations of motion of several rigid bodies pivoted to each other in the
presence of external non-random torque plus a small random torque component.
Computing the approximate mean and variance propagation equations using
perturbation theory. Computing the dynamics in coordinate free Lie algebra
domain using the formula for the differential of the exponential map. Large
deviation analyis of the perturbed motion by calculating the rate functional for
the perturbed Lie algebra process. First moment generating functional, then
limiting logarithmic moment generating functional of the Lie algebra process and
finally the Fenchel-Legendre transform of the moment generating functional and
then applying the Gartner-Ellis theorem to calculate the asymptotic probability
(in the limit of small noise amplitude) for the perturbed Lie algebra element to
remain within a small neighbourhood of the zero matrix (stability zone). We
do this calculation after introducing control feedback torque into the system in
the form of the error between a desired Lie algebra process and the actual Lie
algebra process. The coefficients of this control torque are then designed so that
the probability of deviation of the process from the stability zone (calculated
using large deviation theory) is as small as possible.

[12] Group theory in image processing


Problems:
[1] Let M be a manifold on which a Lie group G acts transitively. Let μ be a
left invariant measure on G. For any fixed x0 ∈ M, define the map τ : G → M
by τ (g) = gx0 . Show that μotau−1 is a G-invariant measure on M.
[2] Let H denote the stability subgroup of G that fixes x0 . let π be a unitary
representation of G in a Hilbert space H. Define the operator

P = π(h)dh
H

where dh is a left invariant measure on H that is also invariant under the


mapping h → h−1 . Assume that this measure can be so normalized so that
H
dh = 1. Show that
P2 = P = P∗
For this you must use the fact that

π(h−1 ) = π(h)∗ , h ∈ H
Advanced Probability and Statistics: Applications to Physics and Engineering 125

and  
P2 = π(h1 h2 )dh1 dh2 = π(h)dh1 dh = P
H×H H×H

by using the left invariance of the measure dh. Let Ĝ denote the collection of
all inequivalent irreducible representations of G. We expand a function f on G
using the Plancherel formula

f (g) = T̂ r(fˆ(π)π(g))dν(π)

where ν is the Plancherel measure on Ĝ. Then, we can write



f (g) = fˆab (π)πba (g)dν(π)
a,b Ĝ

and hence 
< f1 , f2 >= f¯1 (g)f2 (g)dg =
G

fˆ1ab (π)∗ fˆ2cd (σ)πba (g)∗ σcd (g)dgdν(π)dν(σ)
abcd G×Ĝ×Ĝ

If we assume the Schur orthogonality relations



πba (g)∗ σcd (g)dg = ω(π)δ(π, σ).δbc δad
G

then we get 
< f1 , f2 >= ω(π)fˆ1ab (π)∗ fˆ2ab (π)
π,a,b

and in particular,
 
2
f  = |f (g)|2 dg = ω(π)  fˆ(π) 2
G
π∈Ĝ

where
 A 2 = T r(A∗ A)
is the Frobenius norm square of the matrix A. In this case, the Plancherel
measure ν on Ĝ is discrete which is true in particular if the group is compact.
In the general case, we must replace the above by

πba (g)∗ σcd (g)dg = ω(π)δ(π, σ)δbc δad
G

where now δ(π, σ) is not the Kronecker delta function, but rather the Dirac
delta function w.r.t. the measure ν. Thus, in this case, we get

 f 2 =  fˆ(π) 2 ω(π)dν(π)

126 Advanced Probability and Statistics: Applications to Physics and Engineering

In particular, we find that if f is a function defined on M, then we can write



f (gx0 ) = T r(fˆ1 (π)π(g))dν(π)

where
f1 (g) = f (gx0 ), g ∈ G
and then we get for g ∈ G, h ∈ H that
 
f (gx0 ) = f1 (gh) = f1 (g) = f1 (gh)dh = T r(fˆ1 (π)π(g)P )dν(π)
H Ĝ

For each x ∈ M, we can, by the assumed transitivity of the G action on M,


assume that there is an element γ(x) ∈ G such that γ(x)x0 = x. We then get
from the above identity,

f (x) = T r(fˆ1 (π)π(γ(x))P )dν(π), x ∈ M

which is the desired expansion for functions defined on M in terms of the func-
tions 
ψab (x) = (π(γ(x))P )ab = [π(γ(x)]ac Pcb
c

[13] Weyl’s character formula


Let Δ be the set of roots and Δ+ the set of positive roots relative to an
order. Note that G is assumed to be a compact group. Let λ be a dominant
integral weight. Let πλ denote the irreducible representation of G having λ as
its maximal weight. Let

χλ (g) = T r(πλ (g)), g ∈ G

χλ is the irreducible character of G corresponding to the irreducible represen-


tation πλ . We write φλ (X) for log(χλ (exp(X))) where X ∈ g with g being
the Lie algebra of G. Let h denote a Cartan subalgebra of g. Then, χλ (h) for
h = exp(H), H ∈ h is a finite Fourier series with non-negative integral coeffi-
cients, ie, we can write

χλ (exp(H)) = nk .exp(2πiχk (H))
k

where the χk s are integral linear functions on h. It is an easy result in Fourier
series that the functions exp(2πiχk (H)), k = 1, 2, ... are orthogonal on h w.r.t the
Euclidean measure on h. From the irreducibility of χλ and Weyl’s integration
formula, we have

|W |−1 χλ (h).χμ (h)∗ |Δ(h)|2 dh = δ(λ, μ)
H
Advanced Probability and Statistics: Applications to Physics and Engineering 127

for any two dominant integral weights λ, μ. Here, H = exp(h) is the Cartan
subgroup of G corresponding to the Cartan sub-algebra h. In this formula,

Δ(h) = Πα∈Δ+ (exp(α(H)/2) − exp(−α(H)/2)



= exp(ρ(H)).Πα∈Δ+ (1 − exp(−α(H)), ρ = (1/2) α(H)
α∈Δ+

= (s).exp(s.ρ), h = exp(H)
s∈W

We wish to prove that



(s).exp(s.(λ + ρ)(H))
χλ (exp(H)) = 
s∈W

s∈W (s).exp(s.ρ(H))

[14] Root systems and their classification


Let g be a semisimple Lie algebra with a Cartan subalgebra h and let Δ
denote the set of roots of (g, h). Let Δ+ denote the set of positive roots w.r.t
a given Weyl chamber. Let gα denote the root space corresponding to the root
α, ie,
gα = {X ∈ g : [H, X] = α(H)X∀H ∈ h}
Let {Hα , Xα , X−α } be a canonical triple, ie, Xα ∈ gα , X−α ∈ g−α and then it
is clear that
Hα = [Xα , X−α ] ∈ h
since for any H ∈ h, we have

[H, Hα ] = [H, [Xα , X−α ]] =

−([Xα , [X−α , H]] + [X−α , [H, Xα ]])


= −α(H)[Xα , X−α ] − α(H)[X−α , Xα ] = 0
and h is maximal Abelian in h.
/ Δ. Then gα ⊥ gβ ∀β ∈ Δ.
Remark: If α ∈ Δ, then −α ∈ Δ for suppose −α ∈
Indeed, if X ∈ gα , Y ∈ gβ , then

β(H).B(X, Y ) = B(X, [H, Y ]) = B([X, H], Y ) = −α(H)B(X, Y )

so that
(α(H) + β(H)).B(X, Y ) = 0∀H ∈ h
which implies that B(X, Y ) = 0 since by hypothesis α + β = 0 and hence there
exists an H ∈ h such that α(H) + β(H) = 0. Therefore since gα ⊥ h, it follows
that gα ⊥ g which contradicts the non-degeneracy of the symmetric bilinear
form for a semisimple Lie algebra (Cartan’s theorem on semisimplicity).
128 Advanced Probability and Statistics: Applications to Physics and Engineering

Now let α, β ∈ Δ, α = β. Let Then since

[gα , gβ+kα ] ⊂ gα+(k+1)β

for all integers k, it follows that there exist two non-negative integers p, q such
that
p
gβ+kα
k=−q

is invariant under the adjoint action of {H̄α , Xα , X−α } where

H̄α = 2Hα / < α, α >

Remark:
[Xα , X−α ] = c.Hα
where Hα ∈ h is defined so that

B(Hα , H) = α(H), H ∈ h

Note that B(., .) is non-singular on h × h. Indeed, since h is perpendicular to


all the root spaces, it would follow that if any H ∈ h is perpendicular to h, then
H would be perpendicular to the whole of g which would contradict Cartan’s
theorem on the non-degeneracy of the symmetric bilinear form for semisimple
Lie algebras. We have

cα(H) = B(H, [Xα , X−α ]) = B([H, Xα ], X−α )

= α(H).B(Xα , X−α )
and hence,
c = B(Xα , X−α )
We then define
H̄α = 2.Hα / < α, α >
where
< α, α >= α(Hα ) = B(Hα , Hα )
Then,
[H̄α , Xα ] = α(H̄α )Xα = 2.Xα − − − (a),
[Hα , X−α ] = −2X−α − − − (b),
Note that
B(cHα , Hα ) = c < α, α >
or equivalently,

c.B(Hα , Hα ) = B(cHα , Hα ) = B([Xα , X−α ], Hα )

= B(Xα , [X−α , Hα ]) =< α, α > B(Xα , X−α )


Advanced Probability and Statistics: Applications to Physics and Engineering 129

We can always scale Xα and X−α so that

c = B(Xα , X−α ) = 2/ < α, α >

without affecting the commutation relations (a) and (b). We then get H̄α =
c.Hα and
[H̄α , Xα ] = 2.Xα ,
[Hα , X−α ] = −2.X−α ,
[Xα , X−α ] = H̄α
In other words, {H̄α , Xα , X−α } form a canonical sl(2, C) triple and as such ρα
defined as the adjoint representation of the Lie algebra spanned by this triple
p
acting on k=−q gβ+kα is an irreducible representation. Note that dimgα = 1
for any root α. To see this, we consider the subspace (X−α is an arbitrary
non-zero element in g−α )

Vα = span{X−α } ⊕ h ⊕ gkα
k>0

This space is finite dimensional since it is a subspace of the finite dimen-


sional Lie algebra g and further, it is invariant under each of the operators
ādHα , adX±α (Note that for any α, β ∈ Δ, we have that [gα , gβ ] ⊂ gα+β . We
are here using the convention that if γ is not a root, then gγ = {0}). It follows
therefore that

0 = T r([adXα , adX−α ]|Vα ) = T r(ad[Xα , X−α ]|Vα )

= T r(adH̄α |Vα ) =

−α(H̄α ) + 0 + k.α(H̄α )dimgα
k>0

= −2 + 2k.dimgkα
k≥1

It easily follows from this that

dimgα = 1, dimgkα = 0, k > 1


p
Now gβ,α = k=−q gβ+kα is the representation space of the irreducible repre-
sentation ρα of sl(2, C) = span{H̄α , X±α } and consequently, its highest weight
(β + pα)(H̄α ) and lowest weight (β − qα)(H̄α> ) are integers and hence

< β − qα, α >= − < β + pα, α >

or equivalently,

a(β, α) = β(H̄α ) = 2 < β, α > / < α, α >= p + q


130 Advanced Probability and Statistics: Applications to Physics and Engineering

is a non-negative integer.
Remark: The subspace gβ+kα has weight given by (β + kα)(H̄α ) because if
X is any vector in this subspace,

ρα (H̄α )(X) = ad(H̄α )(X) =

(β + kα)(H̄α ) = 2 < β + kα, α > / < α, α >= β(H̄α ) + 2k


Now, we further observe that β + kα is a root for each k = −q, −q + 1, ..., p and
that Xβ+kα has weight β(H̄α )+2k for each k = −q, −q+1, ..., p. Note that these
weights are all integers since these are weight of an irreducible representation of
sl(2, C). In particular, with k = 0, we get that β(H̄α ) is a weight (an integer)
and hence from the representation theory of sl(2, C), it follows that −β(H̄α ) is
also a weight of the irreducible representation ρα . Thus, β − β(H̄α )α is also a
root. But this root is precisely sα β, namely the reflection of the root β in the
hyperplane {H : α(H) = 0}. More precisely,

sα (Hβ ) = Hβ − β(H̄α )Hα = Hβ − 2 < β, α > Hα / < α, α >

is the reflection of Hβ in the hyperplane {H : α(H) = 0}. We can in fact define


for any H ∈ h, sα (H) ∈ h by

sα (H) = H − 2α(H)Hα / < α, α >

This is a reflection of H ∈ h in the hyperplane {H  : α(H  ) = 0}

We also note that if α is a root and c is any complex number such that cα
is a root, then c = ±1.

[15] Image processing on S 2


x, y, z ∈ S 2 . The undistroted image field is given by f (x) and the distorted
image field is given by

g(x) = K1 f (x) + K2 (f ⊗ f )(x) + w(x) =


 
K1 (x, y)f (y)dy + K2 (x, y, z)f (y)f (z)dydz + w(x)
S2 S 2 ×S 2

where dy, dz denote the area measure on S 2 , ie, in spherical polar coordinates,
if x = (θ, φ), then dx = sin(θ)dθ.dφ. We assume that K1 , K2 are G-invariant
kernels where G = SO(3). This means that

K1 (gx, gy) = K1 (x, y), K2 (gx, gy, gz) = K2 (x, y, z), x, y, z ∈ S 2 , g ∈ G

We wish then to determine the most general forms of K1 , K2 . Let Ylm (x) denote
the spherical harmonics on S 2 , m = −l, −l + 1, ..., l − 1, l, l = 0, 1, 2, .... We can
expand 
K1 (x, y) = K1 (l, m, l , m )Ylm (x)Ȳl m (y)
ll mm
Advanced Probability and Statistics: Applications to Physics and Engineering 131

Then the G-invariance condition for K1 translates to



K1 (lml m )[πl (g)]m1 m [π̄l (g)]m1 m Ylm1 (x)Ȳl m1 (y)

= K1 (l, m, l , m )Ylm (x)Ȳl m (y)
ll mm

This condition is equivalent to



K1 (lml m )[πl (g)]m1 m [π̄l (g)]m1 m = K1 (lm1 l m1 )
mm

or equivalently, in terms of matrices,

K1 (l, l ) = πl (g)K1 (l, l )πl (g −1 ), g ∈ G

One way to satisfy this is to take

K1 (l, l ) = c(l)Il δ(l, l )

We now require analogously a similar condition for the G-invariance of the kernel
K2 . The condition is

K2 (x, y, z) = K2 (lml m l m )Ylm (x)Ȳl m (y)Ȳl m (z)
ll l mm m

with K2 satisfying

K2 (lml m l m )[πl (g)]m1 m [π̄l (g)]m1 m [π̄l (g)]m1 m
mm m

= K2 (lm1 l m1 l m1 )


This condition can be put into the form

πl (g)K2 (ll l )(πl (g −1 ) ⊗ πl (g −1 ))

= K2 (ll l )∀g ∈ G


To see how this equation can be satisfied, we use the Clebsch-Gordon theorem
to expand
l +l
πl (g) ⊗ πl (g) = Tl ,l [ πk (g)]Tl−1
 ,l

k=−|l −l |

where T is a non-singular matrix of size N × N where N = (2l + 1).(2l + 1).


Note that πl (g) is a square matrix of size (2l + 1) × (2l + 1). T can be chosen
to be a unitary matrix. Then the above condition on K2 can be expressed as
l +l
   
K2 (ll l ) = πl (g)K2 (ll l )Tl ,l [ πk (g −1 )]Tl∗ ,l , g ∈ G
k=−|l −l |
132 Advanced Probability and Statistics: Applications to Physics and Engineering

In terms of block structured matrices, this equation can be expressed as



πl (g)[K2 (ll l )](s)Tl ,l (s, k)πk (g −1 )T̄l ,l (m, k) = [K2 (ll l )](m), g ∈ G
sk

or equivalently,
 
πl (g)[K2 (ll l )](s)Tl ,l (s, k)πk (g −1 ) = [K2 (ll l )](m)Tl ,l (m, k), g ∈ G
s m

One way in which this can be satisfied is to assume that


 
K2 (ll l )(s)Tl ,l (s, k) = δ(l, k) K2 (ll l )(m)Tl ,l (m, k)
s m

This condition is equivalent to requiring that



K2 (ll l )(s)Tl ,l (s, k) = 0, l = k
s

[16] Dynamics of 3-D rigid body links.


(0) (0)
We assume that R1 (t) and R2 (t) are respectively the rotation matrices
R1 (t), R2 (t) in the absence of noise. Thus, if the total torque rates are expressed
as √ √
T1 (t) + τ1 (t), T2 (t) + τ2 (t)
√ 
where T1 , T2 are the non-random components of the torques while τk (t), k =
1, 2 are white noise processes taking values in the space of 3 × 3 skew-symmetric
(0)
matrices, then Rk (t), k = 1, 2 satisfy the following matrix differential equa-
tions:
(0) T (0) T (0) (0)  
J 1 R1 (t) + kR2 (t) = R1 (t)J1 + R2 (t)lT + T1T − T2T (t)

and likewise with the subscripts 1 and 2 interchanged. Let


(0)
Rk (t) = Rk (t).exp(δXk (t)), k = 1, 2

where δXk (t), k = 1, 2 are skew-symmetric matrices and represent the effect of
noise. We first derive linearized equations for δXk . For this we note that upto
linear orders in the δXk s, we have
(0)
Rk (t) = Rk (t)(1 + δXk (t)) = Ak (t)(1 + δXk (t))

Rk = Ak (1 + δXk ) + Ak δXk


Rk = Ak (1 + δXk ) + 2Ak δXk + Ak δXk
where
(0)
Ak = Rk (t)
Advanced Probability and Statistics: Applications to Physics and Engineering 133

Thus the linearized differential equations are

(A1 δX1 + 2A1 δX1 + A1 δX1 )J1


 
+J1 (δX1 A1 T + 2δX1 A1T + δX1 AT1 )
+(A2 δX2 + 2A2 δX2 + A2 δX2 )k T
 
+k(δX2 A2 T + 2δX2 A2T + δX2 AT2 )
√ 
= (τ1 − τ2 )
with another equation obtained by interchanging the subsctipts 1 and 2 in the
above equation. Now let L1 , L2 , L3 be the standard basis for the linear space
of 3 × 3 skew-symmetric matrices, ie, the standard basis for the Lie algebra of
SO(3). We can therefore write

δXk (t) = xk1 (t)L1 + xk2 (t)L2 + xk3 L3 , k = 1, 2

where xkm (t), k = 1, 2, m = 1, 2, 3 are real number valued stochastic processes.


Substituting these expressions into the above linearized sde’s gives us
3
 
[x1m (t)(A1 Lm J1 + J1 Lm AT1 ) + 2x1m (t)(A1 Lm J1 + J1 Lm A1T )
m=1


+x1m (t)(A1 Lm J1 + J1 Lm A1 T )]
3

+ [x2m (t)(A2 Lm k T + kLm AT2 ) + 2x2m (t)(A2 Lm k T + kLm A2  T )
m=1
 √
+x2m (t)(A1 Lm k T + kLm A1 T )] = W1 (t)
with another equation obtained by interchanging the subscripts 1 and 2. Here,

W1 = τ1 − τ2 , W2 = −W1

We note that in the above equation, the coefficient matrices of xkm , xkm , xkm , k =
1, 2, m = 1, 2, 3 are all skew-symmetric real matrices. Hence we obtain a set of
3 + 3 = 6 linearly independent sde’s for the six variables xkm , k = 1, 2, m =
1, 2, 3. These six scalar equations can be obtained by multiplying the above
matrix sde’s with Ln , n = 1, 2, 3 respectively and taking the trace.

Acknowledgement: This problem is a part of the Phd thesis work of Mr.Rohit


Rana who has used this formalism in the presence of weak noise to compute the
large deviation stability zone exit probability and hence minimize this proba-
bility using controllers.

[17] The root space structure of SO(2n, C)


134 Advanced Probability and Statistics: Applications to Physics and Engineering

Choose an symmetric bilinear form (.|.) for SO(2n, C) and a basis u1 , ..., u2n
for C2n so that
(ui |uj+n ) = δ(i, j), 1 ≤ i, j ≤ n,
(ui |uj ) = (un+i |un+j ) = 0, 1 ≤ i, j ≤ n
This is equivalent to saying that u1 , ..., u2n is an onb w.r.t to the weight matrix

0 In
W =
In 0

One way to construct such a basis, is to start with the standard basis e1 , ..., en
for Cn and then to define

uk = [eTk , 0]T , un+k = [0, eTn+k ]T , k = 1, 2, ..., n

In this basis, a matrix X ∈ M (2n, C) is in SO(2n, C) iff X T W X = I. Thus,


the Lie algebra of SO(2n, C) is the set of all matrices X ∈ M (2n, C) for which
X T W + W X = 0. Writing

A B
X=
C D

this is equivalent to requiring that

D = −AT , B T = −B, C T = −C

We denote this matrix by X(A, B, C). Let g denote this Lie algebra. Let E(p, q)
denote the n × n matrix whose (p, q)th entry is a one and whose all other entries
are zeros. Then, define

F (p, q) = X(0, (E(p, q) − E(q, p))/2, 0), G(p, q) = X(0, 0, (E(p, q) − E(q, p))/2),

H(p, q) = X(E(p, q), 0, 0)


where 1 ≤ p, q ≤ n. Then h = span{(H(p, p) : 1 ≤ p ≤ n} is a Cartan
subalgebra of g. and the root vectors for this Cartan subalgebra are

F (p, q), G(p, q), 1 ≤ p < q ≤ n, H(p, q), 1 ≤ p = q ≤ n

We leave it as an exercise to verify that these are precisely the set of all linearly
independent eigenvectors of ad(h) in g.
Chapter 5

Statistical Aspects of EEG


Signal Analysis and Image
Processing in the Medical
Sciences
MATLAB Problems in classical and quantum signal processing with
medical science applications
[1] Simulate a random variable given its probability distribution function
F (X). For this, you must write a program to calculate approximately the
inverse of the non-decreasing function F (.).

[2] Let x[n], 0 ≤ n ≤ L − 1 be a given signal. Simulate the signal y0[n]


obtained by repeating the signal x over K non-overlapping time slots, each slot
comprising L samples. This signal is given by

y0[Lm + n] = x[n], 0 ≤ n ≤ L − 1, 0 ≤ m ≤ K − 1

Note that this ”periodized signal” y is defined over the time interval 0 ≤ t ≤
KL − 1. Now add noise to this periodized signal to get

y[n] = y0[n] + v[n], 0 ≤ n ≤ KL − 1

After this, delay this noisy signal over each slot by a different amount. Let τ (m)
denote the delay over the mth slot. The resulting signal is then given by z[n]
where

z[mL + n] = y[mL + n − τ (m)], 0 ≤ m ≤ K − 1, τ (m) ≤ n ≤ L − 1

and
z[mL + n] = 0, 0 ≤ n ≤ τ (m) − 1

135
136 Advanced Probability and Statistics: Applications to Physics and Engineering

Now take the DFT of z over each slot and estimate the τ (m) s by comparing
this DFT with the DFT of the original signal x. Reconstruct the signal x from
knowledge of these delay estimates and the noisy signal z[.] and determine the
reconstruction error variance.

[3] Let X be a random vector having the pdf p(X|θ). Let X[n], n = 1, 2, ...
be iid samples of X so that (X[n], n = 1, 2, ..., M ) has the pdf ΠM
n=1 p(X[n]|θ).
Estimate θ from measurements of X[n], n = 1, 2, ..., M by the maximum likeli-
hood method. Denote this estimate by θ̂[M ]. Calculate using the large deviation
principle
limM →∞ M −1 .log(P (|θ̂[M ] − θ| > δ)
Make appropriate approximations.
hint: Write θ̂[M ] = θ + δθ[M ] and write

p(X|θ + δθ) ≈ p(X|θ) + δθ.p (X|θ) + ((δθ)2 /2)p (X|θ)

and
log(p(X|θ + δθ)) ≈ logp(X|θ) + δθ.p (X|θ)/p(X|θ)
+((δθ)2 )(p (X|θ)/p(X|θ) − (1/2)(p (X|θ)/p(X|θ))2 )
M
Use this formula to approximately maximize n=1 log(p(X[n]|θ + δθ)) w.r.t δθ
and then apply the LDP.

[4] Let X μ (τ, σ), μ = 0, 1, ..., D − 1 be a string field in D dimensions. Its


Lagrangian is given by
 1
L0 (∂τ X, ∂σ X) = (1/2) (∂τ X.∂τ X − ∂σ X.∂σ X)dσ
0

where
U.V = ημν U μ V ν
Consider a perturbing Lagrangian
 1
ΔL = (F.∂τ X + G.∂σ X)dσ
0

where F μ (τ, σ), Gμ (τ, σ) are random fields. Calculate the perturbation in the
solution to the string field equations caused by these random perturbations and
evaluate using the LDP the probability that these perturbations will cause the
string deviation from its unperturbed solution to be more than a given threshold
δ.

[5] Consider the three dimensional Helmholtz equation

(∇2 + k 2 )ψ(r) = 0
Advanced Probability and Statistics: Applications to Physics and Engineering 137

Substitute
ψ(r) = A(r).exp(iφ(r))
where A, φ are real functions into the Helmholtz equation and derive nonlinear
pde’s for the two functions A, φ. Hence by making appropriate approximations,
derive a nonlinear Schrodinger equation for φ with the z coordinate taking the
role of the time variable.
hint:
∇ψ = (∇A + iA∇φ)exp(iφ)
∇2 ψ = (∇2 A + iA∇2 φ + 2i(∇A, ∇φ) − A|∇φ|2 )exp(iφ)
and hence the Helmholtz equation reads on neglecting the single and double
partial derivatives of A (which corresponds to the situation when the amplitude
of the wave does not vary rapidly with position, only the phase varies rapidly)

i∇2 φ − |∇φ|2 + k 2 = 0

We now write
φ(r) = kz + χ(r)
and note that
|∇φ|2 = k 2 + |∇χ|2 + 2k∂χ/∂z
so that the above equation becomes

i∇2 χ − |∇χ|2 + 2k∂χ/∂z = 0

which is the required nonlinear Schrodinger equation provided that we assume


that χ depends only on x, y and not on z.

[6] Estimating MRI parameters from distorted speech.


[7] Quantum neural networks for pdf estimation.

iψ,t (t, x) = (−1/2)ψ,xx (t, x) + V (t, x)ψ(t, x)

W,t (t, x) = −βW (t, x) + β0 (p(t, x) − |ψ(t, x)|2 )


V (t, x) = W (t, x)(p(t, x) − |ψ(t, x)|2 )
First we construct an approximate estimate of the signal pdf p(t, x) and then use
the quantum neural network to improve upon this estimate like smoothen it etc.
The logic of the above system of differential equations is that if |ψ(t, x)|2 <<
p(t, x), then W (t, x) will increase with time and hence so will V (t, x) causing
via the Schrodinger equation, ψ(t, x) to change so that |ψ(t, x)|2 increases. Let
us see why this happens. Put

P (t, x) = |ψ(t, x)|2

Then,
P,t (t, x) = ψ ∗ ψ,t + ψ,t

ψ=
138 Advanced Probability and Statistics: Applications to Physics and Engineering

iψ ∗ ψ,xx /2 − iψψ,xx

/2 = −J,x (t, x)
where
J(t, x) = (i/2)(ψ ∗ ψ,x − ψψ,x
∗ ∗
) = Im(ψψ,x )
Therefore if P is to increase with time, then J must decrease with increasing
x. It is plausible to believe from the Gibbs principle, that if the potential V
is large at a point, then the probability of the particle to be at that point is
smaller. However, the basis of that result is a part of quantum statistics which
involves bringing in the notion of a mixed state. We can bring in quasi-classical
quantum mechanics to understand the quantum neural network better. Let

ψ(t, x) = A(t, x).exp(iS(t, x))

then
P (t, x) = A(t, x)2
Substituting this into the above Schrodinger equation gives us

iA,t − AS,t = V A − (1/2)(A − AS 2 + 2iA S  + iAS  )

and hence

A,t = −A S  − AS  /2, S,t = −V + A /2 − AS 2 /2

Also,

P,t = −Im(ψψ,x ),x
= 2AA,t = −A(A S  + AS  /2)
If we fix our energy E, then we get

EA = V A − (1/2)(A − AS 2 + 2iA S  + iAS  )

so that we have approximately



2
S = 2(E − V )

or 
S = 2(E − V )
and hence the equation
2A S  + AS  = 0
gives
A2 S  = C
where C is a constant so that
√ 
A = C0 / S  = C0 / 2(E − V )

or
P = A2 = C02 /2(E − V )
Advanced Probability and Statistics: Applications to Physics and Engineering 139

This equation implies that if V < E and V increases, then P will also increase
and this fact is at the heart of the neural network.

[8] Random time delay estimation based on the pdf.


The signal x(t) is assumed to be a continuous time signal and the delay times
τ0 = 0, τk , k = 1, 2, ... are Poisson arrival times so that τk+1 − τk , k = 0, 1, 2, ...
are exponentially distributed r.v’s with a mean of λ. The received signal is
  ∞
y(t) = x(t − τk ) + v(t) = x(t − τ )dN (τ ) + v(t), t ≥ 0
k≥0 0

The aim is to estimate the arrival times τk , k = 1, 2, ... based on measurements


of the received signal y(t), t ≥ 0. For this, we construct the posteriori density

p(y(.)|{τk })p({τk })

and maximized this w.r.t {τk }. The density

k=0 exp(−λ(τk+1 − τk ))
p(τ1 , ..., τn ) = λn Πn−1

and
 T 
p(y(t), t ∈ [0, T ]|{τk }) = C.exp(−(2σv2 )−1 (y(t) − x(t − τk ))2 dt)
0 k≥0

Thus, the posteriori negative log likelihood function of the delay times is

L({τk }|y(.)) =
 T  
(1/2σv2 ) (y(t) − x(t − τk ))2 dt + λ (τk+1 − τk )
0 k≥0 k

The last sum evaluates to τN where N is the number of Poisson jumps in the time
interval [0, T ]. In order to formulate this more clearly, we require to estimate
both N = N (T ), the number of arrival times in the time interval [0, T ] as well
as the arrival times τk , k = 1, 2, ..., N . This is done by the following evaluation:

p(N, τ1 , ..., τN |y(t), t ∈ [0, T ]) =

C.p(y(t), t ∈ [0, T ]|N, τ1 , ..., τN ).p(τ1 , ..., τN |N ).p(N )


We know that given N = N (T ), the arrival times τ1 < ... < τN are uniformly
distributed over [0, T ] and hence the above evaluates to
 T 
C.exp(−(2σv2 )−1 (y(t) − x(t − τk ))2 dt).λN /N !
0 0≤k≤N

and hence the negative log-likelihood function is given by

L(N, τ1 , ..., τN |y(t), t ∈ [0, T ]) =


140 Advanced Probability and Statistics: Applications to Physics and Engineering

 T 
(1/2σv2 ) (y(t) − x(t − τk ))2 dt − N.log(λ) + log(N !)
0 0≤k≤N

This function must be minimized w.r.t N, {τk }N


k=1 in order to obtain their re-
spective estimates.

[9] Random delay estimation for arbitrary renewal process arrival times.
Assume that the delay times τk , k = 1, 2, ..., τ0 = 0 are such that the inter-
delay times τk+1 − τk , k = 0, 1, 2, ... are iid with probability distribution F (.)
concentrated on R+ . Then, let N (T ) denote the number of arrival times in the
time interval [0, T ]. Thus,

N = N (T ) = max(n ≥ 0 : τn ≤ T ),

Then in terms of probability densities,

p(τk , k = 1, 2, ..., N |N ) = p(τk , k = 1, 2, ..., N )/P (N )


−1
k=0 f (τk+1 − τk ))/P (τN ≤ T < tauN +1 )
= (ΠN
−1 (N ∗
k=0 f (τk+1 − τk )]/(F
= [ΠN (T ) − F (N +1)∗ (T ))
where 
 n∗
f (t) = F (t), F (t) = dF (t1 )...dF (tn )
t1 +...+tn ≤t

This gives us the negative log likelihood function for the delays given the mea-
surement signal over the time interval [0, T ] as

L(τ1 , ...τN , N |y(t), t ∈ [0, T ]) =


 T 
N 
N −1
(1/2σv2 ) (y(t) − x(t − τk ))2 dt − log(f (τk+1 − τk ))
0 k=0 k=0

and minimizing this function w.r.t N, τ1 , ..., τN gives us the optimum delay esti-
mates. We can compute this likelihood function approximately using Parseval’s
theorem on Fourier analysis. Let X(ω, Y (ω) denote the Fourier transforms of
x(t), y(t) respectively. Then,

y(t) = (2π)−1 Y (ω).exp(iωt)dω,
R


N  
x(t − τk ) = (1/2π) X(ω) exp(iω(t − τk ))dω
k=0 R

Thus, if T is large, we have approximately,


 T 
N
(y(t) − x(t − τk ))2 dt =
0 k=0
Advanced Probability and Statistics: Applications to Physics and Engineering 141

 
N
(1/2π) |Y (ω) − X(ω). exp(−iωτk )|2 dω
R k=0

and therefore the delays are estimated by minimizing

L(N, τ1 , ..., τN |y(.)) =


 
N
(1/4πσv2 ) |Y (ω) − X(ω). exp(−iωτk )|2 dω
R k=0


N −1
− log(f (τk+1 − τk ))
k=0

[10] Detecting brain diseases using EEG data


M.R.Raghuveer had along with C.L.Nikias created and developed the theory
of the bispectrum as an analytical and practical tool for analyzing quadratic
phase coupling in a harmonic process comprising of a discrete superposition
of sinusoids. Quadratic phase relations indicate the presence of a quadratic
nonlinearity that distorts a superposition of statistically independent sinusoids
and hence bispectrum estimation can be used to detect a quadratic nonlinearity
and also characterize it. In what follows, we discuss how bispectral analysis has
enabled us to determine the nature of brain disease using this idea.

The sources of various rhythms present within the brain generate signals of
different frequencies, say ω1 , ..., ωp . When some stimuli in the form of light,
sound, skin perturbations etc. are applied on the person, then these frequencies
get phase coupled. Usually these stimuli are modeled by a discrete Poisson
train, so that if τ1 , τ2 , ... are the arrival times of a Poisson process with rate
λ (or equivalently, the inter-arrival time τk+1 − τk , k = 1, 2, ... are independent
exponential random variables with mean 1/λ, and the original EEG signal is


p
f (t) = A(k)cos(ωk t + φk )
k=1

then after the application of this discrete stimulus, the EEG signal becomes
 
g(t) = f (t − τk ) = A(k).cos(ωk t − ωk τm + φk )
k k,m

and it can be shown easily that if the φk s are iid uniform over [0, 2π), then the
triple correlations E(f (t)f (t + t1 )f (t + t2 )) vanishes but E(g(t)g(t + t1 )g(t + t2 ))
does not vanish. In fact, the f (t), g(t) are stationary processes with f (t) Gaus-
sian provided that the A(k) s are iid Rayleigh but g(t) is a non-Gaussian process.
This suggests that by estimating the bispectrum (bivariate Fourier transform
of the triple correlations) of g(t), we can get information about the stimulus
142 Advanced Probability and Statistics: Applications to Physics and Engineering

rate λ. Moreover, if there is some sort of a brain disease, then even without the
stimulus, the different phases {φk } in f (t) be not be statistically independent
causing the bispectrum of f (t) to be non-zero. One model for a brain disease
could be that the original harmonic signal f (t) comprising independent phases
gets non-linearly distorted leading to the resulting output having linear phase
relations causing its bispectrum to be non-zero. Thus by estimating the signal
bispectrum and noting the frequency pairs at which it peaks, we get information
about which frequencies in the rhythms are phase coupled. For example if the
non-linearity is of the Volterra type
 
y(t) = h1 (τ )f (t − τ )dτ + + h2 (τ1 , τ2 )f (t − τ1 )f (t − τ2 )dτ1 dτ2

then an easy calculation shows that



y(t) = |H(ωk )|A(k)cos(ωk t + φk + φH1 (ωk ))
k

+(1/2) A(k)A(m)|H2 (ωk , ωm )|.cos((ωk + ωm )t + φk + φm + φH2 (ωk , ωm ))
k,m

+(1/2) A(k)A(m)|H2 (ωk , −ωm )|.cos((ωk − ωm )t + φk − φm + φH2 (ωk , −ωm ))
k,m

where 
H1 (ω) = h1 (τ ).exp(−jωτ )dτ,

H2 (ω1 , ω2 ) = h2 (τ1 , τ2 ).exp(−j(ω1 τ1 + ω2 τ2 ))dτ1 dτ2

φH1 (ω) = Arg(H1 (ω)), φH2 (ω1 , ω2 ) = Arg(H2 (ω1 , ω2 ))


Therefore the phases {φk } are independent in the input f (t) but not so in
the output y(t). The spectrum of the input f (t) displays impulse (Dirac delta
functions) peaks of strengths A(k)2 |H(ωk )|2 at the frequencies ωk while that
of the output y(t) displays peaks of strengths A(k)2 |H1 (ωk )|2 at ωk , peaks of
strength (1/4)A(k)2 A(m)2 |H2 (ωk , ±ωm )|2 at frequencies ωk + ±ωm . Therefore
if we just estimate the output spectrum, we would be led to believe that the
independent rhythms in the brain EEG data are at the frequencies, ωk , ωk ± ωm
and we would completely miss out the fact that these frequency components
have phase relations, ie, the phase of the frequency components ωk ± ωm has
a term φk ± φm which is a linear combination of the phases φk and φm of the
frequency components ωk , ωm . However the triple moments of f (t) are zero
while those of g(t) are in view of the above mentioned phase relations are

E(g(t)g(t + t2 )g(t + t2 )) =

A(k)2 A(m)2 |H1 (ωk )||H1 (ωm )||H2 (ωk , ωm )|cos(ωk t1 +ωm t2 +φH2 (ωk , ωm )
k,m
−φH1 (ωk )−φH1 (ωm ))
Advanced Probability and Statistics: Applications to Physics and Engineering 143

which means that the bispectrum of g is



Bg (Ω1 , Ω2 ) = E(g(t)g(t + t1 )g(t + t2 ))exp(−j(Ω1 t1 + Ω2 t2 ))dt1 dt2


= A(k)2 A(m)2 |H1 (ωk )||H1 (ωm )||H2 (ωk +ωm )
k,m
|exp(jφH (ωk , ωm ))δ(Ω1 −ωk )δ(Ω2 −ωm )
plus similar terms with ωm replaced by −ωm and φH1 (ωm ) replaced by −φH1 (ωm ).
In this expression, we have defined
φH (ωk , ωm ) = φH2 (ωk , ωm ) − φH1 (ωk ) − φH1 (ωm )

Thus, the strength of the bispectral peak at (Ω1 , Ω2 ) = (ωk , ωm ) gives us


|H1 (ωk )||H1 (ωm )||H2 (ωk , ωm )| or equivalently H2 (ωk , ωm ). From bispectral anal-
ysis of the output, we therefore get information about the nonlinearity H2 (ω, ω  )
or equivalently h2 (t, t ) which is characteristic of the brain disease. It is my con-
jecture that even several months before the brain disease has set in, we can
get information about it by estimating the nonlinear kernel H2 using bispectral
analysis.
The use of group representation theory in inferring about the brain disease:
Precisely speaking, the EEG sensor data is of a spatio-temporal nature, ie, this
signal is a function f (t, x) of time and the position x on the head’s surface. If
we assume the head to be spherical, then x ∈ S 2 , ie, x is a unit vector on the
unit sphere and its model has the form

f (t, x) = A(k, x).cos(ωk t + φ(k, x))
k
The characteristic frequencies ωk are fixed and independent of time and position
on the head’s surface because the sources of these frequencies are fixed and the
undiseased EEG data is generated by a linear mechanism. For example such a
model would arise
 if we convolved in time the impulse response h(t, x) with a
temporal signal k A(k).cos(ωk t + φk ) to get

f (t, x) = A(k)|H(ωk , x)|.cos(ωk t + φk + φH (k, x))
k

where 
H(ω, x) = h(t, x).exp(−jωt)dt
R

φH (k, x) = arg(H(ωk , x))


Another model for such a spatio-temporal EEG signal field could be one specified
via a partial differential equation

Lf (t, x) = s(t, x)
144 Advanced Probability and Statistics: Applications to Physics and Engineering

where L is a partial differential operator in space and time. L−1 would then be
an integral kernel and we would get by solving the above pde

f (t, x) = L(t − τ, x, y)s(τ, y)dτ dy

which would be a signal of the above form when s(t, x) is a harmonic process
w.r.t time. We can also model this signal using a dynamic model and apply the
EKF to estimate the parameters of the operator L. For example, suppose

L = ∂/∂t + M

where M is time independent, then our dynamic model would be



∂f (t, x)/∂t + M (x, y|θ)f (t, y)dy = s(t, x) + w(t, x)

where w(t, x) is white Gaussian noise. θ are the parameters of the EEG signal
and they can be estimated using the EKF applied to spatially discrete measure-
ment data:
g(t, xk ) = f (t, xk ) + v(t, xk ), k = 1, 2, ..., d
obtained by placing sensors at the discrete points xk , k = 1, 2, ..., d on the brain’s
surface. If we take non-linearities into account caused by the disease, then the
dynamic model for the signal would be
 
∂f (t, x)/∂t+ M1 (x, y|θ)f (t, y)dy+ M2 (t, x, y, z|θ)f (t, y)f (t, z)dydz
= s(t, x)+w(t, x)
and once again, we could apply the EKF to estimate the model parameters θ
from which the nature of the brain disease could be classified.
Application of neural networks and artificial intelligence in the classification
of brain disease. Suppose we take a recurrent neural network RNN governed by
the state equations

X(k + 1) = F (X(k), W (k), u(k)) + noise

where X(k) are the signals at the various layers, W (k) are the weights and u(k)
is the input vector. The measurement is taken on the final layer, ie, the Lth
layer:
y(k) = XL (k) + v(k) = HX(k) + v(k)
where
X(k) = [X1 (k)T , ..., XL (k)T ]T
with Xj (k) being the signal vector at the j th layer and H a matrix of the form
[0, 0, ..., 0, I]. The weights W (k) are governed by the evolution

W (k + 1) = W (k) + noise
Advanced Probability and Statistics: Applications to Physics and Engineering 145

We apply a vector stimulus signal u(t)to the diseased brain and record the
output signal vector yd (t) measured on the surface of the brain. We then use
the EKF for the above neural network model to estimate the weights W (k) with
the driving input for the EKF taken as yd (k). Once the weights have converged,
we regard these converged weights as ther characteristic features of the disease.

The use of quantum neural networks in estimating the brain signal pdf.
Quantum neural networks are nature inspired algorithms that naturally generate
a whole family of probability densities and hence can be used to estimate the
joint pdf of the EEG signal on the brain surface.

[11] Modeling speech signals based on distorted MRI data and es-
timating the MRI parameters from distorted speech using the EKF
This is just a special case of synthesizing a higher dimensional signal from a
lower dimensional signal.
Synthesis of MRI data from speech/EEg signals
The MRI space-time signal M (t, x, y) = M (t, x, y|θ) modeled as a function
of time and space is


p
∂t M (t, x, y|θ) = akm (θ)∂xk ∂ym M (t, x, y|θ)+
k,m=0

+(dz0 (t)/dt).χ(t, x, y) + w(t, x, y)


where z(t) is the noiseless speech/EEG signal and θ are the unknown brain pa-
rameters to be estimated which will give us the nature of the brain disease. The
distorted speech/EEG signal z(t) is related to the MRI data via a ”measurement
model” 
dz(t) = ( φ(t, x, y)M (t, x, y|θ)dxdy)dt + dv(t)

In the special case when the brain is normal, θ = θ0 and in the absence of noise
z(t) = z0 (t) so that we have an identity


p
∂t M0 (t, x, y|θ0 ) = akm (θ0 )∂xk ∂ym M0 (t, x, y|θ0 ) + z0 (t)χ(t, x, y)
k,m=0

This identity determines the normal brain, noise free MRI data M0 (t, x, y|θ) as
a function of the computer generated speech data z0 (t). Our aim is to estimate
the brain disease parameters θ from recordings of the speech data z(t). This we
propose to do using the EKF.

If θ deviates slightly from θ0 , we call this deviation δθ:

θ = θ0 + δθ
146 Advanced Probability and Statistics: Applications to Physics and Engineering

Likewise, if z(t) deviates slightly from z0 (t), we call this deviation δz(t):

z(t) = z0 (t) + δz(t)

Thus we have a linearized state model

δM (t, x, y) = M (t, x, y|θ) − M0 (t, x, y|θ0 ),

∂δM (t, x, y) =

p
(akm (θ0 )δθ(t))∂xk ∂ym M0 (t, x, y|θ0 ) + w(t, x, y)
k,m=0

dδθ(t) = d θ (t),
and the linearized measurement model is

dδz(t) = ( φ(t, x, y)δM (t, x, y)dxdy)dt + dv(t)

The problem now amounts to estimating δθ(t) dynamically from δz(s), s ≤ t and
this can be achieved via the Kalman filter. Another model that could describe
this situation involves considering that the MRI image data within the brain
depends on the distorted brain parameters θ as well as on the distorted speech
data z(t). In this case, the state and measurement models will be intertwined.
However, such a model is not too realistic since it is natural to suppose that
first the brain acquires a disease independent of the spoken speech so that the
corresponding distorted parameters θ have nothing to do with the spoken speech
but could depend on the noiseless speech that the computer has stored. One
may also assume that this MRI data is independent of the noiseless speech but
the spoken speech depends on the MRI data and the computer stored noiseless
speech. In this case, a state and measurement model would be


p
∂t M (t, x, y|θ) = akm (θ)∂xk ∂ym M (t, x, y|θ)dxdy
k,m=0

+w(t, x, y)

dz(t) = dz0 (t) + ( φ(t, x, y)M (t, x, y|θ)dxdy)dt + dv(t)

The role of the bispectrum in modeling brain disease: Brain disease is usually
manifested by the generation of nonlinear mechanisms which distort the speech
data. So, if M (t, x, y|θ) is the MRI image field data, then the spoken speech
z(t) would satisfy an sde of the form

dz(t) = (δ. f (M (t, x, y|θ), z0 (t))φ(t, x, y)dxdy)dt + dz0 (t) + dv(t)
Advanced Probability and Statistics: Applications to Physics and Engineering 147

where δ is a small parameter. The question is that when M (t, x, y) is a stationary


Gaussian random field, can we approximately derive the higher order moments
and spectra of z(t) by some sort of a perturbation theoretic analysis ?

Reference: Vijay Upreti, Sagar, Vijyant Agrawal and Harish Parthasarathy,


paper communicated.

[12] Estimating higher dimensional signals from lower dimensional


signals
M (t, x, y) is the true MRI data. Here, t is an index that depends on the kind
of brain disease. We call this index as time and this index assumes different
values for different kinds of disease. We model the dynamics of M (t, x, y) by
a linear pde that is of the first order in ”time” but can contain any number of
spatial partial derivatives ∂xk ∂ym . Thus, the dynamics is modeled as

∂t M (t, x, y|θ) = L(θ)M (t, x, y|θ) + z0 (t)ψ(t, x, y)

where z0 (t) is the computer recorded speech data. We are assuming that when
the patient carrying the brain disease varies, the parameter t will also vary and
for the tth patient, z0 (t) is fixed. Actually to be more precise the forcing term
in the above pde should depend on the entire speech record z0 (.) taken over the
finite duration [0, T ] and therefore we should write

∂t M (t, x, y|θ) = L(θ)M (t, x, y|θ) + ψ(t, x, y, z0 (.))

L(θ) is a partial differential operator depending upon a the parameter vector


θ and we assume that this operator is linear in the parameter, so that we can
write
p
L(θ) = θ[k]Lk
k=1

After discretizing this pde, in both time and space, we obtain a partial difference
equation

M [t + 1, x, y|θ] = M [t, x, y|θ] + δ( θ[k]Ld,k )M [t, x, y|θ] + ψ[t, x, t, z0 ])
k

which can be converted to vector form as



M [t + 1|θ] = (I + δ θ[k]Ld,k M [t|θ]) + δ.ψ[t, z0 ]
k

where δ is the time discretization step size and Ld,k is the discrete matrix repre-
sentation of the partial differential operator Lk after spatial discretization. For
example if Lk = ∂xr ∂ys , then

Ld,k = Δ−r−s Dxr Dys


148 Advanced Probability and Statistics: Applications to Physics and Engineering

where Δ is the spatial discretization step size and Dx , Dy are the partial differ-
ence operators
Dx f (x, y) = f (x + 1, y) − f (x, y), Dy f (x, y) = f (x, y + 1) − f (x, y)
where x + 1 means x + Δ. More precisely, the integer x corresponds to the
spatial length xΔ and likewise for y. Thus,

Dxr Dys M (t, x, y)

is represented by the N 2 × 1 vector


N −1
M (t) = Dxr Dys M (t, x, y)ex ⊗ ey
x,y=0
= AM (t)
where A is the N 2 × N 2 matrix


N −1
A= (ex ⊗ ey )Dxr Dys (ex ⊗ ey )T
x,y=0


N −1
= (ex ⊗ ey )(Dxr ex ⊗ Dys ey )T
x,y=0

In this way, we obtain an ”MRI dynamics”

M [t + 1|θ] = (I + A(θ))M [t|θ] + ψ[t, z0 ]

where

p
A(θ) + θ[k]Ak
k=1

with the Ak s being N 2 × N 2 matrices. The solution is


t−1
M [t|θ] = (I + A(θ))t M [0] + (I + A(θ))t−k−1 ψ[k, z0 ]
k=0

and by matching M [t|θ] at different t s, say at t = t[r], r = 1, 2, ..., q to given


data, we can estimate θ. The matching can be carried out via a gradient search
algorithm:

q
θ[k, t+1] = θ[k, t]−μ(∂/∂θ[k, t])  M [t[r]]
r=1

t[r]−1
−(I+A(θ[:, t]))t[r] M [0]+ (I+A(θ[:, t]))t[r]−m−1 ψ[m, z0 ] 2
m=0

for k = 1, 2, ..., p.
Advanced Probability and Statistics: Applications to Physics and Engineering 149

The effect of stochastic perturbations on the performance of the gradient


search algorithm. Consider first a simplified situation when the model dynamics
of a system is given by
x[t + 1] = f (t, x[t], θ)
We propose to estimate the parameter vector θ via the gradient search algorithm
by matching x[.] to a desired process y[t] which satisfies a noisy version of the
above. The algorithm reads

θ[t + 1] = θ[t] − μ.∇θ  y[t + 1] − f (t, y[t], θ[t]) 2

We assume that
y[t] = x[t] + v[t]
where v[.] is noise and get

y[t + 1] − f (t, y[t], θ[t]) = x[t + 1] + v[t + 1] − f (t, x[t] + v[t], θ[t])

= v[t + 1] + f (t, x[t]|θ) − f (t, x[t] + v[t]|θ[t])


≈ v[t + 1] + f (t, x[t]|θ) − f (t, x[t]|θ[t]) − fx (t, x[t]|θ[t])v[t]
≈ v[t + 1] + fθ (t, x[t]|θ)(θ[t] − θ) − fx (t, x[t]|θ)v[t]

[13] Reconstruction of higher dimensional data from lower dimen-


sional ones
[1] The first step is to build a dynamic model for the slurred speech and
MRI data for a given computer recorded speech. Let ζ0 (t), t = 1, 2, ... denote
the computer recorded speech and let M (t), t = 1, 2, ... denote the dynamic
MRI data vector constructed by arranging the MRI pixel intensities in the
form of a vector at each time. Let ζ(t), t = 1, 2, ... denote the slurred speech
data corresponding to the computer recorded speech ζ0 (.). These processes are
assumed to satisfy the stochastic difference equations

ζ(t + 1) = ζ0 (t + 1) + A1 (θ)(ζ(t) − ζ0 (t)) + A2 (θ)M (t) + a3 (θ) + W1 (t + 1),

M (t + 1) = A4 (θ)M (t) + A5 (θ)ζ0 (t + 1) + a6 (θ) + W2 (t + 1)


In this model, θ is an unknown parameter vector designed initally to fit a given
MRI and slurred speech data corresponding to a given computer recorded speech
sequence. The parameter θ is initially fitted to given data M (.), ζ(.) by taking

θ0 = argminφ [  ζ(t+1)−ζ0 (t+1)−A1 (φ)(ζ(t)−ζ0 (t))−A2 (φ)M (t)−a3 (φ) 2
t


+  M (t + 1) − A4 (φ)M (t) − A5 (φ)ζ0 (t + 1) − a6 (φ) 2 ]
t
150 Advanced Probability and Statistics: Applications to Physics and Engineering

Now while running the EKF to estimate θ dynamically from the measurement
model
z(t) = Hζ(t) + v(t)
we assume that θ(t) = θ0 + δθ(t) and linearize the above model around θ0 and
then estimate δθ(t) using the EKF. For the EKF, the extended state vector is
taken as
ξ(t) = [zeta(t)T , M (t)T , δθ(t)T ]T
and the EKF estimate of this gives us M̂ (t|t), ie, the MRI data estimate as
a by-product. It also gives us δ̂θ(t|t) and hence θ̂(t|t) which characterizes the
nature of the disease.

[14] Quantum neural networks for estimating a slowly time varying


pdf of a parameter vector
The parameter vector θ is assumed to vary from slot to slot. Over each
slot, its time variation is estimated using the EKF and then over each slot we
construct based on these estimates, its empirical pdf, we then smoothen it out
using a quantum neural network based on approximating the pdf of a random
vector using the modulus square of the Schrodinger wave function. The method
of updating the wave function is based on the fact that time independent quasi-
classical quantum mechanics yields the fact that the magnitude square of the
wave function increases with increase in the potential as long as the potential
is smaller than the energy of the state, hence if the modulus square of the wave
function is much smaller than the initial pdf estimate based on the empirical
density, then we can make the potential proportional to the difference between
the empirical density and that predicted using the Schrodinger equation and the
Schrodinger equation would then cause the modulus square of the wave function
to increase at the next iteration. We can also incorporate neural weights to make
this algorithm more effective, namely, cause the neural weights to increase with
increase in the difference between the empirical pdf and that obtained from the
Schrodinger equation and then make the potential proportional to the neural
weight.

[15] Detecting brain diseases using EEG data based on group in-
variant processing
The sources of various rhythms present within the brain generate signals of
different frequencies, say ω1 , ..., ωp . When some stimuli in the form of light,
sound, skin perturbations etc. are applied on the person, then these frequencies
get phase coupled. Usually these stimuli are modeled by a discrete Poisson
train, so that if τ1 , τ2 , ... are the arrival times of a Poisson process with rate
λ (or equivalently, the inter-arrival time τk+1 − τk , k = 1, 2, ... are independent
exponential random variables with mean 1/λ, and the original EEG signal is


p
f (t) = A(k)cos(ωk t + φk )
k=1
Advanced Probability and Statistics: Applications to Physics and Engineering 151

then after the application of this discrete stimulus, the EEG signal becomes
 
g(t) = f (t − τk ) = A(k).cos(ωk t − ωk τm + φk )
k k,m

and it can be shown easily that if the φk s are iid uniform over [0, 2π), then the
triple correlations E(f (t)f (t + t1 )f (t + t2 )) vanishes but E(g(t)g(t + t1 )g(t + t2 ))
does not vanish. In fact, the f (t), g(t) are stationary processes with f (t) Gaus-
sian provided that the A(k) s are iid Rayleigh but g(t) is a non-Gaussian process.
This suggests that by estimating the bispectrum (bivariate Fourier transform
of the triple correlations) of g(t), we can get information about the stimulus
rate λ. Moreover, if there is some sort of a brain disease, then even without the
stimulus, the different phases {φk } in f (t) be not be statistically independent
causing the bispectrum of f (t) to be non-zero. One model for a brain disease
could be that the original harmonic signal f (t) comprising independent phases
gets non-linearly distorted leading to the resulting output having linear phase
relations causing its bispectrum to be non-zero. Thus by estimating the signal
bispectrum and noting the frequency pairs at which it peaks, we get information
about which frequencies in the rhythms are phase coupled. For example if the
non-linearity is of the Volterra type
 
y(t) = h1 (τ )f (t − τ )dτ + + h2 (τ1 , τ2 )f (t − τ1 )f (t − τ2 )dτ1 dτ2

then an easy calculation shows that



y(t) = |H(ωk )|A(k)cos(ωk t + φk + φH1 (ωk ))
k

+(1/2) A(k)A(m)|H2 (ωk , ωm )|.cos((ωk + ωm )t + φk + φm + φH2 (ωk , ωm ))
k,m

+(1/2) A(k)A(m)|H2 (ωk , −ωm )|.cos((ωk − ωm )t + φk − φm + φH2 (ωk , −ωm ))
k,m

where 
H1 (ω) = h1 (τ ).exp(−jωτ )dτ,

H2 (ω1 , ω2 ) = h2 (τ1 , τ2 ).exp(−j(ω1 τ1 + ω2 τ2 ))dτ1 dτ2

φH1 (ω) = Arg(H1 (ω)), φH2 (ω1 , ω2 ) = Arg(H2 (ω1 , ω2 ))


Therefore the phases {φk } are independent in the input f (t) but not so in
the output y(t). The spectrum of the input f (t) displays impulse (Dirac delta
functions) peaks of strengths A(k)2 |H(ωk )|2 at the frequencies ωk while that
of the output y(t) displays peaks of strengths A(k)2 |H1 (ωk )|2 at ωk , peaks of
strength (1/4)A(k)2 A(m)2 |H2 (ωk , ±ωm )|2 at frequencies ωk + ±ωm . Therefore
if we just estimate the output spectrum, we would be led to believe that the
152 Advanced Probability and Statistics: Applications to Physics and Engineering

independent rhythms in the brain EEG data are at the frequencies, ωk , ωk ± ωm


and we would completely miss out the fact that these frequency components
have phase relations, ie, the phase of the frequency components ωk ± ωm has
a term φk ± φm which is a linear combination of the phases φk and φm of the
frequency components ωk , ωm . However the triple moments of f (t) are zero
while those of g(t) are in view of the above mentioned phase relations are

E(g(t)g(t + t2 )g(t + t2 )) =

A(k)2 A(m)2 |H1 (ωk )||H1 (ωm )||H2 (ωk , ωm )|cos(ωk t1 +ωm t2 +φH2 (ωk , ωm )
k,m
−φH1 (ωk )−φH1 (ωm ))
which means that the bispectrum of g is

Bg (Ω1 , Ω2 ) = E(g(t)g(t + t1 )g(t + t2 ))exp(−j(Ω1 t1 + Ω2 t2 ))dt1 dt2

= A(k)2 A(m)2 |H1 (ωk )||H1 (ωm )||H2 (ωk +ωm )
k,m
|exp(jφH (ωk , ωm ))δ(Ω1 −ωk )δ(Ω2 −ωm )
plus similar terms with ωm replaced by −ωm and φH1 (ωm ) replaced by −φH1 (ωm ).
In this expression, we have defined
φH (ωk , ωm ) = φH2 (ωk , ωm ) − φH1 (ωk ) − φH1 (ωm )

Thus, the strength of the bispectral peak at (Ω1 , Ω2 ) = (ωk , ωm ) gives us


|H1 (ωk )||H1 (ωm )||H2 (ωk , ωm )| or equivalently H2 (ωk , ωm ). From bispectral anal-
ysis of the output, we therefore get information about the nonlinearity H2 (ω, ω  )
or equivalently h2 (t, t ) which is characteristic of the brain disease. It is my con-
jecture that even several months before the brain disease has set in, we can
get information about it by estimating the nonlinear kernel H2 using bispectral
analysis.
The use of group representation theory in inferring about the brain disease:
Precisely speaking, the EEG sensor data is of a spatio-temporal nature, ie, this
signal is a function f (t, x) of time and the position x on the head’s surface. If
we assume the head to be spherical, then x ∈ S 2 , ie, x is a unit vector on the
unit sphere and its model has the form

f (t, x) = A(k, x).cos(ωk t + φ(k, x))
k
The characteristic frequencies ωk are fixed and independent of time and position
on the head’s surface because the sources of these frequencies are fixed and the
undiseased EEG data is generated by a linear mechanism. For example such a
model would arise if we convolved in time the impulse response h(t, x) with a
temporal signal k A(k).cos(ωk t + φk ) to get

f (t, x) = A(k)|H(ωk , x)|.cos(ωk t + φk + φH (k, x))
k
Advanced Probability and Statistics: Applications to Physics and Engineering 153

where 
H(ω, x) = h(t, x).exp(−jωt)dt
R

φH (k, x) = arg(H(ωk , x))


Another model for such a spatio-temporal EEG signal field could be one specified
via a partial differential equation
Lf (t, x) = s(t, x)
where L is a partial differential operator in space and time. L−1 would then be
an integral kernel and we would get by solving the above pde

f (t, x) = L(t − τ, x, y)s(τ, y)dτ dy

which would be a signal of the above form when s(t, x) is a harmonic process
w.r.t time. We can also model this signal using a dynamic model and apply the
EKF to estimate the parameters of the operator L. For example, suppose
L = ∂/∂t + M
where M is time independent, then our dynamic model would be

∂f (t, x)/∂t + M (x, y|θ)f (t, y)dy = s(t, x) + w(t, x)

where w(t, x) is white Gaussian noise. θ are the parameters of the EEG signal
and they can be estimated using the EKF applied to spatially discrete measure-
ment data:
g(t, xk ) = f (t, xk ) + v(t, xk ), k = 1, 2, ..., d
obtained by placing sensors at the discrete points xk , k = 1, 2, ..., d on the brain’s
surface. If we take non-linearities into account caused by the disease, then the
dynamic model for the signal would be
 
∂f (t, x)/∂t+ M1 (x, y|θ)f (t, y)dy+ M2 (t, x, y, z|θ)f (t, y)f (t, z)dydz
= s(t, x)+w(t, x)
and once again, we could apply the EKF to estimate the model parameters θ
from which the nature of the brain disease could be classified.
Now suppose that the functions M1 and M2 are G-invariant. Then the
implementation of the EKF is greatly simplified. Here, G is a Lie group of
transformations acting on the head surface manifold M. Let (g, x) → g.x
denote the group action from G × M → M. Then we denote by U the unitary
representation of G in L2 (M) induced by this group action and a G-invariant
measure on M. Specifically

U (g)f (x) = f (g −1 x), f ∈ L2 (M)


154 Advanced Probability and Statistics: Applications to Physics and Engineering

where μ is a measure on M that is G-invariant, ie μ(g −1 E) = μ(E) for all sets


E in B(M) with the latter being a σ-algebra of sets such that the group action
is measurable, ie, x → gx from M → M is measurable for each g ∈ G. We now
decompose this representation into irreducibles:

L2 (M) = Hn
n≥1

where Hn is a subspace invariant and irreducible under U (.). We can write for
any f (x) ∈ L2 (M),

f (x) = Pn f (x)
n≥1

where Pn is the orthogonal projection of L2 (M) onto Hn . Specifically, if {en,k :


k = 1, 2, ..., dn } is an onb for Hn , then


dn
Pn f = |en,k >< en,k , f >
k=1

where 
< en,k , f >= ēn,k (x)f (x)dμ(x)
M

If K(x, y) is a G-invariant kernel, it can be expressed as


 
K(x, y) = k(n)en,k (x)ēn,k (y) = k(n)Pn (x, y)
n,k n

where k(n) are constants dependent only on n and Pn (x, y) is the kernel of the
orthogonal projection operator onto Hn . To verify the G-invariance of K(., .),
we note that

K(g −1 x, g −1 y) = k(n)en,k (g −1 x).ēn,k (g −1 y)
n,k


= k(n)[πn (g)]k k en,k (x).[π̄n (g)]k k ēn,k (y)
n,k,k ,k

= k(n)en,k (x)ēn,k (y) = K(x, y)
n,k

since 
[πn (g)]k k” [π̄n (g)]k k = δk k
k

as πn (.) is a unitary representation of G. Consider now the differential equation



∂f (t, x)/∂t + M (x, y|θ)f (y)dy = s(t, x)
Advanced Probability and Statistics: Applications to Physics and Engineering 155

where M is a G-invariant kernel. We can represent



f (t, x) = f [t, n, k]en,k (x),
n,k


s(t, x) = s[t, n, k]en,k (x),
n,k

M (x, y|θ) = M [n|θ]en,k (x)ēn,k (y)
and substituting these expressions into the differential equation gives us

∂f [t, n, k]/∂t + M [n|θ]f [t, n, k] = s[t, n, k]

which can be solved to give


 t
f [t, n, k] = exp(−(t − τ )M [n|θ])s[τ, n, k]dτ
0

Thus, by measuring the input s(t, x) and the response f (t, x) at different values
of (t, x) ∈ R+ × M, we can estimate θ. Note that by exploiting our apriori
knowledge that M is G-invariant, we have reduced the computation since in
view of this G-invariance, M [, |θ] depends only on n, not on both (n, k) which
means reduced computation. θ can be extracted from i/o data by the following
elementary algorithm:
  t
θ̂ = argminθ |f [t, n, k] − exp(−(t − τ )M [n|θ])s[τ, n, k]dτ |2
t,n,k 0

suppose that s(t, x) was a random process with G-invariant correlation function,
ie
E[s(t, gx)s(t , gx )) = E[s(t, x).s(t , x )] = Ks (t, x, t , x ), g ∈ G
then we can expand 
s(t, x) = s[t, n, k]en,k (x)
n,k

where s[t, n, k] is a random process with correlations of the form

E[s[t, n, k]s[t , n , k  ]] = Ks [t, t , n]δ(n, n )δ(k, k  )

and so we get for the output correlations

E[f [t, n, k]f [t , n , k  ]] =


 t t
=[ exp((t−τ )M [n|θ]).exp((t −τ  )M [n|θ])Ks (τ, τ  , n]dτ.dτ  ]δ(n, n )δ(k, k  )
0 0

= Kf [t, t , n]δ(n, n )δ(k, k  )


156 Advanced Probability and Statistics: Applications to Physics and Engineering

[16] Application of neural networks and artificial intelligence in the classi-


fication of brain disease. Suppose we take a recurrent neural network RNN
governed by the state equations

X(k + 1) = F (X(k), W (k), u(k)) + noise

where X(k) are the signals at the various la yers, W (k) are the weights and
u(k) is the input vector. The measurement is taken on the final layer, ie, the
Lth layer:
y(k) = XL (k) + v(k) = HX(k) + v(k)
where
X(k) = [X1 (k)T , ..., XL (k)T ]T
with Xj (k) being the signal vector at the j th layer and H a matrix of the form
[0, 0, ..., 0, I]. The weights W (k) are governed by the evolution

W (k + 1) = W (k) + noise

We apply a vector stimulus signal u(t)to the diseased brain and record the
output signal vector yd (t) measured on the surface of the brain. We then use
the EKF for the above neural network model to estimate the weights W (k) with
the driving input for the EKF taken as yd (k). Once the weights have converged,
we regard these converged weights as ther characteristic features of the disease.

[17] The use of quantum neural networks in estimating the brain signal pdf.
Quantum neural networks are nature inspired algorithms that naturally generate
a whole family of probability densities and hence can be used to estimate the
joint pdf of the EEG signal on the brain surface.

[18] Polyspectral analysis of EEG data via the EKF


Independent realizations of a signal y(t), 0 ≤ t ≤ N − 1 are given. Denote
the ith realization by yi (t) with i = 1, 2, ..., q. The polyspectrum of the signal
of order r + 1 is estimated over the k th slot, each slot comprising L samples as
follows:

q
Py (ω1 , ..., ωr ) = q −1 [Yi (ω1 , k)...Yi (ωr , k)Ȳi (ω1 + ... + ωr , k)]
i=1

where


L−1
Yi (ω, k) = L−1 yi (kL + t)exp(−i2πωt/L), ω = 0, 1, ..., L − 1
t=0
Advanced Probability and Statistics: Applications to Physics and Engineering 157

We model the signal as the output of a Volterra system to an input harmonic


signal comprising of statistically independent harmonics:


L−1
x(t) = A(ω)cos(2πωt/L + φ(ω))
ω=0

with A(ω) s independent and φ(ω) s independent and uniform over [0, 2π). The
output is modeled as

r 
z(t) = hk (m1 , ..., mk |θ)x(t − m1 )...x(t − mk )
k=1

where

s
hk (m1 , ..., mk |θ) = θ(l).ψ(m1 , ..., mk , l)
l=1

It is not hard to see that the r + 1th order polyspectrum of z(.) has a term of
the form

Pz (ω1 , ..., ωr ) = (A(ω1 )...A(ωr ))2 Re(H̄r (ω1 , ..., ωr )H1 (ω1 )...H1 (ωr ))

where

s
Hm (ω1 , ..., ωm ) = Hm (ω1 , ..., ωm |θ) = θ(l)ψ̂m (ω1 , ..., ωm , l)
l=1



= hm (l1 , ..., lm |θ)exp(−i2π(ω1 l1 +...+ωm lm )/L) = Hm (omega1 , ..., ωm |θ)
l1 ,...,lm =−∞

where

L−1
ψ̂k (ω1 , ..., ωk , l) = ψk (m1 , ..., mk , l)exp(−(i2π/L)(m1 ω1 +...+mk ωk ))
m1 ,...,mk =0

We can express the above polyspectrum equation in the presence of noisy mea-
surements after taking into account the fact that the parameters θ can have
slow time variations:
P (ω|t) = F (ω, θ(t)) + W (t)
or equivalently in vector form as
r
×1
P(t) = F(θ(t)) + V(t) ∈ RL

where 
P(t) = P (ω|t)e(ω)
ω∈RLr

where
e(ω) = f (ω1 ) ⊗ ... ⊗ f (ωr )
158 Advanced Probability and Statistics: Applications to Physics and Engineering

with
ω = (ω1 , ..., ωr ), f (ω) = [δ(ω, 1), ..., δ(ω, L)]T
Here V(t) is measurement noise and

F(θ) = F (ω|θ),
ω∈RLr

with

F (ω) = (A(ω1 )...A(ωr ))2 Re(H̄r (ω1 , ..., ωr )|θ)H1 (ω1 |θ)...H1 (ωr |θ))

We can write
F(θ) = ψθ⊗r
where

ψ= (A(ω1 )...A(ωr ))2 Re(ψ̄r (ω1 , ..., ωr , l0 )
ω1 ,...,ωr ,l0 ,l1 ,...,lr

ψ1 (ω1 , l1 )...ψ1 (ωr , lr )(e(ω1 )⊗...⊗e(ωr )).(u(l0 )⊗...⊗u(lr ))


The parameter dynamics is
θ(l, t + 1) = θ(l, t) + W (l, t)
An estimate of this parameter vector in the absence of parameter noise is given
by
θ̂ = argminθ  P(t) − F(θ) 2

= argminθ  P(t) − ψ.θ⊗r 2
t
Assume that we have an apriori estimate θ0 of θ and we write its estimate based
on the above data as
θ̂ = θ0 + δ θ̂
Then in a linearized approximation, we have

ψ.θ̂⊗r = K δ̂θ

where
⊗(r−1) ⊗(r−2) ⊗(r−1)
K = ψ.(θ0 ⊗ I + θ0 ⊗ I ⊗ θ0 + ... + θ0r−k ⊗ I ⊗ θ0k−1 + ... + I ⊗ θ0 )

Then our measurement model can be expressed in this linearized approximation


as √
δP (t) = Kδθ + V (t)
where is a small parameter that signifies low amplitude measurement noise.
Assuming this noise to be white Gaussian, the parameter deviation estimate
based on N time samples is given by the familiar least squares formula:

N
δ θ̂ = N −1 (K T K)−1 K T δP (t)
t=1
Advanced Probability and Statistics: Applications to Physics and Engineering 159


N

= N −1 (K T K)−1 K T (Kδθ + V (t))
t=1

√ 
N
= δθ + N −1 (K T K)−1 K T . V (t)
t=1

and we get using standard large deviation techniques for small ,

.log(P ( δ θ̂ − δθ > δ)) ≈

−min v >δ I(v)

where
I(v) = (1/2)v T Qv, Q = N σv2 (K T K)−1
Now we come to the case when the parameter vector θ has slow time variations.
Let θ0 (t) be our initial guess value for this parameter at time t and we denote
its estimate at time t by δ θ̂(t). Then our measurement model acquires the form

δP (t) = K(t)δθ(t) + V (t)

where

K(t) = ψ.(θ0 (t)⊗(r−1) ⊗I+θ0 (t)⊗(r−2) ⊗I⊗θ0 (t)+...+θ0 (t)r−k ⊗I⊗θ0 (t)k−1
+...+I⊗θ0 (t)⊗(r−1) )
The dynamical model for δθ(t) is given by

δθ(t + 1) = δθ(t) + W (t + 1)

and its EKF estimate is denoted by δ θ̂(t|t). The LDP must be applied to
estimate the probability

P (maxt∈[0,T ]  δθ(t) − δ θ̂(t|t) > δ)

To get this estimate, we must first implement the EKF.

[19] Generalized synthesis of higher dimensional data from lower


dimensional data based on stochastic pde models
The MRI space-time signal M (t, x, y) = M (t, x, y|θ) modeled as a function
of time and space is

p
∂t M (t, x, y|θ) = akm (θ)∂xk ∂ym M (t, x, y|θ)+
k,m=0

+(dz0 (t)/dt).χ(t, x, y) + w(t, x, y)


where z0 (t) is the noiseless speech/EEG signal and θ are the unknown brain pa-
rameters to be estimated which will give us the nature of the brain disease. The
160 Advanced Probability and Statistics: Applications to Physics and Engineering

distorted speech/EEG signal z(t) is related to the MRI data via a ”measurement
model” 
dz(t) = ( φ(t, x, y)M (t, x, y|θ)dxdy)dt + dv(t)

In the special case when the brain is normal, θ = θ0 and in the absence of noise
z(t) = z0 (t) so that we have an identity


p
∂t M0 (t, x, y|θ0 ) = akm (θ0 )∂xk ∂ym M0 (t, x, y|θ0 ) + z0 (t)χ(t, x, y)
k,m=0

This identity determines the normal brain, noise free MRI data M0 (t, x, y|θ) as
a function of the computer generated speech data z0 (t). Our aim is to estimate
the brain disease parameters θ from recordings of the speech data z(t). This we
propose to do using the EKF.

If θ deviates slightly from θ0 , we call this deviation δθ:

θ = θ0 + δθ

Likewise, if z(t) deviates slightly from z0 (t), we call this deviation δz(t):

z(t) = z0 (t) + δz(t)

Thus we have a linearized state model

δM (t, x, y) = M (t, x, y|θ) − M0 (t, x, y|θ0 ),

∂δM (t, x, y) =

p
(akm (θ0 )δθ(t))∂xk ∂ym M0 (t, x, y|θ0 ) + w(t, x, y)
k,m=0

dδθ(t) = d θ (t),
and the linearized measurement model is

dδz(t) = ( φ(t, x, y)δM (t, x, y)dxdy)dt + dv(t)

The problem now amounts to estimating δθ(t) dynamically from δz(s), s ≤ t and
this can be achieved via the Kalman filter. Another model that could describe
this situation involves considering that the MRI image data within the brain
depends on the distorted brain parameters θ as well as on the distorted speech
data z(t). In this case, the state and measurement models will be intertwined.
However, such a model is not too realistic since it is natural to suppose that
first the brain acquires a disease independent of the spoken speech so that the
corresponding distorted parameters θ have nothing to do with the spoken speech
but could depend on the noiseless speech that the computer has stored. One
may also assume that this MRI data is independent of the noiseless speech but
Advanced Probability and Statistics: Applications to Physics and Engineering 161

the spoken speech depends on the MRI data and the computer stored noiseless
speech. In this case, a state and measurement model would be

p
∂t M (t, x, y|θ) = akm (θ)∂xk ∂ym M (t, x, y|θ)dxdy
k,m=0

+w(t, x, y)

dz(t) = dz0 (t) + ( φ(t, x, y)M (t, x, y|θ)dxdy)dt + dv(t)

The role of the bispectrum in modeling brain disease: Brain disease is usually
manifested by the generation of nonlinear mechanisms which distort the speech
data. So, if M (t, x, y|θ) is the MRI image field data, then the spoken speech
z(t) would satisfy an sde of the form

dz(t) = (δ. f (M (t, x, y|θ), z0 (t))φ(t, x, y)dxdy)dt + dz0 (t) + dv(t)

where δ is a small parameter. We compute the approximate bispectrum of


e(t) = z(t) − z0 (t) assuming that z0 (t) is a Gaussian random process. We have

y(t) = de(t)/dt = δ. f (M (t, x, y|θ), z0 (t))φ(t, x, y)dxdy + v  (t)

Expanding f as a Taylor series in its second argument,



f (M, z0 ) = f (k) (M, 0)z0 (t)k /k!
k≥0


Now
y(t) = ( φ(t, x, y)f (k) (M (t, x, y|θ), 0)dxdy)z0 (t)k /k!
k≥0

More generally, we can incorporate memory into this model by writing



y(t) = fk (M (t, x, y|θ), τ1 , ..., τk )z0 (t − τ1 )...z0 (t − τk )dτ1 ...dτk
k≥1

[20] Historic remark: It was Kalman who created the theory of real time/time
recursive estimation of the state of a dynamical system. His theory was however
based on linear state and measurement models. It was only after G.Kallianpur
and Kushner generalized Kalman’s work to include nonlinear state and mea-
surement models that engineers were able to solve a variety of signal estimation
problems wherever the objective was to do real time processing. Kallianpur’s
work was very mathematical involving measure valued random processes but
engineers were able to simplify it and obtain approximate implementable al-
gorithms. The Kushner-Kallianpur filter is an infinite dimensional filter as it
162 Advanced Probability and Statistics: Applications to Physics and Engineering

talks about how to estimate the conditional pdf of a signal given measurements
recursively. The EKF is a finite dimensional approximation to this.
Reference: Vijay Upreti, Sagar, Vijyant Agrawal and Harish Parthasarathy,
paper communicated.

[21] Prediction of higher dimensional data from lower dimensional


data in discrete time
MRI field (3D) is g(t, x, y, z), 1 ≤ x, y, z ≤ N . This is represented by a N 3 ×1
column vector g(t). Computer recorded speech is s0 (t) and slurred speech is
s(t).S(t) = [s(t), s(t − 1), ..., s(t − p)]T , S0 (t) = [s0 (t), s0 (t − 1), ..., s0 (t − p)]T .
Dynamics of speech is given by an LPC model whose parameters θ depend upon
the nature of the diseased brain:

p
S(t + 1) = S0 (t + 1) + θ(k)Ck S(t) + Ws (t)
k=1

If the brain has no disease, θ = 0 and if further there is no noise, S(t) = S0 (t), ie,
in this case there is no slurring of the speech so that the spoken speech coincides
with the computer recorded speech. The MRI dynamics also depends upon the
brain parameters θ and the computer recorded speech S(t):

p
g(t + 1) = θ(k)Dk g(t) + BS0 (t) + Wg (t + 1)
k=1

This dynamics may be compared with the motion of a pendulum forced by a


torque:
θ (t) = −a.sin(θ(t)) + τ (t)
The term BS0 (t) that drives the MRI should be compared to the torque while
the term involving memory k θ(k)Ck g(t) should be compared with −a.sin(θ(t)).
The parameter a of the pendulum is to be compared with the brain parameters
θ(k). The aim is to take noisy measurements of the speech s(t) and use that to
estimate the brain parameters θ(k) as well as the MRI g(t) dynamically from
this speech data using the EKF. Note that we can even directly estimate the
θ(k) from speech data and then use these estimated θ(k) s to estimate the MRI
from its dynamics by setting the noise Wg to zero.
Remark: The above vector model is similar to constructing a time series
model of the speech signal with the parameters of the time series being slowly
varying functions of time. By estimating these parameters on a real time basis,
we can therefore determine the time varying spectrum and bispectrum of the
speech signal over each time slot from the time series model and the noise
variance and skewness. Once we know the spectrum and bispectrum over each
time slot, we can determine the dominant frequencies in the signal as well as
the average time delay of the signal over each time slot which manifests itself
as a change in in its phase.
In general, while making a signal model, we have to also incorporate non-
random disturbances in the model coming from other sources, like disturbances
Advanced Probability and Statistics: Applications to Physics and Engineering 163

and reverberations within the brain caused by signals coming neurons connected
to other parts of the body, or disturbances caused by change in the positions
and movement of the measuring apparatus. By incorporating such disturbance
terms in the dynamical model, we can design a disturbance observer that will
provide real time estimates of the disturbance and hence subtract this distur-
bance estimate from the dynamical model.
[22] Stochastic instability of a stable system caused by small random spikes,
analysis of the probability of diffusion exit from a domain using large deviation
theory. Application of the diffusion exit problem to computing the probability of
exit of the EKF for MRI data from the stability domain. Consider a dynamical
system

x[n + 1] = f (x[n]) + w[n + 1]
where x[n] ∈ Rp and w[n] is an iid N (0, σ 2 Ip ) sequence. Then Let x0 be a fixed
point of the noiseless dynamical system ,ie,

f (x0 ) = 0

Assume that G is a connected open set containing x0 with the property that if
x(t) is the trajectory of the noiseless system with x(0) ∈ G, then limt→∞ x(t) =
x0 . In other words, x0 is an asymptotically stable fixed point. Then the
large deviation principle implies that the probability of the trajectory of the
noisy system exiting G for the first time at n = N is given for small val-
ues of by the formula exp(−V (N )/epsilon) where V (N ) equals the min-
N −1
imum of (2σ 2 )−1 n=0  x[n + 1] − f (x[n]) 2 over all those trajectories
{x[n] : n = 0, 1, ..., N } for which x[0] = x0 , x[n] ∈ G, 1 ≤ n ≤ N − 1 and
x[N ] ∈
/ G. It follows easily from this result that if N denotes the first time of
exit of the noisy system from G, then for small ,

E(N ) = exp(V̄ / )

where
V̄ = infN V (N )
The precise result is that

lim →0 .log(EN ) = V̄

Remark: This exact formula has been obtained by us based on intuitive


considerations that if the probability of hitting in time N is p(N, ) then it
hits once in an average time ofN/p(N, ) provided thatN is large. Taking the
logarithm of this, multiplying by and letting tend to zero keeping N fixed,
then gives the above formula for the average hitting time. We can use these
computations to determine the average time taken by the EKF state estimation
error to exit a given threshold value.
Reference:A.Dembo and O.Zeitouni, ”Large deviations, Techniques and Ap-
plications”, Springer.
164 Advanced Probability and Statistics: Applications to Physics and Engineering

[23] DNN approach to estimating the parameters/mechanism of


higher dimensional signal data generation by training the network
We have a time recursive model for MRI and slurred speech signals X(s0 , θ)
given the diseased brain parameters θ and the computer recorded speech data
s0 (t). We can train a neural network for different choices of s0 , θ that generate
a given MRI and slurred speech data X(s0 , θ). For the neural network, we give
as input X, ie, the slurred speech and MRI and as output the brain parameters
θ. We apply this to several pairs and determine the optimal weights W of the
neural network. Then, given a new patient, we record his speech and MRI
and give this as input to the nn which outputs the parameter θ. We can even
train the nn to generate the MRI and brain parameters for a given slurred
speech sequence. This amounts to generating the slurred speech s alone for
given θ and applying s as input to the nn and θ as the output. This training
is carried out for different θ, s and after the training is complete, we present
the nn with fresh s, ie, slurred speech data and the nn outputs the parameter
θ. Once θ is known, we can use the recursive model to generate the MRI. The
advantage with the ANN approach over the EKF is that once the nn weights
have been determined by the training process, we do not have to do any further
programming. The advantage with the EKF approach is that it can filter out
process and measurement noise while the nn training is based on a noiseless data
collection for operation. It is an open problem to determine the mean squared
parameter estimation error in the nn output when its training has been carried
out for deterministic data while actually, we finally present it with a noisy data.

[24] Estimating higher dimensional signals in the presence of disturbance


using disturbance observers.
Taking disturbance into account, the model becomes


p
S(t + 1) = S0 (t + 1) + θ(k)Ck S(t) + Ws (t) + ds (t + 1)
k=1


p
g(t + 1) = θ(k)Ck g(t) + BS0 (t) + Wg (t + 1) + dg (t + 1)
k=1

This model can be cast in a single vector form

S(t + 1)
=
g(t + 1)


p
Ck 0 S(t) ds (t + 1)
θ(k) +
0 Dk g(t) dg (t + 1)
k=1

or equivalently, writing
S(t)
X(t) = ,
g(t)
Advanced Probability and Statistics: Applications to Physics and Engineering 165

Ck 0
Fk = ,
0 Dk
as

p
X(t + 1) = θ(k)Fk X(t) + W (t + 1) + d(t + 1) = F X(t) + d(t + 1) + W (t + 1)
k=1

Disturbance observer:
ˆ + 1) = d(t)
d(t ˆ + L(t)(d(t + 1) − d(t))
ˆ


p
ˆ + L(t)(X(t + 1) −
= d(t) ˆ
θ(k, t)Fk X(t) − d(t))
k=1

Another scheme for designing the disturbance observer: Let


ˆ + L(F X(t) − d(t)),
z(t + 1) = d(t) ˆ ˆ = z(t) + P X(t)
d(t)

Then,
ˆ + 1) = z(t + 1) + P X(t + 1) = d(t)
d(t ˆ + L−1 P X(t + 1))
ˆ + L(F X(t) − d(t)

If we choose P = L, then we get


ˆ + 1) = d(t)
d(t ˆ + L(d(t + 1) − d(t)
ˆ + W (t + 1))

How to choose L: Taking the Z transform of the above disturbance observer


gives
(z − 1)D̂(z) = L(zD(z) − D̂(z) + z Ŵ (z))
or equivalently,

D̂(z) = (zL/(z − 1 + L))D(z) + (z/(z − 1 + L))W (z)

Thus,

E(z) = D(z) − D̂(z) = [(1 − z)(L − 1)/(z + L − 1)]D(z) − (z/(z + L − 1))W (z)

It is clear then from the final value theorem of Z-transform theory that if (z −
1)2 D(z) → 0 as z → 1, which is equivalent to d(n + 1) − d(n) → 0 then

limn e(n) = limn (1 − L)−1 w(n)

provided that lim w(n) exists. In the special case of zero noise, the disturbance
estimation error converges to zero.

[25] Time varying spectrum and bispectrum estimation from the EKF: Con-
sider an AR time series model satisfied by the speech-MRI vector process X(t) =
[s(t), g(t)T ]T :

X(t) = a(1)X(t − 1) + ...a(p)X(t − p) + d(t) + W (t)


166 Advanced Probability and Statistics: Applications to Physics and Engineering

By defining the vector

Y (t) = [X(t − 1)T , ..., X(t − p)T ]T

we get
Y (t + 1) = A(a)Y (t) + bd(t) + W (t)
where A is a matrix dependent upon the AR parameters a. Estimation of a,Y(t)
using an EKF gives us â(t). If we assume that a varies slowly and is therefore
almost constant over time slots of duration L, we get the spectral and bispectral
estimates of X(t) over each slot as

SX (omega, t) = |A(ω, t)|−2 ΣW

where Σw is the power spectral density matrix of W. More generally, we can al-
low the AR parameters a(k) to be matrices and then the power spectral estimate
of X(t) would be given by

SX (ω, t) = A(ω, t)−1 ΣW .A(ω, t)∗−1

where A(ω, t) is the matrix polynomial in exp(jω) given by



p
A(ω, t) = I − Â(k, t)exp(−jωk)
k=1

How to incorporate wavelet based state and parameter estimation into the EKF.

[26] wavelet based signal parameter estimation from stochastic difference


models.
Suppose we are given a vector valued signal X(t) evolving in time in accor-
dance with the dynamics

X(t + 1) = F (θ)X(t) + W (t + 1)

We define the temporal wavelet transform of the signal upto time t by



t
WX (t, n, k) = X(s)ψn,k (s)
s=0

Then we have the recursion

WX (t + 1, n, k) = WX (t, n, k) + X(t + 1)ψn,k (t + 1) =

WX (t, n, k) + F (θ)X(t)ψn,k (t) + noise


By collecting signal data upto time t + 1, we can thus estimate the parameter
θ by minimizing the energy function

E(t, θ) =  WX (t + 1, n, k) − WX (t, n, k) − F (θ)X(t)ψn,k (t) 2
n,k
Advanced Probability and Statistics: Applications to Physics and Engineering 167

Can this parameter estimation be made time recursive ? After a long duration
has elapsed, it is clear that if the signal is transient, its wavelet transform
taken upto time t will not change significantly with time and hence if we have
an apriori idea about what dominant frequencies are present in the signal over
each time slot, we need retain only those wavelet coefficients of the signal having
resolution indices corresponding to these dominant frequencies. This knowledge
therefore saves us of storing excessive data during the estimation process. For
example, suppose that we solve the above state variable system as


t−1
X(t) = F (θ)t−k−1 W (k), t = 0, 1, 2, ...
k=0

We take the wavelet transform over the time slot t ∈ [rL, (r + 1)L − 1] retaining
only those coefficients which fall in the index range Dr and then use this data for
different slot numbers r to estimate θ. However, this is not a recursive parameter
estimation scheme. Again, suppose that we have a dynamically varying three
dimensional signal field X(t, x, y, z) = X(t, r). Suppose that its dynamics is
described by a pde

∂t X(t, r) = L(θ)X(t, r) + W (t, r)

where L is a linear partial differential operator in the spatial variables depending


upon a parameter vector θ. We can estimate θ by taking noisy measurements on
the signal field at a sparse set of spatial locations and applying the EKF to it.
An example of such a 3-D signal field model is an MRI excited by speech. We
take the wavelet transform w.r.t the spatial variables on both sides. Indexing
the 3-D wavelet indices by a single index n, we get
 
∂t X(t, r)ψn (r)d3 r = X(t, r)L(θ)∗ ψn (r)d3 r + noise

or equivalently,

∂t WX (t, n) = WX (t, m)ψm (r)L(θ)∗ ψn (r)d3 r + noise
m

or equivalently,

∂t W (t, n) = a(n, m, θ)WX (t, m) + noise
m

where 
a(n, m, θ) = ψn (r)L(θ)ψm (r)d3 r + noise

We require to store this wavelet domain dynamical information only for a few
dominant wavelets in order to estimate θ efficiently. and the resulting estimation
algorithm can then be carried out with lesser computational complexity.
Chapter 6

Electromagnetism and
Quantum Field Theory
[1] Stochastic analysis of electromagnetic wave propagation in curved
space-time
Problem statement: The background metric is gμν (x). The permittivity-
permeability tensor is μν
αβ (x) this is assumed to be a Gaussian random field of
the form
δαμ δβν + δ.χμν
αβ (x)

where χμν
αβ (x) is a zero mean Gaussian field with known correlations

 

E[χμν μν
αβ (x)χα β  (x )]

The aim is to solve for the electromagnetic potentials upto O(δ 2 ) and express its
correlation as well as the correlation of the antisymmetric electromagnetic field
tensor upto O(δ 2 ) terms. The next problem is to consider in addition to such a
random permittivity-permeability tensor, the condition that the metric tensor of
the gravitational field is also a small zero-mean random Gaussian perturbation
of a non-random background metric tensor:
(0)
gμν (x) = gμν (x) + δgμν (x)

and then calculate the electromagnetic field correlations.

[2] A patch is cut on a substrate having a T shape. Assuming that the T


shape is defined by the union of the two rectangles

T = R 1 ∪ R2

where
R1 = {(x, y) : 0 ≤ x ≤ a, 0 ≤ y ≤ b}

169
170 Advanced Probability and Statistics: Applications to Physics and Engineering

R2 = {(x, y) : a ≤ x ≤ c, −d ≤ y ≤ b + d}
determine the electromagnetic fields within the box having T as its upper sur-
face at z = L/2 and the same surface at z = −L/2 with all the walls being
perfectly conducting surfaces. For doing this problem, you may regard this
cavity resonator as the union of two rectqangular boxes, determine the fields
within each rectangular box assuming standard formulae for rectangular cavity
resonators and then applying the continuity condition on the interface between
the two rectangular boxes.

[3] Report on sapphire antennas


[1] The author claims that while in metallic wall rdra’s the main component
of the radiation field arises from the surface current density, here in sapphire
rdra’s the main component of the radiation arises from the dielectric material
inside. Is it possible for a dielectric material to carry sufficient volume current
density for the resulting radiation pattern to be significant ?

[2] Let ( , μ, σ) be the permittivity, permeability and conductivity param-


eters of the dielectric. Then the volume current density J(ω, r) is calculated
from the Maxwell equations taking into account Ohm’s law:

curlH = jω E + J, J = σE, curlE = −jωμH

The boundary wall is assumed to be another dielectric with a smaller dielectric


constant and also possibly different permeability and conductivity. So, we must
also solve for the electromagnetic field in this region and apply the boundary
conditions that the tangential components of the electric field E and the normal
components of the magnetic field B are continuous at this boundary. In addition,
since the boundary is not a perfect conductor, it cannot carry any surface current
density and hence, the tangential components of the magnetic field H = B/μ
should also be continuous at this boundary. How would you solve this boundary
value problem to obtain the possible values of the resonant frequencies ? Kindly
explain how the expression for the resonant frequencies when the boundary of
the rdra is another dielectric differs from that when the boundary is a perfect
conductor ?

[3] Various kinds of feed structures are discussed by the author. In any case,
once the probe shape that feeds into the cavity antenna is known, the Maxwell
equations within the cavity must be supplemented apart from the boundary wall
condition with the boundary condition on the probe surface. Specifically, the
tangential magnetic field on the probe surface should equal the surface current
density on it. I would like to see a satisfactory reply to this at the time of the
viva-voce exam.

[4] The following typical one dimensional example may be used to explain
how resonance occurs when there is a transition in the permittivity from within
the rdra to outside. Assume that the region 0 ≤ x ≤ a has permittivity 1 while
Advanced Probability and Statistics: Applications to Physics and Engineering 171

the regions x < 0 and x > a have permittivity 2 . Thus the one dimensional
Helmholtz equations in these regions are (a) ψ  (x) + k12 ψ(x) = 0, 0 ≤ x ≤ a,
(b) ψ  (x) + k22 ψ(x) = 0, x < 0 or x > a where k12 = ω 2 1 μ0 , k22 = ω 2 2 μ0 . The
solutions in the three regions taking into account absence of reflection for x < 0
and x > a are given by

ψ(x) = A1 exp(jk1 x) + B1 exp(−jk1 x), 0 < x < a

ψ(x) = A2 exp(−ik2 x), x < 0, ψ(x) = B2 .exp(ik2 x), x > a


The boundary conditions are

ψ(0+) = ψ(0−), ψ(a+) = ψ(a−), ψ  (0+) = ψ  (0−), ψ  (a+) = ψ  (a−)

These four conditions correspond to the situation of continuity of the tangential


electric field and normal magnetic fields. They lead to a set of four homogeneous
linear equations for A1 , B1 , A2 , B2 . Setting the determinant of the coefficient
matrix to zero then gives us a transcendental nonlinear equation for the possible
values of the resonant frequency ω. How would you generalize this idea to two
and three dimensions ?

[5] Helmholtz equation for Ez within and outside the rdra when the dielectric
within has permittivity 1 and that outside has permittivity 2 are given by

(∇2⊥ − γ 2 + k12 )ψ(x, y) = 0

within the boundary and

(∇2⊥ − γ 2 + k22 )ψ(x, y) = 0

outside the boundary where the z dependence is exp(±γz). Note that γ has to
be the same for the fields within and outside in view of the continuity of Ez at
the walls. Here
∇2⊥ = ∂ 2 /∂x2 + ∂ 2 /∂y 2
and
k12 = ω 2 1 μ0 , k22 = ω 2 2 μ0
Since Ez vanishes at the top and bottom surfaces, we must have

γ = jπp/d

where p is an integer and d is the height of the rdra. The z-dependence of Ez


must be cos(pπz/d) so that E⊥ vanishes on the top and bottom surfaces. We
write the solution within as

ψ(x, y) = (A1 .cos(α1 x) + B1 .sin(α1 x)).(A2 .cos(α2 y) + B2 .sin(α2 y))

and outside as

ψ(x, y) = (C1 .cos(β1 x) + C2 .sin(β1 y)).(D1 .cos(β2 y) + D2 .sin(β2 y))
172 Advanced Probability and Statistics: Applications to Physics and Engineering

where in order that Helmholtz equation be satisfied,

−α12 − α22 + γ 2 + k12 = 0,

−β12 − β22 + γ 2 + k22 = 0


In this formalism, we are assuming that the top and bottom surfaces of the
rdra are perfect conductors while the side-walls are dielectrics. For continuity of
Dx = Ex at x = 0, a, we require that ( /h2 )∂Ez /∂x must be continuous at these
two boundaries. This gives us α2 = β2 and likewise for the other boundaries.
Following this procedure, can you derive the transcendental equation satisfied
by ω?
[6] How would you evaluate the far field radiation integrals using the Green’s
function/retarded potential approach to the current density within the sappire
rdra ?
[7] At one place, the author talks about two sapphire rdra’s of different
dimensions stacked on above the other. How does one compute the eigen-
frequencies in this case ?
[8] On the whole, the thesis is well written and contains several new ideas
giving the advantages of sapphire rdra’s as regards bandwidth, efficienty, lower
loss etc. relative to the usual rdra’s based on perfectly conducting walls. I rec-
ommend that the author be awarded the PhD degree after all the above queries
are answered during the viva-voce exam. I must also say that the publications
of the author are of a very high standard and hence the author should attempt
to translate this work into a book for the benefit of other research workers in
the field.
[4] Design of cavity resonator antennas using microstrip patches on
a substrate
Suppose our patch has a depth d and on the z = 0 surface, the patch consists
of two rectangles defined by the regions I:

0 < x < a1 , 0 < y < a 2 ,

II:
a 3 < x < a 3 + a 4 , a2 < y < a 2 + a 5 ,
The interface between these two regions is

y = a 2 , a3 < x < a 3 + a 4

Then we find for the Ez field in the two regions I:



Ez (t, x, y, z) = sin(mπx/a1 )sin(nπy/a2 )cos(pπz/d)
mnp
Re(cI (mnp)exp(jωI (mnp)t))
Advanced Probability and Statistics: Applications to Physics and Engineering 173


Ez (t, x, y, z) = sin(mπ(x−a3 )/a4 )sin(nπ(y−a2 )/a5 )cos(pπz/d)
mnp
Re(cII (mnp)exp(jωII (mnp)t)
where
ωI (mnp)2 /c= (mπ/a1 )2 + (nπ/a2 )2 + (pπ/d)2 ,
ωI (mnp)2 /c2 = (mπ/a4 )2 + (nπ/a5 )2 + (pπ/d)2
It is clear that Ez in both the regions vanishes at the interface and so does
∂Ez /∂x. Thus to ensure continuity of all the components of the em field at
the interface, we require that ∂Ez /∂y should be continuous at the interface.
Since this is only an approximate analysis, we shall require that the coefficients
cI (mnp) and cII (mnp) be such that if we denote by EzI and EzII the above
expressions for the z-components of the electric field in the two regions, then

|∂EzI (t, x, a2 , z)/∂y − ∂EzII (t, x, a2 , z)/∂y|2 dxdzdt

be a minimum subject to other constraints where the region of integration is


a3 < x < a3 + a4 , 0 < z < d, 0 < t < T . In this way, we can analyze the
T M modes. To analyze the T E modes, we again first solve for Hz in the two
rectangular patches in the same way taking care of the boundary conditions: I:

Hz = Re(dI (mnp)exp(jωI (mnp)t)cos(mπx/a1 )cos(nπy/a2 )sin(pız/d)
mnp
and II:

Hz = Re(dII (mnp)exp(jωI (mnp)t)
mnp

sin(mπ(x−a3 )/a4 )sin(nπ(y−a2 )/a5 ).sin(pπz/d)


Now we find that ∂Hz /∂y in both the case vanishes at the interface while Hz
and ∂Hz /∂x do not vanish. Thus, in order that all the components of the em
field be nearly continuous at the interface, we shall require that the coefficients
dI and dII be such that

α|HzI (t, x, a2 , z) − HzI (t, x, a2 , z)|2 dxdzdt

be a minimum subject to other constraints given by for example the feed points.
The region of integration is the same as that in the above discussed T M case.
Now consider another example in which there are three connected rectangu-
lar patches given by the regions (0 < x < a1 , 0 < y < a2 ), (a3 < x < a3 +a4 , a2 <
y < a2 + a5 ) and (a6 < x < a6 + a7 , a2 + a5 < y < a2 + a5 + a8 ).
[5] The Abelian and non-Abelian anomalies in effective quantum
field theories
174 Advanced Probability and Statistics: Applications to Physics and Engineering

Although the action functional may be invariant under a symmetry group,


the path measure may not as it happens in Chiral symmetries. The quantum
effective action is the Legendre transform of the logarithm of
 
exp(iW (J)) = exp(iI(φ) − i J.φ)Dφ

For a Fermionic field ψ, we are therefore interested in the measure exp(iI(ψ))Dψ.Dψ̄.


Although I(ψ) is globally gauge invariant, the path measure Dψ.Dψ̄ is not for
Chiral symmetry. This is because exp(iαγ5 ) does not have unit determinant.
The determinant of the Jacobian matrix associated with the transformation of
the path measure is clearly given by

exp(i αT r(γ5 )δ(x − x)d4 x)

which is not defined. So we use a gauge invariant version of the infinitesi-


mal change δ(x − x)d4 x in the path measure, ie, we choose a function f
such that f (0) = 1 and replace δ(x − x)d4 x by the gauge invariant integral
T r(f ((γ.D)2 /M 2 )d4 x in the limit of M → ∞. Here

D = ∂ + ieA(x)

ie,
γ.D = γ.∂ + ieγ.A(x) = γ μ ∂μ + ieγ μ Aμ (x)
Here, we are considering the Abelian U (1) anomaly, ie, the field considered is the
electromagnetic field. Note that f ((γ.D)2 /M 2 ) is an operator that converges
to the identity operator as M → ∞. So its trace converges to the trace of the
identity, ie, δ(x − x)d4 x as M → ∞. We find that for large M

f ((γ.D)2 /M 2 ) = f (0) + f  (0)(γ.D)2 /M 2 + (f  (0)/2)(γ.D)4 /M 4

and
T r((γ.D)2 ) = 0
effectively, since
(γ.D)2 = γ μ γ ν (∂μ + iAμ )(∂ν + iAν )
= (1/2) − (1/2)A2 + iγ μ γ ν (2Aν ∂μ + iAν,μ )
Now exp((1/2)T r()) contributes a field independent constant to the factor
by which the path measure is to be multiplied after the Chiral transformation,
while T r((−(1/2)T r(A2 )) may also be taken as unity after an approrpiate gauge
transformation of the electromagnetic four potential. Also T r(∂μ ) = 0 since ∂μ
is a skew-Hermitian operator. Finally,

T r(iT r(γ μ γ ν Aν,μ (x)d4 x)

is zero since
γ μ γ ν Aν,μ = (2η μν − γ ν γ μ )Aν,μ
Advanced Probability and Statistics: Applications to Physics and Engineering 175

= 2Aν,ν − γ ν γ μ Aν,μ
so that after choosing a gauge so that Aν,ν = 0, we get

T r(γ μ γ ν Aν,μ ) = 0

Thus, the only term that effectively contributes to the anomaly is the fourth
degree term f  (0)T r(γ.D)4 /2M 4 and the non-zero part of this evaluates to

(f  (0)/2M 4 ) T r(γ.∂ + iγ.A)4 d4 x

= (f  (0)/2M 4 ) T r(γ μ γ ν Aμ,ν )2 d4 x

Now,
γ μ γ ν Aμ,ν = −γ ν γ μ Aμ,ν
(since Aμ,μ = 0) and thus,

γ μ γ ν Aμ,ν = (1/2)γ μ γ ν Fμν

Thus, the anomaly after scaling by M 4 is one plus an infinitesimal imaginary


number α times 
T r[(γ μ γ ν Fμν (x))2 ]d4 x

= T r(γ μ γ ν γ α γ β )Fμν (x)Fαβ (x)d4 x

When the infinitesimal symmetry transformation becomes finite, then this cal-
culation amounts to a change in the action by the quantity

(μναβ)Fμν (x)Fαβ (x)d4 x

where
(μναβ) = T r(γ μ γ ν γ α γ β )
The non-Abelian anomaly: Here, the gauge group is SU (N ) with N > 1. Let
ta , a = 1, 2, ..., N be Hermitian generators of this gauge group and after taking
into account Chirality, ie, the Lie algebra is a direct sum of (1 + γ5 )su(N ) and
(1 − γ5 )su(N ). The factor defining the change in the measure now gets modified
to
T r(f ((γ.D)2 /M 2 ))
or equivalently as
T r((γ.D)4 )
where
D = γ.∂ + iγ.A
where
A(x) = ta Aa (x)
176 Advanced Probability and Statistics: Applications to Physics and Engineering

ie
γ.A(x) = γ μ ta Aaμ (x)
By γ μ ta we mean γ μ ⊗ ta and we get by a similar calculation as for the Abelian
case,
−i(γ.D)2 = −i(γ.∂ + iγ.A)2
= [γ.∂.γ.A + γ.A.γ.∂] + i(γ.A)2
so that effectively (ie, after neglecting factors that have zero trace),

[γ.∂.γ.A + γ.A.γ.∂]2 =

γ.A.(γ.partial)2 .γ.A
+γ.∂.γ.A.γ.∂.γ.A
+γ.A.γ.∂.γ.A.γ.∂
+γ.∂.(γ.A)2 .γ.∂
Taking trace,
−T r((γ.D)4 ) =
iT r((A2 )2 ) + iT r(Aμ Aμ .[γ.∂.γ.A + γ.A.γ.∂])
T r[(γ.A)2 (γ.∂)2 ] + 2T r[γ.A.γ.∂.γ.A.γ.∂]

A better way to make this computation is as follows:

Dμ = ∂μ + iAμ , Aμ = Aaμ ta

(γ.D)2 = γ μ γ ν Dμ Dν
= (2η μν − γ ν γ μ )Dμ Dν
= 2D2 − [γ ν γ μ [Dμ , Dν ] + γ ν γ μ Dν Dμ ]
= 2D2 − γ ν γ μ Fμν − γ μ γ ν Dμ Dν
Therefore,
2(γ.D)2 = 2D2 − γ ν γ μ Fμν
and we get on squaring this followed by taking the trace,

2T r((γ.D) ) = T r(γ ν γ μ Fμν (x))2 d4 x
2


= (μναβ)T r(Fμν (x).Fαβ (x))d4 x

Note that
D2 = (∂μ + iAμ ).(∂ μ + iAμ )
=  − A2 + i(2Aμ (x)∂μ )
Advanced Probability and Statistics: Applications to Physics and Engineering 177

provided that we impose the gauge condition Aμ,μ = 0. We have

T r((D2 )2 ) = T r(( − A2 )2 ) − 4.T r(Aμ ∂μ Aν ∂ν )

+4iT r(( − A2 )Aμ ∂μ )


We neglect the first term as it is a pure gauge term and can be made to vanish
by an appropriate choice of the gauge. The second term can be expressed as

T r(Aμ Aν ∂μ ∂ν ) + T r(Aμ Aν,μ ∂ν )

Now let (x) be a smooth approximation to the δ function in R4 . We have


approximately,
T r(Aμ Aν ∂μ ∂ν ) =

Aμ (x)Aν (x) ,μν (0)d4 x

Now we can choose (x) so that

,μν (0) = Cημν

so 
μ ν
T r(A A ∂μ ∂ν ) = C A2 (x)d4 x

which is a pure gauge term and can be made to vanish by an appropriate choice
of the gauge. Again,
T r(Aμ Aν,μ ∂ν )

= Aμ (x)Aν,μ (x) ,ν (0)d4 x = 0

since by choosing to be an even function like a Gaussian pulse, we can make


,ν (0) = 0.

[6] Problems in electrodynamics, fluid dynamics and quantum me-


chanics
[1] A perfectly conducting sphere of radius R is placed in a uniform electric
field extending over all space. Calculate (a) the electric field outside the sphere,
(b) the surface charge distribution on the spherical surface and (c) the force
exerted by the electric field on the sphere.
[2] Repeat [1] when a total charge Q is distributed on the surface of the per-
fectly conducting sphere whose potential is maintained at V0 relative to infinity.
How does the total charge Q distribute itself on the spherical surface ?
[3] Describe a quantum electromagnetic field in terms of creation and anni-
hilation operators in wave-vector space or equivalently momentum space. Do
the following steps:
[1] Write down the Hamiltonian H of a single quantum harmonic oscillator:

H = p2 /2m + mω 2 x2 /2, p = −ihd/dx


178 Advanced Probability and Statistics: Applications to Physics and Engineering

where h is Planck’s constant divided by 2π.


[2] Solve the eigenvalue equation

Hψ(x) = Eψ(x), |ψ(x)|2 dx < ∞
R

as follows. Assume
ψ(x) = u(x).exp(−ax2 /2)
where a is determined so that the coefficient of x2 cancels out in the differential
equation for u(x). Then, write down an infinite series expansion for u(x) and
show that if the infinite series does not terminate with some term being zero,
then ψ(x) = exp(−ax2 /2).u(x) explodes as |x| → ∞ and hence cannot be
square integrable. Show that the infinite series terminates to a polynomial iff
E = (n + 1/2)hω for some non-negative integer n.

[7] Course outline for electromagnetic field theory


[1] The Maxwell equations in the absence of displacement current, discrep-
ancies.
[2] The Maxwell equations in the presence of displacement current, removal
of the discrepancy in charge conservation.
[3] Boundary conditions on the electromagnetic field.
[4] The propagation of TEM waves, unification of light with electricity and
magnetism.
[5] Computing capacitance between various kinds of conducting surfaces in
non-uniform media using the generalized Laplace equation.
[6] Finite difference and finite element methods for numerically solving the
Maxwell equations with boundaries.
[7] Maxwell equations in the presence of random charge and current density
fields analysis based on stochastic calculus for Brownian motion and Poisson
processes.
[8] Relativistic invariance of the wave equation.
[9] Transformation of the electromagnetic fields between two inertial frames.
[10] Motion of charged particles in an electromagnetic field: The Lorentz
equation.
[11] The energy density and Poynting power flux in an electromagnetic field.
[12] Quantization of the electromagnetic fields as superposition of plane
waves with wave-vector domain creation and annihilation operator field coef-
ficients.
[13] Quantization of the matter field into the electron-positron field.
[14] Dirac’s relativistic wave equation for electrons and positrons with elec-
tromagnetic interactions.
[15] Dirac’s equation for nuclear matter with interaction with non-Abelian
gauge bosons.
[16] Symmetry breaking and the electro-weak theory. How the scalar Higgs
particle in its ground state breaks the symmetry thereby giving mass to the
gauge bosons and the electrons.
Advanced Probability and Statistics: Applications to Physics and Engineering 179

[17] Computation of the photon and electron propagators using the Green’s
function satisfying an inhomogeneous pde driven by the four dimensional Dirac
delta function.
[18] Spectral function sum rules: From general principles of Lorentz invari-
ance, the form of the propagator of any scalar field can be expressed as an
integral of the propagator of a particle of mass μ over all possible masses μ
with respect to a measure on the space of masses μ. This is also called the
Kallen-Lehmann representation.
[19] Notion of the quantum effective action derived from the Legendre trans-
form of the logarithm of the path integral for a field in the presence of interaction
between a current field J and the field φ. Properties of the quantum effective
action like its invariance under a gauge transformation when the original action
is invariant under the gauge transformation. How to express the quantum equa-
tions of motion in terms of the quantum effective action. How to compute the
quantum effective action for superconductivity as a function of the gap function
and the external magnetic vector potential and from this function, how to derive
the basic properties of a superconductor.
[20] The ADM action for the gravitational field. How to express the Einstein-
Hilbert action in terms of canonical ADM position and momentum fields. How
to quantize the gravitational field by introducing commutation relations for the
canonical position and momentum fields in the ADM action. While introducing
the canonical commutation relations, we should bear in mind that the ADM
Hamiltonian has constraints like some of the momenta vanish and hence one
should use the Dirac bracket rather than the Poisson/Lie bracket for the com-
mutation rules.
[21] Proof of the general fact that spontaneous symmetry breaking leads to
the generation of massless Goldstone bosons and that the addition of a sym-
metry breaking action to the original invariant action leads to the generation
of particles having nonzero masses. In other words, approximate symmetry
breaking gives masses to massless particles and this has applications to the elec-
troweak theory which explains not only how the gauge bosons that propagate
nuclear forces acquires masses but also how the electron acquires mass when the
matter and gauge fields get coupled to the scalar Higgs field.
[22] The thermal emission of particles from a blackhole via Hawking radiation
and how this phenomenon can be used to communicate from within the critical
radius to without by the use of entangled states. First we start with a state
on the tensor product between the Hilbert spaces for fields with support within
the critical radius and without. Then by using local operations combined with
classical communication, we can generate an entangled state between the interior
and exterior of a blackhole using which we can transmit quantum states from
inside to the outside of the blackhole and vice-versa. By local operations, we
mean the following: Let Ha , a = 1, 2 be two Hilbert spaces and let H = H1 ⊗ H2
be their tensor product. Let ρ be a state in H. I choose a POVM {Mk } acting
in H1 . After applying this measurement to ρ, I note the outcome, say a. Then
180 Advanced Probability and Statistics: Applications to Physics and Engineering

the state ρ of the joint system collapses to


 
ρa = ( Ma ⊗ I2 )ρ( Ma ⊗ I2 )/T r(ρ(Ma ⊗ I2 ))

Then you apply a unitary Ua acting in H2 dependent on my measurement


outcome a. The overall state then gets changed to

σa = (I1 ⊗ Ua )ρa (I1 ⊗ Ua )∗

and my state after this operation is


 
ρ1a = T r2 (σa ) = T r2 [( Ma ⊗ Ua )ρ( Ma ⊗ Ua∗ )/T r(ρ(Ma ⊗ I2 ))

while your state is


 
ρ2a = T r1 (σa ) = T r1 [( Ma ⊗ Ua )ρ( Ma ⊗ Ua∗ )/T r(ρ(Ma ⊗ I2 ))

If |Φ > denotes a maximally entangled state in H, then the question is how to


select the measurements {Mk } and the unitaries {Uk }, so that σa becomes as
close as possible to |Φ >< Φ| w.r.t some distance measure ?

[8] Symmetry breaking in quantum field theory


[1] Let V (φ) be the quantum effective potential. If the original action I(φ)
is invariant under an infinitesimal transformation φn → φn + tnm φm , then
it is well known that the quantum effective potential has the same invariance
property. For a proof, see Steven Weinberg, ”The quantum theory of fields,
vo.II”. Thus, we have

(∂V (φ)/∂φn )tnm φm = 0
n,m

Another differentiation at the ground state φ0 where ∂V (φ0 )/∂φn = 0 gives us



[∂ 2 V (φ0 )/∂φn ∂φk ]tnm φ0m = 0

This formula means that (tnm φ0m )n is an eigenstate of the mass matrix having
zero eigenvalues. Thus, spontaneous symmetry breaking which occurs when
the physics is viewed from the ground state leads to the generation of massless
particles, called massless Goldstone Bosons.
Remark: Let
 
exp(W (J)) = exp(iI(φ) + i J.φ)Dφ

Where 
J.φ = Jφd4 x

We have
i exp(iI(φ) + iJ.φ)φ(x)Dφ
δW (J)/δJ(x) = i < φ(x) >J =
exp(iI(φ) + iJ.φ)Dφ
Advanced Probability and Statistics: Applications to Physics and Engineering 181

Let J = Jψ be defined so that

< φ >J = ψ

Then define
iV (ψ) = ExtK (iK.ψ − W (K))
The extremum is attained when

iψ = W  (K)(x) = δW (K)/δK(x)

or in other words, when


K = Jψ
Thus,
V (ψ) = Jψ .ψ + iW (Jψ )
We now compute

δV (ψ)/δψ(x) = Jψ (x) + (δJψ /δψ(x)).ψ

+(iδW (Jψ )/δJ).δJψ /δψ(x)


= Jψ (x)
since by construction,
δW (Jψ )/δJ = iψ
It then follows that

δ 2 V (ψ)/δψ(x)δψ(y) = δJψ (x)/δψ(y)

On the other hand, we have

exp(−W (J))δ 2 exp(W (J))/δJ(x)δJ(y) = − < φ(x)φ(y) >J

= −ΔJ (x, y)
where ΔJ is the propagator at current J. Then,

δexp(W (J))/δJ(x) = exp(W (J))δW (J)/δJ(x)

and hence,

−ΔJ (x, y) = exp(−W (J))δ 2 exp(W (J))/δJ(x).δJ(y) = δ 2 W (J)/δJ(x)δJ(y)

+(δW (J)/δJ(x)).(δW (J)/δJ(y))


If at a certain current J, φ has zero expectation value, then we get

ΔJ (x, y) = −δ 2 W (J)/δJ(x)δJ(y)

Now if
< φ >J = ψ
182 Advanced Probability and Statistics: Applications to Physics and Engineering

we get that

−exp(W (J))ΔJ = (δ/δJ)(exp(W (J) < φ >J ) = δ(exp(W (J))ψ)/δJ =

exp(W (J))[δψ/δJ + (δW (J)/δJ) ⊗ ψ)


= exp(W (J))[δψ/δJ + iψ ⊗ ψ)]
so that if we evaluate it when ψ =< φ >J = 0, we then get

−exp(W (J))ΔJ = exp(W (J))δψ/δJ

or equivalently.
−ΔJ = δψ/δJ = (δJ/δψ)−1 |ψ=0
= (δ 2 V (ψ)/δψ ⊗ δψ)−1 |ψ=0
In other words, the Hessian of the quantum effective potential is the negative
of the inverse of the propagator kernel evaluated at zero mean field.

[9] A short course in quantum filtering theory for mixed process


measurements
We shall derive the quantum filtering equations and apply it to estimate the
noisy quantum electromagnetic field within a cavity resonator based on non-
demolition measurements of the cavity field.
HP equation

dU (t) = (−(iH + P )dt + L1 dA(t) + L2 dA(t)∗ + SdΛ(t))U (t)

P, L1 , L2 , S are chosen so that U (t) is unitary ∀t. Make the non-demolition


measurements

Yo (t) = U (t)∗ Yi (t)U (t), Yi (t) = c.A(t) + c̄A(t)∗ + b.Λ(t)

Then
dYo (t) = dYi (t) + U (t)∗ (c̄L∗2 dt + cSdA
Application of quantum Ito’s formula gives

dYo (t) = jt (cL2 +c̄L∗2 )dt+jt (c+bL∗2 +cS)dA+jt (c̄+b̄L2 +c̄S ∗ )dA∗ +jt (b+bS ∗ +b̄S)dΛ

Next, derive an algorithm for computing (dYo (t))n , n = 1, 2, ...: Let

(dYo (t))n = jt (P [n])dt+jt (Q[n])dA(t)+jt (R[n])dA(t)∗ +jt (S[n])dΛ(t), n = 1, 2, ...

Clearly,

P [1] = cL2 + c̄L∗2 , Q[1] = c + bL∗2 + cS, R[1] = c̄ + b̄L2 + c̄S ∗ , S[1] = b + bS ∗ + b̄S

Thus, the equation


(dYo (t))n+1 = dYo (t).(dYo (t))n
Advanced Probability and Statistics: Applications to Physics and Engineering 183

gives us

(jt (Q[1])dA + jt (S[1])dΛ).(jt (R[n])dA∗ + jt (S[n])dΛ) =

jt (P [n + 1])dt + jt (Q[n + 1])dA + jt (R[n + 1])dA∗ + jt (S[n + 1])dΛ


and therefore, on using quantum Ito’s formula and the fact that jt is an algebra
homomorphism, we get

Q[1]R[n] = P [n+1], Q[1]S[n] = Q[n+1], S[1]R[n] = R[n+1], S[1]S[n]


= S[n+1], n ≥ 1
Solving these gives us
R[n] = R[1].S[1]n−1 , n ≥ 1,
S[n] = S[1]n , n ≥ 1,
Q[n] = Q[1]S[1]n−1 , n ≥ 1,
P [n] = Q[1]R[1].S[1]n−2 , n ≥ 2,
Remarks: The cavity em fields can be expanded in terms of modes:
 
Ez (t, r) = cn (t)ψn (x, y, z), Hz (t, r) = bn (t)χn (x, y, z)
n n
where
cn (t) = Re(cn (0).exp(jω(n)t)), dn (t) = Re(dn (0).exp(jω(n)t)
The basis functions ψn (r) form an orthonormal family and likewise χn (r). The
transverse components of the em field can be computed in terms of the longitu-
dinal components by standard formulae:

E⊥ (t, r) = Re cmnp (0)exp(jω(mnp)t)∂z ∇⊥ ψmnp (r)/h(m, n)2
n

+Re ((−jω(mnp)/h(m, n)2 )∇⊥ × χmnp (r)d(mnp)(0)exp(jω(mnp)t))
mnp

and likewise for H⊥ (t, r). It follows that the total energy of the em field within
the cavity resonator can be expressed as
 a b d
U (t) = ( /2)[|E(t, r)|2 + (μ/2)|H(t, r)|2 ]d3 r
0 0 0

and its time average is the same as its instantaneous value in view of the or-
thogonality of the modes, ie the energy within the guide is a constant:
 T 
−1
U (t) =< U >= limT →∞ T U (t)dt = α(mnp)|cmnp (0)|2 +β(mnp)|dmnp (0)|2
0 mnp
184 Advanced Probability and Statistics: Applications to Physics and Engineering


= α(mnp)|cmnp (t)|2 + β(mnp)|dmnp (t)|2
mnp

In the case of a cavity with arbitrary curvilinear cross-section, the form of the
energy is the same:

U (t) =< U >= [α(n)|cn (t)|2 + β(n)|dn (t)|2 ]
n

It should be noted that

cn (t) = cn (0).exp(−iα(n)t), dn (t) = dn (0).exp(−iβ(n)t)

Note that taking the cavity quantum Hamiltonian H = U , we get on using the
commutation relations

[cn , c∗m ] = δnm , [dn , d∗m ] = δnm ,

[cn , cm ] = [dn , dm ] = [c∗n , c∗m ] = [d∗n , d∗m ] = [cn , d∗m ] = 0


Then, the Heisenberg equations of motion are

cn (t) = i[H, cn (t)] = −iα(n)cn (t), dn (t) = −iβ(n)t

It should be noted that in a cavity of arbitrary cross sectional shape, the char-
acteristic frequencies of oscillation for the T E and T M modes are different
owing to the different kinds of boundary conditions involved. Specifically, in
the T E case, the boundary condition is of the Dirichlet type which follows
from the condition that Ez vanish on the boundary while in the T M case,
the boundary condition is of the Neumann type which follows from the condi-
tion that the normal component of the magnetic field or equivalently, ∂Hz /∂ n̂
vanishes on the boundary. Now suppose that the cavity is placed within a
noisy bath described by the annihilation, creation and conservation processes
An (t), An (t)∗ , Λn (t), n = 1, 2, .... The magnetic vector potential of the em field
within the guide has the form

Asys (t, r) = 2Re [cn (t)ψsys,n (r) + dn (t)χsys,n (r)]
n

= [cn (t)ψsys,n (r) + cn (t)∗ ψsys,n (r)∗ + dn (t)χsys,n (r) + dn (t)∗ χsys,n (r)∗ ]
n

and likewise, that of the bath has the form



Abath (t, r) = [An (t)ψbath,n (r) + An (t)∗ ψbath,n (r)∗ ]
n

The photon number operator of the bath is given by

dΛn (t) = dAn (t)∗ .dAn (t)/dt


Advanced Probability and Statistics: Applications to Physics and Engineering 185

and hence if there are photocells in the system whose noise current is propor-
tional to the number of bath photons, then this noise current will in turn gener-
ate a noisy em field within the cavity and bath whose corresponding magnetic
vector potential has the form

Aph (t, r) = Λn (t)ψph,n (r)
n

The total electric field within the system is the sum of the system and bath
components:
E(t, r) = Esys (t, r) + Ebath (t, r) =
−∂Asys (t, r)/∂t − ∂Abath (t, r)/∂t − ∂Aph (t, r)/∂t
an likewise for the magnetic field:

B(t, r) = Bsys (t, r)+Bbath (t, r) = ∇×Asys (t, r)+∇×Abath (t, r)+∇×Aph (t, r)

The total em field energy of the system and bath is then given by

Utot = Usys + Ubath (t) + Uint (t)

It is the component H(t) = Usys + Uint (t) which affects the system dynamics
and this concerns us here. It has the form

H(t) = [α(n)|cn (t)|2 + βn (t)|dn (t)|2 ]
n

+ (L1 (n, m)cn (t) + L2 (n, m)cn (t)∗ + L3 (n, m)dn (t) + L3 (n, m)dm (t)∗ )Am (t)
n,m

+ (L5 (n, m)cn (t) + L6 (n, m)cn (t)∗ + L7 (n, m)dn (t) + L8 (n, m)dm (t)∗ )Am (t)∗
n,m

+ (L9 (n, m)cn (t)+L10 (n, m)cn (t)∗ +L11 (n, m)dn (t)+L2 (n, m)dm (t)∗ )Λm (t)
n,m

[10] Examination problems in electromagnetics


[1] [a] Write down the expressions for r̂, θ̂ and φ̂ for the spherical-polar co-
ordinate system in terms of x̂, ŷ, ẑ and x, y, z.
or

[b] Write down the expression for the vector field

F(r) = Fx (x, y, z)x̂ + Fy (x, y, z)ŷ + Fz (x, y, z)ẑ

in the cylindrical coordinate system.

[2] Calculate the flux of the vector field

F(r) = Fr (r, θ)r̂ + Fθ (r, θ)θ̂


186 Advanced Probability and Statistics: Applications to Physics and Engineering

out of the surface of sphere of radius R centred at the origin both directly and
using Gauss’ divergence theorem. Prove that these two expressions agree. You
may use the following formula for the divergence in spherical polar coordinates:
1 ∂ 2 1 ∂ 1 ∂
divF = 2
r Fr + sin(θ)Fθ + . Fφ
r ∂r r.sin(θ) ∂θ r.sin(θ) ∂φ

You may also use the expression dS = R2 sin(θ)dθ.dφ for the surface element
on the spherical surface.

[3] Two concentric rings of radii a < b exist in the plane with centre at the
origin. If the potential on the inner ring is V1 (φ) and that on the outer ring
is V2 (φ) with 0 ≤ φ < 2π, then by solving Laplace’s equation in the annulus
a < ρ < b and applying the boundary conditions, calculate the potential in this
annulus. Express your solution in the form:
 2π
V (ρ, φ) = [K1 (ρ, φ, φ )V1 (φ ) + K2 (ρ, φ, φ )V2 (φ )]dφ
0

The kernels K1 , K2 may be expressed as infinite series.

[4] A two dimensional plane in three dimensions has the equation

nx x + ny y + nz z = d, n2x + n2y + n2z = 1, d > 0

[a] Show that the vector û − ny x̂ + nx ŷ is tangent to the plane.


Assume that there exists a constant surface current density Js û on this plane
and a constant surface charge density σs on the same plane. Also assume that
in the region I : nx x + ny y + nz z − d > 0, the permittivity and permeability are
I , μI respectively and that in the region II : nx x + ny y + nz z − d < 0, these
are II , μII . Then,
[b] Calculate the electric and magnetic fields in region II if a constant electric
and magnetic field EI , HI exists in region I.

[5] Write down the general equation of continuity, Gauss’ law and Ohm’s law
in terms of ρ, J, E, σ, . By manipulating these, show that ρ decays with time
as exp(−σt/ ). Using this formula, define and determine the relaxation time.

[11] Tutorial problems in electromagnetics


[1] Calculate the magnetic field produced by an infinitely long cylindrical
wire of radius R, if it carries a current of density Jz (ρ), 0 ≤ ρR along its axis.
The field is to be computed both within the cylinder and outside it. Do this
problem by first using cylindrical symmetry combined with Ampere’s law in
integral and then by actually solving Poisson’s equation

∇2 Az (ρ) = −μ0 Jz (ρ)

combined with appropriate boundary conditions.


Advanced Probability and Statistics: Applications to Physics and Engineering 187

[2] Let D denote the cross sectional region in the xy plane of a cylindrical
waveguide with axis along the z-axis. Choose an orthogonal curvilinear coor-
dinate system (q1 , q2 ) in the xy plane and express the transverse components
of the electric and magnetic field in the q1 − q2 system in terms of the first
order partial derivatives of their z-components w.r.t q1 and q2 . Formulate the
Helmholtz equations for Ez , Hz with appropriate boundary conditions in the
q1 − q2 system. Assume now that the boundary of the guide is q1 = constant
and explain how by drawing a triangular grid in the q1 − q2 system and apply-
ing the finite element method combined with the Dirichlet boundary conditions
on Ez and the Neumann boundary conditions on Hz , you can approximately
determine the modal eigenvalues and eigenfunctions for these fields as a matrix
generalized eigenvalue problem.

[3] Consider the problem of calculating the multipole moments of a radiation


field. Let Ylm (r̂) denote the spherical harmonics and consider the following
vector valued functions on the unit sphere

Xlm (r̂) = LYlm (r̂), L = −ir × ∇

Show that
r̂.Xlm = 0
Show that the Helmholtz equation

(∇2 + k 2 )f (r)Ylm (r̂) = 0

leads to an ode for f (r) parametrized by l and denote two of its linearly inde-
pendent solutions by fl (r), gl (r). Consider solving the Maxwell equations

divE = 0, divH = 0, (∇2 + k 2 )E = 0, (∇2 + k 2 )H = 0

Show that these equations imply

(∇2 + k 2 )(r.E) = 0, (∇2 + k 2 )(r.H) = 0

Show that fl (r)Xlm (r̂) satisfies the Helmholtz equation and that this vector is
perpendicular to the radial direction. Thus, we can define a multipole transverse
electric field component as

Elm = fl (r)Xlm (r̂)

and the magnetic field corresponding to this is given by

H̃lm = (−1/jωμ)∇ × Elm

Likewise we can define a transverse multipole magnetic field as

Hlm = fl (r)Xlm(r̂)
188 Advanced Probability and Statistics: Applications to Physics and Engineering

and then the corresponding electric field is given by

Ẽlm = (1/jω )∇ × Hlm

Show then that the general solution for the radiation fields can be expressed as

E= [a(l, m)fl (r)Xlm (r̂) + b(l, m)(1/jω )∇ × (fl (r)Xlm (r̂)]
lm

H= [(−1/jωμ)a(l, m)∇ × (fl (r)Xlm (r̂)) + b(l, m)(fl (r)Xlm (r̂)]
lm

Using the orthogonality relations and eigenfunction relations for the spherical
harmonics, prove orthogonality relations for the vector spherical harmonics Xlm .
Using these orthogonality relations, explain how if you know r̂.E and r̂.H on the
surface of a sphere of radius R with centre at the origin, then you can compute
the coefficients a(l, m), b(l, m) in this multipole expansion.

[4] Explain via the Boltzmann kinetic transport equation, how a plasma
interacts with an electromagnetic field and derive approximate dispersion rela-
tions that describe the propagation of perturbations in the plasma distribution
function as well as those in the electromagnetic field.

[12] electromagnetic fields


[1] Gravitational waves within a cavity resonator interacting with the noisy
electromagnetic field in the surrounding bath.
The weak gravitational field is described by the metric tensor

gμν (x) = ημν + hμν (x)

where hμν is of the first order of smallness. The bath electromagnetic field is
described by an electromagnetic four potential of the form

Aμ (t, r) = [a(k, s)ψμ (k, s, t, r) + a(k, s)∗ ψμ (k, s, t, r)]d3 k


+ [Am (t)φμ (m, r) + Am (t)φμ (m, r)∗ + Λm (t)χm (r)]
m

where a(k, s), a(k, s)∗ are the usual annihilation and creation operator fields in
momentum space within the bath satisfying the CCR

[a(k, s), a(k  , s )∗ ] = δss δ 3 (k − k  )

and Am (t), Am (t)∗ , Λm (t) are the annihilation, creation and conservation noise
processes in the bath satisfying the usual Hudson-Parthasarathy quantum Ito
formula:
dAm (t)dAn (t)∗ = δmn dt, (dΛ(t))2 = dΛ(t),
dAm (t)dΛn (t) = δmn dAm (t), dΛm (t)dAn (t)∗ = δmn dAn (t)∗
Advanced Probability and Statistics: Applications to Physics and Engineering 189

The quantum field operators a(k, s), a(k, s)∗ are assumed to commute with the
quantum noise operators Am (t), Am (t)∗ , Λm (t). The bath is assumed to be in
the following pseudo-coherent state:
|v, φ(u) >
defined so that the above quantum field and quantum noise operators have the
following action:
a(k, s)|ψ(v), φ(u) >= v(k, s)|ψ(v), φ(u) >,
< ψ(v  ), φ(u )|a(k, s)|ψ(v), φ(u) >= v(k, s) < ψ(v  ), φ(u )|ψ(v), φ(u) >=
v(k, s)exp(< v  |v >)exp(< u |u >)
It should be noted that
 
< v |v >= v (k, s) v(k, s)d k, < u |u >= ūm (t)∗ um (t)dt
  ∗ 3 

with the summation over over the repeated index s = 1, 2 being implied and
likewise over the repeated index m. We have therefore
< ψ(v  ), φ(u )|a(k, s)∗ |ψ(v), φ(u) >= v  (k, s)∗ .exp(< v  |v >)exp(< u |u >)
and further,
 t
Am (t)|ψ(v), φ(u) >= um (t )dt |ψ(v), φ(u) >,
0
so that
 t
 
< ψ(v ), φ(u )|Am (t)|ψ(v), φ(u) >= um (t )dt .exp(< v  |v >).exp(< u |u >)
0
 t
< ψ(v  ), φ(u )|Am (t)∗ |ψ(v), φ(u) >= um (t )∗ dt < exp(< v  |v >).exp(< u |u >)
0

and finally,
 t
< ψ(v  ), φ(u )|Λm (t)|ψ(v), φ(u) >= ( um (t )∗ um (t )dt )exp(< v  |v >).exp(< u |u >)
0

In this last expression, there is no summation over the index m.


Construction of the coherent state |ψ(v) > for the bath quantum field: Define
 
−1
|ψ(v) >= (C(v) (n!) v(k1 , s1 )...v(kn , sn )a(k1 , s1 )∗ ...a(kn , sn )∗ d3 k1 ...d3 kn )|0 >
n≥0

where the summation over the repeated spin indices s1 , ..., sn is implied and
|0 > is the vacuum state while C(v) is a normalization constant. We have for
example

a(k, s) v(k1 , s1 )v(k2 , s2 )a(k1 , s1 )∗ a(k2 , s2 )∗ d3 k1 d3 k2 |0 >
190 Advanced Probability and Statistics: Applications to Physics and Engineering


= v(k1 , s1 )v(k2 , s2 )([a(k, s), a(k1 , s1 )∗ ]+a(k1 , s1 )∗ a(k, s))a(k2 , s2 )∗ d3 k1 d3 k2 |0 >

= v(k1 , s1 )v(k2 , s2 )(δss1 δ 3 (k − k1 )a(k2 , s2 )∗ d3 k1 d3 k2 |0 >

+ v(k1 , s1 )v(k2 , s2 )a(k1 , s1 )∗ ([a(k, s), a(k2 , s2 )∗ ])+a(k2 , s2 )∗ a(k, s))d3 k1 d3 k2 |0 >

= v(k, s) v(k2 , s2 )a(k2 , s2 )∗ d3 k2 |0 >

+v(k, s) v(k1 , s1 )a(k1 , s1 )∗ d3 k1 |0 >

= 2v(k, s) v(k1 , s1 )a(k1 , s1 )∗ d3 k1 |0 >

since
a(k, s)|0 >= 0
This result is easily extended to give

a(k, s) v(k1 , s1 )∗ ...v(kn , sn )∗ a(k1 , s1 )∗ ...a(kn , sn )∗ d3 k1 ...d3 kn |0 >

= na(k, s) v(k1 , s1 )∗ ...v(kn−1 , sn−1 )∗ a(k1 , s1 )∗

...a(kn−1 , sn−1 )∗ d3 k1 ...d3 kn |0 >, n = 0, 1, 2, ...

From this result, we easily get that


a(k, s)|ψ(v) >= v(k, s)|ψ(v) >
The Lagrangian of the free gravitational field has the following form upto
quadratic orders in the metric perturbation on assuming the coordinate condi-
tion hμ0 = 0:

Lg = [C1 (abrcms)hab,r (x)hcm,s (x)+C2 (ab0cm0)hab,0 (x)hcm,0 (x)+
C3 (abrcm0)hab,r (x)hcm,0 (x)
C4 (abcm)hab (x)hcm (x)]d3 x
The Lagrangian density of the gravitational field interacting with the bath elec-
tromagnetic field has the following form

Lint = C5 (abμναβ)hab (x)Fμν )(x)Fαβ (x)d3 x

where
Fμν (x) = Aν,μ (x) − Aμ,ν (x)
Advanced Probability and Statistics: Applications to Physics and Engineering 191

and Aμ (x) given by (1) above is a superposition of annihilation and creation


operators fields and the fundamental quantum noise creation, annihilation and
conservation processes of the Hudson-Parthasarathy quantum stochastic calcu-
lus. So we can write

Lint = hab (x)(D1 (ab, k, s, k  , s , x)a(k, s)a(k  , s )+D1 (ab, k, s, x)∗ a(k, s)∗ a(k  , s )∗

+D2 (ab, k, s, k  , s , x)a(k, s)∗ a(k  , s ))d3 kd3 k  d3 x


where we have neglected the conservation process term. Note that Lint is de-
rived from the electromagnetic Lagrangian in a background gravitational field
in general relativity:
 
Lem = (−1/4) Fμν (x)F μν (x) −g(x)d3 x

where
F μν = g μα g νβ Fαβ
by replacing
g μν ≈ ημν − hμν
where
hμν = ημα ηνβ hαβ

[13] A problem in electromagnetics


Let J(t, r) be a spatio-temporal stationary current density field. Its correla-
tion is given by
E[Ja (t, r).Jb (t , r )] = Rab (t − t , r − r )
Its temporal Fourier transform is

Jˆa (ω, r) = Ja (t, r).exp(−jωt)dt
R

and the far field magnetic vector potential produced by this density is given by

Aa (ω, r) = K.(exp(−jkr)/r) Ja (ω, r )exp(jkr̂.r )d3 r

We get

E(Jˆa (ω, r).Jˆb (ω  , r )∗ ) = Rab (t − t , r − r )exp(−j(ωt − ω  t ))dtdt

= 2πSab (ω, r − r )δ(ω − ω  )


where 
Sab (ω, r) = Rab (t, r).exp(−jωt)dt
R
192 Advanced Probability and Statistics: Applications to Physics and Engineering

Then, in the far field zone,

E[Aa (ω1 , r1 )Ab (ω2 , r2 )∗ ] =



2
(K /r1 r2 )exp(−jk1 (r1 −r2 )) Sab (ω, r1 −r2 ).exp(j(k1 (r̂1 .r1 −r̂2 .r̂2 ))d3 r1 d3 r2 .δ(ω1 −ω2 )

Now suppose that an atom with a single electron is excited by this random
electromagnetic field.
The interaction Hamiltonian in the Dirac picture is given by
V (t) = e(α, A(t, r)) − eΦ(t, r) = −eαμ Aμ (t, r)

The transition probability in time [0, T ] from an initial state u(r) to a final state
v(r) is given upto O(e2 ) in perturbation theory by
 T
PT (u → v) = e2 E[| < v|αμ Aμ (t, r)|u > dt|2 ]
0
 
= e2 < v(r)|αμ |u(r) >< u(r )|αν |v(r )
[0,T ]2
> E[Aμ (t, r)Aν (t , r )]dtdt d3 rd3 r
where 
Aμ (t, r) = K Jμ (x )δ((x − x )2 )d4 x

where x = (t, r) and x2 = t2 − r2 with c = 1. Thus, the above formula for the
transition probability can be expressed in four dimensional notation as
PT (u → v) =

K 2 e2 < v(r1 )|αμ |u(r1 ) >< u(r2 )|αν |v(r2 )

> E(Jμ (x1 )Jν (x2 ))δ((x1 −x1 )2 ).δ((x2 −x2 )2 )d4 x1 d4 x2 d4 x1 d4 x2

= K 2 e2 < v(r1 )|αμ |u(r1 ) >< u(r2 )|αν |v(r2 )
J
> Rμν (x1 −x2 )δ((x1 −x1 )2 ).δ((x2 −x2 )2 )d4 x1 d4 x2 d4 x1 d4 x2
[14] Problems on physics in a curved background metric
[1] Write down the Maxwell equations in the curved metric

dτ 2 = dt2 − S(t)2 f (r)dr2 − S(t)2 r2 (dθ2 + sin2 (θ)dφ2 )

and obtain approximate solutions assuming |f (r) − 1|. is small.


hint: Write the metric as

dτ 2 = dt2 − S(t)2 (dx2 + dy 2 + dz 2 ) − S 2 (t)h(r)dr2

where
h(r) = f (r) − 1
Advanced Probability and Statistics: Applications to Physics and Engineering 193

We note that
dr = (x.dx + y.dy + z.dz)/r
so that

dr2 = r−2 (x2 dx2 + y 2 dx2 + z 2 dz 2 + 2xydxdy + 2yzdydz + 2zxdzdx)

Thus writing
x1 = x, x2 = y, x3 = z
we can write
dτ 2 = dt2 − S 2 (t)(δab + .hab (r))dxa dxb
where is a small perturbation parameter. The Maxwell equations can be
expressed as
Fμν = Aν,μ − Aμ,ν ,

(F μν −g),ν = 0
We have
g00 = 1, g0a = 0, gab = −S 2 (t)(δab + .hab (r))
Thus,
g 00 = 1, g 0a = 0, g ab = −S(t)−2 (δab − .hab ) + O( 2 )
The Maxwell equations upto O( ) terms read as follows:

Fμν = Aν,μ − Aμ,ν ,

F μν = g μα g νβ Fαβ
F 0a = g ab F0b = −S −2 F0a + .S −2 hab F0b
F ab = g ak g bm Fkm =
S −4 (δak − .hak )(δbm − .hbm )Fkm
= S −4 (Fab − .(hak Fkb + hbk Fak )
g = −S 6 (1 + .h), h = haa

−g = S 3 (1 + .h/2)

−gF 0a = S 3 (1 + .h/2)(−S −2 )(F0a − .hab F0b )
= −S(F0a + .(h.F0a /2 − hab F0b ))
[15] A problem in electrodynamics
A perfectly conducting cylindrical surface of radius R and length L is placed
with its axis along the z axis and extending from z = −L/2 upto z = L/2.
An electromangetic wave with electric field Ei (ω, ρ, θ, φ) is incident upon this
cylinder Let
Js (φ, z) = Jsφ (z, φ)φ̂ + Jsz (φ, z)ẑ
be the induced surface current density on the cylindrical surface. Derive integral
equations satisfied by it at frequency ω.
194 Advanced Probability and Statistics: Applications to Physics and Engineering

Solution: The magnetic vector potential produced by the surface current


density is given by

A(ω, ρ, φ, z) = (Jsφ (φ , z  )(−sin(φ )x̂+cos(φ )ŷ)+
0≤φ <2π,|z  |≤L/2

Jsz (φ , z  )ẑ).G(ρ, φ, z|φ , z  )Rdφ dz 


where
G(ρ,| φ, z φ , z  ) =
 
(μ/4π).exp(−jK ρ2 + R2 − 2ρR.cos(φ − φ ) + (z − z  )2 )/ ρ2 + R2 − 2ρR.cos(φ − φ ) + (z − z  )2

Note that
G(R, φ, z|φ , z  ) =
 
(μ/4π).exp(−jK 4R2 sin2 ((φ − φ )/2) + (z − z  )2 )/ 4R2 sin2 ((φ − φ )/2) + (z − z  )2

The corresponding electric field in space produced by this surface current density
is given by
Es (ω, ρ, φ, z) = (jω )−1 (∇(divA) + K 2 A)
Since the tangential components Ez , Eφ of the total electric field E = Ei + Es
must vanish when ρ = R, |z| ≤ L/2, we get two integral equations

Eiz (ω, R, φ, z) + Esz (ω, R, φ, z) = 0,

Eiφ (ω, R, φ, z) + Esz (ω, R, φ, z) = 0


for the two components Jsφ and Jsz of the surface current density. These two
integral equations must be solved by numerical techniques.
[16] Problems in electrodynamics related to general relativity
[1] Evaluate the components R00 , R11 , R22 , R33 of the Ricci tensor for the
following spherically symmetric metric:
dτ 2 = A(r)dt2 − B(r)dr2 − r2 (dθ2 + sin2 (θ)dφ2 )
Set these equal to zero and thereby obtain the Schwarzchild solution
A(r) = 1 − 2m/r, B(r) = (1 − 2m/r)−1

where m is a constant.
[2] Study the dynamics of small metric perturbations around the Schwarzchild
solution by linearizing the Einstein field equations:

δRμν = 0

hint:
μα ):ν − (δΓμν ):α
δRμν = (δΓα α
Advanced Probability and Statistics: Applications to Physics and Engineering 195

where the covariant derivatives are carried out w.r.t the unperturbed metric.
Further,
(δΓα α
μα ):ν = (δΓμα ),ν
β
−Γα
μν δΓαβ

Note that although Γα α


μν does not transform as a tensor, δΓμν which is the
difference between the connection corresponding to two metrics with the same
system of coordinates does transform as a tensor. Give a reason for this.

[17] Gravitational radiation


[a] Calculate the quadratic part in the Ricci tensor:
(2) (2) (2)
Rμν = (Γα
μα,ν ) − (Γα
μν,α )

(1)
−(Γα
μν ) (Γβαβ )(1) + (Γα
μβ )
(1)
(Γβνα )(1)
gμν = ημν + hμν (x)
(2)
(Γα
μα ) = −(1/2)hαβ .(hβμ,α + hβα,μ − hμα,β )
Thus,
(2)
(Γα
μα,ν )

= (−1/2)[hαβ (hβμ,α + hβα,μ − hμα,β )] nu


Likewise,
2)
(Γα
μν,α ) =

(−1/2)[hαβ (hβμ,ν + hβν,μ − hμν,β )],α


(1)
(Γα
μν ) = (1/2)ηαβ (hβμ,ν + hβν,μ − hμν,β )
μ,ν + hν,μ − hμν ]
= (1/2)[hα α ,α

(Γβαβ )(1) =

(1/2)(hβα,β + h,α − h,β


βα ) = h,α /2
(2)
Substituting all these into the expression for Rμν gives us an expression of the
form
(2)
Rμν = C1 (μνραβσ)hμν,ρ hαβ,σ + C2 (μνραβσ)hμν hαβ,ρσ − − − (1)

Note that
(1) (1) (1)
Rμν = (Γα
μα,ν ) − (Γα
μν,α )

= (1/2)[h,μν − hα
μ,αν − hν,αμ + hμν ]
α

The quadratic part in the Einstein tensor is given by

G(2)
μν = (Rμν − (1/2)Rgμν )
(2)
=
(2)
= Rμν − (1/2)R(2) ημν − (1/2)Rμν
(1)
hμν
196 Advanced Probability and Statistics: Applications to Physics and Engineering

Now,
R(2) = (g μν Rμν )(2) = ημν Rμν
(2) (1)
− hμν .Rμν
(2) (2)
Substituting these, it is easy to see that that Gμν has the same form as Rμν
as given in equation (1) but with a different set of constants C1 (.), C2 (.). Then
the quadratic component in the contravariant part of Gμν is given by
Gμν(2) = [(ημα − hμα )(ηνβ − hνβ )Gαβ ](2)
(2) (1)
= ημα ηνβ Gαβ − (ημα hνβ + ηνβ hμα )Gαβ
It is easy to see once again that this has the same form of the rhs of (1) but
with different constants C1 (.), C2 (.).

The solution for hμν (t, r) is a retarded potential of the form



hμν (t, r) = Sμν (t − |r − r |, r )d3 r /|r − r |

where Sμν (x) is proportional to Tμν −(1/2)T ημν with Tμν as the energy-momentum
tensor of the matter plus electromagnetic field. We take
Tμν = ρvμ vν − (1/4)Fαβ F αβ ημν + Fμα Fνβ ηαβ
The pressure terms in the energy-momentum tensor of the matter field are
taken care of in a generalized form in the energy-momentum tensor of the elec-
tromagnetic field Fμν . Note that the energy-momentum tensor of the em field
is traceless. Thus,
T =ρ
In the frequency domain, we have

hμν (ω, r) = Sμν (ω, r ).exp(−jω|r − r |)d3 r /|r − r |

which in the far field zone becomes



hμν (ω, r) = (exp(−iωr)/r) Sμν (ω, r )exp(jωr̂.r )d3 r

= r−1 exp(−jωr).Pμν (ω, r̂)


Note that 
Pμν (ω, r̂) = Sμν (t, r )exp(−j(ω(t − r̂.r )))dtd3 r

[18] Post-Newtonian equations of hydrodynamics


Expansion is in powers of the velocity
√ v or equivalently in powers of 1/c (ie
v/c). Note that v is proportional
√ to M , so equivalently, the expansion can be
considered in powers of M .
g00 = 1 + g00 (2) + g00 (4) + ..., g0r = g0r (3) + g0r (5) + ...,
Advanced Probability and Statistics: Applications to Physics and Engineering 197

grs = −δrs + grs (2) + grs (4) + ...,


00
g = 1 + g (2) + g 00 (4) + ..., g 0r = g 0r (3) + g 0r (5) + ...,
00

g rs = −δrs + g rs (3) + g rs (5) + ...


ρ = ρ(2) + ρ(4) + ..., p = p(4) + p(6) + ...
T00 = T00 (2) + T00 (4) + ..., T0r = T0r (3) + T0r (5) + ...,
Trs = Trs (2) + Trs (4) + ...
T 00 = T 00 (2) + T 00 (4) + ..., T 0r = T 0r (3) + T 0r (5) + ...,
T rs = T rs (2) + T rs (4) + ...
R00 = R00 (2) + R00 (4) + ...,
R0r = R0r (3) + R0r (5) + ...,
Rrs = Rrs (2) + Rrs (4) + ...
upto fourth order, we have (note that a time derivative increases the order by
unity while a spatial derivative does not change the order)

Γr00 = g r0 Γ000 +g rs Γs00 = −Γr00 = (−1/2)(2gr0,0 (3)−g00,r (2)) = Γr00 (2)+Γr00 (4)

where
Γr00 (2) = (1/2)g00,r (2), Γr00 (4) = −gr0,0 (3)
Γrsm = g r0 Γ0sm + g rk Γksm
= (1/2)g rk (2)(gks,m (2)+gkm,s (2)−gsm,k (2))−(1/2)(grs,m (2)+grm,s (2)−gsm,r (2))
= Γrsm (2) + Γrsm (4)
where
Γrsm (2) = −(1/2)(grs,m (2) + grm,s (2) − gsm,r (2)),
Γrsm (4) = (1/2)g rk (2)(gks,m (2) + gkm,s (2) − gsm,k (2))
Γr0m = g r0 Γ00m + g rs Γs0m
= −Γr0m (3) = −(1/2)(gr0,m (3) + grm,0 (2) − gm0,r (3))
Γ000 = Γ000 (3) = (1/2)g00,0 (3)
Γ0sm = Γ0sm (3) = (1/2)(g0s,m (3) + g0m,s (3) − gsm,0 (2))
Now we compute the perturbation terms of the Ricci tensor components upto
fourth order.

[19] Scattering of a gravitational wave by an electromagnetic field


This problem has applications to the evolution of gravitational perturbations
in the early state of the evolution of our universe, namely during the radiation
dominated era. In the present era, we have a small remnant of this cosmic
microwave background radiation (CMBR) which affects the expansion of our
198 Advanced Probability and Statistics: Applications to Physics and Engineering

universe in the form of fluctuations in the homogeneous and isotropic metric


tensor. We already know that
(1) (1) (1)
Rμν = (Γα
μα,ν ) − (Γα
μν,α )

= (1/2)[h,μν − hα
μ,αν − hν,αμ + hμν ]
α

Thus,
R(1) = ημν Rμν
(1)
= h − h,αβ
αβ )

And hence the linearized Einstein field tensor is

G(1) (1)
μν = Rμν − (1/2)R
(1)
ημν =

αβ
= (1/2)[hμν − h,μν − hα
μ,αν − hν,αμ + hημν + h,αβ ημν ]
α

It is easily seen that


G(1),ν
μν =0
Further, if we choose our coordinate system so that

hνμν − (1/2)h,μ = 0

then we get
G(1)
μν = (1/2)(hμν − (1/2)hημν )

So in this coordinate system, in the radiation dominated era, we have the


Einstein-Maxwell equations

(hμν − (1/2)hημν ) = −16πGSμν

[20] Expressing the basic equations of general relativity in three


dimensional notation

dτ 2 = gμν dxμ dxν = g00 dt2 + 2g0r dxr dt + grs dxr sxs

The light ray takes a time dt = dx0 to travel from xr to xr + dxr where dx0
satisfies
dτ 2 = 0
Thus, 
dx0 = dt = hr dxr + γrs dxr dxs
where
2 1/2
hr = −g0r /g00 , γrs = [(g0r g0s − g00 grs )/g00 ]
Likewise the light ray starting from xr + dxr and travelling to xr takes a time

(dx0 ) = −hr dxr + γrs dxr dxs
Advanced Probability and Statistics: Applications to Physics and Engineering 199

Hence
[dx0 − (dx0 ) ]/2 = hr dxr
and this correction must be taken into account while performing synchronization
of a moving particle. Specifically, the corrected proper time interval should be
taken as

g00 (dx0 − hr dxr )
and hence the three velocity of a moving particle must be defined as
√ √
wr = dxr /[ g00 (dx0 − hr dxr )] = v r /[ g00 (1 − hr v r )]

where
v r = dxr /dx0
We find that
dτ 2 = g00 dt2 + 2g0r dtdxr + grs dxr dxs
= g00 dt2 + 2g0r dtdxr + g00 ((grs g00 − g0r g0s )/g00
2
)dxr dxs + g00 (g0r dxr /g00 )2
= g00 dt2 − 2g00 hr dtdxr − g00 dl2 + g00 (hr dxr )2
= g00 (dt − hr dxr )2 − g00 dl2
where
dl2 = γrs dxr dxs
is the three-length element. We define

v 2 = γrs v r v s , w2 = γrs wr ws = v 2 /(g00 (1 − hr v r )2 )

and then get


dτ 2 == g00 dt2 (1 − hr v r )2 − g00 v 2 dt2
or equivalently,
√ √
dτ = g00 dt((1 − hr v r )2 − v 2 )1/2 = g00 dt(1 − hr v r )(1 − w2 )1/2

Thus,
ur = dxr /dτ = wr /(1 − w2 )1/2
and the geodesic equations of motion can be expressed as

d/dτ (wr /(1 − w2 )1/2 ) + Γr00 (u0 )2 + 2Γr0s u0 us + Γrsm us um = 0

or equivalently,

d/dt(wr /(1 − w2 )1/2 ) + Γr00 + 2Γr0s v s + Γrsm v s v m = 0

[21] The Dyson-Schwinger equations and its connection with the


vacuum polarization tensor and the electron self-energy

Amu (x) = eψ(x)∗ αμ ψ(x),


200 Advanced Probability and Statistics: Applications to Physics and Engineering

(iγ μ ∂μ − m)ψ(x) = −iγ μ Aμ (x)ψ(x)


[Amu (t, r), ∂0 Aμ (t, r )] = iη μν δ 3 (r − r ),
{ψ(t, r), ψ(t, r )∗ } = γ 0 δ 3 (r − r )I4

Dμν (x − y) =< 0|T (Aμ (x)Aν (y))|0 >= θ(x0 − y 0 ) < 0|Aμ (x)Aν (y)|0 >
+θ(y 0 − x0 ) < 0|Aν (y)Aμ (x)|0 >
S  (x−y) =< 0|T (ψ(x).ψ(y)∗ )|0 >

= θ(x0 −y 0 ) < 0|ψ(x).ψ(y)∗ |0 > −θ(y 0 −x0 ) < 0|ψ(y)∗ ψ(x)|0 >
Thus,

Dμν (x) = δ 4 (x)ημν − e < 0|T (ψ(x)∗ αμ ψ(x)Aν (0))|0 >
(iγ μ ∂μ − m)S  (x) = iδ 4 (x) − iγ μ < 0|Aμ (x)ψ(x)ψ(0)∗ |0 >

Note that Dμν (x) is the exact photon propagator while
Dμν (x) = F −1 Dμν (k), Dμν (k) = ημν /k 2 + i0)

is the photon propagator for the free photon field, ie, in the absence of the Dirac
current. Likewise S  (x) is the exact electron propagator while

S(x) = F −1 S(p), S(p) = i(γ.p − m + i0)−1


is the electron propagator for the free electron field, ie, in the absence of the
photon field. These approximate propagators are also called the ”bare propa-
gators”, ie, the propagators in the absence of interactions. More precisely, the
bare photon propagator means the photon propagator in the absence of interac-
tions with the electron field and the bare electron propagator means the electron
propagator in the absence of interactions with the photon field. We define the
vertex function Γμ (p , p) by


S  (p )Γν (p , p)S  (p)D μν (p − p)exp(−ip .x + ip.y − i(p − p ).z)d4 pd4 p =

< 0|T (ψ(x)ψ(y)∗ Aμ (z))|0 >


It follows then that


S  (p )Γν (p , p)S  (p)D μν (p − p)exp(−i(p − p).x)d4 pd4 p =

< 0|T (ψ(x).ψ(x)∗ Aμ (0))|0 >


and hence
< 0|T (ψ(x)∗ αμ ψ(x).Aν (0))|0 >=

T r(αμ S  (p )Γρ (p , p)S  (p))Dνρ (p − p).exp(i(p − p ).x)d4 pd4 p

and taking the Minkowski space 4-D Fourier transform on both sides gives us

< 0|T (ψ(x)∗ αμ ψ(x).Aν (0))|0 > exp(ik.x)d4 x =
Advanced Probability and Statistics: Applications to Physics and Engineering 201



( T r(αμ S  (p + k)Γρ (p + k, p)S  (p))d4 p)D νρ (k)

The equation

Dμν (x) = δ 4 (x)ημν − e < 0|T (ψ(x)∗ αμ ψ(x)Aν (0))|0 >

therefore gives us on taking the Fourier transform,


 
Dμν (k) = Dμν (k) + Dμρ (k)Πρσ (k)D σν (k)

where Πρσ (k) is the vacuum polarization tensor defined by



Πμν (k) = −e T r(αμ S  (p + k)Γν (p + k, p)S  (p))d4 p

Likewise, we can derive an equation for the exact electron propagator in terms
of the vertex function as follows: We start with

(iγ μ ∂μ − m)S  (x) = iδ 4 (x) − iγ μ < 0|Aμ (x)ψ(x)ψ(0)∗ |0 >

and note that




S  (p )Γν (p , p)S  (p)D μν (p − p)exp(−ip.x)d4 pd4 p =

< 0|T (ψ(x)ψ(0)∗ Aμ (x))|0 >


Taking the Fourier transform on both sides gives

< 0|T (ψ(x)ψ(0)∗ Aμ (x))|0 > exp(ik.x)d4 x =


S  (p )Γν (p , k)S  (k)D μν (p − k)d4 p


= S  (p + k)Γν (p + k, k)S  (k)D μν (p)d4 p

Thus,



(γ.k − m)S (k) = i − i[ γμ S  (p + k)Γν (p + k, k)D μν (p)d4 p]S  (k)

or equivalently,


(γ.p − m)S  (p) = i − i[ γμ S  (p + k)Γν (p + k, p)D μν (k)d4 k]S  (p)

or equivalently,
S  (p) = S(p) + S(p)Σ(p)S  (p)
where 
Σ(p) = − γμ S  (p + k)Γν (p + k, p)D”μν (k)d4 k
202 Advanced Probability and Statistics: Applications to Physics and Engineering

We can approximately evaluate the vertex function as follows:



S(p )Γν (p , p)S(p)Dμν (p − p)exp(−ip .x + ip.y − i(p − p ).z)d4 pd4 p ≈

< 0|T (ψ(x)ψ(y)∗ Aμ (z))|0 >



=≈< 0|T (ψ(x)ψ(y)∗ G(z − w)ψ(w)∗ αμ ψ(w)d4 w)|0 >

where 
G(z) = e δ(q 2 )exp(−iq.z)d4 q

where we now regard ψ(.) to be the free electron-positron wave field, ie, the
Dirac field in the absence of the photon field:

(iγ.∂ − m)ψ(x) = 0

or equivalently,

ψ(x) = (a(p, σ)u(p, σ)exp(−ip.x) + b(p, σ)∗ v(p, σ)exp(ip.x))d4 p

with a(p, σ) and b(p, σ) being respectively the electron and positron annihilation
operator fields in momentum-spin space. We evaluate this term by the usual
Wick relations:
< 0|T (ψa (x)ψb (y)∗ ψ(w)∗ αμ ψ(w))|0 >
= (αμ )cd < 0|T (ψa (x)ψb (y)∗ ψc (w)∗ ψd (w))|0 >
= (αμ )cd (Sab (x − y)Scd (w − w) + Sac (x − w)Sdb (w − y))
or after neglecting the infinite constant matrix S(w − w) = S(0), it evaluates to

(αμ )cd Sac (x − w)Sdb (w − y) = [S(x − w)αμ S(w − y)]ab

so that

S(p )Γν (p , p)S(p)Dμν (p − p)exp(−ip .x + ip.y − i(p − p ).z)d4 pd4 p ≈

= G(z − w)S(x − w)αμ S(w − y)d4 w

Putting z = 0 in this equation gives us



S(p )Γν (p , p)S(p)Dμν (p − p)exp(ip.y − p .x)d4 pd4 p =

G(w)S(x − w)αμ S(w − y)d4 w
Advanced Probability and Statistics: Applications to Physics and Engineering 203

and we get by inverse Fourier transforming this equation,

S(p )Γν (p , p)S(p)Dμν (p −p)



= G(w)S(x−w)αμ S(w−y)exp(ip .x−ip.y)d4 wd4 xd4 y

or equivalently,
Dμν (p − p)Γν (p , p) = G(p − p)αμ
ie,
Γν (p , p) = (p − p)2 G(p − p)αν = 0
So in order to calculate Γν (p , p), we must go to the next degree of approxima-
tion. This approximation is given by
Dμν (p − p)S  (p )Γν (p , p)S  (p) = G(p − p)S(p )αμ S(p),

and
S  (p) = S(p) + S(p)Σ(p)S(p),

Σ(p) = − γμ S(p + k)Γν (p + k, p)Dμν (k)d4 k

Thus
[22] The gravitational field in the presence of an external quantum
photon field
Metric:
gμν (x) = ημν + hμν (x)
Choose the coordinate system so that

h0μ = 0

Lagrangian density of the gravitational field:


√ β
LG = Kg μν −g(Γα μν Γαβ − Γμβ Γνα )
α β

≈ C1 (abcd)hab,0 hcd,0 + C2 (abcde)hab,0 hcd,e + C3 (abdcdf )hab,e hcd,f


upto quadratic orders in the metric perturbations and their differentials. The
canonical position fields are hab , 1 ≤ a ≤ 3, totally six in number. The corre-
sponding momentum fields are

π ab = ∂LG /∂hab,0 = 2C1 (abcd)hcd,0 + C2 (abcde)hcd,e

and inverting this equation gives us

hab,0 = D1 (abcd)π cd + D2 (abcde)hcd,e

The Hamiltonian density is then

HG (hab , hab,c , π ab ) = π ab hab,0 − LG


204 Advanced Probability and Statistics: Applications to Physics and Engineering

= F1 (abcd)π ab π cd + F2 (abecdf )hab,e hcd,f + F3 (abcdf )π ab hcd,f


The canoncical Hamiltonian equations for this free gravitational field are then

hab,0 = δHG /δπ ab = 2F1 (abcd)π cd + F3 (abcdf )hcd,f ,


ab
π,0 = −δHG /δhab = ∂e ∂HG /∂hab,e = 2F2 (abdcdf )hcd,f + F3 (cdabe)π cd
which have free gravitational plane waves as solutions. The arbitrary coeffi-
cients of in these plane wave solutions are precisely the graviton creation and
annihilation operator fields in momentum-helicity space. Gravitons have spin
two which means that there are five degrees of freedom for their polarization
corresponding to each direction of propagation. Now consider the interaction
between this gravitational field and the photon field and the gravitational field
and the Dirac electron-positron field. The first interaction term is derived from
the Maxwell Lagrangian √
(−1/4)Fμν F μν −g

= (−1/4)Fμν Fαβ g μα g νβ −g
≈ E1 (μναβab)Fμν Fαβ hab
after neglecting the pure electromagnetic component

ημα ηνβ Fμν Fαβ

and also neglecting terms that are quadratic or higher in the metric perturba-
tions. The photon field Fμν (x) can be expanded as a linear superposition of the
creation and annihilation operator fields of the photon in momentum space:

Fμν (x) = [a(K, s)eμν (K, s)exp(−ik.x) + a(K, s)∗ ēμν (K, s)exp(ik.x)]d3 K

[23] Characterizing diseased tissues in terms of permittivity and permeabil-


ity.
The diseased tissue is characterized by three fields: The permittivity field
(ω, r), the permeability field μ(ω, r) and the conductivity field σ(ω, r). Thus, we
shine an incident electromagnetic field Ei (ω, r), Hi (ω, r) on the tissue and solve
Maxwell’s equations perturbatively to obtain the scattered electromagnetic field
Es (ω, r), Hs (ω, r) as functionals of the incident em field and the permittivity,
permeability, conductivity fields:

E = Ei + δ.Es + O(δ 2 ), H = Hi + δ.Hs + O(δ 2 )

curlE = −jωμH, curlH = jω E, div( E) = 0, div(μH) = 0


Thus,

∇(divE) − ∇2 E = curlcurlE = −jωcurl(μH) = −jω(∇μ × H + μcurlH)


Advanced Probability and Statistics: Applications to Physics and Engineering 205

= −jω∇μ × H + ω 2 μ.E
Likewise,
∇(divH) − ∇2 H = curlcurlH = jωcurl( E)
= jω(∇ × E + .curlE)
= jω∇ × E + ω 2 μ .H
.divE + ∇ .E = 0
divE = −(∇ , E)/
divH = −(∇μ, H)/μ
So,
(∇2 + ω 2 μ )E + ∇(∇log( ), E) − jω∇μ × H = 0
(∇2 + ω 2 μ )H + ∇(∇log(μ), H) + jω∇ × E = 0
Writing
(ω, r) = 0 (1 + δ.χe (ω, r)),
μ(ω, r) = μ0 (1 + δ.χm (ω, r))
we get by first order perturbation theory,
(∇2 + ω 2 μ0 0 )(Ei , Hi ) = 0,
(∇2 + ω 2 μ0 0 )Es + ω 2 μ0 0 (χe + χm )Ei
+∇(∇χe , Ei ) − jωμ0 ∇χm × Hi = 0,
(∇2 + ω 2 μ0 0 )Hs + ω 2 μ0 0 (χe + χm )Hi
+∇(∇χm , Hi ) + jω 0 ∇χe × Ei = 0,
We note that the inverse (∇2 +k 2 )−1 of the operator (∇2 +k 2 ) where k 2 = ω 2 μ0 0
has the kernel
Gk (r, r ) = −exp(−jk|r − r |)/4π|r − r |
and 
Es (ω, r) = Gk (r, r )[−k 2 (χe (ω, r )+χm (ω, r ))Ei (ω, r )
−∇(∇χe (ω, r ), Ei (ω, r ))+jωμ0 ∇χm (ω, r )×Hi (ω, r )]d3 r

Hs (ω, r) = Gk (r, r )[−k 2 (χe (ω, r )+χm (ω, r ))Hi (ω, r )−∇(∇χm (ω, r ), Hi (ω, r ))
−jω 0 ∇χe (ω, r )×Ei (ω, r )]d3 r

In other words, we can write for the scattered em field when the incident field
is fixed and known,
Fs (ω, r) = (Es (ω, r), Hs (ω, r))
Fs (ω, r) = F0 (ω, r, χe , χm )
206 Advanced Probability and Statistics: Applications to Physics and Engineering

with the rhs being a linear functional of χe , χm . The noisy measurement data
therefore has the model

Fs (ω, r) = F0 (ω, r, χe , χm ) + v(ω, r) − − − (1)

where v is noise. We must estimate the entire permittivity-permeability field


χe (ω, r), χm (ω, r) from Fs . Since this is an infinite dimensional estimation prob-
lem, we approximate it by a finite dimensional estimation problem using the
method of moments, ie, we choose basis/test functions ψn (ω, r), n = 1, 2, ..., N
and expand (approximately)


N
χe (ω, r) = θe (n)ψn (ω, r)
n=1


N
χm (ω, r) = θm (n)ψn (ω, r)
n=1

Substituting these expressions into (1), the model assumes the form
 
Fs (ω, r) = F0 (ω, r, θe (n)ψn , θm (n)ψn ) + v(ω, r)
n n

Rather than storing this entire measurement data Fs (ω, r), (ω, r) ∈ D we com-
press it by storing only its dominant wavelet coefficients

WF 0 (n, m) = Fs (ω, r)φn,m (ω, r)dωd3 r = G0 (n, m, θe , θm ), (n, m) ∈ I
D

and the measurement model thus becomes

WF (n, m) = Go (n, m, θe , θm ) + v(n, m), (n, m) ∈ I

where
θe = ((θe (n))), θm = ((θm (n)))
and we can express this measurement model in vector form

WF = G0 (θe , θm ) + v

More generally, if we use pth order perturbation theory for p > 1, then in this
measurement model, we will have G0 as a polynomial function of θe , θm . The
problem of characterizing the disease is to obtain θe , θm from WF . In the noise-
less case, we can do this by a neural network in which we give sample values of
(θe , θm ) and train the weights of the network to take as input WF = G0 (θe , θm )
and output (θ)e , θm ). Then when presented with a general measurement data,
we compute its wavelet coefficient vector WF and input this into the network
which will output the corresponding parameters (θe , θm ). Now we wish to eval-
uate the robustness of this algorithm with respect to measurement noise. We
Advanced Probability and Statistics: Applications to Physics and Engineering 207

select a parameter vector (θe , θm ) and input G0 (θe , θm ) + v into the neural net-
work with the trained weights. if Q(.) denotes the nn function, then the output
parameter estimate is given by

(θ̂e , θ̂m ) = Q(G0 (θe , θm ) + v)

and the robustness is evaluated by computing

P ( (θ̂e , θ̂m ) − (θe , θm ) > δ)



using the large deviation principle for weak noises. Specifically, if we write v
in place of v where is a small parameter that signifies the fact that the noise
is weak, then for small we have an LDP

P ( v ∈ B) ≈ exp(−inf (I(x) : x ∈ B)/ )

where I(.) is the rate function for v. Then the contraction principle gives

P ( (θ̂e , θ̂m ) − (θe , θm ) > δ)

≈ exp(−inf (IW (x) : x > δ)/ )


where
IW (x) = inf (I(y) : Q(G0 (θe , θm ) + y) − (θe , θm ) = x)
Chapter 7

Some Aspects of Superstring


Theory

[1] Discuss the irreducible representations of the permutation group using the
group algebra method and Young diagrams. Explain how this theory can be used
to obtain the characters of the permuatation group and how by using the duality
between the action of the unitary group in its standard tensor representation and
the corresponding action of the permutation group on tensors, one can derive a
formula for the generating function for the characters of the permutation group
provided that one uses Weyl’s character formula for the characters of the unitary
group.

[2] Superstring theory and supergravity coupled to the super Yang-Mills


fields.
The super gravity field equations are determined by a Lagrangian density
mn μ ν
LSU GR = e[Rμν em en + χ̄a Γabc Db χc ]

where Γabc is obtained by completely antisymmetrizing the product γa γb γc of


the Dirac matrices. χa are Majorana Fermion gravitino fields. Let ωμmn denote
the spin connection of the gravitational field. Then it defines a spinor covariant
derivative by
Dμ = ∂μ + ωμmn Γmn , Da = eμa Dμ
where
Γmn = [Γm , Γn ]
The Riemann curvature tensor in the spin representation is

Rμν = [∂μ + ωμmn Γmn , ∂ν + ωνrs Γrs ]

mn
= [ων,μ − ωμ,ν
mn
+ [ωμ , ων ]mn ]Γmn

209
210 Advanced Probability and Statistics: Applications to Physics and Engineering

Note that {Γmn } satisfy the Lorentz Lie algebra commutation relations and that
we can write
mn
Rμν = Rμν Γmn
where
mn α β
Rμν em en = Rμναβ
is the standard Riemann-Christoffel curvature tensor in the coordinate basis.
The curvature scalar is therefore
mn μ ν
R = Rμν em en

The supergravity Lagrangian is invariant, ie, changes by a perfect space-time


divergence under the local supersymmetry transformation

δχμ = c1 Dμ (x), δeaμ = c2 Γa χμ (x)

where
χa = eμa χμ
and
Dμ = ∂μ + ωμab Γab
is the gravitational spinor covariant derivative. The supersymmetry transfor-
mation of ωμmn is not required here, since ωμmn is assumed to be determined by
the field equation that it satisfies obtained by setting the variational derivative
of the supergravity action w.r.t. it to zero. This equation turns out to be a
purely algebraic equation for ωμmn which determines it in terms of the tetrad
field eμn and the gravitino field χa or equivalently χμ .

a
Remark: In the special relativistic Yang-mills theory, Fμν are the fields
obtained from the gauge Boson fields Aaμ by
a
ieFμν = [∂μ + ieAμ , ∂ν + ieAν ]a

with
Aμ = Aaμ τa
where the τa s are Hermitian generators of the gauge group. A gauge invariant
option for the Lagrangian density are obtained by adding matter action terms
to the Yang-Mills Lagrangian is
a
L = (1/2)Fμν F μνa + ψ̄Γa [i∂a + eAa ) − m]ψ

+η̄Γab ηBab
where
Γab = [Γa , Γb ]
It remains to determine the global supersymmetry transformations which change
this Lagrangian by a total differential.
Advanced Probability and Statistics: Applications to Physics and Engineering 211

Exercise: Determine the above mentioned global supersymmetry transfor-


mation.

Some additional details on supergravity:


Supergravity:
Let ωμmn denote the spinor connection of the gravitational field. Then if Γm
are the Dirac matrices in four dimensions and em μ is the tetrad basis of space
time being used, the covariant derivative of a spinor field is defined by

Dμ ψ = (∂μ + (1/4)ωμmn Γmn )ψ

where
Γmn = [Γm , Γn ]
The curvature tensor in spinor notation is
mn
Rμν = [∂μ + (1/4)ωμν Γmn , ∂ν + (1/4)ωνrs Γrs ]
mn
= (1/4)(ων,μ − ων,μ
mn
)Γmn
+(1/16)ωμmn ωνrs [Γmn , Γrs ]
Now using the anticommutator

{Γm , Γn } = 2ηmn

we can easily show that

[Γmn , Γrs ] = 4(ηms Γnr + ηnr Γms − ηmr Γns − ηns Γmr )

Thus
Rμν =
mn
= (1/4)(ων,μ − ων,μ
mn
)Γmn
+(1/4)ωμmn ωνrs (ηms Γnr + ηnr Γms − ηmr Γns − ηns Γmr )
This can be expressed as
mn
Rμν = (1/4)Rμν Γmn

where
mn
Rμν mn
= ων,μ − ων,μ
mn
− ωμrn ωνms ηrs + ωμms ωνrn ηsr + ωμsn ωνrm ηsr − ωμmr ωνns ηrs

=
mn
ων,μ − ων,μ
mn
+ ηrs (−ωμrn ωνms + ωμms ωνrn + ωμsn ωνrm − ωμmr ωνns )
mn
= ων,μ − ων,μ
mn
+ 2ηrs (ωμmr ωνsn − ωνmr ωμns )
It is easily shown that when the spinor connection ωμmn for the gravitational field
is appropriately chosen so that the Dirac equation in curved space-time remains
invariant under both diffeormophisms and local Lorentz transformations, then
212 Advanced Probability and Statistics: Applications to Physics and Engineering

the Riemann curvature tensor as defined usually in terms of the Christoffel


mn
connection symbols, coincides with Rμνρσ = Rμν emρ enσ . In particular, R =
Rμν em en is the scalar curvature of space-time. The spinor connection ωμmn is
mn μ ν

chosen so that the covariant derivative of the tetrad enμ having one spinor index
and one vector index is zero:

0 = Dν enμ = enμ,ν − Γα n nm
μν eα + ων emμ

This is an algebraic equation for ωμmn and is easily solved. However when there
are spinor fields like the gravitino in addition to the gravitational field specified
by the tetrad eμn (ie, the graviton), then the definition of the spinor connection
has to be modified and it is expressed in terms of both the graviton and the
gravitino fields. This equation is obtained by first considering the supergravity
Lagrangian in four space-time dimensions

c1 eR + iχ̄μ Γμνρ Dν χρ

where χμ is a Majorana spinor having an additional vector index μ. The gravi-


ton tetrad field eμn is Bosonic while the gravitino χμ is Fermionic. These are
all considered in the quantum theory to be operator valued fields. Note the
following self consistent definitions:

Γμ = eμn Γn , Γμ = gμν Γν = Γn enμ

Γn = ηnm Γm , enμ enν = gμν , eμn emμ = ηnm ,


enμ = ηnm em ν
μ = gμν en

Γμνρ = Γμ Γνρ + Γν Γρμ + Γρ Γμν


where
Γμν = [Γμ , Γν ]
Thus Γμνρ is obtained by antisymmetrizing the product Γμ Γν Γρ over all the
three indices. we can also clearly write

Γμν = eμm eνn Γmn , Γμνρ = eμm eνn eρk Γmnk

In general, we can define



Γμ1 ...μk = sgn(σ)Γμσ1 ...Γμσk
σ∈Sn

This is obtained by totally antisymmetrizing the product Γμ1 ...Γμk over all its
k indices. The basic property of a Majorana Fermionic operator field ψ(x) is
that apart from all its components anticommuting with each other, it has four
components and satisfies

(ψ(x)∗ )T = ψ T Γ5 Γ0
Advanced Probability and Statistics: Applications to Physics and Engineering 213

where if ⎛ ⎞
ψ1 (x)
⎜ ψ2 (x) ⎟
ψ(x) = ⎜ ⎟
⎝ ψ3 (x) ⎠
ψ4 (x)
then ⎛ ⎞
ψ1 (x)∗
⎜ ψ2 (x)∗ ⎟
ψ(x)∗ = ⎜ ⎟
⎝ ψ3 (x)∗ ⎠
ψ4 (x)∗
ψk (x)∗ denoting the operator adjoint of ψk (x) in the Fock space on which it
acts. Also we define

ψ(x)T = [ψ1 (x), ψ2 (x), ψ3 (x), ψ4 (x)]

so that we have

(ψ(x)∗ )T = [ψ1 (x)∗ , ψ2 (x)∗ , ψ3 (x)∗ , ψ4 (x)∗ ]

Now observe that


0 −i
= diag[iσ 2 , iσ 2 ], σ 2 =
i 0

Note that is a real skewsymmetric matrix. we write

0 1
e = iσ 2 =
−1 0

so that
= diag[e, e], e2 = −I
e 0 0 I
Γ5 Γ0 =
0 −e I 0

0 e
=
−e 0
Thus, the condition for ψ to be a Majorana Fermion can be stated as
∗ ∗
ψ1:2 = eψ3:4 , ψ3:4 = −eψ1:2

or equivalently,
∗ T ∗ T ∗ T ∗ T
(ψ1:2 ) = −(ψ3:4 ) e, (ψ3:4 ) = (ψ1:2 ) e

Also,
Γ0 Γn , Γ0 Γμ , Γn Γ0 , Γμ Γ0
214 Advanced Probability and Statistics: Applications to Physics and Engineering

are Hermitian matrices. We observe that if ψ is a Majorana Fermion,

ψ̄ = (ψ ∗ )T Γ0 = ψ T Γ5 Γ0

So we can also write down the Lagrangian of the gravitino as

iχ̄μ Γμνρ Dν χρ

= iχ∗T 0 μνρ
μ Γ Γ D ν χρ

= iχTμ Γ5 Γ0 Γμνρ Dν χρ
We can verify that apart from a perfect divergence, this quantity is a Hermitian
operator. First observe that

(Γ0 Γμ Γν Γρ )∗ =

(Γ0 Γμ Γν Γ0 Γ0 Γρ )∗ =
Γ0 Γρ Γν Γ0 Γ0 Γμ
= Γ0 Γρ Γν Γμ
so that on antisymmetrizing over the three indices, we get

(Γ0 Γμνρ )∗ = −Γ0 Γμνρ

Thus,
(iχ∗μ Γ0 Γμνρ ∂ν χρ )∗
T

= iχ∗T 0 μνρ
ρ,ν Γ Γ χμ

= iχ∗T 0 ρνμ
μ,ν Γ Γ χρ

= −iχ∗T 0 μνρ
μ,ν Γ Γ χρ

= ∂ν (−iχ∗T 0 μνρ
μ Γ Γ χρ )

+iχ∗T 0 μνρ
μ Γ Γ χρ,ν
proving our claim provided that we replace Dν by ∂ν . If we take the connection
into account, ie
Dν χρ = ∂ν χρ + (1/4)ωνmn Γmn χρ
−Γα
ρν χα

then it follows that we must prove the Hermitianity of the operator fields

χ∗T 0 μνρ
μ Γ Γ Γmn χρ .ωνmn − − − (a)

and
χ∗T 0 μνρ
μ Γ Γ ρν − − − (b)
χα .Γα
Advanced Probability and Statistics: Applications to Physics and Engineering 215

However the field (b) is identically zero since Γμνρ is antisymmetric in (ν, ρ)
while Γαρν is symmetric in (ν, ρ). Hence, we have to prove only the Hermitianity
of the field
χ∗T
μ Γ Γ
0 μνρ
Γmn χρ − − − (c)
Now,
Γμνρ Γmn = [Γμνρ , Γmn ] + Γmn Γμνρ
and
[Γpqr , Γmn ] = [Γp Γqr + Γq Γrp + Γr Γpq , Γmn ]
Now,
[Γp Γqr , Γmn ] = Γp [Γqr , Γmn ] + [Γp , Γmn ]Γqr
= 4Γp (ηqn Γrm + ηrm Γqn − ηqm Γrn − ηrn Γqm )
+4(ηpm Γn − ηpn Γm )Γqr
Summing this equation over cyclic permutations of (pqr) gives us

[Γpqr , Γmn ] =

4 ηmq (Γp Γnr + Γr Γpn + Γn Γrp )
(pqr)

+4 ηnq (Γp Γrm + Γr Γmp + Γm Γpr )
(pqr)

=4 (ηmq Γpnr + ηnq Γprm )
(pqr)

Note that this quantity is antisymmetric w.r.t interchange of (m, n). It thus
follows that
eqν χ∗T 0 μνρ
μ Γ [Γ , Γmn ]χρ
= χp∗T Γ0 [Γpqr , Γmn ]χr

=4 [ηmq χp∗T Γ0 Γpnr χr + ηnq χp∗T Γ0 Γprm χr ]
(pqr)

Now,
(χp∗T Γ0 Γpnr χr )∗ =
χr∗T Γ0 Γrnp χp
= χp∗T Γ0 Γpnr χr
which proves the Hermitianity. Another way to see the Hermitianity of this is
to use the Majorana Fermion property of χp to write

χp∗T Γ0 Γpnr χr =

χpT Γ5 Γpnr χr
216 Advanced Probability and Statistics: Applications to Physics and Engineering

and use the fact that


(Γ5 Γpnr )T = −ΓTpnr Γ5

= − Γrnp Γ5 = Γ5 Γrnp = −Γ5 Γpnr


where we have used the identities

ΓTn = Γn , Γn Γ5 = −Γ5 Γn

Now consider
X = χ∗T 0
μ Γ {Γ
μνρ
, Γmn }χρ
where {., .} denotes anticommutator. We have

X = X1 + X2

where
X1 = χ∗T 0 μνρ
μ Γ Γ Γmn χρ

X2 = χ∗T 0
μ Γ Γmn Γ
μνρ
χρ
we have
X1∗ = χ∗T 0
ρ Γ Γmn Γ
μνρ
χμ

= −χ∗T 0
μ Γ Γmn Γ
μνρ
χρ = −X2
which shows that
X ∗ = −X
ie, X is skew Hermitian. Note that we have used the fact that

(Γ0 Γp Γq Γr Γm Γn )∗ =

(Γ0 Γp Γ0 Γ0 Γq Γr Γ0 Γ0 Γm Γ0 Γ0 Γn )∗ =
Γ0 Γn Γm Γ0 Γ0 Γr Γ0 Γ0 Γq Γp Γ0 Γ0 =
Γ0 Γn Γm Γr Γq Γp
since Γ0 Γn , Γn Γ0 , Γ0 are Hermitian and Γ02 = I. Thus, by antisymmetrizing
over (pqr) and over (mn), we get

(Γ0 Γpqr Γmn )∗ = Γ0 Γnm Γrqp

= Γ0 Γmn Γpqr

Now consider the following local supersymmetry transformation

δχμ (x) = Dμ (x), δenμ = K¯(x)Γn χμ (x)


Advanced Probability and Statistics: Applications to Physics and Engineering 217

where (x) is an infinitesimal Majorana Fermionic parameter. We can easily


check that Dμ also satisfies the Majorana Fermion property. Indeed,

((Dμ (x))∗ )T = ∂μ ( (x)∗ )T + ( (x)∗ )T Γ∗mn ωμmn

= (∂μ (x))T Γ5 Γ0 + (x)T Γ5 Γ0 Γ∗mn ωμmn

= (∂μ (x))T Γ5 Γ0 − (x)T Γ5 Γmn Γ0 ωμmn


Now,
Γmn = −ΓTmn
since
ΓTn = Γn
Thus, since Γ0T = Γ0 , it follows that

Γ5 Γmn Γ0 = Γ5 Γmn Γ0 =

−Γ5 ΓTmn Γ0 = −ΓTmn Γ5 Γ0


his gives
((Dμ (x))∗ )T =
(∂μ (x))T Γ5 Γ0
+ (x)T ΓTmn Γ5 Γ0
= (Dμ (x))T Γ5 Γ0
proving thereby the Majorana property of Dμ (x). Now under the local super-
symmetry transformation of χμ , the Gravitino Lagrangian changes by

δχ (χ̄μ Γμνρ Dν χρ ) =

= δ χ̄μ Γμνρ Dν χρ +
χ̄μ Γμνρ Dν δχρ
= D̄μ (x)Γμνρ Dν χρ
+χ̄μ Γμνρ Dν Dρ (x)
The term in this quantity that is quadratic in {ωμmn } is given by

[3] The action integral of a superparticle is given by



S = e.pμ pμ dτ

where e is the square root of a one dimensional metric and

pμ = dxμ /dτ − θA γ μ dθA /dτ


218 Advanced Probability and Statistics: Applications to Physics and Engineering

with θA (τ ) being Fermionic coordinates. The sum is over A and each θA is there-
fore a D dimensional Majorana Fermion. We wish to determine the supersym-
metry transformation under which S is invariant. We consider an infinitesimal
supersymmetry transformation

δθA = γ 0 γ μ pμ .k A = αμ pμ .k A

δxμ = θAT γ μ δθA


where k A are infinitesimal Grassmanian vectors, each of size D × 1. Note that
γ μ is a skewsymmetric matrix. Then,

δpμ = (δxμ ) − δθAT γ μ θA

−θAT γ μ δθA

= θAT γ μ δθA

θAT γ μ δθA

−δθAT γ μ θA

−θAT γ μ δθA
 
= θAT γ μ δθA − δθAT γ μ θA

= 2θAT γ μ δθA
since k B and hence δθB commutes with θA and γ μ is skewsymmetric. We thus
get
δ(p2 ) = 2pμ δpμ =

4θAT γ μ pμ δθA

= 4θAT γ 0 (α.p)δθA

= 4θAT γ 0 (α.p)2 k A

= 4p2 θAT γ 0 k A
and hence
δ(e.p2 ) = δ(e)p2 + eδ(p2 )

= δ(e)p2 + 4ep2 θAT γ 0 k A
This is zero provided that under the supersymmetry transformation, e changes
by

δe = −4e.θAT γ 0 k A
We note that in this analysis, we are not assuming that k A is independent of time
τ . It can depend on τ and hence our action is not only globally supersymmetric
but it is also locally supersymmetric.
Advanced Probability and Statistics: Applications to Physics and Engineering 219

[4] Generalization of supersymmetric actions from super-particles to super-


strings. We define
pμα = ∂α X μ − θAT γ μ ∂α θA
where α = 1, 2 and
∂ 1 = ∂τ , ∂2 = ∂ σ
We define the infinitesimal super-symmetric transformations

δθA = k A , δX μ = k AT γ μ θA

where k A is an infinitesimal Grassmanian parameter. Then,

δpμα = (δX μ ),α − δθAT γ μ ∂α θA − θAT γ μ (δθA ),α


AT
= k,α γ μ θA + k AT γ μ θ,α
A
− k AT γ μ θ,α
A
− θAT γ μ k,α
A

AT
= 2k,α γ μ θA
This is zero iff k A is a constant Grassmanian parameter in which case it follows
that the action  √
S = ημν hαβ hpμα pνβ d2 σ

is supersymmetric provided that the world sheet metric hαβ is assumed to be


supersymmetric invariant. Thus the action has only global supersymmetry and
not local supersymmetry. To obtain local supersymmetry, we have to add other
terms to the action. (See M.Green,J.Schwarz and E.Witten, Superstring Theory,
vol.I, Cambridge University Press).

[5] Super Yang-Mills action.


a
L = K1 Fμν F μνa + K2 ψ aT γ μ Dμ ψ a

where
iFμν = [∂μ + iAμ , ∂ν + iAν ]
or equivalently,
a
Fμν = Aaν,μ − Aaμ,ν + C(abc)Abμ Acν
where C(abc) are the structure constants of a set of Hermitian basis for the
Lie algebra of the gauge group. Dμ , the gauge covariant derivative acts in the
adjoint representation on the gaugino fields ψ a :

Dμ ψ a = ∂μ ψ a + iC(abc)Abμ ψ c

This formula can be obtained using

Dμ ψ a = [∂μ + iAμ , ψ]a =

∂μ ψ a + i[Abμ τb , ψ c τc ]a
220 Advanced Probability and Statistics: Applications to Physics and Engineering

= ∂μ ψ a + C(abc)Abμ ψ c
Note that Aaμ is a gauge Boson field and its superpartner ψ a is a gauge Fermion
field, also called a gaugino field. Now, we must introduce infinitesimal supersym-
metry transformations under which the action Ld4 x is invariant. We assume
that such a transformation has the form

δAaμ = k T γμ ψ a ,

δψ a = (γ μν k)Fμν
a

where k is an infinitesimal Grassmannian four component spinor and γ μν =


[γ μ , γ ν ]. Under this transformation, we get
a
δ(Fμν F μνa ) = 2F μνa δFμν
a
=

= 2F μνa (δAaν,μ − δAaμ,ν + C(abc)(Abμ δAcν + Acν δAbμ ))


= 4F μνa (δAaν,μ + C(abc)Abμ δAcν )
δ(ψ aT γ μ Dμ ψ a ) =
δψ aT γ μ Dμ ψ a
+ψ aT γ μ Dμ δψ a
+ψ aT γ μ (δDμ )ψ a
where
(δDμ )ψ a = C(abc)(δAbμ )ψ c

[6] Dirac’s equation in curved space-time


Let M be a Riemannian manifold with metric g, ie, for each pair of vector
fields X, Y on M, we have a scalar field g(X, Y ), such that the map (X, Y ) →
g(X, Y ) is symmetric and bilinear. We are given a connection ∇ on M. We say
that ∇ is induced by the metric if ∇X g = 0 for every vector field X defined on
M. This is equivalent to saying that

X(g(Y, Z)) = g(∇X Y, Z) + g(Y, ∇X Z)

for any three vector fields X, Y, Z. It is not hard to prove that the connection ∇
induced by the metric is uniquely determined by the metric provided we assume
in addition that the torsion of the connection is zero, ie,

∇X Y − ∇Y X − [X, Y ] = 0

for any two vector fields X, Y . An elliptic differential operator on the Rieman-
nian manifold is a second order differential operator of the form

D = g ij (x)∂i ∂j + ci (x)∂i + b(x)


Advanced Probability and Statistics: Applications to Physics and Engineering 221

where we have chosen and fixed the coordinate system x and have defined g ij (x)
so that
g ij (x) = ((gij (x)))−1
where
gij (x)X i (x)Y j (x) = g(X, Y )(x)
A Dirac operator on the Riemannian manifold is an operator of the form

D = Vaμ (x)γ a (∂μ + ieAbμ (x)τb + Γμ (x)) − m

where Vaμ (x) is the Vierbein of the metric gμν (x). We can define the space-time
dependent Dirac matrices
Γμ (x) = Vaμ (x)γ a
Then, we can define the Dirac operator as

D = Γμ (x)(i∂μ + eAμ (x)) − m

where now Aμ (x) is a matrix valued four vector potential. It takes into account
both the Yang-Mills connection Abμ (x)τb and the spinor connection
μ
Γμ (x) = ωab (x)[γ a , γ b ]
The Dirac operator D acts on the space of N-component spinor fields defined
on the manifold M.
[7] Integration on a differentiable manifold
Let M be an n-dimensional manifold and ω a k ≤ n differential form on M.
Stokes’ theorem states that
 
ω= dω
∂M M

In terms of coordinates,
ω(x) = ωμ1 ...μk (x)dxμ1 ∧ ... ∧ dxμk

Then,
dω(x) = ωμ1 ...μk ,m (x)dxm ∧ dxμ1 ∧ ... ∧ dxμk
Then
 
dω = sgn(m, mu1 , ..., μk )
M m,μ1 ,...,μk distinctandinincreasingorder

ωμ1 ...μk ,m (x)dxm dxμ1 ...dxμk
where sgn(m, μ1 , ..., μm ) is the signature of the permutation that takes the se-
quence (m, μ1 , ..., μk ) to the sequence obtained by arranging m, μ1 , ..., μk in
increasing order.
[8] Index of an Elliptic operator
222 Advanced Probability and Statistics: Applications to Physics and Engineering

Now, let P be an elliptic differential operator. Consider the heat kernel

Kt = exp(−tP )

Let Kt (x, y) denotes its kernel, ie,



Kt (x, y)f (y)dy = (exp(−tP ))f (x)

We have 
T r(exp(−tP )) = Kt (x, x)dx
M
Let c1 , ..., ck be the positive eigenvalues of P and ck+1 , ..., cN its negative eigen-
values. Then,
index(P ) = k − (N − k) = 2k − N
On the other hand,

N
T r(exp(−tP )) = exp(−tck )
k=1

Suppose P is positive semi-definite. Then we can write P = Q∗ Q for some


operator Q. We consider

N (t) = T r(exp(−tQ∗ Q)) − T r(exp(−tQQ∗ ))

We claim that N (t) = dim(N (Q))−dim(N (Q∗ ). This can be proved for example
using the singular value decomposition. We write

Q = U DV

where D is non-negative diagonal and U, V are unitary matrices. Then, Q∗ Q =


V D2 V ∗ , QQ∗ = U D2 U ∗ and hence if λ is any non-zero eigenvalue of P = Q∗ Q,
then it is also a non-zero eigenvalue of QQ∗ and vice-versa and further the
multiplicities of this eigenvalue are the same for both Q∗ Q and QQ∗ . Thus, the
claim follows. Thus for all non-zero t, we have that

N (t) = dimN (Q) − dim(N (Q∗ )) = dim(N (Q)) − dim(R(Q))⊥

[9] Design of quantum gates using superstring theory The Lagrangian


density for a superstring is given by

L = (1/2)hαβ h∂α X μ ∂β Xμ

+ψ μT ρα ∂α ψμ

where ψ μ (τ, σ) is a Majorana Fermion field and ρα s are skewsymmetric matri-
ces. It should be noted that for Majorana Fermion fields ψ on R4 , we have

ψ ∗ = ψ T γ5 γ 0
Advanced Probability and Statistics: Applications to Physics and Engineering 223

and
ψ̄ = ψ ∗ γ 0
so that
ψ̄γ μ ∂μ ψ = ψ T γ5 γ μ ∂μ ψ
We note that γ5 γ μ and γ μ are skew-symmetric matrices and we may as well
therefore remove the γ5 factor to get the Fermionic contribution to the La-
grangian density as
ψ T γ μ ∂μ ψ
For strings, howewer, space-time is two dimensional and hence we should replace
γ μ by skew-symmetric matrices ρα to obtain the Fermionic contribution to the
Lagrangian density as
ψ T ρα ∂ α ψ
The infinitesimal supersymmetry transformations that leave the total super-
string action invariant are

δX μ = k T ψ μ , δψ μ = ραT k∂α X μ

Then we find that √ μ


δ(hαβ hX,α Xμ,β ) =

= 2hαβ hX,α
μ
δXμ,β

= 2hαβ hX,α
μ T
k ∂ β ψμ
and on the other hand,
δ(ψ μT ρα ∂α ψμ =
2ψ μT ρα ∂α δψμ =
2ψ μT ρα ρβ kXμ,αβ
As in the case of the Dirac matrices, we assume that

ρα ρβ + ρβ ρα = 2hαβ

and then deduce supersymmetry invariance of the superstring action integral.


[10] Project on the design of quantum gates using strings and su-
perstrings
[1] The superstring Lagrangian density is a quadratic form in the Bosonic
and Fermionic component string fields. The corresponding string-field equations
give the basic equations for the Bosonic and Fermionic strings in decoupled from.
The Bosonic part is simply the wave equation with one time variable and one
spatial variable. The Fermionic part is simply the Dirac equation with zero mass
in one time and one spatial dimension. The solutions to these equations yield
the Bosonic string field as a linear combination of a countably infinite number of
Bosonic creation and annihilation operators and the Fermionic string field as a
linear combination of a countably infinite of Fermionic creation and annihilation
224 Advanced Probability and Statistics: Applications to Physics and Engineering

operators. Thus the quantum superstring acts in a tensor product of Boson and
Fermion Fock space. The Hamiltonian of the superstring can be expressed as
 
H= c(n)α(−n)α(n) + d(n)β(−n)β(n)
n≥1 n≥1

where
α(−n) = α(n)∗ , β(−n) = β(n)∗
and the α s satisfy the CCR while the β  s satisfy the CAR:

[α(n), α(m)] = δ(n + m)

{β(n), β(m)} = δ(n + m)


Now we perturb this Hamiltonian with a gauge field Bμν (X) which corresponds
to a string generalization of the electromagnetic potential for particles. The
corresponding contribution to the action is given by

Bμν (X(τ, σ)) αβ ∂α X μ ∂β X ν dτ dσ

It should be noted that the unperturbed action for the superstring is given by

μ
S[X, ψ] = (1/2) X,α X μ,α dτ dσ

+(1/2) ψ μT .ρα ∂α ψμ dτ σ

where α = 1, 2 and α = 1 corresponds to the τ variable while α = 2 corresponds


to the σ variable. This action has global supersymmetry under Boson-Fermion
exchange. This exchange constitutes the global supersymmetry transformation:

δX μ = −k T ψ μ

δψ μ = ρα kX,α
μ

where k is a two parameter Grassmanian variable. The ρα , α = 1, 2 are 2 × 2


versions of the Dirac matrices. They satisfy the Dirac anticommutation relations

ρα ρβ + ρβ ρα = 2η αβ

where ((η αβ )) = diag[1, −1] is the string sheet metric. This form of the metric
can be obtained by the application of an appropriate Weyl scaling. ρα are
skew-symmetric matrices just as in four space-time dimensions γ μ are skew-
symmetric matrices (See S.Weinberg, Vol.III, Supersymmetry). We shall now
verify invariance of the above superstring action under the above infinitesimal
supersymmetry transformation:
μ
δ((1/2)X,α Xμ,α ) =
Advanced Probability and Statistics: Applications to Physics and Engineering 225

Xμ,α δX,α
μ
=
−2Xμ,α k T ψ,α
μ

δ(ψ μT ρα ψμ,α ) =
δψ μT ρα ψμ,α
+ψ μT ρα δψμ,α
μ T βT
= X,β k ρ ρα ψμ,α

+ψ μT ρα ρβ kXμ,βα
μT
= −ψ,alpha ρα ρβ kXμ,β

+ψ μT ρα ρβ kXμ,βα
= −(ψ μT ρα ρβ kXμ,β ),α
+2ψ μT ρα ρβ kXμ,βα
= −(ψ μT ρα ρβ kXμ,β ),α
+ψ μT {ρα , ρβ }kXμ,βα
= η αβ ψ μT kXμ,βα
where a total two divergence has been neglected since such a term does not
contribute to the action integral. Adding the two terms, we find that on neglect
of total divergence terms and using the antisymmetry of , the variation in the
action is given by
δS = −2Xμ,α k T ψ,α
μ
− Xμ,α
,α T
k ψμ
=0

[11] Superstring action for local supersymmetry transformations

pμα = (X,α
μ
− θAT γ μ θ,α
A
)
where A = 1, 2. Define

L1 = (1/2)pμα pα μ να
μ = (1/2)ημν pα p

where α is raised using the worldsheet metric diag[1, −1]. Also define

L2 = c 1 αβ μ
X,α (θ1T γμ θ,β
1
− θ2T γμ θ,β
2
)

+c2 αβ
(θ1T γ μ θ,α
1
).(θ2T γμ θ,β
2
)
The local supersymmetry transformations are

δX μ = k AT γ μ θA , δθA = k A
226 Advanced Probability and Statistics: Applications to Physics and Engineering

where k A is a Grassmannian parameter dependent on the space-time coordinates


τ, σ. We wish to select c1 , c2 so as to get local supersymmetry of the action
(L1 + L2 + L3 )dτ dσ.

[12] Spinors
Let V be a vector space and Q a quadratic form on V so that for any u, v ∈ V ,
we have a multiplication u.v satisfying

uv + vu = B(u, v)

where B(u, v) is the symmetric bilinear form on V induced by Q. Thus,

u2 = B(u, u)/2 = Q(u)

and hence
B(u, v) = Q(u + v) − Q(u) − Q(v)
or equivalently,
B(u, v) = (Q(u + v) − Q(u − v))/2
We assume that the product u1 ...un is defined for any u1 , ..., un ∈ V . The formal
linear span of all such products is denoted by C(V ) and (C(V ), Q), ie, C(V )
equipped with the quadratic form Q is called a Clifford algebra over C(V )+
denotes the subalgebra of C(V ) spanned by even number of products. One
example of a Clifford algebra is as follows. Let ∧ be an antisymmetric tensor
product on V . For u ∈ V , let a(u)∗ act on ∧V by

a(u)∗ w1 ∧ ... ∧ un = u ∧ w1 ∧ ... ∧ un

and let a(u) act on the same by the adjoint operation, ie, contraction:


n
a(u)w1 ∧ ... ∧ un = c(n) < u, wk > (−1)k w1 ...wk−1 ∧ wk+1 ∧ ... ∧ wn
k=1

Then, it is easy to see that

a(u)a(v)∗ + a(v)∗ a(u) =< u, v >

Now we are in a position to describe a construction of a Clifford algebra over


an even dimensional vector space. Let V be a real vector space of dimension 2n
and let {e1 , ..., e2n } be a basis for this vector space satisfying

(ek |em ) = (en+k |en+m ) = 0, 1 ≤ k, m ≤ n,

(ek |en+m ) = δkm , 1 ≤ k, m ≤ n


Note that (.|.) is an inner product on V . We wish to construct a Clifford
structure so that
uv + vu = (u|v), u, v ∈ V
Advanced Probability and Statistics: Applications to Physics and Engineering 227

To this end, we consider the n-dimensional real vector space

W = span{e1 , ..., en }

and consider the 2n dimensional vector space ∧W where ∧ is an antisymmetric


tensor product on W . Then define the actions of ek and en+k , 1 ≤ k ≤ n on
∧W by
γ(ek ).(u1 ∧ ... ∧ ur ) = ek ∧ u1 ∧ ... ∧ ur , u1 , ..., ur ∈ W
and

r
γ(en+k )(u1 ∧ ... ∧ ur ) = c(r) (−1)m−1 (en+k |um )(u1 ∧ ... ∧ ûm ∧ ... ∧ ur )
m=1

Then, it is immediate that for 1 ≤ k, r ≤ n,

γ(ek )γ(er ) + γ(er )γ(ek ) = 0,

γ(en+k )γ(en+r ) + γ(en+r )γ(en+k ) = 0,


γ(ek )γ(en+r ) + γ(en+r )γ(ek ) = δ(k, r)
and hence on extending the map γ linearly and by formally defining an element
u1 ...ur with u1 , ..., ur ∈ V so that

γ(u1 ...ur ) = γ(u1 )...γ(ur )

we get that
γ(u)γ(v) + γ(v)γ(u) = (u|v), u, v ∈ V
Thus, we get a Clifford structure on V and it is clear that the dimension of this
Clifford algebra is 22n .

[13] Bosonic string field theory.


The Bosonic string field is given by

X μ (τ, σ) = xμ +pμ τ +i [(α(n)μ /n)exp(2πin(τ −σ))+(β(n)μ /n)exp(2πin(τ +σ))]
n=0

where
α(n)μ∗ = α(−n)μ , n = 0
in order that X μ be a Hermitian operator field and this string field satisfies the
wave equation
τ − X,σσ = 0
μ μ
X,τ
that is derived from the action

S[X] = (1/2) μ
(X,τ Xμ,τ − X,σ
μ
Xμ,σ )dτ dσ
228 Advanced Probability and Statistics: Applications to Physics and Engineering

Here, the metric of D-dimensional space-time is given by

((ημν )) = diag[1, −1, −1, ..., −1]

The space-time Lorentz group is SO(1, D − 1). The canonical momentum field
is
μ
Pμ = δS/δX,τ = Xμ,τ
and hence the CCR gives

[X μ (τ, σ), Xν,τ (τ, σ  )] = iδνμ δ(σ − σ  )

or equivalently,
[X μ (τ, σ), X,τ
ν
(τ, σ  )] = iη μν δ(σ − σ  )
and hence we get
 
[ (α(n)μ /n)exp(2πin(τ −σ))+(β(n)μ /n)exp(2πin(τ +σ)), (α(m)μ /m)
n=0 m=0

exp(2πim(τ −σ))+(β(m) )exp(2πim(τ +σ))] = (−η μν /2π)δ(σ−σ  )


μ

from which we deduce that


[α(n)μ , α(m)ν ] = (−n/4π 2 )η μν δ(n + m),
[β(n)μ , β(m)ν ] = (−n/4π 2 )η μν δ(n + m),
[α(n)μ , β(m)ν ] = 0
for all n, m = 0. We now construct the string field Hamiltonian.

H = (Pμ X,τ μ
− L)dσ =

μ μ
= (1/2) (X,τ Xμ,τ + X,σ Xμ,σ )dσ

= p2 /2 − 2π 2 (αμ (−n)αμ (n) + β μ (−n)βμ (n))
n=0

where
p2 = pμ pμ = p02 − pi pi
The condition that the state of the system have total energy E gives (H −
E)|φ >= 0, or formally, on such states, H = E. On the other hand, from
relativistic mechanics, we know that the mass is given by M 2 = p2 = pμ pμ .
Thus, we get the result that the mass operator of the string is given by

2M 2 − 4π 2 (α(−n).α(n) + β(−n).β(n)) = E
n≥1

or equivalently,

M 2 = E/2 − 2π 2 (α(−n).α(n) + β(−n).β(n))
n≥1
Advanced Probability and Statistics: Applications to Physics and Engineering 229

Formally, the propagator of the (Bosonic) string Δ = (H − E)−1 which can be


expressed as  1
Δ= z H−E−1 dz
0
Equivalently, apart from scaling factors, the Bosonic string propagator is given
by  1
Δ= z L dz
0
where 
L = −a α(n).α(−n)
n≥1

where a is a constant. We are absorbing the other modes β(n) into the α(n) s.
We can evaluate the matrix elements of the propagator as follows.

[α(n)μ , α(−n)ν ] = nη μν

and if |w > is a coherent state of the set of harmonic oscillators α(n), n ≥ 1, we


have
α(n)|w >= w(n)|w >
Suppose |k1μ1 , k2μ2 , ... > are number states of the system of oscillators. Then, we
have
α(−n).α(n)|k1 k2 ... >= kn |k1 k2 ... >
where
kn = kn0 − kn1 − ... − knD−1
and hence
 
z n≥1 α(−n)α(n)
|k1 k2 ... >= (Πn≥1 z kn )|k1 k2 ... >= z n kn
|k1 k2 ... >

whence 
< m1 m2 ...|z L |k1 k2 ... >= z −a n kn
δ[m − k]

[14] Conformal weights


Let A(z) be an analytic function of the complex variable z. Suppose that
on applying the transformation w = w(z) or equivalently its inverse z = z(w),
this function changes to

B(w) = A(z)(dz/dw)J

Then we say that A has conformal weight w. Consider an infinitesimal analytic


transformation
w = z + (z)
or equivalently,
z = w − (w)
230 Advanced Probability and Statistics: Applications to Physics and Engineering

since (z) is assumed to be of the first order of smallness. Then, if A has


conformal weight J, it gets transformed under this infinitesimal transformation
to A + δA where

A(w) + δA(w) = A(z)(1 −  (w))J = A(z)(1 − J  (w))

or equivalently,

A(z) − A (z) (z) + δA(z) = A(z)(1 − J  (z))

so that
δA(z) = A (z) (z) − JA(z)  (z)
This is the condition for A to have conformal weight J. Now, consider the
following vertex function for a Bosonic string:
 
V (k, z) =: exp(ik.X(z)) := exp(k. α(n)z n /n).exp(k. α(n)z n /n)
n≤−1 n≥1

We wish to compute its conformal weight. First we introduce the Fourier com-
ponents of the energy-momentum tensor:

Ln = (1/2) α(n − m)α(m)
m

and find that


 
[Ln , X(z)] = (1/2)[ α(n − m)α(m), i (α(n)z n /n)]
m n
 
= (1/2) {α(n − m), [α(m), i α(k)z k /k]}
m k
 
−m
= (1/2) {α(n − m), −iz } = −i α(n − m)z −m
m m

= −iz −n α(m)z m
m

On the other hand, 


zdX(z)/dz = i α(m)z m
m

and hence we deduce that

[Ln , X(z)] = −z −n dX(z)/dz

Now consider the infinitesimal transformation w = z + δz = z + (z). Under


this transformation, X(z) changes to X(z − (z)) = X(z) − (z)X  (z). Hence,
we can identify
(z) = .z −n
Advanced Probability and Statistics: Applications to Physics and Engineering 231

for the infinitesimal Lie transformation .Ln .

[15] Super-strings
 √
S[X, ψ] = (1/2) hhαβ X,α
μ
Xμ,β d2 σ


a 2
− ψ a ρα ψ,α d σ

Weyl scaling: First, we can choose our string coordinates (τ, σ) so that

((hαβ )) = exp(φ)diag[1, −1]

and then

h = exp(2φ), ((hαβ )) = exp(−φ)diag[1, −1], (( hhαβ )) = diag[1, −1]

and so there is no need for Weyl scaling. Already, in our system of coordinates,
the Bosonic part of the string action is

(1/2) η αβ X,αμ
Xμ,β d2 σ

The total angular momentum tensor



J μν = (X μ X,τ
ν
− X ν X,τ
μ
)dσ

is conserved. Indeed, by the equations of motion


 1
τ − X X,τ τ )dσ
μν
dJ /dτ = (X μ X,τ
ν ν μ
0
 1
= (X μ X,σσ
ν
− X ν X,σσ
μ
)dσ
0
 1
= (X μ X,σ
ν
− X ν X,σ
μ
),σ dσ = 0
0

The conservation of the angular momentum can also be deduced as a Noether


current conservation corresponding to the invariance of the Lagrangian under
Lorentz transformations of the D-dimensional space-time. Writing

X μ (τ, σ) = i [(αμ (n)/n)exp(−2πin(τ − σ)) + (β μ (n)/n)exp(−2πi(τ + σ))]
n=0

we get that 
J μν = n−1 (αμ (n)αν (−n) − αν (n)αμ (−n))
n=0
232 Advanced Probability and Statistics: Applications to Physics and Engineering


+ n−1 (β μ (n)β ν (−n) − β ν (n)β μ (−n))
n=0

Note that we’ve used


[αμ (n), β ν (m)] = 0
Now we look at the Fermionic part of the super-string equations

ρα ψ,α
a
=0

ie,
ρ0 ψ,τ
a
+ ρ1 ψ,σ
a
=0
We first start with the Fermionic Lagrangian density

LF = ψ a ρα ψ,α
a

= ψ a ρ0 ψ,τ
a
+ ψ a ρ1 ψ,σ
a

The canonical momentum field is


a
Pa = ∂LF /∂ψ,τ = − ρ0 ψ a

Note that ρα , α = 0, 1 are 2 × 2 skew-symmetric matrices. Thus, the Hamilto-


nian density is given by
ψ a = ρ0 Pa
and
HF = PaT ψ,τ
a
− LF = −ψ a ρ1 ψ,σ
a

The CAR are


{ψ a (τ, σ), Pb (τ, σ  )T } = δba δ(σ − σ  )I2
or equivalently,

{ψ a (τ, σ), ψ b (τ, σ  )T } ρ0 = δba δ(σ − σ  )I2

or equivalently,

{ψ a (τ, σ), ψb (τ, σ  )T } = −δba δ(σ − σ  )ρ0

Note that ψ a are Majorana Fermions, which means that

ψ a∗ = ψ aT ρ0

and hence
ψ̄ a = ψ a∗ ρ0 = ψ aT
We can expand the solution as

ψ a (τ, σ) = S a (n, τ )exp(−2πinσ)
n
Advanced Probability and Statistics: Applications to Physics and Engineering 233

For this to satisfy the above equation of motion, we require that

ρ0 ∂τ S a (n, τ ) = 2πinρ1 S a (n, τ )

and hence
S a (n, τ ) = exp(2πinτ A)S a (n)
where
A = ρ0 ρ1
The eigenvalues of A are ±1 and therefore as in the Bosonic case, we again
get the result that in the Fermionic case, the wave field is a superposition of a
forward and a backward travelling wave:

A = e0 eT0 − e1 eT1 , eT0 e1 = 0, eT0 e0 = eT1 e1 = 1

and hence

exp(2πinτ A) = exp(2πinτ )e0 eT0 + exp(−2πinτ )e1 eT1

and hence

S a (n, τ ) = exp(2πinτ )P0 S a (n) + exp(−2πinτ )P1 S a (n)

where
P0 = e0 eT0 , P1 = e1 eT1
and hence,

ψ a (τ, σ) = [P0 S a (n)exp(2πin(τ − σ)) + P1 S a (n)exp(−2πin(τ + σ))]
n

For simplicity of notation, we denote P0 S a (n) by S a (n) and P1 S a (n) by T a (n).


Then, we can express the solution for this two dimensional massless Dirac equa-
tion as

ψ a (τ, σ) = S a (n)exp(2πin(τ − σ)) + T a (n)exp(−2πin(τ + σ))
n

We are now in a position, to derive the CAR satisfied by S a (n), T a (n), n ∈ Z


for our Fermionic string.

[16] Energy-Momentum tensor and super-current for a superstring


μ
L = (1/2)X,α Xμ,α + ψ μT ρ0 (ρ0 ∂0 + ρ1 ∂1 )ψμ

where
ρ0 = σ1 , ρ1 = iσ2
Thus
(ρ0 )2 = I, ρ0 ρ1 = iσ1 σ2 = −σ3
234 Advanced Probability and Statistics: Applications to Physics and Engineering

Then writing
μ μ T
ψ μ = [ψ+ , ψ− ]
we get that the Fermionic component of the superstring Lagrangian density is
given by
LF = ψ μT ρ0 (ρ0 ∂0 + ρ1 ∂1 )ψμ
= ψ μT (I∂0 − σ3 ∂1 )ψμ =
μ μ
ψ+ ∂− ψμ+ + ψ− ∂+ ψμ−
where
∂ ± = ∂0 ± ∂ 1
Note that we may define

x+ = (τ + σ)/2, x− = (τ − σ)/2

and then
∂0 = (1/2)(∂/∂x+ + ∂/∂x− )
∂1 = (1/2)(∂/∂x+ − ∂/∂x− )
and hence
∂/∂x+ = ∂0 + ∂1 = ∂+ ,
∂/∂x− = ∂0 − ∂1 = ∂−
The Euler-Lagrange equations for the Fermionic components are easily seen to
be
∂+ ψ− = 0, ∂− ψ+ = 0
μ
where ψ+ is an abbreviation for ((ψ+ )) and likewise ψ− is an abbreviation for
μ
ψ− . These equations imply that ψ+ is any function of τ − σ only and ψ− is
any function of τ + σ only. The solutions to these equations are of two kinds
depending upon the boundary conditions. These boundary conditions can be
described as follows. The spatial string variable σ is assumed to vary over
π
[0, π]. The variation in SF = 0 LF dσ gives us the following boundary term on
integration by parts:
ψ+ δψ+ − ψ− δψ−
This must vanish at the boundary, ie, at σ = 0, π. To make it vanish at σ = 0,
we assume the boundary condition that

ψ+ (τ, 0) = ψ− (τ, 0)

and hence
δψ+ (τ, 0) = δψ− (τ, 0)
To make it vanish at σ = π, we may assume that the boundary condition that
either
ψ+ (τ, π) = ψ− (τ, π)
Advanced Probability and Statistics: Applications to Physics and Engineering 235

and hence
δψ+ (τ, π) = δψ− (τ, π)
or else
ψ+ (τ, π) = −ψ− (τ, π)
and hence
δψ+ (τ, π) = −δψ− (τ, π)
The first kind of boundary conditions implies that ψ± admit the modal expan-
sions 
μ
ψ+ (τ, σ) = bμ (n)exp(in(τ − σ)),
n∈Z
μ

ψ− (τ, σ) = bμ (n)exp(in(τ + σ))
n∈Z

and the second kind of boundary conditions implies that


μ

ψ+ (τ, σ) = bμ (n)exp(in(τ − σ)),
n∈Z+1/2

μ

ψ− (τ, σ) = bμ (n)exp(in(τ + σ))
n∈Z+1/2

This is the situation for general strings, open or closed. If we further restrict to
closed strings, then we must impose the periodic boundary conditions, that for
any solutions, the value at σ = 0 must coincide with the value at σ = π. This
additional restriction implies that we obtain the following modal expansions:
μ

ψ+ (τ, σ) = bμ (n)exp(2in(τ − σ))
n∈Z

and 
μ
ψ− (τ, σ) = bμ (n).exp(2in(τ + σ))
n∈Z

for the first kind of boundary conditions and for the second kind, we do not
have any periodic solutions unless we extend the spatial domain to the range
[−π, π] in which case, we get the (2π periodic) solution
μ

ψ+ (τ, σ) = bμ (n)exp(2in(τ − σ))
n∈Z+1/2

and 
μ
ψ− (τ, σ) = bμ (n).exp(2in(τ + σ))
n∈Z+1/2

We now compute the components of the energy-momentum tensor as well as


the super-current of our superstring. The Fermionic components are

TF++ = (∂LF /∂∂+ ψ− ).∂+ ψ− =


236 Advanced Probability and Statistics: Applications to Physics and Engineering

ψ− .∂+ ψ− = 0
TF−+ = (∂LF /∂∂− ψ+ ).∂+ ψ+
= ψ+ .∂+ ψ+
TF+− = (∂LF /∂∂+ ψ− ).∂− ψ−
= ψ− .∂− ψ−
and finally,
TF−− = (∂LF /∂∂− ψ+ ).∂− ψ+
ψ+ ∂ − ψ+ = 0
Now, we choose a coordinate system followed by a Weyl rescaling so that the
metric becomes

ds2 = dτ 2 − dσ 2 = d(τ − σ).d(τ + σ) = dx− dx+

so that in the (x+, x− ) system, the world-sheet metric tensor is given by

g++ g+−
=
g−+ g−−

0 1
1 0
ie,
g++ = g−− = 0, g+− = g−+ = 1
and then we get the covariant components of the Fermionic part of the energy-
momentum tensor are given by

TF ++ = g+− TF−+ = TF−+ = ψ+ .∂+ ψ+ ,

Likewise,
TF −− = g−+ TF+− = ψ− ∂− ψ−
TF +− = g+− TF−− = TF−− = ψ+ ∂− ψ+ = 0
TF −+ == g−+ TF++ = 0
We note the Fermionic energy-momentum conservation laws

∂ + = g +− ∂− = ∂− ,

∂ − = g −+ ∂+ = ∂+
and hence

∂ + TF ++ + ∂ − TF −+ = ∂ + TF F+ + = ∂− TF ++ = ∂− (ψ+ ∂+ ψ+ ) = 0

since
∂− ψ+ = 0, ∂− ∂+ = ∂+ ∂−
Advanced Probability and Statistics: Applications to Physics and Engineering 237

Likewise,

∂ − TF −− + ∂ + TF +− = ∂+ TF −− = ∂+ (ψ− ∂+ ψ− ) = 0

since
∂ + ψ− = 0
We now also observe that the Noether theorem applied to the supersymmetry
invariance of the combined Boson-Fermion action implies the conservation of the
supercurrent. The components of the supercurrent are obtained by observing
that the variation of the total action under an infinitesimal local supersymmetry
transformation is given by and expression of the form

δ ,susy S = (∂α (σ)(σ))J α (σ)d2 σ

and hence if the equations of motion are satisfied, then the above variation must
vanish for all infinitesimal local parameters and hence the supercurrent must
be conserved:
∂α J α (σ) = 0
This conservation law can be stated in an alternate form as

∂− J+ = 0, ∂+ J− = 0

where
J+ = ∂+ X μ ψμ+ , J− = ∂− X μ ψμ−
In this form, it is immediate to see these currents are conserved from the equa-
tions of motion:
∂− ∂ + = ∂ α ∂ α
and hence
∂− ∂+ X μ = 0,
∂− ψμ+ = 0

[17] Super-symmetric, gauge invariant and Lorentz invariant action


for non-Abelian gauge fields

D = γ5 ∂θ − γ μ θ∂μ
DL = (1 + γ5 )D/2, DR = (1 − γ5 )D/2
V A (x, θ) is the gauge superfield which in the Wess-Zumino gauge, can be ex-
pressed as
V A (x, θ) = θT γ μ θ.VμA (x)

+θT θ.θT λA (x) + (θT θ)2 DA


238 Advanced Probability and Statistics: Applications to Physics and Engineering

We put
θL = (1 + γ5 )θ/2, θR = (1 − γ5 )θ/2
Then,
D L = ∂ θL − γ μ θ R ∂ μ
D R = − ∂ θR − γ μ θ L ∂ μ
where we have used the identity,

γ5 γ μ + γ μ γ5 = 0,

Note that V A is not a Chiral field. We write

t.V = tA V A

where summation over the non-Abelian gauge index A is implied. {tA } form
a complete set of Hermitian generators for the gauge group assumed to be a
subgroup of U (N ). We define the left Chiral fields

WaA (x, θ) = DR
T
.DR (exp(−t.V ).DLa exp(t.V ))

Note that Φ is a left Chiral superfield iff DR Φ = 0 and it is right Chiral iff


DL Φ = 0. By definition, Φ is left Chiral iff it is a function of only θL and

xμ+ = xμ + θR
T
γ μ θL

Note that
DR xμ+ = −γ μ θL + (− )( γ μ θL ) = 0
and likewise, Φ is right Chiral iff it is a function of only θR and

xμ− = xμ − θR
T
γ μ θL

We note that
DL xμ− = 0
Also,
DR θLa = − ∂θR θLa = 0
and likewise,
DL θRa = 0
since
(1 + γ5 )(1 − γ5 ) = (1 − γ5 )(1 + γ5 ) = 0
These relations can be expressed in matrix notation as
T T
D L θR = D R θL =0

Also note that


T
θR γ μ θL = θT (1 − γ5 ) γ μ (1 + γ5 )θ/4
Advanced Probability and Statistics: Applications to Physics and Engineering 239

= θT (1 − γ5 ) γ μ θ/2
Then,
V A (x, θ) = θT γ μ θ.VμA (x)

+θT θ.θT λA (x) + (θT θ)2 DA


gives
exp(t.V ) = 1 + θT γ μ θ.VμA (x)tA

+θT θ.θT λA (x)tA + (θT θ)2 DA tA


+(θT γ μ θ.VμA (x)tA )2 /2
The last term here is the same as

θT γ μ θ.θT γ ν θVμA VνB tA tB

[18] Lagrangian for Abelian gauge superfields


[1] Show that
L = c1 Fμν F μν + c2 λT γ5 γ μ ∂μ λ
+c3 D2
is supersymmetry invariant for an appropriate choice of the constants c1 , c2 , c3 .
Here
Fμν = Vν,μ − Vμ,ν
Here the superfield is given by

S[x, θ] = C(x) + θT ω(x) + θT θM (x)

+θT γ5 θN (x) + θT γ μ θ.Vμ (x)


+θT θθT γ5 (λ(x) + a.γ μ ∂μ ω(x))
+(θT θ)2 (D(x) + bC(x))
The supersymmetry generator is given by αT L where α is a Majorana Fermionic
parameter and
L = γ5 ∂θ + γ μ θ∂μ
We have
δC(x) = αT γ5 2 ω(x) = −αT γ5 ω(x),
θT θδM (x)
+θT γ5 θδN (x) + θT γ μ θ.δVμ (x)
= αT γ μ θθT ω,μ (x)
+αT γ5 ∂θ [θT θ.θT γ5 (λ + aγ μ ω,μ )]
240 Advanced Probability and Statistics: Applications to Physics and Engineering

Now, we observe that for any function w(x) ∈ C4 , we have

∂θ (θT θ.θT w)

= 2 θ.θT w + θT θ.w

[19] Problems in supersymmetric quantum theory.


[1] Consider the superfield

S[x, θ] = C(x) + θT ω(x) + θT θ.M (x)

+θT γ5 θ.N (x) + θT γ μ θ.Vμ (x)


+θT θ.θT γ5 (λ(x) + a.γ μ ω,μ )
+(θT θ)2 (D(x) + b.C(x))
The infinitesimal supersymmetry transformation is

D = α T γ5 L

where
L = γ5 ∂θ + γ μ θ∂μ
Then the change in the component fields under such an infinitesimal supersym-
metry transformation are given by

δC(x) = −γ5 αω,μ (x)

θT δω(x) = αT γ5 .[γ μ θC,μ (x) + γ5 .2 θM + 2(γ5 )2 θN + 2γ5 2 γ μ θ.Vμ ]


or equivalently,
δω(x) = (γ5 γ μ )T α.C,μ + 2 αM
+2γ5 α.N + 2 γ μ αVμ
= γ5 γ μ α.C,μ + 2 αM
+2γ5 α.N + 2 γ μ αVμ
or equivalently,

δω(x) = γ5 γ μ α.C,μ + 2α.M + 2γ5 α.N + 2γ μ α.Vμ

Likewise,
θT θ.δM + θT γ5 θ.δN + θT γ μ θ.δVμ
= αT γ5 [γ μ θ.θT ω,μ
+γ5 ∂θ (θT θθT γ5 (λ + aγ μ ω,μ ))]
= αT γ5 [γ μ θ.θT ω,μ
+γ5 (2 θθT γ5 (λ + aγ μ ω,μ ) + (θT θ)γ5 (λ + aγ μ ω,μ )]
Advanced Probability and Statistics: Applications to Physics and Engineering 241

= αT γ5 γ μ θ.θT ω,μ
−2αT θθT γ5 (λ + aγ μ ω,μ ) − (θT θ)αT γ5 (λ + aγ μ ω,μ )
Writing
θθT = c1 θT θ + c2 θT γ5 θ.γ5 + c3 θT γ μ θ. γμ
we get on equating coefficients of θT θ, θT γ5 θ and θT γ μ θ respectively, the
equations
δM = αT γ5 (2c1 λ + ((2c1 − 1)a − c1 )γ μ ω,μ ),
δN = 2c2 αT λ + c2 (1 + 2a)αT γ μ ω,μ
= c2 αT (2λ + (1 + 2a)γ μ ω,μ )

[20] Bosonic string theory: Derivation of the Einstein field equations for
gravitation in vacuum based on conformal invariance of the string action.
(τ, σ) represent respectively the time variable and the length variable along
the string, τ ≥ 0, 0 ≤ σ ≤ 1. The string is assumed to be D dimensional, so
that any point on its surface is parameterized by (τ, σ): X μ = X μ (τ, σ). The
space-time metric on the string surface is a two dimensional metric given by
hαβ (τ, σ). Thus, the string action functional is given by


S1 (X) = hαβ (τ, σ) −hgμν (X(τ, σ))X,α μ ν
X,β dτ dσ

We usually assume the string metric to be flat, ie

((hαβ )) = diag[1, −1]

Then writing
∂ α = hαβ ∂β
we get
∂ 0 = ∂0 = ∂/∂τ,
∂ 1 = −∂1 = −∂/∂σ
and then the string action can be expressed as

S1 [X] = gμν (X)∂α X μ .∂ α X ν dτ dσ

Under an infinitesimal conformal Weyl transformation, the metric changes to

exp( φ(X))gμν (X) = (1 + .φ(X)).gμν (X)

and then the change in the string action becomes



φ(X).gμν (X).∂α X μ .∂ α X ν dτ.dσ
242 Advanced Probability and Statistics: Applications to Physics and Engineering

What we require is a quantum average of this variation in the action. To evaluate


this, we write
X μ (τ, σ) = X0μ (τ, σ) + xμ (τ, σ)
where xμ (τ, σ) is a small quantum fluctuation. The propagator of this fluctua-
tion can be derived using the Green’s function method applied to the equations
of motion of the free string. Specifically, we have the flat space-time equations
of motion
∂τ2 X μ (τ, σ) = ∂σ2 X μ (τ, σ)
and then defining the propagator as

Gμν (τ, σ|τ  , σ  ) =< T (X μ (τ, σ).X ν (τ  , σ  )) >

= θ(τ − τ  ) < X μ (τ, σ).X ν (τ  , σ  ) > +θ(τ  − τ ) < X ν (τ  , σ  ).X μ (τ, σ) >
the equations of motion and the equal time Bosonic commutation relations

[∂τ X μ (τ, σ), Xν(τ, σ  )] = η μν δ(σ − σ  )

imply that the string propagator satisfies the following pde:

(∂τ2 − ∂σ2 )Gμν (τ, σ|τ  , σ  ) = η μν δ(τ − τ  ).δ(σ − σ  )

Hence we can formally express this propagator as

Gμν (τ, σ|τ  , σ  ) = Gμν (τ − τ  , σ − σ  ) =



d2 k
η μν
exp(i(k1 (τ − τ  ) − k2 (σ − σ  )))
k2
In this expression,
d2 k = dk1 dk2 , k 2 = k12 − k22
We now evaluate the change in the string action caused by a Weyl conformal
transformation of the metric:

gμν (X) → exp( φ(X))gμν (X) = (1 + φ(x))gμν (X)

The change in the action under such a transformation is



δS[X] = φ(X)gμν (X)∂α X μ .∂ α X ν dτ dσ

Now we evaluate the average of this quantity:

gμν (X) ≈ gμν (X0 ) + (1/2)gμν,ρσ (X0 )xρ xσ

In a system of normal coordinates around X0μ (τ, σ), we have

gμν,ρσ (X0 ) = Rμρνσ (X0 )


Advanced Probability and Statistics: Applications to Physics and Engineering 243

since in such a normal coordinate system, the first order partial derivatives of
gμν vanish at X0 . Further, from the above calculation of the propagator, using
dimensional regularization,

< xρ (τ, σ).xσ (τ, σ) >= η ρσ limτ →τ  ,σ →σ d2+ k.exp(i(k1 (τ −τ  )

−k2 (σ−σ  )))d2 k/k 2


 
= η ρσ k 1+ .dk/k 2 = dk/k 1− ≈ η ρσ /

and hence we get



< δS[X] >= η ρσ φ(X0 )Rμρνσ (X0 (τ, σ))∂α X0μ ∂ α X0ν dτ dσ

= φ(X0 )Rμν (X0 )∂α X0μ ∂ α X0ν dτ dσ

and the condition for this variation to be zero, ie conformal invariance of the
quantum averaged string action is that

Rμν = 0

ie the Einstein field equations be satisfied. This is true in a normal coordinate


system, but since Rμν is a tensor, it should be true in all reference frames. Thus,
we have proved that the Einstein field equations in vacuum naturally follow from
the conformal invariance of the string action.

[21] Virasoro algebra in bosonic quantum string theory


Study project. Expand the bosonic string field as a Fourier series describing
forward and backward propagating waves having as coefficients creation and
annihilation operators and express the components of the energy momentum
tensor as a Fourier series with coefficients being quadratic combinations of cre-
ation and annihilation operators. Derive the Virasoro Lie algebra commutation
relations between these Fourier coefficients of the energy-momentum tensor in-
troducing central charge terms caused by the ambiguity of the operator ordering
in the energy component.
Chapter 8

Superconductivity

Notes on superconductivity using path integrals


[1] The action for the electrons is described by

S[ψ] = ψa (x)∗ (i∂t − E(−i∇ + eA(x)) + eV (x))ψa (x)d4 x

+ Vab (x, y)ψa (x)∗ ψa (x)ψb (y)∗ ψb (y)d4 xd4 y

To calculate the quantum effective action, we must first evaluate the path inte-
gral 
Γ(A, V ) = exp(iS[ψ])Dψ.Dψ ∗

and then derive formulas for the superconductivity current etc. from this effec-
tive action. We add another term to this action, namely

ΔS[ψ, Ψ] = − Vab (x, y)(ψa (x)∗ ψb (y)∗ −Ψab (x, y)∗ )(ψa (x)ψb (y)−Ψab (x, y))d4 xd4 y

It is immediate that the path integral corresponding to the modified action



exp(iS[ψ] + iΔS[ψ, Ψ])Dψ.Dψ ∗ .DΨ.DΨ∗

is the same as Γ(A, V ) except for a multiplicative constant that is independent


of the external electromagnetic field A, V . Now,

S[ψ] + ΔS[ψ, Ψ] = S[ψ] = ψa (x)∗ (i∂t − E(−i∇ + eA(x)) + eV (x))ψa (x)d4 x

+ [Vab (x, y)(ψa (x)∗ ψb (y)∗ Ψab (x, y) + ψa (x)ψb (y)Ψab (x, y)∗

−Ψab (x, y)∗ Ψab (x, y))]d4 xd4 y

245
246 Advanced Probability and Statistics: Applications to Physics and Engineering

Remark: ψa (x) describes the Fermion wave field of a-type particles at the
space-time point x while ψa (x)∗ ψa (x) describes the number density operator
for the Fermion field of a-type particles at the space-time point x. Vab (x, y)
described the interaction potential between one particle of a type located at x
and another particle of b type located at y. Ψab (x, y) describes the Cooper pair
field formed by a Fermion of type a located at x with another another Fermion of
type b located at y. In terms of matrices, we can write the various components
of the above Lagrangian density as
L1 = δ(y − x)ψa (y)∗ (i∂t − E(−i∇ + eA(x)) + eV (x))ψa (x)

+(ψa (x)∗ ψb (y)∗ Ψab (x, y) + ψa (x)ψb (y)Ψab (x, y)∗


δ(y − x)(i∂t − E(−i∇ + eA(x)) + eV (x)) V (x, y) ⊗ Ψ(x, y)
= [ψ(y)∗ , ψ(y)]
V (x, y) ⊗ Ψ(x, y)∗ δ(y − x)(i∂t − E(−i∇ + eA(x)) + eV (x))

ψ(x)
ψ(x)∗

The associated path integral for this component taken over ψ, ψ ∗ has the form

ω − E(−i∇ + eA) + eV V ⊗Ψ
det(
V ⊗ Ψ∗ ω − E(−i∇ + eA) + eV .exp(−i Vab (x, y)Ψab (x, y)∗ Ψab (x, y)d4 xd4 y)

where ω = i∂t is in the frequency domain. The logarithm of this quantity is the
quantum effective action. It is therefore given by

Γ(A, V, Ψ) =

Vab (x, y)Ψab (x, y)∗ Ψab (x, y)d4 xd4 y

ω − E(−i∇ + eA) + eV V ⊗Ψ
+i.log(det(
V ⊗ Ψ∗ ω − E(−i∇ + eA) + eV
Some properties of the quantum effective potential: Consider the path integral
for an action S[φ] with a coupling current:
 
Z(J) = exp(iS[φ] + i Jφ.d4 x)Dφ

We have
δlog(Z(J))/δJ(x) = i < φ >J (x)
Consider the equation (as in large deviation theory)

(δ/δJ(x))(i Jψd4 x − log(Z(J))) = 0

This gives
ψ(x) =< φ >J (x)
Let the solution to this equation be given by

J(x) = Jψ (x)
Advanced Probability and Statistics: Applications to Physics and Engineering 247

Then, define the quantum effective action as



Γ(ψ) = i Jψ .ψ.d4 x − log(Z(Jψ )

Γ(ψ) is called the quantum effective action corresponding to the action S. We


have 
δΓ(ψ)/δψ(x) = iJψ (x) + i (δJψ (y)/δψ(x)).ψ(x)d4 x

− (δZ(J)/δJ)(y)|J=Jψ δJψ (y)/δψ(x)d4 x

= iJψ (x) + i (δJψ (y)/δψ(x)).ψ(x)d4 x

−i ψ(x).δJψ (y)/δψ(x)d4 x

= iJψ (x)
Now suppose that the original action and path measure is invariant under an
infinitesimal transformation φ → φ(x) + χ(x). Then what can we say about a
corresponding invariance of the quantum effective action ? We observe that

Simulating path integrals for fields using MATLAB. Consider for example
the KG path integral in an external electromagnetic field:
 
Z[Aμ ] = exp(i [(1/2)(∂μ +ieAμ (x))φ(x).(∂ μ −ieAμ (x))φ(x)−(m2 /2)φ2 ]d4 x)Dφ

This integral is evaluated as an infinite dimensional determinant using Gaussian


integration theory:

log(Z[Aμ ]) = log(det((∂μ + ieAμ (x))(∂ μ − ieAμ (x))) − m2 I/2)))

Reference:
D.Swaroop and H.Parthasarathy, ”Simulation using MATLAB of the quan-
tum effective action for Cooper pairs given the electromagnetic field”, Technical
Report, NSUT, 2019.

[2] Ginzburg-Landau theory of superconductivity


The joint Lagrangian density of the em field and the Cooper pair field is
given by
L(Aμ , ψ) =
ψn (x)∗ γ 0 [γ μ (iδnm ∂μ + eAμ (x)tnm ) − m0 δnm ]ψn (x)
+F (ψn∗ (x)ψn (x)) − (1/4)Fμν F μν
248 Advanced Probability and Statistics: Applications to Physics and Engineering

in the Dirac picture, or in the stationary Schrodinger picture

L = (h2 /2m0 )|(δnm ∇+ieA(x)tnm /h)ψm (x)|2 +F (ψn (x)∗ ψn (x))−(1/4)Fμν F μν

where
ψn (x) = ρn (x).exp(iφn (x))
Substituting this into the Schrodinger Lagrangian, we get

∇ψn = (∇ρn + iρn ∇φn ).exp(iφn ),

so that
(δnm ∇ + ieAtnm /h)ψm =
(∇ρn + iρn ∇φn )exp(iφn ) + (ieA/h)tnm ρm )exp(iφm )
Summation here over the repeated index m is implied. For a single Cooper pair
field, this reduces to

(∇ρ + iρ∇φ + (ieA/h)ρ)exp(iφ)

Thus, in this special case, the Lagrangian density reduces to

(1/2)(∇ρ)2 + (ρ2 /2)(∇φ + eA/h)2 + F (ρ2 ) − (1/4)Fμν F μν

When we minimize this integral, then it naturally follows that |∇φ + eA/h| will
become very small, ie, the magnetic vector potential will be close to a perfect
gradient which means that the magnetic field B = ∇ × A will nearly be expelled
from the body of the superconductor. This is precisely the Meissner effect. In
the general case of several Cooper pair fields, we have

|(δnm ∇ + ieAtnm /h)ψm |2 =
n
 
(∇ρn )2 + (δnm ρn ∇φn +(eA/h)tnm , δks ρk ∇φk +(eA/h)tks )cos(φm −φs )
n k,n,m,s

+2 (∇ρn , δnm ρn ∇φn + (eA/h)tnm ρm )sin(φn − φm )
n,m

The Ginzburg-Landau Lagrangian density can in fact be extended to described


the motion of Cooper pairs in the presence of non-Abelian gauge fields also in
place of the electromagnetic field.
Chapter 9

Some Aspects of Large


Deviation Theory

[1] Large deviations in physics


[a] In classical mechanics.
[b] In fluid dynamics: A major portion of results in this area have been
obtained by Varadhan along with other scientists. They deal with the use of
entropic methods in proving hydrodynamical scaling limits. The basic idea is to
start with a simple exlcusion process which essentially describes the stochastic
dynamics of a set of particles located at the sites of a lattice. If at a particular
time, a given site is occupied by a particle, then this particle can jump to one of
the empty sites in accordance with a Poisson clock, ie, the particle waits at a site
for an exponentially distributed duration and then jumps to one of the empty
sites in accordance with a given probability distribution. One then defines the
joint probability distribution fN,t (η) of the entire exclusion process η : ZN →
{0, 1} and proves that N −1 times the relative entropy between this distribution
and the distribution obtained by assuming that the number of particles at each
site x of the lattice is Bernoulli distributed with parameter ρ(t, x/N ) and the
number of particles at the different sites are independent random variables,
converges to zero as N → ∞ provided that the density ρ(t, θ), θ ∈ [0, 1], t ≥ 0
satisfies Burger’s equation. From this fact, by applying large deviation theory,
Varadhan et.al. are able to prove the hydrodynamical limit, namely that for
any test function J(θ), the quantity

d(N −1 J(x/N )ηt (x))
x∈ZN

1
converges uniformly in time over [0, T ] in probability to dt 0 J  (θ)ρ(t, θ)(1 −
ρ(t, θ))dθ where the measure ρ(t, θ)dθ is the weak limit of the sequence of random
measures 
μN,t = N −1 ηt (x)δx/N
x∈ZN

249
250 Advanced Probability and Statistics: Applications to Physics and Engineering

Alternate proofs of the hydrodynamic scaling limit without the use of en-
tropic principles are proved by Varadhan et.al., namely based on the local
averaging
 principle. This involves expressing the above time differential of
N −1 x J(x/N )ηt (x) in terms of a drift component and a Poisson martin-
gale component, proving by standard martingale arguments that the martingale
component converges to zero as N → ∞ and then applying the local averaging
principle to arrive at the hydrodynamic scaling  limit. The averaging princi-
ple roughly states that the limit of (2N + 1)−1 y:|y−x|≤N f (τx η) converges to
fˆ(ρ(x)) as N → ∞ followed by → 0 where the η(x) s are independent Bernoulli
with means ρ(x). Varadhan has also used large deviation theory to establish
the hydrodynamic scaling limits for a system of N particles following Hamilto-
nian dynamics in phase space. This is very similar to how one derives the fluid
dynamical equations in statistical mechanics from the Boltzmann kinetic trans-
port equation with the difference that the Boltzmann distribution function is a
function of just one position and one velocity variable, but in Varadhan’s work,
one starts with the joint distribution of N particles in 6N -dimensional phase
space and then forms the empirical averages of the positions and momenta of
the N particles to arrive at the hydrodynamical equations of Euler including
an energy equation. Varadhan clearly states that this derivation is not rigorous
since it assumes the validity of the averaging principle but he states and proves
clearly that if noise is present in the Hamiltonian dynamics, then the averaging
principle is valid. In fact the presence of random noise ensures ergodic behaviour
of the system.

[c] In quantum mechanics especially scattering theory. When the scatter-


ing potential is perturbed by a small random self-adjoint operator, then the
corresponding scattering matrix and hence the scattering cross section gets cor-
respondly perturbed by a small random amount. The problem is to apply large
deviation theory to determine the approximate probability of the the scatter-
ing matrix perturbation to fall within a given set and hence obtain probabilistic
bounds on the gate designed using the scattering matrix obtained from the non-
random component of the scattering potential to deviate from the desired gate
by a prescribed threshold w.r.t a given operator norm.
[d] In electromagnetism.
[e] In general relativity and cosmology.
[f] In control theory for random partial differential equations. Let L be a
partial differential operator linear or nonlinear acting on a field f : Rn → Rn .
We consider the pde

L(f ) = s + w

where w is a noise random field, say a mixture of a Gaussian and a Poisson field.
When = 0, ie, in the absence of noise, the solution is f0 = L−1 (s). When noise
is present, write f = f0 + δf , so that δf approximately satisfies the pde

L0 (δf ) + δL(f0 ) = w
Advanced Probability and Statistics: Applications to Physics and Engineering 251

where L = L0 + δL decomposes L into a large linear partial differential operator


L0 and a small nonlinear differential operator δL. The solution for δf is then
given by √ −1
δf = L0 (w) − L−10 δL(f0 )

In fluid dynamics:
Consider a curved p dimensional surface M on which we introduce curvilin-
ear coordinates xi , i = 1, 2, ..., p and we embed this surface in N > p dimensional
Euclidean space by introducting N cartesian coordinates y n (x), n = 1, 2, ..., N .
The metric tensor on this surface is

N
gμν (x) = (∂y n /∂xμ ).(∂y n /∂xμ )
n=1

An edge on this surface is a curve t → xμ (t). If f : M → R is a smooth function


on M, then the tangent vector to this curve is defined by the equation

df (x(t))/dt = (dxμ (t)/dt)∂f (x(t))/∂xμ

so that the tangent vector to this curve at x(t) is specified by the partial differ-
ential operator
Xx(t) = (dxμ (t)/dt).∂/∂xμ
This operator has the interpretation of the velocity operator, ie, writing v μ =
dxμ /dt, the velocity field in component form, the velocity field in the vector
field differential operator formalism can be expressed as v(x) = v μ (x)∂μ . The
energy momentum tensor of the fluid on this curved surface is

T μν (x) = (ρ(x) + p(x))v μ (x)v ν (x) − p(x)g μν (x)

and the taking into account the fluid viscosity, the Navier-Stokes equations of
motion of the fluid on this curved surface can be expressed as

T:νμν = f μ (x)

where f μ (x) is the external random forcing field. We can express these equations
as
((ρ + p)v μ v ν ):ν − g μν p,ν = f μ
To be precise, we take x0 = t and the space-time metric as
n n
g00 = 1, g0k = 0, gkm = y,k y,m , k, m = 1, 2, ..., p

ie the proper time interval for the particle is


p
dτ 2 = dt2 − gkm (x)dxk dxm
k,m=1
252 Advanced Probability and Statistics: Applications to Physics and Engineering

The manifold is then transformed by a an element h of a Lie group acting on


the spatial coordinates. As a result, the velocity field transforms to
u(x) = hv(h−1 x)
or more precisely, in terms of spatial components,
uk (x) = hki v i (h−1 x)
with summation over the repeated index i. We may also project this velocity
field onto a plane (for example, by taking a photograph) That amounts to mul-
tiplying the velocity field by a rectangular matrix ((Pik )). Denote the resulting
projected velocity field by (wk (x)). Then, we have
wi (x) = Pik uk (x) = Pik hkm v m (h−1 x)
or in terms of vectors and matrices,
w(x) = P hv(h−1 x) + n(x)
where n(.) is the noise field.

[2] LDP for stationary process perturbation of a Markov process


[1] Let X(n), n ∈ Z be a stationary process and let p denote its probability
distribution on path space. Consider the rth order empirical distribution of this
process:
N
μN = N −1 δXi Xi+1 ..Xi+r−1
i=1
We compute the Gartner-Ellis scaled logarithmic moment generating function:

N −1 .logEexp(N f dμN ) =

N
N −1 log(E[exp( f (Xi Xi+1 ...Xi+r−1 )] − − − (1)
i=1
Now suppose that this process is a Markov process with transition probability
measureπ(x, dy) and we wish to evaluate the rate function for the sequence of
empirical measures μN , n ≥ 1 at ν where ν is a probability measure on Rr and
then let r → ∞ so that ν is a stationary, ie shift invariant probability measure
on RZ . We have that (1) equals for large N approximately,

−1
N log(Eexp(N. f (x)dμω (x)))

where μω is the invariant probability distribution on Rr generated by the Markov


transition probability π. Specifically,
 
N
f (x)dμω (x) = limN →∞ N −1 . f (Xi Xi+1 ...Xi+r−1 )
i=1
Advanced Probability and Statistics: Applications to Physics and Engineering 253

almost surely, where ω = (Xn )n≥1 is the Markov path. So if ν is any measure
on Rr , then the rate function for μN , N ≥ 1 is given by
 
I(ν) = supf [ f (x)dν(x) − limN →∞ N −1 .log(E[exp(N. f (x)dμω (x))]]

Let P (dω) denote the probability measure on RZ generated by the Markov


transition probability π for some initial distribution. Then, setting to zero
the variational derivative of the above expression w.r.t f gives us the following
equation for the optimal f (x):

dP (ω).dμω (x).exp(N. f (y)dμω (y))


dν(x) − limN →∞ =0
dP (ω).exp(N. f (y)dμω (y))

or equivalently,

E[dμω (x).exp(N f (y)dμω (y))]


dν(x) − limN →∞ =0
E[exp(N f (y).dμω (y))]

[3] Spectrum and higher order spectrum estimation using EKF


Consider a process x[n] modeled using the time series


p
x[n] = − a[k]x[n − k] + δ.f (x[n − k], 1 ≤ k ≤ p) + w[n + 1]
k=1

We can construct a Markov state model for this process by defining the state
vector
X[n] = [x[n − 1], ..., x[n − p]]T
and then writing the above time series model as

X[n + 1] = AX[n] + δ.F(X[n]) + bw[n + 1]

We estimate the parameters a[k] in A along with the state X[n] using the EKF.
We can in fact, also incorporate some parameters θ into the function F and
estimate it also using the EKF. Suppose that the parameter estimates of a[k], θ
are nearly constant over time slots of duration L. Then, within each such slot, we
can use these estimates to estimate the spectrum and higher order spectrum of
x[n] approximately. For example suppose a[k], θ denote the parameter estimate
over a given time slot. Then, we have


p
x[n] + a[k]x[n − k] = δ.F (x[n − 1], ..., x[n − p], θ) + w[n]
k=1

and writing
x[n] = x0 [n] + δ.x1 [n] + δ 2 x2 [n] + ...
254 Advanced Probability and Statistics: Applications to Physics and Engineering

we get

p
x0 [n] + a[k]x0 [n − k] = w[n],
k=1


p
x1 [n] + a[k]x1 [n − k] = F (x0 [n − 1], ..., x0 [n − p], θ)
k=1

This gives the solution



x0 [n] = h[k]w[n − k],
k≥0


x1 [n] = h[k]F (x0 [n − k − 1], ..., x0 [n − k − p], θ)
k≥0

where 
H(z) = h[k]z −k = A(z)−1
k≥0

where

p
A(z) = 1 + a[k]z −k
k=1

From the expressions, we deduce the approximate spectrum and higher order
spectra of x[n].

[4] Time delay estimation using the EKF


We model the signal dynamics as an ARMA process with unknown parame-
ters and cast this dynamics in state variable form. The parameters are assumed
to be slowly time varying and hence we use the EKF to estimate both the state
and the ARMA parameters over each time slot from noisy signal measurements.
From the ARMA parameters over each time slot, we construct an estimate of
the slowly time varying spectrum and bispectrum using our knowledge of the
variance and skewness of the process noise. The spectrum gives us the set of
dominant frequencies present in the signal along with their amplitudes/intensity
while the bispectrum yield the relative time delay/phase shift of the signal in
each time slot.

Large deviation analysis of the EKF error process The state model
is
x[n + 1] = f (x[n], n) + g(x[n], n)w[n + 1]
and the measurement model is

z[n] = h(x[n], n) + v[n]

The EKF is
x̂[n + 1|n] = f (x̂[n|n], n),
Advanced Probability and Statistics: Applications to Physics and Engineering 255

P [n + 1|n] = F [n]P [n|n]F [n]T + Q


where
F [n] = ∂f (x̂[n|n], n)/∂x
is the Jacobian matrix of the state model,

x̂[n + 1|n + 1] = x̂[n + 1|n] + K[n](z[n + 1] − h(x̂[n + 1|n], n))

where

K[n] = P [n+1|n]H[n]T (H[n]T P [n+1|n]H[n]+R)−1 , H[n] = ∂h(x̂[n+1|n], n)/∂x

P [n + 1|n + 1] = (I − K[n]H[n])P [n + 1|n](I − K[n]H[n])T + K[n]QK[n]T


where
R = cov(v[n]), Q = cov(w[n])

Problem: Write down the linearized stochastic difference equation for the
error process e[n] = x[n] − x̂[n|n] and using formulas for the rate function of
a Gaussian process, calculate the LDP rate function for the process e[n]. Note
that in the difference equation for e[n], the coefficients will generally be functions
of x̂[n|n]. If we assume that tracking is good, then the stochastic process x̂[n|n]
is replaced by the deterministic process xd [n] in the linear difference equation
for e[n] thus guaranteeing that e[n] is a Gaussian process. If we wish to be more
accurate, then we expand functions of x̂[n|n] around xd [n] in which case, we
obtain nonlinear difference equations satisfied by the two error processes x[n] −
x̂[n|n]andxd [n] − x̂[n|n] that are driven by a white Gaussian process and hence
by applying the contraction principle, we can obtain a more accurate formula for
the joint rate function of these two error processes from which the approximate
formula for the probability of deviation of both of these error processes by a
threshold value around zero can be calculated and controllers can be designed
to minimize this deviation probability.

[5] Large deviations in classical and quantum hypothesis problems.


[a] Classical case: Under the hypothesis H1 , the random variable has the
probability distribution p(x) and under H0 , it has the probability distribution
q(x). Consider n independent trials of the experiment. Then, under H1 , the
pdf of the random vector x = (x1 , .., xn ) is given by

p⊗n (x) = p(x1 )...p(xn )

and under H0 , its pdf is given by

q ⊗n (x) = q(x1 )..q(xn )

From the Neyman-Pearson decision theory, it is known that the optimal test
that minimizes the false alarm probability P (H1 |H0 ) for a given miss probability
256 Advanced Probability and Statistics: Applications to Physics and Engineering

P (H0 |H1 ) = is given by the following: Decide H1 if p⊗n (x)/q ⊗n (x) > c(n)
and decide H0 if the same is < c(n), where c(n) is chosen so that

q ⊗n (x)dx =
Zn

where
Zn = {x : p⊗n (x)/q ⊗n (x) > c(n)}
Define the time averaged log likelihood ratio


n
Ln (x) = n−1 / log(p(xk )/q(xk )) = n−1 .log(p⊗n (x)/q ×n (x))
k=1

Then our test region can be written as

Zn = {x : Ln (x) > R}, R = n−1 .log(c(n))

or equivalently as

Zn = {x : p⊗n (x)/q ⊗n (x) > exp(nR)}

Applying Cramer’s large deviation principle, we have that for large n,

n−1 log(P (Ln (x) > R|H0 )) ≈ −inf {I0 (x) : x > R}

and
n−1 log(P (Ln (x) < R|H1 )) ≈ −inf {I1 (x) : x < R}
where 
I0 (x) = sups∈R {sx − log( q 1−s (x)ps (x))}
x

and 
I1 (x) = sups∈R {−sx − log( p1−s (x)q s (x))}
x

Define
 
F0 (s) = log( q 1−s (x)ps (x)), F1 (s) = log( p1−s (x)q s (x))
x x

We have then for 0 < < 1,

F0 ( ) = − .H(q|p) + O( 2 ), ↓ 0,

F0 (1 − ) = − .H(p|q) + O( 2 ), ↓ 0,
F1 ( ) = − .H(p|q) + O( 2 ),
F1 (1 − ) = − .H(q|p) + O( 2 )
Advanced Probability and Statistics: Applications to Physics and Engineering 257

where 
H(p|q) = p(x)log(p(x)/q(x))
x

is the relative entropy between the pdfs’ p, q. Note that H(p|q), H(q|p) ≥ 0 with
equality iff both the pdf’s p, q coincide. Now,

I0 (x) ≥ .x − F0 ( )

and therefore,

inf {I0 (x) : x > R} ≥ .R − F0 ( ) = (R + H(q|p)) + O( 2 )

If follows that on taking R = −H(q|p) + δ that

inf {I0 (x) : x > R} ≥ δ + O( 2 )

and hence
−inf {I0 (x) : x > R} ≤ −δ + O( 2 )
This means that for this choice of R, we have that

P (H1 |H0 ) → 0, n → ∞

On the other hand, we have

I0 (x) ≥ (1 − )x − F0 (1 − )

and hence, for any R,

inf (I0 (x) : x > R} ≥ (1 − )R + .H(p|q) + O( 2 )

= R + (H(p|q) − R) + O( 2 )
and hence,

−inf (I0 (x) : x > R) ≤ −R − (H(p|q) − R) + O( 2 )

= −R − δ + O( 2 )
for the choice R = H(p|q) − δ. By choosing sufficiently small, we find that

−inf (I0 (x) : x > H(p|q) − δ) ≤ −H(p|q) + δ

which means that for the choice of R = H(p|q) − δ, we can get the probability
of false alarm to approach zero at a rate arbitrary close to −H(p|q). Moreover,
for this choice of R,

I1 (x) = sups {−sx − F1 (s)} ≥ − .x − F1 ( )

so that
inf (I1 (x) : x < R) ≥ − .R − F1 ( )
258 Advanced Probability and Statistics: Applications to Physics and Engineering

and so

−inf (I1 (x) : x < R) ≤ .R + F1 ( ) = .R − .H(p|q) + O( 2 )

= .(R − H(p|q)) + O( 2 ) = − .δ + O( 2 )
which means that for this same choice R = H(p|q) − δ, the miss probability
converges to zero. On the other hand, now take R = H(p|q) + δ. Then, we have
by the same logic,

n−1 .log(P (H1 |H0 )) = −inf {I0 (x) : x > R} ≤ −R − δ + O( 2 )

= −H(p|q) − δ(1 + ) + O( 2 ) ≤ −H(p|q) − δ


implying thereby that the false alarm probability goes to zero at a rate slightly
faster than −H(p|q). In this case, we find that

I1 (x) = sups {−sx − F1 (s)}

so that n−1 logP (H0 |H1 ) asymptotically equals

−inf {I1 (x) : x < R} = −infx<R sups {−sx − F1 (s)}

= supx<R infs {sx + F1 (s)}


Recall that 
F1 (s) = log( p1−s (x)q s (x))
x

Now, F1 (s) being a logarithmic moment generating function, is convex and


hence sx + F1 (s) is also convex ∀s and hence attains its minimum when

F1 (s) = −x

ie, 
exp(−F1 (s)) p1−s q s .log(p/q) = x − − − (a)
x

let s0 denote the solution to this equation. Then, we observe that

(d/ds)(rhsof (a)) < 0



Hence, exp(−F1 (s)) s p1−s q s .log(p/q) is a decreasing function of s. However
as s → 0, this function converges to H(p|q). Hence, it follows that for all s > 0,
this function is smaller than H(p|q) and hence we get that for x > H(p|q),
equation (a) has a negative solution for s. Denote this value of s by s0 (x) Thus,
s0 (x) < 0 and we find that for R = H(p|q) + δ,

−inf {I1 (x) : x < R} = supx<R infs {sx + F1 (s)} = supx<R {s0 (x)x + F1 (s0 (x)))

≥ s0 (R)R + F1 (s0 (R))


Advanced Probability and Statistics: Applications to Physics and Engineering 259

Now for δ small and R = H(p|q) + δ, s0 (R) is close to zero and negative. So,
we can write

s0 (R)R + F1 (s0 (R)) = s0 (R)H(p|q) + F1 (0)s0 (R) + O(δ 2 )

= s0 (R)H(p|q) − s0 (R)H(p|q) + O(δ 2 ) = O(δ 2 )


This shows that asymptotically, P (H0 |H1 ) equals unity.

[6] Large deviations applied to some problems of stochastic control


theory
[a] Given a stochastic dynamical system in discrete time

x(n + 1) = f (x(n)) + g(x(n))w(n + 1)

where w(n) are iid standard normal random vectors, we make a measurement

z(n) = h(x(n)) + σ.v(n)

where v(n) are iid normal independent of w(.) and construct the EKF:

x̂(n + 1|n) = f (x̂(n|n)),

x̂(n + 1|n + 1) = x̂(n + 1|n) + K(n)(z(n + 1) − h(x̂(n + 1|n))


where the Kalman gain matrix K(n) is chosen so that

T r(P (n + 1|n + 1))

is a minimum where

P (n + 1|n + 1) = Cov(e(n + 1|n + 1)|Zn ), e(n|n) = x(n) − x̂(n|n),

and

P (n + 1|n) = Cov(e(n + 1|n)|Zn ), e(n + 1|n) = x(n + 1) − x̂(n + 1|n)

Note that

e(n + 1|n + 1) = e(n + 1|n) − K(n)(h(x(n + 1) − h(x̂(n + 1|n)) + σ.v(n + 1))

≈ e(n + 1|n) − K(n)(h (x̂(n + 1|n))e(n + 1|n) + σ.v(n + 1))


= (I − K(n)H(n + 1))e(n + 1|n) − σ.K(n)v(n + 1)
where
H(n + 1) = h (x̂(n + 1|n))
Thus,

P (n+1|n+1) = (I−K(n)H(n+1))P (n+1|n).(I−K(n)H(n+1))T +σ 2 K(n)K(n)T


260 Advanced Probability and Statistics: Applications to Physics and Engineering

Now T r(P (n + 1|n + 1)) can be minimized easily w.r.t K(n) by a variational
principle in matrix calculus. Also note that

e(n + 1|n) ≈ x(n + 1) − f (x̂(n|n)) ≈ f (x(n) − f (x̂(n|n)) + g(x̂(n|n))w(n + 1)

≈ f  (x̂(n|n))e(n|n) + g(x̂(n|n))w(n + 1)
and so
P (n + 1|n) = F (n)P (n|n)F (n)T + G(n).G(n)T
where
F (n) = f  (x̂(n|n)), G(n) = g(x̂(n|n))
Now consider the error sequence e(n) = e(n|n) = x(n)− x̂(n|n). From the above
equations, it is clear that its covariance P (n) = P (n|n) satisfies approximately
a linear quadratic difference equation. The question is, under what conditions
does P (n) converge to a constant matrix P (∞) and if so, then at what rate does
it converge to this limiting error covariance ? First we observe that the optimal
K(n) satisfies

−H(n + 1)P (n + 1|n)(I − K(n)H(n + 1))T + σ 2 K(n)T = 0


oe
(H(n + 1)P (n + 1|n)H(n + 1)T + σ 2 I)K(n)T = H(n + 1)P (n + 1|n)
or
K(n) = P (n + 1|n)H(n + 1)T (H(n + 1)P (n + 1|n)H(n + 1)T + σ 2 I)−1
so
P (n + 1|n + 1) =
T
P (n+1|n)(I−K(n)H(n+1))
= P (n+1|n)−P (n+1|n)H(n+1)T (H(n+1)P (n+1|n)H(n+1)T +σ 2 I)−1 P (n+1|n)
The problem involving large deviation theory is therefore the following: Suppose
e(n), n = 1, 2, ... is a sequence of independent normal random vectors with
covariance matrices P (n), n = 1, 2, .... Assume that P (n) → P as n → ∞.
Then consider
n
Tn = n−1 (e(k)e(k)T − P (k))
k=1

Then it is clear that ETn = 0 and cov(Tn ) → 0 as n → ∞. The prob-


lem isto calculate the rate function of Tn . Alternately, we note that Un =
(n−1 k=1 e(k).e(k)T ) − P is a random matrix satisfying E(Un ) → 0 and
n

Cov(Un ) → 0. In fact, from the law of large numbers, Un converges almost


surely to zero and we can ask the question, at what rate does it converge to
zero, ie, we must determine the LDP rate function of {Un }. Suppose Xn is any
zero mean independent sequence. We form

n
Sn = Xk
k=1
Advanced Probability and Statistics: Applications to Physics and Engineering 261

and define Zn = Sn /n. Assume for the moment that Xn is real valued. Then,

exp(nΛn (λ) = Eexp(nλZn ) = E[exp(λSn )] = Πnk=1 E[exp(λXk )]

= Πnk=1 Mk (λ)
Mk is the moment generating function of Xk . We form

n
n−1 Λn (nλ) = n−1 log(Mk (λ))
k=1

Suppose Mk (λ) → M (λ). Then, by the Cesaro sum theorem,

n−1 Λn (nλ) → Λ̄(λ) = log(M (Λ))

and we can apply the Gartner-Ellis theorem to obtain the LDP for {Zn }.

[7a] Large deviation principle in super-conductivity


[7b] Large deviation for Schrodinger equation perturbed by a random Gaus-
sian magnetic vector potential field.
Statement of the problem: Suppose A(t, r) is a random Gaussian magnetic
vector potential. Assume that this 3-vector valued process is stationary w.r.t
the time variable. We calculate the wave function of the electron as a function
of time by solving the Schrodinger equation:

i∂t ψ(t, r) = −(2m)−1 (∇ + ieA(t, r))2 ψ(t, r) + V (r)ψ(t, r)

where V (r) is the static nuclear potential. The solution to this equation can
formally be written as
 t
ψ(t, r) = T {exp(−i H(s)ds)}ψ(0, r)
0

where
H(t) = −(2m)−1 (∇ + ieA(t, r))2 + V (r)
is a random operator valued stationary stochastic process. Define the random
unitary operator valued stochastic process
 t
U (t) = T {exp(−i H(s)ds)}
0

Then, we wish to formulate a large deviation principle for U (t). The transition
probability amplitude per unit time from a state |u > to another state |v > is
given by
Kt (u, v) = t−1 < v|U (t)|u >
and the transition probability per unit time between the same states is given by

Pt (u, v) = t−1 | < v|U (t)|u > |2


262 Advanced Probability and Statistics: Applications to Physics and Engineering

We compute U (t) using the Dyson series:

H(t) = H0 + eV1 (t) + e2 V2 (t)

where
H0 = −∇2 /2m + V (r),
V1 (t) = (−i/2m)(divAt + 2(At , ∇)), V2 (t) = A2t /2m
Upto O(e2 ), we get
 t
U (t) = U0 (t)−ie U0 (t−s)V1 (s)U0 (s)ds−e2
0

U0 (t−s1 )V1 (s1 )U0 (s1 −s2 )V1 (s2 )U0 (s2 )ds1 ds2
0<s2 <s1 <t
 t
2
−ie U0 (t − s)V2 (s)U0 (s)ds
0
V1 , V2 are random operator valued stationary stochastic processes. A special
case in which the A(t, r) is white w.r.t the time variable implies that the oper-
ators V1 (t), V2 (t) are also white noises with V1 being white Gaussian. However,
V2 (t) being the square of a white noise process is not precisely defined. So
we pass over to discrete time in order to formulate the LDP precisely. The
discretized form of the above Dyson series solution is
 n
W [n] = U0 [n]−1 [U [n] − U0 [n]] = −ieΔ U0 [−k]V1 [k]U0 [k]
k=0


n 
−e2 (iΔ U0 [−k]V2 [k]U0 [k] + Δ2 U0 [−m]V1 [m]U0 [m − k]V1 [k]U0 [k])
k=0 0≤k≤m≤n

In this expression, {(V1 [k], V2 [k])} is an iid bivariate operator sequence. Let us
first consider only the linear term:

n
W [n] = −ie U0 [−k]V1 [k]U0 [k]
k=0

Let
Eexp(T r(XV1 [k])) = M1 (X)
where X is a Hermitian matrix. Then,

Eexp(T r(XW [n])) = Πnk=0 M1 (U0 [k]XU0 [−k])

[8] LDP in quantum state discrimination


Let ρ, σ be two states in the same finite dimensional Hilbert space. Let
0 ≤ T ≤ I. Then, Let ρ, σ have the spectral representations:
 
ρ= |uk > p(k) < uk |, σ = |vk > q(k) < vk |
k k
Advanced Probability and Statistics: Applications to Physics and Engineering 263

Then, the error probability for the test T where ρ has a-priori probability P1
and σ has a-priori probability P2 is given by

P r(e, T ) = P1 T r(ρ(1 − T )) + P2 T r(σT )


 
= P1 < vk |ρ(1 − T )|vk > +P2 < uk |σT |uk >
k k
 
= P1 < vk |um > p(m) < um |1−T |vk > +P2 < uk |vm > q(m) < vm |T |uk >
k,m k,m

 
= P1 p(m) < um |vk > |2 + (P2 q(k) < um |vk >< vk |T |um > −P1 p(m)
k,m k,m
< vk |um >< um |T |vk >)

For the special case P1 = P2 = 1/2, this becomes



P r(e, T ) = (1/2) [p(m) < vk |um >< um |1−T |vk > +q(k) < um |vk >< vk |T |um >]
k,m

Consider now the special case when T is an orthogonal projection. Then T 2 =


T = T ∗ and we have
 
T r(ρ(1−T )) = p(k) < uk |1−T |vm >< vm |1−T |uk >= p(k)| < uk |1−T |vm > |2
k,m k,m

 
T r(σT ) = q(k) < vk |T |um >< um |T |vk >= q(m)| < uk |T |vm > |2
k,m k,m
Further,
p(k)| < uk |1 − T |vm > |2 + q(m)| < uk |T |vm > |2 ≥
min(p(k), q(m)).(| < uk |1 − T |vm > |2 + | < uk |T |vm > |2 )
≥ α(k, m)| < uk |1 − T |vm > + < uk |T |vm > |2 = α(k, m)| < uk |vm > |2
where
α(k, m) = min(p(k), q(m))
Thus, 
P r(e, T ) ≥ (1/2) α(k, m)| < uk |vm > |2
k,m

Note that this result is true for any two positive definite matrices ρ, σ. Now let
ρ, σ be two states and define Tn to be the orthogonal projection onto {exp(nR)ρ⊗n −
σ ⊗n > 0}. Let
ω = (k1 , ..., kn , m1 , ..., mn )
and
P ⊗n (ω) = p(k1 )...p(kn )Πnr=1 | < ukr |vmr > |2
Q⊗n (ω) = q(m1 )...q(mn )Πnr=1 | < ukr |vmr > |2
264 Advanced Probability and Statistics: Applications to Physics and Engineering

Then, we have from the above discussion,

T r(ρ⊗n (1 − Tn )) + T r(σ ⊗n Tn )

≥ (1/2) min(p(k1 )...p(kn ), q(m1 )...q(mn ))Πnr=1 | < ukr |vmr > |2
k1 ,...,kn ,m1 ,...,mn

= (1/2) min(P ⊗n (ω), Q⊗n (ω))
ω
⊗n
= (1/2)P {ω : P ⊗n (ω) < Q⊗n (ω)}
+(1/2)Q⊗n {ω : P ⊗n (ω) > Q⊗n (ω)}
Note that P ⊗n (ω) is the n-fold product measure whose one dimensional marginals
are P1 (k, m) = p(k) < uk |vm > |2 and Q⊗n (ω) is the n-fold product measure
whose one dimensional marginals are Q1 (k, m) = q(m)| < uk |vm > |2 . Large
deviation theory or more precisely, Cramer’s theorem in the classical setting can
now be applied to this problem to derive an asymptotic lower bound for

n−1 .log(T r(ρ⊗n (1 − Tn )) + T r(σ ⊗n Tn ))

ie the scaled logarithmic error probability.

[9] LDP in image processing


If an image field f is subjected to rotation and AWGN blurring and we try to
estimate the rotation, then its estimate is likely to be close to the true rotation
provided that the noise amplitude is small. In the limit when the noise amplitude
becomes very small, we can using the LDP estimate the probability that the
rotation estimate will deviate from the true rotation by an amount greater than
a threshold. This LDP estimate of the error probability decreases when the
minimum of the rate function over the error space increases. Therefore, we can
effectively estimate our rotation by choosing our input image field in such a
manner that the minimum of the LDP rate function over the error space is as
large as possible.

Let G = SO(3) act on an image field f (r) to produce another image field

g(r) = f (R−1 r) + w(r), R ∈ SO(3)

where w(r) is a zero mean Gaussian noise field that has G-invariant correlations.
Let Ylm (r̂) denote the standard spherical harmonics. Then the noise correlations
have the form 
Kw (r, r ) = σ(l)2 Ylm (r̂).Ylm (r̂ )∗
l,m

Let 
flm = f (r)Ylm (r̂)∗ dΩ(r̂), fl = ((flm ))−l≤m≤l
Advanced Probability and Statistics: Applications to Physics and Engineering 265

Then,
gl = πl (R)fl + wl , l = 0, 1, 2, ...
and moreover by the orthogonality of the spherical harmonics for different l, it
is easy to see that wl , l = 0, 1, 2, ... are independent complex Gaussian random
vectors with zero mean and moreover,

E(wl wl∗ ) = σ(l)2 Il , l = 0, 1, 2, ...

The maximum likelihood estimator of R is therefore given by



R̂ = argminS∈G σ(l)−2  gl − πl (S)fl 2
l

We can write
S = exp(X)R
and then
R̂ = exp(X̂)R
where 
X̂ = argminX ( σ(l)−2  gl − exp(Xl ).πl (R)fl 2 )
l

Here X ranges over all real skew symmetric 3 × 3 matrices and

Xl = dπl (X)

ie, dπl is the representation of the Lie algebra of G corresponding to the repre-
sentation πl of G:
πl (exp(X)) = exp(dπl (X))
Our aim is to use the LDP to determine approximately the probability P (
X̂ > δ) in terms of a rate function I(X), ie,
−1
P ( X̂ > δ) ≈ exp(− inf (I(X) : X > δ))

and then determine an optimal image field f (r) so that this ”error probability”
is as small as possible subject to the constraint that f belongs to a class of
image fields. We make an approximate calculation noting that X ≈ 0:

exp(Xl ) ≈ 1 + Xl + Xl2 /2

Then,
 gl − exp(Xl ).πl (R)fl 2 ≈
 gl − πl (R)fl − Xl πl (R)fl − Xl2 πl (R)fl /2 2
≈ gl − πl (R)fl 2 +  Xl πl (R)fl 2
−2Re(< gl − πl (R)fl , Xl πl (R)fl >) − Re(< gl − πl (R)fl , Xl2 πl (R)fl >
=≈ gl − πl (R)fl 2 +T r(Xl πl (R)fl fl∗ πl (R−1 )Xl∗ )
266 Advanced Probability and Statistics: Applications to Physics and Engineering

−2.Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Xl∗ ])


−Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Xl∗2 ])
We write
X = x1 Z1 + x2 Z2 + x3 Z3
where {Z1 , Z2 , Z3 } is the standard real basis for the Lie algebra of all 3 × 3 real
skew-symmetric matrices and x1 , x2 , x3 are variable real numbers. Then, the
above expression becomes

 gl − exp(Xl ).πl (R)fl 2 ≈


3

 gl − πl (R)fl 2 + xi xj .T r(Zil πl (R)fl fl∗ πl (R−1 )Zjl

)
i,j=1

3

−2. xi .Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Zil∗ ])
i=1
3

− xi xj .Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Zil∗ Zjl

])
i,j=1

Setting the gradient of the negative log-likelihood function



σ(l)−2  gl − exp(Xl ).πl (R)fl 2
l

w.r.t x = (xi ) to zero gives us the optimal normal equations:

 3

σ(l)−2 T r(Zil πl (R)fl fl∗ πl (R−1 )Zjl

)xj
l j=1

 3

− σ(l)−2 Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Zil∗ Zjl

])xj
l j=1

= σ(l)−2 Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Zil∗ ]), i = 1, 2, 3
l

This can be inverted to yield

x = (xi ) = A−1 b

where
A = ((A(i, j))) ∈ R3×3 , b = ((b(i))) ∈ R3×1
are respectively defined by

A(i, j) = σ(l)−2 T r(Zil πl (R)fl fl∗ πl (R−1 )Zjl

)
l
Advanced Probability and Statistics: Applications to Physics and Engineering 267


− σ(l)−2 Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Zil∗ Zjl

])
l

and 
b(i) = σ(l)−2 Re(T r[(gl − πl (R)fl )fl∗ πl (R−1 )Zil∗ ])
l

We can write √ √
A = A1 + .A2 , b = c
where 
A1 (i, j) = σ(l)−2 T r(Zil πl (R)fl fl∗ πl (R−1 )Zjl

)
l

A2 (i, j) = − σ(l)−2 Re(T r[wl fl fl∗ π(RT )Zil∗ Zjl

])
l

c(i) = σ(l)−2 Re(T r[wl fl∗ πl (RT )Zil∗ ])
l

It is clear that when there is no noise, ie, = 0, then

x = A−1 b = 0

since = 0 implies
A = A1 , b = 0
with A1 being non-singular provided that we base our estimates on choosing a
sufficiently large number of indices l. Note that A, b depend on f, R. Now, we
have the approximate solution
√ −1
x= A1 c + O( )

or more precisely,
√ √
x= (A−1
1 − A−1 −1
1 A2 A1 )c + O(
3/2
)

= A−1 −1 −1
1 c − .A1 A2 A1 c + O(
3/2
)
From standard LDP for Gaussian random vectors, the approximate rate function
for x is given by
I(x) = (1/2)xT A−1 −1
1 Rcc A1 x

where
Rcc = E(ccT ) = ((Rcc (i, j)))
where

Rcc (i, j) = σ(l)−2 σ(k)−2 Cov(Re(T r[wl fl∗ πl (RT )Zil∗ ]), Re(T r[wk fk∗ πk (RT )Zjk

]))
l,k


= (σ(l))−4 Cov(Re(fl∗ πl (RT )Zil∗ wl ), Re(fl∗ πl (RT )Zjl

wl )
l,k
268 Advanced Probability and Statistics: Applications to Physics and Engineering


= (1/2) σ(l)−2 Re(fl∗ πl (RT )Zil∗ Zjl πl (R)fl )
l

Note that we are using the following notation:

Zjl = dπl (Zj )

Now
inf (I(x) : x > δ) =
inf ((1/2)xT A−1 −1
1 Rcc A1 x : x > δ)

= (δ 2 /2)λmin (T )
where λmin (T ) is the minimum eigenvalue of the 3 × 3 positive definite matrix

T = A−1 −1
1 Rcc A1

and hence f must be chosen so that λmin (T ) is a maximum in order to minimize


the error probability computed using the LDP for small noise amplitudes.

[10] Lower bound in Cramer’s theorem on large deviations


Let X1 , ..., Xn , ... be iid with logarithmic mgf Λ(η):

Λ(η) = log(E[exp(ηX1 )])

Define
Zn = Sn /n, Sn = X1 + ... + Xn
and let Λn (.) be the moment generating function of Zn . Clearly

Λn (nη) = nΛ(η)

Then let μn (.) denote the probability distribution of Zn and for some fixed
η ∈ R, let νn (.) denote the probability distribution on R defined by

(dνn /dμn )(z) = exp(nηz − Λn (nη)) = exp(n(ηz − Λ(η)))

Then νn has mean


 
zdνn (z) = zd(νn /dμn )(z)dμn (z) =
 
z.exp(nηz − Λn (nη))dμn (z) = exp(−nΛ(η))(d/d(nη)) exp(nηz)dμn (z)

= exp(−nΛ(η)))(d/d(nη))(exp(Λn (nη)))
= (d/d(nη))(Λn (nη)) = Λn (nη) = Λ (η) = y
say. We have

μn (B(y, δ)) = exp(−nηz + nΛ(η))dνn (z)
B(y,δ)
Advanced Probability and Statistics: Applications to Physics and Engineering 269


= exp(−nη(z − y) + nΛ(η) − nηy)dνn (z)
B(y,δ))

= exp(n(Λ(η) − yη)) exp(−nη(z − y))dνn (z)
B(y,δ)

≥ exp(n(Λ(η) − yη)) exp(−nη(z − y))dνn (z)
B(y,δ)

≥ exp(n(Λ(η) − yη)).exp(−nηδ)νn (B(y, δ))


Taking logarithms, we get

n−1 log(μn (B(y, δ)) ≥ n−1 log(νn (B(y, δ)))

+Λ(η) − yη − ηδ
By the weak law of large numbers,

μn (B(y, δ)) → 1

and hence we get on letting δ → 0 that

liminf n−1 .log(μn (B(y, δ)) ≥ Λ(η) − yη = Λ(η) − ηΛ (η)

= −Λ∗ (y)
This completes the proof of the lower bound in Cramer’s theorem on large
deviations for sample averages of iid random variables.

[11] Generalization to the non-iid case: The Gartner-Ellis theorem. Let Zn


be a family of random variables with logarithmic moment generating function
Λn (.). Let μn denote the probability distribution of Zn and let νn denote the
probability distribution on R defined by

dνn (z) = exp(nηz − Λn (nη))dμn (z)

Assume that
n−1 Λn (nη) → Λ(η)
exists. The mean of νn is given by

mn = Λn (nη) = (d/dη)n−1 Λn (η)) → Λ (η) = y

say. Thus by the same reasoning as above,

n−1 .log(μn (B(y, δ))) ≥ n−1 .log(νn (B(y, δ)) + n−1 Λn (nη) − yη − ηδ

Now, the logarithmic moment generating function of νn evaluated at nλ Λn (n(λ+


η)) − Λn (nη) and this divided by n converges to Λ(λ + η) − Λ(η) by hypothesis.
The Legendre transform of this is

I(η, x) = supλ (λx − Λ(λ + η) + Λ(η))


270 Advanced Probability and Statistics: Applications to Physics and Engineering

= Λ∗ (x) + Λ(η) − ηx ≥ 0
By the large deviation upper bound (proved easily using the Chebyshev-Markov
inequality)

limsupn−1 .log(νn (B(y, δ)c ) ≤ −infx∈B(y,δ)c I(η, x) < 0

by hypothesis on y. Thus,
νn (B(y, δ)c ) → 0
and hence we infer that
νn (B(y, δ)) → 1
from which it follows that

liminf n−1 .μn (B(y, δ)) ≥ Λ(η) − yη − ηδ

and letting δ → 0 yields the desired lower bound.


[12] Continuation of the summary of Varadhan’s work on large
deviations
Let B(t) be a Brownian motion process. Consider
 t
u(t, x) = E[exp( V (x + B(s))ds)f (x + B(t))]
0

It is well known (the Feynman-Kac formula) that u(t, x) satisfies


∂u(t, x)/∂t = (1/2)∂ 2 u(t, x)/∂x2 + V (x)u(t, x)
with the initial condition
u(0, x) = f (x)
Thus, we can write
u(t, x) = exp(t(L + V ))f (x)
where
L = (1/2)∂ 2 /∂x2
It follows by spectral decomposition of the self-adjoint operator L + V that if
λ(V ) is the largest eigenvalue of L + V , then
limt→∞ t−1 .log(u(t, x)) = λ(V )
On the other hand, we know from Rayleigh’s variational principle that λ(V ) is
the supremum of < g, (L + V )g > over all g with  g 2 = 1. Equivalently, λ(V )
is the supremum of V (x)|g(x)|2 dx − (1/2) |g  (x)|2 dx over all g ∈ L2 (R) with
g 2 (x)dx = 1. Thus, we get the formula
 t
limt→∞ t−1 .logE[exp( V (x+B(s))ds)f (x+B(t))]
0  
= supg (( V (x)|g(x)|2 dx−(1/2) |g  (x)|2 dx)
Advanced Probability and Statistics: Applications to Physics and Engineering 271

where the supremum


 is taken over all g with g 2 (x)dx = 1. Equivalently,
writing g(x) = f (x), we get that the above limit equals the supremum of

V (x)f (x)dx− f 2 (x)dx/8f (x) over all probability densities f (x) on R.Varadhan
generalized this formula to arbitrary processes in the form of an integral lemma
which is the starting point for the weak convergence approach to large devi-
ations pioneered by Dupuis and Ellis. The statement of Varadhan’s integral
lemma Z( ), → 0 is a family of random variables satisfying the LDP with rate
function I(z) iff for all functions φ(Z), we have that

lim →0 .logE[exp(φ(Z( ))/ )] = supz (φ(z) − I(z))

As a special case of this general theorem, Let X(t) be a process whose empirical
measure LX (t, .) defined by
 t
LX (t, B) = t−1 χB (X(s))ds, t > 0
0

satisfies the LDP as t → ∞ with rate I(ν). Then Varadhan’s integral lemma
implies that
  t
−1 −1
t .logEexp(t V (x)LX (t, dx)) = t .logE[exp( V (X(t))dt)]
0

converges as t → ∞ to supν ( V (x)dν(x) − I(ν)) where the supremum is taken


over all probability measures ν. In the special case when X(t) is Brownian
motion, this formula reduces to the earlier mentioned one by noting that the
rate function for the empirical distribution of Brownian motion is


I(ν) = f 2 dx/8f

where
f (x) = dν(x)/dx
is the probability density of the probability distribution ν
Chapter 10

Contributions of some
Indian Scientists

[1] A survey of the pedagogical work of V.S.Varadarajan


During the period 1950-1965, the Indian Statistical Institute at Calcutta was
a centre of excellence research but restricted only to probability and statistics
under the leadership of C.R.Rao, the most famous Indian Statistician credited
to important discoveries like the Cramer-Rao lower bound used in all disciplines
of sciences and engineering and the Rao-Blackwell theorem. A few briliant re-
search students like S.R.S.Varadhan, K.R.Parthasarathy and R.Ranga-Rao for
their doctoral work carried out pioneering research in probability like represen-
tation theory of infinitely divisible probability distributions on locally compact
Abelian groups and Hilbert space and in to the then recently evolved field
of information theory due to the coding theorems of Calude Shannon. When
V.S.Varadarajan joined this institute as a doctoral student, he gave a new proof
of Yuri Prohorov’s celebrated theorem relating tightness of a family of measures
to their weak compactness, ie, to the existence of a weakly convergent subse-
quence. This theorem provided a fundamental connection between topology and
probability theory. Still the ISI remained a centre of excellence only in statistics
where students were taught number crunching techniques using hand operated
calculators. After completing his PhD, Varadarajan visited America, attended
the lectures of G.W.Mackey on his imprimitivity theory relating unitary rep-
resentations of a group acting on a measure space and spectral measures on
the same measure space with values in the lattice of projections in a Hilbert
space to a covariant description of observables in quantum mechanics wherein
observables correspond to spectral measures. In this process, Mackey had de-
veloped the theory of induced representations for arbitrary groups generalizing
the results of Frobenius for finite groups and had discovered the important re-
lationship between an irreducible imprimitivity system and irreducibility of the
induced representation for the case when the larger group is a semidirect prod-
uct of an Abelian group and another group that normalizes the former. Mackey

273
274 Advanced Probability and Statistics: Applications to Physics and Engineering

had applied this theory to give a rigorous derivation of the general form of ob-
servables like position, momentum, angular momentum, spin and energy of a
quantum mechanical system using the projective unitary representations of the
Galilean group. Varadarajan, after mastering Mackey’s theory returned to ISI
Calcutta and gave a course to the research students on al this. Simultaneously,
he gave courses at the ISI on the structure theory of Lie groups and Lie alge-
bras and on Von-Neumann’s celebrated work on operator theory in quantum
mechanics, all from a rigorous mathematical angle. Varadarajan was thus a
pioneer in Indian Mathematics even at a very young age when he injected se-
rious mathematics and mathematical physics like functional analysis, operator
theory, Lie groups, Lie algebras and their representations and the mathematical
foundations of quantum mechanics into an otherwise dull environment where
only probability and statistics existed. The outcome of Varadarajan’s lectures
on these new subjects can be seen today when many research students in his
audience at that time have flowered into celebrated mathematicians and math-
ematical physicists whose works in statistical mechanics, group representation
theory and quantum noise and non-commutative probability theory are known
all over the world.
Whilst visiting America, G.W.Mackey had advised Varadarajan to read
the ”terrifying algebraic machinery” of Harish-Chandra for constructing irre-
ducible representations of semisimple Lie groups and Lie algebras and deriv-
ing Plancherel formulae for semisimple Lie groups that have more than one
non-conjugate Cartan sub-groups extending by leaps and bounds the work of
Gelfand on the representation theory of complex semisimple Lie groups (which
have just a single non-conjugate Cartan subgroup). So Varadarajan proceded
to Princeton, met Harish-Chandra and after many discussions with the mas-
ter, he mastered the general theory of roots developed by Cartan leading to
the classification of all simple Lie algebras, distributional characters, Harish-
Chandra’s (g, K) modules and the Discrete series for real semisimple Lie groups,
the outcome of which was two marvellous books, one, a textbook on Lie groups,
Lie algebras and their representations and another on Harmonic analysis on
semisimple Lie groups. The former gives a detailed account accessible to the
graduate student on Harish-Chandra’s 1951 paper in which the master had de-
rived beautiful algebraic formulae for finite and infinite dimensional irreducible
representations of a semisimple Lie algebra corresponding to each dominant in-
tegral Cartan weight using quotients of the universal enveloping algebra by a
maximal ideal. In this textbook, Varadarajan gives a detailed and easily acces-
sible account of Cartan’s theory of roots and weights for classifying all simple
Lie algebras and their irreducible representations. This book culminates in the
proof of the celebrated Weyl character formula for all irreducible representa-
tions of compact group for a given dominant integral weight. An innumerable
number of exercises are present in this book which are all based on Varadara-
jan’s original research. Exercises include the later discovered Verma module for
constructing infinite dimensional irreducible representations of semisimple Lie
algebras and the Frobenius-Young theory of irreducible characters of the permu-
tation groups. Hints are provided wherever required which enable the student
Advanced Probability and Statistics: Applications to Physics and Engineering 275

to understand the theory much better than if he were given the complete solu-
tion. It is no exaggeration to say that any student who is able to work through
all these exercises can easily begin research in this field especially on infinite
dimensional representations and representations of Lie super-algebras with ap-
plications to supersymmetry. The second book is a priceless gem as it is all
about presenting Harish-Chandra’s original work on distributional characters,
the discrete series and Plancherel’s formula for real semisimple Lie groups in a
readily accessible form by considering the prototype example of SL(2, R) which
is the simplest example of semisimple Lie group having more that one non-
conjugate Cartan subgroup. Varadarajan, in this book, begins with the work of
Gelfand and Naimark on the representation theory of complex semisimple Lie
group and their Plancherel formula. For this, the Principal series representation
of the SL(n, C), namely that induced by the characters of the upper triangular
subgroup is first explained in full detail along with a proof of its irreducibil-
ity based on Mackey’s theory of induced representations. The supplementary
series discovered by Gelfand is then introduced in terms of an invariant inner
product on functions defined on C2 . Varadarajan then explains the startling
result of Gelfand that only the principal series appear in the Plancherel for-
mula for the complex semisimple Lie groups and notes that this was the reason
why Gelfand totally missed out the Discrete series while attempting to derive
a Plancherel formula for the real semisimple Lie groups. Varadarajan then
explains via the theory of Harish-Chandra’s (g, K) modules, the infinite dimen-
sional irreducible representations (more precisely (g, K) modules) of SL(2, R)
and how apart from the principal series, the discrete series also appears. Later
on in the book, Varadarajan introduces the discrete series modules from an
analytic viewpoint by realizing it within the principal series of representations.
Using the powerful theory of orbital integrals developed by Gelfand for arriving
at the Plancherel formula for the complex semisimple Lie groups, Varadarajan
remarks that Harish-Chandra perfected Gelfand’s orbital integral theory on the
group to orbital integral theory on the Lie algebra using the method of Fourier
transforms on a Lie algebra and using this theory arrived at the Plancherel
formula for real semisimple Lie groups like SL(2, R). Here, unlike the complex
case where the orbital integral theory is a generalization of Weyl’s integration
formula for compact Lie groups, the orbital theory involves integration over
the different non-conjugate Cartan subgroups and as in the complex case, not-
ing that the orbital integrals can directly be used to obtain the distributional
characters of the principal and discrete series. The discrete series are infinite
dimensional irreducible representations of SL(2, R), first introduced in this spe-
cial case by Bargmann, which when included along with the principal series
yields the Plancherel formula for SL(2, R). All this along with many interesting
stories are told in Varadarajan’s book like how Gelfand missed out the discrete
series because he adopted the Lie group approach rather than the Lie algebraic
approach and hence failed in deriving the Plancherel formula for groups having
more than one non-conjugate Cartan subgroup like SL(n, R), n ≥ 2. Whilst
dealing with the discrete series representations of SL(n, R) for n > 1 Harish-
Chandra introduced parabolic subgroups and parabolic induction to construct
276 Advanced Probability and Statistics: Applications to Physics and Engineering

representations by inducing them from the characters of compact Cartan sub-


groups. The compact Cartan subgroups consist of 2 × 2 diagonal blocks of real
rotation matrices ie SO(2) matrices and it is proved that the representations
of SL(n, R) that are irreducible and belong to the discrete series are closely
related to the the compact Cartan subgroups. The discrete series of representa-
tions has properties very similar to the irreducible representations of compact
Lie groups. In the case of SL(2, R), which has two non-conjugate Cartan sub-
groups, namely, the hyperbolic group diag[a, a−1 ], a = 0 and the elliptic group
cos(θ) −sin(θ)
, 0 ≤ θ < 2π, it is easy to evaluate the discrete series
sin(θ) cos(θ)
character on elliptic subgroup but harder to evaluate it on the hyperbolic sub-
group. HarishChandra had observed that the characters of the principal series
for SL(2, R) vanish on the elements that are conjugate to the compact Cartan
subgroup, ie, the elliptic subgroup but not on those that are conjugate to the
non-compact Cartan subgroup, ie, the hyperbolic subgroup and it is precisely
this reason that prevents one from obtaining a Plancherel formula for SL(2, R)
using only the principal series of representations. The discrete series of represen-
tations on the other hand do not vanish on either the elliptic or the hyperbolic
subgroup and thus proves to be decisive in obtaining the complete Plancherel
formula. For SL(2, R), Bargmann was the first to performed these evaluations
from a spectral theory standpoint for differential operators but Harish-Chandra
attacked this problem from a more general standpoint, namely that of deriv-
ing differential equations satisfied by the character. The associated differential
operators are invariant differential operators and the characters are invariant
eigen-distributions for these differential operators. HarishChandra explicitly
solved these differential equations for the Discrete series character and obtained
the values of the constants appearing in this solution from the so called ”Cel-
ebrated HarishChandra jump relations” which relate the orbital integrals for
the non-conjugate Cartan subgroups at boundary points. Harish-Chandra thus
effectively combined formidable algebraic machinery with formidable analytic
machinery to arrive at the complete Plancherel formula for any semi-simple Lie
group real or complex. Varadarajan’s presentation of HarishChandra’s analysis
for the Plancherel formula for SL(2, R) is based on expressing the value of a
function at the identity in terms as a superposition of the Fourier transforms of
the orbital integral of f over the two non-conjugate Cartan subgroups elliptic
and hyperbolic and observing that the Fourier transform of the orbital integral
over the elliptic subgroup is expressible in terms of the discrete series characters
while that of the orbital integral over the hyperbolic subgroup is expressible in
terms of the Principal series character. This derivation of the Plancherel for-
mula HarishChandra extended to all semisimple Lie groups and it is definitely
one of the greatest achievements of twentieth mathematics which Varadara-
jan presents in a form accessible to even a graduate student. It is remarkable
that since SL(2, C) is the double cover of the proper orthochronous Lorentz
group, its irreducible representations and characters can be used to construct
invariants for space-time image fields and hence do pattern recognition for three
dimensional time varying object fields. Further, SL(2, R) is the double cover
Advanced Probability and Statistics: Applications to Physics and Engineering 277

of the proper orthochronous Lorentz group acting in the 2-D plane× time axis
and hence can be used to pattern recognition for two dimensional time vary-
ing image fields. Many engineers working in signal and image processing today
learn about the representation theory of SL(2, C) and SL(2, R) from Varadara-
jan’s book and apply it successfully to pattern recognition and estimation of
the Lorentz transformation from the initial object field and the final object
field corrupted with noise. Furhter the representation theory of the symplectic
group is important in doing pattern recognitin for classical mechanical systems.
For example, H(q, p) = (pT Ap + (1/2)qT Bq, where A, B are positive definite
matrices is the Hamiltonian of a system of linear coupled Harmonic oscillators.
Under this Hamiltonian dynamics, (q(t)T , p(t)T )T is obtained by applying a lin-
ear symplectic transformation to (q(0)T , p(0)T )T . Now consider an observable
f (x), (x = (q, p) on phase space. This observable may for example represent a
signal field produced by the particles of the oscillator system dependent on their
phase space configuration, ie positions and velocities, for example if they emit
electromagnetic waves of definite frequencies, then the measured field amplitude
will depend on their positions and the measured field frequencies will depend
on their momenta in view of the Doppler effect. After time t, the measured
signal field becomes f (T−1 t x) where Tt is a symplectic matrix dependent on
the matrices A, B. When this signal field gets corrupted with noise, we can
apply representation to the symplectic group to pull the matris Tt out of the
observable function and hence estimate the parameters on which A, B depend
easily. Varadarajan’s exposition of the representation theory of the symplectic
group is very useful. I have myself benefited a great deal by reading his text
book and applying it to this system feature estimation problem.
Even before meeting Harish-Chandra, just after meeting Mackey and lectur-
ing at the ISI on the axiomatic foundations of quantum mechanics, Varadarajan
had summarized his ISI lectures on the mathematical foundations of quantum
mechanics in the form of an impeccable book titled ”Geometry of quantum
theory” which later on matured into a volume published by Springer. In this
book, Varadarajan talks about various kinds of quantum logics that do not
obey Boolean rules, like for example, the logic based on orthogonal projections
in Hilbert space. He then deals with the precise formulation of Mackey’s imprim-
itivity theorem which states that upto a unitary isomorphism, every imprimitiv-
ity system is equivalent to a certain canonical imprimitivity system whose uni-
tary representation is obtained by inducing a representation of a smaller group
in a certain ”smaller” Hilbert space. Varadarajan’s book on quantum theory
also contains a nice proof of Gleason’s theorem which is one of the cornerstone
theorems in the foundations of quantum mechanics. This theorem states that
every probability measure on the lattice of orthogonal projections in a Hilbert
space can be obtained via a density operator/density matrix. Then Varadarajan
in this book discusses Wigner’s theorem on representing automorphisms of the
projection lattice by unitary and antinunitary operators in the Hilbert space and
showing that the theorem is intimately connected with Mackey’s imprimitivity
system. More precisely, Mackey’s system consists of covariant observables under
a group action which assumes that given a group G and a map g → τ (g) from
278 Advanced Probability and Statistics: Applications to Physics and Engineering

G into the group of automorphisms of the projection lattice, we consider a spec-


tral measure P on the Borel subsets of G/H on which the group G acts in the
canonical way such that τ (g)(P (E)) = P (gE), we can realize using Wigner’s
theorem a projective unitary-antinunitary (pua) representation g → U (g) of
G such that τ (g)(P ) = U (g)P.U (g)−1 . Hence, connecting the two notions,
one of covariant observables under an automorphism and two, realizing auto-
morphisms using pua representations, one arrives at an imprimitivity system
(H, P (.), G, H, U ) so that U (g)P (E)U (g)−1 = P (g.E), g ∈ G, E ∈ G/H where
H is a subgroup of G. Mackey proved that any irredicible imprimitivity sys-
tem defined by (H, P (.), G, H, U ) (ie, there is no non-trivial subspace of H that
is invariant under the operators U (g), P (E), g ∈ G, E ∈ Borel subsets of the
measurable space G/H) can be realized by induction from an irreducible repre-
sentation L of H acting in a smaller Hilbert space upto an isomorphism. The
proof of these facts are neatly presented in Varadarajan’s book. Varadarajan’s
book on quantum theory also discusses the mathematical aspects of relativis-
tic quantum mechanics like the generalization of Dirac’s Gamma matrices used
in his relativistic wave equation to general Clifford algebras, Wigner’s particle
classification theory based on constructing irreducible unitary representations of
the Poincare group by the method of induced representations of the semidirect
products for each irreducible representation of the little group corresponding
to a given character of the Abelian group appearing in the semidirect product
and many other interesting features. For particle classification, we require irre-
ducible representations of the Poincare group which is the semidirect product of
the Lorentz group with the space-time translation group. Particle classification
is based on spin and mass. The representations of the little group specify the
particle spin while for the induction, we require in addition, the characters of
the Abelian component, ie, R4 and the nature of this character specifies the
mass of the particle. It is not an exaggeration to say that after Von-Neumann’s
celebrated treatise on the mathematical foundations of quantum mechanics,
Varadarajan’s book is the next major reference on the mathematical founda-
tions that summarizes future developments. The last major book of Varadara-
jan is titled ”supersymmetry for mathematicians”. In this book, Varadarajan
formalizes the mathematical foundations of almost all the physical theories of
supersymmetry existing today as developed by the physicists. Physicists prefer
to confine themselves to 4 space-time (Bosonic) coordinates and 4 Fermionic
coordinates in order to express superfields as fourth order polynomials in the
Fermionic coordinates with the coefficients being functions of the space-time
Bosonic coordinates. Then they define a representation of the supersymmetry
algebra in terms of a super Lie algebra of supervector fields acting on general su-
per fields. Such a representation was first constructed by Salam and Strathdhee.

[2] The scientific contributions of the Indian School of probability


In the 1940’s, C.R.Rao simultaneously with H.Cramer discovered one of the
most important results in mathematical statistics, known today as the Cramer-
Rao bound. It gives a lower bound on the mean square error/variance of any
Advanced Probability and Statistics: Applications to Physics and Engineering 279

estimator of a parameter or a parameter vector or a parameter function. The


probability distribution p(X|θ) of a random variable, random vector or random
process X depends upon the parameter θ and one constructs an estimator θ̂(X)
of θ as a function/functional of X. C.R.Rao proved that if this estimator is
unbiased, then the associated mean square error is always greater than a cer-
tain lower bound obtained by forming the trace of the inverse of the Fisher
information matrix:

E[ θ̂(X) − θ 2 ] ≥ T r(J(θ)−1 )

where
∂ 2 log(p(X|θ))
J(θ) = −E[ ]
∂θ∂θT
∂(log(p(X|θ)) ∂(log(p(X|θ)
= E[ . ]
∂θ ∂θT
This lower bound states that no matter how one may construct an estimate of
the parameter, one cannot obtain an accuracy beyond a certain limit. This is
just like the uncertainty principle of Heisenberg in quantum mechanics which
states that no matter what the state of a quantum system is, one cannot simulta-
neously measure two non-commuting observables like position and momentum
with infinite accuracy. Recently, mathematical physicists have shown that it
is possible to derive the Heisenberg uncertainty principle from the CRLB and
vice-versa. The idea is roughly to choose a wave function ψ(x) on R and regard
it as the position space wave function. We then shift this wave function by u so
that it becomes ψ(x|u) = ψ(x − u). The corresponding momentum space wave
function is its Fourier transform ψ̂(p|u) = exp(−ipu)ψ̂(p). The probability den-
sity of the position observable in this state is |ψ(x|u)|2 = |ψ(x − u)|2 while the
probability density of the momentum observable in this state is |ψ̂(p)|2 . When
we try to estimate the position shift u from measurement of the position x, we
denote this estimate by û(x). Then the CRLB implies that
 
(û(x) − u) |ψ(x|u)| dx ≥ 1/ (∂log(|ψ(x|u)|2 )/∂u)2 |ψ(x|u)|2 dx
2 2

or equivalently,
 
( (û(x) − u)2 |ψ(x − u)|2 dx)( ((|ψ(x − u)|2 ),u dx)2 /|ψ(x − u)|2 ) ≥ 1

If ψ(x) is a real wave function, then we can easily see by appropriate choice of
the estimator û(x) that this inequality implies
 
( x ψ(x) dx).( ψ  (x)2 dx) ≥ 1/4
2 2

which on using Plancherel’s formula for the Fouier transform, becomes


 
( x2 ψ(x)2 dx).( p2 |ψ̂(p)|2 dp/2π) ≥ 1/4
280 Advanced Probability and Statistics: Applications to Physics and Engineering

which is precisely the Heisenberg uncertainty principle. It can also be extended


to complex valued wave functions. C.R.Rao proved another important theorem
in statistics known as the Rao-Blackwell theorem and many theorems on the
existence and construction of sufficient statistics. If X is a random vector or a
random process whose probability distribution given a parameter θ is p(X|θ),
then we say that T (X) is a sufficient statistics for θ if p(X|θ) can be factorized
as

p(X|θ) = q(T (X)|θ).r(X)

ie, r does not depend on θ. This implies that knowledge of T (X) alone suffices to
determine all the information required to estimate θ from X. C.R.Rao then did
a great deal of work on estimation of parameters in linear models using various
kinds of generalized inverses of matrices and algorithms for the construction of
such generalized inverses and their properties. For example, if X = Hθ + V is
a linear model and we wish to estimate θ from X by minimizing  X − Hθ 2
where the norm may be taken w.r.t any positive semidefinite matrix W , then we
may have a unique or non unique solution. If the solution is non-unique and we
choose that solution having a minimum norm, then the estimate of θ is uniquely
determined. and we can write this solution as θ̂ = pinv(H)X where pinv(H)
is termed as the pseudo-inverse and is also called the Moore-Penrose inverse
of H. C.R.Rao derived necessary and sufficient conditions for the non-unique
least squares generalized inverse, and the unique pseudo-inverse. Further, if the
solution to the linear system X = Hθ exists for θ, then it may be non-unique,
but if we choose that solution having the minimum norm, then the solution
set becomes smaller and is given by the ”minimum-norm generalized inverse”
of H applied to X. C.R.Rao derived necessary and sufficient conditions that
the minimum norm generalized inverses of a given matrix must satisfy. It turns
out that all these generalized inverses can be expressed very easily in terms of
the singular value decomposition of the matrix. C.R.Rao did a variety of work
on singular Gaussian distributions which occur commonly in statistics. This
means that we have a Gaussian random vector amongst which one or more non-
trivial linear relations exist. This is equivalent to saying that the covariance
matrix of the Gaussian vector is singular. In this case, we cannot get an explicit
representation of the joint probability density of the Gaussian vector but we
can get an explicit representation for the joint characteristic function of the
Gaussian vector. One way to analyze such singular Gaussian distributions is to
express the Gaussian vector as a rectangular matrix acting on a non-singular
Gaussian random vector of a smaller size and to look at the density of the smaller
sized vector. However, this is not a coordinate free approach since there exist an
infinite number of such representations. The coordinate free approach developed
by C.R.Rao is based on his theory of generalized inverses of singular matrices and
is well summarized in his celebrated book ”Linear statistical inference and its
applications”. C.R.Rao has proved a variety of theorems concerning asymptotic
efficiency of statistical estimators of parameters. For example, if we are given an
infinite sequence X1 , X2 , ... of iid random variables having common pdf p(x|θ)
Advanced Probability and Statistics: Applications to Physics and Engineering 281

and we estimate θ based on X1 , ..., Xn in the maximum likelihood way so that



n
θ̂(X1 , ..., Xn ) = argmaxθ log(p(Xk |θ))
k=1

then a natural question to ask is that whether this sequence of estimators will
have a variance that asymptotically decays as the CRLB given by
[E[(∂log(p(X1 , ..., Xn |θ))/∂θ)2 ]]−1 .
He developed analytic tools for examining such problems also in the stationary
dependent case. A statistical estimator of a parameter is said to be efficient if its
variance equals the CRLB, it is said to be asymptotically efficient if the ratio of
its variance to the CRLB converges to unity almost surely as the number of data
samples goes to infinity. It is a well known fact that if there exists an estimator
that is efficient then the maximum likelihood estimator is one such. The proof
of this fact is contained in the appendix. C.R.Rao’s work on the CRLB has been
extended to quantum systems. The problem with quantum systems unlike clas-
sical systems is that given a state, it is not clear what observable to measure or
equivalently, what commuting family of observables to measure or equivalently,
what complete orthogonal family of orthogonal projections (PVM or projection
valued measurement) to measure in order to get an estimator that will have
the least variance. Researchers working in quantum information theory have
therefore arrived at the most general kind of measurement namely POVM;s (an
abbreviation for positive operator valued measurement) which contains all the
above set of measurements as special cases. Such a measurement is given by
M = {Mα : α ∈ I} where I is a countable index set such that Mα ≥ 0∀α ∈ I
and α∈I Mα = 1. If ρ is a state in the same Hilbert space on which these
measurement operators are defined, then on making the measurement M, the
probability of getting the outcome α ∈ I is given by
PM (α) = T r(ρ.Mα )
Thus, PM is a probability distribution on I. If we map I into the real line in any
way, then we can regard I as a countable subset of R and if ρ = ρ(θ) depends
on a parameter θ, we can write
PM (α|θ) = T r(ρ(θ)Mα )
Based on this measurement system, an estimator of θ is a function θ̂(α) of
the outcome α and by the classical CRLB, its mean square error based on the
measured outcome satisfies
EM [(θ̂(α) − θ)2 ] ≥ JM (θ)−1
where
 ∂log(PM (α|θ)) 2
JM (θ) = PM (α|θ)( )
α
∂θ
The optimal measurement M in order to estimate θ is obviously that POVM
M for which JM (θ) is a maximum and clearly, this depends on the parameter
θ itself. This result is readily extended to the vector parameter case.
282 Advanced Probability and Statistics: Applications to Physics and Engineering

[3] The work of S.R.S.Varadhan on large deviation theory and the


Stroock-Varadhan Martingale characterization of diffusions
Martingale characterization of diffusions: Stroock and Varadhan developed
in a series of historic papers the general theory of characterizing diffusion pro-
cesses by Martingales. The earlier work of Kyosi Ito characterized a diffusion
process as the solution of a stochastic differential equation or more precisely,
as the solution of stochastic integral equations where the integral is not the
Riemann-Stieltjes integral used for bounded variation processes, but rather the
Ito integral of an adapted process with respect to Brownian motion which is
of unbounded variation. The Riemann-Stieltjes integral for an adapted process
w.r.t Brownian motion does not exist, in fact if we define it as the limit of sums
of the form i f (si )(B(ti+1 ) − B(ti )) where si ∈ [ti , ti+1 ), then as the size of
the partition tends to zero, one gets different results depending on how the si s
are chosen. So Ito preferred to define his integral as the limit with si = ti .
The adaptedness of f then implies that the expectation of the Ito integral for
an adapted process is zero and the expectation of square of the Ito integral
of an adapted process is simply E f (t)2 dt. This fact implies a nice Hilbert
space isomorphism approach to the construction and properties of the Ito inte-
gral. In other words, we get a Hilbert space isomorphism between the Hilbert
space of square integrable adapted processes and the Hilbert space of finite
mean square random variables that are measurable w.r.t the Brownian motion
process. This Hilbert space approach to the construction of the Ito stochastic
integral gives us only an L2 -construction. Ito proved almost sure convergence
of his integral which is a stronger version of the L2 -construction. Then for
the solution of his stochastic differential equation to exist and be unique, Ito
had to assume Llpshitz conditions for the drift and diffusion coefficients. Ito’s
method works further only when the drift and diffusion coefficients depend on
time and the instantaneous state process and in addition, they satisfy a Lipshitz
condition. The work of Stroock and Varadhan is a far reaching generalization
of the Ito theory. They first observe that if X(t) is a diffusion process in the
sense of Ito with infinitesimal generator L, then for any twice continuously
differentiable function f or equivalently for f in the domain of L, the process
t
Mf,X (t) = f (X(t))− 0 Lf (X(s)ds is a Martingale. Conversely if for an adapted
process X, if Mf,X (t) is a Martingale for all twice continuously differentiable
functions f , then by taking f (X) = Xa and f (X) = Xa Xb respectively, and
using the Brownian motion representation for continuous martingales, we get
that X(t) satisfies the Ito stochastic differential equation with drift and diffusion
coefficients determined from L. Already we note that in this argument nowhere
does one require the use of Ito stochastic integrals w.r.t Brownian motion, only
ordinary Riemann integrals w.r.t time are involved. The next giant step taken
by Stroock and Varadhan was to replace the generator L defined as a par-
tial differential operator by a similar one, ie, linear combinations of ∂/∂xa and
∂ 2 /∂xa ∂xb but now with coefficients that not only need not satisfy the Lipshitz
conditions but which can be functions of the entire process history and then
calling the process X(t) an Ito process if Mf,X (t) is a Martingale for all twice
differentiable functions f . Thus, if for a given L = Lω , there exists a unique
Advanced Probability and Statistics: Applications to Physics and Engineering 283

probability measure P on the space of continuous paths such that when under
P , the process Mf,X (t) is a Martingale for all f ∈ C 2 , then in the language of
Stroock and Varadhan, we say that the Martingale problem is well posed and
the solution X having the probability law P is then called an Ito process. The
form of Lω is
 
Lω = μi (t, ω)∂/∂xi + (1/2) aij (t, ω)∂ 2 /∂xi ∂xj
i i,j

where ω is the coordinate process and μi , aij are progressively measurable pro-
cesses. Stroock and Varadhan determine existence and uniqueness conditions
for Ito processes and as one clearly expects, such processes are constructed by
patching together scaled Brownian motion processes with drift over infinitesi-
mal time intervals. The Stroock-Varadhan theory therefore extends the scope
of the Ito theory of stochastic differential equations by including drift and diffu-
sion coefficients satisfying only a boundedness condition and with the resulting
process not necessarily satisfying an sde in the Ito pathwise sense but rather
but rather it being defined by a probability measure on the space of continuous
paths which is uniquely determined by the solution to the Martingale problem.
After this work, Donsker and Vardhan in a series of historic papers developed
the theory of large deviations for general random variables taking values in a
metric space or even more generally, taking values in a topological vector space.
This includes process valued random variables. Varadhan while starting his
research in this field, was first inspired by the following question of Donsker:
Suppose B(t) is Brownian motion and consider evaluating the expectation
 t
u(t, x) = E[exp( V (x + B(s))ds)]
0

It is a well established result of Feynman and Kac that u satisfies the pde

∂u(t, x)/∂t − (1/2)∂ 2 u(t, x)/∂x2 − V (x)u(t, x) = 0,

u(0, x) = 1
The solution to this can formally be expressed as


u(t, x) = cn un (x).exp(λn t)
n=1

where un (x), λn are solutions to the eigenvalue problem

(1/2)un,xx (x) + V (x)un (x) = λn un (x)

Let λ1 denote the maximum eigenvalue. Then it is clear that

limt→∞ t−1 log(u(t, x)) = λ1 = λ1 (V )


284 Advanced Probability and Statistics: Applications to Physics and Engineering

On the other hand, we have by the variational principle of Rayleigh that




λ1 (V ) = supf (intR f (x)2 V (x)dx − (1/2) f 2 (x)dx)
R

where the maximization is over all f such that



f 2 (x)dx = 1
R

Donsker asked Varadhan the question, how to generalize this result to arbitrary
stochastic processes, not necessarily Brownian motion ? The result was a re-
markable result due to Varadhan, known as Varadhan’s integral lemma which
states that if a family of random variables Z( ), → 0 satisfies a large deviation
principle with rate function I(x), ie,

P r(Z( ) ∈ E)

[4] Varadhan’s work on large deviations


[1] Sanov was the first who derived the rate function for the empirical distri-
bution of a sequence of iid random variables. He observed that the rate function
is simply the relative entropy between the desired probability measure and the
actual probability measure of each element of the iid sequence.
[2] Cramer was the first who derived the rate function and proved the LDP
for the empirical average of a sequence of iid random variables. He also observed
that the LDP for the empirical measure of this sequence derived by Sanov could
be obtained as a special case.
[3] Gartner and Ellis were the first to observe that LDP’s for arbitrary fami-
lies of random variables on topological vector spaces can be derived by replacing
the logarithmic moment generating function in Cramer’s theorem by a scaled
limit of that of the sequence under certain very general conditions. Using this,
they showed how Sanov’s theorem and Cramer’s theorem followed as special
cases.
[4] Varadhan derived an integral lemma which was further completed by
Bryc in the form of an inverse Varadhan integral lemma. These theorems gave
an alternate equivalent formulation of the large deviation principle according to
which one can compute the low temperature limit of the logarithm of the mo-
ment generating function of any smooth function of a family of random variables
that satisfy the LDP. This result was fundamental since it opened up the weak
convergence approach to large deviations and enabled one to use the standard
Euler-Lagrange equations for maximizing a functional of a random variable to
obtain the low temperature limit of the logarithmic partition function.
[5] The weak converence approach to large deviations was propounded by
Dupuis and Ellis. By duality, it is in a sense another version of the integral
lemmas of Varadhan and Bryc.
Advanced Probability and Statistics: Applications to Physics and Engineering 285

[6] Freidlin and Wentzell created the theory of large deviations for diffusion
processes driven by weak white Gaussian noise, or equivalently, for stochastic
differential equations driven by Brownian motion with small amplitude. In a
sense these results can be obtained by using the contraction principle according
to which if I1 (x) is the rate function for a random variable/random process
x and if y = f (x) is another random variable/random process, then the rate
function for y is
I2 (y) = inf {I1 (x) : f (x) = y}

[7] Donsker and Varadhan in a series of path breaking papers established


the LDP for the empirical distribution of Markov processes both in discrete and
continuous time. This result has an easy derivation using the Gartner-Ellis the-
orem. Indeed, one can obtain the limiting logarithmic function for finite time
averages of a function of the Markov process samples in terms of the principal
eigenvalue of a modified transition probability kernel and from this expression,
derive the associated rate function by applying the Legendre transform. Al-
though Donsker and Varadhan do not use the Gartner-Ellis method (which
depends on existence of exposed points), the Gartner-Ellis method provides an
easier proof.
[8] Before the work of Freidlin and Wentzell, Schilder obtained the rate
function Brownian motion. His result can also be obtained the Gartner-Ellis
theorem. Once we know the rate function of Brownian motion, we can using
the contraction principle obtain the rate function for any process driven by
Brownian motion by considering the diffusion process as a functional of the
driving Brownian motion and then applying the contraction principle.
[9] Donsker and Varadhan derived the LDP for stationary process pertur-
bations of a Markov process. More precisely, given a Markov process with a
known transition probability distribution, we can ask the question, what is the
probability that the empirical measure of this process will assume values in a set
whose elements are stationary measures ? The rate function for this problem
is a minor modification of the relative entropy between the stationary measure
and the measure induced by true Markovian measure. The general result is a
rate function of the form

I(Q) = EQ [log(dQ(x0 |x−1 , x−2 , ...)/dπ(x0 |x−1 ))]



= dQ(x0 , x−1 , ...)log(dQ(x0 |x−1 , x−2 , ...)/dπ(x0 |x−1 ))

Varadhan has also posed the question that if the original process is any sta-
tionary process with probability measure P , then can the rate function for the
empirical density of this process be expressed as

I(Q|P ) = dQ(x0 , x−1 , x−2 , ...).log(dQ(x0 |x−1 , x−2 , ...)/dP (x0 |x−1 , x−2 , ...))

There are difficulties with the existence of such a rate function. For example if
286 Advanced Probability and Statistics: Applications to Physics and Engineering

both P and Q are ergodic processes, then P will be concentrated on the set


n−1 
{(x0 , x−1 , x−2 , ...)|limn→∞ n−1 x−k = x0 dP = μ(P )}
k=0

while Q will be concentrated on the set


n−1 
{(x0 , x−1 , x−2 , ...)|limn→∞ n−1 x−k = x0 dQ = μ(Q)}
k=0

so if μ(P ) = μ(Q),
then the radon Nikodym derivative dQ(x0 |x−1 , x−2 , ...)/dP (x0 |x−1 , x−2 , ...)
will not exist.
[10] Varadhan obtained the rate function for random walks in a random
environment. Specifically, the environment consists of a lattice such that the
transition probability from x to x + z is a random function of z. More generally,
we can consider a random environment such that the transition probability from
x to x + z is of the form π(ω, x, z) such that for each z, the map x → π(ω, x, z)
is a stationary process on the lattice.
[5] Survey of some of the work in quantum probability by the Indian
school of probabilists
computing scattering probabilities in quantum mechanics
Work of Amrein, Sinha and Jauch on the determination of time
delay in scattering processes
Let H, H0 be two Hamiltonians. H0 corresponds to the free particle Hamilto-
nian while H = H0 +V is the Hamiltonian of the free particle plus its interaction
energy V with the scattering centre. Let Ω+ , Ω− denote the wave operators and
S the scattering matrix, ie, S = Ω∗+ Ω− . Then, we have

Ω− .exp(−itH0 ) = exp(−itH)Ω− , ∀t

and also
Ω+ .exp(−itH0 ) = exp(−itH).Ω+ , ∀t
so that
exp(itH0 ).S.exp(−itH0 ) = S, ∀t
which means that [S, H0 ] = 0. Now let |f > be a free particle state. Regarding
this as the input free state, the corresponding in-scattered state is Ω− |f >. The
average time spent by the particle in this state within the ball B = B(0, R) =
B(R) of radius R in position space is then
 ∞
T (R) =  χB (Q)exp(−itH)Ω− f 2 dt
0
 ∞
=  χB (Q)Ω− exp(−itH0 )f 2 dt
0
Advanced Probability and Statistics: Applications to Physics and Engineering 287

Work on Coulomb scattering


Although a finite value of the scattering cross section for repulsive Coulomb
scattering occurs in the classical case, in the quantum case the wave operators
do not exist. To rectify this problem, we introduce a time varying phase term,
ie, another time varying unitary operator whilst computing the wave operators
and thereby obtain a finite value of this modified wave operators.

Wt = exp(−iH0 t), Ut = exp(−itH), H = H0 + V, H0 = P 2 , V = V (Q) = K/|Q|

Let  
Xt = V (2P t)dt = Kdt/|2P t| = Klog(t)/2|P |

limt→∞ (Ut∗ Wt .exp(−iXt ) − I)f =


 ∞
i Ut∗ (V (Q) − V (2P t))Wt exp(−iXt )f dt
0

So to ensure that this limit exists, we require that the function of time

 (V (Q) − V (2P t))t Wt .exp(−iXt )f 

is integrable as t → ∞. Note that [Wt , Xs ] = 0. Now observe that

Wt∗ V (Q)Wt = V (Wt∗ QWt ) = V (Q + 2P t)

= Zt∗ V (2P t)Zt


where
Zt = exp(iQ2 /4t)
because
Zt∗ 2P tZt = 2P t − (i/4t)2t[Q2 , P ] = Q + 2P t
Thus,
 (V (Q) − V (2P t))t Wt .exp(−iXt )f 
= (Zt∗ V (2P t)Zt − V (2P t))exp(−iXt )f 
 1
d
= exp(−iuQ2 /4t)V (2P t).exp(iuQ2 /4t)exp(−iXt )f du 
0 du
 1
≤  [Q2 /4t, V (2P t)].exp(iuQ2 /4t).exp(−iXt )f  du
0

Now,
[Q2 , V (2P t)] = 2it(QV  (2P t) + V  (2P t)Q)
= 2it(2itV  (2P t) + 2V  (2P t)Q) = −4t2 V  (2P t) + 4itV  (2P t)Q
288 Advanced Probability and Statistics: Applications to Physics and Engineering

Further,

exp(−iuQ2 /4t)F (2P t).exp(iuQ2 /4t) = F (Q + 2P t/u)

= exp(itP 2 /u)F (Q).exp(−itP 2 /u) = Wt/u



F (Q).Wt/u
Thus, we get
 (V (Q) − V (2P t))t Wt .exp(−iXt )f 
 1
≤  (−tV  (Q)Wt/u + iV  (Q))Wt/u Q).exp(−iXt )f  .du
0
Note that we also have the inequality

 (V (Q) − V (2P t))t Wt .exp(−iXt )f 

≤ (−tV  (2P t) + iV  (2P t)Q).Zt/u exp(−iXt )f 


[6] The role of probability and statistics in general relativity and
cosmology
Summary: Structure formation in the universe analyses the problem of
studying the evolution of galaxies, ie, their density, pressure and velocity field
perturbations and the metric perturbations under the linearized Einstein field
equations when the initial values of these are random. In other words, we are
interested in how an initial random configuration evolves with time under deter-
ministic dynamics. This is basically a problem in dynamical systems determined
by deterministic ode’s and pde’s but with random initial conditions.
Consider the Einstein field equations with a random energy-momentum ten-
sor of matter plus radiation T μν . T μν includes terms from random forces subject
to the conservation condition
T:νμν = 0
in view of the Einstein field equations

Rμν − (1/2)Rg μν = −8πGT μν

and the Bianchi identity

(Rμν − (1/2)Rg μν ):ν = 0

[7] Work of the Indian school of general relativists


[a] S.Chandraskehar on the collapse of a star into a white dwarf, post-
Newtonian equations of hydrodynamics, separation of Dirac’s relativistic equa-
tion for an electron in the Kerr metric, detailed analysis of axial and polar
gravitational perturbations of various kinds of blackholes, a study of electro-
magnetic waves in the Schwarzchild and Kerr metric and the solution of Dirac’s
relativistic wave equation in a background Kerr metric. As observed by Hawk-
ing, the Dirac equation in curved space-time can be used to compute the wave
Advanced Probability and Statistics: Applications to Physics and Engineering 289

function of an electron at a later time given that at time t = 0, all the proba-
bility mass of the wave function was concentrated within the critical blackhole
radius. The general form of the metric of a rotating blackhole is given by

dτ 2 = A(r, θ)dt2 − B(r, θ)dr2 − C(r, θ)dθ2 − D(r, θ)(dφ − ω(r, θ)dt)2

Thus,
g00 = A − Dω 2 , g11 = −B, g22 = −C, g33 = −D,
g03 = Dω
The the Klein-Gordon equation in this metric is
√ √
(g μν ψ,ν −g),μ + m2 −gψ = 0

At time t = 0, the probability distribution of the position of the KG particle in


space is
|ψ(0, r, θ, φ)|2 r2 sin(θ)drdθdφ
and we assume that this is non-zero only when r < rc where rc is the critical
radius of the blackhole. The problem is to compute the probability distribution
of the particle at time t, ie, |ψ(t, r, θ, φ)|2 and show that as t → ∞, there is a
larger and large probability of finding the particle in regions to the exterior of
the critical radius.

[b] C.V.Vishweshwara on some special solutions to the Einstein field equa-


tions especially those having cylindrical symmetry.
[c] J.Narlikar on the quasi-steady state model in cosmology in which matter is
assumed to be continuously flowing into our universe from a baby universe that
is unobservable, thereby maintaining a constant matter density in our observable
universe despite its expansion.
[d] Ray-Chaudhary on the geodesic deviation equation.

d2 xμ /dτ 2 + Γμαβ (dxα /dτ ).(dxβ /dτ ) = 0

gives on taking first order variations

d2 δxμ /dτ 2 + Γμαβ,ρ (x)δxρ (dxα /dτ ).(dxβ /dτ ) + 2Γμαβ (dxα /dτ ).dδxβ /dτ = 0

[e] T.Padmanabhan on the structure formation in the universe, basically


expounding on how small initial perturbations in the metric tensor of space-
time, density and velocity of the matter field evolve under the Einstein field
equations into larger values thus leading to the formation of galaxies and other
structures in our universe on the large scale.
If we write down the linearized Einstein field equations as

Lt f (x) = 0

where Lt is a partial differential operator dependent only on time through the


scale factor S(t) of the expanding universe and f denotes the vector field whose
290 Advanced Probability and Statistics: Applications to Physics and Engineering

components are the six metric perturbation coefficients δgkm (x)1 ≤ k ≤ m ≤ 3


(the coordinate condition may be chosen so that δg0μ = 0), the three velocity
field components δv r (x), r = 1, 2, 3 and the density perturbation δρ(x) (the
equation of state is assumed to be known so that using it, we can express
the pressure fluctuations in terms of the density fluctutations), then we can in
principle obtain a dispersion relation for these perturbations. Specifically, we
write
Lt = L0 (t) + L1 (t)∂t + L2k (t).∂k + L3k (t).∂t ∂k
+L4km (t)∂k ∂m + L5 (t)∂t2
where summation over the repeated spatial indices k, m is implied and L0 (t), L1 (t),
L2k (t), L3k (t), L4km (t)
are all matrices of size 10 × 10. Note that f (x) is also a 10 × 1 vector field. To
use this pde for arriving at dispersion relations, we assume that

f (x) = Re(h(t).exp(ik.r))

and find on substituting this into the above pde, the following matrix ode for
h(t):
L0 (t)h(t) + L1 (t)h (t) + ikm L2m (t)h(t)
+ikm L3m (t)h (t) − kp km L4pm (t)h(t) + L5 (t)h (t) = 0
In the special case of a non-expanding background metric, the L matrices
are all constant and the dispersion relation is obtained by assuming h(t) =
Re(h(0)exp(iωt)) the following dispersion relation relating the frequency ω to
the wave-vector (km )m=1,2,3 :
det(L0 + iωL1 + ikm L2m − km ωL3m − kp km L4pm − ω 2 L5 ) = 0
In the time varying case, ie, when the background metric describes an expanding
isotropic and homogeneous universe, we assume that the frequency varies slowly
with time and obtain approximations for it using time dependent perturbation
theory. Specifically, we assume that each of the L matrices is the sum of a large
time independent part and a small time dependent part. Such a decomposition
is obtained by writing the scale factor as
S(t) = 1 + δ.(S(t) − 1)
and considering δ(S(t) − 1) and all its derivatives to be of the first order of
smallness ie of O(δ) and then decomposing the frequency ω also in the same
way:
ω(t) = ω0 + δω1 (t)
and solving for ω0 , ω1 (t) by equating coefficients of δ 0 = 1 and δ 1 = δ on both
sides. Padmanabhan also talks about how statistical features like second and
higher order metric,velocity and density correlations evolve with time given their
initial values.
[f] Ashoke Sen on quantum gravity using string and superstring theories.
Advanced Probability and Statistics: Applications to Physics and Engineering 291

The usual way of describing the physics of systems in the world is either to
regard an object as an ensemble of point particles in describe the dynamics of
each particle in terms of differential equations if we adopt the classical viewpoint
or else if we adopt the quantum viewpoint, we describe the dynamics of the
wave function of an ensemble of a finite number of particles or else the wave
functional of a quantum field. In any case, dynamics of particles or systems
of interacting particles have motion described by the time variable. In string
theory, there are no point particles only strings. Thus, to describe classical
physics, rather than considering the world line of a particle xμ (τ ) as a function
of the proper time τ which is a single parameter, consider the world sheet of a
string X μ (τ, σ) where τ is some real parameter which we may call proper time
and σ is another parameter which may be assumed to vary over the interval
[0, 1] after an appropriate scaling. For each time τ , the map σ → X μ (τ, σ) from
[0, 1] in RD+1 , μ = 0, 1, ..., D is the parametric equation of a D + 1 dimensional
string and as time τ varies, this mapping traces a D+1-dimensional world sheet.
The metric of space-time in general relativity has the form

dτ 2 = gμν (x)dxμ dxν

while in string theory, it has the form

gμν (X)dX μ ∧ dX ν

In the former case, gμν dxμ dxν determines an infinitesimal proper time interval
while in the latter case, gμν dX μ ∧ dX ν describes an infinitesimal area element.
Just as minimizing the proper time interval dτ for point particles gives a
geodesic trajectory in the former case, so also minimizing the the proper area
interval gμν (X)dX μ ∧ dX ν gives a geodesic world sheet trajectory, ie, it tells
us the sheet traced out by the string. We note that

g μν (X)dX μ ∧ dX ν = [gμν (X)(X,τ


μ ν
X,σ − X,σ
μ ν
X,τ ]dτ σ

and setting its variation to zero gives us the world sheet equation:
μ
∂τ ∂L/∂X,τ μ
+ ∂σ ∂L/∂X,σ − ∂L/∂X μ = 0

where
L = L(X μ , X,τ
μ μ
, X,σ )=
μ
gμν (X)(X,τ ν
X,σ − X,σ
μ ν
X,τ )
For quantizing the world-sheet dynamics of a string, we may start with this
Lagrangian density L and compute the corresponding Hamiltonian density

H(X, X,σ , P ) = Pμ X,τ


μ
−L

where
μ
Pμ = ∂L/∂X,τ
292 Advanced Probability and Statistics: Applications to Physics and Engineering

The classical Hamiltonian equations that are equivalent to the above Euler-
Lagrange equations are
μ
X,τ = ∂H/∂Pμ ,
Pμ,τ = −∂H/∂X μ + ∂σ ∂H/∂X,σ
μ

The quantum mechanical wave equation for the wave functional of a string has
the form
i∂τ ψ(τ, X) = Hψ(τ, X)
where
X = {X(σ) : 0 ≤ σ ≤ 1}
and  1
H= H(X(σ), X,σ (σ), P (σ))dσ
0

with
Pμ (σ) = −i∂/∂X μ (σ)

[g] Numerical general relativity based on the ADM formalism carried out at
the NSUT school of astronomy
The ADM action expresses the Einstein-Hilbert action for the gravitational
field in terms of a spatial component and a temporal component. The idea is
basically to start with a space-time manifold having coordinates X μ and imbed
into this four dimensional manifold a family Σt of spatial manifolds for different
times t. Let gμν denote the metric w.r.t the coordinates X μ and g̃μν that w.r.t
the space-time coordinates xμ , x0 = t that parametrize Σt . Then, we write

g μν = g̃ αβ X,α
μ ν
X,β

= g̃ ab X,a
μ ν
X,b + 2g̃ a0 X,a
μ
X,tν

+g̃ 00 X,tμ X,tν


We write
X,tμ = T μ = N μ + N nμ
where N μ is purely spatial and hence tangential to Σt while nμ is orthogo-
nal/normal to Σt , and N is a scalar field chosen so that nμ has unit length.
Thus we can write
N μ = N a X,a
μ

and express the condition for orthogonality of nμ to Σt as

gμν (T μ − N a X,a
μ ν
)X,b =0

This gives
g̃ab N a = gμν T μ X ν , b = g̃0b
Advanced Probability and Statistics: Applications to Physics and Engineering 293

This is a system of three linear equations for the three components N a and its
inversion is easy. Using this identity, it is easily proven from the above identities
that
g μν = g̃ ab X μ , aX,b
ν
+ nμ n ν
ie, the metric tensor in the X μ system decomposes into the sum of a purely
spatial part and a purely normal part with the cross terms cancelling out. Note
that the cross term in the expansion of g μν is given by

g̃ a0 N (X,a
μ ν ν μ
n + X,a n ) + g̃ 00 N (N μ nν + N ν nμ )

and to check that this is zero, it suffices to check that

g̃ a0 + g̃ 00 N a

vanishes. Recall that


N μ = N a X,a
μ

in our notation. However, we’ve already noted that

g̃ab N b = g̃a0

and this implies that


g̃ μa g̃ab N b = g̃ μa g̃a0
or equivalently,
(δaμ − g̃ μ0 g̃0b )N b = (δ0μ − g̃ μ0 g̃00 )
Setting μ = c and μ = 0 respectively in this equation gives

N c − g̃ c0 g̃0b N b + g̃ c0 g̃00 = 0,

−g̃ 00 g̃0b N b = 1 − g̃ 00 g̃00


and these two imply

N c + g̃ c0 (1 − g̃ 00 g̃00 )/g̃ 00 + g̃ c0 g̃00 = 0

or equivalently,
N c + g̃ c0 /g̃ 00 = 0
which is what we sought to prove. We write qab for g̃ab and

q μν = g̃ ab X,a
μ
Xν, b

so that q becomes purely spatial, ie tangential to Σt :

q μν nν = 0

Remark: Define the antisymmetric tensor

Lμν = ∇μ nν − ∇ν nμ
294 Advanced Probability and Statistics: Applications to Physics and Engineering

= nν,μ − nμ,ν
and then to note that since nμ is normal to a three dimensional surface, it can
be expressed as
nμ = F G,μ
where F, G are scalar fields. Then,

Lμν = F,μ G,ν − F,ν G,μ

= (logF ),μ nν − (logF ),ν nμ


which implies that
μ ν
X,a X,b Lμν = 0
ie Lμν does not have a purely spatial component. Now define
 
Kμν = qμμ qνν ∇μ nν 

This is obviously a purely spatial tensor and we find that


 
Kνμ = qνμ qμν ∇μ nν 
 
= qμμ qνν ∇ν  nμ
Thus,  
Kμν − Kνμ = qμμ qνν (nν  ,μ − nμ ,ν  ) =
 
= qμμ qνν Lμ ν  = 0
by the above calculation. Thus, Kμν is a purely spatial symmetric tensor. Now
we seek to evaluate
μ
Kab = X,a ν
X,b μ
Kμν = X,a nu
X,b ∇μ nν

in terms of qab , qab,c , qab,0 , N a , N . Note that

qνμ X,a
ν
= (δνμ − nμ nν )X,a
ν

μ
= X,a
Now
μ
Kab = X,a ν
X,b (nν,μ − Γσνμ nσ )
ν
= nν,a X,b − (1/2)nσ X,a
μ ν
X,b (gσν,μ + gσμ,ν − gμν,σ )
= −nν X,ab
ν
− (1/2)nσ X,a
μ ν
X,b (gσν,μ + gσμ,ν − gμν,σ )
since nν X,b
ν
= 0. Further,
μ ν ν
X,a X,b gσν,μ = gσν,a X,b =
ν
(gσν X,b ),a − gσν X,ab
ν
Advanced Probability and Statistics: Applications to Physics and Engineering 295

ν
= (qσν X,b ),a − gσν X,ab
ν

= qσb,a − gσν X,ab


ν

and likewise,
μ
μ
X,a ν
X,b gσμ,ν = qσa,b − gσμ X,ab
Further,
μ ν
X,a X,b gμν,σ =
μ ν
qμν,σ X,a X,b
Substituting all these expressions and making appropriate cancellations gives us

Kab = (−1/2)nσ (qσb,a + qσa,b − qμν,σ X,a


μ ν
X,b )

Now,
nσ = (X,0
σ
− N σ )/N, N σ = N c X,c
σ

We thus get
N nσ qσb,a = qσb,a X,0
σ
− qσb,a N σ
= g̃0b,a − qσb X,0a
σ
− qσb,a N c X,c
σ

= g̃0b,a − qσb X,0a


σ
− qcb,a N c + qσb N c X,ac
σ

Interchanging a and b gives us

N nσ qσa,b =

g̃0a,b − qσa X,0b


σ
− qca,b N c + qσa N c X,bc
σ

and further,
N nσ qμν,σ X,a
μ ν
X,b =
μ
qμν,σ X,a ν
X,b σ
X,0 − qμν,σ X,a
μ ν
X,b σ c
X,c N
μ
= qμν,0 X,a ν
X,b − qμν,c X,a
μ ν
X,b Nc
= qab,0 − qμν (X,a
μ ν
X,b ),0
−qab,c N c + qμν (X,a
μ ν
X,b ),c N c
= qab,0 − qaν X,b0
ν
− qbν X,a0
ν
− qan,c N c
ν
+qaν X,bc N c + qbν X,ac
ν
Nc
[g] Abhay Ashtekar on quantum gravity using canonical Ashtekar variables.

[8] Research work carried out at the NSUT on quantum signal


processing
[1] Naman Garg (Ph.D thesis)
[2] Rohit Singh (Ph.D thesis)
The latter part of this work deals with denoising classical image fields using
quantum processing on Gaussian states. The idea is first to use classical to
296 Advanced Probability and Statistics: Applications to Physics and Engineering

quantum conversion of noisy and noiseless image field pairs, so that each pair
of such classical image fields is represented by a pair of pure states. The next
step is to approximate each pure state in each such pair by a quantum Gaussian
state (mixed) and then to purify all these Gaussian states by choosing the
Boson-Fock space appearing in the Hudson-Parthasarathy quantum stochastic
calculus as the reference Hilbert space. In this purification, the ”system part” of
the purified state is expressed in terms of the eigenstates of harmonic oscillators
because the Gaussian states are all diagonal w.r.t. these eigenstates provided
that one assumes that the Gaussian states are in diagonal form, ie they commute
with the Hamiltonian or equivalently with the number operator of each of these
harmonic oscillators. The reference states appearing as a tensor product term in
the purification of quantum Gaussian states are all obtained by orthonormalizing
a set of exponential/coherent vectors in the Boson Fock space. The reason for
choosing such a purification is simple. We are designing our unitary processor
based on the Hudson-Parthasarathy Schrodinger equation in which the Lindblad
system operators are linear combinations of creation and annihilation operators
of harmonic oscillators appearing also in the expression of the quantum Gaussian
state in the system Hilbert space. The noise operator processes on the other
hand are the creation and annihilation operator processes described in terms of
families of operators in the Boson Fock space of the reference system required
for purification. Now the system space creation and annihilation operators have
an easy to describe action on the number states appearing as the first term
in the tensor product that describes the purification of the Gaussian state.
On the other hand the noise operator processes of the Hudson-Parthasarathy
quantum stochastic calculus have an easy to describe action on the exponential
vectors of the reference Boson Fock space that appear as the second term in
the tensor product that describes the purification. Thus, the overall quantum
noisy generator of the unitary evolution in the Hudson-Parthasarathy noisy
Schrodinger equation has an easy to describe action on the purification of the
quantum Gaussian state. Specifically,

(a ⊗ dA(t)))|n > ⊗|e(u) >= u(t)dt n|n − 1 > ⊗|e(u) >

and more generally,

< m⊗e(v)|[c1 a⊗dA(t)+c2 a⊗dA(t)∗ +c3 a∗ ⊗dA(t)+c4 a∗ ⊗dA(t)∗ ]|n⊗e(u) >=


√ √
= c1 nu(t)dt < m|n−1 >< e(v)|e(u) > +c2 nv̄(t)dt < m|n−1 >< e(v)|e(u) >
√ √
+c3 n + 1u(t)dt < m|n + 1 >< e(v)|e(u) > +c4 n + 1v̄(t)dt < e(v)|e(u) >
Chapter 11

$SSOLHG'L൵HUHQWLDO
Equations
[1] Construction of adaptive parameter updates and controller gains that guar-
antee Lyapunov stability of a linearized robotic system as regards tracking error
and parameter estimation error.

The exact robot dynamics is

M (q, θ0 )q  + N (q, q  , θ0 ) = τ (t) + M (qd , θ)(Kp (qd − q) + Kd (qd − q  ))

where θ0 is the exact parameter value (ie link masses and lengths). Let θ(t) be
the parameter estimate at time t and define δθ(t) = θ(t) − θ0 , e(t) = qd (t) − q(t)
where qd (t) is the desired robot trajectory. We define τ (t) to be the computed
control torque for trajectory tracking:

τ (t) = M (q(t), θ(t))qd (t) + N (q(t), q  (t), θ(t))

Note that q, q  are measurable at any instant of time perhaps via an EKF ob-
server based on noisy measurements. Then, the above differential equation
becomes after neglecting quadratic and higher terms in e(t), δθ(t),

M (q, θ0 )q  + N (q, q  , θ0 ) =

M (q(t), θ(t))qd (t) + e ) + N (q(t), q  (t), θ(t)) + M (q, θ0 )(Kp e + Kd e )


or
(M (q, θ0 ) − M (q, θ))q  + (N (q, q  , θ0 ) − N (q, q  , θ))
= M (q, θ0 )(e + Kd e + Kp e)
or equivalently,

e + Kd e + Kp e = W (t, q, q  , q  )δθ(t) − − − (1)

297
298 Advanced Probability and Statistics: Applications to Physics and Engineering

where
W (t, q, q  , q  ) =
−M (q, θ0 )−1 [∂M (q, θ0 )/∂θ)(I ⊗ q  ) + (∂N (q, q  , θ0 )/∂θ)]
Note that in (1), q, q  , q  on the rhs may be replaced by qd , qd , qd in view of the
linearized approximation. We now assume that Q1 , Q2 are two positive definite
matrices and propose the adaptive estimation law

δθ (t) = F e(t) − Gδθ(t)

Note that defining


f (t) = [e(t)T , e (t)T ]T
we can write (1) as
f  (t) = Af (t) + W (t)δθ(t)
where
0 I
A=
−Kp I −Kd I
and
0
W (t) =
W (t, q, q  , q  )
Then define the trajectory tracking and parameter estimation error energy func-
tion
V (t) = (1/2)f (t)T Q1 f (t) + (1/2)δθ(t)T Q2 δθ(t)
Question: Under what relationship between the matrices A, W, F, G are we guar-
anteed that there exist positive definite matrices Q1 , Q2 so that V  (t) < 0 ?

[2] Work done by the NSUT school of robotics


[1] Determine algorithms based on stochastic Lyapunov theory for simultane-
ous trajectory tracking of robots and dynamic robot parameter estimation when
the robot dynamical equation has a white Gaussian noise torque component.

[2] Analyze the teleoperation of two robot systems with teleoperation delay
feedback using infinite dimensional stochastic differential equations and pertur-
bation theory for approximately solving stochastic differential equations applied
to the computation of the statistical moments of the fluctuations of the robot
trajectories around the non-random value.
[3] Study the dynamics of several robots on a lattice in simple exclusion
interacting with each other via electromagnetic forces. Each robot at a lattice
site is described by its azimuth and elevation angle at time t and a current
flows along this axis thereby generating an electromagnetic field which interacts
with the currents in the robots at other lattice sites. A lattice site may or may
not be occupied by a robot and a robot can jump from one site to another
in accordance with the simple exclusion model. For a given configuration of
sites, the total kinetic energy of each robot as well as its potential energy of
Advanced Probability and Statistics: Applications to Physics and Engineering 299

interaction with other robots is calculated and thereby the Lagrangian of this
system of interacting robots is formed. This Lagrangian is a function of the
configuration of the lattice, ie it is dependent on which sites in the lattice are
occupied and which not. The statistical average of this Lagrangian is then taken
w.r.t the probability distribution of the occupation numbers of the sites which
follow the simple exclusion stochastic differential equation triggered by Poisson
clocks. Approximations are then made to obtain approximate solutions to the
link angles of the robots at the different sites.

[4] Nano robots interacting via quantum electromangetic fields


Here a system of nano/quantum robots contains each a certain number of
electrons, positrons in the form of a current. The state of each robot is described
by a set of creation and annihilation operators for electrons and positrons and
the Dirac current in each robot is expressed in terms of the electron and positron
creation and annihilation operators. Using the retarded potential formula, the
quantum electromagnetic field produced by each robot is calculated and the
interaction Hamiltonian of this field with the quantum currents in the other
robots is calculated. This interaction also includes terms that are bilinear in
the quantum em field of one robot and the system variables like spin, momen-
tum, orbital angular momentum of other robots . Using this Hamiltonian, the
Heisenberg dynamical equations for the aystem and quantum field variables are
written down and simulated.

[5] Construction of disturbance observer using the Lyapunov energy method


and proving that the disturbance error energy decays with time. Modelling the
disturbance estimation error as a white Gaussian noise process and analyzing
the stochastic behaviour of the robot under disturbance estimate subtraction
from the dynamics.
[6] Neural network modelling of chaotic and speech synthesis systems with
EKF used for weight updation.
[7] The UKF used in place of the EKF for robot state and parameter esti-
mation.
[8] Wavelet based parameter estimation of the potential energy in particle
and rigid body dynamics.
[9] Large deviation principle applied to the design of PD controllers for robots
subject to weak noise so that the probability of escape of the robot from the
stability region becomes as small as possible.
[10] Nonlinear filtering theory applied to robot state estimation when the
process torque noise is a Levy process, ie, a process with independent incre-
ments modelled according to the Levy-Khintchine theorem as a superposition
of Brownian motion and several independent Poisson processes so that its time
derivative is white Gaussian noise plus white compound Poissonian noise.
[11] Lie-group and Lie-algebra theoretic techniques for analyzing robot dy-
namics having several 3-D links. Differential equations satisfied by the rotation
matrices.
300 Advanced Probability and Statistics: Applications to Physics and Engineering

[12] Large deviation principle applied to stochastic differential equations for


rotation matrices occurring in 3-D link robot dynamical analysis.
[13] Quantization of rigid body dynamics via path integrals starting from the
Lagrangian in terms of Euler angles and via the canonical operator theoretic
formalism starting from the Hamiltonian of the rigid body.
Remark on [2] Teleoperation systems: The master robot satisfies the sde

p
dXm (t) = [Fm (Xm (t), t)+ Cm [k]ψm (Xs (t−(k+1)T )−Xm (t−kT ))]dt
k=0
+Gm (Xm (t))dBm (t)+τm (t)dt
and the slave robot satisfies teh sde

p
dXs (t) = [Fs (Xs (t), t)+ Cs [k]ψs (Xm (t−(k+1)T )−Xs (t−kT ))]dt
k=0
+Gs (Xs (t))dBs (t)+τs (t)dt
where ψm , ψs are odd functions and this form of the feedback force clearly dis-
plays the teleoperation delay T taken to receive a signal from the other end be-
fore applying the feedback. We design the feedback coefficients Cm [k], Cs [k], 1 ≤
k ≤ p so that the mean square trajectory tracking error
 T
E  Xm (t) − Xs (t) 2 dt
0
is a minimum. Suppose that such a design has been made assuming no noise.
Then while implementation, there must be small changes δCm [k], δCs [k] in the
feedback coefficients to ensure minimality of the trajectory tracking errors. Let
the corresponding changes in the master and slave trajectories be δXm (t) and
δXs (t) respectively. Then, we get by applying perturbation theory,


p
 
dδXm (t) = Fm (Xm (t), t)δXm (t)dt+ Cm [k]ψm (Xs (t−(k+1)T )
k=0

−Xm (t−kT ))(δXs (t−(k+1)T )−δXm (t−kT ))dt



p
+ δCm [k]ψm (Xs (t − (k + 1)T ) − Xm (t − kT ))dt + Gm (Xm (t))dBm (t)
k=0

p
dδXs (t) = Fs (Xm (t), t)δXs (t)dt+ Cs [k]ψs (Xm (t−(k+1)T )
k=0

−Xs (t−kT ))(δXm (t−(k+1)T )−δXs (t−kT ))dt



p
+ δCs [k]ψs (Xm (t − (k + 1)T ) − Xs (t − kT ))dt + Gs (Xs (t))dBs (t)
k=0
Advanced Probability and Statistics: Applications to Physics and Engineering 301

These equations are of the following general form, ie, linear delay stochastic
differential equations:

N
dξ(t) = Fk (t)ξ(t − kT )dt + G(t)dB(t)
k=0


N
= F0 (t)ξ(t)dt + Fk (t)ξ(t − kT )dt + G(t)dB(t)
k=1

Here, ξ(t) is a vector valued process and the Fk (t) s are matrices. We can solve
this using perturbation theory by considering the delay terms to be of the first
order of smallness. The solution upto first order of smallness is then given by
N 
 t
ξ(t) = Φ(t, 0)ξ(0) + Φ(t, s)Fk (s)Φ(s − kT, 0)ξ(0)ds
k=1 0

 t
+ Φ(t, s)G(s)dB(s)
0
where
∂Φ(t, s)/∂t = F0 (t)Φ(t, s), t ≥ s, Φ(s, s) = I
This formula can be used to evaluate statistical correlations upto second order
of smallness.
[3] Disturbance observer construction for speech models.
x[n] = f (x[n − 1], ..., x[n − p]) + d[n]
We construct the disturbance observer using the following recursive algorithm
ˆ = d[n
d[n] ˆ − 1])
ˆ − 1] = Ln (d[n] − d[n

ˆ − 1] + Ln (x[n] − f (x[n − 1], ..., x[n − p]) − d[n


= d[n ˆ − 1])
We then find
ˆ = d[n] − d[n
d[n] − d[n] ˆ − 1])
ˆ − 1] − Ln (d[n] − d[n − 1] + d[n − 1] − d[n

We write
ˆ δ[n] = d[n] − d[n − 1]
[n] = d[n] − d[n],
and then get
[n] = δ[n] + [n − 1] − Ln (δ[n] + [n − 1])
= (1 − Ln )δ[n] + (1 − Ln ) [n − 1]
If Ln = L is a constant matrix, then we can solve the above difference equation
to get

n−1
[n] = (1 − L) [0] +
n
(1 − L)n−k δ[k]
k=0
302 Advanced Probability and Statistics: Applications to Physics and Engineering

Assuming that 1 − L has all singular within the unit circle, it follows that when

|δ[n]| ≤ K∀n

then


limsupn→∞ | [n]| ≤ K.  1 − L k+1
k=0

1−L
= K.
1−  1 − L 

[4] The UKF: The basic logic behind the UKF is that the EKF does not yield
accurate state estimates since it involves taking the expectation operator inside
a nonlinear function of a Gaussian state. This is not justified. A nonlinear
transformation of a Gaussian vector is non-Gaussian and hence we must make
use of the law of large numbers to calculate its expectation by simulating inde-
pendent realizations of it conditioned on the past observations. The philosophy
of the UKF is precisely based on using the law of large numbers to approximate
conditional expectations of nonlinear functions of Gaussian random vectors.
The state equations are

dX(t) = Ft (X(t))dt + Gt (X(t))dB(t)

The measurement model is

dY (t) = ht (X(t))dt + σv dV (t)

where B(.), V (.) are independent vector valued Brownian motions. In dis-
cretized form these are of the form

X[n + 1] = Fn (X[n]) + Gn (X[n])W [n + 1],

Y [n] = hn (X[n]) + σv V [n]


where W [n], V [n] are independent standard normal random vectors. Let Zn =
{Y [k] : k ≤ n} and

X̂[n + 1|n] = E[X[n + 1]|Zn ], X̂[n|n] = E[X[n]|Zn ]

We then evaluate


K 
X̂[n + 1|n] = E[Fn (X[n])|Zn ] ≈ (1/K) Fn (X̂[n|n] + P [n|n]ξ[k])
k=1

where
P [n|n] = Cov(X[n]|Zn ) = Cov(X[n] − X̂[n|n]|Zn )
and ξ[k], k = 1, 2, ..., K are iid standard normal random vectors. Further,
Advanced Probability and Statistics: Applications to Physics and Engineering 303


K 
P [n + 1|n] = (1/K). (Fn (X̂[n|n] + P [n|n]ξ[k]) − X̂[n + 1|n]).( )T + Q
k=1

where Q = cov(W [n]).


Again, we find that under the joint Gaussian approximation assumption,

X[n + 1|n + 1] = E[X[n + 1]|Zn , Y [n + 1]]

= E[X[n+1]|Zn ]−Cov(X[n+1], Y [n+1]|Zn ).Cov(Y [n+1]|Zn )−1 .(Y [n+1]−E[Y [n+1]|Zn ])


= X̂[n + 1|n] − PXY PY−1
Y (Y [n + 1] − E[hn (X[n + 1])|Zn ])

P [n + 1|n + 1] = P [n + 1|n] − PXY PY−1 T


Y PXY

where we make the approximations,


PXY = cov(X[n + 1], Y [n + 1]|Zn ) ≈

K  
(1/K) [ P [n + 1|n]ξ[k](hn (X̂[n+1|n]+ P [n + 1|n]ξ[k])
k=1

K 
−(1/K) hn (X̂[n+1|n]+ P [n + 1|n]ξ[m])].[ ]T
m=1

K 
PY Y = cov(Y [n + 1]|Zn ) ≈ (1/K). [hn (X̂[n + 1|n] + P [n + 1|n]ξ[m])−
m=1

K 
−(1/K) hn (X̂[n + 1|n] + P [n + 1|n]ξ[m])].[ ]T + σV2 R
m=1

where
R = cov(V [n])

[5] The kinetic energy of the two 3-D link robot system is given by

K(t) = (1/2)T r(R (t)J1 R (t)T ) + (1/2)T r(S  (t)J2 S  (t)T ) + T r(R (t)J3 S  (t)T )

where R(t), S(t) ∈ SO(3). Here, R(t) is the rotation experienced by the first
link and S(t)R(t) that by the second link. The potential energy of the system
can be expressed as
V (t) = aT1 R(t)b1 + aT2 S(t)b2
where a1 , b1 , a2 , b2 ∈ R3 . Thus the Lagrangian of this system of two rigid bodies
in the absence of external torques is given by

L(R(t), S(t), R (t), S  (t) = K(t) − V (t) =

(1/2)T r(R (t)J1 R (t)T )+(1/2)T r(S  (t)J2 S  (t)T )+T r(R (t)J3 S  (t)T )
−aT1 R(t)b1 −aT2 S(t)b2
304 Advanced Probability and Statistics: Applications to Physics and Engineering

Problem: Taking into account the constraints R(t)T R(t) = S(t)T S(t) = I
by writing R(t) = exp(X(t)), S(t) = exp(Y (t)) where X(t), Y (t) are real 3 × 3
skew-symmetric matrices and making use of the differential of the exponential
map
R (t) = R(t).(I − exp(−ad(X(t))/ad(X(t)))(X  (t))
write down the above Lagrangian in terms of Lie algebra coordinates, ie, choose
a fixed real set of three linear independent 3 × 3 skew-symmetric matrices
L1 , L2 , L3 and write X(t) = x1 (t)L1 + x2 (t)L2 + x3 (t)L3 , Y (t) = y1 (t)L1 +
y2 (t)L2 +y3 (t)L3 and then express the above Lagrangian in terms of xk (t), xk (t),
yk (t), yk (t), k = 1, 2, 3.
References: See the published papers by Rohit Singla, Vijyant Agrawal and
Harish Parthasarathy, Rohit Singla and Harish Parthasarathy, Vijyant Agrawal
and Harish Parthasarathy, Rohit Rana, Vijyant Agrawal, Prerna Gaur and
Harish Parthasarathy and the papers communicated by Rohit Rana, Harish
Parthasarathy, Vijyant Agrawal and Prerna Gaur.

[6] A problem suggested by Prof.Vijyant Agarwal on machine in-


telligence
[1] Let
y(k + 1) = F (y(k))
describe a chaotic system in Rn . We wish to approximate this dynamics by
a neural network (NN) defined in terms of weight matrices V ∈ Rn×m and
W ∈ Rm×n so that the approximated system becomes

y0 (k + 1) = V.tanh(W y0 (k))

or more precisely in terms of components,



m 
n
y0 (r, k + 1) = V (r, l)tanh( W (l, p)y0 (p, k)), 1 ≤ r ≤ n
l=1 p=1

Suppose that we have made such an approximation. Then, while implementing


it we need to make the estimates of theweight matrices V, W adaptive, ie, time
dependent so that even in the presence of disturbances and stochastic noise, the
NN delivers a good approximation to the original chaotic system. Out noisy
model is thus given by

y0 (k + 1) = Vk .tanh(Wk y0 (k)) + d(k + 1) + w(k + 1)

where d is disturbance and w is noise. The weight matrices follow the dynamics
of being approximately constant except for some small weight noise:

Vk+1 = Vk + V (k + 1), Wk+1 = Wk + W (k + 1)


Advanced Probability and Statistics: Applications to Physics and Engineering 305

The measurement model is

z(k) = Hy(k) + v(k)

It should be noted that the plant dynamics function F is not known although
we are able to take noisy measurements of some linear transformation of the
state vector y(k) generated by it.Our goal is to use these measurements in the
NN to get a reasonably good approximation to the original plant dynamics. The
model for the disturbance estimate is
ˆ + 1) = d(k)
d(k ˆ + Lk (d(k + 1) − d(k)
ˆ + w(k + 1))

ˆ + Lk (y0 (k + 1) − Vk .tanh(Wk y0 (k) − d(k))


= d(k) ˆ
We may even approximate this disturbance estimate further by replacing the
NN output state y0 (k) by the noisy measurement z(k) of the original plant:
ˆ + 1) = d(k)
d(k ˆ
ˆ + Lk (y(k + 1) − Vk .tanh(Wk y(k)) − d(k))

in the special case when H is the identity matrix.We assume that the disturbance
ˆ
estimation error d(k + 1) − d(k) = w1 (k + 1) is nearly white noise so that
w(k) + w1 (k) = w2 (k) is also white noise whose covariance is given by Q0 =
Cov(w1 (k)) + Cov(w(k)). Then the disturbance estimate follows the stochastic
model
ˆ + 1) = d(k)
d(k ˆ + Lk w2 (k + 1)
ˆ
and we construct an EKF for estimating d(k), Vk , Wk , y(k) based on the noisy
measurements z(k). To this end, we define the extended state vector
ˆ T ]T
ξ(k) = [y0 (k)T , V ec(Vk )T , V ec(WkT )T , d(k)

so that the extended NN dynamics with DO is given by

ξ(k + 1) = ψk (ξ(k)) + w3 (k + 1)

where
ˆ
ψk (ξ(k)) = [(Vk .tanh(Wk y0 (k)))T + d(k), ˆ T ]T
V ec(Vk )T , V ec(WkT )T , d(k)

and

w3 (k + 1) = [w(k + 1)T , V ec( V (k + 1))T , V ec( W (k + 1))T , (Lk w2 (k + 1))T ]T

ˆ
The fact that we are considering d(k + 1) − d(k) to be white noise means that
the disturbance must be slowly time varying, or more precisely, it must asymp-
totically converge to a constant vector so that

d(k + 1) − d(k) → 0, k → ∞
306 Advanced Probability and Statistics: Applications to Physics and Engineering

D.O convergence analysis for constant Lk = L:


ˆ + 1) = d(k)
d(k ˆ + L(d(k + 1) − d(k)
ˆ + w(k + 1))

We can write this as


ˆ + 1) = (1 − L)d(k)
d(k ˆ + L(d(k + 1) + w(k + 1)) − − − (a)

ˆ
This equation implies that if w(k) = 0 and d(k) → d(∞) and d(k) ˆ
→ d(∞),
then
ˆ
d(∞) ˆ
= (1 − L)d(∞) + Ld(∞)
which implies
ˆ
d(∞) = d(∞)
a result which states that if asymptotically, the disturbance converges to a
constant dc vector, and its estimate also converges then the limiting value of the
disturbance estimation error is zero. So the question arises that if noise is absent
and the disturbance is asymptotically constant, then under what conditions on
L will its estimate also be asymptotically constant ? Taking the one sided
Z-transform of (a) gives us

ˆ − − − (b)
(z − 1 + L)D̂(z) = zL(D(z) + W (z)) + z d(0)

which gives on Z-transform inversion,


k
ˆ =L
d(k) ˆ
(1 − L)k−m (d(m) + w(m)) + (1 − L)k d(0)
m=0

Thus, if we assume that  1 − L < 1 (in spectral norm), then (1 − L)k → 0 and
the transient term
ˆ ≤ 1 − L k  d(0)
 (1 − L)k d(0) ˆ → 0, k → ∞

Writing
ˆ
E(z) = D(z) − D̂(z), e(k) = d(k) − d(k)

gives us on neglecting the transient term,

(z − 1 + L)(D(z) − E(z)) = zL(D(z) + W (z))

or
(z − 1 + L)E(z) = (z − 1)(1 − L)D(z) − zLW (z)
which gives on Z-transform inversion,


k−1 
k
e(k) = (1 − L)k−m (d(m) − d(m − 1)) − L (1 − L)k−m w(k)
m=0 m=0
Advanced Probability and Statistics: Applications to Physics and Engineering 307

This formula clearly shows that if d(m) − d(m − 1) → 0 and noise is absent,
then e(k) → 0 as required. It is also clear that in the absence of noise, the rate
at which e(k) converges to zero is −log( 1 − L . To increase this rate, we
must make  1 − L  as small as possible. But then if L is close to unity, in the
presence of noise, the d.o. estimation error variance will become large as shown
by the above formula. Explicitly, we choose an > 0 and a positive integer
N = N ( ) such that

 d(m) − d(m − 1) < ∀m > N

Then, in the absence of noise, we have


N
 e(k) ≤  1 − L k−m  d(m) − d(m − 1) 
m=0


k−1
+  1 − L k−m
m=N +1


→ .  1 − L r
r=1

= .  1 − L  /(1−  1 − L ), k → ∞
Then letting → 0, we conclude that in the absence of noise,

e(k) → 0

Now when noise is present and is zero mean white,

E[ e(k) 2 ] ≤


k−1
2(  1 − L k−m  d(m) − d(m − 1) )2
m=0


k
2
+L  1 − L 2(k−m) σw
2

m=0

the rhs of which converges as k → ∞ to

 L 2 σw
2
/(1−  1 − L 2 )

Hence, to reduce this noise variance bound and simultaneously to get a decently
fast rate of convergence of the d.o. estimation error to zero, we must choose L
so that a cost function of the form
2
C(L) = a.  1 − L  +b.σw  L 2 /(1−  1 − L 2 )
308 Advanced Probability and Statistics: Applications to Physics and Engineering

is a minimum for some positive weights a, b. Another way to consider this


problem is to assume that

 d(m) − d(m − 1) ≤ K < ∞∀m

and then obtain from the above,

E[ e(k) 2 ] ≤


k−1
2K 2 (  1 − L k−m )2
m=0


k
+  L 2  1 − L 2(k−m) σw
2

m=0

which is upper bounded for all k by

2K 2  1 − L 2 /(1−  1 − L )2

+  L 2 σw
2
/(1−  1 − L 2 )

[7] The EKF equations corresponding to the state equations

ξ(k + 1) = ψk (ξ(k)) + gk (ξ(k))W (k + 1)

and the measurement model

z(k) = hk (ξ(k)) + V (k)

are
ˆ + 1|k) = ψk (ξ(k|k)),
ξ(k ˆ

ˆ + 1|k), e(k|k) = ξ(k) − ξ(k|k)


e(k + 1|k) = ξ(k + 1) − ξ(k ˆ

cov(e(k|k)|Zk ) = P (k|k), cov(e(k + 1|k)|Zk ) = P (k + 1|k)


ˆ + 1|k + 1) = ξ(k
ξ(k ˆ + 1|k))
ˆ + 1|k) + K(k + 1)(z(k + 1) − hk (ξ(k

where
T
K(k + 1) = P (k + 1|k)Hk+1 T
(Hk+1 RW Hk+1 + RV )−1

Hk+1 = hk (ξ(k


ˆ + 1|k))

P (k + 1|k) = Ψk+1 P (k|k)ΨTk+1 + Gk+1 RW GTk+1


where
 ˆ + 1|k))
Gk+1 = gk+1 (ξ(k
Advanced Probability and Statistics: Applications to Physics and Engineering 309

and finally,

P (k+1|k+1) = (I−K(k+1)Hk+1 )T P (k+1|k)(I−K(k+1)Hk+1 )T

+K(k+1)PV K(k+1)T
[8] Some remarks on fracture analysis of materials
A brittle object like a chalk is in some sense an elastic material. When a
twist torque or deforming torque is applied to it, it suddenly gets fractured
along certain curves on its boundary surface. Let D denote the surface of the
material. On applying the twisting torque, the material gets partitioned into
disjoint open sets Dk , k = 1, 2, ..., p. The boundaries on the material surface
are therefore Djk = Cl(Dj ) ∩ Cl(Dk ) where 1 ≤ j < k ≤ p where Cl(E)
denotes the closure of the set E. Within each domain Dj , the equations of
elasticity are valid. Thus if D0 denotes the union of the disjoint open sets
Dj , j = 1, 2, ..., p, then in D, the material has a displacement field u(t, r) with
cartesian components uk (t, r), k = 1, 2, 3. The strain tensor of the material in
D0 is then
ujk = (1/2)(uj,k + uk,j )
and if C(jklm) denote the elastic constants, then the Lagrangian density of the
material is given by
3
 
L(uj , uj,k , uj,t ) = (ρ/2) u2j,t − (1/2) C(jklm)ujk ulm
j=1 jklm

Applying the variational principle to this Lagrangian density gives us the wave
equation
ρuj,tt = C(jklm)ul,mk
where the Einstein summation convention has been used, ie, summation over
repeated indices. This equation is solved within each open set Dj resulting in
a waves with unknown coefficients. Then we apply the condition that on the
boundary between two regions Dj and Dk , ie, on Djk a dis-continuity condition
is imposed, for example on this boundary, we may impose the condition that
the difference between the displacements is prescribed. Such a discontinuity
condition must be imposed because it is precisely this that characterises the
nature of the fracture. To make this analysis more precise, we take the spatial
Fourier transform of the above wave equation leading to

ρûj,tt (t, k) = −C(jplm)km kp ûl (t, k)

We absorb the density ρ within the elastic constants and then solve the above
equation to get
û(t, k) = exp(itF(k))a(k) + exp(−itF(k))b(k)
310 Advanced Probability and Statistics: Applications to Physics and Engineering

where F(k) is a square root of the matrix ((C(jplm)km kp ))1≤k,l≤3 . The constant
vectors a(k), b(k) depend on the region Dj under consideration. Specifically,
the inverse spatial Fourier transform of the above equation has the general form

u(t, r) = uj (t, r) = G(t, r − r )Aj (r )d3 r , r ∈ Dj
Dj

and the functions Aj (r), r ∈ Dj must be constrained by a boundary condition


of the form
ψjk (r, Aj , Ak ) = 0, r ∈ Djk
for example a condition like for example after time T , the displacements of the
material on both sides of the boundary Djk has a difference equal to cjk (r), r ∈
Djk would translate to the equation

uj (T, r) − uk (T, r) = cjk (r), r ∈ Djk

Now suppose we take into account an external force/torque applied to the ma-
terial. Then, the above anisotropic wave equation gets replaced with a wave
equation with source:
uj,tt (t, r) = C(jplm)ul,mp (t, r) + fj (t, r)

which translate in the spatial frequency domain to the ode

ûj,tt (t, k) = −C(jplm)km kp ûl (t, k) + fˆj (t, k)

or equivalently in vector-matrix notation,

û,tt (t, k) = −F2 (k)û(t, k) + f̂ (t, k)

The general solution to this equation is

u(t, k) = exp(itF (k))a(k)+exp(−itF (k))b(k)+F (k)−1



(sin((t−s)F (k))/(t−s))f (s, k)ds

and then after time T , this solution can be matched at the boundaries Djk to
the appropriate discontinuity condition.
Advanced Probability and Statistics: Applications to Physics and Engineering 311

Example of one dimensional fracture: Consider the fracture boundary to be


x = c ∈ (a, b) where [a, b] is the region over which the elastic waves propagate.
The wave equations in the disjoint regions (a, c) and (c, b) are the same, namely,

utt (t, x) − m2 uxx (t, x) = 0

Taking Fourier transform w.r.t the time variable gives

û,xx (k, x) + kû(k, x) = 0

where k = ω/m, we find the solutions in the two regions to be

û(k, x) = A1 (k).cos(kx) + B1 (k)sin(kx), a < x < c,

û(k, x) = A2 (k).cos(kx) + B2 (k).sin(kx), c < x < b


If we impose the boundary conditions u(t, a) = u(t, b) = 0, then this becomes

û(k, x) = A(k).sin(kx), 0 < x < c,

û(k, x) = B(k).sin(k(x − b)), c < x < b


We then impose the discontinuity condition

u(t, c + 0) − u(t, c − 0) = d(t)

This condition corresponds to fracture at x = c and it translates in the frequency


domain to
û(k, x + 0) − û(x − 0) = fˆ(k)
Thus,
B(k).sin(k(c − b)) − A(k).sin(kc) = fˆ(k)
This furnishes with one relationship between the functions A(k), B(k).

Example of two dimensional fracture: The material boundary is assumed to


be rectangular ie [0, a] × [0, b]. The fracture boundary is assumed to be

x=c

Then the wave equations in the two regions x < c and x > c are the same:

u,tt (t, xx)(t, x, y) − u,xx (t, x, y) − u,yy (t, x, y) = 0

Using separation of variables, the solution that vanishes at x = 0, a and y = 0, b


at a given temporal frequency k is given by

û(k, x, y) = A1 (k, m).sin(α(m)x)sin(mπy/b), 0 < x < c, 0 < y < b,
m

û(k, x, y) = A2 (k, m).sin(α(m)(x − a)).sin(mπy/b), c < x < a, 0 < y < b
m
312 Advanced Probability and Statistics: Applications to Physics and Engineering

where
k 2 = α2 + (mπ/b)2
Applying the boundary condition,

û(k, c + 0, y) − û(k, c − 0, y) = fˆ(k, y) = fˆ(k, m)sin(mπy/b)
m
we get that
A2 (k, m).sin(α(m)(c − a)) − A1 (k, m)sin(α(m)c) = fˆ(k, m)

This gives us one relationship between the coefficients A1 (k, m), A2 (k, m). How-
ever, in these two examples, we’ve not taken into account the external deforming
forces. We shall now do so. In the first 1-D example, our equations of motion
are
u,tt (t, x) − u,xx (t, x) = g(t, x)
which gives on taking the temporal Fourier transform,

û,xx (k, x) + k 2 û(k, x) = −ĝ(k, x), x ∈ (a, c) ∪ (c, b)

The general solution to this is


 x
û(k, x) = A(k).cos(kx)+B(k).sin(kx)− (sin(k(x−x ))/k(x−x ))g(k, x )dx
a

Applying the boundary condition that this vanishes at x = a, b, we get the


solutions in the two disjoint regions as
 x
û(k, x) = A1 (k).sin(k(x−a))+ (sin(k(x−x ))/k(x−x ))g(k, x )dx , a < x < c,
a

û(k, x) = A2 (k).sin(k(x−b))+B2 (k).cos(k(x−b))+


 x
(sin(k(x−x ))/(k(x−x ))g(k, x )dx , c < x < b
a

where  b
B2 (k) + (sin(k(x − x ))/(k(x − x ))g(k, x )dx = 0
a
The fracture boundary condition is
ˆ
û(k, c + 0) − û(k, c − 0) = d(k)

and that gives one relationship between the coefficients A1 (k), A2 (k), B2 (k).
Now we come to the fracture analysis of general anisotropic materials. It
is clear that to describe the fracture using domains in the spatial regions, we
should not use spatial Fourier transforms. Rather, we must use temporal Fourier
transforms. This leads to the generalized anisotropic Helmholtz equation

C(lpmn)un,pm (k, r) + k 2 ul (k, r) = 0


Advanced Probability and Statistics: Applications to Physics and Engineering 313

Substituting ul (k, r) = Al (K)exp(iK.r) into this equation gives us

k 2 Al (K) − C(lpmn)Kp Km An (K) = 0

which has a non-zero solution for Al (K) only when

det(k 2 I3 − F (K)) = 0

where F (K) is the matrix ((C(lpmn)Kp Km ))l,n Writing formally the solu-
tions as k = ±k r (K), r = 1, 2, 3, with the corresponding eigenvectors being
Ar (K), B r (K)Ār (K)r = 1, 2, 3, we get
 3
un (t, r) = (Arn (K)exp(ik r (K)t) + Bnr (K)exp(−ik r (K)t)exp(iK.r)d3 K
r=1

or equivalently, in the temporal frequency domain,



un (k, r) = [δ(k − k r (K))An (K) + δ(k + k r (K))Bn (K)].exp(iK.r)d3 K

The functions An (K), Bn (K) are different for the different domains and a dis-
continuity matching condition at the boundary has be applied. Since un (t, r) is
a real function, we can write
 
3
un (t, r) = Re(Arn (K).exp(i(k r (K)t − K.r)))d3 K
r=1
or equivalently, upto a proportionality factor, in the temporal frequency domain,
 3
un (k, r) = [ Arn (K)exp(−iK.r)δ(k−k r (K))+Ārn (K)exp(iK.r)δ(k+k r (K))]d3 K
r=1

Note that k r (−K) = k r (K) since C(lpmn)Kp Km does not change sign if the
3-vector K is replaced by −K. In the j th region, we therefore have the solution
 
3
un (t, r, j) = Re(Arn (K, j).exp(i(k r (K)t − K.r)))d3 K, r ∈ Dj
r=1

Now applying the boundary condition

un (t, r, j) − un (t, r, l) = dn (t, r, j, l), r ∈ Dj ∩ Dl

we get
 
3
Re((Arn (K, j) − Arn (K, l)).exp(i(k r (K)t − K.r)))d3 K − dn (t, r, j, l) = 0,
r=1
314 Advanced Probability and Statistics: Applications to Physics and Engineering

∀r ∈ Dj ∩ Dl This equation imposes constraints on the functions Arn (K, j), j =


1, 2, ..., p.
Acknowledgements:This problem was suggested to me by my colleague
Dr.Abhishek Tevatia.
Quantization of fracture mechanics
Suppose we have a pseudo-elastic material like chalk. When we apply a twist
(ie, torsion force) to it, then it develops fractures along certain curves on the
boundary surface. We can thus partition the material into disjoint regions where
the fractures occur along the boundaries between the disjoint region and within
each disjoint region, the laws of elasticity are valid. Let therefore, D1 , D2 , .., DN
denote the disjoint regions. The boundaries are therefore the sets Bjk = cl(Dj )∩
(cl(Dk )), 1 ≤ j < k ≤ N where the Dj s are open sets on the surface of the
topological manifold on which the fractures appear. Let ua (t, r) denote the ath
component of the displacement on the surface of the body. Then strain tensor
is then given by in the union of the open sets Dk , k = 1, 2, ..., N ,

sab (t, r) = (1/2)(ua,b + ub,a )(t, r)

and the stress tensor is


σab = C(abcd)scd
with summation over the repeated indices c, d. In this expression, C(abcd) are
the elastic constants. The Lagragian density of the body is thus given by
3
 
L(ua , ua,t , ua,b ) = (ρ/2) u2a,t − (1/2) C(abcd)uab ucd
a=1 abcd

Quantization of the wave-equations of elasticity: The equations of motion


obtained from the variational principle and taking external forces into account
are
ua,tt (t, r) = C(abcd)uc,bd (t, r) + fa (t, r)
Alternatively, passing over to the Hamiltonian, we get the canonical momentum
fields as
πa (t, r) = ∂L/∂ua,t

= ua,t
Thus, the Hamiltonian density is given by

H(t, r, ua , ua,b , πa ) =

πa ua,t − L =

(1/2)πa πa + 2C(abcd)ua,b uc,d − fa (t, r)ua


Advanced Probability and Statistics: Applications to Physics and Engineering 315

We now calculate the Hamiltonian for this elasticity field inside a cube of side-
length L: 
H(t, ua , πa ) = Hd3 r
[0,L]3

To evaluate this, we expand ua (t, r) as a Fourier series within this box:



ua (t, r) = ua (t, n)exp(2πin.r/L)
n

where n varies over Z3 . Then with B = [0, L]3 , we have


 
2K(t) = ua,t (t, r)2 d3 r = L3 |ua,t (t, n)|2
B n

for twice the kinetic energy and for twice the elastic potential energy,

2V (t) = C(abcd) uab (t, r)ucd (t, r)d3 r
B

= 4C(abcd) ua,b uc,d d3 r
B

= 4π 2 LC(abcd) ua (t, n)ūc (t, n)nb nd
n

The Lagrangian can therefore be expressed as


L(t, ua (t, n), ua,t (t, n))
 
= (L3 /2) |ua,t (t, n)|2 −2π 2 LC(abcd) nb nd ua (t, n)ūc (t, n)
n n

This is the Lagrangian in the Fourier series domain.


[9] Neural network based EKF with disturbance observer for chaotic
systems
The given chaotic system taking into account disturbance effects is
y(k + 1) = f (y(k)) + d(k + 1), y(k) ∈ Rn − − − (1)

The NN approximation to this dynamics is


y(k + 1) = V ∗ .tanh(W ∗ y(k)) − − − (2)

where
V ∗ ∈ Rn×p , W ∗ ∈ Rp×n − − − (3)
To improve upon this approximation, we make NN weights adaptive so that the
NN approximation to the chaotic system is

ŷ(k + 1) = Vk .tanh(Wk y(k)) − − − (4)


316 Advanced Probability and Statistics: Applications to Physics and Engineering

We write for the weight errors:


Ṽk = Vk − V ∗ , W̃k = Wk − W ∗ − − − (5)
so that retaining only first order of smallness terms, (2) can be expressed as
y(k + 1) = (Vk − Ṽk ).tanh((Wk − W̃k )y(k))

= Vk .tanh(Wk y(k)) − Ṽk .tanh(Wk y(k)) − Vk Dk .W̃k y(k)


= ŷ(k + 1) − Ṽk .tanh(Wk y(k)) − Vk .Dk .W̃k y(k)
where
Dk = diag[sech2 ((Wk yk )(i) : i = 1, 2, ..., p]
Thus,
ỹ(k + 1) = ŷ(k + 1) − y(k + 1) = Ṽk .tanh(Wk y(k)) + Vk .Dk .W̃k y(k)

The NN weight EKF observer is thus given by

V ec(V̂k+1 ) = V ec(V̂k ) − K1 ỹ(k + 1),

and
V ec(Ŵk+1 ) = V ec(Ŵk ) − K2 ỹ(k + 1)
We know that
V ec(ỹ(k+1)) = ỹ(k+1) = (tanh(Wk y(k))T ⊗In )V ec(Ṽk )+(y(k)T ⊗Vk Dk )V ec(W̃k )
To implement the EKF, we use D̂k in place of Dk and also use Ŵk , V̂k in place
of Wk , Vk respectively. Thus, the EKF observer becomes
V ec(V̂k+1 ) = V ec(V̂k )

−K1 ((tanh(Ŵk y(k))T ⊗In )V ec(V̂k −V ∗ )+(y(k)T ⊗Vk D̂k )V ec(Ŵk −W ∗ ))),
and
V ec(Ŵk+1 ) = V ec(Ŵk )
−K2 ((tanh(Ŵk y(k))T ⊗In )V ec(V̂k −V ∗ )+(y(k)T ⊗Vk D̂k )V ec(Ŵk −W ∗ ))),
where
D̂k = diag[sech2 ((Ŵk y(k))(i) : i = 1, 2, ..., p]
Taking noise effects into account, the Kalman gains K1 , K2 must be chosen so
as to minimize
T rE(V ec(Ṽk+1 ).V ec(Ṽk+1 )T ) + T rE(V ec(W̃k+1 ).V ec(W̃k+1 )T )
where
Ṽk+1 = V̂k+1 − V ∗ , W̃k+1 = Ŵk+1 − W ∗
We observe that with these definitions, the weight update equations can be
expressed as
V ec(Ṽk+1 ) = V ec(tildeVk )
−K1 ((tanh(Ŵk y(k))T ⊗In )V ec(Ṽk )+(y(k)T ⊗Vk D̂k )V ec(W̃k )),
Advanced Probability and Statistics: Applications to Physics and Engineering 317

and

V ec(Ŵk+1 ) = V ec(Ŵk )
−K2 ((tanh(Ŵk y(k))T ⊗In )V ec(V̂k −V ∗ )+(y(k)T ⊗Vk D̂k )V ec(Ŵk −W ∗ ))),
Also for the disturbance observer, we take
ˆ + 1) = d(k)
d(k ˆ + L(d(k + 1) − d(k))
ˆ

ˆ + L(y(k + 1) − f (y(k)) − d(k))


= d(k) ˆ
ˆ + L(y(k + 1) − V̂k .tanh(Ŵk y(k)) − d(k))
≈ d(k) ˆ
ˆ + L(V ∗ .tanh(W ∗ y(k)) − V̂k .tanh(Ŵk y(k)) − d(k))
≈ d(k) ˆ
ˆ − L(ỹ(k + 1) + d(k))
= d(k) ˆ
ˆ − Lỹ(k + 1)
= (1 − L)d(k)
[10] A problem given by Mridul:
The continuous time case: Consider an ideal integrator in continuous time
given by the transfer function H(s) = 1/s in the Laplace domain or H(jω) =
1/jω in the frequency domain. More precisely, if x(t) is the input signal, then
the integrator output is
 t
y(t) = u(t) ∗ x(t) = x(τ )dτ
−∞

so its transfer function in the frequency domain is


 ∞
1
H(jω) = exp(−jωt)dt = limσ→0
0 σ + jω

= limσ→0 (σ/(σ 2 + ω 2 ) − jω/(σ 2 + ω 2 ))


= 1/jω + πδ(ω)
Let x(t) be a stationary input random process with power spectral density
Sx (ω). On passing this signal through the integrator, the output has a power
spectral density
Sy (ω) = Sx (ω)/ω 2
and hence the total output power in the frequency band {ω : |ω| > } is given
by  ∞
Py ( ) = 2 Sx (ω)dω/ω 2

and the total output power in the band {ω : |ω| > ω0 } is given by
 ∞
Py (ω0 ) = 2 Sx (ω)/ω 2
ω0
318 Advanced Probability and Statistics: Applications to Physics and Engineering

To obtain the analogue of the 3-dB bandwidth we choose a fraction 0 < α < 1
like for example 1/2 and ask the question for what value of ω0 is Py (ω0 ) smaller
than Py ( ). The minimum value of such an ω0 satisfies the equation

Py (ω0 ) = α.Py ( )

For example, choosing x(t) to be Dβ w(t) where w(t) is white Gaussian noise
(Sw (ω) = 1∀ω), we find that Sx (ω) = |ω|2β . Here, β is any real number and
Dβ is a fractional derivation if β > 0 or a fractional integrator if β < 0. If
2β − 2 < −1, ie, β < 1/2, we have
 ∞
Py ( ) = 2 dω.ω 2β−2 = (2/(1 − 2β)) 2β−1

and hence the alpha-bandwidth of this lowpass filter for such an x(t) is given
by
1/ω01−2β = α/ 1−2β
or equivalently,
ω0 = .α1/(2β−1) >
More generally, we can choose an input


N
x(t) = c(m)Dβm wm (t)
m=1

where wm (.), m = 1, 2, ..., N are independent white noise processes and βm <
1/2, m = 1, 2, ..., N . Then the input power spectral density is


N
Sx (ω) = |c(m)|2 |ω|2βm
m=1

and corresponding to this input signal the α-bandwidth ω0 satisfies the equation


N 
N
(|c(m)|2 /(1 − 2βm ))ω02βm −1 = α. (|c(m)|2 /(1 − 2βm )) 2βm −1

m=1 m=1

The discrete time case: The integrator here is an accumulator described by


the i/o relation
n
y(n) = x(k) = u(n) ∗ x(n)
k=−∞

The transfer function of the accumulator is obtained by taking the Z-transform


on both sides of the identity

δ[n] = u[n] − u[n − 1]


Advanced Probability and Statistics: Applications to Physics and Engineering 319

It gives
H(z) = (1 − z −1 )−1 , |z| > 1
There is a singularity at z = 1 which prevents one from defining the DTFT and
it can be sorted out as follows:

H(exp(jω)) = limr>1,r→1 (1 − r−1 exp(−jω))−1

1 − r−1 exp(jω)
= limr→1
(1 + r−2 − 2r−1 .cos(ω))
Since the singularity is at ω = 0, we consider the above expression for small |ω|.
It is approximately given by

1 − r−1 − jω
(1 − r−1 )2 + ω 2

Writing = 1 − r−1 , this expression is


− jω
2 + ω2
As → 0, the real part of this has an area π and therefore, we find that

H(exp(jω) = (1 − exp(−jω))−1 + πδ(ω)

Note that unlike the continuous time case, here, the frequency range is [−π, π).
Now we take a random signal x(t) with power spectral density Sx (ω) that is
zero when |ω| < . The total output power after passing this signal through the
above low pass filter is given by
 π
Py ( ) = 2 (1 − exp(−jω)|−2 Sx (ω)dω

 π
=4 (1 − cos(ω))−1 Sx (ω)dω

and likewise if ω0 ∈ [ , π) is the α-bandwidth, we must have


 π
Py (ω0 ) = 4 (1 − cos(ω))−1 Sx (ω)dω
ω0

= α.Py ( )
To get closed form solutions for the α-bandwidth, we may take Sx (ω) = (1 −
cos(ω))n where n is a positive integer and then corresponding to this input
spectrum, the α-bandwidth ω0 satisfies
 π 
(1 − cos(ω)) n−1
dω = α. π (1 − cos(ω))n−1 dω
ω0
320 Advanced Probability and Statistics: Applications to Physics and Engineering

The integrals in this expression are readily evaluated and the α-bandwidth de-
termined. Note that in both the discrete and continuous time situations, we
must set a threshold on the lower frequency region, so that the input spectrum
does not have any frequency components smaller than this threshold because of
the singularity of the integrator at zero frequency. This is equivalent to requir-
ing that the input signal does not contain any d.c. component since a constant
or dc signal when integrated or summed over an infinite time range gives an
infinite output.

[11] Problems in stochastic filtering and control


[1] Estimating the neural network weights using the EKF driven by the
output of a nonlinear plant dynamical system in order to model the plant using
the neural network. Whilst estimating the neural weights, we take into account
the presence of disturbance in the plant dynamics via a disturbance observer.

The dynamical system has the form

y(t + 1) = F (y(t), y(t − q)|θ) + d(t + 1)

We propose to estimate the function F by modeling this system using a single


layer neural network as

x(t + 1) = V (t).tanh(W (t)x(t)) + d(t + 1), y(t) = x1 (t),

W (t + 1) = W (t) + noise, V (t + 1) = V (t) + noise


where V (t), W (t) are weight matrices of size n × p and p × n respectively with
x(t) being of size n × 1. The disturbance observer is constructed as

ˆ + 1) = d(t)
d(t ˆ + L(d(t + 1) − d(t))
ˆ ˆ + L (t + 1)
= d(t)

ˆ is assumed to be white noise. Note that we can


where (t + 1) = d(t + 1) − d(t)
write
ˆ + (t + 1)
d(k + 1) = d(k)
The EKF for the extended state vector
ˆ T ]T
ξ(t) = [x(t)T , V ec(W (t))T , V ec(V (t))T , d(t)

based on the noisy measurements

z(t) = y(t) + v(t)

is given by
x̂(t + 1|t) = V̂ (t|t).tanh(Ŵ (t|t)x̂(t|t)) +ˆˆd(t|t),
Ŵ (t + 1|t) = Ŵ (t|t), V̂ (t + 1|t) = V̂ (t|t),
ˆ + 1|t) = d(t|t)
d(t ˆ
Advanced Probability and Statistics: Applications to Physics and Engineering 321

or equivalently,
ˆ + 1|t) = ψ(ξ(t|t))
ξ(t ˆ

where
ˆ T , V ec(W )T , V ec(V )T , dˆT ]T
ψ(ξ) = [(V.tanh(W x) + d)
for
ξ = [xT , V ec(W )T , V ec(V )T , dˆT ]T
and the next step in the EKF is
ˆ + 1|t + 1) = ξ(t
ξ(t ˆ + 1|t) + K.(z(t + 1) − H ξ(t
ˆ + 1|t))

where
H = [1, 0T ]T
Note that
z(t) = Hξ(t) + v(t)
The Kalman gain vector K is chosen so that T r(P (t + 1|t + 1)) is a minimum,
where
ˆ + 1|t + 1)) =
P (t + 1|t + 1) = cov(ξ(t + 1) − ξ(t
ˆ + 1|t) + e(t + 1|t) − ξ(t
= cov(ξ(t ˆ + 1|t) − K(He(t + 1|t) + v(t + 1)))

= (I − KH).P (t + 1|t).(I − KH)T + KRv K T


minimizing the trace of this over K gives the optimal K as

K = P (t + 1|t)H T .(HP (t + 1|t)H T + Rv )−1

and correspondingly the optimum P (t + 1|t) obtained by substituting this K


into the above equation as

P (t + 1|t) = P (t + 1|t)(I − KH)T = (I − KH)P (t + 1|t)

To complete the EKF iteration loop, we require


ˆ + 1|t) = ψ(ξ(t)) + noise − ψ(ξ(t|t))
e(t + 1|t) = ξ(t + 1) − ξ(t ˆ

= ψ  (ξ(t|t))e(t|t)
ˆ + noise
so that
P (t + 1|t) = ψ  (ξ(t|t))P
ˆ (t|t).ψ  (ξ(t|t))
ˆ T
+Q
Now suppose that we run this EKF iteration loop for T iterations. Then, the
converged weight matrices are W ∗ = Ŵ (T |T )), V ∗ = V̂ (T |T )), our approxi-
mated dynamical system is then

x̃(t + 1) = V ∗ .tanh(W ∗ x̃(t)) +ˆ


ˆd(t|t), ỹ(t) = x̃1 (t)

We wish to compare this dynamical system with the original dynamical system,
or more precisely, estimate the parameter θ of the original dynamical system
322 Advanced Probability and Statistics: Applications to Physics and Engineering

based on this approximated system. For that purpose, we apply the EKF to the
original dynamical system to estimate θ based on output measurements ỹ(t) of
the approximated dynamical system. Define the state vector
ˆ T
η(t) = [y(t), y(t − 1), ..., y(t − q), θ(t), d(t)]

Then the original dynamical system can be expressed as


ˆ T ]T +noise
η(t+1) = [f (y(t), y(t−q)|θ(t)), y(t), y(t−1), ..., y(t−q+1), θ(t)T , d(t)

= φ(η(t)) + noise
The EKF for this is
η̂(t + 1|t) = φ(η̂(t|t))
η̂(t + 1|t + 1) = η̂(t + 1|t) + K0 .(z̃(t + 1) − H0 η̂(t + 1|t))
Note that the measurement z̃(t), is taken from the approximated plant:

z̃(t) = ỹ(t) + noise

Here,
H0 = [1, 0T ]T ,
and the Kalman gain K0 and the error correlations P (t|t), P (t + 1|t) are com-
puted as usual but based on the original plant φ(.):

P (t + 1|t) = φ (η̂(t|t))P (t|t).φ (η̂(t|t))T + Q

K0 = P (t + 1|t)H0T (H0 P (t + 1|t)H0T + Rv )−1


P (t + 1|t + 1) = (I − K0 H0 )P (t + 1|t)

[12] Dual EKF with neural network for (a) modeling the plant dynamics by
a neural network and (b) using the output of the neural network to estimate
the plant parameters. The plant dynamics is given by

y(n + 1) = F (y(n), y(n − q)|b, c) + d(n + 1)

and the neural network that approximates this plant has a dynamics given by

y1 (n + 1) = V (n).tanh(W (n)y1 (n)) + d(n + 1)

The plant measurement model is

z(n) = y(n) + v (n)

and the neural network measurement model is

z1 (n) = y1 (n) + 1 (n)


Advanced Probability and Statistics: Applications to Physics and Engineering 323

Based on z(n), we construct the EKF for estimating the weights W (n), V (n) and
based on z1 (n), we construct another EKF for estimating the plant parameters
θ = (b, c). These two EKF’s are run parallely to each other and constitute the
dual EKF. Write the plant dynamics in state variable form

xs (n + 1) = ψ(xs (n)) + s (n + 1)

where
d (n
ˆ
+ 1) = d(n + 1) − d(n)
the first component of s being d, and
ˆ T
xs (n) = [y(n), y(n − 1), ..., y(n − q), θ(n)T , d(n)]

with the dynamics of θ, dˆ being given by

θ(n + 1) = θ(n) + noise,

ˆ + 1) = d(n)
d(n ˆ + L d (n + 1)

Thus,

ψ(xs ) = [F (xs,1 (n), xs,q+1 (n)|xs,q+2 (n), ..., xs,q+p+1 (n)), xs,1 (n), ...,

xs,q (n), xs,q+2 (n), ..., xs,q+p+1 (n), xs,q+p+2 (n)]T


note that

xs,1 (n) = y(n), ..., xs,q (n) = y(n − q + 1), θ(n) = [xs,q+2 (n), ..., xs,q+p+1 (n)]T ,

ˆ = xs,q+p+2 (n)
d(n)
The DEKF:

ŷ1 (n + 1|n) = V̂ (n|n).tanh(W (n)ŷ1 (n|n)) +ˆˆd(n|n),

V̂ (n + 1|n) = V̂ (n|n), Ŵ (n + 1|n) = Ŵ (n|n),


ˆ
ˆd(n + 1|n) =ˆ
ˆd(n|n),
[ŷ1 (n + 1|n + 1), V̂ (n + 1|n + 1), Ŵ (n + 1|n + 1),ˆˆd(n + 1|n + 1)]T
= [ŷ1 (n + 1|n), V̂ (n + 1|n), Ŵ (n + 1|n),ˆˆd(n + 1|n)]T
+KN (z(n + 1) − ŷ1 (n + 1|n))
where KN is constructed in the usual way:
T
KN = PN (n + 1|n)HN T
(H1 PN (n + 1|n)HN + Rs )−1

PN (n + 1|n) = φ (x̂N (n|n))PN (n|n)ψ  (x̂N (n|n))T + QN


PN (n + 1|n + 1) = (I − KN HN )PN (n + 1|n)
324 Advanced Probability and Statistics: Applications to Physics and Engineering

where
ˆ T , φ(xN ) = [V.tanh(W y1 ), V, W, d]
xN = [y1 , V, W, d] ˆT

and the dual EKF for the original plant is

x̂s (n + 1|n) = ψ(x̂s (n|n)),

x̂s (n + 1|n + 1) = x̂s (n + 1|n) + Ks (z1 (n + 1) − ŷ(n + 1|n))


Ks = Ps (n + 1|n)HsT (Hs Ps (n + 1|n)HsT + RN )−1
Ps (n + 1|n) = ψ  (x̂s (n|n))Ps (n|n).ψ  (x̂s (n|n))T + Qs ,
Ps (n + 1|n + 1) = (I − Ks Hs )Ps (n + 1|n)

[13] Performance analysis of EKF when the state is an arbitrary Markov


process.
Consider the scalar case first. Let x(t) be a Markov process with generator
Kt . Thus,

E[dφ(x(t))|x(t) = x] = Kt φ(x)dt = dt Kt (x, y)φ(y)dy

Write
Kt (x, y) = ∂Kt (x, y)/∂x, Kt (x, y) = ∂ 2 Kt (x, y)/∂x2
The Kushner-Kallianpur filter without any approximations is given by

dπt (φ) = πt (Kt φ)dt + (πt (hφ) − πt (h)πt (φ))(dz(t) − πt (h)dt)

where
πt (φ) = E[φ(x(t))|Zt ]
Here, the measurement process is

dz(t) = h(x(t))dt + σv dv(t)

where v(.) is standard Brownian motion indepenendent of x(.). Also Zt stands


for σ(z(s) : s ≤ t), the measurement process upto time t. We now make approx-
imations to this exact infinite dimensional filter. Specifically, we let φ(x) = x, x2
and linearize: 
Kt (x) = Kt (x, y)ydy
 
πt (Kt (x)) = E(Kt (x(t), y)|Zt )ydy ≈ Kt (x̂(t), y)ydy = μt (x̂(t))

where 
μt (x) = Kt (x, y)ydy

πt (x) = x̂(t), πt (h) = E(h(x(t))|Zt ) ≈ h(x̂(t))


Advanced Probability and Statistics: Applications to Physics and Engineering 325

πt (x.h) = E(x(t)h(x(t))|Zt ) ≈ x̂(t)h(x̂(t)) + h (x̂(t))P (t)


where
e(t) = x(t) − x̂(t), P (t) = E(e2 (t)|Zt )
Then we find with these approximations that
dx̂(t) = μt (x̂(t))dt + σv−2 h (x̂(t))P (t)(dz(t) − h(x̂(t))dt)
To complete the approximate filter, we need an equation for P (t):

Kt (x ) = Kt (x, y)y 2 dy
2

so that 
πt (Kt (x2 )) = E(Kt (x(t), y)|Zt )y 2 dy
 
2
≈ Kt (x̂(t), y)y dy + (1/2)( Kt (x̂(t), y)y 2 )P (t)

= Lt (x̂(t))P (t) + Mt (x̂(t))


where 
Lt (x) = (1/2) Kt (x, y)y 2 dy,

Mt (x) = (1/2) Kt (x, y)y 2 dy

Further,
πt (x2 ) = E(x(t)2 |Zt ) = x̂(t)2 + P (t),
πt (h) ≈ h(x̂(t))
πt (x h(x)) ≈ x̂(t) h(x̂(t)) + (1/2)(2h(x̂(t)) + 4h (x̂(t))x̂(t) + h (x̂(t))x̂(t)2 )P (t)
2 2

Thus,
πt (x2 h(x)) − πt (x2 )πt (h(x)) ≈ 2h (x̂(t))x̂(t)P (t)
Thus, we get with these approximations,

dP (t)+2x̂(t)dx̂(t)+(dx̂(t))2
= Mt (x̂(t))dt+Lt (x̂(t))P (t)dt+2σv−2 h (x̂(t))x̂(t)P (t)(dz(t)−h(x̂(t))dt)
or equivalently, using Ito’s formula,
dP (t)+2x̂(t)(μt (x̂(t))dt+σv−2 P (t)h (x̂(t))(dz(t)
−h(x̂(t))dt))+σ −2 P (t)2 h (x̂(t))2 dt
v

= Mt (x̂(t))dt + Lt (x̂(t))P (t)dt + 2σv−2 h (x̂(t))x̂(t)P (t)(dz(t) − h(x̂(t))dt)


which simplifies to give
P  (t) = −σv−2 P (t)2 h (x̂(t))2 + (Mt (x̂(t)) + Lt (x̂(t))P (t)) − 2x̂(t)μ(x̂(t))
326 Advanced Probability and Statistics: Applications to Physics and Engineering

Consider now the special situation in which x(t) satisfies the sde

dx(t) = μ(x(t))dt + σ(x(t))dN (t)

where N (.) is a Poisson process with rate λ. Then, the generator of x(.) is given
by
Kt φ(x) = μ(x)φ (x) + λ(φ(x + σ(x)) − φ(x))
We get
Kt (x) = μ(x) + λ.σ(x),
Kt (x ) = 2μ(x)x + λ(σ(x)2 + 2xσ(x))
2

We have
πt (Kt (x)) ≈ μ(x̂(t)) + λ.σ(x̂(t))
πt (Kt (x2 )) ≈ 2μ(x̂(t))x̂(t) + 2μ (x̂(t))P (t)
πt (x.h(x)) − πt (x).πt (h(x)) ≈
h (x̂(t))P (t)
and the EKF in this case simplifies to

dx̂(t) = (μ(x̂(t)) + λ.σ(x̂(t))dt+

σv−2 h (x̂(t))P (t)(dz(t) − h(x̂(t))dt),


d(P (t) + x̂(t)2 ) = dP (t) + 2x̂(t)dx̂(t) + (dx̂(t))2 =
dP (t) + 2x̂(t)((μ(x̂(t)) + λ.σ(x̂(t))dt+
σv−2 h (x̂(t))P (t)(dz(t) − h(x̂(t))dt))
+σv−2 h (x̂(t))2 P (t)2 dt
= 2μ(x̂(t))x̂(t) + 2μ (x̂(t))P (t) + σv−2 P (t)h (x̂(t))(dz(t) − h(x̂(t))dt)
which simplifies to

P  (t) = 2μ (x̂(t))P (t) − 2λx̂(t)σ(x̂(t)) − σv−2 h (x̂(t))2 P (t)2

The estimation error covariance P (t) must be shown to remain bounded with
time. This will guarantee the validity of our approximations. More precisely,
we have to show that the variance of

e(t) = x(t) − x̂(t),

is bounded where x̂(t) satisfies the EKF while x(t) satisfies the above sde driven
by the Poisson process N (.). We cannot use P (t) for the variance of e(t) since
it is based on linearization of the original sde around the state estimate. We
shall do better by expanding the functions μ(x), σ(x) around x̂(t) upto second
degree terms in e(t) and then calculating how E(e(t)2 |Zt ) evolves with time.
This will be a more accurate analysis of the error than that obtained using only
Advanced Probability and Statistics: Applications to Physics and Engineering 327

P (t). In order to proceed with this computation, we shall make a central limit
approximation that e(t) is zero mean Gaussian with variance P (t) conditioned
on Zt :

de(t) = dx(t)−dx̂(t) = μ(x(t))dt+σ(x(t))dN (t)−μ(x̂(t))dt−λσ(x̂(t))dt


+σv−2 P (t)h (x̂(t))(dz(t)−h(x̂(t))dt)
= (μ(x(t))−μ(x̂(t))dt−σv−2 P (t)h (x̂(t))((h(x(t))−h(x̂(t))dt
+σv dv(t))+σ(x(t))dN (t)−λσ(x̂(t))dt
≈ [μ (x̂(t))e(t) + μ (x̂(t))e(t)2 /2]dt
−σv−2 P (t)h (x̂(t))[(h (x̂(t))e(t) + h (x̂(t))e(t)2 /2)dt + σv dv(t)]
+σ(x̂(t))(dN (t) − λdt) + σ  (x̂(t))e(t)dN (t)
= [μ (x̂(t)) − σv−2 P (t)h (x̂(t))2 ]e(t)dt+
[μ (x̂(t)) − σv−2 P (t)h (x̂(t))h (x̂(t))]e(t)2 dt
−σv−1 P (t)h (x̂(t))dv(t)
+σ(x̂(t))(dN (t) − λdt) + σ  (x̂(t))e(t)dN (t)
= (A1 (t)e(t) + A2 (t)e(t)2 )dt + B(t)(dN − λdt) + C(t)e(t)dN (t) + D(t)dv(t)
say. Writing
Q(t) = E(e(t)2 )
(This is not the same as P (t), it is a more accurate computation of P (t) based
on quadratic approximations not on linear approximations), we get by applying
Ito’s formula for Brownian motion and Poisson processes,

d(e2 (t)) = 2e(t)de(t) + (de(t))2 =

2e(t)[(A1 e(t) + A2 e(t)2 )dt + BdM + CedN + Ddv] + (B + Ce)2 dN + D2 dt


where
M = N − λt
Taking conditional expectations given Zt gives us

dQ(t)/dt = 2A1 (t)Q(t) + 2λC(t)Q(t) + D2 + λ(B 2 + C 2 Q(t))


[14] Stochastic optimal control in discrete time
Let w(n), n = 0, 1, 2, ... be iid random variables with pdf p(w). Consider a
stochastic dynamical system

x(n + 1) = f (x(n), w(n + 1), u(n)), n = 0, 2, 2, ...

The aim is to choose the control input u(n) as a non-random function of x(n)
so that
N
E L(n, x(n), u(n))
n=0
328 Advanced Probability and Statistics: Applications to Physics and Engineering

is minimized. To this end, we define



N
V (k, x(k)) = minu(m),k≤m≤N E[ L(n, x(n), u(n))|x(k)]
n=k
where while taking the minimum here, we restrict u(m) to vary over all non-
random functions of x(m) for each m = k, k + 1, ..., N . From the Markovian
structure of the process, it is clear that we can write
V (k, x(k)) = minu(k) (L(k, x(k), u(k)) + E[V (k + 1, x(k + 1))|x(k)])

= minu(k) (L(k, x(k), u(k)) + V (k + 1, f (x(k), w, u(k))p(w)dw)

and the resulting u(k) that optimizes this will automatically be guaranteed to be
a function of x(k) and k only. This optimal u(k) is precisely the control input
at time k. We can write this equation, also called the stochastic Bellmann-
Hamilton-Jacobi equation, as

V (k, x) = minu (L(k, x, u)+ V (k+1, f (x, w, u))p(w)dw), k = N −1, N −2, ..., 0

It should be noted that the iteration starts with


V (N, x) = minu L(N, x, u)
and goes backward in time.
An example: Application of putting a satellite on the surface of the moon.
Let Me , Mm denote respectively the earth’s mass and the moon’s mass. The
differential equations of motion are
r (t) = v(t),
v (t) = f (t, r(t), v(t), u(t)) + σ(t, r(t), v(t)).w(t)
where
f (r, v, u(t)) = −GMe (r − re )/|r − re |3 − GMm (r − rm )/|r − rm |3 − Γv + Cu(t)
and
w(t) = B (t)
where B(t) is 3-D Brownian motion. We wish to choose the control input (ie,
fuel) u(t) so that it is a non-random function of r, v(t) and t so that
 T
E L(r(t), v(t), u(t), t)dt − μT (Er(T ) − rf ) − ν T (Ev(T ) − vf )
0
is a minimum. In this expression, μ, ν are Lagrange multipliers introduced to
take into account the constraint that at the final time T , the mean values of the
position and velocity of the satellite are prescribed. Let
 T
minu(s),t≤s≤T [E[ L(r(τ ), v(τ ), u(τ ), τ )dτ |r(t), v(t)]−μT E[r(T )
t
−rf |r(t), v(t)]−ν T E[v(T )−vf )|r(t), v(t)]
Advanced Probability and Statistics: Applications to Physics and Engineering 329

= V (t, r(t), v(t))


Then we get using the Markovian properties of the processes involved and Ito’s
formula,

V (t, r(t), v(t)) = minu(t) (L(r(t), v(t), u(t), t)dt+E[V (t+dt, r(t)+dr(t),
v(t)+dv(t))|r(t), v(t)])
or equivalently,

∂V (t, r, v) + minu (D(u)V (t, r, v) + L(r, v, u, t)) = 0

where D(u) is the partial differential operator

D(u) = vT ∂/∂r + f (t, r, v, u)T ∂/∂v + (1/2)T r(σ(t, r, v)σ(t, r, v)T ∂ 2 /∂v∂vT )

This operator depends upon u as a parameter. It is the generator of the Markov


process r(t), v(t) at time t for a given input u(t) = u.

[15] Modeling dynamical systems with constraints using neural net-


works
[1] Consider a system of n particles with n − p constraints. The system
therefore has p degrees of freedom which we denote by q1 , ..., qp and the particle’s
positions can be expressed as
rk = rk (q), k = 1, 2, ..., n, q = (q1 , ..., qp )
D’Alembert’s principle of virtual work can therefore be expressed as

n
(mk d2 rk /dt2 − Fk (r), δrk ) = 0
k=1

where

p
Fk (r) = Fk (r(q)), δrk = (∂rk /∂qj )δqj
j=1

and hence the equations of motion, taking into account the constraints are

n
[mk (d2 rk /dt2 , ∂rk /∂qj ) − (Fk , ∂rk /∂qj )] = 0, j = 1, 2, ..., p
k=1

In the special case when the forces are derived from a potential V (r), ie,

Fk (r) = −∂V /∂rk

it is easily shown via a lengthy calculation that these D’Alembert equations of


motion can be derived from a Lagrangian

n
L(q, q  ) = (1/2) mk |drk /dt|2 − V (r(q))
k=1
330 Advanced Probability and Statistics: Applications to Physics and Engineering

where

p
drk /dt = (∂rk /∂qj )qj
j=1

This Lagrangian has the form



p
L(q, q  ) = (1/2) Mjk (q)qj qk − U (q)
j,k=1

where

n
Mjk (q) = mk (∂rm /∂qj , ∂rm /∂qk )
m=1

and
U (q) = V (r(q))
Remark: The constraints physically mean that there exist constraint forces
that are normal to the p dimensional surface defined by the equations r = r(q),
ie rk = rk (q1 , ..., qp ), k = 1, 2, ..., n that cause the particles to remain always
on this surface or equivalently, cause the particles to move tangential to this
surface or in other words, Newton’s equations of motion hold after projecting
both sides of it onto the tangent plane to the surface at each point. Now consider
a situation in which we wish to model this constrained dynamics using a neural
network. Let W1 , ..., Wn−1 denote the weight matrices of the first, second,...
nth layers respectively. Then, the states of the NN can be expressed as

X(k + 1) = σ(Wk X(k)), k = 1, 2, ..., n − 1

where X(k) is the signal vector at the k th layer. u = X(1) is the input signal
vector and y = X(n) is the output signal vector. In the discretized Lagrangian
dynamics, we assume that noise is present. Therefore, the discretized dynamics
can be expressed as

y(n + 1) = f (y(n)) + u(n) + w(n + 1)

We wish to investigate the effect of this noise on the NN weight fluctuations.


Let us say that by solving this difference equation, we get the output at the
final time N as
y(N ) = F (u, w)
where
u = (u(1), ..., u(N − 1)), w = (w(1), ..., w(N − 1))
We write

X(n) = σ(Wn−1 σ(Wn−2 ...σ(W1 X(1))...) = G(W, u)

The aim is to minimize

E[ y(N ) − X(n) 2 ] = E[ F (u, w) − G(W, u) 2


Advanced Probability and Statistics: Applications to Physics and Engineering 331

w.r.t W. Suppose that we already have a guess value W0 for the weight matrix.
Let the optimal weights be a small perturbation of this, ie,

W = W0 + δW

Then, we have approximately,

G(W, u) ≈ G(W0 , u) + G (W0 , u)δW + (1/2)G (W0 , u)(δW ⊗ δW)

and we get
E[ F (u, w) − G(W, u) 2 ≈
E[ F (u, w) − G(W0 , u) 2 ]+
+δWT E[G (W0 , u)T G (W0 , u)]δW
+2δWT E[G (W0 , u)T (F (u, w) − G(W0 − u))]
−(δW ⊗ δW)T E[G (W0 , u)T (F(u, w) − G(W0 , u))]
In these equations, it is being assumed that the input process u is also a random
process just as the dynamical system noise w is. We now define the column
vector
c = E[G (W0 , u)T (F(u, w) − G(W0 , u))]
Then,

K
(δW ⊗ δW)T c = δWa .δWb .c(K(a − 1) + b)
a,b=1

= δWT CδW
where
C = ((c(K(a − 1) + b)))1≤a,b≤K

[16] Quantum neural network for estimating the joint pdf of a ran-
dom signal
y(t), t = 0, 1, 2, ... is a given stationary random process. We wish to estimate
the joint probability density of y(t) = (y(t + τk ), k = 1, 2, ..., p)T .
Let f (y, t) be the tentative joint pdf. We wish to improve upon it using
quantum mechanics. It is known that if the initial wave function ψ(y, 0) of a
quantum system evolves according to Schrodinger’s equation with a real poten-
tial, then |ψ(y, 0)|2 dp y = 1 implies |ψ(y, t)|2 dp y = 1∀t > 0. This suggests
to us that we improve upon our knowledge about the wave function by defining
an ”error potential”

V (y, t) = W (y, t)(f (y, t) − |ψ(y, t)|2 )

where the weights W (y, t) evolve according to a learning algorithm

∂t W (y, t) = −β1 W (y, t) + β2 (f (y, t) − |ψ(y, t)|2 )


332 Advanced Probability and Statistics: Applications to Physics and Engineering

and the wave function required for approximating the data pdf evolves according
to the Schrodinger equation

iψ,t (y, t) = −(1/2m)∇2y ψ(y, t) + V (t, t)ψ(y, t)

This Schrodinger evolution will always guarantee that |ψ(y, t)|2 remains a prob-
ability density function. The idea is that if the neural weight W (y, t) is large
positive, then the decay term −β1 W in the weight learning algorithm will guar-
antee a decrease in the same provided that |ψ(y, t)|2 < f (y, t). On the other
hand, if f (y, t) >> |ψ(y, t)|2 , then the second term in the weight learning algo-
rithm will guarantee rapid increase of the weight W (y, t) causing the potential
V (y, t) to get large and then the Schrodinger equation will guarantee increase
of |ψ(y, t)|2 so that it gets closer to f (y, t).
Chapter 12

Quantum Signal Processing

Goal: To design quantum gates using quantum scattering theory.


[1] Syllabus for Quantum Signal Processing
[1] The notion of a qubit as a generalization of bit.
[2] Tensor product of several qubit states leading to a qudit state.
[3] Pure and mixed states in quantum computation. The principle of super-
position for pure states.
[4] Notion of a quantum probability space (H, P (H), ρ) and its comparison
with a classical probability space (Ω, F, P ).
[5] The notion of a quantum measurement, PVM and POVM, the collapse
postulate in quantum measurement theory.
[6] The Schrodinger evolution, time independent perturbation theory, the
Dyson series in the interaction picutre and its application to the design of unitary
gates for quantum computation.
[7] Various commonly used quantum gates like the CNOT gate (two qubits),
the Hadamard gate (one qubit), the Swap gate (three qubits), the Fredkin and
Toffoli gates, the phase gate, the quantum Fourier transform (QFT) gate.
[8] Tensor product of states, observables and unitary gates.
[9] Using harmonic oscillators to design quantum gates.
[10] Performance analysis of quantum gates designed using Schrodinger’s
evolution in the presence of noise.
[11] The Choi-Kraus and Stinespring representation of a noisy quantum
channel.
[12] The Knill-Laflamme theorem for recovering an input state passed through
a noisy quantum channel.
[13] The notion of a quantum error correcting code based on the Knill-
Laflamme theorem.
[14] Quantum error correcting codes designed via Imprimitivity theorem
applied to Weyl operators acting on states defined in L2 (A) where A is a finite
Abelian group.
[15] Quantum hypothesis testing between two mixed states. Construction of
the decision operator.

333
334 Advanced Probability and Statistics: Applications to Physics and Engineering

[16] Quantum teleportation using entangled states as a fast means for trans-
mitting a d-qubit quantum state by transmitting only 2d classical bits.
[17] Quantum image processing using the Hudson-Parthasarathy unitary
evolution operator.
[18] Quantum entropy of a state and quantum relative entropy between two
states with application to the proof of the classical-quantum Shannon coding
theorem by encoding classical alphabets into density matrices, ie, mixed states.
[19] Generation of entangled states from mixed states for fast communication
using the Schrodinger unitary dynamics.
[2 a] Wave operators given two unbounded Hermitian operators with appli-
cation to scattering theory. Application of scattering theory to the design of
quantum gates. Choosing the control scattering potential so that the S-matrix
at a given energy E is as close as possible to a given unitary matrix.
[2 b] Scattering theory for quantum field theory in which the two Hamiltoni-
ans are functionals of a set of quantum fields like the creation and annihilation
operators in momentum space for electrons, positrons and photons.
[3] An introduction to quantum stochastic calculus and quantum filtering
and control theory.
[a] Boson Fock space,
[b] Exponential vectors in Boson Fock space.
[c] The Weyl operator and its role in the construction of the fundamental
noise operator fields on a Hilbert space and on the space of Hermitian operators
in a Hilbert space.
[d] The Weyl operator and its role in the construction of the fundamental
quantum noise processes.
[e] Alternative construction of the fundamental quantum noise processes
using the algebra generated by an infinite sequence of independent harmonic
oscillators.
[f] The Hudson-Parthasarathy noisy Schrodinger equation.
[g] Non-demolition measurements in the sense of Belavkin.
[h] Derivation of the Belavkin quantum filter.
[i] The equations of motion of a spin operator in the presence of quantum
noisy processes and its estimation in real time based on non-demolition mea-
surements.

[4] Quantum field theory of interactions between electrons, positrons and


photons and its application to the design of large sized quantum gates using
Feynman diagrams.

[5] Design of quantum gates using quantum gravity. A weak gravitational


field is described by small (quantum) fluctuations of the metric tensor around a
background classical metric tensor. The linear wave equation satisfied by such
a metric perturbation is derived and its solution is expressed as linear superpo-
sitions of basis wave functions (depending upon the background metric) with
coefficients being graviton creation and annihilation operator fields. Bosonic
commutation relations between these operator fields are derived starting from
Advanced Probability and Statistics: Applications to Physics and Engineering 335

the canonical commutation relations between the position and momentum den-
sity fields associated with the Lagrangian density of the gravitational field ap-
proximated upto quadratic orders in the fluctuating metric. Cubic terms in this
Lagrangian density are also considered as small perturbations (self-interacting
terms) to the quadratic component of the Lagrangian density and its effect on
graviton propagator corrections is derived.

[6] Image processing using quantum Gaussian states


[1] Take an N × N classical image field (noisy) where N = mr. We partition
this image field matrix into r m × m square blocks. We convert each m × m
2
block into a pure quantum state of size 2m × 1. This done as follows. Let the
original image field be ((X(n, k)))1≤n,k≤N . Then the (k, l)th block is the m × m
matrix ((X(m(k − 1) + p, X(m(l − 1) + q)))1≤p,q≤m . Denote this sub-image field
by ((Xkl (p, q)))1≤p,q≤m . Make a standard C → Q transformation to represent
2
this classical image field by a pure quantum state of size 2m × 1 and then ap-
proximate this pure state by a mixed Gaussian state. Train a unitary operator
U so that when acting on this pure Gaussian state in the adjoint representa-
tion, it outputs a good approximation of a given Gaussian state obtained by
applying the same transformation to another partitioned image field (noiseless).
The training of the unitary operator is based on minimizing the sum of all the
Frobenius norm squared errors over all the r2 blocks.
Reference:
Rohit Singh, Harish Parthasarathy and Jyotsna Singh, Paper in preparation.
[7] Quantum stochastic integration and quantum stochastic differ-
ential equations, proofs of existence and uniqueness
References:
[1] K.R.Parthasarathy, ”An introduction to quantum stochastic calculus”,
Birkhauser, 1992.
[2] P.A.Meyer, ”Quantum probability for probabilists”, Springer lecture notes,
1992.
Let L1 , L2 , L3 be bounded operators and consider the qsde
dX(t) = (L1 dA(t) + L2 dA(t)∗ + L3 dt)X(t)
We wish to show that this equation has a strong solution, ie, for any X(0), there
exists an adapted process X(t) such that for any f ∈ h and u ∈ L2 (R+ , we have
 t
X(t)|f e(u) >= X(0)|f e(u) > + (L1 X(s)dA(s)+L2 X(s)dA(s)∗
0

+L3 X(s)ds)|f e(u) >, t ≥ 0

where the integrals appearing on the rhs are standard quantum stochastic inte-
grals. To this end, we construct a sequence of adapted processes Xn (t), n ≥ 0
such that X0 (t) = X(0) and
 t
Xn+1 (t) = X(0) + (L1 Xn (s)dA(s) + L2 Xn (s)dA(s)∗ + L3 Xn (s)ds), n ≥ 0
0
336 Advanced Probability and Statistics: Applications to Physics and Engineering

Then we get by writing


Dn (t) = Xn (t) − Xn−1 (t),
 t
Dn+1 (t)|f e(u) >= (L1 Dn (s)dA(s) + L2 Dn (s)dA(s)∗ + L3 Dn (s)ds|f e(u) >
0
and hence
 Dn+1 (t)f e(u) 2 ≤
 t  t
3.  L1 Dn (s)dA(s)|f e(u) >2 +3.  L2 Dn (s)dA(s)∗ |f e(u) >2
0 0
 t
+3.  L3 Dn (s)ds|f e(u) >2
0
Now,  t
 L1 Dn (s)dA(s)|f e(u) >2 =
0
 t  t
2
 u(s)L1 Dn (s)|f e(u) > ≤ ( |u(s)|  L1 Dn (s)|f e(u) > ds)2
0 0
 t  t
≤( |u(s)|2 ds).  L1 2  Dn (s)|f e(u) >2 ds
0 0
 t
≤ t  u 2 .  L1 2 Δn (s)ds
0
where
Δn (t) = Dn (t)|f e(u) >2
Again by quantum Ito’s formula, on defining
 t
Fn (t) = L2 Dn (s)dA(s)∗ ,
0

we get
 Fn (t)f e(u) 2 =
 t
 L2 Dn (s)dA(s)∗ |f e(u) >2
0
 t  t
=< L2 Dn (s)dA(s)∗ f e(u), L2 Dn (s)dA(s)∗ f e(u) >
0 0
 t  t
= 2Re( < L2 Dn (s)f e(u), Fn (s)f e(u) > u(s)ds) +  L2 Dn (s)f e(u) 2 ds
0  t 0

≤2 |u(s)|  L2 Dn (s)f e(u)  .  Fn (s)f e(u)  .ds


0  t
+  L2 2 |u(s)|2  Dn (s)f e(u) 2 ds
0
 t
≤ L2 2  Dn (s)f e(u) 2 ds
0
Advanced Probability and Statistics: Applications to Physics and Engineering 337

 t
+K(u)  Fn (s)f e(u) 2 ds
0
 t
+  L2 2  Dn (s)f e(u) 2 ds
0
 t
= 2  L2 2  Dn (s)f e(u) 2 ds
0
 t
+K(u)  Fn (s)f e(u) 2 ds
0

where
K(u) = sups≥0 |u(s)|2
Combining all these inequalities, we get

Δn+1 (t) ≤
 t
3t  u 2  L1 2 Δn (s)ds
0
 t
+3  Fn (t)f e(u) 2 +3t  L3 2 Δn (s)ds
0

where
 Fn (t)f e(u) 2 ≤
 t  t
2  L2 2 Δn (s)ds + K(u)  Fn (s)f e(u) 2 ds
0 0

Application of Gromwall’s Lemma which we prove below to this last inequality


gives us
 Fn (t)f e(u) 2 ≤
 t
2
2  L2  Δn (s)ds
0

+K(u) exp(K(u)(t − v)).2.  L2 2 Δn (s)dsdv
0<s<v<t
 t
= 2  L2 2 Δn (s)ds
0

+K(u) exp(K(u)(t − v)).2.  L2 2 Δn (s)dsdv
0<s<v<t
 t
= 2  L2 2 exp(K(u)(t − s))Δn (s)ds
0
 t
≤ 2.exp(K(u)t)  L2 2 Δn (s)ds
0
338 Advanced Probability and Statistics: Applications to Physics and Engineering

In short, we have just derived an inequality of the form


 t
Δn+1 (t) ≤ K0 (u, T, L1 , L2 , L3 ) Δn (s)ds, 0 ≤ t ≤ T, n = 0, 1, 2, ...
0

where K0 is a finite constant dependent upon the parameters u, T, Lk , k = 1, 2, 3


and iteration gives
Δn+1 (t) ≤ (t − s)n Δ0 (s)ds
from which we derive on defining

δ(n) = sup(Δn (t) : t ∈ [0, T ]},

δ(n) ≤ C.T n /n!


where C is a constant dependent on T . We thus get from the triangle and
Cauchy-Schwarz inequalities,


n+p
supt∈[0,T ]  Xn+p (t) − Xn (t) 2 ≤ p. δ(k)
k=n+1

which converges to zero as n → ∞ for any fixed p.


Remark:

Now we prove Gromwall’s Lemma: Let a(t), b(t) ≥ 0, K > 0 and


 t
b(t) ≤ a(t) + K b(s)ds
0

Then writing  t
c(t) = b(s)ds
0

we have
c (t) ≤ a(t) + K.c(t)
and hence  t
c(t) ≤ exp(K(t − s))a(s)ds
0
 t
≤ exp(tK). a(s)ds
0

Another version of this which follows by iteration is


 t  t
2
b(t) ≤ a(t) + K a(s)ds + K (t − s)b(s)ds
0 0
 t  t  t
2 3
≤ a(t) + K a(s)ds + K (t − s)a(s)ds + (K /2) (t − s)2 b(s)ds
0 0 0
Advanced Probability and Statistics: Applications to Physics and Engineering 339


N  t  t
≤ ... ≤ a(t)+ (K n /(n−1)!) (t−s)n−1 a(s)ds+(K N +1 /N !) (t−s)N b(s)ds
n=1 0 0

which gives on letting N → ∞,


 t
b(t) ≤ a(t) + K. exp(K(t − s))a(s)ds
0

[8] Quantum stochastic calculus, some aspects


[1] Existence and uniqueness of solutions to a class of qsde’s
[2] CAR from CCR
[3] Representations of creation and annihilation operator fields and processes
using Guichardet kernels.

[9] On how the Belavkin quantum filter can be applied to quantum


image processing problems
Consider a classical image field X(m, n), 1 ≤ m, n ≤ N and suppose we
convert it into a quantum image field by representing the intensity of the (m, n)th
pixel with a qubit of the form

|ψ(m, n) >= a(m, n)|1 > +b(m, n)|0 >

where 
|a(m, n)| = X(m, n), |b(m, n)| = 1 − |a(m, n)|2
where X(m, n) has been shifted and normalized so that it falls in the range
[0, 1]. Specifically if

α = maxm,n X(m, n), β = minm,n X(m, n)


X(m,n)−β
then we replace X(m, n) by α−β . We then approximate the resulting pure
quantum state

N
|ψ >= |ψ(m, n) >
m,n=1

by a pure state ρ belonging to some family F, like say the family of all Gaussian
states:
ρ = argmaxσ∈F < ψ|σ|ψ >
This is equivalent to minimizing

 σ − |ψ >< ψ| 2

over σ ∈ F where the norm used is the Frobenius norm. To process this quantum
image state, we first take the tensor product of this with the bath coherent state
|φ(u) >< φ(u)| where

|φ(u) >= exp(−  u 2 /2)|e(u) >


340 Advanced Probability and Statistics: Applications to Physics and Engineering

and then allow the resulting state to evolve under the HP noisy Schrodinger
dynamics so that after time t, the state of the system⊗bath becomes
ρ(t) = U (t)(ρ ⊗ |φ(u) >< φ(u)|)U (t)∗
Normally, in standard quantum image processing theory, we would take the
partial trace of this evolved state after time T , construct its pure state approxi-
mation and then transform the resulting state into a classical image field by the
reverse of the process described above involving conversion of a classical image
field to a pure quantu state. In order to do this, however, we require to measure
the quantum state ρ(T ) after time T . An accurate measurement of this state
is however difficult and hence we adopt Belavkin’s quantum filtering method in
which we take non-demolition measurements of the form

Yo (t) = U (t)Yi (t)U (t)∗ , t ≥ 0

where
Yi (t) = c1 A(t) + c̄1 A(t)∗ + c2 Λ(t)
and then based on ηo (t) = {Yo (s) : s ≤ t}, we estimate the state using Belavkin’s
filter as ρB (t) which is defined by

T r(ρB (t)X) = T r(ρπt (X))

where
πt (X) = E[jt (X)|ηo (t)], jt (X) = U (t)∗ XU (t)
is the Belavkin filtered observable at time t. Here, X is a system observable and
ρ is the system state at time t = 0. ρB (t) should be interpreted as a random
system space density matrix. We can write in terms of dual operators,
ρB (t) = πt∗ (ρ)
The Belavkin filter for observables is
dπt (X) = πt (LX)+(πt (Mt X+XMt∗ )−πt (X)πt (Mt +Mt∗ ))(dYo (t)−πt (Mt +Mt∗ )dt)
and by duality, for states, it is
dπt∗ (ρ) = L∗ πt∗ (ρ)dt + [(πt∗ (ρ)Mt + Mt∗ πt∗ (ρ)

−T r(πt∗ (ρ)(Mt +Mt∗ ))πt∗ (ρ)][dYo (t)−T r(πt∗ (ρ)(Mt +Mt∗ ))].[dYo (t)
−T r(πt∗ (ρ)(Mt +Mt∗ )))
After thus estimating the system state at time t, we can construct its optimum
pure state approximation and then do a quantum → classical conversion. We
note that if ρ is the initial system state and expectations are taken w.r.t the
probability distribution of the Abelian family Yo (t), t ≥ 0 in the state |f ⊗
φ(u) >, then with T rs denoting trace over the system Hilbert space, we have

T rs (πt∗ (ρ)X) = T rs (ρπt (X)) = T rs (ρE(jt (X)|ηo (t)))


Advanced Probability and Statistics: Applications to Physics and Engineering 341

= T rs (E(ρ.jt (X)|ηo (t))) = T rs (E(ρ.U (t)∗ XU (t)|ηo (t)))


= E(T rs (ρ.U (t)∗ XU (t))|ηo (t))
Now let C(t) be any ηo (t)-measurable observable. Then we get from this result,

E(T rs (πt∗ (ρ)X)C(t)) =

E(T rs (ρ.U (t)∗ XU (t))C(t))


=< f φ(u)|T rs (ρ.U (t)∗ XU (t))C(t)|f φ(u) >
= T r(T rs (ρ.U (t)∗ XU (t))C(t)|f φ(u) >< f φ(u)|)
It is thus not clear whether we can write πt∗ (ρ) as E(U (t)ρ.U (t)∗ |ηo (t)). It
should be noted that although U (t)XU (t)∗ commutes with ηo (t), it cannot be
regarded as a system space operator valued function of ηo (t) since there are in
general other observables than system space operator valued functions of ηo (t)
that commute with ηo (t). It should also be noted that U (t)ρU (t)∗ does not
commute with ηo (t) and hence we cannot define E(U (t)ρU (t)∗ |ηo (t))

[10] Some remarks and problems on quantum Gaussian states


This problem is about how to compute the matrix elements of a quantum
Gaussian state defined as the exponential of a quadratic form in the creation
and annihilation operators of a finite set of independent quantum harmonic os-
cillators. The matrix elements are taken w.r.t the coherent states or alternately
between the occupation number states of the harmonic oscillators. The main
idea is to use the Weyl unitary operator associated with a unitary matrix or
more generally with a symplectic matrix acting on the phase space to transform
the quadratic form in the creation and annihilation operators or equivalently
in the position and momentum operators to a diagonal form involving only the
number operators of the different oscillators which mutually commute. The next
problem is about how to realize quantum Gaussian channels ie linear channels
which take as input any Gaussian state and output another Gaussian state. We
may specify such channels by their dual action on the Weyl operators corre-
sponding to translations in Cn and realize this action via quantum Schrodinger
evolution with Lindblad terms ie, in terms of the master equation or the GKSL
equation whose Hamiltonian is that of the set of harmonic oscillators and whose
Lindblad operator terms are linear combinations of the creation and annihilation
operators of the different oscillators. The quantum Fourier transform of a state
is defined as the trace of the product of the state and the Weyl translation oper-
ator with translation variable z ∈ Cn .It is shown that when the Gaussian state
is the exponential of a quadratic form in the creation and annihilation opera-
tors or equivalently in the position and momentum operators then the quantum
Fourier transform of such a state is the exponential of a linear quadratic form in
z, z̄. This fact characterizes a Gaussian state. Methods for inverting the quan-
tum Fourier transform to recover the Gaussian state are given all based on the
Glauber-Sudarshan formula for a non-orthogonal resolution of the identity op-
erator in terms of coherent states. When the Gaussian state is the exponential
342 Advanced Probability and Statistics: Applications to Physics and Engineering

of an arbitrary real symmetric quadratic form in the position and momentum


operators, then computing its matrix elements w.r.t the coherent states is hard
but can be achieved using Williamson’s theorem on diagonalizing a quadratic
form using symplectic matrices. Each symplectic matrix acts as a linear combi-
nation of the position and momentum operators yielding a new set of position
and momentum operators which also satisfy the canonical commutation rela-
tions and hence by the Stone-Von-Neumann theorem, the new set of position
and momentum operators can be expressed as a unitary operator in L2 (Rn )
acting on the previous set of position and momentum operators in the adjoint
representation, or equivalently by saying that the new set of position and mo-
mentum operators are unitarily isomorphic to the previous set. This unitary
operator Γ(L) is uniquely determined by the symplectic matrix L that linearly
transforms the previous set of position and momentum operators to the new
set. The operators Γ(L) have an easy to describe action on the coherent vec-
tors and by selecting the symplectic L in accordance with Williamson’s theorem
so as to diagonalize the quadratic form, we are able to represent the Gaussian
state using commuting number operators and hence obtain a neat formula for
its matrix elements w.r.t the coherent states/exponential vectors.

[1] Let a1 , ..., ap be operators in a Hilbert space such that [ak , a∗m ] = δkm
and [ak , am ] = 0. Consider the Gaussian state


p
ρ = C(Q).exp(− Qkm a∗k am )
k,m=1

where
C(Q) = det(1 − exp(−Q))
Here, Q is a positive definite p × p matrix. Prove that T r(ρ) = 1 and that ρ ≥ 0.
Calculate the quantum Fourier transform of ρ, ie

ρ̂(z) = T r(ρ.W (z))

where

p
W (z) = exp( (z̄k ak − zk a∗k )) = exp(a(z) − a(z)∗ )
k=1

Remark: How to invert the quantum Fourier transform so that ρ can be


expressed in terms of ρ̂(z), z ∈ Cp .
hint: Use the identity

I = π −p |φ(z) >< φ(z)|dp zdp z̄

and hence

W (z) = C(p) |φ(u) >< φ(u)|W (z)|φ(v) >< φ(v)|dudv
Advanced Probability and Statistics: Applications to Physics and Engineering 343

so that

ρ̂(z) = C(p) < φ(v)|ρ|φ(u) >< φ(u)|W (z)|φ(v) > dudv

with

< φ(u)|W (z)|φ(v) >= exp(−|z|2 /2− < z, v >)exp(< u|v+z >)exp(−|u|2 /2−|v|2 /2)

This formula enables us to determine < φ(v)|ρ|φ(u) > and using


 √
|φ(u) >= (un / n!)|n >
n>0

we obtain a sequence of linear equations for the matrix elements < n|ρ|m > of ρ
between the number states of the independent harmonic oscillators. Alternately,
once we know < φ(u)|ρ|φ(v) > by inverting a classical Fourier transform, we
can determine ρ by the formula

ρ = C(p) |φ(u) >< φ(u)|ρ|φ(v) >< φ(v)|dudv


[2] Evaluate the matrix elements of exp(− k,m Qkm a∗k am ) between the oc-
cupation number states |n >.
hint: Write the spectral decomposition of Q as

Q= c(k)ek e∗k , c(k) ≥ 0, e∗k em = δkm
k

Then  
Qkm a∗k am = c(k)(a∗ ek )(e∗k a)
k,m k

= c(k)a(ek )∗ a(ek )
k

where 
a(ek ) = ēk (m)am
m

and clearly
[a(ek ), a(em )∗ ] = δkm , [a(ek ), a(em )] = 0
Now let |f (n) > be the normalized eigenstate of a(ek )∗ a(ek ) with eigenvalue nk :

a(ek )∗ a(ek )|f (n) >= nk |f (n >, k = 1, 2, ..., p, < f (m)|f (n) >= δ[n − m]

We write 
|f (n) >= K(n, m)|m >
m
344 Advanced Probability and Statistics: Applications to Physics and Engineering

and then get 


nk K(n, m)|m >=
m

ek (r)ēk (s)K(n, m)a∗r as |m >
r,s,m

and therefore,
nk K(n, q) =

= ek (r)ēk (s)K(n, m) < q|a∗r as |m >
r,s,m
 √ √
= ek (r)ēk (s)K(n, m) q r ms δ[q − m + us − ur ]
r,s,m

= ek (r)ēk (s)K(n, q + us − ur )
r,s

where ur is the p × 1 vector with a one in its rth position and zeros at all the
other positions. This set of linear equations has to be solved for the Kernel
K(n, m), n, m ∈ Zp+ .

[3] Evaluate the matrix element of exp(a(z)∗ a(z)) between two coherent
states or equivalently between two occupation number states where


p
a(z) = z̄k ak , [ak , a∗j ] = δkj , [ak , aj ] = 0
k=1

Equivalently, if n is any positive integer, evaluate

< φ(u)|(a(z)∗ a(z))n |φ(v) >

W (u)a(z)|e(v) >=< z, v > W (u)|e(v) >=< z, v > exp(−|u|2 /2− < u, v >)|e(v+u) >
a(z)W (u)|e(v) >= a(z)exp(−|u|2 /2− < u, v >)|e(v + u) >
= exp(−|u|2 /2− < u, v >) < z, v + u > |e(v + u) >
Thus,
[W (u), a(z)]|e(v) >= − < z, u > W (u)|e(v) >
and hence
W (u)a(z)W (u)∗ = a(z)− < z, u >
Then,

W (u)f (a(z), a(v)∗ )W (u)∗ = f (a(z)− < z, u >, a(v)∗ − < u, v >)

In particular,

W (u).exp(a(z)∗ a(z))W (u)∗ = exp((a(z)∗ − < u, z >)(a(z)− < z, u >))


Advanced Probability and Statistics: Applications to Physics and Engineering 345

[4] Consider a linear transformation T on the subspace spanned by the Weyl


operators W (z), z ∈ Cp given by

T (W (z)) = exp((−1/2)R(z)T KR(z))W (Az)

where A is any p × p complex matrix, R(z) = [xT , y T ]T , and K is a real positive


semidefinite 2p × 2p matrix. suppose ρ is a Gaussian state. Then

T r(T ∗ (ρ)W (z)) = T r(ρ.T (W (z))


= T r(ρ.W (Az)).exp((−1/2)R(z)T KR(z))

= ρ̂(Az).exp((−1/2)R(z)T KR(z))
Now since ρ is a Gaussian state, we have
ρ̂(z) = exp(−(1/2)R(z)T SR(z) + mT R(z))
where S is a complex Hermitian matrix of size 2p× 2p and m is a complex 2p × 1
vector. Hence T ∗ transforms Gaussian states into Gaussian states. The problem
is to realize the transformation T using the GKSL equation. This problem was
first solved by K.R.Parthasarathy who considered a qsde that yields a family
of unitary operators which are dilations of T and hence by tracing out over
the bath, the unitary evolution becomes a GKSL equation that transforms a
Gaussian state after any time t into another Gaussian state.
[5] Let z ∈ Cp . Choose a unitary operator U on Cp so that U z = |z|e1 . For
example, writing
z = z1 e1 + ... + zp ep
where {e1 , ..., ep } is the standard onb for Cp , we define

U = |e1 >< z|/  z  +(I − |e1 >< e1 |)(I − |z >< z|/  z 2 )

Then, with Γ(U ) = W (0, U ) denoting the second quantization of U , we have

Γ(U )a(z)Γ(U )−1 = a(U z)

since
Γ(U )a(z)|e(v) >=< z, v > |e(U v) >
on the one hand while on the other,

a(U z)Γ(U )|e(v) >= a(U z)|e(U v) >=< U z, U v > |e(U v) >=< z, v > |e(U v) >

so that,
Γ(U )a(z) = a(U z)Γ(U )
or equivalently,

a(U z) = Γ(U )a(z)Γ(U )−1 = Γ(U )a(z)Γ(U )∗


346 Advanced Probability and Statistics: Applications to Physics and Engineering

Note that the proof works only when U is unitary. Now


< e(u)|exp(−βa(z)∗ a(z))|e(v) >=

< e(u)|Γ(U )∗ Γ(U )exp(−βa(z)∗ a(z))Γ(U )|e(v) >=


< e(U u)|exp(−βa(e1 )∗ a(e1 ))|e(U v) >

= < e(U u)|n1 ...np >< n1 ...np |e(U v) > exp(−βn1 )
n≥0

More generally, let |f1 >, ..., |fp > be any onb for Cp . Choose a unitary operator
in Cp so that U |fk >= |ek >, k = 1, 2, ..., p. We have

p
U= |ek >< fk |
k=1
Then,

p
< e(u)|exp(− βk a(fk )∗ a(fk ))|e(v) >=
k=1

< e(U u)|exp(− βk a(ek )∗ a(ek ))|e(U v) >=
k
 
p
= exp(− βk nk ) < e(U u)|n1 ...np >< n1 ...np |e(U v) >
n1 ,...,np ≥0 k=1

where 
< n1 ...np |e(z) >= z1n1 ...zpnp / n1 !...np !
Thus,

p
< e(u)|exp(− βk a(fk )∗ a(fk ))|e(v) >=
k=1
 
p
= exp(− βk nk ).¯(U u)n1 1 ...¯(U u)np p (U v)n1 1 ...(U vp )np /n1 !...np !
n1 ...np k=1

= Πpk=1 exp(exp(−βk )¯(U u)k (U v)k ) = exp(< U u|exp(−D)|U v >)


= exp(< u|U ∗ exp(−D)U |v >)
where 
D = diag[β1 , ..., βp ] = βk |ek >< ek |
k
Now, 
U ∗ .exp(−D).U = exp(−βk )U ∗ |ek >< ek |U
k

= exp(−βk )|fk >< fk |
k

so defining 
Q= βk |fk >< fk |
k
Advanced Probability and Statistics: Applications to Physics and Engineering 347

we have
U ∗ .exp(−D).U = exp(−Q)
and thus,

p
< e(u)|exp(− βk a(fk )∗ a(fk ))|e(v) >= exp(< u|exp(−Q)|v >)
k=1
It then follows that on defining the quantum Gaussian state
p
ρ = C.exp(− βk a(fk )∗ a(fk ))
k=1

where
C = det(1 − exp(−Q))
we have

ρ̂(z) = T r(ρ.W (z)) = π −2p exp(−|u|2 /2−|v|2 /2)

< e(v)|ρ|e(u) >< e(u)|W (z)|e(v) > dudv



= Cπ −2p exp(−|u|2 /2−|v|2 /2)exp(< v|exp(−Q)|u >)
< e(u)|W (z)|e(v) > dudv

= Cπ −2p exp(−|u|2 /2−|v|2 /2)exp(< v|exp(−Q)|u >).exp(−|z|2 /2− < z, v >
+ < u, z+v >)dudv
which is obviously the exponential of a quadratic form in (z, z̄).
[6] The symmetry group of Gaussian states. Let L be a symplectic matrix
acting on R2p . Thus, L is a 2p × 2p real matrix satisfying

LT JL = J

where
0 Ip
J=
−Ip 0
Clearly detL = ±1 and let ρ be a Gaussian state. We wish to show that
Γ(L)ρΓ(L)∗ is also a Gaussian state where Γ(L) is the unique unitary operator
acting in L2 (Rp ) = Γs (Cp ) defined by the equation

Γ(L)W (z)Γ(L)∗ = W (L̃z)


where L̃ ∈ Cp×p is defined by
R(L̃z) R(z)
=L
I(L̃z) I(z)
Here R(z) = Real(z) = x, I(z) = Im(z) = y, z = x + iy. Note that in block
matrix form,
R(z̃) = L11 x + L12 y, I(z̃) = L21 x + L22 y
348 Advanced Probability and Statistics: Applications to Physics and Engineering

and the symplectic condition implies

LT11 LT21 0 I L11 L12 0 I


=
LT12 LT22 −I 0 L21 L22 −I 0

or equivalently,

LT11 L22 − LT21 L12 = I, LT11 L21 + LT21 L11 = 0,

LT12 L22 − LT22 L12 = 0


Note that the definition of Γ(L) and its uniqueness is in complete accordance
with the Stone-Von-Neumann theorem which states that canonical position and
momentum operators in a Hilbert space are unitarily equivalent. More specif-
ically, if (q, p) form a set of canonical position and momentum operators in a
Hilbert space, ie they are self-adjoint and [qi , pj ] = iδij , [qi , qj ] = −, [pi , pj ] = 0
and if L is any symplectic matrix then q  = LT11 q +LT21 p, p = LT12 q +LT22 p, define
another set of canonical position and momentum operators ,ie

[q  , q  ] = 0, [q, p] = iIp , [p, p] = 0

in view of the L being symplectic and hence by the Stone-Von-Neumann the-


orem, there exists a unique unitary operator Γ(L) in the Hilbert space L2 (Rp )
such that
qk = Γ(L)qk Γ(L)∗ , pk = Γ(L)pk Γ(L)∗ , 1 ≤ k ≤ p
In other words, all canonical position and momentum observables in L2 (Rp )
are uniquely determined upto a unitary isomorphism, this is the Stone-Von-
Neumann theorem and the existence of a unique Γ(L) for a symplectic L is an
application of this theorem. Now the Weyl operator can be expressed as

W (z) = exp(z̄.(q + ip)/sqrt2 − z.(q − ip)/ 2)

= exp(−i 2(x.q − y.p))
and hence, this application of the Stone-Von-Neumann theorem is equivalent to
saying that there exists a unique unitary Γ(L) so that

Γ(L)W (z)Γ(L)∗ = W (L̃z)

Note that the canonical commutation relations satisfied by q, p are equivalent


to the Weyl commutation relations:

W (z)W (z  ) = exp(iIm(< z, z  >)W (z + z  ) = exp(−iIm(< z  , z >))W (z  + z)

= exp(−2iIm(< z, z  >)W (z  )W (z)


∀z, z  ∈ Cp . Now let ρ be a Gaussian state. Thus,

ρ̂(z) = T r(ρ.W (z)) = exp(mT1 x + mT2 y − [xT , y T ]S[xT , y T ]T )


Advanced Probability and Statistics: Applications to Physics and Engineering 349

where m1 , m2 are in Cp and S ∈ C2p×2p is symmetric. We wish to determine


the set of all (m1 , m2 , S) such that ρ is a state. The set of all such ρ is then
called the family of Gaussian states in L2 (Rp ) = Γs (Cp ). If ρ is one such state
with parameters (m1 , m2 , S) and if L is symplectic, then Γ(L)∗ ρ.Γ(L) is another
such state. Indeed, we have
T r(Γ(L)∗ ρ.Γ(L).W (z)) =
T r(ρ.Γ(L)W (z).Γ(L)∗ ) = T r(ρ.W (L̃z))
= exp(mT1 R(L̃z) + mT2 I(L̃z) − [R(L̃z)T , I(L̃z)T ]S.[R(z̃)T , I(z̃)T ]T )
= exp([mT1 , mT2 ]L[xT , y T ]T − [xT , y T ]LT SL[xT , y T ]T )
which is the Fourier transform of a state from the same family as described
above but with parameters (LT [mT1 , mT2 ]T , LT SL). To completely characterize
a Gaussian state, we require another restriction on the covariance matrix S
stemming from the uncertainty principle or equivalently from the Weyl com-
mutation relations. This constraint is absent in classical Gaussian probability
distributions since all observables, ie, random variables commute in the classical
case. Since ρ is a state, we require that the matrix
P = ((T r(ρ.W (zi )W (zj )∗ )))1≤i,j≤N

be positive definite for all z1 , ..., zN ∈ Cp and for all N = 1, 2, .... Now the Weyl
commutation relations give

W (zi )W (zj )∗ = W (zi )W (−zj ) = exp(−i.Im(< zi , zj >))W (zi − zj )

Thus
Pab = exp(−i.Im(< za , zb >))T r(ρ.W (za − zb ))
= exp([mT1 , mT2 ][R(za −zb )T , I(za −zb )T ]T −iIm(< za , zb >)−[R(za −zb )T ,

I(za −zb )T ]T S[R(za −zb )T , I(za −zb )T )]T )

Now, we note that

Im(< za , zb >) = xTa yb − yaT xb = [xa , ya ]T J[xTb , ybT ]T


and hence from a well known theorem in analysis, a n.s. condition for P to be
positive definite for all N is that
2S + iJ ≥ 0
Remark: Let N be a 2p × 2p real matrix such that N T N is symplectic. Then
N T N J = J(N T N )−1
so if v is an eigenvector of N T N with eigenva[lue c, then

N T N Jv = J(N T N )−1 v = c−1 Jv


350 Advanced Probability and Statistics: Applications to Physics and Engineering

ie Jv is an eigenvector of N T N with eigenvalue c−1 . Hence, we can arrange the


eigenvalues of N T N as c1 , ..., cp , c−1 −1
1 , ..., cp with corresponding orthonormal
eigenvectors v1 , ..., vp , vp+1 , ..., v2p . Define

V = [v1 , ..., v2p ] = [V1 |V2 ], V1 = [v1 , ..., vp ], V2 = [vp+1 , ..., v2p ]

Also define
C = diag[c1 , ..., cp ]
so we get

N T N V1 = V1 C, N T N V2 = V2 C −1 , N T N V = V.diag[C, C −1 ]

Note that
V T V = V V T = I2p
ie V is a real orthogonal matrix. This is possible since N T N is real symmetric.
Define
W = [V2 |V1 ]
Then,
N T N W = W.diag[C −1 , C]
Now,
N T N JV = J(N T N )−1 V = JV.diag[C −1 , C]
or equivalently,

N T N Jvk = c−1
k Jvk , N N Jvk+p = ck vk+p , 1 ≤ k ≤ p
T

Hence, if we assume that the ck s are all distinct and further all differ from unity,
or more specifically, we may assume that c1 , ..., cd < 1 so that c−1 −1
1 , ..., cd ≥ 1,
then
Jvk = αk vk+p , Jvk+p = −αk−1 vk , 1 ≤ k ≤ p
where α1 , ..., αp are non-zero real numbers and since J T J = I, and the vk s are
normalized, it follows that

αk2 = 1, k = 1, 2, .., p

ie,
αk = ±1, 1 ≤ k ≤ p
Thus, we have

JV = W A, A = diag[α1 , ..., αp , −α1−1 , ..., −αp−1 ] = diag[A1 , −A−1


1 ]

where
A1 = diag[α1 , ..., αp ]
and
V T JV = V T W A
Advanced Probability and Statistics: Applications to Physics and Engineering 351

Now,
V2T o Ip
WTV = [V1 |V2 ] = = K = V TW
V1T Ip 0
say. Thus,
V T JV = KA
Then, taking transpose,

AT K = −V T JV = −KA

or since A is diagonal,
KA + AK = 0
or equivalently,
A1 = A−1
1

which merely tells us what we already know, ie, αk = ±1. We can actually,
without loss of generality assume that αk = 1, k = 1, 2, ..., p. In fact, this
simply amounts to changing the sign of those eigenvectors vk+p , k = 1, 2, ..., p for
which αk = −1. Equivalently, we may first define v1 , ..., vp as real orthonormal
eigenvectors of N T N with eigenvalues c1 , ..., cp respectively, and then define
vk+p = Jvk , k = 1, 2, ..., d. Then by simplecticity of N T N , it follows that
N T N vk+p = N T N Jvk = J(N T N )−1 vk = c−1 k Jvk , k = 1, 2, ..., p. We then
easily get A = diag[I, −I] and hence,

V T JV = J

With the condition

N T N = V.diag[C, C −1 ]V T , V T V = I = V V T

being satisfied. It follows that since D = diag[C, C −1 ] is also symplectic, then


defining M = D1/2 V T , we get that M T M = N T N and M is symplectic.

[7] Consider the following quantum state in L2 (Rn ):

q
ρ = C.exp(−(1/2)[q T , pT ]Q )
p

where

q = (q1 , ..., qn )T , p = (p1 , ..., pn )T , [qi , qj ] = 0, [pi , pj ] = 0, [qi , pj ] = iδij

Assume that Q is a real positive definite matrix. Then by Williamson’s theorem,


there exists a real 2p×2p real symplectic matrix L such that LQLT = diag[D, D]
where D is a p × p diagonal matrix with positive diagonal entries. Then there
exists a unique unitary Γ(L) in L2 (Rn ) such that

Γ(L)qΓ(L)∗ = L11 q + L12 p, Γ(L)pΓ(L)∗ = L21 q + L22 p


352 Advanced Probability and Statistics: Applications to Physics and Engineering

since by symplecticity of L it follows that q  = L11 q + L12 p and p = L21 q + L22 p


satisfy the canonical ccr when q, p satisfy the same. Then, we have

< e(u)|ρ|e(v) >=

q
C < e(u)|Γ(L)∗ Γ(L).exp((−(1/2)[q T , pT ]Q )|Γ(L)∗ Γ(L)|e(v) >
p

= C < e(L̃u)|exp((−1/2)[q T , pT ]LQLT [q T , pT ]T )|e(L̃v) >


= C < e(L̃u)|exp((−1/2)[q T , pT ]diag[D, D][q T , pT ]T )|e(L̃v) >
= C < e(L̃u)|exp((−1/2)(q T Dq + pT Dp))|e(L̃v) >

n
= C < e(L̃u)|exp((−1/2) dk (qk2 + p2k ))|e(L̃v) >
k=1

Writing √ √
ak = (qk + ipk )/ 2, a∗k = (qk − ipk )/ 2
we get that
[ak , aj ] = 0, [a∗k , a∗j ] = 0, [ak , a∗j ] = δkj
and then since
qk2 + p2k = 2a∗k ak + 1
we get


n
< e(u)|ρ|e(v) >= C. < e(L̃u)|exp((−1/2) dk (2a∗k ak + 1))|e(L̃v) >
k=1


n
= Cexp(−(1/2)T r(D)). < e(L̃u)|exp(− dk a∗k ak )|e(L̃v) >
k=1

= Cexp((−1/2)T r(D)) < e(L̃u)|n >< n|e(L̃v) > exp(−d.n)
n

= Cexp((−1/2)T r(D)) exp(−d.n)conj((L̃u)n ).(L̃v)n /n!
n

= Cexp(−(1/2)T r(D).exp((L̃u)∗ exp(−D).L̃v)


= Cexp((−1/2)T r(D)).exp(< u|L̃∗ .exp(−D)L̃|v >)
It is also clear that
C −1 =
T r(exp((−1/2)[q T , pT ]Q[q T , pT ]T ))
= T r(Γ(L)exp((−1/2)[q T , pT ]Q[q T , pT ]T )Γ(L)∗ )
= T r(exp((−1/2)[q T , pT ]LQLT [q T , pT ]T )) =
Advanced Probability and Statistics: Applications to Physics and Engineering 353


T r(exp((−1/2) dk (2a∗k ak + 1))) =
k

exp((−1/2)T r(D)) exp(−d.n) = exp((−1/2)T r(D))det(1 − exp(−D))
n

It follows that

< e(u)|ρ|e(v) >= det(1 − exp(−D)).exp((−1/2) < u|L̃∗ exp(−D)L|v >)

and hence ρ is a Gaussian state. Note that the Fourier transform of ρ is given
by
ρ̂(z) = T r(ρW (z)) =

= π −n < e(u)|ρ|e(v) >< e(v)|W (z)|e(u) > exp(−|u|2 /2 − |v|2 /2)dudv

−n
=π det(1 − exp(−D)) exp((−1/2) < u|L̃∗ exp(−D)L|v >)

×exp(−|z|2 /2− < z, u > + < v, z + u > −|u|2 /2 − |v|2 /2)dudv


is of the form of the exponential of a linear-quadratic function of z, z̄ and hence
ρ is a Gaussian state.

[8] Realizing a Gaussian channel using the GKSL equation. By a Gaussian


chanel, we mean a linear transformation on the space of operators that is CPTP
(completely positive trace preserving). The Weyl operator in L2 (Rn ) is given
by
W (z) = exp(−z̄.a − z.a∗ )
Define

n 
n
Lk = (c(k, j)aj + d(k, j)a∗j ), L∗k = (c̄(k, j)a∗j + d(k,
¯ j)aj ),
j=1 j=1

H= ωk a∗k ak
k

Here, c(k, j), d(k, j) are complex numbers. The GKSL equation in Heisenberg
matrix mechanics is

dX(t)/dt = i[H, X(t)] − (1/2)θ(X(t))

where 
θ(X) = (L∗k Lk X + XL∗k Lk − 2L∗k XLk )
k

= (L∗k [Lk , X] + [X, L∗k ]Lk )
k

We write
θT (X) = i[H, X] − (1/2)θ(X)
354 Advanced Probability and Statistics: Applications to Physics and Engineering

Then
X(t) = Tt (X(0)), Tt = exp(tθT )
Note that
Tt+s = Tt oTs , t, s ≥ 0
Tt∗is a CPTP map. If ρ is the state at time 0, then ρ(t) = Tt∗ (ρ) is the state at
time t:
T r(ρ(t)X) = T r(Tt∗ (ρ)X) = T r(ρ.Tt (X))
So
ρ̂(t, z) = T r(ρ(t)W (z)) = T r(ρ.Tt (W (z))
We shall derive a pde satisfied by ρ̂(t, z), the quantum Fourier transform of the
state at time t under the GKSL dynamics, ie dynamics of a quantum system
coupled to a bath. We have

[H, W (z)] = ωk [a∗k ak , W (z)] =
k

ωk ([a∗k , W (z)]ak + a∗k [ak , W (z)])
k

= ωk (z̄k W (z)ak + zk a∗k W (z))
k

= ωk (−|zk |2 + z̄k ak + zk a∗k )W (z)
k

Also, 
[Lk , W (z)] = c(k, j)[aj , W (z)] + d(k, j)[a∗j , W (z)]
j

= (c(k, j)zj + d(k, j)z̄j )W (z)
j

So,  
L∗k [Lk , W (z)] = (c(k, j)zj L∗k + d(k, j)z̄j L∗k )W (z)
k k,j

= [c(k, j)zj (c̄(k, m)a∗m +d(k,
¯ m)am )+d(k, j)z̄j (c̄(k, m)a∗ +d(k,
m
¯ m)am )]W (z)
k,j,m

Likewise, 
[W (z), L∗k ] = [W (z), (c̄(k, j)a∗j + d(k,
¯ j)aj )]
j

=− ¯ j)zj )W (z)
(c̄(k, j)z̄j + d(k,
j

so that 
[W (z), L∗k ]Lk =
k
Advanced Probability and Statistics: Applications to Physics and Engineering 355


− ¯ j)zj W (z)Lk )
(c̄(k, j)z̄j W (z)Lk + d(k,
k,j

=− ¯ j)zj ([W (z), Lk ] + Lk W (z))
c̄(k, j)z̄j ([W (z), Lk ] + Lk W (z)) + d(k,
k,j

=[ c̄(k, j)z̄j (c(k, m)zm + d(k, m)z̄m − c(k, m)am − d(k, m)a∗m )
k,j,m

+ ¯ j)zj (c(k, m)zm + d(k, m)z̄m − c(k, m)am − d(k, m)a∗ )]W (z)
(d(k, m
k,j,m

Combining all these equations gives us


θT (W (z)) = i[H, W (z)] − (1/2)θ(W (z)) =

i ωk (−|zk |2 + z̄k ak + zk a∗k )W (z)
k

−(1/2) [c(k, j)zj (c̄(k, m)a∗m +d(k,
¯ m)am )+d(k, j)z̄j (c̄(k, m)a∗
m
k,j,m

¯ m)am )]W (z)


+d(k,

−(1/2)[ c̄(k, j)z̄j (c(k, m)zm + d(k, m)z̄m − c(k, m)am − d(k, m)a∗m )
k,j,m

+ ¯ j)zj (c(k, m)zm + d(k, m)z̄m − c(k, m)am − d(k, m)a∗ )]W (z)
(d(k, m
k,j,m

=i ωk (−|zk |2 + z̄k ak + zk a∗k )W (z)
k

+q0 (z) + (ψ̄m (z)am − ψm (z)a∗m )
m

where

q0 (z) = (−1/2) ¯ j)c(k, m)zj zm +c̄(k, j)d(k, m)z̄j z̄m
[c̄(k, j)c(k, m)z̄j zm +d(k,
k,j,m
¯ j)d(k, m)zj z̄m ]
+d(k,
is quadratic in z, z̄ and

ψm (z) = (1/2) ¯ j)d(k, m)zj
[(c(k, j)c̄(k, m)zj +d(k, j)c̄(k, m)z̄j )−(d(k,
k,j
+c̄(k, j)d(k, m)z̄j )]
is linear in z, z̄. It follows that

θT (W (z)) = (q(z) + (φm (z)am + χm (z)a∗m )W (z)
m

where 
q(z) = q0 (z) − i ωk |zk |2 ,
k
356 Advanced Probability and Statistics: Applications to Physics and Engineering

φm (z) = ψ̄m (z) + iωm z̄m ,


χm (z) = −ψm (z) + iωm zm = −φ̄m (z)
We can thus write

θT (W (z)) = [q(z) + (φ̄m (z)am − φm (z)a∗m )]W (z)
m

where q(z) is quadratic in z, z̄ and φm (z) is linear in z, z̄. Now,

∂ ρ̂(t, z)/∂t =

T r(ρ(0).Tt (θT (W (z))) = T r(ρ(t)θT (W (z))



= q(z)ρ̂(t, z) + T r(ρ(t)(φ̄m (z)am − φm (z)a∗m )W (z)))
m

Now 
exp(δt. (φ̄m (z)am − φm (z)a∗m ))W (z)
m

= exp(δt.(φ̄.a − φ.a∗ )).exp(z̄.a − z.a∗ )


Now consider an operator G(t) such that

exp(t(φ̄.a − φ.a∗ )).G(t) =

exp(t(φ̄.a − φ.a∗ + z̄.a − z.a∗ )) = F (t)


say. Then, we get on differentiating w.r.t. t,

exp(t(φ̄.a − φ.a∗ ))(G (t) + (φ̄.a − φ.a∗ )G(t))

= (φ̄.a − φ.a∗ + z̄.a − z.a∗ ).exp(t(φ̄.a − φ.a∗ ))G(t)


and hence,
G (t) = exp(−t.ad(φ̄.a − φ.a∗ ))(z̄.a − z.a∗ )G(t)
= [(z̄.a − z.a∗ ) + t(φ̄.z − φ.z̄)]G(t)
so we deduce that

G(t) = exp((t2 /2)(φ̄.z − φ.z̄)).exp(t(z̄.a − z.a∗ ))

and hence

exp((δt/2)(φ̄.z − φ.z̄)).exp(δt.(φ̄.a − φ.a∗ )).exp(z̄.a − z.a∗ )

= exp(δt(φ̄.a − φ.a∗ ) + z̄.a − z.a∗ ))


This gives us
exp(δt.(φ̄(z).a − φ(z).a∗ ))W (z)
= exp(−(δt/2)(φ̄(z).z − φ(z).z̄)).W (z + δtφ(z))
Advanced Probability and Statistics: Applications to Physics and Engineering 357

Thus,

ρ̂(t + δt, z) = ρ̂(t, z) + δt.[q(z)ρ̂(t, z) + T r(ρ(t)(φ̄.a − φ.a∗ )W (z)))]

= δt.q(z).ρ̂(t, z)+
T r(ρ(t).exp(δt.(φ̄.z − φ.a∗ ))W (z))
= δt.q(z).ρ̂(t, z) + exp(−(δt/2)(φ̄(z).z − φ(z).z̄)).T r(ρ(t).W (z + δtφ(z)))
= δt.q(z)ρ̂(t, z) + exp(−(δt/2)(φ̄(z).z − φ(z).z̄)).ρ̂(t, z + δt.φ(z))
with neglect of O(δt2 ) terms. Thus,

∂ ρ̂(t, z)/∂t =

(q(z) − (1/2)(φ̄(z).z − φ(z).z̄)).ρ̂(t, z)+


+[(φ(z), ∂/∂z) + (φ̄(z), ∂/∂ z̄)]ρ̂(t, z)
= q1 (z)ρ̂(t, z) + [(φ(z), ∂/∂z) + (φ̄(z), ∂/∂ z̄)]ρ̂(t, z)
where
q1 (z) = q(z) − (1/2)(φ̄(z).z − φ(z).z̄)
is also a quadratic polynomial in z, z̄.

Williamson’s theorem is a result in Linear algebra that proves to be very


important and useful in the analysis of quantum Gaussian states.
[1] Williamson’s theorem: Let A be a 2n × 2n real positive definite matrix.
Then, there exists a real 2n × 2n symplectic matrix such that

LALT = diag[D, D]

where D is an n × n positive definite diagonal matrix.


Proof: Let
0 In
J=
−In 0
Define
B = A1/2 JA1/2
Then B is a 2n × 2n real skew-symmetric matrix and hence there exists a real
orthogonal matrix U of size 2n × 2n such that

0 D
B=U UT
−D 0

where D is some real n × n positive definite diagonal matrix. To prove this, we


first diagonalize B w.r.t. to a complex orthonormal basis:

Bvk = iλk vk , k = 1, 2, ..., 2n, λk ∈ R − {0}, < vk , vj >= δkj


358 Advanced Probability and Statistics: Applications to Physics and Engineering

Taking conjugate gives

Bv̄k = −iλk v̄k , k = 1, 2, ..., 2n

Thus without loss of generality, we may assume that λ1 , ..., n > 0. Then, define

uk = Re(vk ), wk = Im(vk ), k = 1, 2, ..., n

Then, taking real and imaginary parts of the above equations,

B.uk = −λk .wk , B.wk = λk uk , k = 1, 2, ..., n

and moreover,
uk = (vk + v̄k )/2, wk = (vk − v̄k )/2i
implies in conjunction with the fact that since vk is an eigenvector of the Her-
mitian matrix −iB with eigenvalue λk while v̄k is an eigenvector of −iB with
eigenvalue −λk , we have vkT vj = 0, k, j = 1, 2, ..., n. Then

uTk uj = (1/4)(vkT vj + v̄kT v̄j + vkT v̄j + vk∗ vj )

= (1/4)(0 + 0 + δkj + δkj ) = (1/2)δkj , k, j = 1, 2, ..., n


and likewise,
wkT wj = (1/2)δkj
while,

uTk wj = (1/4i)(vkT vj + vk∗ vj − vkT v̄j − vk∗ v̄j ) = 0, k, j = 1, 2, ..., n

This proves the claim. Now, define



0
√ D
L = A−1/2 U
− D 0

Then it is easily seen that

LJLT = A−1/2 BA−1/2 = J

and further,
LT AL = diag[D, D]
It is clear that LJLT = J implies J = L−1 JL−T which in turn implies on taking
inverse that LT JL = J. Thus, L is a symplectic matrix satisfying the desired
conditions.

[2] Let L be a symplectic matrix. Then,

Γ(L)∗ W (z)Γ(L)|e(u) >= Γ(L)∗ W (z)|e(L̃u) >

so
< e(v)|Γ(L)∗ W (z)Γ(L)|e(u) >=< e(L̃v)|W (z)|e(L̃u) >
Advanced Probability and Statistics: Applications to Physics and Engineering 359

= exp(−|z|2 /2− < z, L̃u > + < L̃v|L̃u + z >)


On the other hand,

< e(v)|W (L̃z)|e(u) >= exp(−|L̃z|2 /2− < L̃z|u > + < v|L̃z + u >)

In particular, if L̃ is unitary, then it follows by replacing L̃ with L̃∗ that

Γ(L)W (z)Γ(L)∗ = W (L̃z)

Question: What is the class of symplectic matrices L for which this identity is
true ?
[11] A set of prerequisites for understanding the mathematical foundations
of quantum mechanics, quantum stochastics and quantum scattering theory.
[1] Unbounded operators in a Hilbert space.
[2] Born scattering
[3] Basics of quantum field theory.
[4] Statistics of Brownian motion, Poisson processes and other stochastic
processed derived from these.
[5] Quantum stochastic integration and quantum stochastic calculus.
[7] Lippman-Schwinger equations in quantum scattering theory.

H = H0 + V, U0 (t) = exp(−itH0 ), U (t) = exp(−itH)

|Φa > are the free particle states and |Ψa > are the corresponding scattered
states. The input scattered state |Ψa > at energy Ea satisfies

|Ψ+
a >= |Φa > −(H0 − Ea + i )
−1
V |Ψ+
a > − − −(1)

or equivalently, using the spectral theorem,



|Ψa >= |Φa > − |Φb >< Φb |V |Ψ+
+
a > dEb /(Eb − Ea + i )

or equivalently, in the time domain,



g(a)exp(−iEa t)|Ψ+ a > da
 
= g(a)exp(−iEa t)|Φa > da− g(a)exp(−iEa t)|Φb > Tba dadb/(Eb −Ea +i )

where
Tba =< Φb |V |Ψ+
a >

If the contour for the integral w.r.t a = Ea on the rhs is taken over the infinite
lower semicircle, then the pole at Ea = Eb + i is not enclosed and the resulting
contour integral is zero. However, in this case, as t → ∞, it is clear that the
contribution to the contour integral from semicircular arc goes to zero since
−iEa t for Ea having a negative imaginary part goes to −∞ as t → ∞. Thus,
the contour integral is the same as the integral over R and hence, we deduce
that  
g(a)exp(−iEa t)|Ψ+ a > da − g(a)exp(−iEa t)|Φa > da
360 Advanced Probability and Statistics: Applications to Physics and Engineering

converges to zero as t → ∞. This proves that |Ψ+ a > defined by the Lippman-
Schwinger equation (1) is an out scattered state, ie, its time dependent version
converges at t → ∞ to the corresponding time dependent version of the free
particle state |Φa >. A similar argument shows that if |Ψ− a > is defined by
|Ψ−
a >= |Φa > −(H0 − Ea − i )
−1
V |Ψ−
a > − − −(2)

then |Ψ−
a > is an in-scattered state, ie, its time dependent version
g(a)exp(−iEa t)|Ψ−
a >

da converges as t → −∞ to g(a)exp(−iEa t)|Φa > da. We therefore define the


wave operators Ω+ , Ω− by the formulas
Ω+ |Φa >= |Ψ+ −
a >, Ω− |Φa >= |Ψa >
The scattering matrix element Sba (E) where Ea = Eb = E is then defined by
Sba (E) =< Ψ+ − + ∗
b |Ψa >=< Φb |Ω+ Ω− |Φa >
or equivalently, the scattering matrix at energy B is given by
S(E) = Ω∗+ Ω− =
(I + (H0 − E + i )−1 V )−1∗ (I + (H0 − E − i )−1 V )−1
= (I + V (H0 − E − i )−1 )−1 (I + (H0 − E − i )−1 V ))−1
= (H0 − E − i ).(H − E − i )−2 .(H0 − E − i )
= (H − E − i − V ).(H − E − i )−2 .(H − E − i − V )
= I − V (H − E − i )−1 − (H − E − i )−1 V + V (H − E − i )−2 V
=I +R
where

R = −V (H − E − i )−1 − (H − E − i )−1 V + V (H − E − i )−2 V

A better representation of the scattering matrix is given by the formula

S(b, a) = δ(Eb − Ea )(I + R(Ea ))

where
R(E) = 2πi(V − V (H − E + i )−1
To see this, we start with the Lippman-Schwinger equations

Ω+ (Eb ) = (I + (H0 − Eb + i )−1 V )−1 ,

Ω− (Ea ) = (I + (H0 − Ea − i )−1 )−1 V


so that
S(Eb , Ea ) = Ω+ (Eb )∗ Ω− (Ea )
= (I + V (H0 − Eb − i )−1 )−1 .(I + (H0 − Ea − i )−1 V )−1
Advanced Probability and Statistics: Applications to Physics and Engineering 361

= (H0 − Eb − i )(H − Eb − i )−1 (H − Ea − i )−1 (H0 − Ea − i )


= (H − Eb − i − V )(H − Eb − i )−1 (H − Ea − i )−1 (H − Ea − i − V )
= (I − V (H − Eb − i )−1 )(I − (H − Ea − i )−1 )V
= I + R(b, a)
where

R(b, a) = −V (H−Eb −i )−1 −(H−Ea −i )−1 V +V (H−Eb −i )−1 (H−Ea −i )−1 V

= −V (H−Eb −i )−1 −(H−Ea −i )−1 V +(Eb −Ea )−1 V [(H−Eb −i )−1


−(H−Ea −i )−1 )]V

or equivalently,

R(b, a)(Eb −Ea ) = V (H −Eb −i )−1 (V −Eb +Ea )−(Eb −Ea +V )(H −Ea −i )−1 V

Thus,
R(b, a)(Eb − Ea )δ(Eb − Ea ) =
−(Eb − Ea )δ(Eb − Ea ).[V (H − Ea − i )−1 + (H − Ea − i )−1 V ]
This equation is the same as

R(a, a)dEb .δ(Eb − Ea ) = R(a, a)dEa =

= −[V (H − Ea − i )−1 + (H − Ea − i )−1 V ]dEa


or equivalently,

R(a, a) = −[V (H − Ea − i )−1 + (H − Ea − i )−1 V ]

(A remark of caution: It should be noted that if we erroneously interpret


(Eb − Ea )−1 as (Eb − Ea + i )−1 (by replacing Eb with Eb + i ) which is P (Eb −
Ea )−1 − iπδ(Eb − Ea ), then we get

R(b, a)δ(Eb , Ea ) = −iπδ(Eb − Ea )[V (H − Ea − i )−1 + (H − Ea − i )−1 V ]

which in turn implies

R(b, a)δ(Eb , Ea )dEb = R(a, a)dEa =

−iπδ(Eb , Ea )[V (H − Ea − i )−1 + (H − Ea − i )−1 V ]


This is an incorrect result since energies are all real and we cannot interpret Eb
as Eb + i .)
On the other hand,

V − V (H − E + i )−1 V = −V (H − E + i )−1 (V − (H − E + i ))

= V (H − E + i )−1 (E − H0 − i )
362 Advanced Probability and Statistics: Applications to Physics and Engineering

and likewise this also equals


−(V − (H − E + i ))(H − E + i )−1 V

= (H0 − E + i )(H − E + i )−1 V


It follows that
2(V − V (H − E + i )−1 V ) =
V (H − E + i )−1 (E − H0 − i ) + (H0 − E + i )(H − E + i )−1 V
Thus if |E > denotes a state of the free particle Hamiltonian H0 at continuous
energy E, we have
2 < Eb |V − V (H − Ea + i )−1 V |Ea >=

(Eb − Ea + i ) < Eb |(H − Ea + i )−1 V |Ea >


and noting that

(Eb − Ea + i )−1 = P (Eb − Ea )−1 − iπδ(Eb − Ea )

where P x−1 equals zero when x = 0 and x−1 otherwise, we get on multiplying
both sides of the above equation by δ(Eb , Ea )(Eb −Ea +i )−1 = −iπ.δ(Eb −Ea ),

−2πiδ(Eb −Ea ) < Eb |V −V (H−Ea +i )−1 V |Ea >


= δ(Eb , Ea ) < Eb |(H−Ea +i )−1 V |Ea >
which gives on multiplying by dEb ,
−2πi < Ea |V −V (H−Ea +i )−1 V |Ea > dEa =< Ea |(H−Ea +i )−1 V |Ea > dEa

and likewise,

−2πiδ(Eb −Ea ) < Eb |V −V (H−Eb +i )−1 V |Ea >

= δ(Eb , Ea ) < Eb |V (H−Eb +i )−1 |Ea >

which gives on multiplying by dEb ,

−2πi < Ea |V −V (H−Ea +i )−1 V |Ea > dEa =< Ea |V (H−Ea +i )−1 |Ea > dEa

It follows from this discussion that we can interpret S(E) = I + 2πi(V −


V (H − E + i )−1 V as the scattering matrix at energy E provided that while
forming the matrix elements, we assume that the initial and final states have
the same energy E for the free particle Hamiltonian with only the directions of
the initial and final particle possibly differing.
[12] Test problems on basic quantum mechanics
[1] Explain using plane waves and the De-Broglie wave-particle duality and
Planck’s relation between energy and frequency, why the momentum operator
in position space must be taken as p = (−ih/2π)∇ and the energy operator as
E = (ih/2π)∂/∂t.
Advanced Probability and Statistics: Applications to Physics and Engineering 363

[2] For a free particle, the total energy operator is

H = p2 /2m = (−h2 /8π 2 m)∇2

If a such a particle is enclosed within a cuboid of side lengths a, b, d, then


derive the energy spectrum by applying the boundary conditions that the wave
function ψ(r) satisfies the stationary Schrodinger equation

Hψ(r) = Eψ(r)

within the box and that ψ vanishes at the boundary. Show that this energy
spectrum is given by

E(n, m, p) = (π 2 h2 /8π 2 m)(n2 /a2 + m2 /b2 + p2 /d2 ), n, m, p = 1, 2, 3, ...

with the corresponding normalized eigenfunctions given by

ψnmp (r) = (8/abd)1/2 sin(nπx/a)sin(mπy/b)sin(pπz/d)

Show by explicit calculation that these eigenfunctions are orthonormal.


[3] If a particle moves in a finite potential V (x) in one dimension, then its
stationary state Schrodinger equation is given by

ψ  (x) + (8π 2 m/h2 )(E − V (x))ψ(x) = 0

Show that this equation implies the continuity conditions

ψ(x + 0) = ψ(x − 0), ψ  (x + 0) = ψ  (x − 0), x ∈ R

Use this fact to solve the quantum mechanical tunneling problem: If the poten-
tial is V (x) = V1 , x < 0, V (x) = V2 , 0 < x < L, V (x) = V3 , x > L and if the
particle comes from x = −∞ with an energy of V1 , V3 < E < V2 , then the sta-
tionary state Schrodinger equation for this particle is given by (take h/2π = 1)

ψ  (x) + 2m(E − V1 )ψ(x) = 0, x < 0,

ψ  (x) + 2m(E − V2 )ψ(x) = 0, 0 < x < L,


ψ  (x)2m(E − V3 )ψ(x) = 0, x > L
with boundary conditions derived from above:

ψ(0−) = ψ(0+), ψ  (0−) = ψ  (0+), ψ(L−) = −ψ(L+), ψ  (L−) = ψ  (L+)

The solution corresponding to plane waves from the left getting partly reflected
and partly transmitted at x = 0 and getting transmitted fr0m x = L to x → ∞
is given by
ψ(x) = C1 .exp(ik1 x) + C2 .exp(−ik1 x), x < 0,
ψ(x) = C3 exp(αx) + C4 exp(−αx), 0 < x < L,
364 Advanced Probability and Statistics: Applications to Physics and Engineering

ψ(x) = C5 .exp(ik2 x), x > L


Derive by applying the above boundary conditions formulas for the reflection
and transmission coefficients:

R = |C2 |2 /|C1 |2 , T = |C5 |2 /|C1 |2

in terms of k1 , k2 , α. Also evaluate k1 , k2 , α in terms of E, V1 , V2 .


[4] Prove that the position and momentum operators in L2 (R) are unbounded
operators that are self-adjoint. Specifically, the domain of the position operator
is 
D(x) = {f : x2 (1 + |f (x)|2 )dx < ∞}
R
while that of the momentum operator is

D(p) = {f : (|f (x)|2 + |f  (x)|2 )dx < ∞}
R

For proving that x, p are self-adjoint, you may use Von-Neumann’s condition:
Verify that the operators (x + i)−1 , (p + i)−1 are bounded operators in L2 (R),
and hence x + i, p + i both have range equal to the whole of L2 (R).
[5] Path integrals
[a] If H = p2 /2m + V (x) is the Hamiltonian of a non-relativistic particle,

the associated Lagrangian is L(q, q  ) = mq 2 /2 − V (q). Evaluating

ψ1 (q2 ) = C. exp(iΔ.L(q1 , (q2 − q1 )/Δ)))ψ0 (q1 )dq1

upto O(Δ) show that with the constant C chosen appropriately, we have

ψ1 (q2 ) = ψ0 (q2 ) + Δ.(−ψ0 (q2 )/2m + V (q2 )ψ0 (q2 )) + o(Δ)

Deduce that with neglect of o(Δ) terms,

ψ1 (q2 ) =< q2 |exp(−iΔ.H)|ψ0 >

Conclude that if T > 0, we can express the solution to the Schrodinger evolution
equation after time T as

ψT (q2 ) = KT (q2 , q1 )ψ0 (q1 )dq1

where
 
n−1
KT (q, q0 ) = limn→∞ Cn . exp(i Δ.L(qk , (qk+1 − qk )/Δ))dq1 ...dqn−1
k=0

where
Δ = Δn = T /n, qn = q
Advanced Probability and Statistics: Applications to Physics and Engineering 365

for an appropriate sequence of constants {Cn }. Justify that this solution to the
Schrodinger evolution kernel can be expressed as a path integral
  T
KT (q, q0 ) = exp(i L(q(t), q  (t))dt)Π0<t<T dq(t)
q(0)=q0 ,q(T )=q 0

[b] Evaluate the Schrodinger evolution kernel for a 1-D quantum harmonic
oscillator with Hamiltonian

H = p2 /2m + mω 2 q 2 /2

using path integrals and verify that it agrees with that evaluated by actually
solving the stationary Schrodinger equation for the eigenfunctions ψn (x), n =
0, 1, 2, ... with respective energy eigenvalues En , n = 0, 1, 2, ... and then forming
the evolution kernel


KT (x2 , x1 ) = exp(−iEn T )ψn (x2 )ψ̄n (x1 )
n=0

hint: The Lagrangian is given by



L(q, q  ) = mq 2 /2 − mω 2 q 2 /2

Now expand q(t) as a Fourier sinewave series over [0, T ] keeping the end points
fixed:  
q(t) = a + bt + qn 2/T .sin(nπt/T )
n≥1

where
a = x1 , a + bT = x2
Then justify that the path measure Π0<t<T dq(t), after an appropriate normal-
ization, can be replaced by the product measure Πn≥0 dqn and then substitute
the Fourier series expansion for q(t) into the action integral and evaluate the
path integral using standard Gaussian integrals.

[c] Evaluate using the above method, the path integral for a forced harmonic
oscillator described by the Lagrangian

L(q, q  , t) = mq 2 /2 − mω 2 q 2 /2 + f (t)q, 0 ≤ t ≤ T

[d] In the limit as h → 0, show that the path integral



exp(iS(q)/h)Dq
366 Advanced Probability and Statistics: Applications to Physics and Engineering

behaves as exp(iS(q ∗ )/h) where q ∗ is the classical path obtained by extremizing


the action S keeping the end-points fixed. What does this result mean from the
viewpoint of approximating quantum mechanics with the equations of classical
mechanics for macroscopic systems ?
[d] Let H(t, q, p) be any Hamiltonian. The evolution kernel in position space
for this Hamiltonian over the small time interval [t, t + Δ] is given by

< q  |exp(−iΔ.H(t, q, p))|q  >= K(q”, t + Δ|q  , t)

where q  , q  are real vectors and q, p are the position and momentum operator
vectors. Show that this kernel can be expressed as

int < q  |p > dp < p |exp(−iΔ.H(t, q, p))|q  >

so that if by using the canonical commutation relations, we push all the p s to


the left of all the q  s in H(t, q, p), then the above integral becomes

< q  |p > dp < p |q  > .exp(−iΔ.H(t, q  , p ))

Note that now H(t, q  , p ) is a real/complex number, not an operator. Now show
that the position space wave functions that are eigen-functions of the momentum
operator are
< q  |p >= C.exp(iq  , p )
Hence, the above kernel becomes

exp(ip .(q  − q  ) − Δ.H(t, q  , p ))dp

Deduce using this result that the finite time evolution kernel corresponding to
this Hamiltonian is the path integral
  T
exp(i (p(t).q  (t) − H(t, q(t), p(t)))Π0<t<T dq(t)dp(t)
0

Deduce that in the special case when H(t, q, p) = p2 /2m + V (q), we get on
integration w.r.t p in this path integral the earlier result for non-relativistic
quantum mechanics involving a path integral only over q.

[13] Large deviation theory in quantum mechanics.


Let H(t) = H0 + f (t)V be the Hamiltonian of a quantum system with f (t)
a zero mean Gaussian process. Define

V (t) = exp(itH0 )V.exp(−itH)

Then, the transition probability from state |n > to state |m > in time t is given
by  t
2
Pt (m|n, ) = | f (s) < m|V (s)|n > ds|2 + O( 3 )
0
Advanced Probability and Statistics: Applications to Physics and Engineering 367

This transition probability is random since f (.) is a random process. Now in


the limit as → 0, Pt (m|n, ) → 0 and we wish to determine the rate at which
this transition probability converges to zero. It is clear that Pt (m|n, ) has a χ2 -
distribution since it is a quadratic functional of a zero mean Gaussian process.
So we can apply the contraction principle to determine the rate function of this
transition probability.
The rate function of the process .f (.) over the time interval [0, T ] is given
by
IT (x) = sup(< λ, x >T −Λ̄(λ))
where  T
2 −1
Λ̄(λ) = lim →0 .logE[exp( λ(t)f (t)dt)]
0

= (1/2) Rf f (t, s)λ(t)λ(s)dtds
[0,T ]2

and hence 
IT (x) = (1/2) Rf−1
f (t, s)x(t)x(s)dtds
[0,T ]2

Now, consider the random variable



2
Z( ) = Q( .f ) = q(t, s)f (t)f (s)dtds
[0,T ]2

By the contraction principle, the rate function of the family of r.v’s Z( ), → 0


is given by
IZ (z) = inf {IT (x) : Q(x) = z}
This minimization is carried out using the method of Lagrange multipliers, ie,
we minimize
IT (x) − μ(Q(x) − z)
w.r.t x, μ. Setting the variational derivative of this quantity w.r.t x to zero gives
us  T  T
Rf f (t, s)x(s)ds − μ. q(t, s)x(s)ds = 0, 0 ≤ t ≤ T
0 0

Let μ0 be the minimum generalized eigenvalue for this generalized eigenvalue


problem. Then if c.x0 (t) is the corresponding generalized eigenfunction, we have

c2 Q(x0 ) = z

so 
c= z/Q(x0 )
and hence
IZ (z) = IT (cx0 ) = z.IT (x0 )/Q(x0 ) = μ0 z
This is the rate function for the family Z( ), → 0.
368 Advanced Probability and Statistics: Applications to Physics and Engineering

[14] Harmonic oscillators with small anharmonic perturbations The


Hamiltonian is
H = H0 + V
where the unperturbed Hamiltonian is


p
H0 = ω(k)a∗k ak ,
k=1

and the anharmonic perturbation is

V = f (a∗ , a)

where
a = (a1 , ..., ap ), a∗ = (a∗1 , ..., a∗p )
f is assumed to be a polynomial function. The Bosonic commutation relations
are
[ak , a∗m ] = δkm
We have
[W (z), ak ] = −zk W (z), [W (z), a∗k ] = −z̄k W (z)
Equivalently,

W (z)ak W (−z) = ak + zk , W (z)a∗k W (0 − z) = a∗k + z̄k

and hence,
W (z)f (a∗ , a)W (−z) = f (a∗ + z̄, a + z)
which gives

[f (a∗ , a), W (z)] = −(W (z)f (a∗ , a)W (−z) − f (a∗ , a))W (z)

= −(f (a∗ + z̄, a + z) − f (a∗ , a))W (z)


Now, the GKSl generator for observables, can be expressed as

θT (X) = i[H, X] − (1/2)(L∗k [Lk , X] + [X, L∗k ]Lk ) = i[H, X] − (1/2)θ(X)

Let
Lk = lk .a + mk .a∗ , L∗k = ¯lk .a∗ + m̄k .a
or equivalently, when written in full,


p
Lk = (lk (j)aj + mk (j)a∗j ),
j=1


p
L∗k = (¯lk (j)a∗j + m̄k (j)aj )
j=1
Advanced Probability and Statistics: Applications to Physics and Engineering 369

We shall more generally consider Lindblad operators that are arbitrary functions
of the creation and annihilation operators:
Lk = Fk (a∗ , a)
In the special case of Lk s that are linear in the a, a∗ , we have as observed earlier,

θ(W (z)) = (q0 (z) + ψ̄(z).a − ψ(z).a∗ )W (z) + [

where ψ(z) is a linear function of z, z̄ and q0 (z) is a quadratic function of z, z̄.


In this more general nonlinear case, we have

θ(W (z)) = Fk∗ [Fk , W (z)] + [W (z), Fk∗ ]Fk

= −Fk∗ (Fk (a∗ + z̄, a + z) − Fk )W (z) − (Fk (a∗ − z̄, a − z) ∗ −Fk∗ )Fk W (z)
Then taking H = V = f (a∗ , a), we get

[H, W (z)] = (−(f (a∗ + z̄, a + z) − f (a∗ , a))W (z)

Thus the total Lindblad generator is given by



θT (W (z)) = [(f (a∗ , a)−f (a∗ +z̄, a+z))+(1/2) Fk∗ (Fk (a∗ +z̄, a+z)
k
−Fk )+(Fk (a∗ −z̄, a−z)∗ −Fk∗ )Fk ]W (z)
and hence we find that

Tt (W (z)) = exp(t.θT )(W (z)) = exp(tG(z, a∗ , a))W (z)


where

G(z, a∗ , a) = [(f (a∗ , a)−f (a∗ +z̄, a+z))+(1/2) Fk∗ (Fk (a∗ +z̄, a+z)
k

−Fk )+(Fk (a∗ −z̄, a−z)∗ −Fk∗ )Fk ]


Define 
ψ(a∗ , a) = f (a∗ , a) − (1/2) Fk (a∗ , a)∗ Fk (a∗ , a)
k

and

χ(z, a∗ , a) = −f (a∗ +z̄, a+z)+(1/2) [Fk (a∗ , a)∗ Fk (a∗ +z̄, a+z)+
k
Fk (a∗ −z̄, a−z)∗ Fk (a∗ , a)]
Then we can write
θT (W (z)) = [ψ(a∗ , a) + χ(z, a∗ , a)]W (z)

Thus,
Tt (W (z)) = exp(t(ψ(a∗ , a), χ(z, a∗ , a)))W (z)
and hence
r̂ho(t, z) = T r(ρ(0).exp(t(ψ(a∗ , a) + χ(z, a∗ , a)))W (z))
370 Advanced Probability and Statistics: Applications to Physics and Engineering

We have then,
∂ ρ̂(t, z)/∂t =
T r(ρ(t).(ψ(a∗ , a) + χ(z, a∗ , a))W (z))
Now write

∗ ∗
ψ(a , a) + χ(a, a , a) = K(z, u).exp(ū.a − u.a∗ )dudū

Then,
∂ ρ̂(t, z)/∂t =

= K(z, u).T r(ρ(t).exp(ū.a − u.a∗ )W (z))dudū

Now,
exp(ū.a − u.a∗ )W (z) = W (u)W (z)
W (z)|e(v) >= exp(−|z|2 /2− < z, v >)|e(v + z) >
W (u)W (z)|e(v) >= exp(−|z|2 /2− < z, v > −|u|2 /2− < u, v+z >)|e(v+z+u) >
= exp(−|z + u|2 /2− < z + u, v > +Re(< z, u >)− < u, z >)|e(v + z + u) >
= exp(−iIm(< u, z >))W (u + z)|e(v) >
and hence,
∂ ρ̂(t, z)/∂t =

= K(z, u).T r(ρ(t).exp(ū.a − u.a∗ )W (z))dudū

= K(z, u)exp(−iIm(< u, z >))ρ̂(t, u + z)dudū

[15] Dirac’s equation in a radial potential, a simplified approach

H = (α, p) + βm + V (r)

Define
αr = (α, n), n = r/r
Then,
αr= 1,
Note that
σ 0
α=
0 −σ
We also denote by σ, the 4 × 4 matrix vector

diag[σ, σ] = I2 ⊗ σ
Advanced Probability and Statistics: Applications to Physics and Engineering 371

(σ, n)(σ, p) 0
αr (α, p) =
0 (σ, n)(σ, p)
= pr + i(σ, n̂ × p) = pr + ir−1 (σ, L)
where
pr = (n, p) = −i∂/∂r, L = r × p
Thus,
(α, p) = αr (pr + ir−1 (σ, L))
and we get
H = αr (pr + ir−1 (σ, L)) + βm + V
Note that
[αr , pr ] = r−1 [r−1 (α, r), (r, p)]
= r−2 αk xm [xk , pm ] + r−1 [r−1 , (r, p)](α, r)
= ir−2 (α, r) + r−1 xk [r−1 , pk ](α, r)
= ir−1 αr − ir−1 .xk xk /r3 (α, r)
= ir−1 αr − ir−1 .(xk xk /r3 )(α, r) = 0
Now define the observable
k = β((σ, L) + 1)
where by σ, we mean diag[σ, σ]. We have

0 (σ, L) + 1
k=
(σ, L) + 1 0

Then,
[αr , k] = [diag[(σ, n), (σ, n)], k]
0 {(σ, n), (σ, L) + 1}
=
−{(σ, n), (σ, L) + 1} 0
where {., .} means anticommutator. Now,

{(σ, n), (σ, L)} = i(σ, n × L + L × n)

since
n.L = L.n = 0
Now,
(n × L + L × n)a = (abc)(nb Lc − Lc nb )
= (abc)[nb , Lc ]
But,
[nb , Lc ] = (crs)[nb , xr ps ] = (crs)xr [nb , ps ]
= i (crs)xr ∂nb /∂xs
372 Advanced Probability and Statistics: Applications to Physics and Engineering

= i (crs)xr (rδbs − xb xs /r)/r2

= i (crb)xr /r
and thus,
(abc)[nb , Lc ] = i (abc) (crb)xr /r = 2ixa /r = 2ina
It follows that
{(σ, n), (σ, L)} = −2(σ, n)
and hence, we deduce that
[αr , k] = 0

[16] A simulational problem in molecular chemistry has been addressed in


the paper. The simulation is based on using quantum circuits in place of a
classical computer. Specifically, simulation of the dynamics of a laser driven
isomerization reaction has been carried out. The reaction is described by a
Hamiltonian
H(t) = Hmol + Hint (t)
where Hmol = T + V is the time independent molecular Hamiltonian and
Hint (t) describes the laser-molecule interaction Hamiltonian as a function of
time. Hint (t) has the standard form of an interaction between the electric
dipole moment of the molecule and the laser time dependent electric field. The
authors use a formula equn.(7) for the corresponding approximate (upto second
order in dt) unitary evolution U (t + dt, t) based on using the Schrodinger equa-
tion with Hamiltonian H(t). For facilitating readability, a short proof of this
formula may be included in the appendix. The proof may be based for example
on the expansion

exp(−iV dt/2).exp(−i(E(d+dt/2)dt/2).exp(−iT dt).exp(−iE(t+dt/2)dt/2).exp(−iV dt/2)

= (1 − iV dt/2 − V 2 dt2 /8)(1 − iE(t + dt/2)dt/2 − E 2 (t + dt/2)dt2 /8)

×(1−iT dt/2−T 2 dt2 /8)(1−iE(t+dt/2)dt/2−E 2 (t+dt/2)/8)(1−iV dt/2−V 2 dt2 /8)

+O(dt3 )
The authors simplify the computation of this evolution operator by passing over
to the momentum representation where the kinetic energy operator T becomes
diagonal. To go over from the position to the momentum representation, they
use the quantum Fourier transform. Finally, to simulate the reaction dynamics,
the authors use a 3-qubit system based on discretizing space into eight pixels
and representing the wave function in space by an 8 × 1 column vector that
varies with time. The paper is interesting both from a theoretical and an ap-
plication viewpoint. Some indication of higher order in dt approximations of
the Schrodinger unitary evolution operator U (t + dt, t) may be provided in the
Advanced Probability and Statistics: Applications to Physics and Engineering 373

appendix. Moreover, I suggest a derivation of equn.(7) based on the Campbell-


Baker-Hausdorff formula in Lie group theory, ie, a formula of the form
exp(tA).exp(tB) = exp(t(A + B) + c1 (t)[A, B] + c2 (t)[A, [A, B]] + ...)
After including these comments, I recommend publication.

[17] Quantum image processing via Gaussian states


[1] Let X(n, m), 1 ≤ n, m ≤ M be a classical image field with intensities
normalized to fall in the range [0, 1]. We assume M to be divisible by K and
represent X as a block structured matrices with M/K × M/K blocks each
block being of size K × K. Thus, the (p, q)th block is the K × K matrix
Xp,q (u, v) = X(K(p−1)+u, K(q −1)+v), 1 ≤ u, v ≤ K. Here, 1 ≤ p, q ≤ M/K.
We convert each such block into a quantum pure state vector |ψp,q >of size
2
2K × 1 as

K 
|ψp,q >= (Xp,q (u, v)|1 > + 1 − Xp,q (u, v)2 |0 >)
u,v=1

where the tensor product is taken in the lexicographic order. Define the column
vector

K
|a(u, v) >= (δ(u − u , v − v  )|1 > +(1 − δ(u − u , v − v  ))|0 >)
u ,v  =1

Then clearly
< a(u, v)|ψp,q >= Xp,q (u, v)
and therefore,

K
|ψp,q >= Xp,q (u, v)|a(u, v) >
u,v=1

[2] Choose R independent creation and annihilation operator pairs (a∗k , ak ), k =


1, 2, ..., R satisfying the CCR [ak , a∗j ] = δkj , [ak , aj ] = 0 and construct a Gaus-
sian state

R

k=1 (1 − exp(−λk ))]exp(−


ρ(λ1 , ..., λR ) = [ΠR λk a∗k ak )
k=1

We call this a diagonal Gaussian state since it is diagonal w.r.t. the canonical
basis |n >= |n1 , ..., nR >, nk = 0, 1, ... with
a∗k ak |n >= nk |n >, 1 ≤ k ≤ R
We approximate ρ by its truncated version

Q−1
ρ̃ = |n1 , ..., nR > p(n1 , ..., nR |λ)(< n1 , ..., nR |
n1 ,..,nR =0
374 Advanced Probability and Statistics: Applications to Physics and Engineering

where
p(n1 , ..., nR |λ) = exp(−λ1 n1 + ... + λR nR )/ZQ (λ)
where

Q−1
ZQ (λ) = exp(−λ1 n1 + ... + λR nR )
n1 ,...,nR =0

= ΠR
k=1 ZQ (λk )

with

Q−1
ZQ (x) = exp(−nx) = (1 − exp(−Qx))/(1 − exp(−x))
m=0

Now consider the Boson Fock space Γs (L2 (R+ )). Choose distinct vectors
u1 , ..., uP in L2 (R+ ) so that P = QR . Now consider the vectors

P
|f (r) >= c(r, s)|e(us ) >, r = 1, 2, ..., P
s=1

with the complex constants c(r, s) chosen so that

< f (r)|f (r ) >= δ(r, r ), r, r = 1, 2, .., P

This means that



P
c̄(r, s)c(r , s )exp(< us |us >) = δ(r, r )
s,s =1

or equivalently,
C̄WC = IP
where
C = ((c(r, s))), W = ((exp(< us |us >)))
One way to choose C is to take
C = W−1/2
One purification of ρ̃ is given by


Q−1

|φ >= p(n|λ)|n > ⊗|f (r(n) >
n1 ,...,nR =0
where

r(n) = QR−1 n1 +QR−2 n2 +...+QnR−1 +nR +1, n = (n1 , ..., nR ), n1 , ..., nR


= 0, 1, ..., Q−1
Note that n → r(n) is a one-one mapping of {0, 1, ..., Q − 1}R onto {0, 1, ..., P −
1}. We now observe how to compute the matrix elements of the creation and
Advanced Probability and Statistics: Applications to Physics and Engineering 375

annihilation differentials in Boson Fock space relative to the vectors |f (r(n) >
, n ∈ {0, 1, ..., Q − 1}R :

< f (r(m))|dA(t)|f (r(n)) >= c̄(r(m), s).c(r(n), s ) < e(us )|dA(t)|e(us ) >
s,s

= dt c̄(r(m, s).c(r(n), s )us (t).exp(< us |us >)
s,s

and likewise,

< f (r(m))|dA(t)∗ |f (r(n)) >= c̄(r(m), s).c(r(n), s ) < e(us )|dA(t)∗ |e(us ) >
s,s

= dt c̄(r(m, s).c(r(n), s )ūs (t).exp(< us |us >)
s,s

We then consider a similar purification of another Gaussian state:


Q−1

|ψ >= q(n|λ)|n > ⊗|f (r(n) >
n1 ,..,nR =0

We consider the following second order Dyson series approximation for the HP
equation in the absence of a Hamiltonian:
 T 
W = I−i (LdA(t)−L∗ dA(t)∗ )− (LdA(t)−L∗ dA(t)∗ )(LdA(s)−L∗ dA(s)∗ )
0 0<s<t<T

−iT LL∗ /2
We then evaluate the matrix element

< ψ|W |φ >

upto second degree in the Lindblad operators.

Reference: Rohit Singh, Ph.D thesis, NSUT.

[18] EKF applied to state estimation in quantum systems. The state vector
follows the noisy Schrodinger dynamics

ψ  (t) = −i(H + f (t)V )ψ(t)

where f (t) is a random process. If f (t) is white noise, then we must formulate
it as an sde with an Ito correction term that guarantees unitary dynamics:

dψ(t) = −(iH + V 2 /2)dt + V.dB(t))ψ(t)

Measurements are taken at discrete times t1 < t2 < ... < tn < ... on taking into
account the notion of state collapse. Let Ma , a = 1, 2, ..., N define a POVM,
376 Advanced Probability and Statistics: Applications to Physics and Engineering


ie Ma > 0, a Ma = I. Then, after the measurement at time tn is taken, the
state collapses to

ψ(tn + 0) = Man ψ(tn − 0)/ < ψ(tn − 0)|Man |ψ(tn − 0) >1/2

provided that the measured outcome is an . It should be noted that the proba-
bility of this outcome is

pn (an ) =< ψ(tn − 0)|Man |ψ(tn − 0) >

To take this fact into account, we introduce independent random variables


η(n), n = 1, 2, ... having probability distributions

P (η(n) = a) = pn (a), a = 1, 2, ..., N

Then the state at time tn + 0 following the measurement and after noting the
outcome is given by
 
ψ(tn + 0) = M (η(n))ψ(tn − 0)/ pn (η(n))

Here, pn (a) is of the form

pn (a) = q(a|ψ(tn − 0))

It should be noted that during the time interval (tn , tn+1 ), the dynamics of
the state is the above noisy Schrodinger dynamics. If we adopt a discrete time
version of this state and measurement model, we have
X(n + 1) = A1 X(n) + w(n)A2 X(n),

Z(n) = h(X(n), η(n))


where w(n), n = 1, 2, ... are iid N (0, 1) random variables and η(n), n = 1, 2, ...
are independent random variables conditioned on X(n − 1) with a conditional
probability distribution

P r(η(n) = a|X(n − 1)) = q(a|X(n − 1)), a = 1, 2, ..., p

The problem is to calculate the conditional probability p(X(n)|Yn ) recursively


where
Yn = (Z(k) : k ≤ n)
is the measurement process upto time n. We have
 p(X(n + 1)|Yn+1 ) = p(Z(n + 1), Yn , X(n + 1)/p(Yn+1 )
= p(Z(n+1)|X(n+1), X(n)).p(X(n+1)|X(n)).p(X(n)|Yn )dX(n)/

p(Z(n+1)|X(n+1), X(n)).p(X(n+1)|X(n)).p(X(n)|Yn )dX(n)dX(n+1)

Alternate models:
Advanced Probability and Statistics: Applications to Physics and Engineering 377

First state evolution from time 2n to time 2n + 1.

X(2n + 1) = (A1 + w(2r + 1)A2 )X(2n),

Then, state collapse on making a measurement at time 2n + 1:

Z(2n + 1) = h(η(2n + 1), X(2n + 1))

Then, state evolution starting from collapsed state:

X(2n + 2) = Z(2n + 1)

where
P (η(2n + 1) = a|X(2n + 1)) = q(a|X(2n + 1)), a = 1, 2, ..., p
This is one particular model for quantum state measurement in which state
evolution from an even time instant to the next (odd) time instant takes place
according to noisy Schrodinger dynamics while measurement followed by state
collapse takes place from an odd time instant to the next (even) time instant.
In another model, the state evolution under noisy Schrodinger dynamics takes
place from time Kn to time K(n + 1) − 1 and measurement followed by state
collapse takes place from time K(n + 1) − 1 to time K(n + 1). The difference
equation model for this is
X(m + 1) = (A1 + w[m + 1]A2 )X(m), Kn ≤ m ≤ K(n + 1) − 1

Z(K(n + 1) − 1) = h(η(n + 1), X(K(n + 1) − 1)),


X(K(n + 1)) = Z(K(n + 1) − 1)
where
P r(η(n + 1) = a|X(K(n + 1) − 1)) = q(a|X(K(n + 1) − 1)
The conditional densities
p(X(Kn + r)|Y (Kn − 1)), r = 0, 1, ..., K − 1, n = 0, 1, ...
are to be calculated recursively where
Y (Kn − 1) = {Z(Km − 1) : m ≤ n}
is the set of measurements upto time Km − 1. Clearly, we have
p(X(Kn+r+1|Y (Kn−1))

= p(X(Kn+r+1)|X(Kn+r)).p(X(Kn+r)|Y (Kn−1))dX(Kn+r),
r = 0, 1, ..., K−2
or equivalently,
E(φ(X(Kn+r+1)|Y (Kn−1))

= φ((A1 +A2 w)X(Kn+r))p(X(Kn+r)|Y (Kn−1))p(w)dwdX(Kn+r)
378 Advanced Probability and Statistics: Applications to Physics and Engineering

for r = 0, 1, ..., K − 2. Equivalently, defining

πn,r (φ(x)) = E(φ(X(Kn + r))|Y (Kn − 1)), r = 0, 1, ..., K − 1

we have the recursion



πn,r+1 (φ(x)) = πn,r ( p(w)φ((A1 + A2 w)x)dw), r = 0, 1, ..., K − 2

with the initialization

πn,0 (φ(x)) = φ(Z(Kn − 1))

In other words, since the state after the measurement, collapses to the measured
state, there is no need for a filtering algorithm here. However, if there is an
additive noise present during the measurement process, we would then require
a filtering algorithm.
In yet another simplified model, the state evolves without collapse and mea-
surements on the state are made at every time instant with the measured state
at any given time instant being given by a function of the state collapse at the
previous time instant and the current state. For this model,

X(n + 1) = (A1 + w[n + 1]A2 )X(n),

Z(n + 1) = h(η(n + 1), X(n + 1)),


P r(η(n + 1) = a|X(n)) = q(a|X(n))
Define
Yn = (Z(k) : k ≤ n)
and note that

p(X(n + 1)|Yn+1 ) = p(X(n + 1), Yn , Z(n + 1))/p(Yn+1 )

= N/D
where

N= p(Z(n + 1)|X(n + 1), X(n))p(X(n + 1)|X(n))p(X(n)|Yn )dX(n)

and 
D= N dX(n + 1)

Now, 
φ(Z(n + 1))p(Z(n + 1)|X(n + 1), X(n))dZ(n + 1)

= Eφ(h(η(n + 1), X(n + 1)))|X(n + 1), X(n))



= q(a|X(n))φ(h(a, X(n + 1)))
a
Advanced Probability and Statistics: Applications to Physics and Engineering 379

In terms of conditional probability densities,



p(Z(n + 1)|X(n + 1), X(n)) = q(a|X(n))δ(Z(n + 1) − h(a, X(n + 1)))
a

Now consider another model in which measurement noise is present. The


state model is
X(2n + 1) = (A1 + w[2n + 1]A2 )X(2n),
and the measurement model is

Z(2n+2) = (h(η(2n+2), X(2n+1))+v(2n+2))/  h(η(2n+2), X(2n+2))+v(2n+2) 

Following this measurement, the state collapses to

X(2n+2) = h(η(2n+2), X(2n+1))/  h(η(2n+2), X(2n+1)) 


= F (X(2n+1), η(2n+2))

where v(.) is measurement noise. The measurement model can be approximated


upto linear orders in the measurement noise as

Z(2n+2) ≈ (h(η(2n+2), X(2n+1))+v(2n+2))/



 h(η(2n + 2), X(2n + 1)) 2 +2v(2n + 2)T h(η(2n + 2), X(2n + 1))

≈ h(η(2n+2), X(2n+1))/  h(η(2n+2), X(2n+1))  +v(2n+2)/  h(η(2n+2), X(2n+1)) 


−v(2n+2)T h(η(2n+2), X(2n+1))/  h(η(2n+2), X(2n+1)) 3 .h(η(2n+2), X(2n+1))

We can formally express this identity as


Z(2n + 2) = X(2n + 2) + G(η(2n + 2), X(2n + 1))v(2n + 2)

X(2n + 2) = F (X(2n + 1), η(2n + 2))


X(2n + 1) = (A1 + w(2n + 1)A2 )X(2n)
where
F (X, η) = h(η, x)/  h(η, x) 
We write
Y2n = (Z(2k) : k ≤ n)
and find that 
p(X(2n + 1)|Y2n ) = p(X(2n + 1)|X(2n))p(X(2n)|Y2n )dX(2n)

or equivalently, 
E(φ(X(2n + 1))|Y2n ) = E[ φ((A1 + wA2 )X(2n))p(w)dw|Y2n ]
380 Advanced Probability and Statistics: Applications to Physics and Engineering

p(X(2n))|Y2n ) = p(X(2n), Y2n−2 , Z(2n))/p(Y2n ) =



p(Z(2n)|X(2n),
X(2n−1))p(X(2n)|X(2n−1)p(X(2n−1)|Y2n−2 )dX(2n−1)/p(Y2n )
Based on these equations, we could formally derive the EKF as follows:

X̂(2n + 1|2n) = A1 X̂(2n|2n)



X̂(2n + 2|2n) = F (X̂(2n + 1|2n), a)q(a|X̂(2n + 1|2n))
a

X̂(2n + 2|2n + 2) = X̂(2n + 2|2n) + K(Z(2n + 2) − Ẑ(2n + 2|2n))


where
Ẑ(2n + 2|2n) = X̂(2n + 2|2n)
and the Kalman gain K is chosen so that

E[ X(2n + 2) − X̂(2n + 2|2n + 2) 2 ]


is a minimum. This amounts to minimizing

T r(P (2n + 2|2n + 2))

where
P (2n + 2|2n + 2) =

[19] Nonlinear quantum mechanics simulations using neural net-


works
Consider the two electron problem of the Helium atom. The Hartree-Fock
approximation involves approximating the potential generated by each electron
as a Coulomb potential associated with a charge density equal to −e times the
modulus square of the electron’s wave function. Thus, if ψk (t, r) denotes the
wave function of the k th electron for k = 1, 2, then the nonlinear Schrodinger
dynamics of the system of two electrons is approximated as

iψ1,t (t, r) = (−1/2m)∇2 ψ1 (t, r) − (2e2 /r)ψ1 (t, r)



+Ke ( |ψ2 (t, r )|2 d3 r /|r − r |)ψ1 (t, r)
2

and
iψ2,t (t, r) = (−1/2m)∇2 ψ2 (t, r) − (2e2 /r)ψ2 (t, r)

+Ke2 ( |ψ1 (t, r )|2 d3 r /|r − r |)ψ2 (t, r)
Advanced Probability and Statistics: Applications to Physics and Engineering 381

Formally, we can write these two equations as

iψ1 (t) = H0 ψ1 (t) + .V (ψ2 (t))ψ1 (t),

iψ2 (t) = H0 ψ2 (t) + .V (ψ1 (t))ψ2 (t)


We solve this system to two coupled differential equations using perturbation
theory for differential equations. The idea is to treat the terms V (ψk ), k = 1, 2
as small perturbations and to apply time dependent perturbation theory.
2
ψk (t) = ψk,0 (t) + .ψk,1 (t) + ψk,2 (t) + ...


m
= ψk,m (t)
m=0

Plugging this into the equation and using the prime notation for variational
derivatives, we get a sequence of differential equations by equating coefficients
of each power of :
 
iψ1,0 (t) = H0 ψ1,0 (t), iψ2,0 (t) = H0 ψ2,0 (t),

iψ1,1 (t) = H0 ψ1,1 (t) + V (ψ2,0 (t))ψ1,0 (t),

iψ2,1 (t) = H0 ψ2,1 (t) + V (ψ1,0 (t))ψ2,0 (t),

iψ1,2 (t) = H0 ψ1,2 (t) + V  (ψ2,0 (t)).ψ1,1 (t)

iψ2,2 (t) = H0 ψ2,2 (t) + V  (ψ1,0 (t))ψ2,1 (t)
etc.
Now to simulate this system, we design a 2-layer neural network. The zeroth
layer is the initial state ψ1 (0), ψ2 (0) and the

[20] Neural network for simulating N -electron atom using the Hartree-Fock
equations derived from a variational principle.
For the N -electron atom with Hamiltonian
 
H= Ha + Vab
a a<b

where Ha are identical copies of H0 acting on the different components Ha of


the tensor product Hilbert space and Vab are identical copies of V12 acting on
the two fold tensor products Ha ⊗ Hb . We choose as our test wave function
in accordance with the Pauli exclusion principle, the following antisymmetric
tensor product of wave functions:

|ψ >= C (sgn(σ)ψσ1 ⊗ ... ⊗ ψσN )
σ∈SN
382 Advanced Probability and Statistics: Applications to Physics and Engineering

with the constraint that ψ1 , ..., ψN are orthonormal vectors. Then, with

H0 = Ha ,
a

< ψ|H0 |ψ >= N C 2 sgn(σρ) < ψσ1 |H1 |ψρ1 >< ψσ2 |ψρ2 > ... < ψσN |ψρN >
σ,ρ

= N C2 [sgn(σρ) < ψσ1 |H1 |ψρ1 > δ(σ2, ρ2)...δ(σN, ρN )]
σ,ρ

= N C2 < ψσ1 |H1 |ψσ1 >
σ

= N (N − 1)!C 2 = < ψa |H1 |ψa >
a

Further, with 
V = Vab ,
a<b

we have
N
< ψ|V |ψ >= < ψ|V12 |ψ >=
2
N 
C2 < ψσ1 ⊗ ...ψσN |V12 |ψρ1 ⊗ ... ⊗ ψρN >
2 σ,ρ

N 
= C2 [sgn(σρ). < ψσ1 ⊗ ψσ2 |V12 |ψρ1 ⊗ ψρ2 >
2 σ,ρ

[21] Using entangled states to communicate between the interior and exterior
of a Schwarzchild blackhole.
Let ψ0 (r) be in the initial state of the system and ψ(t, r) the state after time
t. The Schrodinger equation for the state is based on replacing the Laplacian
by the Laplace-Beltrami operator w.r.t the spatial metric:
2
γab = (g0a g0b − gab )/g00

This spatial metric is derived as follows. Let a light pulse start at time t from
r and arrive at time t + dt1 at r + dr. Likewise, it starts at time t from r + dr
and arrives at time t + dt2 at r. It now easily follows that dt1 is the positive
root of the quadratic equation

g00 dt2 + 2g0a dtdxa + gab dxa dxb = 0

while dt2 is the positive root of

g00 dt2 − 2g0a dtdxa + gab dxa dxb = 0


Advanced Probability and Statistics: Applications to Physics and Engineering 383

Thus, 
dt1 = [−g0a dxa + (g0a g0b − g00 gab )dxa dxb ]/2g00

and 
dt2 = [g0a dxa + (g0a g0b − g00 gab )dxa dxb ]/2g00

[22] Proof of the Knill-Laflamme theorem


C is the code subspace and P is the orthogonal projection onto C. N is
the noise manifold. We say that C corrects N if there exist operators Rk , k =
1, 2, ..., L such that
L
Rk∗ Rk = I,
k=1

and

Rk N |ψ >= λk (N, ψ)|psi > ∀k = 1, 2, ..., p∀|psi >∈ C, N ∈ N − − − (1)

Note that this condition is equivalent to requiring that


L
Rk N |ψ >< ψ|N ∗ Rk∗ = λ(N, ψ)|ψ >< ψ|∀ψ ∈ C, ∀N ∈ N − − − (2)
k=1

Indeed it is clear that (1) implies (2). Conversely, suppose (2) holds. Then for
any |φ >⊥ |ψ >, we have

< φ|Rk N |ψ >< ψ|N ∗ Rk∗ |ψ >= 0
k

or equivalently, 
| < φ|Rk N |ψ > |2 = 0
k

and hence
< φ|Rk N |ψ >= 0∀k∀|φ >⊥ |ψ >
Thus,
Rk N |ψ >= λk (N, ψ)|ψ >
proving the claim.
Remark: It is easy to see that if C corrects N with Rk , k = 1, 2, ..., L as
recovery operators, then for any mixed state ρ whose range is contained in C,
and for any Nk ∈ N , k = 1, 2, ..., K, we have that
 

Rk ( Nm ρNm )Rk∗ = λ.ρ
k m
384 Advanced Probability and Statistics: Applications to Physics and Engineering

and the converse is also trivially true. In fact, more generally, it is immediate
to see that if T is a quantum noisy channel of the form
 
T (ρ) = Nk ρNk∗ , Nk ∈ N , Nk∗ Nk = I
k k

then C is N -correcting with recovery operators Rk , k = 1, 2, ..., L iff for every


such T , 
Rk .T (ρ).Rk∗ = λρ
k

The only if part is immediate. To prove the if part, let N1 ∈ N . We must


assume that there exist operators N2 , ..., NL ∈ N such that
⎛ ⎞
N1
⎜ N2 ⎟
[N1∗ , ..., NL∗ ] ⎜ ⎟
⎝ .. ⎠ = I
NL

ie

L
Nk∗ Nk = I
k=1

Then the proof of the theorem becomes straightforward.


The Knill-Laflamme theorem: C is N -correcting iff

P N1∗ N2 P = λ(N1∗ N2 )P, ∀N1 , N2 ∈ N


Note that from the above definition of λ, the map (N1 , N2 ) → λ(N1∗ N2 ) from
N × N into C is a semi-inner product, ie, λ(N ∗ N ) ≥ 0, it is conjugate linear in
the first argument and linear in the second argument and λ̄(N1∗ N2 ) = λ(N2∗ N1 ).
We then define the linear manifold

N0 = {N ∈ N : λ(N ∗ N ) = 0}

It is easy to see that N0 is a vector space, ie, a linear manifold, since N1 , N2 ∈ N0


implies
λ(N3∗ (c1 N1 + c2 N2 )) = c1 λ(N3∗ N1 ) + c2 λ(N3∗ N2 )
∀N3 ∈ N and using the Cauchy-Schwarz inequality for semi-inner products, we
get
|λ(N3∗ N1 )|2 ≤ λ(N3∗ N3 ).λ(N1∗ N1 ) = 0
ie,
λ(N3∗ N1 ) = 0
and likewise
λ(N3∗ N2 ) = 0
thus yielding
λ(N3∗ (c1 N1 + c2 N2 )) = 0∀N3 ∈ N
Advanced Probability and Statistics: Applications to Physics and Engineering 385

and taking
N3 = c 1 N 1 + c 2 N 2
results in
c1 N1 + c2 N2 ∈ N0
Now let M = N /N0 and define

< Ñ1 , Ñ2 >= λ(N1∗ N2 ), Ñ1 = N1 + N0 , Ñ2 = N2 + N0 , N1 , N2 ∈ N

This is a well defined inner product on M. Indeed, if

N1 = N1 + M1 , N2 = N2 + M2 , M1 , M2 ∈ N0

then we get 
λ(N1∗ N2 ) = λ(N1∗ N2 )
because
|λ(N1∗ M2 )|2 ≤ λ(N1∗ N1 ).λ(M2∗ M2 ) = 0
ie,
λ(N1∗ M2 ) = 0
and likewise,
λ(M1∗ N2 ) = λ(M1∗ M2 ) = 0
The positive definiteness of < ., . > follows from the following: Let

/ N0 , Ñ = N + N0
N ∈ N,N ∈
Then
< Ñ , Ñ >= λ(N ∗ N ) > 0
Now choose an onb Ñ1 , ...Ñp for M. We can write

Ñk = Nk + N0 , k = 1, 2, ..., p

Then obviously

λ(Nk∗ Nj ) =< Ñk , Ñj >= δkj , k, j = 1, 2, ..., p

Define

p
Pk = Nk P Nk∗ , k = 1, 2, ..., p, Q = 1 − Pk
k=1

Then,
Pk∗ = Pk , Pk Pj = Nk P Nk∗ Nj P Nj∗ = λ(Nk∗ Nj )Nk P Nj∗
= δkj Pk
and hence {Pk : k = 1, 2, ..., p} ∪ {Q} is an orthogonal resolution of the identity.
Define
Rk = P Nk∗ , k = 1, 2, ..., p, Rp+1 = Q
386 Advanced Probability and Statistics: Applications to Physics and Engineering

We have

p+1 
p 
p
Rk∗ Rk = Nk P Nk∗ + Q = +Q = I,
k=1 k=1 k=1

and secondly, if
N ∈ N , |ψ >∈ C
then

Rk Nj |ψ >= P Nk∗ Nj P |ψ >= δkj P |ψ >= δkj |ψ >, k, j = 1, 2, ..., p,

Rp+1 Nj |ψ >= QNj |ψ >=



p 
p
Nj |ψ > − Nk P Nk∗ Nj P |ψ >= Nj |ψ > − δkj Nk P |ψ >
k=1 k=1

= Nj |ψ > −Nj |ψ >= 0, k, j = 1, 2, ..., p


Further, if N0 ∈ N0 , then

 Rk N0 |ψ >2 =< ψ|N0∗ Rk∗ Rk N0 |ψ >

=< ψ|N0∗ Nk P P Nk∗ N0 |ψ >=< ψ|P N0∗ Nk P.P Nk∗ N0 P |ψ >


= |λ(N0∗ Nk )|2 < ψ|ψ >= 0, k = 1, 2, ..., p
since
|λ(N0∗ Nk )|2 ≤ λ(N0∗ N0 ).λ(Nk∗ Nk ) = 0

Finally,
Rp+1 N0 |ψ >= 0
since
N0 |ψ >= 0
because

 N0 |ψ >2 =< ψ|P N0∗ N0 P |ψ >= λ(N0∗ N0 ) < ψ|ψ >= 0

Note that we are taking |ψ >∈ C, so P |ψ >= |ψ > is true. This completes the
proof of the Knill-Laflamme theorem.

[23] Solve Dirac’s equation in a radial potential

(E + eV (r) − m)φ = (σ, p)χ, (E + eV (r) + m)χ = (σ, p)φ

These equations can be derived from

(σ, p)φ1 + mφ2 = (E + eV )φ1 , −(σ, p)φ2 + mφ1 = (E + eV )φ2


Advanced Probability and Statistics: Applications to Physics and Engineering 387

Indeed, adding and subtracting these equations gives us

(σ, p)(φ1 − φ2 ) + m(φ1 + φ2 ) = (E + eV )(φ1 + φ2 ),

(σ, p)(φ1 + φ2 ) − m(φ1 − φ2 ) = (E + eV )(φ1 − φ2 )


so that writing √ √
φ = (φ1 + φ2 )/ 2, χ = (φ1 − φ2 )/ 2
gives us the above transformed Dirac equation. Now write

φ = [c1 Yl,m−1 (r̂), c2 Yl,m2 (r̂)]T u(r)

χ = (σ, r)[d1 Yl,m1 (r̂), d2 Yl,m2 (r̂)]T v(r)


where
m1 = jz − 1/2, m2 = jz + 1/2
Then, we get using
(σ, p)(σ, r) = (p, r) + (σ, L)
where
(p, r) = px x + py y + pz z = (r, p) − 3i = rpr − 3i
where
pr = −id/dr
Also, in terms of ladder operators,

(σ, L) = σx Lx + σy Ly + σz Lz

= σx (L+ + L− )/2 + σy (L+ − L− )/2i + σz Lz

= σ− L+ + σ+ L− + σz Lz
where
0 1
σ+ = (σx + iσy )/2 = ,
0 0
0 0
σ− = (σx − iσy )/2 =
1 0
Thus,
(σ, p)χ =
(σ, p)(σ, r)[d1 Yl,m1 (r̂), d2 Yl,m2 (r̂)]T v(r)
= [(rpr − 3i)v(r) + v(r)(σ+ L− + σ− L+ + σz Lz )][d1 Yl,m1 , d2 Yl,m2 ]T
= (rpr − 3i)v(r)[d1 Yl,m1 , d2 Yl,m2 ]T + v(r)[d2 L− Yl,m2 , d1 L+ Yl,m1 ]T
+v(r)[d1 Yl,m1 , −d2 Yl,m2 ]T
= (rpr − 3i)v(r)[d1 Yl,m1 , d2 Yl,m2 ]T + v(r)[d2 b(l, m2 )Yl,m1 , d1 a(l, m1 )Yl,m2 ]T
= (E + eV − m)φ = (E + V − m)u(r)[c1 Yl,m1 , c2 Yl,m2 ]T
388 Advanced Probability and Statistics: Applications to Physics and Engineering

This gives us two equations:

c1 (E + eV − m)u(r) = d1 (rpr − 3i)v(r) + (d1 + d2 b(l, m2 ))v(r)

c2 (E + eV − m)u(r) = d2 (rpr − 3i)v(r) + (d1 a(l, m1 ) − d2 )v(r)


Likewise,
(σ, p)φ = (σ, p)[c1 Yl,m1 , c2 Yl,m2 ]T u(r)
= (E + eV + m)χ = (E + eV + m)(σ, r)[d1 Yl,m1 , d2 Yl,m2 ]T v(r)
which gives on premultiplying by (σ, r):

(rpr − (σ, L))u(r)[c1 Yl,m1 , c2 Yl,m2 ]T =

r2 (E + eV + m)v(r)[d1 Yl,m1 , d2 Yl,m2 ]T


or equivalently,

(rpr − σ+ L− − σ− L+ − σz Lz )u(r)[c1 Yl,m1 , c2 Yl,m2 ]T =

r2 (E + eV + m)v(r)[d1 Yl,m1 , d2 Yl,m2 ]T


or equivalently,

c1 rpr u(r) − (c2 b(l, m2 ) + c1 )u(r) = d1 r2 (E + eV + m)v(r),

c2 rpr u(r) − (c1 a(l, m1 ) − c2 )u(r) = d2 r2 (E + eV + m)v(r)


Collecting all the four of these equations in one place, we write:

c1 (E + eV − m)u(r) = d1 (rpr − 3i)v(r) + (d1 + d2 b(l, m2 ))v(r) − − − (1)

c2 (E + eV − m)u(r) = d2 (rpr − 3i)v(r) + (d1 a(l, m1 ) − d2 )v(r) − − − (2)


c1 rpr u(r) − (c2 b(l, m2 ) + c1 )u(r) = d1 r2 (E + eV + m)v(r) − − − (3)
c2 rpr u(r) − (c1 a(l, m1 ) − c2 )u(r) = d2 r2 (E + eV + m)v(r) − − − (4)
Note that a(l, m), b(l, m) are defined by the equations:

L+ Yl,m = a(l, m)Yl,m+1 , L− Yl,m = b(l, m)Yl,m−1

These are four ordinary differential equations for two functions u(r), v(r) and
hence for consistency, some relations between the constants c1 , c2 , d1 , d2 are
required. These are as follows: For (1) and (2) to correspond to the same
equation, we need

d1 /c1 = d2 /c2 , (d1 + d2 b(l, m2 ))/c1 = (d1 a(l, m1 ) − d2 )/c2 ,

and for (3) and (4) to correspond to the same equation, we need

d1 /c1 = d2 /c2 , (c1 + c2 b(l, m2 ))/c1 = (c1 a(l, m1 ) − c2 )/c2


Advanced Probability and Statistics: Applications to Physics and Engineering 389

Let
α = d1 /c1 = d2 /c2 , β = c2 /c1
Then, the above conditions are equivalent to

2 + βb(l, m2 ) − (1/β)a(l, m1 ) = 0 − − − (3)

2 + βb(l, m2 ) − a(l, m1 )/β = 0


both of which are the same equation. Thus, β is determined from (3), and then

[24] Syllabus for a course on quantum computation

[1] Classical and quantum probability spaces.


[2] Events, observables and states in quantum probability.
[3] Schrodinger’s equation in finite and infinite dimensional Hilbert spaces.
[4] Heisenberg’s matrix mechanics and Dirac’s interaction picture represen-
tation of quantum dynamics in a perturbing potential.
[4] Schrodinger’s equation in the presence of classical and quantum noise.
[5] The basic finite dimensional unitary gates: CNOT, SWAP, phase, Hadamard,
Fredkin and Toffoli gates.
[6] The quantum Fourier transform gate (QFT).
[7] The notion of PVM and POVM measurements in quantum mechanics.
How measurement causes a state collapse to occur.
[8] Entropy in classical and quantum probability.
[9] The basic quantum classical (Cq) coding theorem.
[10] Schumacher noiseless compression in quantum computing.
[11] The Dyson series expansion for quantum evolution in a time dependent
potential.
[12] Design of quantum unitary gates using physical systems, ie, by perturb-
ing physical Hamiltonians appropriately so that after time T , the Schrodinger
evolution operator matches a desired unitary operator.
[13] Noise performance of quantum gate design algorithms. When a small
noisy potential corrupts the Hamiltonian, then how we would evaluate the mean
square gate matrix design error.
[14] Transmission of a quantum state using entanglement: Quantum tele-
portation with examples.
[15] Phase estimation using the quantum Fourier transform.
[16] Shor’s order finding algorithm using the quantum Fourier transform.
[17] Grover’s search algorithm based on oracles and reflection operators.
[18] Proof of the lower computational complexity of the quantum Fourier
transform and Grover’s search algorithm as compared to the corresponding clas-
sical algorithms.
[19] Bell’s inequalities leading to the proof of impossibility of hidden variables
classical probabilistic model for quantum mechanics.
[ ]
390 Advanced Probability and Statistics: Applications to Physics and Engineering

[19] Bell’s inequalities leading to the proof of impossibility of hidden variables


classical probabilistic model for quantum mechanics.
[20] The notion of an error correcting code in quantum information theory
and the Knill-Laflamme theorem.
[21] Examples of error correcting codes using the Weyl operator and imprim-
itivity systems.
[22] Design of quantum gates using the Feynman diagrammatic approach
for scattering amplitudes of electrons and positrons interacting with an external
electromagnetic source.
[23] Design of quantum gates using non-Abelian and super-symmetric La-
grangians and Hamiltonians.
[24] Design of quantum gates using quantization of classical string theories
based on the Feynman path integral.
[25] Implementing the Hudson-Parthasarathy quantum stochastic noisy
Schrodinger equation on a quantum computer by means of interaction of
quantum electro- magnetic waves with a system Hamiltonian.
[26] Belavkin’s quantum filtering theory as a non-commutative generalization
of classical Kushner-Kallianpur non-linear filtering theory.
[27] Application of Belavkin’s quantum filtering equations to estimating the
spin of an electron.
[28] The interaction of Dirac’s relativistic Hamiltonian with a quantum noisy
electromagnetic fields described in the Hudson-Parthasarathy qsde formalism.
[29] Design of quantum gates using the matching generator method applied
to perturbations of a 3 − D charged harmonic oscillator by a classical and quan-
tum electromagnetic field.
[30] Design of a quantum receiver by generating a current by modulating the
symbol sequence with a PAM pulse, allowing the electromagnetic field gener-
ated by the current to fall on an atom and then take repeated measurements on
the atomic states at discrete times and then by taking into account the collapse
postulate following a measurement, calculating the joint probability of the mea-
surement outcomes and maximizing this probability w.r.t the current symbol
sequence.
[31] Quantum hypothesis testing: Discriminating between two states by an
optimum POVM.
[32] Large deviation techniques in quantum mechanics. Computing the rate
function for the optimal asymptotic false alarm probability for a when the two
states are multiple tensor product states of the form ρ⊗n and σ ⊗n . Expressing
this asymptotic false alarm probability in terms of the information theoretic
distance between the two states.
[33] Application of quantum scattering theory to the design of quantum
gates.
Advanced Probability and Statistics: Applications to Physics and Engineering 391

postulate following a measurement, calculating the joint probability of the mea-


surement outcomes and maximizing this probability w.r.t the current symbol
sequence.
[31] Quantum hypothesis testing: Discriminating between two states by an
optimum POVM.
[32] Large deviation techniques in quantum mechanics. Computing the rate
function for the optimal asymptotic false alarm probability for a when the two
states are multiple tensor product states of the form ρ⊗n and σ ⊗n . Expressing
this asymptotic false alarm probability in terms of the information theoretic
distance between the two states.
[33] Application of quantum scattering theory to the design of quantum
gates.

[25] The Knill-Laflamme theorem


Let N be the noise manifold and P denote the orthogonal projection onto
the code subspace C. We assume the conditions of the Knill-Laflamme theorem
hold:
P N2∗ N1 P = λ(N2∗ N1 )P, N1 , N2 ∈ N
This is equivalent to stating that if |ψk >, k = 1, 2, ..., p is an onb for C, then
(a) < ψk |N2∗ N1 |ψk > is independent of k for fixed N1 , N2 ∈ N and (2) <
ψk |N2∗ N1 |ψm >= 0 for k = m. Now we wish to construct the recovery operators
Rk , k = 1, 2, ..., N so that Rk N |ψ > is proportional to |ψ > for all ψ ∈ C
N ∗
and secondly k=1 Rk Rk = I. Now let {ψkj : 1 ≤ j ≤ r} be an onb for
N |ψk >. Note that dimN |psik >= r is independent of k because the map
N |ψ1 >→ N |ψk > from N |ψ1 > onto N |psik > is a well defined isometry for
each k = 2, 3, ..., p and hence unitary in view of the above stated Knill-Laflamme
conditions. Indeed, denoting this isometry by Uk , we see that Uk is well defined
because N |ψ1 >= N  |ψ1 > for any N, N  ∈ N implies (N − N  )|ψ1 >= 0
implies  (N − N  )|ψ1 >2 = 0 implies λ((N − N  )∗ (N − N  )) = 0 implies
 (N − N  )|ψk >2 = 0 implies N |ψk >= N  |ψk >. The surjectivity of Uk is
obvious. The isometry property of Uk also follows from the Knill-Laflamme
conditions, namely that  N |ψ1 >2 = N |ψk >2 = λ(N ∗ N ). Now define Ek
to be the orthogonal projection onto span{|ψjk >: j = 1, 2, ..., p} and let Vj
denote the operator that maps |ψjk > to |ψj > for each k = 1, 2, ..., r. and
maps span{ψjk >: 1 ≤ j ≤ p, 1 ≤ k ≤ r}⊥ to zero. Note that {|ψjk >: 1 ≤ j ≤
p, 1 ≤ k ≤ r} is an orthonormal set by virtue of the Knill-Laflamme conditions.
We note that
|ψjk >= Uk |ψ1k >
We now define
Rj = Vj Ej , j = 1, 2, ..., p
and observe that if |ψ >∈ C and N ∈ N , then

Rj N |ψ >= Rj |ψlm >< ψlm |N |ψ >
lm
392 Advanced Probability and Statistics: Applications to Physics and Engineering


= Vj Ej |ψlm >< ψlm |N |ψ >
lm
 
= Vj |ψlj >< ψlj |N |ψ >= |ψl >< ψlj |N |ψ >
l l

Now, it is clear from the above construction that we can write

|ψlj >= Nlj |ψl >

We wish to show that the noise operators Nlj can be chosen to be independent
of the index l. In fact, we have

|ψ1j >= N1j |ψ1 >

and then if we choose Nlj = N1j , then we get that


∗ ∗
< ψlj |ψlk >=< ψl |N1j N1k |ψl >= λ(N1j N1k )

=< ψ1j |ψ1k >= δjk


In other words, we first choose an onb |ψ1j >= N1j |ψ1 >, j = 1, 2, ..., r for
N |ψ1 > and then the above argument shows that |ψlj >= N1j |ψl >, j =
1, 2, ..., r will automatically be an onb for N |ψl >. We then get
 

Rj N |ψ >= |ψl >< ψlj |N |ψ >= |ψl >< ψl |N1j N |ψ >
l l

∗ ∗
= |ψl > λ(N1j N ) < ψl |ψ >= λ(N1j N )|ψ >= a(j, N )|ψ >
l

proving the reconstruction property for {Rj }. Further, we have from the above
that 
< φ|N2∗ Rj∗ Rj N1 |ψ >=
j

ā(j, N2 )a(j, N1 ) < φ|ψ >
j

for any |φ >, |ψ >∈ C. Now,


 
∗ ∗
a(N1 , N2 ) = ā(j, N2 )a(j, N1 ) = λ̄(N1j N2 )λ(N2j N2 )
j j

on the one hand while on the other,

λ(N2 ∗ N1 ) =< ψ1 |N2∗ N1 |ψ1 >=



< ψ1 |N2∗ |ψ1j >< ψ1j |N1 |ψ1 >=
j

= < ψ1 |N2∗ N1j |ψ1 >< ψ1 |N1j

N1 |ψ1 >=
j
Advanced Probability and Statistics: Applications to Physics and Engineering 393

 
= λ(N2∗ N1j )λ(N1j

N1 ) = ā(j, N2 ).a(j, N1 ) = a(N1 , N2 )
j j

Thus,

< φ|N2∗ Rj∗ Rj N1 |ψ >= λ(N2∗ N1 ) < φ|ψ >=< φ|N2∗ N1 |ψ >
j
p
This is the same as saying that j=1 Rj∗ Rj equals I on N C, ie on the subspace
span{N |ψ >: N ∈ N , |ψ >∈ C}. We can now add another operator Rp+1 so

that it is zero on N C and Rp+1 Rp+1 = I on (N C)⊥ , thereby guaranteeing the
reconstruction property that Rm N |ψ > is proportional to |ψ > for all m =
1, 2, ..., p + 1 and simultaneously


p+1

Rm Rm = I
m=1

This completes the proof of the Knill-Laflamme theorem.

[26] Simulation of quantum gates by perturbing an atom with a


magnetic field, both classical and quantum
Reference: Mahendra Gupta, M.Tech thesis, NSUT.
Chapter 13

Aspects of Circuit Theory


and Device Physics

[1] Derive using the Kronig-Pinney model for the periodic potential of a lattice,
the Bloch wave functions and hence the existence of energy bands in a solid.
[2] Derive the total diffusion plus drift current in a doped semiconductor
in the presence of an external electric field. What does the equation of charge
conservation/continuity give for the density of electrons/holes in the presence
of a charge generation term.
[3] When a potential is applied across a doped pn junction semiconductor
with the space charge regions having definite widths on both sides of the junc-
tion, then write down Poisson’s equation for the potential and evaluate the space
charge widths in terms of the applied potential and the concentration of donors
and acceptors.
[4] Prove using the Gibbs distribution that the current in pn junction diode
is given by I = I0 (exp(eV /kT ) − 1)
[5] If in a material the potential is V (r), then by the Gibbs distribution
principle, the charge density is ρ0 exp(−qV (rx)/kT ) and V satisfies Poisson’s
equation
∇2 V (r) = −(ρ0 / ).exp(−qV (r)/kT )
Obtain a perturbative series solution for this equation.
[6] Derive the conductivity of a plasma in a weak electric field using the
Boltzmann kinetic transport equation.
[2] Ph.D thesis report for the thesis ”Design of voltage/current
mode analog circuits using second generation current conveyor
The author begins by noting that current mode designs of amplifiers are
based on replacing biasing voltages and other dc voltages by biasing dc currents
and other dc currents. Voltage mode based amplifier/oscillator designs are usu-
ally carried out using BJT transistors and nonlinear and small signal analysis
of such circuits are based on the Ebers-Moll model for the transistor. On the

395
396 Advanced Probability and Statistics: Applications to Physics and Engineering

other hand, current mode designs are usually carried out using MOSFET tran-
sistors. The reason for this may kindly be highlighted at the time of the final
presentation. The canddate’s argument must be based on comparison of the
Ebers-Moll model for BJT with the following model for MOSFETs: BJT:

IC = IC0 (exp((VB − VC )/VT ) − 1) − αC IE0 (exp((VB − VE )/VT ) − 1),

IE = IE0 (exp((VB − VE )/VT ) − 1) − αE IC0 (exp((VB − VC )/VT ) − 1),


MOSFET:
IG = 0, ID = IS = K(VG − VS − VP P )2
The generation of dc current sources is based on current mirrors constructed us-
ing MOSFETs as shown in fig.2.2a. The generated dc currents should be stable
to temperature fluctuations so that these sources can be used as biasing sources
in MOSFET amplifiers as shown in figs 2.2 b,c and d. For this, impedance of
the source must be very high and it should not fluctuate w.r.t temperature. For
example if we have a voltage source Vi with a high impedance Ri and this source
is connected to a load RL , then the current flowing through the load is

Vi
I=
Ri + RL

and in the limit when RL → ∞, Vi → ∞, Vi /RL → I0 we get I = I0 , ie the


current flowing through the load is independent of its impedance RL . The
author may choose to include using nonlinear MOSFET analysis how current
mode designs have the advantage of constant current gain at higher frequencies
where the voltage gain of a voltage mode circuit falls. For this purpose, the
author must explain during the presentation how one draws the small signal
equivalents of BJT’s and MOSFET’s by calculating using the standard nonlinear
two port models the g parameters of both and then calculating the small signal
voltage gain for amplifiers designed using the former and the small signal current
gain for amplifiers designed using the latter. The g-parameters of the BJT and
MOSFET must respectively be based on dc voltage and dc current biasing. The
author mentions that current mode circuits show higher slew rates as compared
to voltage mode circuits. The argument given on pp 17-18 requires some rigorous
mathematical justification. The author also mentions that voltage mode circuits
are more sensitive to supply/biasing voltage fluctuations as compared to current
mode circuits. For fig.2.4, the author may compute the output voltage v0 as a
function of time with input square wave to determine the slew rate theoretically.
Specifically, if J is the biasing current and C0 is the output capacitance with
vi as the input voltage, then we have from KCL and the standard two port
MOSFET model, with I1 as the source current,

I1 (t) = K(vi (t) − vo (t) − VP P )2

dvo (t)
J − I1 (t) = C0
dt
Advanced Probability and Statistics: Applications to Physics and Engineering 397

so that
dvo (t)
C0 = K − K.(vi (t) − vo (t) − VP )2
dt
Solving this differential equation for vo (t) with vi (t) as a square wave gives us
immediately the slew-rate. On the other hand, for a voltage mode BJT amplifier
cct with one collector resistance and one load capacitance, the KCL gives, with
I1 (t) as the collector curent,

I1 (t) = f (vi (t), vo (t)),

Cdvo (t)/dt + I1 (t) = (Vcc − v0 )/R


or equivalently,
Cdvo /dt = (Vcc − vo )/R − f (vi , v0 )
which shows that dvo /dt has apart from the transistor Ebers-Moll nonlinear
function f , a linear term (Vcc − vo )/R which decreases the slew-rate.
Verify all the points mentioned on p.22 for a current mirror. Specifically
verify the high bandwidth property, high slew rate, zero input impedance and
infinite output impedance. Your verifications must be based on the two port
nonlinear model of the MOSFETs taking into account resistances and capaci-
tances between gate and source and gate and drain. This condition prevents
IG = 0. Instead the two port model becomes

IS = K(VG − VS − VP )2 , ID = IG + IS ,

IG = CGS d(VG −VS )/dt+(VG −VS )(/RGS +CGD d(VG −0VD )/dt+(VG −VD )/RGD
Some clear equations require to be given for two port models of MOSFETs
designed using SOI technology. Issues like leakage current, sensitivity, stability
of current source to temperature fluctuations etc. need to be addressed.
Derive the squarer equation given on p.67 from the cct given in fig.4.4 from
first principles. Explain clearly the derivation. For each MOSFET transistor,
use only the algebraic relations and then explain how transient effects caused by
capacitances between gate and source and gate and drain introduce transient
effects in the squarer ie we get squarer plus some small memory terms.
In the fifth chapter, the author states that low power circuits can be designed
using some special kinds of current conveyors. Two kinds of current conveyors
are compared, one BD and two BDQFG. Some reasons for preferring the latter
over the former are given. Some theoretical proofs of these may be presented
during the viva-voce exam. The basic equations given by the author regarding a
current conveyor are that there are two input nodes and two output nodes and
that the current in the output nodes must match that in one of the input nodes
and the author states that one of the input nodes must have a low impedance
while the other input and both the output nodes must have very high impedance.
Why such a requirement is imposed must be stated.
In conclusion, I congratulate the author for having proposed many new tech-
niques regarding the design of current mode transistor circuits as a preference
398 Advanced Probability and Statistics: Applications to Physics and Engineering

to voltage mode circuits. I must say that the author has put in a lot of effort
to analyze and design current mode circuits using standard circuit software.
This study will motivate other researchers to study current mode based cir-
cuits more carefully and try to get improved two port models for MOSFETs
and apply these models to analyzing MOSFET circuits. The candidate well
deserves a PhD degree for this mammoth task. Before awarding her the degree,
I would however appreciate it if she can present some partial answers to the
above queries.

Remark: The Ph.D student for this thesis was Mrs.Bindu Thakral and her
supervisor was Dr.Arthi Vaish.
Chapter 14

Index on the Contents and


Notes
[1] A survey of the pedagogical work of V.S.Varadarajan.
[2] The scientific contributions of the Indian school of probability. Summary
of the work of C.R.Rao and S.R.S.Varadhan.
C.R.Rao’s work on
[a] The Cramer-Rao lower bound on the minimum possible variance of any
estimator of a parameter.
[b] Sufficient statistics.
[c] Asymptotic efficiency of a statistical estimator.
[d] Use of generalized inverses in solving least squares problems.
S.R.S.Varadhan’s work with D.W.Stroock on
[a] Martingale characterization of diffusion processes.
[b] Martingale formulation of Ito processes and Ito processes as a general-
ization of solutions to Ito’s stochastic differential equations. Giving meaning to
Ito sde’s even when the coefficients do not satisfy the Lipshitz conditions.
S.R.S Varadhan’s work with M.D.Donsker on
[a] The large deviation principle for Markov processes, Generalization of
Kac’s formula for the maximum eigenvalue of the operator (1/2)d2 /dx2 + V (x)
using Brownian motion to other stochastic processes using the variational form
of the LDP. This involves using the rate function for the empirical distribution
of the process.
[b] Formulation of the large deviation principle as a variational problem that
characterizes the low temperature limit of the partition function in classical
statistical mechanics.
[c] The ldp applied for Brownian local time.
[d] Proof of Pekar’s conjecture.
[e] Asymptotics of the Wiener sausage.
S.R.S. Varadhan’s work with Guo, Papanicolau, Yau, Donsker, Quastel,
Rezakhanlou and others on

399
400 Advanced Probability and Statistics: Applications to Physics and Engineering

[a] Hydrodynamical scaling limit for interacting diffusions.


[b] Hydrodynamical scaling limit for the simple exclusion process on a lattice.
[c] Entropy methods in hydrodynamical scaling.
[d] Simple exclusion models for particles of different colour.
[e] LDP for the simple exclusion process. How much will the empirical den-
sity of the system of particles deviate from that obeying the non-linear Burger’s
differential equation ?
[f] LDP for random graphs. For example, in a random graphs in which we
have a two dimensional finite grid of points with a probability p that an edge
connects two points. Then, we wish to estimate the number of polygons having
r number of sides.
[g] Random walks in random environment. For example, the environment
can be a one dimensional lattice on which the walk takes place. The probability
that a jump from site n to the site n + k takes place is p(n, k, ω) where the
process {p(n, k, ω) : k ∈ Z}, n ∈ Z is a stationary and ergodic process. The
problem is to obtain an LDP for probabilities associated with such walks.

[3] Survey of the work in quantum probability by the Indian school of prob-
abilists.
[a] The work of R.L.Hudson and K.R.Parthasarathy on quantum Ito’s for-
mula and the precise meaning of a noisy Schrodinger equation in quantum me-
chanics.
[b] The work of K.R.Parthasarathy on quantum Markov processes defined
in terms of star unital homomorphisms.
[c] Quantum stochastic differential equations as unitary dilations of the quan-
tum master equation of Gorini, Kossakowski, Sudarshan and Lindblad.
[d] The work of W.O.Amrein, K.B.Sinha and Jauch on time delay in quantum
scattering theory. To determine expressions for the difference in the average
times spent by the particle in a Borel subset of space before the scattering and
after the scattering.
[e] The work on defining the Scattering matrix for Coulomb scattering after
noting that the wave operators do not exist for the Coulomb potential.

[4] Probability and statistics in general relativity and cosmology; The work of
the Indian school of general relativists. How small random perturbations in the
positions and velocities of stars in a galaxy evolve under Newton’s inverse square
law of gravitation and under Einstein’s law of gravitation into clusters having
specific shapes like globular clusters, spiral galaxies etc. Also if we solve the
Einstein-Maxwell field equations in the presence of matter under initial random
conditions on the metric perturbations, the velocity and density perturbations
and the electromagnetic four potential perturbations, then what will be the
mean square fluctuations in the same quantities as time progresses ?

[5] Work on quantum signal processing carried out at the NSUT.


Advanced Probability and Statistics: Applications to Physics and Engineering 401

[a] Work on simulating the Belavkin filter for mixtures of quantum Gaussian
and Poisson noise measurements and computing the entropy evolution in of the
filter using Lie algebraic methods.
[b] Work on applying classical nonlinear filtering techniques for estimating
a quantum electromagnetic field using a time varying windowed version of this
field to excite a quantum mechanical system.
[c] Work on quantum image processing, specifically, to transform a classical
image field into a quantum state vector having many more degrees of freedom
and then processing this quantum state using optimal unitary operators and
finally converting the processed quantum state into a classical image field.
[d] Work on quantum image processing using Gaussian states. This involves
first transforming a classical image field into a quantum state, then approxi-
mating this quantum state by a quantum Gaussian state and applying stan-
dard processing algorithms on quantum Gaussian states based on the Hudson-
Parthasarathy noisy Schrodinger equation.

[6] Weyl’s integration formula on compact groups.


[a] The integration formula.
[b] Mackey’s theory of induced representations for the semidirect product.
[c] The imprimitivity theorem of Mackey. Equivalence of an imprimitivity
system to the canonical imprimitivity system for a group acting on a manifold.
Let G act on M . For f defined on M , define U (g)f (x) = f (g −1 x) and for
E ⊂ M , define P (E)f = χE f . Then U (g)P (E)f (x) = χE (g −1 x)U (g)f (x)
and hence U (g)P (E)U (g)−1 = P (gE) as operators. This is an example of a
canonical imprimitivity system.

[7] Syllabus for a course on Applied linear algebra.


[8] Syllabus for a course on quantum signal processing.
[9] Unbounded operators in Hilbert space and quantum scattering theory.
[10] Discrete series characters of Harish-Chandra applied to pattern classi-
fication for Lorentz transformations acting in two dimensional space and one
dimensional time.
[11] Work carried out at the NSUT school of robotics.
[12] Cavity resonator antennas designed using microstrip patches-an approx-
imate method of analysis based on patching up the results for rectangular res-
onators.
[13] A short summary of the work of Harish-Chandra on the Plancherel for-
mula for semisimple Lie groups. The idea is based on noting that Fourier trans-
forms of the orbital integral are irreducible characters (this can even be noted
from Weyl’s character formula for compact semisimple Lie groups), then to note
that G = SL(2, R) has two non-conjugate one dimensional Cartan subgroups,
one an elliptic group and two a hyperbolic group and then to apply Weyl’s in-
tegration formula to express the integral of a function on G as a superposition
of the integrals of the two orbital integrals on the two Cartan subgroups and
then to relate these orbital integrals to the irreducible characters of the prin-
cipal and discrete series. The characters of the discrete series are obtained by
402 Advanced Probability and Statistics: Applications to Physics and Engineering

solving differential equations with boundary conditions determined by the cele-


brated Harish-Chandra jump relations. These differential equations are derived
by noting that every irreducible character is an invariant eigen-distribution.
[14] Report on a PhD thesis on design of voltage/current mode analog circuits
using current conveyors.
[15] A lecture on wavelet construction from multiresolution analyses using
the scaling sequence.
[16] Quantum stochastic integration and quantum stochastic differential
equations, proofs of existence and uniqueness of solutions.
[17] Pattern recognition using Harish-Chandra’s discrete series representa-
tions of SL(2, R)–an outline of the main steps [a] The main idea, [b] Computing
certain Haar integrals.
[18] Some aspects of quantum stochastic calculus.
[19] LDP rate function for stationary process perturbation of a Markov pro-
cess. The problem is: Let X(n), n ≥ 1 be a Markov process and let p be a
positive integer. Consider the probability distribution


N
μN,p = N −1 δX(n),X(n+1),...,X(n+p−1)
n=1

Then compute
limN →∞ N −1 .log(P r(μN,p ∈ B)
where B is a subset of measures on Rp . When B is a subset of stationary
measures on RZ , then compute the above probablity when p → ∞. The solution
to this problem is a generalization of Sanov’s theorem that the rate function for
the empirical distribution of iid random variables equals the relative entropy
between the distribution of a random variable in the sequence and the value
assumed by the empirical distribution.
[20] Application of the quantum Belavkin filter to image processing prob-
lems. We first convert the classical image field into a pure quantum state. After
coupling this quantum state to a coherent state of the bath, we then process this
state using a family of unitary operators satisfying the HP noisy Schrodinger
equation with the Lindblad parameters selected according to some optimum
criteria (training the Lindblad parameters). We then take noisy measurements
on the system plus bath satisfying the non-demolition property to estimate the
processed quantum state and we use this processed quantum state to reconstruct
the quantum processed classical image field. The design of the optimal quantum
processor is based on the coherent state vector an not on the creation, annihila-
tion and conservation processes that drive the qsde. Therefore, the only way of
estimating the processed quantum state must involve some real time algorithm
for removing out the quantum noise from non-demolition measurements and the
Belavkin filter does precisely that.

[21] Some remarks and problems on quantum Gaussian states. The main
focus of this section is to determine how the Weyl operator in system space
Advanced Probability and Statistics: Applications to Physics and Engineering 403

transforms under the GKSL equation with the Hamiltonian and Lindblad oper-
ators being some functions of the creation and annihilation operators in system
Hilbert space and to use this transformation to determine how the quantum
Fourier transform of a state evolves with time. This result can then be used to
show that if the Hamiltonian in the GKSL equation is a quadratic function of
the creation and annihilation operators ie the Hamiltonian of a system of har-
monic oscillators and further, if the Lindblad operators in the GKSL equation
are linear functions of the creation and annihilation operators, then Gaussianity
of a state is preserved under the GKSL dynamics.

[22] Some aspects of Harish-Chandra’s work on group representation theory.


[a] Fourier transforms on a Lie algebra.
[b] invariant differential equations satisfied by superposition of exponential
functions on the Cartan subalgebra.
[c] Restriction of an invariant differential operator to the Cartan sub-algebra.
[d] Differential operators acting on invariant functions evaluated at a Cartan
sub-algebra element.
[e] invariant differential equations satisfied by the character in Weyl’s char-
acter formula for compact semisimple Lie groups.
[f] Invariant differential operators acting on orbital integrals evaluated at a
Cartan sub-algebra element.

[23] Stochastic analysis of electromagnetic waves propagating in curved space-


time when the permittivity-permeability-conductivity is a random tensor field.

[24] Some theorems in linear algebra.


[a] Williamson’s theorem on diagonalization of a positive definite matrix in
even dimensions using a symplectic matrix.
[b] Second quantization of symplectic matrices acting on the Weyl operator.
[c] The Jordan canonical form.

[25] A problem suggested by Prof.Vijayant Agarwal on machine intelligence.


This problem deals with modeling the plant dynamics of a chaotic system using a
two layer neural network with a single layer non-linearity in the form of the tanh
function. The weight update equations are based on an EKF applied to noisy
plant output measurements. Further, when our neural network model has non-
random disturbance, then how to optimally design a disturbance observer and
apply the EKF to estimate the disturbance observer output. The optimization
problem here is to design the disturbance observer so that on the one hand,
the rate of convergence of the disturbance estimate error to zero is fast enough
and on the other hand, the mean square noise fluctuation in the disturbance
estimation error must not be too large.

[26] Test problems on basic quantum mechanics.


404 Advanced Probability and Statistics: Applications to Physics and Engineering

[a] Intuitive derviation of the form of the energy and momentum operators
in wave mechanics using Planck’s quantum hypothesis and De-Broglie’s wave-
particle duality.
[b] Energy spectru of a free particle in a 3-D box and verification of the
orthonogality of the stationary state eigenfunctions.
[c] Proof of the continuity of the wave function and its spatial gradient across
a boundary for finite potentials starting from Schrodinger’s equation.
[d] Definition of the (maximal) domains of the position and momentum oper-
ators in quantum mechanics and proofs of the self-adjointness of these operators.
[e] Proof that Feynman’s path integral for the evolution kernel satisfies the
Schrodinger equation. Also includes a discussion of the infinite normalization
constant involving in defining the path integral.
[f] Evaluating the path integral for the forced harmonic oscillator using two
methods. One by direct discretization of the Gaussian path integral, evaluating
the finite dimensional Gaussian integral and then passing over to the limit, and
two, by expressing the path integral in the frequency domain using expansion
of the position process as a Fourier series over the finite time interval [0, T ] and
using the transformation of the path measure to the measure on the countable
Fourier coefficient space.
[g] As Planck’s constant approaches zero, prove that the path integral reduces
to a single phase factor with the phase proportional to the action integral along
the classical path. This proves that in the limit of zero Planck’s constant,
quantum mechanics reduces to classical mechanics, or equivalently, interference
terms in quantum mechanics arising from a superposition over different paths
disappear reducing to just a contribution from just a single classical path.
[h] Large deviation theory in quantum mechanics. We consider a quantum
mechanical system with a small randomly time varying potential perturbing
the Hamiltonian. We assume that this perturbation is a Gaussian process and
calculate approximately the probability of transition between two stationary
states under this small randomly time varying perturbation. Using the LDP
rate function for a Gaussian process and the contraction principle, we then
calculate the rate at which this transition probability approaches zero in the
limit when the perturbation amplitude tends to zero.
[i] Evaluating the evolution of the quantum Fourier transform of a state under
the dynamics of an open quantum system whose Hamiltonian and Lindlbad
operators are functions of only the creation and annihilation operators of a
sequence of independent harmonic oscillators. The main idea in this calculation
is to exploit the commutation relations between the creation and annihilation
operators with the Weyl operators to derive the basic equations that describe
the evolution of the Weyl operator under the Heisenberg dynamnics of the open
quantum system.

[27] A brief summary of Varadhan’s work on large deviations.


[28] Notes on the basic equations for fracture analysis of materials. The
Lagrangian is first setup assuming that in each domain, it is a quadratic form
in the position and velocity field of the elastic material described by an elastic
Advanced Probability and Statistics: Applications to Physics and Engineering 405

constant tensor. The position field is the strain tensor and the kinetic energy is
a quadratic form in the velocity field, ie, in the time derivative of the displace-
ment vector while the potential energy is a quadratic form in the strain tensor.
The strain tensor is a symmetric linear function of the spatial derivatives of the
displacement vector and the quadratic form for the potential energy is one half
of the inner product between the strain tensor and the stress tensor. The stress
tensor is obtained by mutiplying the strain tensor of second rank with the fourth
rank elastic constant tensor. Finally, we discuss the canonical quantization of
this elastic field theory based on regarding the displacement field as the canon-
ical position fields, the partial derivative of the Lagrangian density w.r.t the
time derivative of the displacement field as the canonical momentum fields, in-
troducing canonical commutation relations between the canonical position and
canonical momentum fields and then formulating the functional Schrodinger
equation in which the canonical position fields become multiplication opera-
tors and the canonical momentum fields become partial functional/variational
derivatives w.r.t. the canonical position fields. This idea could be of use when
we are interested in determining the quantum probability laws for fracture of
molecular bonds on the Angstrom scale.
Reference: I wish to acknowledge my debt to mu colleague Dr.Abhishek
Tevatia for suggesting this problem to based on his research into the theoretical
and experimental aspects of fracture of materials.

[29] Neural network based EKF with disturbance observer for chaotic system
modeling. The crucial idea is to identify the chaotic system plant function by
approximating it with a neural network whose weights are updated using the
EKF driven by the noisy output of the original chaotic system. The fact that the
outptut of the chaotic system drives the neural EKF guarantees that after sev-
eral iterations on the neural weights, the neural newtork will well approximate
the original plant dynamics.
[30] Some remarks on classical and quantum entropy.
[31] Large deviations in classical and quantum hypothesis testing problems.
This section deals with the problem that if we have many independent copies
each of two quantum states and we apply a decision POV operator to discrim-
inate between these to tensor product states with the POV operator selected
according to the Neyman-Pearson rule, ie minimize the probability of false alarm
keeping the miss probability fixed, then at what is the maximum possible rate
does the false alarm error probability converge to zero as the number of copies
in the tensor product tends to infinity if we assume that the miss probability
rate converges to zero at some positive rate?
[32] Large deviations applied to some problems of stochastic control theory.
[33a] Large deviation principle in super-conductivity. When the vector po-
tential that drives the Fermionic fields in superconductivity has a small random
component that is modeled as a stationary stochastic process in time, then we
wish to compute the rate at which the superconductivity current fluctuations
converge to zero. The super-conductivity current density is computed using par-
tial differential operators acting on the temperature Greens function with the
406 Advanced Probability and Statistics: Applications to Physics and Engineering

Greens’ function satisfying standard partial differential equations of quantum


mechanics driven by the magnetic vector potential.
[33b] LDP in quantum state discrimination.
[34] Dirac’s equation in a radial potential, a simplified approach based on
conserved quantities.
[35] Problems in Brownian motion.
[a] Use of Doob’s optional stopping theorem for Martingales and the reflec-
tion principle to compute the pdf for absorbed Brownian motion as well as the
statistics of first hitting times.
[36] A problem suggested by Dr.Mridul Gupta: Computing the 3-dB band-
width for a fractional order low pass filter based on analysis of the power spectral
density after a stationary process is passed through the filter.
[37] Levy’s modulus for Brownian motion.
[38] A simulational problem in molecular chemistry based on computing
the Schrodinger evolution kernel of a time varying Hamiltonian operator using
approximate numerical techniques.
[39] Quantum image processing via Gaussian states.
[40] Abelian and non-Abelian anomalies in quantum field theory and the
interpretation of the anomaly integral using the Atiyah-Singer index theorem.
The Anomaly integral can be interpreted in terms of the Chern character com-
puted using the curvature tensor of the non-Abelian connection and the Euler
characteristic of the Riemannian manifold. This is the general feature of all
versions of the index theorem.
[41] Simulation of quantum gates by perturbing an atom with a magnetic
field.
[41a] Estimating parameters in a continuously evolving quantum system by
taking discrete measurements and incoroprating the collapse postulate.
[41b] Estimating parameters in a continuously evolving quantum system
subject to classical stochastic noise by taking continuous measurements like
transition probabilities and using the extended Kalman filter.
[42a] Estimating the image field intensity when the measured intensity is the
sum of a Poisson random field and a mean zero Gaussian random field with the
true intensity field is the mean of the Poisson random field.
[42b] Simulating using a neural network the non-linear Hartree-Fock equa-
tions for a two electron atom in which a linear Schrodinger equation for the
joint wave function is approximated using two coupled non-linear Schrodinger
equations for single particle wave functions.
[43a] Assignment problems in applied linear algebra.
[43b] Supersymmetry in quantum stochastic calculus.
[44] Neural network for simulating an N -electron atom using the Hartree-
Fock equations derived from a variational principle.
[45] Tutorial on linear algebra.
[46] Problems in electrodynamics, quantum mechanics and stochastic fluid
dynamics. If one writes down the Navier-Stokes equation for the velocity field of
a viscous fluid taking into account external random forcing field terms along with
the incompressiblity equation, then the resulting dynamics after discretization
Advanced Probability and Statistics: Applications to Physics and Engineering 407

over spatial pixels can be expressed as a nonlinear stochastic differential equation


driven by a Martingale process. The Martingale here models the external force
field and can in general be taken as a superposition of a Brownian motion and
a Poisson process component both of which can be expressed as the sum of a
Martingale and a finite variation process in accordance with the Doob-Meyer
decomposition.
[47] Course outline for applied linear algebra.
[48] Problems in discrete time stochastic filtering and control involving mod-
eling the plant dynamics by a neural network and taking a disturbance observer
into account for canceling out the disturbance in the neural network.
[49] Ginzburg-Landau theory of superconductivity.
[50] Relative entropy in quantum mechanics.
[51] Symmetry breaking in quantum field theory.
[52] Proof of the Knill-Laflamme theorem in quantum coding theory.
[53] Large deviation principle in group theoretic image processing.
[54] LDU and UDL decompositions in linear prediction theory. Both the
methods are natural consequences of the Gram-Schmidt orthonormalization pro-
cess for stationary stochastic processes.
[55] Quantum filtering theory for mixed process measurements with appli-
cation to estimating the quantum electromagnetic field inside a cavity res-
onator from non-demolition measurements of the bath field in the context of
the Belavkin quantum filter.
[56] Examination problems in electromagnetics.
[57] Examination problems in applied linear algebra.
[58] Simultaneous triangulability of a family of matrices.
[59] Supersymmetric, gauge invariant and Lorentz invariant action for non-
Abelian gauge fields.
[60] Tutorial problems in electromagnetics. Includes application of the cou-
pling between the Maxwell equations and Boltzmann’s kinetic transport equa-
tion to derive dispersion relations for waves in a plasma. The crucial idea here
is based on applying first order perturbation theory to the coupled Boltzmann
and Maxwell equations, then substituting into these linearized partial differ-
ential equations plane wave solutions to the Boltzmann distribution function
perturbation and the electromagnetic field perturbations and obtain the disper-
sion relation between frequency and wave number by equating the determinant
of the coefficient matrix to zero.
[61] Approximate performance analysis of the EKF when the state is an
arbitrary Markov process. Write down the EKF for the state estimate and
subtract this equation from the true state equation to obtain a stochastic differ-
ence/differential equations for the error process. This equations is linear in the
error process but the coefficient matrices in this equation are functions of the
state estimate at the previous time instant. This means that we have a linear
stochastic difference equation with random coefficient matrices. We obtain an
approximate solution to this difference equation using first order perturbation
theory. This approximate solution is based on assuming that the random coef-
ficient matrices in the difference equation are constant matrices plus small time
408 Advanced Probability and Statistics: Applications to Physics and Engineering

varying stochastic matrices. After obtaining this approximate solution, we as-


sume that the driving process and measurement noises in this equation are small
amplitude. Our aim in performance analysis is to calculate the large deviation
rate function for the error process and then use this rate function to calculate
the approximate probability that the error process will remain within a small
neighbourhood of the zero process. The calculation of the LDP rate function is
based on Legendre transforming the approximate limiting logarithmic moment
generating function for the error process.
[62] Application of the Euler-Maclaurin summation formula for computing
the maximum likelihood estimator of an image field subject to Poisson and
Gaussian noise. This includes a self-contained derivation of the Euler-Maclaurin
summation formula.
[63] Lorentz, gauge and supersymmetry invariant Lagrangian for Abelian
gauge superfields.
[64] Stochastic optimal control in discrete time based on the Bellman-dynamic
programming algorithm.
[65] Optimal control of a satellite carrying rocket in continuous time subject
to velocity damping forces and the inverse square law of gravitational forces
exerted on the satellite from the earth and the moon.
[66] Dirac’s equation in a radial potential. Reduction to ordinary differential
equations in the radial coordinate by assuming a solution ansatz in the form of
spherical harmonic spinors multiplied by scalar radial functions.
[67] Heisenberg matrix mechanics equations for the metric tensor of the
gravitational field in a cavity resonator interacting with a quantum noisy elec-
tromagnetic field of the surrounding bath. The main idea here is to assume
that the quantum electromagnetic field has two components, one a quantum
field theoretic component built out of plane wave superpositions of the photon
creation and annihilation operator fields in wave-number helicity space satisfy-
ing the CCR and two a quantum noise component built out of a superposition
of the creation, annihilation and conservation process appearing the quantum
stochastic calculus of Hudson and Parthasarathy. This quantum noisy electro-
magnetic field is substituted into the interaction component of the Maxwell field
with the gravitational field described by considering the component in the action
for the Maxwell field that contains the metric perturbations. This interaction
Lagrangian is added to the Einstein-Hilbert Lagrangian for the gravitational
field and when one applies the variational principle to this total Lagrangian
w.r.t the metric perturbation, the metric perturbations satisfy the linearized
Einstein field equations driven by the energy-momentum tensor of the quantum
noisy electromagnetic field. The solution to these equations yields the metric
perturbations as quadratic functionals of the noisy quantum electromagnetic
field whose joint moments can be easily evaluated when the quantum noisy
electromagnetic field is in a state described by a set of N photons with specified
wave number and helicities tensored with the noisy photon coherent state.
[68] Fundamental characteristics of classical and quantum probability. In-
cludes a discussion on the violation of Bell inequality in quantum mechanics
which conclusively proves the impossibility of constructing a hidden variable
Advanced Probability and Statistics: Applications to Physics and Engineering 409

theory in quantum mechanics, ie, of constructing a large underlying classical


probability space from which quantum probabilities can be derived by averag-
ing with respect to classical probability distributions.
[69] A problem in linear algebra.
[70] Weyl’s character formula for compact semisimple Lie groups.
[71] Modeling dynamical systems with constraints using neural networks.
[72] Some results in operator theory.
[73] Problems in operator theory.
[74] Problems in group theory.
[75] Problems on physics in a curved background metric. Focuses on the
Maxwell, Schrodinger and Dirac equation in a curved background metric. By
regarding the metric as a small perturbation of flat Minkowski space-time, we
may derive corrections to the solution of Maxwell’s equations for the electro-
magnetic field caused by interaction between gravity and the incident Maxwell
field. Likewise, we may derive corrections to the solution of Schrodinger’s and
Dirac’s wave function and energy eigenvalues caused by the interaction between
gravity and the unperturbed wave function of the atom.
[76] A problem in statistical image processing.
[77] Root systems in the classification theory of semisimple Lie algebras.
[78] The spectral theorem in infinite dimensional Hilbert space.
[79] The Riesz representation theorem in Hilbert space. It is important
because it gives us the ability to define the adjoint of a densely defined operator.
The definition of the adjoint is required to give meaning to a self-adjoint operator
in a Hilbert space which is required for doing quantum mechanics.
[80] Proof based on the Bolzano-Weierstrass property of the real number
system that all the norms on a finite dimensional vector space are equivalent,
ie, they all generate the same topology.
[81] Proof that a Hilbert space is finite dimensional iff the closed unit ball is
compact.
[82] Bosonic string theory.
[83] Quantum scattering theory:Scattering cross sections.
[84] Problems in scattering theory. Includes a discussion of the non-existence
of wave operators in Coulomb scattering and how to adjoin a unitary time
varying term so that the wave operators exist. Also includes a proof of the
Hausdorff-Young inequality based on the Hadamard three line theorem which
enables us to deduce new interesting conditions for the existence of the wave
operators.
[85] A problem in electrodynamics. The discussion here is about computing
the surface current density induced on a cylinder of finite length when an elec-
tromagnetic wave is incident upon it. The integral equations satisfied by the
induced surface current density are obtained by setting to zero the sum-total of
the tangential components of the incident and scattered electromagnetic field
on the cylinder surface. These equations are generalizations of the well known
Pocklington and Hallen integral equations in antenna theory.
[85] Other problems in scattering theory. Focuses on some important in-
equalities required to prove the existence of the wave operators.
410 Advanced Probability and Statistics: Applications to Physics and Engineering

[86] Quantum neural network for estimating the joint pdf of a random sig-
nal. Focuses on using the Schrodinger equation to generate in time a family of
probability densities that are smooth approximations of a given family of prob-
ability densities. The focus is on using the fact that for any time varying real
potential, the wave function in Schrodinger equation has a magnitude square
equal to a probability density because of unitary evolution and hence by just
manipulating the potential in accordance with a given error, we can generate a
family of pdf’s that track a given family of pdf’s. In other algorithms for pdf
tracking, we have at each iteration to impose the condition that the pdf remains
non-negative and integrates to unity whereas in a quantum neural network, that
is naturally guaranteed.
[87] Problems in electrodynamics related to general relativity. Focuses on
two problems. One, evaluation of the Ricci tensor components for a general
spherically symmetric metric and using these expressions to solve the Maxwell
equations in a spherically symmetric background metric. Two, study the dy-
namics of small perturbations around the Schwarzchild metric and use these
perturbed metric coefficients to determine the corresponding perturbation in
the electromagnetic field by solving Maxwell’s equations in the perturbed back-
ground metric using perturbation theory.
[88] Syllabus for a course on quantum computation.
[89] Some remarks on operator theory related to quantum scattering. The
aim here is to derive a far field formula for the scattered wave-function for an
incident plane wave function in terms of the Legendre polynomials and hence
obtain a formula for the total scattering cross section by integrating over all
solid angles. Azimuthal symmetry around the direction of the incident plane
wave is assumed, indeed this is a consequence of the radial character of the
interactio potential and the aximuthal symmetry of the incident plane wave.
[90] Some properties of compact operators in the context of scattering theory.
[91] Gravitational radiation.
[92] The post-Newtonian equations of hydrodynamics. Focuses on expand-
ing the metric tensor as well as the energy-momentum tensor of the matter field
in powers of the characteristic velocity of the system (or equivalently in terms of
the square root of the characteristic mass of the system) to derive a sequence of
linear equations for terms of each order of perturbation in terms of the lower or-
der perturbation terms and thereby obtain corrections to the Newtonian theory
of a fluid evolving under its own gravitational potential.
[93] Scattering of a gravitational wave by an electromagnetic field. This
problem focuses on determining the metric perturbations caused by the energy-
momentum tensor of the background electromagnetic field. The relevant equa-
tions that describe these perturbations are obtained by linearizing the Einstein
field equations with the energy-momentum tensor of the electromagnetic field
being treated as a first order perturbation.

[94] General relativity in three dimensional notation. This discussion relies


on the fact that any metric can, by an appropriate transformation of the space-
time coordinates, be brought to the synchronous form in which there are just
Advanced Probability and Statistics: Applications to Physics and Engineering 411

six components of the metric tensor to be solved for rather than the original
ten. We compute the three velocity in terms of the four velocity and write
down the geodesic equation in terms of the three velocity. The time coordinate
required in the definition of the three velocity is the synchronized coordinate
time obtained by separating the proper time differential square in terms of a
spatial component and a time component.

[95] Virasoro algebra in bosonic quantum string theory

[96] Image processing on S 2 . Discusses (a) nonlinear G-invariant model for


a noisy image field obtained by passing a given field into a volterra G-invariant
filter and which also gets corrupted by a noise field on the manifold having
G-invariant statistics. If we apply a G-transformation to the input image field,
then we wish to estimate this G-transformation from the noisy output by the
nmaximum likelihood/least squares method by expressing the G-Fourier trans-
form of the output image field in terms of the G-Fourier transform of the input
image. These Fourier transforms are expressed in terms of the irreducible repre-
sentations of G. The likelihood function is easily computed using the spherical
harmonic transform of the model since this transform is essentially the same
as the G-Fourier transform of the field after averaging over the isotropy group.
Since the noise has G-invariant statistics, the likelihood function is easy to com-
pute. Further, we derive conditions on the Volterra kernels for the total system
to be G-invariant. These conditions essentially amount to requiring that the
spherical harmonic transforms of these kernels commute with the correspond-
ing irreducible representation of G. By applying the Schur lemmas to these
conditions, we derive the general form of the Volterra kernels.
[97] The dynamics of two connected three dimensional links is studied in
the Lie algebra domain. First the Lagrangian for the two link system is set up
in terms of the two rotation matrices taking small random forcing torque into
account. The equations of motion are derived from the variational principle by
incorporating Lagrange multipliers into the system that constrain the matrices
to be rotations. Then each rotation matrix is expressed as the exponential of a
Lie algebra element and finally using the well known formula for the differential
of the exponential map, the equations of motion are expressed in the Lie algebra
domain. The LDP is then applied to these coupled sde’s to obtain approximate
formulas for the probability of the system to fall within the stability zone over
a given duration of time when the torques have small random components.

[98] The Knill-Laflamme theorem. This theorem derives necessary and suf-
ficient conditions for the existence of recovery operators on the range of the set
of mixed states in a finite dimensional Hilbert space in relationship to a noise
manifold of operators so that the state after being transmitted through a noisy
quantum channel constructed out of the noise manifold operators in the form
of the Choi-Kraus/Stinespring representation can be decoded without any error
by passing this output state through another quantum channel built out of the
recovery operators.
412 Advanced Probability and Statistics: Applications to Physics and Engineering

[99] Detecting brain diseases using EEG data


[a] Bispectral analysis of the EEG data. A non-zero bispectral peak indi-
cates a quadratic phase coupling between the corresponding frequencies that
usually arise from transmission of a harmonic comprising uncoupled frequencies
through a quadratic nonlinearity. The presence of a nonlinearity in the brain
mechanism that causes the uncoupled rhythms in the brain to get frequency
and phase coupled indicates the presence of a disease. Thus, brain diseases can
be detected and characterized by estimating the bispectrum of the brain surface
EEG signal. The nature of the nonlinearity can be determined by an analysis
of the bispectrum.

[100] The use of group representation theory in inferring about the brain
disease. The brain surface signal is modeled by a partial differential equation
in space-time driven by a noise source. The coefficients in this pde model are
the parameters to be estimated which give us information about the brain dis-
ease. From noisy measurements on the output of this pde, we can estimate
these parameters using the EKF on a real time basis. When further, the pde
operators are invariant under a group of transformations acting on the curved
brain surface, and further the driving noise also has G-invariant statistics, then
the pde signal model can be considerably simplified by using the group the-
oretic Fourier transform. A special case of this is when the brain surface is
modeled as a sphere with the group of rotations acting on it. In this case under
G-invariance of the system pde dynamics and noise statistics, taking the group
theoretic Fourier transform amounts to expanding the signal and noise fields in
terms of spherical harmonics and in this domain, the signal representation and
computational complexity is considerably reduced.

[101] Application of neural networks and artificial intelligence in the classi-


fication of brain disease. A neural network is used to model the brain system
in which the applied stimulus to the human being is taken as the input signal
and the EEG signal on the brain surface is taken as the output. The converged
weights of the neural network then represent the characteristics of the disease.
By comparing these converged weights to the prototype weights for each kind of
disease obtained using a training algorithm, we can classify the kind of disease
present in the given brain sample.

[102] The use of quantum neural networks in estimating the brain signal pdf.
Quantum neural networks are nature inspired algorithms that naturally generate
a whole family of probability densities and hence can be used to estimate the
joint pdf of the EEG signal on the brain surface.

[103] The Dyson-Schwinger equations and its connection with the vacuum
polarization tensor and the electron self-energy. This section considers first writ-
ing down the Maxwell equations driven by the Dirac four current density source
and the Dirac equation driven by the electromanetic potential connection term.
The Dirac wave function and the electromagnetic potentials are treated as wave
Advanced Probability and Statistics: Applications to Physics and Engineering 413

field operators in the second quantized formalism. We then derive using these
field equations, exact differential equations for the electromagnetic field propa-
gator and the Dirac field propagator. Extra interaction terms appear in these
propagator differential equations in the form of trilinear vertex terms which
are vacuum expectations of the time ordered product of three field operators,
namely a Dirac field operator, its adjoint and an electromagnetic field operator.
These are known as the Dyson-Schwinger equations and can be used to develop
power series expansions for the exact photon and electron propagators.

[104] Syllabus for end-sem exam for SPC01, Applied linear algebra

[105] Lower bound in Cramer’s theorem on large deviations


[106] Generalization to the non-iid case: The Gartner-Ellis theorem
[107] Miscellaneous problems in group representation and scattering theory
[108] Polyspectral analysis of EEG data via the EKF.

[109] The gravitational field in the presence of an external quantum photon


field. The Hamiltonian for the gravitational field interacting with the Maxwell
field is set up via the Einstein-Hilbert-Maxwell Lagrangian. The Hamilto-
nian field equations for the gravitational field are written given the forcing
Maxwell field. The forcing Maxwell field is described as a quantum noisy field in
terms of the Creation, annihilation and conservation processes in the Hudson-
Parthasarathy formalism. The resulting solution for the metric tensor of the
gravitational field is then expressed as the sum of a free field part in terms of
gravitons and a forced part in terms of the Maxwell noisy source. Quantum
fluctuations in the gravitational field are then computed using this model, ie,
quantum averaged moments of the metric tensor in a given state of gravitons and
photons. This is used to evaluate quantum averaged moments of the position
and velocity of particles and of fluid velocity fields moving under the influence
of the quantized metric tensor.

[110] Characterizing diseased tissues in terms of the permittivity and per-


meability fields estimated using scattering theory for electromagnetic fields.
[111] Questions on applied linear algebra
Continuation of the summary of Varadhan’s work on large deviations
[112] Continuation of the summary of the work of Hudson and Parthasarathy
on quantum stochastic calculus

[113] Synthesis of MRI data from speech/EEg signals


Raghuveer along with Nikias developed bispectral analysis and mentioned
that some of its important applications are (a) To determine phase coupling in
the EEG signals and apply it to characterize brain disease, (b) To estimate the
phase of nonminimum phase systems and hence by combining both power spec-
tral and bispectral analysis, to obtain algorithms for estimating the transfer
function of nonminimum phase systems, (c) to estimate quadratic nonlinear-
ities. After this initial pioneering work of Raghuveer and Nikias, many other
414 Advanced Probability and Statistics: Applications to Physics and Engineering

well known engineers like Jerry Mendel, Georgios B.Giannakis and Anantharam
Swamy developed a host of applications of polyspectra to signal and image pro-
cessing algorithms. The primary feature of higher order statistics and polyspec-
tra is that it can detect non-Gaussianity, nonliearity and determine when two
collections of random variables are mutually statistically independent (test of
independence) etc.

[114] Prediction of Neurological diseases by Speech Signals

You might also like