VECTOR VALUED DATA (5) Traffic Flow Analysis: Traffic data collected from sensors
installed on roads, highways, and intersections, including ve-
BUDDHANANDA BANERJEE hicle speed, volume, and traffic density.
(6) Wireless Communication Channels: In wireless commu-
nication, the channel matrix captures signal attenuation and
phase shifts across transmission paths, representing a matrix-
Definition 1. (X1 (!), X2 (!), · · · , Xk (!)) is a vector valued random valued random variable subject to environmental fluctuations.
variable where ! 2 ⌦. (7) Medical Imaging: Medical imaging data in MRI, organized
as a matrix of voxel intensities, forms a matrix-valued random
variable influenced by noise and other factors, reflecting the
(1) Stock Portfolio Returns : In finance, stock portfolio re-
stochastic nature of the imaging process.
turns, derived from the daily returns of individual stocks, form
(8) Genomic Data: Genomic data, represented as a matrix
a vector-valued random variable representing the aggregated
with genes and experimental conditions, constitutes a matrix-
performance of multiple securities.
valued random variable shaped by biological variation and
(2) Weather Data: Meteorological data, encompassing param-
measurement noise.
eters like temperature, humidity, wind speed, and precipita-
tion, constitute a vector-valued random variable, capturing
the variability of weather conditions.
(3) Economic Indicators: Monthly or quarterly economic indi-
cators such as GDP growth rate, inflation rate, unemployment
rate, and consumer spending, forming vectors over time peri-
ods.
(4) Environmental Monitoring: Measurements from air qual-
ity monitoring stations including concentrations of pollutants
such as CO2, NOx, SO2, and particulate matter, organized as
vectors.
Date: Last updated June 12, 2024.
1
R R
(2) Marginal densities: x f (u, y)du = f Y (y) and y f (x, v)dv =
Definition 2. Let (X, Y ) be a pair of random variable with joint f X (x).
distribution function F on same probability space(⌦, A, P) then (3) f (x, y) = f X (x) f Y (y) iff X and Y are independent.
(4) Conditional density of Y |X = x is f Y |x (y|x) = ffX(x,y)
(x) if f X (x) >
F (x, y) = P(X x, Y y)
0
= P({!|! 2 ⌦, X (!) 2 ( 1, x], Y (!) 2 ( 1, y]}) (0.1) (5) Conditional exception (regression) of Y given X = x i.e.
R
A joint CDF satisfies the following properties E(Y |X = x) = y |x y f Y |x (y|x)dy is a function of x.
(a) lim F (x, y) = 0
x# 1,y# 1
(b) lim F (x, y) = 1
x"1,y"1
(c) lim F (x, y) = FX (x) [Marginal distribution of X]
y"1
(d) lim F (x, y) = FY (y) [Marginal distribution of Y ]
x"1
(e) F (a < X b, c < Y d) = F (b, d) F (a, d) F (b, c) + F (a, c) 0
[Non decreasing]
(f) F (a < X b, c < Y d) = (FX (b) FX (a))(FY (d) FY (c)) iff X
Joint and Marginal distributions
and Y are independent.
1. Properties of joint pmf/pdf 2. Laws of expectation
P
A non negative function P(X = x, Y = y) = f (x, y) is said to be the Assume that (X, Y ) has joint p.m.f. f (x, y) with x |x| 2 f X (x) < 1
P
the joint p.m.f of discrete (X, Y ) if they have the following properties and y |y| 2 f Y (y) < 1 then
P P (1) E(↵X ) = ↵E(X )
⇤ x y f (x, y) = 1
P (2) E(X + Y ) = E(X ) + E(Y ) NOTE: Independence NOT required.
⇤ y f (x, y) = f X (x) i.e. marginal p.m.f. of X.
P (3) E(XY ) = E(X )E(Y ) ifX and Y are independent but the reverse
⇤ x f (x, y) = f Y (y) i.e. marginal p.m.f. of Y .
is not true in general.
If (X, Y ) is a pair of continuous valued random variables then
(4) Product moment Cov(X, Y ) = E(X µ x )(Y µy ) = E(XY )
@2
(1) Joint p.d.f of (X, Y ) is f (x, y) = @x@y F (x, y) such that µ x µy .
R R
x y
f (u, v)dudv = 1 (5)V ar (aX + bY ) = a2 V ar (X ) + b2 V ar (Y ) + 2ab Cov(X, Y )
2
Definition 3. Corr (X, Y ) = ⇢(X, Y ) = E (XY ) E (X)E (Y )
p satisfying Dispersion matrix: The dispersion matrix or the variance-
(V ar (X)V ar (Y )
| ⇢(X, Y )| 1. covariance matrix is
D(X) = ((Cov(Xi, X j )))n⇥n = E[(X E(X))(X E(X))T ] = ⌃x
with following properties
⇤ Cov(U p, Vq ) = ((Cov(Ui, Vj ))) p⇥q
⇤ E(X + b) = E(X) + b
⇤ D(X + b) = D(X)
⇤ Cov(X + b, Y + c) = Cov(X, Y)
Correlation coefficient
Remark 1. Correlation can measure linear dependency between
two random variables. Consider ✓ ⇠ Uni f orm(0, 2⇡) and note that
Corr (sin ✓, cos ✓) = 0 although we know that sin2 ✓ + cos2 ✓ = 1.
Let X = (X1, X2, · · · , X n )T be a random vector with finite expec-
tation for each of the component the we define expectation of a
random vector as E(X) = (E(X1 ), E(X2 ), · · · , E(X n ))T . Similarly if
Y = ((Yi j ))m⇥n is a random matrix with finite expectation for each Dispersion matrix - Scatterplot
of the component the we define expectation of a random matrix as
E(Y) = ((E(Yi j )))m⇥n .
3
Important Results: Let X be a random vector with n- 3. Multivariate Normal Distribution
components such that E(X) = µ and D(X) = ⌃ then
Definition 4. Multivariate Normal: A random vector X is said
to follow multivariate normal N (µ, ⌃ ) if it has a density
1
exp{ 2 (x µ)T ⌃ 1 (x µ)}
f (x) = p p
( 2⇡) n | ⌃ |
(1) E(l T X) = l T µ, where l 2 Rn is a constant vector
for some µ 2 Rn and | ⌃ | 0
(2) D(l T X) = l T ⌃ l
(3) E(AX) = Aµ, where A 2 R p⇥n is a constant matrix
(4) D(AX) = A⌃ AT and Cov(AX, BX) = A⌃ BT Exercise 1. (X, Y ) follow Bivariate normal (µ x, µy, 2,
x
2
y, ⇢) if (X, Y )
(5) If Cov(U p, Vq ) = then Cov(AU, BV) = A BT has joint density function
"⇣ ⌘ ✓ y µy ◆ 2 ⇣x ⌘✓y ◆#
1 x µx 2 µx µy
+ 2⇢
2(1 ⇢ 2 )
e x y x y
f (x, y) = p
2⇡ x, y 1 ⇢2
Exercise 2. If (X, Y ) follow Bivariate normal (µ x, µy, 2,
x
2
y, ⇢) the
show that Y |X = x follows
N (µy + ⇢ yx (x µ x ), (1 ⇢2 ) y2 )
FILTER
4
Definition 5. If X ⇠ N (µ, In ) then XT AX has Chi-squared distribu-
tion iff A is idempotent. Moreover XT AX ⇠ 2d f =Rank ( A),ncp=µT Aµ
Linear combination
Quadeartic form: A square matrix A = (( Ai j ))n⇥n is said to be
(a) positive definite if xT Ax > 0 for all x , 0 2 Rn .
(b) positive semi-definite (p.s.d) if xT Ax 0 for all x , 0 2 Rn .
Central Vs Non-Central Chi-square distribution. [Also called non-negative definite (n.n.d.)]
Properties:
(a) If A is p.d. then |A| > 0.
(b) If A is p.s.d. then |A| 0.
If S ✓ V then the projection matrix of subspace S is Ps satisfying
4. Tools from Linear Algebra
(a) Ps v = v if v 2 S
Column Space: The column space of a matrix A = [a1, a2, · · · , an ] (b) Ps v 2 S for all v 2 V
with columns a1, a2, · · · , an is A projection matrix Ps is an orthogonal projection matrix of subspace
C( A) = Sp{a1, a2, · · · , an } = { Ax| x 2 Rn .} S ✓ V if (I Ps ) is a projection matrix of S? ✓ V too.
Hence, row-space of A denoted by R ( A) = C( AT ) and
C( AAT ) = C( A) =) Rank ( AAT ) = Rank ( A)
5
Projection
5. Prediction & Least square problem
Quadratic forms Consider a data set D = {(x i, yi )|x i 2 R, yi 2 R, 8i = 1, 2, · · · , n}
where x i s are non stochastic but yi are stochastic and realized val-
Orthogonal Vectors: Two vectors u, v 2 V are said to be orthog-
P ues of random variable Yi s respectively. If the relation between the
onal if uT v = i ui vi = 0 The orthogonal projection matrix to
response variable y and the regressor variable x is linear in pa-
C( A) is A( AT A) 1 AT .
rameter then it is called a simple linear regression model. For
Vector differentiation:
> ⇣ ⌘ example
⇤ @x@xAx = x> A + A>
@a> x @x> a y= + 1x +✏
⇤ @x = @x = a> 0
x
y= 0 + 1e +✏
both are linear in parameter and hence simple linear regression model.
6
If ( ˆ0, ˆ1 ) minimizes S( 0, 1 ) then their values can be obtained by
Gauss-Markov model: yi = 0 + 1 x + ✏ i , where ✏ i ⇠ N (0, 2 ).
iid solving the normal equations
@S( 0, 1 ) X X
Here 0 2 R, i 2 R, > 0 are unknown model parameters. Here = 0 =) n ˆ0 + ˆ1 xi = yi (5.3)
E(yi ) = 0 + 1 x i and V ar (yi ) = 2, hence @ 0 i i
@S( 0, 1 ) X X X
yi ⇠ N ( 0 + 1 x i,
2
) 8i = 1, 2, · · · , n. (5.1) = 0 =) ˆ0 x i + ˆ1 x 2i = yi x i (5.4)
@ 1 i i i
P
NOTE: (1)Defining Sxy = i (yi ȳ)(x i x̄) we have the solutions as
Sxy
ˆ1 = and ˆ0 = ȳ ˆ1 x̄
Sxx
(2) We have never used the normality assumption for this estimation
i.e. the estimators will be the same even though ✏ i does not follow
normal distribution.
(3) Prediction or regression line: For any x such as old x i s or
some x new the prediction line is ŷ = ˆ0 + ˆ1 x new .
Least square condition
Estimation of model parameters: The least squared condi-
tion to estimate the model parameters is to minimize
X
2
S( 0, 1 ) = (yi 0 1 xi ) . (5.2)
i
Simple linear regression
7
It is a natural extension when there are more than one regressor where, Y = (y1, y2, · · · yn )T , = ( 0, 1, 2, · · · , k )T , X =
variables in the model. The model is written as (1, x1, x2, · · · , xn ) and ✏ = (✏ 1, ✏ 2, · · · ✏ n ) ⇠ N (0, 2 In ). Hence there
are k + 2 unknown model parameters, = ( 0, 1, 2, · · · , k )T
y= 0 + 1 x1 + 2 x2 +···+ k xk + ✏ = xT + ✏, (5.5)
and 2 > 0, which are to be estimated where,
where ✏ ⇠ N (0, 2 ) and x = (1, x 1, x 2, · · · , x k )T , = 2
Y ⇠ N (X , In ) (5.7)
( 0, 1, 2, · · · , k ) . It is trivial to notice from equation (5.5) that
T
E(y|x) = xT is a hyper plane where as the same equation repre-
sents a straight line in simple linear regression. When We have Least Squared Estimation: The least square condition to be min-
more than one observations from the above model then we represent imized to estimate , 2 is
then the ith observation can be represented as
S( ) = (Y X )T (Y X ) = YT Y 2 T
XT Y + T
XT X (5.8)
yi = 0 + 1 x 1i + 2 x 2i +···+ k x ki + ✏i = xTi + ✏ i,
If ˆ minimizes the least square condition then it satisfies the normal
iid
where ✏ i ⇠ N (0, 2) and xi = (1, x 1i, x 2i, · · · , x ki )T , = equations
( 0, 1, 2, · · · , k) .
T
@S( )
| =ˆ =0
@
=) 2XT Y + 2XT X ˆ = 0
=) ˆ = (XT X) 1 XT y (5.9)
So, ŷ = X ˆ = X(XT X) 1 XT y = PX y wherePX = X(XT X) 1 XT is the
orthogonal projection matrix of the column space of X i.e. C(X). It
means ŷ 2 C(X) = C(XT X).
Hence the estimated error in prediction
e=y ŷ = (In PX )y 2 C(X) ? = C(XT X) ? .
Multiple linear regression
It is interesting to note that ŷT e = 0
For n such observation we use matrix notation to represent it as
follows,
Y=X +✏ (5.6)
8
6. Eigenvalue a scalar, then v is an eigenvector of A, and is the correspond-
ing eigenvalue. This relationship can be expressed as: Av = v.
In statistics, multicollinearity or collinearity occurs when the pre-
dictors in a regression model are linearly dependent. Perfect multi-
collinearity arises when the predictive variables have an exact linear
relationship. In this case, the design matrix X T Xcannot be inverted,
leading to parameter estimates of the regression that are not well-
defined, as the system of equations has infinitely many solutions. Im-
perfect multicollinearity, on the other hand, occurs when the predic-
tive variables have a nearly exact linear relationship.
Eigenvalue
Multicollinearity
Consider a matrix A and a nonzero vector v. If applying A
to v is equivalent to simply scales by a factor of , where is
Principal Component Analysis of X T X
9
Department of Mathematics, IIT Kgaragpur
URL: https://sites.google.com/site/buddhanandastat/
E-mail address: [email protected]
10