0% found this document useful (0 votes)
16 views10 pages

Week 2 DrBuddhananda Banerjee Vector RV

Uploaded by

Jainil Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

Week 2 DrBuddhananda Banerjee Vector RV

Uploaded by

Jainil Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

VECTOR VALUED DATA (5) Traffic Flow Analysis: Traffic data collected from sensors

installed on roads, highways, and intersections, including ve-


BUDDHANANDA BANERJEE hicle speed, volume, and traffic density.
(6) Wireless Communication Channels: In wireless commu-
nication, the channel matrix captures signal attenuation and
phase shifts across transmission paths, representing a matrix-
Definition 1. (X1 (!), X2 (!), · · · , Xk (!)) is a vector valued random valued random variable subject to environmental fluctuations.
variable where ! 2 ⌦. (7) Medical Imaging: Medical imaging data in MRI, organized
as a matrix of voxel intensities, forms a matrix-valued random
variable influenced by noise and other factors, reflecting the
(1) Stock Portfolio Returns : In finance, stock portfolio re-
stochastic nature of the imaging process.
turns, derived from the daily returns of individual stocks, form
(8) Genomic Data: Genomic data, represented as a matrix
a vector-valued random variable representing the aggregated
with genes and experimental conditions, constitutes a matrix-
performance of multiple securities.
valued random variable shaped by biological variation and
(2) Weather Data: Meteorological data, encompassing param-
measurement noise.
eters like temperature, humidity, wind speed, and precipita-
tion, constitute a vector-valued random variable, capturing
the variability of weather conditions.
(3) Economic Indicators: Monthly or quarterly economic indi-
cators such as GDP growth rate, inflation rate, unemployment
rate, and consumer spending, forming vectors over time peri-
ods.
(4) Environmental Monitoring: Measurements from air qual-
ity monitoring stations including concentrations of pollutants
such as CO2, NOx, SO2, and particulate matter, organized as
vectors.
Date: Last updated June 12, 2024.
1
R R
(2) Marginal densities: x f (u, y)du = f Y (y) and y f (x, v)dv =
Definition 2. Let (X, Y ) be a pair of random variable with joint f X (x).
distribution function F on same probability space(⌦, A, P) then (3) f (x, y) = f X (x) f Y (y) iff X and Y are independent.
(4) Conditional density of Y |X = x is f Y |x (y|x) = ffX(x,y)
(x) if f X (x) >
F (x, y) = P(X  x, Y  y)
0
= P({!|! 2 ⌦, X (!) 2 ( 1, x], Y (!) 2 ( 1, y]}) (0.1) (5) Conditional exception (regression) of Y given X = x i.e.
R
A joint CDF satisfies the following properties E(Y |X = x) = y |x y f Y |x (y|x)dy is a function of x.

(a) lim F (x, y) = 0


x# 1,y# 1
(b) lim F (x, y) = 1
x"1,y"1
(c) lim F (x, y) = FX (x) [Marginal distribution of X]
y"1
(d) lim F (x, y) = FY (y) [Marginal distribution of Y ]
x"1
(e) F (a < X  b, c < Y  d) = F (b, d) F (a, d) F (b, c) + F (a, c) 0
[Non decreasing]
(f) F (a < X  b, c < Y  d) = (FX (b) FX (a))(FY (d) FY (c)) iff X
Joint and Marginal distributions
and Y are independent.

1. Properties of joint pmf/pdf 2. Laws of expectation


P
A non negative function P(X = x, Y = y) = f (x, y) is said to be the Assume that (X, Y ) has joint p.m.f. f (x, y) with x |x| 2 f X (x) < 1
P
the joint p.m.f of discrete (X, Y ) if they have the following properties and y |y| 2 f Y (y) < 1 then
P P (1) E(↵X ) = ↵E(X )
⇤ x y f (x, y) = 1
P (2) E(X + Y ) = E(X ) + E(Y ) NOTE: Independence NOT required.
⇤ y f (x, y) = f X (x) i.e. marginal p.m.f. of X.
P (3) E(XY ) = E(X )E(Y ) ifX and Y are independent but the reverse
⇤ x f (x, y) = f Y (y) i.e. marginal p.m.f. of Y .
is not true in general.
If (X, Y ) is a pair of continuous valued random variables then
(4) Product moment Cov(X, Y ) = E(X µ x )(Y µy ) = E(XY )
@2
(1) Joint p.d.f of (X, Y ) is f (x, y) = @x@y F (x, y) such that µ x µy .
R R
x y
f (u, v)dudv = 1 (5)V ar (aX + bY ) = a2 V ar (X ) + b2 V ar (Y ) + 2ab Cov(X, Y )
2
Definition 3. Corr (X, Y ) = ⇢(X, Y ) = E (XY ) E (X)E (Y )
p satisfying Dispersion matrix: The dispersion matrix or the variance-
(V ar (X)V ar (Y )
| ⇢(X, Y )|  1. covariance matrix is

D(X) = ((Cov(Xi, X j )))n⇥n = E[(X E(X))(X E(X))T ] = ⌃x

with following properties


⇤ Cov(U p, Vq ) = ((Cov(Ui, Vj ))) p⇥q
⇤ E(X + b) = E(X) + b
⇤ D(X + b) = D(X)
⇤ Cov(X + b, Y + c) = Cov(X, Y)

Correlation coefficient

Remark 1. Correlation can measure linear dependency between


two random variables. Consider ✓ ⇠ Uni f orm(0, 2⇡) and note that
Corr (sin ✓, cos ✓) = 0 although we know that sin2 ✓ + cos2 ✓ = 1.

Let X = (X1, X2, · · · , X n )T be a random vector with finite expec-


tation for each of the component the we define expectation of a
random vector as E(X) = (E(X1 ), E(X2 ), · · · , E(X n ))T . Similarly if
Y = ((Yi j ))m⇥n is a random matrix with finite expectation for each Dispersion matrix - Scatterplot
of the component the we define expectation of a random matrix as
E(Y) = ((E(Yi j )))m⇥n .
3
Important Results: Let X be a random vector with n- 3. Multivariate Normal Distribution
components such that E(X) = µ and D(X) = ⌃ then

Definition 4. Multivariate Normal: A random vector X is said


to follow multivariate normal N (µ, ⌃ ) if it has a density
1
exp{ 2 (x µ)T ⌃ 1 (x µ)}
f (x) = p p
( 2⇡) n | ⌃ |
(1) E(l T X) = l T µ, where l 2 Rn is a constant vector
for some µ 2 Rn and | ⌃ | 0
(2) D(l T X) = l T ⌃ l
(3) E(AX) = Aµ, where A 2 R p⇥n is a constant matrix
(4) D(AX) = A⌃ AT and Cov(AX, BX) = A⌃ BT Exercise 1. (X, Y ) follow Bivariate normal (µ x, µy, 2,
x
2
y, ⇢) if (X, Y )
(5) If Cov(U p, Vq ) = then Cov(AU, BV) = A BT has joint density function
"⇣ ⌘ ✓ y µy ◆ 2 ⇣x ⌘✓y ◆#
1 x µx 2 µx µy
+ 2⇢
2(1 ⇢ 2 )
e x y x y

f (x, y) = p
2⇡ x, y 1 ⇢2
Exercise 2. If (X, Y ) follow Bivariate normal (µ x, µy, 2,
x
2
y, ⇢) the
show that Y |X = x follows
N (µy + ⇢ yx (x µ x ), (1 ⇢2 ) y2 )

FILTER
4
Definition 5. If X ⇠ N (µ, In ) then XT AX has Chi-squared distribu-
tion iff A is idempotent. Moreover XT AX ⇠ 2d f =Rank ( A),ncp=µT Aµ

Linear combination

Quadeartic form: A square matrix A = (( Ai j ))n⇥n is said to be


(a) positive definite if xT Ax > 0 for all x , 0 2 Rn .
(b) positive semi-definite (p.s.d) if xT Ax 0 for all x , 0 2 Rn .
Central Vs Non-Central Chi-square distribution. [Also called non-negative definite (n.n.d.)]
Properties:
(a) If A is p.d. then |A| > 0.
(b) If A is p.s.d. then |A| 0.

If S ✓ V then the projection matrix of subspace S is Ps satisfying


4. Tools from Linear Algebra
(a) Ps v = v if v 2 S
Column Space: The column space of a matrix A = [a1, a2, · · · , an ] (b) Ps v 2 S for all v 2 V
with columns a1, a2, · · · , an is A projection matrix Ps is an orthogonal projection matrix of subspace
C( A) = Sp{a1, a2, · · · , an } = { Ax| x 2 Rn .} S ✓ V if (I Ps ) is a projection matrix of S? ✓ V too.

Hence, row-space of A denoted by R ( A) = C( AT ) and


C( AAT ) = C( A) =) Rank ( AAT ) = Rank ( A)
5
Projection

5. Prediction & Least square problem

Quadratic forms Consider a data set D = {(x i, yi )|x i 2 R, yi 2 R, 8i = 1, 2, · · · , n}


where x i s are non stochastic but yi are stochastic and realized val-
Orthogonal Vectors: Two vectors u, v 2 V are said to be orthog-
P ues of random variable Yi s respectively. If the relation between the
onal if uT v = i ui vi = 0 The orthogonal projection matrix to
response variable y and the regressor variable x is linear in pa-
C( A) is A( AT A) 1 AT .
rameter then it is called a simple linear regression model. For
Vector differentiation:
> ⇣ ⌘ example
⇤ @x@xAx = x> A + A>
@a> x @x> a y= + 1x +✏
⇤ @x = @x = a> 0
x
y= 0 + 1e +✏

both are linear in parameter and hence simple linear regression model.
6
If ( ˆ0, ˆ1 ) minimizes S( 0, 1 ) then their values can be obtained by
Gauss-Markov model: yi = 0 + 1 x + ✏ i , where ✏ i ⇠ N (0, 2 ).
iid solving the normal equations
@S( 0, 1 ) X X
Here 0 2 R, i 2 R, > 0 are unknown model parameters. Here = 0 =) n ˆ0 + ˆ1 xi = yi (5.3)
E(yi ) = 0 + 1 x i and V ar (yi ) = 2, hence @ 0 i i
@S( 0, 1 ) X X X
yi ⇠ N ( 0 + 1 x i,
2
) 8i = 1, 2, · · · , n. (5.1) = 0 =) ˆ0 x i + ˆ1 x 2i = yi x i (5.4)
@ 1 i i i

P
NOTE: (1)Defining Sxy = i (yi ȳ)(x i x̄) we have the solutions as

Sxy
ˆ1 = and ˆ0 = ȳ ˆ1 x̄
Sxx
(2) We have never used the normality assumption for this estimation
i.e. the estimators will be the same even though ✏ i does not follow
normal distribution.
(3) Prediction or regression line: For any x such as old x i s or
some x new the prediction line is ŷ = ˆ0 + ˆ1 x new .

Least square condition

Estimation of model parameters: The least squared condi-


tion to estimate the model parameters is to minimize
X
2
S( 0, 1 ) = (yi 0 1 xi ) . (5.2)
i
Simple linear regression

7
It is a natural extension when there are more than one regressor where, Y = (y1, y2, · · · yn )T , = ( 0, 1, 2, · · · , k )T , X =
variables in the model. The model is written as (1, x1, x2, · · · , xn ) and ✏ = (✏ 1, ✏ 2, · · · ✏ n ) ⇠ N (0, 2 In ). Hence there
are k + 2 unknown model parameters, = ( 0, 1, 2, · · · , k )T
y= 0 + 1 x1 + 2 x2 +···+ k xk + ✏ = xT + ✏, (5.5)
and 2 > 0, which are to be estimated where,
where ✏ ⇠ N (0, 2 ) and x = (1, x 1, x 2, · · · , x k )T , = 2
Y ⇠ N (X , In ) (5.7)
( 0, 1, 2, · · · , k ) . It is trivial to notice from equation (5.5) that
T

E(y|x) = xT is a hyper plane where as the same equation repre-


sents a straight line in simple linear regression. When We have Least Squared Estimation: The least square condition to be min-
more than one observations from the above model then we represent imized to estimate , 2 is
then the ith observation can be represented as
S( ) = (Y X )T (Y X ) = YT Y 2 T
XT Y + T
XT X (5.8)
yi = 0 + 1 x 1i + 2 x 2i +···+ k x ki + ✏i = xTi + ✏ i,
If ˆ minimizes the least square condition then it satisfies the normal
iid
where ✏ i ⇠ N (0, 2) and xi = (1, x 1i, x 2i, · · · , x ki )T , = equations
( 0, 1, 2, · · · , k) .
T
@S( )
| =ˆ =0
@
=) 2XT Y + 2XT X ˆ = 0
=) ˆ = (XT X) 1 XT y (5.9)

So, ŷ = X ˆ = X(XT X) 1 XT y = PX y wherePX = X(XT X) 1 XT is the


orthogonal projection matrix of the column space of X i.e. C(X). It
means ŷ 2 C(X) = C(XT X).
Hence the estimated error in prediction

e=y ŷ = (In PX )y 2 C(X) ? = C(XT X) ? .


Multiple linear regression
It is interesting to note that ŷT e = 0
For n such observation we use matrix notation to represent it as
follows,

Y=X +✏ (5.6)
8
6. Eigenvalue a scalar, then v is an eigenvector of A, and is the correspond-
ing eigenvalue. This relationship can be expressed as: Av = v.
In statistics, multicollinearity or collinearity occurs when the pre-
dictors in a regression model are linearly dependent. Perfect multi-
collinearity arises when the predictive variables have an exact linear
relationship. In this case, the design matrix X T Xcannot be inverted,
leading to parameter estimates of the regression that are not well-
defined, as the system of equations has infinitely many solutions. Im-
perfect multicollinearity, on the other hand, occurs when the predic-
tive variables have a nearly exact linear relationship.

Eigenvalue

Multicollinearity

Consider a matrix A and a nonzero vector v. If applying A


to v is equivalent to simply scales by a factor of , where is

Principal Component Analysis of X T X

9
Department of Mathematics, IIT Kgaragpur
URL: https://sites.google.com/site/buddhanandastat/
E-mail address: [email protected]

10

You might also like