0% found this document useful (0 votes)
36 views8 pages

Time Series Analysis and Stochastic Processes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Time Series Analysis and Stochastic Processes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Contents

1 Introduction 3
1.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Mean and covariance functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2
3

Chapter 1

Introduction

Let us consider the following example:

Figure 1.1: Global mean land–ocean temperature index from 1880 to 2009, with the base period
1951–1980. The data are deviations, measured in degrees centigrade, from the 1951-1980 average.

Example 1.0.1. We observe the Global mean land-ocean temperature index from 1880 to 2009,
with the base period 1951–1980. Each data point xt determines the mean deviation at the year
t ∈ {1951, . . . , 1980}. We observe the following:
• The order of the observation is important to draw conclusions.
• A correlation between consecutive years is observed.
In this course we will learn to interpret time series and provide mathematical models. In particular,
the goals are
1. Understand the underlying mechanism driving the time series.
2. Separate (filter) noise from signals.
3. Form accurate and reliable forecasts.
We start with a basic definition.
Definition 1.0.1. A discrete time series is a sequence {x1 , . . . , xT } of observation.
We must highlight the following particularities of a time series:
1. Data points are ordered with respect to time.
2. Observations are generally not independent.
3. Time series analysis studies this dependence among adjacent observations.
4. This dependence allows us to forecast future values of a time series from current and past
values.
4

1.1 Stochastic processes


Definition 1.1.1. A probability space is a triple (Ω, A, P ) where

• Ω is a set;

• A is a σ-field (algebra);

• P is a probability measure.

The pair (Ω, A) is called the measurable space.

Remark 1.1.1. One of the most common measurable spaces is (Rd , B(Rd )) where B(Rd ) is called
the Borel σ-field and it is defined as the smallest σ-field (in the sense of contention) containing the
open sets (elements of the Euclidean topology) of Rd .

Definition 1.1.2. A random variable is a measurable function X : Ω → R. That is, a function


X : Ω → R such that for every Borel set A ∈ B(R), the set

X −1 (A) = {ω ∈ Ω : X(ω) ∈ A}

is measurable (it belongs to A). A random vector is a measurable function X : Ω → Rd .

A random variable X defines a probability measure PX over (R, B(R)), via the relation

PX (A) = P (X −1 (A)).

Such a probability measure can be uniquely characterized via sets of the form

(−∞, x], a ∈ R,

i.e., by means of the cumulative distribution function (cdf)

FX (x) = PX ((−∞, x]) = P (X ≤ x).

By the same means, a random vector X defines a probability measure PX over (Rd , B(Rd )), via the
relation
PX (A) = P (X −1 (A)),
which is determined by the probability of the squares

(−∞, a1 ] × · · · × (−∞, ad ],

and a fortiori by the multivariate cdf

FX (a) = P (X1 ≤ a1 , . . . , Xd ≤ ad ).

Definition 1.1.3. Let T be a set. A stochastic process with index set T is a family of random variables
{Xt }t∈T .

We observe that a stochastic process {Xt } with index set T defines a function

Ω × T (ω, t) 7→ Xt (ω) ∈ R

such that ω 7→ Xt (ω) is measurable for every fixed t. What the practitioner sees is not the entire
process {Xt }t∈T –otherwise, it would be perfectly understood, and inference would no longer be
required– but just a realization.

Definition 1.1.4. A realization of a stochastic process {Xt } with index set T is the curve {Xt (ω)}t∈T ,
i.e., a function
T ∋ T 7→ Xt (ω).
5

Statistical inference involves estimating the process given the observation of a realization.
However, first, we need to understand how stochastic processes are characterized and ensure their
existence.

Definition 1.1.5. Let {Xt }t∈T be a stochastic process with index set T ⊂ R and

F = {t = (t1 , . . . , tn ) ∈ T n , n ∈ N t1 < · · · < td }.

Then the family (finite-dimensional) distribution functions of {Xt }t∈T is {Ft }t∈F , where for every
t = (t1 , . . . , tn ) ∈ F

Rn ∋ x = (x1 , . . . , xn ) 7→ Ft (x) = F(Xt1 ,...Xtn ) (x) = P (Xt1 ≤ x1 , . . . Xtn ≤ xn ).

The following result is known as Kolmogorov’s Theorem.

Theorem 1.1.2. Let T ⊂ R be a set and

F = {t = (t1 , . . . , tn ) ∈ T n , n ∈ N t1 < · · · < td }.

The probability distribution functions {Ft }t∈F , are the distribution functions of some stochastic
process if and only if for any t = (t1 , . . . , tn ) ∈ F and 1 ≤ i ≤ n,

lim Ft (x) = Ft−i (x−i ), (1.1)


xi →+∞

where
t−i = (t1 , . . . , ti−1 , ti+1 , . . . , tn )

and
x−i = (xt1 , . . . , xti−1 , xti+1 , . . . , xtn ).

Note that the compatibility condition in Kolmogorov’s Theorem can be also stated in terms of
the probability laws as follows.

Theorem 1.1.3. Set T ⊂ R. Let {µt }t∈F be such that for each t = (t1 , . . . , tn ) and n ∈ N, µt is a
probability measure over Rn . Then the following are equivalent

1. There exists an stochastic process {Xt }t∈T such that for every t = (t1 , . . . , tn ) ∈ F

µt (A1 , . . . , An ) = P (Xt1 ∈ A1 , . . . , Xtn ∈ An )

for every (measurable) subsets A1 , . . . , An of R.

2. The compatibility condition

µ(t1 ,...,ti−1 ,ti+1 ,...,tn ) (A1 , . . . , Ai−1 , Ai+1 , . . . , An ) = µ(t1 ,...,tn ) (A1 , . . . , Ai−1 , R, Ai+1 , . . . , An )

holds for every i ∈ {1, . . . , n}, (t1 , . . . , td ) ∈ F, n ∈ N and (measurable) subsets A1 , . . . , An


of R.

Remark 1.1.4. In general for arbitrary index set T , without a order relation we should add the
condition of permutation invariance.
6

1.2 Mean and covariance functions


R
The mean of a random variable X is denoted as E[X] = xdPX (x). A process {Xt }t∈T will define
a “curve of means”
T ∋ T 7→ E[Xt ],
as long as E[Xt ] is finite for all t ∈ T . By the same means we can define the hen the autocovariance
function.

Definition 1.2.1. Let {Xt }t∈T be a process with E[Xt2 ] < +∞ for all t ∈ T . The autocovariance
function is

T × T ∋ (s, t) 7→ γX (s, t) = Cov(Xt , Xs ) = E[(Xt − E[Xt ])(Xs − E[Xs ])] ∈ R.

From the autocovariance function of a process, we can define the concept of stationarity, which
we establish as follows. Subsequently, we will observe that stationary processes have a correlation
function that depends only on the time difference.

Definition 1.2.2. Set T = Z. A process {Xt }t∈T is stationary if

1. for every t ∈ T , E[Xt2 ] < ∞ ;

2. E[Xt ] = m is constant; and

3. for all t, s, h ∈ T , γX (s + h, t + h) = γX (s, t).

Remark 1.2.1. A stationary process {Xt }t∈T satisfies γX (s, t) = γX (s − t, 0) for all t, s ∈ T .
Therefore, for stationary processes the autocovariance function might be written as

T ∋ h 7→ γX (h) = γX (h, 0).

Lemma 1.2.2. Let γ be the autocovariance of a stationary process. Then the following holds:

• (Non negative) γ(0) > 0;

• |γ(h)| ≤ γ(0);

• (even) γ(h) = γ(−h) for all h ∈ T .

Definition 1.2.3. A function K : T = Z → R is non negative if


n
X
ai aj K(j − k) ≥ 0,
j,k=1

for all (a1 , . . . , an ) ∈ Rn and (1, . . . , n) ∈ T n

Theorem 1.2.3. A real-valued function defined on the integers is the autocovariance function of a
stationary time series if and only if it is even and non-negative definite.

A stronger notion of stationarity is introduced below.

Definition 1.2.4. Set T = Z. A process {Xt }t∈T is strict stationary if the random vectors
(X1 , . . . , Xn ) and (X1+h , . . . , Xn+h ) are equally distributed for all 1, . . . , k, h ∈ T and n ∈ N.

The following result is straightforward.

Lemma 1.2.4. Set T = Z. Let {Xt }t∈T be strict stationary. Then PXt = PXs for all t, s ∈ T .
7

It is straightforward to observe that if we have the equality in distribution of the vectors


(Xt , Xs ) = (Xt+h , Xs+h ) and the process has a finite second-order moment, then the covariances
automatically coincide. Therefore,

(strict stationary + E[X 2 ] < ∞) =⇒ stationary ( The reverse is not true).

As the previous display underlines with wisdom, stationary does not imply strict stationary. However
there is a family of random variables determined by the mean and covariance operator– the Gaussian
random variables.
Definition 1.2.5. A process {Xt }t∈T is Gaussian if (X1 , . . . , Xn ) is Gaussian in Rn for all n ∈ N
and 1, . . . , n ∈ T .
For Gaussian processes stationary implies strict stationary, i.e.,

(stationary + Gaussian) =⇒ (strict stationary).

Gaussian random vectors


Definition 1.2.6. Let µ ∈ R and σ > 0. A random variable X ∈ R is Gaussian N (µ, σ 2 ) if
−(t−µ)2
Z x
e σ
FX (x) = √ d t.
−∞ 2πσ 2

In Rd we can define Gaussian random vectors by means of the inner product

⟨x, y⟩ = x1 y1 + · · · + xd yd .

Definition 1.2.7. A random vector X in Rd is Gaussian if for every x ∈ Rd , the univariate random
variable ⟨x, X⟩ is Gaussian.
The mean of a multivariate random variable X = (X1 , . . . , Xd ) is

E[X] = (E[X1 ], . . . , E[Xd ]).

Remark 1.2.5. We notice that a vector x ∈ Rd is determined by (and can be identified with) the
mapping
y 7→ ⟨x, y⟩.
Therefore, we can also see the expectation of a random vector X ∈ Rd as the unique vector
E[X] ∈ Rd such that
⟨E[X], x⟩ = E[⟨X, x⟩]
The covariance (form) between two random vectors X, Y is the bilinear form

Rd × Rd ∋ (x, y) 7→ CovXY (x, y) = E[⟨x, X − E[X]⟩⟨y, Y − E[Y]⟩].

The covariance operator (or matrix) is

ΣXY = E[(X − E[X])t (Y − E[Y])].

The covariance matrix ΣXY satisfies

x(ΣXY )yt = CovXY (x, y).

Lemma 1.2.6. The covariance CovXX satisfies the following:


1. (symmetric) for all x, y ∈ Rd , it holds CovXX (x, y) = CovXX (y, x);

2. (positive) for every x ∈ Rd , it holds CovXX (x, x) ≥ 0.


8

As a consequence, ΣXX is symmetric and non-negative definite.

Notice that a definite positive matrix M has nonnegative eigenvalues λ1 ≥ · · · ≥ λd ≥ 0.


Moreover, there exists an orthogonal matrix1 P such that
 
λ1 0 · · · 0
 0 λ2 · · · 0 
 t
M =P . ..  P .

. .. . .
. . . . 
0 0 · · · λn

The matrix P encodes the change of basis from the canonical to the basis of eigenvectors of M .

Definition 1.2.8. The characteristic function of a random vector X is defined as

ϕX : Rd ∋ x 7→ E[ei⟨X,x⟩ ] ∈ C.

Recall here that the complex exponential function over the real line is defined via Euler’s formula as

eit = cos(t) + i sin(t), t ∈ R.

Lemma 1.2.7. Two random vectors are equal in distribution if and only if their characteristic
functions agree.

Lemma 1.2.8. Let X be a Gaussian random vector with mean m and covariance matrix Σ. Then

• it has characterized function


1
ϕX (x) = e⟨m,x⟩ i− 2 ⟨x,Σx⟩

• and, if det(Σ) > 0, it has density


1 1 −1 (x−m)⟩
fX (x) = e− 2 ⟨x−m,Σ ,
det(Σ)1/2 (2π)n/2

with respect to the Lebesgue measure.

Since the characteristic function determines the distribution of a random variable we obtain the
following result.

Theorem 1.2.9. Let X, Y ∈ Rd be Gaussian random vectors. Then X and Y are equally distributed
if and only if E[X] = E[Y] and ΣXX = ΣYY

Remark 1.2.10. Theorem 1.2.9 yields the fact that Gaussian processes stationary implies strict
stationary.

1
A matrix P is orthogonal if P −1 = P t .

You might also like