SE 513: System Identification: ARX-ARMAX-Other Models
SE 513: System Identification: ARX-ARMAX-Other Models
Topic 07
ARX-ARMAX-Other Models
Md Shafiullah, Ph.D.
Lecture Outline
Random Processes
Shift Operators
AutoRegressive with eXtra input (ARX) model
AutoRegressive Moving Average with eXtra input
(ARMAX) model
Other models
2 Md Shafiullah, Ph.D.
Random Processes
A stochastic process is a mathematical description of random events that
occur one after another. It is possible to order these events according to the
time at which they occur.
A stochastic process, also known as a random process, is a collection of
random variables indexed by time. A random process is conceptually an
extension of a random variable.
If 𝑡1 = 𝑡2 = 𝜏:
𝐶𝑋𝑋 𝑡, 𝑡 = 𝐸 𝑋 2 𝑡 − 𝐸𝑋 𝑡 2
= 𝑉𝑎𝑟 [𝑋 𝑡 ]
Note:
𝜌𝑋𝑋 𝑡, 𝑡 = 1
𝑅𝑋𝑌 𝑡, 𝑡 + 𝜏 = 𝐸 𝑋(𝑡) 𝐸 𝑌 𝑡 + 𝜏
𝐸 𝑋(𝑡) = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
𝑉 𝑋(𝑡) = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
2𝜋
1 𝐴 2𝜋
𝐸𝑋 𝑡 =න A sin (ωt + φ) × × 𝑑φ = න sin (ωt + φ) 𝑑φ
0 2𝜋 2𝜋 0
𝐸𝑋 𝑡 = 0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
1 1
From the definition of the uniform distribution: 𝑓 φ = =
2𝜋−0 2𝜋
1 𝜋
𝐸𝑋 𝑡 = න cos(ω0 𝑡 + 𝜃) 𝑑𝜃 = 0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
2𝜋 −𝜋
1 1
From the definition of the uniform distribution: 𝑓 𝜃 = =
𝜋−(−𝜋) 2𝜋
The second moment of the process:
𝑣 𝑡 = 𝑉𝑎𝑟 𝑋𝑡 = 𝑉 𝑋𝑡 = 𝐸 𝑋 2 𝑡 − 𝐸𝑋 𝑡 2
𝑣 𝑡 = 𝑉𝑎𝑟 𝑋𝑡 = 𝑉 𝑋𝑡 = 𝐸 𝑋 2 𝑡 − 𝐸𝑋 𝑡 2
𝑉 𝑋𝑡 = 𝐸 𝑋 2 𝑡 − 0 2 = 𝐸 𝑋2 𝑡 = 𝐸 𝑐𝑜𝑠 2 ω0 𝑡 + 𝜃
1 + cos[2 ω0 𝑡 + 𝜃 ] 1 cos[2 ω0 𝑡 + 𝜃 ]
𝑉 𝑋𝑡 = 𝐸 =𝐸 +
2 2 2
1 1
𝑉 𝑋𝑡 = + 𝐸[cos 2ω0 𝑡 + 2𝜃 ]
2 2
𝐸 𝑋(𝑡) = 3.5
If the random process 𝑋(𝑡) takes the value -1 with probability 1/3 and takes the
value 1 with probability 2/3, find whether 𝑋(𝑡) is a stationary process or not.
Mean:
1 1 2 1
𝐸 𝑋(𝑡) = 𝑛𝑃𝑛 = −1 × + +1 × = = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
−1 3 3 3
Variance:
1 2
2 2 2
1 8
𝑉 𝑋(𝑡) = 𝐸 𝑋 𝑡 − 𝐸𝑋 𝑡 = 𝑛 𝑃𝑛 − = = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
−1 3 9
2
𝐸𝑋 𝑡 = cos 𝑡 ≠ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
𝜋
2𝐴
𝐸𝑋 𝑡 =− sin ω0 𝑡 ≠ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
𝜋
𝑓𝑥 𝑥1 , 𝑥2 , … , 𝑥N : 𝑡1 , 𝑡2 , … … 𝑡𝑁 = 𝑓𝑥 𝑥1 , 𝑥2 , … , 𝑥N : 𝑡1 + 𝜏, 𝑡2 + 𝜏, … … 𝑡𝑁 + 𝜏
Two random processes, 𝑋 𝑡 and 𝑌 𝑡 , are called jointly WSS, if the following
conditions are satisfied:
o 𝑋 𝑡 is a WSS
o 𝑌 𝑡 is aWSS
o 𝑅𝑋𝑌 𝑡1 , 𝑡2 = 𝐸 𝑋𝑡1 𝐸 𝑌𝑡2 = 𝑅𝑋𝑌 𝑡, 𝑡 + 𝜏 = 𝑅𝑋𝑌 𝜏 = 𝑅𝑋𝑌 𝑡2 − 𝑡1
As cos (t) and sin (t) are deterministic signals, therefore, their mean will be the same as the
main signal.
As, 𝐸 𝐴 = 𝐸 𝐵 = 0
2𝜋
1 𝐴 2𝜋
𝐸𝑋 𝑡 =න A cos (ωt + φ) × × 𝑑φ = න cos (ωt + φ) 𝑑φ
0 2𝜋 2𝜋 0
𝐸𝑋 𝑡 = 0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
1 1
From the definition of the uniform distribution: 𝑓 φ = =
2𝜋−0 2𝜋
𝐴2
𝑅 𝑡1 , 𝑡2 =𝐸 cos{ω(𝑡1 + 𝑡2 )+2 φ}+cos{ω(𝑡1 − 𝑡2 }
2
𝐴2 𝐴2
𝑅 𝑡1 , 𝑡2 = × 𝐸 cos{ω(𝑡1 + 𝑡2 )+2 φ} + × 𝐸 cos{ω(𝑡1 − 𝑡2 )}
2 2
𝐴2 𝐴2
𝑅 𝑡1 , 𝑡2 = ×0+ × cos{ω(𝑡1 − 𝑡2 )} = 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡1 − 𝑡2
26 Link 2 2
Md Shafiullah, Ph.D.
Random Processes
Wide-Sense Stationary (WSS) Process
For a random process , 𝑋(𝑡), Y is an uniform random variable in the interval -1
to +1. Check whether the process is aWSS or not.
𝑋 𝑡 = Y sin (ωt)
Now, the mean of the process:
∞ 1
𝐸[𝑋 𝑡 ] = න 𝑋 𝑡 𝑓 Y 𝑑Y = න Y sin (ωt) × 𝑓 𝑌 × 𝑑Y
−∞ −1
1
1
𝐸𝑋 𝑡 == sin ωt න Y × × 𝑑Y
−1 2
𝐸𝑋 𝑡 = 0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
1 1
From the definition of the uniform distribution: 𝑓 Y = =
1−(−1) 2
1
𝑅 𝑡1 , 𝑡2 = න Y 𝑠𝑖𝑛 ω𝑡1 × Y 𝑠𝑖𝑛(ω𝑡2 ) × 𝑓 𝑌 × 𝑑Y
−1
1
1
𝑅 𝑡1 , 𝑡2 = 𝑠𝑖𝑛 ω𝑡1 × 𝑠𝑖𝑛 ω𝑡2 2
න 𝑌 × × 𝑑Y
−1 2
1 1
From the definition of the uniform distribution: 𝑓 Y = 1−(−1) = 2
1
𝑅 𝑡1 , 𝑡2 = 𝑠𝑖𝑛 ω𝑡1 × 𝑠𝑖𝑛 ω𝑡2
28 Link 3
Md Shafiullah, Ph.D.
Random Processes
Hence, X (t) is not a WSS process.
1
𝑅 𝑡1 , 𝑡2 = cos ω(𝑡1 −𝑡2 ) − cos ω(𝑡1 +𝑡2 )
6
𝑅 𝑡1 , 𝑡2 ≠ 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡1 − 𝑡2
𝑈(𝑧) 𝑧 −1 𝑌(𝑧)
𝑦 𝑘 = 𝑤1 𝑢 𝑘 + 𝑤2 𝑢(𝑘 − 1)
32 Md Shafiullah, Ph.D.
Shift Operators
Two-tap moving average, Finite Impulse Response (FIR) filtering:
𝑦 𝑘 = 𝑤1 𝑢 𝑘 + 𝑤2 𝑢(𝑘 − 1)
1 1 1
𝑦 𝑘 = 𝑢 𝑘 + 𝑢 𝑘 − 1 ; 𝑤ℎ𝑒𝑛, 𝑤1 = 𝑤2 =
2 2 2
1
𝑦 𝑘 = 1 + 𝑞 −1 𝑢 𝑘
2
𝑢(𝑘)
1 + 𝑞 −1 𝑦(𝑘)
2
33 Md Shafiullah, Ph.D.
Shift Operators
First-order autoregressive process:
𝑦 𝑘 =𝑦 𝑘−1 +𝑒 𝑘
𝑦 𝑘 [1 − 𝑞 −1 ] = 𝑒 𝑘
1
𝑦 𝑘 = −1
𝑒 𝑘
1−𝑞
1
𝑒(𝑘) 𝑦(𝑘)
1 − 𝑞 −1
34 Md Shafiullah, Ph.D.
Shift Operators
First-order Infinite Impulse Response (IIR) filtering:
𝑦 𝑘 =𝑦 𝑘−1 +𝑢 𝑘 +𝑢 𝑘−1
𝑦 𝑘 1 − 𝑞 −1 = 1 + 𝑞 −1 𝑢 𝑘
1 + 𝑞 −1
𝑦 𝑘 = −1
𝑢 𝑘
1−𝑞
1 + 𝑞 −1
𝑢(𝑘) 𝑦(𝑘)
35
1 − 𝑞 −1 Md Shafiullah, Ph.D.
Shift Operators
Using shift operators: general input-output relation can be expressed
as:
𝑦 𝑘 = 𝐺 𝑞 −1 𝑢 𝑘 + 𝐻 𝑞 −1 𝑒 𝑘
𝑦 𝑘 : 𝑜𝑢𝑡𝑝𝑢𝑡
𝑢 𝑘 = 𝑖𝑛𝑝𝑢𝑡
𝑒 𝑘 = 𝑛𝑜𝑖𝑠𝑒
𝐺 𝑞 −1 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑜𝑝𝑒𝑟𝑎𝑡𝑜𝑟
[Transfer function in Laplace and Z-domain]
36 Md Shafiullah, Ph.D.
Shift Operators
𝑦 𝑘 = 𝐺 𝑞 −1 𝑢 𝑘 + 𝐻 𝑞 −1 𝑒 𝑘
𝑒(𝑘)
𝐻(𝑞 −1 )
37 Md Shafiullah, Ph.D.
Shift Operators
Most of the common plants can be modelled as:
𝑞 −1 𝐵 ∗ (𝑞 −1 )
𝐺 𝑞 −1 =
𝐴(𝑞 −1 )
𝐴 𝑞−1 = 1 + 𝑎1 𝑞−1 + 𝑎2 𝑞−2 + ⋯ + 𝑎𝑛𝑎 𝑞−𝑛𝑎
𝑦 𝑘 + 𝑎1 𝑦 𝑘 − 1 + 𝑎2 𝑦 𝑘 − 2 + 𝑎3 𝑦 𝑘 − 3 + ⋯ + 𝑎𝑛𝑎 𝑦 𝑘 − 𝑛𝑎
= 𝑏1 𝑢 𝑘 − 1 + 𝑏2 𝑢 𝑘 − 2 + 𝑏3 𝑢 𝑘 − 3 + ⋯ + 𝑏𝑛𝑏 𝑢 𝑘 − 𝑛𝑏 + 𝑒(𝑘)
𝐵 𝑞−1 −1 1
𝐺 𝑞 −1 =
𝐴 𝑞−1
and 𝐻 𝑞 =
𝐴 𝑞 −1
39 Md Shafiullah, Ph.D.
AutoRegressive with eXtra input (ARX) model
Error, e(k), directly enters into the difference equation. Thus, the model is
also known as the equation error model:
𝑦 𝑘 = 1 − 𝐴 𝑞 −1 𝑦 𝑘 + 𝐵 𝑞 −1 𝑢(𝑘) + 𝑒(𝑘)
𝑦ො 𝑘 𝑘 − 1 = 1 − 𝐴 𝑞 −1 𝑦 𝑘 + 𝐵 𝑞 −1 𝑢(𝑘) +Error?
Autoregressive (AR)
Extra (X) input
𝜑 𝑘 = [𝑦 𝑘 − 1 , 𝑦 𝑘 − 2 , … , 𝑦 𝑘 − 𝑛𝑎 ; 𝑢 𝑘 − 1 , 𝑢 𝑘 − 2 , … , 𝑢 𝑘 − 𝑛𝑏 ]
Inputs: 𝑢 1 , 𝑢 2 , 𝑢 3 , … , 𝑢 10
Outputs: [𝑦 1 , 𝑦 2 , 𝑦(3) … , 𝑦 10 ]
y(11) does not depends on u(11) for this specific model. But, in other models, it can be included.
43 Md Shafiullah, Ph.D.
AutoRegressive Moving Average with eXtra
input (ARMAX) model
ARMAX Model:
𝑦 𝑘 + 𝑎1 𝑦 𝑘 − 1 + 𝑎2 𝑦 𝑘 − 2 + 𝑎3 𝑦 𝑘 − 3 + ⋯ + 𝑎𝑛𝑎 𝑦 𝑘 − 𝑛𝑎 =
𝑏1 𝑢 𝑘 − 1 + 𝑏2 𝑢 𝑘 − 2 + 𝑏3 𝑢 𝑘 − 3 + ⋯ + 𝑏𝑛𝑏 𝑢 𝑘 − 𝑛𝑏 + 𝑒(𝑘)
+ 𝑐1 𝑒 𝑘 − 1 +𝑐2 𝑒 𝑘 − 2 + ⋯ + 𝑐𝑛𝑐 𝑒 𝑘 − 𝑛𝑐
𝐵 𝑞 −1 𝐶 𝑞 −1
𝑦 𝑘 = −1
𝑢 𝑘 + −1
𝑒(𝑘)
𝐴 𝑞 𝐴 𝑞
45 Md Shafiullah, Ph.D.
Other models
ARARX Model:
−1 −1
1
𝐴 𝑞 𝑦 𝑘 =𝐵 𝑞 𝑢(𝑘) + 𝑒(𝑘)
𝐷 𝑞−1
Autoregressive (AR) Autoregressive (AR)
Extra (X) input
ARARMAX Model:
𝐶 𝑞−1
𝐴 𝑞−1 𝑦 𝑘 = 𝐵 𝑞−1 𝑢(𝑘) + 𝑒(𝑘)
𝐷 𝑞−1
Autoregressive (AR) ARMA
Extra (X) input
46
Investigate the last terms: how they are AR and ARMA! Md Shafiullah, Ph.D.
Other models
Box-Jenkins Model:
𝐵 𝑞−1 𝐶 𝑞−1
𝑦 𝑘 = −1
𝑢(𝑘) + −1
𝑒(𝑘)
𝐴 𝑞 𝐷 𝑞
𝐵 𝑞 −1
𝑦 𝑘 = 𝑢(𝑘) + 𝑒(𝑘)
𝐹 𝑞 −1
47 Md Shafiullah, Ph.D.
ARX Model
𝑒(𝑘)
1
𝐴(𝑞 −1 )
48 Md Shafiullah, Ph.D.
ARMAX Model
𝑒(𝑘)
𝐶(𝑞 −1 )
𝐴(𝑞 −1 )
49 Md Shafiullah, Ph.D.
Box-Jenkin Model
𝑒(𝑘)
𝐶(𝑞 −1 )
𝐷(𝑞−1 )
50 Md Shafiullah, Ph.D.
Output Error Model
𝑒(𝑘)
51 Md Shafiullah, Ph.D.
Model Order Selection
ARMAX Model:
𝑦 𝑘 + 𝑎1 𝑦 𝑘 − 1 + 𝑎2 𝑦 𝑘 − 2 + 𝑎3 𝑦 𝑘 − 3 + ⋯ + 𝑎𝑛𝑎 𝑦 𝑘 − 𝑛𝑎 = 𝑏1 𝑢 𝑘 − 1 +
𝑏2 𝑢 𝑘 − 2 + 𝑏3 𝑢 𝑘 − 3 + ⋯ + 𝑏𝑛𝑏 𝑢 𝑘 − 𝑛𝑏 + 𝑒(𝑘) +𝑐1 𝑒 𝑘 − 1 +𝑐2 𝑒 𝑘 − 2
+ ⋯ + 𝑐𝑛𝑐 𝑒 𝑘 − 𝑛𝑐
𝐵 𝑞 −1 = 𝑏1 𝑞 −1 + 𝑏2 𝑞 −2 + ⋯ + 𝑏𝑛𝑏 𝑞 −𝑛𝑏
𝐴 𝑞 −1 = 1 + 𝑎1 𝑞 −1 + 𝑎2 𝑞 −2 + ⋯ + 𝑎𝑛𝑎 𝑞 −𝑛𝑎
𝐶 𝑞 −1 = 1 + 𝑐1 𝑞 −1 + 𝑐2 𝑞 −2 + ⋯ + 𝑐𝑛𝑐 𝑞 −𝑛𝑐
𝜃 = [𝑎1 , 𝑎2 , … , 𝑎𝑛𝑎 , 𝑏1 , 𝑏2 , … , 𝑏𝑛𝑏 , 𝑐1 , 𝑐2 , … , 𝑐𝑛𝑐 ]
𝑑 = 𝑛𝑎 + 𝑛𝑏 + 𝑛𝑐 = 𝑓𝑖𝑥𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟
LSE can be used for
𝑑 = 𝑛𝑎 𝑎𝑛𝑑 𝑛𝑏 = 𝑛𝑐 = 0 for AR model
model parameters
𝑑 = 𝑛𝑎 +𝑛𝑏 𝑎𝑛𝑑 𝑛𝑐 = 0 for ARX model estimation
𝑑 = 𝑛𝑎 + 𝑛𝑐 𝑎𝑛𝑑 𝑛𝑏 = 0 for ARMA model
52 Md Shafiullah, Ph.D.
𝑑 = 𝑛𝑎 + 𝑛𝑏 + 𝑛𝑐 for ARMAX model
Model Order Selection
For simplicity, we consider equally balanced models:
𝑑= 𝑛𝑎 𝑎𝑛𝑑 𝑛𝑏 = 𝑛𝑐 = 0 for AR model
𝑑 = 𝑛𝑎 +𝑛𝑏 𝑎𝑛𝑑 𝑛𝑐 = 0 for ARX model where 𝑛𝑎 = 𝑛𝑏
𝑑 = 𝑛𝑎 + 𝑛𝑐 𝑎𝑛𝑑 𝑛𝑏 = 0 for ARMA model where 𝑛𝑎 = 𝑛𝑐
𝑑 = 𝑛𝑎 + 𝑛𝑏 + 𝑛𝑐 for ARMAX model where 𝑛𝑎 = 𝑛𝑏 = 𝑛𝑐
Naïve approach
▪ Calculate the error for d=1 to dmax
▪ Choose the value of d for which the error is minimum
53 Md Shafiullah, Ph.D.
Model Order Selection
d=m= model order
d=1; Not enough d=2; Data are well d>11; Too many degrees of
degrees of freedom to described by this freedom: The model is
represent the data model choice perfect for this set of data.
Underfitting: the Overfitting: the model is
Correct model order
model is too simple. too complex!
54 Md Shafiullah, Ph.D.
Model Order Selection
Model fitting vs. model complexity:
Autocorrelation of a continuous
white noise signal has a strong
peak (Dirac delta function) at t=0,
and is 0 for all t unequal 0.
Correlation Function
58 Link Md Shafiullah, Ph.D.
Model Order Selection
Cross-correlation test:
▪ The cross-correlation between the input(s) and the error should be zero meaning
that there is nothing to be extracted by playing with model order!
▪ The cross-correlation between, the error ε 𝑡 , the input 𝑢 𝑡 , can be defined
as:
𝑟ε𝑢 𝜏 = 𝐸[ε 𝑡 + 𝜏 𝑢 𝑡 ]
▪ The prediction errors ε(t) are independent of the input u(t) for 𝜏 ≥ 0 (current
and future errors are independent of current inputs).
▪ The prediction errors ε(t) are independent of the input u(t) for any 𝜏 (all the
errors are independent of all inputs).
A few references also used to calculate the cross correlations of (a) input square
and residuals, (b) input square and residual square, (c) residuals and
(input×residuals).
59 Link1, Link2, Link3 Md Shafiullah, Ph.D.
Model Order Selection
Adding zero-mean white noise with a variance of 4
MATLAB Command: a=5; b=4; t=1:0.05:20; x=a*cos(2*pi*t/10);
xn=x+sqrt(b)*randn(size(x));
60 Md Shafiullah, Ph.D.
Model Order Selection
Adding 20 SNR white noise to a saw tooth signal
61 Md Shafiullah, Ph.D.
Model Order Selection
Validation: model is validated on a fresh set of data
𝑦 𝑡 + 𝑎1 𝑦 𝑡 − 1 + ⋯ + 𝑎𝑛𝑎 𝑦 𝑡 − 𝑛𝑎
= 𝑏1 𝑢 𝑡 − 𝑛𝑘 + ⋯ + 𝑏𝑛𝑏 𝑢 𝑡 − 𝑛𝑏 − 𝑛𝑘 + 1 + 𝑒(𝑡)
𝑦 𝑡 : Output at time t
𝑛𝑘 : Number of input samples that occur before the input affects the output, also
called the dead time in the system
MATLAB Command:
% Computing output with filtering u and
e by G and H: model output
y=filter(b,a,u)+filter(1,a,e);
MATLAB Command:
% Computing output with filtering u and
e by G and H: model output
y=filter(b,a,u)+filter(1,a,e);
MATLAB Command:
%Generating a random input
L=51200; % length of input
u=2*randn(1,L); % input
76 Md Shafiullah, Ph.D.
Canonical form of a stochastic process
If the process 𝑦(𝑡) is in canonical form, we can now define the inverse of
෩ 𝑧 = 𝑊(𝑧)−1
𝑊 𝑧 as: 𝑊
𝐶 𝑧
𝑌 𝑧 =𝑊 𝑧 𝐸 𝑧 = 𝐸(𝑧)
𝐴 𝑧
𝐴 𝑧
෩ 𝑧 𝑌 𝑧 =
𝐸 𝑧 =𝑊 𝑌(𝑧)
𝐶 𝑧
𝑦 𝑡 = 𝑤 𝑡 − 𝑗 𝑒(𝑗)
𝑗=−∞
∞
𝑦 𝑡 = 𝑤 𝑖 𝑒(𝑡 − 𝑖)
𝑖=0
𝑤 𝑡 is the system impulse response with input 𝑒(𝑡)
We can now do the same for 𝑒(𝑡):
∞
𝑒 𝑡 = 𝑤
𝑖 𝑦(𝑡 − 𝑖)
𝑖=0
𝑒(𝑡) 𝐶(𝑧) 𝑦(𝑡) 𝑦(𝑡) 𝐴 𝑧 𝑒(𝑡)
𝑊 𝑧 = ෩ 𝑧 =
𝑊
𝐴(𝑧) 𝐶 𝑧
78 Md Shafiullah, Ph.D.
Canonical form of a stochastic process
To predict 𝑦(𝑡) at 𝑟-step ahead:
∞
𝑦 𝑡 + 𝑟 = 𝑤 𝑖 𝑒(𝑡 + 𝑟 − 𝑖)
𝑖=0
𝑦 𝑡+𝑟
= 𝑤 0 𝑒 𝑡 + 𝑟 + 𝑤 1 𝑒 𝑡 + 𝑟 − 1 + 𝑤 2 𝑒 𝑡 + 𝑟 − 2 + ……
+ 𝑤 𝑟 𝑒 𝑡 + 𝑤 𝑟 + 1 𝑒 𝑡 − 1 + ……
The red part of the above equation is not computable as we cannot obtain
𝑒 𝑡 + 1 , 𝑒 𝑡 + 2 , … , 𝑒 𝑡 + 𝑟 due to no knowledge about 𝑦 at those time
instants.
However, the blue portion can be obtained.
𝑦 𝑡 + 𝑟 = 𝜀 𝑡 + 𝑟 + 𝑦(𝑡
ො + 𝑟|𝑡)
𝑦ො 𝑡 + 𝑟 𝑡 = 𝑦 𝑡 + 𝑟 − 𝜀 𝑡 + 𝑟
𝜀 𝑡 + 𝑟 = 𝑦 𝑡 + 𝑟 − 𝑦ො 𝑡 + 𝑟 𝑡
∞
𝐶(𝑧)
𝑊 𝑧 = = 𝑤(𝑡)𝑧 −𝑡
𝐴(𝑧)
𝑡=0
= 𝑤 0 + 𝑤 1 𝑧 −1 + 𝑤 2 𝑧 −2 + 𝑤 𝑟 − 1 𝑧 −𝑟+1 + … …
+ 𝑤 𝑟 𝑧 −𝑟 + 𝑤 𝑟 + 1 𝑧 −𝑟−1 + … …
We can see the blue portion contain the expressions used in the predictor equation
while the red portion contain the expressions used in the error equation.
Therefore, taking the long division of the transfer function, 𝑊 𝑧 , expression
might be useful.
81 Md Shafiullah, Ph.D.
Canonical form of a stochastic process
Computing 𝑟-step long division of the transfer function, 𝑊 𝑧 :
𝐶(𝑧) 𝑅𝑟 (𝑧)
𝑊 𝑧 = = 𝑄𝑟 𝑧 +
𝐴(𝑧) 𝐴(𝑧)
−𝑟
𝑅(𝑧)
𝑊 𝑧 = 𝑄𝑟 𝑧 + 𝑧
𝐴(𝑧)
𝑊 𝑧 = 𝑤 0 + 𝑤 1 𝑧 −1 + 𝑤 2 𝑧 −2 + ⋯ + 𝑤 𝑟 − 1 𝑧 −𝑟+1
+ 𝑧 −𝑟 {𝑤 𝑟 + 𝑤 𝑟 + 1 𝑧 −1 + ⋯ }
82 Md Shafiullah, Ph.D.
𝑒(𝑡) 𝐶(𝑧) 𝑦(𝑡)
𝑊 𝑧 =
𝐴(𝑧)
Therefore,
𝑅 𝑧 𝑅 𝑧 𝐴 𝑧 𝑅 𝑧 1
𝑦ො 𝑡 + 𝑟 𝑡 = 𝑒 𝑡 = 𝑦(𝑡) = 𝑦(𝑡)
𝐴 𝑧 𝐴 𝑧 𝐶 𝑧 1 𝐶 𝑧
𝑅 𝑧
𝑦ො 𝑡 + 𝑟 𝑡 = 𝑦(𝑡)
𝐶 𝑧
84 Md Shafiullah, Ph.D.
Canonical form of a stochastic process
Calculation of the variance of the prediction error:
𝑉𝑎𝑟[𝜀 𝑡 + 𝑟 ]
= 𝑉𝑎𝑟[𝑤 0 𝑒 𝑡 + 𝑟 + 𝑤 1 𝑒 𝑡 + 𝑟 − 1 + ⋯ + 𝑤 𝑟 − 1 𝑒(𝑡 + 1)]
= 𝐸 𝑤 0 𝑒 𝑡 + 𝑟 + 𝑤 1 𝑒 𝑡 + 𝑟 − 1 + ⋯+ 𝑤 𝑟 − 1 𝑒 𝑡 + 1 2
= 𝐸[ 𝑤 0 𝑒 𝑡 + 𝑟 2 + 𝑤 1 𝑒 𝑡 + 𝑟 − 1 2 + ⋯+ 𝑤 𝑟 − 1 𝑒 𝑡 + 1 2
+ 2𝑤 0 𝑤 1 𝑒 𝑡 + 𝑟 𝑒 𝑡 + 𝑟 − 1 + ⋯ ]
= 𝐸[ 𝑤 0 𝑒 𝑡 + 𝑟 2 + 𝑤 1 𝑒 𝑡+𝑟−1 2 + ⋯+ 𝑤 𝑟 − 1 𝑒 𝑡 + 1 2]
As 𝐸 𝑒 𝑡 + 𝑟 𝑒 𝑡 + 𝑟 − 1 =0
85 Md Shafiullah, Ph.D.
Canonical form of a stochastic process
Calculation of the variance of the prediction error:
𝑉𝑎𝑟 𝜀 𝑡 + 𝑟
2 2 2
=𝐸 𝑤 0 𝑒 𝑡+𝑟 + 𝑤 1 𝑒 𝑡+𝑟−1 + ⋯+ 𝑤 𝑟 − 1 𝑒 𝑡 + 1
= 𝑤 0 2𝐸 𝑒 𝑡 + 𝑟 2 + 𝑤 1 2𝐸 𝑒 𝑡 + 𝑟 − 1 2 + ⋯ + 𝑤 𝑟 − 1 2 𝐸 [𝑒(𝑡 +
1)2 ]
= 1 × λ2 + 𝑤 1 2
× λ2 + ⋯ + 𝑤 𝑟 − 1 2
× λ2
= λ2 {1 + 𝑤 1 2 + ⋯ + 𝑤 𝑟 − 1 2}
1 1 −2
𝑄2 𝑧 = 1 − 𝑧 −1 𝑅2 𝑧 = 𝑧 = 𝑧 −2 𝑅(𝑧)
87 6 12
Md Shafiullah, Ph.D.
Optimal Predictor for MA processes
The following MA process is a WSS:
𝑦 𝑡 = 𝑐0 𝑒 𝑡 + 𝑐1 𝑒 𝑡 − 1 + ⋯ + 𝑐𝑛 𝑒 𝑡 − 𝑛
𝑛
= 𝑐𝑖 𝑒 𝑡 − 𝑖 ; 𝑒 𝑡 ~𝜔𝑛 (𝜇, λ2 )
𝑖=0
To predict 1-step ahead, we need to make the process canonical by imposing the
following conditions:
𝑐0 = 1, therefore: 𝐶 𝑧 = 1 + 𝑐1 𝑧 −1 + ⋯ + 𝑐𝑛 𝑧 −𝑛
Making the mean of the white noise zero: 𝑒 𝑡 ~𝜔𝑛 (0, λ2 )
The MA process now become:
𝑦 𝑡 = 𝑒 𝑡 + 𝑐1 𝑒 𝑡 − 1 + ⋯ + 𝑐𝑛 𝑒 𝑡 − 𝑛 ; 𝑒 𝑡 ~𝜔𝑛 (0, λ2 )
𝑦 𝑡 = 𝜀 𝑡 + 𝑦(𝑡|𝑡
ො − 1)
It can also be expressed as:
88 𝑦 𝑡 + 1 = 𝜀 𝑡 + 1 + 𝑦(𝑡
ො + 1|𝑡) Md Shafiullah, Ph.D.
MA Process: Example
Evaluate the 2-step ahead predictor of the following MA(2) process:
1 1
𝑦 𝑡 =𝑒 𝑡 − 𝑒 𝑡 − 1 − 𝑒 𝑡 − 2 ; 𝑒 𝑡 ~𝜔𝑛 (0,1)
12 12
To predict 2-step ahead, we need to check the following aspects first:
Mean of the process, 𝐸 𝑦 𝑡 = 0 as 𝑒 𝑡 is of zero mean.
𝐶(𝑧) 1 −1 1 −2
Transfer function of the process, 𝑊 𝑧 = =1− 𝑧 − 𝑧
𝐴(𝑧) 12 12
1 −1 1 −2
𝐶(𝑧) 1− 𝑧 − 𝑧
𝑊 𝑧 = = 12 12
𝐴(𝑧) 1
As the two polynomials, 𝐶(𝑧) and 𝐴(𝑧), are co-prime, monic, and of same order.
Therefore, the process is in canonical form
1 1
𝐶 𝑧 , 𝐴(𝑧) have roots [ , − ] inside the unit circle: asymptotically stable
3 4
filter.
89 Link Md Shafiullah, Ph.D.
MA Process: Example
Recall: 𝑟-step ahead prediction definition:
𝑦 𝑡 + 𝑟 = 𝑤 0 𝑒 𝑡 + 𝑟 + 𝑤 1 𝑒 𝑡 + 𝑟 − 1 + ……
+ 𝑤 𝑟 𝑒 𝑡 + 𝑤 𝑟 + 1 𝑒 𝑡 − 1 + ……
𝑦 𝑡+𝑟
= 𝑒 𝑡 + 𝑟 + 𝑐1 𝑒 𝑡 + 𝑟 − 1 + 𝑐2 𝑒 𝑡 + 𝑟 − 2 + ⋯ + 𝑐𝑟−1 𝑒 𝑡 − 1 + ⋯
+𝑐𝑟 𝑒 𝑡 + 𝑐𝑟+1 𝑒 𝑡 + 1 + ⋯ + +𝑐𝑛 𝑒 𝑡 + 𝑟 − 𝑛
𝑐0 = 𝑤 0 = 1 for canonical form
𝑦 𝑡 = 𝑒 𝑡 + 𝑐1 𝑒 𝑡 − 1 + 𝑐2 𝑒 𝑡 − 2 + ⋯ + 𝑐𝑟−1 𝑒 𝑡 − 𝑟 − 1 + ⋯
+𝑐𝑟 𝑒 𝑡 − 𝑟 + 𝑐𝑟+1 𝑒 𝑡 − 𝑟 + 1 + ⋯ + +𝑐𝑛 𝑒 𝑡 − 𝑛
𝑦 𝑡 = 𝜀 𝑡 + 𝑦(𝑡|𝑡
ො − 2)
Given process:
1 1
𝑦 𝑡 =𝑒 𝑡 − 𝑒 𝑡−1 − 𝑒 𝑡−2
12 12
Therefore,
1
𝑦ො 𝑡 𝑡 − 2 = − 𝑒 𝑡 − 2
12
1
𝜀 𝑡 =𝑒 𝑡 − 𝑒 𝑡−1
12
91 Md Shafiullah, Ph.D.
MA Process: Example
1
𝑦ො 𝑡 𝑡 − 2 = − 𝑒 𝑡−2
12
Therefore, we need a predictor that only uses output data. To do so, we need to
compute the whitening filter:
𝑦 𝑡 =𝑊 𝑧 𝑒 𝑡
෩ 𝑧 𝑦 𝑡
𝑒(𝑡) = 𝑊
෩ 𝑧 𝑦 𝑡−2
𝑒(𝑡 − 2) = 𝑊
Therefore,
1 1
𝑦ො 𝑡 𝑡 − 2 = − 𝑒 𝑡−2 =− 𝑊 ෩ 𝑧 𝑦 𝑡−2
12 12
92 Md Shafiullah, Ph.D.
MA Process: Example
Therefore,
1 1
𝑦ො 𝑡 𝑡 − 2 = − 𝑒 𝑡−2 =− 𝑊 ෩ 𝑧 𝑦 𝑡−2
12 12
1 1
𝑦ො 𝑡 𝑡 − 2 = − 𝑦 𝑡−2
12 1 − 1 𝑧 −1 − 1 𝑧 −2
12 12
1 −1 1 1
1− 𝑧 − 𝑧 −2 𝑦ො 𝑡 𝑡 − 2 = − 𝑦 𝑡 − 2
12 12 12
1 1 1
𝑦ො 𝑡 𝑡 − 2 − 𝑦ො 𝑡 − 1 𝑡 − 3 − 𝑦ො 𝑡 − 2 𝑡 − 4 = − 𝑦 𝑡 − 2
12 12 12
1 1 1
𝑦ො 𝑡 𝑡 − 2 = 𝑦ො 𝑡 − 1 𝑡 − 3 + 𝑦ො 𝑡 − 2 𝑡 − 4 − 𝑦 𝑡−2
12 12 12
93 Md Shafiullah, Ph.D.
MA Process: Example
Variance of the predictor error:
2
1 1
𝑉𝑎𝑟 𝜀 𝑡 = 𝑉𝑎𝑟 𝑒 𝑡 − 𝑒 𝑡 − 1 =𝐸 𝑒 𝑡 − 𝑒 𝑡−1
12 12
2
1 2
1
=𝐸 𝑒 𝑡 + 𝑒 𝑡−1 −2×𝑒 𝑡 × 𝑒 𝑡−1
144 12
2
1 2
1
=𝐸 𝑒 𝑡 +𝐸 𝑒 𝑡−1 +𝐸 − 𝑒 𝑡 ×𝑒 𝑡−1
144 6
1 1 145
=1+ ×1− ×0=
144 6 144
94 Md Shafiullah, Ph.D.
AR Process: Example
Evaluate the 2-step ahead predictor of the following AR(1) process:
1
𝑦 𝑡 = − 𝑦 𝑡 − 1 + 𝑒 𝑡 ; 𝑒 𝑡 ~𝜔𝑛 (14,108)
2
𝑦ො 𝑡 𝑡 − 2 =? ? ?
1
1 + 𝑧 −1 1
𝑦 𝑡 = 5 𝑢 𝑡−1 + 𝑒 𝑡 ; 𝑒 𝑡 ~𝜔𝑛 (0,2)
1 −1 1 −1
1− 𝑧 1− 𝑧
2 2
𝑦ො 𝑡 𝑡 − 1 =? ? ?
1 1
𝑦 𝑡 = − 𝑦 𝑡 − 1 + 𝑒 𝑡 + 𝑒(𝑡 − 1); 𝑒 𝑡 ~𝜔𝑛 (0,2)
3 2
𝑦ො 𝑡 𝑡 − 1 =? ? ?
1 1 −2
1 + 𝑧 −1 1 + 𝑧
𝑦 𝑡 = 5 𝑢 𝑡−1 + 4 𝑒 𝑡 ; 𝑒 𝑡 ~𝜔𝑛 (0,2)
1 −1 1 −1
1− 𝑧 1− 𝑧
2 2
𝑦ො 𝑡 𝑡 − 1 =? ? ?
𝑦ො 𝑡 𝑡 − 2 =? ? ?
Content Courtesy
Dr. Mujahed Mohammad Al-Dhaifallah
Dr. Fouad M. Al-Sunni
Mr. Mohamed Mohamed Ahmed
Text and Reference Books
Online Materials
Appendix
𝑦ො 𝑘 𝑘 − 𝑛 = 𝑊𝑛 𝑞 −1 𝐺 𝑞 −1 𝑢 𝑘 + 1 − 𝑊𝑛 𝑞 −1 𝑦(𝑘)
𝑊𝑛 𝑞−1 = 𝐻 𝑞 −1 𝐻−1 𝑞 −1 .
𝑞 −1 is the first n terms of 𝐻 𝑞 −1 .
Where 𝐻
Example: Consider the following system where 𝑤(𝑡) is white noise with a
variance of 1
𝑦 𝑘 − 𝑦 k − 1 + 0.09𝑦 k − 2 = 𝑢 k − 1 + 2𝑤 𝑘
1
𝐻 𝑞 =
1 − 0.1𝑞−1 1 − 0.9𝑞−1
= 1 + 0.1𝑞−1 + 0.01𝑞−2 + ⋯ 1 + 0.9𝑞−1 + 0.81𝑞−2 + ⋯
= 1 + 𝑞−1 + 𝑞−2 0.91 + ⋯ .
𝑞
𝐻 ෩2 𝑞
𝐻
𝑞 −1 𝐻−1 𝑞 −1 = 1 + 𝑞 −1 (1 − 𝑞 −1 + 0.09𝑞 −2 )
𝑊2 𝑞 = 𝐻
𝑦(k
ƶ ∣ 𝑘 − 2) = 𝑊2 𝑞 𝐺(𝑞)𝑢(k) + 1 − 𝑊2 𝑞 𝑦(k)
= (1 + 𝑞 −1 )𝑞 −1 𝑢(k) + 0.81𝑞 −2 − 0.09𝑞 −3 𝑦(k)
= 𝑢 𝑘 − 1 + u(k − 2) + 0.81𝑦(𝑘 − 2) − 0.09𝑦(𝑘 − 3)
https://arch.readthedocs.io/en/latest/univariate/univariate_forecasting_with_exoge
nous_variables.html
https://www.mathworks.com/help/ident/ref/arx.html
https://www.mathworks.com/help/ident/ref/predict.html
https://busoniu.net/teaching/sysid2017/