Nips Tutorial 2016
Nips Tutorial 2016
Forecasting Non-Stationary
Time Series
• stock values.
• economic variables.
• sensors: Internet-of-Things.
• earthquakes.
• energy demand.
• signal processing.
• sales forecasting.
• IID assumption.
j=1
where
same distribution
• E[Zt ] is independent of t .
Characteristic polynomial
p
X
P (z) = 1 ai z i
i=1
can be used to study properties of this stochastic process.
= c0 (1 1
1
z) · · · (1 p
1
z)
where | i | > 1 for all i = 1, . . . , p and c, c0 are constants.
where
! 1 1 ⇣
X ⌘k
1 1
L = 1
L
i i
k=0
is well-defined since | i
1
| < 1.
Theory and Algorithms for Forecasting Non-Stationary Time Series page 19
Proof
Therefore, it suffices to show that
X1
Yt = j ✏t j
j=0
is weakly stationary.
p
! !D q
!
X X
i
1 ai L 1 L Yt = 1 + bj Lj ✏t.
i=1 j=1
where
B A
n n+k
Theory and Algorithms for Forecasting Non-Stationary Time Series page 34
Learning Theory
Stationary and -mixing process: generalization bounds.
• not testable.
1 t T T +1
1 Xh i
T
Key average quantity: L(h, ZT1 ) L(h, Zt1 1 ) .
T t=1
1 T s T T +1
g(z2 ), v2 g(z3 ), v3
1 +1 1 +1
Z1 , Z10 ⇠ D1
Z4 , Z40 Z5 , Z50
⇠ D3 (·|Z1 , Z2 ) ⇠ D3 (·|Z1 , Z20 ) ZT : distribution based on Zt s.
• where D = {r : rt 1}.
• convex optimization problem.
Discrepancies Weights
Theory and Algorithms for Forecasting Non-Stationary Time Series page 66
Running MSE
• distributional assumption.
On-line scenario:
• no distributional assumption.
For t = 1 to T do
• player receives xt 2 X .
• player selects ht 2 H .
• adversary selects yt 2 Y .
EW({E1 , . . . , EN })
1 for i 1 to N do
2 w1,i 1
3 for t 1 to T do
4 Receive(x P t)
N
i=1 wt,i Ei
5 ht P N
i=1 wt,i
6 Receive(yt )
7 Incur-Loss(L(ht (xt ), yt ))
8 for i 1 to N do
9 wt+1,i wt,i e ⌘L(Ei (xt ),yt ) . (parameter ⌘ > 0)
10 return hT
✓r ◆
RegT log N
=O .
T T
Upper bound:
PN
i=1 wt 1,i e ⌘L(Ei (xt ),yt )
t t 1 = log PN
i=1 wt 1,i
= log E [e ⌘L(Ei (xt ),yt ) ]
wt 1
✓ ✓ ⇣ ⌘ ◆ ◆
= log E exp ⌘ L(Ei (xt ), yt ) E [L(Ei (xt ), yt )] ⌘ E [L(Ei (xt ), yt )]
wt 1 wt 1 wt 1
⌘2
⌘ E [L(Ei (xt ), yt )] + (Hoe↵ding’s ineq.)
wt 1 8
⌘2
⌘L E [Ei (xt )], yt + (convexity of first arg. of L)
wt 1 8
⌘2
= ⌘L(ht (xt ), yt ) + .
8
1 t+1 T T +1
XT h i
1
Average difference: LT +1 (ht , ZT1 ) Lt (ht , Zt1 1 ) .
T t=1
Alternative method:
where
T
X h i
d H (q) =
disc sup qt L ht (XT +1 ), h(XT +1 ) L ht , Z t .
h2H,h2HA t=1
LT +1 (h, ZT1 )
XT
⇤ T RegT
inf LT +1 (h , Z1 ) + 2disc(q) +
⇤
h 2H
t=1
T
r
2
+ M kq uk1 + 2M kqk2 2 log .
Theory and Algorithms for Forecasting Non-Stationary Time Series page 100
References
V. Pestov. Predictive PAC learnability: A paradigm for
learning from exchangeable input data. GRC, 2010.
Theory and Algorithms for Forecasting Non-Stationary Time Series page 102
References
B. Yu. Rates of convergence for empirical processes of
stationary mixing sequences. The Annals of Probability,
22(1):94–116, 1994.
Theory and Algorithms for Forecasting Non-Stationary Time Series page 103
𝜷-Mixing
Definition: a sequence of random variables Z = {Zt }+1
1 is
-mixing if
(k) = sup En sup P[A | B] P[A] ! 0.
n B2 1 A2 1
n+k
B A
n n+k
Theory and Algorithms for Forecasting Non-Stationary Time Series page 104