0% found this document useful (0 votes)
11 views

Lecture 9

Uploaded by

dylan.j.gormley
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 9

Uploaded by

dylan.j.gormley
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Lecture 9

Time Series
Dr. Amr El-Wakeel
Lane Department of Computer
Science and Electrical Engineering

Spring 24
Time Series

Acknowledgment: Dr. Omid Dehzangi


Learning Objectives
• what is forecasting?
• Explain time series & its components
• Smooth a data series
– Moving average
– Exponential smoothing
• Forecast using trend models
Simple Linear Regression
Auto-regressive and beyond
What Is Forecasting?
• Process of predicting a
future event
• Underlying basis of
many decisions
– Business
– Medicine
– Production
– …
Forecasting Approaches
Qualitative Methods Quantitative Methods
• Used when situation is • Used when situation is
vague & little data exist ‘stable’ & historical data
– New products exist
– New technology – Existing products
• Involve intuition, – Current technology
experience • Involve mathematical
• e.g., forecasting sales on techniques
Internet • e.g., forecasting sales of
color televisions
Quantitative Forecasting
• Select several forecasting methods
• ‘Forecast’ the past
• Evaluate forecasts
• Select best method
• Forecast the future
• Monitor the forecast accuracy continuously
Quantitative Forecasting Methods

Quantitative
Forecasting

Time Series Causal


Models Models

Moving Exponential Trend


Regression
Average Smoothing Models
What is a Time Series?
• Set of evenly spaced numerical data
– Obtained by observing response variable at
regular time periods
• Forecast based only on past values
– Assumes that factors influencing past, present, &
future will continue
• Example
– Year: 1995 1996 1997 1998 1999
– Sales: 78.7 63.5 89.7 93.2 92.1
Time Series

Time series data is a sequence of


observations

– collected from a process


– with equally spaced periods of time
– Dynamic, it does change over time.
Time Series
• When working with time series data, it
is paramount that the data is plotted so
the researcher can view the data.
Topics

• Component Factors of the Time-Series Model


• Smoothing of Data Series
– Moving Averages
– Exponential Smoothing
• Least Square Trend Fitting and Forecasting
– Linear, Quadratic and Exponential Models
• Autoregressive Models and beyond
• Choosing Appropriate Models
Time Series Components

Trend Cyclical

Seasonal Irregular
Trend Component

• Overall Upward or Downward Movement


• Data Taken Over a Period of Years
Sales

Time
Cyclical Component

• Upward or Downward Swings


• May Vary in Length

Sales

Time
Seasonal Component

• Upward or Downward Swings


• Regular Patterns
• E.g. Observed Within 1 Year
Sales

Time (Monthly or Quarterly)


Random or Irregular Component

• Erratic, Nonsystematic, Random, ‘Residual’


Fluctuations
• Due to Random Variations of
– Nature
– Accidents
• Short Duration and Non-repeating
Time series Variations

Irregular
variation

Trend

Cyclical
Cycles

Year 01
00
99
Seasonal variations
Multiplicative Time-Series Model

•Used Primarily for Forecasting


•Observed Value in Time Series is the
product of Components
•For Annual Data: Ti = Trend

Y i = Ti  C i  I i Ci = Cyclical
Ii = Irregular
•For Quarterly or Monthly Data: Si = Seasonal

Yi = Ti  Si  Ci  I i
Time Series Forecasting
Time
Series

Smoothing No Yes Trend


Methods Trend? Models

Moving Exponential
Average Smoothing

Auto-
Linear Quadratic Exponential Regressive
Plotting Time Series Data
Time Series Forecasting
Time
Series

Smoothing No Yes Trend


Methods Trend? Models

Moving Exponential
Average Smoothing

Auto-
Linear Quadratic Exponential Regressive
Moving Average Method

• Series of arithmetic means


• Used primarily for smoothing
– Provides overall impression of data over time

• Used for elementary forecasting


Moving Averages

• Series of Arithmetic Means Over Time


• Result Dependent Upon Choice of L, Length of
Period for Computing Means
• Generally, L Should be Odd
• Example: 3-year Moving Average
– First Average:
𝒀 𝟏 + 𝒀𝟐 + 𝒀𝟑
𝑴𝑨(𝟑) =
𝟑
– Second Average:
𝒀𝟐 + 𝒀𝟑 + 𝒀𝟒
𝑴𝑨(𝟑) =
𝟑
Moving Average [Example]
• You work for Firestone Tire and want to
smooth random fluctuations using a 3-period
moving average.
Year Sales MA(3) in 1,000
1995 20,000 NA
1996 24,000 (20+24+22)/3 = 22
1997 22,000 (24+22+26)/3 = 24
1998 26,000 (22+26+25)/3 = 24
1999 25,000 NA
Moving Average Example

John is a building contractor with a record of a total of


24 single family homes constructed over a 6 year
period. Provide John with a Moving Average Graph.
Year Units Moving
Ave
1994 2 NA
1995 5 3
1996 2 3
1997 2 3.67
1998 7 5
1999 6 NA
Moving Average Example Solution

Year Response Moving


Ave
Sales
1994 2 NA
8
1995 5 3
6
1996 2 3
4
1997 2 3.67
2
1998 7 5
0
1999 6 NA 94 95 96 97 98 99
Exponential Smoothing

• Weighted Moving Average


– Weights Decline Exponentially
– Most Recent Observation Weighted Most
• Used for Smoothing and Short-Term
Forecasting
• Weights Are:
– Subjectively Chosen
– Ranges from 0 to 1
ES as Weighted Average

𝑬𝒊 = 𝑾𝒀𝒊 + (𝟏 − 𝑾)𝑬𝒊−𝟏

Idea--The most recent observations might have the


highest predictive value along with the most
recent forecast errors. Let us balance them:
𝑬𝒊

𝑾𝒀𝒊 (𝟏 − 𝑾)𝑬𝒊−𝟏
Exponential Smoothing Constant

Exponential smoothing constant, 0 < w < 1


• w close to 0
– More weight given to previous values of time
series
– Smoother series
• w close to 1
– More weight given to current value of time series
– Series looks similar to original (more variable)

© 2011 Pearson Education, Inc


Steps for Calculating an Exponentially
Smoothed Series

1. Select an exponential smoothing constant, w,


between 0 and 1. Remember that small
values of w give less weight to the current
value of the series and yield a smoother
series. Larger choices of w assign more
weight to the current value of the series and
yield a more variable series.
Steps for Calculating an Exponentially
Smoothed Series

2. Calculate the exponentially smoothed series


Et from the original time series Yt as follows:
E1 = Y1
E2 = wY2 + (1 – w)E1
E3 = wY3 + (1 – w)E2

Et = wYt + (1 – w)Et–1

Exponential Weight: Example

𝑬𝒊 = 𝑾𝒀𝒊 + (𝟏 − 𝑾)𝑬𝒊−𝟏
Exponential Weight: Example Graph

Sales
8

6 Data

0 Smoothed

94 95 96 97 98 99 Year
Forecast Effect of Smoothing Coefficient (W)

^
Yi+1 = W·Yi + W·(1-W)·Yi-1 + W·(1-W)2·Yi-2 +...

Weight
2 Periods 3 Periods
W is... Prior Period Ago Ago
2
W W(1-W) W(1-W)
0.10 10% 9% 8.1%
0.90 90% 9% 0.9%
Linear Time-Series
Forecasting Model
Linear Time-Series Forecasting Model

• Used for forecasting trend


• Relationship between response variable Y &
time X is a linear function
• Coded X values used often
– Year X: 1995 1996 1997 1998 1999
– Coded year: 0 1 2 3 4
– Sales Y: 78.7 63.5 89.7 93.2 92.1
Linear Time-Series Model

෡ 𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊
𝒀

Y b1 > 0

b1 < 0

Time, X1
The Linear Trend Model

Year Coded Sales ෡ 𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊 = 𝟐. 𝟏𝟒𝟑+. 𝟕𝟒𝟑𝑿𝒊


𝒀
94 0 2
8

95 1 5 7

96 2 2 6

97 3 2 5

98 4 7 4 Projected to
year 2000
99 5 6 3

2
Output
Coefficients 1

Intercept 2.14285714 0
X Variable 1 0.74285714 1993 1994 1995 1996 1997 1998 1999 2000
Computing a and b

Given n data points, find the intercept a and the slope b to


Minimize the sum of squared errors =
Minimize the sum of deviations from the line =
n
Minimize  ( y t − a − bxt ) 2
t =1
n n n
n  xt yt −  xt  yt
n n

y t x t
b= t =1 t =1 t =1
2 a= t =1
−b t =1
n
 n  n n
n xt −   xt 
2

t =1  t =1 
Linear Model Seems Reasonable
X Y Computed
7 15 50
relationship
2 10
40
6 13
4 15 30
14 25
15 27 20
16 24
10
12 20
14 27 0
20 44 0 5 10 15 20 25
15 34
7 17
Another Example
Variables: Weeks and Sales

t y
2
Week t Sales ty
1 1 150 150
2 4 157 314
3 9 162 486
4 16 166 664
5 25 177 885

 t = 15  t = 55  y = 812  ty = 2499
2
2
( t) = 225
Linear Trend Calculation

5 (2499) - 15(812) 12495-12180


b = = = 6.3
5(55) - 225 275 -225

812 - 6.3(15)
a = = 143.5
5
y = 143.5 + 6.3 t
Sales in week t = 143.5 + 6.3 t
Linear Trend Calculation

y = 143.5 + 6.3t
When t = 0, the value of y is 143.45 and the
slope of the line is 6.3. meaning that the
value of of y will increase by 6.3 units for
each time period. If t = 10, the forecast is
143.5 + 6.3(10) = 206.5

43
Quadratic Time-Series Model
• Used for forecasting trend
• Relationship between response variable Y &
time X is a quadratic function
• Quadratic model

෡ 𝒊 = 𝒃𝟎 + 𝒃𝟏 𝑿𝒊 + 𝒃𝟐 𝑿𝟐𝒊
𝒀
The Quadratic Trend Model

෠ 2
Year Coded Sales 𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖 + 𝑏2 𝑋𝑖
94 0 2
95 1 5 Coefficients
96 2 2 I n te rc e p t 2.85714286
97 3 2 X V a ri a b l e 1 -0 . 3 2 8 5 7 1 4
98 4 7 X V a ri a b l e 2 0.21428571
99 5 6
Output

෠ 2
𝑌𝑖 = 2.857 − 0.33𝑋𝑖 + .214𝑋𝑖
Exponential Time-Series Model

• Used for forecasting trend


• Relationship is an exponential function
• Series increases (decreases) at increasing
(decreasing) rate
Exponential Time-Series Model
Relationships

Y b1 > 1

0 < b1 < 1
Year, X1
Exponential Trend Model
෡ 𝒊 = 𝒃𝟎 𝒃𝟏 𝑿 𝒊
𝒀 or ෣𝒀𝒊 = 𝐥𝐨𝐠 𝒃𝟎 + 𝑿𝟏 𝐥𝐨𝐠 𝒃𝟏
𝐥𝐨𝐠

Year Coded Sales


C o e f f ic ie n t s
94 0 2
In t e rc e p t 0.33583795
95 1 5
X V a ria b le 10 . 0 8 0 6 8 5 4 4
96 2 2
Output of Values in logs
97 3 2
98 4 7 a n t ilo g (. 3 3 5 8 3 7 9 5 ) = 2.17
99 5 6 a n t ilo g (. 0 8 0 6 8 5 4 4 ) = 1.2

෡ 𝒊 = (𝟐. 𝟏𝟕)(𝟏. 𝟐)𝑿𝒊


𝒀
Autoregressive Modeling
• Like regression model
– Independent variables are lagged response
variables Yi-1, Yi-2, Yi-3 etc.
• Makes an assumption that the observations at
previous time steps are useful to predict the
value at the next time step
– 1st Order: Correlated with prior period
• Estimate with ordinary least squares
Auto-correlation
• Autocorrelation, AKA serial correlation, is the
correlation of a signal with a delayed copy of
itself as a function of delay.
• The similarity between observations as a
function of the time lag between them.

• A mathematical tool for finding


repeating patterns, e.g. the
presence of a periodic signal
obscured by noise.
Autoregressive Modeling

• Used for Forecasting


• Takes Advantage of Autocorrelation
– 1st order - correlation between consecutive values
– 2nd order - correlation between values 2 periods apart

• Autoregressive Model for p-th order: Random


Error

𝒀𝒊 = 𝑨𝟎 + 𝑨𝟏 𝒀𝒊−𝟏 + 𝑨𝟐 𝒀𝒊−𝟐 +••• +𝑨𝒑 𝒀𝒊−𝒑 + 𝜹𝒊


Autoregressive Model: Example

The Office Concept Corp. has acquired a number of


office units (in thousands of square feet) over the last 8
years. Develop the 2nd order Autoregressive models.
Year Units
92 4
93 3
94 2
95 3
96 2
97 2
98 4
99 6
Autoregressive Model: Example Solution

•Develop the 2nd order table Year Yi Yi-1 Yi-2


•Run a regression model 92 4 --- ---
93 3 4 ---
94 2 3 4
95 3 2 3
Output 96 2 3 2
Coefficients 97 2 2 3
I n te rc e p t 3.5 98 4 2 2
X V a ri a b l e 1 0.8125 99 6 4 2
X V a ri a b l e 2 -0 . 9 3 7 5

𝒀𝒊 = 𝟑. 𝟓+. 𝟖𝟏𝟐𝟓𝒀𝒊−𝟏 −. 𝟗𝟑𝟕𝟓𝒀𝒊−𝟐


Autoregressive Model Example: Forecasting

Use the 2nd order model to forecast number of


units for 2000:

𝒀𝒊 = 𝟑. 𝟓+. 𝟖𝟏𝟐𝟓𝒀𝒊−𝟏 −. 𝟗𝟑𝟕𝟓𝒀𝒊−𝟐

𝒀𝟐𝟎𝟎𝟎 = 𝟑. 𝟓+. 𝟖𝟏𝟐𝟓𝒀𝟏𝟗𝟗𝟗 −. 𝟗𝟑𝟕𝟓𝒀𝟏𝟗𝟗𝟖


= 𝟑. 𝟓+. 𝟖𝟏𝟐𝟓 × 𝟔−. 𝟗𝟑𝟕𝟓 × 𝟒
= 𝟒. 𝟔𝟐𝟓
Autoregressive Modeling Steps

1. Choose p: Note that df = n - 2p - 1


2. Form a series of “lag predictor” variables
Yi-1 , Yi-2 , … Yi-p
3. Run regression model using all p variables
4. Test significance of Ap
If null hypothesis rejected, this model is selected
If null hypothesis not rejected, decrease p by 1 and
repeat
Stationary Time Series

• (Weakly) stationary
– The covariance is independent of t for each h
 X ( X t , X t − h )  E( X t −  )( X t − h −  )
– The mean is independent of t
E( X t ) = 
Why Stationary Time Series?

• Stationary time series have the best linear


predictor.
• Nonstationary time series models are usually
slower to implement for prediction.
Converting to Stationary Time Series

• Remove deterministic factors


– Trends
• Exponential smoothing
• Moving average smoothing
• Differencing
 = X t − X t −1 = (1 − B) X t
- B is a back shift operator

 d = X t − X t − d = (1 − B d ) X t
E.g. AR: Stationary Models

• AR (AutoRegressive)
𝑿𝒊 = 𝝋𝟏 𝑿𝒊−𝟏 + ⋯ + 𝝋𝒑 𝑿𝟏−𝒑 + 𝜹𝒊 ,
𝒁𝒕 ~𝑾𝑵(𝟎, 𝝈𝟐 ),

• AR’s predictor

𝑷𝒏 𝑿𝒏+𝟏 = 𝝋𝟏 𝑿𝒏 + ⋯ + 𝝋𝒑 𝑿𝒏−𝒑+𝟏
ARMA and ARIMA

• We introduced moving average processes,


and autoregressive processes as two ways to
model time series.

• Now, we will combine both methods and


explore how ARMA(p,q) and ARIMA(p,d,q)
models can help us to model and forecast
more complex time series.
Stationary Models

• ARMA
– Reduces large autocovariance functions
– A transformed linear predictor is used
ARMA = AR + MA
Autoregressive process of order p

Where:
•p is the order
•c is a constant
•epsilon: noise
Also, a moving average process q is defined as:

Moving average process of order q

Where:
•q is the order
•c is a constant
•epsilon is noise
ARMA
ARMA(p,q) is simply the combination of both models into a
single equation:

Hence, this model can explain the relationship of a time series


with both random noise (moving average part) and itself at a
previous step (autoregressive part).

And both an AR(p) process and a MA(q) process is in play.


ARIMA
ARIMA stands for AutoRegressive Integrated Moving Average:

– AR Autoregression. A model that uses the dependent


relationship between an observation and some number of
lagged observations.
– I Integrated. The use of differencing of raw observations in
order to make the time series stationary.
– MA Moving Average. A model that uses the dependency
between an observation and a residual error from a moving
average model applied to lagged observations.

• AR and MA: linear models that work on stationary time


series, and I is a preprocessing procedure to “stationarize”
time series if needed.
ARIMA

• A standard notation is used of ARIMA(p, d, q) where the parameters are


substituted with integer values to quickly indicate the specific ARIMA model
being used.
o p The number of lag observations included in the model, also called the lag order.
o d The number of times that the raw observations are differenced, also called the
degree of differencing.
o q The size of the moving average window, also called the order of moving average.

• A value of 0 can be used for a parameter, which indicates to not use that element
of the model.
• In other words, ARIMA model can be configured to perform the function of an
ARMA model, and even a simple AR, I, or MA model.
and the equations is expressed as:
ACF and PACF
ACF and PACF
• ACF is an (complete) auto-correlation function which gives us
values of auto-correlation of any series with its lagged values.
ACF and PACF
PACF is a partial auto-correlation function. Instead of finding correlations of
present with lags like ACF, it finds correlation of the residuals (which remains
after removing the effects which are already explained by the earlier lag(s))
with the next lag value hence
‘partial’ and not ‘complete’ as
we remove already found
variations before we find the
next correlation.
ARMA and ARIMA
• ACF and PACF can be used for determining
ARIMA model hyperparamters p and q.

• "Cuts off" means that it becomes zero abruptly,


• "tails off" means that it decays to zero asymptotically ( usually exponentially).
ARMA and ARIMA

• ARMA and ARIMA are mainly suited for


building univariate (single variable) based
model.

• There are other versions that can handle:


o Vector Autoregressive Model (VAR). It’s basically a
multivariate linear time-series model, designed to
capture the dynamics between multiple time-series.
Other Models

• Multivariate Cointegration
• SARIMA
• FARIMA
• …
Ensemble methods

• A single decision tree does not perform well

• But, it is super fast

• What if we learn multiple trees?

We need to make sure they do not all just learn the same
Bagging

• If we split the data in random different ways, decision


trees give different results, high variance.

• Bagging: Bootstrap aggregating is a method that result in


low variance.

• If we had multiple realizations of the data (or multiple


samples) we could calculate the predictions multiple
times and take the average of the fact that averaging
multiple onerous estimations produce less uncertain
results
Bagging
• Say for each sample b, we calculate fb(x), then:

How?

Bootstrap
• Construct B (hundreds) of trees (no pruning)
• Learn a classifier for each bootstrap sample and
average them
• Very effective
Bagging

• Reduces overfitting (variance)

• Normally uses one type of classifier

• Decision trees are popular

• Easy to parallelize
Bagging decision trees

Hastie et al.,”The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, Springer (2009)
No Overfitting
X2

X1
Random Forest Time series prediction

• Random Forest can also be used for time


series forecasting
• It requires that the time series dataset be
transformed into a supervised learning
problem first
• Also requires the use of a specialized
technique for evaluating the model called
walk-forward validation
o valuating the model using k-fold cross validation
would result in optimistically biased results.
Quantitative Forecasting Steps

• Select several forecasting methods


• ‘Forecast’ the past
 • Evaluate forecasts
• Select best method
• Forecast the future
• Monitor continuously forecast accuracy
Forecasting Guidelines
• No pattern or direction in forecast error
– ei = (Actual Yi - Forecast Yi)
– Seen in plots of errors over time
• Smallest forecast error
– Measured by e.g. mean absolute deviation
• Simplest model
– Called principle of parsimony
Selecting A Forecasting Model

• Perform A Residual Analysis


– Look for pattern or direction
• Measure Sum Square Errors - SSE (residual
errors)
• Measure Residual Errors Using MAD
• Use Simplest Model
– Principle of Parsimony
Residual Analysis
• We call time series forecasts, the fitted
values and they are denoted by , the
forecast of yt based on observations y1,…, yt−1
• The “residuals”: what is left over after fitting a
model. For most time series models, the
residuals are equal to the difference between the
observations and the corresponding fitted
values:

• Residuals are useful in checking whether a model
has adequately captured the information in the
data.

https://otexts.com/fpp2/residuals.html
Residual Analysis

e e

0 0

T T
Random errors Cyclical effects not accounted for

e e

0 0

T T
Trend not accounted for Seasonal effects not accounted for
Measuring Errors

• Sum Square Error (SSE)


𝒏
෡ 𝒊 )𝟐
𝑺𝑺𝑬 = ෍(𝒀𝒊 − 𝒀
𝒊=𝟏

• Mean Absolute Deviation (MAD)


෡𝒊
σ𝒏𝒊=𝟏 𝒀𝒊 − 𝒀
𝑴𝑨𝑫 =
𝒏
Principal of Parsimony
• Suppose two or more models provide
good fit for data
• Select the Simplest Model
– Simplest model types:
• least-squares linear
• least-square quadratic
• 1st order autoregressive
– More complex types:
• 2nd and 3rd order autoregressive
• least-squares exponential
References

• Introduction to Time Series and Forecasting


2nd ed., P. Brockwell and R. Davis, Springer
Verlag
• Adaptive Filter Theory 4th ed., Simon Haykin,
Prentice Hall
• Time Series Analysis, James Douglas Hamilton,
Princeton University Press

You might also like