0% found this document useful (0 votes)
13 views65 pages

DSS16-Time Series

Uploaded by

Minh Tâm Trần
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views65 pages

DSS16-Time Series

Uploaded by

Minh Tâm Trần
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Data Support Systems

Lesson 16: Time Series

Dr. Le, Hai Ha


Contents

 Time Series
 Smoothing based methods
 Regression based methods
 Machine learning methods
 Performance evaluation

2
Introduction
• Time series is a series of observations listed in the order of
time
• The data points in a time series are usually recorded at
constant successive time intervals.
• Time series analysis is the process of extracting meaningful
non-trivial information and patterns from time series
• Time series forecasting is the process of predicting the
future value of time series data based on past observations
and other inputs
• The objective in time series forecasting is: to use historical
information about a particular quantity to make forecasts
about the value of the same quantity in the future.

3
Example

4
Time series analysis vs. supervised
predictive models
1. Time is an important predictor in many applications vs.
the predictors are observed in a point in time (like a cross-
section of a block of wood)

2. One may not be interested in or might not even have data


for other attributes that could potentially influence the
target variable

5
Texonomy of TS Forecasting
• Time series can also be broadly divided into descriptive
modeling, called time series analysis, and predictive
modeling, called time series forecasting
• Time Series forecasting can be further classified into four
broad categories of techniques:

6
Example

7
Time Series Decomposition
• Time series data, are univariate, an amalgamation of
multiple underlying phenomenon
• Trend: Trend is the long-term tendency of the data. It
represents change from one period to next. The trends can
be further split into a zero-basis trend and the level in the
data
• Seasonality: Seasonality is the repetitive behavior during a
cycle of time. Seasonality can be further split into hourly,
daily, weekly, monthly, quarterly, and yearly seasonality
• Cycle: Cyclic component represents longer-than-a-year
patterns where there is no specific time frames between the
cycles. Non-linear trend can represent a cyclic component
• Noise: In a time series, anything that is not represented by
level, trend, seasonality, or cyclic component is the noise in
the time series
8
Example

9
Additive and Multiplicative
• The trend and seasonality are the systematic components of
a time series. Systematic components can be forecasted. It
is impossible to forecast noise—the non-systematic
component of a time series
• In the time series decomposition can be classified into
additive decomposition and multiplicative decomposition,
based on the nature of the different components and how
they are composed
• Additive decomposition: Time series = Trend + Seasonality +
Noise
• Multiplicative decomposition: Time series = Trend x
Seasonality x Noise
• If the magnitude of the seasonal fluctuation or the variation
in trend changes, then multiplicative time series
decomposition is the better model
10
Example

11
Classical Decomposition
1. Estimate the trend 𝑇𝑡 : If 𝑚 is even, calculate 2 × 𝑚 moving average
(𝑚 − 𝑀𝐴 and then a 2 − 𝑀𝐴); if 𝑚 is odd, calculate 𝑚 − 𝑚𝑜𝑣𝑖𝑛𝑔
average. A moving average is the average of last 𝑚 data points.
2. Calculate detrended series: Calculate 𝑦𝑡 − 𝑇𝑡 for each data point in
the series.
3. Estimate the seasonal component 𝑆𝑡 : average (𝑦𝑡 − 𝑇𝑡 ) for each 𝑚
period. For example, calculate the average of all January values of
(𝑦𝑡 − 𝑇𝑡 ) and repeat for all the months. Normalize seasonal value in
such a way that mean is zero.
4. Calculate the noise component 𝐸𝑡 : 𝐸𝑡 = (𝑦𝑡 − 𝑇𝑡 − 𝑆𝑡 ) for each
data point in the series.

Multiplicative decomposition is similar to additive decomposition: Replace


subtraction with division in the algorithm described.
An alternative approach is to convert multiplicative decomposition into
additive by applying the logarithmic function

12
Implement

The quarterly Australian beer production dataset

13
Decomposition

14
Forecasting Using Decomposed Data
• The idea is to breakdown the time series into its parts,
forecast the parts, and put it back together for forecasted
future time series values

• It is assumed that the seasonal component of the time


series does not change. Hence, the forecast of the seasonal
component is the same as the values extracted from the
time series.
• The time series data without the seasonal component is
called seasonally adjusted time series
• Adjusted time series can be forecasted by relatively easier
methods: linear or polynomial regression

15
Smoothing based methods
• In the smoothing based approaches, an observation is a function
of past few observations
• Time periods: 𝑡 = 1, 2, 3, . . . , 𝑛. Time periods can be seconds,
days, weeks, months, or years depending on the problem
• Data series: Observation corresponding to each time period
above: 𝑦1 , 𝑦2 , 𝑦3 , . . . 𝑦𝑛 .
• Forecasts: 𝐹𝑛+ℎ is the forecast for the ℎ𝑡ℎ time period following
𝑛. Usually ℎ = 1, the next time period following the last data
point. However ℎ can be greater than 1. ℎ is called the horizon.
• Forecast errors: 𝑒𝑡 = 𝑦𝑡 − 𝐹𝑡 for any given time, 𝑡.
• The observations are made at a constant time interval. If in the
case of intermittent data, one can assume that an interpolation
scheme is applied to obtain equally spaced (in time) data points

16
Simple Forecasting Methods
• Naïve Method
• Seasonal Naive Method
where 𝑠 is the seasonal period
• Average Method
• Moving Average Smoothing
• Weighted Moving Average Smoothing

where typically 𝑎 > 𝑏 > 𝑐

17
Comparing one-step-ahead forecasts

18
Exponential Smoothing
• Exponential smoothing is the weighted average of the past
data, with the recent data points given more weight than
earlier data points
• The weights decay exponentially towards the earlier data
points, hence, the name

α is generally between 0 and 1


• To forecast the future values using exponential smoothing,
the formula can be rewritten as

19
Example

20
Holt’s Two-Parameter Exponential
Smoothing
• The forecast can be expressed as a sum of these two
components, average value or “level” of the series, 𝐿𝑛 , and
trend, 𝑇𝑛 , recursively as:

• Can forecast h horizontal

21
Holt-Winters’ Three-Parameter
Exponential Smoothing
• When a time series contains seasonality in addition to a
trend, yet another parameter, 𝛾, will be needed to estimate
the seasonal component of the time series

where 𝑝 is the seasonality period.


• One can estimate the value of the parameters 𝛼, 𝛽, and 𝛾
from the fitting of the smoothing equation with the training
data

22
Implement

23
Example

24
Regression based methods
• In the regression based methods, the variable time is the
predictor or independent variable and the time series value
is the dependent variable
• Regression based methods are generally preferable when
the time series appears to have a global pattern
• For a time series with local patterns instead of a global
pattern, using a regression based approach requires one to
specify how and when the patterns change, which is difficult
• For such a series, smoothing approaches work best because
these methods usually rely on extrapolating the most recent
local pattern as seen earlier.

25
Example

26
Regression
• The simplest of the regression based approaches for
analyzing a time series is using linear regression
• The time period is the independent variable and attempts to
predict the time series value using this
• The linear regression model is able to capture the long-term
tendency of the series, but it does a poor job of fitting the
data
• Sophisticated polynomial functions can be used to improve
the fit

27
Regression

28
Regression With Seasonality
• Seasonal dummy variables are introduced for each period
(quarter), of the series, which triggers either 1 or 0 in the
attribute values

29
Implement

30
31
Autoregressive Integrated Moving
Average
• ARIMA stands for Autoregressive Integrated Moving Average
model and is one of the most popular models for time series
forecasting
• The ARIMA methodology originally developed by Box and
Jenkins in the 1970s

32
Autocorrelation
• Correlation measures how two variables are dependent on
each other or if they have a linear relationship with each
other “1-lag” series “4-lag” series

33
Autocorrelation
• There is a strong correlation between the original time
series “prod” and 4-lag “prod-4”. They tend to move
together
• This phenomenon is called autocorrelation, where the time
series is correlated with its own data points, with a lag
• One can measure the strength of correlation between the
original time series and all the lag series. The plot of the
resultant correlation matrix is called an Autocorrelation
Function (ACF) chart
• The ACF chart is used to study all the available seasonality in
the time series

34
35
Autoregressive Models
• Autoregressive models are regression models applied on lag
series generated using the original time series
• In the case of autoregression models, the output is the
future data point and it can be expressed as a linear
combination for past 𝑝 data points
• 𝑝 is the lag window

where, 𝑙 is the level in the dataset and 𝑒 is the noise. 𝛼 are


the coefficients that need to be learned from the data
• This can be referred to as an autoregressive model with 𝑝
lags or an 𝐴𝑅(𝑝) model.
• In an 𝐴𝑅(𝑝) model, lag series is a new predictor used to fit
the dependent variable, which is still the original series
value, 𝑌𝑡
36
Stationary Data
• In a time series with trends or seasonality, the value is
affected by time
• A time series is called stationary when the value of time
series is not dependent on time
• For instance, random white noise is a stationary time series
• Stationary time series do not have any means of being
forecasted as they are completely random

37
Example

38
Differencing
• A non-stationary time series can be converted to a stationary time
series through a technique called differencing
• Differencing series is the change between consecutive data points in the
series

• This is called first order differencing. In some case a second order


differencing is required
• To generalize, differencing of order d is used to convert nonstationary
time series to stationary time series
• Seasonal differencing is the change between the same period in two
different seasons. Assume a season has period, 𝑚.

• It is also called as m-lag first order differencing

39
Example

40
Moving Average of Error
• In addition to creating a regression of actual past “p” values,
one can also create a regression equation involving forecast
errors of past data and use it as a predictor. Consider this
equation with:

• where 𝑒𝑖 is the forecast error of data point 𝑖


• The regression equation for 𝑦𝑡 can be understood as the
weighted (𝜃) moving average of past 𝑞 forecast errors. This
is called Moving Average with 𝑞 lags model or 𝑀𝐴(𝑞)

41
Autoregressive Integrated Moving
Average
• The Autoregressive Integrated Moving Average (ARIMA)
model is a combination of the differenced autoregressive
model with the moving average model.
• It is expressed as:

• The I part of ARIMA shows that the data values have been
replaced with differenced values of 𝑑 order to obtain
stationary data, which is the requirement of the ARIMA
model approach
• The prediction is the differenced 𝑦𝑡 in the 𝑑𝑡ℎ order
• This is called the 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) model
• Estimating the coefficients 𝛼 and 𝜃 for a given 𝑝, 𝑑, 𝑞 is what
ARIMA does when it learns from the training data in a time
series
42
ARIMA
• Specifying 𝑝, 𝑑, 𝑞 can be tricky (and a key limitation) but one
can tryout different combinations and evaluate the
performance of the model
• Once the ARIMA model is specified with the value of 𝑝, 𝑑, 𝑞,
the coefficients need to be estimated
• The most common way to estimate is through the Maximum
Likelihood Estimation
• It is similar to the Least Square Estimation for the regression
equation, except MLE finds the coefficients of the model in
such a way that it maximizes the chances of finding the
actual data

43
Special cases of an ARIMA model
• ARIMA (0,1,0) is expressed as 𝑦𝑡 = 𝑦𝑡−1 + 𝑒. It is the naive
model with error, which is called the Random walk model.
• ARIMA (0,1,0) is expressed as 𝑦𝑡 = 𝑦𝑡−1 + 𝑒 + 𝑐. It is a
random walk model with a constant trend. It is called
random walk with drift.
• ARIMA (0,0,0) is 𝑦𝑡 = 𝑒 or white noise
• ARIMA (p,0,0) is the autoregressive model

44
Implement

45
46
Seasonal ARIMA
• The ARIMA model can be further enhanced to take into
account of the seasonality in the time series
• Seasonal ARIMA is expressed by the notion
𝐴𝑅𝐼𝑀𝐴 𝑝, 𝑑, 𝑞 𝑃, 𝐷, 𝑄 𝑚 where:
• 𝑝 is the order of nonseasonal autoregression
• 𝑑 is the degree of differencing
• 𝑞 is the order of nonseasonal moving average of the error
• 𝑃 is the order of the seasonal autoregression
• 𝐷 is the degree of seasonal differencing
• 𝑄 is the order of seasonal moving average of the error
• 𝑚 is the number of observations in the year (for yearly seasonality)

47
Implement

48
49
Machine learning methods
• A time series is a unique dataset where the information
used to predict future data points can be extracted from
past data points
• A subset of the past known data points can be used as
inputs to an inferred model to compute the future data
point as an output
• Standard machine learning techniques are used to build a
model based on the inferred relationship between input
(past data) and target (future data)

50
51
Machine learning methods
• In order to use the supervised learners on time series data, the series is
transformed into cross-sectional data using a technique called
windowing
• This technique defines a set of consecutive time series data as a
window, where the latest record forms the target while other series
data points, which are lagged compared to the target, form the input
variables
• This is similar to an autoregressive model where past 𝑝 data points are
used to predict the next data point

52
Windowing
• The purpose of windowing is to transform the time series
data into a generic machine learning input dataset
• Parameters:
1. Window Size: Number of lag points in one window excluding the
target data point.
2. Step: Number of data points between the first value of the two
consecutive windows. If the step is 1, maximum number of
windows can be extracted from the time series dataset.
3. Horizon width: The prediction horizon controls how many records
in the time series end up as the target variable. The common
value for the horizon width is 1.
4. Skip: Offset between the window and horizon. If the skip is zero,
the consecutive data point(s) from the window is used for
horizon.

53
54
Model Training
• The model

55
Implement

56
57
58
59
Neural Network Autoregressive
• Time series data, which is transformed into a cross-sectional
dataset, can be fed into a neural network as inputs to
predict the output
• A feed forward neural network with one hidden layer, in the
context of Time series is denoted as Neural Network
Autoregressive—𝑁𝑁𝐴𝑅 𝑝, 𝑃, 𝑘 𝑚 where 𝑝 is the number of
lagged inputs (order of the autoregressive model), 𝑘 is the
number of nodes in the hidden layer, 𝑃 is the auto
regressive part of the seasonal component and 𝑚 is the
seasonal period
• The neural network 𝑁𝑁𝐴𝑅 𝑝, 𝑃, 𝑘 𝑚 has a particular
relevance when it comes to time series forecasting. A
𝑁𝑁𝐴𝑅 𝑝, 𝑃, 𝑘 𝑚 functions similarly to a seasonal
𝐴𝑅𝐼𝑀𝐴 𝑝, 0,0 𝑃, 0,0 𝑚 model
60
61
Implement

62
63
Performance Evaluation
• The training process uses data by restricting it to a point and
the rest of the later time series is used for validation

64
Performance Evaluation
• Error
• Mean Absolute Error

• Root Mean Squared Error

• Mean Absolute Percentage Error

• Mean Absolute Scaled Error

𝑇 is the total number of data points


65

You might also like