DSS16-Time Series
DSS16-Time Series
Time Series
Smoothing based methods
Regression based methods
Machine learning methods
Performance evaluation
2
Introduction
• Time series is a series of observations listed in the order of
time
• The data points in a time series are usually recorded at
constant successive time intervals.
• Time series analysis is the process of extracting meaningful
non-trivial information and patterns from time series
• Time series forecasting is the process of predicting the
future value of time series data based on past observations
and other inputs
• The objective in time series forecasting is: to use historical
information about a particular quantity to make forecasts
about the value of the same quantity in the future.
3
Example
4
Time series analysis vs. supervised
predictive models
1. Time is an important predictor in many applications vs.
the predictors are observed in a point in time (like a cross-
section of a block of wood)
5
Texonomy of TS Forecasting
• Time series can also be broadly divided into descriptive
modeling, called time series analysis, and predictive
modeling, called time series forecasting
• Time Series forecasting can be further classified into four
broad categories of techniques:
6
Example
7
Time Series Decomposition
• Time series data, are univariate, an amalgamation of
multiple underlying phenomenon
• Trend: Trend is the long-term tendency of the data. It
represents change from one period to next. The trends can
be further split into a zero-basis trend and the level in the
data
• Seasonality: Seasonality is the repetitive behavior during a
cycle of time. Seasonality can be further split into hourly,
daily, weekly, monthly, quarterly, and yearly seasonality
• Cycle: Cyclic component represents longer-than-a-year
patterns where there is no specific time frames between the
cycles. Non-linear trend can represent a cyclic component
• Noise: In a time series, anything that is not represented by
level, trend, seasonality, or cyclic component is the noise in
the time series
8
Example
9
Additive and Multiplicative
• The trend and seasonality are the systematic components of
a time series. Systematic components can be forecasted. It
is impossible to forecast noise—the non-systematic
component of a time series
• In the time series decomposition can be classified into
additive decomposition and multiplicative decomposition,
based on the nature of the different components and how
they are composed
• Additive decomposition: Time series = Trend + Seasonality +
Noise
• Multiplicative decomposition: Time series = Trend x
Seasonality x Noise
• If the magnitude of the seasonal fluctuation or the variation
in trend changes, then multiplicative time series
decomposition is the better model
10
Example
11
Classical Decomposition
1. Estimate the trend 𝑇𝑡 : If 𝑚 is even, calculate 2 × 𝑚 moving average
(𝑚 − 𝑀𝐴 and then a 2 − 𝑀𝐴); if 𝑚 is odd, calculate 𝑚 − 𝑚𝑜𝑣𝑖𝑛𝑔
average. A moving average is the average of last 𝑚 data points.
2. Calculate detrended series: Calculate 𝑦𝑡 − 𝑇𝑡 for each data point in
the series.
3. Estimate the seasonal component 𝑆𝑡 : average (𝑦𝑡 − 𝑇𝑡 ) for each 𝑚
period. For example, calculate the average of all January values of
(𝑦𝑡 − 𝑇𝑡 ) and repeat for all the months. Normalize seasonal value in
such a way that mean is zero.
4. Calculate the noise component 𝐸𝑡 : 𝐸𝑡 = (𝑦𝑡 − 𝑇𝑡 − 𝑆𝑡 ) for each
data point in the series.
12
Implement
13
Decomposition
14
Forecasting Using Decomposed Data
• The idea is to breakdown the time series into its parts,
forecast the parts, and put it back together for forecasted
future time series values
15
Smoothing based methods
• In the smoothing based approaches, an observation is a function
of past few observations
• Time periods: 𝑡 = 1, 2, 3, . . . , 𝑛. Time periods can be seconds,
days, weeks, months, or years depending on the problem
• Data series: Observation corresponding to each time period
above: 𝑦1 , 𝑦2 , 𝑦3 , . . . 𝑦𝑛 .
• Forecasts: 𝐹𝑛+ℎ is the forecast for the ℎ𝑡ℎ time period following
𝑛. Usually ℎ = 1, the next time period following the last data
point. However ℎ can be greater than 1. ℎ is called the horizon.
• Forecast errors: 𝑒𝑡 = 𝑦𝑡 − 𝐹𝑡 for any given time, 𝑡.
• The observations are made at a constant time interval. If in the
case of intermittent data, one can assume that an interpolation
scheme is applied to obtain equally spaced (in time) data points
16
Simple Forecasting Methods
• Naïve Method
• Seasonal Naive Method
where 𝑠 is the seasonal period
• Average Method
• Moving Average Smoothing
• Weighted Moving Average Smoothing
17
Comparing one-step-ahead forecasts
18
Exponential Smoothing
• Exponential smoothing is the weighted average of the past
data, with the recent data points given more weight than
earlier data points
• The weights decay exponentially towards the earlier data
points, hence, the name
19
Example
20
Holt’s Two-Parameter Exponential
Smoothing
• The forecast can be expressed as a sum of these two
components, average value or “level” of the series, 𝐿𝑛 , and
trend, 𝑇𝑛 , recursively as:
21
Holt-Winters’ Three-Parameter
Exponential Smoothing
• When a time series contains seasonality in addition to a
trend, yet another parameter, 𝛾, will be needed to estimate
the seasonal component of the time series
22
Implement
23
Example
24
Regression based methods
• In the regression based methods, the variable time is the
predictor or independent variable and the time series value
is the dependent variable
• Regression based methods are generally preferable when
the time series appears to have a global pattern
• For a time series with local patterns instead of a global
pattern, using a regression based approach requires one to
specify how and when the patterns change, which is difficult
• For such a series, smoothing approaches work best because
these methods usually rely on extrapolating the most recent
local pattern as seen earlier.
25
Example
26
Regression
• The simplest of the regression based approaches for
analyzing a time series is using linear regression
• The time period is the independent variable and attempts to
predict the time series value using this
• The linear regression model is able to capture the long-term
tendency of the series, but it does a poor job of fitting the
data
• Sophisticated polynomial functions can be used to improve
the fit
27
Regression
28
Regression With Seasonality
• Seasonal dummy variables are introduced for each period
(quarter), of the series, which triggers either 1 or 0 in the
attribute values
29
Implement
30
31
Autoregressive Integrated Moving
Average
• ARIMA stands for Autoregressive Integrated Moving Average
model and is one of the most popular models for time series
forecasting
• The ARIMA methodology originally developed by Box and
Jenkins in the 1970s
32
Autocorrelation
• Correlation measures how two variables are dependent on
each other or if they have a linear relationship with each
other “1-lag” series “4-lag” series
33
Autocorrelation
• There is a strong correlation between the original time
series “prod” and 4-lag “prod-4”. They tend to move
together
• This phenomenon is called autocorrelation, where the time
series is correlated with its own data points, with a lag
• One can measure the strength of correlation between the
original time series and all the lag series. The plot of the
resultant correlation matrix is called an Autocorrelation
Function (ACF) chart
• The ACF chart is used to study all the available seasonality in
the time series
34
35
Autoregressive Models
• Autoregressive models are regression models applied on lag
series generated using the original time series
• In the case of autoregression models, the output is the
future data point and it can be expressed as a linear
combination for past 𝑝 data points
• 𝑝 is the lag window
37
Example
38
Differencing
• A non-stationary time series can be converted to a stationary time
series through a technique called differencing
• Differencing series is the change between consecutive data points in the
series
39
Example
40
Moving Average of Error
• In addition to creating a regression of actual past “p” values,
one can also create a regression equation involving forecast
errors of past data and use it as a predictor. Consider this
equation with:
41
Autoregressive Integrated Moving
Average
• The Autoregressive Integrated Moving Average (ARIMA)
model is a combination of the differenced autoregressive
model with the moving average model.
• It is expressed as:
• The I part of ARIMA shows that the data values have been
replaced with differenced values of 𝑑 order to obtain
stationary data, which is the requirement of the ARIMA
model approach
• The prediction is the differenced 𝑦𝑡 in the 𝑑𝑡ℎ order
• This is called the 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) model
• Estimating the coefficients 𝛼 and 𝜃 for a given 𝑝, 𝑑, 𝑞 is what
ARIMA does when it learns from the training data in a time
series
42
ARIMA
• Specifying 𝑝, 𝑑, 𝑞 can be tricky (and a key limitation) but one
can tryout different combinations and evaluate the
performance of the model
• Once the ARIMA model is specified with the value of 𝑝, 𝑑, 𝑞,
the coefficients need to be estimated
• The most common way to estimate is through the Maximum
Likelihood Estimation
• It is similar to the Least Square Estimation for the regression
equation, except MLE finds the coefficients of the model in
such a way that it maximizes the chances of finding the
actual data
43
Special cases of an ARIMA model
• ARIMA (0,1,0) is expressed as 𝑦𝑡 = 𝑦𝑡−1 + 𝑒. It is the naive
model with error, which is called the Random walk model.
• ARIMA (0,1,0) is expressed as 𝑦𝑡 = 𝑦𝑡−1 + 𝑒 + 𝑐. It is a
random walk model with a constant trend. It is called
random walk with drift.
• ARIMA (0,0,0) is 𝑦𝑡 = 𝑒 or white noise
• ARIMA (p,0,0) is the autoregressive model
44
Implement
45
46
Seasonal ARIMA
• The ARIMA model can be further enhanced to take into
account of the seasonality in the time series
• Seasonal ARIMA is expressed by the notion
𝐴𝑅𝐼𝑀𝐴 𝑝, 𝑑, 𝑞 𝑃, 𝐷, 𝑄 𝑚 where:
• 𝑝 is the order of nonseasonal autoregression
• 𝑑 is the degree of differencing
• 𝑞 is the order of nonseasonal moving average of the error
• 𝑃 is the order of the seasonal autoregression
• 𝐷 is the degree of seasonal differencing
• 𝑄 is the order of seasonal moving average of the error
• 𝑚 is the number of observations in the year (for yearly seasonality)
47
Implement
48
49
Machine learning methods
• A time series is a unique dataset where the information
used to predict future data points can be extracted from
past data points
• A subset of the past known data points can be used as
inputs to an inferred model to compute the future data
point as an output
• Standard machine learning techniques are used to build a
model based on the inferred relationship between input
(past data) and target (future data)
50
51
Machine learning methods
• In order to use the supervised learners on time series data, the series is
transformed into cross-sectional data using a technique called
windowing
• This technique defines a set of consecutive time series data as a
window, where the latest record forms the target while other series
data points, which are lagged compared to the target, form the input
variables
• This is similar to an autoregressive model where past 𝑝 data points are
used to predict the next data point
52
Windowing
• The purpose of windowing is to transform the time series
data into a generic machine learning input dataset
• Parameters:
1. Window Size: Number of lag points in one window excluding the
target data point.
2. Step: Number of data points between the first value of the two
consecutive windows. If the step is 1, maximum number of
windows can be extracted from the time series dataset.
3. Horizon width: The prediction horizon controls how many records
in the time series end up as the target variable. The common
value for the horizon width is 1.
4. Skip: Offset between the window and horizon. If the skip is zero,
the consecutive data point(s) from the window is used for
horizon.
53
54
Model Training
• The model
55
Implement
56
57
58
59
Neural Network Autoregressive
• Time series data, which is transformed into a cross-sectional
dataset, can be fed into a neural network as inputs to
predict the output
• A feed forward neural network with one hidden layer, in the
context of Time series is denoted as Neural Network
Autoregressive—𝑁𝑁𝐴𝑅 𝑝, 𝑃, 𝑘 𝑚 where 𝑝 is the number of
lagged inputs (order of the autoregressive model), 𝑘 is the
number of nodes in the hidden layer, 𝑃 is the auto
regressive part of the seasonal component and 𝑚 is the
seasonal period
• The neural network 𝑁𝑁𝐴𝑅 𝑝, 𝑃, 𝑘 𝑚 has a particular
relevance when it comes to time series forecasting. A
𝑁𝑁𝐴𝑅 𝑝, 𝑃, 𝑘 𝑚 functions similarly to a seasonal
𝐴𝑅𝐼𝑀𝐴 𝑝, 0,0 𝑃, 0,0 𝑚 model
60
61
Implement
62
63
Performance Evaluation
• The training process uses data by restricting it to a point and
the rest of the later time series is used for validation
64
Performance Evaluation
• Error
• Mean Absolute Error