0% found this document useful (0 votes)
64 views4 pages

ARIMA and Sugar Cane Juice

May and June are the hottest months in India. Indians drink sugarcane juice to stay hydrated during this time. The process of making sugarcane juice is similar to ARIMA modeling, which extracts information from time series data in multiple passes like sugarcane is pressed multiple times to extract all its juice. ARIMA modeling uses three parts - differencing to remove trends, autoregression to account for dependency of current values on past values, and moving average to account for dependency of errors on past errors. Each part extracts some "juice" or information from the data.

Uploaded by

Rounaq Dhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views4 pages

ARIMA and Sugar Cane Juice

May and June are the hottest months in India. Indians drink sugarcane juice to stay hydrated during this time. The process of making sugarcane juice is similar to ARIMA modeling, which extracts information from time series data in multiple passes like sugarcane is pressed multiple times to extract all its juice. ARIMA modeling uses three parts - differencing to remove trends, autoregression to account for dependency of current values on past values, and moving average to account for dependency of errors on past errors. Each part extracts some "juice" or information from the data.

Uploaded by

Rounaq Dhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ARIMA and Sugar Cane Juice

May and June are the peak summer months in India. Indian summers are extremely hot
and draining. Summers are followed by monsoon rains. It’s no wonder that during
summers everyone in India has the habit of looking up towards the sky in the hope to
see clouds as an indicator of the arrival of monsoons. While waiting for the monsoons,
Indians have a few drinks that keep them hydrated. Sugar cane juice, or ganne-ka-ras,
is by far my favorite drink to beat the heat. The process of making sugar cane juice is
fascinating and has similarities with ARIMA modeling.

Sugar cane juice is prepared by crushing a long piece of sugar cane through the juicer
with two large cylindrical rollers as shown in the adjacent picture. However, it is difficult
to extract all the juice from a tough sugar cane in one go hence the process is repeated
multiple times. In the first go, a fresh sugar cane is passed through the juicer and then
the residual of the sugar cane that still contains juice is again passed through the juicer
many times till there is no more juice left in the residual. This is precisely how ARIMA
models work

. Consider your time series data as a sugar cane and ARIMA models as sugar cane
juicers. The idea with ARIMA models is that the final residual should look like white
noise otherwise, there is juice or information available in the data to extract.

We will come back to white noise (juice-less residual) in the latter sections of this article.
However, before that let’s explore more about ARIMA modeling.

ARIMA Modeling
ARIMA is a combination of 3 parts i.e. AR (AutoRegressive), I (Integrated), and MA
(Moving Average). A convenient notation for ARIMA model is ARIMA(p,d,q). Here p,d,
and q are the levels for each of the AR, I, and MA parts. Each of these three parts is an
effort to make the final residuals display a white noise pattern (or no pattern at all). In
each step of ARIMA modeling, time series data is passed through these 3 parts like a
sugar cane through a sugar cane juicer to produce juice-less residual. The sequence of
three passes for ARIMA analysis is as follows:
1st Pass of ARIMA to Extract Juice / Information

Integrated (I) – subtract time series with its lagged series to extract trends from the
data

In this pass of ARIMA juicer, we extract trend(s) from the original time series data.
Differencing is one of the most commonly used mechanisms for extraction of trends.
Here, the original series is subtracted from it’s lagged series e.g. November’s sales
values are subtracted from October’s values to produce trend-less residual series. The
formulae for different orders of differencing are as follow:

No Differencing (d=0)

1st Differencing (d=1)  
2nd Differencing (d=2)
For example, in the adjacent plot, a
time series data with a linearly upward trend is displayed. Just below this plot is the 1st
order differenced plot for the same data. As you can notice after 1st order differencing,
trend part of the series is extracted and the difference data (residual) does not display
any trend.

The residual data of most time series usually become trend-less after the first order
differencing which is represented as ARIMA(0,1,0). Notice, AR (p), and MA (q) values in
this notation are 0 and the integrated (I) value has order one. If the residual series still
has a trend it is further differenced and is called 2nd order differencing. This trend-less
series is called stationary on mean series i.e. mean or average value for series does not
change over time. We will come back to stationarity and discuss it in detail when we will
create an ARIMA model for our tractor sales data in the next article.
2nd Pass of ARIMA to Extract Juice / Information

AutoRegressive (AR) – extract the influence of the previous periods’ values on the


current period

After the time series data is made stationary through the integrated (I) pass, the AR part
of the ARIMA juicer gets activated. As the name auto-regression suggests, here we try
to extract the influence of the values of previous periods on the current period e.g. the
influence of the September and October’s sales value on the November’s sales. This is
done through developing a regression model with the time-lagged period values as
independent or predictor variables. The general form of the equation for this regression
model is shown below. You may want to read the following articles on regression
modeling Article 1 and Article 2.

AR model of order 1 i.e. p=1 or ARIMA(1,0,0) is represented by the following regression


equation

3rd Pass of ARIMA to Extract Juice / Information

Moving Average (MA) – extract the influence of the previous period’s error terms on
the current period’s error

Finally, the last component of ARIMA juicer i.e. MA involves finding relationships
between the previous periods’ error terms on the current period’s error term. Keep in
mind, this moving average (MA) has nothing to do with moving average we learned
about in the previous article on time series decomposition.  Moving Average (MA) part
of ARIMA is developed with the following simple multiple linear regression values with
the lagged error values as independent or predictor variables.

MA model of order 1 i.e. q=1 or ARIMA(0,0,1) is represented by the following regression


equation

You might also like