100% found this document useful (2 votes)

331 views57 pages

REport Time Series

Uploaded by

Akshaya Kennedy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

331 views57 pages

REport Time Series

Uploaded by

Akshaya Kennedy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

TIME SERIES FORECASTING

BUSINESS REPORT
AKSHAYA J K
1 |Page

Contents
............................................................................ 3
......................... 3

............................................................................................................. 5
..................... 9

.....................................................................10

.........................................................................16

........................................................................18

..................................22

..................................................................27

..........................................................................................................27

........................................................28
2 Problem Statement – TSF – Rose Dataset..............................................................30
........................30

............................................................................................................32
....................36

.....................................................................37

.........................................................................43

........................................................................45

..................................49
2 |Page

..................................................................54

..........................................................................................................54

........................................................55
3 |Page

.
Sparkling.csv Rose.csv

Solution:

Loaded required packages and read Monthly sales of Sparkling wine dataset without
using panda’s date-time format.

The dataset 'Sparkling' contain two columns of data:

The monthly time stamp from Jan 1980 to July 1995 and the sales corresponding to
the wines.
Method-1:
Create Time Stamps and adding it to the data frame to make it a Time-series data.

Add the time stamp to the original data-frame and set the time stamp as an index,
also drop the YearMonth column from the dataset.
4 |Page

Method-2:
Alternate way to read the original data-frame has a Time series data is by using
panda’s functions. [parse_dates=True, squeeze=True, index_col=0]

All values are properly loaded for the dataset with the index as panda’s date-time
format.
Sparkling time series data do not contain any missing values.

Plot the Sparkling Time Series to understand the behaviour of the data:

The Sparkling wine dataset shows significant seasonality and doesn’t shows any
consistent trend but has upward and downward slopes during the time period.
Sparkling wine has been consistently favoured over the years by customers.
5 |Page

Solution:

Check the basic measures of descriptive statistics:

The basic measures of descriptive statistics tell us how the Sales have varied across
years. But for this measure of descriptive statistics we have averaged over the whole
data without taking the time component into account.

The descriptive summary of the data shows that on an average 2402 units of
Sparkling wines were sold each month on the given period of time. 50% of month’s
sales varied from 1605 units to 2549 units. Maximum sale reported in a month is
7242 units.
Yearly Boxplot:
6 |Page

Monthly Boxplot:

The yearly-boxplot, shows that the average sale of Sparkling has been more or
less consistent across the period, at or a little below 2000 units.
The outliers in the yearly-boxplot most probably represent the seasonal sale during
the seasonal months.
The monthly-box-plot shows a clear seasonality during the festive seasonal months
of October, November and December, which peaks in December. The sale tanks in
the month of June.
7 |Page

The monthly plot for Sparkling shows mean and variation of units sold each month
over the years. Sale’s in seasonal month’s shows a higher variation than in the
lean months.
Sale in December with a mean few points below 6000, varies from 7400 to 4500
units over the years. Whereas sale in November varies from 3500 units to 5000
units and sale in October varies from 2500 to 4000 units.
The lean months from January till September shows more or less a consistent sale
around 2000 units.

The plot of monthly sale over the years also shows the seasonality component of
the time-series, with October, November and December selling exponentially
higher volumes.
The highest volume of Sparkling wines were sold in December, 1987 and the least
of December sale was in 1981. Post 1987 December sales is around an average
6500 units, which was around 5000 in early 80’s.
The seasonal sale since 1990 has been more or less consistent around 6000 units
in December, 4000 units in November and 3000 units in October.
Sales for the months from January to July is seen to be consistent across the
years, compared to the rest of the months.
8 |Page

Decompose the Time Series and plot the different components:

The takeaways from the decomposition plots of Sparkling wine sales is

As the altitude of the seasonal peaks in the observed plot is changing according to
the change in trend, the time-series is assumed to be ‘multiplicative’.
The plot of the trend component does not show a consistent trend, but an
intermediary period shows an upward trend which gets consistent on the late half
of time-series.
The additive model shows the seasonality with a variance of 3000 units and the
multiplicative model shows a variance of 30%.
The residual shows a pattern of high variability across the period of time-series,
which is more or less consistent in both additive and multiplicative decompositions.
The additive model shows a mean variance around 0 and the multiplicative model
shows a variance around 10%.
If the seasonality and residual components are independent of the trend, then you
have an additive series. If the seasonality and residual components are in fact
dependent, meaning they fluctuate on trend, then we have a multiplicative series.
9 |Page

Solution:

The train and test datasets are created with year 1991 as starting year for test data

The Plot Sparkling Time Series as train and test

10 | P a g e

Note: Please do try to build as many models as possible and as many iterations of models as
possible with different parameters.
Solution:

To regress the sale of Sparkling wines, numerical time instance order for both training and test
set were generated and the values added to the respective datasets

The linear regression plots shows a gradual upward trend in forecast of Sparkling
wine, consistent with the observed trend which was not visually apparent.
For Regression on Time forecast on the Test Data, RMSE is 1389.135.

The model has taken the last value from the test set and fitted it on the rest of the
train time period and used the same value to forecast the test set.
For Naive forecast on the Test Data, RMSE is 3864.279
The model do not capture the trend or seasonality for the given dataset.

In the Simple Average model, the forecast is done using the mean of the time-series
variable from the training set.

The model is not capable of either forecasting or able to capture the trend and
seasonality present in the dataset.
For Simple Average on the Test Data, RMSE is 1275

For the moving average model, we will calculate rolling means (or trailing moving
averages) for different intervals. The best interval can be determined by the
maximum accuracy.
The moving average models are built for trailing 2 points, 4 points, 6 points and 9
points.
For Sparkling dataset the accuracy is found to be higher with the lower rolling point
averages.
In moving average forecasts the values can be fitted with a delay of n number of
points.
The best interval of moving average from the model is 2 point
12 | P a g e

RMSE Values:

The model was ran without passing a value for alpha and used parameters:
‘optimized=True, use_brute=True’.
The auto-fit model picked up alpha = 0.0496 as the smoothing parameter.
Simple Exponential Smoothing is applied if the time-series has neither a trend nor
seasonality, which is not the case with the given data.
The forecasting using smoothing levels of alpha between 0 and 1 are as below, where the
smoothing levels are passed manually.
For alpha value closer to 1, forecasts follows the actual observation closely and closer to 0,
forecasts are farther from actual and line gets smoothened
For Sparkling, test RMSE is found to be higher for values closer to zero, which is same as
in Simple average forecast.
By passing manual alpha values, alpha =0.025 gives a better RMSE compared to
optimized RMSE value.
13 | P a g e

The Double Exponential Smoothing models is applicable when data has trend, but no
seasonality. Sparkling data contain slight trend component and very significant seasonality
In first iteration, smoothing level (alpha) and trend (beta) are fitted to the model
iteratively from values 0.1 to 1 and the best combination was chosen based on the RMSE
values, which is as below with alpha 0.1 and beta 0.1
On the second iteration the model was allowed to choose the optimized values using
parameters ‘optimized=True, use_brute=True’
The auto-fit model retuned higher RMSE value compared to iterative alpha=0.1 and
beta=0.1 RMSE value.
14 | P a g e

The Triple Exponential Smoothing models (Holt-Winter’s Model) is applicable when data
has both trend and seasonality. Sparkling data contain slight trend and significant
seasonality
On first iteration, smoothing level (alpha), trend (beta) and seasonality (gamma) are fitted
to the model iteratively from values 0.1 to 1 and the best combination was chosen based
on the RMSE values, which is as below with alpha 0.4, beta 0.1 and gamma 0.3
On the second iteration the model was allowed to choose the optimized values using
parameters ‘optimized=True, use_brute=True’
The auto-fit model retuned higher RMSE value compared to iterative alpha=0.4, beta=0.1
and gamma=0.3 RMSE value.
15 | P a g e

Model Comparison:

From the comparison of accuracy values and the plot it can be inferred that Triple
Exponential Smoothing is the best model, which has trend as well as seasonality
components fitting well with the test data.
2 point trailing moving average model is also found to have fit well with a slight lag in test
dataset.
16 | P a g e

Note: Stationarity should be checked at alpha = 0.05

Solution:
Augmented Dickey Fuller test is the statistical test to check the stationarity of a time
series. The test determine the presence of unit root in the series to understand if the
series is stationary or not
Null Hypothesis: The series has a unit root, that is series is non-stationary
Alternate Hypothesis: The series has no unit root, that is series is stationary
If we fail to reject the null hypothesis, it can say that the series is non-stationary and if we
accept the null hypothesis, it can say that the series is stationary
The ADF test on the original Sparkling series retuned the below values, where p-value is
greater than alpha .05 so we fail to reject the null hypothesis.

Differencing of order one is applied on the Sparkling series as below and tested for
stationarity. At an order of differencing 1, the series is found to be stationary as below
The rolling mean and standard deviation is also plotted to understand the component of
seasonality and to ascertain if it’s multiplicative or additive in character.
The altitude of rolling mean and std dev is seen changing according to change in slope,
which indicates multiplicity.
17 | P a g e

The ADF test is also done in this exercise with logarithmic transformation of the train data
and differencing of seasonal order (12), to understand if removing the multiplicity of the
seasonal component will have an impact on the accuracy of model.
18 | P a g e

Solution:

ARIMA model was built with optimised model and found the least AIC value =2210.62 at
(2, 1, 2).
As the Sparkling series of data contain seasonality component, ARIMA model do not
perform well. The RMSE value for this Auto- ARIMA model is 1375.
19 | P a g e

The model was built on train data with seasonality 12 and with different optimal
parameters (p, d, q)x(P, D, Q) parameters, the lowest AIC is 1382.35 was obtained at (1,
1, 2)x(0, 1, 2, 12).
The model was built with the above parameters.

The model was built on log transformed train data and with seasonality 12 and with
different optimal parameters (p, d, q)x(P, D, Q) parameters, the lowest AIC is 284.48 was
obtained at (0, 1, 1)*(1, 0, 1, 12).
The model was built with the above parameters.
21 | P a g e

The diagnostics plot of the model was derived and the standardized residuals are found to
follow a mean of zero, and the histogram shows the residuals follow a normal distribution.
The Normal Q-Q plot also shows that the quantiles come from a normal distribution as the
point forms roughly a straight line.
The correlogram shows the autocorrelation of the residuals and there are no significant
lags above the confidence index.
From the above model summary it can be inferred that MA.L1, AR.L.S12, MA.L.S12 terms
has the highest absolute weightage.
From the p-values it can be inferred that terms MA.L1, AR.L.S12, MA.L.S12 are significant
terms, as their values are below 0.05.
The RMSE values of the automated SARIMA of log series model is 336.58
22 | P a g e

The model built with log series data has a lower RMSE value when compared to original
train data.

Solution:

Here, we have taken alpha=0.05.

The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the significant
lag before which the PACF plot cuts-off to 0.
The Moving-Average parameter in an ARIMA model is 'q' which comes from the significant
lag before the ACF plot cuts-off to 0.
By looking at above plots, we can say that both the PACF and ACF plot cuts-off at lag 0.
23 | P a g e

The RMSE value of manual ARIMA model is 4780. Since the ARIMA model do not capture
the seasonality, this model do not perform well.

From the ACF plot of the observed/ train data, it can be inferred that at seasonal interval
of 12, the plot is not quickly tapering off. So a seasonal differencing of 12 has to be taken
24 | P a g e

From the plots above an apparent slight trend is still existing after differencing of seasonal
order of 12. With a further differencing of order one, no trend is present.

An ADF test need to be done to check the stationarity after the above differencing. With a
p-value below alpha 0.05 and test statistic below critical values, it can be confirmed that
the data is stationary.

ACF and PACF plots of the seasonal-differenced + one order differenced data is created to
find the values for (p,d,q)x(P,D,Q).
25 | P a g e

Here we have taken alpha = 0.05 and seasonal period as 12.

From the PACF plot it can be seen that till 3rd lag it’s significant before cut-off, so AR term
‘p = 3’ is chosen. At seasonal lag of 12, it almost cuts off, so seasonal AR ‘P = 1’
From ACF plot it can be seen that lag 1 is significant before it cuts off, so MA term ‘q =1’ is
selected and at seasonal lag of 12, a significant lag is apparent, so kept seasonal MA term
‘Q = 1’ initially.
The seasonal MA term ‘Q’ was later optimized to 2, by validating model performance, as
the data might be under-differenced.
The final selected terms for SARIMA model is (3, 1, 1)*(1,1,2,12).
The diagnostics plot of the model was derived and the standardized residuals are found to
follow a mean of zero, and the histogram shows the residuals follow a normal distribution.
The Normal Q-Q plot also shows that the quantiles come from a normal distribution as the
point forms roughly a straight line.
The correlogram shows the autocorrelation of the residuals and there are no significant
lags above the confidence index.
The RMSE values of the automated SARIMA model is 324.10
26 | P a g e
27 | P a g e

Solution:

Manual SARIMA (3,1,1)*(1,1,2,12) is found to be the best model, followed by

Auto_SARIMA model.

Solution:
Based on the overall model evaluation and comparison, Maual SARIMA is selected for final
prediction into 12 months in future.
Manual SARIMA model with optimal parameters (3,1,1)*(1,1,2,12) is found to be the best
model in terms of accuracy scored against the full data.
The model predicts an upward trend and continuation of the seasonal surge in sales in the
upcoming 12 months. According to the model the seasonal sale will be more than that of
the previous year.
28 | P a g e

Please explain and summarise the various steps performed in this project. There should be
proper business interpretation and actionable insights present.

Solution:
The model forecasts sale of 29535 units of Sparkling wine in 12 months into future. Which
is an average sale of 2462 units per month.
29 | P a g e

The seasonal sale in December 1995 will hit a maximum of 6136 units, before it drops to
the lowest sale in January 1996; at 1246 units
The wine company is recommended to ramp up their procurement and production line in
accordance with the above forecasts for the third quarter of 1995 (October, November and
December), which is a total of 13,370 units of sparkling wine is expected to be sold.
The forecast also indicates that the year-on-year sale of sparkling wine is not showing an
upward trend. The winery must adopt innovative marketing skills to improve the sale
compared to previous years.
30 | P a g e

Solution:
Loaded required packages and read Monthly sales of Rose wine dataset without using
panda’s date-time format.

The dataset Rose contain two columns of data:

The monthly time stamp from Jan 1980 to July 1995 and the sales corresponding to
the wines.
Method-1:
Create Time Stamps and adding it to the data frame to make it a Time-series data.

Add the time stamp to the original data-frame and set the time stamp as an index,
also drop the YearMonth column from the dataset.
31 | P a g e

Method-2:
Alternate way to read the original data-frame has a Time series data is by using
panda’s functions. [parse_dates=True, squeeze=True, index_col=0]

All values are properly loaded for the dataset with the index as panda’s date-time
format. The Rose Time series has values in float64 data-type format.
Rose time series contain 2 missing values, they are for the time stamp '1994-07-01'
and '1994-08-01'
Impute the null values by using interpolation [polynomial of order 2].

Plot the Sparkling Time Series to understand the behaviour of the data:
32 | P a g e

The Rose wine dataset shows significant seasonality and decreasing Trend could be
observed with a multiplicative seasonality present.
The demand for Rose had been fell out-of-favour over the years.

Solution:

Check the basic measures of descriptive statistics:

The mean value of the Time Series is nearly same as the median values. As a time
series data it may signify presence of decreasing trend and multiplicative
seasonality.
The descriptive summary of the data shows that on an average 90 units of Rose
wines were sold each month on the given period of time. 50% of months sales
varied from 63 units to 112 units. Maximum sale reported in a month is 267 units
and minimum of 28 units
The basic measures of descriptive statistics tell us how the Sales have varied
across years. But for this measure of descriptive statistics we have averaged over
the whole data without taking the time component into account.
33 | P a g e

Yearly Boxplot:

The yearly-boxplot, shows that the average sale of Rose wine moving according to
the downward trend in sales over the years. The outliers over upper bound in the
yearly-boxplot most probably represent the seasonal sale during the seasonal
months.
The monthly-box-plot shows a clear seasonality during the seasonal months of
November and December. Though the sale tanks in the month of January, it picks
up in the due course of the year.
Average sale in December is around 140 units, November is around 110 units and
October is around 90 units.
34 | P a g e

The monthly plot for Rose shows mean and variation of units sold each month over
the years. Sale in months such as July, August, September and December shows a
higher variation than the rest
Sale in December with a mean few points below 100, varies from 75 to 270 units
over the years. Whereas the average sale is less than or closer to 100 units
(above50) for the rest of the year.

The plot of monthly sale over the years also shows the seasonality component of
the time-series, with November and December selling exponentially higher
volumes than other months.
The highest volume of Rose wines were sold in December, 1980 and the least of
December sale was in 1993. Though December sale picked after 1983, it
consistently dipped after 1987.
35 | P a g e

Decompose the Time Series and plot the different components:

The observed plot of the decomposition diagram shows visible annual seasonality
and a downward trend. The early period of the plot shows higher variation than in
the later periods
The trend diagram shows a downward trend overall. Exponential dips can be seen
between 1981 and 1983 and later from 1991 to 1993
Seasonal components are quite visible and consistent in both the observed and
seasonal charts of the diagrams. The multiplicative model shows variance in
seasonality of 16%
The residuals shows a pattern of high variability across the period of time-series,
which is more or less consistent.
The variance in residuals shows higher variance in the early period of the series,
which explains the higher variance in observed plot at same time period.
As the seasonality peaks are consistently reducing its altitude in consistent with
trend, the series can be treated as multiplicative in model building
36 | P a g e

Solution:

The train and test datasets are created with year 1991 as starting year for test data

The Plot Rose Time Series as train and test

37 | P a g e

Note: Please do try to build as many models as possible and as many iterations of models as
possible with different parameters.
Solution:

To regress the sale of Rose wines, numerical time instance order for both training and test set
were generated and the values added to the respective datasets

The linear regression on the Rose dataset shows an apparent downward trend as consistent
with the observed time-series.
For Regression on Time forecast on the Test Data, RMSE is 15.278
The model has successfully captured the trend of the series, but does not reflect the
seasonality.

In naive model, the prediction for tomorrow is the same as today and the
prediction for day after tomorrow is tomorrow and since the prediction of tomorrow
is same as today, therefore the prediction for day after tomorrow is also today.
The model has taken the last value from the test set and fitted it on the rest of the
train time period and used the same value to forecast the test set.
For Naive forecast on the Test Data, RMSE is 79.75.
The model do not capture the trend or seasonality for the given dataset.
38 | P a g e

In the Simple Average model, the forecast is done using the mean of the time-series
variable from the training set.

The model is not capable of either forecasting or able to capture the trend and
seasonality present in the dataset.
For Simple Average on the Test Data, RMSE is 53.48
39 | P a g e

For the moving average model, we will calculate rolling means (or trailing moving
averages) for different intervals. The best interval can be determined by the
maximum accuracy.
The moving average models are built for trailing 2 points, 4 points, 6 points and 9
points.
For Rose dataset the accuracy is found to be higher with the lower rolling point
averages.
In moving average forecasts the values can be fitted with a delay of n number of
points.
The best interval of moving average from the model is 2 point.
40 | P a g e

The model was ran without passing a value for alpha and used parameters:
‘optimized=True, use_brute=True’.
The auto-fit model picked up alpha = 0.0987 as the smoothing parameter.
Simple Exponential Smoothing is applied if the time-series has neither a trend nor
seasonality, which is not the case with the given data.
The forecasting using smoothing levels of alpha between 0 and 1 are as below, where the
smoothing levels are passed manually.
For alpha value closer to 1, forecasts follows the actual observation closely and closer to 0,
forecasts are farther from actual and line gets smoothened
For Rose, test RMSE is found to be higher for values closer to zero, which is same as in
Simple average forecast.
Both manual alpha =0.10 and optimized alpha value are having similar RMSE value.

The Double Exponential Smoothing models is applicable when data has trend, but no
seasonality. Rose data contain significant trend component and seasonality.
In first iteration, smoothing level (alpha) and trend (beta) are fitted to the model
iteratively from values 0.1 to 1 and the best combination was chosen based on the RMSE
values, which is as below with alpha 0.1 and beta 0.1
On the second iteration the model was allowed to choose the optimized values using
parameters ‘optimized=True, use_brute=True’
The auto-fit model has lower RMSE value compared to iterative alpha=0.1 and beta=0.1
RMSE value.
41 | P a g e

The Triple Exponential Smoothing models (Holt-Winter’s Model) is applicable when data
has both trend and seasonality. Rose data contain significant trend and seasonality.
On first iteration, smoothing level (alpha), trend (beta) and seasonality (gamma) are fitted
to the model iteratively from values 0.1 to 1 and the best combination was chosen based
on the RMSE values, which is as below with alpha 0.4, beta 0.1 and gamma 0.3
On the second iteration the model was allowed to choose the optimized values using
parameters ‘optimized=True, use_brute=True’
The auto-fit model retuned higher RMSE value compared to iterative alpha=0.1, beta=0.2
and gamma=0.3 RMSE value.
42 | P a g e

Model Comparison:
43 | P a g e

Note: Stationarity should be checked at alpha = 0.05

Solution:
Augmented Dickey Fuller test is the statistical test to check the stationarity of a time
series. The test determine the presence of unit root in the series to understand if the
series is stationary or not
Null Hypothesis: The series has a unit root, that is series is non-stationary
Alternate Hypothesis: The series has no unit root, that is series is stationary
If we fail to reject the null hypothesis, it can say that the series is non-stationary and if we
accept the null hypothesis, it can say that the series is stationary
The ADF test on the original Rose series retuned the below values, where p-value is
greater than alpha .05 so we fail to reject the null hypothesis.
44 | P a g e

Differencing of order one is applied on the Rose series as below and tested for stationarity.
At an order of differencing 1, the series is found to be stationary as below
The rolling mean and standard deviation is also plotted to understand the component of
seasonality and to ascertain if it’s multiplicative or additive in character.
The altitude of rolling mean and std dev is seen changing according to change in slope,
which indicates multiplicity.
The ADF test is also done in this exercise with logarithmic transformation of the train data
and differencing of seasonal order (12), to understand if removing the multiplicity of the
seasonal component will have an impact on the accuracy of model.
45 | P a g e

Solution:

ARIMA model was built with optimised model and found the least AIC value =1276 at (0,
1, 2).
As the Rose series of data contain seasonality component, ARIMA model do not perform
well. The RMSE value for this Auto- ARIMA model is 15.63.
46 | P a g e

The model was built on train data with seasonality 12 and with different optimal
parameters (p, d, q)x(P, D, Q) parameters, the lowest AIC is 774.97 was obtained at (0, 1,
2)x(2, 1, 2, 12).
The model was built with the above parameters.
47 | P a g e

The model was built on log transformed train data and with seasonality 12 and with
different optimal parameters (p, d, q)x(P, D, Q) parameters, the lowest AIC is -247.08 was
obtained at (0, 1, 1)*(1, 0, 1, 12).
The model was built with the above parameters.

The model built with log series data has a higher RMSE value when compared to original
train data.

Solution:
50 | P a g e

Here, we have taken alpha=0.05.

The RMSE value of manual ARIMA model is 84.16. Since the ARIMA model do not capture
the seasonality, this model do not perform well.

ACF and PACF plots of the seasonal-differenced + one order differenced data is created to
find the values for (p,d,q)x(P,D,Q).
52 | P a g e

Here we have taken alpha = 0.05 and seasonal period as 12.

From the PACF plot it can be seen that till 4th lag it’s significant before cut-off, so AR term
‘p = 4’ is chosen. At seasonal lag of 12, seasonal AR ‘P = 0’.
From ACF plot it can be seen that till lag 2nd is significant before it cuts off, so MA term ‘q
=2’ is selected and at seasonal lag of 12, a significant lag is apparent, so kept seasonal MA
term ‘Q = 1’ initially.
The seasonal MA term ‘Q’ was later optimized to 2, by validating model performance, as
the data might be under-differenced.
The final selected terms for SARIMA model is (4, 1, 2)*(0,1,2,12).
The diagnostics plot of the model was derived and the standardized residuals are found to
follow a mean of zero, and the histogram shows the residuals follow a normal distribution.
The Normal Q-Q plot also shows that the quantiles come from a normal distribution as the
point forms roughly a straight line.
The correlogram shows the autocorrelation of the residuals and there are no significant
lags above the confidence index.
The RMSE values of the automated SARIMA model is 15.38.
53 | P a g e
54 | P a g e

Solution:

Triple Exponential Smoothing (Holt Winter’s) with alpha: 0.1, beta: 0.2 and gamma: 0.3 is
found to be the best model, followed by 2-point trailing moving average model.

Solution:
Based on the overall model evaluation and comparison, Triple Exponential Smoothing (Holt
Winter’s) is selected for final prediction into 12 months in future.
TES model alpha: 0.1, beta: 0.2 and gamma: 0.3 & trend: ‘additive’, seasonal:
‘multiplicative’ is found to be the best model in terms of accuracy scored against the full
data.
The model predicts continuation of the trend in sales and seasonality in year-end sales.
The prediction shows a stabilization of downward trend, as the sales will be almost same
as previous observed year.
The RMSE value of TES obtained for the entire dataset is 17.88
55 | P a g e

Please explain and summarise the various steps performed in this project. There should be
proper business interpretation and actionable insights present.

Solution:
56 | P a g e

The model forecasts sale of 585 units of Rose wine in 12 months into future. Which is an
average sale of 48 units per month.
The seasonal sale in December 1995 will reach a maximum of 82 units, before it drops to
the lowest sale in January 1996; at 30 units.
Unlike Sparkling wine, Rose wine sells very low number of units and the standard
deviation is only 12.75. Which means that higher demand does not impact procurement
and production.
The ABC estate wine should investigate the low demand for Rose wine in market and make
corrective actions in marketing and promotions.

2011 Audi Q7: Owner's Manual
100% (1)
2011 Audi Q7: Owner's Manual
392 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Elektor Books - Alex Pozhitkov and BoB Gudgel - Renewable Energy at Home - 2024
No ratings yet
Elektor Books - Alex Pozhitkov and BoB Gudgel - Renewable Energy at Home - 2024
110 pages
State-of-Healthcare-Life-Sciences-GCCs-in-India-PDF
No ratings yet
State-of-Healthcare-Life-Sciences-GCCs-in-India-PDF
29 pages
Juniper SRX CLI Cheatsheet
No ratings yet
Juniper SRX CLI Cheatsheet
1 page
LISHENHAO P129627 ZCMB6112 WalmartsDataAnalyticsforOptimisedInventoryandCustomerInsights
No ratings yet
LISHENHAO P129627 ZCMB6112 WalmartsDataAnalyticsforOptimisedInventoryandCustomerInsights
7 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
ML 2 - Problem statements and Rubirics
No ratings yet
ML 2 - Problem statements and Rubirics
3 pages
Great Learning Predictive Modelling Project
No ratings yet
Great Learning Predictive Modelling Project
12 pages
LA-E921P Acer G3-571 Esquema
No ratings yet
LA-E921P Acer G3-571 Esquema
64 pages
Bios Section B2562
No ratings yet
Bios Section B2562
40 pages
Germany Registration Form
No ratings yet
Germany Registration Form
2 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Jacket Slitting Machine: Jacketstrip 8310
100% (1)
Jacket Slitting Machine: Jacketstrip 8310
202 pages
Project Predictive Modeling
50% (2)
Project Predictive Modeling
69 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
AKSHAYA - Advanced Statistics Project Report
No ratings yet
AKSHAYA - Advanced Statistics Project Report
50 pages
ASProject-Padma Murali
No ratings yet
ASProject-Padma Murali
45 pages
Data Visualization in Tableau - Car Insurance Claim Project
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
51 pages
Time - PGP DSBA
100% (1)
Time - PGP DSBA
43 pages
App Design in UIUX
No ratings yet
App Design in UIUX
2 pages
Viva Ques 1
No ratings yet
Viva Ques 1
24 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Project Report - Predictive Modeling
No ratings yet
Project Report - Predictive Modeling
49 pages
Time Series Project
100% (3)
Time Series Project
45 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Diver Resistor Manual
No ratings yet
Diver Resistor Manual
17 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
31 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
E Commer
No ratings yet
E Commer
24 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
8 pages
02-XML Structure Introduction
No ratings yet
02-XML Structure Introduction
15 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
25 pages
Quiz_ Attempt Review
No ratings yet
Quiz_ Attempt Review
5 pages
Web Design and Trends in Web Design
No ratings yet
Web Design and Trends in Web Design
10 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
Social Studies History Subject For Middle School 6th Grade Ancient World History
No ratings yet
Social Studies History Subject For Middle School 6th Grade Ancient World History
56 pages
VARUNSAINI - 11 Dec 2022
No ratings yet
VARUNSAINI - 11 Dec 2022
16 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
2020 - Nasution Et Al - GEOMATE
No ratings yet
2020 - Nasution Et Al - GEOMATE
13 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Answer Book - Sparkling Wines
No ratings yet
Answer Book - Sparkling Wines
10 pages
Time Series Forcast
No ratings yet
Time Series Forcast
18 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
Project - Time Series Forecasting (Sparkling - CSV) & (Rose - CSV)
100% (1)
Project - Time Series Forecasting (Sparkling - CSV) & (Rose - CSV)
15 pages
Prefabrication Presentation
No ratings yet
Prefabrication Presentation
18 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
Sist en Iso 14644 10 2022
No ratings yet
Sist en Iso 14644 10 2022
13 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
SQL Quiz Results
No ratings yet
SQL Quiz Results
17 pages
MYP Math
No ratings yet
MYP Math
7 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Data Mining Project
No ratings yet
Data Mining Project
11 pages
Directory Partition
No ratings yet
Directory Partition
10 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
hw25 3rd
No ratings yet
hw25 3rd
2 pages
2016 Bookmatter SolvingPDEsInPython
No ratings yet
2016 Bookmatter SolvingPDEsInPython
6 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Prodsequence Pasundayag
No ratings yet
Prodsequence Pasundayag
2 pages
Seb Sapura 3000
No ratings yet
Seb Sapura 3000
1 page
Project Questions
No ratings yet
Project Questions
3 pages
Object Oriented Programming Using C++: Instructions To Candidates
No ratings yet
Object Oriented Programming Using C++: Instructions To Candidates
2 pages
Great Learning: SMDM Final Assignment
100% (1)
Great Learning: SMDM Final Assignment
16 pages
DXC Resume
No ratings yet
DXC Resume
1 page
MRA CafeChain Analysis
No ratings yet
MRA CafeChain Analysis
23 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
SMDM Extended Project Report
No ratings yet
SMDM Extended Project Report
9 pages
08-3464 Walker 7080
No ratings yet
08-3464 Walker 7080
2 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Project Report
100% (3)
Project Report
36 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Extended Project
No ratings yet
Extended Project
1 page
Capstone Project
100% (1)
Capstone Project
7 pages

REport Time Series

Uploaded by

REport Time Series

Uploaded by

TIME SERIES FORECASTING

The dataset 'Sparkling' contain two columns of data:

Check the basic measures of descriptive statistics:

Decompose the Time Series and plot the different components:

The takeaways from the decomposition plots of Sparkling wine sales is

The Plot Sparkling Time Series as train and test

Note: Stationarity should be checked at alpha = 0.05

Here, we have taken alpha=0.05.

Here we have taken alpha = 0.05 and seasonal period as 12.

Manual SARIMA (3,1,1)*(1,1,2,12) is found to be the best model, followed by

The dataset Rose contain two columns of data:

Check the basic measures of descriptive statistics:

Decompose the Time Series and plot the different components:

The Plot Rose Time Series as train and test

Note: Stationarity should be checked at alpha = 0.05

Here, we have taken alpha=0.05.

Here we have taken alpha = 0.05 and seasonal period as 12.

You might also like