0% found this document useful (0 votes)
47 views

Rajendra Ladda Time Series Forecasting Project Report

The document provides a report on time series forecasting projects for rose wine and sparkling wine sales data. It includes: 1) Data exploration including missing value imputation, plots, and decomposition. 2) Model building and evaluation on training and test data including regression, naive forecasting, smoothing methods. 3) Checking and correcting for non-stationarity. 4) Building an automated ARIMA/SARIMA model selection using AIC on training data and evaluating it on test data.

Uploaded by

Rajendra Ladda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Rajendra Ladda Time Series Forecasting Project Report

The document provides a report on time series forecasting projects for rose wine and sparkling wine sales data. It includes: 1) Data exploration including missing value imputation, plots, and decomposition. 2) Model building and evaluation on training and test data including regression, naive forecasting, smoothing methods. 3) Checking and correcting for non-stationarity. 4) Building an automated ARIMA/SARIMA model selection using AIC on training data and evaluating it on test data.

Uploaded by

Rajendra Ladda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Rajendra Ladda

Time series Forecasting Project Report


05-Nov-2023

1. Read the data as an appropriate Time Series data and plot the data.
The Rose wine time series data has 2 missing values. I imputed the missing values by
intrapolation method. Following is the time series plot after missing values imputation.

The Sparkling wine time series data has no missing values. Following is the time series plot.
2. Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

Null Value Check:


I checked both data sets for null value. Rose wine data had 2 null values which I imputed as
stated above. After imputation, there are no null values in the Rose wine data.

There are no null values in the Sparkling wine data.

Duplicate Value Check:


There are no duplicate values in both Rose and Sparkling wine datasets.

Data Description:

ROSE WINE
count mean std min 25% 50% 75% max

Rose 187.0 89.914439 39.238325 28.0 62.5 85.0 111.0 267.0


SPARKLING
count mean std min 25% 50% 75% max
WINE

Sparkling 187.0 2402.417112 1295.11154 1070.0 1605.0 1874.0 2549.0 7242.0

Month Wise Box Plots

Decomposition

Additive – Rose Wine


Multiplicative – Rose Wine

Additive – Sparkling Wine


Multiplicative – Sparkling Wine

As we can observe from the above decompositions, we can say that both the wine time series are

clearly multiplicative in nature and both have a seasonal component.


We can observe that the Rose sales show downward trend and the Sparkling sales show slightly

upward trend. The Wine sales are unstable. They clearly show seasonality.

3. Split the data into training and test. The test data should start in 1991.

After splitting both the time series datasets, following are the data sizes.

Rose Wine Data: Training Data 132 rows, Test Data 55 rows

Sparkling Wine Data: Training Data 132 rows, Test Data 55 rows

Following are the plots of training and test data in both Rose and Sparkling wine data.

Rose Wine Training and Test Data Plot


Sparkling Wine Training and Test Data Plot

4. Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression,naïve
forecast models and simple average models. Should also be built on the training
data and check the performance on the test data using RMSE.
Following are the RMSE comparisons of all the different methods used. Below that please find the charts
with predictions using all the different methods.

Test RMSE Rose Test RMSE Sparkling

RegressionOnTime 15.268955 1389.135175

NaiveModel 79.718773 3864.279352

SimpleAverageModel 53.460570 1275.081804

2pointTrailingMovingAverage 11.529278 813.400684

4pointTrailingMovingAverage 14.451403 1156.589694

6pointTrailingMovingAverage 14.566327 1283.927428


Test RMSE Rose Test RMSE Sparkling

9pointTrailingMovingAverage 14.727630 1346.278315

Simple Exponential Smoothing 36.796241 1338.008384

Double Exponential Smoothing 15.268944 5291.879833

Triple Exponential Smoothing (Additive Season) 14.249661 378.951023

Triple Exponential Smoothing (Multiplicative Season) 20.156763 404.286809


Best Model for ROSE data is 2 Point Moving Average Model

Best Model for Sparkling data is Triple Exponential Smoothing Model (Additive Season)

5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical
test. If the data is found to be non-stationary, take appropriate steps to make it
stationary. Check the new data for stationarity and comment. Note: Stationarity
should be checked at alpha = 0.05.

Rose Wine Data


DF test statistic is -2.240
DF test p-value is 0.46713716277931555
p>0.05 Hence The Data is not stationary

If we take difference of 1 following is the DF statistics


DF test statistic is -8.162
DF test p-value is 3.0159761158274063e-11
Number of lags used 12
P<0.05

Hence we conclude that at lag 1 the Rose Wine data becomes stationary
Sparkling Data
DF test statistic is -1.798
DF test p-value is 0.7055958459932068
p>0.05 Hence The Data is not stationary

Let us take difference of 1 following is the DF statistics


DF test statistic is -44.912
DF test p-value is 0.0
P<0.05 Hence the Sparking Wine data becomes stationary at lag 1

6. Build an automated version of the ARIMA/SARIMA model in which the parameters


are selected using the lowest Akaike Information Criteria (AIC) on the training data
and evaluate this model on the test data using RMSE.

You might also like