0% found this document useful (0 votes)
60 views

Forecast UPC-Level FMCG Demand, Part II: Hierarchical Reconciliation

Uploaded by

Ram Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Forecast UPC-Level FMCG Demand, Part II: Hierarchical Reconciliation

Uploaded by

Ram Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

2015 IEEE International Conference on Big Data (Big Data)

Forecast UPC-Level FMCG Demand, Part II: Hierarchical Reconciliation

Dazhi Yang∗ , Gary S. W. Goh, Siwei Jiang and Allan N. Zhang Orkan Akcan
Singapore Institute of Manufacturing Technology (SIMTech) Antuit
Agency for Science, Technology and Research (A∗ STAR) Singapore, Singapore
Singapore, Singapore
Email: ∗ [email protected]
[email protected]

Abstract—In a big data enabled environment, manufactur- for a single product in a single market is performed.
ers and distributors may have access to previously unobserved
retailer-level demand related information. This additional A. Literature Reivew
information can be considered in demand forecasting to
produce more accurate forecasts, and thus enable better Generally speaking, the demand for a product is gov-
stock-outs management. In Part II of this two-part paper, erned by economic theory. A typical retailer would collect
we explore the hierarchical nature of fast moving consumer panel data over several dimensions including items, stores,
goods (FMCG) demand (represented by sales) time series markets, categories. In addition, product characteristics and
and produce one week ahead rolling forecasts on universal demographics are also frequently being recorded and up-
product code (UPC) level (or distributor level, as per our
definition below). We show that the hierarchical forecasting dated. Theoretically, when individual consumer character-
framework has significant accuracy improvement over the istics are matched to the goods they purchased, the complex
conventional univariate forecasting methods. The main rea- and evolving relationships between consumer behavior and
son of the observed improvements is due to the price and demand can be studied.
promotion information available at the retailer level, which Econometrics theory has potential in integrating the
is assumed to be unknown to the distributor. To reconcile
forecasts according to the hierarchy, only the forecast values above-mentioned data for demand prediction1 . One such
at retailer level are needed, the business strategies of individual example is the random-coefficients logit model [2]. It con-
retailers remain proprietary. A freely available dataset is siders the utility uijt of an individual consumer i from a
considered to encourage further exploration. Data exploratory product j in a market t:
analysis and visualization tools are discussed in Part I of the
paper. uijt = xjt βi − αi pjt + ξjt + ijt (1)
Keywords-FMCG; forecasting; hierarchical reconciliation;
visualization
where x and ξ are observed and unobserved product char-
acteristics, p is the price of product,  is a zero mean
I. I NTRODUCTION stochastic term and α and β are interpretable model coef-
ficients. It can be seen from Eq. (1) that prices of different
Demand forecasting at various horizons is essential for products in different markets, observed and unobserved
sales and operations planning (S&OP) process. Good fore- product characteristics and demographics are used to model
casts provide strong decision support for operation man- uijt . The demand can thus be estimated by integrating
agement tasks such as capacity planning, inventory man- over the probability mass of the consumers who choose
agement and planning & scheduling [1]. We are interested brand j in market t (see Ref. [2] for details). However,
in forecasting the demand of fast moving consumer goods such econometrics models are mostly used for market
(FMCG) on universal product code (UPC) level in this share prediction. Furthermore, if we consider a particular
paper. market at different time t, Eq. (1) suggests that in order to
In a big data enabled environment, manufacturers and forecast future utility, the future values of the predictors are
distributors may have access to previously unobserved needed. Due to the theoretical nature of the method and the
retailer-level demand related information. This additional complexity of the formulation, much work along this track
information enables potential integration of the hierarchical is on-going, manufacturers, distributors and retailers often
demand. The hierarchy should be considered in demand rely on simplistic but effective demand forecasting methods
forecasting in order to produce more accurate forecasts at in their day-to-day operations.
various levels of the hierarchy. Before we introduce hierar-
chical forecasting, literature review on demand forecasting 1 Prediction is more general than forecasting.

978-1-4799-9926-2/15/$31.00 ©2015 IEEE 2113


Alternative to econometrics demand models are time autoregressive distributed lag (ADL) model:
series models (or extrapolative methods). These models

 

consider the evolution and dependence of the temporal ln(y0,t ) =α0 + αj ln(y0,t−j ) + β0,j ln(p0,t−j )
process of demand. Such demand forecasting models rely j=1 j=0
on discerning demand patterns and are constructed based on

 
P 

previous observations. Simple exponential smoothing (SES) + γ0,j I0,t−j + βp,j ln(pp,t−j )
is one of the well-accepted models; it is shown that SES is j=0 p=1 j=0
efficient in capturing the level component in demand over 12

Q 
 
time [3]. Given a demand time series {yt : t ∈ Dt }, where + γq,j Iq,t−j + θd monthly_dummyd
Dt = {0, 1, 2, · · · } is the time indices, the weighted average q=1 j=0 d=1
form of SES is given by: 1

C 

ŷt+1 = αyt + (1 − α)ŷt (2) + δc,ν calendar_eventc,t−ν + εt (3)


c=1 ν=0
where symbol ˆ denotes an estimate and 0 ≤ α ≤ 1 is the
where
is the maximum lag; P and Q are numbers of
smoothing parameter. Eq. (2) shows that the forecast at time
competitor products in terms of price and promotional index
t + 1 is a weighted average of the most recent observation
respectively; ν = 0 corresponds to the calendar event week;
yt and the most recent forecast ŷt . The deficiency of SES
ν = 1 corresponds to the week before the calendar event
in FMCG demand prediction is thus apparent: when the
week; C is the number of distinct calendar events in a year;
characteristics of the demand series change due to special
the remaining symbols in Eq. (3) are self-explanatory. The
events such as promotions or holidays, the fitted parameter
values of P and Q are selected via the lasso (least absolute
can no longer describe the series. Exogenous inputs or
shrinkage and selection operator). It was shown that the
multivariate methods are needed to explain those special
ADL model outperforms both the SES and base-times-lift
events [4].
models.
The so-called “base-times-lift” model is one of the
The method proposed in Ref. [3] can be considered as the
simplistic models that use exogenous variables [5]; it is
state-of-the-art forecasting model for FMCG. It reflects a
commonly adopted by the industry [3]. The base-times-lift
strong belief in the effect of predictor variables in predicting
models first generates a baseline forecast using a model,
the response variable. However, the ADL model shown in
such as the SES, for non-promoted time periods. A “lift
Eq. (3) has two major deficiencies for operational forecast:
effect” L is added to the baseline forecast during the
promoted periods. Ref. [5] models L using promotional 1) The first deficiency lies within the modeling part.
index I, i.e., L̂c = (Ic /Ip )Lp , where subscripts c and p Suppose C = 9 and P = Q =
= 2 (which are the
represent current and previous promotions. However, such typical values [8]), the total number of predictors is
linear assumption on lift effect has been shown to be sub- 50. There is a high chance that the model is misspeci-
optimal in FMCG forecasting; it often performs worse than fied (e.g., retaining null of the t-test). This would lead
SES model [3]. Better representations of lift effect are thus to an additional regression parameter selection stage
desired. beside the original lasso step, i.e., lasso determines
In the literature, promotional demand is related to various P and Q, and the additional parameter selection step
factors. Ref. [6] suggests that the promotional demand, and reduces number of predictors from 50 to a smaller
thus sales, are mainly affected by promotion style, item number. Without the additional parameter selection
promotion history and store promotion history. Ref. [7] stage the insignificant predictors only contribute to
considers deal types (promotion style), depth of deals, the variance of the predicted value.
as well as variations across categories, time and brands. 2) The exogenous factors such as promotion information
Among several alternatives, Ref. [3] presents a more general is usually available at store level or retail enterprise
formulation of FMCG demand with the following predic- level but not available to manufacturers and distrib-
tors: (1) past values of demand of the focal product, (2) utors. Therefore the ADL model is not applicable to
price of the focal product, (3) promotional index of the manufacturers and distributors; they have to make
focal product, (4) price of the competitor products, (5) forecasts based on the historical order placement
promotional indices of the competitor products, (6) monthly information provided by the retailers.
dummy variables and (7) dummies for calendar events; all Beside the above two major deficiencies, other issues such
future information (other than demand itself) is assumed as missing data handling, scalability of the algorithm, lasso
to be known to the forecaster. These predictors are related regression design and error evaluations in Ref. [3] also have
to the future demand of the focal product through the room for improvements. However, we mainly address the

2114
Total

A B C

AA AB AC BA BB BC CA CB CC

Figure 1. A three-level hierarchical time series structure.

second issue in this paper, i.e., improving FMCG unit sales2 the current state-of-the-art on the value of sharing demand
forecasting on universal product code (UPC) level, without information. It is concluded that whether to share demand
using the retailer-level strategic information (such as an information depends on whether the downstream demand
upcoming promotion). process can be inferred. At this stage, we consider the
case where the downstream demand cannot be inferred.
B. Hierarchical nature of FMCG sales In addition, we note that the forecasts at retailer level are
FMCG unit sales data can be modeled as a hierarchical not actual orders. If the forecasts are the same as orders,
structure as according to how the UPCs are being distributed providing such information to the distributed would have
in a supply chain; Fig. 1 gives such an illustration. Level 0 no practical relevance.
(Total) denotes the completely aggregated distributor-level In this paper, we show that the hierarchical reconciliation
time series. Level 1 (A, B and C) shows the first level using the retail-level data would improve UPC-level forecast
of disaggregation which could represents various UPCs accuracy significantly. The proposed forecasting framework
handled by the distributor. Down to level 2 (AA, AB, · · · , needs collaborative efforts among the individual retailers
CC), it contains the most disaggregated time series, e.g. the and the distributor. Such collaboration protects proprietary
demand at individual outlets. The dataset used in this paper information as only the forecasts are needed, without dis-
is disaggregated first by product type then by geography, closure of business strategies. The paper is organized as
although other configurations of the hierarchical structure follows: Section II introduces formulation and variants of
could be defined. the hierarchical reconciliation. As the name suggests, the
In a typical supply chain, distributors receive orders from method reconciles the forecasts instead of producing the
the retailers and thus make forecasts based on historical forecasts; base forecasts are generated independently by
orders. If we assume independent decision making among various distributors and retailers prior to reconciliation.
the retailers, price and promotion information is unob- Section III illustrates our assumptions and approaches to
servable by the distributors. The forecasts made by the produce base forecasts. Detailed results and discussions are
distributors are thus often based solely on the aggregated shown in Section IV and Section V concludes the paper.
sales time series. However, such distributor-side forecasts
are inconsistent with the sum of all retailer-side forecasts,
due to the different forecast generating processes used by II. H IERARCHICAL R ECONCILIATION
various parties. In an ideal environment where every retailer
produce their own forecasts based on their business strategy
and feed the results to the distributor, the forecasts in the In this section, hierarchical forecasting methods will be
supply chain need to be “aggregate consistent”. Forecast discussed based on the illustrative example shown in Fig. 1.
reconciliation is therefore needed. More general description of the methods can be found in
In fact, the problem herein discussed is related to infor- Refs. [13]–[15]. We note that the number of hierarchies and
mation sharing in supply chains. In the field of management aggregation/disaggregation interpretation may vary based
science, information sharing is not new [10], [11]. However, on the granularity of the available research data.
most studies are theoretical; much attention is focused on The underlying principle of hierarchical prediction is
the effects of information sharing on operation management originated from a summing matrix S. Suppose Yi,t de-
issues such as inventory management. Ref. [12] reviewed notes all observations at level i and time t and Yt =
(Y0,t , Y1,t , Y2,t ) , then:
2 Despite the differences between demand and sales [9], we consider
using unit sales to represent demand and evaluate various models’ perfor-
mance based on sales data. Yt = SY2,t (4)

2115
where which is equivalent to the ordinary least square (OLS)
⎛ ⎞
1 1 1 1 1 1 1 1 1 solution. Note that β h = (S  S)−1 S  Y t+h in Eq. (8) is the
⎜ 1 1 1 0 0 0 0 0 0 ⎟ best linear unbiased estimator [14]. In this case, P in Eq. (6)
⎜ ⎟
⎜ 0 0 0 1 1 1 0 0 0 ⎟ is (S  S)−1 S  . Therefore, we only need to construct the
⎜ ⎟
⎜ 0 0 0 0 0 0 1 1 1 ⎟ summing matrix and produce the base forecasts in order
⎜ ⎟
⎜ 1 0 0 0 0 0 0 0 0 ⎟ to obtain the optimal reconciled forecasts. On the other
⎜ ⎟
⎜ 0 1 0 0 0 0 0 0 0 ⎟ circumstances, if the earlier assumption does not hold, GLS
⎜ ⎟
S=⎜
⎜ 0 0 1 0 0 0 0 0 0 ⎟
⎟ (5) solution is:
⎜ 0 0 0 1 0 0 0 0 0 ⎟
⎜ ⎟ Y
t+h = S(S  Σ†h S)−1 S  Σ†h Y t+h (9)
⎜ 0 0 0 0 1 0 0 0 0 ⎟
⎜ ⎟

⎜ 0 0 0 0 0 1 0 0 0 ⎟
⎟ where Σ†h is the generalized inverse of the error covariance
⎜ 0 0 0 0 0 0 1 0 0 ⎟ matrix. However, Σh is not known and very difficult4
⎜ ⎟
⎝ 0 0 0 0 0 0 0 1 0 ⎠ (or impossible) to estimate for large hierarchies [15]. The
0 0 0 0 0 0 0 0 1 weighted least squares (WLS) solution given by:
The summing effect is apparent. The bottom-level time Y
t+h = S(S  Λh S)−1 S  Λh Y t+h (10)
series will sum into various time series in the hier-
archy based on various rows in S. Suppose Y t+h = may be appropriate. Λh in Eq. (10) is a diagonal matrix
(Y 0,t+h , Y 1,t+h , Y 2,t+h ) denotes the h-step-ahead base with elements equal to the inverse of the variances of h .
forecasts (see below), all hierarchical forecasting methods III. P RODUCING BASE FORECASTS
can be written as: Following Eq. (6), the collection of base forecasts is
Y
t+h = SP Y t+h (6) needed to reconcile the forecast according to the hierarchy.
We consider the Dominick’s database in this paper; the
where P is a matrix of choice to reconcile forecasts, Y
t+h dataset is provided by the James M. Kilts Center, University
is the final revised hierarchical forecasts. of Chicago Booth School of Business with a collaborative
In general, there are four ways to reconcile the forecasts, effort by the Dominick’s Finer Food (DFF). In Part I of this
namely, the top-down approach, the bottom-up approach, two-part paper [16], data exploratory analysis, visualization
the middle-out approach and optimal reconciliation [14]. and preprocessing are described in details. However, we
For example, if P = (09×4 |I9 ), where 0i×j is the i×j null would make necessary reiterations in Part II, so that the
matrix and I9 is a size 9 identity matrix, Eq. (6) represents present paper can be relatively independent. At this stage,
the bottom-up approach. In this case SP = (013×4 |S), the we note that 37 UPCs (level 1 time series) from the bot-
times series at levels 0 and 1 are simple sums of the bottom- tled juice category are selected to demonstrate hierarchical
level series. Similarly, the top-down approach corresponds forecasting.
to P = (p|09×12 ) where p = (p1 , · · · , p9 ) is a vector of
“weights” of the bottom-level series. A. General methods versus specific methods
As compared to the bottom-up and top-down approaches, Univariate statistical forecasting methods such as the
much effort is needed to derive the representation for autoregressive integrated moving average models (ARIMA)
optimal reconciliation. We refer the interested readers to and exponential smoothing state space models (ETS) are
Ref. [13] for details; we only state the main results here. general; they can arguably be applied to any time series
The general idea is derived from the representation of the and generate forecasts (see Refs. [17], [18] for details of
h-step-ahead base forecasts by a linear regression model: ARIMA and ETS respectively). These univariate methods
Y t+h = Sβh + h (7) do not consider the effects of exogenous factors. In FMCG
forecasting, the exogenous factors such as price and pro-
where h has zero mean and covariance Σh ; βh = motion are well-studied. There is little argument we can
E(Y 2,t+h ), i.e., βh is the expectation of the base forecasts. make on not using these factors when data are available.
Ref. [13] showed that under a reasonable assumption3 , the Forecasting methods which consider domain knowledge are
optimal reconciled forecasts are given by the generalized specific methods.
least square (GLS) solution:
4 In general, given the symmetrical structure of Σ , there are n(n+1)/2
Y
t+h = S β h = S(S  S)−1 S  Y t+h
h
(8) parameters to be estimated in a GLS setting, which is difficult especially
when n is large. This is usually counteracted by imposing some structure
3 Sometimes, it is reasonable to assume  ≈ S
h 2,h , where 2,h is on Σh , so that the number of parameters to be estimated is much smaller.
the level 2 forecast errors in our case, or more generally, the bottom-level Even though, the appropriateness of such structure needs to be carefully
forecast errors. examined.

2116
Complete time series
Training: 1 ∼ 200 Unused: 202 ∼ 377
1 ··· ···
Training: 2 ∼ 201 Unused: 203 ∼ 377
2 ··· ···
Training:3 ∼ 202 Unused: 204 ∼ 377
n rolling 3 ··· ···
forecasts Training: 4 ∼ 203 Unused: 205 ∼ 377
4 ··· ···
..
.
Training: n ∼ 376
n ··· ···

Figure 2. Experimental design: 177 (n = 177 in this case) rolling forecasts are performed for each time series. The data point being forecast is shown
in red. The size of training window (shown in blue) is fixed at 200.

In this paper, we consider using SES (equivalent to an accuracy by only considering the promotional index and
ARIMA(0,1,1) model) to represent the general methods. price information of the focal product.
This is because our time series do not process strong
seasonal and trend components. For specific methods, we B. The pitfalls of using averaged information
consider using the ADL model. Instead of following the Ideally, if all the retailer-level business strategies for a
fully specified Eq. (3), a reduced form is used: particular UPC is available to the distributor, the distributor
would make an informed decision on overall price and

 

promotional index, and thus calculate the price curves and
ln(y0,t ) =α0 + αj ln(y0,t−j ) + β0,j ln(p0,t−j )
promotion stripes shown in Fig. 2 of Ref. [16]. The UPC-
j=1 j=0
level forecasts can then be made using ADL models such as


Eq. (11). However, as mentioned in point 2) Section I, such
+ γ0,j I0,t−j + εt (11)
j=0
information is usually not available to distributors; applying
ADL on UPC level may not be practically viable. In addi-
To support our model of choice, we use a data visualization tion, as all the stores in the DFF dataset belong to the same
technique in Ref. [16]. It is shown that the calendar events retailer group, the retailers tend to adopt a uniform pricing
(9 holidays) and 12 months in a year have minimal cor- strategy where they increase or decrease the price for all
relation with the demand fluctuation. The monthly dummy the stores at the same time [19]. When the pricing strategies
variables and dummy variables for calendar event in Eq. (3) are decided independently and the number of independent
are thus dropped. retailers is large, the price and promotion information on
Besides the dummy variables, the full ADL model shown UPC level may be redundant (e.g., each and every week
in Eq. (3) also considers the promotional index and price there may be several retailers making promotions so that
of the competitor products as predictors. In a separate anal- the overall promotional index is “flat” throughout a year).
ysis we found that given the 37 selected UPCs, including In such scenarios, the distributor would rely on simple time
the lasso-identified competitors only add to the prediction series models to make forecasts. Following this assumption,
variance but not accuracy. An obvious reason for such we only use SES to forecast the UPC-level sales, i.e.,
observation is that our products are from the same category, forecast the level 1 time series; whereas for the retailer-
the complimentary effect (e.g. bread and butter) of FMCG level sales where price and promotion information is more
sales is not applicable. The substitution effect (e.g. eggs readily available, ADL is used.
from different farms) is also hypothesized to be minimal due
to the relatively persistent price over the years, and thus the C. Experimental design
consumer loyalty. Therefore, we use Eq. (11) in this paper An ad hoc data preprocessing sequence is used in
to optimize the trade-off between prediction variance and Ref. [16] to yield a total of 1971 store-level time series

2117
Table I
F ORECAST ERRORS (MAPE) OF 37 SELECTED UPC S FROM THE BOTTLED JUICE CATEGORY. T HE ERRORS ARE IN PERCENTAGE .

No. of Univariate methods Hierarchical methods


UPC Product name
series Persistence SES Bottom-up Optimal (OLS) Optimal (WLS)
1 7045011484 SUNSWEET PRUNE JUIC 48Oz 55 10.16 10.74 9.02 10.73 8.65
2 5300015154 REALEMON PLASTIC LEM 45Oz 61 11.17 9.64 9.00 11.03 8.92
3 5300015108 REALEMON LEMON JUICE 8Oz 64 15.11 13.05 10.89 14.98 10.89
4 3828103123 HH LEMON JUICE 32Oz 61 16.74 16.92 18.32 16.55 17.50
5 3828103091 HH APPLE JUICE 128Oz 66 31.81 37.66 20.92 31.40 20.86
6 3828103025 DOM APPLE JUICE 32Oz 52 39.11 41.96 21.97 38.40 21.68
7 3828103021 HH APPLE JUICE 48Oz 63 62.84 182.55 34.23 61.84 33.88
8 3120027407 OS PINK GRAPEFRUIT 64Oz 42 29.99 34.58 17.61 30.04 17.51
9 3120027007 OS GRAPEFRUIT JUICE 64Oz 53 28.16 30.34 12.68 27.94 12.59
10 3120026134 OS LC CRANRASP 48Oz 52 26.11 24.66 12.26 26.22 12.33
11 3120021007 OS CRANAPPLE DRINK 64Oz 62 25.18 27.01 12.21 24.74 12.12
12 3120020035 OS LO CAL CRANBERRY 48Oz 63 21.33 20.29 10.49 21.37 10.43
13 3120020007 OS CRNBRY JCE COCKTA 64Oz 65 26.15 30.91 11.67 25.86 11.51
14 3120020005 OS CRANBERRY COCKTA 48Oz 66 34.65 50.54 12.46 34.38 12.41
15 1480031656 MOTTS NATURAL APPLE 64Oz 65 74.67 55.48 20.62 73.62 20.57
16 1480000034 MOTTS REGULAR APPLE 64Oz 67 85.04 73.67 26.49 83.91 27.72
17 7045011329 SUNSWEET PRUNE JUICE 32Oz 61 12.28 11.99 12.10 12.32 11.70
18 5300015132 REALEMON LEMON JUICE 32Oz 66 16.55 17.49 17.05 16.57 16.05
19 4850000193 TROP TWSTR ORGCRAN 46Oz 34 30.35 29.55 16.27 30.06 16.20
20 4180022700 WELCHS WHITE GRAPE J 64Oz 54 10.49 11.11 16.51 10.36 15.14
21 4180020750 WELCHS GRAPE JUICE 64Oz 65 11.21 11.11 14.87 11.41 13.89
22 4176000394 INDIAN SUMMER APPLE 64Oz 64 141.24 262.59 31.19 139.11 30.83
23 3828103017 HH APPLE JUICE 64Oz 67 66.32 76.37 25.33 65.51 25.16
24 3828103009 HH PRUNE JUICE 40Oz 40 15.72 15.93 13.47 16.14 12.91
25 3120027005 OS GRAPEFRUIT JUICE 48Oz 39 29.75 30.56 16.69 29.61 16.60
26 3120026107 OS CRANRASPBERRY DR 64Oz 56 28.70 33.27 12.07 28.38 11.97
27 3120026105 OS CRANRASPBERRY 48Oz 45 43.58 71.13 20.12 43.51 20.30
28 3120021005 OS CRANAPPLE DRINK 48Oz 41 41.11 55.02 18.67 41.08 18.65
29 3828103115 DOM CRANBERRY RSPBRY 48Oz 7 33.58 33.57 27.12 37.00 26.60
30 3828103033 HH CRANBERRY JUICE C 48Oz 61 23.28 24.27 18.61 22.94 18.45
31 3828103005 DOM PRUNE JUICE 32Oz 41 16.49 15.89 15.48 16.62 14.63
32 3120027405 OS PINK GRAPEFRUIT 48Oz 26 31.49 36.20 19.40 31.97 19.33
33 1480000032 MOTTS APPLE JUICE P 32Oz 62 9.57 9.07 11.71 9.62 10.60
34 5300015407 REAL LIME JUICE 8Oz 7 24.35 22.16 22.72 25.99 21.44
35 7045011402 SUNSWEET PRUNE JCE W 40Oz 58 12.53 14.76 11.56 12.43 11.00
36 3120020000 OS CRANBERRY COCKT 32Oz 59 10.99 9.96 10.50 11.06 10.20
37 1480051324 MOTTS CLAMATO JUICE 32Oz 61 14.10 12.69 15.27 14.10 14.75
overall 1971 31.40 39.59 16.96 31.32 16.65

for 37 UPCs over a period of 377 weeks. For each of the commonly evaluated parameter is the forecast horizon. For
2009 time series (1, 37 and 1971 time series at levels 0, 1 the DFF dataset, it was shown that the difference between
and 2 respectively), we produce 177 rolling forecasts with a 1-week-ahead and 12-weeks-ahead forecasting accuracies
a moving window of 200 weeks of training data. In other is small [3].
words, the first 200 data points are used in model fitting There is a rich literature on choice of error metrics in
to produce the first forecast; when the ‘new data’ becomes forecasting. In this study we choose the scale-independent
available, we refit the model using the new window of 200 mean average percentage error (MAPE), which is most
weeks and produce another forecast. An illustration of the widely used in practice [20].
experimental design is depicted in Fig. 2. An alternative to
the present approach is to use a fixed model throughout the IV. R ESULTS AND DISCUSSION
evaluation period. It is found that some time series have a We apply both SES and persistence models on the 37
gradual change in level component; the iterative approach UPC-level aggregated series. Persistence model assumes
could capture the gradual change and thus make better fore- the forecast is equal to the current observation; it is of-
casts. We also note that it is possible to perform experiments ten included as a naive benchmark. Table I shows the
to investigate the effect of training length on forecasting results. It is observed that the overall performance of the
accuracy, however, we limit our choice of training length persistence model is better than that of the SES, although
of 200 following Ref. [3]. In forecasting studies, another SES gives improved results for some UPCs. Fig. 3 shows

2118
3828103021 5300015154

40000

1500
30000
Unit sales

1200
20000

10000 900

0 600
200 250 300 350 200 250 300 350
Timestep

Legend Actual Persistence SES

Figure 3. Time series plot of forecast values using univariate models for 2 example UPCs.

Bottom−up Optimal (OLS) Optimal (WLS)

10

8
Forecast log of sales, ln(y0, t)

Persistence SES

Count
10
100

75
8
50

25
6

6 8 10 6 8 10
Measured log of sales, ln(y0, t)

Figure 4. Scatter plots of the measured and forecast log of unit sales. For each forecasting method, all forecast points from the 37 UPCs are included
(177×37 = 6549 points in each sub-plot). Hexagon binning algorithm is used for visualization.

2119
the forecast results time series plot for 2 selected UPCs. achieve an overall of 58% lower MAPE over SES. With
The relatively big SES forecast error (182.55% for SES; these more accurate forecasts and without the need for
62.84% for persistence) for UPC-3828103021 originates either distributors or retailers to divulge sensitive business
from the misidentified level component (see left plot). The information, it depicts a win-win situation for this supply
misidentification is caused by the large sales peaks present chain cooperative model to be both efficient and practical.
in the data. On the other hand (right plot), when the data In future, we will further explore hierarchical fore-
fluctuates about a level, SES gives slightly better results casting (for supply chain applications) by considering an
owing to its smoothing property. extended hierarchy, i.e., including manufacturers/suppliers
Recall in Section II, we introduced several variants of and/or products from different categories. To improve scal-
hierarchical forecasting, including top-down, bottom-up and ability, the recursive calculation of (S  Λh S)−1 as shown
optimal approaches. For FMCG data, it is well-known that in Ref. [15] can be employed. Other exogenous information
information loss is substantial in aggregation and therefore such as sentiment scores from online product reviews as
the top-down method may not be suitable. Furthermore, well as marketing and advertising strategies will be explored
given our hierarchical structure, it is not easy to assign to evaluate whether they can improve forecast accuracy.
weights to the middle- and bottom-level series. For example, In addition, probabilistic forecasts instead of the current
we need to know the market share of each UPC in order point forecasts will be investigated. We will also attempt
to assign weights to the middle-level series. We therefore to address the computational challenge in estimating the
consider bottom-up and optimal approaches for hierarchical variance-covariance structure of the base forecast, so that
forecasting. Both OLS and WLS but not GLS are used a GLS solution to the hierarchical reconciliation can be
in optimal reconciliation for obvious reasons stated in employed.
Section II. Since persistence is superior to SES for our
particular dataset, we use results from the persistence model ACKNOWLEDGMENT
(for levels 0 and 1 only; level 2 series is forecast using ADL
This work is partially supported under the A∗ STAR
models) during reconciliation. The results are tabulated in
TSRP fund 1424200021 and Antuit–SIMTech Supply Chain
Table I. To further understand the errors, the scatter plot
Analytics Lab.
of measured versus forecast unit sales for each forecasting
method is shown in Fig. 4. Note that both the ordinate and R EFERENCES
abscissa are on log-scale; hexagon binning algorithm is used
for visualization. [1] D. Samson and P. J. Singh, Operations Management.
From Table I and Fig. 4, both bottom-up and optimal Cambridge University Press, 2008, cambridge Books
Online. [Online]. Available: http://dx.doi.org/10.1017/
(WLS) approaches show significant improvements over the
CBO9781139150002
univariate methods, whereas the optimal (OLS) reconcil-
iation gives similar results to the persistence model. The [2] A. Nevo, “Measuring market power in the ready-to-eat cereal
unsatisfactory performance of OLS-based reconciliation is industry,” Econometrica, vol. 69, no. 2, pp. 307–342, Mar.
caused by the heteroscedasticity in the data. The bottom- 2001.
up and optimal (WLS) approaches do not distinguish from
each other in this case study. It is believed that the good [3] T. Huang, R. Fildes, and D. Soopramanien, “The value of
competitive information in forecasting FMCG retail product
performance of the bottom-up approach can be attributed to sales and the variable selection problem,” European Journal
the nature of the data. Even at the very bottom level, the data of Operational Research, vol. 237, no. 2, pp. 738–748, 2014.
is well behaved, with price and promotion driven unit sales
for most series. It is also hypothesized that when the number [4] R. Fildes, K. Nikolopoulos, S. F. Crone, and A. A. Syntetos,
of time series and number of levels in the hierarchy scale “Forecasting and operational research: a review,” Journal of
the Operational Research Society, vol. 59, no. 9, pp. 1150–
up, the optimal reconciliation would be more advantageous
1172, May 2008.
than the bottom-up approach.
[5] O. G. Ali, S. Sayin, T. van Woensel, and J. Fransoo, “SKU
V. C ONCLUSION demand forecasting in the presence of promotions,” Expert
We perform one-week-ahead UPC-level FMCG sales Systems with Applications, vol. 36, no. 10, pp. 12 340–
forecasting by considering the inherent hierarchical struc- 12 348, Dec. 2009.
ture of the distributor-retailer relationship in a supply chain.
[6] L. G. Cooper, P. Baron, W. Levy, M. Swisher, and P. Gogos,
The optimally reconciled hierarchical forecasts using WLS “PromocastTM : A new forecasting method for promotion
is shown to be the most promising as compared to the planning,” Marketing Science, vol. 18, no. 3, pp. 301–316,
other benchmarking models. The WLS approach is able to 1999.

2120
[7] S. Bogomolova, S. Dunn, G. Trinh, J. Taylor, and R. J. Volpe, [14] G. Athanasopoulos, R. A. Ahmed, and R. J. Hyndman,
“Price promotion landscape in the US and UK: Depicting “Hierarchical forecasts for Australian domestic tourism,”
retail practice to inform future research agenda,” Journal International Journal of Forecasting, vol. 25, no. 1, pp. 146–
of Retailing and Consumer Services, vol. 25, pp. 1–11, Jul. 166, Jan. 2009.
2015.
[15] R. J. Hyndman, A. Lee, and E. Wang, “Fast computation
[8] T. Huang, personal communication, 2015. of reconciled forecasts for hierarchical and grouped time
series,” 2014, working paper.
[9] S. A. Conrad, “Sales data and the estimation of demand,”
Operational Research Quarterly (1970-1977), vol. 27, no. 1, [16] D. Yang, G. S. W. Goh, C. Xu, and A. N. S. Zhang,
pp. 123–127, 1976. “Forecast UPC-level FMCG demand, Part I: Exploratory
analysis and visualization,” in Big Data (Big Data), 2015
[10] C. Terwiesch, Z. J. Ren, T. H. Ho, and M. A. Cohen, “An IEEE International Conference on, Oct 2015.
Empirical Analysis of Forecast Sharing in the Semiconductor
Equipment Supply Chain,” Management Science, vol. 51, [17] G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series
no. 2, pp. 208–220, Feb. 2005. Analysis: Forecasting and Control. Englewood Cliffs-New
Jersey: Prentice Hall, Inc., 1994.
[11] H. L. Lee, K. C. So, and C. S. Tang, “The Value of Infor-
mation Sharing in a Two-Level Supply Chain,” Management [18] R. J. Hyndman, A. B. Koehler, J. K. Ord, and R. D. Snyder,
Science, vol. 46, no. 5, pp. 626–643, May 2000. Forecasting with Exponential Smoothing. Deblik, Berlin,
Germany: Springer, 2008.
[12] M. M. Ali and J. Boylan, “On the value of sharing demand
information in supply chains,” in OR56 Annual Conference, [19] S. J. Hoch, B.-D. Kim, A. L. Montgomery, and P. E. Rossi,
London, 2014, pp. 44–56. “Determinants of store-level price elasticity,” Journal of
Marketing Research, vol. 32, no. 1, pp. 17–29, 1995.
[13] R. J. Hyndman, R. A. Ahmed, G. Athanasopoulos, and H. L.
Shang, “Optimal combination forecasts for hierarchical time [20] R. Fildes and P. Goodwin, “Fine judgements: do organi-
series,” Computational Statistics & Data Analysis, vol. 55, zations follow best practice when applying management
no. 9, pp. 2579–2589, Sep. 2011. judgement to forecasting?” Interfaces, vol. 37, pp. 570–576,
2007.

2121

You might also like