0% found this document useful (0 votes)
39 views

LSTM Based Long-Term Energy Consumption Prediction With

Uploaded by

achugaming6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

LSTM Based Long-Term Energy Consumption Prediction With

Uploaded by

achugaming6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Energy 197 (2020) 117197

Contents lists available at ScienceDirect

Energy
journal homepage: www.elsevier.com/locate/energy

LSTM based long-term energy consumption prediction with


periodicity
Jian Qi Wang, Yu Du, Jing Wang*
The College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China

a r t i c l e i n f o a b s t r a c t

Article history: Energy consumption information is a kind of time series with periodicity in many real system, while the
Received 21 September 2019 general forecasting methods do not concern periodicity. This paper proposes a novel approach based on
Received in revised form long short-term memory (LSTM) network for predicting the periodic energy consumption. Firstly, hidden
22 January 2020
features are extracted by the autocorrelation graph among the real industrial data. The correlation
Accepted 18 February 2020
analysis and mechanism analysis contribute to finding the appropriate secondary variables as model
Available online 21 February 2020
input. In addition, the time variable is complemented to precisely capture the periodicity. Then a LSTM
network is constructed to model and forecast sequential data. The experimental results on a certain
Keywords:
Periodic time series
cooling system demonstrate that the proposed method has higher prediction performance compared
Energy consumption prediction with several traditional forecasting methods, such as autoregressive moving average model (ARMA),
Secondary variables selection autoregressive fractional integrated moving average model (ARFIMA) and back propagation neural
Long short-term memory(LSTM) network (BPNN). The RMSE of LSTM is 19.7%, 54.85%, 64.59% lower than BPNN, ARMA, ARFIMA on the
May test data. Furthermore, considering the limitation of missing certain measuring equipments, new
prediction models with the reduced secondary variables are retrained to explore the relationship be-
tween the prediction accuracy and the potential input variables. The experimental results demonstrate
that the proposed algorithm has the excellent generalization capability.
© 2020 Elsevier Ltd. All rights reserved.

1. Introduction exponential smoothing method are two familiar statistical methods


for predicting temporal data. ARIMA is employed to forecast short-
Energy consumption prediction has been regarded as a vital and term electricity consumption and receive good forecasting perfor-
challenging task in the industry and academia. Accurate energy mance [9,10]. A new model is proposed which combine the virtues
consumption prediction can provide valid guidance in allocating of the seasonal time series ARIMA and back-propagation neural
energy resource [1], formulating energy-saving measures [2] and network model [11]. It can capture the periodic characteristics of
improving energy system. Meanwhile, exact energy prediction also the data and reduce dependence on data volume [12]. introduces
can give a hand to the managers for implementing market research the grey accumulating generation operator into the exponential
management and speeding economic development [3]. From an smoothing method to solve the conflict between good smoothing
academic point of view, its development can be applied to the effect and additional weight. The cubic exponential smoothing
forecasting area for other time series, such as traffic flow prediction method is implemented to predict time series of heavy metal
[4], weather prediction [5], temperature prediction [6], stock pre- contamination [13]. However, the exponential smoothing method
diction [7] and solar radiation prediction [8]. lacks the ability to identify turning the points in data. What is more,
Many prediction methods have been proposed to predict the ARIMA and exponential smoothing method both rely heavily on the
time series data. These methods are divided into three categories: historical data. They are not the right choice to predict long-term
statistical analysis, machine learning and deep learning. Autore- time series if there is much variability in data.
gressive integrated moving average model (ARIMA) and The machine learning approaches, such as k-nearest neighbors,
support vector machine and artificial neural networks (ANN), have
gotten much attention due to the learning ability for nonlinear
* Corresponding author. feature mappings between input and output data. They build the
E-mail addresses: [email protected] (J.Q. Wang), duyu2011@ data relationship directly from the history data without considering
foxmail.com (Y. Du), [email protected] (J. Wang).

https://doi.org/10.1016/j.energy.2020.117197
0360-5442/© 2020 Elsevier Ltd. All rights reserved.
2 J.Q. Wang et al. / Energy 197 (2020) 117197

maintains the temporal correlation by introducing memory cell


Nomenclature [24]. infers the lost data by combining the LSTM model and missing
data pattern analysis.
In the energy consumption prediction domain, it is not to be
Abbreviation ignored that the data usually emerges the strong periodic charac-
ACF autocorrelation function teristic. Periodicity is an undulating or oscillating change in the
ANN artificial neural networks time series around a long-term trend. It is not a trend toward a
ARFIMA autoregressive fractional integrated moving single direction, but a pattern of alternating fluctuations over time.
average model For example, the peak period of daily electricity use usually occurs
ARMA autoregressive moving average model at a certain fixed time of day. Likewise, energy consumption data
BPNN back propagation neural network also have monthly and seasonal periodicity. Therefore, it is neces-
LSTM long short-term memory sary to find an appropriate approach which can precisely capture
PACF partial autocorrelation function the periodicity. Moreover, there are other multiple factors affecting
RNN recurrent neural network the energy consumption to be taken into account. The prediction is
probably not precise if only periods are utilized.
Subscripts Inspired by the advantages of LSTM, a long-term forecasting
i the ith order method is proposed to predict the time series with strong period-
j the jth order icity. Firstly, the time series from the real system are statistically
analyzed and it is shown that the energy consumption series has
Variable symbols obvious daily periodicity in the autocorrelogram. The correlation
b bias of LSTM coefficients among the measurement variables are used to find the
h output of the memory block of LSTM strong relevance variables with predicted output. Some mechanism
W; U weight of LSTM knowledge are also assisted to analyze and optimize the selection
X input of appropriate secondary variables. Here the time label is also
x; y two independent sets of variables added into the secondary variables to capture the daily periods.
Y actual output Then the forecasting model is constructed based on the LSTM
b
Y predicted output network with the secondary variables as inputs. It has shown an
excellent forecasting ability for the time series by introducing the
gate concepts to judge which momentous information should be
maintained and which futile data should be forgotten. Therefore,
the internal mechanism [14]. applies artificial neural networks and the proposed method based on LSTM can not only be used to retain
support vector machine to forecast the chaotic time series without the long-range correlations, but also to capture the periodicity.
a similar pattern. Least squares support vector machines is imple- When the LSTM network is trained and optimized, it can be used to
mented to solve the problem of choosing kernel parameters and long-term forecast the further energy consumption. The remark-
hyperparameters which define the function to be minimized able contributions of this paper lie in three aspects:
[15,16]. [17] introduces the structural learning with forgetting and
the mean prediction error or an information criterion to find an  A complete strategy is proposed to predict the time series with
ANN model with better generalization ability. Possignolo et al. periodicity based on LSTM. The proposed method selects the
employ ANN to be an alternative to dedicated hardware by fore- appropriate secondary variables from the multitudinous mea-
casting time series [18]. An improved k-nearest neighbors algo- surements to break the limitation of auto-regression methods. It
rithm is proposed by incorporating the amplitude and offset also increase the generalization ability and reduce the compu-
invariance, complexity invariance, and treatment of trivial matches tation cost due to the lower-dimensional input.
[19]. Nevertheless, these methods mentioned above all do not  Under the condition that all the secondary variables are avail-
effectively explore the correlations between data. able, the experimental results for the energy prediction of the
Deep learning methods increase the number of hidden layers of cooling system show that the proposed LSTM model is be su-
neural network, and they perform very well in processing strong perior to the traditional methods, such as BPNN model, ARMA
nonlinear characteristics. Back-propagation neural network (BPNN) model and ARFIMA model in the forecasting of long-term time
and recurrent neural network (RNN) are two usual deep learning series.
algorithms applied in forecasting time series [20]. integrates BPNN  Considering some variables are not measured due to the sensor
and genetic algorithm to improve the forecasting performance of missing or damaged, we also explore the prediction perfor-
strong nonlinear time series. An artificial immune system based mance comparison by retraining new models with fewer sec-
BPNN is constructed to predict the chaotic time series, and this ondary variables. The experiment results under different
method overcomes the local optimal weight problem [21]. How- missing combinations are helpful to in-depth analyze the pre-
ever, the prediction models trained by BPNN are difficult to guar- diction model and guide the application in other cooling
antee the optimal performance due to the serious exploding/ systems.
vanishing gradient problems. RNN can retain the temporal infor-
mation, whose structure is first presented in 1990 [22]. It in- The remainder of this paper is organized as follows: Section 2
troduces the concept of recurrent layer to choose whether to retain gives the general framework of the prediction modeling and the
the information from the previous moments. However, RNN can’t specific implementation techniques. Section 3 demonstrates the
maintain the long-term dependence well on account of the ex- experimental detail and results analysis. Section 4 draws the
ploding/vanishing gradient problem. To address these problems, an conclusions.
improved RNN, named as long short-term memory network (LSTM)
is proposed [23]. LSTM figures out the exploding/vanishing 2. Methodology
gradient problem by introducing the gate structures and it
In general, it is a rather challenging task to precisely forecast the
J.Q. Wang et al. / Energy 197 (2020) 117197 3

time series with randomness, uncertainty, chaos, nonlinearity and data. Autocorrelation and cross-correlation analysis can help seek
periodicity. This section details the proposed framework for long- the hidden periodicity and determine the correlation degree among
term energy consumption prediction with distinct periodicity. variables. Data differencing can eliminate the variation trend of
The techniques involved are given in the following. data, especially the periodicity. Many statistic analysis methods can
help provide reference opinions for the subsequent selection of
2.1. Design philosophy research variables and model types.
Step 4: Secondary variables determination. There are many
Fig. 1 illustrates the universal process of time series prediction. variables that affect the forecasting target in the real industry. Some
The specific steps are as follows: have large impact and some have little influence. So it is critical to
Step 1: Project understanding. Analyze the prediction re- identify the process variables that have the more impact on energy
quirements and understand the process mechanism. Determine prediction. Secondary variables generally refers to the variables
initially the relevant variables based on the mechanism analysis. that has a great influence on the predicted object, are determined
Step 2: Data preparation. First, collect the process data related to by the combination of correlation analysis and mechanism analysis.
prediction according to the forecast target from the industrial Step 5: Predictive modeling. There are many approaches to
system. Generally there are many contaminations among the real forecast time series, such as RNNs, ARMA and BPNN. Which one is
measurements, such as null values due to sensor fault, outliers with the most appropriate model for the given predictive problem? We
abrupt variation and noise. Eliminating these abnormal data and should answer this question and train the prediction model ac-
denoising should be implemented to obtain the filtered data. cording to the secondary variables. By adjusting the parameters of
Step 3: Data analytic. The goal of this step is to explore the the model, the model with the highest prediction accuracy will be
hidden patterns in the filtered data and explore the distribution is obtained.
characteristics of the data. For example, line graphs of variables can Step 6: Model evaluation. In general, some indicators are used to
find the changing properties and distribution characteristics of evaluate the prediction performance and the generalization ability

Fig. 1. Flowchart of the proposed methodology.


4 J.Q. Wang et al. / Energy 197 (2020) 117197

of the model, such as root mean squared error (MSE], mean squared
error (RMSE], mean absolute percentage error (MAPE], mean ab- D1  D2
CORðx; yÞ ¼ (4)
solute error (MAE] and variance. Section 2.4 introduces these in- nðn  1Þ=2
dicators in detail.
Step 7: Model Updating. Any prediction model will gradually where D1 and D2 represents the number of the concordant pairs
deviate from the true value with the time increasing. Therefore, the and the disconcordant pairs, respectively.
update strategy of the prediction model is inevitable. Here two
criteria are considered for updating the model. The first one is that 2.2.3. Spearman rank correlation coefficient
if the cumulative daily error reaches to a threshold, the model will Spearman rank correlation coefficient is very similar to Kendall
be updated. Another is the regularly updated, for example the rank correlation coefficient. They are both adept at calculating the
model is updated in fix time period. monotone correlation between variables. However, the Kendall
Step 8: Result analysis. Analyzing the predicting results of the method is faster than the Spearman method in calculating corre-
model can help find out some implicit conclusions. This may pro- lation coefficients of ordered variables.
vide ideas for future research. The spearman correlation coefficient CORðx; yÞ is:

X
n
2.2. Correlation analysis S¼ ðxi  yi Þ2 (5)
i¼1

The variables in one system are related to each other. When a


variable x varies, the other variable y with a high degree of corre- 6S
CORðx; yÞ ¼ 1  (6)
lation will also vary. The correlation coefficient, CORðx; yÞ ranges n3  n
from 1 to þ1, depicts the correlation degree between variables.
where n represents the number of variables in each group.
CORðx; yÞ ¼ 0 indicates that there is no relation between x and y.
The spearman method is not sensitive to outliers, so outliers in
CORðx; yÞ ¼ 1 indicates a complete positive correlation, and
the variables have little effect on the correlation coefficient.
CORðx; yÞ ¼ 1 means a total negative correlation. The absolute
value of the correlation coefficient approachs 1, the correlation
become strong. In contrast, the correlation coefficient is close to 0, 2.3. Long short-term memory network (LSTM)
the correlation become weak. There are three frequently-used
methods to calculate the correlation coefficient, namely Pearson The internal architecture of LSTM and its superiority in sequence
rank correlation coefficient [25], Kendall rank correlation coeffi- analysis are briefly described in this section. LSTM introduces
cient [26] and Spearman rank correlation coefficient [27]. The de- special units called memory block in the recurrent layer to over-
tails are as follows. come the vanishing/exploding gradient problems. Fig. 2 shows the
structure of a memory block. Memory cells are included in the

2.2.1. Pearson correlation coefficient (PCC)


PCC is suitable for dealing with the linear correlation between
two continuous variables. x and y represent two sets of variables,
respectively. The number of variables in each group is n. For two set
of variables, fxi gin and fyi gin , have means:

1 Xn
x¼  ðx Þ (1)
n i¼1 i

1 Xn
y¼  ðy Þ (2)
n i¼1 i

Then the Pearson correlation coefficient CORðx; yÞ is:


Pn
ðx  xÞðyi  yÞ
CORðx; yÞ ¼ P 1 i
q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (3)
n 2 Pn 2
1 ðxi  xÞ 1 ðyi  yÞ

2.2.2. Kendall rank correlation coefficient


It applies to the case that two sets of variables are both ordered
and monotonically correlated. ðxi ; yi Þ is on behalf of any pair of
observations of fxi gin and fyi gin . ðxi ; yi Þ are said to be a concor-
dant pair, if both xi > xj and yi > yj , or if xi < xj and yi < yj . ðxi ; yi Þ are
said to be a discordant pair, if both xi > xj and yi < yj , or if xi < xj and
yi > yj . They are neither a concordant pair nor a discordant pair, if
xi ¼ xj or yi ¼ yj.
The correlation coefficient CORðx; yÞ calculated by using Kendall Fig. 2. Architecture of the memory block of LSTM.
rank correlation coefficient is:
J.Q. Wang et al. / Energy 197 (2020) 117197 5

memory blocks, which are self-connected. Three especial multi- Fig. 3 shows the research framework of LSTM. Like other neural
plicative units called gates are introduced to store temporal se- networks, LSTM is also a multi-layer architecture, namely the input
quences. The functions of the three gates are as follows: layer, the hidden layer and the output layer. The memory blocks
mentioned above are introduced in the hidden layers of LSTM to
 Input gate: controls the amount of current information flow into maintain the time series dependencies. The hidden layers has L
the memory cell. layers (L  1). The number of nodes in different hidden layers is
 Output gate: controls the amount of current information flow Ni ð1  i  LÞ. These parameters are related to predictive perfor-
into the rest of the networks. mance. In general, the selection of these parameters is based on
 Forget gate: selects the cell state at the previous moment and engineering experience. The last part of the LSTM model is a full
adaptively retain part of the information into the current connected layer with an output node.
moment.
2.4. Performance evaluation
These three gates are equal to multiply the previous information
by a number which ranges from 0 to 1. When the number is 0, all MAE, MSE, RMSE, MAPE and Theil U statistic [28,29] are five
the previous information flow is discarded. When the number is 1, traditional indicators to measure the accuracy of the model,
all information flow is retained.
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The three gates all use sigmoid functions which limit the data to 1 Xm b Þ2
the range of 0e1. The sigmoid function is defined as: RMSE ¼ ðY  Y
m i¼1 i i

1 1 Xm b Þ2
sðxÞ ¼ (7) MSE ¼ ðY  Y
1 þ ex m i¼1 i i

1 Xm bj
MAE ¼ jY  Y
m i¼1 i i

it , ft , ot are on behalf of input gate, forget gate, and output gate,  b 


respectively. C ~ t is intermediate value during the calculation, 100Xm Yi  Y i
(10)
MAPE ¼ i¼1 
m Yi 
it ¼ sðWi Xt þ Ui ht1 þ bt Þ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
~ t ¼ tanhðWz Xt þ Uz h u X
C  t1 þ  bz Þ u1 n
t b t  Yt Þ2
ðY
ft ¼ s Wf Xt þ Uf ht1 þ bf n t¼1
(8)
~ t þ ft *C
Ct ¼ it *C U1 ¼ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
t1 u X u X
ot ¼ sðWo Xt þ U0 ht1 þ b0 Þ u1 n u n
t Yb t 2 þ t1 Yt 2
ht ¼ ot *tanhðCt Þ n t¼1 n t¼1

where Wi , Ui , Wz , Uz , Wf , Uf , Wo and Uo are weight matrices, bt , bz , where m is the number of samples of the test set, Yi refers to the
bf and bo are bias vectors. xt is the current input. ht and ht1 are the b refers to the predictive value of i-th
real value of i-th sample, Y i
outputs at the current time t and the previous time t  1, respec- sample. The smaller values of these parameters are, the higher
tively. The hyperbolic tangent function can be expressed: accuracy of the model is.
ex  ex Mean and variance (S2 ) are two other indicators to evaluate the
tanhðxÞ ¼ (9) dispersion of the predicted data,
ex þ ex
1 Xm
b
Y¼  Y
m i¼1 i
(11)
1 Xm b
S2 ¼  i¼1
ð Y i  YÞ2
m

where Y refers to the mean of data, S2 is on behalf of the variance of


the difference. When the values of MSE, RMSE, MAE and MAPE are
close, the variance can be used to judge the degree of deviation
between the predicted data and the real data. The model with a
small variance is more stable than the model with a big variance.

3. Experiments and results

3.1. Physical equipment

The real data for this experiment came from a refrigeration


device, including the indoor and outdoor temperature, the liquid
level of liquid reservoir, the pressure inside the compressor, fan
operation signal, alarm signal and other hundreds of variables. The
basic structure of the cooling system is shown in Fig. 4:
The condensed fluid vaporizes from the fluid reservoir into gas
Fig. 3. LSTM structure used in this study.
and absorbs heat in the evaporator to keep the low temperature of
6 J.Q. Wang et al. / Energy 197 (2020) 117197

means to select secondary variables.


Step 1: The correlation coefficient can depict the relevancy be-
tween data. Choose the Kendall rank correlation coefficient to
compute correlation coefficient because the data is monotonically
correlated. The Kendall rank correlation coefficient are shown in
descending order as Fig. 7. The correlation coefficients for all the
process variables display in a descending order, in which the plus-
minus sign means this variable is positive or negative correlation
with the output. The number in x-axis is the top 20 variables with
the larger correlation. The first variable is energy consumption it-
self. According to the rule that the correlation coefficient is larger
than 0.3, the first five variables except energy consumption were
selected as secondary variables, which shown in Table 1.
Step 2: Mechanism analysis is used to supplement other vari-
Fig. 4. Schematic diagram of refrigerator system. ables as secondary variables. In a physical system, temperate and
inspiratory superheat both have a huge impact on energy con-
sumption. High or low temperatures both makes the cooling sys-
the refrigerator. The gas is pressurized by the compressor and liq- tem run in a high load, resulting in the increased energy
uefied into a liquid in the condenser. Then the liquid is stored in a consumption. The inspiratory superheat is the temperature differ-
fluid reservoir. It goes back into the loop again from the condenser ence between the inlet temperature of the compressor exhaust
to control the temperature of the refrigerator. The throttling device pipe and the saturation temperature corresponding to the actual
controls the flow rate of condensed fluid in the loop. In the system condensation pressure. It is an important physical quantity when
shown in Fig. 4, the data of energy consumption of the condenser measuring energy consumption. However, the variables used to
and compressors are measured separately. In this experiment, en- calculate the inspiratory superheat were not collected because of
ergy consumption is the only variable studied. Fig. 5 lists that some sensor fault. At last, indoor temperate and outdoor temperate are
variables associated with compressors. also selected as secondary variables.
Step 3: Section 3.2 shows that the data has a daily periodicity.
3.2. Analyze data characteristics Hence, the time data is added into secondary variables to retain
periodicity. Time data refers to the number of hours corresponding
The data used in this experiment were collected every minute to the data collection. Table 1 lists all secondary variables.
for five months (March 2018 to July 2018). At first, data cleaning is
used to solve the problem of outliers and null values in the data. The 3.4. Experimental results
data after data cleaning is also called filtered data, which is used in
later experiments. In these simulations below, two groups of experiments were
Fig. 6 shows the fundamental characteristics of the data. Fig. 6(a designed to verify the advantages of our proposed LSTM method.
and b) gives the original data and its first difference. The original Compressor energy consumption is taken as an prediction output
data fluctuates up and down, while the difference data is a sta- (the output is designed to be 1-step-ahead). The secondary vari-
tionary sequence. Fig. 6(c and d) gives their corresponding auto- ables in Table 1 are treated as an input sequence. Research frame-
correlation function curves. The autocorrelation graphs only show works for this study can be describe in Fig. 3. According to the
the results of the first 8000 sets of data. It is obvious that the daily experience from engineering, the LSTM has two hidden layers (L ¼
periodicity exits in the original data and vanishes in the first dif- 2) and the number of nodes in each hidden layer is 100 (N1 ; N2 ¼
ference data. The real energy consumption data are time sequences 100). The learning rate is 0.0006.
with periodicity. Software: Python 3.6.5, Tensor Flow 1.14, Win10 64-bit oper-
ating system; Hardware: RAM 16 GB, Inter (R) Core (TM) i7-9750H
3.3. Select secondary variables CPU.

Correlation analysis and mechanism analysis are two common 3.4.1. Case 1
Case 1 mainly verifies the forecasting performance of the pro-
posed method. Three other algorithms, BPNN, ARMA and ARFIMA,
were all conducted to the comparative experiments. In this case,
data from April were treated as the training set to train the LSTM
model and data from March, May, June and July were used as the
test set. The number of the data from April is 42,000. After data
cleaning, the training set still had 33,189 groups of data. The model
trained with these filtered data is called the original model. The
updated model in case 1 was trained by using a new dataset which
consisted of the data from the last 20 days of April and the first ten
days of May.
Fig. 8 and Fig. 10 show the fitting results of the training set of the
LSTM model and the BPNN model, respectively. Fig. 9 and Fig. 11
show the forecasting results of the LSTM model and the BPNN
model, respectively. The red lines indicate the real value and the
blue lines mean the forecasting value in these figures. From these
four figures, it can be found that LSTM can better fit the sequence
Fig. 5. Secondary variables that affect the energy consumption of the compressor. with periodicity than BPNN. And the forecasting results of the
J.Q. Wang et al. / Energy 197 (2020) 117197 7

Fig. 6. (a): The original data. (b): first order difference data. (c): Autocorrelation image of original data illustrated the periods. (d): Autocorrelation image of differentia data
illustrated the period is eliminated.

Fig. 7. Kendall rank correlation coefficient related to energy consumption data.

Table 1 BPNN model has large error near the peak point. Therefore, the
Secondary variables. LSTM model has a better predictive performance than BPNN.
ID secondary variable names Table 2 summarizes the training errors and the test errors of the
1 compressor suction temperature
LSTM model. The prediction effect of the model is evaluated by
2 inspiratory capacity adopting MSE, RMSE, MAE, MAPE, S2 , and Theil U stat 1. Only the
3 dew point temperature prediction error of the original model on the test set is considered,
4 compressor suction pressure
we can find from Table 2, the values of MSE, RMSE, MAE, MAPE, and
5 head discharge temperature
6 indoor temperature
Theil U stat 1 are all the smallest in May and the largest in July. In
7 outdoor temperature addition, the prediction errors of the test set in July were increasing
8 time data significantly. It indicates that the generalization of the model be-
comes weak as the time interval increases. This is concordant with
8 J.Q. Wang et al. / Energy 197 (2020) 117197

Fig. 8. Training results for April (LSTM).

Fig. 9. Forecasting results for May by using April model (LSTM).

Fig. 10. Training results for April (BPNN).


J.Q. Wang et al. / Energy 197 (2020) 117197 9

Fig. 11. Forecasting results for May by using April model (BPNN).

Table 2
Error of LSTM model.

Month&indicator training set test set updated model

April March May June July June July

MSE 0.1008 3.2828 0.8808 1.2055 5.5088 1.1538 5.1283


RMSE 0.3175 1.8118 0.9385 1.098 2.3471 1.0742 2.2646
MAE 0.2289 1.4215 0.7198 0.8494 1.8167 0.8356 1.7679
MAPE 1.6785 10.8147 4.9133 4.9989 10.5836 4.9157 10.5811
S2 4.486 2.8696 3.9564 3.8914 7.8296 4.1376 8.1931
MEAN 13.9342 12.6079 14.9345 17.1939 16.6694 17.1976 17.0577
Theil U stat1 0.0113 0.0698 0.0312 0.0317 0.0680 0.0310 0.0649

the common sense that the longer the predicting duration is, the 3.4.2. Case 2
more difficult it is to predict accurately. Besides the above, Table 2 In a real system, it is hard to get all the secondary variables
also illustrates the MSE, RMSE, MAE, MAPE, S2 , and Theil U stat 1 of because many devices cannot be equipped with sensors or have
the updated model is lower than that of the original model, which sensor fault. Case 2 is designed to study whether the proposed
indicates that the updated model can improve the prediction per- method can still predict energy consumption well and truly with
formance. It tells us that updating the model in time is a valid missing a part of secondary variables. Meanwhile, case 2 also ex-
technique to ensure prediction accuracy in the real application. plores the relationship between the prediction accuracy and the
Table 3 summarizes the values of RMSE of four algorithms. The potential input variables.
RMSE of LSTM is smaller than that of BPNN, ARMA and ARFIMA. The secondary variables used in this case are listed in Table 1. In
Only the RMSE of the test set is considered, the values of RMSE case 2, the data from April are used as the training set and the data
gradually increases from May to July. However, the increasing from May are used as the test set. In the following experiment, only
extent of these errors of BPNN, ARMA and ARFIMA is more overt. It the input variables of the data set are different when training new
indicates that BPNN, ARMA and ARFIMA are weaker than LSTM at models, and other settings remain unchanged.
predicting long-term data because they can’t reserve the temporal Table 4 shows the experimental results when one secondary
relationship on time series data. The sixth indicator, mean, gradu- variable is missing in the dataset used to train the models. The
ally increased from March to June. The reason for this phenomenon missing ID in Table 4 represents the ID number of the missing
may be that the cooling system needs much refrigeration to offset secondary variable. Table 1 gives the ID number of secondary var-
the change of the external environmental temperature from spring iables. So the missing ID 1 represents that the missing variable is
to summer. Table 3 also shows that the updated model improves compressor suction temperature. Combined with Table 2, the
forecast performance. values of the evaluation index from Table 4 to Table 6 actually

Table 3
Experimented results of four algorithms.

Month RMSE for the training set RMSE for the test set updated model

April March May June July June July

LSTM 0.3175 1.8118 0.9385 1.098 2.3471 1.0742 2.2646


BPNN 0.9485 2.026 1.1687 2.4876 3.5974 2.4956 3.64
ARMA (19,16) e 4.6614 2.0785 2.0084 3.1633 1.9280 3.4453
ARFIMA (16,0.4276,16) e 4.6843 2.6505 2.0466 3.1845 1.9905 3.4732
10 J.Q. Wang et al. / Energy 197 (2020) 117197

Table 4
The input of missing one variable (LSTM).

dataset train

Missing ID 1 2 3 4 5 6 7 8

MAE 0.2360 0.2736 0.3729 0.2392 0.2395 0.3305 0.2354 0.2961


MSE 0.1111 0.1387 0.2324 0.1086 0.1133 0.1841 0.1075 0.1519
RMSE 0.3333 0.3724 0.4820 0.3294 0.3366 0.4291 0.3279 0.3898
dataset test
Missing ID 1 2 3 4 5 6 7 8
MAE 0.7227 0.9106 0.8043 0.7245 0.7219 0.7710 0.7276 0.7927
MSE 0.8855 1.3761 1.0621 0.8887 0.8888 0.9981 0.9055 1.0623
RMSE 0.9410 1.1731 1.0306 0.9427 0.9428 0.9990 0.9516 1.0307

Table 5
The input of missing two variables (LSTM).

dataset train test

missing ID 2&1 2&3 2&4 2&5 2&6 2&7 2&8 2&1 2&3 2&4 2&5 2&6 2&7 2&8

MAE 0.2788 0.432 0.289 0.2793 0.3077 0.2732 0.3007 0.997 1.0845 0.9648 1.0006 0.9641 0.8876 0.9922
MSE 0.1544 0.3116 0.1617 0.1511 0.1842 0.1446 0.1717 1.6149 1.8633 1.5381 1.6577 1.5454 1.3139 1.6542
RMSE 0.3929 0.5582 0.4021 0.3887 0.4292 0.3803 0.4144 1.2708 1.365 1.2402 1.2875 1.2431 1.1463 1.2862

Table 6 illustrate that the forecasting effect of the trained model becomes
The input of missing three variables (LSTM). weak as the number of missing secondary variables increases.
dataset train Among all the secondary variables, inspiratory capacity plays a key
role on the prediction of energy while dew point temperature and
missingID 2&3&1 2&3&4 2&3&5 2&3&6 2&3&7 2&3&8
indoor temperature may just play second place. In addition,
MAE 0.466 0.4468 0.4806 0.5372 0.4717 0.4592
missing one variable or two variables have a little impact on the
MSE 0.3841 0.3458 0.4111 0.4669 0.38 0.3683
RMSE 0.6197 0.588 0.6411 0.6833 0.6164 0.6069
prediction. However, MAE for all models is greater than 1 when
dataset test three secondary variables are missing and the predictive power of
missing ID 2&3&1 2&3&4 2&3&5 2&3&6 2&3&7 2&3&8 the model will be greatly reduced. Therefore, the models still have
MAE 1.1402 1.1045 1.175 1.2589 1.0138 1.081 great prediction ability when at least six input variables are still
MSE 2.0735 1.9811 2.2076 2.4807 1.6591 1.8808 retained. It is tough to guarantee the accuracy of the model when
RMSE 1.44 1.4075 1.4858 1.575 1.288 1.3714 the training set has less than six secondary variables. In addition,
the results in case 2 show that the proposed method can still pre-
dict energy consumption accurately even if a part of secondary
increases, that is to say, reducing input variables reduce the accu- variables are missing.
racy of the model. Table 4 shows that the errors of the models with
the missing variable 2 (inspiratory capacity) are larger than that of 4. Conclusions
other missing models. It means that inspiratory capacity plays a key
role on the prediction of energy. The errors of the models with the Energy forecasting task has occupied an important position in
missing variable 1 (compressor suction temperature) are smallest. our daily life due to its enormous economic benefits. Many methods
It means that compressor suction temperature has the lowest have been put forward to forecast energy consumption. However,
impact on the system. traditional methods do not perform well because they don’t extract
Table 5 shows the experimental results when two secondary the periodicity hidden in the energy consumption data. This paper
variables are missing in the dataset used to train the models. The has proposed a complete approach to predict time series with
missing ID is 1&2 represents that the missing variables are periodicity based on LSTM. This method selects secondary variables
compressor suction temperature and inspiratory capacity. Table 5 from all data related to energy consumption for modeling. In
shows that the errors of the models with the missing variable addition, time data is complemented into the secondary variables
2&3 are larger than that of other missing models. It indicates that to more precisely capture the periodicity. Experiments using a
variable 3 (dew point temperature) has a relatively important in- cooling system under one-step-ahead forecasting are conducted to
fluence on the prediction of energy. The errors of the models with verify the performance of LSTM. The most important findings of
the missing variable 2&7 are smallest. It means that variable 7 this research are as follows.
(outdoor temperature) has little impact on the system. Table 6
shows the experimental results when three secondary variables  The time variable can capture the periodicity precisely. The
are missing in the dataset used to train the models. Table 6 shows LSTM model is suggested to be implemented in predicting en-
that the errors of the models with the missing variable 2&3&6 are ergy consumption with the addition of time variable in order to
larger than that of other missing models. It means that variable 6 improve accuracy. Moreover, the LSTM method has higher
(indoor temperature) has a great influence on the system, after prediction performance compared with ARMA, ARFIMA, BPNN.
inspiratory capacity and dew point temperature. The errors of the The RMSE of LSTM is 19.7% lower than BPNN, 54.85% lower than
models for test set with the missing variable 2&3&7 are smallest. ARMA and 64.59% lower than ARFIMA in the forecasting of long-
It’s consistent with the previous conclusion. It indicates that out- term time series.
door temperature has little impact on the system.  Case 2 explores that how the potential input variables affect the
The increasing trend of the errors from Table 4 to Table 6 prediction accuracy. It is found that inspiratory capacity plays a
J.Q. Wang et al. / Energy 197 (2020) 117197 11

key role on the accuracy of the model while dew point tem-
perature and indoor temperature just have minor influence on X
q X
p

the prediction of energy. It is recommended to ensure that the xt ¼ ut þ qi uti þ wj xtj (14)
three variables (inspiratory capacity, dew point temperature i¼1 j¼1

and indoor temperature) are available when modeling a similar


cooling system. where p and q are the orders of autoregressive and moving average.
ut1 is the moving average (MA) variable. q and w are the moving
This study demonstrates that the proposed approach has average parameter and the autoregressive parameter, respectively.
excellent potential for predicting energy consumption. Further The most important part is to determine the p and q parameter.
studies will focus on a hybrid model of LSTM and other models on Using autocorrelation function/partial autocorrelation function
energy consumption. (ACF/PACF) analyses or using information criterion method can
help us find the appropriate parameters. Akaike information cri-
Acknowledgment terion (AIC) [32], Schwartz criterion criteria (SBC)[33], Hannan-
Quinn criterion (HQ) [34] and Akaike’s final prediction error (FPE)
This work is supported by the National Natural Science Foun- [35] are often used for modeling in ARMA. The formulas for the four
dation of China (No.61573050, No.61973023), Beijing Natural Sci- information criteria are given below:
ence Foundation (4202052), Shandong Key Laboratory of Big-data
Driven Safety Control Technology for Complex Systems AIC ¼ 2lnðLÞ þ 2*k
(SKDK202001). The authors would like to thank the anonymous SBC ¼ 2lnðLÞ þ k*lnðnÞ (15)
reviewers for their constructive comments.
HQ ¼ 2lnðLÞ þ 2*k*lnðlnðnÞÞ

where L is the maximum likelihood under the model, n is the


Appendix
number of data point, and k is the number of variables in the model.
A. Back Propagation Neural Network (BPNN) b 
~ T þ Mp þ 1
FPEðpÞ ¼ lnjSðpÞj þ M ln (16)
BPNN is the most simple and popular structures in ANN for b  Mp  1
T
capturing relationships behind data. BPNN is a multi-layer archi-
tecture, namely input layer, hidden layer and output layer. First, the ~
where lnjSðpÞj is the logarithm of the determinant of the estimated
back-propagation process calculates the weights for connections noise covariance matrix (prediction error). T b ¼ TN is the total
among the nodes and output of the whole network. Then calculates number of datapoints used to fit the model (T samples per trial * N
a minimized least-mean-square error between the target and the trials).
predicted values. This error is back propagated via the network to In this paper, we give a range of p and q (set 0e20 in the
update the weights among the nodes [30]. Fig. 12 show the research experiment) to model for each individual information criterion and
framework of BPNN. choose the p and q parameter corresponding to the minimum cri-
terion value for modeling. In this way, we can get five models from
the four information criteria and ACF/PACF analysis. Then choose
the model with the smallest RMSE for this experiment. At last, the p
and q parameters are determined by the FPE method in ARMA and
the AIC method in ARFIMA.

C. Autoregressive Fractional Integrated Moving Average Model


(ARFIMA)

Fig. 12. Architecture of BPNN with one hidden layer. ARFIMA [36] is similar to ARMA. Its design idea is as follow. First,
calculate the Hurst exponent (h) of the energy consumption data
and determine fractional order difference (d), which is d ¼ h  0:5.
The calculation formula is as follows: Then fractional difference is made to the data. Use ARMA model for
modeling fractional order difference data and forecasting. At last,
h ¼ sðw1 x þ b1 Þ perform reverse difference to the predicted results of ARMA is the
(12)
o ¼ sðw2 h þ b2 Þ final predicted value. fxt g, fxt g and L are the raw series, the
The error of the whole network is: sequence after difference and the lag operator, respectively. To
simplify the computation, we set x0 ¼ 0, The fractional difference
1X n formula is as follows:
E¼ ðy  oÞ2 (13)
2 i¼1
xt ¼ ð1  LÞd xt (17)
where s is sigmoid function. w1 and w2 are weight matrices, b1 and
where
b2 are bias vectors. x, y and o are input variables, target value and
predictive value, respectively.
X

Gðk  dÞ
ð1  LÞd ¼ Lk (18)
B. Auto Regressive and Moving Average Model (ARMA) k¼0
Gðk þ 1ÞGðdÞ

The equation of an ARMA (p,q) model is [31]: where G is gamma function.


12 J.Q. Wang et al. / Energy 197 (2020) 117197

References [18] Possignolo RT, Hammami O. Performance evaluation of hybrid ANN based
time series prediction on embedded processor. Circ Syst 2016. https://doi.org/
10.1109/LASCAS.2010.7410246.
[1] Liu J, Chen Y, Zhan J, Fei S. An on-line energy management strategy based on
[19] Sabino Parmezan AR, Souza VMA, Batista GEAPA. Evaluation of statistical and
trip condition prediction for commuter plug-in hybrid electric vehicles. IEEE
machine learning models for time series prediction: identifying the state-of-
Trans Veh Technol 2018;67:3767e81. https://doi.org/10.1109/
the-art and the best conditions for the use of each model. Information Sci-
TVT.2018.2815764.
ences; 2019. https://doi.org/10.1016/j.ins.2019.01.076.
[2] Han B, Zhang D, Tao Y. Energy consumption analysis and energy management
[20] Aishwarya DC, Babu CN. Prediction of time series data using GA-BPNN based
strategy for sensor node. In: International conference on information &
hybrid ANN model. In: 2017 IEEE 7th international advance computing con-
automation IEEE; 2008. https://doi.org/10.1109/ICINFA.2008.4607998.
ference (IACC). IEEE; 2017. https://doi.org/10.1109/IACC.2017.0174.
[3] Pao HT. Forecast of electricity consumption and economic growth in Taiwan
[21] Wu JY. Forecasting chaotic time series using an artificial immune system
by state space modeling. Energy 2009;34(11):1779e91. https://doi.org/
Algorithm-based BPNN. In: International conference on Tech-
10.1016/j.energy.2009.07.046.
nologies&Applications of artificial intelligence; 2011. https://doi.org/10.1109/
[4] Yin H, Wong SC, Xu J, Wong C. Urban traffic flow prediction using a fuzzy-
TAAI.2010.88.
neural approach. Transport Res Part C (Emerging Technologies) 2002;10(2):
[22] Williams RJ, Zipser D. A learning algorithm for continually running fully
85e98. https://doi.org/10.1109/YAC.2016.7804912.
recurrent neural networks. MIT Press; 1989. https://doi.org/10.1162/
[5] Rasp S, Lerch S. Neural networks for post-processing ensemble weather
neco.1989.1.2.270.
forecasts. Mon Weather Rev 2018;146:11. https://doi.org/10.1175/MWR-D-
[23] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput
18-0187.1.
1997;9(8):1735e80. https://doi.org/10.1162/neco.1997.9.8.1735.
[6] Zhang Q, Wang H, Dong J, Zhong GQ, Sun X. Prediction of sea surface tem-
[24] Fu R, Zhang Z, Li L. Using LSTM and GRU neural network methods for traffic
perature using long short-term memory. Geosci Rem Sens Lett IEEE
flow prediction. In: 2016 31st youth academic annual conference of Chinese
2017;14(10):1745e9. https://doi.org/10.1109/LGRS.2017.2733548.
association of automation (YAC). IEEE; 2016. https://doi.org/10.1109/
[7] Zuo Y, Kita E. Stock price forecast using Bayesian network. Expert Syst Appl
YAC.2016.7804912.
2012;39(8):6729e37. https://doi.org/10.1016/j.eswa.2011.12.035.
[25] Dutilleul P, Stockwell JD, Frigon D, Legendre P. The mantel test versus pear-
[8] Nomiyama F, Asai J, Murakami T, Murata J. A study on global solar radiation
son’s correlation analysis: assessment of the differences for biological and
forecasting using weather forecast data. Circuits and Systems (MWSCAS). In:
environmental studies. J Agric Biol Environ Stat 2000;5(2):131e50. https://
2011 IEEE 54th international midwest symposium on. IEEE; 2011. https://
doi.org/10.2307/1400528.
doi.org/10.1109/MWSCAS.2011.6026332.
[26] Kumari S, Nie J, Chen HS, Ma H, Stewart R, Li X, Lu MZ, Taylor WM, Wei HR.
[9] Mohamed Z, Bodger P. Forecasting electricity consumption in New Zealand
Evaluation of gene association methods for coexpression network construc-
using economic and demographic variables. Energy 2005;30:1833e43.
tion and biological knowledge discovery. PloS One 2012;7(11):e50411.
https://doi.org/10.1016/j.energy.2004.08.012.
https://doi.org/10.1371/journal.pone.0050411.
[10] Sen P, Roy M, Pal P. Application of ARIMA for forecasting energy consumption
[27] Xiao C, Ye J, Esteves RM, Rong C. Using Spearman’s correlation coefficients for
and GHG emission:A case study of an Indian pig iron manufacturing organi-
exploratory data analysis on big dataset. Concurrency Comput Pract Ex
zation. Energy 2016;116:1031e8. https://doi.org/10.1016/
2016;28(14):3866e78. https://doi.org/10.1002/cpe.3745.
j.energy.2016.10.068.
[28] Thiel H. Applied economic forecasting. Chicago: Rand McNally; 1966.
[11] Tseng FM, Yu HC, Tzeng GH. Combining neural network model with seasonal
[29] Huang LL, Wang J. Global crude oil price prediction and synchronization based
time series ARIMA model. Technol Forecast Soc Change 2002;69(1):71e87.
accuracy evaluation using random wavelet neural network. Energy 2018;151:
https://doi.org/10.1016/s0040-1625(00)00113-x.
875e88. https://doi.org/10.1016/j.energy.2018.03.099.
[12] Wu L, Liu S, Yang Y. Grey double exponential smoothing model and its
[30] Wang JZ, Wang JJ, Zhang ZG, Guo SP. Forecasting stock indices with back
application on pig price forecasting in China. Appl Soft Comput 2016;39:
propagation neural network. Expert Syst Appl 2011;38(11):14346e55.
117e23. https://doi.org/10.1016/j.asoc.2015.09.054.
https://doi.org/10.1016/j.eswa.2011.04.222.
[13] Rao Y, Xu S, Xiong L. Time series prediction of heavy metal contamination in
[31] Tan IKT, Hoong PK, Keong CY. Towards forecasting low network traffic for
mining areas based on exponential smoothing model. International Confer-
software patch downloads: an ARMA model forecast using CRONOS. In: Sec-
ence on Information Science & Technology 2011. https://doi.org/10.1109/
ond international conference on computer and network Technology; 2010.
ICIST.2011.5765081.
https://doi.org/10.1109/ICCNT.2010.35.
[14] Pano-Azucena AD, Tlelo-Cuautle E, Tan XD. Prediction of chaotic time series
[32] Akaike HT. A new look at the statistical model identification. IEEE Trans
by using ANNs, ANFIS and SVMs. In: 7th international conference on modern
Automat Contr 1974;19(6):716e23.
circuits and systems technologies (MOCAST); 2018. https://doi.org/10.1109/
[33] Schwarz G. Estimating dimensions of a model. Ann Stat 1978;6(2). https://
MOCAST.2018.8376560.
doi.org/10.1214/aos/1176344136.
[15] Ismail S, Shabri A, Samsudin R. A hybrid model of self-organizing maps (SOM)
[34] Hannan EJ, Rissanen J. Recursive estimation of mixed of autoregressive
and least square support vector machine (LSSVM) for time-series forecasting.
moving average order. Biometrika 1982. https://doi.org/10.1093/biomet/
Expert Syst Appl 2011;38(8):10574e8. https://doi.org/10.1016/
69.1.81.
j.eswa.2011.02.107.
[35] Lütkepohl H. VAR order selection and checking the model adequacy. New
[16] Rubio G, Pomares H, Rojas I, Herrera LJ. A heuristic method for parameter
introduction to multiple time. https://doi.org/2006.10.1007/978-3-662-
selection in LS-SVM:Application to time series prediction. Int J Forecast
02691-5_4.
2011;27(3):725e39. https://doi.org/10.1016/j.ijforecast.2010.02.007.
[36] Ye XM, Xia Y, Zhang J, Chen Y. Characterizing long memories in electric water
[17] Ishikawa M, Moriyama T. Prediction of time series by a structural learning of
heater power consumption time series. IEEE Africon 2011. https://doi.org/
neural networks. Fuzzy Set Syst 1996;82(2):167e76. https://doi.org/10.1016/
10.1109/AFRCON.2011.6072104.
0165-0114(95)00253-7.

You might also like