LSTM Based Long-Term Energy Consumption Prediction With
LSTM Based Long-Term Energy Consumption Prediction With
Energy
journal homepage: www.elsevier.com/locate/energy
a r t i c l e i n f o a b s t r a c t
Article history: Energy consumption information is a kind of time series with periodicity in many real system, while the
Received 21 September 2019 general forecasting methods do not concern periodicity. This paper proposes a novel approach based on
Received in revised form long short-term memory (LSTM) network for predicting the periodic energy consumption. Firstly, hidden
22 January 2020
features are extracted by the autocorrelation graph among the real industrial data. The correlation
Accepted 18 February 2020
analysis and mechanism analysis contribute to finding the appropriate secondary variables as model
Available online 21 February 2020
input. In addition, the time variable is complemented to precisely capture the periodicity. Then a LSTM
network is constructed to model and forecast sequential data. The experimental results on a certain
Keywords:
Periodic time series
cooling system demonstrate that the proposed method has higher prediction performance compared
Energy consumption prediction with several traditional forecasting methods, such as autoregressive moving average model (ARMA),
Secondary variables selection autoregressive fractional integrated moving average model (ARFIMA) and back propagation neural
Long short-term memory(LSTM) network (BPNN). The RMSE of LSTM is 19.7%, 54.85%, 64.59% lower than BPNN, ARMA, ARFIMA on the
May test data. Furthermore, considering the limitation of missing certain measuring equipments, new
prediction models with the reduced secondary variables are retrained to explore the relationship be-
tween the prediction accuracy and the potential input variables. The experimental results demonstrate
that the proposed algorithm has the excellent generalization capability.
© 2020 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.energy.2020.117197
0360-5442/© 2020 Elsevier Ltd. All rights reserved.
2 J.Q. Wang et al. / Energy 197 (2020) 117197
time series with randomness, uncertainty, chaos, nonlinearity and data. Autocorrelation and cross-correlation analysis can help seek
periodicity. This section details the proposed framework for long- the hidden periodicity and determine the correlation degree among
term energy consumption prediction with distinct periodicity. variables. Data differencing can eliminate the variation trend of
The techniques involved are given in the following. data, especially the periodicity. Many statistic analysis methods can
help provide reference opinions for the subsequent selection of
2.1. Design philosophy research variables and model types.
Step 4: Secondary variables determination. There are many
Fig. 1 illustrates the universal process of time series prediction. variables that affect the forecasting target in the real industry. Some
The specific steps are as follows: have large impact and some have little influence. So it is critical to
Step 1: Project understanding. Analyze the prediction re- identify the process variables that have the more impact on energy
quirements and understand the process mechanism. Determine prediction. Secondary variables generally refers to the variables
initially the relevant variables based on the mechanism analysis. that has a great influence on the predicted object, are determined
Step 2: Data preparation. First, collect the process data related to by the combination of correlation analysis and mechanism analysis.
prediction according to the forecast target from the industrial Step 5: Predictive modeling. There are many approaches to
system. Generally there are many contaminations among the real forecast time series, such as RNNs, ARMA and BPNN. Which one is
measurements, such as null values due to sensor fault, outliers with the most appropriate model for the given predictive problem? We
abrupt variation and noise. Eliminating these abnormal data and should answer this question and train the prediction model ac-
denoising should be implemented to obtain the filtered data. cording to the secondary variables. By adjusting the parameters of
Step 3: Data analytic. The goal of this step is to explore the the model, the model with the highest prediction accuracy will be
hidden patterns in the filtered data and explore the distribution is obtained.
characteristics of the data. For example, line graphs of variables can Step 6: Model evaluation. In general, some indicators are used to
find the changing properties and distribution characteristics of evaluate the prediction performance and the generalization ability
of the model, such as root mean squared error (MSE], mean squared
error (RMSE], mean absolute percentage error (MAPE], mean ab- D1 D2
CORðx; yÞ ¼ (4)
solute error (MAE] and variance. Section 2.4 introduces these in- nðn 1Þ=2
dicators in detail.
Step 7: Model Updating. Any prediction model will gradually where D1 and D2 represents the number of the concordant pairs
deviate from the true value with the time increasing. Therefore, the and the disconcordant pairs, respectively.
update strategy of the prediction model is inevitable. Here two
criteria are considered for updating the model. The first one is that 2.2.3. Spearman rank correlation coefficient
if the cumulative daily error reaches to a threshold, the model will Spearman rank correlation coefficient is very similar to Kendall
be updated. Another is the regularly updated, for example the rank correlation coefficient. They are both adept at calculating the
model is updated in fix time period. monotone correlation between variables. However, the Kendall
Step 8: Result analysis. Analyzing the predicting results of the method is faster than the Spearman method in calculating corre-
model can help find out some implicit conclusions. This may pro- lation coefficients of ordered variables.
vide ideas for future research. The spearman correlation coefficient CORðx; yÞ is:
X
n
2.2. Correlation analysis S¼ ðxi yi Þ2 (5)
i¼1
1 Xn
x¼ ðx Þ (1)
n i¼1 i
1 Xn
y¼ ðy Þ (2)
n i¼1 i
memory blocks, which are self-connected. Three especial multi- Fig. 3 shows the research framework of LSTM. Like other neural
plicative units called gates are introduced to store temporal se- networks, LSTM is also a multi-layer architecture, namely the input
quences. The functions of the three gates are as follows: layer, the hidden layer and the output layer. The memory blocks
mentioned above are introduced in the hidden layers of LSTM to
Input gate: controls the amount of current information flow into maintain the time series dependencies. The hidden layers has L
the memory cell. layers (L 1). The number of nodes in different hidden layers is
Output gate: controls the amount of current information flow Ni ð1 i LÞ. These parameters are related to predictive perfor-
into the rest of the networks. mance. In general, the selection of these parameters is based on
Forget gate: selects the cell state at the previous moment and engineering experience. The last part of the LSTM model is a full
adaptively retain part of the information into the current connected layer with an output node.
moment.
2.4. Performance evaluation
These three gates are equal to multiply the previous information
by a number which ranges from 0 to 1. When the number is 0, all MAE, MSE, RMSE, MAPE and Theil U statistic [28,29] are five
the previous information flow is discarded. When the number is 1, traditional indicators to measure the accuracy of the model,
all information flow is retained.
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The three gates all use sigmoid functions which limit the data to 1 Xm b Þ2
the range of 0e1. The sigmoid function is defined as: RMSE ¼ ðY Y
m i¼1 i i
1 1 Xm b Þ2
sðxÞ ¼ (7) MSE ¼ ðY Y
1 þ ex m i¼1 i i
1 Xm bj
MAE ¼ jY Y
m i¼1 i i
where Wi , Ui , Wz , Uz , Wf , Uf , Wo and Uo are weight matrices, bt , bz , where m is the number of samples of the test set, Yi refers to the
bf and bo are bias vectors. xt is the current input. ht and ht1 are the b refers to the predictive value of i-th
real value of i-th sample, Y i
outputs at the current time t and the previous time t 1, respec- sample. The smaller values of these parameters are, the higher
tively. The hyperbolic tangent function can be expressed: accuracy of the model is.
ex ex Mean and variance (S2 ) are two other indicators to evaluate the
tanhðxÞ ¼ (9) dispersion of the predicted data,
ex þ ex
1 Xm
b
Y¼ Y
m i¼1 i
(11)
1 Xm b
S2 ¼ i¼1
ð Y i YÞ2
m
Correlation analysis and mechanism analysis are two common 3.4.1. Case 1
Case 1 mainly verifies the forecasting performance of the pro-
posed method. Three other algorithms, BPNN, ARMA and ARFIMA,
were all conducted to the comparative experiments. In this case,
data from April were treated as the training set to train the LSTM
model and data from March, May, June and July were used as the
test set. The number of the data from April is 42,000. After data
cleaning, the training set still had 33,189 groups of data. The model
trained with these filtered data is called the original model. The
updated model in case 1 was trained by using a new dataset which
consisted of the data from the last 20 days of April and the first ten
days of May.
Fig. 8 and Fig. 10 show the fitting results of the training set of the
LSTM model and the BPNN model, respectively. Fig. 9 and Fig. 11
show the forecasting results of the LSTM model and the BPNN
model, respectively. The red lines indicate the real value and the
blue lines mean the forecasting value in these figures. From these
four figures, it can be found that LSTM can better fit the sequence
Fig. 5. Secondary variables that affect the energy consumption of the compressor. with periodicity than BPNN. And the forecasting results of the
J.Q. Wang et al. / Energy 197 (2020) 117197 7
Fig. 6. (a): The original data. (b): first order difference data. (c): Autocorrelation image of original data illustrated the periods. (d): Autocorrelation image of differentia data
illustrated the period is eliminated.
Table 1 BPNN model has large error near the peak point. Therefore, the
Secondary variables. LSTM model has a better predictive performance than BPNN.
ID secondary variable names Table 2 summarizes the training errors and the test errors of the
1 compressor suction temperature
LSTM model. The prediction effect of the model is evaluated by
2 inspiratory capacity adopting MSE, RMSE, MAE, MAPE, S2 , and Theil U stat 1. Only the
3 dew point temperature prediction error of the original model on the test set is considered,
4 compressor suction pressure
we can find from Table 2, the values of MSE, RMSE, MAE, MAPE, and
5 head discharge temperature
6 indoor temperature
Theil U stat 1 are all the smallest in May and the largest in July. In
7 outdoor temperature addition, the prediction errors of the test set in July were increasing
8 time data significantly. It indicates that the generalization of the model be-
comes weak as the time interval increases. This is concordant with
8 J.Q. Wang et al. / Energy 197 (2020) 117197
Fig. 11. Forecasting results for May by using April model (BPNN).
Table 2
Error of LSTM model.
the common sense that the longer the predicting duration is, the 3.4.2. Case 2
more difficult it is to predict accurately. Besides the above, Table 2 In a real system, it is hard to get all the secondary variables
also illustrates the MSE, RMSE, MAE, MAPE, S2 , and Theil U stat 1 of because many devices cannot be equipped with sensors or have
the updated model is lower than that of the original model, which sensor fault. Case 2 is designed to study whether the proposed
indicates that the updated model can improve the prediction per- method can still predict energy consumption well and truly with
formance. It tells us that updating the model in time is a valid missing a part of secondary variables. Meanwhile, case 2 also ex-
technique to ensure prediction accuracy in the real application. plores the relationship between the prediction accuracy and the
Table 3 summarizes the values of RMSE of four algorithms. The potential input variables.
RMSE of LSTM is smaller than that of BPNN, ARMA and ARFIMA. The secondary variables used in this case are listed in Table 1. In
Only the RMSE of the test set is considered, the values of RMSE case 2, the data from April are used as the training set and the data
gradually increases from May to July. However, the increasing from May are used as the test set. In the following experiment, only
extent of these errors of BPNN, ARMA and ARFIMA is more overt. It the input variables of the data set are different when training new
indicates that BPNN, ARMA and ARFIMA are weaker than LSTM at models, and other settings remain unchanged.
predicting long-term data because they can’t reserve the temporal Table 4 shows the experimental results when one secondary
relationship on time series data. The sixth indicator, mean, gradu- variable is missing in the dataset used to train the models. The
ally increased from March to June. The reason for this phenomenon missing ID in Table 4 represents the ID number of the missing
may be that the cooling system needs much refrigeration to offset secondary variable. Table 1 gives the ID number of secondary var-
the change of the external environmental temperature from spring iables. So the missing ID 1 represents that the missing variable is
to summer. Table 3 also shows that the updated model improves compressor suction temperature. Combined with Table 2, the
forecast performance. values of the evaluation index from Table 4 to Table 6 actually
Table 3
Experimented results of four algorithms.
Month RMSE for the training set RMSE for the test set updated model
Table 4
The input of missing one variable (LSTM).
dataset train
Missing ID 1 2 3 4 5 6 7 8
Table 5
The input of missing two variables (LSTM).
missing ID 2&1 2&3 2&4 2&5 2&6 2&7 2&8 2&1 2&3 2&4 2&5 2&6 2&7 2&8
MAE 0.2788 0.432 0.289 0.2793 0.3077 0.2732 0.3007 0.997 1.0845 0.9648 1.0006 0.9641 0.8876 0.9922
MSE 0.1544 0.3116 0.1617 0.1511 0.1842 0.1446 0.1717 1.6149 1.8633 1.5381 1.6577 1.5454 1.3139 1.6542
RMSE 0.3929 0.5582 0.4021 0.3887 0.4292 0.3803 0.4144 1.2708 1.365 1.2402 1.2875 1.2431 1.1463 1.2862
Table 6 illustrate that the forecasting effect of the trained model becomes
The input of missing three variables (LSTM). weak as the number of missing secondary variables increases.
dataset train Among all the secondary variables, inspiratory capacity plays a key
role on the prediction of energy while dew point temperature and
missingID 2&3&1 2&3&4 2&3&5 2&3&6 2&3&7 2&3&8
indoor temperature may just play second place. In addition,
MAE 0.466 0.4468 0.4806 0.5372 0.4717 0.4592
missing one variable or two variables have a little impact on the
MSE 0.3841 0.3458 0.4111 0.4669 0.38 0.3683
RMSE 0.6197 0.588 0.6411 0.6833 0.6164 0.6069
prediction. However, MAE for all models is greater than 1 when
dataset test three secondary variables are missing and the predictive power of
missing ID 2&3&1 2&3&4 2&3&5 2&3&6 2&3&7 2&3&8 the model will be greatly reduced. Therefore, the models still have
MAE 1.1402 1.1045 1.175 1.2589 1.0138 1.081 great prediction ability when at least six input variables are still
MSE 2.0735 1.9811 2.2076 2.4807 1.6591 1.8808 retained. It is tough to guarantee the accuracy of the model when
RMSE 1.44 1.4075 1.4858 1.575 1.288 1.3714 the training set has less than six secondary variables. In addition,
the results in case 2 show that the proposed method can still pre-
dict energy consumption accurately even if a part of secondary
increases, that is to say, reducing input variables reduce the accu- variables are missing.
racy of the model. Table 4 shows that the errors of the models with
the missing variable 2 (inspiratory capacity) are larger than that of 4. Conclusions
other missing models. It means that inspiratory capacity plays a key
role on the prediction of energy. The errors of the models with the Energy forecasting task has occupied an important position in
missing variable 1 (compressor suction temperature) are smallest. our daily life due to its enormous economic benefits. Many methods
It means that compressor suction temperature has the lowest have been put forward to forecast energy consumption. However,
impact on the system. traditional methods do not perform well because they don’t extract
Table 5 shows the experimental results when two secondary the periodicity hidden in the energy consumption data. This paper
variables are missing in the dataset used to train the models. The has proposed a complete approach to predict time series with
missing ID is 1&2 represents that the missing variables are periodicity based on LSTM. This method selects secondary variables
compressor suction temperature and inspiratory capacity. Table 5 from all data related to energy consumption for modeling. In
shows that the errors of the models with the missing variable addition, time data is complemented into the secondary variables
2&3 are larger than that of other missing models. It indicates that to more precisely capture the periodicity. Experiments using a
variable 3 (dew point temperature) has a relatively important in- cooling system under one-step-ahead forecasting are conducted to
fluence on the prediction of energy. The errors of the models with verify the performance of LSTM. The most important findings of
the missing variable 2&7 are smallest. It means that variable 7 this research are as follows.
(outdoor temperature) has little impact on the system. Table 6
shows the experimental results when three secondary variables The time variable can capture the periodicity precisely. The
are missing in the dataset used to train the models. Table 6 shows LSTM model is suggested to be implemented in predicting en-
that the errors of the models with the missing variable 2&3&6 are ergy consumption with the addition of time variable in order to
larger than that of other missing models. It means that variable 6 improve accuracy. Moreover, the LSTM method has higher
(indoor temperature) has a great influence on the system, after prediction performance compared with ARMA, ARFIMA, BPNN.
inspiratory capacity and dew point temperature. The errors of the The RMSE of LSTM is 19.7% lower than BPNN, 54.85% lower than
models for test set with the missing variable 2&3&7 are smallest. ARMA and 64.59% lower than ARFIMA in the forecasting of long-
It’s consistent with the previous conclusion. It indicates that out- term time series.
door temperature has little impact on the system. Case 2 explores that how the potential input variables affect the
The increasing trend of the errors from Table 4 to Table 6 prediction accuracy. It is found that inspiratory capacity plays a
J.Q. Wang et al. / Energy 197 (2020) 117197 11
key role on the accuracy of the model while dew point tem-
perature and indoor temperature just have minor influence on X
q X
p
the prediction of energy. It is recommended to ensure that the xt ¼ ut þ qi uti þ wj xtj (14)
three variables (inspiratory capacity, dew point temperature i¼1 j¼1
Fig. 12. Architecture of BPNN with one hidden layer. ARFIMA [36] is similar to ARMA. Its design idea is as follow. First,
calculate the Hurst exponent (h) of the energy consumption data
and determine fractional order difference (d), which is d ¼ h 0:5.
The calculation formula is as follows: Then fractional difference is made to the data. Use ARMA model for
modeling fractional order difference data and forecasting. At last,
h ¼ sðw1 x þ b1 Þ perform reverse difference to the predicted results of ARMA is the
(12)
o ¼ sðw2 h þ b2 Þ final predicted value. fxt g, fxt g and L are the raw series, the
The error of the whole network is: sequence after difference and the lag operator, respectively. To
simplify the computation, we set x0 ¼ 0, The fractional difference
1X n formula is as follows:
E¼ ðy oÞ2 (13)
2 i¼1
xt ¼ ð1 LÞd xt (17)
where s is sigmoid function. w1 and w2 are weight matrices, b1 and
where
b2 are bias vectors. x, y and o are input variables, target value and
predictive value, respectively.
X
∞
Gðk dÞ
ð1 LÞd ¼ Lk (18)
B. Auto Regressive and Moving Average Model (ARMA) k¼0
Gðk þ 1ÞGðdÞ
References [18] Possignolo RT, Hammami O. Performance evaluation of hybrid ANN based
time series prediction on embedded processor. Circ Syst 2016. https://doi.org/
10.1109/LASCAS.2010.7410246.
[1] Liu J, Chen Y, Zhan J, Fei S. An on-line energy management strategy based on
[19] Sabino Parmezan AR, Souza VMA, Batista GEAPA. Evaluation of statistical and
trip condition prediction for commuter plug-in hybrid electric vehicles. IEEE
machine learning models for time series prediction: identifying the state-of-
Trans Veh Technol 2018;67:3767e81. https://doi.org/10.1109/
the-art and the best conditions for the use of each model. Information Sci-
TVT.2018.2815764.
ences; 2019. https://doi.org/10.1016/j.ins.2019.01.076.
[2] Han B, Zhang D, Tao Y. Energy consumption analysis and energy management
[20] Aishwarya DC, Babu CN. Prediction of time series data using GA-BPNN based
strategy for sensor node. In: International conference on information &
hybrid ANN model. In: 2017 IEEE 7th international advance computing con-
automation IEEE; 2008. https://doi.org/10.1109/ICINFA.2008.4607998.
ference (IACC). IEEE; 2017. https://doi.org/10.1109/IACC.2017.0174.
[3] Pao HT. Forecast of electricity consumption and economic growth in Taiwan
[21] Wu JY. Forecasting chaotic time series using an artificial immune system
by state space modeling. Energy 2009;34(11):1779e91. https://doi.org/
Algorithm-based BPNN. In: International conference on Tech-
10.1016/j.energy.2009.07.046.
nologies&Applications of artificial intelligence; 2011. https://doi.org/10.1109/
[4] Yin H, Wong SC, Xu J, Wong C. Urban traffic flow prediction using a fuzzy-
TAAI.2010.88.
neural approach. Transport Res Part C (Emerging Technologies) 2002;10(2):
[22] Williams RJ, Zipser D. A learning algorithm for continually running fully
85e98. https://doi.org/10.1109/YAC.2016.7804912.
recurrent neural networks. MIT Press; 1989. https://doi.org/10.1162/
[5] Rasp S, Lerch S. Neural networks for post-processing ensemble weather
neco.1989.1.2.270.
forecasts. Mon Weather Rev 2018;146:11. https://doi.org/10.1175/MWR-D-
[23] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput
18-0187.1.
1997;9(8):1735e80. https://doi.org/10.1162/neco.1997.9.8.1735.
[6] Zhang Q, Wang H, Dong J, Zhong GQ, Sun X. Prediction of sea surface tem-
[24] Fu R, Zhang Z, Li L. Using LSTM and GRU neural network methods for traffic
perature using long short-term memory. Geosci Rem Sens Lett IEEE
flow prediction. In: 2016 31st youth academic annual conference of Chinese
2017;14(10):1745e9. https://doi.org/10.1109/LGRS.2017.2733548.
association of automation (YAC). IEEE; 2016. https://doi.org/10.1109/
[7] Zuo Y, Kita E. Stock price forecast using Bayesian network. Expert Syst Appl
YAC.2016.7804912.
2012;39(8):6729e37. https://doi.org/10.1016/j.eswa.2011.12.035.
[25] Dutilleul P, Stockwell JD, Frigon D, Legendre P. The mantel test versus pear-
[8] Nomiyama F, Asai J, Murakami T, Murata J. A study on global solar radiation
son’s correlation analysis: assessment of the differences for biological and
forecasting using weather forecast data. Circuits and Systems (MWSCAS). In:
environmental studies. J Agric Biol Environ Stat 2000;5(2):131e50. https://
2011 IEEE 54th international midwest symposium on. IEEE; 2011. https://
doi.org/10.2307/1400528.
doi.org/10.1109/MWSCAS.2011.6026332.
[26] Kumari S, Nie J, Chen HS, Ma H, Stewart R, Li X, Lu MZ, Taylor WM, Wei HR.
[9] Mohamed Z, Bodger P. Forecasting electricity consumption in New Zealand
Evaluation of gene association methods for coexpression network construc-
using economic and demographic variables. Energy 2005;30:1833e43.
tion and biological knowledge discovery. PloS One 2012;7(11):e50411.
https://doi.org/10.1016/j.energy.2004.08.012.
https://doi.org/10.1371/journal.pone.0050411.
[10] Sen P, Roy M, Pal P. Application of ARIMA for forecasting energy consumption
[27] Xiao C, Ye J, Esteves RM, Rong C. Using Spearman’s correlation coefficients for
and GHG emission:A case study of an Indian pig iron manufacturing organi-
exploratory data analysis on big dataset. Concurrency Comput Pract Ex
zation. Energy 2016;116:1031e8. https://doi.org/10.1016/
2016;28(14):3866e78. https://doi.org/10.1002/cpe.3745.
j.energy.2016.10.068.
[28] Thiel H. Applied economic forecasting. Chicago: Rand McNally; 1966.
[11] Tseng FM, Yu HC, Tzeng GH. Combining neural network model with seasonal
[29] Huang LL, Wang J. Global crude oil price prediction and synchronization based
time series ARIMA model. Technol Forecast Soc Change 2002;69(1):71e87.
accuracy evaluation using random wavelet neural network. Energy 2018;151:
https://doi.org/10.1016/s0040-1625(00)00113-x.
875e88. https://doi.org/10.1016/j.energy.2018.03.099.
[12] Wu L, Liu S, Yang Y. Grey double exponential smoothing model and its
[30] Wang JZ, Wang JJ, Zhang ZG, Guo SP. Forecasting stock indices with back
application on pig price forecasting in China. Appl Soft Comput 2016;39:
propagation neural network. Expert Syst Appl 2011;38(11):14346e55.
117e23. https://doi.org/10.1016/j.asoc.2015.09.054.
https://doi.org/10.1016/j.eswa.2011.04.222.
[13] Rao Y, Xu S, Xiong L. Time series prediction of heavy metal contamination in
[31] Tan IKT, Hoong PK, Keong CY. Towards forecasting low network traffic for
mining areas based on exponential smoothing model. International Confer-
software patch downloads: an ARMA model forecast using CRONOS. In: Sec-
ence on Information Science & Technology 2011. https://doi.org/10.1109/
ond international conference on computer and network Technology; 2010.
ICIST.2011.5765081.
https://doi.org/10.1109/ICCNT.2010.35.
[14] Pano-Azucena AD, Tlelo-Cuautle E, Tan XD. Prediction of chaotic time series
[32] Akaike HT. A new look at the statistical model identification. IEEE Trans
by using ANNs, ANFIS and SVMs. In: 7th international conference on modern
Automat Contr 1974;19(6):716e23.
circuits and systems technologies (MOCAST); 2018. https://doi.org/10.1109/
[33] Schwarz G. Estimating dimensions of a model. Ann Stat 1978;6(2). https://
MOCAST.2018.8376560.
doi.org/10.1214/aos/1176344136.
[15] Ismail S, Shabri A, Samsudin R. A hybrid model of self-organizing maps (SOM)
[34] Hannan EJ, Rissanen J. Recursive estimation of mixed of autoregressive
and least square support vector machine (LSSVM) for time-series forecasting.
moving average order. Biometrika 1982. https://doi.org/10.1093/biomet/
Expert Syst Appl 2011;38(8):10574e8. https://doi.org/10.1016/
69.1.81.
j.eswa.2011.02.107.
[35] Lütkepohl H. VAR order selection and checking the model adequacy. New
[16] Rubio G, Pomares H, Rojas I, Herrera LJ. A heuristic method for parameter
introduction to multiple time. https://doi.org/2006.10.1007/978-3-662-
selection in LS-SVM:Application to time series prediction. Int J Forecast
02691-5_4.
2011;27(3):725e39. https://doi.org/10.1016/j.ijforecast.2010.02.007.
[36] Ye XM, Xia Y, Zhang J, Chen Y. Characterizing long memories in electric water
[17] Ishikawa M, Moriyama T. Prediction of time series by a structural learning of
heater power consumption time series. IEEE Africon 2011. https://doi.org/
neural networks. Fuzzy Set Syst 1996;82(2):167e76. https://doi.org/10.1016/
10.1109/AFRCON.2011.6072104.
0165-0114(95)00253-7.