0% found this document useful (0 votes)

7 views

Concurrency and Computation - 2020 - Li - An effective deep learning neural network model for short‐term load forecasting

This research article presents a deep learning neural network model for short-term load forecasting, utilizing multilayer perceptron (MLP), convolutional neural network (CNN), and long short-term memory (LSTM) techniques. A novel loss function is introduced, demonstrating a mean absolute percentage error (MAPE) that outperforms traditional methods, with CNN showing superior accuracy and robustness compared to MLP and LSTM. The study emphasizes the importance of accurate load forecasting for energy management in smart grids.

Uploaded by

Moussa Fall

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Concurrency and Computation - 2020 - Li - An effective deep learning neural network model for short‐term load forecasting

Uploaded by

Moussa Fall

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Received: 13 January 2019 Revised: 27 July 2019 Accepted: 1 August 2019

DOI: 10.1002/cpe.5595

RESEARCH ARTICLE

An effective deep learning neural network model for

short-term load forecasting

Ning Li1 Lu Wang1 Xinquan Li1 Qing Zhu2

1 State Grid Xinjiang Electric Power Co., Ltd.,

Institute of Electric Power Science, Urumqi, Summary
China
2 NARI Technology Co, Ltd, Nanjing, China
Energy load forecasting plays an important role in the smart grid, which can affect the
promoting energy production and consumption decision-making processes. In this paper, the
Correspondence state-of-the-art deep learning (DL) neural models are used in the short-term load forecasting,
Qing Zhu, NARI Technology Co, Ltd, Nanjing
including the multilayer perceptron (MLP), the convolutional neural network (CNN), and the
211106, China.
Email: [email protected] long short-term memory (LSTM). A novel loss function is proposed for the load forecasting, and
two commonly used benchmarks are used to verify the validity of the proposed function. The
Funding information simulation results show that the mean absolute percentage error (MAPE) of the proposed loss
National Key Technology Research and
Development Program of China, Grant/Award
function is 19.63% lower than cross-entropy and 2.34% lower than mean absolute error (MAE).
Number: 2016YFB0901103; National Natural We compared the mentioned neural networks in different aspects, and the results show that in
Science Foundation of China, Grant/Award energy load forecasting, CNN has superior performance than MLP and LSTM in terms of high
Number: 61603239
accuracy and robustness to weather changes.

KEYWORDS

deep learning, energy load forecasting, neural network, smart grid

1 INTRODUCTION

Energy load forecasting is an essential part of electricity retail side, which can enable the electricity suppliers to make proper decisions for
energy trading.1,2 Generally, the load forecasting refers to the prediction of the future load data by systematically processing the past data with
considering some certain important characteristics, and the predicted results can satisfy certain accuracy requirements. Predicting the future
load value is important to energy management with minimized energy wastage, including facilitating the grid operation rational arrangement
and maintenance plan, saving fuel and coal, reducing costs, facilitating a reasonable power supply construction, and promoting the electricity
improvement. Furthermore, the contracting strategy, quotation strategy, trading strategy, and user economic measurement are all based on the
load forecasting results. Without precise prediction, the electricity supplier cannot even prepare bids and offers on the spot market; therefore,
the accuracy of load forecasting is crucial to the electricity suppliers.3
Generally, the load forecasting can be divided into two levels; one is the system (grid)-level load forecasting, and the other is the consumer
(household)-level load forecasting. The power companies mainly care about the grid-level loads and focus on the stable and safe operation of the
entire power system. However, for the electricity suppliers, the consumer-level load forecasting is much more important. They have to obtain
the whole grid's loads data and predict the individual consumer's data; then, based on the predicted data, the electricity suppliers perform their
power dispatch allocation strategies. The loads curves of the two level are shown in Figure 1; it can be seen that the grid-level curve is relatively
flat and easy to predict, while the user-level curve fluctuates drastically and randomness under the same conditions. Although the consumer-level
load forecasting is much more difficult than the grid level, it is inevitable for the electricity suppliers and the virtual power plants, especially for
short-term (within two weeks) and ultra-short-term (within one day) consumer-level load forecasting.4-7
At present, there are many methods for short-term load forecasting,8-16 including the artificial neural network (ANN) methods, the time series
methods, regression analysis methods, support vector machine, fuzzy prediction, etc. Due to various factors such as weather changes, social
activities, and festivals, the traditional shallow neural network or models become hard to train and cannot achieve satisfactory performance.
However, most of the factors are regular, for example, the morning and afternoon peaks in one day and the trough in midnight. Then, in the
weekly level, the overall consumption is lower and the peak delayed in the weekend compared with other work days. Next, the long-term patterns
Concurrency Computat Pract Exper. 2020;32:e5595. wileyonlinelibrary.com/journal/cpe © 2020 John Wiley & Sons, Ltd. 1 of 10
https://doi.org/10.1002/cpe.5595
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 10 LI ET AL.

Grid Level

t
P

Household Level

FIGURE 1 The loads curves of different levels t

such as the season has less effect for short-term forecasting; the empirical data shows that the consumption during winter in cold climates is
higher than warm climates. Therefore, we can utilize the regular factors to perform effective load prediction. The deep learning (DL) methods
such as the long short-term memory (LSTM) and the convolutional neural networks (CNNs)17-27 are expected to extract different features and
improve the load prediction accuracy by adopting multiple network levels. For example, Akbari Asanjan et al28 proposed a precipitation forecast
model for short-term quantitative precipitation forecasting. Zhang et al29 investigated the applicability of LSTM model to the reservoir operation
simulation and gave the comparison of LSTM and back-propagation (BP) network. Yang et al30,31 employed and compared the artificial intelligence
and data mining techniques to forecast the reservoir operations, including the random forest (RF), ANN, and the support vector regression (SVR)
methods. Høverstad et al32 studied the data-driven short-term load forecasting; by considering extracting the seasonality before forecasting, the
forecasting algorithms achieve more accurate forecasts. Lago et al33 and Ugurlu et al34 investigated various neural networks and showed that the
neural network methods outperform state-of-the-art statistical forecasting methods. Hosein and Hosein35 used deep neural network to forecast
the network loads; the results indicated that the DL methods perform better than traditional methods.
The present load forecasting methods show that the load forecasting is a challenging task, especially for the household level; lots of factors
can affect the prediction accuracy. Moreover, to apply any load forecasting technique to each household will require a detailed investigation of
the influences on individual load consumption behavior in order to adequately capture complex dynamics. The consumer-level forecasting will
require a large amount of high-resolution data, which requires a large amount of computing resources. Moreover, it is necessary to obtain every
household's consumption behavior and investigate each factor's impact to adequately capture the complex dynamics. Since the meteorological
management system has been established, which can provide various high-precision climatic historical data, an effective forecasting model can
be established to further improve the accuracy of short-term load forecasting.
In this paper, we presented an end-to-end neural network model to predict the loads for the following 24 hours. Since the DL can accurately
predict the energy load,35-41 three deep learning algorithms, including the multi-layer perceptron (MLP),42 convolutional neural network (CNN),43
and long-short term memory (LSTM),44 are chosen for analysis.
We also proposed an effective loss function, and based on the loss function, the efficiency and accuracy of the three algorithms are compared
in various real-world scenarios. The contribution of this work is twofold. Firstly, a complete end-to-end model based on deep learning network
is proposed for short-term load forecasting. The proposed model does not involve external feature extraction or feature selection algorithms,
which only uses the raw readily available load data, the temperature data, and the holiday information as the input. Secondly, an efficient loss
function is proposed, which can reduce the mean absolute percentage error (MAPE).
The remainder of this paper is organized as follows. In Section 2, the MLP, CNN, and LSTM algorithms, as well as the models for load
forecasting, are provided. In Section 3, the model of input data and a novel loss function are proposed. In Section 4, the experimental results of
short-term load forecasting by the proposed model are presented, and the comparison of the proposed model with different networks, including
MLP, CNN, and LSTM, is given. The conclusion is given in Section 5.

2 ARTIFICIAL NEURAL NETWORK APPROACHES AND MODEL FOR SHORT-TERM LOAD

FORECASTING

Artificial neural network (ANN) has always been one of the primary solutions for short-term load forecasting. The latest development of neural
networks, especially the deep learning method, has had a tremendous impact in the fields of computer vision, natural language processing, and
speech recognition.45-49 Researchers fuse their understanding of different tasks into specific network structures rather than using a fixed shallow
neural network structure. Different building blocks, including CNN50,51 and LSTM, make deep neural networks highly flexible and efficient. At
present, researchers have proposed various techniques that can effectively train multilayer neural networks without leading to the disappearance
of gradients or serious overfitting. The application of DL for short-term load forecasting is a relatively new topic. In the past decade, researchers
have been using restricted Boltzmann machine (RBM)52 and multilayer feedforward neural network to predict the demand-side load and the
natural gas load.53,54 However, the hidden layers of the above algorithms are usually very small; with the number of layers increasing, the training
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 3 of 10

of the above models becomes more and more difficult, which limits the performance of the models. In the following, the discussion of the
state-of-art deep learning neural networks will be given, and based on these networks, we propose an efficient model for the short-term load
forecasting.

2.1 Multilayer perceptron (MLP)

Multilayer perceptron (MLP) is a forward structured neural network, which maps a set of input vectors to a set of output vectors. MLP can be
considered as a directed graph, which consists of multiple node layers and each node connected to the next layer. In addition to the input nodes,
each node is a neuron with nonlinear activation function. Back propagation (BP) algorithm is often used to train MLP.55 MLP overcomes the
disadvantage that perceptrons cannot recognize linear indivisible data. The simplest MLP is a three-tier structure, including input layer, hidden
layer, and output layer. The layers of the MLP are fully connected that any neuron in each layer is connected with all the neurons in the previous
layer. 56
From the input layer to the hidden layer, we can use X as the input and H as the hidden layer; then, H can be given by

H = f(w1 X + b1 ), (1)

where w1 represents the weight, b1 represents the offset, and f is the activation function.
From the hidden layer to output layer, we can use Y as the output; then, obtain

Y = g(w2 H + b2 ), (2)

where w2 and b2 are the training parameters and g is the activation function.

2.2 Convolutional neural network (CNN)

Different from the application in the image processing, CNN uses one-dimensional (1D) convolution for load forecasting. Firstly, find a sequence
with a window size of kernel_size and then convolute the sequence with the original sequence to obtain a new sequence. In addition, the
convolution network contains a pooling operation, which can extract the useful data features.57-59 In this paper, we use a two-layer convolution
plus two max-pooling convolutional networks for short-term load forecasting, and the CNN model for short-term load forecasting is shown in
Figure 2.

2.3 Long-short time memory (LSTM)

The recursive neural network (RNN) is a powerful and useful tool for dealing with the time series. Compared with the conventional neural
network, the calculation results of the hidden layers of RNN are related to the present input and the previous output results of the hidden layers.
When dealing with long-term data, RNN model needs to link the current implicit state with the previous N time calculations. Therefore, the
amount of calculation will increase exponentially, resulting in a significant increase in model training time. To address the excessive calculation of

Convolutions 1

Max-pooling Full-connected
Convolutions 2
Max-pooling
Different Kernels/Filters

Feature map Output

Temperature value
Historical load Load of hours
(1D Grid) Load of days
Load of month
Season
Month FIGURE 2 The model of CNN for short-term
Holiday load forecasting
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 10 LI ET AL.

Input Layer Hidden Layer

MLP&LS MLP&LS
x TM&CNN TM&CNN
y y
MLP&LS

Data preprocessing
MLP&CN
x TM&CNN N&CNN

Raw data
y y Output Layer

Fully connected
y The predicted
.. load data
..
..
MLP&LS MLP&LS
x y
TM&CNN TM&CNN
y y
Network training

Rmsprop
optimization

Predicted
Actual data Loss
data
FIGURE 3 The proposed model calculation
for short-term load forecasting

RNN, the LSTM adds three valve nodes to each layer, including the forgetting gate, input gate, and output gate. These valves can be opened or
closed to determine whether the memory state of the model network reaches a threshold at each level of output. Valve nodes use the sigmoid
function to calculate the memory status of the network as input; if the output reaches a threshold, the valve output is multiplied by the calculation
results of present layer as the input of next layer. If the threshold is not reached, ignore the output. The weight of each layer and the valve node
are updated during each back propagation training process.

2.4 The common ANN model for short-term forecasting

In this section, we propose an ANN model for short-term load forecasting, which is shown in Figure 3. There are four parts in the model, including
the input layer, hidden layer, output layer, and the training/optimization part. The hidden layer is the corn of the prediction model; in this paper,
we can use the proposed MLP, CNN, and LSTM networks to process the data. In order to predict the load data of the hth hour, we need obtain
the previous L data as the input, where L denotes the data length. Assume that the raw data is S = {s1 , s2 , … , sn }; after the data preprocessing,
we can obtain a (n − L) × L dataset 𝚽 = {𝜙1 , 𝜙2 , … , 𝜙n−L }, where 𝜙i = {sj−L , sj−L+1 , … , sj−1 }, 1 ≤ i ≤ n − L, j = i + L. Then, we can divide the
dataset into two parts: the training set 𝚽train = 𝜙1 , 𝜙2 , … , 𝜙m and testing set 𝚽test = 𝜙m+1 , 𝜙m+2 , … , 𝜙n−L . Since the training sample data is large,
the mini-batch gradient descent (MBGD) algorithm is used as the optimization algorithm, which can reduce the calculation cost and increase the
operation speed. Moreover, all the models are trained using the RMSprop optimizer.60

3 MODEL INPUT AND THE LOSS FUNCTION

In this section, we use the proposed model to make a preliminary prediction of the electrical loads within the following 24 hours. The parameters
are shown in Table 1, where Lh represents the input data used to predict the loads of next day's hth hour. In order to simplify the data processing,
the loads and the temperature data are normalized, and the input data should be able to both forecast the short-term prediction accuracy and the
long-term trend, respectively.61-63 More specifically, the parameters Ld , Lw , Td , and Tw are expected to help the model to identify the long-term
trends, and Lh and Th are expected to forecast the loads and temperature of the following 24 hours. The predicted loads are used to replace the
value of Lh and correlate the next 24 hour's loads. All the above parameters can be adjusted according to different situations. Moreover, we also
consider the meteorological information and special days such as seasons, weekdays and weekends, holidays and nonholidays, etc, to capture
the periodicity and special time characteristics of the time series.
Since the output of the model must be close to the actual loads of the hth hour of day i, it is necessary to properly train the model; here, we
propose a loss function for the proposed model, and the loss function is given by

M N |̂i i|
1 ∑∑ ||yh − yh ||
= , (3)
MN i=1 h=1 yih
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 5 of 10

Input Size Description TABLE 1 Inputs of the deep learning

model for the prediction of the hth
Th 24 Temperature of recent 24 hours hour of next 24 hours
Td 7 Temperature of the hth hour of recent 7 days
Tw 4 Temperature of the hth hour of days that are 7, 14, 21, and 28 days prior to next 24 hours
Lh 24 Loads of recent past 24 hours
Ld 7 Loads of the hth hour of recent 7 days
Lw 4 Loads of the hth hour of days that are 7, 14, 21, and 28 days prior to next 24 hours
T 1 The forecasted temperature value of the hth hour of next 24 hours
S 1 Season
W 1 Weekday/weekend
H 1 Holiday

Model Output dimension/value TABLE 2 Neural network parameters

Dense1(ReLU) 74
Dense2(ReLU) 148
Dense4 1
hidden units 8
Input features 74
Dense 1
filter number 64
Kernel size 5
Pool window 2
Dropout rate 0.4
Dense1(ReLU) 148
Dense2 1

where M and N represent the number of days in the samples and the number of hours within a day, respectively, and ŷ hi and yhi represent the
predicted loads and actual loads for the hth hour of day i, respectively. Based on the proposed model, we investigate the performance of the
proposed loss function and compare the aforementioned three networks. All the experiments are done on a computer having Intel(R) Core(TM)
i7 − 6700 CPU @3.40 GHz processor and 6 GB RAM and running on Ubuntu 16.04.

4 EXPERIMENTAL RESULTS

We use two well-known datasets, including the ISO-NE dataset and the North-American Utility dataset, to test the validity of the proposed
model.* The ISO-NE dataset covers the loads and temperature data from January 3, 2003 to December 31, 2014. The North-American Utility
dataset contains load and temperature data at one-hour resolution. The dataset covers the time range between January 1, 1985 and October
12, 1992. In all experiments, the ranges of Spring, Summer, Autumn, and Winter are March 8 to June 7, June 8 to September 7, September 8 to
December 7, and December 8 to March 7, respectively. The codes have been made available online at https://github.com/ningningLiningning/
load-forecasting-codes.

4.1 Experimental setup

For ISO-NE dataset, the data from December 31, 2012 to December 31, 2014 are used as the test set, and the data before December 31, 2012
are used as train set. For North-American Utility dataset, the data from October 12, 1990 to October 12, 1992 are used as the test set, and
the data before October 12, 1990 are used as train set. We use threefold cross-validation to validate the load prediction models and the
best-performing hyperparameters are shown in Table 2.
We use the mean absolute percentage error (MAPE) to evaluate the proposed models.64 MAPE is a kind of measurement method used for
prediction the statistic data, and the expression is given by

1 ∑ |At − Ft |
n
M= , (4)
n i=1 At

where At denotes the actual loads, Ft denotes the forecasted loads, and n denotes the number of the test loads.

* The datasets have been made available online at https://class.ee.washington.edu/555/el- sharkawi and https://github.com/ningningLiningning/iso- ne.
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 10 LI ET AL.

TABLE 3 Comparison of the proposed model on the North-American Utility dataset Model 1990 1991 1992 Average
with respect to MAPE
CNN+CE 18.0% 17.8% 19.6% 18.5%
CNN + MAE 6.51% 6.27% 6.75% 6.19%
LSTM + PL 4.31% 7.92% 9.14% 7.30%
MLP + PL 3.61% 3.97% 3.17% 3.55%
CNN + PL 3.57% 3.89% 3.42% 3.13%

0.5

0.4
Train NN loss

Loss / 0.5-MAPE
Valid NN 0.5-MAPE
0.3 Train CNN loss
Valid CNN 0.5-MAPE
Train LSTM loss
Valid LSTM 0.5-MAPE
0.2

0.1

0
FIGURE 4 The training and validation curves of accuracy and loss on the 0 50 100 150 200 250 300
ISO-NE dataset Epochs

Furthermore, we compare the proposed loss function with the state-of-the-art loss functions, including the cross-entropy (CE) loss and mean
absolute error (MAE) loss.65
Assume ŷ hi and yhi represent the predicted loads and actual loads for the hth hour of day i, respectively. The CE loss is given by

( )
1 ∑∑ i
M N
CE = y log ŷ h,o
i
, (5)
MN i=1 h=1 h,o

i
where yh,o represents the one-hot vector of yhi . The MAE loss is given by

1 ∑∑| i
M N
|2
MAE = |ŷ − yhi | . (6)
MN i=1 h=1 | h |

The comparison of the proposed model on the North-American Utility dataset with respect to MAPE is shown in Table 3; we can see that for
the CNN, the MAPE of the proposed loss function is 15.37% lower than CE and 3.06% lower than MAE.

4.2 Performance of the proposed model on the ISO-NE dataset

We first give the training and validation curves of accuracy and loss on the proposed model.66 As shown in Figure 4, it can be seen that the loss
curves go down, and the accuracy curves keep rising in all of the mentioned neural networks. We can also observe that the training loss drops
sharply at the beginning and then descends steadily; moreover, the validation loss has the similar trend as the training loss. Table 4 shows the
performance comparison of the proposed model and other models on the ISO-NE dataset with respect to MAPE, where MLP + PL represents the
MLP network and the proposed loss function; similarly, CNN + PL and LSTM + PL represents the CNN and LSTM network and the proposed loss
function, respectively. From Table 4, we can see that the CNN performs the best in all of the mentioned networks for short-term load forecasting.
When we consider the existing load forecasting literatures, the LSTM outperforms CNN.17,33,34 Our results in this paper lead to the opposite
conclusion, which due to that, in normal time-dependent sequence prediction problems, the LSTM outperforms CNN; however, as Table 1 shows,
we do not simply classify the energy load prediction as a problem of sequence prediction. Therefore, the loads are not only related to the usage
during a certain periods but are also related to other factors such as season S, holiday H, etc. For example, the load usage at 10 o'clock of the next
morning is not only related to the load usage during a certain periods before 10 o'clock but is also related to the season, winter or autumn. CNN
can effectively capture the local similarities of the load usage based on the convolution kernel. However, the LSTM and MLP cannot capture
the inherent laws of loads. Therefore, CNN can achieve better performance in load forecasting problems. Based on the above discussion, in the
following, we use CNN as the network to test the efficiency of the proposed loss function. Let CNN + CE represent the CNN network and CE loss
function; CNN + MAE represents CNN network and MAE loss function. The performance is given in Table 5. We can see that the proposed model
CNN + PL outperforms the other existing models; the MAPE of the proposed loss function is 19.63% lower than CE and 2.34% lower than MAE.
From the above comparison, we can see that the proposed loss function outperforms CE and MAE, due to that the proposed loss function
represents the distance between the predicted load and the actual load; however, CE and MAE represent whether the prediction results is
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 7 of 10

Model 1990 1991 1992 Average TABLE 4 Comparison of the proposed model on the ISO-NE dataset with respect to
MAPE
LSTM + PL 5.49% 5.43% 5.57% 5.49%
MLP + PL 4.54% 5.11% 4.76% 4.80%
CNN + PL 2.77% 2.81% 2.77% 2.82%

Model 2012 2013 2014 Average TABLE 5 Comparison of the proposed loss function with respect to MAPE
CNN + CE 22.4% 21.6% 21.7% 21.3%
CNN + MAE 5.11% 5.15% 4.89% 5.07%
CNN + PL 2.77% 2.81% 2.77% 2.82%

Predicted Loads
10 Actual Loads
2
Load Value
MLP

1.5

1
0 50 100 150 200
Time

10
2
Load Value
CNN

1.5

1
0 50 100 150 200
Time

10
2
Load Value
LSTM

1.5
FIGURE 5 The comparison of actual loads
and the predicted loads between January 1,
1
0 50 100 150 200 2012 and January 10, 2012 on the ISO-NE
Time dataset

0.5

0.4
Train NN loss
Valid NN 0.5-MAPE
Loss / 0.5-MAPE

0.3 Train CNN loss

Valid CNN 0.5-MAPE
Train LSTM loss
Valid LSTM 0.5-MAPE
0.2

0.1

0
0 50 100 150 200 250 300 FIGURE 6 The training and validation curves of 0.5-MAPE and loss on the
Epochs North American dataset

correct. Moreover, according to the above discussion, we can conclude that the CNN model performs the best in present three networks and is
suitable for short-term load forecasting.
Figure 5 gives the comparison of actual loads and the predicted loads on the ISO-NE dataset. It can be seen that the curve of the predicted
loads is similar with the actual data.

4.3 Performance of the proposed model on the North-American Utility dataset

In this section, the generalization capability of the proposed model using the North-American Utility dataset is given. The training and validation
curves of accuracy and loss are shown in Figure 6. It can be seen that the loss curves go down, the accuracy curves keep rising of all the proposed
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 10 LI ET AL.

Predicted Load
Actual Load
3000

Load Value
MLP
2000

1000
0 50 100 150 200
Time
3000

Load Value
CNN
2000

1000
0 50 100 150 200
Time
3000

Load Value
LSTM

FIGURE 7 The comparison of actual 2000

load and the predicted load between 1000

January 1, 1992 and January 10, 0 50 100 150 200
1992 on the North American dataset Time

neural networks, and the training loss drops sharply at the beginning and then descends steadily. The curve of validation loss has the similar trend
as the training data. The predicted and actual loads data curves are given in Figure 7.

5 CONCLUSION

This paper has proposed an effective neural network model for the short-term load forecasting; in order to verify the effectiveness, we have
performed the experiments on three deep learning methods, including the MLP, CNN, and LSTM, to train and test the model. Two commonly
used benchmarks are used to verify the validity of the model. Comparisons with the existing models show that the proposed model has superior
performance in terms of prediction accuracy and robustness to temperature changes, and the simulation results also show that based on the
proposed model, CNN outperforms the other two networks. Since we do not simply classify the energy load prediction as a sequence prediction
problem, by adding one-hot codes for season, weekday/weekend distinction, and holiday/nonholiday distinction to help the model capture the
periodic and unordinary temporal characteristics of the load time series, the loads are not only related to the usage during a certain periods but
are also related to other factors. Therefore, CNN can effectively capture the local similarities of the load usage based on the convolution kernel
and achieve better performance than LSTM and MLP.
Moreover, we have proposed a new loss function, and the experiments show that the proposed loss function outperforms CE and MAE. That
is because the proposed loss function represents the distance between the predicted load and the actual load. According to the experiments, we
can conclude that the proposed model and loss function are suitable for short-term load forecasting. Since we only cover the latest technology of
deep neural networks, more deep neural network structures will be applied to the model in the future to improve the performance. Furthermore,
the implementation of deep learning in ultra-short-term load forecasting will be considered in our following works.

ACKNOWLEDGMENT

This work was supported in part by the National Key Technology Research and Development Program of China under grant 2016YFB0901103.

ORCID

Ning Li https://orcid.org/0000-0001-7897-9288

REFERENCES
1. Farhangi H. The path of the smart grid. IEEE Power Energy Mag. 2010;8(1):18-28.
2. Mohsenian-Rad A-H, Wong VWS, Jatskevich J, Schober R, Leon-Garcia A. Autonomous demand-side management based on game-theoretic energy
consumption scheduling for the future smart grid. IEEE Trans Smart Grid. 2010;1(3):320-331.
3. Palensky P, Dietrich D. Demand side management: demand response, intelligent energy systems, and smart loads. IEEE Trans Ind Inform.
2011;7(3):381-388.
4. Ceperic E, Ceperic V, Baric A. A strategy for short-term load forecasting by support vector regression machines. IEEE Trans Power Syst.
2013;28:4356-4364.
5. Yousaf R, Ahmad R, Ahmed W, Haseeb A. Fuzzy power allocation for opportunistic relay in energy harvesting wireless sensor networks. IEEE Access.
2017;5:17165-17176.
6. Rejc M, Pantos M. Short-term transmission-loss forecast for the Slovenian transmission power system based on a fuzzy-logic decision approach. IEEE
Trans Power Syst. 2011;26:1511-1521.
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL. 9 of 10

7. Gao R, Zhang L, Liu X. Short-term load forecasting based on least square support vector machine combined with fuzzy control. In: Proceedings of the
10th World Congress on Intelligent Control and Automation; 2012; Beijing, China.
8. Khuntia SR, Rueda JL, van der Meijden MAMM. Neural network-based load forecasting and error implication for short-term horizon. In: Proceedings
of the 2016 International Joint Conference on Neural Networks (IJCNN); 2016; Vancouver, Canada.
9. Niu D-X, Wanq Q, Li J-C. Short term load forecasting model using support vector machine based on artificial neural network. In: Proceedings of the
2005 International Conference on Machine Learning and Cybernetics; 2005; Guangzhou, China.
10. Hwang YM, Jung JH, Seo JK, Lee J-J, Kim JY. Energy-efficient transmission strategy with dynamic load for power line communications. IEEE Trans
Smart Grid. 2018;9(3):2382-2390.
11. Wang Y, Chen Q, Sun M, Kang C, Xia Q. An ensemble forecasting method for the aggregated load with subprofiles. IEEE Trans Smart Grid.
2018;9(4):3906-3908.
12. Wang Y, Chen Q, Zhang N, Wang Y. Conditional residual modeling for probabilistic load forecasting. IEEE Trans Power Syst. 2018;33(6):7327-7330.
13. Chen P, Cao Z, Chen Z, Wang X. Off-grid DOA estimation using sparse Bayesian learning in MIMO radar with unknown mutual coupling. IEEE Trans
Signal Process. 2019;67(1):208-220.
14. Jin X, Wu J, Dong Y, Chi D. Application of a hybrid model to short-term load forecasting. In: Proceedings of the 2010 International Conference of
Information Science and Management Engineering; 2010; Xi'an, China.
15. Mukhopadhyay P, Mitra G, Banerjee S, Mukherjee G. Electricity load forecasting using fuzzy logic: short term load forecasting factoring weather
parameter. In: Proceedings of the 2017 7th International Conference on Power Systems (ICPS); 2017; Pune, India.
16. Papadakis SE, Theocharis JB, Bakirtzis AG. Fuzzy short-term load forecasting models based on load curve-shaped prototype fuzzy clustering. In:
Proceedings of the 1999 European Control Conference (ECC); 1999; Karlsruhe, Germany.
17. Kuo P-H, Huang C-J. An electricity price forecasting model by hybrid structured deep neural networks. Sustainability. 2018;10(4):1280.
18. Taylor JW. Short-term electricity demand forecasting using double seasonal exponential smoothing. J Oper Res Soc. 2003;54(8):799-805.
19. He W. Load forecasting via deep neural networks. Procedia Comput Sci. 2017;122:308-314.
20. Wu W, Chen K, Qiao Y, Lu Z. Probabilistic short-term wind power forecasting based on deep neural networks. In: Proceedings of the 2016 International
Conference on Probabilistic Methods Applied to Power Systems (PMAPS); 2016; Beijing, China.
21. Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks. 2015;61:85-117.
22. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798-1828.
23. Li L, Ota K, Dong M. Everything is image: CNN-based short-term electrical load forecasting for smart grid. In: Proceedings of the 2017 14th
International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science
and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC); 2017; Exeter, UK.
24. Din GMU, Marnerides AK. Short term power load forecasting using Deep Neural Networks. In: Proceedings of the 2017 International Conference on
Computing, Networking and Communications (ICNC); 2017; Santa Clara, CA.
25. Shi H, Xu M, Li R. Deep learning for household load forecasting — a novel pooling deep RNN. IEEE Trans Smart Grid. 2018;9(5):5271-5280.
26. Dong X, Qian L, Huang L. Short-term load forecasting in smart grid: a combined CNN and K-means clustering approach. In: Proceedings of the 2017
IEEE International Conference on Big Data and Smart Computing (BigComp); 2017; Jeju, South Korea.
27. Voß M, Bender-Saebelkampf C, Albayrak S. Residential short-term load forecasting using convolutional neural networks. In: Proceedings of the 2018
IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm); 2018; Aalborg, Denmark.
28. Akbari Asanjan A, Yang T, Hsu K, Sorooshian S, Lin J, Peng Q. Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent
neural networks. J Geophys Res Atmos. 2018;123(22):12,543-12,563.
29. Zhang D, Lin J, Peng Q, et al. Modeling and simulating of reservoir operation using the artificial neural network, support vector regression, deep
learning algorithm. J Hydrol. 2018;565(C):720-736.
30. Yang T, Asanjan AA, Faridzad M, Hayatbini N, Gao X, Sorooshian S. An enhanced artificial neural network with a shuffled complex evolutionary global
optimization with principal component analysis. Information Sciences. 2017;418-419:302-316.
31. Yang T, Asanjan AA, Welles E, Gao X, Sorooshian S, Liu X. Developing reservoir monthly inflow forecasts using artificial intelligence and climate
phenomenon information. Water Resour Res. 2017;53(4):2786-2812.
32. Høverstad BA, Tidemann A, Langseth H, Öztärk P. Short-term load forecasting with seasonal decomposition using evolution for parameter tuning.
IEEE Trans Smart Grid. 2015;6(4):1904-1913.
33. Lago J, De Ridder F, De Schutter B. Forecasting spot electricity prices: deep learning approaches and empirical comparison of traditional algorithms.
Applied Energy. 2018;221(C):386-405.
34. Ugurlu U, Oksuz I, Tas O. Electricity price forecasting using recurrent neural networks. Energies. 2018;11(5):1-23.
35. Hosein S, Hosein P. Load forecasting using deep neural networks. In: Proceedings of the 2017 IEEE Power Energy Society Innovative Smart Grid
Technologies Conference (ISGT); 2017; Washington, DC.
36. Fu R, Zhang Z, Li L. Using LSTM and GRU neural network methods for traffic flow prediction. In: Proceedings of the 2016 31st Youth Academic
Annual Conference of Chinese Association of Automation (YAC); 2016; Wuhan, China.
37. Sirojan T, Phung BT, Ambikairajah E. Deep neural network based energy disaggregation. In: Proceedings of the 2018 IEEE International Conference
on Smart Energy Grid Engineering (SEGE); 2018; Oshawa, Canada.
38. Marino DL, Amarasinghe K, Manic M. Building energy load forecasting using Deep Neural Networks. In: Proceedings of the IECON 2016 - 42nd
Annual Conference of the IEEE Industrial Electronics Society; 2016; Florence, Italy.
39. Qiu X, Zhang L, Ren Y, Suganthan PN, Amaratunga G. Ensemble deep learning for regression and time series forecasting. In: Proceedings of the 2014
IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL); 2014; Orlando, FL.
40. Gensler A, Henze J, Sick B, Raabe N. Deep Learning for solar power forecasting — an approach using AutoEncoder and LSTM neural networks. In:
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC); 2016; Budapest, Hungary.
41. Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans Smart
Grid. 2019;10(1):841-851.
15320634, 2020, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5595 by Universität zu Lübeck, Wiley Online Library on [12/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 10 LI ET AL.

42. Saad Z, Hazirah AJN, Suziana A, et al. Short-term load forecasting of 415V, 11kV and 33kV electrical systems using MLP network. In: Proceedings of
the 2017 International Conference on Robotics, Automation and Sciences (ICORAS); 2017; Melaka, Malaysia.
43. Amarasinghe K, Marino DL, Manic M. Deep neural networks for energy load forecasting. In: Proceedings of the 2017 IEEE 26th International
Symposium on Industrial Electronics (ISIE); 2017; Edinburgh, UK.
44. Han L, Peng Y, Li Y, Yong B, Zhou Q, Shu L. Enhanced deep networks for short-term and medium-term load forecasting. IEEE Access. 2019;7:4045-4055.
45. Sainath TN, Vinyals O, Senior A, Sak H. Convolutional, long short-term memory, fully connected Deep Neural Networks. In: Proceedings of the 2015
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2015; Brisbane, Australia.
46. Chen Z, Chen P, Li J, Miao P. Non-orthogonal multi-carrier MIMO communication system using M-ary efficient modulation. Digit Signal Process.
2018;76:14-21.
47. Chen P, Zheng L, Wang X, Li H, Wu L. Moving target detection using colocated MIMO radar on multiple distributed moving platforms. IEEE Trans
Signal Process. 2017;65(17):4670-4683.
48. Chen Z, Cao Z, He X, Jin Y, Li J, Chen P. DoA and DoD estimation and hybrid beamforming for radar-aided mmWave MIMO vehicular communication
system. Electronics. 2018;7(3):40.
49. Almalaq A, Edwards G. A review of deep learning methods applied on load forecasting. In: Proceedings of the 2017 16th IEEE International Conference
on Machine Learning and Applications (ICMLA); 2017; Cancun, Mexico.
50. Zang H, Cheng L, Ding T, et al. Hybrid method for short-term photovoltaic power forecasting based on deep convolutional neural network. IET Gener
Transm Distrib. 2018;12(20):4557-4567.
51. Chen P, Chen Z, Zhang X, Liu L. SBL-based direction finding method with imperfect array. Electronics. 2018;7(12):426.
52. Ling Z-H, Deng L, Yu D. Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech
synthesis. IEEE Trans Audio Speech Lang Process. 2013;21(10):2129-2139.
53. Fan S, Chen L. Short-term load forecasting based on an adaptive hybrid method. IEEE Trans Power Syst. 2006;21(1):392-401.
54. Cao Z, Chen P, Chen Z, He X. Maximum likelihood-based methods for target velocity estimation with distributed MIMO radar. Electronics. 2018;7(3):29.
55. Dragomir OE, Dragomir F, Brezeanu I, Minca E. MLP neural network as load forecasting tool on short-term horizon. In: Proceedings of the 2011 19th
Mediterranean Conference on Control Automation (MED); 2011; Corfu, Greece.
56. Bosa DA. Energia natural afluente monthly forecasting using an artificial neural networks MLPs. In: Proceedings of the 2018 Simposio Brasileiro de
Sistemas Eletricos (SBSE); 2018; Niteroi, Brazil.
57. Chua LO, Roska T. The CNN paradigm. IEEE Trans Circuits Syst I Fundam Theory Appl. 1993;40(3):147-156.
58. Roska T, Chua LO. The CNN universal machine: an analogic array computer. IEEE Trans Circuits Syst II Analog Digit Signal Process. 1993;40(3):163-173.
59. Dong X, Qian L, Huang L. A CNN based bagging learning approach to short-term load forecasting in smart grid. In: Proceedings of the 2017 IEEE
Smartworld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing,
Internet of People and Smart City Innovation; 2017; San Francisco, CA.
60. Yazan E, Talu MF. Comparison of the stochastic gradient descent based optimization techniques. In: Proceedings of the 2017 International Artificial
Intelligence and Data Processing Symposium (IDAP); 2017; Malatya, Turkey.
61. Chen K, Chen K, Wang Q, He Z, Hu J, He J. Short-term load forecasting with deep residual networks. IEEE Trans Smart Grid. 2019;10(4):3943-3952.
62. Singh S, Hussain S, Bazaz MA. Short term load forecasting using artificial neural network. In: Proceedings of the 2017 Fourth International Conference
on Image Information Processing (ICIIP); 2017; Shimla, India.
63. Ramezani M, Falaghi H, Haghifam M-R, Shahryari GA. Short-term electric load forecasting using neural networks. In: Proceedings of the EUROCON
2005 - The International Conference on ‘‘Computer as a Tool’’; 2005; Belgrade, Serbia.
64. De Myttenaere A, Golden B, Le Grand B, Rossi F. Mean absolute percentage error for regression models. Neurocomputing. 2016;192:38-48.
65. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA: MIT Press; 2016.
66. Meng F, Chen P, Wu L, Wang X. Automatic modulation classification: a deep learning enabled approach. IEEE Trans Veh Technol.
2018;67(11):10760-10772.

How to cite this article: Li N, Wang L, Li X, Zhu Q. An effective deep learning neural network model for short-term load forecasting.
Concurrency Computat Pract Exper. 2020;32:e5595. https://doi.org/10.1002/cpe.5595