Deep Neural Networks for Energy Load Forecasting
Deep Neural Networks for Energy Load Forecasting
Abstract—Smartgrids of the future promise unprecedented wastage [10]. Accurate forecasts or predictions of the energy
flexibility in energy management. Therefore, accurate demand at building level can help optimize energy usage and
predictions/forecasts of energy demands (loads) at individual site minimize wastage at building level. The advent of smart meters
and aggregate level of the grid is crucial. Despite extensive have made the energy consumption data available [11]. Thus,
research, load forecasting remains to be a difficult problem. This data driven and statistical modeling have been made possible
paper presents a load forecasting methodology based on deep [3].
learning. Specifically, the work presented in this paper
investigates the effectiveness of using Convolutional Neural Load forecasting is subcategorized in to three categories: 1)
Networks (CNN) for performing energy load forecasting at Short-term load forecasting (one hour to one week) 2) Medium-
individual building level. The presented methodology uses term load forecasting (week to a year) and 3) Long-term load
convolutions on historical loads. The output from the forecasting (longer than a year) [2]. Regardless of the category,
convolutional operation is fed to fully connected layers together load forecasting has been shown to be a difficult problem. In
with other pertinent information. The presented methodology was that, individual building level load forecasting has been shown
implemented on a benchmark data set of electricity consumption to be a harder task than aggregate load forecasting [2], [12]. Two
for a single residential customer. Results obtained from the CNN major approaches exist in literature for performing energy load
were compared against results obtained by Long Short Term forecasting: 1) Physics principles based forecasting and 2)
Memories LSTM sequence-to-sequence (LSTM S2S), Factored Statistical and machine learning based forecasting. This paper
Restricted Boltzmann Machines (FCRBM), “shallow” Artificial
focuses only on the second approach. In literature, many
Neural Networks (ANN) and Support Vector Machines (SVM) for
the same dataset. Experimental results showed that the CNN
machine learning approaches have been explored. In [3] ,
outperformed SVR while producing comparable results to the Artificial Neural Network (ANN) ensembles were used to
ANN and deep learning methodologies. Further testing is required perform the building level load forecasting. ANNs have been
to compare the performances of different deep learning explored in detail for the purpose of all three categories of load
architectures in load forecasting. forecasting [5] , [13], [14], [15], [16]. In [17], support vector
machines coupled with empirical mode decomposition was used
Keywords— Deep Learning; Deep Neural Networks; to perform long term load forecasting. Kernel based multi task
Convolutional Neural Networks; Building Energy; Energy; learning methodologies were used for electricity demand
Artificial Neural Networks prediction in [18]. In [12], individual household electricity loads
are modeled using sparse coding to perform medium term load
I. INTRODUCTION forecasting. Apart from the methods enumerated above, a
Smart grids demand an unprecedented level of flexibility in multitude of different methods are proposed in literature for
the way energy is generated and distributed to minimize energy solving the load forecasting problem. In the interest of brevity,
wastage and optimize usage [1], [2], [3]. Therefore, the power an exhaustive coverage of the presented methods is not provided
grid has to be able to dynamically adapt to changes in demand in this paper. Readers are referred to [19], [20] and [4] for
and efficiently distribute the generation. Further, the grid comprehensive surveys of different techniques used for load
controller should have the capability of efficiently handing the forecasting. However, despite the extensive research, individual
distributed generation from various sources such as renewables site/building level load forecasting remains to be a difficult
[4]. Therefore, intelligent control decisions should be made in problem.
real-time at aggregate level as well as at individual component This paper investigates the possibility of using deep learning
level. Therefore, the ability of forecasting the future demands for performing individual building level load forecasting. Multi
accurately is imperative [2], [5]. layered architecture of deep learning allows the learning process
Forecasting the demand of individual component is as to be carried out with multiple layers of abstraction. Thus, deep
important as forecasting the aggregate demand in several fronts. learning architectures allow the learning structure to learn
In terms of demand response, individual component/ building complex relationships between inputs and outputs and complex
level demand forecasting enables responding to local demand patterns in data. Thus, deep learning has revolutionized a range
with local generations [2]. In terms of future uncertainty of fields such as speech recognition and computer vision. A
mitigation, demand/load forecasting at aggregate and individual comprehensive overview and a review of deep learning
component level, plays a crucial role [2]. Further, apart from methodologies can be found in [21].
being a major energy consumer by accounting for 20%-40% of In previous work for load forecasting using deep learning,
the total worldwide energy production [6], [7], [8], [9] buildings authors of [2] performed building level load forecasting using
are shown to account for a significant portion of the total energy
l-)))
Conditional Restricted Boltzmann Machines (CRBM) [22] and [26] was used. To benchmark the performance, the same
Factored Conditional Restricted Boltzmann Machines forecasting process was carried out using standard “shallow”
(FCBRM) [23]. Authors compared the two methods to several Artificial Neural Networks (ANNs) and Support Vector
traditional methods including Support Vector Machines and Machines (SVM). Therefore, performance of CNN was
traditional or “shallow” Artificial Neural Networks (ANN). compared against ANN, SVM, LSTM, LSTM-S2S and
Their experimental results showed that the FCRBM method FCRBM. In addition to comparing with other algorithms,
outperformed the other tested methodologies. In another deep several architectures of CNNs were tested.
learning approach, the authors use Deep Belief Networks for
The rest of the paper is organized as follows. Section II
performing short-term load forecasting on a Macedonian hourly
provides background on Convolutional Neural Networks.
electricity consumption dataset [24]. They compare the results
Section III elaborates the presented energy load forecasting
to an ANN and conclude that DBN significantly outperforms the
methodology using CNNs. Section IV elaborates the
ANN on the tested dataset. In [25], the authors present an
experimental setup. Section V presents the experimental results.
ensemble method combining Deep Belief Networks and Support
Finally, section VI concludes the paper.
Vector Machines and present load forecasting as a test case of
the presented deep learning based regression methodology. II. CONVOLUTIONAL NEURAL NETWORKS
In our previous work, we investigated the effectiveness of This section provides a brief overview on Convolutional
using Long Short Term Memory (LSTM) based algorithms for Neural Networks (CNNs).
building level load forecasting [26]. Two variants of the LSTM
architecture were investigated: 1) standard LSTM architecture CNNs are a special type of neural network, which is mainly
and 2) LSTM based sequence-to-sequence architecture. The used for processing data with a grid topology [27]. For instance,
same dataset which was used in [2] with the same train and test images can be viewed as 2D grids and time series data such as
split was used for comparing results. It was shown that the energy consumption data, can be viewed as 1D grids. CNNs
sequence to sequence architecture performed well and produced have been used successfully literature mainly for computer
comparable results with the FCRBM algorithm in [2]. vision tasks such as image classification [21], [28]. CNNs use a
specialized linear operation named convolution in at least one of
This paper presents a continuation of the work of load the layers in the network.
forecasting using deep learning by exploring a different deep
learning technique for performing building level energy load Convolution is defined as an operation on two functions on
forecasting. The presented methodology uses Convolutional real valued arguments [28]. The convolution operation is
Neural Networks (CNNs) to perform the forecasting. In the denoted with an asterisk.
presented work, multiple convolutional layers are used on
historical load data before performing the final regression task.
s = ( x ∗ w) (1)
The presented CNN based load forecasting methodology was where x denotes the input function, w denotes the weighting
tested on a benchmark dataset which contained electricity function. In the context of CNNs, the weighting function is
consumption data for a single residential customer with time called a “kernel”. The output of the convolution operation is
resolution of one hour. In order to compare results with the often called the “feature map” (denoted by s).
earlier work in deep learning, the same dataset used in [2] and
Fig. 1: Convolutional Neural Network architecture employed for energy load forecasting
The convolution operation is usually applied to inputs load for the next N time steps. The vector of predicted N time
represented in multidimensional arrays. Further, the kernel is steps can be expressed as:
also a multidimensional array of weights that is changed as the
algorithm learns through the iterations. Therefore, in a general
G
case, since the inputs and kernels are multidimensional, the yˆ = { yˆ [ M ] , yˆ [ M +1] , yˆ [ M + 2 ] , yˆ [ M + 3 ] ,..., yˆ [ M + N −1] } (4)
convolution operation is applied to more than one dimension.
Thus, the convolution operation for a two dimensional input can where yˆ [t ] represents the forecasted/ predicted energy load for
be expressed as:
the time step t.
In performing the energy load prediction, a pre-defined
s (i , j ) = ( I * K )( i , j ) = ¦ ¦ I (l , m ) K (i + l , j + m ) (2)
number of historic energy load measurements are fed into the
l m
where I represents a two dimensional input, K represents a two convolutional layers of the CNN. Since energy load
dimensional kernel. S is the resulting feature map after the measurements are time series data, 1-D convolutions are
convolution. performed on the data. As mentioned in the previous section,
three stages of the convolution operation are applied to the
Each convolutional layer is comprised of three phases. The historical data. Therefore, the input vector to the convolutional
first phase performs the above described convolution operation layers takes the same form as Eq (3).
and produces a feature map. Then, the elements of the feature Once the convolutional operations are performed, they are
map is run through a nonlinear activation function. The rectified forwarded to a set of fully connected layers. In this stage, in
linear activation function [29] is usually used in this stage [28]. addition to the historic energy load measurement data,
Finally, a pooling function is used in the third stage to further
information about the date and time of the first prediction is fed
modify and smoothen the feature map. The pooling operation
makes the representation less susceptible to small variations in into the CNN as inputs. The input vector for the fully connected
the input. Out of various pooling techniques, in the presented layers of the CNN can be expressed as:
work, max pooling method is used. In max pooling, the
operation returns the maximum value of a predefined G G
i fc = {o c , h[ M ] , m [ M ] , dw [ M ] , dm [ M ] , wf [ M ] } (5)
rectangular neighborhood. Other pooling techniques such as
average pooling, min pooling and weighted average pooling G
have been used in literature [28]. where i fc is the input vector for the fully connected layer, oGc is
As mentioned, the designed network can consist of one or the output from the convolutional layers, h[ M ] , m[ M ] , dw[ M ] ,
more convolutional layers. Once the convolutional layer(s) dm[ M ] and wf [ M ] represent the hour, month, day of the week,
produce their outputs, the output is sent to one or more fully
connected layers. Fully connected layers can be thought of as day of the month and a flag which is set if it’s a weekend for the
hidden layers in a standard neural network. The output layer is time stamp of the first prediction. I.e. if historical energy load
placed after the fully connected layers. The output layer data for 10 previous time steps are used as inputs to make the
performs a similar function to an output layer in a standard prediction of next time stamps, time stamp data for the 11th time
ANN. Learning process of the CNN is carried out using back step is used as the input for the hidden layer. Fig. 1 shows the
propagation. proposed CNN architecture for energy load forecasting.
Standard back propagation is used for training of the model.
III. CONVOLUTIONAL NEURAL NETWORKS FOR ENERGY LOAD A gradient descent model such as Stochastic Gradient Decent
FORECASTING (SGD) is used to perform the optimization.
This section elaborates the presented methodology for
The loss function for the optimization can be expressed as:
energy load forecasting using Convolutional Neural Networks
The objective of the presented load forecasting methodology N
is to estimate the energy load for a time step or multiple time L = ¦ ( y [ t ] − yˆ [ t ] ) (6)
steps in the future, given historical electricity load data. For t =1
purpose of simplicity, in the presented work, the time step
between two measurements are assumed constant. where L is expressed as the total prediction error for N future
Let historical energy load data for M time steps be available. time steps.
Then, the vector of available historical energy load data can be
IV. EXPERIMENTAL SETUP
expressed as:
This section elaborates the experimentation process of the
presented methodology. First, the used dataset is introduced.
G Then, the specifics of the implementation are discussed.
y = { y[ 0 ] , y[1] , y[ 2] , y[ 3] ,..., y[ M −1] } (3)
A. Dataset
where y[t] represents the actual energy load measurement for the The presented CNN based energy load forecasting
time step t. The forecasting algorithm should predict the energy methodology was implemented on a benchmark dataset of
TABLE I: TESTED CNN ARCHITECTURES
electrical consumption. The used dataset contains electrical (hidden layer (s)). In this experiment, two hidden layers with 20
consumption data for a single residential customer named neurons each were used for all the test cases. The hidden layer
“Individual household electric power consumption dataset” used the ReLU function as its activation function. Since there
[30]. The used dataset contains power consumption data for the were 60 outputs, the output layer contained 60 neurons. The
period of four years (December 2006 – November 2010) with output layer contained a linear activation function to produce
one-minute resolution. The dataset contains aggregate active outputs. Different CNN architectures with different
power and three sub-metering measurements for three separate convolutional layers, kernel sizes and pooling filter sizes were
sections of the house. In the presented work, only the aggregate tested. Table I summarizes the different CNN architectures for
active power data are used. each test case. For all architectures, for training, ADAM [31]
algorithm was used as the gradient based optimizer. For all the
The dataset contains 2075259 active power measurements in test cases the same training and testing data were used.
the aforementioned range of dates. Even though the dataset
contained one-minute resolution data, the study was conducted In order to provide a benchmark, the load forecasting process
in one-hour resolution data. One-hour resolution data were was carried out using a standard feed forward “shallow” ANN
obtained by averaging the one-minute resolution data. and Support Vector Machines (SVM). Historical energy load
Therefore, the dataset was reduced to 34608 records. The dataset data for 60 previous time steps together with the same time step
was split in to training and testing sets. In [2] and [26], first three TABLE II: TESTING ERROR COMPARISON WITH OTHER
years were used for training and the last year data were used for ALGORITHMS
testing. For comparison, the same train test split was used. In
addition, to get generalized performance, training was carried RMSE
RMSE
out using k-fold cross validation as well. Algorithm (Specific
With Cross
Training/Testing
Validation
B. Implementation Details for Comparison)
As mentioned, the CNN based load forecasting algorithm CNN 0.677 0.732
was implemented on one-hour resolution data. The presented LSTM – S2S [24] 0.625 N/A
methodology was implemented to perform a forecast for the next FCRBM [2] 0.663 N/A
60 hours. I.e. N in Eq. (4) was set to 60. In order to perform the SVM 0.814 0.895
said prediction, historic power consumption data for the 60
immediate previous hours were fed in the CNN based ANN 0.691 0.736
methodology. I.e. M was set to 60 in Eq. (3).
Therefore, 60 inputs were fed into the convolutional layers TABLE III: ERRORS SUMMARY FOR DIFFERENT CNN TEST
of the CNN. Since the inputs were time series data, it was CASES (CROSS VALIDATION)
considered to be data with the form of a 1D grid. Therefore, the CNN Test Case RMSE (Testing)
Eq. (2) presented in Section II has to be modified for a 1D input.
Further the kernel used in the convolution layers are defined as 1 0.732
1 dimensional kernels. Fig. 1 shows the architecture of the 2 0.734
designed CNN. In the implemented CNN, each of the
convolutional layers were designed to have the above mentioned 3 0.746
three phases; 1) convolution phase, 2) non-linear transformation 4 0.766
and 3) pooling (subsampling) phase. As mentioned, the
5 0.744
convolution phase for the three layers were carried out with 1D
kernels. Non-linear transformation for all convolutional layers 6 0.737
was carried out using the linear rectifier unit (ReLU) activation 7 0.757
function. The pooling phase for all convolutional layers was
performed using max-pooling. Once the convolutional layers 8 0.756
produced their output, it was sent to a fully connected layer(s)
data sent to the CNN were used as inputs to ANN and SVM. All
three algorithms were implemented with K-fold cross validation
V. EXPERIMENTAL RESULTS
This section elaborates results obtained from the conducted
experiments. As mentioned, the dataset was split in to a training
set and a testing set. The training set contained data for the first
three years and the testing set contained the data for the final
year. This specific train/testing split was used to follow the same
procedure used in [2] and [26] so that results can be compared.
It was noticed that this specific split resulted in testing errors,
which were less than the training errors. Even though this was
an unusual observation, the specific split had to be used to be
able to compare with the previous work done with DNN. In
addition to the specific split, K-fold cross validation was
performed for CNNN, ANN and SVM. Fig. 2(a): Sample 60 hour prediction – CNN (Training set)
Table II summarizes the results obtained from all the
algorithms for testing sets in terms of RMSEs. It can be seen that
all the DNN architectures perform better than the SVM but
produced similar results to ANN. It was seen that the CNN
slightly outperformed the FCRBM but not the LSTM-S2S
presented in [26]. ANN performed better than the SVM but
couldn’t outperform any of the deep architectures.
As mentioned, it was noticed that when the specific training
testing split was used, the testing error was lower than the
training error. Therefore, it was important to get a more general
idea of the algorithms performance. Table II presents the results
obtained by 4-fold cross validation. It can be seen that when
cross validation was performed, the results are not as good as the
specific split. Cross-validated results were not available for the
LSTM and the FCRBM.
Fig.2 (a) and Fig.2 (b) depicts two samples from CNN Fig. 2(b): Sample 60 hour prediction – CNN (Testing Set)
predictions in the training and testing sets respectively. Figures
show samples where the actual measurement is compared with VI. CONCLUSION AND DISCUSSION
the CNN estimation. In both training and testing it can be been This paper investigated the effectiveness of using
that the CNN is able to follow the general trend of data showing Convolutional Neural Networks for individual building level
capabilities of generalization. The superior generalization energy load forecasting. The work was an extension of our work
capability of CNN is shown in the cross validated results as well. on deep learning based energy load forecasting in [26]. The
In addition to comparing the presented method with other presented method was implemented on a benchmark dataset of
algorithms, several architectures of the CNN was tested as electricity consumption data for one residential customer. For
mentioned in Section IV (see Table I). As mentioned, eight comparison, load forecasting was carried out using ANN and
different test cases were tried and it was seen that the much SVM. The results obtained by CNN was compared to ANN,
variation of the performance wasn’t shown, in training or SVM and other deep architectures LSTM and FCRBM. In
testing, regardless of the architecture. Table III presents the addition to the algorithm comparisons, testing was carried out
results obtained over all the test cases. It was seen that Test Case with different CNN architectures that consisted of different
6 (3 convolutional layers) managed to produce the lowest convolutional layers. It was noticed that results didn’t vary much
training error while Test Case 1 (1 convolutional layer) managed across the different architectures. The presented CNN managed
to produce the lowest testing error. to produce comparable results to the FCRBM presented in [2]
and Sequence to Sequence LSTM presented by us in [26] on the
However, the method needs to be tested on several different same dataset. Therefore, it can be concluded that CNN remains
datasets and many architectures that are more different to to be a viable candidate for producing accurate load forecasts.
accurately analyze effect of having multiple convolutional However, it needs to be further tested with different datasets to
layers and different kernel sizes on the forecasting/ prediction validate the performance. In addition to these methods, authors
accuracy. Further, all these methods need to be extensively in [24] and in [25] presented Deep Belief Networks for load
tested on several datasets to be able to provide accurate forecasting. In order to compare the effectiveness of all these
comparisons on their performance. algorithms they need to be tested on different real world
datasets. Further, majority of the work (except [24]) do not use
weather data as an input to their work. It is important to add the
weather data as inputs to the learning algorithms as it has a direct [14] M. D. Felice and X. Yao, “Short-Term Load Forecasting with Neural
relationship with the energy consumption. As future work, we Network Ensembles: A Comparative Study [Application Notes],” IEEE
Computational Intelligence Magazine, vol. 6, no. 3, pp. 47–56, Aug.
plan to compare the results of the above algorithms with weather 2011.
data to find the effectiveness of deep learning algorithms in load
[15] R. Kumar, R. K. Aggarwal, and J. D. Sharma, “Energy analysis of a
forecasting. In addition, short term, medium term and long term building using artificial neural network: A review,” Energy and
forecasting will be carried out with different methods to check Buildings, vol. 65, pp. 352–358, Oct. 2013.
their effectiveness. [16] S. M. Sulaiman, P. A. Jeyanthy, and D. Devaraj, “Artificial neural
network based day ahead load forecasting using Smart Meter data,” in
REFERENCES 2016 Biennial International Conference on Power and Energy Systems:
[1] C. Clastres, “Smart grids: Another step towards competition, energy Towards Sustainable Energy (PESTSE), 2016, pp. 1–6.
security and climate change objectives,” Energy Policy, vol. 39, no. 9, [17] L. Ghelardoni, A. Ghio, and D. Anguita, “Energy Load Forecasting Using
pp. 5399–5408, Sep. 2011. Empirical Mode Decomposition and Support Vector Regression,” IEEE
[2] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deep learning Transactions on Smart Grid, vol. 4, no. 1, pp. 549–556, Mar. 2013.
for estimating building energy consumption,” Sustainable Energy, Grids [18] J. B. Fiot and F. Dinuzzo, “Electricity Demand Forecasting by Multi-
and Networks, vol. 6, pp. 91–99, Jun. 2016. Task Learning,” IEEE Transactions on Smart Grid, vol. PP, no. 99, pp.
[3] J. G. Jetcheva, M. Majidpour, and W.-P. Chen, “Neural network model 1–1, 2016.
ensembles for building-level electricity load forecasts,” Energy and [19] A. K. Singh, Ibraheem, S. Khatoon, M. Muazzam, and D. K. Chaturvedi,
Buildings, vol. 84, pp. 214–223, Dec. 2014. “Load forecasting techniques and methodologies: A review,” in 2012 2nd
[4] P. Siano, “Demand response and smart grids—A survey,” Renewable and International Conference on Power, Control and Embedded Systems,
Sustainable Energy Reviews, vol. 30, pp. 461–478, Feb. 2014. 2012, pp. 1–10.
[5] C. Roldán-Blay, G. Escrivá-Escrivá, C. Álvarez-Bel, C. Roldán-Porta, [20] L. Hernandez et al., “A Survey on Electric Power Demand Forecasting:
and J. Rodríguez-García, “Upgrade of an artificial neural network Future Trends in Smart Grids, Microgrids and Smart Buildings,” IEEE
prediction method for electrical consumption forecasting using an hourly Communications Surveys Tutorials, vol. 16, no. 3, pp. 1460–1495, Third
temperature curve model,” Energy and Buildings, vol. 60, pp. 38–46, 2014.
May 2013. [21] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
[6] L. Pérez-Lombard, J. Ortiz, and C. Pout, “A review on buildings energy no. 7553, pp. 436–444, May 2015.
consumption information,” Energy and Buildings, vol. 40, no. 3, pp. 394– [22] V. Mnih, H. Larochelle, and G. E. Hinton, “Conditional Restricted
398, 2008. Boltzmann Machines for Structured Output Prediction,”
[7] K. Amarasinghe, D. Wijayasekara, H. Carey, M. Manic, D. He, and W. arXiv:1202.3748 [cs, stat], Feb. 2012.
P. Chen, “Artificial neural networks based thermal energy storage control [23] G. W. Taylor, G. E. Hinton, and S. T. Roweis, “Two Distributed-State
for buildings,” in IECON 2015 - 41st Annual Conference of the IEEE Models For Generating High-Dimensional Time Series,” Journal of
Industrial Electronics Society, 2015, pp. 005421–005426. Machine Learning Research, vol. 12, no. Mar, pp. 1025–1068, 2011.
[8] D. Wijayasekara and M. Manic, “Data-fusion for increasing temporal [24] A. Dedinec, S. Filiposka, A. Dedinec, and L. Kocarev, “Deep belief
resolution of building energy management system data,” in IECON 2015 network based electricity load forecasting: An analysis of Macedonian
- 41st Annual Conference of the IEEE Industrial Electronics Society, case,” Energy, vol. 115, Part 3, pp. 1688–1700, Nov. 2016.
2015, pp. 004550–004555. [25] X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga,
[9] M. Manic, D. Wijayasekara, K. Amarasinghe, and J. J. Rodriguez- “Ensemble deep learning for regression and time series forecasting,” in
Andina, “Building Energy Management Systems: The Age of Intelligent 2014 IEEE Symposium on Computational Intelligence in Ensemble
and Adaptive Buildings,” IEEE Industrial Electronics Magazine, vol. 10, Learning (CIEL), 2014, pp. 1–6.
no. 1, pp. 25–39, Mar. 2016. [26] D. L. Marino, K. Amarasinghe, and M. Manic, “Building Energy Load
[10] S. Naji et al., “Estimating building energy consumption using extreme Forecasting using Deep Neural Networks,” arXiv:1610.09460 [cs], Oct.
learning machine method,” Energy, vol. 97, pp. 506–516, Feb. 2016. 2016.
[11] M. Manic, K. Amarasinghe, J. J. Rodriguez-Andina, and C. Rieger, [27] Y. Bengio, Y. Lecun, and Y. Lecun, Convolutional Networks for Images,
“Intelligent Buildings of the Future: Cyberaware, Deep Learning Speech, and Time-Series. 1995.
Powered, and Human Interacting,” IEEE Industrial Electronics [28] I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT
Magazine, vol. 10, no. 4, pp. 32–49, Dec. 2016. Press, 2016.
[12] C. N. Yu, P. Mirowski, and T. K. Ho, “A Sparse Coding Approach to [29] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural
Household Electricity Demand Forecasting in Smart Grids,” IEEE Networks,” in PMLR, 2011, pp. 315–323.
Transactions on Smart Grid, vol. 8, no. 2, pp. 738–748, Mar. 2017.
[30] M. Lichman, “UCI Machine Learning Repository.” 2013.
[13] M. Q. Raza and Z. Baharudin, “A review on short term load forecasting
[31] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”
using hybrid neural network techniques,” in 2012 IEEE International
arXiv:1412.6980 [cs], Dec. 2014.
Conference on Power and Energy (PECon), 2012, pp. 846–851.