1 s2.0 S0960148124021232 Main
1 s2.0 S0960148124021232 Main
Renewable Energy
journal homepage: www.elsevier.com/locate/renene
A R T I C L E I N F O A B S T R A C T
Keywords: Accurate prediction of solar and wind power output is crucial for effective integration into the electrical grid.
Renewable energy Existing methods, including conventional approaches, machine learning (ML), and hybrid models, have limi
Solar and wind power forecasting tations such as limited adaptability, narrow generalizability, and difficulty in forecasting multiple types of
Transformer model
renewable energy respectively. To address these challenges, this study introduces two novel hybrid models: the
Bidirectional long-short-term memory model
Hybrid model
CNN-ABiLSTM, which integrates Convolutional Neural Networks (CNN) with Attention-based Bidirectional Long
Short-Term Memory (ABiLSTM), and the CNN-Transformer-MLP, which integrates CNN with Transformers and
Multi-Layer Perceptrons (MLP). In both hybrid models, the CNN captures short-term patterns in solar and wind
power data, while the ABiLSTM and Transformer-MLP models address the long-term patterns. CNN, BiLSTM, and
Encoder-based Transformer were taken as baseline standalone models. The proposed hybrid models and
standalone baseline models were trained on quarter-hour-based real-time data. The hybrid models outperform
standalone baseline models in day, week, and month-ahead forecasting. The CNN-Transformer-MLP hybrid
provides more accurate day and week-ahead solar and wind power predictions with lower mean absolute error
(MAE), root mean square error (RMSE), and mean square error (MSE) values. For month-ahead forecasts, the
CNN-ABiLSTM hybrid excels in wind power prediction, demonstrating its strength in long-term forecasting.
* Corresponding author.
E-mail address: [email protected] (H. Wang).
https://doi.org/10.1016/j.renene.2024.122055
Received 18 August 2024; Received in revised form 12 November 2024; Accepted 28 November 2024
Available online 29 November 2024
0960-1481/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
T. Bashir et al. Renewable Energy 239 (2025) 122055
the time span of the forecast, ranging from ultra-short-term (up to 1 h), day-ahead wind power generation data. Similarly, a hybrid CNN and
short-term (daily), medium-term (daily to weekly), to long-term fore Transformer model proposed in Ref. [33] focuses on forecasting wind
casting (weekly to yearly) [9]. Ultra-short-term forecasts are vital for power generation data across multiple farms for ultra-short-term and
grid management, short-term forecasts support real-time electricity short-term periods, thereby overlooking the need for a model that
dispatch and unit commitment, medium-term forecasts assist in grid handles both solar and wind power data. A hybrid Wavelet Packet
maintenance planning, and long-term forecasts help in planning grid Decomposition (WPD)-CNN-LSTM-MLP model was proposed in
expansion [10]. Regarding forecasting techniques, RES power genera Ref. [33] to forecast hourly-based solar irradiance data. The
tion prediction utilizes three main methods. Analytical equations are the WPD-CNN-LSTM-MLP hybrid exhibited superior performance compared
foundation of the ’White-Box’ or ’Physical’ methods, which investigate to other network combinations, indicating that a random combination of
the interaction between a variety of factors that affect the production of diverse networks for forecasting tasks is inadequate. This was notably
RES [11]. Conversely, ’Black-Box’ methods, which include statistical underscored by the efficacy of the WPD-CNN-LSTM combination. A
and machine learning approaches, rely on historical RES power gener hybrid model, comprising a CNN layer, fully connected neural network
ation data to uncover hidden relationships between output and input layers, and Gated Recurrent Unit (GRU) layers, was used to predict very
variables using mathematical models [12]. Statistical methods, such as short-term wind data in Ref. [34]. In Ref. [35], a Multi-Head-Attention
the Exponential Smoothing (ES), auto regressive moving average (MHA) probabilistic CNN-BiLSTM model was proposed to forecast wind
(ARMA), and autoregressive integrated moving average (ARIMA) [13], speed data. All these studies focused on short term forecasting for a
and regression models [14], are particularly effective for single RES generation but did not tackle the problem of combined solar
ultra-short-term forecasting of RES data. Machine learning (ML) [15] and wind forecasting for longer period of time.
techniques, such asartificial neural networks (ANN), decision trees [16, There is a dearth of published research on hybrid models that
17], and k-nearest neighbors (kNN) [18], are predominantly used for attempt to predict data from both solar and wind power sources. For
short-term forecasting [19]. Lastly, ’Grey-Box’ methods, which are hy example, in Ref. [36], a novel approach was introduced to forecast solar
brids of deep and ML models, have been shown to yield the most ac and wind power generation data by utilizing a hybrid deep learning
curate forecasting results compared to the aforementioned methods in model that integrates time2vec, wide-first layer kernels CNN, and
all time-frames [20,21].These methodologies profit from the advantages BiLSTM. Although this hybrid model utilized hourly based wind and
of both ’White-Box’ and ’Black-Box’ methodologies, thereby delivering solar data, it didn’t address the model’s forecasting accuracy over longer
more precise predictions [22]. Different RES forecasting models have periods. In Ref. [37], the author proposed using statistical Markov Chain
been compared in Ref. [23], highlighting research trend on hybrid Monte Carlo (MCMC) simulations to forecast wind and solar power
models due to their better performance than standalone counterparts. generation, aiming to optimize long-term energy contracts for pur
Various studies have employed diverse combinations of machine and chasing renewable energy. The forecasting results from the MCMC
deep learning-based hybrid models to predict the RES power generation simulations were compared with those derived from Bayesian estimates
data. In Ref. [24], the Transformer model’s forecasting capabilities were to evaluate their effectiveness. However, it didn’t address the model’s
investigated in light of the correlation between various wind farms in forecasting over shorter periods. In addition data from a single site was
order to forecast short-term wind power production. Although the used to evaluate the forecasting accuracy in above studies.
Transformer model excels at capturing long-term dependencies in se Each of the hybrid models mentioned above, including MHA-CNN-
quences through its self-attention mechanism [25], it may overlook local BiLSTM, CNN-LSTM-TF, WPD-CNN-LSTM-MLP and so on, surpasses
relations. Therefore, to thoroughly investigate the local relations within the performance of traditional (like ARIMA, ES, etc.) or ML models (such
sequences, the Transformer model has been utilized in a hybrid fashion. as ANN, kNN, LSTM, Transformer etc.) when used standalone. These
For example, in Ref. [26] Transformer (TF) model was combined for hybrid approaches improve prediction accuracy by combining the
accuracy improvement with CNN and LSTM in different sequences. By strengths of both methods and compensating for their individual
employing solar power generation data different model combinations weaknesses. However, most hybrid approaches are specifically designed
were tested and the CNN-LSTM-TF combination found to be optimal. and applied to data from single renewable energy sources (RES), either
Mostly, the combination of CNN with other deep learning models has solar or wind power. While some hybrid techniques utilize both solar
been employed to harness its capability to capture the intrinsic features and wind power data, their use has been confined to short-term fore
of RES power generation data, complemented by weather data [27,28]. casting. While short-term forecasts are essential for real-time electric
The integration of CNN with other models has been utilized for pre grid management and operational efficiency, the longer-intervals fore
dicting outcomes using both univariate and multivariate input datasets. casts are also crucial for tactical planning and ensuring reliability in
Research indicates that multivariate hybrid CNN models tend to perform energy supply. However, there is a significant gap for hybrid models that
better in initial time steps. Conversely, the performance of univariate are computationally efficient, free from manual features selection and
hybrid CNN models typically improves over later time steps [29,30]. can effectively forecast not only short term but also longer-periods of
This pattern suggests that hybrid CNN models are effectively adaptable time.
for forecasting with both types of datasets.
Most of hybrid models for short-term forecasting focused on single 1.3. Contribution and organization
RES power generation either solar power or wind power. In Ref. [31],
the author suggested a hybrid model for forecasting short-term solar To mitigate the aforementioned issues, this study proposes two
power generation data that consists of two noise-removal decomposition hybrid techniques: CNN-ABiLSTM, CNN-Transformer-MLP. The pro
layers: Variational Mode Decomposition (VMD) and Improved Com posed work addresses the technical gap in solar and wind power data
plementary Ensemble Empirical Mode Decomposition (ICEEMD). Then, forecasting by overcoming the limitations of recent studies. These ap
the Whale Optimization Algorithm (WOA) is used to determine the proaches were rigorously tested and evaluated using several evaluation
optimal parameters for tailoring a BiLSTM model. Finally, an attention metrics. The hybrid models outperform the standalone CNN, BiLSTM,
layer was introduced to focus on key information, enhancing the fore and Transformer methods in terms of performance accuracy, as evi
casting accuracy. The incorporation of diverse deep learning models denced by empirical findings. This confirms the competitiveness and
enhances the precision of solar power generation data prediction. efficacy of the proposed models in forecasting production data for solar
However, this integration also escalates the computing expense required and wind power. The main findings of this study are outlined as follows.
to execute the hybrid model. In Ref. [32], A hybrid model, utilizing the
combination of Weighted Extreme Learning Machine (ELM) and Particle ● Hybrid of CNN-ABiLSTM and CNN-Transformer-MLP models is pro
Swarm Optimization (PSO), was employed to predict hour and posed in this work for both solar and wind power forecasting. The
2
T. Bashir et al. Renewable Energy 239 (2025) 122055
proposed hybrid methods considered both short-term and long-term involving data collection, handling missing values and noise, data di
patterns in solar and wind power data. vision, and normalization. This study gathers renewable power gener
● Both hybrid models utilize the CNN model to address the short-term ation data from renewable energy sources, which often contain missing
patterns in solar and wind power data. The Transformer-MLP and values and noise due to sensor malfunctions. Methods like linear inter
ABiLSTM models capture the residual long-term patterns. polation and forward or backward fill are used to address missing values
● The hybrid models were trained on real-time, quarter-hourly uni [38]. Similarly, noise in RES data can be addressed using various
variate solar and wind power data sourced from the European methods, with normalization commonly applied. The dataset is subse
Network of Transmission System Operators for Electricity (ENTSO- quently divided into three subsets for the purpose of training, validating,
E), specifically covering Germany and Luxembourg. and evaluating the forecasting model [39]. The training set changes
● The proposed hybrid models have a forecasting horizon that spans model parameters, the validation set examines performance after each
from one day to a month and were evaluated and compared against epoch, and the testing set assesses accuracy.
standalone CNN, Encoder-based Transformer, and BiLSTM models Data normalization is essential for optimizing model performance
using evaluation metrics including MSE, RMSE, MAE, and the coef and accelerating convergence, particularly for neural network-based
ficient of determination (R2). models. The Min-Max normalization method scales data within the
range of 0–1 and is given in Eq. (1) below:
The remaining sections of the paper are organized in the following xo − xmin
manner: Section 2 provides a detailed explanation of the suggested xo =
̂ (1)
xmax − xmin
approach, including the methodology, data preparation, and an over
view of the model components, such as CNN, BiLSTM, Transformer, and Where ̂ x o is the normalized value of data and xo denotes original input.
hybrid models. Section 3 briefly delve into the evaluation metrics. The xmax and xmin denotes the maximum and minimum values of the original
experimental setup is described in Section 4, and the results are dis values respectively.
cussed in Section 5. Section 6 conclude the paper. In this study real-time renewable solar PV and wind power genera
tion data were downloaded from platform of ENTSO-E [40]. Data from
2. Methodology the ENTSO-E platform, covering German (DE) and Luxembourg (LU)
regions with 15-min intervals, was downloaded. The dataset spans four
This section outlines various prediction methodologies using deep years (2018–2023) with 178,272 samples. Data from 2018 to 2021 (119,
learning-based neural network models to extract patterns in solar and 712 observations) was used for training, while the following 13 months
wind power data. It concludes with the principles of the proposed hybrid (29,280 observations) were used for validation and testing. The data
models, which aim to enhance accuracy and efficiency in renewable shows minor short-term fluctuations, as each instance represents the
power prediction. The overall forecasting process is illustrated in Fig. 1. combined power generation from various solar and wind farms. Prior to
forecasting, the data was processed as per the mentioned criteria.
2.1. Data preparation
3
T. Bashir et al. Renewable Energy 239 (2025) 122055
2.2. Model components gates. The sigmoid activation function also enables an LSTM model to
capture time series data non-linear features. The output gate’s hyper
Model components subsection briefly describes the basic principle of bolic tangent activation function, conversely, controls the output be
each component employed in the proposed hybrid frameworks. tween − 1 and 1. it stands for the input gate, ft for the forget gate, ot for
the output gate, ct for the cell state, and ht for the hidden state.
2.2.1. Convolutional neural network In an LSTM model, the forget gate ft uses a sigmoid function to decide
A convolutional neural network (CNN) is a feedforward neural whether to retain or discard the previous cell content. Subsequently, the
network that uses convolutional operations to extract high-level features cell state is modified by merging the forget gate (ft) with the prior cell
and correlations from time series [41]. Incorporating a 1D convolutional state (ct-1) and the input gate (it) with the current input. The output gate,
layer enhances accuracy and reduces complexity, while CNNs are denoted as ot, utilizes a sigmoid function to ascertain whether to retain
effective at extracting informative features from noisy data [42]. or transmit the present output. The hidden state ht is obtained by
A typical CNN architecture consists of three primary components: a multiplying the controlled output of ot with the current cell state ct. Both
convolutional layer, a pooling layer, and a fully connected layer. The ct and ht are then transmitted to the next stage.
convolutional layer extracts contextual spatial information from data This process is illustrated in Fig. 2.
using convolutional kernels, with multiple kernels generating feature The BiLSTM model comprises two LSTM blocks: a forward LSTM
maps through their interactions. block processing the input sequence from t-m to t, and a backward LSTM
⎛ ⎞ block processing from t to t-m. Forward LSTM network block’s output
∑ (→o t ) and backward LSTM network block’s output (← o t ) are carried out by
ym
j =f
⎝ xm− 1 ⊙ wm + bm ⎠
i ij j (2)
i∈Mj utilizing the working principles of single LSTM model described above.
The mathematical dynamics of output of BiLSTM model (yt ) is given in
Where ym m− 1 Eq. (6).
j is the j th output of m th convolutional layer, xi is the i th
output of (m-1)th convolutional layer. While Mj is the selection of input →, ←
yt = σ(ot ot ) (6)
maps, and wmij denotes the weight between i the input and j th output. ⊙
The purpose of utilizing sigmoid activation function (σ ) is to unify
denotes the convolution operation and bm
j denotes the j th bias of m th
outputs of single LSTM models. To regulate overfitting of output of
layer. Finally, rectified linear unit (ReLU) activation function is denoted
BiLSTM model, drop-out technique is employed. The rate of the drop-out
by f.
layer is a hyperparameter need to be tuned accordingly while defining
A pooling layer in CNN architecture reduces parameters by either
BiLSTM model. The working operation of the BiLSTM model is shown in
taking the global average or selecting the maximum value from the
Fig. 3.
convolutional kernel outputs, as represented by Eq. (3).
( ( ) )
m 2.2.3. Transformer model
ym
j = f βj max xj
m− 1
+ bm
j (3)
The Transformer model, by means of its self-attention mechanism,
learns temporal and recurring patterns in RES data more efficiently than
Where max(.) denotes the maxpooling subsampling function, βm
j repre traditional RNNs. Unlike RNNs, Transformers do not require processing
sents the j th bias m th layer. data in a sequential order. The components of the Transformer model
The fully connected layer produces the final results of the CNN model are discussed below.
by utilizing the outputs from the convolutional and pooling layers as its
inputs, as specified in Eq. (4).
2.2.3.1. Encoder-decoder architecture. A Transformer model typically
( )
consists of stacked encoder-decoder layers. Each layer includes a posi
ym
j = f w xj
m m− 1
+ bm
j (4)
tional encoding, a self/multi-head-attention, and a fully connected
sublayer. Each sublayer, following layer normalization, employs a re
Where ym
j denotes the final j th output of m th layer of the CNN model, sidual structure. The mathematical dynamics of these sublayers are
xm−
j
1
denotes the j th input vector of (m-1) th layer, wm denotes weight described as follows.
matrix between m th layer and (m-1) th layer.
outputsublayer = LayerNorm(x + Sublayer(x)) (7)
2.2.2. BiLSTM
Where, outputsublayer denotes output of sublayer. Sublayer(x) represents
BiLSTM is an advanced version of the LSTM model. Before delving
the function implemented by sub-layer. To support residual connection
into the BiLSTM model, it is essential to understand the fundamental
structure and working principles of the LSTM model. LSTM enhances
traditional RNNs by addressing the vanishing gradient problem and
improving long-term sequence retention through a gating mechanism
[43,44]. The LSTM unit cell comprises three gates: input, forget, and
output. These gates regulate the model’s capacity to retain long-term
dependencies. The mathematical dynamics of LSTM can be explained
as follows:
it = σ (U
( i ⋅[ht− 1 , xt ] + bi ))
ft = σ Uf ⋅[ht− 1 , xt ] + bf
ot = σ (U0 ⋅[ht− 1 , xt ] + bo ) (5)
ct = ft ⊙ ct− 1 + it ⊙ tanh(Uia ⋅[ht− 1 , xt ] + bia )
ht = ot ⊙ tanh(ct )
Where, U denotes the weight matrix and the bias of corresponding gates
are denoted by b. ⊙ symbol denotes element-wise multiplication. σ
symbol is often known as the sigmoid activation function and is
responsible for controlling the opening and closing of the corresponding
Fig. 2. Single Cell of LSTM model [45].
4
T. Bashir et al. Renewable Energy 239 (2025) 122055
5
T. Bashir et al. Renewable Energy 239 (2025) 122055
( )
QKT MultiHeadAtten(Q, K, V) = Concat(h1 , ⋯, hh )Wo (12)
Atten(Q, K, V) = soft max √̅̅̅̅̅ V (11)
dK
Where, hi = Atten(Qi , Ki , Vi ).
Where dK denotes dimension of K.
The multi-head attention mechanism splits a single self-attention Qi = QWiQ , WiQ ∈ Rdmod el ×dK
/
mechanism into h parallel heads, each calculated using the formula in Ki = KWiK , WiK ∈ Rdmod el ×dK , dK = dV = dmod el h
(13)
Eq. (11) to generate different weight matrices. The individual weight Vi = VWiV , WiV ∈ Rdmod el ×dV
matrices WiQ , WiK and WiV , are responsible for transformation of Q, K, and i = 1, 2, 3, ⋯, h
V of dimension of dmod el into h vectors Qi , Ki and Vi , each of dimension
dmod
. Finally, the outputs of all heads of each parallel layer are
el Where, WiQ , WiK , and WiV are linearly projected from Q, K, and V. Wio is
h
concatenated and passed through a linear layer to get the final value. also a projected parameter matrix utilized to remap the result of
The mathematical dynamics of above explanation is given as follows: concatenation to dmod el dimensions.
6
T. Bashir et al. Renewable Energy 239 (2025) 122055
2.2.4. Hybrid model strategy the purpose of employing the CNN model is also same like the hybrid
This study explores various hybrid framework strategies for fore CNN-ABiLSTM that is extraction of high level features and correlation
casting renewable power generation data, including CNN-LSTM, CNN- present in the renewable power generation data. After the extraction of
BiLSTM, CNN-ABiLSTM, and CNN-Transformer-MLP techniques. How high level feature and correlation by the CNN, these features are passed
ever, only two of the hybrid techniques CNN-ABiLSTM and CNN- through the positional encoding layer. The purpose of employment of
Transformer-MLP outperformed other hybrid techniques. While, the the positional encoding is to assign a specific position to a specific time-
CNN-Transformer-MLP hybrid model outperformed the previously step for the processing of the Transformer layer. Although the Trans
mentioned CNN-ABiLSTM. Therefore, these two hybrid techniques are former does not possess inherent sequential data processing skills, it is
discussed in detail here in this section on hybrid model strategy. Fig. 5 adept at capturing long-term dependencies in RES data. The output of
elaborates on the two proposed hybrid models, which share the same the Transformer model is subsequently fed to multilayer perceptron to
preprocessing steps for wind and solar power datasets. Note that the generate the ultimate result.
models were trained separately for wind and solar power predictions, A key difference between the two hybrid techniques resides in the
using wind data for wind power forecasting and solar data for solar utilization of BiLSTM and Transformer models. The BiLSTM models may
power forecasting. The figure also highlights the structural differences be susceptible to the vanishing gradient issue. This issue can hinder its
between the proposed hybrid models. ability to model the long-range dependencies present in the RES data.
Because renewable power generation data is sequential in nature, the
2.2.4.1. CNN-ABiLSTM model. Hybrid of CNN-ABiLSTM model is a BiLSTM model requires additional training time compared to the
stacked fashion hybrid strategy. This hybrid strategy utilizes the Encoder-based Transformer model, which is also sequential. However,
strength of each model at different stages of forecasting process. The the Encoder-based Transformer models cope up with aforementioned
hybrid model structure begins with the 1D-CNN, followed by BiLSTM problems leveraging its strong parallel processing ability. Similarly, to
networks, and concludes with the incorporation of attention mechanism address the issue of capturing long-range dependencies of the BiLSTM
along with a dense layer. Initially, the rolling (sliding) window method model that Transformer model can easily capture with the help of in-
is essential for structuring the input data. This method allows the model built attention mechanism, different attention mechanism modules
to forecast the subsequent time step by utilizing the renewable power such as simple attention, self-attention, and multi-head attention
data from the previous time steps (particularly solar and wind power). mechanisms are employed. However, simple attention mechanism
The rolling window strategy utilizes the following time step as the modules gave the optimal results at the optimal training time. So, the
output variable and the preceding time steps as the input variables. simple attention mechanism is finally utilized with the hybrid strategy of
The width of the window in the rolling window method is defined by CNN-ABiLSTM model.
the number of previous time steps that were utilized. Following the se
lection of the ideal window width, CNN is fed renewable power data 3. Evaluation metrics
with the ideal window size. A CNN model has the strong capability to
capture the local (short-term) correlations between the previous and The performance of previously described forecasting models is
next time steps of univariate renewable power data. The extracted cor evaluated by employing four separate evaluation metrics: the mean
relations with the help of CNN are fed to the BiLSTM model. Thus absolute error (MAE), the root mean squared error (RMSE), the mean
BiLSTM model will utilize the new information extracted by the CNN for squared error (MSE), and finally coefficient of determination (R2 )
the identification of innate contextual temporal features (long-term method. Mathematically these evaluation metrics can be described as
correlation) to forecast renewable power generation data. In addition, follows:
the output of the BiLSTM model is inputted into the attention mecha
N ⃒
nism module. 1 ∑ ( )⃒
MAE = ⃒ ̂y R − yR ⃒ (15)
The primary function of the attention mechanism is to allocate more N L=1
weights in the output layer of the BiLSTM model to time steps that are √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
more meaningful. This allows the model to effectively adjust to shifting √
√1 ∑ N
( )2
patterns and variable lengths of temporal dependencies. It ensures that RMSE = √ y R − yR
̂ (16)
N L=1
BiLSTM layer can selectively attend to more relevant portions of time
series during forecasting.
N
The proposed hybrid framework of CNN-ABiLSTM employs the 1 ∑ ( )2
MSE = y R − yR
̂ (17)
simple attention mechanism, foregoing self-attention or multi-head N L=1
attention mechanism and mathematical dynamics of this simple atten
tion mechanism is given as follows: Rss
R2 = 1 − (18)
Tss
e = tanh(W • x + b)
β = soft max(e) Where N denotes total number of the data points, ̂ y R represents fore
∑n (14)
context = (xi ⋅βi ) casted value of renewable power generation data, and yR is the original
i=1 data. Finally, the Rss represents the residual sum of squares, and Tss is the
total sum of squares.
The set of new values, denoted by e, is created by calculating the dot
product of the input sequence x and the weight matrix W, and then
4. Experimental setup
passing the result through a hyperbolic tangent activation function. β is
the probability distribution of e. Finally, the context is weighted sum of
The open-source TensorFlow machine learning framework served as
the i th input sequence (xi ) and i th probability distribution (βi ).
the platform for coding in Python. In the hybrid CNN-ABiLSTM model,
the CNN layer is composed of two Con1D layers. The Con1D layers are
2.2.4.2. CNN-transformer-MLP model. The working principle of the
utilized by employing the built-in "keras.layers" package, with 64 filters
hybrid of CNN-Transformer-MLP is someway similar to the hybrid of
(convolutional kernels), a kernel size of 2, and ReLU as the activation
CNN-ABiLSTM. The input fed to the hybrid of CNN-Transformer-MLP
function. The BiLSTM layer consists of 50 LSTM units, each having a
model also employs same technique of the rolling or sliding window.
ReLU activation function. Finally, the mathematical formulation of Eq.
The CNN model is also an important part of this hybrid technique, and
(14) helped to build a customized attention mechanism function by
7
T. Bashir et al. Renewable Energy 239 (2025) 122055
8
T. Bashir et al. Renewable Energy 239 (2025) 122055
Fig. 6. Day ahead predictions (a) solar power data, (b) wind power data.
Fig. 7. Week ahead predictions (a) solar power data, (b) wind power data.
Fig. 8. Month ahead predictions (a) solar power data, (b) wind power data.
Table 3
Comparison of day-ahead forecasts across different models.
Model Name and Date Model’s Solar Power Generation Data Performance Evaluation Model’s Wind Power Generation Data Performance Evaluation
2022-11-30 RMSE (MW) MAE (MW) MSE (MW) R2 RMSE (MW) MAE (MW) MSE (MW) R2
CNN 88.95 81.60 7912.78 0.9953 274.48 270.88 75342.13 0.6471
BiLSTM 56.79 46.34 3225.51 0.9981 62.86 52.45 3952.02 0.9815
Encoder-based Transformer 48.09 29.02 2313.48 0.9986 46.48 36.82 2160.43 0.9898
CNN-ABiLSTM 36.94 26.70 1364.89 0.9992 44.82 37.68 2009.41 0.9905
CNN-Trans.-MLP 31.32 17.11 981.02 0.9994 43.07 33.75 1855.04 0.9913
for solar power dataset. The RMSE values for solar power data pre proved to be more efficient than CNN-Transformer-MLP with RMSE
dictions are lower than the CNN-ABiLSTM model that are 54.89. How value of 256.19. While, in terms of percentage performance, the CNN-
ever, during the forecast of wind power dataset CNN-ABiLSTM model ABiLSTM model outperformed the CNN-Transformer-MLP model for
9
T. Bashir et al. Renewable Energy 239 (2025) 122055
Table 4
Comparison of week-ahead forecasts across different models.
Model Name and Date Model’s Solar Power Generation Data Performance Evaluation Model’s Wind Power Generation Data Performance Evaluation
2022-12-02-09 RMSE (MW) MAE (MW) MSE (MW) R2 RMSE (MW) MAE (MW) MSE (MW) R2
CNN 68.92 48.68 4749.77 0.9972 196.96 170.88 38793.06 0.9966
BiLSTM 56.88 39.69 3235.38 0.9981 133.51 97.95 17824.25 0.9985
Encoder-based Transformer 49.68 33.42 2469.04 0.9985 140.59 103.16 19766.94 0.9983
CNN-ABiLSTM 32.89 16.13 1082.19 0.9993 125.60 89.37 15775.73 0.9986
CNN-Trans.-MLP 32.28 15.35 1042.20 0.9993 117.74 84.20 13882.27 0.9988
Table 5
Comparison of month-ahead forecasts across different models.
Model Name and Date Model’s Solar Power Generation Data Performance Evaluation Model’s Wind Power Generation Data Performance Evaluation
2023–01 RMSE (MW) MAE (MW) MSE (MW) R2 RMSE (MW) MAE (MW) MSE (MW) R2
CNN 131.90 83.96 17397.78 0.9965 294.27 220.36 86599.44 0.9994
BiLSTM 72.48 56.79 5254.72 0.9989 268.88 185.40 72297.18 0.9995
Encoder-based Transformer 83.91 50.33 7041.47 0.9986 368.40 262.34 135722.1 0.9992
CNN-ABiLSTM 63.81 46.26 4071.96 0.9992 256.19 178.01 65634.47 0.9996
CNN-Trans.-MLP 54.89 32.60 3013.95 0.9994 266.61 178.49 71085.26 0.9996
Fig. 9. CNN-Trans.-MLP performance comparison with other models: (a) day-ahead, (b) week-ahead.
wind power forecasting, achieving a 4.07 % reduction in RMSE, a 0.27 % strategies outperform standalone CNN, BiLSTM, and Encoder-based
reduction in MAE, and an 8.31 % reduction in MSE. It also confirms the Transformer models with respect to RMSE, MAE, MSE, and R2. The
necessity of testing many models for various time horizons, as demon proposed hybrid strategies significantly outperform the standalone
strated in the spider plot depicted in Fig. 10. The above discussion of counterparts for long-term predictions and marginally surpass in short-
results suggest that, the hybrid of CNN-Transformer-MLP model proved term predictions.
to be more effective in forecasting both solar and wind power dataset for The results analysis of renewable energy forecasting models has
short period of time. But for the long period of time, CNN-ABiLSTM revealed significant performance variances across diverse timeframes
model start to challenge this status for solar power data and also got and model architectures. For day-ahead forecasts, the hybrid CNN-
successful to beat CNN-Transformer-MLP model in performance for Transformer-MLP model significantly outperformed other models. It
wind power dataset. achieved average improvements of 31.94 % in RMSE, 43.81 % in MAE,
and 48.58 % in MSE for solar power data. Similarly, it enhanced wind
6. Conclusion power forecasting accuracy with improvements of 25.41 % in RMSE,
28.39 % in MAE, and 34.48 % in MSE. However, for week-ahead pre
Accurate forecasting of renewable power generation is crucial to dictions of solar power production, although the CNN-ABiLSTM model
ensure stable operation and effective administration of electric grid. As demonstrated improved performance compared to day-ahead pre
renewable energy is intermittent and occasionally unpredictable, the dictions, the CNN-Transformer-MLP model still managed to outperform
simple standalone machine learning-based models fail to provide effi it by a narrow margin. Specifically, the CNN-Transformer-MLP model
cient results. This article proposed two hybrid strategies, a hybrid of outperformed the CNN-ABiLSTM model, showing a 4.84 % improve
CNN-ABiLSTM and a CNN-Transformer-MLP model, for the forecast of ment in MAE, 1.85 % improvement in RMSE and a 3.69 % improvement
renewable power production, specifically wind and solar power pro in MSE. For week-ahead predictions of wind power data, the CNN-
duction. These hybrid strategies feature individual methods based on Transformer-MLP model outperformed other models, including the
their strength for extraction and forecasting complex and non-linear CNN-ABiLSTM. It achieved an average improvement of 18.64 % in
trends in renewable power data. The developed hybrid strategies are RMSE, 22.73 % in MAE, and 32.53 % in MSE. Nevertheless, for month-
validated on real-time renewable power production dataset for Ger ahead predictions of wind power data, the CNN-ABiLSTM model out
many. The simulation results confirm that the proposed hybrid performed other models, including the CNN-Transformer-MLP model. It
10
T. Bashir et al. Renewable Energy 239 (2025) 122055
11
T. Bashir et al. Renewable Energy 239 (2025) 122055
[30] S. Souha, Hybrid Deep Learning Model for Renewable Energy Forecasting, Master’s [38] A. Van Poecke, H. Tabari, P. Hellinckx, Unveiling the backbone of the renewable
Thesis, Leibniz University, 2021. https://www.iwi.uni-hannover.de/fileadmin/iw energy forecasting process: exploring direct and indirect methods and their
i/Abschlussarbeiten/MA_Soltani_K.pdf. (Accessed 27 May 2024). applications, Energy Rep. 11 (2024) 544–557, https://doi.org/10.1016/j.
[31] M. Yu, D. Niu, K. Wang, R. Du, X. Yu, L. Sun, F. Wang, Short-term photovoltaic egyr.2023.12.031.
power point-interval forecasting based on double-layer decomposition and WOA- [39] I. Karijadi, S.-Y. Chou, A. Dewabharata, Wind power forecasting based on hybrid
BiLSTM-Attention and considering weather classification, Energy 275 (2023) CEEMDAN-EWT deep learning method, Renew. Energy 218 (2023) 119357,
127348, https://doi.org/10.1016/j.energy.2023.127348. https://doi.org/10.1016/j.renene.2023.119357.
[32] R. Tahmasebifar, M.P. Moghaddam, M.K. Sheikh-El-Eslami, R. Kheirollahi, A new [40] F. Parreño, C. Parreño-Torres, R. Alvarez-Valdes, A matheuristic algorithm for the
hybrid model for point and probabilistic forecasting of wind power, Energy 211 maintenance planning problem at an electricity transmission system operator,
(2020) 119016, https://doi.org/10.1016/j.energy.2020.119016. Expert Syst. Appl. 230 (2023) 120583, https://doi.org/10.1016/j.
[33] X. Huang, Q. Li, Y. Tai, Z. Chen, J. Zhang, J. Shi, B. Gao, W. Liu, Hybrid deep eswa.2023.120583.
neural model for hourly solar irradiance forecasting, Renew. Energy 171 (2021) [41] Z. Liao, C.F.M. Coimbra, Hybrid solar irradiance nowcasting and forecasting with
1041–1060, https://doi.org/10.1016/j.renene.2021.02.161. the SCOPE method and convolutional neural networks, Renew. Energy 232 (2024)
[34] M.A. Hossain, R.K. Chakrabortty, S. Elsawah, M.J. Ryan, Very short-term 121055, https://doi.org/10.1016/j.renene.2024.121055.
forecasting of wind power generation using hybrid deep learning model, J. Clean. [42] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, D.J. Inman, 1D
Prod. 296 (2021) 126564, https://doi.org/10.1016/j.jclepro.2021.126564. convolutional neural networks and applications: a survey, Mech. Syst. Signal
[35] Y.-M. Zhang, H. Wang, Multi-head attention-based probabilistic CNN-BiLSTM for Process. 151 (2021) 107398, https://doi.org/10.1016/j.ymssp.2020.107398.
day-ahead wind speed forecasting, Energy 278 (2023) 127865, https://doi.org/ [43] F. Shahid, A. Zameer, M. Muneeb, A novel genetic LSTM model for wind power
10.1016/j.energy.2023.127865. forecast, Energy 223 (2021) 120069, https://doi.org/10.1016/j.
[36] D. Geng, B. Wang, Q. Gao, A hybrid photovoltaic/wind power prediction model energy.2021.120069.
based on Time2Vec, WDCNN and BiLSTM, Energy Convers. Manag. 291 (2023) [44] M. Abdel-Nasser, K. Mahmoud, Accurate photovoltaic power forecasting models
117342, https://doi.org/10.1016/j.enconman.2023.117342. using deep LSTM-RNN, Neural Comput. Appl. 31 (2019) 2727–2740, https://doi.
[37] J.J. Mesa-Jiménez, A.L. Tzianoumis, L. Stokes, Q. Yang, V.N. Livina, Long-term org/10.1007/s00521-017-3225-z.
wind and solar energy generation forecasts, and optimisation of Power Purchase [45] T. Bashir, C. Haoyong, M.F. Tahir, Z. Liqiang, Short term electricity load
Agreements, Energy Rep. 9 (2023) 292–302, https://doi.org/10.1016/j. forecasting using hybrid prophet-LSTM model optimized by BPNN, Energy Rep. 8
egyr.2022.11.175. (2022) 1678–1686, https://doi.org/10.1016/j.egyr.2021.12.067.
12