0% found this document useful (0 votes)
15 views

Article

1) The document presents a transfer learning model called TFDI-EEP that uses knowledge from a fault detection task to improve energy efficiency prediction in petrochemical processes when measurement signals are degraded by faults. 2) TFDI-EEP uses an LSTM network structure and partial layer freezing during transfer learning to maintain fault detection abilities while fine-tuning the model for the new prediction task. 3) Testing shows TFDI-EEP achieves more accurate energy efficiency prediction compared to other methods when evaluation datasets contain up to 40% measurement faults, demonstrating its robustness to signal degradation.

Uploaded by

Dejene Dagime
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Article

1) The document presents a transfer learning model called TFDI-EEP that uses knowledge from a fault detection task to improve energy efficiency prediction in petrochemical processes when measurement signals are degraded by faults. 2) TFDI-EEP uses an LSTM network structure and partial layer freezing during transfer learning to maintain fault detection abilities while fine-tuning the model for the new prediction task. 3) Testing shows TFDI-EEP achieves more accurate energy efficiency prediction compared to other methods when evaluation datasets contain up to 40% measurement faults, demonstrating its robustness to signal degradation.

Uploaded by

Dejene Dagime
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Energy and AI 12 (2023) 100224

Contents lists available at ScienceDirect

Energy and AI
journal homepage: www.sciencedirect.com/journal/energy-and-ai

Explainable deep transfer learning for energy efficiency prediction based on


uncertainty detection and identification
Chanin Panjapornpon a, *, Santi Bardeeniz a, Mohamed Azlan Hussain b,
Patamawadee Chomchai a
a
Department of Chemical Engineering, Center of Excellence on Petrochemicals and Materials Technology, Faculty of Engineering, Kasetsart University, Bangkok 10900,
Thailand
b
Department of Chemical Engineering, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• Robust prediction of energy efficiency


under measurement anomaly is
presented.
• A transfer learning approach is imple­
mented based on a deep network
structure.
• An aberrant measurement fault detec­
tion task is chosen as a source domain.
• Interpretability, accuracy, and reli­
ability are enhanced with the proposed
model.
• The revelation of interconnection be­
tween domains is deeply explored.

A R T I C L E I N F O A B S T R A C T

Keywords: Energy efficiency is an important aspect of increasing production capacity, minimizing environmental impact,
Energy efficiency prediction and reducing energy usage in the petrochemical industries. However, in practice, data quality can be degraded
Transfer learning by measurement malfunction throughout the operation, leading to unreliable and inaccurate prediction results.
Petrochemical process
Therefore, this paper presents a transfer learning fault detection and identification-energy efficiency predictor
Measurement reliability
Fault detection and identification
(TFDI-EEP) model formulated using long short-term memory. The model aims to predict the energy efficiency of
the petrochemical process under uncertainty by using the knowledge gained from the uncertainty detection task
to improve prediction performance. The transfer procedure resolves weight initialization by applying partial
layer freezing before fine-tuning the additional part of the model. The performance of the proposed model is
verified on a wide range of fault variations to thoroughly examine the maximum contribution of faults that the
model can tolerate. The results indicate that the TFDI-EEP achieved the highest r-squared and lowest error in the
testing step for both the 10% and 20% fault variation datasets compared to other conventional methods.
Furthermore, the revelation of interconnection between domains shows that the proposed model can also
identify strong fault-correlated features, enhancing monitoring ability and strengthening the robustness and
reliability of the model observed by the number of outliers. The transfer parameter improves the prediction
performance by 9.86% based on detection accuracy and achieves an r-squared greater than 0.95 on the 40%
testing fault variation.

* Corresponding author.
E-mail address: [email protected] (C. Panjapornpon).

https://doi.org/10.1016/j.egyai.2022.100224

Available online 18 December 2022


2666-5468/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

1. Introduction LSTM structure [15]. However, a robust energy efficiency analysis


approach considering uncertainty in measurement without outlier
The industrial sector, especially the petrochemical industry, ac­ removal has not been comprehensively investigated yet. The energy
counts for a significant portion of world energy consumption and efficiency forecast model has been created to utilize reliable and trust­
greenhouse gas emissions [1]. To address the global worthy information [16]. The energy efficiency prediction model that
energy-environment issues and satisfy the sustainable development accounts for uncertainty is incapable of identifying abnormal system
goal, the petrochemical industry aims to quantify energy efficiency by behavior [17]. Most fault detection models always finally finish only the
providing guidance on daily operations, minimizing environmental detection task [18], whereas these fault labels can enhance the predic­
impact, and reducing energy consumption [2]. With the rapid evolution tion performance [19]. The gap between the fault detection task and
in the field of artificial intelligence and data-driven predictive modeling, energy prediction can be resolved by the concept of transfer learning
many techniques have been proposed recently in the literature to [20]. The transfer learning algorithm can significantly improve learn­
perform energy analysis and determine the saving opportunities of the ability [21] by resolving the weight initialization problem. Also, the
petrochemical process at the industrial level. Machalek et al. proposed transfer procedure strengthens the performance of conventional deep
the hybrid encoder–decoder method combined with the physics-based learning models by transferring the knowledge across the model [22].
model to predict multi-step time series of heat absorbed by the water Modal et al. proposed transfer learning applied to the deep neural net­
and steam in a thermal power plant [3]. Chen et al. proposed the works to formulate an accurate surrogate model with limited available
long-short term memory (LSTM) model to predict the hourly time-series data for predicting unstable operating conditions in combustors [23].
energy consumption and carbon emission with various occupant den­ However, the model did not reveal the interconnection of the energy
sities in the building [4]. Nevertheless, these data-driven prediction prediction task under transient fault operation, and the functional per­
approaches are sensitive to the quality of the measurement signal. When formance of AI-based prediction under aberrant signals was not thor­
signal quality is degraded, the performance of the model suffers from the oughly examined, which has been carried out in this work to address the
influence of fault, which affects its stability. mentioned research gap.
In practice, one of the most challenging tasks when dealing with a This paper proposes a model parameter-based transfer learning
complex system is that the measurement signal is degraded by the effects measurement fault detection and identification - energy efficiency pre­
of complex interconnection between process equipment, such as the dictor (TFDI-EEP) integrated with LSTM computational layer to predict
multiple thermal interaction units, mass transportation equipment, energy efficiency under aberrant measurement signals. The model is
phase separation, and unexpected events during production hours transferred from fault detection - identification task or source domain
caused by control systems and production rate changes. These into energy efficiency task or target domain by applying the partial layer
mentioned complexities result in the distinct characteristics of petro­ freezing technique to maintain the detection ability from the source
chemical process data, which are generally multidimensional, imbal­ domain and then fine-tuning the additional part of the model. The
anced, noisy, and inconsistent [5]. Sensor malfunctions cause outliers proposed study provides a novel perspective on prediction and model
and aberrant signals in measured process variables [6], which signifi­ development for researchers using an artificial intelligence technique to
cantly impact the result [7] and accuracy in the quality of the model [8]. strengthen the energy efficiency prediction for complex petrochemical
Several studies have focused on diagnostics to eliminate undesirable industries when signal quality is degraded. The main contributions of
input data from the original dataset. K-means clustering [9] and this work can be summarized as follows:
density-based spatial clustering of applications with noise [10] have
recently been implemented to eliminate outliers. Even so, unsupervised 1 Interpret insights of the network using the concept of explainable AI.
techniques rely on user-specified parameters that are limited. Geng et al. (Section 5.1)
proposed a cartesian product combining the cross-feature approach and 2 Produce energy efficiency prediction techniques for systems in the
a supervised convolutional neural network to give a graphical analysis presence of faults in the application of the vinyl chloride monomer
for energy efficiency evaluation [11]. Some researchers proposed the process. (Section 5.2)
unsupervised graph convolutional autoencoder for the multivariate time 3 Propose the transfer learning strategy based on LSTM computational
series model [12] and auto-encoder-based anomaly root cause analysis layers to improve the model training reproducibility and reliability.
[13] to predict anomalous behavior. Adding this new function to the (Section 5.3)
framework would increase the amount of time necessary for offline data 4 Explain the consequence of classification accuracy from the source
training, as well as restrict real-time deployment due to the many task related to prediction performance in the target task. (Section
intricate data stages of processing involved. Some researchers proposed 5.4)
an integrated framework to solve the two task problems simultaneously. 5 Reveal the effect of fault variation magnitude on the prediction
Fang et al. proposed a hybrid deep LSTM transfer learning for labeling performance. (Section 5.5)
the classifier and energy predictor based on the sharing parameters of
the feature extractor [14]. Shi and Chehade proposed a consecutively 2. Model development
dual-LSTM framework of the change point detection model, which
resulted in an improvement in the prediction performance of the second 2.1. Deep network structure

In this study, TFDI-EEP is proposed in comparing with non-transfer


Table 1
learning integrated framework of measurement fault detection and
Structural information of network.
identification - energy efficiency predictor (FDI-EEP), LSTM-based en­
Characteristics Name Layers Connection Visualization Network ergy efficiency predictor (EEP), and traditional artificial neural network
Type
including feedforward neural network (FFNN), recurrent neural
Model 1 FFNN 4 3 Fig. 1 Series network (RNN), and convolutional neural network (CNN). The struc­
Model 2 RNN 4 4 Fig. 1 Recurrent
tural information of each model used in this study is summarized in
Model 3 CNN 7 6 Fig. 1 Series
Model 4 EEP 4 3 Fig. 1 DAG Table 1.
Model 5 FDI- 8 8 Fig. 1 DAG Fig. 1 demonstrates the model formulation of the deep learning
EEP model focused in this study. Among deep learning models, FFNN is the
Model 6 TFDI- 8 8 Figs. 1, 3, 4 DAG most straightforward learning algorithm. This model employs several
EEP
hidden and activation layers before passing data on to the output layers

2
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 1. Deep network formulation.

3
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Table 2
Comparison of advantages and disadvantages of each deep network structure.
Model Advantages Disadvantages

FFNN • Less complexity. • Only applicable to linearly separable data.


• Low computational resources.
RNN • Supporting time-series calculation. • Vanishing and exploding gradient problem.
• Can use previous observation to determine a current • Recurrent computation can be a bottleneck.
output.
CNN • Strong feature extraction ability. • Low interpretability.
• Sensitive to hyperparameters.
• Require very large amount of training dataset.
LSTM • Resolve the gradient problem in training. • Time-consuming.
• Strong dynamic learning ability. • Fixed numbers of predictors in the target domain when performing transfer learning due to the size of
recurrent parameters.

without any cycle or loop [24]. The hidden layers are constructed using information processing steps are visualized in Fig. 2.
fully connected layers that sum each channel of the previous layer
output and transform it into the final output using linear activation of
the regression layers. RNN and CNN are the more complex deep learning 2.3. Output task
models. RNN is structured based on a fully connected and regression
layer similar to FFNN. RNN, on the contrary, contains cyclic connections The difference in tasks between the source and target domains
that allows state variables to capture the temporal dynamic behavior of require different types of output layers. In the source domain, the clas­
data and recall the state variable at previous time steps [25], where this sification output layer is deployed for the measurement fault detection -
dynamic recurrent feature will feed the current state to the following identification task. This layer computes the cross-entropy loss (LossC )
observations. CNN consists of various convolutional and pooling layers. and weighted classification tasks with mutually exclusive classes. The
When dealing with time series data (input variables, observations, and classification layer usually follows a fully connected layer and a softmax
time steps), the convolution layer convolves a significant part of the layer. A fully connected layer summarizes all the vectors equal to the
inputs over the time dimension using a specified filter to create a feature output size, while softmax activation normalizes it into the probability
mapping [26]. The pooling layers then carry out downsampling by distribution. The calculation of fully connected value, softmax activa­
reducing the size of the information and sending it to the subsequent tion, and cross-entropy loss for single output classification are given in
layers (fully connect layer to adjust size and regression layer). CNN is Equations below.
highly capable of performing strong feature extraction and multidi­ o2t = wfc ht + bfc (1)
mensional feature condensing. The EEP is an advanced deep learning
model for energy efficiency prediction based on time-series data; it is
composed of the LSTM computational layers, fully connected layer, and ) 2
eot
regression layer. Furthermore, FDI-EEP uses a result of fault classifica­ ynk = o3t = Softmax(o2t = ∑
n (2)
2
eot
tion as an additional predictor, while TFDI-EEP uses the partial transfer t=1
knowledge of the FDI task concatenated with the predictor to perform
energy efficiency prediction. Details of LSTM-based and proposed 1 ∑N ∑ K

models will be discussed in the subsequent section. The comparison of LossC = − wk tnk lnynk (3)
N n=1 k=1
advantages and disadvantages for each type of deep neural network is
shown in Table 2. where the wfc, bfc are the array of weighting factors and bias of the fully
connected layer and o2t is the output vector of the fully connected layer,
2.2. LSTM cell structure N is the number of samples, K is the number of classes, wk is the penalty
weight for class k, tnk is the indicator that the n-th sample belongs to the
In the LSTM-based models, the LSTM layer was deployed to improve k-th class, and ynk is the output class for sample n for the k-th class.
the ability of information capturing. The LSTM uses state variables to For the target domain, a regression layer determines the half-mean-
extract temporal behavior and manipulate long-term dependency by a squared-error loss function (LossR ) of the predicted responses at each
gating mechanism. Additionally, LSTM prevents gradient problems by time step in sequence-to-sequence regression networks.
storing the memory in sigmoid ranges, which is suitable for future The loss function in Eq. (4) is utilized to tune hyperparameters and
backpropagation. The input gate of LSTM contains two types of activa­ update weighting factors as well as bias in each layer of a deep network
tion functions, which are the sigmoidal and hyperbolic tangent func­ structure.
tions. First, the hyperbolic tangent generates a cell candidate and is
updated by the gating vector from the sigmoid activation function. Then, 1 ∑N
( )2
LossR = yactual,t − o2t (4)
the forget gate decides whether part of the information needs to be 2N t=1
considered and which may be disregarded. Finally, the output gate
calculates the output vector from the current and previous inputs. By the where the yactual,t is the actual class values, and N is the number of
updated cell state, the output gate decides which part of the current cell samples.
state will be carried out as a final output from LSTM layers. The LSTM To prevent the overfitting problem in this work, regularization is

4
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 2. LSTM cell structure.

5
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 3. Transfer learning network information.

applied to the target tasks by adding to the penalty term, which mini­ in Fig. 5.
mizes the weighting factor in each layer. The final error function in the
target task can be calculated by Eqs. (5) and (6). 3.1. Data normalization
ER = LossR + λΩ(w) (5)
The scale of the input variable is one of the most important aspects of
1 learning stability. Relatively different input variables not only conduct
Ω(w) = wT w (6) imbalanced weighting factors and bias in prediction but also cause the
2
learning process to fail with an exploding gradient. In the study, the
where w is the weighting vector, λ is the regularization factor. input variable consists of large-scale input, i.e., flow rate and utility
consumption, and small scale, including positive and negative measur­
2.4. Transfer learning modeling able temperature. Hence, this study applied z-score normalization on
sequence input layers to adjust the input variable to the same scale and
The transfer learning problem allows users to transfer knowledge ensure that the network can operate on the information using Eqs. (7)–
and adopt the model across the domain differently based on the source (9).
and target tasks [27]. In this study, the model parameter-based transfer
learning is deployed by presupposing that the source and target tasks xji − μj
zji = (7)
overlap some parameter or subsequent distributions of the σj
hyper-parameters [28]. The knowledge gained from the source task can ∑
n
solve the weight initialization problem and increase the prediction xji
reliability of the target task [29]. Partial layer freezing is also imple­ μj = i=1 (8)
n
mented to prevent overwriting pre-trained weight and bias of the
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
learnable activations and maintain the monitoring ability in fault ∑n ( j )2
i=1 xi − μ
j
detection - identification of the source task. The learnable activation j
σ = (9)
information of the source and target tasks are summarized in Fig. 3, and n− 1
the structure of the proposed TDFI-EEP is visualized in Fig. 4. j
where xi is the original i th data point of the j-th input feature, μj is the
3. Data processing steps mean value of the j-th input feature, and σ j is the standard deviation of
the j-th input feature.
This section provides data processing steps and modeling procedures
for conventional neural networks and transfers learning across the 3.2. Hyperparameter tuning and model validation
domain. The information in this section is based on the data perspective,
which includes data normalization, hyperparameter tuning, model The grid-search method explores the optimal set of hyperparameters
validation, and model performance indicator calculation, as illustrated under the searching domain specified in Table 3. To evaluate the general

6
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 4. TFDI-EEP network structure and training steps.

7
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 5. Overall transfer learning procedure.

highest validation performance is chosen as an optimal set of hyper­


Table 3
parameters for offline and online testing.
Hyperparameters searching domain.
Hyperparameter Range 3.3. Performance indicator calculation
FFNN hidden node {5 - 200}
Number of FFNN hidden layers {1 - 2} In this study, three statistical performance metrics are evaluated in
RNN hidden node {5 - 200}
the final step of the modeling framework. These indicators, including
Recurrent delay {1 - 4}
CNN Filter Size {3 - 10}
coefficient of determination (R-squared), root mean squared error
Number of CNN filters {10 - 100} (RMSE), and mean absolute percentages error (MAPE), are introduced to
Number of CNN layers {1 - 2} track the performance of the model as expressed in Eqs. (10)–(12). The
CNN padding {Same, Casual} value of r-squared depicts the overall model fitness and efficiently in­
Number of the 1st LSTM layers {1 - 2}
terprets the reproducibility of the model. The RMSE and MAPE are the
Hidden node of the 1st LSTM layers {5 - 200}
Number of the 2nd LSTM layers {1 - 2} alternative residual quantifying methods, demonstrating the exact scale
Hidden node of the 2nd LSTM layers {5 - 200} of the output and the dimensionless relative error in the percentages,
Regularization factor {0 - 50} respectively.
Learning rate {0.01 - 10}
∑n
Wight update optimizer {Adam, RMSProp, SGD} (yi − ̂y i )2
R2 = 1 − ∑i=1 n 2
(10)
i=1 (yi − y)

performance of the model under fault scenarios and prevent the model √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
from overfitting problems, k-fold cross-validation is applied with the 1∑n
RMSE = (yi − ̂ y i )2 (11)
number of folds equal to five in each iteration of hyperparameter n i=1
exploration. Cross-validation is the resampling method that uses n ⃒ ⃒
different parts of the information to validate the model by dividing the 1∑ ⃒yi − ̂
y i ⃒⃒
MAPE = ⃒
⃒ × 100% (12)
entire dataset into five groups. Then, four folds are used as the training n i=1 yi ⃒
dataset, while the remaining one is the testing dataset. The procedure is
repeated until every fold is used as a testing dataset. The average per­ where yi is the actual output of the i th sample, ̂
y i is the predicted output
formance indicator is calculated from every fold result and reported as a value of the i th sample, and y is the mean value of the original output
validation performance. Finally, the hyperparameter that gives the value.

8
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 6. Simple block diagram of the VCM production process.

4. Descriptions of process and data generation which are 10% fault variation (randomly between 10% and 15%) and
20% fault variation (randomly between 20% and 25%). The whole
The studied vinyl chloride monomer (VCM) production process dataset is divided into training, validation, and testing with a percentage
consists of five sections: chlorination, oxychlorination, ethylene of 60%, 20%, and 20%. Fig. 9 illustrates an example of the input vari­
dichloride (EDC) purification, EDC cracking, and VCM purification as ables under 10% fault variation, which are characterized by the tem­
illustrated in Fig. 6. According to the energy distribution in Fig. 7, the perature and flow rate, respectively.
EDC cracking and VCM purification sections consume a high energy load
because it consists of the cracking furnace, multiple energy-interaction 4.3. Energy efficiency and specific energy consumption
units, and a series of quencher and distillation columns [30]. There­
fore, the EDC cracking and VCM purification sections are selected as the Energy efficiency can be interpreted in various forms, according to
case study for this energy efficiency prediction. the aims of the study. A specific energy consumption (SEC), which refers
to the ratio of the energy supplied as an input to the number/quantity of
4.1. EDC cracking and VCM purification sections the product produced [33], is adopted as a monitoring indicator in this
work. The SEC represents the energy intensity and productivity of the
In the EDC cracking section, vaporized EDC is fed into the thermal process, which is calculated by Eq. (13)
furnace, in which it is pyrolyzed into VCM and other byproducts. A se­
Ein
ries of quenchers and condensers quickly cool the hot gas mixture before SEC = (13)
V
transferring it to the VCM purification section. The first column refines
HCl to the top before recycling it to the oxychlorination section, while where Ein is the energy supplied to the process, and V is the VCM pro­
the second column separates VCM from unreacted EDC. Fig. 8 depicts duction rate.
the process flow diagram of the EDC cracking and VCM purification Since the SEC is easily understood and enables direct reporting of the
sections, including a notation of 40 measurable variables used to esti­ amount of energy consumed per unit of product, it is beneficial for
mate energy efficiency. These input variables are directly supplied into managing, monitoring, and comparing the energy usage of the petro­
the input layer of the model without any data preprocessing techniques, chemical industry. Additionally, it can be utilized for benchmarking at
or feature selection approaches employed. multiple scales, including process, national, and international bench­
marking. Likewise, the SEC calculations also have the potential to
4.2. Aberrant measurement signal determine operating profit margins per product and machine safety.

The aberrant measurement signal can be defined as a temporary or 5. Results and discussion
permanent fluctuation in the output signal of sensors under normal
conditions [31]. The period of these variations depends on the types of 5.1. Hyperparameter tuning result
anomalies that cause faulty behavior. It can be present in the form of
outliers, measurement faults [32], and process uncertainty. The number The optimal hyperparameters tuning by the grid-search method with
of fault classes of the studied VCM process is 41, which corresponds to maximum cross-validation r-squared for conventional and LSTM-based
the number of sensors included in the normal operation of the measuring methods are shown in Tables 4 and 5, respectively. Before being trans­
system. ferred to the target domain to predict energy efficiency, the classifica­
The data used in this study was simulated to gather process infor­ tion models of the FDI-EEP and TFDI-EEP must first be optimized in the
mation by UniSim Design Suite and develop aberrant process signals training step for the number of hidden layers and hidden nodes until the
using MATLAB through the co-simulation approach. MATLAB delivered maximum detection accuracy is achieved.
the requested operational condition information to the UniSim Design Fig. 10 visualizes the activation nodes and their values along the
Suite, which utilized them to produce normal samples and construct training sequence. The activation visualization by an activation map
datasets of defective signals. As a result, two datasets with over 2000 [34] improves interpretability and transparency for valuable insights
datapoints with different amplitude of fault variation were generated, from a data analytics perspective. Furthermore, it shows the values of

9
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 7. Energy efficiency distribution of the VCM production process.

10
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 8. Process flow diagram of the EDC cracking and VCM purification sections.

11
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 9. Examples of a) flow and b) temperature input variables under 10% fault variation.

12
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Table 4 LSTM over time to evaluate the involvement of the input variables in the
Optimized hyperparameters of the conventional models. source task. There are 41 fault cases, but only 26 features were used
Hyperparameter 10% Variation 20% Variation since the activation steps demonstrate that the LSTM has feature selec­
FFNN RNN CNN FFNN RNN CNN tion capability, and not all the information from the process features is
FFNN hidden node 5 – – 25 – – necessary to detect and identify all the cases with measurement faults.
FFNN hidden layers 1 – – 1 – –
RNN hidden node – 120 – – 130 – 5.2. Energy efficiency prediction
Recurrent delay – 2 – – 1 –
CNN Filter Size 3 3
Table 6 summarizes the energy efficiency prediction performance
– – – –
Number of CNN filters – – 30 – – 30
Number of CNN layers – – 1 – – 2 under 10% and 20% fault variations. All models were averaged with 15
CNN padding – – Same – – Same iterations, and the training epoch was specifically set to 500 for evalu­
Learning rate 0.1 1 0.1 0.1 1 0.1 ating the general performance of the network. The results show that
Regularization factor 1 0 0 0.5 0 0
TFDI-EEP performs energy efficiency prediction outstandingly
Weight update optimizer Adam SGD Adam Adam SGD Adam
compared with other models. In addition, the transfer parameters assist
the target prediction task in providing the lowest RMSE, MAPE, and the
highest r-squared on both fault variations. Table 7 shows the speed in­
dicator of each model, including training time, prediction speed, and
Table 5
Optimized hyperparameters of LSTM-based models. execution time. These indicators refer to the time required in the
training and model implementation step. As a result, the training time of
Hyperparameter 10% Variation 20% Variation
TFDI-EEP is 4 to 5 longer than the regular LSTM-based models such as
EEP FDI- TFDI- EEP FDI- TFDI-
EEP EEP EEP EEP EEP but compensates with higher prediction accuracy. The execution
time and prediction speed of every model in this study are less than a
LSTM layers for – 2 2 – 1 1
classification second, making it applicable for real-time implementation.
Hidden node of the – 40 40 – 40 40 Fig. 11 shows the comparative results of the predicted and actual
first LSTM layers energy efficiency values. One important point to view from the figure is
Hidden node of the – 40 40 – – – that most models encounter bias/variance problems. Since the fault
second LSTM layers
variation increases, the error is inadvertently increased on the predicted
LSTM layers for 1 1 1 1 1 1
regression SEC. However, as can be seen from the prediction residual of the diag­
Hidden node of LSTM 10 60 5 40 65 5 onal line, the model parameter-based transfer learning method helps the
layers model discover an excellent tradeoff between bias and variance.
Learning rate 0.45 0.75 0.4 1 0.9 0.4
Regularization factor 0.6 0.3 0.2 0.3 0.4 0.3
Weight optimizer Adam Adam Adam Adam Adam Adam

Fig. 10. Training LSTM activation.

13
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Table 6
Model performance for energy efficiency prediction.
Dataset Model 10% fault variation 20% fault variation
R-squared MAPE RMSE R-squared MAPE RMSE

Training FFNN 0.792 1.729 0.096 0.582 3.055 0.165


RNN 0.844 1.619 0.099 0.776 2.083 0.119
CNN 0.928 1.360 0.078 0.913 1.305 0.081
EEP 0.875 1.654 0.093 0.802 2.055 0.157
FDI-EEP 0.972 0.928 0.046 0.966 0.968 0.047
TFDI-EEP 0.983 0.545 0.034 0.981 0.698 0.035
Validation FFNN 0.771 4.082 0.232 0.578 2.768 0.153
RNN 0.681 1.977 0.153 0.558 3.640 0.204
CNN 0.891 1.355 0.097 0.887 1.362 0.104
EEP 0.821 1.714 0.121 0.762 2.111 0.139
FDI-EEP 0.961 1.065 0.051 0.951 1.087 0.054
TFDI-EEP 0.975 0.576 0.039 0.972 0.893 0.041
Testing FFNN 0.817 4.026 0.228 0.581 3.226 0.172
RNN 0.676 1.989 0.188 0.597 3.780 0.218
CNN 0.914 1.427 0.085 0.891 1.528 0.101
EEP 0.807 1.654 0.129 0.786 2.274 0.150
FDI-EEP 0.971 1.016 0.047 0.961 1.091 0.049
TFDI-EEP 0.976 0.541 0.038 0.973 0.767 0.041

GPU floating-point discrepancy, results in different outcomes as conse­


quences. Fig. 14 shows the retrained validation r-squared of each model
Table 7 over 100 times on 10% and 20% fault variation. It is apparent that
Speed indicator for energy efficiency prediction. TFDI-EEP is reliable in predicting energy efficiency under fault sce­
Variation Model Training time Prediction speed Execution time narios, as identified by the interquartile range and whisker enlargement.
(s) (obs/s) (ms) Also, the TFDI-EEP gives the highest mean, median, and maximum
10% FFNN 4 3479 0.28 validation r-squared on both 10% and 20% fault variation. Conversely,
RNN 14 2073 0.48 the validation r-squared decreases as the fault variation intensify, as
CNN 28 1001 0.99 evidenced by a lower minimum validation r-squared except for RNN.
EEP 58 1325 0.75
FDI-EEP 239 762 1.31
TFDI- 265 977 1.02 5.5. Wide-ranging test
EEP
20% FFNN 3 3164 0.32 The fault variation has a direct effect on the prediction performance.
RNN 12 2130 0.47
CNN 40 818 1.22
This section provides the wide-ranging test result, which uses 10% fault
EEP 53 1611 0.62 variation hyperparameters and datasets to train the model and test the
FDI-EEP 238 1585 0.63 situation by the 10% – 100% fault variation datasets. Fig. 15 demon­
TFDI- 264 1094 0.91 strates the testing r-squared under 10% – 100% fault variation datasets.
EEP
The result shows that TFDI-EEP can handle up to 40% fault variation
while preserving a testing r-squared greater than 0.95. Moreover, TFDI-
5.3. Effect of detection accuracy on energy efficiency prediction EEP depends on higher fault variation based on the finding of outlier and
higher minimum testing r-squared and is more accurate than the other
Fig. 12 shows the improvement rate of TFDI-EEP from the conven­ models.
tional LSTM energy efficiency predictor. The result shows that the
additional tasks on the source domain or the pretrained classifier 6. Conclusions
improved the prediction accuracy on the target domain, in which the
improvement rate is approximately around 3.74% to 9.89% based on the In this paper, TFDI-EEP is proposed to track the specific energy
classification accuracy. consumption of production and monitor the measurement behavior of
Fig. 13 illustrates the effect of classification accuracy on testing r- the vinyl chloride monomer process. The proposed model is accom­
squared with a 10% fault variation dataset. In the range between 30% plished to tackle the issue of energy efficiency prediction under aberrant
and 70% classification accuracy, the prediction r-squared is maintained measurements. TFDI-EEP is constructed based on the trained network of
at the average performance of TFDI-EEP. The prediction r-squared LSTM for classification tasks combined with partial layers freezing and
slightly deviates from the average performance when the classification transfer to the target domain. Furthermore, TFDI-EEP was retrained to
accuracy is less than 30%. In contrast, when the classification accuracy check the reliability and reproducibility of the model, which determines
is greater than 70%, the testing r-squared performs significantly better the overall performance by k-fold cross-validation techniques. The
than the average performance of TDFI-EEP. model also included extensive testing to assess the model robustness
when fault variance increased from 10% to 100%. The following con­
clusions can be drawn from the current findings of the study:
5.4. Reliability distribution of energy efficiency predictor
1 The transfer procedure has the potential to improve energy predic­
Another essential indicator for model-based prediction is reliability tion performance by transferring knowledge gained from elementary
and reproducibility [35], which implies that the model can repeatedly to more complex tasks in order to strengthen the learnability and
run a specific algorithm and receive reliable, identical, or nearly the reproducibility of the network under fault conditions.
same results and indicates that the model is scalable and capable for 2 The activation mapping provided by FDI (source task) reveals
large-scale production. A single small change in one step during the meaningful insights on which variable should be monitored to detect
modeling phase, i.e., changes in data, framework, data limitation, and any system anomaly.

14
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 11. Actual and predicted SEC comparison.

Fig. 12. Percentages of improvement over baseline of prediction.

3 Based on the reproducibility and wide-ranging test, TFDI-EEP is


more reliable when predicting energy efficiency under small-scale
and wide-range fault variations.
4 The maximum magnitude of fault variation that TFDI-EEP can be
handled is 40%, and the prediction performance is stable when the
detection accuracy is between 30% and 70%.

Theoretically, this work bridges the gap between domain fault


detection and identification with energy efficiency prediction. However,
there are some limitations that need to be considered. The model con­
sumes additional training time due to the construction of the classifi­
cation task, and proficiency in energy efficiency prediction strongly
depends on the FDI accuracy. Since the accuracy of the source domain is
less than 30%, the model needs to be recalibrated. A self-adaptive AI
[36] could be considered in the future for model selection to automat­
ically update the information in real-time without retraining, especially
Fig. 13. Effect of detection accuracy and prediction r-squared on TFDI-EEP.

15
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 14. Box plot of validation r-squared.

16
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

Fig. 15. Wide-ranging testing result.

when the classification accuracy drop. Moreover, the number of pre­ [3] Machalek D, Tuttle J, Andersson K, Powell KM. Dynamic energy system modeling
using hybrid physics-based and machine learning encoder–decoder models. Energy
dictor variables for the two domains must be identical because the size
AI 2022;9:100172. https://doi.org/10.1016/j.egyai.2022.100172.
of the transferred weight and bias can be inconsistent with the dimen­ [4] Chen C-Y, Chai KK, Lau E. AI-Assisted approach for building energy and carbon
sion of the designed network. We plan to use the non-identical features footprint modeling. Energy AI 2021;5:100091. https://doi.org/10.1016/j.
of transfer learning by implementing alternative transfer procedures in egyai.2021.100091.
[5] Sharifian S, Sotudeh-Gharebagh R, Zarghami R, Tanguy P, Mostoufi N. Uncertainty
the future. in chemical process systems engineering: a critical review. Rev Chem Eng 2021;37:
687–714. https://doi.org/10.1515/revce-2018-0067.
[6] Jan SU, Lee YD, Koo IS. A distributed sensor-fault detection and diagnosis
Declaration of Competing Interest framework using machine learning. Inf Sci (Ny) 2021;547:777–96. https://doi.org/
10.1016/j.ins.2020.08.068.
The authors declare that they have no known competing financial [7] Xu C, Zhao S, Liu F. Sensor fault detection and diagnosis in the presence of outliers.
Neurocomputing 2019;349:156–63. https://doi.org/10.1016/j.
interests or personal relationships that could have appeared to influence neucom.2019.01.025.
the work reported in this paper. [8] Yoo M. A resilience measure formulation that considers sensor faults. Reliab Eng
Syst Saf 2020;7. https://doi.org/10.1016/j.ress.2019.02.025.
[9] Beisheim B, Rahimi-Adli K, Krämer S, Engell S. Energy performance analysis of
Data availability continuous processes using surrogate models. Energy 2019;183:776–87. https://
doi.org/10.1016/j.energy.2019.05.176.
The data that has been used is confidential. [10] Moghadasi M, Ozgoli HA, Farhani F. Steam consumption prediction of a gas
sweetening process with methyldiethanolamine solvent using machine learning
approaches. Int J Energy Res 2021;45:879–93. https://doi.org/10.1002/er.5979.
[11] Geng Z, Zhang Y, Li C, Han Y, Cui Y, Yu B. Energy optimization and prediction
Acknowledgments modeling of petrochemical industries: an improved convolutional neural network
based on cross-feature. Energy 2020;194:116851. https://doi.org/10.1016/j.
energy.2019.116851.
The author would like to acknowledge the support of the Faculty of [12] Miele ES, Bonacina F, Corsini A. Deep anomaly detection in horizontal axis wind
Engineering, Kasetsart University (Grant No. 65/10/CHEM/M.Eng), the turbines using graph convolutional autoencoders for multivariate time series.
Energy AI 2022;8:100145. https://doi.org/10.1016/j.egyai.2022.100145.
Kasetsart University Research and Development Institute, and Kasetsart [13] Roelofs CMA, Lutz M-A, Faulstich S, Vogt S. Autoencoder-based anomaly root
University. In addition, the authors would like to acknowledge Honey­ cause analysis for wind turbines. Energy AI 2021;4:100065. https://doi.org/
well and GRD Tech Co., Ltd for providing the UniSim Design Suite 10.1016/j.egyai.2021.100065.
[14] Fang X, Gong G, Li G, Chun L, Li W, Peng P. A hybrid deep transfer learning
simulation software used in this study.
strategy for short term cross-building energy prediction. Energy 2021;215:119208.
https://doi.org/10.1016/j.energy.2020.119208.
References [15] Shi Z, Chehade A. A dual-LSTM framework combining change point detection and
remaining useful life prediction. Reliab Eng Syst Saf 2021;205:107257. https://doi.
org/10.1016/j.ress.2020.107257.
[1] Golmohamadi H. Demand-side management in industrial sector: a review of heavy
[16] Tien PW, Wei S, Darkwa J, Wood C, Calautit JK. Machine learning and deep
industries. Renew Sustain Energy Rev 2022;156:111963. https://doi.org/10.1016/
learning methods for enhancing building energy efficiency and indoor
j.rser.2021.111963.
environmental quality – a review. Energy and AI 2022;10:100198. https://doi.org/
[2] Hassani H, Silva ES, Al Kaabi AM. The role of innovation and technology in
10.1016/j.egyai.2022.100198.
sustaining the petroleum and petrochemical industry. Technol Forecast Soc Change
2017;119:1–17. https://doi.org/10.1016/j.techfore.2017.03.003.

17
C. Panjapornpon et al. Energy and AI 12 (2023) 100224

[17] Panjapornpon C, Bardeeniz S, Hussain MA. Improving energy efficiency prediction [27] Peirelinck T, Kazmi H, Mbuwir BV, Hermans C, Spiessens F, Suykens J, et al.
under aberrant measurement using deep compensation networks: a case study of Transfer learning in demand response: a review of algorithms for data-efficient
petrochemical process. Energy 2023;263:125837. https://doi.org/10.1016/j. modelling and control. Energy AI 2022;7:100126. https://doi.org/10.1016/j.
energy.2022.125837. egyai.2021.100126.
[18] Westermann P, Evins R. Using Bayesian deep learning approaches for uncertainty- [28] Pinto G, Wang Z, Roy A, Hong T, Capozzoli A. Transfer learning for smart
aware building energy surrogate models. Energy AI 2021;3:100039. https://doi. buildings: a critical review of algorithms, applications, and future perspectives.
org/10.1016/j.egyai.2020.100039. Adv Appl Energy 2022;5:100084. https://doi.org/10.1016/j.adapen.2022.100084.
[19] Panjapornpon C, Bardeeniz S, Hussain MA. Deep learning approach for energy [29] Fan C, Sun Y, Xiao F, Ma J, Lee D, Wang J, et al. Statistical investigations of transfer
efficiency prediction with signal monitoring reliability for a vinyl chloride learning-based methodology for short-term building energy predictions. Appl
monomer process. Reliab Eng Syst Saf 2022:109008. https://doi.org/10.1016/j. Energy 2020;262:114499. https://doi.org/10.1016/j.apenergy.2020.114499.
ress.2022.109008. [30] Chinprasit J, Panjapornpon C. Model predictive control of vinyl chloride monomer
[20] Oyewole I, Chehade A, Kim Y. A controllable deep transfer learning network with process by Aspen plus dynamics and MATLAB/Simulink co-simulation approach.
multiple domain adaptation for battery state-of-charge estimation. Appl Energy IOP Conf Ser: Mater Sci Eng 2020;778:012080. https://doi.org/10.1088/1757-
2022;312:118726. https://doi.org/10.1016/j.apenergy.2022.118726. 899X/778/1/012080.
[21] Yang D, Peng X, Ye Z, Lu Y, Zhong W. Domain adaptation network with uncertainty [31] Saeed U, Lee Y-D, Jan SU, Koo I. CAFD: context-aware fault diagnostic scheme
modeling and its application to the online energy consumption prediction of towards sensor faults utilizing machine learning. Sensors 2021;21:617. https://doi.
ethylene distillation processes. Appl Energy 2021;303:117610. https://doi.org/ org/10.3390/s21020617.
10.1016/j.apenergy.2021.117610. [32] Zhang Z, Mehmood A, Shu L, Huo Z, Zhang Y, Mukherjee M. A survey on fault
[22] Wang C, Chen D, Chen J, Lai X, He T. Deep regression adaptation networks with diagnosis in wireless sensor networks. IEEE Access 2018;6:11349–64. https://doi.
model-based transfer learning for dynamic load identification in the frequency org/10.1109/ACCESS.2018.2794519.
domain. Eng Appl Artif Intell 2021;102:104244. https://doi.org/10.1016/j. [33] Lawrence A, Thollander P, Andrei M, Karlsson M. Specific energy consumption/use
engappai.2021.104244. (SEC) in energy management for improving energy efficiency in industry: meaning,
[23] Mondal S, Chattopadhyay A, Mukhopadhyay A, Ray A. Transfer learning of deep usage and differences. Energies 2019;12:247. https://doi.org/10.3390/
neural networks for predicting thermoacoustic instabilities in combustion systems. en12020247.
Energy AI 2021;5:100085. https://doi.org/10.1016/j.egyai.2021.100085. [34] Salahuddin Z, Woodruff HC, Chatterjee A, Lambin P. Transparency of deep neural
[24] Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2015; networks for medical image analysis: a review of interpretability methods. Comput
61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003. Biol Med 2022;140:105111. https://doi.org/10.1016/j.
[25] Wang H, Lei Z, Zhang X, Zhou B, Peng J. A review of deep learning for renewable compbiomed.2021.105111.
energy forecasting. Energy Convers Manage 2019;198:111799. https://doi.org/ [35] Gundersen OE, Shamsaliei S, Isdahl RJ. Do machine learning platforms provide
10.1016/j.enconman.2019.111799. out-of-the-box reproducibility? Future Gener Comput Syst 2022;126:34–47.
[26] Walser T, Sauer A. Typical load profile-supported convolutional neural network for https://doi.org/10.1016/j.future.2021.06.014.
short-term load forecasting in the industrial sector. Energy AI 2021;5:100104. [36] Luo X, Oyedele LO. A self-adaptive deep learning model for building electricity
https://doi.org/10.1016/j.egyai.2021.100104. load prediction with moving horizon. Mach Learn Appl 2022;7:100257. https://
doi.org/10.1016/j.mlwa.2022.100257.

18

You might also like