0% found this document useful (0 votes)
8 views

Entropy: Multi-Sensor Vibration Signal Based Three-Stage Fault Prediction For Rotating Mechanical Equipment

Uploaded by

mohaned.jedidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Entropy: Multi-Sensor Vibration Signal Based Three-Stage Fault Prediction For Rotating Mechanical Equipment

Uploaded by

mohaned.jedidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

entropy

Article
Multi-Sensor Vibration Signal Based Three-Stage Fault
Prediction for Rotating Mechanical Equipment
Huaqing Peng 1 , Heng Li 1 , Yu Zhang 1 , Siyuan Wang 2 , Kai Gu 1, * and Mifeng Ren 2, *
1 State Key Laboratory of Nuclear Power Safety Monitoring Technology and Equipment,
China Nuclear Power Engineering Co., Ltd., Shenzhen 518172, China; [email protected] (H.P.);
[email protected] (H.L.); [email protected] (Y.Z.)
2 College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China;
[email protected]
* Correspondence: [email protected] (K.G.); [email protected] (M.R.)

Abstract: In order to reduce maintenance costs and avoid safety accidents, it is of great significance to
carry out fault prediction to reasonably arrange maintenance plans for rotating mechanical equipment.
At present, the relevant research mainly focuses on fault diagnosis and remaining useful life (RUL)
predictions, which cannot provide information on the specific health condition and fault types of
rotating mechanical equipment in advance. In this paper, a novel three-stage fault prediction method
is presented to realize the identification of the degradation period and the type of failure simultane-
ously. Firstly, based on the vibration signals from multiple sensors, a convolutional neural network
(CNN) and long short-term memory (LSTM) network are combined to extract the spatiotemporal
features of the degradation period and fault type by means of the cross-entropy loss function. Then,
to predict the degradation trend and the type of failure, the attention-bidirectional (Bi)-LSTM network
is used as the regression model to predict the future trend of features. Furthermore, the predicted
 features are given to the support vector classification (SVC) model to identify the specific degradation

period and fault type, which can eventually realize a comprehensive fault prediction. Finally, the
Citation: Peng, H.; Li, H.; Zhang, Y.;
NSF I/UCR Center for Intelligent Maintenance Systems (IMS) dataset is used to verify the feasibility
Wang, S.; Gu, K.; Ren, M.
and efficiency of the proposed fault prediction method.
Multi-Sensor Vibration Signal Based
Three-Stage Fault Prediction for
Keywords: vibration signal; fault prediction; multiple sensors; CNN; attention-Bi-LSTM
Rotating Mechanical Equipment.
Entropy 2022, 24, 164. https://
doi.org/10.3390/e24020164

Academic Editors: Chi-Hua Chen,


1. Introduction
Jianhua Zhang and Qichun Zhang
In the production processes of modern industries, the performance of rotating mechan-
Received: 9 December 2021 ical equipment may degrade over time, even resulting in failure due to long-term operation
Accepted: 18 January 2022 under severe conditions such as high speed, high temperature, high pressure, and heavy
Published: 21 January 2022 loads. To ensure the safety and efficiency of the operation, health monitoring and the estab-
Publisher’s Note: MDPI stays neutral lishment of a maintenance strategy have become active research focus in both industry and
with regard to jurisdictional claims in academia [1–4]. Initially, the maintenance strategy was implemented after fault diagnosis
published maps and institutional affil- or preventive maintenance. We know that different types of equipment faults have specific
iations. vibration frequency characteristics. Therefore, traditional signal processing methods, such
as Fourier transform (FT) [5], short-time Fourier transform (STFT) [6], and wavelet trans-
form (WT) [7], have been proposed to obtain useful features from the vibration signal to
reflect the operating status of the rotating mechanical equipment. However, the above fault
Copyright: © 2022 by the authors. diagnosis methods rely heavily on expert experience. In order to solve this problem, deep
Licensee MDPI, Basel, Switzerland. learning methods, such as CNN [8], LSTM [9], and the combined CNN and LSTM [10],
This article is an open access article have been used in fault diagnosis more recently, displaying the ability of deep feature
distributed under the terms and
self-learning without relying on manual intervention and prior knowledge. Although
conditions of the Creative Commons
these methods can achieve excellent results in fault diagnosis, they cannot provide early
Attribution (CC BY) license (https://
warnings and take recovery measures before the fault occurs. Therefore, fault prediction is
creativecommons.org/licenses/by/
gradually emerging as a preventive maintenance method.
4.0/).

Entropy 2022, 24, 164. https://doi.org/10.3390/e24020164 https://www.mdpi.com/journal/entropy


Entropy 2022, 24, 164 2 of 16

In [11], Peng Y et al. pointed out that fault prediction involves determining the RUL
or working time of the diagnostic component based on the historical condition of the
component. The research on RUL prediction can be divided into two categories: model-
based methods and data-driven methods [12]. Lei Y et al. used maximum likelihood
estimation and a particle filter algorithm to predict the RUL of bearings [13]. The proposed
method is not well applicable to abrupt degeneration trends. Model-based methods rely on
prior knowledge and specific conditions, whereas the data-driven approach attempts to use
deep learning to deduce the degradation of equipment based on a large amount of historical
data. In [14], a spectrum-principal-energy-vector method was used to extract features first,
and then a deep CNN was formulated to obtain the RUL of bearings according to the
features. For the first time, Xia Mei et al. divided the monitoring data into different health
stages [15]. Based on this approach, the RUL of equipment was predicted using de-noising
auto-encoder-based deep neural networks (DNNs). Although the above RUL methods
can estimate how long it is until a fault will occur based on historical information, they
are unable to provide the exact degradation period and fault type. To solve this problem,
based on gray relational analysis, Wei X U et al. used a neural network model to predict
the future state of a rolling bearing [16]. In [17], Xu H et al. used two models: a regression
model and a classification model, which could not only predict the stage of degradation,
but also classify the type of fault that would occur. However, the traditional wavelet packet
transform (WPT) method was used to extract the time-frequency domain features of the
original vibration signal, which requires expertise to select the appropriate basis function.
Moreover, deep learning methods rely heavily on data information. Recent studies have
shown that using multi-sensor data with sensor fusion technology can improve the accuracy
and robustness of fault diagnosis models [18,19].
Therefore, in this paper, CNN, LSTM, and support vector classification (SVC) are
combined to establish a novel three-stage fault prediction model for rotating mechanical
equipment based on vibration signals from multiple sensors. Compared with the existing
fault prediction results, the main contributions of the paper are as follows:
1. More formative vibration signals, used for training the fault prediction model, are
collected from multiple sensors to improve the accuracy of the prediction method;
2. Deep features of varies degradation periods and fault types can be extracted by
CNN and LSTM automatically without relying on manual intervention and profes-
sional knowledge;
3. The degradation period and fault type can be predicted simultaneously in advance
with high accuracy.
The rest of this paper is structured as follows. In Section 2, the proposed three-
stage fault prediction framework is generally introduced. Section 3 presents the details of
the combined CNN and LSTM feature extraction, the attention-bidirectional (Bi)-LSTM
regression model and the SVC classification mode. In Section 4, the superiority of the
proposed fault prediction method is verified by applying it to the Intelligent Maintenance
Systems (IMS) dataset. Section 5 concludes this paper.

2. Problem Formulation and Main Fault Prediction Framework


To accomplish the tasks of predicting the degradation period and the type of failure,
a novel three-stage fault prediction framework is presented based on the multi-sensor
vibration signal, using deep learning and machine learning. The proposed architecture
is illustrated in Figure 1, and can be divided into three parts: CNN-LSTM-based feature
extraction, attention-Bi-LSTM-based prediction, and SVC-based classification.
As shown in Figure 1, the design of three-stage architecture is related to three objectives:
1. In the feature extraction stage, the original vibration signals collected by multiple
sensors are sent to the CNN-LSTM network for the extraction of spatiotemporal
features, which contain operating status information;
2. In the prediction stage, the attention-Bi-LSTM is trained to predict the trend of the features;
Entropy 2022, 24, 164 3 of 16

3. In the classification stage, based on the spatiotemporal features and their trends,
the SVC model is formulated to identify the degradation period and the future
fault type.

Feature
Sensor S1 Sensor Sn
Extraction
Stage

1D Conv layer 1D Conv layer


with BN with BN

1DMaxpooling 1DMaxpooling

1D Conv layer 1D Conv layer


with BN with BN

LSTM LSTM

Feature

Prediction
Attention-Bi-LSTM
Stage

Test Train
Support Vector
Classification

Classification Future Fault Mode


Stage

Figure 1. The overall framework of the three-stage failure prediction method.

The original vibration signal from multiple sensors is collected first. In order to extract
the spatiotemporal information of the obtained vibration signal, a CNN with a convolution-
pooling-convolution structure and an LSTM network are combined to formulate the feature
extraction model. Then, an attention-Bi-LSTM network is used to predict the feature trends,
which can reflect the future health state of the rotating mechanical equipment. Finally,
the predicted features are sent into the SVC for classification, which can achieve the purpose
of predicting the degradation period and fault type simultaneously. The emtire process of
the proposed three-stage fault prediction method is detailed in the following section.

3. Deep Learning Network-Based Three-Stage Fault Prediction


3.1. Feature Extraction Stage
Vibration signals are collected by multiple sensors is in the form of time series with
noise. This is susceptible to the amplitude. Therefore, it is necessary to perform feature
Entropy 2022, 24, 164 4 of 16

extraction on the time-domain vibration signals. The CNN can extract the features of the
original vibration signal, reducing the amount of data and diminishing the noise. However,
it only can capture the spatial features, and the temporal features are ignored. Therefore,
the CNN-LSTM is used in this paper to extract the spatiotemporal features of the original
vibration signal. The framework of the CNN-LSTM model is illustrated in Figure 2. First,
a CNN with a convolution-pooling-convolution structure is used to extract the spatial
features of the original vibration signal. Then, the deep abstract of temporal features can be
obtained by adding the LSTM network after the CNN.

Sensor S1 Sensor Sn

C1 Layer
C6 Layer

C7 Layer
C2 Layer

FT

C3 Layer Softmax

Cross-entropy

C4 and C5 Layer LSTM LSTM LSTM LSTM

Flatten

Figure 2. The model structure of the feature extraction model (Ci denotes the output of the ith layer;
FT denotes the output of the last dense layer).

We can denote the originally collected vibration signal in the form of a time series
from the mth sensor as V m = {Vtm }m=1,2,··· ,n;t=1,2,··· . In order to deal with the problem of
sample imbalance and obtain more detailed characteristics of the vibration signal, the data
are divided into windows . At the same time, it is necessary to ensure that the window size
is properly chosen. For the data in each window, 1D convolution is first used to extract the
shallow features. The convolution operation formula of the τth window is as follows:
!
L
Ci1 (τ ) = f ∑ ωi1 ∗ V m (τ ) + bi1 , (i = 1, 2, · · · , L) (1)
i =1

where Ci1 (τ ) is named the feature map, and it denotes the ith channel output of the
convolution operation; L is the total number of channels; V m (τ ) denotes the vibration
signals in the τth window; ∗ denotes the dot product; f (·) is the activation function;
N represents the number of convolution kernels; ωi1 is the ith weight parameters of the 1st
layer; bi1 represents the ith bias of the 1st layer; and ω and b are undetermined parameters
to be trained.
After
 the 1D convolution operations, 1D maxpooling operations for the processing
results Ci1 i=1,2,...,L are performed, which can reduce the dimension of the feature maps
and further extract key features of the vibration signal:
n o
Ci2 = max Ci,11 1
, Ci,2 1
, · · · , Ci,d , (i = 1, 2, · · · , L) (2)
Entropy 2022, 24, 164 5 of 16

1 represents the kth value of the ith channel of the 1st layer, d is the number
where Ci,k
of samples participating in the maxpooling operation, and Ci2 denotes the output of the
pooling layer.
Another convolution operation is performed on the processing result of the pooling
operation in order to extract the deeper features of the vibration signal:
!
N
Cl3 (τ ) = f ∑ ωl3 ∗ C2,l (τ ) + bl3 (3)
l =1

where the symbols in (3) are the same as those in (1).


Considering the time characteristics of the vibration signal, the LSTM neural network is
used to extract the temporal features, processing the series data with the forgetting memory
scheme while avoiding the gradient disappearance and gradient explosion problems. LSTM
realizes the function of long-term and short-term state transfer through the structure of
an input
 gate, forget gate, and output gate. After obtaining the spatial features from the
CNN Cl3 l =1,2,... , two LSTM networks are employed to extract the temporal features.The
operation principle is as follows: the update of the hidden state hl at time l is based on the
joint action of the input gate il , forgetting gate f l , output gate o l , cell state cl , and hidden
state hl −1 at time l − 1. The specific formula is as follows:
 
il = σ W 1i Cl3 + W 2i hl −1 + bi ,
 
f l = σ W 1 f Cl3 + W 2 f hl −1 + b f ,
 
o l = σ W 1o Cl3 + W 2o hl −1 + bo , (4)
 
cl = f l cl −1 + il tanh W 1c Cl3 + W 2c hl −1 + bc ,
 
hl = o l tanh cl .

where W denotes weight and b is bias. σ denotes the sigmoid activation function, and is
Hadamard product. n o
We can input the LSTM-processed features hl into the two fully connected
l =1,2,...
layers
n tooobtain the eventual features of the original vibration signal, which can be denoted
as FT l .
l =1,2,...  
FEl = f ωl · hl + bl (5)
 
FT l = f ωl0 · FEl + bl0 (6)

where ω and ω 0 denote weights, b and bn0 areobiases, and f (·) is the activation function.
We can iput the obtained feature FT l into the Softmax layer to obtain the
l =1,2,...
probability that the current feature belongs to various categories p̂. Then, one can input the
probability p̂ into the cross-entropy loss function to complete the entire back-propagation
process. The cross-entropy loss function is defined as follows:

N h    i
L = − ∑ p(i) log p̂(i) + 1 − p(i) log 1 − p̂(i) (7)
i =1

where p(i) denotes the true probability that V m = {Vtm }m=1,2,··· ,n;t=1,2,··· belongs to the ith
category; and p̂(i) denotes the corresponding probability obtained from the deep networks’
classification result. The Softmax layer and the cross-entropy loss function are used here
to provide a target for the back-propagation process of the deep network. It can endow
classification attributes to the extracted features.
Entropy 2022, 24, 164 6 of 16

3.2. Prediction Stage


In order to achieve the purpose of failure prediction, after obtaining the operating state
characteristics of the rotating machinery, the feature trend should be predicted. Since these
features have a time series correlation, attention Bi-LSTM is used as the prediction model
here, as shown in Figure 3. Bi-LSTM consists of a forward LSTM and a backward LSTM,
which are able to capture features of both past and upcoming time series. In addition,
in order to enhance the correlation between the output results of Bi-LSTM, an attention
mechanism is added, which can achieve the purpose of redistributing the weights between
the output results.

cp
Attention

ap
hl htar

Fully
connected
LSTM LSTM layer

Bi-LSTM Upcoming
feature
LSTM LSTM

FT

Figure 3. Flow chart of the feature prediction stage.


n o
The features obtained from the feature extraction stage, FT l are first sent into
l =1,2,...
the Bi-LSTM network. The specific principle formulas are as follows:
 
ilF = σ WF1i FTFl + WF2i hlF−1 + biF
 
1f 2f f
f Fl = σ WF FTFl + WF hlF−1 + bF
 
2f
o lF = σ WF1o FTFl + WF hlF−1 + boF (8)
 
clF = f Fl clF−1 + ilF tanh WF1c FTFl + WF2c hlF−1 + bcF
 
hlF = o lF tanh clF
Entropy 2022, 24, 164 7 of 16

 
ilB = σ WB1i FTBl + WB2i hlB+1 + biB
 
1f 2f f
f Bl = σ WB FTBl + WB hlB+1 + bB
 
2f
o lB = σ WB1o FTBl + WB hlB+1 + boB (9)
 
clB = f Bl clB+1 + ilB tanh WB1c FTBl + WB2c hlB+1 + bcB
 
hlB = o lB tanh clB

hl = hlF hlB
 
(10)
where the subscripts F and B denote forward and backward, respectively, and hl represents
the concatenation of the forward output hlF and backward output hlB of Bi-LSTM.
The basic idea of adding the attention mechanism in Bi-LSTM is to calculate the
correlation between the target hidden state htar and the output hidden states hl of Bi-LSTM,
and then to output the attention vector [20]. The principle formulas are listed as follows:
T
score (htar , hl ) = htar Whl (11)

exp(score(htar , hl ))
αts = L
(12)
∑l 0 =1 exp(score(htar , hl 0 ))
S
cp = ∑ αts hl (13)
s =1
 
a p = tanh Wc c p ; htar (14)
where score(·) is a function, αts is the attention weight, c p denotes the context vector, a p
represents the attention
 vector, and W and Wc denote weights.
We input the a p p=1,2,... into the two fully connected layers to obtain the result of the
feature prediction stage, which we denote as { f t p } p=1,2,... .

f e p = f ωp · a p + bp

(15)
 
f t p = f ω 0p · a p + b0p (16)

where ω and ω 0 denote weights, b and b0 are biases, and f (·) is the activation function.

3.3. Classification Stage


After obtaining the future spatiotemporal features of the original vibration signal using
the attention-Bi-LSTM prediction model, the degradation period and fault type should be
identified by formulating a classification model. In fact, the feature types are divided into
several categories by training a Softmax classifier in the feature extraction stage. However,
since the prediction step is added after the feature extraction step, the classification model
and the prediction model cannot perform the same backpropagation. Therefore, considering
the errors of the prediction results, the more robust SVC is used as the classifier instead of
the Softmax with rigorous function mapping [21]. The basic idea of the SVC model is to
find a classifier that maximizes the classification interval between the hyperplane and the
support vector. The principle of the SVC can be addressed as follows. For the predicted
features { f t p } p=1,2,... , we denote their failure modes as {y p } p=1,2,... , y ∈ {−1, 1}. The SVC
problem can be converted into the following quadratic optimization problem:
n
1 n n  
Max ∑ αi −
2 i∑ ∑ i j i j
p p
α α y y K f t i , f t j (17)
i =1 =1 i =1
Entropy 2022, 24, 164 8 of 16

s.t. αi ≥ 0, i = 1, · · · , n
n (18)
∑ αi yi = 0
i =1
 
p p
where α is the Lagrange multiplier. K f ti , f t j is the kernel function, which can map
the linearly inseparable samples in the initial space to the linearly separable samples in
the high-dimensional space. In this paper, we use the radial basis function (RBF) as the
kernel function. p p
− γ k f i − f j k2
 
p p
K f ti , f t j = e ,γ > 0 (19)

where γ is the width of the RBF. The output function of the category can be obtained using
the following formula:
" #
n  
f ( x ) = sgn ∑ αi∗ yi K f ti , f t j + b∗
p p
(20)
i =1

where b∗ is the classification threshold, which is obtained by substituting support vectors.


The SVC fault diagnosis model in this paper adopts the One vs. Rest (OvR) approach
to realize the multi-classification of faults. The main ideas of OvR are as follows: if N
categories need to be classified, N binary classifiers should be constructed. In the training
process, select one category as positive and the others as negative, and then classify them
in turn. In the test process, the test samples are sequentially brought into the trained N
classifiers for calculation, and then the final classification result can be calculated.

3.4. Implementing the Proposed Fault Prediction Strategy


At last, it is worth summarizing the pseudo-code of the entire fault prediction algo-
rithm, which is shown in Algorithm 1:

Algorithm 1 Fault prediction algorithm


1: procedure TRAINING PROCESS
2: Input: original signal from sensor S1 ,. . .,sensor Sn ,and initial labels {ym }m=1,2,...
which are fault modes
3:
n Output:
o the model parameters of trained Attention-Bi-LSTM and SVC; test dataset
FT d
d=1,2,...
4: Random initialization: The feature extraction network parameters {ω } E and {b} E ;
The feature prediction network parameters {W } P and {bi } P
5: Split original signal from S1 ,. . .,sensor Sn into {training set} and {test set} for
feature extraction model
6: for { sensor S1 , . . ., sensor Sn } in {{training set}, {test set}} do
7:
( k in range(times),
for
epochs , if{sensor S1 , . . . , sensor Sn } == {training set}
times = do
1 , if{sensor S1 , . . . , sensor Sn } == {test set}
8: Calculate 11st one-dimensional convolution layer for sensor S1 as Ck1 (S1 )
based on ω11 and b11 ;. . . ;Calculate 1st n one-dimensional convolution layer for sensor Sn
as Ck1 (Sn ) based on ωn1 and bn1
9: Calculate 21nd one-dimensional maxpooling layer for Ck1 (S1 ) as Ck2 (S1 );. . . ;
Calculate 2nd 1
n one-dimensional maxpooling layer for Ck ( Sn ) as Ck ( Sn )
2

10: Calculate 31rd one-dimensional convolution layer for Ck2 (S1 ) as Ck3 (S1 ) based
on ω1 and b13 ;. . . ;Calculate 3rd
3 2 3
n one-dimensional convolution layer for Ck ( Sn ) as Ck ( Sn )
3
based on ωn and bn 3

11: Calculate 41th and 51th LSTM layers for Ck3 (S1 ) as Ck5 (S1 ) based on ω14,5 and
b14,5 ;. . .;Calculate 4th th 3 5 4,5
n and 5n LSTM layers for Ck ( Sn ) as Ck ( Sn ) based on ωn and bn
4,5
Entropy 2022, 24, 164 9 of 16

Algorithm 1 Cont.
12: Concatenate Ck5 (S1 ) ,. . ., Ck5 (Sn ) as Ck6
13: Calculate 1st Dense layer for Ck6 as Ck7 and 2nd Dense layer for Ck7 as
(
FT tr , if { sensor S1 , . . . , sensor Sn } == { training set }
{ FT }m=1,2,... =  te tr=1,2,...
m
FT te=1,2,... , if { sensor S1 , . . . , sensor Sn } == { test set }
based on ω and b , then input { FT m }m=1,2,... to softmax layer
7 7

14: Calculate the cross-entropy loss; Update the {ω } E and {b} E ,


E E
i.e.,{ω } E , {b} E ← {ω 0 } , {b0 }
15: end for
16: end for
17: Reorder { FT m }m=1,2,... in time series
Divide { FT m }m=1,2,... to training dataset FT train,a a=1,2,... and test dataset

18:
n o
FT test,d
d=1,2,...
19: for epoch in range( EPOCHS) do
Calculate the Bi-LSTM model for FT train,a a=1,2,... as Pk1 based on W 1 and bi1

20:
21: Calculate the attention layer for Pk1 as Pk2 based on W 2 and bi2
22: Calculate the Dense layer for Pk2 as Pk3 based on W 3 and bi3
23: Calculate the RMSE loss function and update the {W } P and {bi } P , i.e.,
{W } P , {bi } P ← {W 0 } P , {bi0 } P
24: end for
25: Use cross-validation to train the SVC model with { FT m }m=1,2,... and {ŷm }m=1,2,... to
obtain the parameters of the SVC {support · vector }C
26: end procedure
27: procedure TEST n PROCESSo
28: Input: FT test,d and {yd }d=1,2,... which are fault mode labels of
n o d=1,2,...
FT test,d
d=1,2,...
29: Output: the labels {ŷd }d=1,2,... of fault prediction and their accuracy
30: Leading-in the parameters of trained attention-Bi-LSTM modelnand SVC o model
31: Calculate attention-Bi-LSTM model prediction results for FT test,d as
n o d=1,2,...
f ttest,d
d=1,2,... n o
32: Calculate SVC classifying results {ŷd }d=1,2,... for f ttest,d
d=1,2,...
†[ŷd ==yd ]
33: Calculate accuracy = †[yd ]
,d = 1, 2, . . ., where †[ x ] denotes the number of x
34: end procedure

4. Validating the Proposed Method


In order to verify the fault prediction method proposed in this paper, the IMS dataset
was used [22]. The IMS dataset contains three data sets: dataset 1, with two acceleration
sensors on each bearing, and dataset 2 and dataset 3, with one accelerometer on each
bearing, respectively. In view of the fact that this experiment needs to predict the failure
modes through multiple sensors, dataset 1 is used to verify the proposed method. The sam-
pling frequency of dataset 1 is 20 kHz, and the sampling duration is 1 s. The sampling
interval of the first 43 rounds of each acceleration sensor was used to collect data every
5 min, then to collect data every 10 min and generate a data file containing 20,480 sampling
points. The Python programming environment was used, based on the Keras framework
of version 2.3.1. All experiments were performed on Intel Xeon ES-2620 CPU.
Entropy 2022, 24, 164 10 of 16

4.1. The Description of the Dataset


At the end of the test-to-failure experiment, in dataset 1, bearing 3 displayed an
inner race defect and bearing 4 displayed a roller element defect [23]. The purpose of this
experiment was to classify the bearing degradation period and identify the early fault type
of the bearing based on the vibration signals from multiple sensors. Therefore, bearings
1-3 and 1-4 were selected for research, as shown in Table 1. The entire life of the bearing
was divided into four stages: the norm period, slight period, severe period, and failure
period. In addition, there were two types of faults: the inner race defect and the roller
element defect. Therefore, there were seven fault modes in the experiment, which were
the norm period, the slight inner race defect, the severe inner race defect, the inner race
failure, the slight roller element defect, the severe roller element defect, and the roller
element failure.

Table 1. Introduction to dataset 1.

Dataset Bearing Fault Type Sensor Number


1-1 - 2
1-2 - 2
dataset1
1-3 inner race defect 2
1-4 roller element defect 2

The IMS dataset does not have a detailed true label of the degradation period and
specific failure of the bearing. Therefore, according to the labeling method in [17,24],
the threshold of each stage should be set according to actual needs.In this simulation,
the root mean square (RMS) features of the 20,480 vibration signals collected per second
from each sensor were first extracted. Then, expertise was involved in labeling the degra-
dation period, which is shown in Figure 4 and Table 2. In our simulation, 20,480 samples of
vibration signals were collected every 10 min, and these samples were saved in one file.
There were 2156 sampling files in the whole life cycle. For example, the samples from the
2120th file to the 2151th file belonged to the severe period for bearing 1-3 H.

Table 2. Degradation period settings.

File Numbers Period


Norm Slight Severe Failure
Bearing
1-3 H 1–1850 1851–2119 2120–2151 2152–2156
1-3 V 1–1850 1851–2119 2120–2151 2152–2156
1-4 H 1–1600 1601–2128 2129–2151 2152–2156
1-4 V 1–1600 1601–2128 2129–2151 2152–2156

4.2. Feature Extraction


4.2.1. Training Process
The data need to be processed appropriately in order to extract the more detailed
characteristics of the vibration signal and train the better performance of fault prediction
model. Using the trial and error method, the samples in each file with 20,480 samples were
divided into 80 windows.Therefore, there were 256 samples in every window.
In the feature extraction stage, the task of the CNN-LSTM network is to effectively
extract the spatiotemporal features of the degradation period and fault type simultaneously.
The framework of the CNN-LSTM network for feature extraction in Figure 2 was used here.
The network settings are shown in Table 3. The inputs of Sensor H and Sensor V were both
N × 256 × 1, with N being the number of windows. The batch size was 512. The number
of epochs was 25, and the optimizer used was Adam. The cross-entropy in (7) is employed
here as the loss function.
Entropy 2022, 24, 164 11 of 16

Figure 4. Root mean square features of each bearing.

Table 3. Network parameter settings in the feature extraction stage.

Layer Type Kernel Size/Stride/Numbers Activation Function Padding BN


1-1 Sensor H - - - N
1-2 Sensor V - - - N
2-1 1D Convolution 64/16/16 RELU same Y
2-2 1D Convolution 64/16/16 RELU same Y
3-1 1D Maxpooling 2/2 - valid N
3-2 1D Maxpooling 2/2 - valid N
4-1 1D Convolution 32/8/32 RELU same Y
4-2 1D Convolution 32/8/32 RELU same Y
5-1 LSTM 100 Tanh/Sigmoid - N
5-2 LSTM 100 Tanh/Sigmoid - N
6-1 LSTM 40 Tanh/Sigmoid - N
6-2 LSTM 40 Tanh/Sigmoid - N
7 Concatenate - - - N
8 Dense1 128 - - N
9 Dense2 10(feature) - - N
10 Softmax - - - N
11 Cross-entropy - - - N
Entropy 2022, 24, 164 12 of 16

4.2.2. Verifying the Validity of the Feature Extraction Model


In order to verify the effectiveness of the proposed multi-sensor CNN-LSTM feature
extraction model, a CNN-LSTM network with a single sensor was used as the comparative
model. The evaluation equation was chosen as follows:

TP
accuracy = (21)
TP + FP
where TP is the number of true positives and FP is the number of false positives. The
comparison results between our model and other models are shown in Table 4:

Table 4. Comparison of feature extraction model with different sensor types.

Sensor Type Accuracy


Sensor H 0.892
Sensor V 0.832
Sensor H and Sensor V 0.928

From Table 4, it can be seen that the classification accuracy based on a single-sensor
CNN-LSTM feature extraction model was lower than that based on the multi-sensor model.
The reason is that the large fluctuation and noise of the vibration signals from a single
sensor may mislead the identification of the degradation period, such as the fluctuation of
1-4V shown in Figure 4. However, the vibration signals from multiple sensors can provide
more comprehensive information, which could improve the classification accuracy.

4.3. Trend Prediction


4.3.1. Training Process
The input of the prediction model is the features obtained from the previous section.
The characteristics of the normal period were not used to predict degradation trend, be-
cause the data in that period often show no trend information. In this simulation, the severe
and failure periods for the inner race and the roller element were predicted, respectively.
In order to illustrate the effectiveness of the proposed Attention Bi-LSTM model, only the
prediction results of the failure period for roller defects are presented in the following as
a representation of the model’s performance.The input of our training process was set at
1994 × window width × 10, and the input of the testing process was 400 × window width × 10.
The parameter settings of the prediction model are shown in Table 5.

Table 5. Predictive model network parameter settings.

Layer Units Activation Function


Input - -
Bi-LSTM 100 Tanh/Sigmoid
Attention - -
Dense 75 RELU
Dense 1 -

Sliding time window technology was used to segment the dataset. When the window
moves backward, a series of sample data covering each other will be formed. In the
selection of the sliding window width, we tried three, six, and nine sample points to predict
the next sample point, and used the root mean square error (RMSE) as the scoring criterion
for the prediction error, which is defined in (21). The results are shown in Table 6. It was
Entropy 2022, 24, 164 13 of 16

finally determined that the width of the input window with the smallest prediction error
was six. Therefore, six sample points were used to predict the next sample point.
s
1 n
n i∑
RMSE = (ŷi − yi )2 (22)
=1

where yi is the true value and ŷi is the predicted value.

Table 6. The impact of different input window widths on the prediction results.

Input RMSE
Three inputs 2.221
Six inputs 1.818
Nine inputs 1.906

4.3.2. Testing Results


During model training and evaluation stages, the RMSE was used as the loss function
of the prediction model, and the prediction results are shown in Figure 5.

Figure 5. Comparison of prediction results (features 1 to 10 are shown in sequence).

From Figure 5, it can be seen that the general feature trends can be predicted with
certain errors. The reason for this is that the IMS dataset we used was designed for RUL
prediction instead of degradation period prediction [25]. The final task of this paper was to
classify the predicted features, which allows for an acceptable prediction error.

4.3.3. Comparison with Other Models


In order to verify the prediction effect of the proposed attention-Bi-LSTM model,
LSTM, Bi-LSTM, and attention-LSTM were applied to the IMS dataset for comparison.
The comparative results are shown in Table 7. Since the attention mechanism can redis-
tribute the proportions between sequences according to the predicted target, the use of a
prediction model with an attention mechanism was able to improve the accuracy. More-
over, Bi-LSTM was able to capture features of both past and upcoming time series, and
Entropy 2022, 24, 164 14 of 16

the performance of Bi-LSTM was also better than that of a single LSTM. We can see that the
best prediction model was attention-Bi-LSTM (1.818/1.686).

Table 7. Comparison of prediction models.

Algorithm RMSE
LSTM 1.875
Bi-LSTM 1.828
Attention-LSTM 1.838
Attention-Bi-LSTM 1.818

4.4. Classification
After obtaining the spatiotemporal features and the degradation trends of the bearing,
the SVC was used to identify the degradation period and fault type. In this step, the modes
of classification of the severe period and failure period for inner race and roller element
faults are presented, which is more significant than normal mode identification. In this
simulation, the data were collected every ten minutes, and our task was to realize the iden-
tification of the degradation period and fault type for the future 50 min. The classification
results are shown in Figure 6.

Figure 6. Confusion matrix of failure prediction results(“in” and “rl” denote inner and roller element
defects, respectively; “sl”, “se” and “fa” represent slight, severe, and failure periods).

As seen in the confusion matrix in Figure 6, the classification accuracy of the proposed
SVC model combined with the attention-Bi-LSTM prediction model can reach 0.944. Fur-
thermore, the classification accuracy of the failure mode can even reach 0.985. According to
the background running data, it can be found that samples with classification errors are
distributed at the connection of two adjacent periods, whereas the other samples are almost
classified correctly. In summary, we can achieve short-term predictions of failure types and
degradation periods through the use of our proposed fault prediction method.

Remark 1. The time-series signals in each window have one class label. The window size is
determined using trial and error with a simulation method. The final accuracy of the prediction
results with different window sizes is listed in Table 8. Table 8 shows a comparison of the test
accuracy, sampling time, and test time of the proposed fault prediction method with three different
Entropy 2022, 24, 164 15 of 16

window sizes. From Table 8, it can be seen that under the window size of 256, the test accuracy
of the proposed method was higher than that of 2048 and 4096. This is because when the window
size is larger, the number of windows is fewer. This directly leads to a lack of training set data
in the feature prediction stage, especially for severe and failure periods, and the key information
cannot be captured during feature prediction. As a result, the final classification accuracy of the
predicted features would be lower. On the other hand, although the test time with a window size
of 256 was about 0.01 s more than the other two cases, the fault prediction accuracy was about 0.1
higher. Therefore, the window size was chosen to be 256 in this simulation. When the programming
environment and CPU change, the fault prediction time will also change accordingly.

Table 8. Window size and its influence on failure prediction.

Window Size×the The Accuracy of Fault Sampling Time Fault Prediction


Number of Windows Mode Prediction (s) Time (s)
4096×5 0.8 0.2 0.244
2048×10 0.86 0.1 0.248
256×80 0.944 0.0125 0.255

5. Conclusions
In this study, we divided the fault prediction task into three stages: feature extraction,
feature prediction, and fault mode classification. In the first stage, the spatiotemporal
features of the degradation period and fault mode are extracted through CNN-LSTM, based
on vibration signals from multiple sensors. In the second stage, the features are sent to a
Bi-LSTM network with an added attention mechanism to predict the feature trends. The Bi-
LSTM method can take into account two-way sequences, and the attention mechanism can
make the Bi-LSTM network work more efficiently by adjusting the weights. Finally, an
SVC is used to classify the predicted spatiotemporal features of the deterioration period
and fault type simultaneously. The IMS dataset was used to illustrate the effectiveness
of the proposed fault prediction method. The simulation results show that a short-term
prediction of deterioration period failure modes was achieved using the established fault
prediction model. This would be helpful in arranging a maintenance plan in an industrial
production setting.
The efficiency of the proposed three-stage fault prediction method is based on the
premise that the training set and test set obey the same distribution, with plenty of samples.
However, in real engineering scenarios, rolling bearings usually work in normal conditions
under different work conditions, which leads to different distributions and few fault data.
Therefore, fault prediction strategies should be investigated considering these problems in
the future work.

Author Contributions: Conceptualization, H.P. and M.R.; methodology, H.P., S.W., H.L. and Y.Z.;
software, S.W.; validation, H.P., S.W. and M.R.; formal analysis, M.R.; investigation, H.P.; resources,
H.P. and K.G.; writing—original draft preparation, S.W. and M.R.; writing—review and editing, S.W.
and M.R. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Natural Science Foundation of China (No. 61973226) and
the Key Research and Development Program of Shanxi Province (No. 201903D121143).
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Baraldi, P.; Di Maio, F.; Zio, E. Unsupervised clustering for fault diagnosis in nuclear power plant components. Int. J. Comput.
Intell. Syst. 2013, 6, 764–777. [CrossRef]
2. Kim, K.; Bartlett, E.B. Nuclear power plant fault diagnosis using neural networks with error estimation by series association.
IEEE Trans. Nucl. Sci. 1996, 43, 2373–2388.
Entropy 2022, 24, 164 16 of 16

3. Gong, Y.; Su, X.; Qian, H.; Yang, N. Research on fault diagnosis methods for the reactor coolant system of nuclear power plant
based on DS evidence theory. Ann. Nucl. Energy 2018, 112, 395–399. [CrossRef]
4. Lu, B.; Upadhyaya, B.R. Monitoring and fault diagnosis of the steam generator system of a nuclear power plant using data-driven
modeling and residual space analysis. Ann. Nucl. Energy 2005, 32, 897–912. [CrossRef]
5. Rai, V.K.; Mohanty, A.R. Bearing fault diagnosis using FFT of intrinsic mode functions in Hilbert–Huang transform. Mech. Syst.
Signal Process. 2007, 21, 2607–2615. [CrossRef]
6. Fakhfakh, T.; Bartelmus, W.; Chaari, F.; Zimroz, R.; Haddar, M. Condition Monitoring of Machinery in Non-Stationary Operations;
STFT Based Approach for Ball Bearing Fault Detection in a Varying Speed Motor; Springer: Berlin/Heidelberg, Germany, 2012;
pp. 41–50.
7. Chen, J.; Li, Z.; Pan, J.; Chen, G.; Zi, Y.; Yuan, J.; Chen, B.; He, Z. Wavelet transform based on inner product in fault diagnosis of
rotating machinery: A review. Mech. Syst. Signal Process. 2016, 70, 1–35. [CrossRef]
8. Eren, L.; Ince, T.; Kiranyaz, S. A generic intelligent bearing fault diagnosis system using compact adaptive 1D CNN classifier.
J. Signal Process. Syst. 2019, 91, 179–189. [CrossRef]
9. Zhao, H.; Sun, S.; Jin, B. Sequential fault diagnosis based on LSTM neural network. IEEE Access 2018, 6, 12929–12939. [CrossRef]
10. Qiao, M.; Yan, S.; Tang, X.; Xu, C. Deep convolutional and LSTM recurrent neural networks for rolling bearing fault diagnosis
under strong noises and variable loads. IEEE Access 2020, 8, 66257–66269. [CrossRef]
11. Peng, Y.; Liu, D.; Peng, X. A review: Prognostics and health management. J. Electron. Meas. Instrum. 2010, 24, 1–9. [CrossRef]
12. Liu, J.; Wang, W.; Ma, F.; Yang, Y.B.; Yang, C.S. A data-model-fusion prognostic framework for dynamic system state forecasting.
Eng. Appl. Artif. Intell. 2012, 25, 814–823. [CrossRef]
13. Lei, Y.; Li, N.; Gontarz, S.; Lin, J.; Radkowski, S.; Dybala, J. A model-based method for remaining useful life prediction of
machinery. IEEE Trans. Reliab. 2016, 65, 1314–1326. [CrossRef]
14. Ren, L.; Sun, Y.; Wang, H.; Zhang, L. Prediction of bearing remaining useful life with deep convolution neural network. IEEE
Access 2018, 6, 13041–13049. [CrossRef]
15. Xia, M.; Li, T.; Shu, T.; Wan, J.; De Silva, C.W.; Wang, Z. A two-stage approach for the remaining useful life prediction of bearings
using deep neural networks. IEEE Trans. Ind. Inform. 2018, 15, 3703–3711. [CrossRef]
16. Xu, W.; Liu, W.B.; Zhou, M.; Yang, J.F.; Xing, C.H. Application of Neural Network Model for Grey Relational Analysis in Bearing
Fault Prediction. Bearing 2012. [CrossRef]
17. Xu, H.; Ma, R.; Yan, L.; Ma, Z. Two-stage prediction of machinery fault trend based on deep learning for time series analysis.
Digit. Signal Process. 2021, 117, 103150. [CrossRef]
18. Park, J.W.; Sim, S.H.; Jung, H.J. Displacement estimation using multimetric data fusion. IEEE/ASME Trans. Mechatron. 2013,
18, 1675–1682. [CrossRef]
19. Olofsson, B.; Antonsson, J.; Kortier, H.G.; Bernhardsson, B.; Robertsson, A.; Johansson, R. Sensor fusion for robotic workspace
state estimation. IEEE/ASME Trans. Mechatron. 2015, 21, 2236–2248. [CrossRef]
20. Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015,
arXiv:1508.04025.
21. Tang, Y. Deep learning using linear support vector machines. arXiv 2013, arXiv:1306.0239.
22. The Bearing Dataset was Provided by the Center for Intelligent Maintenance Systems (IMS), University of Cincinnati. Available
online: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/ (accessed on 2 November 2021).
23. Qiu, H.; Lee, J.; Lin, J.; Yu, G. Wavelet filter-based weak signature detection method and its application on rolling element bearing
prognostics. J. Sound Vib. 2006, 289, 1066–1090. [CrossRef]
24. Hong, S.; Zhou, Z.; Zio, E.; Hong, K. Condition assessment for the performance degradation of bearing based on a combinatorial
feature extraction method. Digit. Signal Process. 2014, 27, 159–166. [CrossRef]
25. Yan, M.; Xie, L.; Muhammad, I.; Yang, X.; Liu, Y. An effective method for remaining useful life estimation of bearings with elbow
point detection and adaptive regression models. ISA Trans. 2021. [CrossRef]

You might also like