Applied Soft Computing: Ali Akbar Abdoos, Peyman Khorshidian Mianaei, Mostafa Rayatpanah Ghadikolaei
Applied Soft Computing: Ali Akbar Abdoos, Peyman Khorshidian Mianaei, Mostafa Rayatpanah Ghadikolaei
a r t i c l e i n f o a b s t r a c t
Article history: Power quality (PQ) issues have become more important than before due to increased use of sensitive elec-
Received 25 March 2015 trical loads. In this paper, a new hybrid algorithm is presented for PQ disturbances detection in electrical
Received in revised form power systems. The proposed method is constructed based on four main steps: simulation of PQ events,
24 September 2015
extraction of features, selection of dominant features, and classification of selected features. By using two
Accepted 18 October 2015
powerful signal processing tools, i.e. variational mode decomposition (VMD) and S-transform (ST), some
Available online 28 October 2015
potential features are extracted from different PQ events. VMD as a new tool decomposes signals into
different modes and ST also analyzes signals in both time and frequency domains. In order to avoid large
Keywords:
Feature selection
dimension of feature vector and obtain a detection scheme with optimum structure, sequential forward
Pattern recognition selection (SFS) and sequential backward selection (SBS) as wrapper based methods and Gram–Schmidt
Support vector machines orthogonalization (GSO) based feature selection method as filter based method are used for elimination
Signal analysis of redundant features. In the next step, PQ events are discriminated by support vector machines (SVMs) as
S-transform classifier core. Obtained results of the extensive tests prove the satisfactory performance of the proposed
Variational mode decomposition method in terms of speed and accuracy even in noisy conditions. Moreover, the start and end points of
PQ events can be detected with high precision.
© 2015 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.asoc.2015.10.038
1568-4946/© 2015 Elsevier B.V. All rights reserved.
638 A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646
its learning process is a very time consuming task and there is no where f(t) is the main signal to be decomposed, {uk } = {u1 , . . .,
definite rule for optimum setting of NNs’ structures. PNN needs uK } and {ωk } = {ω1 , . . ., ωK } implicates the set of all modes and
a large number of exemplar patterns to yield acceptable classifica- their center frequencies, respectively. ı(t) is the Dirac distribu-
tion accuracy. This obstacle leads to slow execution speed and large tion and * denotes convolution. In order to address the constraint,
memory requirements. both penalty term and Lagrangian multipliers are considered. The
In spite of large number of research works in the field of PQ combination of the two terms benefits both from the nice conver-
events detection, there is still a lack of analysis of extracted fea- gence properties of the quadratic penalty at finite weight, and the
tures effects on classifiers detection accuracy. On the other hand strict enforcement of the constraint by the Lagrangian multiplier.
large dimension of extracted features may mislead the classifier So, the above optimization problem is changed to unconstraint one
which results in the reduction of detection accuracy. So, redundant as below [22]:
features should be eliminated from extracted features. Moreover,
j
2
most of the proposed method are not able to detect the interval L({uk }, {ωk }, ) = ˛ ∂t ı(t) + t ∗ uk (t) e−iωk t
time in which the signal is distorted [3–17]. 2
k
In this study, a relatively large dimension feature vectors are
2
extracted using ST [21] and VMD [22], and more useful features are
selected applying several feature selection methods based on fil- + f (t) − u(t) + (t), f (t) − u(t) (2)
ter and wrapper. Wrapper based method are very time consuming
k 2 k
due to their huge computational burden while filters are faster since
they rank features based on intrinsic attributes. Sequential forward Then the alternate direction method of multipliers (ADMM) is
selection (SFS) [23] and sequential backward selection (SBS) [24] used for solving the original minimization problem (2) by find-
as wrapper based methods and Gram–Schmidt orthogonalization ing the saddle point of the augmented Lagrangian L in a sequence
(GSO) based feature selection [25] as a filter based method are used of iterative sub-optimizations. Plugging the solutions of the sub-
to eliminate redundant features. In the last step, dominant selected optimizations into the ADMM, and directly optimizing in Fourier
features are discriminated by support vector machines (SVMs) clas- domain, the complete algorithm for variational mode decomposi-
sifier [26,27]. The proposed detection scheme has advantages from tion is summarized in the following algorithm [22].
the following aspects:
Complete
Algorithm:
optimization of VMD
Initialize û1k , ωk1 ,n ← 0
• VMD and ST have few tuning parameters as compared to WT repeat
which has many mother wavelet filters and decomposition levels. n←n+1
• SVM classifier with simple structure and only few adjustable for k = 1: K
do
ωû (ω) dω
2
accuracy of the proposed method.
←
0 ∞
k
Update ωk : ωkn+1
• Results have been presented for both filter and wrapper based ûkn+1 (ω)2 dω
feature selection methods. end for
0
• The start and end points of events can be detected by using mode
≥ 0:
Dual ascent for all ω
2 of VMD analysis.
• The proposed algorithm is robust in noisy conditions. ˆ n (ω) +
ˆ n+1 (ω) ← f̂ (ω) − ûn+1
k
(ω)
k
2 2
2. Required tools until convergence: ûn+1
k
− ûnk /ûnk < ε
2 2
k
2.1. Variational mode decomposition According to above algorithm, uk and ωk should be updated to
realize the VMD analysis process. To update the modes, the opti-
The VMD is a signal processing technique that decomposes a mization problem of relation (2) is solved with respect to uk . This
real valued signal into different levels named modes uk , which sub optimal problem is represented as follows [22]:
have specific sparsity properties while producing main signal. It ⎧
is assumed that each mode k to be concentrated around a center ⎨ 2
j
pulsation ωk determined during the decomposition process. Thus, un+1 = arg min ˛∂t ı(t) + ∗ uk (t) e−iωk t
k uk ∈ X ⎩ t 2
the sparsity of each mode is chosen to be its bandwidth in spec-
2 ⎫
tral domain. In order to obtain the mode bandwidth, the following
steps should be implemented: (1) applying Hilbert transform to ⎬
(t)
each mode uk in order to obtain unilateral frequency spectrum. (2) +f (t) − ui (t) + (3)
Shifting the mode’s frequency spectrum to “baseband”, by using an 2 ⎭
i 2
exponential tuned to the respective estimated center frequency. (3)
Estimation of the bandwidth through the H1 Gaussian smoothness
The solution of this quadratic optimization problem is readily
of the demodulated signal, i.e. the squared L2 – norm of the gra-
found by letting the first variation vanish for the positive frequen-
dient. Thus, the decomposition process is realized by solving the
cies [22]:
following optimization problem [22]:
f̂ (ω) − û (ω) + ((ω)/2) ˆ
j
2
ûn+1 (ω) =
/ k i
i=
(4)
min ∂t [ ı(t) + t ∗ uk (t)]e−iωk t k
1 + 2˛(ω − ωk )2
2
k (1)
which is clearly identified as a Wiener filtering of the current resid-
s.t. uk = f (t) ual, with signal prior 1/(ω − ωk )2 . The full spectrum of the real
k mode is then simply obtained by Hermitian symmetric completion.
A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646 639
Conversely, the real part of the inverse Fourier transform of this SVM can also be used in nonlinear classification using kernel
filtered analytic signal yield the mode in time domain. function. The nonlinear mapping function ϕ is applied to map the
In the second subproblem, the optimization problem of rela- original data x into a high-dimensional feature space, where the lin-
tion (2) is solved with respect to ωk . The center frequencies do not ear classification is possible. Then, the nonlinear decision function
appear in the reconstruction fidelity term, but only in the band- is:
width prior. The relevant problem thus reads [22]: ⎛ ⎞
2
m
The solution of above suboptimization problem in frequency where K(xi , xj ) is called the kernel function, K(xi , xj ) =
(xi )
(xj ). The
domain is obtained as follows [22]: most commonly used kernel functions are as follows [26,27]:
∞ 2
ωûk (ω) dω
ωkn+1 = 0 (1) Linear: K(xi , xj ) = xiT xj
∞ (6)
ûk (ω)2 dω d
(2) Polynomial: K(xi , xj ) = (ˇxiT xj + r) , ˇ > 0
0
(3) Gaussian radial basis function: K(xi , xj ) = exp(−xi −
which puts the new ωk at the gravity center of the correspond-
xj2 ), > 0
ing mode’s power spectrum. This means carrier frequency is the
frequency of a least squares linear regression to the instantaneous (4) Sigmoid: K(xi , xj ) = tanh(ˇxiT xj + r)
phase observed in the mode [22].
Here ˇ, r, and d are kernel parameters. Currently in the lit-
2.2. Support vector machines erature, there is no method available for deciding the value of C,
for choosing the best kernel function and for setting the kernel
The purpose of SVM is to find an optimal separating hyperplane functions. As the proper setting of the SVM parameters has direct
by maximizing the margin between the separating hyperplane and effect on the detection accuracy, the appropriate kernel function
m and other parameters are obtained using heuristic optimization
the data set [26]. Given a set of data T = xi , yi where xi ∈ Rn
i=1
techniques such as continuous ant colony optimization algorithms
denotes the input vectors, yi ∈ +1, −1 stands for two classes,
[28].
and m is the sample number. It is supposed that the hyperplane
The SVMs find an optimal hyperplane on the feature space.
f(x) = 0 separates the given data which are linearly separable.
Therefore, to classify more than two classes, two straightforward
m
approaches could be used. The first approach is to compare class by
f (x) = w · x + b = wk · xk + b = 0 (7) class with several machines and combine the outputs using some
k=1 decision rule. The number of machines Nm needed for the m classes,
where w and b denotes the weight and bias terms. These variables separation problem is given by [26,27]:
adjust the position of the separating hyperplane. The separating m!
Nm = (14)
hyperplane should satisfy the following constraints: 2(m − 2)!
yi f (xi ) = yi (w · xi + b) ≥ 1, i = 1, 2, . . ., m (8) In this approach, each class is associated with m − 1 outputs. The
advantage of such method is that it gives information about class-
Positive slack variables i are defined to measure the distance
by-class separation, which could be used in another system to solve
between the margin and the vectors xi that lie on the wrong side of
the problem of simultaneous events. In this work, a simple decision
the margin. Then, the optimal hyperplane separating the data can
rule is used: the winner class is the one which all related outputs
be obtained by the following optimization problem:
m − 1 are greater than zero.
1 m
The second approach to solve the multiple class (m > 2) separa-
Minimize w2 + C i , i = 1, 2, . . ., m tion problem using SVMs is to compare one class against others.
2
Therefore, the required number of machines Nm is the same num-
i=1
(9)
yi (w · x + b) ≥ 1 − i ber of classes m. The decision rule applied to the m outputs is the
subject to winner takes all. In this case, the decision rule is simplified because
i ≥ 0 each output is related to one class. An advantage of this solution is
the expected lower computational cost in comparison to the first
where C is the error penalty.
approach as fewer machines are needed.
By introducing the Lagrangian multipliers ˛i , the above-
mentioned optimization problem is transformed into the dual
2.3. Sequential forward selection
quadratic optimization problem, that is:
m
1
m
Sequential forward selection was first proposed in [23]. It opera-
Maximize L(˛) = ˛i − ˛i ˛j yi yj (xi , xj ) (10) tes in the bottom-to-top manner. The selection procedure starts
2
i=1 j=1 with an empty set initially. Then, at each step, the feature maximiz-
ing the criterion function is added to the current set. This operation
m
subject to ˛i yi = 0, ˛i ≥ 0, i = 1, 2, . . ., m continues until the desired number of features is selected. The nes-
(11)
ting effect is present such that a feature added into the set in a step
i=1
cannot be removed in the subsequent steps [11]. As a consequence,
Thus, the linear decision function is created by solving the dual SFS method can offer only suboptimal result.
optimization problem, which is defined as:
⎛ ⎞ 2.4. Sequential backward selection
m
f (x) = sign ⎝ ˛i yi (xi , xj ) + b⎠ (12)
Sequential backward selection [24] method proposed in works
ij=1
in a top-to-bottom manner. It is the reverse case of SFS method.
640 A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646
Feature classification
Classification of selected features 3.1. Generating of PQ disturbances
(Detection of PQ type)
Table 1 shows a detailed summary of PQ disturbances types
Interval detection as well as their controlling parameters, definitions, and equa-
Use mode 2 to detect the start and end points of events tions. Generating of PQ exemplar signals by adjustable parametric
equations for recognition purpose has advantages from different
viewpoints. The parameters allow the training and testing PQ dis-
turbances to be changed in a wide range and in a controlled manner.
End The simulated signals in this way are also very similar to the PQ
events occur in real power systems. On the other hand, the general-
ization capability of classifier core improves using different signals
Fig. 1. Flowchart of the proposed algorithm.
belonging to the same class. Two distinct data sets are generated
for training and testing phases. A total of 100 cases of each class
Initially, complete feature set is considered. At each step, a sin- with various parameters are generated for training and 100 cases
gle feature is deleted from the present set so that the criterion are generated for testing. Ten-cycle data window of PQ event sig-
function is maximized for the remaining features within the set. nals is considered for extraction of features. In real power system
Removal operation continues until the desired number of features the sampling frequency can be increased up to 10 KHz. If the high
is obtained. Once a feature is eliminated from the set, it cannot enter sampling frequency is considered in the simulation, more samples
into the set in the subsequent steps [11]. will appear in each cycle. More samples increase the signal res-
olution but the computational burden is consequently increased.
2.5. Gram–Schmidt orthogonalization based feature selection On the other hand, few samples give poor resolution of the signal
method which contains less information about it. For power system applica-
tions, high-frequency components usually do not appear in voltage
GSO process is a simple forward selection method which can signals. Therefore, the sampling frequency is set to 3200 which is
be effectively used for features ranking. Suppose the kth feature suitable for analysis of PQ disturbances in power systems. Thus,
of M patterns is denoted by vector Xk = [xk1 , xk2 , . . ., xkM ]T and each cycle contains 64 samples for power systems with frequency
Y = [y1 , y2 , . . ., yM ]T represents the vector of output target. In order of 50 Hz. According to Nyquist theorem, the harmonic contents
to select the best correlated feature with output, the cosine of angle can be monitored up to 1600 Hz which is suitable for analysis of
between each input feature Xk and target Y is calculated as an eval- PQ events. Our proposed method can be applied to systems with
uation criterion [25]: different sampling frequency but for higher sampling frequency,
the required time for implementation of the proposed method is
Xk · Y increased.
cos(ϕk ) = (15)
Xk Y
3.2. Feature extraction
where ϕk is the angle between input kth feature vector Xk and out-
put target Y, N is the number of all features and Xk · Y denotes the Feature extraction is the most important part of the intelligent
inner product between Xk and Y. If the output is fully proportional pattern recognition schemes. As mentioned in Section 1, many
to input, the ϕk is zero, and inversely if the output is fully uncorre- signal processing techniques have been used for extraction of fea-
lated to input, the ϕk is /2 [25]. So, in an iterative procedure the tures. Some DFT based algorithms analyze signal only in frequency
feature that maximizes the above mentioned evaluation criterion, domain and some DWT based algorithm are very sensitive. In this
is selected as the most correlated feature to target. For selection of paper ST as well as VMD are used as powerful signal processing
the next feature, the output vector and all other candidate features tools for extraction of potential features. ST can analyze signal in
are mapped to null space of the selected feature and then input both time and frequency domains, simultaneously. On the other
features and output vectors are updated with new data. The rank- hand VMD decomposes signal into different modes so that each
ing procedure is repeated until all candidate features are ranked, mode contains specific spectrum. Thus, VMD is not affected by noise
or when a predetermined stopping condition is met [25]. as it can separate high frequency components in a high level mode.
Moreover the VMD can trace the signal changes more accurately
3. Proposed method so that the start and end points of disturbances can be recognized
more accurately.
Fig. 1 demonstrates the flowchart of the proposed method. The After generation of the PQ events using parametric equations
proposed detection scheme is realized through four main steps: presented in Table 1, some features are extracted by analysis of
A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646 641
Table 1
PQ disturbance model.
disturbances using ST and VMD. Typical sag and swell signals as value, the duration interval of PQ disturbances can be determined
well as their decomposed modes are depicted in Figs. 2 and 3, precisely. Extensive simulations have been performed and it is
respectively. As it can be seen in Fig. 2, the mode 1 shows the found that by using the absolute value of mode 2 and setting the
approximation of the main signal which includes low frequency threshold value to 0.05, the duration of events are recognized with
components. As opposed to mode 1, the higher level modes, high detection accuracy. In this study some features are extracted
i.e. modes 2 and 3 include higher frequencies components. This from the decomposed modes. The standard deviation, energy and
attribute of the VMD helps to detect the start and end points maximum absolute value of each mode are calculated as extracted
of PQ events. It is clearly seen that at start and end points of features. Since three decomposition levels are considered in the
sag event, the magnitudes of modes 1 and 2 have an oscillatory analysis, nine features are extracted using VMD analysis.
behavior. Identically, this characteristic is also observed in the ST is also used for analysis of PQ events for extraction of some
swell waveform depicted in Fig. 3. By setting a proper threshold features. ST yields a complete visualization of the signal in both
1
Sag
-1
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
2
Mode 1
-2
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0.1
Mode 2
-0.1
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0.05
Mode 3
-0.05
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Time (sec)
Sag
0
-2
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Mode 1 0
-2
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0.2
Mode 2
-0.2
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
0.05
Mode 3
-0.05
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Time (sec)
time and frequency domains. The output of ST is a matrix with in the most researches in the field of PQ events detection techniques
complex elements whose rows pertain to frequencies and columns while it has important impacts from following aspects:
show the time. The maximum elements appear in columns and
rows yield magnitude and frequency contours, respectively. The • Required memory for data storage decreases and consequently
phase contour is obtained by calculating the correspondence phase less computational burden is needed for training time.
of the element having maximum amplitude in each column. The S- • The generalization capability of classifier increases and hence the
contours of previously illustrated sag and swell have been shown probability of convergence difficulties decreases.
in Figs. 4 and 5, respectively. Ten features are extracted using ST
analysis as follows [13]: Therefore, feature selection as an essential stage should be con-
sidered in the intelligent pattern recognition techniques.
• Feature 10: Standard deviation of the magnitude contour. There are many feature selection methods which can be gen-
• Feature 11: Energy of the magnitude contour. erally classified into two main groups: wrapper and filter. The
• Feature 12: Standard deviation of the frequency contour. wrapper method scores feature subset based on the performance
• Feature 13: Standard deviation of phase contour. of the forecaster model that needs a huge computational burden
• Feature 14: Energy of the frequency contour. [11,13]. Filter methods use arithmetic analysis of features without
• Features 15–19: Energy of contour levels 1–5. evaluation of forecaster model performance.
In this step, the most useful subset features are selected among
the extracted features because inseparable features cannot be rec-
3.3. Feature selection
ognized even by powerful classifiers. Feature selection is performed
using different methods based on filter and wrapper. SFS, SBS as
In pattern recognition schemes, elimination of redundant fea-
wrappers and GSO as a filter are examined to evaluate the perfor-
tures is a challenging problem. This main stage has been neglected
mance of the proposed method.
S-Contour S-Contour
120 120
110 110
100 100
90 90
Frequency (Hz)
0.16664
Frequency (Hz)
0.16664 0.26402
80 0.16664 80 0.26402
0.26402
0.33329 0.333 29 0.52803
70 0.49993 0.33329 70
0.52803 0.79205 0.52803
0.49993
0.66657 0.49993 1.0561
60 0.66657 60 0.79205 0.792 05
0.83322 0.83322 1.05611.3201 1.3201
0.66657
50 50
0.66657 1.0561
0.83322 0.79205 1.3201
0.49993 0.66657 0.79205
40 0.66657
0.49993 0.33329 0.49993 40 0.52803
1.0561
0.79205 0.52803
0.33329 0.16664 0.33329 0.26402 0.52803 0.26402
0.16664 0.16664 0.26402
30 30
20 20
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Time (Sec) Time (Sec)
Table 2
Acquired results considering only extracted features from ST analysis.
Table 3
Acquired results considering only extracted features from VMD analysis.
At last step, the SVM classifier is used as classifier core for To investigate the effect of selected features on the detection
detection of PQ events while subset selected features from prepro- accuracy of the proposed method, extracted features from ST and
cessing steps (feature extraction and feature selection) are utilized VMD analysis are considered separately. Tables 2 and 3 represent
for training and testing process. The radial basis function is selected the detection accuracy of the proposed method using extracted
as a kernel function of the SVM and the penalty factor (C) and the features obtained from ST and VMD analysis. For each feature
adjustable parameter of radial basis function () are determined selection method, the subset features with the highest accuracy
using improved ant colony optimization algorithm presented in and the lowest feature dimensions are shown in bold font. The first
[28]. After the detection of PQ disturbance type, the mode 2 is used nine features are those extracted using ST and the rest 10 features
for detection of start and end points of events. are extracted features from VMD analysis which are numbered
Table 4
SFS method with combination of all extracted features using VMD and ST.
1 12 61.88
2 10, 12 85.77
3 3, 10, 12 95.88
4 3, 10, 12, 14 97.44
5 3, 5, 10, 12, 14 98.77
6 3, 5, 10, 12, 13, 14 99.11
7 3, 5, 8, 10, 12, 13, 14 99.33
8 3, 5, 8, 10, 12, 13, 14, 16 99.33
9 3, 5, 6, 8, 10, 12, 13, 14, 16 99.33
10 3, 5, 6, 8, 10, 12, 13, 14, 16, 17 99.22
11 3, 5, 6, 8, 9, 10, 12, 13, 14, 16, 17 99.11
12 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17 99.33
13 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17 99.44
14 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18 99.44
15 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18 99.55
16 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19 99.22
17 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 98.66
18 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 96.33
19 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 96.11
Boldface indicates the highest accuracy together with the lowest feature dimensions.
644 A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646
Table 5
SBS method with combination of all extracted features using VMD and ST.
1 3 48.88
2 3, 11 86.44
3 3, 9, 11 96.00
4 3, 9, 10, 11 98.00
5 3, 9, 10, 11, 14 99.00
6 3, 9, 10, 11, 12, 14 99.22
7 3, 7, 9, 10, 11, 12, 14 99.33
8 3, 7, 9, 10, 11, 12, 13, 14 99.44
9 2, 3, 7, 9, 10, 11, 12, 13, 14 99.55
10 2, 3, 5, 7, 9, 10, 11, 12, 13, 14 99.55
11 2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14 99.55
12 2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 16 99.55
13 2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 16, 17 99.66
14 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17 99.44
15 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19 99.33
16 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19 99.00
17 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19 96.66
18 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19 96.55
19 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 96.11
Boldface indicates the highest accuracy together with the lowest feature dimensions.
Table 6
GSO method with combination of all extracted features using VMD and ST.
1 2 43.66
2 2, 7 57.55
3 2, 7, 9 81.66
4 2, 3, 7, 9 87.66
5 2, 3, 7, 9, 18 92.44
6 2, 3, 7, 9, 16, 18 92.88
7 2, 3, 7, 9, 12, 16, 18 93.88
8 2, 3, 7, 9, 12, 14, 16, 18 95.44
9 2, 3, 7, 9, 11, 12, 14, 16, 18 98
10 2, 3, 7, 9, 11, 12, 14, 15, 16, 18 98
11 2, 3, 6, 7, 9, 11, 12, 14, 15, 16, 18 98.11
12 2, 3, 6, 7, 8, 9, 11, 12, 14, 15, 16, 18 98.33
13 2, 3, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18 98.77
14 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18 96.66
15 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18 96.77
16 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 97
17 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 96.77
18 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 96.33
19 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 96.11
Boldface indicates the highest accuracy together with the lowest feature dimensions.
from 10 to 19. Table 2 shows that for all SFS, SBS and GSO methods, 13, 14 and 17 as dominant features have been repeated in all the
the best detection accuracy is 94.33. Features 3, 4, 5, 6, 7, 8 and 9 best solutions of different feature selection methods.
appear in all the best solutions, so these feature can be considered Obtained results show that extracted features using a single
as the “must select” features, regarding the classification accuracy. analysis tool cannot yield acceptable detection accuracy. Thus, in
Similarly, the detection accuracy of the proposed method is calcu- the following investigation, the combination of all extracted fea-
lated for extracted features while VMD by itself is used for analysis tures using both ST and VMD analysis tools are considered as
of PQ events. The best detection accuracies are 96.66%, 96.66% and candidate features. Then by applying different feature selection
93.77% for SFS, SBS and GSO methods, respectively. Features 12, methods to 19-dimension vector of extracted features, the best
Table 7
Percentage of correct classification results under noiseless and different noisy conditions considering three classifiers.
SVM ANN KNN SVM ANN KNN SVM ANN KNN SVM ANN KNN SVM ANN KNN
C1 100 99 99 100 100 100 100 100 100 100 100 100 100 100 100
C2 96 93 95 97 96 95 98 97 96 98 98 97 99 98 98
C3 99 94 95 100 95 96 100 95 97 99 96 97 99 97 97
C4 100 100 98 100 100 99 100 100 100 100 100 100 100 100 100
C5 99 96 97 99 97 98 99 98 99 100 99 100 100 99 100
C6 98 96 98 99 97 98 99 97 98 100 97 99 100 98 99
C7 99 96 97 100 97 98 100 98 98 100 98 98 99 98 99
C8 97 93 95 97 95 96 97 97 97 98 97 96 100 97 98
C9 95 94 95 96 95 95 97 96 96 99 97 97 100 98 97
Overall accuracy (%) 98.11 96 96.55 98.66 96.88 97.22 99.00 97.55 97.88 99.33 98.00 98.22 99.66 98.33 98.66
Boldface indicates the highest accuracy together with the lowest feature dimensions.
A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646 645
Table 8
Performance comparison in terms of percentage of correct classification results.
[7] WT + SVM Pure signal, sag, swell, interruption, harmonics, transient, 98.89 96.33
sag & harmonics, swell & harmonics, flicker
[8] WT + RVM Pure signal, sag, swell, interruption, harmonics, transient, 99.03 98.47
sag & harmonics, swell & harmonics, flicker
[9] WT + NN Pure signal, sag, swell, interruption, harmonics, sag & 95.71 89.92
harmonics, swell & harmonics
[10] WT + NN Frequency capacitor switching (high, low), normal, 92.3 –
impulsive transient, sag, interruption
[13] (WT + ST) + FS + PNN Pure signal, sag, swell, interruption, harmonics, transient, 99.22 97.44
sag & harmonics, swell & harmonics, flicker
[17] HT + RBFN Normal signal, sag, swell, harmonics, transients, voltage 97.00 94
flicker
This paper (ST + VMD) + FS + SVM Pure signal, sag, swell, interruption, harmonics, transient, 99.66 98.11
sag & harmonics, swell & harmonics, flicker
subset features are selected. Tables 4–6 represent the detection for SBS and GSO feature selection methods, respectively. Features
accuracy of different number of selected features. By applying the 2, 3, 6, 7 and 9 which have been extracted using ST appear in all the
SFS, 15-dimension feature subset yields the best detection accu- best subsets. On the other hand, features 10, 11, 12, 14 and 16 are
racy, i.e. 99.55%. As observed from Table 4, the detection accuracy of those selected features of VMD analysis which appear in all the best
the proposed method improves by dimension increase of selected solutions of applying different feature selection methods. Results
subset features until it reaches to its maximum value at which the demonstrate that the best performance belongs to SBS method with
subset feature vector size reaches to 15. Thereafter, the detection detection accuracy of 99.66%.
accuracy decreases until all extracted features are selected. Simi- Moreover, the signals in real electric power systems are usually
larly, Tables 5 and 6 give the calculated detection accuracies for SFS contaminated with noise. To consider the noisy conditions, additive
and GSO methods for different dimension of subset features. The white Gaussian noise (AWGN) is added to PQ events [8,13]. The sen-
best selected subsets yield 99.66% and 98.77% detection accuracy sitivity of the proposed algorithm under different noise levels with
X: 0.05789
0
Magnitude
Y: 0.1041
X: 0.01828
-0.5
Y: -0.1573
-0.5
-1
-1 -1.5
0 0.02 0.04 0.06 0.08 0 0.02 0.04 0.06 0.08
Time (sec) Time (sec)
a) Normalized three-phase voltages b) Mode 1
Mode 2 Mode 3
0.25 0.08
Phase A Phase A
0.2 Phase B 0.06 Phase B
Phase C Phase C
0.15 X: 0.02133
Y: 0.1095 0.04
0.1
0.02
Magnitude
Magnitude
0.05
0
0
-0.02
-0.05
-0.04
-0.1
-0.2 -0.08
0 0.02 0.04 0.06 0.08 0 0.02 0.04 0.06 0.08
Time (sec) Time (sec)
c) Mode 2 d) Mode 3
Fig. 6. Real three-phase sag voltages together with decomposed modes of VMD analysis. (a) Normalized three-phase voltages, (b) mode 1, (c) mode 2, (d) mode 3.
646 A.A. Abdoos et al. / Applied Soft Computing 38 (2016) 637–646