Automatic Recognition of Power Quality Disturbances: Sea L
Automatic Recognition of Power Quality Disturbances: Sea L
=
= 9 = 9
F
q and t are discrete Doppler and lag respectively
*
[ , ] [ ] [(( )) ]
N
n v n v n t t 9 = +
Ambiguity Plane
(d)
(f)
(e)
Figure 8 AP corresponding to Harmonics
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
Figure 9 AP corresponding to capacitor fast switching
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
Figure 10 AP corresponding to capacitor slow switching
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
Figure 11 AP corresponding to voltage sudden sag
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
Figure 12 AP corresponding to voltage gradual decay
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
Figure 13 AP corresponding to voltage swell
20 40 60 80 100 120 140 160
20
40
60
80
100
120
140
160
(a)
(f) (e)
(d) (c)
(b)
To extract features, class-
dependent kernels need to be
designed for smoothing the
ambiguity plane.
Intuitive Feature Extraction From AP
11
2, 1 2
32
1
12 1, 1 1
21 22
31 3, 1 3
1,1 1, , 1
2
2 1,
1 , 1
n
n n
n n
n n n n
n n n
n
n
n
n n
n
ap ap ap
ap ap
ap ap ap
a
ap
ap ap
ap
ap
ap
p ap ap
ap ap
AP
ap
(
(
(
(
=
(
(
(
(
(
11
2, 1 2
32
1, 1
2
n n
n n
n
ap
ap ap
ap
ap
ap
AP
(
(
(
(
(
(
=
(
(
(
2, 1
2
11
32
2
1, 1
n
n
n
n n
ap
ap
ap
ap
ap
ap
(
(
(
(
(
(
(
(
(
How to Choose
Feature Points
From the AP ???
Design class-dependent TFR from AP
Features for a pattern recognition task should
Maximize the separability of signals from different
classes
Maximize the similarity of signals from the same
class
Select those points whose
Between-class variances are largest
Within-class variances are smallest
Design the Kernel
( ) ( )
2
1 1
( ) 2
1
| [ , ] [ , ] |
[ , ]
( [ , ])
C C
i j
ij
i j
C
c
i
w A A
FDK
q t q t
q t
o q t
= =
=
12
11
10
9
8
7
6
5
4
3
2
1
f
f
f
f
f
f
f
f
f
f
f
f
Feature Vector
Overview of Algorithm II
V /I Waveforms
Types of PQ
Disturbances
2nd Level
Wavelet
Decomposition
and Reconstruction
WTCs > T
9 Level
Multiresolution
signal Decomposition
(MSD)
Classification by
Neural Networks
Feature Extraction
From MSD Matrix
NOT PQ
Disturbances
no
(For first 3 level WTCs)
Moving Average
Filter
yes
Wavelet Analysis
A mathematical tool for signal analysis
A wavelet is a short-term duration wave, which grows and
decays essentially in a limited time period
RP of Morlet Mother Wavelet
Daubechies 4 Mother Wavelet
Tell us how weighted average of certain other functions
vary from one averaging period to the next
Why Use Wavelets?
Fourier Analysis
Periodic time functions
Wide bandwidth for short term transients
Not consider frequencies that evolve in time
Suffer from certain annoying anomalies
Gibbss phenomenon
Aliasing (With FFT)
FFTs computation complexity -- O(N*log
2
N)
Wavelet Analysis
Choose desirable frequency and time characteristics
Use short windows at high frequencies and low windows at low
frequencies
Basic functions employ time compression or dilation
Freedom in the choice of mother wavelet
WTs computation complexity -- O(N)
Wavelet Transform
A family of scaling functions and wavelet functions
are generated by dilating and shifting the mother
wavelet and scaling function
Calculate wavelets and scaling coefficients based on
following inner products
*
, , ,
( ), ( ) ( ) (2 )
j
j k j k j k
w x t t x t t k = < > =
}
/ 2
,
( ) 2 (2 ), ,
j j
j k
t t k j k eZ
/ 2
,
( ) 2 (2 )
j j
j k
t t k | |
*
, , ,
( ), ( ) ( ) (2 )
j
j k j k j k
v x t t x t t k | | = < > =
}
Wavelet MSD
WT decomposes a signal into different scales with multiple
levels of resolution by dilating a single mother wavelet
Decomposes a signal into its detailed and smoothed
versions
h(n)
g(n)
2
2
c
1
(n)
c
0
(n)
h(n)
g(n)
2
2
c
2
(n)
d
1
(n)
d
2
(n)
Wavelet MSD
Wavelet MSD
Conventional Methods
Use WTCs in a certain row --- lose information on the other scales
Use energy values ||d
i
||
2
of all rows --- their weights are not equal
c
0
d
1
d
2
d
M
c
M
Wavelet MSD
Basic ideas
Divide WTCs in the MSD matrix into disjoint clusters
Each cluster contributes one feature
More important frequency/scale ranges have a larger number of clusters
WTCs producing more features are more important in classification
One possible result of grouping clusters (24 features)
Scale m WTC's in MSD Matrix
1 {d
11
, d
12
, d
13
, , d
1,259
}
2 {d
21
, d
22
, d
23
, , d
2,133
}
3 {d
31
, d
32
, d
33
, , d
3,70
}
4 {d
41
, d
42
, , d
46
}{d
47
, d
48
, , d
4,38
}
5 {d
51
, d
52
, , d
5,22
}
6 {d
61
, d
62
, , d
6,9
}{d
6,10
, d
6,12
, , d
6,14
}
7 {d
71
, , d
73
}{d
74
, d
75
}{d
76
}{d
77
, , d
7,10
}
8 {d
81
, d
82
, d
83
}{d
84
, d
85
}{d
86
}{d
87
, d
88
}
9 {d
91
, d
92
, d
93
}{d
94
}{d
95
}{d
96
, d
97
}
9 {d
91
, d
92
, d
93,
d
94
}{d
95
}{d
96
}{d
97
}
Wavelet MSD
Cluster Determination
Clusters are determined by a set of training signals
Peaks of wavelet coefficients indicate the occurrences of PQ
events
Coefficients at same position of different MSD matrices
form independent random variables
Clusters are constructed around the high peaks of wavelet
coefficients
Feature Extraction
Divide the MSD Matrix according to the cluster pattern
Determine all the d features, u
1
, , u
d,
2
|| ||
i
i i
U
u r v = =
Wavelet MSD
0 10 20 30 40 50 60 70 80 90 100
0
0.01
0.02
0.03
Clustering a rowvector in the MSD matrix
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Classifier Neural Networks
Input: signal signatures from feature extractors
Output: class type identified
I
n
p
u
t
P
a
t
t
e
r
n
s
O
u
t
p
u
t
D
e
c
i
s
i
o
n
Weights
Input layer
Hidden layer
Output layer
Neuronal Model
Activation Function
Bipolar Sigmoid y=T(s)
Y
X
N
X
j + 1
X
1
X
j
T
1
N
n
n
X
=
s=
Neuronal Model
W
1
W
j
W
j + 1
W
N
W
n
Basic Idea of Error Backpropagation
( ) ( ) 2
1
|| ||
( , ) / 0,
( , ) / 0
Q
q q
q
nm
mj
E t z
E w u w
E w u u
=
=
c c =
c c =
y
w
y f w
df( )/dw
= ( )
slope= w(0)
w(0)
w(1)= w(0)- (df(w(0))/dw) q
Learning rate
Linear Gradient Decent
Minimum Sum-Squared
Error Methodology
Testing of Algorithm I
RESULTS OF TESTING CLASSIFICATION METHOD
(6 CLASSES CASE)
Class Tested
Correctly
Identified
Mistaken
to C 1
Mistaken
to C 2
Mistaken
to C 3
Mistaken
to C 4
Mistaken
to C 5
Mistaken
to C 6
1. Harmonics
100%
-
0%
0%
0%
0%
0%
2. Capacitor
fast switching
transients
100%
0%
-
0%
0%
0%
0%
3. Capacitor
slow switching
transients
94% 0%
6%
-
0%
0%
0%
4. Voltage
sudden sag
92%
0%
0%
1%
-
7%
0%
5. Voltage
gradual decay
sag
93%
0%
0%
7%
0%
-
0%
6. Voltage
swell
100%
0%
0%
0%
0%
0%
-
Demo of PQ Event Recognition System
PSCAD/EMTDC
Visual power system simulator
Developed by Manitoba HVDC research
center
Simulate electromagnetic transients for DC
and AC
PSCAD is the user interface
EMTDC is the simulation engine.
Similar to EMTP and ATP but faster
Complete Circuit- AC system
AC system Continued
Dranetz BMI
Power Platform 4300
Specifications
Sampling Frequency: 7kHz
Update rates: once per second (Harmonic-based
parameters updated every 5 seconds)
Voltage: 10 600 Vrms
Frequency: Fundamental range 16 - 450 Hz
Current: depend on the current probes
(TR-2520: 300A - 3000A RMS)
Partnerships
Signal databases are being built with possible
help from
R.W. Beck
Bonnevile Power Administration
SRP (Salt River Project)
University of Washington Physical Plant
American Public Power Association
Classifier HMM (Hidden Markov Model)
Initially introduced in late 60s and early 70s
Extended from Markov process
Utilized extensively in a wide range of applications
Pattern recognition, especially speech recognition
Biological signal processing, e.g., gene prediction in DNA
Artificial intelligence, image understanding
Possible advantages as a classifier
Very good classification performance
Very competitive learning speed
Requires small number of training examples
Classifier HMM
Discrete Markov process (first order)
P[q
t
=S
j
| q
t-1
=S
i
,
q
t-2
=S
k
,]=P[q
t
=S
j
| q
t-1
=S
i
]=a
ij
Example:
State 1: rain; State 2: cloudy; State 3: sunny. (This is B.)
State transition matrix,
Given day 1(t=1) is sunny (state 3) (This is .)
What is the probability that next 7 days will be sun-sun-
rain-rain-sun-cloudy-sun? (This is O.)
If current state is known, past states is useless for
predicting future states
0.4 0.3 0.3
{ } 0.2 0.6 0.2
0.1 0.1 0.8
ij
A a
(
(
= =
(
(
Classifier HMM
Coin toss model (a HMM model)
Can only see the results, but dont know whats going on
Observation sequence: O = H H T T T H T T H H
Underlying state transition model, matrix A={a
ij
}
Probability model between observations and states, B={b
j
(k)}
Elements of an HMM
N, number of states
A & B
, initial state distribution
Complete HMM model = (A, B, )
Classifier HMM
A HMM poses three questions
Evaluation P(O| )
Given the model, probability of the observation sequence?
Forward-backward procedure
Decoding hidden state transition sequence
Given O, , best explanation of observations?
Viterbi Algorithm
Learning = arcmax P(O| )
Given O, how to adjust = (A, B, ) to approach the real
model
Expectation Maximization (EM) Algorithm
Classifier HMM
Tree structure constructed (HMT)
Black circles hidden states
White circles observations
State transition matrix A?
Probability model (B) for Wavelets
coefficients Gaussian Mixture Model
+
S
1
S
2
Classifier HMM
Pdf for the a wavelet coefficient,
Wavelet-based HMM constructed,
How to calculate (train) the HMM? -- EM algorithm
Select an initial model
0
, m=0
E step: p(S|w,
m
)
M step: set
m+1
=arg max
E
S
[ln f(w, S|)|w,
m
]
Set m=m+1. if converged, then stop; else, go to E step
, , , ,
, | , ,
1
( ) ( ) ( | ), 2 .
j k j k j k j k
L
W j k s W S j k j k
i
f w p l f w S L here
=
= =
( , , ) A B t
Classifier HMM
Maximum likelihood Classification
Compare between 2 classes
Many-class classification (finding the shortest distance
of signal likelihood)
( | )
1,
( | )
m
n
f W
if class m wins
f W
A >
1
argmin | ln ( , | ) ln ( , | ) |
J
m m d T d m
j
C f S W f S W
=
=