0% found this document useful (0 votes)
137 views

State Estimation

This document discusses linear least squares estimation for state estimation in electric power grids. It begins with an introduction to state estimation and its motivation to analyze the power network based on current measurements. It then presents an example circuit with four measurements and sets up the linear least squares problem to estimate the unknown state variables (node voltages and voltage source). The goal is to minimize the sum of squared errors between the actual and estimated measurements. Taking the gradient of the error function leads to the normal equations, whose solution provides the linear least squares estimate of the state.

Uploaded by

Fengxing Zhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views

State Estimation

This document discusses linear least squares estimation for state estimation in electric power grids. It begins with an introduction to state estimation and its motivation to analyze the power network based on current measurements. It then presents an example circuit with four measurements and sets up the linear least squares problem to estimate the unknown state variables (node voltages and voltage source). The goal is to minimize the sum of squared errors between the actual and estimated measurements. Taking the gradient of the error function leads to the normal equations, whose solution provides the linear least squares estimate of the state.

Uploaded by

Fengxing Zhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

State Estimation

1.0 Introduction
State estimation for electric transmission grids
was first formulated as a weighted least-squares
problem by Fred Schweppe and his research
group [1] in 1969 (Schweppe also developed
spot pricing, the precursor of modern-day
locational marginal prices LMPs which are a
central feature of electricity markets). Figure 0
below shows Dr. Schweppe (the one seated in
the chair).

Fig. 0
The basic motivation for state estimation is that
we want to perform computer analysis of the

network under the conditions characterized by


the current set of measurements.
Specifically, we want to know the values of the
bus voltage phasor magnitudes and angles |Vk|,
k for all k=1,,N buses in the network (we
assume 1=0 so we do not need to find that one).
We begin with linear least squares estimation.
2.0 Linear least squares estimation
The material in this section closely follows that
in [2, chapter 2].
Consider the circuit given in Fig. 1 below where
current injections I1, I2, and voltage E are
unknown. Let R1=R2=R3=1.0 . The
measurements are as follows:
meter A1: i1,2=1.0 Ampere
meter A2: i3,1=-3.2 Ampere
meter A3: i2,3=0.8 Ampere
meter V: e=1.1 volt

The problem is to determine the state of the


circuit, which in this case is nodal voltages v1,
v2, and the voltage e across the voltage source.
+
Node 1

I1

V -

R1

Node 2

A1
+

R2

A3
R3

I2

A2

Node 3

Fig. 1
Lets write each one of the measured currents in
terms of the node voltages, and we may also
write down our one voltage measurement.
v v e
i1m, 2 1 2
v1 v2 e 1.0
(1)
1
0 v1
i3m,1
v1 3.2
(2)
1
v 0
i2m,3 2
v2 0.8
(3)
1
e 1.1
(4)
Expressing all of the above in matrix form:
3

1 1 1
1.0
1 0 0 v1 3.2

v 2

0 1 0 0.8

0 0 1
1.1

(5)

Lets denote terms in eq. (5) as A, x, and b, so:


Ax b
(6a)
where
1 1 1
1 0 0
,
A
0 1 0

0
0
1

v1
x v2 ,
e

1.0
3.2

b
0.8

1
.
1

(6b)

How do we solve eq. (5)?


Observe that the multiplying matrix is not
square, i.e., there are 4 rows but only 3 columns.
The reason for this is because there are 4
equations but only 3 variables. This means that
the system of equations defined by eq. (5) is
over-determined. This is a standard feature in
state-estimation. Noticing that there is one
equation for each measurement, the implication
is that we will attempt to always obtain as many
measurements as we can.
4

There is no single solution to eq. (5), but there is


a single solution that is normally thought of as
best. This solution is the one that minimizes
the sum of the squared error between what
should be computed by each equation, which is:
1.0
3.2

b
0. 8

1
.
1

(7)

and what is computed by each equation, which


is:
1 1 1
1 0 0 v1
v2
Ax
0 1 0

e
0 0 1

(8)

The difference, or error, is then:


1.0 1 1 1
1.0 v1 v2 e
v
3.2 1 0 0 1 3.2 v

v2

b Ax

0.8 0 1 0
0.8 v2

1.1 e
1.1 0 0 1

The squared error is then

(9)

1.0 v1 v2 e 2

3
.
2

v
1

2
0.8 v

1.1 e2

and the sum of the squared errors is:

1.0 v1 v2 e2 3.2 v1 2 0.8 v2 2 1.1 e2

Careful tracking of the previous expression will


indicate that it could be written as
b AxT b Ax

Lets multiply the above by and give it a


name:
J

1
b Ax T b Ax
2

(10)
Our problem is then to choose x so as to
minimize J. (This is an unconstrained
minimization problem.) We can do this setting
the gradient of J with respect to x to zero and
solving for x.
To do this, we expand J as follows:

1
b Ax T b Ax 1 bT Ax T b Ax
2
2
1 T
T
T
T
b b Ax b b Ax Ax Ax
2

Using (Ax)T=xTAT, we have:


6

(11)

1 T
b b xT AT b bT Ax xT AT Ax
2

(12a)
Consider the second and third terms in (12a).
Using a 2x2 to illustrate,
xT AT b x1

A
x2 11
A12

A
bT Ax b1 b2 11
A21

A21 b1
b1x1 A11 b1x2 A12 b2 x1 A21 b2 x2 A22
A22 b2
A12 x1
b1x1 A11 b2 x1 A21 b1x2 A12 b2 x2 A22
A22 x2

we observe these terms are equal. Therefore,


(12a) becomes
J

1 T
b b 2bT Ax xT AT Ax
2

(12b)
To remind us all about gradients, we recall that
it is given by (13):
J
x
1
J
x J x2

J

x
n

(13)

Now (12b) is written in compact notation, and it


may not be obvious how to differentiate each
term in it. To assist with this, I provide the
following relations:
7

Function Gradient
#1
#2
#3
#4

F xT b

xF b

F bT x

xF b

F xT Au

x F Au

F uT Ax

x F AT u

#5 F xT Ax

x F 2 AT x

The above gradient relations apply to (12b) as


illustrated in (12c).
#5
# 4

1 T
T
T T
J b b 2b A x x A A x

(12c)

Using the appropriate relations in the above


table (#4 to the second term and #5 to the third
term), the gradient of (12c) can be expressed as:
1
x J 2 AT b 2( AT A)T x
(14)
2
Again, using (Ax)T=xTAT, the second term is
2( AT A)T x 2 AT Ax

so (14) becomes
x J

1
2 AT b 2 AT Ax AT b AT Ax
2

(15)

The minimum of J is obtained when


J
T
T
A b A Ax 0
x
8

(16)

And this implies that


A b A Ax
T

(17)

Note:
Equation (17) is referred to in statistics as the
normal equations.
We could have obtained (17) by just
multiplying Ax=b through by AT.
ATA multiplies an mn by an nm to get an
mm matrixSquare!
(ATA)T=ATA, so the transpose of ATA is
itself. This may only occur if ATA is
symmetric, implying that ATA is symmetric.
Reference [3, p. 157] shows that if A has
linearly independent columns, then ATA is
invertible.
Solving eq. (17) for x results in

x A A A b
T

(18)

Define the gain matrix G as


GA A
T

(19)

Also define the pseudo-inverse of A as

A A A A G A
I

(20)

Now we can find the answer to our problem as.


1 T
T
1 T
I
x A A A b G A b A b
(21)
First, the gain matrix is given as
1 1 1
1 1 0 0
2 1 1

1
0
0
T
1 2 1
G A A 1 0 1 0

0 1 0
1 0 0 1
1 1 2
0
0
1

The inverse of the gain matrix is then found


from Matlab as
3 1 1
1
1
G 1 3 1
4
1 1 3

The pseudo-inverse is then


3 1 1 1 1 0 0
1 3 1 1
1
1
I
1 T
A G A 1 3 1 1 0 1 0 1 1 3 1
4
4
1 1 3 1 0 0 1
1 1 1 3

Then we can obtain the least squares estimate of


the 3 states from the 4 measurements as
1.0
1 3 1 1
3.125

3
.
2
1
I
0.875
x A b 1 1 3 1

0.8
4
1 1 1 3
1.175
1.1

10

It is also interesting at this point to look at the


difference between the measurements we
actually had, which is b, and the values
corresponding to those measurements that we
would compute using the state vector x, which
is Ax.
This difference is referred to as the residual, r,
and given by eq. (9) as
1.0 1 1 1
3.2 1 0 0 v1

v 2
r b Ax
0.8 0 1 0


e
1.1 0 0 1
1.0 1 1 1
1.0 1.075 0.075
3
.
125

3.2 1 0 0
3.2 3.125 0.075

0.875

0.8 0 1 0
0.8 0.875 0.075

1
.
175

1.1 1.175 0.075


1.1 0 0 1

3.0 A motivating system and some basics


Consider Fig. 2 where we obtain measurements
on the indicated quantities.

11

P12, Q12
V1

Bus 2

Bus 1
P23, Q23

Bus 3

Fig. 2
We denote the measured quantities as follows:
z1 V1

z 2 P12
z 3 Q12
z 4 P23

(22)

z 5 Q23

Then we can write


z i zi i
where
zi is the measured value
zi is the true value (unknown)

12

(23)

i is the error (unknown)

Not knowing zi and i is a problem. However,


we may obtain statistical information from
calibration curves (error as a function of
measurement) of measuring instruments. It is
usually assumed that i is a random variable
with a normal (Gaussian) distribution having
zero mean, as illustrated in Fig. 3.
f(i)

Fig. 3
This makes sense because a particular
measuring instrument, if it is reasonably
calibrated, may read a little high (positive error)
13

at times or a little low (negative error) at times,


so that the average error is zero. Calibration
curves enable determination of the variance i.
Recall the expectation operator, which we
denote as E() (i.e., the expected value of ). It
is defined as:

E ( x)

xf ( x)dx

(24)

which gives the mean value of a variable x


described by the probability distribution
function f(x).
We also define variance as:
var( x)
2
x

E
(
x
)
f ( x)dx

(25)

We relate variance to mean beginning with (25).

14

var( x)
2
x

E
(
x
)
f ( x)dx

2 xE ( x) ( E ( x)) 2 f ( x)dx

x 2 f ( x)dx 2 E ( x) xf ( x)dx ( E ( x)) 2


E ( x ) 2( E ( x)) ( E ( x))
2

f ( x)dx

(26)

E ( x 2 ) ( E ( x)) 2

From eq. (26) (which is true for any random


variable x), we see that if the mean is 0
(E(x)=0), then the last term in eq. (26) is 0 and
x2 E ( x 2 )
(27)
In regards to the calibration error, characterized
by the random variable i, we have then:
E (i ) 0 (zero mean)
2
2
E
(

i
i (variance)

Note that the larger the variance, the less


accurate is the measuring device.

Since we have multiple measuring instruments,


we also need to understand how the statistics of
15

one random variable relate to the statistics of


another.
The covariance measure is effective in doing
this, and is defined as
cov( x, y)
2
xy

x E ( x) y E ( y) f ( x, y)dx

(28)

Note that variance is a special case of


covariance when x=y, i.e., var(x)=cov(x,x).
The covariance cov(x,y) is a measure of how
two variables change together.
If cov(x,y)>0, then x tends to increase when
y increases.
If cov(x,y)<0, then x tends to decrease when
y increases.
If x and y are independent, then cov(x,y)=0
(but note that cov(x,y)=0 does not
necessarily imply independence) because
covariance reflects linear dependence. Two
variables can be nonlinearly dependent (and
therefore not independent) but have a
covariance of 0.
16

It can be shown [4] from eq. (26) that


2
xy cov( x, y) E ( xy ) E ( x) E ( y)
(29)
If
x and y are independent, then
E(xy)=E(x)E(y).
Therefore, for two independent random
variables, the covariance is:
xy2 cov( x, y) 0

Now, back to our state estimation problem


A basic assumption:

The errors i and j for any two measuring


instruments i and j are independent. This means
that
0,
cov( i , j ) 2
i ,

i j
i j

(30)

Thus, we can define a covariance matrix R,


where the element in position (i,j) is cov(i,j).
Given (28), the matrix will appear as:

17

12 0 0

2
0

0
0
2

0 0

0 0 m2
0

(31)

where it is assumed that we have m measuring


instruments.
We will use eq. (31) in our development.
4.0 Problem for AC State Estimator
We will define the state vector as for an N-bus
network as:
2

x1 N

x
x2 N 1 V1

V
N

(32)

We will have n=2N-1 states.


For each parameter for which we have a
measurement, we want to write an equation in
terms of the states. In other words, if we have a
18

measurement zi, then the true value of that


measurement will be

zi hi ( x)

(33)
For a voltage measurement, the function hi is
very simple:

zi Vk

(34)

where measurement i occurs at bus k.


For MW and MVAR flows, the function hi is
given by the expressions for power flow across
a line from bus p to bus q. These are given by
Ppq Vp2 g pq VpVq g pq cos( p q ) VpVqbpq sin( p q ) (35)
Qpq Vp2 (bpq bp ) VpVq g pq sin( p q ) VpVqbpq cos( p q ) (36)
where the line has
series admittance of gpq+jbpq; gpq>0, bpq<0 for
inductive line.
Shunt susceptance at bus p of bp (which
includes any reactive shunt at the bus plus half
of the line charging). If capacitive, then bp>0.
Now define some vectors:

19

z1
z
Measured values:
z m
z1
z
True values:
zm

Errors

1

m

(37)

(38)

(39)

Generalizing eq. (23), we have:


z z

(40)
From (33), we also have for a vector of
functions expressing the measurement values in
terms of the states:

z h( x)

Substituting eq. (41) into (40), we have:


z h(x)

(41)

(42)
Now consider what we have here. The number
of unknowns is n=2N-1 (the states in x which

20

are the angle and voltage variables), and then


we have some number of measurements m.
Lets assume that m>n, i.e., that we have more
measurements than states.
One thing we could do is
set =0 z=h(x)
choose m=n equations (each corresponding to
a measurement)
Solve for x (it would need to be non-linear
solver but once done, solution is unique).
However, the tough question would be: Which
measurements to choose to keep? Which are the
best?
Since we do not know which measurements are
the best, we instead make sure that we have
more measurements than states, i.e., we will
solve the problem for m>n=2N-1.
So our strategy is as we saw in our earlier
example, to choose x so as to minimize the sum
21

of the squared errors between the measured


values and the actual values.
From eq. (40), we have the error is
z h(x)

(43)
Similar to eq. (10), we can then express the sum
of squared errors as
1 m 2 1 T
1
T
J i z h( x) z h( x)
2 i1
2
2

(44)

We denote the above as J because we will find


it convenient to modify a bit.
By minimizing J, we are effectively choosing x
that best fits the measurements. Remember,
however, that some measurement devices are
better than others (which is a different statement
than some measurements are better than others).
It is reasonable, then, to place more weight on
the better measuring devices.

22

A good choice for this weight is

i2

since

Good device small , large i2

Bad device large , small i2

2
i

2
i

Therefore, we will modify eq. (44) to be:


1 m i2
J 2
2 i1 i

(45)

And so i will increase the error terms for


accurate
measurements,
making
the
optimization (and the solution) more dependent
on those measurements.
Recall the covariance matrix given by eq. (31),
repeated here for convenience:
2

12 0 0

2
0

0
0
2

0 0

0 0 m2
0

(31)

Because R is diagonal, its inverse is easy to


find:

23

1
2
1
0
1
R

0
1

22
0
0

0
0

0
1
0

m2

(46)

We can therefore express eq. (45) in as:

1 m i2 1 T 1
1
1 m zi hi ( x)
T
1
J 2 R z h ( x ) R z h( x )
2 i1 i 2
2
2 i1
i2

(47)
The problem then becomes to find x that
minimizes J. Note, however, that h is nonlinear,
and so our solution will necessarily be iterative.

5.0 Solution for AC State Estimator


So the problem can be stated as follows:
mimimize

24

1 m i2 1 T 1
1
1 m zi hi ( x)
T
1
J 2 R z h ( x ) R z h( x )
2 i1 i 2
2
2 i1
i2

(48)
We can apply first order conditions, which
means that all first derivatives of the objective
function with respect to decision variables must
be zero, i.e., that x J 0 . That is,

For a

J
x 0
J 1
xJ

x J
0
xn

single element in x J ,

(49)
we have:

1 m zi hi ( x)
J
2 i1
i2

(50)

J 1 m 2zi hi ( x) hi ( x) m zi hi ( x) hi ( x)

x1 2 i 1
i2
x1
i2
x1 (51)
i 1

This can be written in matrix form as


h ( x)
J
1
x1
x1

z1 h1 ( x)

h2 ( x)
hm ( x) 1 z2 h2 ( x)

(52)

x1
x1

h
(
x
)
m m

25

And we then see how to write the vector of


derivatives, according to:
h1 ( x)
x
1

h1 ( x)
J
x2
x

h1 ( x)

xn

h2 ( x)
hm ( x)

x1
x1 z1 h1 ( x)

h2 ( x)
hm ( x) z h ( x)

1 2
2

x2
x2


(53)

h2 ( x)
hm ( x)
zm hm ( x)

xn
xn

We recognize the matrix of partial derivatives in


(53) as a sort of Jacobian matrix but
(a) it is n x m, i.e., it is not square and
(b) unlike standard Jacobian, here the rows
vary with variable (x1, x2, ), not function
(h1, h2, ).
Lets define a matrix H that does not have the
second (b) attribute, i.e.,
h1 ( x)
x
1

h2 ( x)
H x1

h ( x)
m
x1

h1 ( x)
h1 ( x)

x2
xn

h2 ( x)
h2 ( x)

x2
xn


hm ( x)
hm ( x)

x2
xn
26

(54)

Note that H is m x n, and it is the negative


transpose of the first matrix in eq. (53).
Then we see that the optimality condition can be
written as:
J
x
J 1
1
xJ
H T ( x) R z h( x) 0
x
J
xn

(55)

The solution to eq. (55) will yield the estimated


state vector x which minimizes the square error.
Because there are n elements in the partial-J
vector on the left, we observe that eq. (55) gives
n equations. Since there are n variables in x, it is
possible to solve eq. (55) explicitly for x.
Now we need to determine a solution procedure
for eq. (55). To do so, lets define the left-handside of eq. (55) as G(x), i.e.,
1
G( x) H T ( x) R z h( x) 0
(56)

27

Perform a Taylor series expansion of G(x)


around a certain solution x0.
G( x0 x) G( x0 ) x G( x) x h.o.t. 0
(57)
x
Note that eq. (57) indicates that if x0+x is to be
a solution, then the right-hand-side of eq. (57)
must be zero.
0

Recall that in a Taylor series expansion, the


higher order terms contain products of x, and
so if x is relatively small, terms containing
products of x will be very small, and in fact,
negligible. So we will neglect the h.o.t. in eq.
(57). This results in:
G ( x 0 x) G ( x 0 ) x G ( x) x 0
(58)
x
Since eq. (58) is nonlinear, we must resort to an
iterative algorithm to solve it. We will use a
Newton-type algorithm
Lets assume that we can make a pretty good
guess at the solution to eq. (58), i.e., that the
difference between our guess and the real
solution is relative small.
0

28

Denote this guess as x(k). Because it is not the


solution, G(x(k))0.
So we want a better guess. Denote the better
guess as x(k+1). The difference between the old
guess x(k) and the new guess x(k+1) is x(k+1), i.e.,
( k 1)
(k )
x
x x
(59)
Or we can write
( k 1)
(k )
x x
x
(60)
Evaluating G at the better guess, we have
( k 1)
(k )
(k )
G( x
) G( x x) G( x ) x G( x) x (61)
x
We desire that G(x(k+1))=0. Under this desired
condition, eq. (61) becomes:
( k 1)
(k )
(k )
G( x ) G( x x) G( x ) x G( x) x 0 (62)
x
(k )

(k )

Solving for

x G ( x)

x G ( x)

x(k )

, we have:

x G( x )
(k )
(k )

(63)
In considering eq. (63), we already understand
the right-hand-side, this is just the negative of
eq. (56), evaluated at x(k), i.e.,
(k )
(k )
1
(k )
G( x ) H T ( x ) R z h( x )
(64)
There are n functional expressions in eq. (64).
x

29

But what is x G( x) x ? This is the derivatives,


with respect to each of the n state variables, of
each of the n functional expressions in eq. (56):
1
G( x) H T ( x) R z h( x) 0
(56)
Since there are n functional expressions and n
derivatives to take for each one, we can see that
G(x) will be nn, a square matrix.
(k )

Remembering eq. (55)


J
x
J 1
1
xJ
H T ( x) R z h( x) 0
x
J
xn

(55)

reminds us that G(x) are also derivatives with


respect to each of the n state variables.
Therefore, x G( x) x are second derivatives of J
with respect to the state variables.
Can we obtain a form for these second
derivatives. Lets start from eq. (56):
1
G( x) H T ( x) R z h( x)
(56)
So what we want is:
(k )

30

x G ( x)

G ( x )
1

H T ( x) R z h( x)
x
x

(65)

The differentiation of what is inside the brackets


of eq. (65) is formidable. We will make it easier
for ourselves by assuming that H(x) is a
constant matrix. This implies H(x+x) H(x),
i.e., the derivatives of the power flow equations
do not change.
Remember, from eq. (54),
h1 ( x)
x
1

h ( x)
h( x ) 2
H
x1
x

h ( x)
m
x1

h1 ( x)
h1 ( x)

x2
xn

h2 ( x)
h2 ( x)

x2
xn


hm ( x)
hm ( x)

x2
xn

(54)

where the functions hi are the expressions for


the measurements (voltages, real and reactive
power flows) in terms of the states (angles and
voltage magnitudes).
So H is really a power flow Jacobian matrix. It
is well known that the power flow Jacobian is
relatively insensitive to relatively small
variations in state.
31

With the above assumption, differentiating the


right-hand-side of eq. (63) becomes not-so-bad:
G ( x)
1
x G ( x)
H T ( x) R z h( x)
x

h( x )
1
1 h( x )
H T ( x) R
H T ( x) R
x
x

But we recognize from eq. (54) the term

(66)
h( x )
x in

eq. (66) as H. Therefore, eq. (66) becomes:


1

x G ( x ) H T ( x) R H ( x )

(67)
Making this substitution into eq. (61) results in:
(k )
x G( x) x G( x )
(63)
x
(k )

H T ( x) R H ( x)

x G( x )
(k )
(k )

(68)
Finally, replacing the right-hand-side of eq. (68)
with eq. (54) evaluated at x(k) yields:
1
1
H T ( x) R H ( x) x H T ( x) R z h( x)
(69)
x
x
Equation (69) provides a way to solve for x.
6.0 Solution Algorithm
Given:
measurements z
[z1, ,zm]
standard deviations 1,m
x

(k )

(k )

32

the network
Compute : state estimate x
[x1,,xn]
(this is all voltage magnitudes and all voltage
angles except for swing bus angle)

1. Form measurement expressions h(x)


2. Form derivative expressions

h( x )
x

3. Form R
4. Let k=0. Guess solution x(0).
5. Compute H(x(k)), h(x(k))
1
T
1
T
z h( x) x
b

H
(
x
)
R
A

H
(
x
)
R
H
(
x
)
6.Compute
,
x
7. Solve Ax=b for x.
8. Compute x(k+1)= x(k)+ x
xi then
9. If max
i
k=k+1
Go to 5
Else Stop
(k )

(k )

Homework #10:
For the lossless network shown below, the following
data is given:
z1=V1=1.02, 1=0.1
33

z2=V2=1.0, 2=0.1
z3=P12=2.0, 3=0.05
z4=Q12=0.2, 4=0.05
Let
x

(0)

x1( 0) 2( 0) 0

x2( 0) V1( 0) 1.02


x3( 0) V2( 0) 1.0

and perform one iteration of the solution procedure


to find x(1).
Bus 1

Bus 2
-j10

References:
[1] F. Schweppe, .J. Wildes, and D. Rom, Power system static state estimation: Parts I, II, and III, Power Industry
Computer Conference (PICA), Denver, Colorado June, 1969.
[2] A. Monticelli, State Estimation in Electric Power Systems: A Generalized Approach, Kluwer, Boston, 1999.
[3] G. Strang, Linear algebra and its applications, third edition, Harcourt Brace, 1988.
[4] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1984.

34

You might also like