Multiple-model estimation with variable structure
Multiple-model estimation with variable structure
4, APRIL 1996
Abstract-Existing multiple-model (MM) estimation algorithms problems can be successfully formulated in terms of such
have a fixed structure, i.e., they use a fixed set of models. An systems. Typical examples can be found in systems subject to
important fact that has been overlooked for a long time is failuredrepairs, piecewise linearization of nonlinear systems,
how the performance of these algorithms depends on the set of
models used. Limitations of the fixed structure algorithms are target tracking, reconfigurable systems, etc. 1171, [18], [lo].
addressed first. In particular, it is shown theoretically that the Increasing attention has been given recently to hybrid systems
use of too many models is performance-wise as bad as that of due to their wide applicability.
too few models, apart from the increase in computation. This To the authors’ knowledge, all of the existing MM esti-
paper then presents theoretical results pertaining to the two ways mators developed prior to [21], except [14] and [29], use a
of overcoming these limitations: selectkonstruct a better set of
models and/or use a variable set of models. This is in contrast fixed set of models, i.e., a fixed structure, at each time. When
to the existing efforts of developing better implementable fixed these algorithms are applied to solve real-world problems,
structure estimators. Both the optimal MM estimator and practi- it is often the case that the use of only a small number
cal suboptimal algorithms with variable structure are presented. of models is not good enough. Such situations exist in, for
A graph-theoretic formulation of multiple-model estimation is example, failure detection and isolation, where many parts
also given which leads to a systematic treatment of model-set
adaptation and opens up new avenues for the study and design can fail or deteriorate. The use of more models increases
of the MM estimation algorithms. The new approach is illustrated the computational burden considerably. More importantly, it
in an example of a nonstationary noise identification problem. does not necessarily improve the performance. In fact, the
performance will deteriorate if too many models are used
due to the excessive “competition” from the “unnecessary”
I. INTRODUCTION
(excess) models. Thus, one may face a dilemma: more models
ULTIPLE-MODEL (MM) estimation is a powerful ap- have to be used to improve the accuracy, but the use of too
proach to adaptive estimation. It is particularly good for many models can degrade the performance, not to mention the
problems involving structural as well as parametric changes. increase in computational burden.
In this approach, a set of models is selected/designed to To find a way out of this dilemma, this paper introduces
represent (or cover) the possible system behavior patterns the concept of variable structure MM estimation and proposes
(called system modes), and the overall estimate is obtained several variable structure algorithms, in contrast to the existing
by a certain combination of the estimates from the filters efforts of developing better implementable fixed structure
running in parallel based on the individual models that match estimators.
the system modes. This approach was initiated in [26]. The The paper is organized as follows. After defining in
early work did not consider jumps in system modes and led to Section I1 the hybrid estimation problem to be considered,
the nonswitching MM algorithms. In the more recent and more Section 111 briefly describes the fixed structure algorithms
realistic switching MM algorithms, first proposed in [ 11, the and the associated problems. In Section IV, theoretical results
jumping of system modes is modeled by switching between pertaining to the effects of the choice of model set are obtained.
models. Section V deals with the theoretical aspects of the variable
The MM approach is best described in terms of stochastic structure estimators. The MM estimation algorithms are
hybrid systems. A hybrid system is one that can be suit- formulated in a new graph-theoretic framework in Section VI
ably described in a hybrid space Rnx x s, the Cartesian for better design and study of both fixed and variable structure
product of the continuous-valued base state space Rnx and algorithms. Then a number of variable structure frameworks
a discrete set S , the collection of the system modes (modal for MM estimation are presented in Section VII. Numerical
states) which characterize the behavior patterns of the system. examples of a nonstationary noise identification problem given
A stochastic hybrid system thus distinguishes itself from in Section VI11 illustrate the superiority of the proposed
conventional systems in its imbedded random jump process variable structure algorithms to the existing fixed structure
which governs the (random) sudden transition of its oper- algorithms. The conclusions are summarized in the last section.
ational modes (system behavior patterns). Many real-world
Manuscript received July 3, 1992; revised May 19, 1995. Recommended by
Associate Editor, P. J. Ramadge. This work was supported by ONRBMDO 11. PROBLEM STATEMENT OF THE HYBRIDESTIMATION
under Grant NOOO14-91-J-1950and by the NSF under Grant ECS-9496319. Consider a stochastic hybrid system
X.-R. Li is with the Department of Electrical Engineering, University of
New Orleans, New Orleans, LA 70148 USA.
Y. Bar-Shalom is with the Department of Electrical and Systems Engineer-
ing, University of Connecticut, Storrs, CT 06268 USA.
Publisher Item Identifier S 0018-9286(96)02828-0.
with (possibly state-dependent) Markovian transition of the It is thus clear that the MM approach fits well into problems
system mode that can be characterized by structural as well as parametric
changes. The existing MM algorithms have a fixed structure in
P{m,(k + l ) b % ( k ) >4 k ) ) the sense that the set A4 used in (5) and (6) is time-invariant,
= $[k, m2, m 3 ,4 k ) l vmz, mj E s (2) even though the models themselves may be time-varying.
and the mode-dependent measurements of the base state Let z k = {z(O),z(l),...,z(k)}be the measurement se-
quence through time k (or more rigorously, the corresponding
z ( k ) = h [ k ,z ( k ) ,m ( k ) ] + w[k, m ( k ) ,4 k ) l (3) a-algebra), where z ( 0 ) denotes the initial information.
The state estimate and its associated covariance matrix
where 2 is the base state vector; z is the noisy measurement
are calculated in a fixed model-set MM algorithm using the
vector; m ( k ) is the modal state (system mode index) at time k ,
minimum mean-square error (MMSE) criterion as follows:
which denotes the mode in effect during the sampling period
k ; P { . } denotes probability; the event that mode m, is in ?(klk) = ~ % ( ~ l ~ ) w j ( ~ ) l ~ k(8)}
effect at time k is denoted as 3
m 3 ( k )k { m ( k )= m,} (4)
j
S is the set of all modal states at all times; and
v[k,m ( k ) , $(IC)] and w [ k , m ( k ) ,z ( k ) ] are the state- . [l”(klk)- ~ ~ ( k l k ) l ’ } ~ { H j ( k ) l z k } (9)
dependent (mode-dependent, in particular) process and where Z j ( k l k ) is the optimal estimate at time k under the
measurement noise sequences, respectively. It is implied a
hypothesis H j ( k ) = {mode history j through time k is
by (3) that the base state measurements are noisy and
the true one}2, and Pj(klk) is the associated covariance.
mode-dependent, and the mode information is imbedded in
As such, there are N assumed possible modes at each time
the measurement sequence. In other words, the system mode
(since any model in M can be in effect at any time) and
sequence is an indirectly observed (and state-dependent)
thus N k hypotheses (possible mode histories) at time k .
Markov chain’. The hybrid state (x,m) is sometimes denoted
Therefore, the full-hypothesis-tree (FHT) estimator in this
by [ in this paper.
setting has to use N’ different permutations3 of N models
The problem of hybrid estimation is to estimate the base
at time k , and both summations in (8) and (9) are over
state and the modal state based on the sequence of the noisy
all hypotheses. This exponential increase in computation and
(mode-dependent) measurements.
memory renders the implementation of the FHT algorithms
infeasible in real time. This is true even for the simplest
111. EXISTINGMULTIPLE-MODEL
stochastic hybrid systems-the so-called (Markovian) “jump-
ESTIMATION AND ITS LIMITATIONS
linear” systems [27]. In view of this, existing efforts have
An estimation algorithm that uses the same set of models been focusing on developing more cost-effective real-time
at all times is referred to as a fixed structure or fixed model- implementable versions of this FHT estimator using certain
set MM algorithm. To the authors’ knowledge, all of the MM hypothesis management techniques such as merging “similar”
algorithms prior to [21] belong to this class with [14] and [29] hypotheses andlor pruning “unlikely” hypotheses to keep the
being the only exceptions. Several papers appeared after [21], number of the remaining hypotheses within a certain limit. As
such as [24]. In the fixed model-set MM approach, a set of such, the summations in (8) and (9) are over the remaining
models must be determined in advance. Denoting by M this hypotheses. Different suboptimal MM algorithms are distinct
fixed set of models assumed in the algorithm, system (1)-(3) is from each other in the criteria and techniques of hypothesis
approximated by one that consists of a set of N conventional management. The above discussion also holds for algorithms
models as based on other optimality criteria, such as the MAP (maximum
a posteriori) estimators.
+
4 k 1) = f,h, 4 k ) I + g , [ k 4 k ) >% [ k ,4 k ) l l The problems associated with the fixed structure MM al-
Vm, E A4 ( 5 ) gorithms are closely related to an important fact hardly men-
+
4 k ) = h,[k, 4 k ) I W J k , 4 k ) l vm, E M (6) tioned in the literature and largely ignored in the MM estima-
tion theory: the performance of an MM estimator depends to
and jumps between the system modes are modeled by switch-
a large extent on the model set M used.
ing from one model to another governed by a Markov law
a If S, the set of all (true) system modes, is known and not
similar to (2). Here, N = IMI is the cardinality of M , i.e., the too large, it is natural to choose M to exactly match this set4.
number of models used, and the subscript j denotes quantities
pertaining to model m,. For example * A mode history through time k is defined as a sequence of modes from
the initial time up to and including time k , that is, { m ( i ) } ; i b .
3That is, each of the N recursive filters at time k has to run N“’ times,
h,[k, 4 k ) I 2 h [ k , .(k), m,(k)l each time with a different previous estimate.
41n this paper, a “mode” refers to the system behavior pattern and a
A
WjF, 4k)l = 4% m j w 4k)l. (7) “model” refers to the (possibly reduced order or simplified) representation (or
description) on which the estimator is based. Such a distinction is convenient
’ It is known as the hidden (state-dependent) Markov chain in the speech and necessary where the mismatch of the model and mode is a major concern.
recognition literature [30]. In other cases, they may be used interchangeably.
480 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 4, APRIL 1996
However, since S is usually not known exactly or is too perfectly. Consequently, the best Gaussian approximation for
large, a set of models that can “cover” in some sense the p[.(k), m ( k ) l z k ]at each time in this “Markovian approach”
possible system modes at any time should they be selected or is equivalent to the Generalized Pseudo-Bayesian algorithm
constructed-this is the major task in model design for MM of first order (GPB1) [9]. If a Gaussian approximation for
estimation. To have reliable results, at least one of the models each p [ z ( k ) ,m , ( k ) l z k ] is used, the result is equivalent to the
in M must be “c10se”~to the system mode in effect at any GPB2 algorithm.
time. The existing MM algorithms, which use a fixed set of
models, usually perform reasonably well for problems that can
be handled with a small set of good models. However, in many I v . DESIGNOF THE MODELSET
practical situations, especially with high-dimensional systems, Although the importance of the model set on the perfor-
this requirement is not satisfied. The use of more models mance has been hardly mentioned in the literature, it is evident
in an MM algorithm will increase the computational burden in practice since the primary difficulty in applying the MM
considerably. What is even worse is that, to many people’s estimation theory is to design an appropriate set of models.
surprise, increasing the number of models in a fixed structure A major concern is thus the selection of the model set. This
MM algorithm does not guarantee better performance; rather, can provide a guideline for estimator design. Unfortunately,
it may yield poorer results. This could be true even if an FHT very limited theoretical results on this important issue are
estimator is used. This is a dilemma: additional models should available6.
be used to improve the accuracy, but the use of too many The major reason for the unsatisfactory performance of the
models can degrade the performance, let alone the increase in existing fixed structure algorithms with a large model set is
computation. There are two possible ways out of this dilemma: that many models in this set are so different from the system
1) to design a better set of models and 2) to use a variable set mode in effect at a particular time that not only is the compu-
of models. These are the two main themes of this paper. tation for the filters based on these models a waste, but also
Note that the sequence of the hybrid states [ ( k ) = the excessive “competition” from the “unnecessary” models
[ ~ ( km ) ,( k ) ]is Markov, whereas the base state alone is not. degrades performance. To show this degradation theoretically
It seems better to develop a recursive estimator for ( directly and to obtain better insight into the MM algorithms, the effects
to eliminate the explosion in computational requirements of of using different model sets are investigated in this section.
the above hypothesis-tree-based approach. This would be For simplicity, a static problem is considered with a mea-
really nice if it were true. For simplicity, however, consider surement of the state based on (3) for a single time index that
a jump-linear system with the linear-Gaussian assumption is omitted here.
of the Kalman filter (for each system mode). Suppose that Notations: Denote by 2~ a multiple-model MMSE estimate
p [ x ( k - 1)1m3(k - l), is Gaussian. Then using an arbitrary model set A, that is >
Proposition 4.1: Suppose that the optimal model set S and (20) follows from
an arbitrary model set A are given. Denote by J the set of
common models of S and A 0s -?A = (1 - b)[?s- 2 ~ 1 . (24)
J = S n A (the set of common models). (15)
, Q.E.D.
s
Let L be the mismatched models between and A (i.e., those s
Corollav 4.1; If either A or is included in the other, then
not common to them), that is, the set of missing models 6% the equivalent estimate using the mismatched models becomes
those in S but not in A ) and extra models (i.e., those in A
but not in S), given by 1
l-bEm, E (A-S)% a3
2.=( 1 A 3 S (extra models case)
if
L = ( S U A) - ( S n A) (25)
= ( S - J ) U ( A - J ) (symmetric difference). (16)
Em,E ( S - A ) ‘&
if A cS (missing models case)
Also let
where b 2 1 if A c S and b 5 1 if A 3 S .
a, = P{m,Iz, A } VmJ E J (mode probability) (17) Note that the sum of the probabilities of the mismatched
= k‘{m&, S } Vmt E J (mode probability) (18) models is 1 - b if A 2 S or 1 - l / b if A c S .
= E[xlz, m,] (mode-conditioned estimate). (19) Proposition 4.1 and Corollary.4.1 indicate that the use of
too many models is as bad as the use of too few models. They
Then the distance between the optimal estimate 2s and the also show that the degradation of the estimation is proportional
estimate 2~ based on model set A is proportional to the to both the mismatched model probabilities and the distance
distance between 2s and the “equivalent” estimate ? L based between the optimal estimate and the equivalent estimate using
on the set L of the mismatched models the mismatched models. An important question thus arises
naturally: When do we add (or delete) a certain group of
112s - 2~11’= (1 - b)’112s - P~ll’ (20) models, denoted as set C below, to improve the performance?
where 2~ is the equivalent estimate (referred to set A ) based The following theorem provides an answer.
on the mutually mismatched models of S and A, given by7 Theorem 4.1: Consider two model sets A and B. Assume
that A is a subset of B. Let C = B - A. As in (22), the
mode probabilities as calculated based on B and on A have
the relation
mJ€J m3€ ( A - J )
/ \
n,&(A-J) mr6 ( S - J)
= b2s + (1 - b ) P L (23)
’The first summation is the weighted sum of the estimates based on the and ll?s - 2~11’ = 112s - 2~11’ if and Only if the equality
uncalled-for (extra) models, and the second one is of the estimates based on in (30) holds.
the missing models as calculated using S;it is converted to -4 by scaling by
b.
86 = 1 if and only if E 5,s. 9 b = 1 iff P { m j ( z ,B } = 0, Vm, E C which is equivalent to 2~ %A.
482 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 4, APRIL 1996
= [2brcosQ+ (1 - b)r2- (1+ b ) ] Fig. 1. Illustration of Theorem 4.1. If and only if the estimate based on
model set C falls inside the corresponding ball will the use of the set C in
' (1 - b)ll?s - ?All2. (32) addition to set -4improve the estimation accuracy.
It is then easy to check that the right-hand side (RHS) of (32) same direction as 2~ is, and thus will leave more room in
is not positive if and only if (30) holds. Q.E.D. the opposite direction for 2~ using additional models with a
It can be shown that the region described by (30) is a circle smaller probability 1 - b to balance the offset of 2~ and thus
of a radius (1/(1- b ) ) centered at ( b / ( l - b ) , 180") on the improve the estimation accuracy.
plane determined by the two vectors (2s -?A) and ( 2 s - 2 ~ ) .
Theorem 4.1 requires knowledge of the optimal estimate
Theorem 4.1 provides a guideline for model set design.
OS. It would be much more useful if this assumption could
Specifically, it provides a criterion for deciding when the
be relaxed somehow. Still, the significance of Theorem 4.1 is
addition or deletion of certain models is beneficial.
not as limited as it may appear. An analogy is the tracking in
The geometric interpretation of Theorem 4.1 is interesting:
clutter problem for which the Kalman filter is an invaluable
given 2s, ? A , and model set A , and consider adding a set C of
tool even though it "unrealistically" assumes the availability of
new models to set A (C and A are thus disjoint). The estimate
2 c can be obtained by (29). Note that 2 c depends on b. Place correct measurements. In fact, Theorem 4.1 provides not only
2s at the origin and 2~ at (1, 0), meaning that the space has a insight into the MM estimation but also a theoretical guideline
unit length of 112s - 2 11.~Vary 2 c (i.e., vary cos 9i and r ) such for the model set adaptation required in a variable structure
that the equality in (30) holds. If 2~ is confined to the plane MM estimator. It is also very useful, for instance, in the
determined by the two vectors ( 2 s - 2 ~ and ) (2s - 2c),l0 design and performance evaluation of the MM algorithms and
then the circular loci of 2~ are shown in Fig. 1 with values of for a comparison between two MM algorithms. Specifically,
b fixed at 0.5, 0.6, and 0.7, respectively. Clearly, the circular the (generally time-varying) parameters r , cosi9, and b can
loci become spherical surfaces without the above-mentioned be obtained (perhaps by Monte Carlo simulation) for the
confinement. For a given set C , Theorem 4.1 then states that if particular scenario of interest. Theorem 4.1 can be applied to
and only if 2 , falls inside its corresponding circle (ball), using determine at what time one estimator is superior to the other
a model set B = A U C (i.e., adding models in C to set A ) is one. It is also possible to use Theorem 4.1 to obtain a new
superior to using A alone in the sense that 2~ is closer to 2s estimator based on two estimators that use disjoint model sets.
than 2~ is. As such, Theorem 4.1 is somewhat similar to the It is thus clear that the FHT estimator described in the previ-
unit-circle stability criterion for a discrete-time linear system. ous section is optimal if and only if the model set used at any
In general, for higher dimensional cases, Theorem 4.1 states time exactly matches (with probability one) the set of possible
that the estimate ? A can be improved by adding models in C system modes at that time which is often not the case in reality.
to set A if and only if 2c falls inside the corresponding ball In view of this, it is inappropriate to say that an FHT estimator
depending on the b value of (26) which itself depends on the
(based on an arbitrary model set) is optimal, since it does not
number of the models in C and the quality of the estimates
provide the performance limit for all practical MM algorithms.
based on them.
Using a better model set (obtained generally in real time), it is
Fig. 1 is somewhat surprising at first glance in that the
possible that a real-time implementable estimator can provide
improvement ball is larger if A and B match each other
better (in the sense of a larger probability of the common better results. In addition, a fixed structure FHT estimator is
clearly not optimal if the set of possible system modes is
modes being the true ones), meaning that it is easier to use
time-varying, which is in general the case, as shown later.
the additional models to improve the performance. This can
be explained as follows. Since 2s is at the origin and 2~ MULTIPLE-MODEL
V. VARIABLESTRUCTURE APPROACH
is located at ( r , 0) = (1, 0), 2~ will be better than 2~ if
2~ is in the unit ball. In the case that A and B match each It is well known that the most powerful (MP) test is
other well, 2~ will be away from 2s in approximately the the best test for simple hypothesis testing problems in the
Neyman-Pearson framework under the assumption of a fixed
"Or for simplicity, consider that the i s , ? A , and Pc are two dimensional sample size. For problems of a sequential nature, however,
LI AND BAR-SHALOM: MULTIPLE-MODEL ESTIMATION 483
a
the sequential probability ratio test (SPRT) is superior to the Notations: Let S = {SI, SZ,. . . , S N } be the family of all
MP test in the sense of more efficient use of samples-this distinct state-dependent system mode sets. As such, the set of
is the well-known optimality of the SPRT. The major reason system modes S ( k ) at any time k is a member of S. As before,
for this is the following. Using a fixed sample size, the MP let S be the set of all possible system modes, i.e., the union
test does not utilize intermediate information in the sense that of S ( t ) , t = 0, l , . . .. Let S k (or m k )be a sequence of the
it does not make any decision until the required sample size system mode sets (or system modes) through k.
is obtained (and at that time, it has to make a decision even In the sequel, S k and mk are also used to denote the model
if sufficient information is not available), whereas the SPRT (-set) sequence that exactly matches the system mode(-set)
allows a variable sample size and makes a decision when and sequence, respectively.
only when sufficient information is obtained. In view of the state dependency of the system mode sets, it
Drawbacks exist in the fixed structure MM algorithms is meaningless to consider the following sequence:
similar to those in the MP test. The model set M , like the
sample size of the MP test, has to be determined beforehand . . . , S(t - I), Sm3( t ) ., . . with mj $! S ( t - 1) (35)
based only on the initial information about the possible system
modes. Actually, the real-time measurements carry valuable because Sm3( t )will never be a true mode set" at k if S ( t - 1)
information concerning the system mode being currently in is the true one at k - 1. In other words, a true mode sequence
effect; they also provide useful information about the mode set, will never be a member sequence of (35). Thus, we introduce
that is, which modes are "reasonable candidates" to consider the following definitions.
for being in effect. In addition, the set of possible system Dejinitions: A (finite) sequence'* of mode (model) sets
modes depends on the previous state of the system and is
thus time-varying. It is therefore reasonable to consider using
Sk e {S(O),S(l),...,S(k)} is said to be admissible if
S ( t ) is a state-dependent mode (model) set w.r.t. one or
variable structures. That is, M in (5) and (6) is replaced by a
more members of the previous mode (model) set S ( t - 1)
variable set of models to be determined from all (off-line and
on-line) information available-in particular, the measurement for all 1 5 t 5 k . Similarly, a mode (model) sequence
A
sequence. In this sense, the variable structure makes it possible m k = {m(O),m(l),...,m(k)} is said to be admissible if,
to fuse the prior knowledge and the posterior information about for each 1 5 t 5 k , there is an z(t - 1) such that
the system modes. P{m(t)Im(t- l),z(t - 1)) # 0. Loosely speaking, a mode
Similarly to the superiority of the SPRT to the MP test, an (-set) sequence is admissible if and only if it is a possible true
MM estimator with a variable structure may be superior to mode(-set) sequence. Clearly, the admissibility defined here
existing fixed structure schemes in terms of more efficient use concerns the feasibility of the evolution of the system mode
of models. (set). More about the admissibility will be given in Section VI.
In this section, the theoretical basis for MM estimation with Optimal MM Estimator: The optimal MM estimator in the
a variable structure is established. Variable structure estimation MMSE sense with a variable structure is given by
is clearly a generalization of the fixed structure estimation in
the sense that the former includes the latter as a special case. ? ( k l k ) = E [ E [ 2 ( k ) l S kZk]lzk]
, =~?Sk(klk)P{sklzk}
Variable structure MM estimation is based on the following S'"
important facts that have been long overlooked. 1) The FHT (36)
estimator is not optimal if the model set used does not P ( k l k ) = c [ P s S( k l k ) f [?(klk) - ?Sk ( k l k ) ]
exactly match the true system mode set at any time, as S'"
shown before; 2) the measurement sequence carries valuable
' [?(klk) - ? s k ( k l k ) ] ' ] P { s k l z k } (37)
information concerning the state and the system mode set
in effect; and 3) the set of possible system modes at any
where z k is the measurement sequence through time k , and
time depends, in general, on the previous state of the system.
?sk ( k I k ) and Psk ( k I k ) are, respectively, the optimal estimate
This state dependency of the mode set arises from the fact
and its covariance at time k assuming that the true mode-set
that a particular system mode can, in general, jump to only
certain system modes that correspond to nonzero transition sequence is Sk.They can be obtained in a similar way as in
probabilities. (8) and (9) but with a variable model set. Specifically, that is
A
+
The state-dependent system mode set at time k 1 with re-
O s k ( k l k ) = EIE[z(k)lmk,sk,z k ] \ s k z, k ]
spect to (w.r.t.) the previous hybrid state [ ( k ) = ( z ( k ) ,m ( k ) )
is defined as = ? m ' " ( k ] k ) P { m k p kz, k } (38)
where P { m ( k + l)l[(k)} was defined by (2). The mode- ' [ ? s k ( k l k ) - ? m k ( k l k ) ] ' ] P { m k l S kz ,k } (39)
dependent system mode set w.r.t. mode m ( k ) is defined as
S,(,)(k + 1) = {m(k+ 1):P { m ( k + l ) l m ( k ) ,z ( k ) } " A mode set (sequence) is said to be a true one if it includes the true mode
(sequence) in it.
>0 €or Some z ( k ) } . (34) '*The term ''sequence'' Instead of the more rigorous term "ordered k-tuple"
+
Note that SE(k)(k 1) is a subset of S m ( k ) ( k + 1). is used in this paper to keep the terminology as simple as possible (but not
simpler).
484 E E E TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 4, APRIL 1996
where 2 m k ( k l k ) is the optimal estimate at time k assuming The recursion for the probability of the mode-set sequence
that the true mode sequence is m' and P,. ( k l k ) is the in (36) and (37) can be obtained by Bayes' formula as
associated covariance.
P { S k I z k } = P { S ( k ) ,S"-llz(k), 2-1)
Proof: Equations (36) and (38) follow straightforwardly
1
from the smoothing property of the expectation operation [9]. = - p [ S ( k ) , z ( k )1sk-1, zk-l]P{Sk-l }'".I
The mixture covariance matrices (37) and (39) can be proven
1
similarly as in [9], even though the proof given there is in a = - p [ z ( k ) I S k , z"']P{S(k) I s k - 1 , 2-1)
C
somewhat different setting. Q.E.D.
Remarks: The summations in (36) and (37) are over all . P(S"-1I2k--l } (42)
the admissible mode-set sequences rather than all possible where c = p [ z (k ) I Z " ~ ] is a normalization constant.
mode sequences. This is a key difference between the variable The first term on the RHS of (42) reflects the information
structure and the fixed structure. To appreciate this better, about the system mode-set sequence gained from the current
suppose that the system mode set depends only on the modal measurement (i.e., the likelihood function of the mode-set
state, that is, the system mode sequence is a Markov instead of sequence) which is given by
a base-state-dependent Markov chain. Then the state (mode)
dependency of the system inode set at any time is a (many-to- P[+)ISk, zk-7
one) point-to-set mapping G : S + S which assigns to each = p [ z ( k ) l m k ,z"-1]P(m"S'", P}(43)
element of S some member of S.The summations in (36) and m"Sk
(37) are over the range (not even the codomain) of the mapping
Gk : S x . . . x S + S x . . . x S (both of k folds), whereas where the second term is the predicted mode (sequence)
those in (8) and (9) are over the domain of G k . This is in probability, given implicitly in the derivation of (40) (i.e.,
analogy conceptually to the difference between the Lebesgue mode sequence probability times mode transition probability),
integral and the Riemann integral. Since S,,J = 1,2.. . . , N, and the first term is the likelihood function of the mode
form a partition of S and each S, is a mode set, it follows sequence.
immediately that the optimal variable structure estimator has The mode-set transition probability, i.e., the second term
a two-level hierarchical structure: multiple model-sets at the on the RHS of (42), can be obtained as follows. Define the
higher level for the elements of S (as a partition of S ) , and mapping G as the (hybrid) state dependency of the system
multiple models at the lower level for the elements of each S,, mode set similarly as shown before for the mode dependency.
as is clear from (36)-(39). Also, it is not difficult to see that Let E, be the inverse image of S, under G, that is, the subset
a partition of the domain implicitly requires the use of a fixed consisting of the elements of the hybrid state space that is
structure (since S is a constant set), whereas a partition of the mapped by G on S,, given by
range does not (because S,,j = 1,2 , . . . , N, are all distinct E, = X , x MJ = G-'({S,}) = { E : G ( [ )= S,}
mode sets). Clearly, the variable structure estimator reduces to
the fixed structure estimator if S is a singleton, i.e., S = { S } = {(x,m ) E Rnx x S : G ( t )= S,} (44)
which is usually not the case (see Section VI). where X , and M, are the projections of EJ in the base-state
Further Discussions of the Optimal Estimator: The mode subspace and the system mode set S , respectively. Then
sequence probability P { m k l S k ,z k } in (38) and (39) can
be obtained similarly as in a fixed structure estimator. P{S,(k)lS"l, 2-1) = /dP{t.S"l, 2-1) (45)
Specifically, one has the following recursion:
P{mkISk, zk} where S,(k)= { S ( k ) = S,};P{[ISk-', zk-'} is a gen-
1 eralized cumulative distribution function because t has both
= -p[z(k)lmk, zk-']P{mklsk, 2-1) continuous and discrete components, and the integration is
1 over the region in probability space such that t E E,. In
= - p [ z ( k ) l m k , z'"-1]P{m(k)lm'-l, S ( k ) , zk-1) the case that the mode sequence is a Markov instead of
. p{mk--lISk-l, (40) state-dependent Markov chain, (45) becomes
where c = p [ z ( k ) l S k xk-']
, is a normalization constant and P{S,(k)lS"-l, zk--l}
p [ . ] denotes the pdf. In view of the fact that only admissible = P { m ( k - 1)1Sk-', (46)
mode sequences are considered, the last term above follows m ( k - 1)E 111,
from:
or
P{mk-lISk-l, z k - ' } = P{mk-'1Sk, z k P 1 } (41)
P{ s, ( k )I sk-1, zk-'}
since the knowledge of S ( k ) carries only feasibility infor-
= P { m ( k - 1), mk-21sk-1, zk-l} (47)
mation (and nothing else) about m ( k - 1). The first term
m(k--l)€M3mk-2
on the RHS of (40) is the likelihood function of the system
mode sequence which can be obtained similarly as in a fixed where the summand in (47) is the mode sequence probability,
structure estimator. The second term is the mode transition given by (40). Note that M, may contain more than one mode,
probability. as A d 2 in the following simple illustrative example.
LI AND BAR-SHALOM: MULTIPLE-MODEL ESTIMATION 485
With the variable structure estimator, the estimate at 5 = 2 V and E are called the vertices and the (directed) edges of D,
will be the probabilistically weighted sum of the estimates respectively. A stochastic digraph is such a digraph in which
from filters matched to modes 3-7, respectively. As explained each edge is assigned a probabilistic weight that sums up to
before, at IC = 2, a GPB1 algorithm can be applied to modes unity for each vertex. Associated with a stochastic digraph,
3-5 with 91(111) as the initial estimate, where 21 (1 11) denotes there is an adjacency matrix A whose ( i , j)th element a,,
the estimate of x( 1) based on m1 at IC = 1, likewise for modes is the weight of the edge from vertex U, to vertex v3. The
6 and 7 (with 2 ~111) ( as the initial estimate). Note that such adjacent sets of vertices from vertex U, and to vertex vl are,
an application of the GPBl does not lead to a loss of the respectively, defined in terms of a,, as
optimality of the estimator.
In a fixed structure GPBl estimator, however, the mode-
matched filters at k = 2 use the weighted sum of 31(111)and
&(111) as the initial estimate for that ~ y c 1 e . lThis
~ merging
makes the overall estimate at k = 2 nonoptimal. It should It is assumed in this paper that every vertex of the digraph
also be noted that this estimator uses seven filters at k = 2 as under consideration has a self-loop (i.e., adjacent to itself).
opposed to five filters (or a group of three filters plus a group Theorem 6.1: The set of models (or modes) used in the
of two filters) with the variable structure estimator. MM algorithms, together with the model switching law of (2),
If, in the above example, m2 may jump to m5, then the can be represented by a stochastic digraph without parallel
initial estimate for the filter based on m5 in the variable edges (i.e., edges that are incident with the same ordered pair
structure estimator should be the probabilistically weighted of models).
sum of 2:1(111) and 2 2 ( l [ l ) . Pruu$ Represent a model by a vertex with the same
A Suboptimal Variable Structure MM Estimator: A simple index. Add an edge e(v,, U)) from vertex v, to vertex v3 if,
yet good suboptimal variable structure approach is the one according to ( 2 ) , mode m, may jump to mode m3. Assign
based on the best mode-set sequence (BMSS) (possibly with a the transition probability from m, to m3 to be the weight of
proper merging of mode-set sequences) which can be viewed e(Ti,, v3). Obviously, the digraph constructed in this process
as an application of the well-known Viterbi technique to the is a stochastic digraph without parallel edges. Q.E.D.
management of the mode-set sequences. The terms “mode,” “model,” and “vertex” will be used
Let M ( k ) and 111‘“be the model set and model-set sequence, interchangeably.
respectively, at time k in a BMSS algorithm. A large class of DeJnition 6.2: A directed walk of D is a sequence of
recursive algorithms that uses a variable model set consists of vertices U O , 211,. . . , U k such that there is an edge in D from
the following five key steps at each time k . t1-1 to U, for z = 1, 2 , . . . , k . A digraph is said to be strongly
RAMS Approach (Recursive Adaptive Model-Set Ap- connected if there exists a directed walk between any two
proach): vertices.
Model set adaptation: determine the model set M ( k ) With the help of graph theory, four essential ingredients of
based on {A&‘“-’, z’}. a multiple model algorithm can be identified: a set of single-
Initialization of model-based filters: obtain the “initial” model based algorithms (e.g., Kalman filters), each matched
conditions for each filter based on a model in M ( k ) . to a particular mode; a fusion rule that provides the overall
Mode-matched estimation: for each model in M ( k ) , results based on these individual mode-matched (or mode-
obtain estimates under the assumption that this model sequence-matched) algorithms; an initialization rule to obtain
matches the system mode in effect exactly. the initial conditions for each recursive filter at each time; and
Mode sequence probability calculation: compute an evolution mechanism of the underlying digraph at each time
P{M“-l, m ( k ) / z k }for a11 m ( k ) in M(IC). that defines the graph-theoretic relationship between modes.
Estimate fusion: obtain the overall estimate and its We shall call this underlying digraph of an MM algorithm its
associated covariance. (supporting) digraph. It should be emphasized that there may
Step 1) is unique for variable structure algorithms. Its the- be many different supporting digraphs associated with a mode
oretical basis is the merging/pruning criterion and (42)-(47). set, and thus it is better to use digraphs rather than mode sets
More details will be given in Section VII. Step 2) has been to describe the MM algorithms.
discussed. Steps 3)-5) are similar as in a fixed mode-set This graph-theoretic formulation opens up new avenues for
algorithm. the study and application of MM estimation. We provide below
a sampling of useful results without proof
VI. GRAPH-THEORETIC
FORMULATION A Markov chain is ergodic if and only if its associated
OF MULTIPLE-MODEL
ALGORITHMS digraph is a strongly connected stochastic digraph.
DeJnition 6.1: A digraph (or directed graph) D is an or- The state-dependent mode set w.r.t. m3 is the adjacent
dered pair of disjoint sets (V, E ) such that E is a subset of set of modes from m3.
the set of ordered pairs of elements of V16. The elements of The set S of the system modes is not state dependent if
and only if its associated digraph is complete symmetric
I5If the designer correctly chooses the initial mode prob-
ability vector to be [0.5, 0.3, 0, 0,0, 0, 01’ instead of, say,
(i.e., every mode can be jumped from any other one
[1/7, 117. 117. 117, 117, 117. 1/71’, directly). It is thus clear that the state-dependent system
I6Figs. 2-6 are all examples of digraphs. mode sets are usually not the same as their union S.
488 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 4, APRIL 1996
0 A mode sequence consisting of the members of S is scaling all weights in the digraph to obtain a stochastic digraph.
admissible if and only if it corresponds to a directed walk In the sequel, let D always be the total digraph obtained by
of the digraph associated with S. normalizing the union of the supporting digraphs at all times
e The number of admissible mode sequences of S at time of the MM algorithm under consideration.
k is given by
A. Active Digraph (AD) Algorithm
N s ( k ) = Ea!,”’ (61)
2, 3 One way of obtaining the variable digraph can be referred to
as the active digraph (or model-set) algorithm. The basic idea
where a!,”)is the ( i ,j)th entry of A k , the kth power of is to use a subdigraph of the total digraph as the “active”
the adjacency matrix A of the digraph associated with S. digraph at each time. This was inspired by the active set
This follows from a theorem in graph theory [32] which method in constrained nonlinear programming [25]. One cycle
states that the number of the directed walks of length k of the AD algorithm is as follows.
from wz to v3 is equal to a:,”’. AD Algorithm:
Certain properties of MM algorithms are related to their Obtain the union of the system mode sets Y =
supporting digraphs, and therefore MM algorithms can be UmED(k-l)S m ( k ) ,where D(k-1) is the active digraph
classified into several categories in terms of their supporting at time k - 1 and S m ( k ) is the “dependent system
digraphs. mode set w.r.t. m, defined by (34).
Definition 6.3: A fixed digraph (or fixed structure) MM Evaluate the probability of each mode in Y .
algorithm is one whose supporting digraphs must be identical Form the active mode set Y’ as the subset of Y con-
(or is om or phi^'^) at all times; otherwise, it is said to be a sisting of no more than K modes that have largest
variable digraph (or variable structure) algorithm. A fixed probabilities, where K depends on the maximum allow-
model-set MM algorithm is one in which all supporting able computational burden.
digraphs at different times must have the same set of models. Obtain D ( k ) by normalizing D[Y’],the subdigraph of
An MM algorithm is said to be switchable if its supporting D induced by Y’.
digraphs need not consist of only isolated vertices. If all of the Execute steps 2)-5) of the RAMS approach of Section V
supporting digraphs are strongly connected, then the algorithm using digraph D ( k ) .
is said to be strongly switchable. Step 2) above can be done based on the following:
Remarks: 1) A fixed digraph algorithm must have a fixed
structure, but a fixed structure algorithm may have, at differ- P{m(k)lm(k - I), WI, 29
ent times, supporting digraphs with different assignments of 1
= - p [ z ( k ) l m ( k ) ,m ( k - l),2 - 1 1
nonzero weights. In other words, a fixed structure algorithm
allows adaptive (or time-varying) mode transition probabil- . P{m(k)Im(k- l),D [ Y ]z,k - 1 )
ities, that is, a nonhomogeneous Markov chain model for V m ( k ) E D[Y] (62)
the system mode sequence. 2) A fixed structure algorithm
has to use a fixed set of models, whereas a fixed model- where c = p [ z ( k ) l m ( k- 1),D [ Y ]z, “ ~ ] is a normalization
set algorithm need not have a fixed structure since both the constant; p [ z ( k ) l m ( k ) m(k-
, 1),zk-’] is the likelihood func-
zero and nonzero weights can be reassigned for its supporting tion; and P { m ( k ) / m ( k- l),D [ Y ]z, k - l } is the predicted
digraph at different times. 3) Almost all effective practical mode probability which can be obtained from P { m ( k -
algorithms are strongly switchable. In fact, the supporting l)]&’} and D[Y].Note that D[Y]is used for notational
digraphs of practical MM algorithms are usually symmetric simplicity to denote the mode set Y along with the mode
(or bidirectional) in the sense that e(w,, v3) exists if and only transitions governed by a (state-dependent) Markov chain.
if e(wj, w,) exists, with only a few exceptions [ I l l . The above AD algorithm can be simplified as follows.
This graph-theoretic formulation of the MM algorithms pro- All modes of a digraph can be classified into three cate-
vides a rigorous framework, which not only makes available gories: unlikely (or insignificant), significant, and principal.
many well-developed techniques and results of graph theory, Consequently, a reasonable set of rules for mode-set (digraph)
but also provides a systematic way of handling mode-set evolution is: 1) discard the unlikely modes, 2 ) keep the
evolution (adaptation) in real time required for the variable significant modes, and 3) activate the modes strongly adjacent
structure MM algorithms. to the principal modes. Such an adjacent set of modes w.r.t.
mode m, is defined by
OF MODELSET (SUPPORTING DIGRAPH)
VII. ADAPTATION
A, = F, n T,= {m3E V ( D ): uz3 # 0 and a3, # O}. (63)
Definitions: Given V’ c IV(D), where V ( D ) denotes
the vertex set of D , then D’ = (V’,E’) is said to be Let U ( k - 1) be a maximum set of the unlikely modes in
the subdigraph of D induced by V’, denoted by D[V’],if D ( k - 1) whose strong connectedness is retained after the
E’ contains all edges of D with both end vertices in V’. removal of U and the incident edges. Then steps 2) and 3) of
Normalization of a weighted digraph is the procedure of the AD algorithm can be replaced by the following two steps:
l7Two digraphs are isomorphic if there is a one-to-one correspondence 2’) Identify each mode in D ( k - 1) to be unlikely, signif-
between their vertex sets that preserves adjacency. icant, or principal.
LI AND BAR-SHALOM: MULTIPLE-MODEL ESTIMATION 489
TABLE I
MODELSETS AND MODELS
USED IN DIFFERENT DESIGNS
design model sets used models used # of models in each set
digraph D1 digraph D2
is particularly advantageous in the case where the set of
Fig. 5. Two representative members of the strong cover of Fig. 4
possible system modes is large.
For example, a simple AG MM algorithm for the example
is in a specified switching region, defined by a certain com- to be considered in the next section may be
bination of thresholds. A simple switching logic is, assuming
D ( 0 ) = (2, 20}
that D2 was in effect at time k - 1
D ( k ) = {Qi, Q 2 , Q3) (67)
with
Q1 = max[l, $ Q ( k - 1)] Q2 = Q ( k - 1)
Q3 = min [30, 2 Q ( k - 1)] (68)
where Q is the estimated value of Q.
where tl and t2 are design parameters, equal to, say, 0.5 and VIII. AN EXAMPLEOF
0.6, respectively. A more sophisticated switching logic based NONSTATIONARYNOISEIDENTIFICATION
on (42)-(47) can also be used.
This section compares, via an example of nonstationary
Assuming that the supporting digraph sequence is a Markov
noise identification, the new adaptive digraph approach and
chain, it is also possible to obtain (design) the transition
the fixed digraph approach. The IMM algorithm is chosen as
probability matrix of this chain and to apply the fixed structure
the MM algorithm on which the comparison is based, because
MM algorithm to it. This can be referred to as the soft
it is one of the most cost-effective hybrid estimation schemes
switching of digraphs (just like the soft model switching in
a decision-free MM algorithm). In this scheme, the fixed [71, 131, ~ 3 1 .
The following second-order linear kinematic system is used
structure MM approach is applied in two levels: model (lower)
in our example:
level and digraph (higher) level.
The initialization of new filters can be done as follows based
on the principle described in Section V. After switching, say,
z(k + 1) = 1 T]z(k) + [$IT2]v(k) (69)
from D1 to D2 at time k , d2(k - Ilk - 1) and Pz(k- I l k - 1) +
z ( k ) = [l O]z(k) w(k) (70)
should be used as the initial conditions for filters based on mg,
m7, and mil. However, after switching, say, from D2 to 0 4
with the sampling period T = 1, where v(k) and w(k) are
at time k , the probabilistically weighted sum of 21( k - 1I k - 1) zero-mean white with variances Q ( k ) and R ( k ) , respectively.
and 2 7 ( k - 1I k - 1) and the corresponding covariance should Consider the problem of estimating time-varying variance
be used as the initial conditions for the filter based on m4; Q of a scalar process noise sequence. If the only information
at time k , the filters based on mg and mi3 should have no about the noise variance to be estimated is its upper and
contributions to the overall estimate because their probabilities lower bounds, an MM algorithm with mly two models can
should both be zero; and at time k +
1, ?4(klk) and its be adopted [20]. If the difference between the upper and
lower bounds is large, however, this approach with a fixed
covariance should be used as the initial conditions for the
filters based on mg and m13. digraph may lead to unsatisfactory results no matter how
many models are used. In this case, the new adaptive digraph
approach can improve the accuracy significantly. Assume that
C. Adaptive Grid (AG) Algorithm the only a priori knowledge is that Q can vary suddenly in the
A third means of obtaining the supporting digraphs is to interval [l], [30]. Denoting by IMMn a fixed structure IMM
make adaptive the grid of the parameters that characterize the algorithm with n models, and by a digraph switching IMM
possible modes. This algorithm follows a similar idea to that algorithm (DSIMM), several designs are listed in Table I with
of the adaptive MMPDA filter [14] or the moving-bank MM the model switching probabilities defined in Fig. 6 (self-loops
estimators [29], [15]. In this scheme, a coarse grid is set up are omitted). In Table I, the numbers denote the values of Q.
initially, and then the grid is adjusted recursively according to The first model set in all of the four variable structure
an adaptation scheme based possibly on the current estimate, designs is the initial digraph specially designed to have a
mode probabilities, and measurement residual. This approach coarse coverage of the mode space.
~
In the results that follow the standard-deviation version was if a more sophisticated adaptation scheme is used. The
used: improvement of the DSIMM algorithm over the fixed
structure IMM algorithm is even more significant if the
Q ( k ) = .(k)2 (71) difference between the upper and lower bounds is larger.
where The digraph switching algorithm is not sensitive to the
choices of models. This is illustrated in Fig. 8 for DSIMM1,
.(k) = .%P{m%(W} (72) DSIMM2, DSIMM3, and DSIMM4, where DSIMM2 and
m,EM(k) DSIMM4 used the same logic (74) as DSIMM1. For DSIMM3,
in which 0 and Q denote the standard deviation and the
variance, respectively. Use of this standard-deviation version
was found to be superior to the normally used variance version
D1 -
the following simple logic was used:
D2
D1 -+ D4
if
if
P(m1) > 0.6
P ( m 2 ) > 0.6
[201 0 2 -+ 0 3 if P { m z } > 0.7
if P { m l } > 0.7 (75)
Q(k) = 1 Q,P{m,(k)lz’I-. (73)
0 3
0 3
-+
-+
DZ
D4 if P{m2} > 0.7
m,EM(k)
0 4 +0 3 if P { m l } > 0.7.
Fig. 7 shows the results via 100 Monte Carlo simulations Note that in this example, DSIMM3, the adaptive digraph
from IMM2, IMM3, and DSIMM1. Three arbitrarily chosen IMM using four digraphs, is not significantly better than the
sequences of the true process noise variances were used, as DSIMM using three digraphs. The major reason is that the
denoted by the solid lines in the figure. Note that the set of noise is not “wild” enough: The lower and upper bounds of its
possible modes is infinite here, and thus there is no optimal standard deviation are approximately 1 and 5.5, respectively.
MM algorithm. The following simple switching logic was used A choice of Q = 1, 10, 30 corresponds roughly to c = 1,
in DSIMM1:19
D1
D1
-
i
D2
D3
if P { m l } > 0.6
if P { m z }> 0.6
3.2, and 5.5. Hence, the model discernibility (separation) will
not be good if more quantization levels are used because it is
too difficult for the algorithm to tell which mode is in effect
from a small number of measurements. A DSIMM using four
0 2
0 3 -
-+ D3
D2
if P { m z }> 0.7
if P ( m 1 ) > 0.7.
The following observations can be made from Fig. 7:
(74)
digraphs indeed gives better results than those from a DSIMM
using three digraphs if the difference between the lower and
upper bounds is larger. Also, if the quantity to be estimated is
the mean, rather than the variance, a well-designed DSIMM
The IMM3 is not better than the IMM2. Specifically, the estimator with four digraphs is better than that with three
IMM3 is too “conservative” to any variation in Q. This is digraphs [20].
a general behavior when too many models are used. The
IMM algorithm will become even more conservative if IX. CONCLUSIONS
more models are used. This is an example where the use
This paper makes contributions in the following four aspects
of more models in a fixed digraph MM algorithm does of MM estimation: 1) the introduction of the variable structure
not improve the performance.
and the construction of the optimal estimator with variable
The DSIMM1, which consists of exactly the same models
structure, 2) the presentation of practical variable structure
as in the IMM3, provides significantly better results
schemes, 3) the formulation of the MM approach in terms
than those from the IMM2 and IMM3. The DSIMMl of graph-theoretic notions, and 4) the theoretical results on the
is also computationally cheaper than the IMM3, since choice of model set.
all of its digraphs consist of only two models and the
Existing fixed structure MM algorithms, which use a fixed
switching logic used is simple. Better results are obtained
set of models, usually perform reasonably well for problems
I9D, denotes the dlgraph associated with the tth model set that can be handled with a small set of models. Many practical
492 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 4, APRIL 1996
25
*^V
,4
+
20 20 -
0)
U
c
15 ._
0 15 -
0
L
W
VI
._ 10 -
10 t
0
5 5 - estimated with G 2
estimated with G 3 .
estimated with G 4
I
o~~~""""'""'"'''''""."...'.""~~~~~~~~ 0 20 40 60 80 100
0 10 20 30 40 50
time time
REFERENCES
[ l ] G. A. Ackerson and K. S. Fu, "On state estimation in switching
environments," IEEE Trans. Automat. Contr., vol. AC-15, pp. 10-17,
Jan. 1970.
[2] B. D. 0. Anderson and J. B. Moore, Optimal Filtering. Englewood
problems, however, involve more than just a few modes. The Cliffs, NJ: Prentice-Hall, 1979. <
computational burden increases considerably as the number [3] A. Averbuch, S. Itzikowitz, and T. Kapon, "Radar target track-
ing-Viterbi versus IMM," IEEE Trans. Aerosp. Electron. Syst., vol.
of models increases. More importantly, the use of more 27, pp, 550-563, May 1991.
models does not necessarily improve the performance; instead, [4] E. Balas and M. W. Padberg, "Set partitioning: A survey," SIAM Rev.,
vol. 18, pp. 710-761, Oct. 1976.
as shown theoretically in this paper, it may degrade the [5] Y. Baram and N. R. Sandell, Jr., "An information theoretic approach to
performance because of the excessive "competition" from the dynamical systems modeling and identification," IEEE Trans. Automat.
"unnecessary" models. Contr., vol. AC-23, pp. 61-66, Feb. 1978.
[6] -, "Consistent estimation on finite parameter sets with application
The MM algorithms can be formulated in a graph-theoretic to linear systems identification," IEEE Trans. Automat. Contr., vol.
framework. This opens up a new area in MM estimation. Three AC-23, pp. 4 5 1 4 5 4 , June 1978.
LI AND BAR-SHALOM: MULTIPLE-MODEL ESTIMATION 493
[7] Y. Bar-Shalom, Ed., Multitarget-Multisensor Tracking: Applications and Xiao-Rong Li (M’92-SM’95) received the B.S. and
Advances, vol. 11. Nonvood, MA: Artech House, 1992. M.S. degrees from Zhejiang University, Hangzhou,
[8] Y. Bar-Shalom, K. C. Chang, and H. A. P. Blom, “Tracking a maneuver- Zhejiang, P.R.C., in 1982 and 1984, respectively,
ing target using input estimation versus the interacting multiple model and the M.S. and Ph.D. degrees from the University
algorithm,” IEEE Trans. Aerosp. Electron. Syst., vol. 25, pp. 296-300, of Connecticut, Storrs, in 1990 and 1992, respec-
Apr. 1989. tively, all in electrical engineering.
[9] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Principles, Since 1994, he has been with the Department
Techniques, and Software. Boston, MA. Artech House, 1993. of Electrical Engineering, University of New Or-
1101
. - -, Multitaraet-Multisensor Trackinn: Princiules and Techniques. leans, New Orleans, LA, where he is an Assistant
Storrs, CT: YBS, 1995. Professor. During 1986-1987, he did research on
1111 W. D. Blair, G. A. Watson, and T. R. Rice, “Tracking maneuvering power transmission at the University of Calgary,
targets with an interacting multiple model filter containing exponen- Alberta, Canada. He was an Assistant Professor in the Department of Electrical
tially correlated acceleration models,” in Proc. Southeastern Symp. Syst. Engineering, University of Hartford, West Hartford, CT, from 1992 to 1994.
Theory, Columbia, SC, Mar. 1991. He is a consultant to several companies. His current research interests include
[I21 H. A. P. Blom and Y. Bar-Shalom, “The interacting multiple model stochastic systems, statistical signal processing, and electric power.
algorithm for systems with Markovian switching coefficients,” IEEE Dr. Li has published over 15 refereed joumal articles, three book chapters,
Trans. Automat. Contr., vol. 33, pp. 780-783, Aug. 1988. and coauthored (with Y. Bar-Shalom) two books: Estimation and Tracking:
[ 131 N. Christofides, Graph Theory: An Algorithmic Approach. London: Principles, Techniques, and Soffware (Boston, MA: Artech House, 1993)
Academic, 1975. and Multitarget-Multisensor Tracking: Prcnciples and Techniques (Storrs, C T
[ 141 M. Gauvrit, “Bayesian adaptive filter for tracking with measurements of YBS, 1995). He has also won several outstanding paper awards. He is an
uncertain origin,” Automatica, vol. 20, pp. 217-224, Mar. 1984. Editor for Tracking and Navigation of the IEEE TRANSACTIONS ON AEROSPACE
[I51 J. A. Gublakon and P. S. Maybeck, “Flexible spacestructure control via
AND ELECTRONIC SYSTEMS.
moving-bank multiple model algorithms,” IEEE Trans. Aerosp. Electron.
Syst., vol. 30, pp. 750-757, July 1994.
. - A. H. Jazwinski, Stochastic Processes and Filterinn Theorv. New York:
1161
Academic, 1970.
1171 X. R. Li, “Hybrid estimation techniques,” in Control and Dynamic
Systems: Advaices in Theory and Applications, vol. 76, C. T. Leondes,
Ed. New York: Academic, 1996, pp. 1-76.
1181 -, “Hybrid state estimation and performance prediction with appli-
cations to air traffic control and detection threshold optimization,” Ph.D. Yaakov Bar-Shalom (S’63-M’66-SM’SO-F’84)
dissertation, Univ. Connecticut, Storrs, 1992. was born on May 11, 1941. He received the B S
[I91 X. R. Li and Y. Bar-Shalom, “A hybrid conditional averaging technique and M S degrees from the Technion, Israel Institute
for performance prediction of algorithms with continuous and discrete of Technology, Haifa, in 1963 and 1967, and the
uncertainties.” in Proc. 1994 Amer. Contr. Con(, Baltimore, MD, June Ph D degree from Princeton University, Princeton,
1994, pp. 1530-1534. NJ, in 1970, all in electrical engineering
1201
~. -, “A recursive multiple model approach to noise identification,” From 1970 to 1976, he was with Systems Control,
IEEE Trans. Aerosp. Electron. Syst., voir 30, pp. 671-684, July 1994. Inc., Palo Alto, CA Currently, he is a Professor of
[21] - “Mode-set adaptation in multiple-model estimators for hybrid
~
Electrical and Systems Engineering at the University
systems,” in Proc. 1992 Amer. Contr. Con$, Chicago, IL, June 1992, of Connecticut, Storrs. His research interests are in
pp. 1794-1799. estimation theory and stochastic adaptive control,
[22] __, “Performance prediction of hybrid algorithms,” in Control and and he has published over 180 papers in these areas. He coauthored the
Dynamic Systems: Advances in Theory and Applications, vol. 72, C. T. monograph, Tracking and Data Association (New York: Academic, 1988);
Leondes, Ed. New York: Academic, 1995, pp. 99-151. the graduate text, Estimation and Trucking: Principles, Techniques and
1231 __, “Performance prediction of the interacting multiple model Sofiware (Boston. Artech House, 1993); the text, Multztarget-Multisensor
algorithm,” IEEE Trans. Aerosp. Electron. Syst., vol. 29, pp. 755-771, Tracking. Principles and Techniques (YBS, 1995); and edited the books,
July 1993. Multitarget-Multisensor Tracking Applications and Advances (Boston. Artech
[24] H. Lin and D. P. Atherton, “An investigation of the SFIMM algorithm
House, vol I, 1990, vol 11, 1992). He has been consulting for numerous
for tracking maneuvering targets,” in Proc. 32nd IEEE Conf Decision
companies and originated the series of multitarget-multisensor tracking short
Contr., San Antonio, TX, Dec. 1993, pp. 930-935. courses offered via UCLA Extension, at government laboratories, private
[25] D. G. Luenberger, Linear and Nonlinear Programming, 2nd ed. Read-
companies, and overseas. He has also develo ed the commercially available
ing, MA: Addison-Wesley, 1984.
1261 D. T. Magill, “Optimal adaptive estimation of sampled stochastic interactive software packages MULTIDATTJ for automatic track formation
processes,” IEEE Trans. Automat. Contr., vol. AC-10, pp. 434-439, and tracking of maneuvering or splitting targets in clutter, PASSDATTM
1965. for data association from multiple passive sensors, BEARDATTM for tar et
[27] M. Mariton, Jump Linear Control Systems in Automatic Control. New localization from bearing and frequency measurements in clutter, IMDAT’M
York: Marcel Dekker, 1990. for image segmentation and target centroid tracking, and FUSEDATTM for
[28] P. S. Maybeck, Stochastic Models, Estimation and Controls, vols. 11, fusion of possibly heterogeneous multisensor data for tracking.
111. New York: Academic, 1982. Dr. Bar-Shalom served as Associate Editor of the IEEE TRANSACTIONS
[29] P. S. Maybeck and K. P. Hentz, “Investigation of moving-bank multiple ON AUTOMATIC CONTROL, and from 1978 to 1981 as Associate Editor of
model adaptive algorithms,” AIAA J. Guidance, Contr., Dynamics, vol. Automatica He was Program Chairman of the 1982 American Control
10, pp. 90-96, Jan.-Feb. 1987. Conference, General Chairman of the 1985 ACC, and Co-chairman of the
[30] L. R. Rabiner, “A tutorial on hidden Markov models and selected 1989 IEEE International Conference on Control and Applications. During
applications in speech recognition,” Proc. IEEE, vol. 77, pp. 257-286, 1983-1987, he served as Chairman of the Conference Activities Board
Feb. 1989. of the IEEE Control Systems Society, and during 1987-1989, he was a
[31] S. N. Sheldon and P. S. Maybeck, “An optimizing design strategy for member of the Board of Governors of the IEEE Control Systems Society In
multiple model adaptive estimation and control,” IEEE Trans. Automat. 1987, he received the IEEE Control Systems Society Distinguished Member
Contr., vol. 38, pp. 651-654, Apr. 1993. Award. Currently, he is a Distinguished Lecturer of the IEEE Aerospace
[32] K. Thulasiraman and M. N. S. Swamy, Graph Theory: Theory and and Electronic Systems Society He was elected a Fellow of the IEEE for
Algorithms. New York: Wiley, 1992. “contributions to the theory of stochastic systems and of multitarget tracking.”