0% found this document useful (0 votes)
64 views

On Adaptive Control Processes

This document discusses adaptive control processes and how dynamic programming techniques can be used to design adaptive controllers. It presents an example problem of using feedback control to keep a nonlinear system near an unstable equilibrium state. The controller selects control values over time based on observations of a random disturbing force affecting the system, allowing it to adapt and potentially improve its control decisions. Numerical solutions are generally needed due to the complexity of real-world adaptive control problems.

Uploaded by

Shivani Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

On Adaptive Control Processes

This document discusses adaptive control processes and how dynamic programming techniques can be used to design adaptive controllers. It presents an example problem of using feedback control to keep a nonlinear system near an unstable equilibrium state. The controller selects control values over time based on observations of a random disturbing force affecting the system, allowing it to adapt and potentially improve its control decisions. Numerical solutions are generally needed due to the complexity of real-world adaptive control problems.

Uploaded by

Shivani Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ON MV

IE COmTROL PROCESSES

Richard BeUlnan and Robert Kalaba


The RAND Corporation
Santa Monica, California

sumrmary
One of the most challenging areas in the underlying physical processes. These condftions
field of automatic control is the design of auto- range all the way from complete howledge $0 total
matic control devices that 'learn' to improve their ignorance. As the processunfolds, however,
performance basedupon experience, i.e., thatcan additional information concerning these factors
adapt themselves to circumstances as they find may become available to the controller, which th
them. The military and connuercial implications of has the possibility of 'learming' to improve its
such devices are impressive, and interest in the performance basedupon experience, or in fact,
two main areas of research in the fieldof control, actual experimentation. In this case we say that
the USA and the USSR, rum high. Unfortunately, the controller adapts itself to its environment.
though, both theory and construction of adaptive In an earlier paper, [TI, a broad and general
controllers are in their infancy, and some timefoundation was laid for the mathematical of study
may pass before they axe connuonplace. Nonetheless,adaptive processes, through the ofuse concepts
developent at this time of adequate theoriesof from the fieldof dynamic programming,141. The
processes of this Z-ture is essential. specific purposeof this paper is to render these
The purposeof our paper is to show how thenotions more concrete through the detailed study
functional equation technique of a new mathe- of some special control processes involving a
matical discipline, dynamic progrmmkg, can be nonlineax system with a tendency to be stimulated
used in the formulation and solution of a variety to undesirable oscillations.
of optimization problems concerning the design of We approach the adaptive control process in a
adaptive devices. Although, occasionally,a series of steps. Flrst we assume that the control-
solution in closed form can be obtained, in general, ler hascmplete information concerning the behavior
numerical solution via theof use high-speed of the forcing function over time, a process which
digital computers is contemplated. is referredto as a deterministic control process.
We discuss here the closely allied problems of Then we introduce someunknown factors, which
formulating Optive control processes in precise appear mathematically as random variables having
mathematical termsarld of presenting feasible distribution functions which known are to the
computational algoritbmsf o r determining numerical controller. This leads toa stochastic control
solutions. process. Lastly, we allm the controller still
To illustrate the general concepts, considerless information about unknovn
the factors and
a system which is governed by the inhomogeneous require that the controller learn to improve its
Van der Pol equation performance through observation of the valuesof
,
r(t) an adaptive control process.
-
P + p(x2 1) + x = r(t), o 5 t 5 T, In this paper,we limit the deficiency in the
where r(t) isa random functiondose statistical controller's knowledge to incomplete information
properties areonly partially known to a feedback concerning a random disturbing force. There are,
control device which seeks to keep the systemneedless near to say,many other waysin which ignorance
the unstable equilibrium state x = 0, x = 0. It can manifest itself. Among these we may mention
proposes to do this by selecting the of pvalue
as uncertainties concerning the determination of the
a function of the stateof the system at time t, state of the system and its environment by the
and the time t itself. By observing the raudom sensing devices , the objective (figure of merit)
process r(t), the controller may, vith the passage of the process, the transfonuations of the state
of time, infer more and more concerning the statis- of the system produced by control decisions, the
tical properties of the function r(t ) and thus may set 02 allowable decisions, and so on. These will
be expected to improve the quality of its control be exmuined in subsequent investigations.
decisions. In this way the controller adapts
itself to circumstances -.it finds them. The Although the design and operation of adaptive
process is thus an interestjng ex,antple of adaptive controllers are in their infancy, interest in
control, and, conceivably, with some inrmediate these devicesruns high, [lo]. The functional
applications. equation techniqueof dynamic programing,(4),
can beused to attacka wide varietyof problems
Lastly, some areas of this general domain involving the determination of optimal control
requiring additional research are indicated. policies for control devices having the ability t
adapt themselves to circumstances. In particular,
I. ktrduction it providesa useful conceptual framework for the
very discussion of such devices.
In many engineering, economical, biological, occasion, aaalytic results are obtained, on
Though
and statistical control processes, a device of one [ ?),
type or another which we shsll call a eontroller emphasis i s upon the awelopent of methods which
is calledupon to perform under various conditions are suitable for use in conjunction with high-
of uncertainty regarding the structure of the speed digital comguters having large memories.

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
Some of the advantages of t h e dynamicprograrmning f o r 0 < t < T in order to minimize J[p1. The
approach are i t s suitability for use with nonc controi fun-ction w i l l besubject t o a constraint
l i n e a r , as well a s l i n e a r , systems, i t s autamatic
production of a desirable parameter study (a
3 5 p ( t ) 5 m2, where may benegative. Motice
t a t t h e c r i t e r i o n f u n a i o n i s not the usual mean-
' s e n s i t i v i t y ' or s t a b i l i t y a n a l y s i s ) , i t s straight- square deviation, which in t h i s case w u l d be of
forwardness and computational f e a s i b i l i t y , and l i t t l e avail since the underlying equations are
i t s a b i l i t y t o incorporate stochastic elements in nonlinear. The f i r s t term on the right-hand side
a routine fashion. of Eq. (4) measures the cost o f deviation during
Let us now t u r n our a t t e n t i o n t o an example the entire course o f the process and the second
which w i l l serve t o i l l u s t r a t e t h e s e remarks. term measures the cost of deviation a t t h e termina-
t i o n of control.
2. A Feedback Control Problem The temptation is t o view t h i s as a problem
Let us consider a system which, i f uncontrol- in the calculus of variations, [16], in which one
led, is governed by the well-known nonlinear seeks t o determine p = # ( t ) as a function of time
differential equation over t h e e n t i r e i n t e r v a l 0 5 t 5 T in an attempt
t o minimize the functional J. The f a c t t h a t p i s
x + p(x2 - l)X + x = 0, x(0) = cl, X(0) = c2, (1) constrained t o l i e between y and m i s a cause
(the Van der Pol equation) where the dots denote of some complication. Furthermore, 2there are no
differentiation with respect to time. This classical prototypes for the stochastic control
equation i s of fundamental importance in describing processes we wish t o study below.
the development of relaxation oscilLations in The approach which we shall use, by contrast,
t r i o d e o s c i l l a t o r c i r c u i t s and i n describing the emphasizes the feedback control nature of the
operation of lrmltivibrators. We s h a l l c a l l p the problem. We s h a l l imbed the original problem
system parameter. If we introduce the fhction within a class of problems in which we regard the
v ( t ) , by means of t h e r e l a t i o n !ystem as being in some general state x = cl,
x = v, (2) x = c2 a t the time t , and ask what the optimal
choice of p i s under thesecircumstances.Notice
then the equation in (1)can be replaced by the ( a s a consequence o f the usual existence and
first-order system uniqueness r e s u l t s f o r d i f f e r e n t i d e q u a t i o n s )
x = v, x(0) = T, (3) that the past history of the process i s of no
consequence i n making this decision, only the
e=-
p(x2 - -
l ) v x, v(0) = c2. current state. Pursuing t h i s approach, i n which
we have a continuous decision process, [21, we
2 2
It i s well-knoun t h a t if c1 + c2 # 0 then the , characterize the curve p = j ( t ) a s an envelope of
solution of the system ( 3 ) will tend toward a tangents, rather than as a locus of points, as
unique periodicsolution. In the (X,.) phase m l d be the case were the earlier vievpoint
plane, this periodic solution is represented by adopted.
a closed curve which all trajectories (except I n order t o prepare the way, though, f o r t h e
,
x = 0 v = 0) approach. Thus , when the system i s use of d i g i t a l computing machines, we wish t o
disturbed f r o m its (unstable) equilibrium position reformulate the problem in terms of a discontinuous
(x = 0, v = 0 ) , a periodic oscillation tends to time variable, which w i l l a l s o m a t e r i a l l y s i m p l i e
develop. Full d e t a i l s are available in t h e book matters conceptually &en we deal with the cases
on nonlinear oscillations by Stoker, [11. of stochastic and adaptive control.
Let us assume, though, t h a t t h e o s c i l l a t i o n s
are undesirable ( ' p a r a s i t i c ' ) , and t h a t t h e 3. A DiscreteVersion
system can be controlled by varying the system The problem could be treated in the form in
parameter, p, in a given range in an e f f o r t t o which it now stands, [21. Since our objective i s
maintain the system as close as possible to i t s t o devise methods which are particularly suitable
equilibrium state. f o r high-speed d i g i t a l computational purposes, we
Consider that the process begins at time 0 and prefer to reformulate the model i t s e l f in terms
terminates a t time T, and t h a t t h e system i s of discrete variables. It must beborne in mind
initially in s t a t e (cl, c2), mere c1 i s the t h a t , in any event, digital computers consider all
v a r i a b l e s t o be discrete.
displacement, x, and c i s the velocity, x = v.
2 The time interval f r o m 0 t o T i s divided into
We shall a r b i t r a r i l y measure the 'cost' of N intervals of length h so t h a t
deviation from equilibrium during the process by
t h ei n t e al N h = T (1)
J[pl =
f ( f x ( t ) l + / v ( t ) / ) d t + exp(lx(T)] + h ( t ) l ) ,

where exp( z) is the exponential function of 2 .


(41
the system is in s t a t e (%,vk)
~f at time a,and
the control decision a t that time is t o have the
system parameter be %, then the new s t a t e et
time (k t 1 ) h is given by the finite difference

k:
We deliberately use such a monstrous function in equations
order t o squelch any direct analytic approach i n
% + vkh
1
=
embryo. Our objective w i l l bethedetermination
of the system parameter p as a flmction of the
s t a t e of the system a t time t and the time t i t s e l f , = vk - [.tk($ - l)vk + % h,

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
relations which hold for k = 0,1,2,. N 1. .., - system parameter values with the s y s t e m i n s t a t e
These equations are the finite difference (c,,c,) -
a t time (N k)h. Let ustherefore
analogues of theequations in (2.3). The cost of
thedeviation f r o m equilibrium from time kh t o define for k = 1,2 ,...,N, the functions
time
(k + l ) h i s taken t o be ( + prkl )h, fk(c1,c2) = thecost of the last k stages
ma the cost for deviation of t h e f i n a l s t a t e from of the control process with the
equilibrium is e-( 151 + lvNl ). The t o t a l system beginningthose last k
cost for deviation from the equilibrium state stages in s t a t e (cl,c2), and .
during the entire process is considered t o be using an optimal selection of the
system parameter throughout the
m-1 t-
remainder of the process.
(1)

We s h a l l determine first fl(c1,c2), then


f2(c1,c2),and so on, u n t i l fN(cl,c2) has been
determined. A t the same time, we s h a l l determine
the optimal choices of p t o make.

The function fl(c1,c2) is e a s i l y determined.


value of the systemparameter selected a t time kh. Here we are concerned with a process which begins
Let us assume t h a t t h e system is in state (x,y)
i n i t i a l l y and that we seek a s e t of parameter -
st time (N l ) h and terminates a t time Iw = T,
values, [po, p1,. .
.,fin-;] , which w i l l leinimize withthe system in s t a t e (cl,c2) a t time
t h e t o t a l coat of deviation given in equation (3). (N - 1)h.The cost of deviation during the process
is ( cll + 1c21 )h. If thevalue of the state
4. Deterministiccontrol parameter selected is t.f
, then the state a t the
termination of the process will be given by the
As stated, the problem requires a constrained equations
N-dimensional minimization t o be performed, and as
such may be quite difficult t o carry out cornputs-
t i o n a l l y in Its native form. Even so, this
problem i s conceptually much simpler than the
original continuous version which required a mini- vN = c2 - [PC.: - l)c2 + c1] h,
mization over elements i n a function space. To
solve this new discrete problem we imbed the
given declaion process within a class of processes where use has been made of the formulas in (3.2).
in such a vay t h a t we shall have a sequence of H The cost of this terminal deviation is
simple one-dimensional optimizations t o perform,
rather than the one d i f f i c u l t N-dimensional lcl + c2hl + Ic2 -h[ft(cl 2 - 1)c2 + c
problem. This decomposition makes possible an
e f f i c i e n t machine solution.
Consequently, the systemparameter p must be
The imbedding i s accomplished by focusing our selected so that the total cost, given by the
attention upon determining whaf value between ml expression
and m of t h e system p a m e t e r t o choose a t time
2
-
(H k)h if the system i s then i n 8-
,
general
s t a t e (a,b ) where k may have any of the values
-.
0,1,2,. ..,N -
1 and a and b are any r e a l
numbers. The original discrete problem is one of
the members of this c l a s s of problems. Notice i s minimized. The minimizing value of
that t h i a i s the general problem of i n t e r e s t t o one-stage processwill depend on the s?t atefor
(cl,this
c2),
the feedback controller, for it must decide what and it can easily be determined by a digital
value of the system parameter to call for with the computer using a searchtechnique. The bracket is
system in some physicalstate(a,b) and kh evaluated f o r sample values of f 4 in the range
time units r d n i n g before the termination of the ml 5 5 m2 and thevalue of yieldingthe
process.
smallest value of the bracket i s t h e o p t h a 1 system
To formulate the problem analytically, we pararaeter value. Here me see by inspection that
note f i r s t that t h e m i
ni
m l cost of deviation Over
a , should be chosen equal t o m,,
U if
the last k stages of theprocesswiththe system ‘ 2 -21)cg > O and equal to ml ci f (cl2 - 1)c2 < 0.
(cl
s t a r t i n g t h i s portion of the process i n some state,
say (c1,c2)? i s some definitefunction of k and If (cl -
l)c2 = 0, the choice of p is immaterial.
c1 and c2.It is, namely, thecostthat is in- kt us denote this optimal choice
stage process under consideration
curredduring t h e hat k stages of theprocess initially in state (c,,~,) by
using an optimal eelection of t h e sequence of

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
expression
We have the i r h r -

particular, since
~n fl(c1,c2) i s known
+
2 hl + 12
' -"[b('l 2 - ')'2 +

' from the above discussion, f ( c ,c ) can be


2 1 2
(4) determined from the formula

where the right-hand side represents the minimum


f2(c1,c2) = b1/h + /c2j h + Min p l ( c l
over a l l choices of? between ml and m2 of the
m15pS2
expression i n brackets. It i s c l e a r that this
m i n i m u m value actually depends on c1 and c
2'

Now let us assume that the functions


fl(c1,c2), f2(C1,C2),"', fk(c1,c2), vith < N,
have a l l been determined. W e wish next t o d e t e r -
mine the function fk+l(c1,c2). W e do t h i s by
making use of the principle of optimality,
which i s a special, but quite important, app
t i o n of the concept of invariant imbedding, [8]
CL:. which follows from equation ( 6 ) . The value of
which minimizes the expression in brackets in
equation (7) i s theoptimal value of to choose
According t o this principle, an optimal sequence
of decisions has the property that whatever the
-
a t time (N 2)h(theinitialstage of a two-
stage decision process) d t h t h e system i n s t a t e
i n i t i a l d e c i s i o n and i n i t i a l s t a t e are, the remain-
ing decisions must constitute an optimal sequence
(c,,c,) . We denote t h i s optimalchoice of p by
of decisions with respect to the state which
results from the first decision.

To apply this p r i n c i p l e , l e t u s suppose t h a t


a t time (W - k -
l)h, with k + 1 decisions re-
maining and the system i n some s t a t e ( c ,c ), the
f i r o t decision i s t o s e t t h e system parkneger
equal to f 4 .
The e f f e c t of this decision i s t o
Similarly, the remaining m i n i m a l cost functions
f ( c , c ) and optimal decision functions %(c1,c2)
transform the system i n t o t h e s t a t e ( 5vk) , at k 1 2
can nou be determined recursively.
time (N - k)h (when k decisionsremain),vhere
In order to apply this solution, it i s
\ = c1 + c2h, necessary, of course, to construct a control device
that w i l l c a l l for the indicated optimal value of
vk = c2 - [~(c: - 1)c2 + el] h.
the system parameter pl f o r each s t a t e of the
system,andtimeremaining.Should this prove not
t o be feasible, other sub-optimal policies will
have t o be employed. These can a l s o be determined
From the cost point of view we see t h a t this byaynamic programing methods, by imposing suit-
r e s u l t si n a cost ( [,CI
+ /c21 )h during the able constraints on the allowable choice of fl and
the information fed into the computer-controller.
time i n t e r v a l from (N k 1 ) h t o (W- k)h - - In the event of t h i s sub-optimization, ll3-J , the
and (since optimal decisions must be made over the loss i n system performance can be quantitatively
remaining k decisionsbeginningwiththe system assessed by comparison with the performance of an
i nt h es t a t e (%,v,) given by equation ( 8 ) 1, a optimal controller.
cost of f (c
k 1
+ c2h ,cp
l ) c 2 + CJ h) - [& - 5. I n f i n i t e Duration
duringthe remainder of theprocess.Clearly, the
choice of the system parameter a t time (N k l)h - - In the event that the process i s of suffi-
must be made so as t o minimize the sum of these ciently great duration, we may wish t o approximate
w
t o costs. This observation results in the t o it by means of an infinitely long process.
eqetion Furthermore, we may now wish to exert control so
as t o minimize the maximum deviation of the
function
Ix(t)l + Iv(t)l
over ELI. time. kt
us then define the function

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
f(c c ) = the maximum value of IxI+IVI N
1-
" over all time with the system = x(l\ki + lvkl lh + e q ( l % ] + lvH1) ( 3)
i n i t istahtilan
eltye (c1,c2), k=O
using an optimal control policy.
is minimized. This now can only be accomplished
(1) in some averagesense, since % and vB are random
variables for k = 1,2, ,N. ...
Once again we imbed
the original process within a class of procesaea.
CI Denoting
taking the value
of an expected by E, we
i .. The relevant
functional
equation now becomes define
the sequence of functions

J
This is based on the observation that u must be
chosen so t h a tt h el a r g e r of ]e1) + ]c2/ and the for k = 1,2,...,N, where
greatest aeviation over the remainder of the pro-
cess w i l l be a s smell possible.
as A further %k = vw= c2' (5)
discussion of suchequations can be found in 2 3 ,
Xhus, fk(c1,c2) represents the minimal expected
6 . StochasticControl
t o t a l c o s t of deviation from equilibrium for a
L e t us now complicate mattersfor the con- processbeginning a t time (N k)h and terminating -
t r o l l e r , somewhat,by assuming t h a t t h e system is a t time T = Nh, withthe system i n i t i a l l y in the.
subjected t o a random external force,
the in- s t a t e (cl,c2) .
fluence of which cannotbeneglected. If we
denote the random force which is exerted a t time We first consider
theone-stage process which
kh by rg, then equations (3.2) become -
begins a t time (a l ) h , ani3 terminates a t Rh = T.
We have
xn+l = xn + Vnh>
f 1( 1
c ' c2 ) = Min Eicllh + lc21h + exp[\cl
v
n+l
= vn -[pn(xn 2 - l ) v n + xn I
The controller can no longer predict precieely
h + rnh.

+
m15 u 9 2

c2hl + [e2 -( 2
- 1 ) c 2 + cl)h
w h a t state the system will be transformed i n t o
when avalueforu is chosen, for thetransformed
s t a t e depends on tEe value
that
the random force Jl . . . .
+ ,
rn asmmes.
or, taking the expected value over r
For simplicity, we shall a s s m that the N-
1'
random variables rn are independent and t h a t

So that the disturbing force is e i t h e r 26. The


case where r is correlated to the value of
n
can also be considered a t some increase in
complexity.
+ Ic2 - ( ~ ( ~ -1 21)c2 + cl)h +
1
Sat

- ..'il
We do,however,wish t o assume t h a t t h e value
of p have the known value p*. This last +jc2 - ( ~ ( ~ 1-21)c2 + cl)h
assumption is not justified i n many situationa.
If thevalue of p is not known, further compli-
cations arise, leading to the adaptive control Once again, t h i s minimization can e a s i l y be per-
processes discussed in $7. formedby a computer, so that the functions
fl(c1,c2)and M1(c1,c2), the minimizing value of
Once again we wish to control the development
of the system i n such a way t h a t p f o r each state (c c ) canbe taken a s known.
1' 2

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
Next we consider the process which is initi- brash contempt for the power of the human mind
ated at time -
(B k)h with the system in a general
state (c1,c2), so that the process involvesk Let us return to our nonlinear system
is being disturbed by a random force. Let now us
decisions. We wish to determine the optimal firstdeprive the controller of the knowledge
of the ex-
decision for the controller to make under these
circumstances, and we denote the ODtimal of value act
rn
value of p . The controller stillknows that
= 2 5 , but the probability of each outcome is
I. Por each state (c1,c2) yC(c1,c2).
by
not known. Although this is an unpleasant situa-
For any choiceof the system parameter tion, this controller is still much more fortunate
state of the system is transformed from the than one that does not know eventhe form of the
(c1,c2) at time(N -
k)h into the state distributions of the variables r , or their degree
of correlation. We shall not en?er into a dis-
( c1 + c2h,c2 -
[p( : c -
l)c2 + cl] h +
cussion of such matters here.
We can proceed with the design of an adaptive
with probability p* and into the state controller along the rollowing lines. The state
of the system will be characterized now o n l y not
(cl + c2h,c2 - [ + ( e : -
l)c2 + cl] h -
by a position anda velocity, butalso by a cur-
rent estimate for p, which in the absence of
further informationwe shall agree to regard as
-
with probability1 p*. Consequently, once again the precise value of the probability that =rn +6
using the principle of optimality, we obtain the
recurrence relation At any particular stage of the process uhen
control decision is made, not only does the sys
change state physically, but on the ofbasis the
knowledge of the original physical state, the
transformed physical state and the parameter valu
(LA) chosen, the controller can determine the sign
of theunknown force for that stage. This may
lead the controller to change its estimate of p.
But howshall the estimate be changed?
+ cl] h + a)
Though there are many ways of answering this
question, let us indicate one specific approach.
Let us regard p itself as a random variable with
an a priori probability density function u(x),
i.e.,

k = 2,3,...,N. The term in brackets represents the


cost during the first period from k)(N to -
Prob
c a 5p5a +A = W(a)A + O(A
2
).

-
(N k + l ) , and the second the minimal expected As
-
over the remaining k1 periods. As before in the we take the expected value
of p,
shall we
costthe initial estimate of p*, a value
call ply
deterministic case, we can determine computationally
the desired functions fk(c1,c2) and
%(c1,c2), using
the foregoing recursive relations.
7. Adaptive Control Upon observing that a positive disturbing force is
exerted, r= + 6, our new estimateof the pro&
In some circumstances, even less information ability density function for p will be given
than was assumed in the previous section be will
available to the controller concerning external
influences which may affect the behaviorof the
system being controlled. Provision may be made,
though, for the controller to "learn" about the
nature of these influences, as the processunfolds. Upon observing a negative disturbing rforce, = 6, -
we shall change our estimate of the probability
It may then be able to improveits control decision-
making capability in the courseof time. In this density function to
sense the controller adapts itself to circumstances.
w,(x) =
(1 x)w(x)
(4)
- .
Observe that we are using the word in a quite
precise sense. There is nothing mystical about the
machine "thinking" or "creating" or "learning" in
this restricted sense. That the human mind works Here we have adopted a Bayes approach,[15]. This
in this way, or that the machine in
any sense is the procedure adoptedin [6,7,9]. Consequently,
approximates the behaviorof the human mind, can after observing a positive disturbing force, the
only be concludedon the basisof a rash evaluation new estimate of p* itself is
of the possibilities of a digital computer or a

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
heldfixed. Then we define the sequence of
functions.

fk(cl,c2;m,n) h
te expected cost of control
=
over the last k stages of a
and after observing a negative disturbing force, process in which i n i t i a l l y
the new estimate of p* i s the system is inphysical (10)
state (clc' 2 ) and m posi-
/- x(l - x)w(x)ax t i v e and n negative forces
havebeen observed, Using an
optimal control policy.
In taking expected values we shall assume that
current estimates of distributions are the true
Byway of s m r y we m y note that i f the a distributions, consistent with our previous
p r i o r i choice of probability density function i s practice.
w(x), then after m positive and n negative forces
havebeen observed the new estimate of the density For the k-stage
process we f i n d (U)
m c t i o n is

xm(l - xpW(x)
6 l x y 1 - X)%(X)dx '
+cl]h+Sh;m+l,n)
and p,, the new estim'e of p itself, i s

lxm+l(l - x)nw(x)ax
PmJn -4 lxm(l - x)nv(x)dx
(7) +cJbGh;m,n+l) ,
The controller i s t o a c t as i f this estimate i s f o r k=.2,3,...,N, and f o r the one-stage process
the exactvalueof p*. This should be recognized
as an assumption of our analysis.
The i n t e M s i n equation (7) s i m p ~ ~ if fy
w(x) is the density function for a beta distri-
bution,

where B(a,b) i s the beta function. G r e a t flexi-


b i l i t y i n the shape of the curve of w(x) can be
achieved by selecting the pamters a and b
appropriately. For this choiceof w(x) we have

4 xWa(l - x)n+bldx A s before, the functions rk and the decision


'm,n -l 1 xm+a-l(l - xln+b-ldx (9)
functions %(c1,c2;m,n)
recursively.
are t o be determined

me n u m e r i a r e s o l u t i o n of equations (11) and


- B(m + a + 1, n + b) ?resents some difficulties, however, i n that
(E)
- B(m + a, n + b) sequences of functions of four arguments a r e t o be
determined. Several methods present themselves
forconsideration, however. I n particular, we
- (m + a ) note that when m + n i s large we can essert with
(m +
a ) + (n + b )
some confidence that m/(m + n) i s a good est-te
The parameters a and b play the roles of t h e a f o r p*. The decisions called for in the solution
p r i o r i numbers of positive and negative forces of the stochastic control process discussed in 5'6
observed. If the sum i s small,not much w e i g h t should povide nearly optimal control decisions.
i s given t o t h e i n i t i a l estimate; i f it i s large, Some advantage may be gained by considering an
many periods of the process are required before i n f i n i t e stage process as i n 5 5 , which has the
the estimate of p* can be significantly changed. e f f e c t of eliminating one subscript. These, and
other matters, will be discussed i n a forthcoming
opt-
We now take up the problem of determining the
decisions for the controller to make.
thesis by M. Aoki fU] , .
F i r s t the function w(x) i s chosen and it i s then
7

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
Notice that the adaptive cFlntroller dis- f ( c ,c ;m,n) =
1 1 2
IC, Ih + IC, jh
cussed i n t h i s s e c t i o n does no research"
regarding the random variables. It merely
observes the history of the process t o d a t e and
combines t h i s with i t s a p r i o r i knowledge i n a
way which i s specified a t the beginning of the
process, i n order to a r r i v e a t i t s current con-
trol decision. How the controller will a c t a6 the
result ofany particular observed history i s f u l l y
s p e c i f i e di n i t i a l l y . More sophisticatedcontrol-
l e r s v i l l be designed to look for correlations,
provide for non-stationarities in the unknown pro-
cess, and so on. Much remains to bedone d o n g
these lines.
This case differs mathematically from the
8. An Extension previous ones only in t h a t t h e minimizations a r e
now over a two-dimensional region rather than a
L e t us now turn our attention to the case i n one-dimensional region.
which the system can be controlled both by modi-
fying the system parameter p and also by exerting 9. Discussion
a control force, gk, a t time kh, where
k = 0,1,2 ,...,
N -1.. The equation for the state of In the earlier sections of t h i s paper we have
sketched a treatment of an adaptive control process
the system becomes
frm the dynamicprogramming viewpoint. Much re-
xWl = X + vnh, x. = cl, mains t o be done a t various levels in the treatment
n (1) of these fascinating control processes.
vn+l = vn - [.,(xn 2
J
- 1 ) vn + xh + rnh + gnh, A t the conceptual level, for example,models
involving other types of uncertalnties on the part
"0 = c.2, of the controller , mentioned i n 51, have y e t t o be
constructed. One of the principal difficulties
where now both p and gn are a t ourdisposal. As occurs in describing the state of knowledgeof the
before, pn i s l&ted t o l i e within the region controller and how t h i s changes as new information
i s added.F'urthermore, so much information may
nI -
<p<%,
I+
n=O,1, ...,N - 1 , (2 1 become available that a way must befound t o
summarize it succinctly without impairing the
and we s h a l l a l s o a s s u m that the control force decision-making capability to a marked degree. In
%is constrained by the inequality t h i s connection, see the discussion of sufficient
s t a t i s t i c s in [141.
14 5 G, n = 0,1,2, ...m - 1.
Insofar as the mathematical analysis itself i s
Upon introduction of the functions concerned, many perplexing problems arise, as, f o r
fk(xJv;m,n), example, questions concerning the convergence of
discrete adaptive processes to continuous processes
and the very formlation of adaptive processes of
continuous type.

Lastly, as we have already indicated, for the


more realistic processes involving more s t a t e
variables, the computational solutions present
special problems of t h e i r own, a l l of which must
be carefully investigated.

Another whole problem area beyond those already


mentioned i s encompassedby the actual construction
of optimsladaptivecontrollers. Challenging
problems arise in t r y i n g t o pursue a s t r a i g h t and
narrow path between the complexity of exact
solutions and t h e f a l l i b i l i t y of approximations.

References

1. Stoker, J. J., Nonlinear Vibrations in Mechani-


c a l and E l e c t r i c a l Systems, Interscience
Publishers, k c . , New York, 19%.

and

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.
2. Bellman, R., 'On the applicationof the through inhomogeneous media,' Proc. Hat.
Zheory of dynamic programming to the study Acad. Sei. USA, vol. 42, 1956, pp. 629-632.
of control processes,' Proc. Symposium on
Nonlinear Circuit Analysis, Polytechnic 9. Bellman, R., 'A problem in sequential design
Institute of Brooklyn, Brooklyn, New York, of experiments,' ~ y avol. , 16, 1956,
1957, PP. 199-2l-3. pp. 221-229.
*
3. Bellman, R., 'Dynamic programming and 10. Aseltine, J. A., A. R. Mancini, and
stochastic control processes,' Information C. W. m u r e , 'A Survey of adaDtive control
and control,POI. 1, 1958, pp. 228-239. systems,' IRE-Trans. On-Automatic Control,
SAC-6, k c . 193, pp. 102-108.
4. Bellman, R., Dynamic P r o g r m i n g , Princeton
University Press, Princeton, New Jersey, ll. Aoki, M., W.D. Thesis, Universityof California
1957 at Los Angeles, to appear.

5. Bellman, R., and R. Kalaba,'On the roleof 12. Freimer, M., F'h.D. Thesis, Harvard University,
dyn&c programming in statistical to appear.
communication theory,'IRE Trans. on
Information Theory, vol.IT-3, 1957, 13. Bellman, R., and R. Kalaba, On k-th best
PP 197-203* licles, The
RAND Coqbration, Paper
p"-t-
P-1 17, July 1958.
'On communication
6. Bellman, R., and R. Kalaba,
processes involving learning and random 14. Mood, A. PI., Introduction to the Theory
of
duration,' 1958 IRE Nat ionai Convention Statistics, McGraw-Hill
Book Compamy, Inc.,
Record, part4, July 1958, pp. 16-21. New York,19%.

7. Bellman, R., andR. Kalaba, Dynamic programm- 15. Cramer, H., Mathematical Methodsof Statistics,
ing and adaptive processes--1: Mathematical Princeton University Press, Princeton,
Foundation, TheRAND Corporation, Paper New Jersey,1951
P-1416,July 3, 1958.
16. Courant, R., and D. Hilbert, Methods of
8. Bellman, R., and R. Kalaba,'On the principle Mathematical Physics,vol. 1, Interscience
of invariant imbedding and propagation Publishers, Inc., New York,1953.

Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:48:18 UTC from IEEE Xplore. Restrictions apply.

You might also like