Computational Nonlinear Stochastic Control based on the Fokker

Transcription

Computational Nonlinear Stochastic Control based on the Fokker
Computational Nonlinear Stochastic Control based on the
Fokker-Planck-Kolmogorov Equation
Mrinal Kumar ∗S. Chakravorty †John L. Junkins
‡
February 13, 2008
Abstract
The optimal control of nonlinear stochastic systems is considered in this paper. The central role played by the
Fokker-Planck-Kolmogorov equation in the stochastic control problem is shown under the assumption of asymptotic
stability. A computational approach for the problem is devised based on policy iteration/ successive approximations,
and a finite dimensional approximation of the control parametrized diffusion operator, i.e., the controlled FokkerPlanck operator. Several numerical examples are provided to show the efficacy of the proposed computational
methodology.
1
Introduction
Numerous fields of science and engineering present the problem of uncertainty propagation through nonlinear dynamic
systems.1 One may be interested in the determination of the response of engineering structures - beams/plates/entire
buildings under random excitation (in structure mechanics2 ), or the motion of particles under the influence of stochastic force fields (in particle physics3 ), or the computation of the prediction step in the design of a Bayesian filter (in
filtering theory)4, 5 among others. All these applications require the study of the time evolution of the Probability
Density Function (PDF), p(t, x), corresponding to the state, x, of the relevant dynamic system. The pdf is given
by the solution to the Fokker-Planck-Kolmogorov equation (FPE), which is a PDE in the pdf of the system, defined
by the underlying dynamical system’s parameters. In this paper, approximate solutions of the FPE are considered,
and subsequently leveraged for the design of controllers for nonlinear stochastic dynamical systems. In the past few
years, the authors have developed a generalized multi-resolution meshless FEM methodology, partition of unity FEM
(PUFEM), utilizing the recently developed GLOMAP (Global Local Orthogonal MAPpings) methodology to provide
the partition of unity functions6, 7 and the orthogonal local basis functions, for the solution of the FPE. The PUFEM
is a Galerkin projection method and the solution is characterized in terms of an finite dimensional representation of
the Fokker-Planck operator underlying the problem. The methodology is also highly amenable to parallelization.8–12
Though the FPE is invaluable in quantifying the uncertainty evolution through nonlinear systems, perhaps its greatest
benefit may be in the stochastic analysis, design and control of nonlinear systems. In the context of nonlinear stochastic
control, Markov Decision Processes have long been one of the most widely used methods for discrete time stochastic
control. However, the Dynamic Programming equations underlying the MDPs suffer from the curse of dimensionality.13–15 Various approximate Dynamic Programming (ADP) methods have been proposed in the past several years
for overcoming the curse of dimensionality,15–19 and broadly can be categorized under the category of functional reinforcement learning. These methods are essentially model free method of approximating the optimal control policy in
stochastic optimal control problems. These methods generally fall under the category of value function approximation
∗ Mrinal Kumar is a Ph.D student in the department Aerospace Engineering at the Texas A&M University, College Station, TX-77843,
mrinal@neo.tamu.edu, http://people.tamu.edu/˜mrinal.
† Suman Chakravorty is an Assistant Professor with the Faculty of Aerospace Engineering at Texas A&M University, College Station
‡ John L. Junkins is the Royce E. Weisenbaker Distinguished Professor of Aerospace Engineering at Texas A&M University, College
Station
1
methods,15 policy gradient/ approximation methods17, 18 and actor-critic methods.16, 19 These methods attempt to
reduce the dimensionality of the DP problem through a compact parametrization of the value functions (with respect
to a policy or the optimal function) and the policy function. The difference in the methods is mainly through the
parametrization that is employed in order to achieve the above goal, and range from nonlinear function approximators
such as neural networks16 to linear approximation architectures.15, 20 These methods learn the optimal policies by repeated simulations on a dynamical system and thus, can take a long time to converge to a good policy, especially when
the problem has continuous state and control spaces. In contrast, the methodology proposed here is model-based, and
uses the finite dimensional representation of the underlying parametrized diffusion operator to parametrize both the
value function as well as the control policy in the stochastic optimal control problem. Considering the low order finite
dimensional controlled diffusion operator allows us to significantly reduce the dimensionality of the planning problem,
while providing a computationally efficient recursive method for obtaining progressively better control policies.
The literature on computational methods for solving continuous time stochastic control problems is relatively sparse
when compared to the discrete time problem. One of the approaches is through the use of locally consistent Markov
decision processes.21 In this approach, the continuous controlled diffusion operator is approximated by a finite dimensional Markov chain which satisfies certain local consistency conditions, namely that its drift and diffusion coefficients
match that of the original process locally. The resulting finite state MDP is solved by standard DP techniques such as
value iteration and policy iteration. The method relies on a finite difference discretization and thus, can be computationally very intensive in higher dimensional spaces. In another approach,22, 23 the diffusion process is approximated
by a finite dimensional Markov chain through the application of generalized cell to cell mapping.24 However, even
this method suffers from the curse of dimensionality because it involves discretizing the state space into a grid which
becomes increasingly infeasible as the dimension of the system grows. Also, finite difference and finite element methods
have been applied directly to the nonlinear Hamilton-Jacobi-Bellman partial differential equation.25, 26 The method
proposed here differs in that it uses policy iteration in the original infinite dimensional function space, along with a
finite dimensional representation of the controlled diffusion operator in order to solve the problem. Considering a lower
order approximation of the underlying operator results in a significant reduction in dimensionality of the computational problem. Utilizing the policy iteration algorithm results in typically having to solve a sequence of a few linear
equations (typically < 5) before practical convergence is obtained, as opposed to solving a high dimensional nonlinear
equation if the original nonlinear HJB equation is solved.
The literature for solving deterministic optimal control problems in continuous time is relatively mature when compared to its stochastic counterpart. The method of successive approximations/ policy iteration has widely been used
to solve deterministic optimal control problems. Various different methods have been proposed for solving the policy
evaluation step in the policy iteration algorithms that include Galerkin-based methods27, 28 and neural network based
methods29–33 among others. The methodology outlined here can be viewed as an extension of this extensive body of
work to the stochastic optimal control problem. However, it should be noted, that the extension is far from trivial as
the stochastic optimal control problem, especially the the control of degenerate diffusions, has pathologies that have
to be treated carefully in order to devise effective computational methods for solving such problems. We would like
to mention that we are interested in the feedback control of dynamical systems here, i.e, in the solution of the HJB
equation as opposed to solving the open loop optimal control problem based on the Pontryagin minimum principle34
that results in a two-point boundary value problem involving the states and co-states of the dynamical system. Please
see references34–36 for more on this approach to solving the optimal control problem. Also, we would like to state here
that, to the best of our knowledge, there is no stochastic equivalent to the two-point boundary value problems that
result from the Pontryagin minimum principle being applied to the deterministic optimal control problem.
The rest of the paper is arranged as follows. Section 2 introduces the forward and backward Kolmogorov equations
and shows their equivalence under the condition of asymptotic stability. Section 3 outlines the computational method
used to obtain a finite dimensional representation of the infinite dimensional Fokker-Planck or forward Kolmogorov
(FPE), and the backward Kolmogorov Equation (BKE). In Section 4, the finite dimensional representations of the
FPE and BKE operators are used to outline a recursive procedure, based on policy iteration, to obtain the solution
to the stochastic nonlinear control problem. In Section 5, the methodology is applied to various different nonlinear
control problems.
2
2
The Forward and Backward Kolmogorov Equations
In this section, we shall introduce the forward Kolmogorov or the Fokker-Planck Equation (FPE) and the Backward
Kolmogorov equation. While the former is central to the propagation of uncertainty and nonlinear filtering of dynamical
systems, the latter is paramount in the solution of stochastic control problems. It can be shown that if the FPE is
asymptotically stable, then the forward and backward equations are equivalent in a sense to be made precise later in
this section.
For the sake of simplicity, we shall consider the case of a one-dimensional system in this section, the results are easily
generalized to multidimensional systems. Consider the stochastic differential equation:
ẋ = f (x) + g(x)W,
(1)
where W represents white noise. Associated with the above equation is the Forward Kolmogorov or the Fokker-Planck
Equation (FPE):
∂p
∂(f p) 1 ∂ 2 (g 2 p)
,
=−
+
∂t
∂x
2 ∂x2
(2)
and the Backward Kolmogorov Equation (BKE):
∂(p) g 2 ∂ 2 p
∂p
=f
+
.
∂t
∂x
2 ∂x2
(3)
The FPE governs the evolution of the probability density function of the stochastic system Eq.1 forward in time while
the BKE governs the evolution of the uncertainty (pdf) in the system backward in time.
The FPE is said to be asymptotically stable if there exists a unique pdf p∞ such that any initial condition, either
deterministic or probabilistic, decays to the pdf p∞ as time goes to infinity. In the following,we show the equivalence
of the BKE and the FPE under the condition of asymptotic stability of the FPE.
Since the FPE is asymptotically stable, let the pdf at any time be represented as follows:
p(x, t) = p∞ (x)p̄(x, t),
(4)
i.e., as the product of the stationary pdf and a time varying part. Substituting into the FPE, we obtain:
∂p∞ p̄
∂(f p∞ p̄) 1 ∂ 2 (g 2 p∞ p̄)
=−
+
,
∂t
∂x
2
∂x2
(5)
which after using the fact that p∞ is the stationary solution, i.e., that
∂p∞
∂(f p∞ ) 1 ∂ 2 (g 2 p∞ )
=−
+
= 0,
∂t
∂x
2 ∂x2
(6)
and expanding the various terms in Eq. 5 leads us to the following identity:
p∞
∂ p̄
Consider the term −f p∞ ∂x
+
∂ p̄
∂ p̄ 1 2
∂ 2 p̄
∂ p̄ ∂(g 2 p∞ )
= −f p∞
+ g p∞ 2 +
.
∂t
∂x 2
∂x
∂x ∂x
∂ p̄ ∂(g 2 p∞ )
∂x
∂x
(7)
in the above equation. Note that from Eq. 6 that
∂
1 ∂g 2 p∞
(−f p∞ +
) = 0,
∂x
2 ∂x
(8)
1 ∂(g 2 p∞ )
= const.
2 ∂x
(9)
and hence,
−f p∞ +
Hence, if we assume that the invariant distribution is well-behaved such that the terms f p∞ → 0, and
x → ±∞, i.e, if the stationary distribution is not heavy-tailed, then
−f p∞ +
1 ∂(g 2 p∞ )
= 0,
2 ∂x
3
∂g 2 p∞
∂x
→ 0 as
(10)
and hence, substituting the above into Eq. 7, and dividing throughout by p∞ (x), it follows that
∂ p̄
∂(p̄) g 2 ∂ 2 p̄
,
=f
+
∂t
∂x
2 ∂x2
(11)
i.e, the time varying part of the pdf follows the BKE.
Denote by Lf (.) and Lb (.) the FPE and BKE operators respectively, i.e.,
∂(f p) 1 ∂ 2 (g 2 p)
,
+
∂x
2 ∂x2
∂p g 2 ∂ 2 p
Lb (p) = f
.
+
∂x
2 ∂x2
Lf (p) = −
(12)
(13)
The BKE operator is the formal adjoint of the FPE operator. Under the condition of asymptotic stability of the
FPE, and in view of the above development, the BKE and FPE are equivalent in the following sense: if ψ(x) is and
eigenfunction of the BKE operator Lb (.) with corresponding eigenvalue λ, then p∞ (x)ψ(x) is an eigenfunction of the
FPE operator with the same eigenvalue λ. Hence, the knowledge of the eigenfunctions of the BKE operator along with
knowledge of the stationary distribution of the FPE is sufficient for identifying the eigenstructure of the FPE, and
vice-versa. Hence, under the condition of asymptotic stability, knowledge of the stationary solution along with any
one of the FPE or the BKE operators is sufficient to completely determine the characteristics of the other operator.
In the following, we outline some of the other important properties of the FPE/ BKE operators under the condition
of asymptotic stability of the FPE. Consider the following Hilbert space, L2 (p∞ ), i.e, the L2 space where the inner
product is weighted by the stationary distribution:
Z
< f, g >∞ = f (x)g(x)p∞ (x)dx.
(14)
Then, from the development above it is quite simple to show that the BKE operator is a symmetric operator in L2 (p∞ ),
i.e.,
< Lb (f ), g >∞ =< f, Lb (g) >∞ , ∀ f, g ∈ L2 (p∞ ).
(15)
Under the condition that the stationary distribution is not heavy tailed, the semi-group associated with the BKE
operator is compact as well as self-adjoint and thus, the operator has a discrete spectrum. Moreover, the eigenfunctions
of the operator also form a complete orthonormal basis for L2 (p∞ ). However, the sufficient conditions for the existence
of a discrete spectrum are not well-known in the multidimensional case, especially for degenerate diffusions (please see
below), to the best of the knowledge of the authors. However, in obtaining the finite dimensional representations of the
operators, the symmetry of the operator is enough to assure the well-posedness of the associated problems. In the case
of multidimenesional problems, the FPE and the BKE operators are respectively given by the following expressions:
Lf (p) = −
Lb (V ) =
X ∂
1 X ∂2
fi p +
(GGt )ij p,
∂x
2
∂x
x
i
i
j
i
i,j
(16)
∂
1X
∂2
V +
(GGt )ij
V,
∂xi
2 i,j
∂xi xj
(17)
X
i
fi
given the nonlinear stochastic multidimensional system
Ẋ = F (X) + G(X)W,
F (X) = [f1 (X), · · · , fn (X)],
(18)
(19)
where W is multidimensional white noise. Note that in most multidimensional systems, especially mechanical ones,
the matrix G is rank deficient since the input to the systems is at only the velocity level variables, and hence, so is the
matrix GGt . In general, a diffusion operator may be written as
D(V ) ≡
X
i
ai (X)
∂V
1X
∂2V
+
dij (X)
,
∂xi
2 i,j
∂xi xj
(20)
where the A(x) = [a1 (X), · · · , aN (X)] is a drift vector and D(X) = [dij (X)] is a diffusion matrix. In the case when
D(x) is not strictly positve definite for all x, the diffusion process is termed degenerate (also sometimes it is said the
4
diffusion process violates the uniform ellipticity condition). For diffusions arising out of stochastic dynamical systems,
i.e., the BKE operator, it follows that they are degenerate diffusions due to the rank defficiency of G. In general, the
theoretical treatment of the solution of such equations is made very difficult by the degeneracy because classical solutions to PDEs governed by such operators may not exist. In these cases, the generalized notion of viscosity solutions
needs to be introduced.37–39
The FPE and the BKE are paramount in the propagation of uncertainty through, and the filtering and control
of stochastic nonlinear systems. As mentioned above though the theoretical treatment of these operators in the
multidimensional case is very difficult, nevertheless, we may obtain a finite dimensional, i.e., a matrix representation
of the above operators in order to practically solve these problems, atleast in an approximate fashion within the
resulting finite dimensional space. We shall show the exact application of the finite dimensional representation of the
operators to uncertainty propagation, filtering and control in the next two sections.
3
Finite Dimensional Representation of the FPE/ BKE operators
As mentioned in the previous section, the FPE and the BKE operators are the key to the analysis, filtering and control
of stochastic dynamical systems. However, these are infinite dimensional operators and in order to be practically
feasible, a finite dimensional representation of these operators needs to be obtained. The Galerkin residual projection
method is one of the most powerful methods of obtaining such a representation and is outlined in the following.
The Galerkin projection method is a general method to solve partial differential equations and is often used to solve the
FPK equation. Let p(t, x) represent the time varying solution to the FPK equation (2). Let {φ1 (x), · · · , φn (x), · · · }
represent a set of basis functions on some compact domain D. the various choices of the basis functions lead to different
methodologies. The Galerkin method makes the assumption that the pdf p(t, x) can be written as a linear combination
of the basis functions φi (x) with time varying coefficients, i.e.,
p(t, x) =
∞
X
ci (t)φi (x).
(21)
i=1
The key to obtaining the solution to the pdf p(t, x) is to find the coefficients ci (t) in the above equation. The Galerkin
method achieves this by truncating the expansion of p(t, x) for some finite N and projecting the equation error onto
the set of basis functions {φi (x)}, i.e.,
N
X
ci (t)φi (x),
(22)
p(t, x) ≈
i=1
and
<
∂
PN
i=1 ci (t)φi (x)
∂t
N
X
ci (t)φi (x)), φj (x) >= 0,
− Lf (
(23)
i=1
where < ., . > represents the inner product on L2 (D), the space of Lebesgue integrable functions on the domain D, and
Lf represents the FPE operator. Note that the above represents a finite set of differential equations in the co-efficients
{ci (t)}, which can be solved for the coefficients ci (t), given by:
N
X
i=1
ċi < φi , φj > −
N
X
ci < Lf (φi ), φj >= 0, ∀j.
(24)
i=1
Let
M = [< φi , φj >],
K = [< Lf (φi ), φj >].
(25)
(26)
The pair (M, K) represents the finite dimensional approximation of the FPE operator. If the basis functions are
mutually orthogonal, then the stiffness matrix K represents the finite dimensional representation of the FPE operator.
An eigendecompositon of the pair (M, K) yields the approximation of the eigenstructure of the infinite dimensional
5
FPE operator. The uncertainty propagation problem, as well as the prediction step of the filtering problem involve
the solution of the linear differential equation
M Ċ + KC = 0,
(27)
over a finite or infinite time interval. Given the eigenstructure of the above LTI system has been found and stored
a priori, the above equation can be trivially solved in real-time. Moreover, if the FPK equation is asymptotically
stable, it can be shown after some work that the eigenvalues of the FPE operator are all real and non-positive. In the
following section, we shall show how the FPE/ BKE operator may be used to solve the stochastic nonlinear control
problem.
The Galerkin method is not the only method that is available in order to obtain the finite dimensional representation of
the FPE/ BKE operator. The generalized Galerkin residual projection method can be used to obtain the representation.
In the generalized procedure, the residual (or equation error) obtained by substituting the approximate solution into
the equation of interest, is projected onto a set of generalized functions instead of the basis functions themselves.
For instance, collocation/ pseudospectral methods involve projecting the equation error onto a set of delta functions
located at a specially selected set of collocation points.35, 36 A full comparison of these methods for the computational
solution of the FPE/ BKE equations is beyond the scope of this paper. However, we would like to mention that
the generalized FEM based methods developed by the authors8–12 have consistently outperformed other competing
Galerkin residual projection methods.
4
Nonlinear Stochastic Control
An iterative approach to solving the Hamilton-Jacobi-Bellman (HJB) equation is presented now for the optimal control
problems of nonlinear stochastic dynamical systems. Consider the following dynamical system:
ẋ = f (x) + g(x)u + h(x)w(t);
x(t = 0) = x0
(28)
where, u is the control input (vector) and w represents white noise excitation. We are interested in minimizing the
following discounted cost functional over the infinite horizon:
Z ∞
J(x0 ) = E
[l(ϕ(t)) + ku(ϕ(t))k2R ]e−βt dt
(29)
0
where, ϕ(t) is a trajectory that solves Eq.28 and l is often referred to as the state-penalty function, and β is a given
discount factor. The infinite horizon problem results in a stationary, i.e., time invariant control law. However, for the
case of systems with additive noise, i.e., when h(x) is independent of x, as would be the case for instance in the LQG
problem, the cost-to-go is undefined since the trajectories of the system never decay to zero. Hence, in that case, the
discounting is a practical step to ensure a bounded cost-to-go function. The discount factor may also be interpreted
as a finite horizon over which the control law tries to optimize the system performance. For this problem, the optimal
control law, u∗ (x) is known to be given in terms of a value function V ∗ (x) as follows:
∂V ∗
1
(x)
u∗ (x) = − R−1 gT
2
∂x
(30)
where, the value function V ∗ (x) solves the following well known (stationary) Hamilton-Jacobi Bellman equation:
∂V ∗ T
1
∂2V ∗
1 ∂V ∗
∂V ∗
f + hQhT
+l−
gR−1 gT
− βV ∗ = 0,
2
∂x
2
∂x
4 ∂x
∂x
V ∗ (0) = 0
(31)
The above framework (solving for the optimal control law, u∗ (x)) is known as the H2 control paradigm. Notice that
designing the optimal control in the above framework requires solving a nonlinear PDE (Eq.31), which is in general
a very difficult problem. It is however possible to restructure the above equations so that the central problem can
be reduced to solving a sequence of linear PDEs - a much easier proposition. This is achieved via the substitution of
Eq.30 into the quadratic term involving the gradient of the value function in the HJB (Eq.31). Doing so gives us the
following equivalent form the the HJB:
1
∂2V ∗
∂V ∗ T
(f + gu) + hQhT
+ l + kuk2R − βV ∗ = 0
(32)
∂x
2
∂x2
The above form of the HJB is known as the generalized HJB. Notice that the substitution discussed above has converted
the nonlinear PDE to a linear PDE in the value function, V ∗ (x). Eq.32 forms the core of a policy iteration algorithm
in which the optimal control law can be obtained by iterating upon an initial stabilizing control policy as follows:
6
1. Let u(0) be an initial stabilizing control law (policy) for the dynamical system (Eq. 28), i.e., the FPE corresponding to the closed loop under u(0) is asymptotically stable.
2. For i = 0 to ∞
• Solve for V (i) from:
T
1
∂V (i)
∂ 2 V (i)
(f + gu(i) ) + hQhT
+ l + ku(i) k2R − βV (i) = 0
∂x
2
∂x2
(33)
1
∂V (i)
u(i+1) (x) = − R−1 gT
(x)
2
∂x
(34)
• Update policy as:
3. End
[Policy Iteration]
The convergence of the above algorithm has been proven for the deterministic case (Q = 0) in references27, 28 and is
true for the stochastic case too.40 However, conditions for the existence of classical solutions to the linear elliptic PDE
in the policy evaluation step (Eq. 33), and the asymptotic stability of the closed loop systems under the resulting
control policies, in the sense that the associated FPE is stable, have not been obtained to the best knowledge of the
authors. In this paper, we shall assume that the policy evaluation step admits a classical solution as well as that the
sequence of control policies generated asymptotically stabilize the closed loop system. In fact, these assumptions are
borne out by the numerical examples considered in the next section. The great advantage of using the policy iteration
algorithm over directly trying to solve the nonlinear HJB equation is that the algorithm typically converges in very
few steps (≤ 5).
Let us take a closer look at Eq.33 - writing it in operator form, we have:
(L − β)V = q
(35)
where,
1
∂2
∂
+ hQhT 2
∂x 2
∂x
= −(l + kuk2R )
L =
q
(f + gu)T
(36)
(37)
Notice that the operator L() is identical to the BKE operator of the closed loop under control policy u. The approximation for the value function is then written in terms of basis functions in the following manner:
N
X
V̂ =
ci φi (x)
(38)
i=1
where, φi (x) could be local or global basis functions depending on the nature of the approximation being used. In the
current paper, we use global polynomials (i.e. φ()i are polynomials supported on Ω). Then, following the Galerkin
approach, we project the residual error resulting from plugging the approximation (Eq.38) into the generalized HJB
(Eq.33) on to the space of polynomials on Ω as follows:
N
X
i=1
Z
Z
(L − β)[φi (x)]φj (x)dΩ =
ci
qφj dΩ,
Ω
j = 1, 2, . . . , N
(39)
Ω
Eq.39 is equivalent to a linear system of algebraic equations in ci :
Kc = f
(40)
where
Z
Kij
=
(L − β)[φj ]φi dΩ
(41)
qφi dΩ
(42)
ZΩ
fi
=
Ω
7
Note that
K = Kb − βI,
(43)
where Kb is the finite dimensional representation of the backward Kolmogorov operator. Some notes on the admissibility of basis functions (φi ) are due here. Notice that the BK operator involves only derivatives of the unknown
function, V (x). It is thus clear that when the discount factor (β) is zero, it is not possible to include a use a complete
set of basis functions without making the stiffness matrix K singular (φ ≡ 1 causes a rank deficiency of 1). However,
so far as solving Eq. 40 is concerned, it is not important to have the constant basis function in the approximation
space. In theory, a nontrivial β allows us to include φ ≡ 1 in the basis set; but in practice it would require a large
discount factor before the K is reasonably well-conditioned for inversion; which in turn would take the solution far
from being optimal since then the control policy becomes very short sighted.
The above formulation can also be carried out on local subdomains by choosing φi to be supported on subdomains
forming a cover for the global domain Ω. Then the projection integrals (39) would be evaluated over the local
subdomains. This is the basic approach in finite element methods and meshless finite element methods, where the
local subdomains overlap each other. The authors have developed a method based on the Partition of Unity Finite
Element method (PUFEM) to solve the forward (Fokker Planck) and backward Kolmogorov equations.8–12
5
5.1
Numerical Examples
Two Dimensional System - Van der Pol Oscillator
Consider the following two dimensional stochastic Van der Pol oscillator:
ẍ + v1 x + v2 (1 − x2 )ẋ = u + gw(t)
with, v1 = 1, v2 = −1, g = 1. The objective is to minimize the following cost function:
Z ∞
1 2
2
2
(x + ẋ ) + u dt
J(x0 , ẋ0 ) = E
2
0
(44)
(45)
The approximation for the value function was written using a complete set of global polynomials up to fourth order
(using polynomials up to the sixth order did not provide significant improvement in the approximation accuracy). It
is notable that in the above formulation, no direct means of enforcing state or control input constraints have been
used. In order to obtain reasonable bounds on the control input, the cost function (Eq.45) was scaled so as to modify
the generalized HJB (Eq.33) as follows (in addition to including the discount factor (β)):
T
∂V (i)
1
∂ 2 V (i)
(f + gu(i) ) + hQhT
+ βV = −eγ (l + ku(i) kR )
∂x
2
∂x2
(46)
i.e.,
l(·) → eγ l(·)
R → eγ R,
γ<0
(47)
(48)
Fig. 1 shows the converged results obtained for the above system. A linear starting policy was used to start the
iteration process, i.e., u(0) = K ẋ, with K = −20. Tuning parameters β = 0.05 and γ = −0.15 were found to give
acceptable results after about 13 iterations of the control policy. Fig.2(i) shows the system response without a control
input (open loop). Clearly, the states diverge. Fig.2(ii) shows the system response with the initial stabilizing control.
For both these results, nominal initial conditions were used inside the domain of operation, Ω = [−5, 5] × [−5, 5]. Fig.3
shows the optimal state trajectories obtained and the optimal control law.
5.2
Two Dimensional System - Duffing Oscillator
Next consider the following hard spring stochastic duffing oscillator:
ẍ − x + x3 = u + gw(t)
8
(49)
(i) Converged Value Function Surface.
(ii) Converged Control Input Surface. The maximum control required was modulated using the tuning parameter
γ.
Figure 1 Converged Results for the Stochastic Van der Pol Oscillator.
(i) Open Loop System Response for Eq.44.
(ii) System Response to Initial Stabilizing Control, u(0) =
−20ẋ.
Figure 2 System Response of the Stochastic Van der Pol Oscillator I.
9
Response to u(0)
Response to uconverged
(ii) Optimal Control Law Converged upon and u(0) .
(i) Optimal Path Obtained, Shown Alongside Response to
u(0) .
Figure 3 System Response of the Stochastic Van der Pol Oscillator II.
(i) Converged Value Function Surface.
(ii) Converged Control Input Surface. The maximum control required was modulated using the tuning parameter
γ.
Figure 4 Converged Results for the Stochastic Hard Spring Duffing Oscillator.
The above system represents
case ( = +1.5) with an unstable open-loop equilibrium at (x, ẋ) = (0, 0)
√ the hard-spring
√
and stable equilibria at ( , 0) and (− , 0). The cost function to be minimized is the same as for the previous system,
and we use a linear stabilizing control given by: u(0) = −30ẋ. Similar convergence characteristics as for the Van der
Pol oscillator were obtained and are shown in Figs. 4-6.
5.3
Four Dimensional System - Missile Pitch Control Autopilot
Finally, we consider a four-dimensional system that models the pitch motion control for a missile autopilot. This
model is the same as that considered in refrences27, 28 with a noise term added to all kinetic equations:
q̇
=
α̇
=
δ̈
My
+ gq w1 (t),
Iy
cos2 α
Fz + q + gα w2 (t),
mU
= −2ζωn δ̇ + ωn2 (δc − δ) + gδ w3 (t)
10
My = Cm (α, δ)QSd
(50)
Fz = Cn (α, δ)Sd
(51)
(52)
(ii) System Response to Initial Stabilizing Control, u(0) =
−30ẋ.
(i) Open Loop System Response for Eq.49.
Figure 5 System Response of the Stochastic Duffing Oscillator I.
Response to u(0)
Response to uconverged
(i) Optimal Path Obtained, Shown Alongside Response to
u(0) .
(ii) Optimal Control Law Converged upon and u(0) .
Figure 6 System Response of the Stochastic Duffing Oscillator II.
11
(i) Converged Value Function Surface.
(ii) Converged Control Input Surface.
Figure 7 Converged Results for the Missile Pitch Controller, Showing Various Sections of the Four-D State Space.
The control input in the above equations is δc (t), which is the commanded tail-fin deflection, while δ(t) denotes the
actual tail-fin deflection. Pitch rate and angle of attack are denoted by q and α respectively. wq (t), wα (t) and wδ (t)
are independent components of a 3 dimensional white noise process. The aerodynamic coefficient are given by the
following nonlinear functions:
Cm (α, δ) = b1 α3 + b2 α|α| + b3 α + b4 δ
Cn (α, δ) = a1 α3 + a2 α|α| + a3 α + a4 δ
The values of the various constants can be found in Beard et. al. (1998). The cost
by:
"Z
2
∞
1
Fz − Fzss
2
2
J(x0 ) = E
(q − qss ) + 25(α − αss ) +
+
2
m
0
(53)
(54)
function to be minimized is given
1
ku − δss k2R
2
#
(55)
where the ‘ss’ subscripted quantities denote steady state values. We use the same starting stabilizing control as that
used in Beard et. al.:
Fz
u(0) = 0.08(q + qss ) + 0.38(α + αss ) + 0.37
(56)
m
A complete basis of global quadratic shape functions was used to approximate the value function in the policy iteration
algorithm. Various sections of the converged value function and optimal control surface are shown in Fig.7. The optimal
trajectories are shown in Fig.7. Fig.8 shows the system response to the optimal control law obtained. Notice that the
pitch rate settles to zero and the angle of attack acquires the desired steady state value.
5.4
Notes on Modal Analysis
It is easy to see that any constant function, ψ = c, is a stationary solution of the BK equation, and an eigenfunction
of the BK operator with trivial eigenvalue. It was mentioned in Section that the constant basis function, φ ≡ 1, is
not admissible while solving for the coefficients, ci . However, when considering the eigenvalue problem for the BK
operator, Lψ = λψ, it is imperative that the constant basis function be included in the set; because the stationary
solution of the BK equation is indeed the constant function. The modified BK operator (β > 0), however, does
not admit a constant stationary solution and an eigenfunction with a trivial eigenvalue (Fig 9). We reiterate that
it is important while solving the eigenvalue problem to utilize a complete basis set in the approximation space (i.e.
including the constant function) in order to recover the stationary eigenfunction of the BK operator. On the other
hand, it is possible to solve for the coefficients, ci , without using the constant basis function. Fig.9 shows the absolute
values of the eigenvalues for both the Fokker-Planck and the Backward Kolmogorov operators.
12
Open Loop
Response to uconverged
(i) Optimal Path Obtained, Shown Alongside Open Loop
Trajectory.
(ii) Optimal Control Law Obtained.
Figure 8 System Response of the Missile Pitch Controller.
(i) Comparison of the Spectra of the BK and FP Operators (ii) The Modified BK Operator does not Contain the Trivfor the Van der Pol Oscillator.
ial Eigenvalue in its Spectrum (β = 0.02).
Figure 9 Spectra of the BK, modified BK and FP Operators for the Van der Pol Oscillator
13
6
Conclusion
In this paper, we have outlined a computational methodology for solving the nonlinear stochastic optimal control based
on the policy iteration algorithm and a finite dimensional approximation of the controlled Fokker-Planck-Kolomogorov/
diffusion operator. The computational methodology was also tested on a number of test cases where it was shown
to have satisfactory performance. The next step in our research will be to extend these methods to more complex
stochastic systems such as jump diffusion processes and stochastic hybrid systems. We envisage using a hierarchical
hybrid methodology in order to solve the abovementioned problems wherein the methods outlined in this paper would
form the lower, continuous level of the hierarchy.
REFERENCES
1. H. Risken, The Fokker Planck Equation: Methods of Solution and Applications, ser. Springer Series in Synergetics.
2. D. C. Polidori and J. L. Beck, “Approximate solutions for nonlinear vibration problems,” Prob. Engr. Mech.,
vol. 11, pp. 179–185, 1996.
3. M. C. Wang and G. Uhlenbeck, “On the theory of brownian motion ii,” Reviews of Modern Physics, vol. 17, no.
2-3, pp. 323–342, 1945.
4. A. H. Jazwinski, Stochastic Processes and Filtering Theory.
New york, NY: Academic Press, 1970.
5. A. Doucet, J. F. G. deFrietas, N. J. Gordon, and eds., Sequential Monte Carlo Methods in Practice. New York:
Springer-Verlag, 2001.
6. J. L. Junkins, P. Singla, T. D. Griffith, and T. Henderson, “Orthogonal global/local approximation in ndimensions: Applications to input/output approximation,” in 6th International conference on Dynamics and
Control of Systems and Structures in Space, Cinque-Terre, Italy, 2004.
7. J. L. Junkins and P. Singla, “Orthogonal global/local approximation in n-dimensions: Applications to input/output approximation,” in to appear in Computer Methods in Engg. and Science, Submitted.
8. M. Kumar, P. Singla, J. Junkins, and S. Chakravorty, “A multi-resolution meshless approach to steady state
uncertainty determination in nonlinear dynamical systems,” in 38th IEEE Southeastern Symposium on Systems
Theory, Cookeville, TN, Mar 5-7, 2006.
9. M. Kumar, P. Singla, S. Chakravorty, and J. L. Junkins, “The partition of unity finite element approach to the
stationary fokker-planck equation,” in AIAA/AAS Astrodynamics Specialist Conference, Keystone, CO, USA,
21-24 August, 2006, AIAA Paper 2006-6285.
10. S. Chakravorty, “A homotopic galerkin approach to the solution of the fokker-planck-kolmogorov equation,” in
Proceedings of the 2006 American Control Conference, 2006.
11. M. Kumar, S. Chakravorty, and J. L. Junkins, “An eigendecomposition approach for the transient fokker-planck
equation,” in Proceedings of the 18th Engineering Mechanics Division Conference of the ASCE, Blacksburg, VA,
2007.
12. ——, “A semianalytic meshless approach to the fokker-planck equation with application to hybrid systems,” in
Proc. IEEE Conf. Dec. and Control, vol. to appear, 2007.
13. R. E. Bellman, Dynamic Programming.
Princeton, NJ: Princeton University Press, 1957.
14. D. P. Bertsekas, Dynamic Programming and Optimal Control, vols I and II. Cambridge: Athena Scientific, 2000.
15. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming.
Boston, MA: Athena Scientific, 1996.
16. P. Werbos, “Approximate dynmaic programming for real-time control and neural modeling,” in Handbook of
Intelligent Control, White and Sofge, Eds. Van Nostrand Reinhold.
17. P. Marbach, Simulation based Optimization of Markov Reward Processes, PhD Thesis.
sachusetts Institute of Technology, 1999.
14
Boston, MA: Mas-
18. J. Baxter and P. Bartlett, “Infinite horizon policy-gradient estimation,” Journal of Artificial Intelligence Research,
vol. 15, pp. 319–350, 2001.
19. R. S. S. et. al., “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 1999
Neural Information Proc. Sys., 1999.
20. M. Lagoudakis and R. Parr, “Least squares policy iteration,” Journal of Machine Learning Research, vol. 4, pp.
1107–1149, 2003.
21. H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time.
york, NY: Springer, 2001.
New
22. L. G. Crespo and J. Q. Sun, “Stochastic optimal control of nonlinear systems via short-time gaussian approximation and cell mapping,” Nonlinear Dynamics, vol. 28, pp. 323–342, 2002.
23. ——, “Stochastic optimal control via bellman’s principle,” Automatica, vol. 39, pp. 2109–2114, 2003.
24. C. S. Hsu, Cell to Cell Mapping: Amethod for the Global Analysis of Nonlinear Systems.
Springer-Verlag, 1987.
New York, NY:
25. F. B. Hanson, “Techniques in computational stochastic dynamic programming,” in Control and Dynamical Systems: Advances in Theory and Applications, C. T. Leondes, Ed., vol. 76. New York, NY: Academic Press,
1996.
26. ——, “Computational dynamic programming on a vector multiprocessor,” IEEE Transactions on Automatic
Control, vol. 36, pp. 507–511, 1991.
27. R. Beard, G. Saridis, and J. Wen, “Galerkin approximation of the generalized hamilton-jacobi-bellman equation,”
Automatica, vol. 33, pp. 2159–2177, 1997.
28. ——, “Approximate solutions to the time-invariant hamilton-jacobi-bellman equation,” Journal of Optim. Th.
Appl., 1998.
29. F. L. L. et. al., Neural Network control of Robotic Manipulators and nonlinear systems. Taylor and Francis, 1999.
30. M. A. Kahlaf and F. L. Lewis, “Nearly optimal state feedback control of constrained nonlinear system using a
neural network hjb,” Annual Reviews in Control, vol. 28, pp. 239–251, 2004.
31. M. Abu-Khalaf, J. Huang, and F. L. Lewis, Nonlinear H2/ H-∞ constrained Feedback control: a practical approach
using neural networks. New York: Springer Verlag, June 2006.
32. R. Padhi and S. N. Balakrishnan, “Proper orthogonal decomposition based optimal neurocontrol synthesis of a
chemical reactor process using approximate dynamic programming,” Neural Netw., vol. 16, no. 5-6, pp. 719–728,
2003.
33. S. Ferrari, Algebraic and Adaptive Learning in Neural Control Systems, PhD thesis. Princeton University, 2002.
34. A. E. Bryson and Y. Ho, Applied Optimal Control.
Washington, D.C.: Hemsiphere Publsihing Co., 1975.
35. I. M. Ross and F. Fahroo, “Pseudospectral knotting methods for solving nonsmooth optimal control problems,”
AIAA J. of Guid. Cont. Dyn., vol. 27, pp. 397–405, 2004.
36. ——, “Issues in real-time computation of optimal control,” Mathematical and Computer Modeling, vol. 43, pp.
1172–1188, 2006.
37. W. H. Fleming and H. Soner, Controlled Markov Processes and Viscosity Solutions.
2001.
New york, NY: Springer,
38. M. G. Crandall and P. L. Lions, “Viscosity solutions of hamilton-jacobi equations,” Transactions of the American
Mathematical Society, vol. 277, pp. 1–42, 1983.
39. M. G. Crandall, H. Ishii, and P. L. Lions, “User’s guide to viscosity solutions of second order partial differential
equations,” Bulletin of the American Mathematical Society, vol. 27, pp. 1–67, 1992.
40. M. Puterman, Markov Decision Process.
Hoboken, NJ: Wiley-Interscience, 1994.
15