Control in the Reliable Region of a Statistical Model

Transcription

Control in the Reliable Region of a Statistical Model
2
Control in the Reliable Region
of a Statistical Model
Youngmok Yun, Student member, IEEE, Jonathan Ko, Member, IEEE
and Ashish D. Deshpande, Member, IEEE
Abstract—We present a novel statistical model-based control
algorithm, called control in the reliable region of a statistical
model (CRROS). First, CRROS builds a statistical model with
Gaussian process regression which provides a prediction function
and uncertainty of the prediction. Then, CRROS avoids high
uncertainty regions of the statistical model by regulating the
null space of the pseudo-inverse solution. The simulation results
demonstrate that CRROS drives the states toward high density
and low noise regions of training data, ensuring high reliability
of the model. The experiments with a robotic finger, called Flexfinger, show the potential of CRROS to control robotic systems
that are difficult to model, contain constrained inputs, and exhibit
heteroscedastic noise output.
I. I NTRODUCTION
Robotics researchers have developed statistical model learning algorithms with neural networks (NN) [1], support vector regression (SVR) [2], and Gaussian process regression
(GPR) [3], [4], to resolve the challenges of system identification. However, control with statistical models is still
challenging since these data-dependent models are sometimes
unreliable, and the incorrect model might lead to control failure and safety accidents. In addition, the heavy computational
burden of statistical models results in a low control loop
frequency, which is critical in control. Our goal is to develop
a statistical model-based control algorithm which is reliable,
fast, and operated in a safe range.
A crucial weakness of the statistical modeling methods is
that the reliability of statistical models is highly dependent
on the training data. Even if a learning algorithm successfully
works with a given training data, it is almost impossible to
reliably predict the behavior of nonlinear systems when the
predicted region includes sparse data or a significant level
of noise (Fig. 1). If a state is out of the reliable region,
the statistical model might be substantially different from
the actual system behavior, eventually resulting in failure of
control.
We consider a tracking control problem for the following
system.
y = g(x)
(1)
dim(x) > dim(y)
Y. Yun and A. D. Deshpande are with the Department of Mechanical
Engineering, The University of Texas at Austin, Austin, TX, 78703, USA,
e-mail: {yunyoungmok@utexas.edu, ashish@austin.utexas.edu}.
J. Ko is with the Department of Computer Science and Engineering, University of Washington, Seattle, WA, 98195, USA, e-mail: jonko6@gmail.com
where y is the output, x is the controllable state, and g is
a nonlinear system. Since dim(x) > dim(y), multiple x can
work as solution for a given y. For nonlinear systems, this
redundancy is common for various reasons (e.g, singularity). A
kinematically redundant manipulator with joint angles as state
and end-effector position as output is a well-known example of
such a system [5]. Another example is a human-like tendondriven finger [6]. We seek to design a control algorithm to
determine values of x that will track a desired trajectory of
y with a statistical model, denoted as f . Here, for successful
control, the statistical model f should reliably represent g.
We present a statistical model-based control strategy for the
system (1), called CRROS (control in the reliable region of a
statistical model). The underlying idea is to drive x toward an
area of reliable f by regulating redundant DOFs while pursuing a desired output trajectory. With a statistical modeling
method, unreliable f is inevitable in sparse data regions or
serious noise region, but CRROS is designed to avoid these
unreliable regions. The first step of CRROS is to develop
a statistical model with GPR. Since GPR is fully based on
Bayes’ theorem, it provides not only the predicted output for a
given input but also the uncertainty of the prediction. Here, the
prediction uncertainty is important because it indicates which
regions are reliable. The next step in CRROS is to track a
desired output y by using the pseudoinverse of the Jacobian
of the GPR model, and simultaneously regulate the null space
to drive x toward a region of low uncertainty.
A previous work related with our research is the obstacle
avoidance algorithm for manipulators and mobile robots using
a virtual repulsive force field [7]. The force field in [7] is
generated by the location of obstacles in output space, but
in CRROS, the force field is generated by the distribution of
prediction uncertainty in state space. The approach of force
field provides an optimum direction to minimize the prediction
uncertainty by using the gradient operator without iterative
optimization procedure. After the introduction of the GPR,
few researchers have integrated nonlinear model predictive
control (NMPC) [8] with the uncertainty prediction function of
GPR to address the model reliability issues. However, because
the combination of GPR with NMPC requires considerable
computational burden, the applications have been restricted
to only chemical process control [9], [10]. Another related
approach has been developed in the field of reinforcement
learning (RL). RL researchers have developed a learning
algorithm to determine a behavior policy of robots to achieve a
specific goal [11], [12], [13]. The learning procedure contains
an effort to minimize the uncertainty of the statistical model.
3
II. S YSTEM M ODELING WITH G AUSSIAN P ROCESS
R EGRESSION
In this section, we present an overview of building a model
for a nonlinear system with the GPR algorithm [3], [4]. An
advantage of the GPR is that the user does not need to actively
tune the complexity and parameters of the model, in contrast to
other nonlinear regression modeling methods including NN [1]
and SVR [2]. Once the Gaussian process (GP) is defined with
a mean and a covariance function (zero mean and Gaussian
kernel is mostly used), a detailed model is determined by the
database itself. Another merit is that its prediction output is a
probabilistic distribution defined by Gaussian distribution, and
the variance of the distribution can be regarded as the level
of prediction uncertainty. This uncertainty plays a key role
in the development of CRROS. One disadvantage of GPR is
the heavy computational burden, which increases by O(n2 ) for
prediction, where n is the number of training data [3]. This can
be partially resolved by implementing methods such as sparse
approximation [16], local Gaussian process regression [17] or
usage of GPU [18].
Training data
95% confidence, St GPR
95% confidence, Het GPR
Fitting curve, St GPR
Fitting curve, Het GPR
Output
The policy search algorithms may allow robots to pursue more
sophisticated goals, but the selection of an appropriate policy
and the optimization of policy parameters are still challenging
problems.
CRROS solves a fundamental problem of a statistical model
(i.e, prediction reliability) for controlling the system of (1)
to track a desired trajectory. Since CRROS takes advantages
of the force field approach (gradient), which can provide an
optimum solution at the current state, it does not need any
iterations, and thus allows for a high speed control loop.
CRROS pushes the state towards a region with dense and
low-noise data. This tendency makes the system control safer
because the training data region is a guaranteed safe region
and low noise leads to predictable outputs. These advantages
enable CRROS to control systems for which it is challenging
to develop a parametric model, e.g, soft robots. Also, CRROS
is an efficient algorithm for teaching a redundant robot because
its state follows the region where the robot has learned and
the learning has been repetitive. For example, if we teach
a kinematically redundant manipulator desired motions, the
robot will generate a new x to track a desired y, and the
redundant motion will be optimized to be close to the learned
motion.
In the following sections, we explain the GPR-based modeling method with two versions, a standard GPR and a
heteroscedastic GPR (Section II), and the details of CRROS
(Section III). For validation, we carried out computer simulations (Section IV) and experiments with a robotic system
(Section V). In the experiments, Flex-finger, for which it is
challenging to build an analytical model, is controlled to prove
the practical effectiveness of CRROS. In addition, we provide
all simulation codes written in MATLAB to help reader’s
understanding of the core principle of CRROS [14]. We also
provide all experiment videos to show the detail of experiment
results [14]. Preliminary work on this topic was presented in
a conference [15].
Training data
St GPR
Het GPR
High density,
Low noise
Low uncertainty
Low uncertainty
Low density,
High uncertainty
High uncertainty
High density,
High noise
Low uncertainty
High uncertainty
Input
Figure 1.
From a nonlinear function, a training data set was sampled
with heteroscedastic noise. The training data can be classified into three
regions. The first includes the high density of data and low noise. The second
includes sparse data. The third includes the high density of data and high
noise. The only first region of training data can be reliably modeled with
a statistical method. The standard version of GPR [3] considers the density
of training data and the overall noise level for uncertainty estimation. The
heteroscedastic GPR [19] can consider not only the density of training data
but also heteroscedastic noise when it estimates prediction uncertainty. As a
result, in the third region, the uncertainty of prediction with the standard GPR
is low, which is wrong. Otherwise the heteroscedastic GPR is able to estimate
its uncertainty correctly for all regions.
In this paper, we present CRROS with two GPR algorithms.
The first one uses a standard version of GPR [3], while the
second uses a heteroscedastic GPR [19]. Fig. 1 compares
the different properties of two GPRs. The core principle of
CRROS is the same for both versions, and indeed can be
combined with other statistical modeling methods if they
provide the uncertainty of prediction. The standard GPR is
a well-known regression algorithm in robotics and statistics,
supported by many communities because of its robustness and
mathematical simplicity. The heteroscedastic GPR is able to
model more complex systems, but is approximately half as
fast as standard GPR. There are ongoing works to improve the
speed of the heteroscedastic GPR [19], [20], [21]. In general,
we recommend to use a heteroscedastic GPR for CRROS when
required to model heteroscedastic systems, and use a standard
GPR otherwise. For the detailed properties of CRROS with
two GPRs, we will discuss in the later sections.
A. Modeling with a Standard GPR
We first build a statistical model with the standard version
of GPR. We will explain only the essential part of GPR for
CRROS. For further details, refer [3]. Let us assume that we
have a training data set D for the system (1), which contains
inputs xi and outputs yi : D = {(xi , yi )|i = 1, 2, . . . , n},
X = [x1 x2 . . . xn ]> , y = [y1 y2 . . . yn ]> . Given this
data set, we wish to predict the probability density of y∗
for a new query point x∗ . In other words, we want to find
p(y∗ |x∗ , D, Θ), where Θ is a hyperparameter set for the
4
GPR model. If the system (1) has a multidimensional output
(MIMO system), its output is predicted by multiple functions,
p(y∗j |x∗ , Dj , Θj ), where j is the index of the output. In this
section, for simplicity, we only explain p(y∗ |x∗ , D, Θ). The
next section describes the expansion into a multidimensional
output.
A Gaussian process is a stochastic process, and defined as
a collection of random variables, any finite number of which
have a joint Gaussian distribution. It is completely determined
by a mean function and a covariance function as:
f (x) ≡ GP (m(x), k(x, x0 ))
(2)
where mean function m(x) = E[f (x)], and covariance function k(x, x0 ) = E[(f (x) − m(x))(f (x0 ) − m(x0 ))]. In our
GP model, the mean function and the covariance function are
determined as:
m(x) =0
(3)
k(x, x0 ) =νf2 exp −
>
(δx) Λ(δx)
2
+ νn2 ϕ(i, j)
(4)
where δx is defined as x − x0 . ϕ(i, j) is the Kronecker
delta function. Λ is a diagonal matrix whose elements are
(squared inverse) length scales for input space. νf2 specifies
the overall vertical scale of variation of the latent values, νn2
the latent noise variance. The hyperparameter set, denoted as
Θ, is defined as a set of νf , νn and Λ, and determines the
characteristics of a GP model.
The Gaussian process prior is updated with a training data
set based on Bayes’ theorem. The posterior is given as:
−1
−1
p(y∗ |x∗ , D, Θ) = N (k>
y, κ − k>
k∗ )
∗K
∗K
(5)
where K is a matrix whose elements Kij are covariance
function values of k(xi , xj ). k∗ = [k(x∗ , x1 ) . . . k(x∗ , xn )]T .
κ = k(x∗ , x∗ ). For the derivation, refer to [3]. Fig. 1 helps in
understanding the useful feature of GPR.
The characteristics of a GP model are determined by the
choice of the hyperparameter set, so selection of an appropriate
hyperparameter set is critical for a good representation of a
system. We select a hyperparameter set which maximizes the
log-likelihood p(y|X, Θ):
1
1
n
log p(y|X, Θ) = − y> K−1 y − log |K| − log 2π (6)
2
2
2
and the maximization is performed with a gradient-based
optimization algorithm. The gradient is calculated as:
∂ log p(y|X, Θ)
1
∂K −1
1
∂K
= y> K−1
K y − tr(K−1
)
∂θ
2
∂θ
2
∂θ
(7)
both the density of data and the noise level in estimating the
prediction uncertainty.
For the main GP, the same data set D introduced in the
previous subsection is used. The dataset for the second GP is
determined as: D0 = {(xi , zi )|i = 1, 2, . . . , n} where zi =
log(ri ) and ri is the square of residual of the main GPR at
the query point xi . The log function was used to ensure the
positive value of r in the inverse function, ri = exp(zi ). The
posterior of the main GP, or the heteroscedastic GPR, is given
as:
p(y∗ |x∗ , D, Θ)
−1
−1
= N (k>
y, κ + r∗ − k>
k∗ )
∗ (K + R)
∗ (K + R)
(8)
where R = diag(r1 . . . rn ), r∗ = log(z∗ ), and z∗ is the mean
of the second GP posterior at the query point x∗ . K, k∗ , κ are
determined in the same way as (5).
The main and second GPs are recursively connected because
the prediction with the main GP changes the residuals and
the regression of these residuals (the second GP) which again
affects the main GP. The hyperparameter sets determining two
GPs can be optimized by iterating predictions of two GPs until
the hyperparameters converge like an EM algorithm [22]. For
the initial step of iteration, a standard GP is used as the main
GP since we cannot obtain the residuals yet.
III. D EVELOPMENT OF CRROS
As stated in Section I, a statistical model has different
features compared to an analytical model, thus it demands
a different control strategy. Specifically, if we apply a conventional control algorithm to a statistical model, it will output
a path of x with no regard to the training data. Since the
reliability of model is highly dependent on the density of
data and noise level, this may result in failure of control.
One example of this situation is presented in Section IV. To
address this problem, we propose a novel control strategy,
called CRROS that takes advantage of the system redundancy
to drive the system states to reliable regions.
The basic idea of CRROS is to first track a desired output
with the pseudoinverse of the Jacobian of the GPR model, then
push the state of the system in the direction which minimizes
the prediction uncertainty on the null space associated with the
Jacobian. The direction to minimize the prediction uncertainty
is found by the gradient operator providing the slope of
prediction uncertainty field obtained from GPR. The advantage
of this approach is that CRROS can find the optimum solution
at the current state without iteration. This method is inspired
by the obstacle avoidance algorithm [5], [7]. In CRROS, the
obstacle is the unreliable region of a statistical model.
B. Modeling with a heteroscedastic GPR
A. Necessary equations for developing CRROS with standard
GPR
Secondly, we build a statistical model with the heteroscedastic GPR devised by [19]. The basic idea behind this method
is to create two GP models. The main GP is used to find a
regression model for the training data, while the second GP
is used to fit the residuals of the main regression model. By
combining the two GPs, the heteroscedastic GPR accounts for
To introduce CRROS, we first need to define a number of
mathematical terms which will be used in the main CRROS
algorithm. The standard GP in (5) provides two values: predicted output and variance of the prediction, which can be
re-written with two functions. The first function, fµ (x∗ ) in
(9), is for the predicted output at a query point x∗ , and the
5
other function, fΣ (x∗ ) in (10), is for the variance (uncertainty)
of the prediction.
fµ (x∗ ) =
fΣ (x∗ ) =
−1
k>
y
∗K
> −1
κ − k∗ K k∗
gradient are defined as:
fΞ (x∗ ) =
(9)
(10)
∇fΞ (x∗ ) =
JΣ (x∗ ) =
(18)
q
X
!>
JΣi (x∗ )
(19)
i=1
∂fµ (x∗ )
∂x∗
= (K−1 y)>
fΣi (x∗ )
i=1
Next, their Jacobians are calculated as:
Jµ (x∗ ) =
q
X
∂k∗
∂x∗
(11)
∂fΣ (x∗ )
∂x∗
−1
= −2k>
∗K
∂k∗
∂x∗
(12)
The size of both matrixes Jµ and JΣ are 1 × d where d =
dim(x). The term ∂k∗ /∂x∗ , commonly used in (11) and (12),
is defined as:

 ∂(k(x ,x ))
∗
1
∗ ,x1 ))
· · · ∂(k(x
∂x∗ [1]
∂x∗ [d]


∂k∗
..
..
..

(13)
=
.
.
.


∂x∗
∂(k(x∗ ,xn ))
∂(k(x∗ ,xn ))
···
∂x∗ [1]
∂x∗ [d]
where x∗ [i] indicates the i-th element in the vector x∗ . In our
case, since the covariance function of GP uses the squared
exponential kernel, each element is defined as:
∂(k(x∗ , x))
(δx)> Λ(δx)
= −Λii (x∗ [i] − x[i])νf2 exp −
∂x∗ [i]
2
(14)
where fΞ is a scalar function providing the sum of prediction
variances; the variances are already squared values of standard
deviations, thus the sum operation can represent its total
variance. ∇fΞ (x∗ ) is a d-dimensional gradient vector which
indicates the direction of the linearized slope of fΞ at x∗ point.
B. Necessary equations for developing CRROS with heteroscedastic GPR
For the heteroscedastic GPR, we only need to replace (9)(12) with (20)-(23):
−1
fµ (x∗ ) = k>
y
∗ (K + R)
fΣ (x∗ ) = κ + r∗ −
k>
∗ (K
(20)
−1
+ R) k∗
> ∂k∗
Jµ (x∗ ) = (K + R)−1 y
∂x∗
∂r∗
−1 ∂k∗
JΣ (x∗ ) =
− 2k>
∗ (K + R)
∂x∗
∂x∗
(21)
(22)
(23)
Since r∗ = exp(z∗ ) and z∗ is the mean of GP posterior,
∂r∗ /∂x∗ is calculated with (11). All other elements are
obtained in the same way of the standard GPR.
C. CRROS algorithm
So far, we have been limited to one-dimensional outputs.
Let us expand this problem into a multidimensional output
problem, dim(y) = q > 1. We assume that we do not have
any knowledge of relation between the outputs, a priori, thus
the multi-dimensional output functions are simply stacked with
the one dimensional functions. That is:
fµ (x∗ ) = [fµ1 (x∗ ) fµ2 (x∗ )
...
fµq (x∗ )]>
(15)
fΣ (x∗ ) = [fΣ1 (x∗ ) fΣ2 (x∗ )
...
fΣq (x∗ )]>
(16)
fµi and fΣi are for individual outputs, built in (9) and (10).
Both fµ and fΣ are q-dimensional vectors. In the same way,
q × d dimensional jacobian matrixes Jµ and JΣ are built as:
 1

Jµ (x∗ )
Jµ2 (x∗ )


Jµ (x∗ ) =  .  ,
 .. 
Jµq (x∗ )
 1

JΣ (x∗ )
JΣ2 (x∗ )


JΣ (x∗ ) =  . 
 .. 
(17)
JΣq (x∗ )
Since we want to solve a control problem, we are more
interested in the sum of variances of multiple outputs rather
than individual variances. Therefore, another function and its
The system equation g in (1) is modeled with the GP mean
function fµ , and it can be linearized as (24). Inversely it can
be written as (25). In the view point of the feedback control,
(25) can be rewritten as (26).
∆y = Jµ ∆x
∆x =
x̃t+1 =
+
J+
µ ∆y + (Id − Jµ Jµ )v
ξ J+
µ (yt+1 − ŷt ) + (Id −
(24)
(25)
J+
µ Jµ )v
+ x̂t
(26)
where x̃t+1 is the next target state to be controlled, x̂t is
the observed current state, yt+1 is the next desired output,
and ŷt is the observed current output. From here, since we
are only interested in the prediction at the point x̂t , in other
words x∗ = x̂t , we will use Jµ instead of Jµ (x̂t ). Other
notations will follow this omission rule as well. Jµ is the
Jacobian matrix for fµ . J+
µ is the Moore Penrose pseudoinverse
of Jµ and ξ is a scalar gain. (Id −J+
µ Jµ ) is a matrix projecting
a vector on the null space associated with Jµ where Id is a
d × d identity matrix. v is an arbitrary d-dimensional vector.
Here, a remarkable point is that we can select any v since it
is projected on the null space which does not affect ∆y, but
v can affect ∆x. In many control applications, v is set to be a
zero vector because it minimizes ||x̃t+1 − x̂t ||, in other words,
it is the minimum norm solution. In our case, v is set in a
different way.
In CRROS, we wish to minimize the sum of prediction
variances fΞ while tracking a desired path. To reduce fΞ , we
6
need to consider a direction of ∆x to minimize ∆fΞ . The
direction of ∆x is defined with a direction vector vd :
∇fΞ
vd = argmin (fΞ (x̂t + εu) − fΞ (x̂t )) = −
||∇fΞ ||
u
(27)
where ε is a small positive real number. (27) is a widely used
theorem in gradient-based optimization algorithms [7]. The
magnitude change of fΞ , caused by vd , is defined with vm as
(28):
vm = fΞ (x̂t + vd ) − fΞ (x̂t ) ' ∇fΞ> vd = −||∇fΞ ||
(28)
“'” is caused by linearization, or the gradient operation. Now,
let us build CRROS’s v whose direction is vd and magnitude
is β||vm ||. Using (27) and (28), v = −β∇fΞ where β is
a control gain for reducing the prediction variance. v points
toward a reliable region in the GPR statistical model, and its
magnitude is proportional to the slope of fΞ . Finally, (26) is
rewritten with the new v as:
+
xt+1 = ξ J+
µ (yt+1 − ŷt ) − β(Id − Jµ Jµ )∇fΞ + x̂t (29)
Because the primary task is to track a desired trajectory, J+
µ
first uses a part of x space, and then the remaining space (i.e.,
null space) is used for −β∇fΞ . The value of β determines how
strongly the algorithm pushes the target state x̃t+1 toward the
low uncertainty region. A large value of β increases ||x̃t+1 −
x̂t || because the null space is orthogonal to the space of J+
µ.
IV. S IMULATION WITH 3-DOF A RTICULATED
M ANIPULATOR
For validation of the algorithm, we conducted a simulation
with a 3-DOF articulated manipulator. First, we validate the
CRROS algorithm with a dataset including homegeneous noise
to effectively observe the properties of CRROS. Second, a
training data set including heteroscedastic noise is used to see
the difference of the two versions of CRROS. Simulation codes
written in MATLAB are in [14].
Fig. 2 demonstrates the configuration of the simulated 3DOF articulated manipulator. The state x consists of three
joint angles and the output y consists of the x-y positions
of the end-effector. Based on the system configuration, the
kinematics are governed by a nonlinear function g as shown
in (30):
y = g(x)
(30)
y1
l1 cos(x1 ) + l2 cos(x1 + x2 ) + l3 cos(x1 + x2 + x3 )
=
y2
l1 sin(x1 ) + l2 sin(x1 + x2 ) + l3 sin(x1 + x2 + x3 )
where l1 , l2 , and l3 are length of each link, and were set as
1.5, 1.2, and 1.0 respectively in simulation. (y in Fig. 2 has
an offset for generality.)
A. Simulation with homogeneous noise
For generating the training data, 30 input angles, x, were
randomly sampled within π/2 rad ranges. This range constraint has two purposes. The first is to simulate a realistic
situation; most articulated arm actuators have a limited operation range of motion due to hardware contact and safety.
The second reason is to create a sparse region in the training
data, in order to clearly show the effectiveness of CRROS.
Corresponding training data outputs y were collected using
(30). Then we built a statistical model with the standard
GPR. Since the noise is homogeneous, the statistical model
of heteroscedastic GPR is almost same with that of the
standard GPR. Based on the GPR model constructed with
the collected training data, CRROS controlled the manipulator
to track a desired circular trajectory. The simulations were
conducted with three different conditions, β = 0, 10 and 100
to investigate the effect of β, the gain pushing the state away
from uncertainty regions. We assumed observation error and
control error, x̂ = x + x , ŷ = y + y and xt+1 = x̃t+1 + c
where x , y , c are unbiased Guassian noise and their standard
deviations are set as 0.02 rad, 0.01, and 0.02 rad respectively.
Fig. 3 and Fig. 4 illustrate the simulation results.
In the first case β = 0, as shown in Fig. 3 (a), (b) and
Fig. 4, the algorithm failed to track a desired trajectory. With
β = 0, CRROS does not consider the unreliable region, and
simply outputs the minimum norm solution. As a result, the
state entered the region with sparse training data, and this significantly deteriorated the control performance. In the second
and third cases, CRROS successfully tracked the desired path,
and simultaneously stayed in low uncertainty regions. This
verifies that the algorithm successfully pushed the state toward
a reliable region by projecting ∇fΞ on a null space. Because
the low uncertainty region exists around a training data point,
the trajectory of state space tended to be close to those points.
This effect was more significant for the third case because in
this case CRROS pushed the state more forcefully toward the
low uncertainty region. This resulted in several sharp turns in
the state space (Fig. 3 (f)) and longer travel distances (Fig. 4
(b)). The uncertainty in the third case is the lowest during the
control time as shown in Fig. 4 (a).
B. Simulation with heteroscedastic noise
Figure 2. The configuration of the 3-DOF articulated manipulator which is
used in the simulations. The position of the end-effector y is controlled by
the three joint angles x.
For the second simulation, we generated a particular data
set to observe the different features of two CRROS versions.
First, we sampled 100 data points with a Gamma distribution
for x1 and uniform distributions for x2 and x3 . The Gamma
distribution was used to make a biased density of distribution
along x1 (Fig. 5 (c) and (d)). Second, when obtaining y, we
Training data
State
y2
x3
x2
2
Initial point
0.1
0
0
β=0
β = 10
β = 100
1
í
1
0.04
x1
0.2
0.4
Time
0.6
0.8
(a) Prediction uncertainty
1
β=0
β = 10
β = 100
0.02
0
0
0.2
0.4
Time
0.6
0.8
1
(b) ||x̃t+1 − x̂t ||
1
1
0
−1
−2
2
0.2
3
3 Initial point
2
0.3
Std
Training
data
3
Output
Desired path
2
3DOF Manipulator
7
1
||∆ x||
x3
2
2
−1
0
y1
1
2
3
(a) Result in y space, β = 0
x2
1
0 0.5
x1
1
Figure 4. Simulation results of 3-DOF articulated manipulator control.
√ (a)
shows the prediction uncertainty by standard deviation, expressed as fΞ . In
the case β = 0, CRROS does not consider prediction uncertainty, as a result
the prediction uncertainty dramatically increased. After all, this unreliable
model caused the control to fail. The case β = 100 had a slightly smaller
prediction uncertainties during the control time than the case β = 10. (b)
shows the norm of x̃t+1 − x̂t . Initially the first case had the smallest value
because it provides the minimum norm solution. However, at the end, large
error in y space sharply increased the value. For the third case, CRROS
strongly pushed the state to reduce the prediction uncertainty, and as a result,
several peaks appear in the plot.
1.5
(b) Result in x space, β = 0
3
1.5
x3
y2
2
1
1
0.5
0
2
−1
−2
−1
0
1
2
1.5
x2
3
y1
(c) Result in y space, β = 10
1
0
−0.5
x1
0.5
1
(d) Result in x space, β = 10
3
1.5
x3
y2
2
1
1
0.5
0
2
−1
−2
−1
0
1
2
y1
(e) Result in y space, β = 100
3
1.5
x2
1
−0.5
0
0.5
x1
(f) Result in x space, β = 100
Figure 3.
Simulation results of 3-DOF articulated manipulator control.
Experiments were conducted with three different conditions, β = 0, 10 and
100. Each row shows the results for different β values. The analysis were
performed in y space (left column) and x space (right column). We illustrate
the configuration of the manipulator over time in the left column as images
which fade with time. If β = 0, CRROS provides the minimum norm solution,
and does not consider the distribution of training data. As a result, it fails
to track a desired trajectory. In the second and third cases (β = 10, 100),
CRROS successfully tracked a desired trajectory. In y space, the results (c)(e)
look similar, but the state path in x space are significantly different. The path
in (f) tends to be closer to training data points than (d), thus the trajectory
has several sharp curvatures. This is caused by a high gain β pushing more
forcefully the state toward a reliable region.
added unbiased Gaussian noise on the samples of x1 whose
standard deviation is (x1 + 1.5)2 , that is heteroscedastic noise.
The motivation for this particular training data is to effectively
observe the different features of two CRROS versions. In this
training data, there is a special region existing in the right
side of x1 axis, which includes many data points and also
significant noise. We conducted simulations with a standard
GPR and a heteroscedastic GPR in the same configuration.
Fig. 5 shows the results.
1
The CRROS with a standard GPR drove the state toward the
high density data region (Fig. 5 (c)), and the substantial noise
in the region resulted in the failure of control (Fig. 5 (a)). For
clear visualization, x3 axis in Fig. 5 (c) and (d) are omitted.
Fig. 5 (e) shows the evidence more clearly. For the comparison,
two prediction uncertainty plots, acquired by the standard GPR
and the heteroscedastic GPR for the x trajectory, were drawn
together in Fig. 5 (e). The standard GPR was not able to
detect the change of noise level until it diverged, although
the heteroscedastic GPR was able to do so. In contrast, the
CRROS with a heteroscedastic GPR successfully tracked the
desired trajectory by avoiding unreliable regions caused by the
low data density and the high noise level (right column of Fig.
5).
V. E XPERIMENTS WITH F LEX - FINGER
In the previous section, we presented simulation results of a
3-DOF articulated manipulator system to validate our methodology. Of course, for such a system control engineers could
also use an analytical model. However, there are a number
of nonlinear systems whose models cannot be represented
analytically. To demonstrate the effectiveness of CRROS for
those systems, we selected a robotic system whose analytical
model is difficult to derive.
The robot is called Flex-finger and designed to study various
features of the human finger (Fig. 6). The robot is a highly
nonlinear system. Its main body is made of a spring-tempered
steel plate and plastic bones, which act as phalanges and joints
respectively, but do not have a deterministic rotational center.
Four compliant cables (corresponding to the human tendons),
consisting of strings and springs, connect the plastic bones
to the motors in order to generate flexion/extension motion.
Four servo motors (corresponding to human muscles) change
the cable lengths, which leads to a change in the pose of
Flex-finger. The pose of the fingertip is observed by a motion
capture system (corresponding to the human eye).
The objective in this experiment is to control the tip of Flexfinger to track a desired path, which is similar to the control
of the human finger with muscles. In this case, going back
8
3
3
Initial point
y2
2
2
2
y
Training
data
3
Output
Desired path
2
3DOF manipulator
1
1
0
0
−1
−2
−1
−2
(a) Flexible finger
0
y1
1
2
3
(a) Result in y space, St. GPR
3
2.5
2.5
0.5
x1
0
3
0.4
0.6
Time
2
Noise level
−1
x1
0
1
í
1
(c)
x Overview of Flex-finger
1
Figure 6.
A manipulator called Flex-finger was used to validate the
performance of CRROS. Flex-finger consists of a spring-tempered steel plate,
plastic bone, string, spring, servo motor and pulley. Compliant elements and
nonlinear tendon configuration make it difficult to build an analytical model.
Depending on the amount of tensions, the friction on cables varies and results
in heteroscedastic noise. Some finger poses which deform the steel plate
significantly involves high friction on cables as shown in (b). The input range
of Flex-finger, or the angle of motor pulley, is limited for preventing damage
on the springs and the finger.
1
1
Std, Het. GPR
Error
0.5
0.2
1
x2
1.5
(d) Result in x space, Het. GPR
Std, St. GPR
Std, Het. GPR
Error
Training data
State
−2
1
(c) Result in x space, St. GPR
0
0
2
0.5
Noise level
−1
0.5
1
1
1
1
y1
2
Initial point
x2
2
x
1.5
0
(b) Result in y space, Het. GPR
3
2
−1
x3
−1
(b) A pose involving high friction
2
0.8
1
0
0
0.2
0.4
0.6
Time
0.8
1
(e) Error and prediction uncertainty, (f) Error and prediction uncertainty,
St. GPR
Het. GPR
Figure 5. Simulation results of CRROS with two versions of GPRs. The left
column figures show the results of the CRROS with a standard GPR, and the
right column figures show the results of the CRROS with a heteroscedastic
GPR. We sampled 100 training data points with a Gamma distribution for x1
and uniform distributions for x2 and x3 . The Gamma distribution was used to
create a biased density of data as shown in (c) and (d). The higher x1 area has
the higher density of data. In addition, we added heteroscedastic noise. The y
with a higher x1 has a bigger noise. With a standard GPR, CRROS failed to
track the given trajectory because it cannot consider the heteroscedastic noise
in uncertainty estimation as shown in (a)(c)(e). In contrast, the CRROS with
a heteroscedastic GPR was able to track the desired trajectory by avoiding
statistically unreliable regions (i.e, sparse data region or high noise region).
to (1), x consists of the rotational angles of the servo motors
and y consists of the x-y coordinates of the fingertip position.
This is a challenging control problem, even for human. First,
the kinematics of the system are highly nonlinear. Particularly,
there are several bifurcation points due to the characteristics of
spring-tempered steel plate. This means that for a small change
in x, y can dramatically diverge. Second, the system has
heteroscedastic noise caused by friction along string routing.
The configuration of x varies the friction, and changes the
noise level on y. Third, the range of operational control input
is limited. To prevent damage to Flex-finger (e.g, permanent
deformation of the spring), the control operation must always
be done within a safe input range.
For collecting the training data, 700 motor angle values,
or x, were supplied and corresponding finger positions, or
y, were recorded. The input values were randomly selected
with uniform distributions from within the available operation
range. This sampling naturally generated dense data areas and
sparse data areas. We will call the upper-right area of y the
‘high noise area’ (Fig. 7 (a) (b)) because y in this area is
generated by bending Flex-finger significantly as shown in
Fig. 6 (b), requiring stronger tension combinations and causing
higher friction than other areas.
The CRROS algorithm was implemented in C++ (with RT
Linux) and executed on a desktop with Intel Xeon 3.5 GHz
CPU and 8 GB RAM. In the program, the control loop ran
at 30 Hz. The loop frequency was mainly determined by the
communication speed with servo motors, not by the speed of
CRROS. The execution time of the CRROS algorithm in a
control loop was less than 10 ms and 20 ms for a standard
GPR and a heteroscedastic GPR respectively.
We have conducted two sets of experiments. In the first
experiment, the performances and limitations of CRROS were
explored with a standard GPR while tracking two desired
trajectories. In the second, CRROS with a heteroscedastic GPR
was used to overcome the shortcomings of a standard GPR.
A. CRROS with a standard GPR
In the first set of experiments, CRROS with a standard
GPR tracked two desired trajectories. The first trajectory was
designed to avoid the high noise area, and the second trajectory
was designed to pass through the high noise area. The control
gains were tuned for the optimal tracking performance, and
the optimized gain was used for all experiments. Fig. 7 (a)-(d)
illustrates the experiment results.
9
High noise area
100
80
60
40
Initial point
20
Initial point
20
40
mm
60
0
0
80
20
40
mm
60
80
80
60
0
0
Initial point
60
Initial point
40
20
20
40
mm
60
0
0
80
20
40
mm
60
80
(a) Result in y space, elliptical path (b) Result in y space, rectangular
path
Std.
Error
5
20
40
60
10
10
Std.
Error
5
mm
mm
mm
80
10
Std.
Error
0
0
100
20
(a) Result in y space, without passing (b) Result in y space, with passing
the high noise area
the high noise area
10
High noise area
100
40
20
120
High noise area
0
0
sec
20
40
60
sec
(c) Error and prediction uncertainty, (d) Error and prediction uncertainty,
without passing the high noise area with passing the high noise area
Figure 7. CRROS with a standard GPR controlled Flex-finger tracking two
different trajectories. (a),(b) show the tracking path in y space, and (c),(d)
show the tracking error and prediction uncertainty during the tracking. The
errors were reasonably low when the state was not around the high noise area.
When the state is close to or within the high noise area, the tracking errors
significantly increased. This result indicates that CRROS with a standard GPR
is not able to avoid the unreliable region caused by the high noise level.
In the experiments, as soon as the CRROS algorithm starts,
the finger tip immediately converged to the desired trajectory,
and the prediction uncertainty also decreased. The tracking
errors were reasonably low until the finger tip reached the
high noise area. When the finger tip is close to or within the
high noise area, the tracking errors increased. For the first
trajectory, the prediction uncertainty and the tracking errors
were always low (Fig. 7 (c)), validating that CRROS pushes
the state toward the low uncertainty region. For the second
trajectory, the tracking error was larger than the first case when
the state was in the high noise area. It indicates that CRROS
with a standard GPR is limited due to heteroscedastic noise.
While the state is in the high noise area, the statistical model
was different from the actual system because CRROS failed
to avoid the unreliable area, and as a result, CRROS generated
wrong directional outputs, causing a fluctuating motion, and
increasing the tracking error.
B. CRROS with a heteroscedastic GPR
In the second set of experiments, CRROS with a heteroscedastic GPR tracked two desired trajectories which pass
through the high noise area. The first desired trajectory was the
same trajectory which the CRROS with a standard GPR could
not track. The other trajectory was created with a different
Std.
Error
mm
0
0
60
mm
mm
mm
80
40
Training data
Desired path
120 Actual path
120
High noise area
mm
120Training data
Desired path
100Actual path
5
0
0
20
40
60
5
0
0
sec
20
40
60
sec
(c) Error and prediction uncertainty, (d) Error and prediction uncertainty,
elliptical path
rectangular path
Figure 8. CRROS with a heteroscedastic GPR controlled Flex-finger to track
two trajectories. (a),(b) show the tracking path in y space, and (c),(d) show
the tracking error and prediction uncertainty during the tracking. Overall, the
tracking errors were kept in a reasonably low level for whole trajectories. In
the high noise region, the prediction uncertainty was increased, but CRROS
was able to track the desired path. This verifies that CRROS drove the state
toward the reliable region containing dense data and low noise.
shape, rectangle, and with a different direction, counterclockwise. Fig. 8 and Fig. 9 illustrate the experiment results.
Overall, the tracking performance of the CRROS with a
heteroscedastic GPR was superior to the CRROS with a
standard GPR. At first, the tracking error was similar, but
after entering the high noise region, the superiority of the
heteroscedastic GPR was evident. The prediction uncertainty
increased in the high noise region, but CRROS drove the state
toward the reliable region containing dense data and low noise,
while maintaining low tracking errors. Another notable point
is that most of the control inputs used in tracking two different
trajectories were located within the range of training data
which we defined for data collection (Fig. 9). This verifies that
CRROS tends to push the state toward the dense data region.
The minor exception shown in Fig. 9 (c) can be explained
by the priority of CRROS. The first priority of CRROS is to
track a desired trajectory. If two objectives, that are tracking
and uncertainty minimization, are conflicted, CRROS selects
to track a desired trajectory.
VI. D ISCUSSION
We have presented a statistical model-based control algorithm using GPR, called CRROS. The features of CRROS
have been explored by simulation with a 3-DOF articulated
10
60
Ellipse
40
Training data range
60
Rectangle
Training data range
%
%
40
20
0
60
20
80
0
100 120 140 160
deg
(a) Histogram of state, x1
60
80
100
deg
120
140
(b) Histogram of state, x2
60
60
Training data range
Training data range
%
40
%
40
40
20
0
200
20
250
deg
(c) Histogram of state, x3
300
0
100
150
200
deg
250
(d) Histogram of state, x4
Figure 9.
Histograms of system state used in tracking two different
trajectories. Red histograms are obtained when CRROS with a heteroscedastic
GPR tracked the ellipse trajectory, and blue histograms for the rectangle
trajectory. Most of the states were located within the range of training data.
This verifies that CRROS tends to push the state toward the dense data
region. The minor exception shown in (c) can be explained by the priority of
CRROS. If two objectives of CRROS, tracking and uncertainty minimization,
are conflicted, then CRROS selects to track a desired trajectory.
manipulator, and its effectiveness has been validated by the
experiments with Flex-finger. The experiments with a physical
system introduce a host of practical considerations including
challenges in modeling, heteroscedastic noise, and restrictions
in range of inputs. CRROS drives the state toward a low
uncertainty region while tracking a desired output trajectory.
Since CRROS takes advantages of the pseudo-inverse and
gradient operator, it does not need any iteration, and thus
allows for high speed control loop, which is essential in many
robotics applications or large scale applications. Furthermore,
the control is performed around the training data and low noise
region, which ensures safety in control.
Many a times robotic system designers have avoided optimal
designs to ensure that analytical models can be developed
for their systems. For example, they have preferred to use
linear elements such a linear spring which are easily modeled.
A significant motivation for CRROS is to provide control
schemes for a wide range of systems. The advantages shown in
the simulations and experiments indicate potential of CRROS
to various robotic systems that are difficulty to model, such
a soft robot. In addition, CRROS provides a unique optimum
solution for redundant DOFs of robotic manipulators. There
are various types of optimization techniques for the redundant
DOFs of manipulators, minimizing energy, torque and etc [23].
CRROS selects the solution closest to the previous experience
within low-noise training data. This unique optimization feature is useful when teaching a robot an objective for which it
is challenging to express in a simple mathematical form.
For practical implementation of CRROS, a user needs to
consider several points. The first one is the selection of a version of GPR to use CRROS. Theoretically, a heteroscedastic
GPR can model a system more accurately than a standard
GPR. However, the superiority of a heteroscedastic GPR in
modeling does not always guarantee a higher performance
in CRROS. The higher complexity of a heteroscedastic GPR
requires a longer prediction time and may constrain the step
size of a gradient-based optimization. Therefore, the significance of heteroscedastic noise and the step size in a control
loop should be considered together to select a version of
GPR for CRROS. Another practical issue is tuning of the
control gains. The first gain ξ can be decided depending on
how aggressively the control engineer wants the system to
be controlled. The second gain β can be tuned to make a
good balance between the left and right terms in (29), or
+
J+
µ (yt+1 − ŷt ) and β(Id − Jµ Jµ )∇fΞ . It can be confirmed by
monitoring each element, and practically they can be obtained
in a few iterations.
In the future, CRROS can be improved in several ways.
First, the development of a more systematic data collection
method can improve the performance of CRROS. Because
the performance of the statistical model-based control is
highly dependent on the distribution of a training data set, a
better strategy of data collection would lead to better control
performance. Active learning theories developed in RL can
be a solution [24]. Another important issue is the correlation of multivariate outputs. In this paper, we predicted the
multidimensional output with multiple independent regression
functions. That is, we are using multi-kriging [25]. Thus,
the predicted multivariate outputs are uncorrelated. This issue
might be resolved by using dependent GP [26]. As stated
above, the acceleration of GPR prediction may improve the
control performance for large scale systems. Sparse approximation [16], [17] and GPU-based calculation [18] are possible
solutions.
R EFERENCES
[1] C. M. Bishop, Neural Networks for Pattern Recognition. New York,
NY, USA: Oxford University Press, Inc., 1995.
[2] A. Smola and B. Schlkopf, “A tutorial on support vector regression,”
Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.
[3] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine
learning. The MIT Press, 2006.
[4] G. Gregori and G. Lightbody, “Gaussian process approach for modelling
of nonlinear systems,” Engineering Applications of Artificial Intelligence, vol. 22, no. 45, pp. 522 – 533, 2009.
[5] C. Klein and C.-H. Huang, “Review of pseudoinverse control for
use with kinematically redundant manipulators,” Systems, Man and
Cybernetics, IEEE Transactions on, vol. SMC-13, no. 2, pp. 245–250,
1983.
[6] A. D. Deshpande, J. Ko, D. Fox, and Y. Matsuoka, “Control strategies
for the index finger of a tendon-driven hand,” The International Journal
of Robotics Research, vol. 32, no. 1, pp. 115–128, 2013.
[7] O. Khatib, “Real-time obstacle avoidance for manipulators and mobile
robots,” The international journal of robotics research, vol. 5, no. 1,
pp. 90–98, 1986.
[8] F. Allgöwer, T. A. Badgwell, J. S. Qin, J. B. Rawlings, and S. J.
Wright, “Nonlinear predictive control and moving horizon estimationan
introductory overview,” in Advances in control, pp. 391–449, Springer,
1999.
[9] B. Likar and J. Kocijan, “Predictive control of a gasliquid separation
plant based on a gaussian process model,” Computers and Chemical
Engineering, vol. 31, no. 3, pp. 142 – 152, 2007.
[10] J. Kocijan, R. Murray-Smith, C. Rasmussen, and A. Girard, “Gaussian
process model based predictive control,” in Proceedings of the American
Control Conference, vol. 3, pp. 2214–2219, 2004.
11
[11] M. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and
data-efficient approach to policy search,” in Proceedings of the 28th
International Conference on Machine Learning, pp. 465–472, 2011.
[12] J. Peters and S. Schaal, “Policy gradient methods for robotics,” in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference
on, pp. 2219–2225, IEEE, 2006.
[13] R. Martinez-Cantin, N. de Freitas, A. Doucet, and J. A. Castellanos,
“Active policy learning for robot planning and exploration under uncertainty.,” in Robotics: Science and Systems, pp. 321–328, 2007.
[14] “Supplementary material of CRROS,” http:// www.me.utexas.edu/
∼reneu/ res/ CRROS.html.
[15] “Control in the reliable region of a statistical model with gaussian
process regression,” in IEEE/RSJ International Conference on Intelligent
Robots and Systems, pp. 654–660, 2014.
[16] E. Snelson and Z. Ghahramani, “Sparse gaussian processes using
pseudo-inputs,” in Advances in Neural Information Processing Systems
18, pp. 1257–1264, 2006.
[17] D. Nguyen-Tuong and J. Peters, “Local gaussian process regression
for real-time model-based robot control,” in IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 380–385, 2008.
[18] B. V. Srinivasan, Q. Hu, and R. Duraiswami, “Gpuml: Graphical
processors for speeding up kernel machines,” in Workshop on High
Performance Analytics-Algorithms, Implementations, and Applications,
Siam Conference on Data Mining, Citeseer, 2010.
[19] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, “Most likely
heteroscedastic gaussian process regression,” in Proceedings of the 24th
international conference on Machine learning, pp. 393–400, 2007.
[20] Q. V. Le, A. J. Smola, and S. Canu, “Heteroscedastic gaussian process
regression,” in Proceedings of the 22nd international conference on
Machine learning, pp. 489–496, 2005.
[21] M. Lzaro-gredilla and M. K. Titsias, “Variational heteroscedastic gaussian process regression,” in Proceedings of the 28nd international
conference on Machine learning, pp. 841–848, 2011.
[22] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood
from incomplete data via the em algorithm,” Journal of the royal
statistical society. Series B, pp. 1–38, 1977.
[23] J. Park, Y. Choi, W. K. Chung, and Y. Youm, “Multiple tasks kinematics
using weighted pseudo-inverse for kinematically redundant manipulators,” in IEEE International Conference on Robotics and Automation,
pp. 4041–4047, 2001.
[24] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in
robotics: A survey,” International Journal of Robotics Research, vol. 32,
no. 11, pp. 1238–1274, 2013.
[25] C. Williams and C. Rasmussen, “Gaussian processes for regression,” in
Advances in neural information processing systems, pp. 514–520, 1996.
[26] P. Boyle and M. Frean, “Dependent gaussian processes,” in Advances
in Neural Information Processing Systems, pp. 217–224, 2004.