Control in the Reliable Region of a Statistical Model
Transcription
Control in the Reliable Region of a Statistical Model
2 Control in the Reliable Region of a Statistical Model Youngmok Yun, Student member, IEEE, Jonathan Ko, Member, IEEE and Ashish D. Deshpande, Member, IEEE Abstract—We present a novel statistical model-based control algorithm, called control in the reliable region of a statistical model (CRROS). First, CRROS builds a statistical model with Gaussian process regression which provides a prediction function and uncertainty of the prediction. Then, CRROS avoids high uncertainty regions of the statistical model by regulating the null space of the pseudo-inverse solution. The simulation results demonstrate that CRROS drives the states toward high density and low noise regions of training data, ensuring high reliability of the model. The experiments with a robotic finger, called Flexfinger, show the potential of CRROS to control robotic systems that are difficult to model, contain constrained inputs, and exhibit heteroscedastic noise output. I. I NTRODUCTION Robotics researchers have developed statistical model learning algorithms with neural networks (NN) [1], support vector regression (SVR) [2], and Gaussian process regression (GPR) [3], [4], to resolve the challenges of system identification. However, control with statistical models is still challenging since these data-dependent models are sometimes unreliable, and the incorrect model might lead to control failure and safety accidents. In addition, the heavy computational burden of statistical models results in a low control loop frequency, which is critical in control. Our goal is to develop a statistical model-based control algorithm which is reliable, fast, and operated in a safe range. A crucial weakness of the statistical modeling methods is that the reliability of statistical models is highly dependent on the training data. Even if a learning algorithm successfully works with a given training data, it is almost impossible to reliably predict the behavior of nonlinear systems when the predicted region includes sparse data or a significant level of noise (Fig. 1). If a state is out of the reliable region, the statistical model might be substantially different from the actual system behavior, eventually resulting in failure of control. We consider a tracking control problem for the following system. y = g(x) (1) dim(x) > dim(y) Y. Yun and A. D. Deshpande are with the Department of Mechanical Engineering, The University of Texas at Austin, Austin, TX, 78703, USA, e-mail: {yunyoungmok@utexas.edu, ashish@austin.utexas.edu}. J. Ko is with the Department of Computer Science and Engineering, University of Washington, Seattle, WA, 98195, USA, e-mail: jonko6@gmail.com where y is the output, x is the controllable state, and g is a nonlinear system. Since dim(x) > dim(y), multiple x can work as solution for a given y. For nonlinear systems, this redundancy is common for various reasons (e.g, singularity). A kinematically redundant manipulator with joint angles as state and end-effector position as output is a well-known example of such a system [5]. Another example is a human-like tendondriven finger [6]. We seek to design a control algorithm to determine values of x that will track a desired trajectory of y with a statistical model, denoted as f . Here, for successful control, the statistical model f should reliably represent g. We present a statistical model-based control strategy for the system (1), called CRROS (control in the reliable region of a statistical model). The underlying idea is to drive x toward an area of reliable f by regulating redundant DOFs while pursuing a desired output trajectory. With a statistical modeling method, unreliable f is inevitable in sparse data regions or serious noise region, but CRROS is designed to avoid these unreliable regions. The first step of CRROS is to develop a statistical model with GPR. Since GPR is fully based on Bayes’ theorem, it provides not only the predicted output for a given input but also the uncertainty of the prediction. Here, the prediction uncertainty is important because it indicates which regions are reliable. The next step in CRROS is to track a desired output y by using the pseudoinverse of the Jacobian of the GPR model, and simultaneously regulate the null space to drive x toward a region of low uncertainty. A previous work related with our research is the obstacle avoidance algorithm for manipulators and mobile robots using a virtual repulsive force field [7]. The force field in [7] is generated by the location of obstacles in output space, but in CRROS, the force field is generated by the distribution of prediction uncertainty in state space. The approach of force field provides an optimum direction to minimize the prediction uncertainty by using the gradient operator without iterative optimization procedure. After the introduction of the GPR, few researchers have integrated nonlinear model predictive control (NMPC) [8] with the uncertainty prediction function of GPR to address the model reliability issues. However, because the combination of GPR with NMPC requires considerable computational burden, the applications have been restricted to only chemical process control [9], [10]. Another related approach has been developed in the field of reinforcement learning (RL). RL researchers have developed a learning algorithm to determine a behavior policy of robots to achieve a specific goal [11], [12], [13]. The learning procedure contains an effort to minimize the uncertainty of the statistical model. 3 II. S YSTEM M ODELING WITH G AUSSIAN P ROCESS R EGRESSION In this section, we present an overview of building a model for a nonlinear system with the GPR algorithm [3], [4]. An advantage of the GPR is that the user does not need to actively tune the complexity and parameters of the model, in contrast to other nonlinear regression modeling methods including NN [1] and SVR [2]. Once the Gaussian process (GP) is defined with a mean and a covariance function (zero mean and Gaussian kernel is mostly used), a detailed model is determined by the database itself. Another merit is that its prediction output is a probabilistic distribution defined by Gaussian distribution, and the variance of the distribution can be regarded as the level of prediction uncertainty. This uncertainty plays a key role in the development of CRROS. One disadvantage of GPR is the heavy computational burden, which increases by O(n2 ) for prediction, where n is the number of training data [3]. This can be partially resolved by implementing methods such as sparse approximation [16], local Gaussian process regression [17] or usage of GPU [18]. Training data 95% confidence, St GPR 95% confidence, Het GPR Fitting curve, St GPR Fitting curve, Het GPR Output The policy search algorithms may allow robots to pursue more sophisticated goals, but the selection of an appropriate policy and the optimization of policy parameters are still challenging problems. CRROS solves a fundamental problem of a statistical model (i.e, prediction reliability) for controlling the system of (1) to track a desired trajectory. Since CRROS takes advantages of the force field approach (gradient), which can provide an optimum solution at the current state, it does not need any iterations, and thus allows for a high speed control loop. CRROS pushes the state towards a region with dense and low-noise data. This tendency makes the system control safer because the training data region is a guaranteed safe region and low noise leads to predictable outputs. These advantages enable CRROS to control systems for which it is challenging to develop a parametric model, e.g, soft robots. Also, CRROS is an efficient algorithm for teaching a redundant robot because its state follows the region where the robot has learned and the learning has been repetitive. For example, if we teach a kinematically redundant manipulator desired motions, the robot will generate a new x to track a desired y, and the redundant motion will be optimized to be close to the learned motion. In the following sections, we explain the GPR-based modeling method with two versions, a standard GPR and a heteroscedastic GPR (Section II), and the details of CRROS (Section III). For validation, we carried out computer simulations (Section IV) and experiments with a robotic system (Section V). In the experiments, Flex-finger, for which it is challenging to build an analytical model, is controlled to prove the practical effectiveness of CRROS. In addition, we provide all simulation codes written in MATLAB to help reader’s understanding of the core principle of CRROS [14]. We also provide all experiment videos to show the detail of experiment results [14]. Preliminary work on this topic was presented in a conference [15]. Training data St GPR Het GPR High density, Low noise Low uncertainty Low uncertainty Low density, High uncertainty High uncertainty High density, High noise Low uncertainty High uncertainty Input Figure 1. From a nonlinear function, a training data set was sampled with heteroscedastic noise. The training data can be classified into three regions. The first includes the high density of data and low noise. The second includes sparse data. The third includes the high density of data and high noise. The only first region of training data can be reliably modeled with a statistical method. The standard version of GPR [3] considers the density of training data and the overall noise level for uncertainty estimation. The heteroscedastic GPR [19] can consider not only the density of training data but also heteroscedastic noise when it estimates prediction uncertainty. As a result, in the third region, the uncertainty of prediction with the standard GPR is low, which is wrong. Otherwise the heteroscedastic GPR is able to estimate its uncertainty correctly for all regions. In this paper, we present CRROS with two GPR algorithms. The first one uses a standard version of GPR [3], while the second uses a heteroscedastic GPR [19]. Fig. 1 compares the different properties of two GPRs. The core principle of CRROS is the same for both versions, and indeed can be combined with other statistical modeling methods if they provide the uncertainty of prediction. The standard GPR is a well-known regression algorithm in robotics and statistics, supported by many communities because of its robustness and mathematical simplicity. The heteroscedastic GPR is able to model more complex systems, but is approximately half as fast as standard GPR. There are ongoing works to improve the speed of the heteroscedastic GPR [19], [20], [21]. In general, we recommend to use a heteroscedastic GPR for CRROS when required to model heteroscedastic systems, and use a standard GPR otherwise. For the detailed properties of CRROS with two GPRs, we will discuss in the later sections. A. Modeling with a Standard GPR We first build a statistical model with the standard version of GPR. We will explain only the essential part of GPR for CRROS. For further details, refer [3]. Let us assume that we have a training data set D for the system (1), which contains inputs xi and outputs yi : D = {(xi , yi )|i = 1, 2, . . . , n}, X = [x1 x2 . . . xn ]> , y = [y1 y2 . . . yn ]> . Given this data set, we wish to predict the probability density of y∗ for a new query point x∗ . In other words, we want to find p(y∗ |x∗ , D, Θ), where Θ is a hyperparameter set for the 4 GPR model. If the system (1) has a multidimensional output (MIMO system), its output is predicted by multiple functions, p(y∗j |x∗ , Dj , Θj ), where j is the index of the output. In this section, for simplicity, we only explain p(y∗ |x∗ , D, Θ). The next section describes the expansion into a multidimensional output. A Gaussian process is a stochastic process, and defined as a collection of random variables, any finite number of which have a joint Gaussian distribution. It is completely determined by a mean function and a covariance function as: f (x) ≡ GP (m(x), k(x, x0 )) (2) where mean function m(x) = E[f (x)], and covariance function k(x, x0 ) = E[(f (x) − m(x))(f (x0 ) − m(x0 ))]. In our GP model, the mean function and the covariance function are determined as: m(x) =0 (3) k(x, x0 ) =νf2 exp − > (δx) Λ(δx) 2 + νn2 ϕ(i, j) (4) where δx is defined as x − x0 . ϕ(i, j) is the Kronecker delta function. Λ is a diagonal matrix whose elements are (squared inverse) length scales for input space. νf2 specifies the overall vertical scale of variation of the latent values, νn2 the latent noise variance. The hyperparameter set, denoted as Θ, is defined as a set of νf , νn and Λ, and determines the characteristics of a GP model. The Gaussian process prior is updated with a training data set based on Bayes’ theorem. The posterior is given as: −1 −1 p(y∗ |x∗ , D, Θ) = N (k> y, κ − k> k∗ ) ∗K ∗K (5) where K is a matrix whose elements Kij are covariance function values of k(xi , xj ). k∗ = [k(x∗ , x1 ) . . . k(x∗ , xn )]T . κ = k(x∗ , x∗ ). For the derivation, refer to [3]. Fig. 1 helps in understanding the useful feature of GPR. The characteristics of a GP model are determined by the choice of the hyperparameter set, so selection of an appropriate hyperparameter set is critical for a good representation of a system. We select a hyperparameter set which maximizes the log-likelihood p(y|X, Θ): 1 1 n log p(y|X, Θ) = − y> K−1 y − log |K| − log 2π (6) 2 2 2 and the maximization is performed with a gradient-based optimization algorithm. The gradient is calculated as: ∂ log p(y|X, Θ) 1 ∂K −1 1 ∂K = y> K−1 K y − tr(K−1 ) ∂θ 2 ∂θ 2 ∂θ (7) both the density of data and the noise level in estimating the prediction uncertainty. For the main GP, the same data set D introduced in the previous subsection is used. The dataset for the second GP is determined as: D0 = {(xi , zi )|i = 1, 2, . . . , n} where zi = log(ri ) and ri is the square of residual of the main GPR at the query point xi . The log function was used to ensure the positive value of r in the inverse function, ri = exp(zi ). The posterior of the main GP, or the heteroscedastic GPR, is given as: p(y∗ |x∗ , D, Θ) −1 −1 = N (k> y, κ + r∗ − k> k∗ ) ∗ (K + R) ∗ (K + R) (8) where R = diag(r1 . . . rn ), r∗ = log(z∗ ), and z∗ is the mean of the second GP posterior at the query point x∗ . K, k∗ , κ are determined in the same way as (5). The main and second GPs are recursively connected because the prediction with the main GP changes the residuals and the regression of these residuals (the second GP) which again affects the main GP. The hyperparameter sets determining two GPs can be optimized by iterating predictions of two GPs until the hyperparameters converge like an EM algorithm [22]. For the initial step of iteration, a standard GP is used as the main GP since we cannot obtain the residuals yet. III. D EVELOPMENT OF CRROS As stated in Section I, a statistical model has different features compared to an analytical model, thus it demands a different control strategy. Specifically, if we apply a conventional control algorithm to a statistical model, it will output a path of x with no regard to the training data. Since the reliability of model is highly dependent on the density of data and noise level, this may result in failure of control. One example of this situation is presented in Section IV. To address this problem, we propose a novel control strategy, called CRROS that takes advantage of the system redundancy to drive the system states to reliable regions. The basic idea of CRROS is to first track a desired output with the pseudoinverse of the Jacobian of the GPR model, then push the state of the system in the direction which minimizes the prediction uncertainty on the null space associated with the Jacobian. The direction to minimize the prediction uncertainty is found by the gradient operator providing the slope of prediction uncertainty field obtained from GPR. The advantage of this approach is that CRROS can find the optimum solution at the current state without iteration. This method is inspired by the obstacle avoidance algorithm [5], [7]. In CRROS, the obstacle is the unreliable region of a statistical model. B. Modeling with a heteroscedastic GPR A. Necessary equations for developing CRROS with standard GPR Secondly, we build a statistical model with the heteroscedastic GPR devised by [19]. The basic idea behind this method is to create two GP models. The main GP is used to find a regression model for the training data, while the second GP is used to fit the residuals of the main regression model. By combining the two GPs, the heteroscedastic GPR accounts for To introduce CRROS, we first need to define a number of mathematical terms which will be used in the main CRROS algorithm. The standard GP in (5) provides two values: predicted output and variance of the prediction, which can be re-written with two functions. The first function, fµ (x∗ ) in (9), is for the predicted output at a query point x∗ , and the 5 other function, fΣ (x∗ ) in (10), is for the variance (uncertainty) of the prediction. fµ (x∗ ) = fΣ (x∗ ) = −1 k> y ∗K > −1 κ − k∗ K k∗ gradient are defined as: fΞ (x∗ ) = (9) (10) ∇fΞ (x∗ ) = JΣ (x∗ ) = (18) q X !> JΣi (x∗ ) (19) i=1 ∂fµ (x∗ ) ∂x∗ = (K−1 y)> fΣi (x∗ ) i=1 Next, their Jacobians are calculated as: Jµ (x∗ ) = q X ∂k∗ ∂x∗ (11) ∂fΣ (x∗ ) ∂x∗ −1 = −2k> ∗K ∂k∗ ∂x∗ (12) The size of both matrixes Jµ and JΣ are 1 × d where d = dim(x). The term ∂k∗ /∂x∗ , commonly used in (11) and (12), is defined as: ∂(k(x ,x )) ∗ 1 ∗ ,x1 )) · · · ∂(k(x ∂x∗ [1] ∂x∗ [d] ∂k∗ .. .. .. (13) = . . . ∂x∗ ∂(k(x∗ ,xn )) ∂(k(x∗ ,xn )) ··· ∂x∗ [1] ∂x∗ [d] where x∗ [i] indicates the i-th element in the vector x∗ . In our case, since the covariance function of GP uses the squared exponential kernel, each element is defined as: ∂(k(x∗ , x)) (δx)> Λ(δx) = −Λii (x∗ [i] − x[i])νf2 exp − ∂x∗ [i] 2 (14) where fΞ is a scalar function providing the sum of prediction variances; the variances are already squared values of standard deviations, thus the sum operation can represent its total variance. ∇fΞ (x∗ ) is a d-dimensional gradient vector which indicates the direction of the linearized slope of fΞ at x∗ point. B. Necessary equations for developing CRROS with heteroscedastic GPR For the heteroscedastic GPR, we only need to replace (9)(12) with (20)-(23): −1 fµ (x∗ ) = k> y ∗ (K + R) fΣ (x∗ ) = κ + r∗ − k> ∗ (K (20) −1 + R) k∗ > ∂k∗ Jµ (x∗ ) = (K + R)−1 y ∂x∗ ∂r∗ −1 ∂k∗ JΣ (x∗ ) = − 2k> ∗ (K + R) ∂x∗ ∂x∗ (21) (22) (23) Since r∗ = exp(z∗ ) and z∗ is the mean of GP posterior, ∂r∗ /∂x∗ is calculated with (11). All other elements are obtained in the same way of the standard GPR. C. CRROS algorithm So far, we have been limited to one-dimensional outputs. Let us expand this problem into a multidimensional output problem, dim(y) = q > 1. We assume that we do not have any knowledge of relation between the outputs, a priori, thus the multi-dimensional output functions are simply stacked with the one dimensional functions. That is: fµ (x∗ ) = [fµ1 (x∗ ) fµ2 (x∗ ) ... fµq (x∗ )]> (15) fΣ (x∗ ) = [fΣ1 (x∗ ) fΣ2 (x∗ ) ... fΣq (x∗ )]> (16) fµi and fΣi are for individual outputs, built in (9) and (10). Both fµ and fΣ are q-dimensional vectors. In the same way, q × d dimensional jacobian matrixes Jµ and JΣ are built as: 1 Jµ (x∗ ) Jµ2 (x∗ ) Jµ (x∗ ) = . , .. Jµq (x∗ ) 1 JΣ (x∗ ) JΣ2 (x∗ ) JΣ (x∗ ) = . .. (17) JΣq (x∗ ) Since we want to solve a control problem, we are more interested in the sum of variances of multiple outputs rather than individual variances. Therefore, another function and its The system equation g in (1) is modeled with the GP mean function fµ , and it can be linearized as (24). Inversely it can be written as (25). In the view point of the feedback control, (25) can be rewritten as (26). ∆y = Jµ ∆x ∆x = x̃t+1 = + J+ µ ∆y + (Id − Jµ Jµ )v ξ J+ µ (yt+1 − ŷt ) + (Id − (24) (25) J+ µ Jµ )v + x̂t (26) where x̃t+1 is the next target state to be controlled, x̂t is the observed current state, yt+1 is the next desired output, and ŷt is the observed current output. From here, since we are only interested in the prediction at the point x̂t , in other words x∗ = x̂t , we will use Jµ instead of Jµ (x̂t ). Other notations will follow this omission rule as well. Jµ is the Jacobian matrix for fµ . J+ µ is the Moore Penrose pseudoinverse of Jµ and ξ is a scalar gain. (Id −J+ µ Jµ ) is a matrix projecting a vector on the null space associated with Jµ where Id is a d × d identity matrix. v is an arbitrary d-dimensional vector. Here, a remarkable point is that we can select any v since it is projected on the null space which does not affect ∆y, but v can affect ∆x. In many control applications, v is set to be a zero vector because it minimizes ||x̃t+1 − x̂t ||, in other words, it is the minimum norm solution. In our case, v is set in a different way. In CRROS, we wish to minimize the sum of prediction variances fΞ while tracking a desired path. To reduce fΞ , we 6 need to consider a direction of ∆x to minimize ∆fΞ . The direction of ∆x is defined with a direction vector vd : ∇fΞ vd = argmin (fΞ (x̂t + εu) − fΞ (x̂t )) = − ||∇fΞ || u (27) where ε is a small positive real number. (27) is a widely used theorem in gradient-based optimization algorithms [7]. The magnitude change of fΞ , caused by vd , is defined with vm as (28): vm = fΞ (x̂t + vd ) − fΞ (x̂t ) ' ∇fΞ> vd = −||∇fΞ || (28) “'” is caused by linearization, or the gradient operation. Now, let us build CRROS’s v whose direction is vd and magnitude is β||vm ||. Using (27) and (28), v = −β∇fΞ where β is a control gain for reducing the prediction variance. v points toward a reliable region in the GPR statistical model, and its magnitude is proportional to the slope of fΞ . Finally, (26) is rewritten with the new v as: + xt+1 = ξ J+ µ (yt+1 − ŷt ) − β(Id − Jµ Jµ )∇fΞ + x̂t (29) Because the primary task is to track a desired trajectory, J+ µ first uses a part of x space, and then the remaining space (i.e., null space) is used for −β∇fΞ . The value of β determines how strongly the algorithm pushes the target state x̃t+1 toward the low uncertainty region. A large value of β increases ||x̃t+1 − x̂t || because the null space is orthogonal to the space of J+ µ. IV. S IMULATION WITH 3-DOF A RTICULATED M ANIPULATOR For validation of the algorithm, we conducted a simulation with a 3-DOF articulated manipulator. First, we validate the CRROS algorithm with a dataset including homegeneous noise to effectively observe the properties of CRROS. Second, a training data set including heteroscedastic noise is used to see the difference of the two versions of CRROS. Simulation codes written in MATLAB are in [14]. Fig. 2 demonstrates the configuration of the simulated 3DOF articulated manipulator. The state x consists of three joint angles and the output y consists of the x-y positions of the end-effector. Based on the system configuration, the kinematics are governed by a nonlinear function g as shown in (30): y = g(x) (30) y1 l1 cos(x1 ) + l2 cos(x1 + x2 ) + l3 cos(x1 + x2 + x3 ) = y2 l1 sin(x1 ) + l2 sin(x1 + x2 ) + l3 sin(x1 + x2 + x3 ) where l1 , l2 , and l3 are length of each link, and were set as 1.5, 1.2, and 1.0 respectively in simulation. (y in Fig. 2 has an offset for generality.) A. Simulation with homogeneous noise For generating the training data, 30 input angles, x, were randomly sampled within π/2 rad ranges. This range constraint has two purposes. The first is to simulate a realistic situation; most articulated arm actuators have a limited operation range of motion due to hardware contact and safety. The second reason is to create a sparse region in the training data, in order to clearly show the effectiveness of CRROS. Corresponding training data outputs y were collected using (30). Then we built a statistical model with the standard GPR. Since the noise is homogeneous, the statistical model of heteroscedastic GPR is almost same with that of the standard GPR. Based on the GPR model constructed with the collected training data, CRROS controlled the manipulator to track a desired circular trajectory. The simulations were conducted with three different conditions, β = 0, 10 and 100 to investigate the effect of β, the gain pushing the state away from uncertainty regions. We assumed observation error and control error, x̂ = x + x , ŷ = y + y and xt+1 = x̃t+1 + c where x , y , c are unbiased Guassian noise and their standard deviations are set as 0.02 rad, 0.01, and 0.02 rad respectively. Fig. 3 and Fig. 4 illustrate the simulation results. In the first case β = 0, as shown in Fig. 3 (a), (b) and Fig. 4, the algorithm failed to track a desired trajectory. With β = 0, CRROS does not consider the unreliable region, and simply outputs the minimum norm solution. As a result, the state entered the region with sparse training data, and this significantly deteriorated the control performance. In the second and third cases, CRROS successfully tracked the desired path, and simultaneously stayed in low uncertainty regions. This verifies that the algorithm successfully pushed the state toward a reliable region by projecting ∇fΞ on a null space. Because the low uncertainty region exists around a training data point, the trajectory of state space tended to be close to those points. This effect was more significant for the third case because in this case CRROS pushed the state more forcefully toward the low uncertainty region. This resulted in several sharp turns in the state space (Fig. 3 (f)) and longer travel distances (Fig. 4 (b)). The uncertainty in the third case is the lowest during the control time as shown in Fig. 4 (a). B. Simulation with heteroscedastic noise Figure 2. The configuration of the 3-DOF articulated manipulator which is used in the simulations. The position of the end-effector y is controlled by the three joint angles x. For the second simulation, we generated a particular data set to observe the different features of two CRROS versions. First, we sampled 100 data points with a Gamma distribution for x1 and uniform distributions for x2 and x3 . The Gamma distribution was used to make a biased density of distribution along x1 (Fig. 5 (c) and (d)). Second, when obtaining y, we Training data State y2 x3 x2 2 Initial point 0.1 0 0 β=0 β = 10 β = 100 1 í 1 0.04 x1 0.2 0.4 Time 0.6 0.8 (a) Prediction uncertainty 1 β=0 β = 10 β = 100 0.02 0 0 0.2 0.4 Time 0.6 0.8 1 (b) ||x̃t+1 − x̂t || 1 1 0 −1 −2 2 0.2 3 3 Initial point 2 0.3 Std Training data 3 Output Desired path 2 3DOF Manipulator 7 1 ||∆ x|| x3 2 2 −1 0 y1 1 2 3 (a) Result in y space, β = 0 x2 1 0 0.5 x1 1 Figure 4. Simulation results of 3-DOF articulated manipulator control. √ (a) shows the prediction uncertainty by standard deviation, expressed as fΞ . In the case β = 0, CRROS does not consider prediction uncertainty, as a result the prediction uncertainty dramatically increased. After all, this unreliable model caused the control to fail. The case β = 100 had a slightly smaller prediction uncertainties during the control time than the case β = 10. (b) shows the norm of x̃t+1 − x̂t . Initially the first case had the smallest value because it provides the minimum norm solution. However, at the end, large error in y space sharply increased the value. For the third case, CRROS strongly pushed the state to reduce the prediction uncertainty, and as a result, several peaks appear in the plot. 1.5 (b) Result in x space, β = 0 3 1.5 x3 y2 2 1 1 0.5 0 2 −1 −2 −1 0 1 2 1.5 x2 3 y1 (c) Result in y space, β = 10 1 0 −0.5 x1 0.5 1 (d) Result in x space, β = 10 3 1.5 x3 y2 2 1 1 0.5 0 2 −1 −2 −1 0 1 2 y1 (e) Result in y space, β = 100 3 1.5 x2 1 −0.5 0 0.5 x1 (f) Result in x space, β = 100 Figure 3. Simulation results of 3-DOF articulated manipulator control. Experiments were conducted with three different conditions, β = 0, 10 and 100. Each row shows the results for different β values. The analysis were performed in y space (left column) and x space (right column). We illustrate the configuration of the manipulator over time in the left column as images which fade with time. If β = 0, CRROS provides the minimum norm solution, and does not consider the distribution of training data. As a result, it fails to track a desired trajectory. In the second and third cases (β = 10, 100), CRROS successfully tracked a desired trajectory. In y space, the results (c)(e) look similar, but the state path in x space are significantly different. The path in (f) tends to be closer to training data points than (d), thus the trajectory has several sharp curvatures. This is caused by a high gain β pushing more forcefully the state toward a reliable region. added unbiased Gaussian noise on the samples of x1 whose standard deviation is (x1 + 1.5)2 , that is heteroscedastic noise. The motivation for this particular training data is to effectively observe the different features of two CRROS versions. In this training data, there is a special region existing in the right side of x1 axis, which includes many data points and also significant noise. We conducted simulations with a standard GPR and a heteroscedastic GPR in the same configuration. Fig. 5 shows the results. 1 The CRROS with a standard GPR drove the state toward the high density data region (Fig. 5 (c)), and the substantial noise in the region resulted in the failure of control (Fig. 5 (a)). For clear visualization, x3 axis in Fig. 5 (c) and (d) are omitted. Fig. 5 (e) shows the evidence more clearly. For the comparison, two prediction uncertainty plots, acquired by the standard GPR and the heteroscedastic GPR for the x trajectory, were drawn together in Fig. 5 (e). The standard GPR was not able to detect the change of noise level until it diverged, although the heteroscedastic GPR was able to do so. In contrast, the CRROS with a heteroscedastic GPR successfully tracked the desired trajectory by avoiding unreliable regions caused by the low data density and the high noise level (right column of Fig. 5). V. E XPERIMENTS WITH F LEX - FINGER In the previous section, we presented simulation results of a 3-DOF articulated manipulator system to validate our methodology. Of course, for such a system control engineers could also use an analytical model. However, there are a number of nonlinear systems whose models cannot be represented analytically. To demonstrate the effectiveness of CRROS for those systems, we selected a robotic system whose analytical model is difficult to derive. The robot is called Flex-finger and designed to study various features of the human finger (Fig. 6). The robot is a highly nonlinear system. Its main body is made of a spring-tempered steel plate and plastic bones, which act as phalanges and joints respectively, but do not have a deterministic rotational center. Four compliant cables (corresponding to the human tendons), consisting of strings and springs, connect the plastic bones to the motors in order to generate flexion/extension motion. Four servo motors (corresponding to human muscles) change the cable lengths, which leads to a change in the pose of Flex-finger. The pose of the fingertip is observed by a motion capture system (corresponding to the human eye). The objective in this experiment is to control the tip of Flexfinger to track a desired path, which is similar to the control of the human finger with muscles. In this case, going back 8 3 3 Initial point y2 2 2 2 y Training data 3 Output Desired path 2 3DOF manipulator 1 1 0 0 −1 −2 −1 −2 (a) Flexible finger 0 y1 1 2 3 (a) Result in y space, St. GPR 3 2.5 2.5 0.5 x1 0 3 0.4 0.6 Time 2 Noise level −1 x1 0 1 í 1 (c) x Overview of Flex-finger 1 Figure 6. A manipulator called Flex-finger was used to validate the performance of CRROS. Flex-finger consists of a spring-tempered steel plate, plastic bone, string, spring, servo motor and pulley. Compliant elements and nonlinear tendon configuration make it difficult to build an analytical model. Depending on the amount of tensions, the friction on cables varies and results in heteroscedastic noise. Some finger poses which deform the steel plate significantly involves high friction on cables as shown in (b). The input range of Flex-finger, or the angle of motor pulley, is limited for preventing damage on the springs and the finger. 1 1 Std, Het. GPR Error 0.5 0.2 1 x2 1.5 (d) Result in x space, Het. GPR Std, St. GPR Std, Het. GPR Error Training data State −2 1 (c) Result in x space, St. GPR 0 0 2 0.5 Noise level −1 0.5 1 1 1 1 y1 2 Initial point x2 2 x 1.5 0 (b) Result in y space, Het. GPR 3 2 −1 x3 −1 (b) A pose involving high friction 2 0.8 1 0 0 0.2 0.4 0.6 Time 0.8 1 (e) Error and prediction uncertainty, (f) Error and prediction uncertainty, St. GPR Het. GPR Figure 5. Simulation results of CRROS with two versions of GPRs. The left column figures show the results of the CRROS with a standard GPR, and the right column figures show the results of the CRROS with a heteroscedastic GPR. We sampled 100 training data points with a Gamma distribution for x1 and uniform distributions for x2 and x3 . The Gamma distribution was used to create a biased density of data as shown in (c) and (d). The higher x1 area has the higher density of data. In addition, we added heteroscedastic noise. The y with a higher x1 has a bigger noise. With a standard GPR, CRROS failed to track the given trajectory because it cannot consider the heteroscedastic noise in uncertainty estimation as shown in (a)(c)(e). In contrast, the CRROS with a heteroscedastic GPR was able to track the desired trajectory by avoiding statistically unreliable regions (i.e, sparse data region or high noise region). to (1), x consists of the rotational angles of the servo motors and y consists of the x-y coordinates of the fingertip position. This is a challenging control problem, even for human. First, the kinematics of the system are highly nonlinear. Particularly, there are several bifurcation points due to the characteristics of spring-tempered steel plate. This means that for a small change in x, y can dramatically diverge. Second, the system has heteroscedastic noise caused by friction along string routing. The configuration of x varies the friction, and changes the noise level on y. Third, the range of operational control input is limited. To prevent damage to Flex-finger (e.g, permanent deformation of the spring), the control operation must always be done within a safe input range. For collecting the training data, 700 motor angle values, or x, were supplied and corresponding finger positions, or y, were recorded. The input values were randomly selected with uniform distributions from within the available operation range. This sampling naturally generated dense data areas and sparse data areas. We will call the upper-right area of y the ‘high noise area’ (Fig. 7 (a) (b)) because y in this area is generated by bending Flex-finger significantly as shown in Fig. 6 (b), requiring stronger tension combinations and causing higher friction than other areas. The CRROS algorithm was implemented in C++ (with RT Linux) and executed on a desktop with Intel Xeon 3.5 GHz CPU and 8 GB RAM. In the program, the control loop ran at 30 Hz. The loop frequency was mainly determined by the communication speed with servo motors, not by the speed of CRROS. The execution time of the CRROS algorithm in a control loop was less than 10 ms and 20 ms for a standard GPR and a heteroscedastic GPR respectively. We have conducted two sets of experiments. In the first experiment, the performances and limitations of CRROS were explored with a standard GPR while tracking two desired trajectories. In the second, CRROS with a heteroscedastic GPR was used to overcome the shortcomings of a standard GPR. A. CRROS with a standard GPR In the first set of experiments, CRROS with a standard GPR tracked two desired trajectories. The first trajectory was designed to avoid the high noise area, and the second trajectory was designed to pass through the high noise area. The control gains were tuned for the optimal tracking performance, and the optimized gain was used for all experiments. Fig. 7 (a)-(d) illustrates the experiment results. 9 High noise area 100 80 60 40 Initial point 20 Initial point 20 40 mm 60 0 0 80 20 40 mm 60 80 80 60 0 0 Initial point 60 Initial point 40 20 20 40 mm 60 0 0 80 20 40 mm 60 80 (a) Result in y space, elliptical path (b) Result in y space, rectangular path Std. Error 5 20 40 60 10 10 Std. Error 5 mm mm mm 80 10 Std. Error 0 0 100 20 (a) Result in y space, without passing (b) Result in y space, with passing the high noise area the high noise area 10 High noise area 100 40 20 120 High noise area 0 0 sec 20 40 60 sec (c) Error and prediction uncertainty, (d) Error and prediction uncertainty, without passing the high noise area with passing the high noise area Figure 7. CRROS with a standard GPR controlled Flex-finger tracking two different trajectories. (a),(b) show the tracking path in y space, and (c),(d) show the tracking error and prediction uncertainty during the tracking. The errors were reasonably low when the state was not around the high noise area. When the state is close to or within the high noise area, the tracking errors significantly increased. This result indicates that CRROS with a standard GPR is not able to avoid the unreliable region caused by the high noise level. In the experiments, as soon as the CRROS algorithm starts, the finger tip immediately converged to the desired trajectory, and the prediction uncertainty also decreased. The tracking errors were reasonably low until the finger tip reached the high noise area. When the finger tip is close to or within the high noise area, the tracking errors increased. For the first trajectory, the prediction uncertainty and the tracking errors were always low (Fig. 7 (c)), validating that CRROS pushes the state toward the low uncertainty region. For the second trajectory, the tracking error was larger than the first case when the state was in the high noise area. It indicates that CRROS with a standard GPR is limited due to heteroscedastic noise. While the state is in the high noise area, the statistical model was different from the actual system because CRROS failed to avoid the unreliable area, and as a result, CRROS generated wrong directional outputs, causing a fluctuating motion, and increasing the tracking error. B. CRROS with a heteroscedastic GPR In the second set of experiments, CRROS with a heteroscedastic GPR tracked two desired trajectories which pass through the high noise area. The first desired trajectory was the same trajectory which the CRROS with a standard GPR could not track. The other trajectory was created with a different Std. Error mm 0 0 60 mm mm mm 80 40 Training data Desired path 120 Actual path 120 High noise area mm 120Training data Desired path 100Actual path 5 0 0 20 40 60 5 0 0 sec 20 40 60 sec (c) Error and prediction uncertainty, (d) Error and prediction uncertainty, elliptical path rectangular path Figure 8. CRROS with a heteroscedastic GPR controlled Flex-finger to track two trajectories. (a),(b) show the tracking path in y space, and (c),(d) show the tracking error and prediction uncertainty during the tracking. Overall, the tracking errors were kept in a reasonably low level for whole trajectories. In the high noise region, the prediction uncertainty was increased, but CRROS was able to track the desired path. This verifies that CRROS drove the state toward the reliable region containing dense data and low noise. shape, rectangle, and with a different direction, counterclockwise. Fig. 8 and Fig. 9 illustrate the experiment results. Overall, the tracking performance of the CRROS with a heteroscedastic GPR was superior to the CRROS with a standard GPR. At first, the tracking error was similar, but after entering the high noise region, the superiority of the heteroscedastic GPR was evident. The prediction uncertainty increased in the high noise region, but CRROS drove the state toward the reliable region containing dense data and low noise, while maintaining low tracking errors. Another notable point is that most of the control inputs used in tracking two different trajectories were located within the range of training data which we defined for data collection (Fig. 9). This verifies that CRROS tends to push the state toward the dense data region. The minor exception shown in Fig. 9 (c) can be explained by the priority of CRROS. The first priority of CRROS is to track a desired trajectory. If two objectives, that are tracking and uncertainty minimization, are conflicted, CRROS selects to track a desired trajectory. VI. D ISCUSSION We have presented a statistical model-based control algorithm using GPR, called CRROS. The features of CRROS have been explored by simulation with a 3-DOF articulated 10 60 Ellipse 40 Training data range 60 Rectangle Training data range % % 40 20 0 60 20 80 0 100 120 140 160 deg (a) Histogram of state, x1 60 80 100 deg 120 140 (b) Histogram of state, x2 60 60 Training data range Training data range % 40 % 40 40 20 0 200 20 250 deg (c) Histogram of state, x3 300 0 100 150 200 deg 250 (d) Histogram of state, x4 Figure 9. Histograms of system state used in tracking two different trajectories. Red histograms are obtained when CRROS with a heteroscedastic GPR tracked the ellipse trajectory, and blue histograms for the rectangle trajectory. Most of the states were located within the range of training data. This verifies that CRROS tends to push the state toward the dense data region. The minor exception shown in (c) can be explained by the priority of CRROS. If two objectives of CRROS, tracking and uncertainty minimization, are conflicted, then CRROS selects to track a desired trajectory. manipulator, and its effectiveness has been validated by the experiments with Flex-finger. The experiments with a physical system introduce a host of practical considerations including challenges in modeling, heteroscedastic noise, and restrictions in range of inputs. CRROS drives the state toward a low uncertainty region while tracking a desired output trajectory. Since CRROS takes advantages of the pseudo-inverse and gradient operator, it does not need any iteration, and thus allows for high speed control loop, which is essential in many robotics applications or large scale applications. Furthermore, the control is performed around the training data and low noise region, which ensures safety in control. Many a times robotic system designers have avoided optimal designs to ensure that analytical models can be developed for their systems. For example, they have preferred to use linear elements such a linear spring which are easily modeled. A significant motivation for CRROS is to provide control schemes for a wide range of systems. The advantages shown in the simulations and experiments indicate potential of CRROS to various robotic systems that are difficulty to model, such a soft robot. In addition, CRROS provides a unique optimum solution for redundant DOFs of robotic manipulators. There are various types of optimization techniques for the redundant DOFs of manipulators, minimizing energy, torque and etc [23]. CRROS selects the solution closest to the previous experience within low-noise training data. This unique optimization feature is useful when teaching a robot an objective for which it is challenging to express in a simple mathematical form. For practical implementation of CRROS, a user needs to consider several points. The first one is the selection of a version of GPR to use CRROS. Theoretically, a heteroscedastic GPR can model a system more accurately than a standard GPR. However, the superiority of a heteroscedastic GPR in modeling does not always guarantee a higher performance in CRROS. The higher complexity of a heteroscedastic GPR requires a longer prediction time and may constrain the step size of a gradient-based optimization. Therefore, the significance of heteroscedastic noise and the step size in a control loop should be considered together to select a version of GPR for CRROS. Another practical issue is tuning of the control gains. The first gain ξ can be decided depending on how aggressively the control engineer wants the system to be controlled. The second gain β can be tuned to make a good balance between the left and right terms in (29), or + J+ µ (yt+1 − ŷt ) and β(Id − Jµ Jµ )∇fΞ . It can be confirmed by monitoring each element, and practically they can be obtained in a few iterations. In the future, CRROS can be improved in several ways. First, the development of a more systematic data collection method can improve the performance of CRROS. Because the performance of the statistical model-based control is highly dependent on the distribution of a training data set, a better strategy of data collection would lead to better control performance. Active learning theories developed in RL can be a solution [24]. Another important issue is the correlation of multivariate outputs. In this paper, we predicted the multidimensional output with multiple independent regression functions. That is, we are using multi-kriging [25]. Thus, the predicted multivariate outputs are uncorrelated. This issue might be resolved by using dependent GP [26]. As stated above, the acceleration of GPR prediction may improve the control performance for large scale systems. Sparse approximation [16], [17] and GPU-based calculation [18] are possible solutions. R EFERENCES [1] C. M. Bishop, Neural Networks for Pattern Recognition. New York, NY, USA: Oxford University Press, Inc., 1995. [2] A. Smola and B. Schlkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004. [3] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. The MIT Press, 2006. [4] G. Gregori and G. Lightbody, “Gaussian process approach for modelling of nonlinear systems,” Engineering Applications of Artificial Intelligence, vol. 22, no. 45, pp. 522 – 533, 2009. [5] C. Klein and C.-H. Huang, “Review of pseudoinverse control for use with kinematically redundant manipulators,” Systems, Man and Cybernetics, IEEE Transactions on, vol. SMC-13, no. 2, pp. 245–250, 1983. [6] A. D. Deshpande, J. Ko, D. Fox, and Y. Matsuoka, “Control strategies for the index finger of a tendon-driven hand,” The International Journal of Robotics Research, vol. 32, no. 1, pp. 115–128, 2013. [7] O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,” The international journal of robotics research, vol. 5, no. 1, pp. 90–98, 1986. [8] F. Allgöwer, T. A. Badgwell, J. S. Qin, J. B. Rawlings, and S. J. Wright, “Nonlinear predictive control and moving horizon estimationan introductory overview,” in Advances in control, pp. 391–449, Springer, 1999. [9] B. Likar and J. Kocijan, “Predictive control of a gasliquid separation plant based on a gaussian process model,” Computers and Chemical Engineering, vol. 31, no. 3, pp. 142 – 152, 2007. [10] J. Kocijan, R. Murray-Smith, C. Rasmussen, and A. Girard, “Gaussian process model based predictive control,” in Proceedings of the American Control Conference, vol. 3, pp. 2214–2219, 2004. 11 [11] M. Deisenroth and C. E. Rasmussen, “Pilco: A model-based and data-efficient approach to policy search,” in Proceedings of the 28th International Conference on Machine Learning, pp. 465–472, 2011. [12] J. Peters and S. Schaal, “Policy gradient methods for robotics,” in Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pp. 2219–2225, IEEE, 2006. [13] R. Martinez-Cantin, N. de Freitas, A. Doucet, and J. A. Castellanos, “Active policy learning for robot planning and exploration under uncertainty.,” in Robotics: Science and Systems, pp. 321–328, 2007. [14] “Supplementary material of CRROS,” http:// www.me.utexas.edu/ ∼reneu/ res/ CRROS.html. [15] “Control in the reliable region of a statistical model with gaussian process regression,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 654–660, 2014. [16] E. Snelson and Z. Ghahramani, “Sparse gaussian processes using pseudo-inputs,” in Advances in Neural Information Processing Systems 18, pp. 1257–1264, 2006. [17] D. Nguyen-Tuong and J. Peters, “Local gaussian process regression for real-time model-based robot control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 380–385, 2008. [18] B. V. Srinivasan, Q. Hu, and R. Duraiswami, “Gpuml: Graphical processors for speeding up kernel machines,” in Workshop on High Performance Analytics-Algorithms, Implementations, and Applications, Siam Conference on Data Mining, Citeseer, 2010. [19] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, “Most likely heteroscedastic gaussian process regression,” in Proceedings of the 24th international conference on Machine learning, pp. 393–400, 2007. [20] Q. V. Le, A. J. Smola, and S. Canu, “Heteroscedastic gaussian process regression,” in Proceedings of the 22nd international conference on Machine learning, pp. 489–496, 2005. [21] M. Lzaro-gredilla and M. K. Titsias, “Variational heteroscedastic gaussian process regression,” in Proceedings of the 28nd international conference on Machine learning, pp. 841–848, 2011. [22] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the royal statistical society. Series B, pp. 1–38, 1977. [23] J. Park, Y. Choi, W. K. Chung, and Y. Youm, “Multiple tasks kinematics using weighted pseudo-inverse for kinematically redundant manipulators,” in IEEE International Conference on Robotics and Automation, pp. 4041–4047, 2001. [24] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013. [25] C. Williams and C. Rasmussen, “Gaussian processes for regression,” in Advances in neural information processing systems, pp. 514–520, 1996. [26] P. Boyle and M. Frean, “Dependent gaussian processes,” in Advances in Neural Information Processing Systems, pp. 217–224, 2004.