preprint
Transcription
preprint
Distributed Asynchronous Time-Varying Constrained Optimization Andrea Simonetto and Geert Leus Faculty of EEMCS, Delft University of Technology, 2826 CD Delft, The Netherlands e-mails: ta.simonetto, g.j.t.leusu@tudelft.nl Abstract—We devise a distributed asynchronous gradientbased algorithm to enable a network of computing and communicating nodes to solve a constrained discrete-time timevarying convex optimization problem. Each node updates its own decision variable only once every discrete time step. Under some assumptions (strong convexity, Lipschitz continuity of the gradient, persistent excitation), we prove the algorithm’s asymptotic convergence in expectation to an error bound whose size is related to the constant stepsize choice and the variability in time of the optimization problem. Moreover, the convergence rate is linear. In addition, we present an interesting by-product of the proposed algorithm in the context of time-varying consensus, and we discuss some numerical evaluations in multi-robot scenarios to assess the algorithm performance and the tightness of the proven asymptotic bounds. I. I NTRODUCTION We consider a time-varying optimization problem defined on time-varying functions that are distributed over a network of computing and communicating nodes. Let the nodes be labeled with i P V “ t1, . . . , nu, and for each discrete time k P N, we equip each of them with the private local function fi,k pxq : Rd Ñ R. The main goal for the computing nodes at each discrete time k is to solve the optimization problem ÿ minimize fi,k pxq (1) xPX iPV where each of the fi,k pxq is a convex function of x, while X is a nonempty, closed, convex set. And, by solving, we mean computing an optimizer of (1) for each k. We allow the computing nodes to communicate with their immediate neighbors defined via the undirected communication graph Gk “ pV, Ek q, with time-varying edge set Ek . In particular, at each time k, every node i can communicate with all the nodes j P Ni,k :“ tj P V |pi, jq P Ek u (that is, we assume an edge-asynchronous protocol). Problems as (1) appear in distributed estimation of stochastic time-varying signals [1], in distributed control of mobile multi-robot systems with time-varying tasks [2], and even as a result of sequential convex programming approaches to multiagent non-convex problems [3]–[5]. When each of the fi,k pxq’s are time-invariant, several approaches can be applied to solve (1). These techniques This research was supported in part by STW under the D2S2 project from the ASSYS program (project 10561). differ for the assumptions they require and the properties they can ensure (convergence, convergence rate, resilience to asynchronous communication protocols, among others). Examples of such approaches are the subgradient method [6], dual averaging [7], and the alternating direction method of multipliers [8], [9]. All the mentioned techniques are iterative and they require communication among the nodes to converge to an optimizer of (1). In the case of time-varying fi,k pxq’s, they would converge (in theory) only when each node could exchange an infinite number of messages with its neighbors, between consecutive time steps k and k`1. Specific (so called running) methods that account for a finite number of messages between consecutive time steps and still guarantee convergence have been proposed in [2], [10]–[17], but they are all limited to specific versions of (1). Notably, in [1], [17], the authors work under the same general assumptions that we use, but consider unconstrained optimization, while in [16], the authors use subgradient methods but they assume that the optimizers of (1) are not time-varying. Contributions. We propose an asynchronous gradient-based distributed algorithm for the computing nodes to converge to an optimizer of (1), here presented in Algorithm 1. In fact, due to the time-varying nature of the problem, the convergence will be shown up to an error bound, whose size is directly dependent on the change in time of the optimizer of (1). This algorithm can be seen as a generalization in a time-varying context of the work in [6] where only one iteration of the algorithm is performed between consecutive time steps, as well as a generalization of the work in [1], [17] in asynchronous and constrained settings. In addition, in contrast to [1], [17], our algorithm does not hinge on dual variables to reach a common decision vector among the nodes (which complicates significantly the theoretical analysis of convergence), but is instead based on consensus protocols, which are easier to analyze and embed on real hardware. II. D ISTRIBUTED A LGORITHM We want to enable the computing nodes to solve (1) in a distributed fashion, where each of the nodes communicates with their neighbors only. For this task, we introduce local copies of the decision variable xk . These local copies are referred to as yi,k . We formally formulate the problem at hand as follows. Devising a distributed algorithm in order to enforce that the local decision variable yi,k eventually converges (up to a bounded error) to the optimal solution of (1) at time step k (x˚k ), or formally, ‰ “ @i P V, lim inf E }yi,k ´ x˚k }2 ď δ, A3) the second smallest eigenvalue of W̄ is positive, i.e., λ2 pW̄q ą 0. Call yk the stacked vector of all yi,k ’s. Define r :“ 1 ` α2 L2 ´ αm, γ :“ 1 ´ βλ2 pW̄q, ›) !› ÿ › › M :“ max › ∇x fj,k pxq|x˚ › . k kÑ8 for some δ ě 0. We describe now our proposed algorithm: it consists of two basic steps, the first is a single consensus iteration, while the second is a projected gradient descent. In order to enforce consensus among the local decision variables, let us define the time-varying consensus matrix Wk and two different stepsizes α ą 0 and β ą 0. The consensus matrix Wk is a symmetric (owning to the edge-asynchronous protocol assumption) matrix constructed based on the edge set Ek , and thus on the adjacency matrix Ak , as " ´rA s for j ‰ i rWk si,j “ řn k i,j . rA s for j“i k i,l l“1 As for any consensus matrix, we assume that Wk has nonzero elements if and only if the related nodes can communicate with each other, it is rank deficient and in particular Wk 1n “ 0n , and finally, for the sequence of matrices tWk u, we define ErWk s “ W̄ “ W̄T . With this in place, we are ready to describe our gradientbased distributed algorithm, as in Algorithm 1. Algorithm 1 Asynchronous distributed gradient algorithm Initialize by picking locally an arbitrary yi,1 P X. Then for k ě 1: 1) compute the local variable vi,k`1 by local communication as n ÿ vi,k`1 “ yi,k ´ β rWk si,j yj,k ; (2) j“1 2) compute locally the gradient of fi,k with respect to x at vi,k`1 , as gi,k “ ∇x fi,k pxq|vi,k`1 ; 3) update the local variable yi,k as yi,k`1 “ PX rvi,k`1 ´ αgi,k s ; (3) where PX r¨s indicates the projection operator; 4) go to step 1. For Algorithm 1, convergence goes as follows. Theorem 1: Assume that: A1) each one of the functions fi,k pxq is strongly convex with parameter m for all k, and their gradient is Lipschitz continuous with constant L for all k; A2) the distance between the optimizer of (1) at two subsequent time steps is bounded as }x˚k ´ x˚k´1 } ď δx ; i,k jPV,j‰i If we choose β ă 1{n, and α ă m{L2 , then the sequence of tyk u generated by Algorithm 1 converges as Er}yk`1 ´ 1n b x˚k`1 }2 s ď rEr}yk ´ 1n b x˚k }2 s` α2 n M 2 nδx2 ? ? ` ? . γp1 ´ γq 1 ´ γ Furthermore, we have 0 ă r ă 1 and thus, the convergence rate is linear. Proof: The proof can be found in [18, Theorem 1]. A few words are in order for the assumptions. Assumption A1) makes the solution set of (1) to be a unique point, and it is a recurrent assumption in the time-varying optimization literature, see for instance [19, Chapter 6]. Assumption A2) gives a handle on the variability of the optimizer, and it is also quite standard. Assumption A3) is required for the nodes to reach an agreement: it basically says that the communication graph Gk is connected in expectation, i.e., ErGk s is connected. Finally, the scalar M in the theorem quantifies how different the local optimal gradients are from the their mean value, and it is bounded, given A1). Corollary 1: Under the same assumptions of Theorem 1, we obtain lim inf Er}yk`1 ´ 1n b x˚k`1 }2 s ď δ, kÑ8 where δ“ ˘ ` 1 n ? ? α2 M 2 { γ ` δx2 . 1´r1´ γ Proof: Straightforward by applying the properties of geometric series. The last result shows the bounded error floor the algorithm is converging to. In particular, δ depends on the constant stepsize choice α, on the dissimilarity of the local optimal gradient M , on the network connectivity γ, and on the variability of the optimizer δx . In principle, one could optimize the choices of α and β to trade-off convergence rate, r, and asymptotic error, δ. III. E XAMPLE : T IME -VARYING C ONSENSUS In this section, we present an interesting by-product of the proposed algorithm. In particular, we show that it can be used to solve time-varying consensus problems. A time-invariant consensus problem can be written as the following strongly convex program 1 ÿ minimize }x ´ ci }2 . (4) x 2 iPV 5 5 5 5 5 5 0 0 0 0 0 0 −5 −5 35 40 k“2 45 −5 35 40 45 k “ 79 40 45 k “ 1247 4 2 10 35 40 45 k “ 3816 2 5 35 40 45 k “ 4983 40 45 k “ 6306 40 45 k “ 1947 35 40 45 k “ 9420 10 5 35 35 15 6 4 40 45 k “ 2804 −5 35 15 10 8 6 35 40 45 k “ 702 15 10 8 0 −5 35 12 10 5 40 45 k “ 313 14 12 10 −5 35 35 40 45 k “ 7785 5 Fig. 1. Snapshots of the algorithm’s waypoint generation (red points) and reference ones (blue diamonds) for the chosen example. Black lines between red nodes represent the edge set E, while the light red lines are the waypoints’ trajectories from τ “ 0 till τ “ k. The reference points move along a circular trajectory, although the radius is too big to be appreciated in these snapshots. A video of this simulation result is available at http://ens.ewi.tudelft.nl/„asimonetto/. ř The solution of (4) is x˚ “ n1 iPV ci , that is the average value of the vector ci across the network. In a time-varying case, ci is time-dependent, i.e., ci,k , and we want the nodes to agree on a time-varying average. In particular, we want the node to solve (for each k) the optimization problem 1 ÿ }x ´ ci,k }2 , (5) minimize x 2 iPV which perfectly fits our problem set (1). Applying Algorithm 1 to this problem yields the iterate ¸ ˜ n ÿ rWk si,j yj,k ` αci,k . (6) yi,k`1 “ p1 ´ αq yi,k ´ β j“1 Corollary 2: Assume that the second smallest eigenvalue of W̄ is positive, i.e., λ2 pW̄q ą 0. Choose β ă 1{n and α ă 1{2. Then iteration (6) converges to the solution of (5) as ›ı ”› 1 ÿ › › lim inf E ›yi,k`1 ´ ci,k › ď δ, @i kÑ8 n iPV Proof: Straightforward given that m “ 1{2, L “ 1. Notice that the value of δ is the same as in Theorem 1. Remark 1: The algorithms in [1], [17] could also be applied to time-varying unconstrained consensus problems. The benefit to use ours is that we can allow the nodes to communicate asynchronously via a time-varying edge set Ek . A thorough comparison with [1], [17] is a matter of future research. IV. N UMERICAL E VALUATIONS A. Multi-Robot Control The numerical example we are presenting is a formation control problem, where a number of mobile nodes need to track a defined point in space and maintain a certain formation. The example is inspired by [20] and has the added aim to show that the proposed algorithm can work with partially overlapping decision variables x (i.e., there is no need for each of the computing nodes to agree on the total decision variable x but only on subsets of it). We consider n “ 36 mobile nodes that have a fixed connection structure E, and need to track a squared pattern figure in two dimensions (Figure 1). At a given discrete time k, each mobile node i needs to compute a waypoint xi,k where to head to, this waypoint depends on the current value of the reference point xref i,k and on the neighboring waypoint/reference values. We consider each of the neighboring reference points to be known to the nodes. Putting this together, the computing mobile nodes have to solve the optimization problem ÿ´ 2 minimize θ}xi,k ´ xref i,k } ` xk iPV ÿ jPNi ¯ ref 2 }xi,k ´ xj,k ´ pxref , (7) i,k ´ xj,k q} where xk is the stacked version of the all xi,k ’s and θ ą 0 is a chosen scaling factor. It is easy to see that the problem (7) fits our problem formulation (1). It is sufficient to call, ÿ ref 2 2 }xi ´ xj ´ pxref fi,k pxq “ θ}xi ´ xref i,k ´ xj,k q} . i,k } ` jPNi The reference states (xref i,k ) evolve along circular trajectories with constant angular velocity ω. At each iteration k the symmetric adjacency matrix of the communication graph Ak is generated by an i.i.d. Bernoulli process with 4 Er}yk ´ yk˚ }2 s 10 Bound: Theorem 1 y1,1 10 3 10 5 y2,1 10 0 2000 4000 6000 8000 y 2 10000 y50,1 0 x˚ 1 discrete time k Fig. 2. Convergence performance of the algorithm for the chosen example. −5 B. Time-Varying Consensus A second numerical example involves a time-varying consensus problem in two dimensions. In particular, with the notation of Section III, we consider n “ 50 nodes, the vector ci,0 generated by using a uniform probability distribution of width 1 around the point p10, 0q, and ci,k following a circular trajectory of angular velocity ω “ 1e-4. The initial vectors yi,0 are randomly picked, the stepsizes α “ 1{p15nq and β “ 0.2{n, λ2 pW̄q “ 0.4846, while PrrrAk sij “ 1s “ 0.8 for all pi, jq P E. Figure 3 depicts how the nodes reach consensus and follow the time-varying optimizer. The black circles are the values of yi,1 for all i, the red square represents the value of the optimizer at the last simulated discrete time, k “ 50, 000, while the black diamonds close by are the values of yi,k at the same discrete time. In Figure 4, we display the convergence of the proposed algorithm which is linear up to the asymptotic bound of Corollary 2. In this case, the bound is also reasonably tight (once again yk˚ “ 1n b x˚k ). V. C ONCLUSIONS We have proposed a distributed stochastic gradient asynchronous algorithm to solve a convex separable time-varying program. The overall scheme converges linearly to an error bound whose size depends on the constant stepsize α and the yi,50,000 −10 −10 −5 x˚ 50,000 0 5 10 x Fig. 3. Trajectories of the vectors yi,k while reaching consensus and tracking the optimizer. 4 10 3 10 Er}yk ´ yk˚ }2 s PrrrAk sij “ 1s “ 0.7 for all pi, jq P E (that is, Ek Ď E). The weight θ “ 0.5, while the stepsizes α and β are determined according Theorem 1, in fact, in this example, the bounds are analytically computable. For the simulations, we set α “ 1.75θ{p2pθ ` λn pW̄q{0.7qq2 and β “ 0.3{n. We select the angular velocity as ω “ 0.5{40α, and run the distributed asynchronous time-varying optimization problem up to k “ 10, 000. By using snapshots of the agents’ trajectories, we show the algorithm’s behavior (Figure 1). The blue diamonds are the reference waypoints, while the red points are the agents’ computed waypoints at discrete time k. The black lines represent the fixed edge set E, while the light red lines are the waypoints’ trajectories from τ “ 0 till τ “ k. As we further see, the convergence performance is in line with the asymptotical bound of Theorem 1, which is reasonable in this particular case (Figure 2, where we have called yk˚ “ 1n b x˚k ). Bound: Corollary 2 2 10 1 10 0 10 0 10 1 10 2 10 3 10 4 10 discrete time k Fig. 4. Convergence performance of the algorithm for the chosen example. variability in time of the optimizer. The numerical evaluations well depict the performance of the proposed approach. In addition to the results presented in this paper, some extensions have already been studied: in [18], we present a variant of the algorithm for cases in which the constraint set X is time-varying, the optimization problem is stochastic, and the gradients are computed only up to a defined accuracy ǫ. In [21], we extend the algorithm presented here to deal with non-strongly convex multiuser optimization. Nonetheless, several open questions are still present, and have been left for future research. For example, the derived bounds could be tightened. And, if some knowledge of how the functions fi,k pxq are varying with time can be acquired or learned by the nodes, a predictive-corrective tracking could be added to the algorithm, as proposed in centralized nonlinear programming [22], [23]. R EFERENCES [1] F. Y. Jakubiec and A. Ribeiro, “D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems,” IEEE Transactions on Signal Processing, vol. 61, no. 2, pp. 450 – 466, 2013. [2] S.-Y. Tu and A. H. Sayed, “Mobile Adaptive Networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 4, pp. 649 – 664, 2011. [3] A. Simonetto, T. Keviczky, and R. Babuška, “On Distributed Algebraic Connectivity Maximization in Robotic Networks,” in Proceedings of the American Control Conference, San Francisco, USA, June – July 2011, pp. 2180 – 2185. [4] A. Simonetto, “Distributed Estimation and Control for Robotic Networks,” Ph.D. dissertation, Delft University of Technology, Delft, The Netherlands, 2012. [5] A. Simonetto, T. Keviczky, and R. Babuska, “Constrained Distributed Algebraic Connectivity Maximization in Robotic Networks,” Automatica, vol. 49, no. 5, pp. 1348 – 1357, 2013. [6] K. Srivastava and A. Nedić, “Distributed Asynchronous Constrained Stochastic Optimization,” IEEE Transactions on Selected Topics in Signal Processing, vol. 5, no. 4, pp. 772 – 790, 2011. [7] J. C. Duchi, A. Agarwal, and M. Wainwright, “Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling,” IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592 – 606, 2012. [8] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in Ad Hoc WSNs With Noisy Links— Part I: Distributed Estimation of Deterministic Signals,” IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 350 – 364, 2008. [9] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method R in Machine Learning, vol. 3, of Multipliers,” Foundations and Trends no. 1, pp. 1 – 122, 2011. [10] M. Kamgarpour and C. Tomlin, “Convergence Properties of a Decentralized Kalman Filter,” in Proceedings of the 47th IEEE Conference on Decision and Control, Cancun, Mexico, December 2008, pp. 3205 – 3210. [11] P. Braca, S. Marano, V. Matta, and P. Willett, “Asymptotic Optimality of Running Consensus in Testing Binary Hypotheses,” IEEE Transactions on Signal Processing, vol. 58, no. 2, pp. 814 – 825, 2010. [12] F. S. Cattivelli and A. H. Sayed, “Diffusion Strategies for Distributed Kalman Filtering and Smoothing,” IEEE Transactions on Automatic Control, vol. 55, no. 9, pp. 2069 – 2084, 2010. [13] M. Farina, G. Ferrari-Trecate, and R. Scattolini, “Distributed Moving Horizon Estimation for Linear Constrained Systems,” IEEE Transactions on Automatic Control, vol. 55, no. 11, pp. 2462 – 2475, 2010. [14] D. Bajovic, D. Jakovetic, J. Xavier, B. Sinopoli, and J. M. F. Moura, “Distributed Detection via Gaussian Running Consensus: Large Deviations Asymptotic Analysis,” IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4381 – 4396, 2011. [15] M. M. Zavlanos, A. Ribeiro, and G. J. Pappas, “Network Integrity in Mobile Robotic Networks,” IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 3 – 18, 2013. [16] R. L. G. Cavalcante and S. Stanczak, “A Distributed Subgradient Method for Dynamic Convex Optimization Problems Under Noisy Information Exchange,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 2, pp. 243 – 256, 2013. [17] Q. Ling and A. Ribeiro, “Decentralized Dynamic Optimization Through the Alternating Direction Method of Multipliers,” IEEE Transactions on Signal Processing, vol. 62, no. 5, pp. 1185 – 1197, 2014. [18] A. Simonetto, L. Kester, and G. Leus, “Distributed Time-Varying Stochastic Optimization and Utility-based Communication,” Submitted to IEEE Transactions on Control of Network Systems, 2014, available at http://arxiv.org/abs/1408.5294. [19] B. T. Polyak, Introduction to Optimization. Optimization Software, Inc., 1987. [20] F. Borrelli and T. Keviczky, “Distributed LQR Design for Identical Dynamically Decoupled Systems,” IEEE Transaction on Automatic Control, vol. 53, no. 8, pp. 1901 – 1912, 2008. [21] A. Simonetto and G. Leus, “Double Smoothing for Time-Varying Distributed Multi-user Optimization,” in Proceedings of the IEEE Global Conference on Signal and Information Processing, Atlanta, US, December 2014. [22] V. M. Zavala and M. Anitescu, “Real-Time Nonlinear Optimization as a Generalized Equation,” SIAM Journal of Control and Optimization, vol. 48, no. 8, pp. 5444 – 5467, 2010. [23] A. L. Dontchev, M. I. Krastanov, R. T. Rockafellar, and V. M. Veliov, “An Euler-Newton Continuation method for Tracking Solution Trajectories of Parametric Variational Inequalities,” SIAM Journal of Control and Optimization, vol. 51, no. 51, pp. 1823 – 1840, 2013.