D2.4 Etude d`une approche de type “forward recovery” pour l
Transcription
D2.4 Etude d`une approche de type “forward recovery” pour l
D2.4 Etude d’une approche de type “forward recovery” pour l’infrastructure de gestion du Runtime Petascale. VERSION DATE EDITORIAL MANAGER AUTHORS STAFF 1.0 2010 Sylvain Peyronnet Swan Dubois, Thomas Hérault, Toshimitsu Masuzawa, Olivier Pérès, Sylvain Peyronnet et Sébastien Tixeuil. Copyright ANR SPADES. 08-ANR-SEGI-025. D2.4 Contents 1 Préambule 2 Scalable Overlay for Address-based Networks with 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Pack algorithm . . . . . . . . . . . . . . . . . 2.2.2 List algorithm . . . . . . . . . . . . . . . . . 2.2.3 Ranking algorithm . . . . . . . . . . . . . . . 2.2.4 Routing Algorithm . . . . . . . . . . . . . . . 2.2.5 Convergence time of the global algorithm . . 2.3 Related Works . . . . . . . . . . . . . . . . . . . . . 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 4 Resource Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 7 8 13 16 18 18 20 20 3 Stabilizing Locally Maximizable Tasks in Unidirectional Networks is Hard 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Impossibility Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Possibility Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Deterministic solution with identifiers . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Probabilistic solution with unbounded memory in asynchronous anonymous networks 3.4.3 Probabilistic solution with bounded memory in synchronous anonymous networks 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 23 24 27 27 29 31 35 4 The 4.1 4.2 4.3 4.4 36 36 37 38 39 42 43 47 4.5 Impact of Topology on Byzantine Containment in Stabilization Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Self-Stabilizing Protocol Resilient to Byzantine Faults . . . . . . . . . . Maximum Metric Tree Construction . . . . . . . . . . . . . . . . . . . . 4.4.1 Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Topology-Aware Strict Stabilizing Protocol . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ANR SPADES. 08-ANR-SEGI-025 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 3 D2.4 Chapter 1 Préambule Ce document présente les travaux réalisés dans le cadre de la tâche 2.4 du projet SPADES. Il s’agit de présenter des travaux afférents à la thématique d’un approche de type “forward recovery” pour la mise en place d’une infrastructure de gestion d’un runtime petascale. Parmi les approches “forward recovery”, l’auto-stabilisation occupe une place à part en raison de sa simplicité apparente. En effet, une approche auto-stabilisante est par nature robuste aux défaillances et attaques transitoires, sans qu’il ne semble y avoir de mécanisme ad-hoc mis en jeu dans son comportement. Cependant, garantir la compatibilité d’une telle approche avec une architecture à l’échelle du petascale n’est pas simple. Ainsi l’impact d’une défaillance peut être très fort (retentissement sur l’ensemble du système) et le temps de retour à un état légitime peut être en pratique trop long (même si il est toujours théoriquement borné). Dans tous les travaux de ce livrable, on trouve l’hypothèse sous-jacente que les défaillances sont à priori décorrelées (pas de phénomènes épidémiques). Cela permet ainsi de s’affranchir du premier problème mentionné ci-dessus. Les quatres chapitres suivants correspondent à plusieurs articles réalisés par les participants au projet SPADES au sein de l’équipe-projet Grand-Large à l’INRIA Saclay-Île-de-France et leurs collaborateurs. Voici une description rapide de ces chapitres. Chapitre 2. Il correspond à l’article (en cours de soumission) suivant : Scalable Overlay for Addressbased Networks with Resource Discovery par Olivier Pérès, Thomas Hérault et Sylvain Peyronnet. Dans ce chapitre nous présentons un algorithme qui construit une structure similaire à un arbre couvrant équilibré. Cela permet de garantir que si le système contient n processus, alors la distance entre la racine et n’importe quelle feuille est au plus de dlog ne. L’algorithme est construit par composition d’algorithmes auto-stabilisants (et l’est également lui aussi [23]). Le premier algorithme utilisé regroupe les processus en groupes de taille bien choisie, puis un second algorithme met en place un chaînage entre les groupes. Une fois cette structure globale mise en place, un troisième algorithme va distribuer des identifiants (en ordre croissant) aux processus qui constituent le système. A l’aide de cette structure un algorithme additionnel permet de router les messages efficacement. L’algorithme global construit donc exactement le type de structure qui nous intéresse dans le cadre de systèmes à grande échelle : routage simple, robuste et rapide, nommage uniques des processus. Chapitre 3. Il correspond à l’article (publié dans les actes de ICDCS 2010) suivant : Stabilizing Locally Maximizable Tasks in Unidirectional Networks is Hard par Toshimitsu Masuzawa et Sébastien Tixeuil [46]. Ce chapitre présente des avancées sur le problème de la construction auto-stabilisante de tâches localement maximisables (comme par exemple la construction d’un ensemble indépendant maximum) dans des réseaux unidirectionnels à topologie quelconques. Nous présentons tout d’abord des résultats négatifs montrant l’impossibilité de la construction d’algorithmes au-stabilisants deterministe pour ce ANR SPADES. 08-ANR-SEGI-025 Page 4 D2.4 problème dans un modèle très général. Puis nous présentons des algorithmes fonctionnels dans des modèles sujets à des hypothèses plus restrictives. Chapitre 4. Il correspond à l’article (publié dans les actes de DISC 2010) suivant : The Impact of Topology on Byzantine Containment in Stabilization par Swan Dubois, Toshimitsu Masuzawa et Sébastien Tixeuil [28]. La tolérance aux pannes bizantines est une caractéristique souhaitée des systèmes distribués puisqu’elle permet de tolérer des comportements malicieux au sein du système (typiquement une corruption de mémoire). Ce chapitre aborde le problème de la construction d’arbres maximisant certaines métriques (sous entendue comme maximisant pour chaque noeud du système la valeur de la métrique respectivement à une relation d’ordre pre-établie). Le problème est réputé difficile. Nous montrons tout d’abord un résultat d’impossibilité sur la contention des pannes bizantines dans un contexte autostabilisant pour ce problème. Nous présentons ensuite un contexte plus favorable qui permet de résoudre une version plus faible de ce problème. D’autres travaux ont été réalisés dans un contexte similaire, ils ont donnés lieu à publication [30] mais ne sont pas présentés dans ce livrable. 08-ANR-SEGI-025 Page 5 D2.4 Chapter 2 Scalable Overlay for Address-based Networks with Resource Discovery 2.1 Introduction Many systems, like peer-to-peer file sharing systems [1, 51] or runtime environments of parallel systems [11, 12, 13, 9], rely on a resilient communication infrastructure to provide their service. This communication infrastructure is built on top of an existing network. In this chapter, we consider the prevalent case of Address-Based networks. These are networks where each process possesses a unique address and can communicate with any other process whose address is known. Addresses can be transmitted in messages, enabling processes to discover other processes and establish new communications. In this model, opening a communication between two processes, and keeping this connection alive, is a major part of the resources used by the processes. This model encompasses a realistic deployment of an application over the Internet: any process can communicate with any other, as soon as it knows its IP address and port, which can be communicated using existing connections. Connections consume significant resources in kernel memory and processing, and induce some communication costs to establish and maintain. The communication infrastructure that is built is an overlay network on the underlying, potentially fully connected, network. The topology used to build this network has a significant impact on the scalability of the infrastructure. The diameter of the overlay network must be small enough to guarantee a small latency of communication from any point to any other; but at the same time, the number of resources used by each process must also be constrained, to spare system resources for the application. These two goals are contradictory, since increasing the number of connections reduces the diameter of the network. In this work, we choose a simple tradeoff to bound the number of resources used as well as the diameter of the system by log(n), where n is the number of processes in the system. Another fundamental function of the communication infrastructure is to abstract out the system enough to simplify the communications: the communication infrastructure provides names to processes, and routing abilities to send a message from any process to any other, using the communication channels of the overlay network. The last two algorithms that we present in this chapter use the overlay network that is built to give each process a rank, i.e. a unique integer between 0 and n − 1, and provide efficient routing using these ranks. A major property of the communication infrastructure is that it must be reliable, even in case of unexpected failures. All the algorithms that we present here are self-stabilizing, which means that they will converge to an appropriate behavior starting from any configuration. As a consequence, if the system is subject to any arbitrary transient failure (messages loss or replication, process crash, memory corruption, anything that does not modify the code of the processes and only has an effect limited in duration), after a convergence time, the system will rebuild a correct overlay network and route messages on it as expected. This property makes these algorithms suitable for use in highly volatile systems, where it is hard to predict the possible failure scenarios, such as the Internet. ANR SPADES. 08-ANR-SEGI-025 Page 6 D2.4 The first algorithm that we present build packs of processes organized along complete binomial trees. This algorithm uses a constant t = log D such that D is an upper bound on the number of processes in the system. A value of t higher than necessary does not slow down the convergence of the algorithm. Moreover, in a system where the domain I of process identifiers is known, if |I| is the number of bits of I, then each process knows that D ≤ 2|I| : a tighter estimate of D is therefore not necessary. Using fair composition of self-stabilizing algorithms [23], the second algorithm links the packs together, building a single tree with the desired properties. The ranking per se operates on top of the global structure, for a total of three composed algorithms. We present in the end a routing algorithm that allows the processes to communicate efficiently with each other on the topology built by the other three algorithms. As compared to our previous spanning tree algorithm [41], this one builds a less constrained topology. Many processes are eligible to become root, its children may not have the immediately smaller identifiers, etc. It also converges faster: Θ(nB), as opposed to Θ(n(nB)) for the previous algorithm. 2.2 Algorithms In this section, we present in details the algorithms we introduced in the previous section. As mentioned earlier, the top level algorithms (ranking and routing) rely on the fair composiConstants: tion of three composed algorithms that perform a t : N {The upper bound on log n} global task together. my_id: I {The process unique identifier} We express our algorithms in an asynchronous Variables: neighbor[0..t] : I ∪ {⊥} distributed algorithm model [41] where message Definitions: passing is used to communicate between processes. active(0) ≡ true It has two additional abstractions to cope with the active(i) ≡ active(i − 1)∧ address-based concept of the underlying network: neighbor[i − 1] 6= ⊥, ∀i ∈ J1, tK an oracle that only gives a weak knowledge of the level ≡ max{i | active(i)} system, and a failure detector. When queried, the leader ≡ (level = 0) ∨ neighbor[level − 1] < my_id oracle gives one process identifier, which can be the identifier of a valid process, or not. We assume Figure 2.1: Pack Algorithm Constants, Varithat the oracle is weakly fair: if queried infinitely, ables and Definitions it will give all process identifiers infinitely. The failure detector is necessary because of the lack of synchronicity in the communications, and is an eventually perfect failure detector. This failure detector is represented in the algorithms by a local function that any process p can call: S(q) is true if and only if q is suspected by p to have failed at the time of the call. The first algorithm builds a forest of binomial trees. The first step is to pair processes together: basically, each process queries its oracle, looking for a neighbor, until it finds one that has no partner. Since the algorithm is self-stabilizing, a mechanism takes care of the cases where pairs are not wellformed by exchanging keep-alive messages and negative acknowledges. The case of an initialization with the identifier of a non-active process is handled using the failure detector. As a result, the system eventually consists of a set of pairs. If there is an odd number of processes, one of them remains unpaired. Now, the same mechanism that worked with individual processes can be applied to the pairs themselves, considering each of them has a leader that executes the algorithm. Pair of pairs are grouped together, forming packs of four processes. Applying the same principle as long as larger groups can be made results in a set of packs where no two packs have the same size. P To describe the system after log n convergence, let us consider the number of processes, n, in base 2: n = j=0 aj .2j , with aj ∈ {0, 1}. For all j ∈ J0; log nK, there exists a pack of processes of size 2j if and only if aj = 1. Any process in the system is part of a pack, and there are no two packs of the same size (they would fuse if they existed). In the general case, the topology is not connected when this first algorithm has converged. A second algorithm then builds a doubly-linked list connecting the pack leaders together, which yields a single spanning tree. Pack leaders are detected in the same way, using the oracle to discover processes. 08-ANR-SEGI-025 Page 7 D2.4 Then, there remains to give each process a unique identifier which is a number in J0, n − 1K. A third algorithm, which assumes that the other two have converged, is responsible for allocating these identifiers. This algorithm uses a simple weight propagating protocol, from the leaves up to the root, to compute the weight (expressed as the number of processes) of each branches of the tree. Ranking is then done by propagating name-assigning tokens along the spanning tree. Lastly, it is possible to ensure an efficient routing using this topology. Each process can then send a message to another process, knowing only its rank. The fourth algorithm solves this problem while guaranteeing a maximum number of 2dlog ne hops. We now present these four algorithms, along with their proof of self-stabilization and convergence time. 2.2.1 Pack algorithm The goal of this algorithm is to build groups of processes, called packs, whose cardinality is a power of two, and to elect a leader in each pack. Each process has a vector of neighbors holding up to t identifiers of a neighbor process or the special value ⊥ that denotes no valid identifier and thus the absence of a neighbor. If the vector of some process p at index i holds a valid identifier q, we say that the neighbor of p at level i is q. A process can be active at some level or not. All processes are active at level 0. Then, a process is active at level i iff it is active at level i − 1 and it has a neighbor at level i − 1. Being active at level i means, for a process, that it is looking for a neighbor at this level, or has an active neighbor at this level. Using the active function, we can define the level of a process: it is the highest level at which the process is active. A level of i denotes that the process has i − 1 neighbors. A process that is active at its level l and that has an identifier greater than the identifier of its neighbor at level l − 1 is the leader of its pack (any process of level 0 is also a leader). This process continuously prospects to find a l + 1th neighbor to increase the size of the pack. As a consequence, the number of neighbors varies among processes: processes at a high level will have more neighbors than processes at a lower level. For a single pack, this builds a binomial tree. As an example of execution, each process p of level 0 first uses its oracle to look for a neighbor q that is also at level 0. When a neighbor is found, which yields a pack of two processes, the process that has the highest identifier becomes leader and begins looking for a neighbor at level 1. This has to be the leader of a pack of two processes. The result is a graph defined recursively: a pack(0) is a pair of processes, a pack(k, k > 0)) is a pair of packs(k − 1). Rules: Rule Cleanup: true −→ for all i ∈ J0, tK do if neighbor[i] 6= ⊥ ∧ (S(neighbor[i]) ∨ ¬active(i) ∨ neighbor[i] = my_id) then neighbor[i] ← ⊥ end if end for Rule Link Maintenance: ∃i ∈ J0, tKactive[i] ∧ neighbor[i] 6= ⊥ −→ send Hello(i) to neighbor[i] Rule Prospection: leader −→ v =getPeer() if v >my_id then send Exists(level) to v end if Rule Reaction to Exists: reception of Exists(j) sent by v −→ if leader ∧level = j then neighbor[j] ← v end if Rule Reaction to Hello: reception of Hello(j) sent by v −→ if neighbor[j] = ⊥ ∨ v > neighbor[j] then neighbor[j] ← v else if neighbor[j] 6= v∨ = 6 active(j) then send Goodbye(j) to v end if Rule Reaction to Goodbye: reception of Goodbye(j) sent by v −→ if neighbor[j] = v then neighbor[j] ← ⊥ end if Figure 2.2: Pack Algorithm 08-ANR-SEGI-025 Rules Page 8 D2.4 3 6 7 4 2 3 3 2 1 7 5 3 7 1 5 6 7 2 0 1 3 1 1 2 3 1 1 0 4 5 y x a b c Node x is active at level y neigh_x [0]=a neigh_x [1]=b neigh_x [2]=c Figure 2.3: Structure of an 8-process pack Figure 2.3 shows the structure of a pack. The eight processes, identified 0 to 7, are represented by circles. Each contains a vector of t = 3 neighbors, ordered from left to right. Colors are also used to define the index of a neighbor: black is used for the index 0, green for index 1 and red for index 2. The number outside the circle represents the level of each process, while the colors of the links express which index in the neighbor vector define this link. Process 7 being the leader of this pack, it is represented with a bold circle. The tree that represents a pack(k) has a subtree of size 2i for each i ∈ J0, log kK. The longest distance, in number of hops, between the root of a pack(k) and any process in the pack is thus log k. To build the pack, and recover from potential failures, the protocol uses three messages types: • Each process spontaneously sends Hello messages to each of its neighbors to allow them to check that the links between the processes are symmetrical; • When a process receives a Hello message, it can come from a neighbor at the same level (which is correct and ignored), or from any other process. If it comes from any other process, it means that the sender is incorrectly initialized, so it breaks the link by sending a Goodbye message to the corresponding neighbor. A process receiving a Goodbye message removes the corresponding process identifier from the neighbor vector. • The processes looking for a neighbor send an Exists message to a process given by the oracle. The Exists message hold the level of prospection and the sender’s identifier, and if the level of prospection matches the level of the receiver, and the proposition of pairing is more advantageous for the receiver (it replaces a ⊥ neighbor, thus increasing the level of activity of the receiver, or comes from a neighbor with a higher identity, thus removing the burden of prospection on the receiver), the receiver accepts the emitter as a neighbor at this level. The formal version of the algorithm is given in algorithm 2.2. Since the algorithm permanently tries to fuse packs, each pack eventually reaches its maximum size in the system. For example, in a system comprising 18 processes ((18)10 = (10010)2 ), there is a pack of 16 processes and a pack of 2 processes. At this stage, in the general case, the topology is not connected since the packs are not linked to one another. 08-ANR-SEGI-025 Page 9 D2.4 Proof of self-stabilization To prove the self-stabilizing property of the Pack algorithm, we first define the set of legitmate configurations. To do so, we use the concept of stable processes, defined below. Definition 1 (stable) Let p and q be two processes.A system σ is stable at level l if and only if the following properties hold for all m ∈ J0, lK: • if active(m)(p) ∧ active(m)(q) ∧ neighbor[m](p) = q then neighbor[m](q) = p. • there are at most 2m+1 − 1 processes (pi ) s.t. ∀i, active(m)(pi ) and neighbor[m](pi ) = ⊥. • if leader(p) ∧ leader(q) ∧ level(p) = level(q) then p = q. • if Exists(m) ∈ cq→p then level(p) 6= m or ¬leader(p). • Hello(m) ∈ cp→q ⇒ neighbor[m](p) = q. • Goodbye(m) 6∈ cp→q . Let p be a process of a system σ. If the pack leader of p has level l and σ is stable at level l, then p is stable. Definition 2 (Lp ) A system is in the set Lp of legitimate configurations if and only if it is stable at level dlog ne. Theorem 1 The pack algorithm is self-stabilizing to Lp . Proof 1 This proof is divided into three parts: correction (Lemma 1), closure (Lemma 2) and convergence (Lemma 3). Lemma 1 (correctness) Let σ be a system with n processes in a legitimate configuration. For any i, there is one pack of size 2i if and only if n.[i], the ith binary digit of n, is 1. Proof 2 For all i such that n.[i] = 1, we show that there exists a pack of size 2i , then we show that this pack is unique. First notice that the first point of the definition of stable indicates that the neighbor relationship is reflexive. Then, the second point of this definition implies that for any pack of size m, there cannot be m processes without a neighbor at its level. That is, if the number of processes is sufficient to pair another block with it, the processes are already in other blocks. Lastly, because of the third point of this definition, there can be no two blocks of the same size. Definition 3 (paired) A process p is paired at level m iff p is stable at level m − 1 and there is a process q, stable at level m − 1, s.t. neighbor[m](p) = q. Lemma 2 Lp is closed under the execution of the algorithm. Proof 3 This is a consequence of the fact that a system stable at level l remains so throughout any execution, which we now prove. Here are the possible transitions. None of them can change neighbor[k](q) for a process q stable at level k ≤ l. • Cleanup: no process is suspect because the failure detectors have converged, no process is its own neighbor in the initial configuration by definition of stable(l), and the possible correction affecting an inactive process does not make the configuration illegitimate since the conditions only concern active processes. • Link maintenance and Prospecting: the only messages that can be sent are Hello to a neighbor, which obeys the rule on Hello messages, and Exists to an already paired process, which verifies the rule on Exists messages. 08-ANR-SEGI-025 Page 10 D2.4 • Reaction to Exists: by definition of stable, an Exists(m) message can only be received by a process p s.t. neighbor[m](p) 6= ⊥, thus p does nothing. • Reaction to Hello: by definition of stable, a Hello(m) message can only be sent by p to q s.t. neighbor[m](p) = q, thus q does nothing. • Reaction to Goodbye: by definition of stable, there is no such message in the channels linking stable processes together. Lemma 3 (convergence) The pack algorithm converges to Lp from any configuration. Proof 4 It is enough to prove that any system σ stable at level l − 1 and unstable at level l ≥ 0, or unstable at level l = 0, eventually becomes stable at level l. First notice that the execution of the sanity checking rule eliminates the cases of suspect neighbors and self-connections (m, p s.t. neighbor[m](p) = p). Since we suppose that the failure detectors are stabilized at this point, we disregard crashed processes. Similarly, at no point in the algorithm is it possible for a process to connect to itself. Also, all the messages present in the initial configuration are consumed and all the processes have executed their sanity checking rule. Finally, the fact that σ is stable at level l − 1 means that none of the values of active, level or leader can change in σ. This is because they only depend on active itself and neighbor[m](p) for p stable at level m, and this cannot change. Let z be the highest non-paired process at level l. Notice that no process can send Goodbye to z. Thus, if z writes the identifier of a correct process p in its neighbor variable, then p and z become paired. Suppose there is an execution of σ where no pair is formed at level l. Let p be a process distinct from z, not paired at level l (p has to exist, or σ would be stable at level l). As part of its spontaneous prospection rule, p sends out an infinite number of Exists messages and thus, because of the global condition on the oracle, eventually sends Exists to z. Since z has ⊥ in its neighbor variable, it takes p as a neighbor: contradiction. Hence, eventually the number of process pairs at level l is maximal, which leaves at most 2m+1 − 1 unpaired processes. Complexity of the Pack Algorithm We evaluate the complexity of this algorithm using two significant measures: the convergence time and the number of variables in I used. Number of variables in I used. It is important to minimize this number since each variable holding a different process identifier will request memory and processing time to establish, and maintain, a communication channel with the destination. With the vectorial notation used here, the algorithm needs at best O(log n) variables in I. However, it is possible to implement this algorithm by dynamically allocating the memory. In this case, each process only allocates a variable in I to record a non-⊥ value. The global leader still needs blog nc variables for its neighbors, but the global memory usage is much lower. Intuitively, since the topology is a tree, there are n − 1 edges, which means each process uses on average two variables representing its neighbors. Formally, n processes have a neighbor at level 1, . . . , 2 processus have a neighbor at level blog nc. Thus, the average number of neighbors per process is: Pk=blog nc k=0 n n 2k k=blog nc = X k=0 1 2k ∼ n7→+∞ 2 Convergence time. Since the algorithm uses a resource discovery oracle, the convergence time of the system depends on the time spent by the oracle to achieve its specification. We call this time B. The self-stabilization proof allows to characterize the convergence of the system. As shown in lemma 3, in any configuration in which convergence is not reached, the system is stable at a given 08-ANR-SEGI-025 Page 11 D2.4 level i − 1 and unstable at level i (or unstable at level 0). We thus calculate the maximum time the system needs to go from unstable at level 0 to stable at level log n. As shown above, the system stabilizes at worst level by level. At level 0, n processes participate in the stabilization of the system by looking for a neighbor. At level i > 0, only the 2ni leaders look for a neighbor. When the system is stable at level log n, convergence is achieved. Lemma 4 Convergence is achieved, in the worst case, in Θ(nB) asynchronous rounds. Proof 5 Let E be an un upper bound on the number of rounds necessary for two processes p and q to become neighbors at a given level, once p has obtained the identifier of q. We first prove (lemma 5 that the stabilization time of any level j is, in the worst case, Θ n(B+E) . j 2 The convergence time of the system is thus the sum of the convergence times of all the levels, i.e. j=n X j=0 Θ n(B + E) 2j ≈ Θ(B + E) E is the number of asynchronous rounds necessary to send a message Exists and receive the answer Hello, i.e. O(1). Therefore, the system converges in Θ(nB) asynchronous rounds. Lemma 5 Consider a system stable at level j − 1 and unstable at level j, or unstable at level j = 0. In the worst case, the system becomes stable at level j in Θ(B) asynchronous rounds. Proof 6 Let P be the set of unstable processes at level j. Without loss of generality, we write P = p1 , p2 , . . . , pi such that if a > b, then the identifier of pa is lower than that of pb . Since half the active processes at a given level are active at the higher level, at worst,i = 2nj . By definition of unstable these processes are leader at level j and do not have a neighbor at level j − 1, the guard of their prospection rule is true. Thus, in B asynchronous rounds, all the processes in P obtain, by definition of the oracle, all the identifiers in P . The proof of convergence implies that at least one pair of processes is formed. We now show that it is possible to form exactly one pair. We build a first asynchronous round as follows : all the processes execute their prospection rule such that for all i ∈ J1, nK, pi obtains the identifier of pi+1 (modn). Each process thus sends Exists to the corresponding process. In the second round, each process receives the Exists message that was sent to it during the previous round and takes the sender as its neighbor at level j. In the i − 1 subsequent asynchronous rounds, which are still in the first B rounds, each process executes the following actions : • it executes its prospection rule; • it receives the identifier of a process p ∈ Π to which it has not yet sent an Exists message; • it sends Exists to p. All the Exists messages of these i − 1 asynchronous rounds are received and ignored, because each process already has a neighbor. During the next round, each process pi sends Hello to its neighbor pi+1 . Except in the case of pi , this neighbor already has a neighbor with a higher identifier; it thus replies Goodbye. In the last asynchronous round, upon reception of the Goodbye messages, each process replaces the identifier of its neighbor at level i with ⊥. Finally, only one pair remains: pn−1 is paired with pn . 08-ANR-SEGI-025 Page 12 D2.4 Algorithm 2 List Algorithm P3 ⊥ Variables: −∞ prev, next: I ∪ {⊥} level, next level: !0, t" ∪ {−∞, +∞} Algorithm 2 List Algorithm • • • • •prev P1 P2 P3 P2 P3 ⊥• P1 P2 ⊥ Rules: Variables: • • • • +∞ Cleanup3Prev: −∞ 5 2 3 Rule prev, next: I ∪ {⊥} S(prev) ∨ prev level ≤ level ∨¬ leader −→ prev • • • • • • • • • level, next level: !0, t" ∪ {−∞, +∞} (prev, prev level) ← (⊥, −∞) Rules: • • • • • • • • Rule Cleanup Prev: Rule P,4 • Cleanupt • • • Next: R Q R,1 P S(prev) ∨ prev level ≤ level ∨¬ leader −→ S(next) ∨ next level ≥ level ∨¬ leader −→ • • • • 2 8 16 Q,3 level) (prev, prev ← (⊥, −∞) Q,3 (next, next level) ← (⊥, +∞) • • • • Rule Cleanupt Next: Rule Link maintenance: • • • • S(next) ∨ next level ≥ level ∨¬ leader −→ leader • •−→ • • Example of a list (next, next level) ← (⊥, +∞) send to prev • • ListHello(level) • • send ListHello(level) to next Rule Link maintenance: level 3), the last one is led by leader −→ Rule Prospection: Fig. 2. Example of a list ocesses (level 2). send ListHello(level) to prev leader −→ send ListHello(level) to next list building uses three mesv ← getPeer() larly to the algorithm of pack if ¬S(v) then contains 8 processes (level 3), the last one is led by Rule Prospection: prospect continuously using ListExists(level) to v 2). P3 andsend contains 4 processes (level leader −→ er other leaders with a level end if The algorithm of list building uses three mesv ← getPeer() Figure 2.4: Exampleifof¬S(v) a list of the leader, but lower than sages, and works to the algorithm of pack then Rule Reaction to similarly ListExists: a successor, and other leaders building: will prospect continuously using send ListExists(level) to v receptionleaders of ListExists(l) sent by v −→ an the level of the leader, but theiforacle, discover other leaders with a level end if leader tothen ent one to be a predecessor. higherif than of the leader, levelthe < llevel < prev level thenbut lower than Rule Reaction to ListExists: sts is used to prospect, while List 2.2.2 algorithm the current oneprev to be a successor, (prev, level) ← (v, l) and other leaders reception of ListExists(l) sent by v −→ nd ListGoodBye are used to with else a level than<the the leader, but if lower next level l <level levelof then leader This evaluate definitions read variablesiffrom thethen Pack algorithm, but cannot modify and symmetry of the prevalgorithm / highercan than the one(v,tol)be and a predecessor. (next, nextcurrent level) ← if level < l < prev level then them. Being with this algorithm, it allows packs together through a doubly linked Thecomposed message is used to prospect, while to connect end if ListExists (prev, prev level) ← (v, l) of this algorithm is list. givenThe in first messages are of used process in theand listListGoodBye is the leader theto highest-level then end if ListHello else if pack, next level < all l < the levelother then leaders follow ensure the coherency and symmetry of the prev / in decreasing order. (next, next level) ← (v, l) Rule Reaction to ListHello: ization: We define as follow next variables. end if the identifiers (namely prev and next) reception of ListHello(l) sent by v adds −→ two variables holding purpose, the algorithm configurations for the To Listserve this The formal version of this algorithm is given in end if if v = prev then that point Algorithm to the successor 2. ← l and predecessor of the process in the doubly-linked list of packs for leaders. prev level Rule Reaction to ListHello: y): Let m be the number of with Together these output variablesWethat define the linked list, we store the level of the predecessor and Proof define as follow else if ofv self-stabilization: = next then reception of ListHello(l) sent by v −→ pack algorithm. Iff thesuccessor, l lowest the set oflevel legitimate configurations thet,List which a number between 0for and plus two special values: +∞ and −∞ to handle the nextare ←l if v = prev then he following conditions, they cases algorithm, l: special of the L first and last leaders. else prev level ← l eady and the system is steady Definition 4 (steady): Let of send ListGoodBye towhere vm be the 2.4 gives an example n =number 26. Processes P if, Q, are leaders of their packs, of else v =and next Rthen leaders and their packs Figure are packs by the pack algorithm. Iff the l lowest endformed if 8 and next level ← lis represented on the left of the respective sizes 16, 2 processes. The couple (next, next_level) pack leaders verify the following conditions, they elsethe right of the process. P has no successor, Rule Reaction to(prev, ListGoodBye: process, is isrepresented on er of the smallest pack, then while and the theircouple packs are steadyprev_level) and the system steady send ListGoodBye to v reception of ListGoodBye sent by v −→ d next level(p) = −∞, else while R hasat no predecessor. predecessor of Qare (resp. successor of R) is R (resp. is Q). level l. The other The leaders and their packs end if if leader then ch that q is the leaderThe of List unsteady. algorithm uses three messages, and works similarly to the Pack algorithm: leaders continuif v = prev then Rule Reaction to ListGoodBye: smaller than that of p and • if pprev is the the thediscover smallest other pack, then ously prospect using oracle,of to leadersreception with a level higher than ← ⊥leader of ListGoodBye sentthe by vlevel −→ of the leader, evel(q). next(p) = ⊥ and next level(p) = −∞, else end if but lower than the current one to be a successor, and other leaders with a level lower than the level if leader then er of the largest pack, then next(p) = q then such that q is the leader of if v = next if v = prev then of the leader, but higher than the current one to be a predecessor. The message ListExists is used to d prev level(p) = +∞, else thenext largest ← ⊥pack smaller than that of p and prev ← ⊥ prospect, whilenext messages ListHello and ListGoodBye are used to ensure the coherency and symmetry of level(p) = level(q). end if end if • if variables. p is the leader of the largest pack, then the prev / next end if if v = next then level(p) is= given +∞, else = of ⊥ this and prev The formalprev(p) version algorithm in Algorithmnext 0. ← ⊥ end if end if P1 5 P2 P3 2 P2 3 Proof of self-stabilization We now define the set of legitimate configurations for the List algorithm, Ll : Definition 4 (steady) Let m be the number of packs formed by the pack algorithm. Iff the l lowest pack leaders verify the following conditions, they and their packs are steady and the system is steady at level l. The other leaders and their packs are unsteady. 08-ANR-SEGI-025 Page 13 D2.4 Algorithm 0 List Algorithm Variables: prev, next: I ∪ {⊥} prev_level, next_level: J0, tK ∪ {−∞, +∞} Rules: Rule Cleanup Prev : S(prev) ∨ prev_level ≤ level ∨¬ leader −→ (prev, prev_level) ← (⊥, −∞) Rule Cleanupt Next: S(next) ∨ next_level ≥ level ∨¬ leader −→ (next, next_level) ← (⊥, +∞) Rule Link maintenance: leader −→ send ListHello(level) to prev send ListHello(level) to next Rule Prospection: leader −→ v ← getPeer() if ¬S(v) then send ListExists(level) to v end if Rule Reaction to ListExists: reception of ListExists(l) sent by v −→ if leader then if level < l < prev_level then (prev, prev_level) ← (v, l) else if next_level < l < level then (next, next_level) ← (v, l) end if end if Rule Reaction to ListHello: reception of ListHello(l) sent by v −→ if v = prev then prev_level ← l else if v = next then next_level ← l else send ListGoodBye to v end if Rule Reaction to ListGoodBye: reception of ListGoodBye sent by v −→ if leader then if v = prev then prev ← ⊥ end if if v = next then next ← ⊥ end if end if 08-ANR-SEGI-025 Page 14 D2.4 • if p is the leader of the smallest pack, then next(p) = ⊥ and next_level(p) = −∞, else next(p) = q such that q is the leader of the largest pack smaller than that of p and next_level(p) = level(q). • if p is the leader of the largest pack, then prev(p) = ⊥ and prev_level(p) = +∞, else prev(p) = q such that q is the leader of the smallest pack larger than that of p and prev_level(p) = level(q). • in all other cases, prev_level(p) = level(prev(p)) and next_level(p)) = level(next(p)). • no channel contains a ListGoodBye message. • if the channel cp→q contains a message ListHello(l), then prev(p) = q or next(p) = q. Definition 5 (Ll ) A configuration that is steady at level m and in which the leader p of the largest pack is such that prev(p) = ⊥ and prev_level(p) = t is legitimate. The set of such configurations is called Ll . Theorem 2 The list algorithm is self-stabilizing to Ll . Proof 7 This proof is divided into three parts: correction (Lemma 6), closure (Lemma 7) and convergence (Lemma 8). Lemma 6 (correction) In any legitimate configuration, the doubly-linked list of the list algorithm includes the leaders of all the packs, from the largest down to the smallest. Proof 8 For this algorithm, the definition of a legitimate configuration is the same as the one of a correct configuration. Lemma 7 (closure) The set Ll is closed under the execution of the list algorithm. Proof 9 None of the possible transitions in a legitimate configuration yields to an illegitimate configuration. • Cleanup (next and prev): all the conditions are false by definition of Ll . • Link maintenance: the messages sent verify the condition on ListHello messages. • Prospection: sending a ListeExists message cannot make the configuration illegitimate. • Reaction to ListHello: the transition has no effect. • Reaction to ListExists: the transition has no effect. • Reaction to ListGoodBye: there is no such message in a legitimate configuration. Lemma 8 The list algorithm converges to Ll from any configuration. Proof 10 We consider that the failure detectors are stabilized, all initial messages are consumed and the pack algorithm is stabilized. We first show that undesirable values are eventually eliminated in lemma 9, then we show that the system eventually becomes steady, level by level, in lemma10. As a consequence, the system eventually stabilizes to Ll . Definition 6 (spurious) Let p be a pack leader. The value of prev(p) or next(p) is spurious if the associated level does not match that of the corresponding process, i.e. next_level(p) 6= level( next(p)) or prev_level(p) 6= level( prev(p)). Lemma 9 All spurious values are eventually eliminated. Proof 11 Suppose next_level(p) 6= level( next(p)). Eventually, p executes its sanity checking rule and sends ListHello to q. If q is not a pack leader, it replies with ListGoodBye, on reception of which p executes next ← ⊥. If q is a pack leader then level(q) > level(p) by definition of p, so q also replies with ListGoodBye. The other case, prev_level(p) 6= level( prev(p)), is symmetric. 08-ANR-SEGI-025 Page 15 D2.4 The level variables are updated at the same time as prev and next, which makes it impossible to introduce a spurious value in the system. We then suppose that no spurious value exists in the system. An immediate consequence of this is that the system eventually becomes steady at level 0. Lemma 10 A system steady at level l < t eventually becomes steady at level l + 1. Proof 12 Let p be the leader of the smallest unsteady pack and q be the leader of the largest steady pack. We prove that eventually, next(p) = q using the fairness of the oracle and next(p) cannot change afterwards. First, notice that if next(p) = q, then p cannot change the value of its next variable. This would require one of the following: • a process r such that level(p) > level(r) > level(q) sends ListHello to p, but by definition of p and q, there is no such r. • q sends ListGoodBye to p. Since q is a pack leader, this can only happen if q receives ListHello from p, but in this case, since level(p) > level(q) and there is no level between those two, q does not send a ListGoodBye message. Now, consider an execution where next(p) is never q. By its prospection rule, q sends an infinite number of times ListExists to all the processes returned by its oracle, including p (by the global contition on oracles). Since level(p) > level(q) > next_level(p), p writes the identifier of q in its next variable. This is a contradiction. Convergence time Theorem 3 After the pack algorihm is stabilized, the list algorithm converges in Θ(B) asynchronous rounds in the worst case. Proof 13 First, the variables prev of the first process and next of the last process take the value ⊥ during the first execution of the spontaneous rule. Notice that, through the use of the prospection rule, all the processes obtain the identifiers of all the other processes in B asynchronous rounds. Therefore, in B rounds, each pack leader receives the identifier and level of all the other leaders. Then, as seen in the self-stabilization proof, when a leader p receives the ListExists message from the leader q whose level is immediately higher (resp. lower) than its own, it takes q for prev (resp. next). Therefore, after B rounds, the list is stabilized. 2.2.3 Ranking algorithm This algorithm, to be composed with the previous two, gives all the processes in the system consecutive integer identifiers, starting with 0. To achieve its goal, it uses the tree structure of the packs (see figure 2.3). Each process has a variable called name, distinct from the constant my_id. The root, i.e. the leader of the largest block, spontaneously sends Rank(0) to itself. Each pack being constituted of a regular structure (a binomial tree), it is possible to recursively assign a unique rank to each of the sons of the root, knowing only the depth of the tree (neighbor number i can take the name 2i ). It is also possible to compute the size of the tree knowing its level: a pack whose root is of level l holds 2l processes. Thus, when any process receives a Rank(r) message, it takes r as its name, then assigns the name i r + σj=0 2level−i to each neighbor i such that neighbor[i] 6= ⊥, by sending a corresponding Rank message to its neighbors. Lastly, if the process is a leader and has a successor, it sends the next available rank to its successor, which is r + 2level . Note that name assignments can occur in parallel: knowing only the level of the process receiving the Rank message and the next available identifier which is held in the Rank message, is enough to assign a name to all of its direct neighbors. 08-ANR-SEGI-025 Page 16 D2.4 Algorithm 1 Ranking Algorithm Variables: name: J0, n − 1K Rules: Rule Spontaneous Rule: previous = ⊥ −→ send Rank (0) to my_id Rule Reaction to Rank : reception of Rank(r) −→ name ← r n←r+1 for all i = 1 to level do send Rank (n) to neighbor[i − 1] n ← n + 2level−i end for if next 6= ⊥ then send Rank (n) to next end if 0 0 0 3: Rank(2) 1 1: Rank(0) P 3 0 4: Rank(13) 0 2: Rank(1) 2: Rank(5) 1 2 0 2 1 2: Rank(7) 10 3: Rank(4) 3: Rank(6) 0 0 4: Rank(3) Q 2 0 2: Rank(8) 0 3: Rank(9) 1 3: Rank(12) R 1 1 3: Rank(11) 0 0 4: Rank(10) 0 Figure 2.5: Example of Ranking Algorithm 08-ANR-SEGI-025 Page 17 D2.4 The formal algorithm is given in Algorithm 1. Figure 2.5 shows a system comprising 14 processes. Thus, there are three packs: one of 8 processes, lead by P, one of 4 processes lead by Q and one of 2 processes, lead by R. P is the root of the tree, and the black arrows in the figure represent the next pointer of each process. The number inside the processes represent their level. The number next to the links between processes inside a pack represents the index of the process in the neighbor vector of the parent. Blue arrows and text represent Rank messages: first, P sends Rank (0) to itself, according to the spontaneous rule. Then, it sends in phase 2 Rank (0 + 1 = 1) to its neighbor at index 0, Rank (1 + 23−1 = 5) to its neighbor at index 1, Rank (5 + 23−2 = 7) to its neighbor at index 2, and because it is a leader, Rank (7 + 23−3 = 8) to its next neighbor. Self-stabilization This algorithm is self-stabilizing because once the pack and list algrithms are stabilized, the topology does not change anymore. Eventually, the root sends Rank(0) to itself, which launches a new distribution of names. As soon as the wave has finished propagating along the tree, each process has a unique name that is suitable for routing. Convergence time Let E be an upper bound on the transmission time of a message. Messages are created by the root of the spanning tree and forwarded down to the leaves, which takes E log n asynchronous rounds. Since E is normally Θ(1) for the system to be usable, the convergence time of this algorithm is Θ(log n). 2.2.4 Routing Algorithm The routing algorithm provides a procedure post that takes a message and an integer which is the name of the final destination. The goal of the routing algorithm is to deliver the message at the final destination, using the shortest established route between the caller of post and the destination. The message is delivered at the destination by calling the deliver function with the message as a parameter. The routing algorithm is done with a single message Route, that takes the destination rank and the message to deliver. The algorithm is straightforward: at the reception of a Route message, any process that is the destination delivers the message. Otherwise, using its name and its level, the process computes whether the message must be routed on one of the children (because of the ranking algorithm, children of any process of rank r and level l are ranked r + 1 to r + 2l ), in which case the appropriate child is computed by iterating on the size of the subtrees rooted in each child. If the message is not directed to one of the children, then it must be forwarded to the parent in the tree (neighbor[level − 1]) if the process is not a leader, to the leader pointed by next if the destination possess a higher rank than the process, or by prev if the destination possess a lower rank than the process. Because the resulting tree has a diameter of at most 2 log n, a message is routed in at most 2 log n hops. The formal algorithm is given in Algorithm 2. Since this algorithm uses no memory, it stabilizes instantly: no additional convergence time is needed in order to be able to route when the ranking algorithm is converged. Note however that because of the composition, since the global algorithm relies on regular self-stabilizing algorithms, routing is not guaranteed to work before all the other algorithms have converged. 2.2.5 Convergence time of the global algorithm In the worst case, the algorithms converge one after another. The global convergence time is thus Θ(nB) + Θ(B) + Θ(log n) = Θ(nB). This improves over the previous fastest spanning tree algorithm known in this model [41], which converges in Θ(n(Bn)) [50]. 08-ANR-SEGI-025 Page 18 D2.4 Algorithm 2 Routing Algorithm Definitions: SendM essage(m, d) ≡ send Route(m, d) to my_id Rules: Rule Reaction to Route: reception of Route(m, r) −→ if r = name then Deliver m else if r ≥ name + 1 ∧ r < name + 2level then n ← name + 1 i←0 f ound ← false while ¬f ound do if r ≥ n ∧ r < n + 2level−i−1 then send Route(m, r) to neighbor[i] f ound ← false end if i←i+1 n ← n + 2level−i−1 end while else if ¬ leader then send Route(m, r) to neighbor[level−1] else if r <name then send Route(m, r) to prev else {r > name + 2level ∧ leader} send Route(m, r) to next end if 08-ANR-SEGI-025 Page 19 D2.4 2.3 Related Works Many self-stabilizing algorithms that build spanning trees can be found in the literature, mostly using the state reading model. They can build depth first search trees, or breadth first search trees, depending on the algorithm. Many information can be found in the survey of Gärtner [35]. Sandeep Gupta and Pradip Srimani [39] presented an algorithm to build and maintain trees in an adhoc message-passing network. This algorithm assumes a fixed range of communication and the discovery of the list of neighbors. In our case, this algorithm is not applicable, because we would have to model the range of communication as infinity, and this would require each processes to know about all the processes in the system. Yehuda Afek and Anat Bremler [2] define the principle of power supply to build a spanning tree over an unidirectional network and state reading. The algorithms presented here use a similar concept, messages being sent permanently to supply the topology. This concept is common in self-stabilization and peer-to-peer systems, where it is known as gossiping. Vijay Garg and Anurag Agarwal [34] gave a self-stabilizing spanning tree algorithm for large scale systems under the assumption that processes already posses consecutive ranks, and are joinable using these ranks. Our algorithms provide this abstraction, and does not make the same assumption. Brian Bourgon, Ajoy Datta and Viruthagiri Natarajan [10] introduced a self-stabilizing ranking algorithm for a tree-based network. Their algorithm could be used on top of the tree built in this work, instead of the ranking and routing algorithms. However, our ranking algorithm uses specificities of the tree that is built, in order to improve the convergence time. The self-stabilizing algorithm proposed by Dolev, Israeli and Moran [24] addresses the issue of ranking in an anonymous network. It first builds a spanning tree, using random choices to break topological symmetries. However, the algorithm is built for a shared memory / state reading model, and is thus not fitted for large scale address-based networks. In a previous work [41], we introduced the abstractions that we reuse in this chapter to model a large-scale address-based network using message passing. We presented a first self-stabilizing algorithm to build a spanning tree. This work improves on the previous result by giving a much more efficient algorithm to build a tree, removing many constraints on the order of nodes in the tree that were necesary in the previous algorithm. We also solve higher-level problems, using the tree that is built, the specificities of the topology, and the fair composition of self-stabilizing algorithms. 2.4 Conclusion We presented self-stabilizing algorithms for large scale address-based systems. Address-based systems, such as the Internet, enable communication between any pair of processes, as long as one of them knows the address of the other. Maintaining such addresses, and corresponding communication channels, is costly at large scale, and the algorithms we presented limit the amount of such resources, while still building a resilient and efficient communication infrastructure for classical peer-to-peer distributed algorithms. The first algorithm packs processes together in a forest of complete binomial trees. Composed with the second algorithm that doubly links trees together, this creates a single tree whose diameter and depth are both logarithmic in the number of processes in the system. Each process, moreover, has at most log n communication channels to maintain, where n is the number of processes in the system. The third algorithm assigns ranks (consecutive unique identifiers, ranging from 0 to n − 1) to the processes of the tree, creating a higher-level abstraction for the communication layer. The fourth algorithm presents a routing mechanism using these ranks, thus completing the creation of a fully usable communication infrastructure. Since the distance between two nodes is bounded by 2 log n hops this infrastructure is efficient. It is also reliable since the whole structure is built in a self-stabilizing manner. It means that in case of failures, the system would converge back to a normal behavior. The algorithm built from the composition of the four aforementioned algorithms relies on a computation model that replaces traditional neighbor list with an oracle. This weakening of the system 08-ANR-SEGI-025 Page 20 D2.4 assumptions allows scaling up to very large systems. The algorithm converges in Θ(nB) asynchronous rounds, which improves over the best previously known spanning tree algorithm in such settings. 08-ANR-SEGI-025 Page 21 D2.4 Chapter 3 Stabilizing Locally Maximizable Tasks in Unidirectional Networks is Hard 3.1 Introduction One of the most versatile techniques to ensure forward recovery of distributed systems is that of selfstabilization [20, 27, 53]. A distributed algorithm is self-stabilizing if after faults and attacks hit the system and place it in some arbitrary global state, the system recovers from this catastrophic situation without external (e.g. human) intervention in finite time. The vast majority of self-stabilizing solutions in the literature [27] considers bidirectional communication capabilities, i.e. if a process u is able to send information to another process v, then v is always able to send information back to u. This assumption is valid in many cases, but can not capture the fact that asymmetric situations may occur, e.g. in wireless networks, it is possible that u is able to send information to v yet v can not send any information back to u (u may have a wider range antenna than v). Asymmetric situations, that we denote in the following under the term of unidirectional networks, preclude many common techniques in self-stabilization from being used, such as preserving local predicates (a process u may take an action that violates a predicate involving its outgoing neighbors without u knowing it, since u can not get any input from its outgoing neighbors). Self-stabilizing solutions are considered easier to implement in bidirectional networks since detecting incorrect situations requires less memory and computing power [5], recovering can be done locally [4], and Byzantine containment can be guaranteed [44, 45, 48]. Investigating the possibility of self-stabilization in unidirectional networks was recently emphasized in several papers [3, 8, 14, 16, 18, 19, 25, 32, 33]. However, topology or knowledge about the system varies: [16] considers acyclic unidirectional networks, where erroneous initial information may not loop; [3, 14, 19, 25] assume unique identifiers and strongly connected communication graph so that global communication can be implemented; [18, 32, 33] make use of distinguished processes yet operate on arbitrary unidirectional networks. Tackling arbitrary uniform unidirectional networks in the context of self-stabilization proved to be hard. In particular, [8, 7] studied the self-stabilizing vertex coloring problem in unidirectional uniform networks (where adjacent nodes must ultimately output different colors). Deterministic and probabilistic solutions to the vertex coloring problem [37, 47] in bidirectional networks have local complexity (∆ states per process are required, and O(∆) –resp. O(1)– actions per process are needed to recover from an arbitrary state in the case of a deterministic –resp. probabilistic– algorithm, where ∆ denotes the maximum degree of a process). By contrast, in unidirectional networks, [8] proves a lower bound of n states per process (where n is the network size) and a recovery time of at least n(n − 1)/2 actions in total (and thus Ω(n) actions per process) in the case of deterministic uniform algorithms, while [7] provides a probabilistic solution that remains either local in space or local in time, but not both. In this chapter, we consider the problem of constructing self-stabilizingly a locally maximizable task ANR SPADES. 08-ANR-SEGI-025 Page 22 D2.4 (e.g. maximal independent set, maximal matching, grundy coloring) in uniform unidirectional networks of arbitrary shape. It turns out that local maximization is strictly more difficult than local predicate maintenance (i.e. vertex coloring). On the negative side, we present evidence that in uniform networks, deterministic self-stabilization of this problem is impossible. Also, the silence property (i.e. having communication fixed from some point in every execution) is impossible to guarantee, either for deterministic or for probabilistic variants of protocols. On the positive side, we present a series of generic protocols that can be instantiated for all considered locally maximizable tasks. First, we design a deterministic protocol for arbitrary unidirectional networks with unique identifiers that exhibits O(m log n) space complexity and O(D) time complexity in asynchronous scheduling, where n (resp. m) is the number of processes (resp. links) in the network and D is the network diameter. We complement the study with probabilistic generic protocols for the uniform case: the first probabilistic protocol requires infinite memory but copes with asynchronous scheduling (stabilizing in time O(log n + log ` + D), where ` denotes the number of fake identifiers in the initial configuration), while the second probabilistic protocol has polynomial space complexity (in O(m log n)) but can only handle synchronous scheduling (stabilizing in time O(n log n + `)). The remaining of the chapter is organized as follows: Section 3.2 presents the programming model and problem specification. Section 3.3 presents our negative results, while Section 3.4 details the protocols. Section 3.5 gives some concluding remarks and open questions. 3.2 Preliminaries Program model A program consists of a set V of n processes. A process maintains a set of variables that it can read or update, that define its state. A process contains a set of constants that it can read but not update. A binary relation E of cardinality m is defined over distinct processes such that (i, j) ∈ E if and only if j can read the variables maintained by i; i is a predecessor of j, and j is a successor of i. The set of predecessors (resp. successors) of i is denoted by P.i (resp. S.i), and the union of predecessors and successors of i is denoted by N.i, the neighbors of i. The ancestors of process i is recursively defined as follows: i itself is an ancestor of i, and ancestors of each predecessor of i are also ancestors of i. The descendants of i are similarly defined using successors (instead of predecessors). The relation E is not necessarily symmetric, which reflects the assumption that the network we consider is unidirectional. Another remarkable point that distinguishes our model from the ordinal unidirectional model is that each process is aware of its predecessors but is unaware of its successors; each process knows how many predecessors it has and can distinguish them but has no knowledge about its successors. Notice that the unawareness of successors is inherent to some unidirectional networks such as wireless networks. For processes i and j in V , d(i, j) denotes the distance (or the length of the shortest path) from i to j in the directed graph (V, E). We define, for convenience, the distance as d(i, i) = 0 and d(i, j) = ∞ if i is not reachable to j. The diameter D is defined as D = max{d(i, j) | (i, j) ∈ V × V, d(i, j) 6= ∞}. A graph G = (V, E) is strongly connected if for any two vertices i and j, both d(i, j) 6= ∞ and d(j, i) 6= ∞ hold. The strongly connected components (abbreviated as SCC) of G are its maximal strongly connected sugraphs. An action has the form hnamei : hguardi −→ hcommandi. A guard is a Boolean predicate over the variables of the process and its predecessors. A command is a sequence of statements assigning new values to the variables of the process. Remind that a process is unaware of its successors so actions of a process can depend on its predecessors but are completely independent of its successors. A parameter is used to define a set of actions as one parameterized action. A configuration of the program is the assignment of a value to every variable of each process from its corresponding domain. Each process contains a set of actions. In some configuration, an action is enabled if its guard is true in the configuration, and a process is enabled if it has at least one enabled action in the configuration. A computation is a maximal sequence of configurations γ0 , γ1 , . . . such that for each configuration γi , the next configuration γi+1 is obtained by executing the command of at least one action that is enabled in γi . Maximality of a computation means that the computation is infinite or it terminates in a configuration where none of the actions are enabled. A program that only has 08-ANR-SEGI-025 Page 23 D2.4 terminating computations is silent. A scheduler is a predicate on computations, that is, a scheduler is a set of possible computations, such that every computation in this set satisfies the scheduler predicate. We consider only weakly fair schedulers, where no process can remain enabled in a computation without executing any action. We distinguish three particular schedulers in the sequel of the chapter: the distributed scheduler corresponds to predicate true (that is, all weakly fair computations are allowed). The locally central scheduler implies that in any configuration belonging to a computation satisfying the scheduler, no two enabled actions are executed simultaneously on neighboring processes. The synchronous scheduler implies that in any configuration belonging to a computation satisfying the scheduler, every enabled process executes one of its enabled actions. The distributed and locally central schedulers model asynchronous distributed systems. In asynchronous distributed systems, time is usually measured by asynchronous rounds (simply called rounds). Let C = γ0 , γ1 , . . . be a computation. The first round of C is the minimum prefix of C, C1 = γ0 , γ1 , . . . , γk , such that every enabled process in γ0 executes its action or becomes disabled in C1 . Round t (t ≥ 2) is defined recursively, by applying the above definition of the first round to C 0 = γk , γk+1 , . . .. Intuitively, every process has a chance to update its state in every round. A configuration conforms to a predicate if this predicate is true in this configuration; otherwise the configuration violates the predicate. By this definition every configuration conforms to predicate true and none conforms to false. Let R and S be predicates over the configurations of the program. Predicate R is closed with respect to the program actions if every configuration of the computation that starts in a configuration conforming to R also conforms to R. Predicate R converges to S if R and S are closed and any computation starting from a configuration conforming to R contains a configuration conforming to S. The program deterministically stabilizes to R if and only if true converges to R. The program probabilistically stabilizes to R if and only if true converges to R with probability 1. Problem specification In this chapter we consider locally maximizable tasks, and instantiate them using the following three classical problems: UMIS Each process i defines a function mis.i that takes as input the states of i and its predecessors, and outputs a value in {true, false}. The unidirectional maximal independent set (denoted by UMIS in the sequel) predicate is satisfied if and only if for every i ∈ V , either mis.i = true ∧ ∀j ∈ N.i, mis.j = false or mis.i = false ∧ ∃j ∈ N.i, mis.j = true. UGC Each process i defines a function col .i that takes as input the states of i and its predecessors, and outputs a non-negative integer (or color). The unidirectional Grundy coloring (denoted by UGC) predicate is satisfied if and only if for every i ∈ V , ∀j ∈ N.i, col .i 6= col .j and col .i = min(Z≥0 − {col .j | j ∈ N.i}), where Z≥0 denotes the set of non-negative integers. UMM Each process i defines a function match.i that takes as input the states of i and its predecessors, and outputs one of its predecessor (actually the local label of the incoming link from the predecessor) or a symbol ⊥. The unidirectional maximal matching (denoted by UMM) predicate is satisfied if and only if • for any two distinct processes i and j such that match.i 6= ⊥ and match.j 6= ⊥, {i, match.i} ∩ {j, match.j} = ∅, and • for any neighboring processes i and j such that match.i = match.j = ⊥, either ∃g ∈ N.i, match.g = i ∨ ∃h ∈ N.j, match.h = j. 3.3 Impossibility Results In this section, we consider anonymous and uniform networks, where processes of the same in-degree execute exactly the same code (note however that probabilistic protocols may exhibit different actual behaviors when making use of a random variable). 08-ANR-SEGI-025 Page 24 D2.4 S'' S c S'' a S' S f S' d S' b (a) System A g S'' e h (b) System B Figure 3.1: Impossibility of silent self-stabilizing localy maximizable task Definition 7 (Local View) The local view of a process p consists of the states of its own and its predecessors (with local labels on the incoming links). Two processes are called locally equivalent if they have a same local view. Since all the processes have the same program to execute and the program depends only on the states of its own and its predecessors, locally equivalent processes make the same action when activated. Especially, a process is disabled at a configuration, all the processes locally equivalent to it are also disabled at any configuration. Using the concept of local equivalence, we can characterize the problems for which no silent self-stabilizing solution exists. Theorem 4 A problem allows no silent self-stabilizing solution if it satisfies the following condition: From any configuration, say γ, satisfying the problem predicate, a configuration γ 0 (possibly of a network different from that of γ) can be constructed such that S1. γ 0 cannot satisfy the problem predicate, and S2. every process at γ 0 is locally equivalent to some process at γ. Proof 14 Assume a silent self-stabilizing solution A for contradiction. Starting from any configuration, A reaches a silent configuration, say γ, satisfying the problem predicate. Now construct a configuration γ 0 satisfying the conditions S1 and S2. The configuration γ 0 is silent, since all the processes are disabled at γ and are also at γ 0 (from S2). Thus A forever remains at γ 0 when starting from γ 0 . This contradicts the assumption that A is a self-stabilizing solution since γ 0 does not satisfy the problem predicate (from S1). Notice that the impossibility results of Theorem 4 holds even for probabilistic potential solutions. Impossibility results for the UMIS, the UGC and the UMM problems are easily obtained from Theorem 4. Corollary 1 There exists no silent self-stabilizing solution for the UMIS problem. Proof 15 Consider System A as depicted in Figure 3.1 (a). In any configuration γ satisfying the UMIS predicate, exactly one of the three processes, says a, has mis.a = true. Now consider System B in Figure 3.1 (b) and construct configuration γ 0 , where process d has the same state as a, e and g have the same state as b, and f and h have the same state as c. Configuration γ 0 does not satisfy the UMIS predicate since only d has mis.d = true (or S1 holds). It is easy to see d is locally equivalent to a, e and g are locally equivalent to b, and f and h are locally equivalent to c (or S2 holds). Thus, the corollary holds from Theorem 4 Corollary 2 There exists no silent self-stabilizing solution for the UGC problem. 08-ANR-SEGI-025 Page 25 D2.4 Proof 16 Consider System A as depicted in Figure 3.1 (a). In any configuration γ satisfying the UGC predicate, col .a, col .b and col .c return mutually distinct colors drawn from {0, 1, 2}. Without loss of generality, assume col .a = 0. Now consider System B in Figure 3.1 (b) and construct configuration γ 0 , where process d has the same state as a, and e and g have the same state as b, and f and h have the same state as c. Configuration γ 0 does not satisfy the UGC predicate since h does not satisfy the requirement of the minimum color (or S1 holds); col .g and col .h return mutually distinct colors drawn from {1, 2} and thus the minimum color 0 is not used at h or its neighbor g. It is easy to see d is locally equivalent to a, e and g are locally equivalent to b, and f and h are locally equivalent to c (or S2 holds). Thus, the corollary holds from Theorem 4 Corollary 3 There exists no silent self-stabilizing solution for the UMM problem. Proof 17 Consider System A as depicted in Figure 3.1 (a). In any configuration γ satisfying the UMM predicate, exactly one of the three processes, say a, has match.a 6= ⊥ (actually match.a = c). Now consider System B in Figure 3.1 (b) and construct configuration γ 0 , where process d has the same state as a, and e and g have the same state as b, and f and h have the same state as c. Configuration γ 0 does not satisfy the UMM predicate since match.g = match.h = ⊥ and match.d 6= g (or S1 holds). It is easy to see d is locally equivalent to a, e and g are locally equivalent to b, and f and h are locally equivalent to c (or S2 holds). Thus, the corollary holds from Theorem 4 Initial symmetry can be broken self-stabilizingly by a probabilistic approach [37, 38, 47] however deterministic protocols prevent using such techniques. Thus, relaxing the silence property still prevents the existence of deterministic solutions since symmetry breaking is impossible in some situations (e.g. a ring where all the processes are initially in the same state). Thus we can obtain the following theorem. First we introduce the unidirectional view that is simply a unidirectional version of the view introduced in [54]. Definition 8 (Unidirectional View) The unidirectional view Vp1 at distance 1 of a node p is the local view of p. The unidirectional view at distance k of p is a tree Vpk of height k that contains one unidirectional view Vqk−1 as a subtree of p for each predecessor q of p. In the below, a unidirectional view is simply called a view. The following theorem is derived from the result of [54]. Intuitively, the view at infinite distance of process p is the information p can use at the best. Thus, processes with the same view cannot make their states distinct from each other’s when all the processes are activated every time. It is also known that two processes have the same view at infinite distance if their views at distance n are the same. Theorem 5 A problem allows no deterministic self-stabilizing solution if the following configuration γ exists at some network G: for any configuration γ 0 of G satisfying the problem predicate, there exist processes p and q that have the same unidirectional view of distance n in γ but have different states at γ 0 , where n is the number of processes in G. Theorem 5 implies impossibility of symmetry breaking. Thus, the following corollary obviously holds from Theorem 5 by considering as γ, for example, a ring network consisting of processes with a same state. Corollary 4 There exists no deterministic self-stabilizing solution for the UMIS , the UGC and the UMM problems. 08-ANR-SEGI-025 Page 26 D2.4 3.4 Possibility Results The previous impossibility results yield that for the deterministic case, only non uniform networks admit a self-stabilizing solution for the UMIS, the UGC and the UMM problems. In section 3.4.1, we present such a deterministic solution. For anonymous and uniform networks, there remains the probabilistic case. We proved that probabilistic yet silent solutions are impossible, so both our solutions are non-silent. The one that is presented in Section 3.4.2 performs in asynchronous networks but requires unbounded memory, while the one that is presented in Section 3.4.3 performs in synchronous networks and uses O(m log n) memory per process. 3.4.1 Deterministic solution with identifiers This subsection deals with networks where each process has a unique identifier. First, we present a deterministic scheme of self-stabilizing solutions and then characterize a problem class that can be solved by the scheme. The problem class contains the UMIS, the UGC and the UMM problems as explained later. The intuition of the scheme is as follows. Every process collects the predecessor information from all of its ancestors using the self-stabilizing approach given in [19, 22, 33]. From the collected information, each process i can reconstruct the exact topology of the subgraph consisting of all ancestors of i. In the case that each process has a given input value of the problem to be solved, the input values of all the ancestors are also collected. Then, using the topology and the input values, each process locally solves the problem and changes its state according to the solution. The details of the scheme is given in Algorithm 1. Each process i maintains a variable Topologyi to store tuples of the form (id , ID, inp, d ) where id is a process identifier, ID is the (identifier) set of the predecessors of process id, inp is the input value of id, and d is the distance from id to i. For Topologyi , G(Topology i ) denotes a directed graph G = (V, E) obtained from the predecessor information contained in Topologyi : V = {id | (id, ∗, ∗, ∗) ∈ Topology i }, and E = {(j, k) ∈ V × V | ∃(k, ID, ∗, ∗) ∈ Topology i s.t. j ∈ ID}. Lemma 11 Let i be any process. At the end of the k-th round (k ≥ 1) and later, variable Topologyi stores correct tuples up to distance k − 1: {(id , ID, inp, d ) ∈ Topology i | d ≤ k − 1} = {(j, P.j, inp j , d(j, i)) | d(j, i) ≤ k − 1} The following corollary is derived from Lemma 11. Corollary 5 Let i be any process and D(i) be the maximum distance to i from all the ancestors of i. At the end of the (D(i) + 1)-th round and later, Topologyi exactly stores only the correct tuples of all the ancestors of i. Corollary 5 shows that the scheme of Algorithm 1 eventually provides each process i with the topology of its ancestors including their identifiers and input values. This is the maximum information that process i can use in the unidirectional network, which intuitively implies that the scheme can allow us to solve any problem that is solvable in our model. The following theorem characterizes the problem that can be solved by the scheme of Algorithm 1. Theorem 6 In asynchronous networks with identifiers, the scheme of Algorithm 1 can provide a selfstabilizing solution to any problem such that each process can find its final state (or its output) solely from the topology of its ancestors including their identifiers and input values. Its convergence time is D + 1 rounds and the memory space required at each process is O(m log n) bits. 08-ANR-SEGI-025 Page 27 D2.4 Algorithm 1 A generic deterministic scheme in asynchronous networks with identifiers constants of process i idi : identifier of i; Pi : identifier set of its predecessors P.i; inp i : input value of i (of the problem to be solved); variables of process i Topologyi : set of (id , ID, inp, d ) tuples; // topology that i is currently aware of. // id: process identifier // ID: identifier set of P.(id) // inp: input value of i // d: distance from id to i in Topologyi . function update(Topologyi ) TopologyiS:= {(idi , Pi , inp i , 0)}∪ j∈P.i {(id , ID, inp, d + 1) | (id , ID, inp, d ) ∈ Topologyj }; while ∃(id , ID, inp, d ), (id 0 , ID 0 , inp 0 , d 0 ) ∈ Topologyi s.t. id = id0 and d < d0 remove (id 0 , ID 0 , inp 0 , d 0 ) from Topologyi ; while ∃(id , ID, inp, d ), (id 0 , ID 0 , inp 0 , d 0 ) ∈ Topologyi s.t. id = id0 and (ID 6= ID 0 or inp 6= inp 0 ) remove all the tuples (id, ∗, ∗, ∗) from Topologyi ; while ∃(id , ID, inp, d ) ∈ Topologyi s.t. id is unreachable to i in G(Topology i ) remove (id , ID, inp, d ) from Topologyi ; solve(Topologyi ) change the state of i to the task-dependent solution using Topologyi actions of process i true −→ update(Topologyi ); solve(Topologyi ); 08-ANR-SEGI-025 Page 28 D2.4 Modification to a silent protocol: Algorithm 1 can be easily modified to be silent. For simplicity of our presentation, every process always has an enabled action with guard true, and thus, Algorithm 1 is not silent. But, Algorithm 1 becomes silent by changing the guard so that the action becomes enabled only when Topologyi needs to be updated. Precisely, the guard is changed to Topology i 6= {(idi , Pi , inp i , 0)} ∪ [ {(id , ID, inp, d + 1) | j∈P.i (id , ID, inp, d ) ∈ Topology j }. The rest of this subsection shows the UMIS, the UGC, and the UMM problems are contained in the problem class of Theorem 6. Corollary 6 The scheme of Algorithm 1 can provide self-stabilizing deterministic solutions for the UMIS, the UGC and the UMM problems in asynchronous networks with identifiers. Algorithm 2 A task-dependent function at process i for the UMIS problem function UMIS(Topologyi ) WorkingTpi := Topologyi ; UMISi := ∅ while ∃(idi , Pi , 0) ∈ WorkingTpi { Let W be a source SCC of WorkingTpi ; for each id ∈ W in the descending order of identifiers if UMISi ∪ {id} is an independent set UMISi := UMISi ∪ {id}; WorkingTpi := WorkingTpi − W ; } if idi ∈ UMISi output true; else output false; 3.4.2 Probabilistic solution with unbounded memory in asynchronous anonymous networks In this subsection, we present a probabilistic scheme of self-stabilizing solutions for locally maximizable tasks in asynchronous anonymous networks. The scheme is based on a probabilistic unique naming of processes, which allows each process, in the same way as Algorithm 1, to deterministically find the exact topology consisting of all of its ancestors. In the naming algorithm, each process is given a name variable that can be arbitrary large (thus the unbounded memory requirement). The naming is unique with probability 1 after a bounded number of new name draws. The new name draw consists in appending a random bit at the end of the current identifier. Each time the process is activated, a new random bit is appended. In parallel, we essentially run the deterministic algorithm to find the topology consisting of all ancestors of the process. The main difference from Algorithm 1 is in handling the process identifiers. The variable Topology (similar to that of Algorithm 1) of a particular process may contain several different identifiers of a same process since the identifier of the process continues to get longer and longer in every execution of the algorithm. To circumvent the problem, we consider two distinct identifiers to be the same if one is a prefix of the other, and anytime such same identifiers conflict, only the longest one is retained. 08-ANR-SEGI-025 Page 29 D2.4 Another difference is that we do not need the distance information. The distance information is used in Algorithm 1 to remove the fake tuples (i, ID, inp, d) of process i such that i is an identifier of a non-existing process or ID 6= P.i, which may exist in the initial configuration. In our scheme, fake tuples with identifiers that are prefixes of identifiers of real processes are eventually removed since the correct identifier eventually becomes longer than any fake identifier. Notice that tuples with fake identifiers are eventually disconnected from the constructed subgraph topology and thus are removed. The details of the algorithm are given in Algorithm 3. Algorithm 3 A probabilistic scheme in asynchronous anonymous networks constants of process i inp i : input value of i (of the problem to be solved); variables of process i id i : identifier (binary string) of i; Pi : identifier set of P.i; Topology i : set of (id , ID, inp) tuples; // topology that i is currently aware of. // id: a process identifier // ID: identifier set of P.(id) // inp: input value of id function update(Topologyi ) idi := append(idi , random_bit); // append a random bit to the current id Pi := identifier set of P.i; // update the identifier setSof i’s predecessors Topology i := {(id i , Pi , inp i )} ∪ j∈P.i Topology j ; while ∃(id , ID, inp), (id 0 , ID 0 , inp 0 ) ∈ Topology i s.t. id’ is a prefix of id remove (id 0 , ID 0 , inp 0 ) from Topology i ; while ∃(id , ID, inp) ∈ Topology i s.t. id is unreachable to i in G(Topology i ) remove (id , ID, inp) from Topology i ; solve(Topology i ) change the state of i to the task-dependent solution using Topologyi actions of process i true −→ update(Topology i ); solve(Topology i ); Theorem 7 In asynchronous anonymous networks, the scheme of Algorithm 3 can provide a selfstabilizing probabilistic solution to any problem such that each process can find its final state (or its output) solely from the following information: (i) the topology of its ancestors including their input values, and (ii) a total order over its ancestors that is consistent among an arbitrarily given total order over all the processes. Its expected convergence time is O(log n + log ` + D) rounds where ` is the number of fake identifiers in the initial configuration. By a similar argument of Corollary 6, we can obtain the following corollary from Theorem 7. Corollary 7 The scheme of Algorithm 3 can provide self-stabilizing probabilistic solutions for the UMIS, the UGC and the UMM problems in asynchronous anonymous networks. 08-ANR-SEGI-025 Page 30 D2.4 3.4.3 Probabilistic solution with bounded memory in synchronous anonymous networks The scheme in Algorithm 3 is based on global unique naming, however, self-stabilizing global unique naming in unidirectional networks inherently requires unbounded memory. The goal of this subsection is to present a scheme, with bounded memory, of self-stabilizing solutions. To avoid usage of unbounded memory space, the scheme attains and utilizes a local unique naming instead of the global one. The local unique naming guarantees that two processes have distinct identifiers whenever one is reachable from the other. Indeed, such a local naming is sufficient for each process to recognize the strongly connected component it belongs to. Once the component is recognized, some problems such as the UMIS, the UGC and the UMM problems can be solved by a method similar to that in Section 3.4.2. In our scheme to achieve local unique naming, each process extends its identifier by appending a random bit when it finds an ancestor with the same identifier as its own. To be able to perform such a detection, a process needs to distinguish any of its ancestors from itself even when they have the same identifier. The detection mechanism is basically executed as follows: each process draws a random number, and disseminates its identifier together with the random number to its descendants. When process i receives the same identifier as its own, it checks whether the attached random number is same as its own. If they are different, the process detects that this is a distinct process (that is, a real ancestor) with the same identifier as its own current identifier. When the process receives the same identifier with the same random number as its own for a given period of time, it draws a new random number and repeats the above procedure. Hence, as two different processes eventually draw different random numbers, eventually every process is able to detect an ancestor with the same identifier if such an ancestor exists. The above method may cause false detection (or false positive) when a process receives its own identifier but with an old random number. To avoid such false detection, each identifier is relayed with a distance counter and is removed when the counter becomes sufficiently large. Moreover, the process repeats the detection checks while keeping sufficiently long periods of time between them. The details of the self-stabilizing probabilistic algorithm for the local naming are presented in Algorithm 4. Lemma 12 Algorithm 4 presents a self-stabilizing probabilistic local naming algorithm in synchronous anonymous networks. Its expected convergence time is O(n log n + `) rounds where ` is the number of fake identifiers in the initial configuration. Proof Sketch: First we show that the algorithm is a self-stabilizing probabilistic local naming algorithm. For contradiction, we assume that two processes i and j (where j is an ancestor of i) keep a same identifier after a configuration. Without loss of generality, the distance from j to i is minimum among process pairs keeping same identifiers. Let j, u1 , u2 , . . . , um , i be the shortest path from j to i. Since all processes in the path have mutually distinct identifiers except for a pair i and j, (id j , rnd j , k) is not discarded in any intermediate process uk (1 ≤ k ≤ m) (because of k ≤ |{id | (id , ∗, ∗) ∈ ID uk }) and is delivered to i. Thus, eventually i detects id i = id j and rnd i 6= rnd j . Then i extends its identifier by adding a random bit, which is a contradiction. We evaluate the expected convergence time of the algorithm. By similar argument to the proof of Theorem 7, we can show that the expected number of bits added to a process identifier is O(log n). Notice that the number ` of fake identifiers has no influence to the evaluation, for the distance d of a fake identifier is larger than the timer value (once the timer is reset) and thus is removed (because of dist > |{id | (id , ∗, ∗) ∈ ID i }|) when function naming is executed. Actually we can show that all the fake identifiers existing in the initial configuration are removed in O(n + `) rounds. On the other hand, the time between two executions of function naming at a process depends on the number of currently existing identifiers (including the fake ones), which is initially O(n+`) and becomes O(n) within O(n+`) rounds. Thus, the expected convergence time is O(n log n + `) rounds. Algorithm 5 presents a scheme of self-stabilizing solutions in networks with local naming. Thus, the fair composition[27] with the local-naming algorithm in Algorithm 4 provides probabilistic self-stabilizing algorithms in synchronous anonymous networks. For simplicity, we omit the code for removing fake 08-ANR-SEGI-025 Page 31 D2.4 Algorithm 4 Probabilistic local naming in synchronous anonymous networks variables of process i id i : identifier (binary string) of i; rnd i : random number selected from {1, 2, . . . , k}; // k (≥ 2) is a constant ID i : set of (id , rnd , d ) tuples; // identifiers that i is currently aware of. // id: a process identifier // rnd: random number of id // d: distance that id traverses function update(ID i ) ID i := S {(id i , rnd i , 0)}∪ j∈P.i {(id , rnd , d + 1) | (id , rnd , d ) ∈ ID j }; while ∃(id , rnd , d ) ∈ ID i s.t. d > |{id | (id , ∗, ∗) ∈ ID i }|; remove (id , rnd , d ) from ID i ; if timer > |{id | (id , ∗, ∗) ∈ ID i }| // timer is incremented by one every round naming(ID i ) naming(ID i ) if ∃(id i , rnd , ∗) ∈ ID i s.t. rnd 6= rnd i id i := append (id i , random_bit); // append a random bit to the current id rnd i := number randomly selected from {1, 2, . . . , k}; reset_timer; // reset timer to 0 update(ID i ); actions of process i true −→ update(ID i ); 08-ANR-SEGI-025 Page 32 D2.4 initial information in Algorithm 5 since such fake initial information can be removed in a similar way to Algorithm 4. Similar to previous schemes, each process i has variable Topology i to store the topology of ancestors of i. However, distinct from previous algorithms, the exact topology of ancestors of i cannot be constructed because two distinct ancestors may have a same identifier when they are mutually unreachable. This may make it difficult or impossible for each process to find the solution of the problem solely from the topology information. Instead, as shown in Lemma 13, each process i can exactly construct the topology of the strongly-connected component it belongs to. To supplement the weakness of the topology information, a tuple stored in Topology i is of the form (id,ID,inp,lview,d) and has an additional entry lview to store the local view of process id (other entries id, ID, inp and d are same as those in Algorithm 1). The final states of external predecessors of the strongly connected component (or the processes that are predecessors of processes in the component but are not in the component) can be obtained from the local view and can be utilized to find the solution of the problem. Algorithm 5 A scheme in networks with local naming constants of process i id i : identifier of i; // distinct from that of any ancestor Pi : identifier set of P.i; inp i : input value of i (of the problem to be solved); variables of process i st i : state of i to be communicated (task-specific) lview i : set of (id,st,label) tuples; // local view of i // id: an identifier of a predecessor of i // st: a state of id // label: a link label at i assigned to the incoming link from id Topologyi : set of (id,ID,inp,lview,d) tuples; // topology that i is currently aware of. // id: a process identifier // ID: identifier set of P.(id ) // inp: input value of id // lview: local view of id // d: distance from id to i function update(Topology S i) lview i := j∈P.i {(id j , st j , label i (j))} // label i (j): local label at i for link (j, i) Topology S i := {(id i , Pi , inp i , lview i , 0)}∪ j∈P.i {(id, ID, inp, lview, d + 1)| (id, ID, inp, lview, d) ∈ Topology j }; while ∃(id , ID, inp, lview , d ) ∈ Topology i , ∃(id 0 , ID 0 , inp 0 , lview 0 , d 0 ) ∈ Topology i s.t. id = id0 and d < d0 remove (id 0 , ID 0 , inp 0 , lview 0 , d 0 ) from Topology i ; solve(Topology i ) change the state of i to the task-dependent solution using Topologyi actions of process i true −→ update(Topology i ); solve(Topology i ); Lemma 13 In synchronous locally-named networks, the scheme presented in Algorithm 5 allows each process to exactly recognize the topology of the strongly connected component it belongs to in O(D) rounds. Proof Sketch: It is obvious that variable Topologyi of each process i after D rounds consists of tuples (id , P.(id ), inp id , lview id , d(id , i)) from all the ancestors id of i. Notice that the local naming allows two 08-ANR-SEGI-025 Page 33 D2.4 7 5 7 10 5 7 4 10 5 4 1 1 3 8 3 8 9 9 11 11 (a) An actual graph G (b) Graph G1 (and G3 , G8 , and G9 ) Figure 3.2: An actual locally-named graph G and Graph G1 constructed at process 1. distinct processes to have a same identifier if they are mutually unreachable. Thus, Topology i may contain a same tuple (id , P.(id ), inp id , lview id , d(id , i)) of two or more distinct processes and/or may contain two tuples (id , P , inp, lview , d ) and (id , P 0 , inp 0 , lview 0 , d ) with same id and d but different values in some other entry. Each process i recognizes the topology of the strongly connected component i belongs to as the one in the following graph Gi = (Vi , Ei ): Vi = {id | (id , ∗, ∗, ∗, ∗) ∈ Topology i } and Ei = {(u, v) | (v, P, ∗, ∗, ∗) ∈ Topology i s.t. u ∈ P } (see Figure 3.2). In other words, Gi can be obtained from the actual graph G as follows: First consider the subgraph G0i induced by all ancestors of i, and then merge the processes with a same identifier into a single process. What we have to show is that Gi and G0i are the same with respect to the topology of the strongly connected component i belongs to. It is obvious that all processes in Gi and G0i are reachable to i. What we have to show is that process j is reachable from i in Gi (or j belongs to the strongly connected component of i) if and only if j is also reachable from i in G0i . The if part is obvious since Gi is obtained from G0i by merging processes. The only if part holds as follows. Consider two distinct processes j and j 0 with a same identifier if exist. Since they are mutually unreachable but are reachable to i, they are unreachable from i in G0i (otherwise one of them is reachable from the other). This implies that, in construction of Gi from G0i , merging is applied only to processes unreachable from i, that is, the merging has no influence on reachability from i. Thus, any process unreachable from i in G0i remains unreachable from i in Gi . Theorem 8 In synchronous anonymous networks, fair composition of the local-naming algorithm of Algorithm 4 and the scheme of Algorithm 5 can provide a probabilistic self-stabilizing solution to any problem such that each process can find its final state (or its output) solely from the following information: (i) the topology of its strongly connected component including their identifiers and input values, and (ii) the final states of external predecessors of its strongly connected component (given as local views of processes in the component). 08-ANR-SEGI-025 Page 34 D2.4 Its expected convergence time is O(n log n + `) rounds where ` is the number of fake identifiers in the initial configuration. The expected space complexity of the resulting algorithm is O(m log n). Corollary 8 Fair composition of the local-naming algorithm of Algorithm 4 and the scheme of Algorithm 5 can provide probabilistic self-stabilizing solutions for the UMIS, the UGC and the UMM problems in synchronous anonymous networks. Algorithm 6 A problem-dependent function at a process i for the UMIS problem function UMIS (Topology i ) UMIS i := ∅; Let W be the SCC of Topology i that i belongs to; for each id ∈ W in the descending order of identifiers if (6 ∃(∗, true, ∗) ∈ lview id ) and (P.id ∩ UMIS i = ∅) for (id , P.id , inp id , lview id , d) ∈ Topology i then UMISi := UMISi ∪ {id}; if idi ∈ UMISi output true; else output false; 3.5 Conclusion Although in bidirectional networks, self-stabilizing maximal independent set is as difficult as (nonGrundy) vertex coloring [37], this work proves that in unidirectional networks, the computing power and memory that is required to solve the problem varies greatly. Silent solutions to unidirectional uniform networks coloring do exist and require Θ(n2 ) (resp. Θ(1)) stabilization time when deterministic (resp. probabilistic) solutions are considered. By contrast, deterministic maximal independent set construction in uniform unidirectional networks is impossible, and silent maximal independent set construction is impossible, regardless of the deterministic or probabilistic nature of the protocols. Similar differences can be observed for maximal matching and grundy coloring. The self-stabilizing probabilistic naming techniques (defining equivalence classes over identifiers) that we introduced here could be of independent interest for solving other tasks in cases where similar impossibility results hold. While we presented positive results for the deterministic case with identifiers, and the non-silent probabilistic cases, there remains the immediate open question of the possibility to devise a probabilistic solution with bounded memory in asynchronous setting. Another interesting issue for further research related to global tasks. The global unique naming that we present in section 3.4.2 solves a truly global problem in networks where global communication is not feasible, by defining proper equivalences classes between various identifiers. The case of other classical global tasks in distributed systems (e.g. leader election) is worth investigating. 08-ANR-SEGI-025 Page 35 D2.4 Chapter 4 The Impact of Topology on Byzantine Containment in Stabilization 4.1 Introduction The advent of ubiquitous large-scale distributed systems advocates that tolerance to various kinds of faults and hazards must be included from the very early design of such systems. Byzantine Faulttolerance [43, 49] is traditionally used to mask the effect of a limited number of malicious faults. Making distributed systems tolerant to both transient and malicious faults is appealing yet proved difficult [26, 15, 48] as impossibility results are expected in many cases. Two main paths have been followed to study the impact of Byzantine faults in the context of self-stabilization: - Byzantine fault masking (any correct processes eventually satisfy its specification). In completely connected synchronous systems, one of the most studied problems in the context of self-stabilization with Byzantine faults is that of clock synchronization. In [6, 26], probabilistic self-stabilizing protocols were proposed for up to one third of Byzantine processes, while in [21, 42] deterministic solutions tolerate up to one fourth and one third of Byzantine processes, respectively. - Byzantine containment. For local tasks (i.e. tasks whose correctness can be checked locally, such as vertex coloring, link coloring, or dining philosophers), the notion of strict stabilization was proposed [48, 52, 45, 31]. Strict stabilization guarantees that there exists a containment radius outside which the effect of permanent faults is masked, provided that the problem specification makes it possible to break the causality chain that is caused by the faults. As many problems are not local, it turns out that it is impossible to provide strict stabilization for those. Note that a strictly stabilizing algorithm with a radius of 0 which runs on a completely connected system provides a masking approach. In this chapter, we investigate the possibility of Byzantine containment in a self-stabilizing setting for tasks that are global (i.e. for which there exists a causality chain of size r, where r depends on n the size of the network), and focus on a global problem, namely maximum metric tree construction (see [40, 36]). As strict stabilization is impossible with such global tasks, we weaken the containment constraint by relaxing the notion of containment radius to containment area, that is Byzantine processes may disturb infinitely often a set of processes which depends on the topology of the system and on the location of Byzantine processes. The main contribution of this chapter is to present new possibility results for containing the influence of unbounded Byzantine behaviors. In more details, we define the notion of topology-aware strict stabilization as the novel form of the containment and introduce containment area to quantify the quality of the containment. The notion of topology-aware strict stabilization is weaker than the strict stabilization but is stronger than the classical notion of self-stabilization (i.e. every topology-aware strictly stabilizing protocol is ANR SPADES. 08-ANR-SEGI-025 Page 36 D2.4 self-stabilizing, but not necessarily strictly stabilizing). To demonstrate the possibility and effectiveness of our notion of topology-aware strict stabilization, we consider maximum metric tree construction. It is shown in [48] that there exists no strictly stabilizing protocol with a constant containment radius for this problem. In this chapter, we provide a topology-aware strictly stabilizing protocol for maximum metric tree construction and we prove that the containment area of this protocol is optimal. 4.2 Distributed System A distributed system S = (V, E) consists of a set V = {v1 , v2 , . . . , vn } of processes and a set E of bidirectional communication links (simply called links). A link is an unordered pair of distinct processes. A distributed system S can be regarded as a graph whose vertex set is V and whose link set is E, so we use graph terminology to describe a distributed system S. Processes u and v are called neighbors if (u, v) ∈ E. The set of neighbors of a process v is denoted by Nv , and its cardinality (the degree of v) is denoted by ∆v (= |Nv |). The degree ∆ of a distributed system S = (V, E) is defined as ∆ = max{∆v | v ∈ V }. We do not assume existence of a unique identifier for each process. Instead we assume each process can distinguish its neighbors from each other by locally arranging them in some arbitrary order: the k-th neighbor of a process v is denoted by Nv (k) (1 ≤ k ≤ ∆v ). The distance between two processes u and v is the length of the shortest path between u and v. In this chapter, we consider distributed systems of arbitrary topology. We assume that a single process is distinguished as a root, and all the other processes are not distinguishable. We adopt the shared state model as a communication model in this chapter, where each process can directly read the states of its neighbors. The variables that are maintained by processes denote process states. A process may take actions during the execution of the system. An action is simply a function that is executed in an atomic manner by the process. The actions executed by each process are described by a finite set of guarded actions of the form hguardi −→ hstatementi. Each guard of process u is a boolean expression involving the variables of u and its neighbors. A global state of a distributed system is called a configuration and is specified by a product of states of all processes. We define C to be the set of all possible configurations of a distributed system S. For a R process set R ⊆ V and two configurations ρ and ρ0 , we denote ρ 7→ ρ0 when ρ changes to ρ0 by executing an action of each process in R simultaneously. Notice that ρ and ρ0 can be different only in the states of processes in R. For completeness of execution semantics, we should clarify the configuration resulting from simultaneous actions of neighboring processes. The action of a process depends only on its state at ρ and the states of its neighbors at ρ, and the result of the action reflects on the state of the process at ρ0 . A schedule of a distributed system is an infinite sequence of process sets. Let Q = R1 , R2 , . . . be a schedule, where Ri ⊆ V holds for each i (i ≥ 1). An infinite sequence of configurations e = ρ0 , ρ1 , . . . Ri is called an execution from an initial configuration ρ0 by a schedule Q, if e satisfies ρi−1 7→ ρi for each i (i ≥ 1). Process actions are executed atomically, and we also assume that a distributed daemon schedules the actions of processes, i.e. any subset of processes can simultaneously execute their actions. A more constrained daemon is the central one which must choose only one enabled process at each step. Note that, as the central daemon allows executions that are also allowed under the distributed daemon, an impossibility result under the central daemon is stronger than one under the distributed one. In the same way, a possibility result under the distributed daemon is stronger than one under the central one. The set of all possible executions from ρ0 ∈ C is denoted by Eρ0 . The set of all possible executions S is denoted by E, that is, E = ρ∈C Eρ . We consider asynchronous distributed systems where we can make no assumption on schedules except that any schedule is weakly fair : every process is contained in infinite number of subsets appearing in any schedule. 08-ANR-SEGI-025 Page 37 D2.4 In this chapter, we consider (permanent) Byzantine faults: a Byzantine process (i.e. a Byzantinefaulty process) can make arbitrary behavior independently from its actions. In other words, a Byzantine process has always an enabled rule and the daemon arbitrarily chooses a new state for this process when this process is activated. If v is a Byzantine process, v can repeatedly change its variables arbitrarily. The only restriction we do on Byzantine processes is that the root process can never be Byzantine. 4.3 Self-Stabilizing Protocol Resilient to Byzantine Faults Problems considered in this chapter are so-called static problems, i.e. they require the system to find static solutions. For example, the spanning-tree construction problem is a static problem, while the mutual exclusion problem is not. Some static problems can be defined by a local specification predicate (shortly, specification), spec(v), for each process v: a configuration is a desired one (with a solution) if every process v ∈ V satisfies spec(v) in this configuration. A specification spec(v) is a boolean expression on variables of Vv (⊆ V ) where Vv is the set of processes whose variables appear in spec(v). The variables appearing in the specification are called output variables (shortly, O-variables). In what follows, we consider a static problem defined by a local specification predicate. Self-Stabilization. A self-stabilizing protocol ([20]) is a protocol that eventually reaches a legitimate configuration, where spec(v) holds at every process v, regardless of the initial configuration. Once it reaches a legitimate configuration, every process never changes its O-variables and always satisfies spec(v). From this definition, a self-stabilizing protocol is expected to tolerate any number and any type of transient faults since it can eventually recover from any configuration affected by the transient faults. However, the recovery from any configuration is guaranteed only when every process correctly executes its action from the configuration, i.e., we do not consider existence of permanently faulty processes. Strict stabilization. When (permanent) Byzantine processes exist, Byzantine processes may not satisfy spec(v). In addition, correct processes near the Byzantine processes can be influenced and may be unable to satisfy spec(v). Nesterenko and Arora [48] define a strictly stabilizing protocol as a self-stabilizing protocol resilient to unbounded number of Byzantine processes. Given an integer c, a c-correct process is a process defined as follows. Definition 9 (c-correct process) A process is c-correct if it is correct ( i.e. not Byzantine) and located at distance more than c from any Byzantine process. Definition 10 ((c, f )-containment) A configuration ρ is (c, f )-contained for specification spec if, given at most f Byzantine processes, in any execution starting from ρ, every c-correct process v always satisfies spec(v) and never changes its O-variables. The parameter c of Definition 10 refers to the containment radius defined in [48]. The parameter f refers explicitly to the number of Byzantine processes, while [48] dealt with unbounded number of Byzantine faults (that is f ∈ {0 . . . n}). Definition 11 ((c, f )-strict stabilization) A protocol is (c, f )-strictly stabilizing for specification spec if, given at most f Byzantine processes, any execution e = ρ0 , ρ1 , . . . contains a configuration ρi that is (c, f )-contained for spec. An important limitation of the model of [48] is the notion of r-restrictive specifications. Intuitively, a specification is r-restrictive if it prevents combinations of states that belong to two processes u and v that are at least r hops away. An important consequence related to Byzantine tolerance is that the containment radius of protocols solving those specifications is at least r. For any global problem, such as the spanning tree construction we consider in this chapter, r can not be bounded to a constant. The results of [48] show us that there exists no (o(n), 1)-strictly stabilizing protocol for these problems and especially for the spanning tree construction. 08-ANR-SEGI-025 Page 38 D2.4 Topology-aware strict stabilization. In the former paragraph, we saw that there exist a number of impossibility results on strict stabilization due to the notion of r-restrictive specifications. To circumvent this impossibility result, we define here a new notion, which is weaker than the strict stabilization: the topology-aware strict stabilization (denoted by TA-strict stabilization for short). Here, the requirement to the containment radius is relaxed, i.e. the set of processes which may be disturbed by Byzantine ones is not reduced to the union of c-neighborhood of Byzantine processes but can be defined depending on the topology of the system and on Byzantine processes location. In the following, we give formal definition of this new kind of Byzantine containment. From now, B denotes the set of Byzantine processes and SB (which is a function of B) denotes a subset of V (intuitively, this set gathers all processes which may be disturbed by Byzantine processes). Definition 12 (SB -correct node) A node is SB -correct if it is a correct node ( i.e. not Byzantine) which does not belong to SB . Definition 13 (SB -legitimate configuration) A configuration ρ is SB -legiti- mate for spec if every SB -correct node v is legitimate for spec ( i.e. if spec(v) holds). Definition 14 ((SB , f )-topology-aware containment) A configuration ρ0 is (SB , f )-topology-aware contained for specification spec if, given at most f Byzantine processes, in any execution e = ρ0 , ρ1 , . . ., every configuration is SB -legitima- te and every SB -correct process never changes its O-variables. The parameter SB of Definition 14 refers to the containment area. Any process which belongs to this set may be infinitely disturbed by Byzantine processes. The parameter f refers explicitly to the number of Byzantine processes. Definition 15 ((SB , f )-topology-aware strict stabilization) A protocol is (SB , f )-topology aware strictly stabilizing for specification spec if, given at most f Byzantine processes, any execution e = ρ0 , ρ1 , . . . contains a configuration ρi that is (SB , f )-topology-aware contained for spec. Note that, if B denotes the set of Byzantine processes and SB = {v ∈ V |min{d(v, b), b ∈ B} ≤ c}, then a (SB , f )-topology-aware strictly stabilizing protocol is a (c, f )-strictly stabilizing protocol. Then, a TA-strictly stabilizing protocol is generally weaker than a strictly stabilizing one, but stronger than a classical self-stabilizing protocol (that may never meet its specification in the presence of Byzantine processes). The parameter SB is introduced to quantify the strength of fault containment, we do not require each process to know the actual definition of the set. Actually, the protocol proposed in this chapter assumes no knowledge on this parameter. 4.4 Maximum Metric Tree Construction In this work, we deal with maximum (routing) metric spanning trees as defined in [36] (note that [40] provides a self-stabilizing solution to this problem). Informally, the goal of a routing protocol is to construct a tree that simultaneously maximizes the metric values of all of the nodes with respect to some total ordering ≺. In [36], authors give a general definition of a routing metric and provide a characterization of maximizable metrics, that is metrics which always allow to construct a maximum (routing) metric spanning trees. In the following, we recall all definitions and notations introduced in [36]. Definition 16 (Routing metric) A routing metric is a five-tuple (M, W, met, mr, ≺) where: - M is a set of metric values, 08-ANR-SEGI-025 Page 39 D2.4 - W is a set of edge weights, - met is a metric function whose domain is M × W and whose range is M , - mr is the maximum metric value in M with respect to ≺ and is assigned to the root of the system, - ≺ is a less-than total order relation over M that satisfies the following three conditions for arbitrary metric values m, m0 , and m00 in M : - irreflexivity: m 6≺ m, - transitivity : if m ≺ m0 and m0 ≺ m00 then m ≺ m00 , - totality: m ≺ m0 or m0 ≺ m or m = m0 . Any metric value m ∈ M \ {mr} satisfies the utility condition (that is, there exist w0 , . . . , wk−1 in W and m0 = mr, m1 , . . . , mk−1 , mk = m in M such that ∀i ∈ {1, . . . , k}, mi = met(mi−1 , wi−1 )). For instance, we provide the definition of three classical metrics with this model: the shortest path metric (SP), the flow metric (F), and the reliability metric (R). SP where = (M1 , W1 , met1 , mr1 , ≺1 ) M1 = N W1 = N met1 (m, w) = m + w mr1 = 0 ≺1 is the classical > relation R where = F where = (M2 , W2 , met2 , mr2 , ≺2 ) mr2 ∈ N M2 = {0, . . . , mr2 } W2 = {0, . . . , mr2 } met2 (m, w) = min{m, w} ≺2 is the classical < relation (M3 , W3 , met3 , mr3 , ≺3 ) M3 = [0, 1] W3 = [0, 1] met3 (m, w) = m ∗ w mr3 = 1 ≺3 is the classical < relation Definition 17 (Assigned metric) An assigned metric over a system S is a six-tuple (M, W, met, mr, ≺, wf ) where (M, W, met, mr, ≺) is a metric and wf is a function that assigns to each edge of S a weight in W . Let a rooted path (from v) be a simple path from a process v to the root r. The next set of definitions are with respect to an assigned metric (M, W, met, mr, ≺, wf ) over a given system S. Definition 18 (Metric of a rooted path) The metric of a rooted path in S is the prefix sum of met over the edge weights in the path and mr. For example, if a rooted path p in S is vk , . . . , v0 with v0 = r, then the metric of p is mk = met(mk−1 , wf ({vk , vk−1 })) with ∀i ∈ {1, k − 1}, mi = met(mi−1 , wf ({vi , vi−1 })) and m0 = mr. Definition 19 (Maximum metric path) A rooted path p from v in S is called a maximum metric path with respect to an assigned metric if and only if for every other rooted path q from v in S, the metric of p is greater than or equal to the metric of q with respect to the total order ≺. 08-ANR-SEGI-025 Page 40 D2.4 Definition 20 (Maximum metric of a node) The maximum metric of a node v 6= r (or simply metric value of v) in S is defined by the metric of a maximum metric path from v. The maximum metric of r is mr. Definition 21 (Maximum metric tree) A spanning tree T of S is a maximum metric tree with respect to an assigned metric over S if and only if every rooted path in T is a maximum metric path in S with respect to the assigned metric. The goal of the work of [36] is the study of metrics that always allow the construction of a maximum metric tree. More formally, the definition follows. Definition 22 (Maximizable metric) A metric is maximizable if and only if for any assignment of this metric over any system S, there is a maximum metric tree for S with respect to the assigned metric. Note that [40] provides a self-stabilizing protocol to construct a maximum metric tree with respect to any maximizable metric. Moreover, [36] provides a fully characterization of maximazable metrics as follow. Definition 23 (Boundedness) A metric (M, W, met, mr, ≺) is bounded if and only if: ∀m ∈ M, ∀w ∈ W, met(m, w) ≺ m or met(m, w) = m Definition 24 (Monotonicity) A metric (M, W, met, mr, ≺) is monotonic if and only if: ∀(m, m0 ) ∈ M 2 , ∀w ∈ W, m ≺ m0 ⇒ (met(m, w) ≺ met(m0 , w) or met(m, w) = met(m0 , w)) Theorem 9 (Characterization of maximizable metrics [36]) A metric is maximizable if and only if this metric is bounded and monotonic. Given a maximizable metric M = (M, W, mr, met, ≺), the aim of this work is to construct a maximum metric tree with respect to M which spans the system in a self-stabilizing way in a system subject to permanent Byzantine failures. It is obvious that these Byzantine processes may disturb some correct processes. It is why, we relax the problem in the following way: we want to construct a maximum metric forest with respect to M. The root of any tree of this forest must be either the real root or a Byzantine process. Each process v has three O-variables: a pointer to its parent in its tree (prntv ∈ Nv ∪ {⊥}), a level which stores its current metric value (levelv ∈ M ), and a variable which stores its distance to the root of its tree (distv ∈ {0, . . . , D}). Obviously, Byzantine process may disturb (at least) their neighbors. We use the following specification of the problem. We introduce new notations as follows. Given an assigned metric (M, W, met, mr, ≺, wf ) over the system S and two processes u and v, we denote by µ(u, v) the maximum metric of node u when v plays the role of the root of the system. If u and v are two neighor processes, we denote by wu,v the weight of the edge {u, v} (that is, the value of wf ({u, v})). Definition 25 (M-path) Given an assigned metric M = (M, W, mr, met, ≺, wf ) over a system S, a M-path is a path (v0 , . . . , vk ) (k ≥ 1) such that: (i) prntv0 = ⊥, levelv0 = mr, distv0 = 0, and v0 ∈ B ∪ {r}, (ii) ∀i ∈ {1, . . . , k}, prntvi = vi−1 , levelvi = met(levelvi−1 , wvi ,vi−1 ), and distvi = i, (iii) ∀i ∈ {1, . . . , k}, met(levelvi−1 , wvi ,vi−1 ) = max≺ {met(levelu , wvi ,u ), u ∈ Nvi }, and (iv) levelvk = µ(vk , v0 ). We define the specification predicate spec(v) of the maximum metric tree construction with respect to a maximizable metric M as follows. ( prntv = ⊥, levelv = mr, and distv = 0 if v is the root r spec(v) : there exists a M-path (v0 , . . . , vk ) such that vk = v otherwise 08-ANR-SEGI-025 Page 41 D2.4 Figure 4.1: Examples of containment areas for flow spanning tree construction. Figure 4.2: Examples of containment areas for reliability spanning tree construction. Following discussion of Section 4.3 and results from [48], it is obvious that there exists no strictly stabilizing protocol for this problem. It is why we consider the weaker notion of topology-aware strict stabilization. First, we show an impossibility result in order to define the best possible containment area. Then, we provide a maximum metric tree construction protocol which is (SB , f )-TA-strictly stabilizing where f ≤ n − 1 which matches these optimal containment area. From now, SB denotes this optimal containment area, i.e.: SB = {v ∈ V \ B |µ(v, r) max≺ {µ(v, b), b ∈ B} } \ {r} Intuitively, Byzantine faults may disturb only processes that are (non strictly) closer from a Byzantine process than the root with respect to the metric. Figures 4.1 and 4.2 provide some examples of containment areas with respect to two maximizable metrics. We introduce here a new definition that is used in the following. Definition 26 (Fixed point) A metric value m is a fixed point of a metric M = (M, W, mr, met, ≺) if m ∈ M and if for all value w ∈ W , we have: met(m, w) = m. 4.4.1 Impossibility Result In this section, we show that there exist some constraints on the containment area of any topology-aware strictly stabilizing protocol for the maximum metric tree construction depending on the metric. Theorem 10 Given a maximizable metric M = (M, W, mr, met, ≺), even under the central daemon, there exists no (AB , 1)-TA-strictly stabilizing protocol for maximum metric spanning tree construction with respect to M where AB SB . Proof 18 Let M = (M, W, mr, met, ≺) be a maximizable metric and P be a (AB , 1)-TA-strictly stabilizing protocol for maximum metric spanning tree construction protocol with respect to M where AB SB . We must distinguish the following cases: Case 1: |M | = 1. Denote by m the metric value such that M = {m}. For any system and for any process v 6= r, we have µ(v, r) = min≺ {µ(v, b), b ∈ B} = m. Consequently, SB = V \ (B ∪ {r}) for any system. Consider the following system: V = {r, u, v, b} and E = {{r, u}, {u, v}, {v, b}} (b is a Byzantine process). As SB = {u, v} and AB SB , we have: u ∈ / AB or v ∈ / AB . Consider now the following configuration ρ00 : prntr = prntb = ⊥, prntv = b, prntu = v, levelr = levelu = levelv = levelb = m, distr = distb = 0, distv = 1 and distu = 2 (other variables may have arbitrary values). Note that ρ00 is AB -legitimate for spec (whatever AB is). Assume now that b behaves as a correct process with respect to P. Then, by convergence of P in a fault-free system starting from ρ00 which is not legitimate (remember that a strictly-stabilizing protocol is a special case of a self-stabilizing protocol), we can deduce that the system reaches in a finite time a configuration ρ01 in which: prntr = ⊥, prntu = r, prntv = u, prntb = v, levelr = levelu = levelv = 08-ANR-SEGI-025 Page 42 D2.4 levelb = m, distr = 0, distu = 1, distv = 2 and distb = 3. Note that processes u and v modify their O-variables in this execution. This contradicts the (AB , 1)-TA-strict stabilization of P (whatever AB is). Case 2: |M | ≥ 2. By definition of a bounded metric, we can deduce that there exist m ∈ M and w ∈ W such that m = met(mr, w) ≺ mr. Then, we must distinguish the following cases: Case 2.1: m is a fixed point of M. Consider the following system: V = {r, u, v, b}, E = {{r, u}, {u, v}, {v, b}}, wr,u = wv,b = w, and wu,v = w0 (b is a Byzantine process). As for any w0 ∈ W , met(m, w0 ) = m (by definition of a fixed point), we have: SB = {u, v}. Since AB SB , we have: u ∈ / AB or v ∈ / AB . Consider now the following configuration ρ10 : prntr = prntb = ⊥, prntv = b, prntu = v, levelr = levelb = mr, levelu = levelv = m, distr = distb = 0, distv = 1 and distu = 2 (other variables may have arbitrary values). Note that ρ10 is AB -legitimate for spec (whatever AB is). Assume now that b behaves as a correct process with respect to P. Then, by convergence of P in a fault-free system starting from ρ10 which is not legitimate (remember that a strictly-stabilizing protocol is a special case of a self-stabilizing protocol), we can deduce that the system reaches in a finite time a configuration ρ11 in which: prntr = ⊥, prntu = r, prntv = u, prntb = v, levelr = mr, levelu = levelv = levelb = m (since m is a fixed point), distr = 0, distu = 1, distv = 2 and distb = 3. Note that processes u and v modify their O-variables in this execution. This contradicts the (AB , 1)-TA-strict stabilization of P (whatever AB is). Case 2.2: m is not a fixed point of M. This implies that there exists w0 ∈ W such that: met(m, w0 ) ≺ m (remember that M is bounded). Consider the following system: V = {r, u, v, v 0 , b}, E = {{r, u}, {u, v}, {u, v 0 }, {v, b}, {v 0 , b}}, wr,u = wv,b = wv0 ,b = w, and wu,v = wu,v0 = w0 (b is a Byzantine process). We can see that SB = {v, v 0 }. Since AB SB , we have: v ∈ / AB or v 0 ∈ / AB . Consider now the following configuration ρ20 : prntr = prntb = ⊥, prntv = prntv0 = b, prntu = r, levelr = levelb = mr, levelu = levelv = levelv0 = m, distr = distb = 0, distv = distv0 = 1 and distu = 1 (other variables may have arbitrary values). Note that ρ20 is AB -legitimate for spec (whatever AB is). Assume now that b behaves as a correct process with respect to P. Then, by convergence of P in a fault-free system starting from ρ20 which is not legitimate (remember that a strictly-stabilizing protocol is a special case of a self-stabilizing protocol), we can deduce that the system reaches in a finite time a configuration ρ21 in which: prntr = ⊥, prntu = r, prntv = prntv0 = u, prntb = v (or prntb = v 0 ), levelr = mr, levelu = m levelv = levelv0 = met(m, w0 ) = m0 , levelb = met(m0 , w) = m00 , distr = 0, distu = 1, distv = distv0 = 2 and distb = 3. Note that processes v and v 0 modify their O-variables in this execution. This contradicts the (AB , 1)-TA-strict stabilization of P (whatever AB is). 4.4.2 Topology-Aware Strict Stabilizing Protocol In this section, we provide our self-stabilizing protocol that achieves optimal containment areas to permanent Byzantine failures for constructing a maximum metric tree for any maximizable metric M = (M, W, met, mr, ≺). More formally, our protocol is (SB , f )-strictly stabilizing, that is optimal with respect to the result of Theorem 10. Our protocol is borrowed from the one of [40] (which is self-stabilizing). The key idea of this protocol is to use the distance variable (upper bounded by a given constant D) to detect and break cycles of processes which have the same maximum metric. The main modifications we bring to this protocol follow. In the initial protocol, when a process modifies its parent, it chooses arbitrarily one of the "better" neighbors (with respect to the metric). To achieve the (SB , f )TA-strict stabilization, we must ensure a fair selection along the set of its neighbor. We perform this fairness with a round-robin order along the set of neighbors. The second modification is to give priority to rules (R2 ) and (R3 ) over (R1 ) for any correct non root process (that is, such a process which has (R1 ) and another rule enabled in a given configuration always executes the other rule if it is activated). Our solution is presented as Algorithm 3. In the following, we provide a sketch1 of the proof of the TA-strict stabilization of SSMAX . 1 Due to the lack of place, formal proofs are omitted. A full version of this work is available in the companion 08-ANR-SEGI-025 Page 43 D2.4 Algorithm 3 SSMAX : A TA-strictly stabilizing protocol for maximum metric tree construction. Data: Nv : totally ordered set of neighbors of v. D: upper bound of the number of processes in a simple path. Variables: ( prntv = ⊥ if v = r ∈ Nv if v 6= r : pointer on the parent of v in the tree. levelv ∈ {m ∈ M |m mr}: metric of the node. distv ∈ {0, . . . , D}: distance to the root. Macro: For any subset A ⊆ Nv , choose(A) returns the first element of A which is bigger than prntv (in a round-robin fashion). Rules: (Rr ) :: (v = r) ∧ ((levelv 6= mr) ∨ (distv 6= 0)) −→ levelv := mr; distv := 0 (R1 ) :: (v 6= r) ∧ (prntv ∈ Nv )∧ ((distv 6= min(distprntv + 1, D)) ∨ (levelv 6= met(levelprntv , wv,prntv ))) −→ distv := min(distprntv + 1, D); levelv := met(levelprntv , wv,prntv ) (R2 ) :: (v 6= r) ∧ (distv = D) ∧ (∃u ∈ Nv , distu < D − 1) −→ prntv := choose({u ∈ Nv |distv < D − 1}); distv := distprntv + 1; levelv := met(levelprntv , wv,prntv ) (R3 ) :: (v 6= r) ∧ (∃un ∈ Nv , (dist u < D − 1) ∧ (levelv ≺ met(levelu , wu,v ))) u ∈ Nv −→ prntv := choose (levelu < D − 1) ∧ (met(levelu , wu,v ) = max≺ q∈Nv /levelq <D−1 o {met(levelq , wq,v )}) ; levelv := met(levelprntv , wprntv ,v ); distv := distprntv + 1 08-ANR-SEGI-025 Page 44 D2.4 Remember that the real root r can not be a Byzantine process by hypothesis. Note that the subsystem whose set of nodes is (V \ SB ) \ B is connected respectively by boundedness of the metric. Given ρ ∈ C and m ∈ M , let us define the following predicate: IMm (ρ) ≡ ∀v ∈ V, levelv max≺ {m, max≺ {µ(v, u), u ∈ B ∪ {r}}} If we take a configuration ρ ∈ C such that IMm (ρ) holds for a given m ∈ M , then we can prove that the boundedness of M implies that, for any step ρ 7→ ρ0 of SSMAX , IMm (ρ0 ) holds. Hence, we can deduce that: Lemma 14 For any metric value m ∈ M , the predicate IMm is closed by actions of SSMAX . Given an assigned metric to a system S, observe that the set of metrics value M is finite and that we can label elements of M by m0 = mr, m1 , . . . , mk such that ∀i ∈ {0, . . . , k − 1}, mi+1 ≺ mi . We introduce the following notations: ∀mi ∈ M, Pmi ∀mi ∈ M, Vmi = v ∈ (V \ SB ) \ B µ(v, r) = mi = i S Pm j j=0 ∀mi ∈ M, Imi = v ∈ Vmax≺ {µ(v, u), u ∈ B ∪ {r}} ≺ m i ∀mi ∈ M, LC mi = ρ ∈ C (∀v ∈ Vmi , spec(v)) ∧ (IMmi (ρ)) LC = LC mk If we consider a configuration ρ ∈ LC mi for a given metric value mi and a process v ∈ Vmi , then we can show from the closure of IMmi (established in Lemma 14), the boundedness of M and the construction of the protocol that v is not enabled in ρ. Then, the closure of IMmi is sufficient to conclude that: Lemma 15 For any mi ∈ M , the set LC mi is closed by actions of SSMAX . Lemma 15 applied to LC = LC mk gives us the following result: Lemma 16 Any configuration of LC is (SB , n − 1)-TA contained for spec. This lemma establishes the closure of SSMAX . To prove the TA strict stabilization of SSMAX , it remains to prove its convergence. In this goal, we prove that any execution starting from an arbitrary configuration of C converges to LC m0 = LC mr and then to LC m1 and so on until LC mk = LC. Note that IMmr is satisfied by any configuration of C and that if all processes of Pmr are not enabled in a configuration then this configuration belongs to LC mr . Then, we can prove that any process of Pmr takes only a finite number of steps in any execution. This implies the following result: Lemma 17 Starting from any configuration of C, any execution of SSMAX reaches in a finite time a configuration of LC mr . technical report (see [29]). 08-ANR-SEGI-025 Page 45 D2.4 Given a metric value mi ∈ M and a configuration ρ0 ∈ LC mi , assume that e = ρ0 , ρ1 , . . . is an execution of SSMAX starting from ρ0 . We define then the following variant function. For any configuration ρj of e, we denote by Aj the set of processes v of Imi such that levelv = mi in ρj . Then, we define f (ρj ) = min{distv , v ∈ Aj }. We can prove that there exists an integer k such that f (ρk ) = D. This implies the following lemma: Lemma 18 For any mi ∈ M and for any configuration ρ ∈ LC mi , any execution of SSMAX starting from ρ reaches in a finite time a configuration such that ∀v ∈ Imi , levelv = mi ⇒ distv = D. Given a metric value mi ∈ M , consider a configuration ρ0 ∈ LC mi such that ∀v ∈ Imi , levelv = mi ⇒ distv = D. Assume that e = ρ0 , ρ1 , . . . is an execution of SSMAX starting from ρ0 . For any configuration ρi of e, we define the following set Eρi = {v ∈ Imi |levelv = mi }. First, we prove that there exists an integer k such that for any integer j ≥ k, we have Eρj+1 ⊆ Eρj . In other words, there exists a point of the execution afterwards the set E can not grow. Moreover, we prove that if a process of Eρj (j ≥ k) is activated during the step ρj 7→ ρj+1 , then it satisfies v ∈ / Eρj+1 . Finally, we observe that any process v ∈ Imi such that distv = D is activated in a finite time. In conclusion, we obtain that there exists an integer j such that Eρj = ∅. In other words, we have: Lemma 19 For any mi ∈ M and for any configuration ρ ∈ LC mi such that ∀v ∈ Imi , levelv = mi ⇒ distv = D, any execution of SSMAX starting from ρ reaches in a finite time a configuration such that ∀v ∈ Imi , levelv ≺ mi . A direct consequence of Lemmas 18 and 19 is the following. Lemma 20 For any mi ∈ M and for any configuration ρ ∈ LC mi , any execution of SSMAX starting from ρ reaches in a finite time a configuration ρ0 such that IMmi+1 (ρ0 ) holds. Given a metric value mi ∈ M , consider a configuration ρ ∈ LC mi . We know by Lemma 20 that any execution starting from ρ reaches in a finite time a configuration ρ0 such that IMmi+1 (ρ0 ) holds. Denote by e an execution starting from ρ0 . Now, we can observe that, if all processes of Pmi+1 are not enabled in a configuration of e, then this configuration belongs to LC mi+1 . Then, we can prove that any process of Pmi+1 takes only a finite number of steps in any execution starting from ρ0 . This implies the following result: Lemma 21 For any mi ∈ M and for any configuration ρ ∈ LC mi , any execution of SSMAX starting from ρ reaches in a finite time a configuration of LC mi+1 . Let ρ be an arbitrary configuration. We know by Lemma 17 that any execution starting from ρ reaches in a finite time a configuration of LC mr = LC m0 . Then, we can apply at most k times the result of Lemma 21 to obtain that any execution starting from ρ reaches in a finite time a configuration of LC mk = LC, that proves the following result: Lemma 22 Starting from any configuration, any execution of SSMAX reaches a configuration of LC in a finite time. Lemmas 16 and 22 imply respectively the closure and the convergence of SSMAX . We can summarize our results with the following theorem. Theorem 11 SSMAX is a (SB , n − 1)-TA strictly stabilizing protocol for spec. 08-ANR-SEGI-025 Page 46 D2.4 4.5 Conclusion We introduced a new notion of Byzantine containment in self-stabilization: the topology-aware strict stabilization. This notion relaxes the constraint on the containment radius of the strict stabilization to a containment area. In other words, the set of correct processes which may be infinitely often disturbed by Byzantine processes is a function depending on the topology of the system and on the actual location of Byzantine processes. We illustrated the relevance of this notion by providing a topology-aware strictly stabilizing protocol for the maximum metric tree construction problem which does not admit strictly stabilizing solution. Moreover, our protocol performs the optimal containment area with respect to the topology-aware strict stabilization. Our work raises some opening questions. Number of problems do not accept strictly stabilizing solution. Does any of them admit a topology-aware strictly stabilizing solution ? Is it possible to give a necessary and/or sufficient condition for a problem to admit a topology-aware strictly stabilizing solution ? What happens if we consider only bounded Byzantine behavior ? 08-ANR-SEGI-025 Page 47 D2.4 Bibliography [1] Eytan Adar and Bernardo A. Huberman. Free riding on gnutella. In First Monday, 2000. [2] Yehuda Afek and Anat Bremler. Self-stabilizing unidirectional network algorithms by powersupply. In SODA ’97: Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms, pages 111–120. Society for Industrial and Applied Mathematics, 1997. [3] Yehuda Afek and Anat Bremler-Barr. Self-stabilizing unidirectional network algorithms by power supply. Chicago J. Theor. Comput. Sci., 1998, 1998. [4] Yehuda Afek and Shlomi Dolev. Local stabilizer. J. Parallel Distrib. Comput., 62(5):745– 765, 2002. [5] Joffroy Beauquier, Sylvie Delaët, Shlomi Dolev, and Sébastien Tixeuil. Transient fault detectors. Distributed Computing, 20(1):39–51, June 2007. [6] Michael Ben-Or, Danny Dolev, and Ezra N. Hoch. Fast self-stabilizing byzantine tolerant digital clock synchronization. In Rida A. Bazzi and Boaz Patt-Shamir, editors, PODC, pages 385–394. ACM, 2008. [7] Samuel Bernard, Stéphane Devismes, Katy Paroux, Maria Potop-Butucaru, and Sébastien Tixeuil. Probabilistic self-stabilizing vertex coloring in unidirectional anonymous networks. In Proceedings of ICDCN 2010, Lecture Notes in Computer Science, Kolkata, India, January 2010. Springer Berlin / Heidelberg. [8] Samuel Bernard, Stéphane Devismes, Maria Gradinariu Potop-Butucaru, and Sébastien Tixeuil. Optimal deterministic self-stabilizing vertex coloring in unidirectional anonymous networks. In Proceedings of the IEEE International Conference on Parallel and Distributed Processing Systems (IPDPS 2009), Rome, Italy, May 2009. IEEE Press. [9] George Bosilca, Camille Coti, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. Constructing resiliant communication infrastructure for runtime environments. In Proceedings of the Abstracts of Parco’09, page to appear, 2009. [10] B. Bourgon, A.K. Datta, and V. Natarajan. A self-stabilizing ranking algorithm for tree structured networks. In Computers and Communications, 1995. Conference Proceedings of the 1995 IEEE Fourteenth Annual International Phoenix Conference on, pages 23–28, Mar 1995. [11] Darius Buntinas, George Bosilca, Richard L. Graham, Geoffroy Vall?, and Gregory R. Watson. A scalable tools communications infrastructure. High Performance Computing Systems and Applications, Annual International Symposium on, 0:33–39, 2008. ANR SPADES. 08-ANR-SEGI-025 Page 48 D2.4 [12] Ralph Butler, William Gropp, and Ewing Lusk. A scalable process-management environment for parallel programs. In In Euro PVM/MPI, pages 168–175. Springer-Verlag, 2000. [13] R. H. Castain, T. S. Woodall, D. J. Daniel, J. M. Squyres, B. Barrett, and G. E. Fagg. The open run-time environment (openrte): A transparent multicluster environment for high-performance computing. Future Gener. Comput. Syst., 24(2):153–157, 2008. [14] Jorge Arturo Cobb and Mohamed G. Gouda. Stabilization of routing in directed networks. In Ajoy Kumar Datta and Ted Herman, editors, WSS, volume 2194 of Lecture Notes in Computer Science, pages 51–66. Springer, 2001. [15] Ariel Daliot and Danny Dolev. Self-stabilization of byzantine protocols. In Ted Herman and Sébastien Tixeuil, editors, Self-Stabilizing Systems, volume 3764 of Lecture Notes in Computer Science, pages 48–67. Springer, 2005. [16] Sajal K. Das, Ajoy Kumar Datta, and Sébastien Tixeuil. Self-stabilizing algorithms in dag structured networks. Parallel Processing Letters, 9(4):563–574, December 1999. [17] Ajoy Kumar Datta and Maria Gradinariu, editors. Stabilization, Safety, and Security of Distributed Systems, 8th International Symposium, SSS 2006, Dallas, TX, USA, November 17-19, 2006, Proceedings, volume 4280 of Lecture Notes in Computer Science. Springer, 2006. [18] Sylvie Delaët, Bertrand Ducourthial, and Sébastien Tixeuil. Self-stabilization with roperators revisited. Journal of Aerospace Computing, Information, and Communication, 2006. [19] Sylvie Delaët and Sébastien Tixeuil. Tolerating transient and intermittent failures. Journal of Parallel and Distributed Computing, 62(5):961–981, May 2002. [20] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commun. ACM, 17(11):643–644, 1974. [21] Danny Dolev and Ezra N. Hoch. On self-stabilizing synchronous actions despite byzantine attacks. In Andrzej Pelc, editor, DISC, volume 4731 of Lecture Notes in Computer Science, pages 193–207. Springer, 2007. [22] Shlomi Dolev and Ted Herman. Superstabilizing protocols for dynamic distributed systems. Chicago J. Theor. Comput. Sci., 1997, 1997. [23] Shlomi Dolev, Amos Israeli, and Shlomo Moran. Self-stabilization of dynamic systems. In Proceedings of the MCC Workshop on Self-Stabilizing Systems, MCC Technical Report No. STP-379-89, 1989. [24] Shlomi Dolev, Amos Israeli, and Shlomo Moran. Uniform dynamic self-stabilizing leader election. IEEE Transactions on Parallel and Distributed Systems, 8(4):424–440, 1997. [25] Shlomi Dolev and Elad Schiller. Self-stabilizing group communication in directed networks. Acta Inf., 40(9):609–636, 2004. [26] Shlomi Dolev and Jennifer L. Welch. Self-stabilizing clock synchronization in the presence of byzantine faults. J. ACM, 51(5):780–799, 2004. 08-ANR-SEGI-025 Page 49 D2.4 [27] SC Douglas. Self-stabilized gradient algorithms for blind source separation with orthogonality constraints. IEEE Transactions on Neural Networks, 11(6):1490–1497, 2000. [28] Swan Dubois, Toshimitsu Masuzawa, and Sebastien Tixeuil. The impact of topology on byzantine containment in stabilization. In Proceedings of DISC 2010, Lecture Notes in Computer Science, Boston, Massachusetts, USA, September 2010. Springer Berlin / Heidelberg. [29] Swan Dubois, Toshimitsu Masuzawa, and Sébastien Tixeuil. The Impact of Topology on Byzantine Containment in Stabilization. Research report inria-00481836 (http://hal.inria.fr/inria-00481836/en/), 05 2010. c [30] Swan Dubois, Toshimitsu Masuzawa, and SÃbastien Tixeuil. On byzantine containment properties of the min+1 protocol. In Proceedings of SSS 2010, Lecture Notes in Computer Science, New York, NY, USA, September 2010. Springer Berlin / Heidelberg. [31] Swan Dubois, Maria Gradinariu Potop-Butucaru, Mikhail Nesterenko, and Sébastien Tixeuil. Self-stabilizing byzantine asynchronous unison. CoRR, abs/0912.0134, 2009. [32] Bertrand Ducourthial and Sébastien Tixeuil. Self-stabilization with r-operators. Distributed Computing, 14(3):147–162, July 2001. [33] Bertrand Ducourthial and Sébastien Tixeuil. Self-stabilization with path algebra. Theoretical Computer Science, 293(1):219–236, February 2003. Extended abstract in Sirocco 2000. [34] Vijay K. Garg and Anurag Agarwal. Self-stabilizing spanning tree algorithm with a new design methodology. Technical Report TR-PDS-2004-001, University of Texas at Austin, PDS Laboratory Technical Reports, 2004. [35] Felix C. Gärtner. A survey of self-stabilizing spanning-tree construction algorithms. Technical Report IC/2003/38, Ecole Polytechnique Federale de Lausanne, Technical Reports in Computer and Communication Sciences, 2003. [36] Mohamed G. Gouda and Marco Schneider. Maximizable routing metrics. IEEE/ACM Trans. Netw., 11(4):663–675, 2003. [37] Maria Gradinariu and Sébastien Tixeuil. Self-stabilizing vertex coloring of arbitrary graphs. In International Conference on Principles of Distributed Systems (OPODIS’2000), pages 55–70, Paris, France, December 2000. [38] Maria Gradinariu and Sébastien Tixeuil. Conflict managers for self-stabilization without fairness assumption. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS 2007), page 46. IEEE, June 2007. [39] Sandeep K. S. Gupta and Pradip K. Srimani. Self-stabilizing multicast protocols for ad hoc networks. J. Parallel Distrib. Comput., 63(1):87–96, 2003. [40] SKS Gupta and PK Srimani. Mobility tolerant maintenance of multi-cast tree in mobile multi-hop radio networks. In Proceedings of the 1999 International Conference on Parallel Processing, pages 490–497, 1999. 08-ANR-SEGI-025 Page 50 D2.4 [41] Thomas Herault, Pierre Lemarinier, Olivier Peres, Laurence Pilard, and Joffroy Beauquier. A model for large scale self-stabilization. In 21st IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2007. [42] Ezra N. Hoch, Danny Dolev, and Ariel Daliot. Self-stabilizing byzantine digital clock synchronization. In Datta and Gradinariu [17], pages 350–362. [43] Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, 1982. [44] Toshimitsu Masuzawa and Sébastien Tixeuil. Bounding the impact of unbounded attacks in stabilization. In Datta and Gradinariu [17], pages 440–453. [45] Toshimitsu Masuzawa and Sébastien Tixeuil. Stabilizing link-coloration of arbitrary networks with unbounded byzantine faults. International Journal of Principles and Applications of Information Science and Technology (PAIST), 1(1):1–13, December 2007. [46] Toshimitsu Masuzawa and Sébastien Tixeuil. Stabilizing locally maximizable tasks in unidirectional networks is hard. In Procedings of ICDCS 2010. IEEE Press, June 2010. [47] Nathalie Mitton, Bruno Séricola, Sébastien Tixeuil, Eric Fleury, and Isabelle GuérinLassous. Self-stabilization in self-organized multihop wireless networks. Ad hoc and Sensor Wireless Networks, January 2010. [48] Mikhail Nesterenko and Anish Arora. Tolerance to unbounded byzantine faults. In 21st Symposium on Reliable Distributed Systems, page 22. IEEE Computer Society, 2002. [49] Mikhail Nesterenko and Sébastien Tixeuil. Discovering network topology in the presence of byzantine nodes. IEEE Trans. Parallel Distrib. Syst., October 2009. [50] Olivier Peres. Construction de topologies autostabilisante dans les systèmes à grande échelle. PhD thesis, Univ. Paris Sud 11, 2008. [51] Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Rachid Guerraoui, editor, Middleware, volume 2218 of Lecture Notes in Computer Science, pages 329–350. Springer, 2001. [52] Yusuke Sakurai, Fukuhito Ooshita, and Toshimitsu Masuzawa. A self-stabilizing linkcoloring protocol resilient to byzantine faults in tree networks. In Principles of Distributed Systems, 8th International Conference, OPODIS 2004, volume 3544 of Lecture Notes in Computer Science, pages 283–298. Springer, 2005. [53] Sébastien Tixeuil. Algorithms and Theory of Computation Handbook, Second Edition, chapter Self-stabilizing Algorithms, pages 26.1–26.45. Chapman & Hall/CRC Applied Algorithms and Data Structures. CRC Press, Taylor & Francis Group, November 2009. [54] Masafumi Yamashita and Tsunehiko Kameda. Computing on anonymous networks: Part i-characterizing the solvable cases. IEEE Trans. Parallel Distrib. Syst., 7(1):69–89, 1996. 08-ANR-SEGI-025 Page 51