SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC
Transcription
SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC
SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC RESOURCE ALLOCATION IN DISCRETE EVENT SYSTEMS A Dissertation Presented by CHRISTOS PANAYIOTOU Submitted to the Graduate School of the University of Massachusetts, Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 1999 Department of Electrical and Computer Engineering c °Copyright by Christos Panayiotou 1999 All Rights Reserved SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC RESOURCE ALLOCATION IN DISCRETE EVENT SYSTEMS A Dissertation Presented by CHRISTOS PANAYIOTOU Approved as to style and content by: Christos G. Cassandras, Chair Theodore E. Djaferis, Co-Chair Wei-Bo Gong, Member Agha Iqbal Ali, Member Seshu Desu, Department Head Department of Electrical and Computer Engineering ABSTRACT SAMPLE PATH OPTIMIZATION TECHNIQUES FOR DYNAMIC RESOURCE ALLOCATION IN DISCRETE EVENT SYSTEMS SEPTEMBER 1999 CHRISTOS PANAYIOTOU, B.S.E.C.E., UNIVERSITY OF MASSACHUSETTS AMHERST M.B.A., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Christos G. Cassandras The main focus of this dissertation is the dynamic allocation of discrete-resources in stochastic environments. For this purpose, we develop two algorithms that can be used to address such problems. The first one, is descent, in other words at every iteration it goes to an allocation with a lower cost, and it is suitable for problems with separable convex structure. Furthermore, at every iteration it visits feasible allocations which makes it appropriate for use on-line. The second one, is incremental, that is, it starts with zero resources, and at every step it allocates an additional resource. Both algorithms are proven to converge in deterministic as well as stochastic environments. Furthermore, because they are driven by ordinal comparisons they are robust with respect to estimation noise and converge fast. To complement the implementation of the derived optimization algorithms we develop two techniques for predicting the system performance under several parameters while observing a single sample path under a single parameter. The first technique, Concurrent Estimation can be directly applied to general DES while for the second one, FPA, we demonstrate a general procedure for deriving such algorithm for the system dynamics. Moreover, both procedures can be used for systems with general event lifetime distributions. The dissertation ends with three applications of the derived resource allocation methodologies on three different problems. First, the incremental algorithm is used on the a kanban-based manufacturiv ing system to find the kanban allocation that optimizes a given objective function (e.g., throughput, mean delay). Next, a variation of the descent algorithm is used to resolve the channel allocation problem in cellular telephone networks as to minimize the number of lost calls. Finally, a combination of the FPA and kanban approaches were used to solve the ground holding problem in air traffic control to minimize the congestion over busy airports. v Contents ABSTRACT iv LIST OF FIGURES xii 1 INTRODUCTION 1 1.1. Classification of Resource Allocation Problems . . . . . . . . . . . . . . . . . . . . . . . 1 1.2. Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4. Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 BACKGROUND ON STOCHASTIC OPTIMIZATION 7 2.1. Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Ordinal Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3. Stochastic Ruler (SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4. Stochastic Comparison (SC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5. Nested Partitioning (NP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6. Multi-armed Bandit Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.7. Noise Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 DESCENT ALGORITHMS FOR DISCRETE-RESOURCE ALLOCATION 12 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2. Characterization of the Optimal Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3. Deterministic Descent On-Line Optimization Algorithm . . . . . . . . . . . . . . . . . . 15 3.3.1 . Interpretation of D-DOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.2 . Properties of the Process D-DOP . . . . . . . . . . . . . . . . . . . . . . . . . . 16 vi 3.3.3 . Convergence of the D-DOP Process . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4. Stochastic On-Line Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.1 . Properties of the S-DOP Process . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.2 . Convergence of the S-DOP Process . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.3 . A Stronger Convergence Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 INCREMENTAL ALGORITHMS FOR DISCRETE-RESOURCE ALLOCATION 24 4.1. Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2. Deterministic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 . Deterministic Incremental Optimization Algorithm (DIO) . . . . . . . . . . . . 25 4.2.2 . Complementary Deterministic Incremental Optimization Algorithm (DIO) . . . 26 4.2.3 . Extension of the Incremental Optimization Algorithms . . . . . . . . . . . . . . 26 4.3. Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.1 . Stronger Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4. Discussion on the Incremental Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 PERTURBATION ANALYSIS 30 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2. Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3. Concurrent Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3.1 . Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.3.2 . Timed State Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3.3 . Coupling Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.4 . Extensions of the TWA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4. Finite Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4.1 . Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4.2 . Derivation of Departure Time Perturbation Dynamics . . . . . . . . . . . . . . . 38 5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 vii 6 OPTIMIZATION OF KANBAN-BASED MANUFACTURING SYSTEMS 40 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.2. More on the Smoothness Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.3. Application of the Incremental Optimization Algorithms . . . . . . . . . . . . . . . . . 43 6.3.1 . Application of SIO on Serial Manufacturing Process . . . . . . . . . . . . . . . 43 6.3.2 . Application of SIO on a Network . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3.3 . Application of SIO on a Network . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 7 CHANNEL ALLOCATION IN CELLULAR TELEPHONE NETWORKS 48 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.2. Overlapping Cells and Modeling Assumptions . . . . . . . . . . . . . . . . . . . . . . . 49 7.3. DR and DH Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.4. Performance Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.4.1 . Simple Neighborhood (SN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.4.2 . Extended Neighborhood (EN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.4.3 . On-Line Implementation of SN and EN . . . . . . . . . . . . . . . . . . . . . . . 54 7.5. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.6. Conclusions and Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 8 GROUND-HOLDING PROBLEM IN AIR TRAFFIC CONTROL 61 8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 8.2. System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 8.3. Kanban-Smoothing (KS) Control Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8.3.1 . Representation of (KS) as a Timed State Automaton . . . . . . . . . . . . . . . 65 8.3.2 . Evaluation of GHD Under (KS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.4. Airplane Scheduling Using Finite Perturbation Analysis . . . . . . . . . . . . . . . . . . 68 8.4.1 . FPA-Based Control Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 8.4.2 . Global Optimality of the FPA Approach . . . . . . . . . . . . . . . . . . . . . . 74 8.4.3 . Algorithm Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 8.5. Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 viii 8.5.1 . Performance of KS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8.5.2 . Performance of L-FPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.6. Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 9 EPILOGUE 82 9.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 9.2. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 A SELECTED ALGORITHMS 85 A.1 S-DOP Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 A.2 Time Warping Algorithm (TWA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 A.3 Finite Perturbation Algorithm for Serial Queueing Systems . . . . . . . . . . . . . . . 86 B PROOFS FROM CHAPTER 3 88 B.1 Proof of Theorem 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 B.2 Proof of Theorem 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 B.3 Proof of Lemma 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 B.4 Proof of Lemma 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 B.5 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 B.6 Proof of Lemma 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 B.7 Proof of Lemma 3.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 B.8 Proof of Theorem 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 B.9 Proof of Theorem 3.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 C PROOFS FROM CHAPTER 4 108 C.1 Proof of Theorem 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 C.2 Proof of Theorem 4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 C.3 Proof of Theorem 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 D PROOFS FROM CHAPTER 5 111 D.1 Proof of Theorem 5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 ix E PROOFS FROM CHAPTER 8 114 E.1 Proof of Theorem 8.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 E.2 Proof of Lemma 8.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 E.3 Proof of Lemma 8.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 E.4 Proof of Theorem 8.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 BIBLIOGRAPHY 125 x List of Figures 3.1 Evolutions of the modified D-DOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1 The sample path constructability problem for DES . . . . . . . . . . . . . . . . . . . . 32 5.2 FPA System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.1 Manufacturing systems consisting of N stations in series . . . . . . . . . . . . . . . . . 41 6.2 Manufacturing network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.3 Queueing network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.4 Evolution of the SIO algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.5 Evolution of the SIO algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.6 Ranking of the allocations picked by SIO . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.7 Evolution of the (SIO) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.8 Ranking of the allocations picked by SIO . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.1 Overlapping Cell Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.2 Cell overlapping as a function of the cell radius . . . . . . . . . . . . . . . . . . . . . . 56 7.3 Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.14 57 7.4 Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.4 58 7.5 Average number of induced handoffs for EN and DH. . . . . . . . . . . . . . . . . . . . 58 7.6 Call loss probabilities as a function of the cell radius . . . . . . . . . . . . . . . . . . . 59 7.7 Call loss probabilities as a function of the parameter τ . . . . . . . . . . . . . . . . . . 59 7.8 Call loss probabilities for non-uniform traffic . . . . . . . . . . . . . . . . . . . . . . . . 60 8.1 Destination airport queueing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 8.2 Stage representation for the KS control policy . . . . . . . . . . . . . . . . . . . . . . . 65 xi 8.3 Assignment of Ground-Holding Delay (GHD) under KS. (a) GHD until the beginning of the next stage. (b) GHD until the end of the next stage. (c) GHD until the previous airplane clears the runway. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.4 Timing diagram for ground-holding delay . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.5 Global Optimality: (a) L-FPA result (b) Global Optimum. . . . . . . . . . . . . . . . 74 8.6 FPA-based algorithms. (a) Local FPA (L-FPA) Controller triggered by airplane departures, (b) Global FPA (G-FPA) Controller triggered at any time . . . . . . . . . . 76 8.7 Hourly landings at airport D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 8.8 Trade-off between airborne and ground delays for the KS controller . . . . . . . . . . . 78 8.9 Overall cost improvement under the KS scheme . . . . . . . . . . . . . . . . . . . . . . 79 8.10 Trade-off between airborne and ground delays for the L-FPA controller . . . . . . . . . 80 8.11 Overall cost improvement under the L-FPA scheme . . . . . . . . . . . . . . . . . . . . 81 E.1 τ ∗ for Case I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 E.2 τ ∗ for Case II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 E.3 τ ∗ for Case III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 E.4 Case IV subcases: (a) P > 0, (b) P < 0 . . . . . . . . . . . . . . . . . . . . . . . . . . 117 xii Chapter 1 INTRODUCTION This dissertation focuses on optimal resource allocation in the context of Discrete-Event Systems (DES). These are systems, mainly “man-made”, where state changes are due to asynchronous occurrences of discrete events [12]. For example, consider a computer that processes jobs on a first come first serve basis. For this system, the state is described by the number of jobs that are either processed or wait to be processed. The state changes only when a new job is submitted to the computer (job arrival) or when the computer finishes processing a job (job departure). Therefore, for this system, all activity is observed only at the instants of a job arrival or departure. At any other point in time, the state of the system remains unchanged. For the purposes of this thesis, a “resource” corresponds to any means that can be used by a “user” to achieve a goal. This interpretation of resource allocation can be applied to a broad class of systems. For example, in the context of wireless communications, a “channel” is a resource, a mobile phone is the user, and the goal is to allow two people that are physically located in distant areas to communicate. Another example may be the automatic teller machine (resource) that allows people (users) to get cash. Also, note that many entities may be viewed as either users or resources depending on the context. For example, in a queueing system, buffers may be the resources while servers may be the users. From another perspective though, servers may be viewed as the resources while customers become the users. Finally, it is worth pointing out that several other problems may be mapped into resource allocation problems. For example, consider the ground-holding problem in air traffic control (see Chapter 8) where it is required to schedule airplane arrivals so that congestion is avoided. In this case, the time that the runway can be used is divided into small intervals, each representing a resource which is then allocated to each flight to facilitate its landing or takeoff. 1.1. Classification of Resource Allocation Problems Resources, depending on their nature, can be classified as “continuous” or “discrete”. The basic premise of this classification is whether a resource is divisible or not. Note however that this distinction may not always be clear. Take, for example, a computer link with 1 Mega bit per second (Mbps) capacity. If a user (computer) can request any amount from the available capacity, then the resource allocation problem is considered to be continuous. In this case, a user may request 100Kbps while another may request 133Kbps. In another setting, the capacity allocation may be viewed as a discrete resource allocation problem. For example, suppose the aforementioned link is divided into 1 10 discrete channels with capacity 100Kbps each. While the first user will request a single channel, the second one, with the 133Kbps requirement, might decide that s/he will get a single channel and suffer a loss in the quality of service (QoS), or may request two channels and pay a higher cost while wasting the 67Kbps. Another classification of resource allocation problems is whether they are “static” or “dynamic”. In static problems, the objective function corresponds to a long, possibly infinite, time horizon. In this case, the optimization problem is solved once and it is not revisited until the end of this long interval. On the other hand, in dynamic optimization problems, the objective function is defined over a finite horizon, the length of which is much smaller than the time horizon of the static problem. In this case, the optimization problem is solved multiple times, once at the end of each short interval. Therefore, a dynamic controller can reallocate resources as to optimize the objective function based on the information available at the end of each interval, which of course, is going to be at least the same as the information available at the beginning of interval one. Since in general, more information leads to better decisions, a dynamic controller will perform better than a static one at the expense of collecting more information. Yet another classification for resource allocation problems refers to the environment that the underlying system operates in. If the event times and state transitions are known exactly, and if the objective function under any allocation can be calculated exactly, then the system is said to be “deterministic”. On the other hand, if any of event times and state transitions are random variables, then the system is said to be “stochastic”. 1.2. Dissertation Overview In general, resources are scarce while there are many users that compete to gain control over them. The first goal of this thesis is to derive ways in which resources can be allocated to users so that an objective function is optimized. The main focus of this dissertation is the dynamic allocation of discrete-resources in stochastic environments. Discrete Resource allocation problems often arise in the context of Discrete Event Systems (DES). Classic examples include the channel allocation in communication networks [15, 76] and the buffer allocation problem in queueing models [36, 78]. Our second goal is to apply the derived techniques to real complex systems where finding a closed form solution for the system’s performance, under any allocation, is very difficult if at all possible. Thus, performance must be estimated through Monte Carlo simulation or by direct measurements made on the actual system. In these systems one is forced to make the resource reallocation decisions based on noisy estimates of the system’s performance1 . While the area of stochastic optimization over continuous decision spaces is rich and usually involves gradient-based techniques as in several well-known stochastic approximation algorithms [46, 64], the literature in the area of discrete stochastic optimization is relatively limited. The known approaches are based on (i) multi-armed bandit theory [28, 6], (ii) branch and bound techniques [66, 53], and (iii) random search algorithms [77, 31, 3]. (for more details see Chapter 2). The main difficulty in solving optimization problems over a discrete parameter set is that gradients are not defined. Therefore, all mathematical tools that have been developed for solving continuous optimization problems simply do not apply to discrete optimization problems. Since gradient information is not meaningful for the type of systems we are investigating, we substitute it with a finite difference, 1 When appropriate or to help the clarity of the presentation, we may assume a deterministic system. 2 which is of the form ∆L(n) = L(n + 1) − L(n), (1.1) where, L(n) is the value of the objective function under n resources. We derive necessary and sufficient conditions that such finite difference must satisfy to achieve optimality, and we use it to develop optimization algorithms that yield the optimal allocation. The first problem we consider is that of allocating discrete resources to a set of users when the objective function has a separable structure J(x) = N X Ji (xi ) (1.2) i=1 where xi is the number of resources allocated to user i. For this problem, we identified necessary and sufficient conditions that the optimal allocation must satisfy in a deterministic environment. Based on these conditions, we develop a “descent” optimization algorithm that yields the optimal allocation in a finite number of steps. Subsequently, we adapted the optimization algorithm for stochastic environments, and showed that the modified algorithm converges to the optimal allocation in probability. These results were published in [13]. Furthermore, we showed that the modified algorithm, under some additional mild assumptions, converges to the optimal allocation almost surely. This result has been accepted for publications in [23] where, it is also shown that the rate of convergence of this algorithm for the class of “regenerative” systems is exponential. Subsequently, we considered the problem of resource allocation for a system that does not have the nice separable structure of (1.2), but satisfy a “smoothness” condition (defined in Chapter 4). For this type of systems, we have developed an “incremental” optimization algorithm that yields the optimal allocation in a finite number of steps when the performance estimates are known exactly. Moreover, we modified the algorithm for stochastic environments and showed that it converges to the optimal allocation in probability and, under some additional mild assumptions, almost surely. These results have been accepted for publication in [57]. Two features of the resource allocation schemes we analyze are worth noting because of their practical implications. All iterative reallocation steps are driven by ordinal comparisons, which are particularly robust with respect to noise in the estimation process (see Chapter 2). Consequently, (i) As in other ordinal optimization schemes (e.g., [37, 38]), convergence is fast because short estimation intervals are adequate to guide allocations towards the optimal, and (ii) There is no need for “step size” or “scaling” parameters which arise in algorithms driven by cardinal estimates of derivatives or finite differences; instead, based on the result of comparisons of various quantities, allocations are updated by reassigning one resource with respect to the current allocation. This avoids the difficult problem of selecting appropriate values for these parameters, which is often crucial to the convergence properties of such the algorithms. In order to be able to apply the proposed optimization algorithms on-line, it is necessary to develop efficient ways of calculating the finite differences of the form of (1.1). For this reason we have developed the following two schemes: (i) Concurrent Estimation (CE): a fairly general method of constructing sample paths under any parameter by observing a sample path under a single parameter. The results of this work are presented in [16]. 3 (ii) Finite Perturbation Analysis (FPA): a more efficient but less general way of constructing a sample path under some “neighboring” allocations2 . This scheme takes advantage of the special structure of some systems. Subsequently, we use the principles from the derived optimization schemes to solve the resource allocation problems for three different applications. The first application is from the area of kanbanbased manufacturing systems where kanban (tokens) are used to maintain low work-in-process inventory (WIP). In that context, kanban constitute discrete resources that are allocated to the various productions stations (users) as to optimize an objective function while maintaining low WIP. To solve this problem, we used the “incremental” optimization algorithm described in Chapter 4 in conjunction with the FPA scheme mentioned earlier as described in [57]. Our second application comes from the area of wireless communications and deals with problem of channel allocating in cellular telephone networks. In this case, we assume the model of “overlapping” cells described in [26] and apply a variation of the descent algorithm to distribute subscribers over the various base stations as to minimize the probability that an arriving call will be lost due to the unavailability of a free channel. Finally, the last application is from the area of air traffic control. In that context it is generally true that airborne delays are more expensive than ground holding delays, hence the objective is to determine the ground holding delay of each airplane as to minimize the overall waiting cost. For this problem we propose two solutions. The first approach is referred to as the Kanban Smoothing (KS) flow control policy, first proposed in [56]. KS is designed to “smooth” an arrival process by systematically reducing its variance. The second approach uses FPA to determine the change in the value of the cost function if a new airplane is allowed to arrive at the destination airport as a function of its ground delay. Hence, it determines the delay than minimizes that cost. These schemes are also presented in [58]. 1.3. Contributions The contributions of this dissertation are the following. • For the separable convex resource allocation problem, we have extended the results of [14] in two ways. First, in the deterministic case, we derived several properties of the on-line descent optimization algorithm. Second, we have modified the algorithm so that it is applicable in a stochastic environment and have investigated its convergence properties. The resulting stochastic optimization algorithm uses pseudo-sensitivities (finite differences (1.1)) to determine the next allocation and is considerably different from the already existing approaches which are based on Bandit theory, branch and bound techniques and random search. • Using the pseudo-sensitivities again, we have shown that the algorithm INCREASE described in [41] for deterministic, separable, convex objective functions can be also used for non-separable resource allocation problems, but which satisfy the “smoothness” condition defined in Chapter 4. In addition, we modified the algorithm so that it can be used in stochastic environments and have proved that it converges to the optimal allocation. 2 These are the allocations resulting by adding or removing a single resource from the allocation of the observed sample path. 4 • We have developed the Time Warping Algorithm (TWA) which implements “concurrent estimation/simulation” and can be used to solve the sample path constructability problem for DES (see Chapter 5). Even though the basic idea behind concurrent estimation/simulation is not new (it was often implied in the literature, e.g. see [12, 80]) it was never developed for general systems. TWA is a fairly general simulation algorithm which can solve the sample path constructability problem for DES with arbitrary lifetime distributions unlike Augmented System Analysis (ASA) [10] and Standard Clock (SC) [74] which require exponentially distributed lifetimes. • We illustrate a procedure that can be used to obtain estimates of finite perturbations for systems where the dynamics can be described though Lindley-type recursions. • We apply the resource allocation techniques we developed to cellular telephone networks. Specifically, we extend the methodologies for channel allocations when “overlapping cells” are allowed. In this context, Karlsson [44] and Everitt [26] have developed heuristic algorithms for channel allocation for overlapping cells. We have developed two new channel allocation algorithms which are based on the aforementioned discrete resource allocation algorithms and have used simulation to show that our algorithms improve the system performance which is usually defined as the call loss probability. • Another area where we apply our resource allocation techniques is the ground-holding problem in air traffic control. For this problem, we developed two new and efficient approaches for solving this problem, the Kanban-Smoothing and the FPA approach, which are unrelated with the linear programming type approaches that have been used to attack this problem. 1.4. Organization of the Dissertation This dissertations can be divided into two main parts. The first part deals with the development of new methodologies for attacking resource allocation problems while the second part deals with applications of resource allocation techniques on real systems. The first part starts with Chapter 2 which reviews some of the literature on approaches for solving stochastic optimization problems. Furthermore, it presents some relevant results that will be used in the development of the methodologies proposed in this thesis. Subsequently, in Chapter 3 we address the problem of allocating identical resources to a set of users when the objective function to be optimized is separable and convex and propose a “descent” optimization algorithm. In Chapter 4 we relax the separability assumption and propose an “incremental” algorithm that yields the optimal allocation under the “smoothness” condition. In both chapters, we first address the deterministic problem and show that the proposed algorithms yield the optimal allocation in a finite number of steps. Subsequently, we address the stochastic problem and show that under some mild assumptions the proposed algorithms converge to the global optimum. The first part ends with Chapter 5 where we developed techniques for obtaining the finite differences of (1.1), by observing a single sample path under a single parameter. The second part includes the applications. In Chapter 6 we deal with the problem of allocating a finite number of kanban to the various production stages of a manufacturing system. In Chapters 7 we address the issue of allocating communication channels to base stations in a TDMA/FDMA cellular communication network and in Chapter 8 we address the ground holding problem in air traffic 5 control. Summary and conclusions are presented in Chapter 9. Finally, the appendices include some of the derived algorithms as well as the proofs of all theorems and lemmas presented in the main part of the thesis. 6 Chapter 2 BACKGROUND ON STOCHASTIC OPTIMIZATION This chapter describes the problem of stochastic optimization and reviews some of the attempts of solving it. 2.1. Problem Formulation In many resource allocation problems, we wish to find the allocation x that optimizes a performance measure J(x), where x belongs in a finite set X . For many real systems however, it is often impossible to derive a closed form solution for J(x) and as a result, when evaluating system performance, we are forced to resort to estimation obtained through simulation or on-line system observation over an interval of length t. For the purposes of this thesis, we assume that the effect of estimation noise is decreased as the observation interval t increases. A2.1: Ergodicity Assumption: For every allocation x the performance estimate Jˆt (x) taken over a sample path of length t approaches its true value as t goes to infinity. That is: lim Jˆt (x) = J(x), a.s. t→∞ This assumption is mild and in the context of discrete-event systems it usually holds. Note that the performance measures of DES are often expressed in the form of an expectation J(x) = E[L(x, ξ)] (2.1) where L(x, ξ) is a random variable corresponding to the system performance under parameter x while ξ represents the uncertainty. When evaluating (2.1) only realizations of L(x, ξ) are available so a standard approach of estimating the performance of such systems if though sampling, i.e., t 1X ˆ J(x) = Li (x, ξi ). t i=1 7 (2.2) In this case, assuming that the realizations Li (x, ξ) form an i.i.d process with E[Li (x, ξ)] < ∞ and V ar(Li (x, ξ)) < ∞ then A2.1 holds due to √ the strong law of large numbers. However, it is also true that the rate of convergence is only O(1/ t) in the sense that µ ˆ E[J(x) − E[L(x, ξ)]]2 = ¶ µ 1 1 V ar(L(x, ξ)) = O √ t t ¶ (2.3) Note that for large systems, this rate of convergence is prohibitively slow. It implies that to obtain an accurate performance estimate under a single parameter, very long simulations are required. Then, it is easy to see that for systems with a large number of feasible allocations it would be practically impossible to obtain estimates under all possible allocations. Next, we present some of the techniques that were developed to solve the problem of stochastic optimization. 2.2. Ordinal Comparison The ordinal comparison technique [37] is based on two main principles. 1. Goal Softening. 2. Ordinal rather than cardinal optimization. Goal softening is a realization that in many applications rather than spending a lot of resources in finding the optimal allocation it is often more desirable to find an allocation that is good enough with the minimum amount of effort. In other words, find an allocation that is within the top α% of all possible design. The second principle of ordinal comparison suggests comparing the relative goodness (rank) between different designs without knowing the exact values of the corresponding performance measures. For example suppose that we want to choose between two allocations x1 and x2 and suppose that ˆ J(x1 ) < J(x2 ). Then, even when the performance estimates J(x) have a large noise component, ˆ ˆ it is highly probable that J(x1 ) < J(x2 ). This suggests that it is possible to perform the resource allocation without having to obtain accurate estimates of J(x). Next, we present without proof, two results from [22] that reveal some of the properties of ordinal comparison and which will be proven useful in the sequel. The first lemma is a direct consequence of A2.1, and suggests that as the observation interval t increases, then the probability that the performance estimates will give the correct order goes to one. Lemma 2.2.1 Let J(x1 ) < J(x2 ) and suppose that assumption A2.1 holds. Then, lim Pr[Jˆt (x1 ) ≥ Jˆt (x2 )] = 0, and t→∞ lim Pr[Jˆt (x1 ) < Jˆt (x2 )] = 1 k→∞ The second lemma establishes the rate of convergence for comparing δˆt against 0. Lemma 2.2.2 Suppose that {δˆt , t ≥ 0} is a stochastic process satisfying 8 (a) limt→∞ δˆt = δ, a.s.; (b) limt→∞ E[δˆt ] = δ; (c) V ar[δˆt ] = O ³ ´ 1 t . If δ > 0, then Pr[δˆt ≤ 0] = O ³ ´ 1 t . The assumption in Lemma 2.2.2 is very mild and almost always satisfied in the simulation or direct sample path observation of discrete-event dynamic systems. Another interesting results proven by Dai [22] indicates that for the class of “Regenerative Systems1 ” the order of the performance estimates converges at a rate which is exponentially fast. Finally, note that the principles and properties of ordinal comparison can be used to complement other optimization schemes. This is true of the algorithms described next, as well as, for some of the schemes that will be developed later in the sequel. 2.3. Stochastic Ruler (SR) The Stochastic Ruler [77] is motivated by Simulated Annealing [1] method. In essence, this algorithm defines a sequence of allocations {xi , i = 1, 2, · · ·} and for every allocation it defines a neighborhood N (xi ) ⊂ X . To determine the next allocation, xi+1 the algorithm randomly picks an allocation y ∈ N (xi ) and compares its performance H(y) to a random variable Θ(a, b) which is uniformly distributed in the interval (a, b). (Θ(a, b) represents the stochastic ruler). If the system’s performance is better than the random variable, then, SR adopts y as the new allocation (xi+1 = y) otherwise xi+1 = xi . Application of SR is complicated for two reasons. First, a priori information is needed on the range of the performance estimates in order to determine the stochastic ruler range (a, b). Second, it is necessary to define the neighborhood structure N (xi ) for all i = 1, 2, · · ·. Clearly, identifying a good neighboring structure will benefit the algorithm performance, however, in general this is a very difficult task. These restrictions of SR have motivated the development of Stochastic Comparison which is described next. 2.4. Stochastic Comparison (SC) The Stochastic Comparison approach [32] uses the principles of random search [9] to overcome the shortcomings of SR. The SR approach also defines a sequence of allocation {xi , i = 1, 2, · · ·} but in order to determine the next allocation, it randomly picks an allocation y from the entire search space and compares its performance to the performance of the current allocation. If the performance of the new allocation is better than the performance of the current allocation, then SC adopts the new allocation (xi+1 = y) otherwise xi+1 = xi . SC eliminates the neighborhood identification problem by always selecting an allocation from the entire search space. In other words, the neighborhood of any allocation includes every feasible 1 See [43] for details on regenerative systems 9 allocation. Furthermore, SC does not require any a priori information on the system performance since the comparison is always between the performances of the current allocation and the allocation under test. Different variations of SR and SC are proposed in [3] and in references therein. 2.5. Nested Partitioning (NP) The Nested Partitioning approach [68, 67] combines the principles of random search with branchand-bound techniques to determine the global optimum allocation. The algorithm consists of four basic steps. (i) partitioning, (ii) random sampling, (iii) identification of promising region, and (iv) backtracking. Specifically, the algorithm works as follows. First, it divides the entire search space into M regions and randomly samples allocations from each region σi0 , i = 1, · · · , M . Using the obtained samples, it determines the most promising region (say σ10 ). Subsequently, it divides the selected region σ10 into M smaller regions σi1 , i = 1, · · · , M and aggregates all other regions into a 1 single region σM +1 . Then it randomly samples allocations from all M + 1 regions and it identifies the new most promising region. If the most promising region is any of the regions 1 to M , it divides that region into another M sub-regions and again aggregates the remaining regions into a single region M + 1. This continues until the most promising region is a singleton. In the event that the surrounding region M + 1 becomes the most promising region, then NP backtracks to a larger region. Implementation of the four steps of NP can vary depending on the application. Note however that the region partitioning rule is crucial to the algorithm’s performance. If the partitioning is such that most of the good allocations tend to be clustered in the same sub-region, it is likely that the algorithm will concentrate its effort in these regions. On the other hand, a bad selection of the region partitioning rule may have adverse effects on its performance. Unfortunately, identifying a good region partitioning strategy is not a trivial task. 2.6. Multi-armed Bandit Theory The bandit theory approach [28] addresses a slightly different problem than the stochastic optimization problem described earlier. In this case, rather than allocating several resources to the users, the goal is to determine how to dynamically allocate a single resource among all possible users as to optimize the objective function over time. In the basic version of the multi-armed bandit problem there are N possible choices, each carrying a random reward ri , i = 1, · · · , N derived from a distribution fri (ri ). At the nth iteration, the system reward R(n) is given by the reward of the selected choice, i.e. R(n) = rj , where j is the selected choice. The objective is to optimize the discounted reward over an infinite horizon R= ∞ X β n R(n) n=1 where 0 < β < 1 is a discount factor. The solution suggested by Gittins and Jones [29] involves the association of an index νi (n) on each option and hence select the choice with the largest index. The calculation of the index νi (n) which depends on the underlying distributions of each option, is beyond the scope of this thesis and it is omitted. Variations of the problem appear in [6] and references therein. 10 2.7. Noise Effects For all of the algorithms described above, in order to achieve convergence to the optimal allocation it is necessary that the effect of noise in the performance estimates is gradually reduced. There are several possible approaches that can allow us to achieve this goal. First, if the performance measure of interest satisfies assumption A2.1, then, at every iteration one can increase the length of the observation interval. This approach will be used in the methodologies that will be presented later in Chapters 3 and 4. Another possibility is the approach described in [31]. In this case, the observation interval is kept constant but the number of comparisons is increased. So, a new allocation is adopted only if its performance is found better than the performance of the current allocation more than Mk times, where Mk is a monotonically increasing sequence. 11 Chapter 3 DESCENT ALGORITHMS FOR DISCRETE-RESOURCE ALLOCATION In this chapter we develop a descent optimization algorithm that can be used for discrete resource allocation problems with separable and convex structure. This algorithm is very efficient and can be used in real time (on-line) application. Furthermore, it is shown that it converges to the optimal under both, deterministic and stochastic environments. 3.1. Introduction In this chapter we consider the problem of K identical resources to be allocated over N user classes so as to optimize some system performance measure (objective function). Let the resources be sequentially indexed so that the “allocation” is represented by the K-dimensional vector s = [s1 , · · · , sK ]T where sj ∈ {1, · · · , N } is the user class index assigned to resource j. Let S be the finite set of feasible resource allocations S = {[s1 , · · · , sK ] : sj ∈ {1, · · · , N }} where “feasible” means that the allocation may have to be chosen to satisfy some basic requirements such as stability or fairness. Let Li (s) be the class i cost associated with the allocation vector s. The class of resource allocation problems we consider is formulated as: (RA1) min s∈S N X Li (s) i=1 (RA1) is a special case of a nonlinear integer programming problem (see [41, 59] and references therein) and is in general NP-hard [41]. However, in some cases, depending upon the form of the objective function (e.g., separability, convexity), efficient algorithms based on finite-stage dynamic programming or generalized Lagrange relaxation methods are known (see [41] for a comprehensive discussion on aspects of deterministic resource allocation algorithms). Alternatively, if no a priori information is known about the structure of the problem, then some form of a search algorithm is employed (e.g., Simulated Annealing [1], Genetic Algorithms [39]). 12 3.2. Characterization of the Optimal Allocation In order to specify the class of discrete resource allocation problems we shall study in this chapter, we define ni = K X 1[sj = i] i = 1, · · · , N (3.1) j=1 where 1[·] is the standard indicator function and ni is simply the number of resources allocated to user class i under some allocation s. We shall now make the following assumption: A3.1: Li (s) depends only on the number of resources assigned to class i, i.e., Li (s) = Li (ni ). This assumption asserts that resources are indistinguishable, as opposed to cases where the identity of a resource assigned to user i affects that user’s cost function. Even though A3.1 limits the applicability of the approach to a class of resource allocation problems, it is also true that this class includes a number of interesting problems. Examples include: (a) Buffer allocation in parallel queueing systems where the blocking probability is a function of the number of buffer slots assigned to each server (for details, see [54]), (b) Cellular systems where the call loss probability of each cell depends only on the number of channels assigned to each cell (see also Chapter 7), (c) Scheduling packet transmissions in a mobile radio network, where the resources are the time slots in a transmission frame (see [15, 76, 55]). Under A3.1, we can see that an allocation written as the K-dimensional vector s = [s1 , · · · , sK ], can be replaced by the N -dimensional vector s = [n1 , · · · , nN ]. In this case, the resource allocation problem (RA1) is reformulated as (RA2) min s∈S N X Li (ni ) s.t. i=1 N X ni = K i=i −1)! Although (RA2) is not NP-hard, the state space is still combinatorially explosive (|S| = (K+N K!(N −1)! ), so that an exhaustive search of the state space is generally not feasible. Several off-line algorithms based on the theory of generalized Lagrange multipliers are presented in Chapter 4 of [41] where the optimal solution can be determined in polynomial time. Our objective however, is to solve stochastic resource allocation problems where the cost function is not available in closed-form. This requires that (a) We resort to estimates of Li (ni ) and ∆Li (ni ) for all i = 1, · · · , N over some observation period, and (b) Iterate after every such observation period by adjusting the allocation which, therefore, must remain feasible at every step of this process. It is for this reason that we wish to derive on-line discrete optimization algorithms. We shall first deal with issue (b) above in section 3.3. We will then address issue (a) in section 3.4. In addition to A3.1, we will make the following assumption regarding the cost functions of interest: A3.2: For all i = 1, · · · , N , Li (ni ) is such that ∆Li (ni + 1) > ∆Li (ni ) where ∆Li (ni ) = Li (ni ) − Li (ni − 1), ni = 1, · · · , K with boundary values ∆Li (0) ≡ −∞ and ∆Li (N + 1) ≡ ∞ 13 (3.2) This assumption is the analog of the usual convexity/concavity requirement for the vast majority of gradient-driven optimization over continuous search spaces. It is the assumption that typically allows an extremum to be a global optimum. The alternative is to settle for local optima. From a practical standpoint, most common performance criteria in systems where resource allocation arises are quantities such as throughput, mean delay, and blocking probability which generally satisfy such properties. Next, we present two key results that will provide a stopping condition for the proposed algorithm. Theorem 3.2.1 Under assumptions A3.1 - A3.2, an allocation ¯s = [¯ n1 , · · · , n ¯ N ] is a global optimum (i.e., a solution of (RA2)) if and only if ∆Li (¯ ni + 1) ≥ ∆Lj (¯ nj ) for any i, j = 1, · · · , N (3.3) The proof of the lemma in included in Appendix B. Note that Theorem 3.2.1 gives a necessary and sufficient condition that the optimal allocation must satisfy in terms of the cost differences ∆Li (·) for i = 1, · · · , N in only a small set of feasible allocations, namely the neighborhood of the optimal allocation B(s∗ )1 . Also, s∗ denotes the solution of the optimization problem (RA2), i.e., s∗ is such that L(s∗ ) ≤ L(s) for all s ∈ S where S is redefined as ( ) S= s = [n1 , · · · , nN ] | N X ni = K . i=i Next, we will derive a different necessary and sufficient condition for global optimality in solving (RA2), expressed in terms of maxi=1,···,N {∆Li (ni )}. As will be seen in the proof of Theorem 3.2.2, necessity still relies on assumptions A3.1-A3.2 alone, but sufficiency requires an additional technical condition: A3.3: Let [¯ n1 , · · · , n ¯ N ] be an allocation such that: max {∆Li (¯ ni )} ≤ max {∆Li (ni )} i=1,···,N i=1,···,N for all s = [n1 , · · · , nN ] ∈ S. If i∗ = arg maxi=1,···,N {∆Li (¯ ni )}, then ∆Li∗ (¯ ni∗ ) > ∆Lj (¯ nj ) for ∗ all j = 1, · · · , N , j 6= i This assumption guarantees a unique solution to (RA2) and, as mentioned above, it is only used to prove sufficiency of Theorem 3.2.2. If the condition is violated, i.e. there is a set of optimal allocations, then, in the deterministic case, the algorithm will converge to one member of the set dependent on the initial allocation. In the stochastic case, the algorithm will oscillate between the members of the set as mentioned in the remark at the end of Section 3.4. Theorem 3.2.2 Under assumptions A3.1-A3.2, if an allocation ¯s = [¯ n1 , · · · , n ¯ N ] is a global optimum (i.e., a solution of (RA2)) then: max {∆Li (¯ ni )} ≤ max {∆Li (ni )} i=1,···,N i=1,···,N (3.4) for all s ∈ S. If in addition A3.3 holds, then (3.4) also implies that ¯s is a solution of (RA2). 1 B(s) = {x : x = s + ei − ej , i, j = 1, · · · , N } and ei is an N -dimensional vector with all of its elements equal to 0 except the ith one which is equal to 1. 14 The proof of the lemma in included in Appendix B Note that Theorem 3.2.2 provides a characterization of the optimal allocation in terms of only the largest ∆Li (·) element in the allocation. What is interesting about condition (3.4) is that it can be interpreted as the discrete analog to continuous variable optimization problems. In such problems the partial derivatives of the cost function with respect to control variables must be equal (e.g., see [27]). In order to derive a similar result for a discrete optimization problem, one must replace derivatives by finite cost differences, such as the quantities ∆Li (·), i = 1, · · · , N , defined in (3.2) and keep them as close as possible. This is expressed in terms of the maximum value of such finite differences at the optimal point in condition (3.4). Having established some necessary and sufficient conditions that characterize the optimal allocation, namely Theorems 3.2.1 and 3.2.2, our next task is to develop an algorithm that iteratively adjusts allocations on-line. These conditions then serve to determine a stopping condition for such an algorithm, guaranteeing that an optimal allocation has been found. 3.3. Deterministic Descent On-Line Optimization Algorithm In this section, we present an iterative process referred to as deterministic descent optimization process (D-DOP), for determining the solution to (RA2). In particular, we generate sequences {ni,k }, k = 0, 1, · · · for each i = 1, · · · , N as follows. We define a set C0 = {1, · · · , N } and initialize all sequences so that an allocation s0 = [n1,0 , · · · , nN,0 ] is feasible. Then, let ni,k − 1 ni,k+1 = n i,k n i,k and ( Ck+1 = +1 if i = i∗k and δk > 0 if i = jk∗ and δk > 0 otherwise Ck − {j ∗ } Ck if δk ≤ 0 otherwise (3.5) (3.6) where i∗k , jk∗ and δk are defined as follows: i∗k = arg max{∆Li (ni,k )} (3.7) jk∗ = arg min{∆Li (ni,k )} (3.8) δk = ∆Li∗k (ni∗k ,k ) − ∆Ljk∗ (njk∗ ,k + 1) (3.9) i∈Ck i∈Ck To complete the specification of this process, we will henceforth set ∆Li (0) ≡ −∞ for all i = 1, · · · , N . Finally, note that ties in equations (3.7) and (3.8) (i.e., if there are more than one indices that qualify as either i∗k or jk∗ ) can be broken arbitrarily but for simplicity, we will adopt the following convention: If i∗k = p and δk ≤ 0, then i∗k+1 = p (3.10) This statement is trivial if the maximization in (3.7) gives a unique value. If, however, this is not the case then we simply leave this index unchanged as long as δl ≤ 0 for l > k, which implies that all ∆Li (ni,k ) values remain unchanged. 15 3.3.1. Interpretation of D-DOP Looking at (3.7), i∗k identifies the user “most sensitive” to the removal of a resource among those users in the set Ck , while in (3.8), jk∗ identifies the user who is “least sensitive”. Then, (3.5) forces a natural exchange of resources from the least to the most sensitive user at the kth step of this process, provided the quantity δk is strictly positive (an interpretation of δk is provided below). Otherwise, the allocation is unaffected, but the user with index jk∗ is removed from the set Ck through (3.6). Thus, as the process evolves, users are gradually removed from this set. As we will show in the next section, the process terminates in a finite number of steps when this set contains a single element (user index), and the corresponding allocation is a globally optimal one. As defined in (3.9), δk represents the “potential improvement” (cost reduction) incurred by a transition from allocation sk to sk+1 . That is, δk = L(sk ) − L(sk+1 ) (3.11) which is seen as follows: L(sk ) − L(sk+1 ) = N X i=1 Li (ni,k ) − N X Li (ni,k+1 ) i=1 = Li∗k (ni∗k ,k ) + Ljk∗ (njk∗ ,k ) − Li∗k (ni∗k ,k − 1) − Ljk∗ (njk∗ ,k + 1) = ∆Li∗k (ni∗k ,k ) − ∆Ljk∗ (njk∗ ,k + 1) = δk Note that if δk > 0, which implies that the cost will be reduced by allocation sk+1 , then the reallocation is implemented in (3.5). If, on the other hand, δk ≤ 0, this implies no cost reduction under the candidate allocation sk+1 , and sk remains unchanged as seen in (3.5). 3.3.2. Properties of the Process D-DOP We begin by establishing in Lemma 3.3.1 below a number of properties that the sequences {ni,k } and {Ck } in (3.5) and (3.6) respectively satisfy. Based on these properties, we will show that {sk }, where sk = [n1,k , · · · , nN,k ], converges to a globally optimal allocation. We will also use them to determine an upper bound for the number of steps required to reach this global optimum. Lemma 3.3.1 The D-DOP process defined by (3.5)-(3.9) is characterized by the following properties: P1. ∆Li∗k (·) is non-increasing in k = 0, 1, · · ·, that is, ∆Li∗k+1 (ni∗k+1 ,k+1 ) ≤ ∆Li∗k (ni∗k ,k ) for all k = 0, 1, · · · (3.12) P2. ∆Ljk∗ (·) is non-decreasing in k = 0, 1, · · ·, that is, ∗ ∗ ∆Ljk+1 (njk+1 ,k+1 ) ≥ ∆Ljk∗ (njk∗ ,k ) for all k = 0, 1, · · · (3.13) ∗ = p and p 6= i∗ for all k < l < m. P3. Let p = i∗k and suppose there exists some m > k such that jm l Then, Cm+1 = Cm − {p} (3.14) 16 P4. Let p = jk∗ and suppose there exists some m > k such that i∗m = p and p 6= jl∗ for all k < l < m. Then, there exists some q, 1 ≤ q ≤ N − 1, such that ( Cm+q+1 = Cm+q − {p} {p} if |Cm+q+1 | > 1 if |Cm+q+1 | = 1 (3.15) P5. Let i∗k = p. Then, np,m ≤ np,k for any k = 0, 1, · · · and for all m > k (3.16) np,m ≥ np,k for any k = 0, 1, · · · and for all m > k (3.17) P6. Let jk∗ = p. Then, The proof of the lemma in included in Appendix B. Properties P3 and P4 are particularly important in characterizing the behavior of D-DOP and in establishing the main results of this section. In particular, P3 states that if any user p is identified ∗ , m > k, then this user is immediately removed from as i∗k at any step k of the process and as jm the Cm set. This also implies that np,m is the number of resources finally allocated to p. Property P4 is a dual statement, with a different implication. Once a user p is identified as jk∗ at some step k and as i∗m , m > k, then there are two possibilities: Either p will be the only user left in Cl , l > m and, therefore, the allocation process will terminate, or p will be removed from Cl for some m < l < m + N − 1. This discussion also serves to point out an important difference between P5 and P6, which, at first sight, seem exact duals of each other. In P5, a user p = i∗k for some k will never in the future take any resources from other users. On the other hand, in P6 it is not true that a user p = jk∗ will never in the future give away any resources to other users; rather, as seen in P4, user p may give away at most one resource to other users. This happens if δm > 0 when p = i∗m , m > k, as is clear from the proof of P4, since np,m = np,k+1 = np,k + 1 and then np,m+1 = np,m − 1. 3.3.3. Convergence of the D-DOP Process The next result establishes an upper bound in the number of steps required for the process D-DOP to converge to a final allocation where, a final allocation sL is defined to be one at step L with |CL | = 1. Furthermore, in this section we show that the final allocation is also a global optimum. ¯ in M steps such that |C| ¯ = 1 and Lemma 3.3.2 The D-DOP process reaches a final state (¯s, C) M ≤ K + 2(N − 1). The proof of the lemma is included in Appendix B. Theorem 3.3.1 Let ¯s = [¯ n1 , · · · , n ¯ N ] be the final allocation of the D-DOP process. Then, ¯s is a global optimum (i.e., a solution of (RA2)). The proof of the theorem is also included in Appendix B. Corollary 3.3.1 The D-DOP process defines a descent algorithm, i.e., L(sk ) ≥ L(sl ) for any l > k The proof of the corollary follows immediately from equations (3.5) and (3.9) and the fact that δk = L(sk ) − L(sk+1 ) in (3.11). 17 3.4. Stochastic On-Line Optimization Algorithm In this section, we turn our attention to discrete resource allocation performed in a stochastic setting. In this case, as mentioned in Chapter 2, the cost function L(s) is usually an expectation whose exact value is difficult to obtain (except for very simple models). We therefore resort to estimates of L(s) which may be obtained through simulation or through direct on-line observation of a system. In ˆ t (s) an estimate of L(s) based on observing a sample path for a time either case, we denote by L period of length t. We are now faced with a problem of finding the optimal allocation using the noisy ˆ t (s). information L It should be clear that D-DOP described by equations (3.5)-(3.9) does not work in a stochastic ˆ t (s). For instance, suppose that δk > 0; environment if we simply replace L(s) by its estimate L however, due to noise, we may obtain an estimate of δk , denoted by δˆk , such that δˆk ≤ 0. In this case, rather than reallocating resources, we would remove a user from the C set permanently. This implies that this user can never receive any more resources, hence the optimal allocation will never be reached. Two modifications are necessary. First, we will provide a mechanism through which users can re-enter the C set to compensate for the case where a user is erroneously removed because of noise. Second, we will progressively improve the estimates of the cost differences ∆L(s) so as to eliminate the effect of estimation noise; this can often be achieved by increasing the observed sample path length over which an estimate is taken. We will henceforth denote the length of such a sample path at the kth iteration of our process by f (k). The following is the modified stochastic descent optimization process ((S-DOP) adjusted for a stochastic environment. The state is now denoted by {(ˆsk , Cˆk )}, with ˆsk = [ˆ n1,k , ..., n ˆ N,k ]. After proper initialization, at the kth iteration we set: ˆ i,k − 1 n n ˆ i,k+1 if i = ˆi∗k and δˆk (ˆi∗k , ˆjk∗ ) > 0 = n ˆ + 1 if i = ˆjk∗ and δˆk (ˆi∗k , ˆjk∗ ) > 0 i,k n ˆ i,k otherwise and Cˆk+1 = ˆ jk∗ } Ck − {ˆ if δˆk (ˆi∗k , ˆjk∗ ) ≤ 0 if |Cˆk | = 1 otherwise C 0 ˆ Ck (3.18) (3.19) where ˆ f (k) (ˆ ˆi∗k = arg max{∆L ni,k )} i (3.20) ˆ f (k) (ˆ ˆjk∗ = arg min{∆L ni,k )} i (3.21) i∈Cˆk i∈Cˆk ˆ f∗(k) (ˆ ˆ f∗(k) (ˆ δˆk (ˆi∗k , ˆjk∗ ) = ∆L nˆi∗ ,k ) − ∆L nˆj ∗ ,k + 1) ˆi ˆ j k k k k (3.22) It is clear that equations (3.18)-(3.22) define a Markov process {(ˆsk , Cˆk )}, whose state transition probability matrix is determined by ˆi∗k , ˆjk∗ , and δˆk (ˆi∗k , ˆjk∗ ). Before proceeding, let us point out that the only structural difference in S-DOP compared to the deterministic case D-DOP of the previous section occurs in equation (3.19), where we reset the Cˆk set every time that it contains only one element. By doing so, we allow users that have been erroneously removed from the Cˆk set due to 18 noise to re-enter the user set at the next step. Another difference is of course that the actual values of ˆ f (k) (·). An implementation of S-DOP is included all ∆Li (·) are now replaced by their estimates, ∆L i in Appendix A. 3.4.1. Properties of the S-DOP Process Before we begin describing the properties of S-DOP, let as make some assumptions on the behavior of the objective function under consideration. As stated earlier, the second modification we impose is to eliminate the effect of estimation noise by increasing the observed sample path length as the number of iterations increases. In addition to the ergodicity assumption (A2.1) we make the following assumption. A3.4: Let δk (i, j) = ∆Li (ˆ ni,k ) − ∆Lj (ˆ nj,k + 1). For every δk (ˆi∗k , ˆjk∗ ) = 0, there is a constant p0 such that Pr[δˆk (ˆi∗k , ˆjk∗ ) ≤ 0 | δk (ˆi∗k , ˆjk∗ ) = 0, (ˆsk , Cˆk ) = (s, C)] ≥ p0 > 0 for any k and any pair (s, C). Assumption A3.4 guarantees that an estimate does not always give one-side-biased incorrect information. This assumption is mild and it usually holds in the context of discrete-event dynamic systems where such problems arise. Now we are ready to describe some useful properties of the process {(ˆsk , Cˆk )} in the form of the following lemmas, the proofs of which are included in Appendix B. These properties pertain to the asymptotic behavior of probabilities of certain events crucial in the behavior of {(ˆsk , Cˆk )}. First, let dk (s, C) = 1 − Pr[L(ˆsk+1 ) ≤ L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] (3.23) so that [1 − dk (s, C)] is the probability that either some cost reduction or no change in cost results from the kth transition in our process (i.e., the new allocation has at most the same cost). The next lemma shows that the probability of this event is asymptotically 1, i.e., our process corresponds to an asymptotic descent resource allocation algorithm. Lemma 3.4.1 For any s = [n1 , n2 , ..., nN ] ∈ S and any C, lim dk (s, C) = 0 (3.24) dk = sup max di (s, C). (3.25) k→∞ Moreover, define i≥k (s,C) Then dk ≥ dk (s, C), dk is monotone decreasing, and lim dk = 0. (3.26) k→∞ Next, given any state (ˆsk , Cˆk ) reached by the S-DOP process, define ½ Amax k ¾ = j | ∆Lj (ˆ nj,k ) = max{∆Li (ˆ ni,k )} , i 19 (3.27) ½ Amin k ¾ = j | ∆Lj (ˆ nj,k ) = min{∆Li (ˆ ni,k )} . (3.28) i Observe that Amax and Amin are, respectively, the sets of indices i∗k and jk∗ defined in (3.7) and (3.8) k k of the deterministic optimization process (with exact measurement). Recall that i∗k , jk∗ need not be unique at each step k, hence the need for these sets. We then define h i h i ak (s, C) = 1 − Pr ˆi∗k ∈ Amax | (ˆsk , Cˆk ) = (s, C) , k (3.29) bk (s, C) = 1 − Pr ˆjk∗ ∈ Amin | (ˆsk , Cˆk ) = (s, C) . k (3.30) Here, [1−ak (s, C)] is the probability that our stochastic resource allocation process at step k correctly identifies an index ˆi∗k as belonging to the set Amax (similarly for [1 − bk (s, C)]). k Lemma 3.4.2 Suppose that Assumption A2.1 holds. Then, for every pair (s, C), we have lim ak (s, C) = 0, lim bk (s, C) = 0 k→∞ (3.31) k→∞ Moreover, define ak = sup max ak (s, C), bk = sup max bk (s, C). i≥k (s,C) (3.32) i≥k (s,C) Then ak ≥ ak (s, C), bk ≥ bk (s, C), both ak and bk are monotone decreasing, and lim ak = 0, k→∞ lim bk = 0. (3.33) k→∞ The proof of the first part of the lemma follows immediately from Lemma 2.2.1 given the definition of the sets Amax and Amin k k . The second part then follows from the fact that, by their definitions, ak and bk are monotone decreasing. The last asymptotic property we need establishes the fact that there will be an improvement (i.e., strictly lower cost) to an allocation at step k if that allocation is not optimal. However, this improvement may not occur within a single step; rather, we show in Lemma 3.4.3 that such improvement may require a number of steps, αk , beyond the kth step, where, αk is an increasing sequence that satisfies certain technical requirements (See Appendix B.7). Next, define: i h ek (s, C) = 1 − Pr L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C) . (3.34) and observe that [1 − ek (s, C)] is the probability that strict improvement (i.e., strictly lower cost) results when transitioning from a state such that the allocation is not optimal to a future state αk steps later. We will establish in Lemma 3.4.3 that this probability is asymptotically 1, as shown in the next lemma the proof of which is included in Appendix B. Lemma 3.4.3 Suppose that the ergodicity assumption (A2.1) as well as A3.4 hold. For any allocation s = [n1 , · · · , nN ] 6= s∗ and any set C, lim ek (s, C) = 0. (3.35) ek = sup max ei (s, C). (3.36) k→∞ Moreover, define i≥k s∈S,C Then ek ≥ ek (s, C), ek is monotone decreasing, and lim ek = 0. k→∞ 20 (3.37) 3.4.2. Convergence of the S-DOP Process With the help of the properties established in the previous section, we can prove the following theorem on the convergence of the S-DOP process the proof of which is included in Appendix B. Theorem 3.4.1 Suppose that the ergodicity assumption A2.1 and A3.4 hold and that the optimum s∗ is unique. Then the S-DOP process described by equations (3.18)-(3.22) converges in probability to the optimal allocation s∗ . Remark: If the optimal allocation s∗ is not unique, the analysis above can be extended to show that convergence is to a set of “equivalent” allocations as long as each optimum is neighboring at least one other optimum. When this arises in practice, what we often observe is oscillations between allocations that all yield optimal performance. 3.4.3. A Stronger Convergence Result By proper selection of the sample path length f (k), i.e., the kth estimation period, and under some additional mild assumptions of Lemmas 2.2.2 we can show that the following lemma holds: ˆ t (ni ) satisfies the assumptions of Lemma 2.2.2. Lemma 3.4.4 Assume that, for every i, the estimate L i Then, for any i, j, i 6= j, µ ¶ ˆ ti (ni ) ≥ ∆L ˆ tj (nj )] = O Pr[∆L 1 t µ ¶ and ˆ ti (ni ) < ∆L ˆ tj (nj )] = 1 − O Pr[∆L 1 t provided that ∆Li (ni ) < ∆Lj (nj ). Using Lemma 3.4.4, we can show that the S-DOP process converges almost surely. This result is formally stated in the following theorem the proof of which is included in Appendix B. Theorem 3.4.2 Suppose A3.1-A3.3 and the assumptions of Lemma 3.4.4 hold. If f (k) ≥ k 1+c for some constant c > 0, then the S-DOP process converges almost surely to the global optimal allocation. 3.5. Future Directions The algorithms described in this chapter are easy to implement either on-line or off-line and, are robust with respect to estimation noise. In addition, they converge fast, and as shown in [23], for the class of regenerative systems they converge exponentially fast. Given those advantages, it is interesting to see if such algorithms can work for more general systems that do not fall under (RA2) and/or systems that violate assumptions A3.1 and A3.2 (i.e., systems that are not convex nor separable). The first question that rises is what would happen if we simply apply either D-DOP or S-DOP on a general system. The answer is that there are two potential problems: 1. The algorithm might oscillate between two or more allocations and so it will never converge. 21 2. The algorithm may converge to some allocation other than the global optimum (we will referred to such allocations as “local” optima). For the first problem, we can find a quick fix using the properties that were derived in Section 3.3.2. Specifically, properties P3 and P5 state that once a user has given one or more resources, it cannot receive any resource back. Enforcing such a policy will guarantee that the algorithm will converge to an allocation, however, there is no guarantee that such allocation is going to be the global optimum. To solve the second problem, i.e., get out of the local optimum, we can use principles from Random Search techniques [37]. The basic idea is to randomly pick an initial allocation and then let D-DOP or S-DOP evolve until they reach their final allocation. Once this allocation is reached, randomly pick a new initial allocation and repeat the same process. Figure 3.1 shows several sample paths of this algorithm when it is used to find the minimum to the deterministic function shown below: 10 if x1 ≤ 2 5| cos(0.01πx x ) + cos(0.005πx )| if 2 < x ≤ 15 2 3 4 1 f (x1 , x2 , x3 , x4 ) = 5 5 1500 + ) if 15 < x 1 ≤ 25 1+x2 1+x3 (2x1 +3x2 +4x3 +5x4 )2 9 (3.38) if x1 > 25 P where x1 , · · · , x4 are integers such that 4i=1 xi = 30 and xi ≥ 0, i = 1, · · · , 4. The minimum value for this problem is 0 and it occurs at eight allocations out of the possible 5, 456. If we apply the D-DOP scheme as presented in Section 3.3 then it does not converge because the cosine function causes the algorithm to oscillate. Thus we enforce P3 and P5, (i.e., we do not allow a user to get a resource if it has given up one or more resources). As seen in Figure 3.1, the modified D-DOP starts at a bad allocation and quickly converges to a local minimum. At this point, in order to get out of the local minimum, it randomly selects different initial allocations and implements D-DOP until one results in an allocation with better performance. 3.6. Summary In this chapter we consider a class of DES with separable convex functions, i.e. problems of the form of (RA2) that satisfy assumptions A3.1 and A3.2. For this class of systems, we derived necessary and sufficient conditions that the optimal allocation must satisfy. Based on these conditions, we developed an optimization algorithm which in a deterministic environment, i.e., when the objective function is known with certainty, yields the optimal allocation in a finite number of steps. Subsequently, we adapted the optimization algorithm for stochastic environments, i.e., when only noisy estimates of the performance measure is available, and showed that it converges to the optimal allocation in probability and, under some mild assumptions, almost surely as well. Finally, these algorithms have several desirable properties. They are easy to implement and can be used for both, on-line and off-line applications. They converge fast and are robust with respect to estimation noise. 22 2 1.6 Cost 1.2 0.8 0.4 0 0 50 100 Iteration Figure 3.1: Evolutions of the modified D-DOP 23 150 Chapter 4 INCREMENTAL ALGORITHMS FOR DISCRETE-RESOURCE ALLOCATION Not all problems have a nice separable structure. In this chapter we develop an incremental optimization algorithms that can be applied to systems with non-separable structure. The optimality properties of the algorithms are proven under a necessary and sufficient “smoothness condition”. Furthermore, it is shown that the incremental algorithms converge to the optimal allocation in probability and, under additional mild conditions, almost surely as well. 4.1. Problem Formulation In this chapter we consider systems with performance measures J(x). The major difficulty here is due to the fact that the performance measures are not separable. In other words, one cannot P express J(x) as J(x) = N i=1 Ji (xi ), therefore the algorithms described in the previous chapter do not directly apply. We will make use of the following definitions. First, ei = [0, · · · , 0, 1, 0, · · · , 0] is an N -dimensional vector with all of its elements zero except the ith element which is equal to 1. Second, ∆Ji (x) = J(x + ei ) − J(x) (4.1) is the change in J(x) due to the addition of a new resource to the ith element of an allocation x = [x1 , · · · , xN ]. In other words, it is the sensitivity of J(x) with respect to xi . Finally, let ( Ak = x : N X ) xi = k, xi ≥ 0 , k = 0, 1, · · · i=1 be the set of all possible allocations of k resources to N stages. Using the above definitions, the optimization problem is formally stated as: (RA3) max J(x) x∈AK In addition, we define the following conditions that apply on J(x): 24 • Smoothness Condition or Condition (S): If J(x∗ ) ≥ J(x) for some x∗ ∈ Ak and any x ∈ Ak , k = 1, · · · , K, then max J(x∗ + ei ) ≥ max J(x + ei ) i=1,...,N i=1,...,N (4.2) ¯ • Complementary Smoothness Condition or Condition (S): If J(x∗ ) ≤ J(x) for some x∗ ∈ Ak and any x ∈ Ak , k = 1, · · · , K, then min J(x∗ + ei ) ≤ min J(x + ei ) i=1,...,N i=1,...,N (4.3) • Uniqueness Condition or Condition (U): Let i∗ = arg maxi=1,...,N {∆Ji (x)}, then ∆Ji∗ (x) > ∆Jj (x), (4.4) for any x ∈ Ak , k = 1, · · · , K, and any j 6= i∗ . ¯ imply that if in an optimal allocation of k resources a user i is given ni Conditions (S) and (S) resources, then, in an optimal allocation of k + 1 resources, user i will receive at least ni resources. These conditions might sound restrictive, but as indicated in a later chapter, they are satisfied by a wide range of systems (see Section 6.2). Condition (U) requires that at every allocation the maximum finite difference as defined in (4.1) is unique. This is a rather technical condition, as will become clear in the sequel, and it may be relaxed as shown in Section 4.2.3. 4.2. Deterministic Case Problem (RA3) falls in the class of discrete resource allocation problems. Under conditions (S) and (U), the following simple incremental allocation process similar to one found in [41] provides an optimal allocation of K resources in K steps. 4.2.1. Deterministic Incremental Optimization Algorithm (DIO) Define the sequence {xk }, k = 0, · · · , K such that xk+1 = xk + ei∗k (4.5) i∗k = arg max {∆Ji (xk )} (4.6) where i=1,...,N and x0 := [0, · · · , 0]. After K steps, xK is the optimal solution of (RA3) as shown in the theorem that follows. Theorem 4.2.1 For any k = 0, 1, 2, · · ·, xk defined in (4.5) yields a solution to problem (RA3) if and only if J(x) satisfies conditions (S) and (U). 25 The proof of this theorem is included in Appendix C. It should come as no surprise that a similar algorithm will deliver an allocation that minimizes ¯ and (U). an objective function that satisfies conditions ( S) 4.2.2. Complementary Deterministic Incremental Optimization Algorithm (DIO) Consider the complementary (RA3) problem min J(x) (RA3) x∈AK To solve (RA3) define the sequence {xk }, k = 0, · · · , K such that xk+1 = xk + ei∗k (4.7) i∗k = arg min {∆Ji (xk )} (4.8) where i=1,...,N and x0 := [0, · · · , 0]. After K steps, xK is the optimal solution of (RA3) as shown in the theorem that follows. Theorem 4.2.2 For any k = 0, 1, 2, · · ·, defined in xk in (4.7) yields a solution to problem (RA3) ¯ and (U). if and only if J(x) satisfies conditions (S) The proof this theorem is similar to the proof of Theorem 4.2.1 and it is omitted. Remarks: If there are K available resources to be allocated to N stages, then the DIO as well as the DIO processes require K steps before they deliver the optimal allocation. In contrast, using exhaustive search −1)! requires a number of steps which is combinatorially explosive (K+N (N −1)!K! . It is possible to relax the Uniqueness condition (U) through a straightforward extension of the DIO algorithm as described in the next section. 4.2.3. Extension of the Incremental Optimization Algorithms ¯ k after k steps and assume that (4.6) gives i∗k = i = j, Suppose that the sequence {xk } in (4.5) yields x for two indices i, j ∈ 1, · · · , N . In this case, it is clear that J(¯ xk +ei ) = J(¯ xk +ej ), but the process has ¯k. no way of distinguishing between i and j in order to define a unique new state xk+1 given xk = x Note also that random selection cannot guarantee convergence to the optimal since it is possible ¯ k + ei or x ¯ k + ej ) can yield the that at the next iteration only one of the two allocations (either x optimum. Since there is inadequate information to choose between i and j, it is natural to postpone the decision until more information is available. To achieve this we modify the process as described next, by using a recursion on a set of allocations Uk ∈ Ak . In particular, we define a sequence of sets {Uk }, k = 0, · · · , K such that Uk+1 = {xk + ei | ∆Ji (xk ) = ∆Ji∗k (xk ), i = 1, · · · , N, xk ∈ Uk } 26 (4.9) where i∗k = arg max {∆Ji (xk )}. (4.10) i=1,...,N xk ∈Uk and U0 = {x0 }, x0 = [0, 0, · · · , 0]. After K steps, it is easy to see that any allocation in UK is the optimal solution to (RA3). The extra cost incurred by this scheme compared to (4.5)-(4.6) involves storing additional information. It is straight forward to show that a similar extension applies to (DIO) and so it is omitted. 4.3. Stochastic Case In this section we focus our attention to the resource allocation problem in a stochastic environment. Following the definitions of Chapter 2, we assume that the performance measure is of the form of an expectation, J(x) = E[L(x)], where L(x) is a sample function used as the noisy performance estimate. The problem then, is to determine the optimal resource allocation based on a scheme similar to the DIO algorithm defined by (4.5)-(4.6) now driven by estimates of J(xk ). In particular, let Jˆt (xk ) denote a noisy estimate of J(xk ) obtained through simulation or on-line observation of the system over an “estimation period” t. Clearly, the DIO algorithm (as well as DIO and their extensions of Section 4.2.3) can no longer guarantee convergence in such a stochastic setting. For instance, suppose that at the kth step of the allocation process i∗k = j, however, due to noise, we obtain an estimate ˆi∗k = m 6= j. In this case, the mth stage will get an additional resource, whereas it is possible that at the optimal allocation the mth stage has only as many resources as it had prior to the kth iteration. Since there is no way of reallocating resources to another stage, the optimal allocation will never be reached. With this observation in mind, we introduce a number of modifications to the DIO algorithm. First, as in the previous chapter, there should be a mechanism through which resources erroneously allocated to some stage are reallocated to other stages. Second, it must be possible to progressively improve the performance estimates so as to eliminate the effects of estimation noise. Toward this goal, let f (l) denote the length of the sample path on the lth iteration and let it be such that liml→∞ f (l) = ∞. We then define a stochastic process {ˆ xk,l }, k = 0, · · · , K, l = 1, 2, · · ·, as follows: ˆ k+1,l = x ˆ k,l + eˆi∗ k = 0, · · · , K − 1 (4.11) x k,l ˆ K,l , the process is reset to for all l = 1, 2, · · ·, and every K iterations, i.e. after allocation x ˆ 0,l+1 = [0, · · · , 0] l = 1, 2, · · · x where ˆi∗k,l = arg n max i=1,...,N f (l) ∆Jˆi (4.12) o (ˆ xk,l ) . (4.13) We will subsequently refer to the above allocation scheme as the Stochastic Incremental Optimization (SIO) algorithm. Note that in order to derive the stochastic version of DIO, (SIO) we simply replace the max operator in (4.13) with a min operator. Again we make the ergodicity assumption A2.1 about the performance estimates Jˆt (x) and prove the following result. Theorem 4.3.1 For any performance measure J(x) that satisfies assumption A2.1 and conditions (S) and (U), {ˆ xK,l } converges in probability to the global optimal allocation, as l → ∞. The proof of the theorem is included in Appendix C. 27 4.3.1. Stronger Convergence Results Under some additional mild conditions and by properly selecting the “observation interval” t = f (l), it is possible to show that the SIO process converges to the optimal allocation almost surely. First, ˆ assume that that J(x) satisfies the conditions of Lemma 2.2.2, therefore Lemma 3.4.4 also holds. As in Chapter 3 we prove the following result the proof of which is included in Appendix C. Theorem 4.3.2 For any performance measure J(x) that satisfies conditions (S) and (U) and the conditions of Lemma 3.4.4, if the observation interval f (l) ≥ l1+c for some constant c > 0, the process {ˆ xk,l } converges to the global optimal allocation almost surely. Note that it is straight forward to derive similar convergence results for SIO and so they are omitted. 4.4. Discussion on the Incremental Algorithm First note that if the Uniqueness condition (U) is not satisfied, then there is no guarantee that neither SIO nor SIO will converge to the optimal in any sense. In this case, one can proceed in a way similar to the set-iterative process described in Section 4.2.3. For example, one can include in the set Uk all the allocations whose estimated performance lies within some distance rk from the observed maximum/minimum where rk is a decreasing sequence. Second, using the “Law of Diminishing Returns” from Economic Theory (e.g. see [2]) it is expected that as the number of resources increases, then, assuming no economies of scale, the marginal impact of every additional resource on the objective function will be decreasing. This implies the following: 1. ∆J(xk∗ ) = J(x∗k ) − J(x∗k−1 ), where x∗k is the solution to (RA3) when there are k available resources, is a decreasing function of k. 2. As k increases, the set of allocations that exhibit optimal or near optimal performance grows larger. 3. Pr[ˆ xk,l = x∗k ] is a decreasing function of k, since the number of near optimal allocations increases and hence the distance between two or more allocations increases. These repercussions have practical implications on the usage of the developed incremental algorithms. First, if the total number of resources is not fixed, then one can determine the maximum number of resources based on a cost-benefit analysis. In other words, one can continue adding resources as long as the marginal benefit of the new resource ∆J(x∗k ) is greater than the cost of the resource. Second, there are implications in terms of the convergence of the stochastic version of the algorithm. For small values of k, there are less optimal allocations and the difference of the performance of such allocations with any other allocations is large. As a consequence, the probability of identifying the optimal allocation under a small k is also large. As k increases, there are more near optimal allocations hence the probability of identifying the true optimal decreases. As a result the algorithm may oscillate in a set of allocations with near optimal performance. This can be used to make the algorithm even more efficient. For example, in SIO every K steps the algorithm is reset to x0 = [0, · · · , 0] as shown in (4.12). Rather than resetting to x0 for all l = 1, 2, · · ·, it may be more 28 efficient to reset to x0 only for l = n, 2n, · · ·, where n > 1 is an integer. For all other iterations we may reset to an allocation xz , z < K such than xiz <= x ˆiK for all i = 1, · · · , N . That is, find an allocation that is common to all allocations that are picked as optimal and make it the initial allocation. 4.5. Summary In this chapter we consider another class of DES that does not have a separable functions but ¯ conditions. For which satisfy either the “smoothness” (S) or the “complementary smoothness”(S) this class of systems, we developed an incremental optimization algorithm which in a deterministic environment, yields the optimal allocation in a finite number of steps. Finally, we adapted the optimization algorithm for use in stochastic environments and showed that it converges to the optimal allocation in probability and, under some mild assumptions, almost surely as well. 29 Chapter 5 PERTURBATION ANALYSIS The optimization algorithms presented in the previous two chapters are contingent upon availability of the value of the finite difference ∆J or at least the availability of an unbiased estimate of its value. This implies the observation of at least two sample paths, one under parameter x and one under x + ei . Hence, at least two sample paths are required to obtain one such ∆. Our objective however, is to use the optimization algorithms on-line. In other words, we would like to develop controllers that would be able to observe a real system and automatically reallocate the system’s resources to maintain optimal performance. Note that, when observing a real system operating under some parameter x, it is usually straight forward to obtain one of the necessary estimates needed to ˆ calculate the required difference, i.e., J(x). The objective of this chapter is to develop techniques that would enable us to predict the system’s performance under other parameters while observing the system under x without actually switching to the other parameter. 5.1. Introduction It is by now well-documented in the literature that the nature of sample paths of DES can be exploited so as to extract a significant amount of information, beyond merely an estimate of J(θ). It has been shown that observing a sample path under some parameter value θ allows us to efficiently obtain estimates of derivatives of the form dJ/dθ which are in many cases unbiased and strongly consistent (e.g., see [12, 30, 35] where Infinitesimal Perturbation Analysis (IPA) and its extensions are described). Similarly, Finite Perturbation Analysis (FPA) has been used to estimate finite differences of the form ∆J(∆θ) or to approximate the derivative dJ/dθ through ∆J/∆θ when other PA techniques fail. In the discrete-resource allocation context, of particular interest are often parameters θ that take values from a discrete set {θ1 , · · · , θm } (e.g., queueing capacities), in which case we desire to effectively construct sample paths under any θ1 , · · · , θm by just observing a sample path under one of these parameter values. In this chapter, we develop Concurrent Estimation (CE) which is a general approach for constructing the sample paths under any parameter {θ1 , · · · , θm } using observations on a single sample path under θ. Subsequently, we develop a Finite Perturbation Scheme which takes advantage of the special structure of some systems to construct sample paths under neighboring allocations, i.e. allocations that result from adding a resource to one of the users. 30 5.2. Problem Definition We will concentrate on the general sample path constructability problem for DES. That is, given a sample path under a particular parameter value θ, the problem is to construct multiple sample paths of the system under different values using only information available along the given sample path. A solution to this problem can be obtained when the system under consideration satisfies the Constructability Condition (CO) presented in [11, 10]. Suppose that a sample path of the system is observed under parameter θ and we would like to construct the corresponding sample path under some θ0 . Then (CO) consists of two parts. The first part is the Observabilty Condition (OB) which states that at every state the feasible event set of the constructed sample path must be a subset of the feasible event set of the observed sample path. The second part is a requirement that all lifetimes of feasible events conditioned on event ages are equal in distribution. Unfortunately, CO is not easily satisfied. Nonetheless, two methods have been developed that solve CO for systems with exponential lifetime distributions. In particular, the Standard Clock (SC) approach [74] solves the sample path constructability problem for models with exponentially distributed event lifetimes by exploiting the well-known uniformization technique for Markov chains. This approach allows the concurrent construction of multiple sample paths under different (continuous or discrete) parameters at the expense of introducing “fictitious” events. Chen and Ho [18] have proposed a Generalized Standard Clock approach that uses approximation techniques to extend the SC approach to systems with non-exponential event lifetime distributions. On the other hand, Augmented System Analysis (ASA) [11, 10] solves the constructability problem by “suspending” the construction of one or more paths during certain segments of the observed sample path in a way such that the stochastic characteristics of the observed sample path are preserved. In ASA, it is still necessary to assume exponential event lifetime distributions, although, with a minor extension it is possible to allow at most one event to have a non-exponential lifetime distribution (see [12, 10] for details). We consider a DES and adopt the modeling framework of a stochastic timed state automaton (E,X ,Γ,f ,x0 ) (see [12]). Here, E is a countable event set, X is a countable state space, and Γ(x) is a set of feasible (or enabled) events, defined for all x ∈ X such that Γ(x) ⊆ E. The state transition function f (x, e) is defined for all x ∈ X , e ∈ Γ(x), and specifies the next state resulting when e occurs at state x. Finally, x0 is a given initial state. Remark: The definition is easily modified to (E,X ,Γ,p,p0 ) in order to include probabilistic state transition mechanisms. In this case, the state transition probability p(x0 ; x, e0 ) is defined for all x, x0 ∈ X , e0 ∈ E, and is such that p(x0 ; x, e0 ) = 0 for all e0 6∈ Γ(x). In addition, p0 (x) is the pmf P [x0 = x], x ∈ X , of the initial state x0 . Assuming the cardinality of the event set E is N , the input to the system is a set of event lifetime sequences {V1 , · · · , VN }, one for each event, where Vi = {vi (1), vi (2), · · ·} is characterized by some arbitrary distribution. Under some system parameter θ0 , the output is a sequence ξ(θ0 ) = {(ek , tk ), k = 1, 2, · · ·} where ek ∈ E is the kth event and tk is its corresponding occurrence time (see Figure 5.1). Based on any observed ξ(θ0 ), we can evaluate L[ξ(θ0 )], a sample performance metric for the system. For a large family of performance metrics of the form J(θ0 ) = E[L[ξ(θ0 )]], L[ξ(θ0 )] is therefore an estimate of J(θ0 ). Defining a set of parameter values of interest {θ0 , θ1 , · · · , θM }, the 31 V1 = {v1 (1), v1 (2), · · ·} VN = {vN (1), vN (2), · · ·} - ·· · - DES (θ0 ) - - ξ(θ0 ) DES (θ1 ) - ξ(θ1 ) ·· · - DES (θM ) - ξ(θM ) Figure 5.1: The sample path constructability problem for DES sample path constructability problem is: For a DES under θ0 , construct all sample paths ξ(θ1 ), · · · , ξ(θM ) given a realization of lifetime sequences V1 , · · · , VN and the sample path ξ(θ0 ). We emphasize that the proposed schemes are suited to on-line sample path construction, where actual system data are processed for performance estimation purposes. Furthermore, unlike SC and ASA they can be used for arbitrary lifetime distributions. 5.3. Concurrent Simulation For simplicity, in the rest of this section we assume that the DES under investigation satisfies the following three assumptions. A5.1: Feasibility Assumption: Let xn be the state of the DES after the occurrence of the nth event. Then, for any n, there exists at least one r > n such that e ∈ Γ(xr ) for any e ∈ E. A5.2: Invariability Assumption: Let E be the event set under the nominal parameter θ0 and let Em be the event set under θm 6= θ0 . Then, Em = E. A5.3 Similarity Assumption: Let Gi (θ0 ), i ∈ E be the event lifetime distribution for the event i under θ0 and let Gi (θm ), i ∈ E be the corresponding event lifetime distribution under θm . Then, Gi (θ0 ) = Gi (θm ) for all i ∈ E. Assumption A5.1 guarantees that in the evolution of any sample path all events in E will always become feasible at some point in the future. If for some DES assumption A5.1 is not satisfied, i.e. there exists an event α that never gets activated after some point in time, then, as we will see, it is 32 possible that the construction of some sample path will remain suspended forever waiting for α to happen. Note that a DES with an irreducible state space immediately satisfies this condition. Assumption A5.2 states that changing a parameter from θ0 to some θm 6= θ0 does not alter the event set E. More importantly, A5.2 guarantees that changing to θm does not introduce any new events so that all event lifetimes for all events can be observed from the nominal sample path. Finally, assumption A5.3 guarantees that changing a parameter from θ0 to some θm 6= θ0 does not affect the distribution of one or more event lifetime sequences. This allows us to use exactly the same lifetimes that we observe in the nominal sample path to construct the perturbed sample path. In other words, our analysis focuses on structural system parameters rather that distributional parameters which is appropriate for the resource allocation problems that we are dealing in this thesis. Note, that these assumptions can be relaxed at some additional computational cost as discussed later in this chapter. Before presenting the coupling approach we use to solve the constructability problem and the explicit procedure we will refer to as the Time Warping Algorithm, let us present the necessary notation. 5.3.1. Notation and Definitions First, let ξ(n, θ) = {ej : j = 1, · · · , n}, with ej ∈ E, be the sequence of events that constitute the observed sample path up to n total events. Although ξ(n, θ) is clearly a function of the parameter ˜ θ, we will write ξ(n) to refer to the observed sample path and adopt the notation ξ(k) = {˜ ej : j = 1, · · · , k} for any constructed sample path under a different value of the parameter up to k events in that path. It is important to realize that k is actually a function of n, since the constructed sample path is coupled to the observed sample path through the observed event lifetimes. However, again for the sake of notational simplicity, we will refrain from continuously indicating this dependence. Next we define the score of an event i ∈ E in a sequence ξ(n), denoted by sni = [ξ(n)]i , to be the non-negative integer that counts the number of instances of event i in this sequence. The ˜ i . In what follows, all corresponding score of i in a constructed sample path is denoted by s˜ki = [ξ(k)] quantities with the symbol “ ˜. ” refer to a typical constructed sample path. Associated with every event type i ∈ E in ξ(n) is a sequence of sni event lifetimes Vi (n) = {vi (1), · · · , vi (sni )} for all i ∈ E The corresponding set of sequences in the constructed sample path is: e i (k) = {vi (1), · · · , vi (˜ V ski )} for all i ∈ E which is a subsequence of Vi (n) with k ≤ n. In addition, we define the following sequence of lifetimes: Vi (n, k) = {vi (˜ ski + 1), · · · , vi (sni )} for all i ∈ E e i (k). Associated with any one of which consists of all event lifetimes that are in Vi (n) but not in V these sequences are the following operations. Given some Wi = {wi (j), · · · , wi (r)}, Suffix Addition: Wi + {wi (r + 1)} = {wi (j), · · · , wi (r), wi (r + 1)} and, 33 Prefix Subtraction: Wi − {wi (j)} = {wi (j + 1), · · · , wi (r)}. Note that the addition and subtraction operations are defined so that a new element is always added as the last element (the suffix) of a sequence, whereas subtraction always removes the first element (the prefix) of the sequence. Next, define the set n A(n, k) = i : i ∈ E, sni > s˜ki o (5.1) which is associated with Vi (n, k) and consists of all events i whose corresponding sequence Vi (n, k) contains at least one element. Thus, every i ∈ A(n, k) is an event that has been observed in ξ(n) ˜ and has at least one lifetime that has yet to be used in the coupled sample path ξ(k). Hence, A(n, k) should be thought of as the set of available events to be used in the construction of the coupled path. Finally, we define the following set, which is crucial in our approach: M (n, k) = Γ(˜ xk ) − (Γ(˜ xk−1 ) − {˜ ek }) (5.2) where, clearly, M (n, k) ⊆ E. Note that e˜k is the triggering event at the (k − 1)th state visited in the constructed sample path. Thus, M (n, k) contains all the events that are in the feasible event set Γ(˜ xk ) but not in Γ(˜ xk−1 ); in addition, e˜k also belongs to M (n, k) if it happens that e˜k ∈ Γ(˜ xk ). Intuitively, M (n, k) consists of all missing events from the perspective of the constructed sample path when it enters a new state x ˜k : those events already in Γ(˜ xk−1 ) which were not the triggering event remain available to be used in the sample path construction as long as they are still feasible; all other events in the set are “missing” as far as residual lifetime information is concerned. The concurrent sample path construction process we are interested in consists of two coupled processes, each generated by a timed state automaton. This implies that there are two similar sets of equations that describe the dynamics of each process. In addition, we need a set of equations that captures the coupling between them. 5.3.2. Timed State Automaton We briefly review here the standard timed state automaton dynamics, also known as a Generalized Semi-Markov Scheme (GSMS) (see [12, 30, 35]). We introduce two additional variables, tn to be the time when the nth event occurs, and yi (n), i ∈ Γ(xn ), to be the residual lifetime of event i after the occurrence of the nth event (i.e., it is the time left until event i occurs). On a particular sample path, just after the nth event occurs the following information is known: the state xn from which we can determine Γ(xn ), the time tn , the residual lifetimes yi (n) for all i ∈ Γ(xn ), and all event scores sni , i ∈ E. The following equations describe the dynamics of the timed state automaton. step 1: Determine the smallest residual lifetime among all feasible events at state xn , denoted by yn∗ : yn∗ = min {yi (n)} i∈Γ(xn ) (5.3) step 2: Determine the triggering event: en+1 = arg 34 min {yi (n)} i∈Γ(xn ) (5.4) step 3: Determine the next state: xn+1 = f (xn , en+1 ) (5.5) tn+1 = tn + yn∗ (5.6) step 4: Determine the next event time: step 5: Determine the new residual lifetimes for all new feasible events i ∈ Γ(xn+1 ): ( yi (n + 1) = yi (n) − yn∗ vi (sni + 1) if i 6= en+1 and i ∈ Γ(xn ) if i = en+1 or i ∈ 6 Γ(xn ) step 6: Update the event scores: ( sn+1 i = sni + 1 sni for all i ∈ Γ(xn+1 ) if i = en+1 otherwise (5.7) (5.8) Equations (5.3)-(5.8) describe the sample path evolution of a timed state automaton. These equations apply to both the observed and the constructed sample paths. Next, we need to specify the mechanism through which these two sample paths are coupled in a way that enables event ˜ lifetimes from the observed ξ(n) to be used to construct a sample path ξ(k). 5.3.3. Coupling Dynamics Upon occurrence of the (n + 1)th observed event, en+1 , the first step is to update the event lifetime sequences Vi (n, k) as follows: ( Vi (n + 1, k) = Vi (n, k) + vi (sni + 1) Vi (n, k) if i = en+1 otherwise (5.9) The addition of a new event lifetime implies that the “available event set” A(n, k) defined in (5.1) may be affected. Therefore, it is updated as follows: A(n + 1, k) = A(n, k) ∪ {en+1 } (5.10) Finally, note that the “missing event set” M (n, k) defined in (5.2) remains unaffected by the occurrence of observed events: M (n + 1, k) = M (n, k) (5.11) At this point, we are able to decide whether all lifetime information to proceed with a state transition in the constructed sample path is available or not. In particular, we check the condition M (n + 1, k) ⊆ A(n + 1, k). (5.12) Assuming (5.12) is satisfied, equations (5.3)-(5.8) may be used to update the state x˜k of the constructed sample path. In so doing, lifetimes vi (ski + 1) for all i ∈ M (n + 1, k) are used from e i (n + 1, k). Thus, upon completion of the six state update steps, all the corresponding sequences V e i (n, k), A(n, k), and M (n, k) need to be three variables associated with the coupling process, i.e., V updated. In particular, ( Vi (n + 1, k + 1) = Vi (n + 1, k) − vi (˜ ski + 1) Vi (n + 1, k) 35 for all i ∈ M (n + 1, k) otherwise (5.13) This operation immediately affects the set A(n + 1, k) which is updated as follows: n A(n + 1, k + 1) = A(n + 1, k) − i : i ∈ M (n + 1, k), s˜k+1 = sn+1 i i o (5.14) Finally, applying (5.2) to the new state x ˜k+1 , M (n + 1, k + 1) = Γ(˜ xk+1 ) − (Γ(˜ xk ) − {˜ ek+1 }) (5.15) Therefore, we are again in a position to check condition (5.12) for the new sets M (n + 1, k + 1) and A(n + 1, k + 1). If it is satisfied, then we can proceed with one more state update on the constructed sample path; otherwise, we wait for the next event on the observed sample path until (5.12) is again satisfied. The analysis above is summarized by the Time Warping Algorithm (TWA) included in Appendix A.2. 5.3.4. Extensions of the TWA Earlier in this chapter we stated a few assumptions that were made to simplify the development of our approach and keep the TWA notationally simple. It turns out that we can extend the application of TWA to DES by relaxing these assumptions at the expense of some extra work. In A5.2 we assumed that changing a parameter from θ0 to some θm 6= θ0 does not alter the event set E. Clearly, if the new event set, Em is such that Em ⊆ E, the development and analysis of TWA is not affected. If, on the other hand, E ⊂ Em , this implies that events required to cause state transitions under θm are unavailable in the observed sample path, which make the application of our algorithm impossible. In this case, one can introduce phantom event sources which generate all the unavailable events as described, for example, in [17], provided that the lifetime distributions of these events are known. The idea of phantom sources can also be applied to DES that do not satisfy A1. In this case, if a sample path remains suspended for a long period of time, then a phantom source can provide the required event(s) so that the sample path construction can resume. In A3 we assumed that changing a parameter from θ0 to some θm 6= θ0 does not affect the distribution of one or more event lifetime sequences. This assumption is used in (5.9) where the observed e i (n + 1, k). Note that this problem can be lifetime vi (sni + 1) is directly suffix-added to the sequence V overcome by transforming observed lifetimes Vi = {vi (1), vi (2), · · ·} with an underlying distribution Gi (θ0 ) into samples of a similar sequence corresponding to the new distribution Gi (θm ) and then e i (n + 1, k). This is indeed possible, if Gi (θ0 ), Gi (θm ) are known, at the expense suffix-add them in V of some additional computational cost for this transformation (for example, see [12]). One interesting special case arises when the parameter of interest is a scale parameter of some event lifetime distribution (e.g., it is the mean of a distribution in the Erlang family). Then, simple rescaling suffices to transform an observed lifetime vi under θ0 into a new lifetime vˆi under θm : vˆi = (θm /θ0 )vi Finally note that in a simulation environment it is possible to eliminate the overhead qK which is due to checking the subset condition in step 2.5. In order to achieve this we need to eliminate the coupling between the observed and constructed sample paths. Towards this goal, we can simulate the nominal sample but rather than disposing the event lifetimes we save them all in memory. Once the simulation is done, we simulate one by one all the perturbed sample paths exactly as we do with the brute force simulation scheme but rather than generating the required random variates we read them directly from the computer memory. In this way we trade off computer memory for higher speedup. A quantification of this tradeoff is the subject of ongoing research. 36 5.4. Finite Perturbation Analysis As noted before, the optimization algorithms of the previous chapters do not require the system performance under any allocation but only under the neighboring allocations, i.e., allocations that differ by ±1 resources in two distinct users. In this section we try to take advantage of the structure of queueing systems in order to derive a more efficient constructability scheme that will provide us with the necessary information. Even though the method of deriving this scheme is pretty general and can be applied to any queueing model, we will develop it using the serial queueing model shown in Figure 5.2. For this model, our objective is to observe the system under some buffer allocation, and based on that predict the performance of the system if we had added an extra buffer slot to one of the queues. λ Q0 Q1 µ0 … µ1 QN µΝ Figure 5.2: FPA System Model 5.4.1. Notation and Definitions We begin by establishing some basic notation and defining quantities we will use in our analysis. First, for any x, let [x]+ ≡ max{0, x}. The pair (k, n) will be used to denote the kth job in the nth stage. Associated with such a job are Zkn : the service time of (k, n). Ckn : the service completion time of (k, n) at stage n. Dkn : the departure time of (k, n) from stage n; if no blocking occurs, then Dkn = Ckn . We also define n Ikn ≡ Dkn−1 − Dk−1 ≡ −Wkn (5.16) Observe that when Ikn > 0, this quantity is the length of an idle period at stage n that starts with the departure of (k−1, n) and ends with the arrival of (k, n) at time Dkn−1 . Conversely, if Wkn = −Ikn > 0, n this is the waiting time of (k, n) who can only begin processing at time Dk−1 > Dkn−1 . Similarly, we define n+1 Bkn ≡ Dk−x − Ckn (5.17) n+1 which, if Bkn > 0, provides the length of a blocking period for the job (k, n) completing service at time Ckn . Finally, n Qnk ≡ Dkn − Dk−1 = Zkn + [Ikn ]+ + [Bkn ]+ (5.18) so that Qnk represents the interdeparture time of (k − 1, n) and (k, n) at stage n. 37 For our purposes, a perturbed sample path is one that would have resulted if the exact same nominal sample path had been reproduced under an allocation with one buffer slot added at some queue. To distinguish between quantities pertaining to the nominal path and their counterparts on a perturbed path we will use a tilde(“˜·”) as follows: if the number of buffers allocated to queue n is xn in the nominal path, then x ˜n denotes the number of buffers in the perturbed path. Similar notation applies to other quantities such as Dkn , etc. With this in mind, we define the indicator function ( 1[n + 1] = 1[˜ xn+1 = xn+1 + 1] = 1 if x ˜n+1 = xn+1 + 1 0 if x ˜n+1 = xn+1 to identify the downstream stage to any stage n where an additional buffer would have been added in a perturbed path. We also define en ∆Dkn ≡ Dkn − D (5.19) k to be the departure time perturbation for (k, n) due to the addition of a buffer to the nominal allocation. Finally, we will find useful the following quantity, defined as the relative perturbation in departure times for two jobs (k1 , n1 ) and (k2 , n2 ): (k ,n ) ∆(k12 ,n12 ) ≡ ∆Dkn11 − ∆Dkn22 (5.20) 5.4.2. Derivation of Departure Time Perturbation Dynamics We begin with the simple observation that the departure time Dkn satisfies the following Lindley-type recursive equation: n o n+1 n Dkn = max Dkn−1 + Zkn , Dk−1 + Zkn , Dk−x (5.21) n+1 There are three cases captured in this equation: 1. The departure of (k, n) was activated by the departure of (k, n − 1). This corresponds to the case where (k, n) starts a new busy period at stage n and, upon completion of service, it is not blocked by the downstream stage n + 1 . Thus, Dkn = Dkn−1 + Zkn and from the definitions (5.16) and (5.17) it is easy to see that Dkn = Dkn−1 + Zkn ⇐⇒ Wkn ≤ 0, Bkn ≤ 0 (5.22) 2. The departure of (k, n) was activated by the departure of (k − 1, n). This corresponds to the case where (k, n) belongs to an ongoing busy period (hence, experiencing some waiting in queue before receiving service) and it is not blocked by the downstream server n + 1. Thus, n Dkn = Dk−1 + Zkn and from (5.16) and (5.17) it is once again easy to check that n Dkn = Dk−1 + Zkn ⇐⇒ Wkn ≥ 0, Bkn ≤ 0 (5.23) 3. The departure of (k, n) was activated by the departure of (k − xn+1 , n + 1). This corresponds to the case where (k, n) is blocked and must remain at the nth stage after service completion1 . n+1 In this case, Dkn = Dk−x and from (5.17) it is easy to check that n+1 n+1 Dkn = Dk−x ⇐⇒ Bkn ≥ 0 n+1 (5.24) 1 Actually, this case combines two subcases, one where (k, n) starts a new busy period and leaves stage n after being blocked for some time, and another where (k, n) belongs to anongoing busy period and leaves stage n after being blocked. 38 Next, we consider the perturbation ∆Dkn defined by (5.19). In order to find the perturbation we apply (5.21) to both, the nominal and perturbed paths, and identify 9 distinct cases. After some algebra we arrive at the following theorems: Theorem 5.4.1 If Wkn ≤ 0 and Bkn ≤ 0, then: n (k,n−1) ∆Dkn = ∆Dkn−1 − max 0, ∆(k−1,n) − Ikn , o (k,n−1) ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] (5.25) The proof of this theorem is included in the Appendix D. Theorem 5.4.2 If Wkn > 0 and Bkn ≤ 0, then: n (k−1,n) n ∆Dkn = ∆Dk−1 − max 0, ∆(k,n−1) − Wkn , o (k−1,n) ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] (5.26) Theorem 5.4.3 If Bkn > 0, then: n+1 ∆Dkn = ∆Dk−x n+1 n (k−x n+1 − max ∆(k,n−1) (k−x n+1 ∆(k−1,n) ,n+1) ,n+1) − Bkn − [Wkn ]+ , ³ (k−x ,n+1) ´ o − Bkn − [Ikn ]+ , ∆(k−xn+1 − Qn+1 k−xn+1 · 1[n + 1] n+1 −1,n+1) (5.27) The proofs of Theorems 5.4.2 and 5.4.3 are similar to the proof of Theorem 5.4.1 and are omitted. Theorems 5.4.1-5.4.3 directly lead to the FPA algorithm presented in Appendix A.3. Note that this algorithm exactly constructs the sample path that would have been observed if one of the stages had an extra kanban. Also, note that all quantities used are directly observable from the nominal sample path. Finally, note that this algorithm subsumes the FPA results presented in [36], where perturbations were associated with stages rather than individual jobs. 5.5. Summary Several optimization algorithms, including the ones described in earlier chapters, require “derivative” like information in order to determine their next step. In general, to provide such information it is necessary to observe two sample paths under two different parameter values. It is well known that for the class of discrete-event systems it is possible to obtain such information by observing only a single sample path. This is referred to as the constructability problem, which is the subject of this chapter. First we develop a general algorithm that can be used to construct a sample path under any parameter {θ1 , · · · , θm } while observing a single sample path under θ0 . Subsequently, we take advantage of the special structure of queueing systems and we develop a more efficient algorithm for constructing sample paths but its applicability is restricted to a neighborhood of allocations “close” to the parameter of the observed sample path (i.e., allocations that differ by ±1 resources). 39 Chapter 6 OPTIMIZATION OF KANBAN-BASED MANUFACTURING SYSTEMS This chapter presents the first application of the developed resource allocation methodologies on kanban-based manufacturing systems. In this context, kanban constitute the discrete resources while the machine stages represent the users. We show that such systems satisfy the smoothness or complementary smoothness conditions defined in Chapter 4 and hence use the developed Incremental optimization algorithms to allocate a fixed number of kanban to all stages as to optimize an objective function. 6.1. Introduction The Just-In-Time (JIT) manufacturing approach (see Sugimori et al. [71] and Ashburn [7]) was developed to reduce the work-in-process inventory and its fluctuations and hence reduce production costs. The main principle of the technique is to produce material only when it is needed. Its most celebrated component is the so called kanban method, the basic idea of which is the following. A production line is divided into several stages and at every stage there is a fixed number of tags (or tickets) called kanban. An arriving job receives a kanban at the entrance of the stage and maintains possession of it until it exits the stage. If an arriving job does not find an available kanban at the entrance, it is not allowed to enter that stage until a kanban is freed; in this case, the job is forced to wait in the previous stage and becomes blocked. There are several variations of the basic kanban production system (see [24] for an overview) and, over the past years, much work has been devoted to the analysis and performance evaluation of such schemes. One of the main issues associated with the kanban method is the determination of the number of kanban at every stage. It is obvious that in order to achieve a minimum work-in-process inventory, no more than one job should be allowed at every stage. This, however, would severely restrict other objective functions such as throughput, mean delay etc. Therefore, the selection of the number of kanban is closely linked to a tradeoff between work-in-process inventory and some other possible objective. Several authors have investigated such tradeoffs. Philipoom et. al. [60], investigated the relation of the number of kanban with the coefficient of variation in processing times, 40 machine utilization, and the autocorrelation of processing times and proposed an empirical methodology for determining the number of kanban. In related work, Gupta and Gupta [34] investigated additional performance measures such as production idle time and shortage of final products. In studying the performance of kanban systems, both analytical models (e.g., [47, 52, 70]) and simulation (e.g., [40, 50, 65]) have been used. In the former case, one must resort to certain assumptions regarding the various stochastic processes involved (e.g., modeling demand and service processes at different stages through exponential distributions); however, even in the case of a simple finite Markov chain model, the large state space of such models necessitates the use of several types of approximations. In the case where simulation is used, any kind of parametric analysis of the model requires a large number of simulation runs (one for each parameter setting) to be performed. For a comparative overview see [73]. In this chapter we use simulation to first show that several manufacturing systems satisfy the “smoothness” or “complementary smoothness” conditions introduced in Chapter 4. Hence, depending on the objective, we apply the (SIO) or (SIO) algorithms that were developed in that chapter to allocated a fixed number of kanban K to the N stations as to either maximize the system throughput or minimize the average delay of each part (i.e., minimize system time). 6.2. More on the Smoothness Condition Unfortunately, conditions SIO and (SIO) are difficult to test. Since no easy method for testing these condition exists, we simulated the investigated systems exhaustively and obtained the system performance under any possible allocation for all possible number of kanban K = 1, 2, · · ·, and verified that SIO and (SIO) indeed hold. Specifically, we performed the following experiments. First, we simulated the serial manufacturing system of Figure 6.1 for N = 4, 5, 6, i.e., for systems with 5, 6, and 7 queues respectively, where the first queue (Q0 ) was assigned an infinite number of kanban. The objective function under consideration was the system throughput and it was found that smoothness was satisfied for all test cases. λ Q0 Q1 µ0 … µ1 QN µΝ Figure 6.1: Manufacturing systems consisting of N stations in series Subsequently, we tested the system shown in Figure 6.2 where, we again assumed that Q0 had infinite number of kanban. For this system, we considered throughput as the objective function of interest and found that smoothness was satisfied for all test cases. In addition, we considered the mean delay as another possible objective function and tested whether such a system satisfies the ¯ was valid. complementary smoothness condition. The findings were that for all examined cases, (S) Similar results were reported for the system of Figure 6.3. In this case however, all stages have a finite capacity. As a result, entities (parts, customers etc.) are lost if they arrive at queues 41 Q1 µ1 Q4 p1 λ Q0 µ4 Q2 µ0 p2 Q6 µ2 µ6 p3 Q3 Q5 µ3 µ5 Figure 6.2: Manufacturing network Q1 , · · · , Q3 when no kanban is available1 . In this case, we found that for all examined cases, the ¯ In addition, we considered the customer throughput satisfies (S) while the mean delay satisfies (S). ¯ loss probability as another possible objective measure and found that it too satisfies the (S). λ1 Q1 µ1 Q4 µ4 λ2 λ3 Q2 Q6 µ2 Q3 µ6 Q5 µ3 µ5 Figure 6.3: Queueing network ¯ may seem restrictive, The results reported above suggest that even though conditions (S) and (S) they are satisfied by several important systems and it reflects the fact that the addition of a single kanban to a stage of a non-optimal allocation will not cause a “non-smooth” jump in the overall performance a system. 1 Note that such a system is more relevant in the context of communication systems rather than manufacturing system but we consider it just to test the applicability of the smoothness condition. 42 6.3. Application of the Incremental Optimization Algorithms In this section we consider two manufacturing processes modeled as a kanban system consisting of N + 1 stages. The entrance to stage 0 contains an infinite-capacity buffer, i.e. stage 0 has infinite kanban. A job completes service at any stage, it continues to a downstream stage if that stage has an available kanban otherwise it waits, hence blocking the operation of the corresponding server. Exception is the last stage (N ) which is assumed to be connected to an infinite sink. Finally, jobs at all stages are processed on a First-In-First-Out (FIFO) basis and no distinction among job types is made. Let xi denote the number of kanban allocated to stage i and define the N -dimensional vector x = [x1 , · · · , xN ] to represent a kanban allocation. We will assume that at least one kanban is initially allocated to each of stages 1, · · · , N (xi ≥ 1), otherwise the throughput of the system is zero. We will P 0 further assume that an upper bound on the work-in-process is given such that N i=1 xi = K . Note 0 that since every stage must have at least one kanban, only K = K − N are available to be allocated to the N stages. Therefore, the search space must be redefined as follows: ( Ak = x : N X ) xi = k + N, xi ≥ 1 , k = 0, 1, · · · i=1 6.3.1. Application of SIO on Serial Manufacturing Process First, we consider the serial manufacturing process shown in Figure 6.1. In this case, the objective function J(x) is the throughput of the system and hence the problem is to determine an allocation x that maximizes J(x) subject to the constraint on the total number of kanban. In this case, SIO can be applied directly with the only modification that x0 = [1, · · · , 1] rather than [0, · · · , 0]. Figure 6.4 shows the evolution of the algorithm for a system with five stages in series (N = 4) when the available kanban K = 9 (K 0 = 13), and therefore there are 220 possible allocations. Furthermore, the arrival process is Poisson with rate λ = 1.0 and the service processes are all exponential with rates µ0 = 2.0, µ1 = 1.5, µ2 = 1.3, µ3 = 1.2, and µ4 = 1.1. In this figure, the horizontal axis is given in term of steps, where a ”step” represents the interval between the allocation of an additional resource through (4.11) and (4.12). Initially, we obtain ˆ k,l performance estimates every 100 departures (f (l) = 100 departures) and every time we reset x we increase the observation interval by another 100 departures. Through exhaustive search, it was found that the optimal allocation is [1, 3, 4, 5] with approximate throughput 0.9033. As seen in Figure 6.4, the SIO algorithm yields near-optimal allocations within the first four or five iterations2 (SIO performance curve). It is also worth reporting some additional results not evident from Figure 6.4. Specifically, the algorithm delivered allocations which were among the top 10% designs even at the very first iteration when the observation interval was limited to only 100 departures. After the first 10 iterations, (observation intervals greater than 1000 departures) the allocations obtained were among the top 1% designs, and after the 20th iteration the SIO algorithm consistently picked the top design [1, 3, 4, 5]. Finally, notice the saw-tooth shape of the evolution of the SIO algorithm curve which reflects the incremental nature of the algorithm and the resetting of (4.12). 2 An iteration corresponds to K steps 43 0.95 Throughput 0.85 0.75 0.65 0.55 15 30 45 60 75 90 105 120 135 150 165 180 Step SIO Performance SIO Evolution Optimal Solution Figure 6.4: Evolution of the SIO algorithm 6.3.2. Application of SIO on a Network Next, we consider the manufacturing process shown in Figure 6.2. For this case, the objective function J(x) is the throughput of the system and hence the problem is to determine an allocation x that maximizes J(x) subject to the constraint on the total number of kanban. Again we apply (SIO) directly using x0 = [1, · · · , 1] rather than [0, · · · , 0]. Figure 6.5 shows the evolution of the algorithm for the network when the available kanban K = 9 = 15), and therefore there are 2, 002 possible allocations. The arrival process is Poisson with rate λ = 1.3 and the service processes are all exponential with rates µ0 = 3.0, µ1 = µ2 = 1.0, µ3 = µ4 = µ5 = 1.5, and µ6 = 3.0. (K 0 Initially, we obtain performance estimates every 100 departures (f (l) = 100 departures) and ˆ k,l we increase the observation interval by another 100 departures. Through every time we reset x exhaustive search (simulating every allocation for 106 departures), it was found that the optimal allocation is [2, 0, 0, 3, 2, 2] with approximate throughput 1.304. As seen in Figure 6.5, initially the performance estimates are very noisy since they are taken over very short observation intervals. As the observation interval f (k) grows larger, then variance of the estimates is reduced but not eliminated. Note, that even with the noisy estimates, SIO is able to pick allocations with relatively good performance. Next, we used the results from the exhaustive search and ranked every possible allocation according their performance. Subsequently, we used this ranking order to check the rank of the allocations picked by SIO and the results are shown in the histogram of Figure 6.6. This figure shows that all of the picked allocations are among the top 40% of all possible allocations, but only 13% are among the top 5%. This suggests that either the SIO does not work very well, or there were several allocations with optimal or near optimal performance and hence the order we obtained through the exhaustive 44 1.5 Throughput 1.4 1.3 1.2 SIO Evolution 1.1 SIO Performance Optimal Solution 1 0 50 100 150 200 Step Figure 6.5: Evolution of the SIO algorithm simulation does not represent the “true” ranking. In this case, we believe that the later is true since more than 400 allocations (i.e., 20% of all allocations) exhibited performance within 0.1% of the optimum. 6.3.3. Application of SIO on a Network Finally, we used the system considered in the previous section but rather than maximizing throughput we minimize the mean delay. For this system, through exhaustive search, we found that the optimal allocation is [2, 3, 2, 0, 2, 0] with mean delay 3.975. As seen in Figure 6.7, the SIO algorithm yields near-optimal allocations even within the first iteration. As before, we used the results from the exhaustive search and ranked every possible allocation according their performance. Based on that order we check the rank of the allocations picked by SIO and the results are shown in the histogram of Figure 6.8. This figure shows that 58% of the picked allocations are among the top 1% of all possible allocations while none of the picked allocations is worse than the top 30% of all possible designs. This is considerably better than the results presented in the previous section and the reason is that there are considerably fewer allocations that exhibit near optimal performance. 6.4. Summary In this chapter we considered to problem of allocating kanban to the various stages of a manufacturing system as to optimize a performance measure such as the throughput and mean delay. First, we ¯ Subsequently, we used showed that the systems under consideration satisfy conditions (S) and (S). (SIO) and (SIO) to determine the optimal allocation and showed that these algorithms can yield good solutions with a limited amount of effort. 45 16 % 12 8 4 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Percentile of Top Designs Figure 6.6: Ranking of the allocations picked by SIO 60 SIO Evolution SIO Performance Optimal Solution System Time 50 40 30 20 10 0 0 50 100 150 Step Figure 6.7: Evolution of the (SIO) algorithm 46 200 70 60 50 % 40 30 20 10 0 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 15% 20% Percentile of Top Designs Figure 6.8: Ranking of the allocations picked by SIO 47 30% Chapter 7 CHANNEL ALLOCATION IN CELLULAR TELEPHONE NETWORKS This chapter presents another application of resource allocation methodologies on mobile cellular telecommunication networks. In this context, channels are the discrete resources while mobile phones are the users. Using principles from the preceding chapters we are able to develop channel allocation algorithms that can reduce the probability that a new call will not find an available channel while using the smallest number of reconfigurations. 7.1. Introduction In the recent years mobile communications have experienced a tremendous growth. In order to cope with the increased demand, the service area is divided into cells where frequency channels may be reused as long as there is sufficient distance separations to prevent interference [51]. In addition, frequency allocation algorithms may further increase the system capacity, at least for systems that use either frequency or time division multiple access schemes (FDMA or TDMA respectively). The simplest frequency allocation scheme is fixed channel allocation (FCA) where each cell is pre-assigned a fixed set of channels for its exclusive use, making sure that none of the surrounding cells can use those channels. Apart from its simplicity, FCA can reduce interference by increasing the frequency separation among the channels assigned to the same cell and it has superior performance under heavy traffic. On the other hand, FCA cannot adapt to changing traffic conditions. Dynamic channel allocation (DCA) schemes [21, 61] overcome this problem by allowing all cells to use any of the available channels as long as interference is kept below certain level. DCA schemes increase the flexibility and traffic adaptability of the system, and consequently increase the system capacity under light and medium traffic conditions. However, for heavy traffic conditions, DCA may lead to inefficient allocations making FCA a better choice. To combine the benefits of the two, hybrid channel allocation (HCA) schemes [42] have been developed where all available channels are divided into fixed and dynamic sets. For a comprehensive study of the channel allocation schemes the reader is referred to [45]. 48 In all of the channel allocation schemes described above, a mobile phone is always connected to the base station with the highest signal to noise ratio, which usually is the base station that is physically located closer to the mobile user. In practical systems, in order to achieve a complete area coverage it is inevitable that some mobile phones, depending on their actual location, may be able to communicate with two or more base stations, i.e. they receive a signal with sufficiently high signal to noise ratio from multiple base stations. Since such phones can be connected to two or more base stations, it may be possible to increase the network capacity by connecting new mobile phones to the “least” congested base station. Towards this end, two algorithms have been developed, namely, directed retry (DR) and directed handoff (DH) [26, 44, 25]. DR directs a new call to the base station with the greatest number of available channels, while DH may redirect an existing call from one cell to a neighboring one to further increase the system capacity (for comparison purposes both algorithms are described in Section 7.3). Both, DR and DH perform the “phone allocation” to base stations based on some state information, i.e., the number of available channels. In effect, the two algorithms try to balance the number of available channels over all base stations. However, using Lagrangian relaxation, it is easy to show that in order to solve a convex constrained optimization problem, it is important to balance the partial derivatives of the objective function with respect to the control variables [27]. This idea has motivated the development of the optimization algorithms developed in the preceding chapters and it is also a motivating factor for algorithms that are derived in this chapter, namely, Simple Neighborhood (SN) and Extended Neighborhood (EN). These algorithms use derivative-like information (in the form of a finite difference) to determine the channel allocation. As a result, these algorithms enhance the performance of DR and DH by decreasing the call loss probability while requiring less reconfigurations. Moreover, we demonstrate that using overlapping cell structures leads to increased system capacities that can be better than DCA. Consequently, one can expect such structures to perform even better in hierarchical models due to the larger overlap between micro and macro cells []. 7.2. Overlapping Cells and Modeling Assumptions In this section we describe a cellular system model where the cells are allowed to overlap as seen in Figure 7.1. In the model we assume that the service area is flat and each cell is represented by a hexagon inscribed in a circle of radius r = 1. Each cell Ci is serviced by base station Bi which is located in the center of the cell. The coverage area of each base station is represented by a circle around Bi , where, in order to achieve full coverage its radius must be greater than or equal to 1. Note that a model with overlapping cells is realistic since even under the idealized conditions described above, it is necessary that at least 21% of the cell’s area overlaps with neighboring cells. In practical systems, this overlap is usually much larger, and it can be further increased by increasing the transmitted power of some transmitters and by applying principles of power control [33, 79]. Next, we define some sets that will be useful when describing the various algorithms that will be presented in the remaining of this chapter. A(Bi ) is the set of all base stations that are located in the cells that are adjacent to cell Ci and is serviced by base station Bi . IN (c) Immediate Neighborhood is the set of all base stations that mobile phone c can be directly connected to. 49 EN (c) Extended Neighborhood is the set of all base stations that are adjacent to the cells that are S in IN (c), i.e., EN (c) = j∈IN (c) A(j). M (Bi ) is the set of all mobile phones that are connected to base stations Bi , and H(Bi ) is the set of all mobiles that are not connected to base station Bi but which can be connected to Bi , i.e., these are calls connected to a base station in A(Bi ) and are located in the area that overlaps with Bi . For example, in Figure 7.1, A(B1 ) = {B2 , B3 , B4 , B5 , B11 , B12 }, IN (c) = {B1 , B2 , B3} and EN (c) = {B1 , · · · , B12 }. To complete the notational definitions we assume that base station Bi is assigned a fixed number of channels Ki . Furthermore, we use mi to denote the number of channels that are currently available in Bi (mi = Ki − |M (Bi )|). Finally, by B ∗ (c) we denote the base station that is located closer to mobile c or has the highest signal to noise ratio. Existing call (e) B4 B12 B5 B6 B1 B2 B11 B3 B7 B10 B8 New call (c) B9 Figure 7.1: Overlapping Cell Structure In addition, the results of this paper follow the basic assumptions made in [26], that is: 1. Call arrivals are described by a Poisson process with rate λ while the call duration is exponentially distributed with mean 1/µ. 50 2. Blocked calls are cleared and do not return. 3. Fading and co-channel interference are not accounted for. Propagation and interference considerations are simply represented by the constraint that, if the channel is used in a given cell, it cannot be reused in a ring of R cells around that cell, R = 1, 2, · · ·. 4. Mobile phones are assumed stationary, i.e. in this model we do not account for mobile users that roam from one cell to another. 5. Certain mobiles may be connected to multiple base stations, i.e. mobiles that are located in the intersection of the coverage areas of two or more base stations as shown in Figure 7.1. 7.3. DR and DH Schemes In this section we describe variations of the two algorithms that have appeared in the literature and use the overlapping cell model, namely “directed retry” and “directed handoff”. In DR when a new call is initiated, it is connected to the closest base station if the number of available channels in that base station is greater than a threshold T . If the number of available channels is less than or equal to T , then the new call is assigned to the base station that has the most available channels from all base stations in the immediate neighborhood of the new call (IN (c)). Directed Retry (DR) When a new call c arrives 1. If mi∗ > T , carry call c at Bi∗ , where Bi∗ = B ∗ (c) 2. If mi∗ ≤ T , carry call c at Bj ∗ , where j ∗ = maxj∈IN (c) {mj } 3. If mj = 0, for all j ∈ IN (c), then call c is blocked DH is an enhancement to DR where an existing call may be redirected to a neighboring base station in order to accommodate more new calls. Specifically, this scheme works as shown below Directed Handoff (DH) When a new call c arrives 1. Find Bi∗ = B ∗ (c) and define Q = {Bi∗ } ∪ A(Bi∗ ) 2. Let j ∗ = arg maxj∈Q {mj } 3. If j ∗ = i∗ , carry c at Bi∗ and go to 7. 4. If there exists call e ∈ Mi∗ ∩ Hj ∗ , then handoff e to Bj ∗ , and carry the new call c at Bi∗ . Go to 7. 5. If Mi∗ ∩ Hj ∗ = ∅, Q := Q − {Bj ∗ } 6. If Q 6= ∅ go to 2 else the call is blocked 7. END. 51 DH can improve the performance of DR by being able to redistribute calls over seven base stations (the base station located closer to the new call plus its six adjacent cells), while DR can only redistribute to up to a maximum of three base stations. The trade off is an induced handoff on an existing call which is forced to switch its carrying base station. 7.4. Performance Enhancements As mentioned in the introduction, DR and DH perform the call allocation to the base stations based on the current state information, i.e., the number of available channels in each cell, whereas, it would be desirable to use derivative like information to perform the optimization. To derive such information, we need an objective function and given that we are interested in minimizing the number of lost calls, it is natural to consider the steady state loss probability as given by the Erlang B formula. K −1 ρKi Xi ρji LP = i Ki ! j=0 j! (7.1) where Ki is the number of channels assigned to base station Bi and ρi = λi /µi is the traffic intensity in cell Ci . Note however, that for the type of algorithms we are interested in, the number of channels Ki is fixed for all cells, hence, any derivative like function of the steady state loss probability with respect to any parameter other than Ki , will always be equal to zero. This suggests that rather than using a steady state measure, it may be preferable to use a transient one. So, rather than directly trying to minimize the steady state loss probability, one can try to minimize the loss probability over the next τ time units and hope that such actions will also minimize the steady state loss probability. To derive a transient objective function recall that any base station can be modeled by an M/M/m/m queueing system (see [48, 12]). Such a systems generates a birth-death Markov chain with a probability mass function π i (t) which is the solution of the differential equation dπ i (t) = π i (t)Qi dt (7.2) i (t)] and π i (t) is the probability that at time t there will be j, j = where π i (t) = [π0i (t), · · · , πK j i 0, · · · , Ki , active calls in cell Ci . Furthermore, Qi is the transition rate matrix which for the M/M/m/m queueing system is given by −λi λi µ −(λ i i + µi ) Qi = 0 2µi 0 .. . 0 .. . 0 ··· 0 λi 0 0 ··· ··· −(λi + 2µi ) .. . λi .. . ··· .. . 0 ··· 0 0 .. . .. . (Ki − 1)µi −(λi + (Ki − 1)µi ) λi 0 Ki µi −Ki µi (7.3) i (τ ), What we are after, is the probability that a call is lost within the next τ time units, i.e., πK i which for small τ is clearly going to be a function of the initial conditions. Hence, for every cell Ci 52 we define the following objective function: i Li (mi ) = πK (τ ), s.t. π i (0) = eKi −mi i (7.4) where ej is a (Ki + 1)-dimensional vector with all of its elements equal to zero except the jth one which is equal to 1 and mi is the number of the base station’s free channels at the decision time, i.e., at t = 0. Based on this objective function, we then define the following finite difference ∆Li (mi ) = Li (mi − 1) − Li (m), mi = 1, · · · , Ki (7.5) with a boundary condition Li (0) = ∞. Next, we are ready to describe our optimization algorithms. 7.4.1. Simple Neighborhood (SN) The SN algorithm is very similar to DR. Their only difference is that SN assigns the new call to the “least sensitive” base station with respect to the number of available channels, while DR assigns the new call to the base station with the largest number of available channels. Note that for systems with uniform traffic over the entire coverage area, and when the threshold T of DR is set equal to the number of available channels, then the two algorithms behave exactly in the same way, since the least sensitive base station will always be the one with the most available channels. This is also observed in some of the simulation results that we present in the next section. More specifically, the algorithm works as follows: Simple Neighborhood (SN) When a new call c arrives, 1. Let i∗ = arg mini∈IN (c) {∆Li (mi )} 2. Assign c to base station Bi∗ 3. END. 7.4.2. Extended Neighborhood (EN) The Extended Neighborhood algorithm rather than looking for the least sensitive base station among the cells in the immediate neighborhood, it searches the entire extended neighborhood. If the least sensitive base station (say i∗ ) is within IN (c), then it assigns the call to i∗ as in SN. If the least sensitive base station is not in IN (c), then this scheme looks for an existing call that is connected to one of the IN (c) base stations, and is located in the intersection of the least sensitive base station with any of the cells in IN (c). For example, in Figure 7.1, call e is connected to B1 and is located in the intersection of the coverage area of B4 (the least sensitive base station among EN (c)). Then, the algorithm induces a handoff, connecting e to B4 while assigning c to B1 . Note, that if there is no call in the intersection of the least sensitive base station with any of the cells in IN (c), then the scheme looks for the next best option. Next, we formally describe the EN algorithm 53 Extended Neighborhood (EN) When a new call (c) arrives, 1. Define Q = EN (c) 2. Let i∗ = arg mini∈Q {∆Li (mi )} 3. If Bi∗ ∈ IN (c), assign c to Bi∗ and go to 7. 4. If there exists e ∈ Ai∗ ∩ 5. Handoff call e to Bi∗ and assign c to the base station that e belonged to. Then go to 7 6. Q = Q − {Bi∗ }. If |Q| = 0 call is blocked, otherwise, go to 2. 7. END. ³S ´ j∈IN (c) Mj go to 5, else go to 6. The EN algorithm is also similar to the DH algorithm. Their differences lie in step 2 where EN is trying to minimize the sensitivities with respect to the number of available channels while DH tries to find the base station with the most available channels. Another difference is that EN is trying to find the least sensitive base station among all base stations in EN (c) while DH is trying to find the base station with the most available channels in A(B ∗ (c)) ⊆ EN (c). 7.4.3. On-Line Implementation of SN and EN Under the Poisson arrival and exponential call duration assumptions one can monitor the system over an interval of length T and get estimates of the actual parameters λi and µi for all base stations Ci , i = 1, 2, · · ·. Based on these estimates, we solve the differential equation (7.2) Ki + 1 times, once for each initial condition, and use the results to determine the finite differences ∆Li (j) for all j = 0, · · · , Ki which are then saved and used by the controller over the next interval. Therefore, this algorithm does not impose any significant computational burden. An interesting question rises when the underlying distributions are not exponential. In this case, the differential equation (7.2) is not valid and therefore neither the finite differences (7.5). In this case, we can directly get estimates of the finite differences over the interval T and use those to drive the SN and EN algorithms. A simple algorithm for obtaining such estimates is the following: Every τ time units observe the current state of each cell, i.e. the number of active channels. Count the number of times that each state was observed Nl , for l = 0, · · · , Ki . Also count the number that that the state at t0 , is l and at t0 + τ it becomes m, Nlm for all l, m = 0, · · · , Ki . Finally, form the appropriate ratios to get the required estimates. 7.5. Simulation Results In this section we present simulation results for the call loss probability of SN and EN and compare them with the corresponding results of DR and DH. In addition, we use the results derived in [19, 20] to compare the performance of these algorithms with the performance of Dynamic Channel Allocation (DCA) algorithms. Specifically, we reproduced the lower bounds on two DCA algorithms: (a) The “Timid DCA” scheme which allows a new mobile phone to connect via any channel that is not used 54 in any of the cells that are located in the R consecutive rings surrounding the closest base station. (b) The “Aggressive DCA” scheme which allows a new mobile to get any channel, even if it is used in an adjacent cell, and force existing calls to search for new interference free channel in their area1 . These bounds are the result of an ad hoc Erlang-B model which uses the Erlang B formula (7.1) and substitutes the traffic intensity ρ → N ρ and the total number of available channels K → δK, where N is the reuse factor2 [51], and δ is the normalized channel utilization3 . For the results presented next, we assume that there is a total of 70 channels. Any channel assigned to base station Bi cannot be reused in any base station in the ring of radius R = 2 cells around Bi . In this case, the reuse factor N = 7 and hence each base station is assigned 10 channels. In addition, we assume that the service area consists of 64 cells each of which is represented by a hexagon inscribed in a circle of radius r = 1, arranged in an 8 × 8 grid with a base station located in the center. Note that in order to achieve full area coverage, the coverage radius of each base station must be at least equal to 1. Finally, in order to reduce the effect of cells being located at the edges we have used the model in [26]: a cell on an edge is assumed to be adjacent to the cells on the opposite edge. Figure 7.2 shows how the probability of a mobile being able to hear multiple base stations changes as a function of the coverage radius of each base station assuming that demand is uniformly distributed over the entire area. Note that even at the minimum coverage radius at least 20% of the mobile phones are covered by two base stations. Next we compare the call loss probabilities for various channel allocation and “mobile allocation” algorithms. Figure 7.3 compares the call loss probability of the five channel allocations schemes namely FCA, DR, SN, DH and EN when the traffic is uniform over the entire service area and the coverage radius is 1.14. In this case, about 50% of all calls can hear a single base station, 43% can hear two base stations and 7% can hear three base stations. For DR, we simulated the system with 2 different thresholds T = 5 and T = 10. As T increases, the loss probability decreases and at the limit, i.e., when T is equal to the number of the available channels, the performance of DR is identical with the performance of SN as indicated in Section 7.4.1. Furthermore, note that DH and EN exhibit superior performance which is considerably better than the Timid DCA while for intensities ρ ≥ 7.5, EN outperforms even the aggressive DCA. A similar story is also presented in Figure 7.4 which compares the loss probabilities of the seven algorithms when the coverage radius is equal to 1.4, i.e. when 15% of the calls can hear one base station, 33% can hear two base stations and 52% can hear three base stations. Note how the call loss probability has been dramatically reduced and note that both, EN and DH outperform even the aggressive DCA. However, increasing the coverage radius by that much will increase the co-channel interference between adjacent base stations and it is possible that such a configuration may not be feasible due to noise. On the other hand, we point out that such overlapping probabilities may be feasible for non-uniform traffic or in hierarchical cell structures. For example, when planing a system, the base stations may be placed is a way such that the overlapping areas is over high usage regions. EN and DH improve the system performance at the expense of intracell handoffs. When a new call arrives, they may redirect an existing call from its present base station to a neighboring one to 1 Note that in [19] it is stated that no practical algorithm exists that can implement the aggressive DCA and conjecture that this bound may not be attainable. 2 N = i2 + ij + j 2 where i, j are integers. For R odd, i = j = (R + 1)/2 and for R even i = R/2 and R = j/2 + 1. 3 For a Timid DCA scheme δ(R = 1) = 0.693, δ(R = 2) = 0.658, δ(R = 3) = 0.627 while for the aggressive DCA scheme δ = 1 [19]. 55 0.9 Call listens 1 BS Call listens 2 BSs Call listens 3 BSs 0.8 Coverage Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 1.03 1.06 1.09 1.12 1.15 1.18 1.21 1.24 1.27 1.3 1.33 1.36 1.39 1.42 Normalized Cell Radius Figure 7.2: Cell overlapping as a function of the cell radius accommodate the new call. As shown in Figure 7.5, the use of derivative like information allows EN to achieve lower call loss probability than DH while requiring significantly less reconfigurations (less intracell induced handoffs). Figure 7.6 shows how the call loss probability will change as the cell radius changes when the traffic intensity in each cell is ρ = 8 Erlangs. Note that when DH and EN are used, a small increase in the base stations’ coverage area may dramatically decrease the call loss probability. Next, we investigate the effect of the parameter τ on the performance of the SN and EN algorithms. This is shown in Figure 7.7 which indicates that the algorithms perform well under small values of τ however as τ gets larger, their performance degrades. This is reasonable because algorithms like SN and EN behave like D-DOP and S-DOP presented in Chapter 3 and hence require some convexity assumptions like A3.2. Note that the transient performance measure we used (7.4) is convex with respect to the initial conditions for τ = 0, but not strictly convex. For values of 0 < τ < τ 0 it becomes strictly convex but for values of τ > τ 0 it becomes non-convex, where τ 0 is a constant that depends on λ and µ. The objective of SN and EN is to direct new calls to the “least sensitive” base station. If the objective function is convex, then the minimum finite difference (7.5) correctly identifies that base station. On the other hand, this is not guaranteed when the function is not convex. Hence, the performance of the SN and EN algorithms degrades as the objective function becomes non-convex. As mentioned previously, under uniform traffic, the performance of DR is almost identical to SN when the threshold T is equal to the total number of channels. For non-uniform traffic however, SN has an advantage due to the use of the sensitivity information. Figure 7.8 shows the overall call loss probability when the coverage radius is 1.14 and the traffic is such that for every three neighboring 56 Call Loss Probability 1.0E+00 1.0E-01 1.0E-02 FCA DR (T=5) DR (T=10) SN DH EN Timid DCA Aggr. DCA 1.0E-03 1.0E-04 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Traffic Intensity (Erlangs) Figure 7.3: Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.14 cells, one has a fixed intensity of 8 Erlangs, another has intensity of 2 Erlangs and the intensity of the third varies as in the horizontal axis of the figure. For this case, SN exhibits slightly lower loss probability than DR while again EN exhibits the best performance. 7.6. Conclusions and Future Direction In this chapter we presented two algorithms (SN and EN) which use sensitivity like information to improve the performance of DR and DH schemes in the context of overlapping cellular networks. SN and EN exhibit lower call loss probability than DR and DH respectively while EN also reduces the number of intracell handoffs. Furthermore, for instances where a high enough percentage of calls can be connected to multiple base stations, these algorithms can achieve lower call loss probabilities than many DCA schemes. In should be interesting to investigate the effect of such algorithms in systems with hierarchical cell structure where overlapping areas are potentially much bigger. Furthermore, for this work we assumed that the mobile users are “stationary”, i.e., they do not cross the boundaries of the cell where they were initiated. Clearly, it would be interesting to look at how such algorithms affect the number of dropped calls when users are allowed to roam. Finally, another interesting aspect of such systems is the issue of fairness as described in [49] (i.e., mobile users depending on their location can perceive different quality of service because they can here multiple base stations). 57 1.0E+00 FCA DR (T=5) DR (T=10) SN DH EN Timid DCA Aggr. DCA Call Loss Probability 1.0E-01 1.0E-02 1.0E-03 1.0E-04 1.0E-05 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Traffic Intensity (Erlangs) Figure 7.4: Call loss probabilities as a function of the traffic intensity ρ when the cell radius is 1.4 0.9 Handoffs per Call 0.8 0.7 0.6 0.5 DH (r =1.14) EN (r =1.14) DH (r =1.4) EN (r =1.4) 0.4 0.3 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Traffic Intensity Figure 7.5: Average number of induced handoffs for EN and DH. 58 0.12 DR SN DH EN Call Loss Probability 0.1 0.08 0.06 0.04 0.02 0 1 1.03 1.06 1.09 1.12 1.15 1.18 1.21 1.24 1.27 1.3 1.33 1.36 1.39 1.42 Normalized Cell Radius Figure 7.6: Call loss probabilities as a function of the cell radius 0.14 SN-8 EN-8 Call Loss Probability 0.12 SN-9 EN-9 0.1 0.08 0.06 0.04 0.02 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 Transient Horizon (ττ )) Figure 7.7: Call loss probabilities as a function of the parameter τ 59 Call Loss Probability 1 FCA DR (T=10) SN DH EN 0.1 0.01 0.001 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 Traffic Intensity Figure 7.8: Call loss probabilities for non-uniform traffic 60 11 Chapter 8 GROUND-HOLDING PROBLEM IN AIR TRAFFIC CONTROL This chapter presents our final application of resource allocation methodologies. Specifically we consider the ground-holding problem as it arises in air traffic control. In this context a runway is considered as the resource while airplanes using it to land or take off are the users. First, we consider the runway as a discrete resource by dividing the interval that it can be used into time slots and hence assign each time slot to an aircraft. Subsequently, we relax the time discretization and view the runway as a continuous resource. 8.1. Introduction In recent years, air traffic has increased dramatically while airport capacity has remained stagnant. This has resulted in congestion problems which degrade the performance of the air traffic control system, raise safety concerns, and cause excessive costs (of the order of several billions of US dollars). Adding to the problem is the important fact that the capacity of an airport is sensitive to changes in weather conditions (e.g., visibility, wind). Thus, even if a maximum airport capacity were adequate to meet the scheduled demand, it is not unusual for this capacity to drop by half or even more due to bad weather conditions, resulting in serious congestion problems that typically propagate to other airports as well. Solutions to this problem vary according to the planning horizon. Long-term considerations involve building new airports and additional runways. Medium-term approaches focus on ways that disperse traffic to less utilized airports through regulation, incentives, etc. Finally, short-term solutions aim at minimizing the unavoidable delay costs under the current capacity and demand. This chapter proposes and analyzes control schemes that belong to the latter category. The most important class of solutions to the short-term congestion problem is through GroundHolding Policies (GHP) which are based on the premise that ground delays are less expensive (less fuel) and safer than airborne delays. The objective of any GHP is to trade off airborne delays for ground delays. Thus, when a flight is scheduled to arrive at some destination airport at a time when high congestion is expected, it is instructed to delay its departure from the origin airport until air traffic subsides. The fundamental issue in any GHP is to determine which flights should be delayed 61 and by how long. Several authors have considered this problem for a single destination airport. Andreatta and Romanin-Jacur [5] have studied the single-period GHP problem and used dynamic programming (DP) to obtain a GHP that minimizes the delay cost. Terrab and Odoni [72] extended these results to the multi-period GHP, while Richetta and Odoni [62, 63] also addressed the same problem formulated as a stochastic linear program which they solved to obtain the optimal solution. However, it is well-known that the DP approach suffers from the “curse of dimensionality”; for a problem of realistic size, the resulting state space explodes, making a solution practically intractable and necessitating the use of heuristics. For the GHP problem in a network of several destination airports, Vranas et. al. [75] developed three models under slightly different assumptions, which are based on a zero-one integer program formulation. Bertsimas and Patterson [8] enriched the model by introducing en-route capacity constraints, and also addressed issues that arise due to banks of flights in the hub-and-spoke system, as well as rerouting of aircrafts. A comparative study for three different approaches of solving the GHP problem for multi-airport systems has been conducted by Andreatta and Brunetta [4]. The GHP problem is generally viewed as having a stochastic and a dynamic component. First, airport capacity is stochastic since it is weather dependent and cannot be predicted accurately, especially for long periods of time. A dynamic component is present, since, as time progresses, better weather estimates become available which may repeatedly require changes in the scheduling policy. In addition, sudden changes in operating conditions (e.g., emergencies) also require rapid rescheduling capabilities. Note that the aforementioned methodologies reported in the literature are based on some type of a Linear Program (LP) to solve the GHP problem, which for typicalsize problems, contains several hundreds of thousands of variables and constraints. An additional difficulty is present in some of the techniques which also involve integer programming: since there is no guarantee that the LP relaxation will yield an integral solution, techniques such as the branch-andbound may also be necessary, increasing the time needed to solve a single instance of the problem. Due to the dynamic component of the GHP problem, it is essential to solve several instances of it over the period of a day, as one tracks changes in operating conditions and hence airport capacities. However, due to the size of the problem, such approaches may easily become impractical. In this chapter we address the GHP problem for a single destination airport and introduce two new solution approaches suited for this problem. Our first approach is motivated by the kanban control policies introduced in the previous chapter; more specifically it follows the Kanban Smoothing (KS) flow control policy, first proposed in [56]. Its main advantage is that it is inherently dynamic and at the same time very simple to implement. By controlling certain parameters, one can trade off ground and airborne delays. This approach, however, is not aimed at completely eliminating airborne delays. This has motivated our second approach, in which we study the sample paths generated by scheduled flight departures and arrivals with the explicit aim of assigning each flight a GH delay that minimizes a specific cost function. By invoking Finite Perturbation Analysis (PA) techniques, we develop an efficient algorithm which eliminates airborne delays. The advantages of (KS) include: (a) It is very easy to implement and inherently dynamic. As soon as new estimates of the airport capacity become available, they can be immediately entered into the controller by just changing the number of kanban in the relevant stages; thus, there is no a priori need to limit airport capacity to a small number of profiles so as to keep computational effort to manageable levels (as was done in [62]). (b) It automatically addresses uncertainties in the departure time or travel time of each flight through its kanban release mechanism. (c) It is distributed over airports and scalable, since every destination airport can have its own controller and a new airport can be added by simply adding a new controller. (d) It facilitates en-route speed 62 control; for example, an airplane that enters a stage without an available kanban can be instructed to slow down and later speed up as kanban become available, and (e) It easily addresses en-route capacities by assigning a similar controller to every sector and requiring that every flight must obtain a kanban from each sector in its path before it is allowed to take off. The second approach bypasses a limitation of the proposed (KS) policy, as well as all LP policies proposed in the literature, in which a day is divided into small intervals. The smaller the intervals, the higher the effectiveness of the resulting policy; however, the size of the problem increases and this requires more computational effort to obtain a solution. As an example, if a day is divided into 15-minute intervals and the airport capacity is 5 landings per interval, then 5 flights assigned to any such interval may arrive at any point during the interval; therefore, they are not guaranteed zero airborne waiting time. On the other hand, if a day is divided into 3-minute intervals, then only 1 landing per interval will be allowed, reducing, but not eliminating, the airborne delay cost. This however, will increase the number of variables and constraints required, hence the complexity of the problem will be increased. Depending on the amount of actual delay assigned by such policies, they may become more “conservative” or more “liberal” in the sense that they may introduce excessive GH delays while the runway at the destination airport is idle, or allow more than the minimum possible airborne delays. This problem has motivated our second control scheme, which aims at forcing airborne waiting times to zero, while avoiding excessive ground holding delays. The cornerstone of this approach is to replace a time-driven model in which one considers flights within intervals of fixed length by an event-driven model in which one studies sample paths of the air traffic system generated by departure and arrival events under a specific policy. Towards this end, we have employed an FPA scheme similar to the one described in an earlier chapter (Section 5.4). In the GHP case, the “specified change” is the introduction of any new flight arrival, and the FPA scheme we develop evaluates the impact of such an event on the delay cost for all possible GH delay values. This allows us to determine the GH delay which minimizes the cost that this new flight will induce to the entire system. 8.2. System Model For the purposes of this paper we consider a network with M departure (source) airports (S1 , · · · , SM ) and a single destination airport D as shown if Figure 8.1. Over the period (0, T ), there are K flights (f1 , · · · , fK ) scheduled to arrive at D, at times T1 , · · · , TK respectively. Any flight fk that departs from Si , i = 1, · · · , M , at time tk arrives at the destination airport at time Tk = tk + di where, the travel times di of each airplane are deterministic and known in advance. Delays, therefore, are only due to congestion. S1 S2 d1 . . .. .. d2 Airborne wait queue Destination Airport Runway Main Area dM SM Figure 8.1: Destination airport queueing model 63 At the time of an airplane arrival at D, if the runway of D is not occupied by a preceding airplane, it immediately proceeds to the runway in order to land, otherwise it delays its landing until all preceding airplanes clear the runway. Once an airplane has landed, it proceeds to the airport’s gates making the runway available for the next airplane. Note that for safety reasons there is a minimum time separation, Zt , between any two consecutive airplane landings which depends on the weather conditions. Under “good weather”, Zt obtains its minimum value and therefore the airport capacity CtD , defined as the number of landings per unit of time, assumes its maximum value. As weather conditions deteriorate, Zt increases and so CtD decreases. In the context of queueing networks, a runway corresponds to a server. Therefore, the system can be represented by a single queue served by multiple servers, one for each available runway. For simplicity, in our analysis we assume a single server but extensions are straightforward. 8.3. Kanban-Smoothing (KS) Control Policy According to the KS policy, the entire air traffic network is divided into N stages based on the distance of any point from the destination airport D which is defined as stage 0. Thus, at any given time every airplane belongs to one of the stages depending on its distance from D. An airplane that is either landing or waiting to land (i.e., it is in the airborne wait queue) is in stage 0. Any other airplane is in the ith stage if it is at a distance d such that (i − 1)σ < d ≤ iσ, away form D, where σ is the stage range or stage duration and is a parameter set by the controller. Note that rather than using actual distance units (e.g., Km or miles) to describe each stage, it is preferable to use time units in order to accommodate airplanes with different speeds. In this case, d corresponds to the expected time needed by the airplane to reach D and σ has time units (e.g., minutes). In Figure 8.2 flight fd is expected to arrive at D after 2σ < d < 3σ time units so it is assigned to stage 3. Similarly, fe is assigned to stage N − 1. Every stage i is assigned ki kanban (tokens or permits). Every time an airplane enters stage i it releases the kanban from the previous stage, i + 1, and receives a kanban from the new stage i. Unlike manufacturing systems, however, the stage boundaries in the air traffic network cannot be rigidly set. In a serial production line, if a part finds no kanban available in the next stage it is simply forced to wait at its current stage until a kanban becomes available. In our case, however, a traveling airplane cannot be forced to wait in the midst of its trip until a kanban is freed. Thus, our controller allows for a relaxed exchange of kanban as follows: If an airplane that already holds a kanban from a previous stage enters a new stage that does not have a free kanban, the airplane simply does not release the kanban of the previous stage. In other words, it only releases a kanban of an upstream stage when it can get a kanban of some downstream stage. The control policy operates on airplanes that have recently become ready to take off as follows: The airplane informs the controller of its departure time from a source airport and its expected arrival time at D. Using this information, the controller determines the airplane’s original stage, say i. If that stage has an available kanban, it assigns it to the airplane which is then allowed to take off. On the other hand, if no kanban is available, the controller searches all upstream stages for a free kanban. If it finds one, say m > i, it assigns it to the new airplane, but instructs it to delay its departure by an amount of time which corresponds to the time difference between the stage that the airplane is currently in and the stage that the available kanban was found, i.e., by (m − i)σ. 64 Stage N-1 Stage 3 fd Stage 2 Stage 1 fb fa Destin. Airport D Stage 0 fc Time Kanban flow fe Figure 8.2: Stage representation for the KS control policy 8.3.1. Representation of (KS) as a Timed State Automaton To formalize this process, we shall model it through a timed state automaton (E, X , Γ, f, x0 , V), where E is the event set, X is the state space, Γ contains the feasible event sets for all x ∈ X , f is the state transition function, x0 is the initial state and V is the clock structure associated with the events in E (for the dynamics of the automaton see Chapter 5 and for more details see [12]). In this context, the event set is given by E = {α1 , · · · , αN −1 , β1 , · · · , βN −1 , γ} where αi denotes the event that an airplane which is located in stage i becomes ready to take off, βi denotes the event that an airplane has moved from stage i to i − 1, and γ denotes the event that an airplane has landed at D. The state space of the system is described by two N -dimensional integer vectors (x, a) such that x = [x0 , · · · , xN −1 ] and a = [a0 , · · · , aN −1 ]. (8.1) In x, element x0 is the number of airplanes physically present in stage 0 (i.e., airplanes that are either landing or waiting to land). In a, element a0 is the number of free kanban of stage 0. Similarly, xi is the number of airplanes that are physically in the ith stage, that is, they are at a distance d away from D such that (i − 1)σ < d ≤ iσ. Lastly, ai is the number of free kanban at the ith stage. Note P −1 that 0 ≤ xi ≤ N j=i kj and 0 ≤ ai ≤ ki , where kj is the number of kanban assigned to stage j. For every state (x, a), the feasible event set is given by Γ(x) = N[ −1 j=1 αj N[ −1 {βj : xj > 0} j=1 65 [ {γ : x0 > 0} (8.2) that is, events αj are always feasible, while an event βj or γ is only feasible if the corresponding stage contains at least one airplane. To specify the state transition functions, we first define the following auxiliary variables: ( pi = N, min{m : m ≥ i, am > 0}, if ai = 0, · · · , aN −1 = 0 otherwise for all i = 1, · · · , N − 1, and qi = min m : m ≥ i, m X j=i xj ≤ m X j=i kj for all i = 0, · · · , N − 1. The variable pi is used to determine the first upstream stage (i.e., pi ≥ i) that has an available kanban and if none exists we set pi = N . The variable qi determines the first upstream stage that does not have any airplane with kanban assigned from a further upstream stage. The criterion for such a stage (say m) is that the total number of airplanes in stages i through m must be less than the total number of kanban assigned to these stages. We now specify the state transition functions as follows (a prime ‘·0 ’ is used to indicate the next state following an event occurrence): • If event αi occurs and pi < N , set: x0pi = xpi + 1 and a0pi = api − 1 • If event βi occurs, set: x0i = xi − 1 and x0i−1 = xi−1 + 1 In addition, if ai−1 > 0 set a0i−1 = ai−1 − 1 and a0qi = aqi + 1 • If event γ occurs, set: x00 = x0 − 1 and a0q0 = aq0 + 1 When an αi event occurs, the state of stage pi is updated. If none of the feasible stages i, · · · , N − 1 has an available kanban, i.e., pi = N , then the corresponding flight is cancelled (or must be held and scheduled only when some kanban becomes available). When a βi event occurs, if the new stage (i − 1) has an available kanban (i.e., ai−1 > 0), then the airplane involved will release the kanban of stage i and get a kanban form i − 1. However, it is possible that in stage i there is an airplane that still has a kanban from stage i + 1 because when it arrived at i there was no kanban available. In this case, this airplane will get the newly freed kanban and release the kanban of i + 1. In general, there may be several airplanes that are physically in stage s ≥ i but have kanban from another stage (say z > s). Those airplanes will get a kanban from z − 1 and release the kanban from z. In effect, a newly released kanban will propagate upstream until it finds the first stage that does not have any airplane with kanban from a further upstream stage, determined through qi . A similar upstream propagation of kanban may occur when a γ event occurs as well, in which case q0 is used to determine the ultimate stage that will receive a newly released kanban. For example, in Figure 8.2, when fa 66 enters stage 0, it gets a kanban from stage 0 and releases the kanban of stage 1. Subsequently, fb gets the kanban of stage 1 and releases the kanban of stage 2, which is then taken by fc . Finally, fc releases the kanban of stage q = 3. So even though fa released a kanban of stage 1, at the end the released kanban in the one from stage 3. The initial state x0 can be any feasible state, but for the sake of simplicity we assume that the system starts out empty, so x = [0, · · · , 0] and a = [k0 , · · · , kN −1 ]. Note that at any airport usually there are no scheduled arrivals between midnight and six o’clock in the morning, which gives enough time for the system to empty making it a reasonable assumption. Finally, the clock structure V describes the lifetime sequence of every event in E through some probability distribution. 8.3.2. Evaluation of GHD Under (KS) In the event that there is no kanban available in the current stage for a flight that has become ready to take off, it is immediately apparent that this flight will experience some ground holding delay (GHD). The determination of the exact delay can result in a “conservative” or “liberal” approach with respect to the airborne delay allowed as illustrated next. Distance from D Stage 0 Stage 1 Stage i+1 Stage i Stage i-1 Airport D f1 d Z t1 f1 δt wg f2 t2 case (a) wa wg f1 I t2 case (b) wg f1 f2 f2 t2 case (c) Z Figure 8.3: Assignment of Ground-Holding Delay (GHD) under KS. (a) GHD until the beginning of the next stage. (b) GHD until the end of the next stage. (c) GHD until the previous airplane clears the runway. For simplicity, we assume that there is a single source airport S (i.e. M = 1) and it is located d time units away from the destination airport D. Furthermore, assume that each stage is assigned a single kanban (i.e., ki = 1 for all i = 1, · · · , N ) and that the system starts out empty. At t = t1 , flight f1 is ready to depart from S and since (i − 1)σ ≤ d ≤ iσ, f1 is assigned to stage i as shown in Figure 8.3. After δt, at t = t2 = t1 + δt, f2 is ready to take off from S. If f1 is still in stage i, then f2 will be assigned to stage i + 1 and will be given a kanban from the same stage. Now, any 67 airplane in stage i + 1 will arrive at D after an interval s such that iσ ≤ s ≤ (i + 1)σ. In order for f2 to arrive within this range, it must delay its takeoff by an interval wg which can take any value between iσ − d ≤ wg ≤ (i + 1)σ − d. In the more liberal approach, wg = iσ − d. Then, it is possible that f2 will arrive at D before f1 clears the runway, therefore it will experience some airborne delay wa as shown in case (a) of Figure 8.3. On the other hand, in the more conservative approach, wg = (i + 1)σ − d. In this scenario the runway of D will be idle for a period I while f2 is experiencing unnecessary ground delays as shown in case (b). Notice that in the best case scenario, f2 will arrive at D immediately after f1 has cleared the runway as shown in case (c). Under this scenario, the runway does not remain idle, while the airborne delay wa is zero. Clearly, the total delay (wg + wa ) in cases (a) and (c) is the same, however, since we assume that there is a higher cost associated with airborne delays, case (c) is preferred. The discussion above raises two issues. First, the performance of KS depends on the controller parameters, i.e., the duration of each stage σ and the number of kanban assigned to each stage ki , i = 0, · · · , N − 1. Thus, it is necessary to find ways for determining these parameters, and we address this issue in Section 8.5. Second, it should be apparent that the time division in intervals of length σ prevents us from applying a control policy that can completely eliminate airborne delays, unless an additional mechanism is developed (thus complicating the KS scheme). This motivates our second approach, which is based on using Perturbation Analysis (PA) techniques to analyze a sample path of the system and aim at minimizing the airborne delays without creating unnecessary idle periods. This approach is described in the next section. 8.4. Airplane Scheduling Using Finite Perturbation Analysis At any given time, every airport maintains a list of the airplanes scheduled to arrive during the day. Each new airplane requesting takeoff from a source airport represents an addition to this schedule and incurs an additional cost. Our approach in this section is based on a derivation of the incremental cost associated with the arrival of an extra airplane as a function of its ground-holding delay and the use of Finite Perturbation Analysis (FPA) [12, 35] to minimize this cost. Since the destination airport corresponds to a single server queue, the dynamics associated with it are given by the standard Lindley recursion (e.g., see [48]): Lk (d) = max{Ak (d), Lk−1 } + Zk (8.3) where Ak (d) is the time until airplane k will arrive at D when it is assigned a ground holding time d, Zk is the time that it occupies the runway, and Lk (d) is the time until it lands. Note that applying some ground-holding delay to k implies that its arrival time Ak will increase which in turn will affect its landing time through the max operation in (8.3). Therefore, any cost function expressed in terms of Ak , Lk will be non-differentiable, making it difficult to solve any associated minimization problem. To overcome this problem, we recognize that points of non-differentiability correspond to ground-holding delays that result in event-order changes (i.e., ground-holding delays that result in two airplanes arriving at exactly the same time). Based on the arrival times of the already expected airplanes, we break all possible ground-holding delays into smaller intervals [Aj , Aj+1 ) for all j such that Aj > Ak (0) and optimize the cost function in each interval. Then, we determine the groundholding delay that corresponds to the minimum cost among all these intervals. Before we describe the details of our optimization procedure, it is essential that we define all timing intervals that we will use in studying a sample path. At any point in time, the controller of 68 sm ¾ -¾ τ - ··· D ? ? Aa−1 Ak (0) ? ? Am−1 Aa ? ? Ak (sm + τ ) Am - Figure 8.4: Timing diagram for ground-holding delay the destination airport has a list of all airplanes that are expected to arrive with their corresponding arrival times {A1 , · · · , Am , · · ·} as shown in Figure 8.4. When the kth airplane is ready to take off from some airport Sj , it informs the controller about its expected arrival time, which is the earliest possible arrival time for k when the ground delay d is zero, that is, Ak (0). Based on this value, the controller can identify the airplanes that are expected right before and right after k, denoted by a − 1 and a respectively with a = min{i : Ak (0) < Ai }. If the kth airplane is assigned a ground-holding delay d ≥ 0, its new expected arrival time is delayed by d, that is, Ak (d) = Ak (0) + d, (8.4) In general, this will place the arrival time in some interval [Am−1 , Am ), where m ≥ a, as illustrated in Figure 8.4. For reasons that will be clear in the sequel, we break the ground-holding delay into two parts d = sm + τ, (8.5) where, sm is the ground holding delay which forces k to arrive at exactly the same time as Am−1 , and τ is any additional delay within the interval [Am−1 , Am ). Specifically, define sm = max{Am−1 − Ak (0), 0}, for any m ≥ a. (8.6) Notice that for the first possible interval [Aa−1 , Aa ), we have sa = 0, thus the need for the max operator. In addition, since Am−1 ≤ Ak (0) + sm + τ < Am , the “residual” delay τ is constrained as follows: 0 ≤ τ ≤ Am − Ak (0) if sm = 0 0 ≤ τ ≤ Am − Am−1 otherwise. Finally, we point out that given the ground-holding delay d of k, airplane k will arrive after m − 1, therefore its arrival will not affect the airborne waiting time of a − 1 or any other airplane j < m − 1. On the other hand, the arrival of k may increase the airborne delay time of any airplane that is expected after k, that is, airplanes m, m + 1, · · ·. Next, define ˜ j (d) − Lj ≥ 0 j = 1, 2, · · · ∆Lj (d) = L (8.7) ˜ j (d) is the where Lj is the expected landing time of airplane j before airplane k is considered, and L expected landing time of j if k is assigned a ground holding time d. Using Perturbation Analysis (PA) nomenclature, Lj is the landing time in the nominal sample path of this system. The addition of the new airplane k would result in a perturbed sample path. The values of d ≥ 0 define a family ˜ j (d) is the landing time of j in a perturbed sample path of such perturbed sample paths, and L corresponding to some d. As already pointed out, for all j such that Aj < Ak (d), ∆Lj (d) = 0, while for j such that Aj ≥ Ak (d), ∆Lj (d) ≥ 0. 69 Now we are ready to express the additional cost due to k as a function of the ground-holding delay d = sm + τ . Letting cg and ca be the costs per unit time of ground and airborne delays respectively, we have X Ck (sm , τ ) = cg (sm + τ ) + ca max{0, Lm−1 − Ak (sm + τ )} + ca ∆Lj (τ ) (8.8) {j:Aj >Ak (sm )} The first term is the cost due to ground-holding of airplane k, and the second term is its airborne delay cost, which is positive only if Lm−1 > Ak (sm + τ ). The last term is the cost incurred to all airplanes that are expected after k. Note that since ∆Lj (d) = 0 for all j such that Aj < Ak (sm + τ ), those terms are left out of the summation, which is why ∆Lj becomes a function of τ only. Finally, throughout the remainder of the paper we will make the obvious assumption that ca ≥ cg . Our objective then is to determine τ = τ ∗ such that Ck (sm , τ ∗ ) ≤ Ck (sm , τ ) for all possible sm and then find the value of sm with the minimum cost. That is, we seek min Ck (sm , τ ), k = 1, · · · , K (8.9) sm ,τ where K is the number of expected arrivals in a time period of interest, typically a day. Next, we concentrate on deriving the optimal point in any interval [Am−1 , Am ), m = a, a + 1, · · ·. In evaluating Ck (sm , τ ) we need to evaluate all perturbations ∆Lj (τ ) for j ≥ m. We will first consider the case j > m, and then the case j = m. For all j > m (with m fixed) we can easily evaluate the perturbation ∆Lj (τ ) using (8.3) and (8.7) as follows: ˜ j−1 (τ )} + Zj − max{Aj , Lj−1 } − Zj ∆Lj (τ ) = max{Aj , L ( = h = max{0, ∆Lj−1 − Ij }, if Aj > Lj−1 ∆Lj−1 , otherwise ∆Lj−1 (τ ) − [Ij ]+ i+ , j>m (8.10) where [x]+ = max{0, x} and Ij = Aj − Lj−1 is the idle period preceding the arrival of airplane j which is present if Aj > Lj−1 . Equation (8.10) indicates that a perturbation is generated only at the landing of airplane m. The perturbation of any airplane j > m is just due to perturbation propagation. Also, note that since the perturbed sample path contains a new arrival, it is impossible to generate a new idle period. We now express the perturbation in the landing time ∆Lj (·) for all j > m as a function of ∆Lm (τ ) for the given m. Let us group all airplanes j ≥ m in busy periods starting from m. Note that (8.10) implies that for any airplane z that belongs to the same busy period as m, we have ∆Lz (τ ) = ∆Lm (τ ) since [Iz ]+ = 0. Furthermore, for any airplane n that starts a new busy period we have In > 0, hence ∆Ln (τ ) = [∆Ln−1 (τ ) − In ]+ . Let Bn be the number of airplanes in the busy period that started with n. Then, clearly, for all z = n + 1, · · · , n + Bn , we have ∆z (τ ) = ∆Ln (τ ). With these observations, we can express the incremental cost Ck (sm , τ ) in (8.8) as a function of the generated perturbation ∆Lm (τ ): Ck (sm , τ ) = cg (sm + τ ) + ca max{0, Lm−1 − Ak (sm ) − τ )} + ca B X b=1 70 " Bb ∆Lm (τ ) − b X i=1 #+ IiB (8.11) where B is the number of busy periods after the arrival of k, B1 is the number of airplanes that follow m and are in the same busy period as m, and Bb , b = 2, · · · , B is the number of airplanes in the bth busy period. Finally, IiB is the idle period preceding busy period i = 1, · · · , b. Next, we make the following simplifying assumption: A8.1. The minimum time between any two consecutive airplane landings is constant and equal to Z, i.e., Zk = Z for all k. This assumption is reasonable since Z depends on the weather conditions which do not change dramatically in a small period of time (e.g., a few minutes), so that Zk−1 ≈ Zk ≈ Zk+1 . This assumption maintains a First Come First Serve (FCFS) scheduling discipline at D; without this assumption, an optimization algorithm aiming at minimizing the delay will give precedence to the airplanes with the smallest landing time. Further, we assume that Z does not depend on the type of airplane, but only on the weather so it is the same for all airplanes. We now consider the case j = m and determine the generated perturbation ∆Lm (τ ). This provides an initial condition for the recursive relationship in (8.10). Recalling (8.3), we have Lm = ˜ m (τ ) = max{Am , L ˜ k (τ )} + Z, therefore, max{Am , Lm−1 } + Z and L ˜ k (τ )} − max{Am , Lm−1 } ∆Lm (τ ) = max{Am , L (8.12) ˜ k (τ ) ≥ Lm−1 . It follows that where L ( ∆Lm (τ ) = ˜ k (τ ) − Lm−1 L if Am ≤ Lm−1 , (i.e., m does not start a new busy period) ˜ max{0, Lk (τ ) − Am } if Am > Lm−1 , (i.e., m starts a new busy period) Let Wk be the airborne waiting time of airplane k, given by Wk = [Lm−1 − Ak (sm ) − τ ]+ ˜ k (τ ) = Ak (sm ) + τ + Wk + Z, we get Since L ( ∆Lm (τ ) = Ak (sm ) + τ + Wk + Z − Lm−1 , if Am ≤ Lm−1 + max{0, Ak (sm ) + τ + Wk + Z − Am }, if Am > Lm−1 Expanding Wk = max{0, Lm−1 − Ak (sm ) − τ }, we get ( ∆Lm (τ ) = max{0, Ak (sm ) + τ − Lm−1 } + Z, if Am ≤ Lm−1 max{0, Ak (sm ) + τ + Z − Am , Lm−1 + Z − Am }, if Am > Lm−1 (8.13) This expression provides the perturbation generated when airplane m lands resulting from the addition of airplane k such that Am−1 = Ak (sm ) ≤ Ak (sm ) + τ ≤ Am . Thus, the control variable τ is constrained by 0 ≤ τ ≤ Am − Ak (sm ) Using the perturbation expression in (8.13) and the cost expression in (8.11), we can determine the ground holding delay τ that minimizes the additional cost Ck (sm , τ ) (under fixed sm ). This is accomplished in the following theorem the proof of which is found in the Appendix E. 71 Theorem 8.4.1 Let T1 = Lm−1 −Ak (sm ) and T2 = Am −Ak (sm ). Then, in any interval [Am−1 , Am ), the additional cost Ck (sm , τ ) is minimized by τ ∗ = T1 + T2 − max{T1 , T2 } − min{0, T1 } (8.14) So far we have identified the minimum cost in each of the intervals [Am−1 , Am ) for all m = a, a + 1, · · ·. Next we summarize the results in the form of an algorithm that gives the solution to (8.9). 8.4.1. FPA-Based Control Algorithm The algorithm we present here determines the GHD that minimizes the incremental cost in (8.9) based on the assumption that the controller’s information consists of a list with all expected airplanes j = 1, · · · , k − 1 that have already been scheduled prior to airplane k requesting permission to arrive at D. The list includes the expected arrival times (Aj ), calculated landing times (Lj ), and idle period lengths (Ij ≥ 0) that precede each scheduled airplane. Note that this algorithm solves a costminimization problem for each airplane k; as such, this is a local optimization problem, in the sense that it is driven only by an individual takeoff request and does not take into account all expected arrival information in the system. This justifies the term Local qualifying the FPA-based control algorithm (L-FPA). In the next section, we will show that the local nature of this algorithm may lead to sub-optimal solutions and then examine how it may be used to achieve global optimality. The L-FPA algorithm starts by identifying the airplanes that will precede and follow the new airplane (denoted by k) when its ground-holding time is zero. That is, we identify the airplanes indexed by a − 1 and a respectively. Using the information in the expected airplane list (Aa−1 , Aa ) together with Theorem 8.4.1, the controller determines the ground-holding delay that would minimize Ck (0, τ ) in the interval [Ak (0), Aa ). Next it asks what would happen if airplane k arrives between the next two airplanes; in other words, what would happen if m − 1 = a and m = a + 1. The algorithm continues traversing backwards checking whether increasing the ground holding delay may reduce the cost. The question that arises is when the algorithm should stop the backward search. Note that any GHP is trading off airborne delay for ground delay based on the assumption that airborne delay is more expensive. Therefore, when the cost due to ground-holding delay becomes greater than the airborne cost under zero ground-delays, increasing the ground holding delay any further will not reduce the cost, hence defining a stopping condition. In other words, the stopping condition is sm cg ≥ Ck (0, 0) where cg is the ground-holding cost per unit time, sm = max{0, Am−1 −Ak (0)}, and Ck (·, ·) is defined in (8.11). 72 Local FPA-based Control Algorithm (L-FPA) When k requests permission to arrive at D, Step 1. INITIALIZE: m := min{i : Ak (0) < Ai } sm := 0, Cmin := Ck (0, 0), GHD := 0 Step 2. IF cg sm ≥ Ck (0, 0) GOTO Step 6. Step 3. Determine τ ∗ using equation (8.14) Step 4. IF Ck (sm , τ ∗ ) < Cmin THEN Cmin := Ck (sm , τ ∗ ) and GHD := sm + τ ∗ Step 5. Set m := m + 1, sm := Am−1 − Ak (0) and GOTO Step 2. Step 6. END. To complete the specification of the algorithm we include the cases where m−1 or m does not exist by setting Lm−1 = 0 and Am = ∞ respectively. Upon termination of the algorithm, GHD holds the value of the ground-holding delay that minimizes the cost function Ck (·), i.e., d∗ = s∗m + τ ∗ = GHD. The following lemma provides some insight to the operation of the L-FPA algorithm and will prove useful in our analysis of global optimality. Lemma 8.4.1 asserts that when an airplane k is expected to arrive at D, it will be assigned a GHD such that its own airborne delay is zero. In other words, when k is expected to arrive during a busy period of the system, it is assigned a GHD which is at least long enough to allow the last airplane of the busy period to clear the runway. In the sequel we shall use dk to denote the GHD of the kth airplane as determined by L-FPA. Lemma 8.4.1 Suppose that Ak (0) ≤ Aa ≤ La−1 , where a := min{i : Ak (0) < Ai }. Then, the GHD assigned to k by the L-FPA controller is such that dk ≥ Ll (dl ) − Ak (0) where l is the index of the last airplane of the busy period that a belongs to. The proof of the lemma is included in the Appendix E. An obvious corollary of Lemma 8.4.1 is that within any busy period, the additional cost Ck (·, ·) is monotonically decreasing as dk increases. From a practical standpoint, the L-FPA algorithm is well-suited for dynamic control: It is triggered by the kth airplane ready to take off, which requests permission from the controller of the destination airport by sending its earliest possible arrival time Ak (0). The controller then determines the GHD that minimizes the incremental cost based on all information available up to that time instant, i.e., all expected arrival times Aj , where j = 1, · · · , k − 1 are airplanes already scheduled prior to k. Note that Ak (0) will account for possible congestion at the source airport as well as any other known factors that may delay its departure. In addition, Ak (0) will include the best current estimate of the travel time of k until the destination is reached. Whether this approach can also achieve global optimality in the sense of completely eliminating all airborne delays is an issue addressed in the next section. 73 8.4.2. Global Optimality of the FPA Approach As previously mentioned, the L-FPA algorithm addresses a local optimization problem pertaining to an individual airplane k and it does not take into account all expected arrival information in the system; this includes future airplanes expected to request permission to take off from a source airport and fly to D. This raises the question of whether the algorithm might possibly achieve global optimality as well. In this section, we shall first show how the L-FPA algorithm may lead to sub-optimal solutions and then examine how it may be modified to achieve global optimality. The fact that the L-FPA controller does not take into account all expected arrivals may lead to sub-optimal policies as shown in Figure 8.5. Suppose that flight fk is expected to arrive at D after Ak (sm ) time units. Under the FPA scheme, assuming appropriate ground and airborne costs, cg and ca respectively, fk will be assigned a ground delay τ ∗ , as indicated in Figure 8.5 (a). In addition, assume that flight fm , originating from an airport located farther away than the source airport of k, had already requested permission to take off and is scheduled to arrive at Am as shown in the a same figure. When the L-FPA control policy is implemented, fk will induce an airborne delay wm on flight fm . One can easily notice that the performance of the L-FPA scheme can be improved ∗ to flight f , thus eliminating all airborne delays as shown in by introducing a ground delay τm m Figure 8.5 (b). However, under the L-FPA control policy the latter cannot materialize since at the time the information on fk becomes available, fm is already en-route to D; therefore, introducing ground delays is infeasible and the only option at this point is to reduce the speed of fm . ∗ ¾ τ -¾ Z a ¾wm - (a) ? D (b) D Ak (sm ) Lm−1 ∗ - Am ∗ ¾ τk ? ¾ τm - Ak (sm ) Lm−1 ∗ Am Am + τm - Figure 8.5: Global Optimality: (a) L-FPA result (b) Global Optimum. Therefore, if we consider a “global optimum” policy to be one through which airborne delay is entirely traded off for ground-holding delay, then the absence of future information, i.e., information on airplanes that are expected to take off after k, prevents the L-FPA control scheme from achieving this goal. The question that arises is whether this scheme, equipped with additional information on all expected flights, can converge to a globally optimal policy. For instance, every morning the airport controller has information on the expected arrival times of all flights that are scheduled for the day, as well as predictions on the airport capacity. Is it possible to use this information together with the FPA approach to derive a policy such that airborne delay is reduced to zero? It turns out that this is possible as shown in Theorem 8.4.2. Before further considering optimality properties, let us completely characterize a “globally op74 timal” policy. To do so, assume that during a particular day, under a zero ground-holding policy all airplanes will experience a total of Da0 time units of airborne delay and, of course, zero ground delay Dg0 = 0. In this case, under any ground-holding policy, it is unavoidable that all airplanes will experience a total of at least Da0 time units of delay, either in the air or on the ground. Assuming that airborne delay is more expensive than ground delay, then the globally optimal policy is one where Da∗ = 0 and Dg∗ = Da0 (8.15) That is, all of the airborne delay is traded off for exactly the same amount of ground delay. We will show in Theorem 8.4.2 that the L-FPA algorithm can achieve (8.15) under an appropriate information structure. To do so, we will make use of the following lemma the proof of which is in the Appendix E. Lemma 8.4.2 Suppose that all flights are ordered based on their arrival time such that A1 (0) ≤ A2 (0) ≤ · · · ≤ Ak (0) ≤ · · · ≤ AK (0), and are used to drive the L-FPA algorithm in that order. Then, the L-FPA algorithm will assign GHDs dj , j = 1, · · · , K, such that A1 (d1 ) ≤ A2 (d2 ) ≤ · · · ≤ Ak (dk ) ≤ · · · ≤ AK (dK ), where dj = sj + τ j and sj , τ j are the solutions to (8.9). The proof of Lemma 8.4.2 provides additional insights to the operation of the L-FPA algorithm when each airplane is considered according to its arrival order. The following corollary allows us to immediately determine the GHD of k + 1 given the schedule up to k. Corollary 8.4.1 Suppose that all flights are ordered based on their arrival time such that A1 (0) ≤ A2 (0) ≤ · · · ≤ Ak (0) ≤ · · · ≤ AK (0), and are used to drive the L-FPA algorithm in that order. Then, the GHD assigned by L-FPA to each airplane is given by n o dk = sk + τ k = max 0, Lk−1 (dk−1 ) − Ak (0) for all k = 1, · · · , K. (8.16) The proof of the corollary follows from the proof of Lemma 8.4.1. When k − 1 is the last scheduled airplane, H = 0 and hence Ak (dk ) = Lk−1 (dk−1 ), therefore, dk = Lk−1 (dk−1 ) − Ak (0). The max operator is necessary for the case when Ak (0) > Lk−1 (dk−1 ). This suggests that if the kth airplane is scheduled to arrive before airplane k −1 clears the runway (Ak (0) < Lk−1 (d∗k−1 )), it should delay its departure by Lk−1 (d∗k−1 ) − Ak (0) which is the expected airborne delay that would be experienced by airplane k. On the other hand, if the kth airplane is expected after k − 1 lands (Ak (0) > Lk−1 (d∗k−1 )), then it should depart immediately. In this case, the loop of steps 2-5 of the L-FPA algorithm in Section 8.4.1 is eliminated. Theorem 8.4.2 Suppose that all flights are ordered based on their arrival time such that A1 (0) ≤ A2 (0) ≤ · · · ≤ Ak (0) ≤ · · · ≤ AK (0), and are used to drive the L-FPA algorithm in that order. Then, the L-FPA algorithm yields a globally optimal policy. The proof of the theorem is included in the Appendix E. Theorem 8.4.2 suggests an alternative approach of using the FPA algorithm which is referred to as “Global FPA” (G-FPA) because it results in a globally optimal schedule. In this case, the FPA algorithm is triggered at any desired time instant and is given the list of all expected airplane arrivals in ascending order (i.e., Aj (0) ≤ Aj+1 (0), for j = 1, · · · , K − 1). The output of FPA is also a list with the optimal GHDs of all airplanes dj , j = 1, · · · , K. 75 Global FPA Algorithm (G-FPA) At any point in time Step 1. Order airplanes according to their arrival times A1 (0) ≤ A2 (0) ≤ · · · ≤ AK (0) Step 2. FOR k = n 1 TO K o k d = max 0, Lk−1 (dk−1 ) − Ak (0) Step 3. END. Figure 8.6 shows two different ways of implementing the underlying FPA algorithm developed in Section 8.4 which result in different control structures. First, the L-FPA controller is shown in Figure 8.6 (a). In this case, the underlying FPA algorithm is triggered by every airplane k which is ready to depart and informs the controller of its expected arrival time Ak (0). The FPA algorithm determines a GHD and adds k to the schedule. Note that when this is done, L-FPA does not change the GHD of any airplane that has already been scheduled. Furthermore, this controller does not take into consideration any airplanes that will depart after k and, for this reason, it cannot guarantee a global optimal solution in the sense of (8.15), as indicated at the beginning of this section. The G-FPA controller is shown in Figure 8.6 (b). This controller is invoked every time there is a change in some of the current estimates: the airport capacity (i.e., changes in Z) or the estimates of airplane expected arrival times, Aj (0), j = 1, · · · , K. Every time that the FPA algorithm is triggered it considers future information and can change the GHD of any airplane, leading to a global optimum as proven in Theorem 8.4.2. As shown next, even though this algorithm needs to be executed several times, it is very efficient so as not to pose any computational problems. Memory {Ai : Di < Dk} Ak FPA Input at time t Input at time t A1 :: Ak Ak+1 :: AK dk Output at time t Output at time t FPA dk dk+1 :: dK (b) (a) Figure 8.6: FPA-based algorithms. (a) Local FPA (L-FPA) Controller triggered by airplane departures, (b) Global FPA (G-FPA) Controller triggered at any time 8.4.3. Algorithm Complexity The efficiency of the L-FPA algorithm depends on several parameters, including the airport capacity. For example, when the airport capacity is maximum, then the optimal GHD for all airplanes is zero, assuming that the original flight scheduling was done optimally. In this case, the loop of steps 2, 76 3 and, 4 of L-FPA is implemented only once. On the other hand, when the airport capacity is close to zero, then every airplane will experience long GHDs and it is possible that the loop will be implemented close to k times. In the case of G-FPA, the input is the entire list of all expected arrivals applied in ascending order (i.e., Aj (0) ≤ Aj+1 (0), for j = 1, · · · , K − 1). Using Corollary 8.4.1, the GHD of any airplane involves just the evaluation of dk through (8.16) which in turn involves a single addition and comparison. Therefore, the total number of operations required in order to determine the optimal schedule is just 2K where K is the total number of flights expected. Since the G-FPA algorithm is computationally efficient, it is reasonable to have it executed several times during a day for every significant change in weather conditions or in the expected arrival time of a flight. 8.5. Numerical Results This section describes some numerical results that were obtained through simulation of the KS and FPA schemes. We used the model of Figure 8.1 with M = 20 source airports located at distances of 30, 45, · · · , 285 minutes away from D. Furthermore, based on the distance from D, each flight was assigned a scheduled take off time so that the arrival pattern at D is the one shown in Figure 8.7. This figure shows the number of scheduled arrivals at Boston’s Logan International airport for every hour of a typical day and was also used in the studies described in [62]. For the purposes of our experiments, we assumed that the airplanes expected within a given hour were equally spaced over the interval. 50 47 46 41 39 Landings per Hour 40 37 37 40 36 31 30 30 28 29 28 23 22 19 20 16 10 3 0 6 8 10 12 14 16 18 20 Time of Day Figure 8.7: Hourly landings at airport D 77 22 8.5.1. Performance of KS To test the performance of the KS algorithm, we simulated the system with and without the KS controller and compared their corresponding results. First we assumed that the landing capacity of D is fixed at 40 landings per hour, i.e., the minimum separation between any two consecutive airplanes is Z = 1.5 minutes. Furthermore, every time that a flight was given a kanban from a higher stage, the delay assigned to it was the average between the maximum and minimum possible delays (see Section 8.3.2). In other words, delayed airplanes were placed in the middle of the stage. Finally, we fixed the number of kanban per stage to three (3) and observed the performance of the KS scheme for various values of the landing stage duration σ, which is a parameter of the controller. As indicated in Figure 8.8, increasing the landing stage duration decreases the airborne delay while increasing the ground holding time. This is expected, since increasing the interval σ decreases the instantaneous arrival rate at D (3/σ). 8 Delay (min.) No Control Airborne Delay 7 KS Ground Delay KS Airborne Delay 6 KS Total Delay 5 4 3 2 1 0 3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Stage Duration (min.) Figure 8.8: Trade-off between airborne and ground delays for the KS controller Figure 8.9 shows the percentage improvement of the delay cost for various ca /cg ratios. Note that the improvement is maximized when the stage duration is equal to 4.5 minutes which is also equal to the number of kanban times the minimum separation between any two consecutive arrivals. This observation can be generalized through “flow equilibrium” to provide a rule for setting the number of kanban at every stage i, that is » σ ki = Zi ¼ (8.17) where Zi is the predicted minimum separation between any two consecutive airplanes at time t = iσ, and dxe in the smallest integer greater than x. 78 50 Ca=1.25Cg Ca=1.5Cg Ca=1.75Cg Ca=2Cg % Delay Cost Improvent 40 30 20 10 0 -10 -20 3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Stage Duration (min.) Figure 8.9: Overall cost improvement under the KS scheme 8.5.2. Performance of L-FPA As in the KS case, to demonstrate the effectiveness of the L-FPA scheme we compare the performance under the L-FPA controller with the uncontrolled case. In Figure 8.10 we set the cost parameters ca = 1.2cg and obtain the expected ground and airborne delays as a function of the airport capacity. Note that the total delay in the uncontrolled case consists of only the airborne delay. On the other hand, L-FPA forces the airborne delay down to almost zero at the expense of longer ground delays. Also note that the total delay (ground plus airborne) is the same for both cases. Notice that in this case, L-FPA performed very well even though, as we have seen it cannot guarantee a global optimal schedule. Had we used the G-FPA controller that uses future information as well, then the airborne delay would have been reduced to zero, while the ground delay would have been exactly equal to the airborne delay of the uncontrolled case. Figure 8.11 shows the cost benefits of L-FPA under various ca /cg ratios. As indicated in the figure, the higher the airborne cost, the higher the benefit of the L-FPA scheme. Also note that in the case where ca /cg = 1 there is no benefit of using a ground holding policy as indicated in Section 8.3.2. Finally, one can see the benefit of using L-FPA instead of KS when observing the maximum improvements achieved by the two algorithms. Suppose that the ratio ca /cg = 2. Then, as indicated in Figure 8.9, KS can achieve an improvement of about 38%. On the other hand, when the capacity is 40 landings per hour, L-FPA can achieve an improvement of about 46% as indicated in Figure 8.11. 79 Ave. Ground/Airborne Delays (min.) 50 No Control Airborne Delay FPA Ground Delay 40 FPA Airborne Delay FPA Total Delay 30 20 10 0 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Destination Airport Capacity Figure 8.10: Trade-off between airborne and ground delays for the L-FPA controller 8.6. Conclusions and Future Directions In this chapter we present two new approaches for solving the ground-holding problem in air traffic control. The first approach is a heuristic and is based on the kanban control policy. The second approach stems from ideas in perturbation analysis of discrete-event systems and is proven to generate an optimal policy. Both approaches are very easy to implement, inherently dynamic, and scalable while they can accommodate various aspects of the problem such as limited sector capacities etc. In the future it would be interesting to investigate extensions that will enable the analysis of networks of airports. Specifically, we need to address multiple destination flights and banks of flights in the hub-and-spoke system. In multiple destination flights a GHD at any intermediate airport may generate additional costs in several downstream destination airports, thus the monotonicity result of Lemma 8.4.1 may no longer hold. In the hub-and-spoke system, a GHD of an incoming flight at the hub may delay several outgoing flights, or it may cause passengers to miss their connecting flights. In these cases, it is still possible to utilize both of the proposed approaches, but communication between airport controllers may be necessary. 80 50 Ca=Cg Ca=1.25Cg Ca=1.5Cg Ca=1.75Cg Ca=2Cg % Cost Improvement 40 30 20 10 0 -10 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Destination Airport Capacity Figure 8.11: Overall cost improvement under the L-FPA scheme 81 Chapter 9 EPILOGUE In conclusion, we summarize the achievements of the dissertation and outline some possible directions for future research. 9.1. Summary The main focus of this dissertation was the dynamic allocation of discrete-resources in stochastic environments. Such problems arise frequently in communication networks, manufacturing systems, transportation systems etc. To solve such problems, we developed two algorithms. The first one, is descent, in other words at every iteration it goes to an allocation with a lower cost, and it is most suitable for problems with separable convex structure. At every step, this algorithm takes a resource from the “most sensitive” user and reallocates it to the “least sensitive” user, thus it always visits feasible allocations which makes it appropriate for use on-line. The second algorithm is incremental in the sense that it starts with zero resources, and at every step it allocates an additional resource and it is suited for systems that satisfy the smoothness or complementary smoothness conditions. In a deterministic environment, both algorithms were proven to converge to the global optimal allocation in a finite number of steps which is grows linearly with the number the number of resources to be allocated. Furthermore, in stochastic environment, they were proven to converge in probability, while, under some additional mild assumptions they converge with probability one. Moreover, because they are driven by ordinal comparisons they are robust with respect to estimation noise thus their convergence is accelerated since they can use estimates taken over shorter observation intervals. In addition, we used perturbation analysis to improve the performance of the optimization algorithms and to enable on-line implementation. Towards this end, we developed two techniques for predicting the system performance under several parameters while observing a single sample path under a single parameter. The first technique, Concurrent Estimation can be directly applied to general DES while for the second one, FPA, we demonstrated a general procedure for deriving such algorithm for the system dynamics. Furthermore, we point out that both procedures can be used for systems with general event lifetime distributions. Subsequently, we applied principles from the derived optimization techniques on three different problems. The first problem is from the area of manufacturing systems. Specifically, we addressed 82 the problem of allocating a fixed number of kanban to the various production stages as to optimize a performance measure (e.g., throughput, mean delay) while maintaining a low work-in-process inventory. We showed that such systems satisfy the smoothness and complementary smoothness conditions and hence we used the “incremental” algorithm to perform the resource allocation. The second problem we considered is from the area of mobile telecommunication networks and we addressed the problem of channel allocation as to minimize the number of lost calls due to the unavailability of free channels. This problem can be modeled as a separable convex problem and hence we used a variation of the descent algorithm to solve it. Through simulation we showed that the proposed algorithms performed better that existing algorithms and they require less reconfigurations. The last application is from the area of air traffic control where we addressed the problem of determining the ground holding delay as to minimize congestion over busy airports. To solve the problem we developed two new approaches. The first one is a heuristic and is based on the kanban control policy. The second approach stems from the ideas in FPA and is proven to generate an optimal policy. Both approaches are very easy to implement, inherently dynamic, and scalable while they can accommodate various aspects of the problem such as limited sector capacities etc. 9.2. Future Directions There are several direction in which one can extend the results presented in this dissertation. First, in Section 3.5 we started looking at ways of developing a descent-like algorithm for more general systems and proposed a modified version of the algorithm. One possibility is to investigate the conditions under which that modified algorithm converges to the optimal allocations, both in a deterministic and stochastic environments. Furthermore, another possibility worth exploring is whether it is possible to improve the performance of such algorithms by redefining the quantities that involve the finite differences ∆Li (·) and δk . These quantities as they are defined in equations (3.2), (3.9) and (3.22) describe what would happen if some user gets an extra resource independently from what would happen if another user gives up a resource. This however, works only for separable problems. For the general case, it may be necessary to define these quantities in such a way so that they reflect the combined effect. For example, one may define ∆Lij (s) = L(s) − L(s + ei − ej ) for all i, j = 1, 2, · · · , N , where again ei is a vector with all of its elements equal to 0 except the ith one which is equal to 1. Of course, the drawback is that more such differences might be necessary for every step. The channel allocation problem also has some directions worth exploring. First, it should be interesting to investigate the effect of the proposed algorithms (SN and EN) for systems with hierarchical cell structure where overlapping areas are potentially much bigger. Furthermore, the results presented in Chapter 7 assume that the mobile users are “stationary”, i.e., they do not cross the boundaries of the cell where they were initiated. Clearly, it would be interesting to look at how such algorithms affect the number of dropped calls when users are allowed to roam. Finally, note that users in the overlapping areas experience a higher quality of service compared to users located in areas that are covered by a single base station. Another interesting issue is to see how SN and EN affect the fairness in the perceived quality of service among users. 83 Finally, there are several unresolved issues relating to the proposed approaches for solving the ground holding problem in air traffic control. For this problem it would be interesting to investigate extensions that will enable the analysis of networks of airports. Specifically, address multiple destination flights and banks of flights in the hub-and-spoke system. In multiple destination flights a GHD at any intermediate airport may generate additional costs in several downstream destination airports, thus the monotonicity result of Lemma 8.4.1 may no longer hold. In the hub-and-spoke system, a GHD of an incoming flight at the hub may delay several outgoing flights, or it may cause passengers to miss their connecting flights. In these cases, it is still possible to utilize both of the proposed approaches, but communication between airport controllers may be necessary. 84 Appendix A SELECTED ALGORITHMS A.1 S-DOP Pseudo Code 1.0 Initialize: (0) (0) s(0) = [n1 , · · · , nN ]; C (0) = {1, · · · , N }; k = 0; initialize f (k). ˆ k (n(k) , · · · , n(k) ) ≡ [∆L ˆ f (k) (n(k) )] ˆ f (k) (n(k) ), · · · , ∆L 1.1 Evaluate D 1 1 1 N N N (k) (k) ˆ k (n , · · · , n )] 2.1 Set i∗ = arg maxi=1,···,C (k) [D 1 N (k) (k) ˆ k (n , · · · , n )] 2.2 Set j ∗ = arg mini∈C (k) [D 1 N (k) (k) ˆ k (n(k) , · · · , n(k) 2.3 Increase f (k) and Evaluate D 1 i∗ − 1, · · · , nj ∗ + 1, · · · , nN ) ˆ f∗(k) (n(k) ˆ f (k) (k) 2.4 If ∆L j j ∗ + 1) < ∆Li∗ (ni∗ ) Goto 3.1 ELSE Goto 3.2 3.1 Update allocation: (k+1) (k) (k+1) (k) (k+1) (k) ni∗ = ni∗ − 1; nj ∗ = nj ∗ + 1; nm = nm for all m ∈ C (k) and m 6= i∗ , j ∗ ; Set k ← k + 1 and go to 2.1 3.2 Replace C (k) by C (k) − {j ∗ }; If |C (k) | = 1, Reset C (k) = {1, · · · , N }; Go to 2.2 A.2 Time Warping Algorithm (TWA) 1. INITIALIZE n := 0, k := 0, tn := 0, t˜k := 0, xn := x0 , x ˜k = x ˜0 , n k yi (n) = vi (1) for all i ∈ Γ(xn ), si = 0,˜ si = 0 for all i ∈ E, M (0, 0) := Γ(˜ x0 ), A(0, 0) := ∅ 2. WHEN EVENT en IS OBSERVED: 85 2.1 2.2 Use (5.3)-(5.8) to determine en+1 , xn+1 , tn+1 , yi (n + 1) for all i ∈ Γ(xn+1 ), sn+1 for all i i ∈ E. Add the en+1 event lifetime to Vi (n + 1, k): ( Vi (n + 1, k) = 3. Vi (n, k) + vi (sni + 1) Vi (n, k) if i = en+1 otherwise 2.3 Update the available event set A(n, k): A(n + 1, k) = A(n, k) ∪ {en+1 } 2.4 Update the missing event set M (n, k): M (n + 1, k) = M (n, k) 2.5 IF M (n + 1, k) ⊆ A(n + 1, k) then Goto 3. ELSE set n ← n + 1 and Goto 2.1. TIME WARPING OPERATION: 3.1 Obtain all missing event lifetimes to resume sample path construction at state x˜k : ( y˜i (k) = 3.2 3.3 vi (˜ ski + 1) y˜i (k − 1) for i ∈ M (n + 1, k) otherwise Use (5.3)-(5.8) to determine e˜k+1 , x ˜k+1 , t˜k+1 , y˜i (k + 1) for all i ∈ Γ(˜ xk+1 ) ∩ (Γ(˜ xk ) − k+1 {˜ ek+1 }), s˜i for all i ∈ E. Discard all used event lifetimes: Vi (n + 1, k + 1) = Vi (n + 1, k) − vi (˜ ski + 1) for all i ∈ M (n + 1, k) 3.4 Update the available event set A(n + 1, k): n A(n + 1, k + 1) = A(n + 1, k) − i : i ∈ M (n + 1, k), s˜k+1 = sn+1 i i 3.5 o Update the missing event set M (n + 1, k): M (n + 1, k + 1) = Γ(˜ xk+1 ) − (Γ(˜ xk ) − {˜ ek+1 }) 3.6 A.3 IF M (n + 1, k + 1) ⊆ A(n + 1, k + 1) then k ← k + 1 and Goto 3.1. ELSE k ← k + 1, n ← n + 1 and Goto 2.1. Finite Perturbation Algorithm for Serial Queueing Systems 1. Initialize: ∆Dkn = 0 for all k, n ≤ 0. 2. At the departure of (k, n): (a) If (k, n) did NOT wait and was NOT blocked (Wkn ≤ 0, Bkn ≤ 0), then n (k,n−1) ∆Dkn = ∆Dkn−1 − max 0, ∆(k−1,n) − Ikn , (k,n−1) o ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] 86 (b) If (k, n) waited but was NOT blocked (Wkn > 0, Bkn ≤ 0), then n (k−1,n) n ∆Dkn = ∆Dk−1 − max 0, ∆(k,n−1) − Wkn , o (k−1,n) ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] (c) If (k, n) was blocked (Bkn > 0), then n (k−x n+1 n+1 − max ∆(k,n−1) ∆Dkn = ∆Dk−x n+1 (k−x n+1 ∆(k−1,n) ³ ,n+1) (k−x ,n+1) − Bkn − [Ikn ]+ , ,n+1) − Bkn − [Wkn ]+ , ´ o ∆(k−xn+1 − Qn+1 k−xn+1 · 1[n + 1] n+1 −1,n+1) 87 Appendix B PROOFS FROM CHAPTER 3 B.1 Proof of Theorem 3.2.1 First, define the set B(s) = {s : s = [n1 , · · · , ni + 1, ··, nj − 1, ··, nN ] for some i 6= j} which includes all feasible neighboring points to s = [n1 , · · · , nN ], i.e., vectors which differ from s by +1 and −1 in two distinct components (recall that n1 + · · · + nN = K). To prove that (3.3) is a necessary condition, assume that ¯s is a global optimum, that is L(¯s) ≤ L(s0 ) for all s0 ∈ B(¯s). From this we can write: N X Li (¯ ni ) ≤ L1 (¯ n1 ) + · · · + Li (¯ ni + 1) + · · · + Lj (¯ nj − 1) + · · · + LN (¯ nN ) i=1 or Li (¯ ni ) + Lj (¯ nj ) ≤ Li (¯ ni + 1) + Lj (¯ nj − 1) and, therefore, ∆Li (¯ ni + 1) ≥ ∆Lj (¯ nj ) s∗ for any i, j (B.1) To prove the sufficiency of (3.3), let ¯s = [¯ n1 , · · · , n ¯ N ] be an allocation that satisfies (3.3), and let ∗ ∗ = [n1 , · · · , nN ] be a global optimum. Therefore, [n∗1 , · · · , n∗N ] satisfies (B.1), i.e, ∆Li (n∗i + 1) ≥ ∆Lj (n∗j ) for any i, j (B.2) Let n∗i = n ¯ i + di for all i = 1, · · · , N , where di ∈ {−K, · · · , −1, 0, 1, · · · , K} and subject to the constraint N X dj = 0 j=1 which follows from the constraint n1 + · · · + nN = K. Then, define the set A = {i : di = 0}. There are now two cases depending on the cardinality |A| of this set: Case 1: |A| = N . In this case we have n ¯ i = n∗i for all i, so that, trivially, ¯s ≡ s∗ . 88 Case 2: |A| = 6 N . This implies, that there exist indices i, j such that di > 0 and dj < 0. Therefore, we can write the following ordering: ∆Lj (¯ nj + dj + 1) ≥ ∆Li (¯ ni + di ) ≥ ∆Li (¯ ni + 1) ≥ ∆Lj (¯ nj ) (B.3) where the first inequality is due to (B.2), the second is due to A3.2, and the third is due to our assumption that ¯s satisfies (3.3). However, for dj ≤ −2, using A3.2, we have ∆Lj (¯ nj ) > ∆Lj (¯ nj + dj + 1) which contradicts (B.3). It follows that for an allocation to satisfy (3.3) only dj = −1 is possible, which in turn implies that (B.3) holds in equality, i.e., ∆Lj (¯ nj ) = ∆Li (¯ ni + di ) = ∆Li (¯ ni + 1) (B.4) Using A3.2, this implies that di = 1. This argument holds for any (i, j) pair, therefore we conclude that the only possible candidate allocations ¯s satisfying (3.3) are such that ∆Li (¯ ni + 1) = ∆Lj (¯ nj ) for all i, j 6∈ A, di = 1, dj = −1 (B.5) Let the difference in cost corresponding to ¯s and s∗ be ∆(¯s, s∗ ). This is given by ∆(¯s, s∗ ) = N X [Li (¯ ni ) − Li (n∗i )] = i=1 = N X N X [Li (¯ ni ) − Li (¯ ni + di )] i=1 i6∈A ∆Li (¯ ni ) − i=1 i6∈A di =−1 N X ∆Li (¯ ni + 1) = 0 i=1 i6∈A di =1 where in the last step we use (B.5). This establishes that if ¯s = [¯ n1 , · · · , n ¯ N ] satisfies (3.3), then either ¯s ≡ s∗ as in Case 1 or it belongs to a set of equivalent optimal allocations and hence the theorem is proved. B.2 Proof of Theorem 3.2.2 Suppose that ¯s is a global optimum, and consider an allocation s = [n1 , · · · , nN ] such that s 6= ¯s. We can then express ni (0 ≤ ni ≤ K) as: ni = n ¯ i + di for all i = 1, · · · , N where di ∈ {−K, · · · , −1, 0, 1, · · · , K} and subject to N X dj = 0 (B.6) j=1 which follows from the fact that [¯ n1 , · · · , n ¯ N ] is a feasible allocation. Let i∗ = arg maxi=1,···,N {∆Li (¯ ni )}. If s 6= ¯s, it follows from (B.6) that there exists some j such that dj > 0 and two cases arise: Case 1: If j = i∗ , then max {∆Li (ni )} ≥ ∆Lj (nj ) = ∆Li∗ (¯ ni∗ + di∗ ) > ∆Li∗ (¯ ni∗ ) i=1,···,N where the last step is due to A3.2 since di∗ > 0. 89 Case 2: If j 6= i∗ , then first apply Theorem 3.2.1 to the optimal allocation ¯s to get ∆Lj (¯ nj + 1) ≥ ∆Li∗ (¯ ni∗ ) (B.7) Then, we can write the following: max {∆Li (ni )} ≥ ∆Lj (nj ) = ∆Lj (¯ nj + dj ) ≥ ∆Lj (¯ nj + 1) ≥ ∆Li∗ (¯ ni∗ ) i=1,···,N where the second inequality is due to A3.2 and the fact that dj ≥ 1, and the last inequality in due to (B.7). Hence, (3.4) is established. Next, we show that if an allocation ¯s satisfies (3.4) and A3.3 holds, it also satisfies (3.3), from which, by Theorem 3.2.1, we conclude that the allocation is a global optimum. Let i∗ = arg maxi=1,···,N {∆Li (¯ ni )} and suppose that (3.3) does not hold. Then, there exists a j 6= i∗ such that: ni )} (B.8) ni∗ ) = max {∆Li (¯ ∆Lj (¯ nj + 1) < ∆Li∗ (¯ i=1,···,N Note that if no such j were to be found, we would have ∆Lj (¯ nj + 1) ≥ ∆Li∗ (¯ ni∗ ) > ∆Lk (¯ nk ) for all j, k (because of A3.3) and we would not be able to violate (3.3) as assumed above. Now, without loss of generality, let i∗ = 1 and j = N (j satisfying (B.8)). Then, using A3.2, A3.3, and (B.8), the feasible allocation [¯ n1 − 1, n ¯2, · · · , n ¯ N −1 , n ¯ N + 1] is such that: ∆L1 (¯ n1 ) = max{∆L1 (¯ n1 ), · · · , ∆LN (¯ nN )} > max{∆L1 (¯ n1 − 1), ∆L2 (¯ n2 ), · · · , ∆LN (¯ nN + 1)} which contradicts (3.4) for the feasible allocation [¯ n1 −1, n ¯2, · · · , n ¯ N −1 , n ¯ N +1] and the theorem is proved. B.3 Proof of Lemma 3.3.1 To prove P1, first note that if δk ≤ 0, then, from (3.5), ∆Li∗k+1 (ni∗k+1 ,k+1 ) = ∆Li∗k (ni∗k ,k ). On the other hand, if δk > 0, then there are two cases: Case 1: If i∗k = i∗k+1 then, from A3.2, ∆Li∗k (ni∗k ,k ) > ∆Li∗k (ni∗k ,k − 1) = ∆Li∗k+1 (ni∗k+1 ,k+1 ) Case 2: If i∗k 6= i∗k+1 = p, then there are two possible subcases: Case 2.1: If p = jk∗ , since δk > 0, we have ∆Li∗k (ni∗k ,k ) > ∆Lp (np,k + 1) = ∆Li∗k+1 (ni∗k+1 ,k+1 ) Case 2.2: If p 6= jk∗ , then by the definition of i∗k and the fact that np,k = np,k+1 , ∆Li∗k (ni∗k ,k ) ≥ ∆Lp (np,k ) = ∆Li∗k+1 (ni∗k+1 ,k+1 ) 90 The proof of P2 is similar to that of P1 and is omitted. Next, we prove property P3. First, note that when p = i∗k we must have δk > 0. Otherwise, from (3.5), we get ni,k+1 = ni,k for all i = 1, · · · , N . From (3.10), this implies that i∗k+1 = i∗k = p, which violates our assumption that p 6= i∗l for k < l < m. Therefore, with δk > 0, (3.5) implies that np,k+1 = np,k − 1. In addition, np,m = np,k+1 = np,k − 1, since p 6= i∗l for all l such that k < l < m, and p ∈ Cm . We then have: δm = ∆Li∗m (ni∗m ,m ) − ∆Lp (np,m + 1) = ∆Li∗m (ni∗m ,m ) − ∆Lp (np,k ) = ∆Li∗m (ni∗m ,m ) − ∆Li∗k (ni∗k ,k ) ≤ 0 where the last inequality is due to P1. Therefore, (3.14) immediately follows from (3.6). To prove P4, first note that when p = p is removed from Ck , in which case p p = i∗m as assumed. Therefore, with δk np,m = np,k+1 = np,k + 1, since p 6= jl∗ consider two possible cases: jk∗ we must have δk > 0. If δk ≤ 0, then from (3.6), 6∈ Cm for any m > k and it is not possible to have > 0, we get np,k+1 = np,k + 1 from (3.5). Moreover, for all l such that k < l < m, and p ∈ Cm . We now Case 1: If δm > 0, then np,m+1 = np,m − 1 = np,k . The following subcases are now possible: Case 1.1: If there is at least one j ∈ Cm+1 such that ∆Lj (nj,m+1 ) > ∆Lp,m+1 (np,m+1 ), ∗ then we are assured that i∗m+1 6= p. If jm+1 = arg mini∈Cm+1 {∆Li (ni,m+1 )} is unique, ∗ ∗ then, since jk = p and np,m+1 = np,k , it follows from P2 that jm+1 = p. Now consider δm+1 and observe that δm+1 = ∆Li∗m+1 (ni∗m+1 ,m+1 ) − ∆Lp (np,m+1 + 1) = ∆Li∗m+1 (ni∗m+1 ,m+1 ) − ∆Lp (np,m ) = ∆Li∗m+1 (ni∗m+1 ,m+1 ) − ∆Li∗m (ni∗m ,m ) ≤ 0 where the last inequality is due to P1. Therefore, from (3.6), Cm+2 = Cm+1 − {p} ∗ and (3.15) holds for q = 1. If, on the other hand, jm+1 is not unique, then it is ∗ possible that jm+1 6= p since we have assumed that ties are arbitrarily broken. In ∗ this case, there are at most q ≤ N − 1 steps before jm+q = p. This is because at step ∗ m + 1 either δm+1 ≤ 0 and jm+1 is removed from Cm+1 , or δm+1 > 0 and, from (3.5), ∗ ∗ ∗ ∗ ∗ ∗ (njm+1 (njm+2 njm+2 ,m+2 ) > ∆Ljm+1 ,m+1 ) ,m+2 = njm+1 ,m+1 + 1, in which case ∆Ljm+2 from A3.2. The same is true for any of the q steps after m. Then at step m + q + 1, ∗ we get δm+q+1 ≤ 0 by arguing exactly as in the case where jm+1 is unique, with m + 1 replaced by m + q + 1, and again and (3.15) holds. Case 1.2: If ∆Lj (nj,m+1 ) is the same for all j ∈ Cm+1 , then it is possible that i∗m+1 = p. ∗ In this case, δl < 0 for all l > m + 1 due to A3.2. Therefore, jm+1 will be removed ∗ from Cm+1 through (3.6). Moreover, since im+1 = p by (3.10), this process repeats itself for at most q ≤ N − 1 steps resulting in Cm+q+1 = {p}. ∗ = r where r 6= p, then C Case 2: If δm ≤ 0 and jm m+1 = Cm − {r}. In this case, note that ∗ ∗ im+1 = im = p and depending on the sign of δm+1 we either go to Case 1 or we repeat the process of removing one additional user index from the Cm+1 set. In the event that δl ≤ 0 for all l > m, all jl∗ will be removed from the Cl set. The only remaining element in this set is p, which reduces to Case 1.2 above. 91 Property P5 follows from P3 by observing in (3.5) that the only way to get np,m > np,k is if jl∗ = p and δl > 0 for some k < l < m . However, P3 asserts that this is not possible, since p would be removed from Cl . Property P6 follows from P4 by a similar argument. The only way to get np,m < np,k is if i∗l = p and δl > 0 for some k < l < m . However, it is clear from the proof of P4 that p would either be removed from Cl , possibly after a finite number of steps, or simply remain in this set until it is the last element in it. B.4 Proof of Lemma 3.3.2 We begin by first establishing the fact that the process terminates in a finite number of steps bounded by K(N + 1). This is easily seen as follows. At any step k, the process determines some i∗k (say p) with two possibilities: (i) Either user p gives one resource to some other user through (3.5), or (ii) One user index is removed from Ck through (3.6), in which case i∗k+1 = p and we have the exact same situation as in step k (if case (ii) persists, clearly |Cl | = 1 for some l ≤ k + N − 1). Under case (i), because of property P5, p cannot receive any resources from other users, therefore in the worst case p will give away all of its initial resources to other users and will subsequently not be able to either give or receive resources from other users. Since np,k ≤ K for any k, it follows that p can be involved in a number of steps that is bounded by K + 1, where 1 is the extra step when p is removed from Ck at some k. Finally, since there are N users undergoing this series of steps, in the worst case the process terminates in N (K + 1) steps. This simple upper bound serves to establish the fact that the process always terminates in a finite number of steps. We will use this fact together with some of the properties in Lemma 3.3.1 to find a tighter upper bound. Let the initial allocation be s0 . Since the process always terminates in a finite number of steps, there exists some final allocation ¯s = [¯ n1 , · · · , n ¯ N ] which, given s0 , is unique since the algorithm is deterministic. An allocation ni,k at the kth step can be written as follows: ni,k = n ¯ i + di,k PN where di,k = {−K, · · · , −1, 0, 1, · · · , K} and feasible. Now define the following three sets: Ak = {i : di,k > 0}, i=1 di,k = 0 for all k = 0, 1, · · ·, since all allocations are Bk = {i : di,k = 0}, Ck = {i : di,k < 0} and note that at the final state di = 0 for all i = 1, · · · , N . Due to P3, at every step we have i∗k ∈ Ak (recall that once a user is selected as i∗k it can only give away resources to other users). Similarly, due to P4, jk∗ ∈ Bk ∪ Ck . At every step of the process, there are only two possibilities: 1. If δk > 0, let p = i∗k ∈ Ak and q = jk∗ ∈ Bk ∪ Ck . Then, at the next step, (3.5) implies that dp,k+1 = dp,k − 1 and dq,k+1 = dq,k + 1. 2. If δk ≤ 0, then a user index from Bk is removed from the set Ck . 92 Moreover, from the definitions of the three sets above, we have N X di,k = 0 = i=1 N X di,k + i=1 i∈Ak N X di,k i=1 i∈Ck and, therefore, we can write Pk = N X di,k = − i=1 i∈Ak N X di,k i=1 i∈Ck where 0 ≤ Pk ≤ K for all k = 0, 1, · · ·, since 0 ≤ ni,k ≤ K. Now let P0 be the initial value of Pk and let |A0 | be the initial cardinality of the set Ak . We separate the number of steps required to reach the final allocation into three categories: (i) Clearly, P0 steps (not necessarily contiguous) are required to make Pk = 0 for some k ≥ 0 by removing one resource at each such step form users i ∈ Al , l ≤ k. During any such step, we have δl > 0 as in case 1 above. (ii) These P0 steps would suffice to empty the set Ak if it were impossible for user indices to be added to it from the set Bl , l ≤ k. However, from property P4 it is possible for a user j such that j ∈ Bk and j 6∈ Al for all l < k to receive at most one resource, in which case we have j ∈ Ak . There are at most N − |A0 | users with such an opportunity, and hence N − |A0 | additional steps are possible. During any such step, as in (i), we have δl > 0 as in case 1 above. (iii) Finally, we consider steps covered by case 2 above. Clearly, N − 1 steps are required to reach |Ck | = 1 for some k. Therefore, the number of steps L required to reach the final allocation is such that L ≤ P0 + N − |A0 | + N − 1. Observing that P0 ≤ K and |A0 | ≥ 1, we get L ≤ K + 2(N − 1). Note that if |A0 | = 0, implies that the s0 = ¯s and in this case only N − 1 steps (see (iii) above) are required to reach the final state. Thus, N − 1 is the lower bound on the required number of steps. B.5 Proof of Theorem 3.3.1 First, by Lemma 3.3.2, a final allocation exists. We will next show that this allocation satisfies ∆Li (¯ ni + 1) ≥ ∆Lj (¯ nj ) for any i, j (B.9) We establish this by contradiction. Suppose there exist p 6= q such that (B.9) is violated, i.e. ∆Lp (¯ np + 1) < ∆Lq (¯ nq ), and suppose that p, q were removed from Ck and Cl respectively (i.e., at steps k, l respectively). Then, two cases are possible: Case 1: k < l. For p to be removed from Ck in (3.6), the following should be true: jk∗ = p and δk ≤ 0. However, ∆Lp (np,k + 1) ≥ ∆Li∗k (ni∗k ,k ) ≥ ∆Li∗ (ni∗ ,l ) ≥ ∆Lq (nq,l ) where the first inequality is due to δk ≤ 0, the second is due to property P1 in (3.12), and the last is due to the definition of i∗k . Therefore, our assumption is contradicted. 93 Case 2: k > l. Now q is removed from Cl first, therefore: ∆Lq (nq,l ) = ∆Ljl∗ (njl∗ ,l ) ≤ ∆Ljk∗ (njk∗ ,k ) = ∆Lp (np,k ) < ∆Lp (np,k + 1) where the two equalities are due to (3.6) and the fact that q, p were removed from Cl and Ck respectively. In addition, the first inequality is due to P2 in (3.13), and the last inequality is due to A3.2. Again, our assumption is contradicted. Therefore, (B.9) holds. We can now invoke Theorem 3.2.1, from which it follows that (B.9) implies global optimality. B.6 Proof of Lemma 3.4.1 Suppose δk (i, j) satisfies Assumption A3.4. Given ˆsk = s, Cˆk = C, consider the event L(ˆsk+1 ) > L(ˆsk ). According to the process (3.18)-(3.22) and (3.11): If δˆk (ˆi∗k , ˆjk∗ ) > 0, if δˆk (ˆi∗ , ˆj ∗ ) ≤ 0, k then L(ˆsk+1 ) − L(ˆsk ) = −δk (ˆi∗k , ˆjk∗ ), and then L(ˆsk+1 ) = L(ˆsk ). k Therefore, L(ˆsk+1 ) > L(ˆsk ) occurs if and only if δˆk (ˆi∗k , ˆjk∗ ) > 0 and δk (ˆi∗k , ˆjk∗ ) < 0. Then, Pr[L(ˆsk+1 ) > L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] = Pr[δˆk (ˆi∗k , ˆjk∗ ) > 0 and δk (ˆi∗k , ˆjk∗ ) < 0] X ˆ f (k) (ni ) − ∆L ˆ f (k) (nj + 1) > 0 and (ˆi∗k , ˆjk∗ ) = (i, j)] = Pr[∆L i j {(i,j)|δk (i,j)<0} ≤ X f (k) ˆ Pr[∆L i ˆ f (k) (nj + 1) > 0]. (ni ) − ∆L j (B.10) {(i,j)|δk (i,j)<0} For each pair of (i, j) satisfying δk (i, j) < 0, we know from Lemma 2.2.1 that f (k) ˆ lim Pr[∆L i k→∞ ˆ f (k) (nj + 1) > 0] = 0. (ni ) − ∆L j Taking this limit in (B.10), and also noticing the finiteness of the set {(i, j) | δk (i, j) < 0} for any pair (ˆsk , Cˆk ) = (s, C), we obtain lim dk (s, C) = lim Pr[L(ˆsk+1 ) > L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] = 0 k→∞ k→∞ and the proof of (3.24) is complete. The definition (3.25) immediately implies that dk is monotone decreasing and that dk ≥ dk (s, C). The limit (3.26) then follows from (3.24). 94 B.7 Proof of Lemma 3.4.3 Before we prove this lemma, we first need to formalize the definition of the αk sequence as well as an auxiliary results which is stated as Lemma B.7.1. Hence, we define a sequence of integers {αk } satisfying lim αk = ∞, lim αk (ak + bk ) = 0, lim (1 − dbk/2c )αk = 1, (B.11) k→∞ k→∞ k→∞ where, for any x, bxc = {n | n ≤ x, n is integer} is the greatest integer smaller than x. Such a sequence {αk } exists. For example, any αk ≤ b(max{dbk/2c , ak + bk })−1/2 c, limk→∞ αk = ∞, satisfies (B.11) (without loss of generality, we assume that dk ak bk 6= 0, otherwise αk can take any arbitrary value). The choice of {αk } is rather technical. Its necessity will be clear from the following proofs of Lemmas B.7.1 and 3.4.3. Furthermore, observe that if {αk } satisfies (B.11), we also have lim (1 − dk )αk = 1, k→∞ (B.12) since dk ≤ dbk/2c . Next, we define another property in terms of the next lemma which is needed to establish Lemma 3.4.3. The property states that if an allocation is not optimal at step k, then the probability that this allocation remains unchanged over αk steps is asymptotically zero. Lemma B.7.1 Suppose that A2.1 and A3.4 hold and let {αk } satisfy (B.11). Consider an allocation s = [n1 , · · · , nN ] 6= s∗ and any set C. Then lim Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] = 0 k→∞ (B.13) Proof: Given ˆsk = s, Cˆk = C, consider the event L(ˆsk+1 ) = L(ˆsk ). According to the process (3.18)-(3.22) and (3.11): If δˆk (ˆi∗k , ˆjk∗ ) > 0, then L(ˆsk+1 ) − L(ˆsk ) = −δk (ˆi∗k , ˆjk∗ ), and if δˆk (ˆi∗k , ˆjk∗ ) ≤ 0, then L(ˆsk+1 ) = L(ˆsk ). Therefore, L(ˆsk+1 ) = L(ˆsk ) occurs if and only if either {δˆk (ˆi∗k , ˆjk∗ ) > 0, δk (ˆi∗k , ˆjk∗ ) = 0} or {δˆk (ˆi∗k , ˆjk∗ ) ≤ 0}. (B.14) for every k. For notational convenience, consider, for any k, the events − ˆ ˆ∗ ˆ∗ ˆ ˆ∗ ˆ∗ ˆ∗ ˆ∗ A+ k = {δk (ik , jk ) > 0, δk (ik , jk ) = 0} and Ak = {δk (ik , jk ) ≤ 0} Next, for any i ≥ 1, define the following subset of {k, · · · , k + i − 1}: ˆ = {h : δˆh (ˆi∗ , ˆj ∗ ) ≤ 0, h ∈ {k, · · · , k + i − 1}} R(i) h h ˆ k ). In addition, for any given integer I, let R ˆ I (αk ) denote and let Iˆk be the cardinality of the set R(α such a set with exactly I elements. Then, define the set ˆ I (αk ) = {k, · · · , k + αk − 1} − R ˆ I (αk ) Q 95 containing all indices h ∈ {k, · · · , k + αk − 1} which do not satisfy δˆh (ˆi∗h , ˆjh∗ ) ≤ 0. Finally, define + − − ˆ ˆ A+ I (αk ) = {Ah , for all h ∈ QI (αk )} and AI (αk ) = {Ah , h ∈ RI (αk )} Depending on the value of Iˆk defined above, we can write Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] = = Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk ≤ |C| + N | (ˆsk , Cˆk ) = (s, C)] + + Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N | (ˆsk , Cˆk ) = (s, C)]. (B.15) We will now consider each of the two terms in (B.15) separately. The first term in (B.15) can be rewritten as Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk ≤ |C| + N | (ˆsk , Cˆk ) = (s, C)] = |C|+N = X Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk = I | (ˆsk , Cˆk ) = (s, C)] (B.16) I=0 Using the notation we have introduced, observe that Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk = I | (ˆsk , Cˆk ) = (s, C)] X = Pr[A− (αk ), A+ (αk ) | (ˆsk , Cˆk ) = (s, C)] I I ˆ I (αk ) R = X − Pr[A+ sk , Cˆk ) = (s, C)] Pr[A− sk , Cˆk ) = (s, C)] I (αk ) | AI (αk ), (ˆ I (αk ) | (ˆ (B.17) ˆ I (αk ) R ˆ I (αk ) (otherwise, if Set h0 = k + αk − 1 and, without loss of generality, assume that h0 ∈ 6 R ˆ I (αk ) there must exist some M such that k + αk − M 6∈ R ˆ I (αk ) and the same argument may ∈R 0 be used with H = k + αk − M ). Then h0 − Pr[A+ sk , Cˆk ) = (s, C)] I (αk ) | AI (αk ), (ˆ + − = Pr[A+ sk , Cˆk ) = (s, C)] h0 , AI (αk − 1) | AI (αk ), (ˆ X − = Pr[A+ sh0 , Cˆh0 ) = (s0 , C 0 ), A+ sk , Cˆk ) = (s, C)] I (αk − 1) | AI (αk ), (ˆ h0 , (ˆ (B.18) (s0 ,C 0 ) Recalling the definition of A+ h0 , we can write − Pr[A+ sh0 , Cˆh0 ) = (s0 , C 0 ), A+ sk , Cˆk ) = (s, C)] I (αk − 1) | AI (αk ), (ˆ h0 , (ˆ − = Pr[δˆh0 (ˆi∗h0 , ˆjh∗0 ) > 0 | δh0 (ˆi∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+ sk , Cˆk ) = (s, C)] × I (αk − 1), AI (αk ), (ˆ Pr[δh0 (ˆi∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+ (αk − 1) | A− (αk ), (ˆsk , Cˆk ) = (s, C)] I I Then, the Markov property of the process (3.18)-(3.22) implies that − Pr[A+ sh0 , Cˆh0 ) = (s0 , C 0 ), A+ sk , Cˆk ) = (s, C)] I (αk − 1) | AI (αk ), (ˆ h0 , (ˆ = Pr[δˆh0 (ˆi∗h0 , ˆjh∗0 ) > 0 | δh0 (ˆi∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 )] × − Pr[δh0 (ˆi∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+ sk , Cˆk ) = (s, C)] (B.19) I (αk − 1) | AI (αk ), (ˆ 96 However, by Assumption A3.4, Pr[δˆh0 (ˆi∗h0 , ˆjh∗0 ) > 0 | δh0 (ˆi∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 )] ≤ 1 − p0 Thus, (B.19) becomes − Pr[A+ sh0 , Cˆh0 ) = (s0 , C 0 ), A+ sk , Cˆk ) = (s, C)] I (αk − 1) | AI (αk ), (ˆ h0 , (ˆ + ≤ (1 − p0 ) Pr[δh0 (ˆi∗h0 , ˆjh∗0 )=0, (ˆsh0 , Cˆh0 ) = (s0 , C 0 ), AI (αk − 1) | A− sk , Cˆk ) = (s, C)] I (αk ), (ˆ ≤ (1 − p0 ) Pr[(ˆsh0 , Cˆh0 ) = (s0 , C 0 ), A+ (αk − 1) | A− (αk ), (ˆsk , Cˆk ) = (s, C)]. I I Using this inequality in (B.18), we obtain − Pr[A+ sk , Cˆk ) = (s, C)] I (αk ) | AI (αk ), (ˆ − ≤ (1 − p0 ) Pr[A+ sk , Cˆk ) = (s, C)]. I (αk − 1) | AI (αk ), (ˆ Continuing this recursive procedure, we finally arrive at − Pr[A+ sk , Cˆk ) = (s, C)] ≤ (1 − p0 )αk −I I (αk ) | AI (αk ), (ˆ which allows us to obtain the following inequality from (B.16) and (B.17): Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk ≤ |C| + N | (ˆsk , Cˆk ) = (s, C)] |C|+N ≤ X X (1 − p0 )αk −I Pr[A− sk , Cˆk ) = (s, C)] I (αk ) | (ˆ I=0 R ˆ I (αk ) |C|+N ≤ X αk −(|C|+N ) (1 − p0 )αk −I ≤ p−1 . 0 (1 − p0 ) (B.20) I=0 Since 0 ≤ 1 − p0 < 1 by Assumption A3.4, and since limk→∞ αk = ∞ according to (B.11), the preceding inequality implies that the first term in (B.15) is such that lim Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk < |C| + N | (ˆsk , Cˆk ) = (s, C)] = 0. k→∞ (B.21) Next we consider the second term in (B.15). Let Jˆk be the first step after k that we have either ˆi∗ 6∈ Amax or ˆj ∗ 6∈ Amin , h = k, k + 1, ..., k + αk − 1. Clearly, k ≤ Jˆk ≤ k + αk . We also use (without h h h h confusion) Jˆk = k + αk to mean that ˆi∗h ∈ Amax and ˆjh∗ ∈ Amin for all h = k, k + 1, ..., k + αk − 1. k h Then the second term in (B.15) can be written as Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N | (ˆsk , Cˆk ) = (s, C)] = k+α k −1 X Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N, Jˆk = J | (ˆsk , Cˆk ) = (s, C)] J=k min ˆ ˆ∗ +Pr[L(ˆsh+1 ) = L(ˆsh ), ˆi∗h ∈ Amax sk , Cˆk )=(s, C)] h , jh ∈ Ah , h = k, ..., k+αk −1, Ik > |C|+N |(ˆ (B.22) We shall now consider each of the two terms in (B.22) separately. In the first term, for any J, k ≤ J < k + αk , we have Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N, Jˆk = J | (ˆsk , Cˆk ) = (s, C)] ≤ Pr[Jˆk = J | (ˆsk , Cˆk ) = (s, C)] = X Pr[Jˆk = J | (ˆsJ , CˆJ ) = (s0 , C 0 )] Pr[(ˆsJ , CˆJ ) = (s0 , C 0 )|(ˆsk , Cˆk ) = (s, C)] {(s0 ,C 0 )} 97 (B.23) where the second step above follows from the Markov property of (3.18)-(3.22). Moreover, Pr[Jˆk = J | (ˆsJ , CˆJ ) = (s0 , C 0 )] [ ≤ Pr[{ˆi∗J 6∈ Amax {ˆjJ∗ 6∈ Amin sJ , CˆJ ) = (s0 , C 0 )] J } J } | (ˆ ≤ Pr[ˆi∗J 6∈ Amax | (ˆsJ , CˆJ ) = (s0 , C 0 )] + Pr[ˆjJ∗ 6∈ Amin | (ˆsJ , CˆJ ) = (s0 , C 0 )] J J ≤ aJ + bJ ≤ ak + bk where we have used (3.29), (3.30), (3.32), and the monotonicity of ak and bk . This inequality, together with (B.23), implies that k+α k −1 X Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N, Jˆk = J | (ˆsk , Cˆk ) = (s, C)] J=k ≤ (αk − 1)(ak + bk ). By Lemma 3.4.2 and (B.11) it follows that lim k→∞ k+α k −1 X Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1, Iˆk > |C| + N, Jˆk = J | (ˆsk , Cˆk ) = (s, C)] = 0 J=k (B.24) As for the second term in (B.22), we note the following facts. (a) Given that ˆsk 6= s∗ , then either there is a j ∈ Cˆk such that δk (i∗k , j) > 0 for any i∗k ∈ Amax k or the set Cˆh first decreases to |Cˆh | = 1 according to (3.19) and then is reset to C0 , in which case there is a j ∈ C0 such that δk (i∗k , j) > 0 (otherwise ˆsk would be the optimum according to Theorem 3.2.1. Therefore, without loss of generality, we assume that there is a j ∈ Cˆk such that δk (i∗k , j) > 0 for any i∗k ∈ Amax k . min ˆ∗ (b) As long as (B.14) holds and ˆi∗h ∈ Amax h , jh ∈ Ah , ni,j )}, h = k, k + 1, ..., k + αk − 1. ni,j )} = max{∆Li (ˆ max{∆Li (ˆ j∈Cˆk j∈Cˆh (c) One user is deleted from the set Cˆh every time δˆk (ˆi∗h , ˆjh∗ ) ≤ 0. The previous facts (a) – (c) imply that, when min ˆ ˆ∗ L(ˆsh+1 ) = L(ˆsh ), ˆi∗h ∈ Amax h , jh ∈ Ah , h = k, · · · , k+αk −1, Ik > |C|+N, ˆ k, k ≤ M ˆ k ≤ k + αk − 1 such that with probability one there exists a M ∗ ˆ∗ˆ , ˆj ∗ˆ ) > 0. δˆMˆ k (ˆi∗Mˆ , ˆjM ˆ k (iM ˆ ) ≤ 0, δM M k k k k Then, the second term in (B.22) becomes min ˆ ˆ∗ Pr[L(ˆsh+1 ) = L(ˆsh ), ˆi∗h ∈ Amax sk , Cˆk ) = (s, C)] h , jh ∈ Ah , h = k, ..., k+αk −1, Ik > |C|+N | (ˆ ∗ ∗ ∗ ∗ ≤ Pr[δˆ ˆ (ˆi , ˆj ) ≤ 0, δ ˆ (ˆi , ˆj ) > 0 | (ˆsk , Cˆk ) = (s, C)] Mk = k+α k −1 X ˆk M ˆk M Mk ˆk M ˆk M ∗ ∗ ˆ k = M, (ˆsk , Cˆk ) = (s, C)] × Pr[δˆM (ˆi∗M , ˆjM ) ≤ 0, δM (ˆi∗M , ˆjM )>0 | M M =k ˆ k = M | (ˆsk , Cˆk ) = (s, C)] Pr[M 98 (B.25) Using Lemma 2.2.1, we know that ∗ ∗ ˆ k = M, (ˆsk , Cˆk ) = (s, C)] Pr[δˆM (ˆi∗M , ˆjM ) ≤ 0, δM (ˆi∗M , ˆjM )>0|M X ˆ k = M, (ˆsk , Cˆk ) = (s, C)] → 0 as k → ∞. ≤ Pr[δˆM (i, j) ≤ 0 | M {(i,j)∈CˆM ,δM (i,j)>0} Therefore, we get from (B.25): min ˆ ˆ∗ lim Pr[L(ˆsh+1 ) = L(ˆsh ), ˆi∗h ∈ Amax sk , Cˆk ) = (s, C)] = 0 h , jh ∈ Ah , h = k, ..., k+αk −1, Ik > |C|+N | (ˆ k→∞ The combination of this fact with (B.24) and (B.21) yields the conclusion of the lemma. Proof of Lemma 3.4.3 First, given (ˆsk , Cˆk ) = (s, C) and some αk defined as in (B.11), consider sample paths such that L(ˆsi+1 ) ≤ L(ˆsi ) for all i = k, k + 1, · · · , k + αk − 1. Observe that any such sample path can be decomposed into a set such that L(ˆsi+1 ) < L(ˆsi ) for some k ≤ h ≤ k + αk − 1 and a set such that L(ˆsi+1 ) = L(ˆsi ) for all i = k, k + 1, · · · , k + αk − 1. Thus, we can write {L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1} = {∃k ≤ h ≤ k + αk − 1 s.t. L(ˆsh+1 ) < L(ˆsh ), and L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k+αk −1, i 6= h} [ {L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k +αk − 1}. (B.26) Therefore, Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] = Pr[∃k ≤ h < k+αk s.t. L(ˆsh+1 ) < L(ˆsh ), and L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k+αk−1, i 6= h | (ˆsk , Cˆk ) = (s, C)] + Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] ≤ Pr[L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] + Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]. (B.27) Using Lemma B.7.1, the second term on the right-hand-side above vanishes as k → ∞, and (B.27) yields lim Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]. k→∞ ≤ lim Pr[L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] k→∞ On the other hand, we can write Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] X = Pr[(ˆsi , Cˆi ) = (si , Ci ), i = k + 1, · · · , k + αk | (ˆsk , Cˆk ) = (s, C)]. ( ) (si ,Ci ) L(si )≤L(si−1 ) i=k+1,···,k+αk 99 (B.28) The Markov property of {(ˆsk , Cˆk )} implies that Pr[(ˆsi , Cˆi ) = (si , Ci ), i = k + 1, ..., k + αk | (ˆsk , Cˆk ) = (s, C)] = k+α Yk Pr[(ˆsi , Cˆi ) = (si , Ci ) | (ˆsi−1 , Cˆi−1 ) = (si−1 , Ci−1 )]. i=k+1 Thus Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] = k+α Yk X ( ) (si ,Ci ) L(si )≤L(si−1 ) i=k+1,···,k+αk = Pr[(ˆsi , Cˆi ) = (si , Ci ) | (ˆsi−1 , Cˆi−1 ) = (si−1 , Ci−1 )] i=k+1 k+α k −1 ³ Y X ( ) (si ,Ci ) L(si )≤L(si−1 ) i=k+1,···,k+αk −1 X ( ) Pr[(ˆsi , Cˆi ) = (si , Ci ) | (ˆsi−1 , Cˆi−1 ) = (si−1 , Ci−1 )]× i=k+1 ´ Pr[(ˆsj , Cˆj ) = (sj , Cj ) | (ˆsj−1 , Cˆj−1 ) = (sj−1 , Cj−1 )] (sj ,Cj ) L(sj )≤L(sj−1 ) j=k+αk = k+α k −1 ³ Y X ) ( (si ,Ci ) L(si )≤L(si−1 ) i=k+1,···,k+αk −1 Pr[(ˆsi , Cˆi ) = (si , Ci ) | (ˆsi−1 , Cˆi−1 ) = (si−1 , Ci−1 )]× i=k+1 ´ Pr[L(ˆsk+αk ) ≤ L(sk+αk −1 ) | (ˆsk+αk −1 , Cˆk+αk −1 ) = (sk+αk −1 , Ck+αk −1 )] Now, recalling the definition of dk (s, C) in (3.23), observe that the last term in the product above is precisely [1 − dk+αk −1 (ˆsk+αk −1 , Cˆk+αk −1 )]. Moreover, by Lemma 3.4.1 we have dk+αk −1 ≥ dk+αk −1 (ˆsk+αk −1 , Cˆk+αk −1 ). Therefore, we get Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] ( ) (si ,Ci ) L(si )≤L(si−1 ) i=k+1,···,k+αk −1 ≥ ··· ≥ k+α k −1 Y X ≥ (1−dk+αk −1 ) Pr[(ˆsi , Cˆi ) = (si , Ci ) | (ˆsi−1 , Cˆi−1 ) = ˆ(si−1 , Ci−1 )] i=k+1 k+α k −1 Y (1 − di ) i=k ≥ (1 − dk )αk (B.29) where the last inequality follows from Lemma 3.4.1 where it was shown that di is monotone decreasing in i. Hence, since αk satisfies (B.12), we get lim Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk )] = 1 k→∞ 100 Finally, using this limit and the inequality (B.28), and recalling the definition of ek (s, C) in (3.34), we readily conclude that (3.35) holds. Moreover, the definition (3.36) immediately implies that ek is monotone decreasing and ek ≥ ek (s, C). The limit (3.37) then follows from (3.35). B.8 Proof of Theorem 3.4.1 We begin by defining three auxiliary quantities we shall use in the proof. First, let us choose some ² > 0 such that ² < min0 {L(s) − L(s0 ) | L(s) − L(s0 ) > 0}. (B.30) {s,s } Note that such ² > 0 exists because of the discrete nature of the cost function and the finiteness of the number of feasible allocations. Observe that ² is a real number strictly lower than the smallest cost difference in the allocation process. Second, for any s, set qs = b(L(s) − L(s∗ ))/²c. Then L(s) − L(s∗ ) ≥ qs ², L(s) − L(s∗ ) < (qs + 1)². (B.31) Finally, we shall define a convenient sequence {αk } that satisfies (B.11). To do so, let q = maxs∈S qs , and, for any k, choose αk = b 1 1 min{k, (max{dbk/2c , ak + bk })−1/2 }c ≤ min{k, (max{dbk/2c , ak + bk })−1/2 }. 2q 2q Since the sequences {dk } and {ak + bk } are monotone decreasing by their definitions, the sequence {αk } is monotone increasing and it is easy to verify that it satisfies (B.11). The next step in the proof is to define a particular subsequence of {(ˆsi , Cˆi )} as follows. First, set x = k − qαk and observe that x = k − qαk ≥ k/2. (B.32) Then, define a sequence of indices {yi } with i = 0, · · · , qs through y0 = x, yi = yi−1 + αyi−1 , i = 1, 2, ..., qs . For sufficiently large k such that αx ≥ 1, it is easy to verify by induction that x = y0 < y1 < ... < yqs ≤ k. (B.33) Now, for any k and x defined above, consider a subsequence of {(ˆsi , Cˆi ), i = x, .., k}, denoted by ψ = {(ˆsyi , Cˆyi ), i = 0, 1, ..., qs }, starting at ˆsy0 = ˆsx = s, and such that either there is an i, 0 ≤ i ≤ qs − 1 such that L(ˆsyj+1 ) − L(ˆsyj ) ≤ −² for all j = 0, · · · , i − 1 and ˆsyj = s∗ , for all j = i, · · · , qs or L(ˆsyi+1 ) − L(ˆsyi ) ≤ −² and ˆsyi 6= s∗ for all i = 0, · · · , qs − 1 101 In other words, any such subsequence is “embedded” into the original process {(ˆsi , Cˆi )} so as to give strictly decreasing costs and if it reaches the optimum it stays there afterwards. The subsequence defined above has the additional property that ˆsyqs = s∗ . (B.34) This is obvious in the case where ˆsyj = s∗ for some j = 0, 1, · · · , qs . On the other hand, if ˆsyi 6= s∗ for all i = 0, 1, ..., qs − 1, we must have L(ˆsyi )−L(ˆsyi−1 ) ≤ −², i = 1, ..., qs . (B.35) Adding the qs inequalities above yields L(ˆsyqs ) − L(ˆsx ) ≤ −qs ² or, since ˆsx = s, L(ˆsyqs ) − L(s∗ ) ≤ L(s) − L(s∗ ) − qs ². This inequality, together with (B.31), implies that L(ˆsyqs ) − L(s∗ ) ≤ ². Since ² satisfies (B.30), we must have L(ˆsyqs ) = L(s∗ ) for all paths satisfying (B.35), which in turn implies ˆsyqs = s∗ since the optimum s∗ is assumed unique. Therefore, for every subsequence ψ = {(ˆsyi , Cˆyi ), i = 0, 1, ..., qs } considered, (B.34) holds. Before proceeding with the main part of the proof, let us also define, for notational convenience, a set Ψ to contain all subsequences of the form ψ as specified above, or any part of any such subsequence, i.e., any {(ˆsyn , Cˆyn ), · · · , (ˆsym , Cˆym )} with n ≤ m and n, m ∈ {0, 1, · · · , qs }. Then, for any s 6= s∗ and any C, all sample paths restricted to include some ψ ∈ Ψ form a subset of all sample paths that lead to a state such that ˆsyqs = s∗ , i.e., Pr[ˆsyqs = s∗ | (ˆsx , Cˆx ) = (s, C)] X ≥ Pr[(ˆsyi , Cˆyi ) = (si , Ci ), i = 1, ..., qs | (ˆsx , Cˆx ) = (s, C)] (B.36) {(si ,Ci ),i=1,...,qs }∈Ψ Because {(ˆsk , Cˆk )} is a Markov process, setting (s0 , C0 ) = (s, C), the previous inequality can be rewritten as Pr[ˆsyqs = s∗ | (ˆsx , Cˆx ) = (s, C)] ≥ X qs Y Pr[(ˆsyi , Cˆyi ) = (si , Ci ) | (ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )] {(si ,Ci ),i=1,...,qs }∈Ψ i=1 In addition, let us decompose any subsequence ψ into its first (qs − 1) elements and the remaining element (sqs , Cqs ). Thus, for any subsequence whose (qs − 1)th element is (ˆsyqs −1 , Cˆyqs −1 ), there is a 102 set of final states such that (sqs , Cqs ) ∈ Ψ, so that we may write qs Y X Pr[(ˆsyi , Cˆyi ) = (si , Ci ) | (ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )] {(si ,Ci ),i=1,...,qs }∈Ψ i=1 = qY s−1 X Pr[(ˆsyi , Cˆyi ) = (si , Ci ) | (ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )]× {(si ,Ci ),i=1,...,qs−1}∈Ψ i=1 X Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs ) | (ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )] (B.37) (sqs ,Cqs )∈Ψ Let us now consider two possible cases regarding the value of sqs −1 . Case 1: If sqs −1 = s∗ , then, aggregating over all Cqs and recalling (B.34), we can write, for any Cqs −1 in some subsequence of Ψ, X Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs ) | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )] (sqs ,Cqs )∈Ψ = Pr[ˆsyqs = s∗ | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )] Now let us consider a subsequence {(ˆsi , Cˆi )} with i = yqs −1 , · · · , yqs and ˆsyqs −1 = ˆsyqs = s∗ . Observing that all subsequences {(ˆsi , Cˆi )} restricted to ˆsi = s∗ for all i = yqs −1 , · · · , yqs form a subset of all the subsequences above, and exploiting once again the Markov property of the process {(ˆsk , Cˆk )}, we can clearly write Pr[ˆsyqs = s∗ | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )] yqs Y X ≥ 0 Pr[(ˆsi , Cˆi ) = (s∗ , Ci0 )|(ˆsi−1 , Cˆi−1 ) = (s∗ , Ci−1 )] {Ci0 ,i=yqs −1 ,...,yqs }i=yqs −1 +1 where Cy0 qs −1 = Cqs −1 . Using the definition of dk (s, C) in (3.23) and noticing that, given ˆsk = s∗ , L(ˆsk+1 ) ≤ L(ˆsk ) is equivalent to ˆsk+1 = s∗ when the optimum is unique, each term in the product above can be replaced by [1 − di−1 (ˆsi−1 , Cˆi−1 )], i = yqs −1 + 1, · · · , yqs . In addition, from Lemma 3.4.1, we have dk ≥ dk (s, C). Therefore, Pr[ˆsyqs = s∗ | (ˆsyqs −1 , Cˆyqs −1 ) = (s∗ , Cqs −1 )] yqs Y ≥ (1 − di ) ≥ (1 − dyqs −1 )yqs −yqs −1 ≥ (1 − dx )yqs −yqs −1 (B.38) i=yqs −1 +1 where the last two inequalities follow from the fact that dk is monotone decreasing and the fact that yi ≥ x. Case 2: If sqs −1 6= s∗ , then by the definition of any subsequence ψ ∈ Ψ, we must have a strict cost decrease, i.e., L(ˆsyqs ) − L(ˆsyqs −1 ) ≤ −². Therefore, for any Cqs −1 in some subsequence of Ψ, we can now write X Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )]) (sqs ,Cqs )∈Ψ = Pr[L(ˆsyqs ) − L(ˆsyqs −1 ) ≤ −²|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )] = Pr[L(ˆsy ) < L(ˆsy )|(ˆsy , Cˆy ) = (sqs −1 , Cqs −1 )] +αy qs −1 qs −1 qs −1 103 qs −1 qs −1 (B.39) recalling the choice of ² in (B.30). We can now make use of the definition of ek (s, C) in (3.34) and write 1 − eyqs −1 = Pr[L(ˆsyqs −1 +αyqs −1 ) < L(ˆsyqs −1 )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )] Then, making use of the monotonicity of {ek } established in Lemma 3.4.3 and the fact that yi ≥ x for all i = 1, · · · , qs , we get X Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )] (sqs ,Cqs )∈Ψ ≥ 1 − eyqs −1 ≥ 1 − ex (B.40) Therefore, combining both cases, i.e., inequalities (B.38) and (B.40), we obtain the inequality X Pr[(ˆsyqs , Cˆyqs ) = (sqs , Cqs )|(ˆsyqs −1 , Cˆyqs −1 ) = (sqs −1 , Cqs −1 )] (sqs ,Cqs )∈Ψ ≥ min{(1 − dx )yqs −yqs −1 , 1 − ex } ≥ (1 − dx )yqs −yqs −1 (1 − ex ). Returning to (B.36) and using the inequality above, we obtain Pr[ˆsyqs = s∗ | (ˆsx , Cˆx ) = (s, C)] ≥(1−dx )yqs−yqs −1 (1−ex )× X qY s−1 Pr[(ˆsyi , Cˆyi ) = (si , Ci )|(ˆsyi−1 , Cˆyi−1 ) = (si−1 , Ci−1 )] {(si ,Ci ),i=1,...,qs−1}∈Ψi=1 This procedure can now be repeated by decomposing a subsequence ψ with (qs − 1) elements into its first (qs − 2) elements and the remaining element (sqs −1 , Cqs −1 ) and so on. Note that in this case the value of the last state at each step of this procedure, sqs −i , i = 1, · · · , qs , is not necessarily s∗ . However, if sqs −i−1 = sqs −i = s∗ , then Case 1 considered earlier applies; if sqs −i−1 6= s∗ , then Case 2 applies. Thus, after qs such steps, we arrive at: Pr[ˆsyqs = s∗ | (ˆsx , Cˆx ) = (s, C)] ≥ (1 − dx )yqs −y0 (1 − ex )qs . Since x ≥ k/2 ≥ bk/2c according to (B.32) and since dk and ek are monotone decreasing according to (3.25) and (3.36) respectively, we have dx ≤ dbk/2c Thus and ex ≤ ebk/2c . Pr[ˆsyqs = s∗ | ˆsx , Cˆx ] ≥ (1 − dbk/2c )yqs −y0 (1 − ebk/2c )q . (B.41) On the other hand, noting that yqs ≤ k according to (B.33), consider {(ˆsk , Cˆk )} starting from (ˆsx , Cˆx ). Then, Pr[ˆsk = s∗ | (ˆsx , Cˆx ) = (s, C)] X ≥ Pr[(ˆsyqs , Cˆyqs ) = (s∗ , Cyqs ), (ˆsi , Cˆi ) = (s∗ , Ci ), i = yqs +1, ..., k | (ˆsx , Cˆx ) = (s, C)] {Ci ,i=yqs ,...,k} 104 where we have used the fact that ˆsyqs = s∗ . Using, once again, the Markov property and the same argument as in Case 1 earlier to introduce dk , we get X Pr[(ˆsyqs , Cˆyqs ) = (s∗ , Cyqs ), (ˆsi , Cˆi ) = (s∗ , Ci ), i = yqsx +1, ..., k | (ˆsx , Cˆx ) = (s, C)] {Ci ,i=yqs ,...,k} X = Pr[(ˆsyqs , Cˆyqs ) = (s∗ , Cyqs ) | (ˆsx , Cˆx ) = (s, C)] × {Ci ,i=yqs ,...,k} k−1 Y Pr[(ˆsh+1 , Cˆh+1 ) = (s∗ , Ch+1 ) | (ˆsh , Cˆh ) = (s∗ , Ch )] h=yqs ≥ (1 − dbk/2c )yqs −y0 (1 − ebk/2c )q k−1 Y (1 − dh ) h=yqs ≥ (1 − ebk/2c )q (1 − dbk/2c )k−x = (1 − ebk/2c )q (1 − dbk/2c )qαk . Consequently Pr[ˆsk = s∗ ] = E[Pr[ˆsk = s∗ | (ˆsx , Cˆx ) = (s, C)]] ≥ (1 − ebk/2c )q (1 − dbk/2c )qαk = (1 − ebk/2c )q [(1 − dbk/2c )αk ]q → 1, as k → ∞ (B.42) where the limit follows from (3.37) in Lemma 3.4.3 and the choice of αk satisfying (B.12). This proves that {ˆsk } converges to s∗ in probability. B.9 Proof of Theorem 3.4.2 First, let us derive two relations that will prove useful in the proof of the theorem. From the definition of the quantities ak (s, C), bk (s, C) given by (3.29) and (3.30) respectively we get h i ak (s, C) = Pr ˆi∗k ∈ 6 Amax | (ˆsk , Cˆk ) = (s, C) k " ≤ Pr ≤ ≤ # ˆ f (k) (ˆ ˆ f (k) (ˆ max {∆L nj,k )} ≥ max {∆L ni,k )} | (ˆsk , Cˆk ) = (s, C) j i max max j6∈Ak min Pr i∈Amax k min i∈Ak " ˆ f (k) (ˆ max {∆L nj,k )} j j6∈Amax k X i∈Amax k h j6∈Amax k f (k) ˆ Pr ∆L j # ≥ ˆ f (k) (ˆ ∆L ni,k ) i | (ˆsk , Cˆk ) = (s, C) i ˆ f (k) (ˆ (ˆ nj,k ) ≥ ∆L ni,k ) | (ˆsk , Cˆk ) = (s, C) i . (B.43) Note that ∆Lj (ˆ nj,k ) < ∆Li (ˆ ni,k ) for all j 6∈ Amax and i ∈ Amax k k . Similarly, we get bk (s, C) ≤ min X i∈Amin k h f (k) ˆ Pr ∆L j f (k) ˆ (ˆ nj,k ) ≤ ∆L i j6∈Amin k and ∆Lj (ˆ nj,k ) > ∆Li (ˆ ni,k ) for all j 6∈ Amin and i ∈ Amin k k . 105 i (ˆ ni,k ) | (ˆsk , Cˆk ) = (s, C) (B.44) Next, for any αk , consider the event [L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1] and observe that Pr[L(ˆsi+1 ) ≤ L(ˆsi ), i = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] = Pr[∃h, k ≤ h < k+αk s.t. L(ˆsh+1 ) < L(ˆsh ), and L(ˆsi+1 ) ≤ L(ˆsi ), i = k, · · · , k+αk−1, i 6= h | (ˆsk , Cˆk ) = (s, C)] + Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] ≤ Pr[L(ˆsk+αk ) < L(ˆsk ) | (ˆsk , Cˆk ) = (s, C)] + Pr[L(ˆsi+1 ) = L(ˆsi ), i = k, · · · , k + αk − 1 | (ˆsk , Cˆk ) = (s, C)]. (B.45) In addition, it follows from Lemma B.7.1 (specifically equations (B.15), (B.20), (B.22) and (B.25)) that Pr[L(ˆsh+1 ) = L(ˆsh ), h = k, ..., k + αk − 1 | (ˆsk , Cˆk ) = (s, C)] αk −(|C|+N ) ≤ p−1 + (αk − 1)(ak + bk ) 0 (1 − p0 ) + k+α k −1 X ∗ ∗ Pr[δˆM (ˆi∗M , ˆjM ) ≤ 0, δM (ˆi∗M , ˆjM ) > 0 | (ˆsk , Cˆk ) = (s, C)] M =k ≤ p−1 0 (1 + − p0 )αk −(|C|+N ) + (αk − 1)(ak + bk ) k+α k −1 X X n M =k (i,j)∈CˆM , δM (i,j)>0 o P [δˆM (i, j) ≤ 0 | (ˆsk , Cˆk ) = (s, C)]. (B.46) We can now combine (B.45), (B.46) with (B.29) to establish the following inequality for any (s, C): αk −(|C|+N ) ek (s, C) ≤ [1 − (1 − dk )αk ] + p−1 + (αk − 1)(ak + bk ) 0 (1 − p0 ) + k+α k −1 X X P [δˆM (i, j) ≤ 0 | (ˆsk , Cˆk ) = (s, C)]. (B.47) M =k {(i,j)∈CˆM ,δM (i,j)>0} Now we are ready to proceed to the proof of the theorem. If f (k) ≥ k 1+c for some c > 0 and if the assumptions of Lemma 3.4.4 are satisfied, we know from Lemma 3.4.4, the definition in (3.32) (B.10) that ¶ µ ¶ µ 1 1 =O . dk = O f (k) k 1+c Furthermore, since the space of (s, C) is finite, Lemma 3.4.4, the definition in (3.32), and inequalities (B.43) and (B.44) imply that µ 1 ak = O f (k) ¶ µ =O 1 k 1+c ¶ µ , 1 bk = O f (k) ¶ µ =O Next, choose αk = 1+c ln(k), k = 1, 2, ... − ln(1 − p0 ) 106 1 k 1+c ¶ . and observe that {αk } above satisfies (B.11) and that (1 − p0 )αk = 1 . k1+c Then, (B.47) gives µ ek = O(1 − (1 − dk )αk ) + O((1 − p0 )αk ) + O((αk − 1)(ak + bk )) + O µ ln(k) = O k 1+c ¶ µ αk + O((1 − p0 ) ¶ 1 f (k) ¶ ln(k) =O . k 1+c Finally, from (B.42) we get µ ∗ q qαk Pr[ˆsk = s ] = 1−O(1−(1−ebk/2c ) (1−dbk/2c ) ¶ ln(k) ) = 1−O(ebk/2c +αk dbk/2c ) = 1−O . k 1+c Since ˆsk can take only a finite set of values, the previous equation can be rewritten as µ Pr[|ˆsk − s∗ | ≥ ²] = O P ln(k) k 1+c ¶ (B.48) for any sufficiently small ² > 0. Since k ln(k) < ∞, we know from the Borel-Cantelli Lemma ([69], k1+c pp. 255-256) that {ˆsk } converges almost surely to the optimum allocation s∗ . 107 Appendix C PROOFS FROM CHAPTER 4 C.1 Proof of Theorem 4.2.1 We use induction on k = 0, 1, · · · and establish the result for any number of kanban k to be allocated over N stages. First, define the following vectors: xk is the allocation reached at the kth step in (4.5); x∗k is the solution of (RA3) over x ∈ Ak ; and finally yk is any allocation in Ak . For k = 0, (4.6) gives i∗0 := arg maxi=1,...,N {∆Ji (x0 )}. Then, from the definition of ∆Ji (x) and condition (U), it follows that (C.1) J(x0 + ei∗0 ) > J(x0 + ei0 ) for all i0 = 1, · · · , N , i0 6= i∗0 , which implies that x∗1 = x0 + ei∗0 . Note that this is true because (4.6) is obtained from an exhaustive search over the entire space A1 which includes only N allocations, x0 + ei for i = 1, · · · , N . Since equation (4.5) gives x1 = x0 + ei∗0 , it follows that x1 = x∗1 , that is, x1 is the solution of (RA3) over A1 . Now suppose that for some k ≥ 1 the vector xk obtained from (4.5)-(4.6) yields the optimal allocation, that is J(xk ) = J(x∗k ) ≥ J(yk ) for all yk ∈ Ak From (4.6), again i∗k = arg maxi=1,...,N {∆Ji (xk )} (a unique index under (U)). It then follows from the definition of ∆Ji (x) that J(xk + ei∗k ) = J(xk ) + ∆Ji∗k (xk ) ≥ J(xk ) + ∆Jik (xk ) = J(xk + eik ), for any ik = 1, · · · , N . Therefore, J(xk + ei∗k ) = max {J(xk + ei )} ≥ max {J(yk + ei )} i=1,..,N i=1,..,N where the inequality is due to the smoothness condition (S). Hence, x∗k+1 = xk + ei∗k Finally, note that (4.5) also gives xk+1 = xk + ei∗k , and therefore, xk+1 = x∗k+1 , i.e. xk+1 is the solution of (RA3) over Ak+1 . 108 Conversely, suppose that the algorithm yields the optimal solution for any K = 1, 2, · · ·, however it does not satisfy conditions (S) and (U) for some k < K. This implies that there exists an allocation x∗k ∈ Ak such that J(x∗k ) ≥ J(yk ) for all yk ∈ Ak and maxi=1,..,N {J(x∗k + ei )} < maxi=1,..,N {J(yk + ei )}. This implies that the algorithm does not yield an optimal allocation over Ak+1 , which is a contradiction. C.2 Proof of Theorem 4.3.1 Let yk , k = 1, · · · , K denote the allocations that the DIO process in (4.5) would visit if J(x) were known exactly. Clearly, yK is the optimal allocation due to Theorem 4.2.1. We proceed by determining the probability that (4.11)-(4.13) will yield yK for some l: Pr[ˆ xK,l = yK ] = Pr[ˆi∗K−1,l = i∗K−1 |ˆ xK−1,l = yK−1 ] Pr[ˆ xK−1,l = yK−1,l ] where ˆi∗K−1,l and i∗K−1 are defined in (4.13) and (4.6) respectively. Further conditioning, we get Pr[ˆ xK,l = yK ] = (K−1 Y ) Pr[ˆi∗k,l = i∗k |ˆ xk,l = yk ] Pr[ˆ x0,l = y0 ] (C.2) k=1 Next, take any term of the product h Pr ˆi∗k,l = i∗k |ˆ xk,l = yk i n o f (l) ˆf (l) = Pr max ∆Jˆj (x) |ˆ xk,l = yk ∆Ji∗k,l (x) > j=1,···,N j6=i∗k,l k,l j=1,···,N j6=i∗k,l = n f (l) ˆ = 1 − Pr ∆Ji∗ (x) ≤ max f (l) ∆Jˆj o (x) |ˆ xk,l = yk N [ f (l) f (l) 1 − Pr ∆Jˆi∗ (x) ≤ ∆Jˆj (x)|ˆ xk,l = yk k,l j=1 j6=i∗k,l ≥ 1− N X · f (l) f (l) Pr ∆Jˆi∗ (x) ≤ ∆Jˆj (x)|ˆ xk,l = yk ¸ k,l j=1 j6=i∗k,l (C.3) Since liml→∞ f (l) = ∞, and since ∆Ji∗k,l (x) > ∆Jj (x), j 6= i∗k,l all terms in the summation go to 0 due to Lemma 2.2.1 and therefore, all terms Pr[ˆi∗k,l = i∗k |ˆ xk,l = yk ] approach 1 as l → ∞. Moreover, by (4.12) we have Pr[ˆ x0,l = y0 ] = 1, where y0 = [0, · · · , 0]. It follows that liml→∞ Pr[ˆ xK,l = yK ] = 1 and the theorem is proved. C.3 Proof of Theorem 4.3.2 From Lemma 3.4.4 we get that · ¸ f (l) f (l) Pr ∆Jˆi∗ (x) ≤ ∆Jˆj (x)|ˆ xk,l = yk = O k,l 109 µ 1 f (l) ¶ µ =O 1 l1+c ¶ Also from (C.3) we get that the µ Pr[ˆ xk,l 6∈ Xk∗ |ˆ xk,l = yk ] = O 1 l1+c ¶ (C.4) where Xk∗ is the set of all allocations that exhibit optimal performance from the set Ak . Clearly, ∞ X 1 l=1 l1+c <∞ (C.5) Hence, using the Borel-Cantelli Lemma (see pp. 255-256 of [69]) we conclude that xˆk,l converges to the optimal allocation almost surely. 110 Appendix D PROOFS FROM CHAPTER 5 D.1 Proof of Theorem 5.4.1 In order to prove this theorem, first we need to derive the perturbation of (k, n) which is defined in (5.19). Given the definition of Dkn from equation (5.21), then, there are 9 distinct cases possible. Case 1. Nominal Sample Path: (k, n) starts a new busy period and is not blocked, i.e. Wkn ≤ 0, Bkn ≤ 0. f n ≤ 0 and Perturbed Sample Path: (k, n) starts a new busy period and is not blocked, i.e. W k n e ≤ 0. B k Applying (5.22) for both nominal and perturbed sample paths, we get: e n−1 + Z n ) = Dn−1 − D e n−1 = ∆D n−1 ∆Dkn = Dkn−1 + Zkn − (D k k k k k (D.1) Case 2. Nominal Sample Path: (k, n) starts a new busy period and is not blocked, i.e. Wkn ≤ 0 and Bkn ≤ 0. f n > 0 and B e n ≤ 0. Perturbed Sample Path: (k, n) waits and is not blocked, i.e. W k k Applying (5.22) for the nominal sample path and (5.23) for the perturbed sample path, we get: e n + Z n) ∆Dkn = Dkn−1 + Zkn − (D k−1 k n e n − Dn = Dkn−1 + Dk−1 −D k−1 k−1 n n = ∆Dk−1 + Dkn−1 − Dk−1 n = ∆Dk−1 + Ikn where (5.16) was used in the last step. Note that Ikn ≥ 0 since Wkn ≤ 0 by assumption. Adding and subtracting ∆Dkn−1 and using (5.20) allows us to rewrite this equation in the following form (which will prove more convenient later on): h (k,n−1) ∆Dkn = ∆Dkn−1 − ∆(k−1,n) − Ikn i (D.2) Case 3. Nominal Sample Path: (k, n) starts a new busy period and is not blocked, i.e. Wkn ≤ 0 and Bkn ≤ 0. 111 e n > 0. Perturbed Sample Path: (k, n) is blocked, i.e. B k Using (5.24) for the perturbed path and the definition of ∆Dkn , e n+1 ∆Dkn = Dkn − D k−xn+1 −1[n+1] n+1 n+1 e n+1 = Dkn + Dk−x −D k−xn+1 −1[n+1] − Dk−xn+1 −1[n+1] n+1 −1[n+1] n+1 n+1 = ∆Dk−x + Dkn − Dk−x n+1 −1[n+1] n+1 −1[n+1] Using (5.17),(5.18) and the fact that Dkn = Ckn we get, if 1[n + 1] = 0, n+1 ∆Dkn = ∆Dk−x − Bkn n+1 and, if 1[n + 1] = 1, n+1 n+1 n+1 n+1 ∆Dkn = ∆Dk−x + Dkn − Dk−x + Dk−x − Dk−x n+1 −1 n+1 n+1 n+1 −1 n+1 = ∆Dk−x − Bkn + Qn+1 k−xn+1 n+1 −1 Again add and subtract ∆Dkn−1 to obtain h i (k,n−1) ∆Dkn = ∆Dkn−1 − ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] (D.3) For the other six remaining cases, expressions for ∆Dkn can be derived in a similar way. We omit the details and provide only the final equations: Case 4. Nominal Sample Path: (k, n) waits and is not blocked, i.e. Wkn > 0 and Bkn ≤ 0. f n ≤ 0 and Perturbed Sample Path: (k, n) starts a new busy period and is not blocked, i.e. W k e n ≤ 0. B k h i (k−1,n) n ∆Dkn = ∆Dk−1 − ∆(k,n−1) − Wkn (D.4) Case 5. Nominal Sample Path: (k, n) waits and is not blocked, i.e. Wkn > 0 and Bkn ≤ 0. f n > 0 and B e n ≤ 0. Perturbed Sample Path: (k, n) waits and is not blocked, i.e. W k k n ∆Dkn = ∆Dk−1 (D.5) Case 6. Nominal Sample Path: (k, n) waits and is not blocked, i.e. Wkn > 0 and Bkn ≤ 0. e n > 0. Perturbed Sample Path: (k, n) is blocked, i.e. B k h i (k−1,n) n ∆Dkn = ∆Dk−1 − ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] (D.6) Case 7. Nominal Sample Path: (k, n) is blocked, i.e. Bkn > 0. f n ≤ 0 and Perturbed Sample Path: (k, n) starts a new busy period and is not blocked, i.e. W k n e Bk ≤ 0. i h (k−xn+1 ,n+1) n+1 n n + − B − [W ] (D.7) ∆Dkn = ∆Dk−x − ∆ k k (k,n−1) n+1 Case 8. Nominal Sample Path: (k, n) is blocked, i.e. Bkn > 0. f n > 0 and B e n ≤ 0. Perturbed Sample Path: (k, n) waits and is not blocked, i.e. W k k h (k−x n+1 n+1 ∆Dkn = ∆Dk−x − ∆(k−1,n) n+1 112 ,n+1) − Bkn − [Ikn ]+ i (D.8) Case 9. Nominal Sample Path: (k, n) is blocked, i.e. Bkn > 0. e n > 0. Perturbed Sample Path: (k, n) is blocked, i.e. B k h (k−x ,n+1) i n+1 n+1 n+1 ∆Dkn = ∆Dk−x − ∆(k−xn+1 −1,n+1) − Qk−xn+1 · 1[n + 1] n+1 (D.9) Using these 9 cases we can prove the theorem as follows. First, we show that the last two terms f n and B e n alone: in the max bracket of equation (5.25) can be expressed in terms of W k k (k,n−1) n ∆(k−1,n) − Ikn = ∆Dkn−1 − ∆Dk−1 − Ikn e n−1 − Dn + D e n − Dn−1 + Dn = Dkn−1 − D k−1 k−1 k−1 k k en − D e n−1 = D k−1 k fn = W k (D.10) (k,n−1) ∆(k−xn+1 −1[n+1],n+1) + Bkn − Qn+1 k−xn+1 · 1[n + 1] = n+1 = ∆Dkn−1 − ∆Dk−x + Bkn − Qn+1 k−xn+1 · 1[n + 1] n+1 −1[n+1] e n−1 − Dn+1 e n+1 = Dkn−1 − D k k−xn+1 −1[n+1] + Dk−xn+1 −1[n+1] + n+1 n+1 n+1 +Dk−x − Dkn − (Dk−x − Dk−x ) · 1[n + 1] n+1 n+1 n+1 −1 e n+1 e n−1 − Z n = D k k−xn+1 −1[n+1] − Dk e n+1 en fn + = D k−xn+1 −1[n+1] − (Ck − [Wk ] ) e n + [W f n ]+ = B k k (D.11) Therefore, equation (5.25) is equivalent to: f n, B e n + [W f n ]+ } ∆Dkn = ∆Dkn−1 − max{0, W k k k (D.12) We can then consider the following three possible cases: f n ≤ 0 and B e n ≤ 0, then Case 1 examined earlier applies and equation (D.1) gives 1. If W k k n−1 ∆Dkn = ∆Dk , which is precisely (D.12) since f n, B e n + [W f n ]+ } = 0. max{0, W k k k f n > 0 and B e n ≤ 0, then Case 2 applies and equation (D.2) holds, which is again (D.12) 2. If W k k since f n, B e n + [W f n ]+ } = W f n = ∆(k,n−1) − I n max{0, W k k k k k (k−1,n) e n > 0, then Case 3 applies and (D.3) holds, which is the same as (D.12) since 3. If B k (k,n−1) n+1 n f n, B e n + [W f n ]+ } = B e n + [W f n ]+ = ∆ max{0, W k k k k k (k−xn+1 −1[n+1],n+1) + Bk − Qk−xn+1 · 1[n + 1] 113 Appendix E PROOFS FROM CHAPTER 8 E.1 Proof of Theorem 8.4.1 In order to prove the theorem we identify the following four cases in (8.13). Case I: Ak (sm ) ≤ Am ≤ Lm−1 , i.e., m does not start a new busy period. From (8.13), even when τ obtains its maximum value, τ = Am −Ak (sm ), we have max{0, Ak (sm )+ Am − Ak (sm ) − Lm−1 } = 0 since Lm−1 ≥ Am . Therefore ∆Lm (τ ) = Z. Substituting ∆Lm (τ ) in the cost function (8.11) we get Ck (sm , τ ) = (cg − ca )τ + cg sm + ca (Lm−1 − Ak (sm )) + ca B X b=1 " Bb Z − b X #+ IiB i=1 = (cg − ca )τ + H (E.1) where H is a constant independent of τ . Since ca ≥ cg , Ck (sm , τ ) obtains its minimum value when τ obtains its maximum value, that is τ ∗ = Am − Ak (sm ) (E.2) which is shown in Figure E.1. Note that in this case, from the definition of T1 and T2 and since Ak (sm ) ≤ Am ≤ Lm−1 , max{T1 , T2 } = T1 and min{0, T1 } = 0. Therefore, (8.14) gives τ ∗ = T2 = Am − Ak (sm ) which is the result in (E.2). Case II: Ak (sm ) < Lm−1 < Am and Z − Im ≤ 0 (in this case, m starts a new busy period). Under these conditions, the second case in (8.13) applies and gives ( ∆Lm (τ ) = 0, if 0 ≤ τ ≤ Am − Ak (sm ) − Z Ak (sm ) + τ + Z − Am , if Am − Ak (sm ) − Z < τ ≤ Am − Ak (sm ) (E.3) To simplify the notation, we define P = Am − Ak (sm ) − Z, therefore, (E.3) is rewritten as ( ∆Lm (τ ) = 0, if 0 ≤ τ ≤ P τ − P, if P < τ ≤ Am − Ak (sm ) 114 (E.4) τ∗ ¾ - ? Ak (sm ) D ¾ Am τ range Lm−1 - - Figure E.1: τ ∗ for Case I Substituting ∆Lm (τ ) in the cost function (8.11) cg τ + cg sm + ca max{0, Lm−1 − Ak (sm ) − τ }, if 0 ≤ τ ≤ P Ck (sm , τ ) = cg τ + cg sm + ca max{0, Lm−1 − Ak (sm ) − τ } h i+ P P +ca B Bi −P + τ − b I B , i=1 i b=1 (E.5) if P ≤ τ ≤ Am − Ak (sm ) Since, by assumption, Lm−1 > Ak (sm ), ( Lm−1 − Ak (sm ) − τ, if 0 ≤ τ ≤ Lm−1 − Ak (sm ) 0, if Lm−1 − Ak (sm ) ≤ τ ≤ Am − Ak (sm ) (E.6) Substituting in the cost function (E.5), we get: max{0, Lm−1 − Ak (sm ) − τ } = if 0 ≤ τ ≤ Lm−1 − Ak (sm ) ≤ P (cg − ca )τ + cg sm + ca (Lm−1 − Ak (sm )), c τ + c s , if Lm−1 − Ak (sm ) ≤ τ ≤ P g g m Ck (sm , τ ) = h i+ PB Pb B cg τ + cg sm + ca b=1 Bi −P + τ − i=1 I , if P ≤ τ ≤ Am − Ak (sm ) i (E.7) Note that (E.6) breaks the first case of (E.5) into two subcases. Further, it simplifies the second case of (E.5) since for the range Lm−1 − Ak (sm ) ≤ P ≤ τ ≤ Am − Ak (sm ), max{·} = 0 leading to the third case of (E.7). Next we check each of the three possible cases in of (E.7) to find the value of τ that minimizes the cost function. In the first case, the corresponding expression is minimized when τ obtains its maximum value, i.e., τ = Lm−1 − Ak (sm ). In the second case, the expression is minimized when τ obtains its minimum value which again is τ = Lm−1 − Ak (sm ). Finally, the third expression is always greater than or equal to the second one, since the summation term is always non-negative. Therefore, the cost is minimized when τ ∗ = Lm−1 − Ak (sm ) (E.8) as shown in Figure E.2. Note that in this case Ak (sm ) ≤ Lm−1 ≤ Am , therefore, max{T1 , T2 } = T2 and min{0, T1 } = 0. Hence, (8.14) gives τ ∗ = T1 = Lm−1 − Ak (sm ) which is the result of (E.8). Case III: Ak (sm ) < Lm−1 < Am and Z − Im > 0 (as in Case II, m again starts a new busy period). Since Z − Im > 0, (8.13) reduces to ( ∆Lm (τ ) = max{τ − P, Z − Im } = Z − Im , if 0 ≤ τ ≤ Lm−1 − Ak (sm ) τ − P, if Lm−1 − Ak (sm ) < τ ≤ Am − Ak (sm ) 115 (E.9) Im ¾ τ∗ ¾ ? Z -¾ - - Lm−1 Ak (sm ) D Am Figure E.2: τ ∗ for Case II where as before, P = Am − Ak (sm ) − Z. Therefore, the additional cost due to k becomes Ck (sm , τ ) = (cg − ca )τ + cg sm + ca (Lm−1 − Ak (sm )) i+ h Pb P B , +ca B i=2 Ii b=1 Bi Z − Im − if 0 ≤ τ ≤ Lm−1 − Ak (sm ) cg τ + cg sm h i+ P Pb B +ca B B −P + τ − I , if Lm−1 − Ak (sm ) ≤ τ ≤ Am − Ak (sm ) i b=1 i=2 i (E.10) The first term is minimized when τ obtains its maximum value, i.e., τ = Lm−1 − Ak (sm ). The second term is minimized when τ obtains its minimum value i.e., τ = Lm−1 − Ak (sm ). Therefore, the value of τ that minimizes the cost is given by: τ ∗ = Lm−1 − Ak (sm ) (E.11) which is the same as (E.8) and is illustrated in Figure E.3. Note that, as in Case II, Ak (sm ) ≤ Lm−1 ≤ Am implies that max{T1 , T2 } = T2 and min{0, T1 } = 0. Hence, (8.14) gives τ ∗ = T1 = Lm−1 −Ak (sm ) which is the result of (E.11). τ∗ ¾ Z -¾ - ? D Ak (sm ) Lm−1 Am - Figure E.3: τ ∗ for Case III Case IV: Lm−1 < Ak (sm ) < Am (as in Cases II, III, m again starts a new busy period). In this case, with P as defined earlier, τ −P = τ − Am + Ak (sm ) + Z = Z − (Am − Lm−1 ) + τ + Ak (sm ) − Lm−1 = Z − Im + τ + Ak (sm ) − Lm−1 > Z − Im since Ak (sm ) > Lm−1 . Therefore, (8.13) reduces to ( ∆Lm (τ ) = max{0, τ − P } = 0, if 0 ≤ τ ≤ P τ − P, if P < τ ≤ Am − Ak (sm ) 116 (E.12) and based on the sign of P = Am − Ak (sm ) − Z we identify two subcases which are also presented in Figure E.4: (a) P ≥ 0 The cost function is then given by ( Ck (sm , τ ) = cg τ + cg sm , cg τ + cg sm + ca h PB b=1 Bi −P + τ − Pb B i=1 Ii i+ if 0 ≤ τ ≤ P , if P ≤ τ ≤ Am − Ak (sm ) (E.13) Clearly, the minimum value is obtained when τ ∗ = 0. (b) P < 0 Since τ ≥ 0, only ∆Lm (τ ) = τ − P is possible in (E.12), for all 0 ≤ τ ≤ Am − Ak (sm ). Therefore, the cost function becomes Ck (sm , τ ) = cg τ + cg sm + ca B X " Bi −P + τ − b X #+ IiB (E.14) i=1 b=1 which obtains its minimum value when τ ∗ = 0. Therefore, under the conditions of case IV, the value of τ that minimizes the cost is given by τ ∗ = 0. (E.15) In this case, Lm−1 ≤ Ak (sm ) ≤ Am , hence max{T1 , T2 } = T2 and min{0, T1 } = T1 . Therefore, (8.14) gives τ ∗ = 0 which is the result in (E.15). D Z ¾ (a) - ? Lm−1 Ak (sm ) (b) ¾ D ? Lm−1 - Am Ak (sm ) Z - Am - Figure E.4: Case IV subcases: (a) P > 0, (b) P < 0 Note that no other case is possible since, by assumption, Ak (sm ) ≤ Am and the proof is complete. E.2 Proof of Lemma 8.4.1 In the nominal path (before airplane k is considered), airplane a is expected to experience an airborne waiting time Wa = La−1 − Aa . In the perturbed path (when k is considered) under a zero GHP we 117 get Wk = La−1 − Ak (0) and Wa = La−1 − Aa + Z, since the presence of k prior to a imposes an additional airborne delay Z on airplane a. Clearly, this additional airborne delay propagates to all a + 1, · · · , l.. Therefore, Ck (0, 0) = ca [La−1 − Ak (0) + (l − a + 1)Z] + H (E.16) where H ≥ 0 accounts for the delay that k will induce on airplanes scheduled to arrive after l. Note that for Ak (0) + dk ≥ Ll (dl ) the statement of the lemma holds trivially while for Ak (0) + dk < Ll (dl ), H is constant independent of the GHD dk since in this case the departure of l in the perturbed path is fixed at Ll (dl ) + Z. Moreover, note that H = 0 if l is the last scheduled airplane, or if the idle period that precedes l + 1 is such that Il+1 ≥ Z. (1) Now, let us invoke the L-FPA algorithm. In the first iteration m(1) = a and sm = sa = (1) max{0, Aa−1 − Ak (0)} = 0. By assumption, Ak (sm ) = Ak (0) ≤ Aa ≤ La−1 , which corresponds to Case I of Theorem 8.4.1, therefore, τ ∗(1) = Aa − Ak (0). Then, the new additional cost is ³ Ck 0, τ ∗(1) ´ n o ∗(1) = cg τ ∗(1) + ca max 0, La−1 − Ak (s(1) ) + ca (l − a + 1)Z + H m +τ = cg [Aa − Ak (0)] + ca max{0, La−1 − Ak (0) − Aa + Ak (0)} + ca (l − a + 1)Z + H = cg [Aa − Ak (0)] + ca [La−1 − Aa ] + ca (l − a + 1)Z + H = Ck (0, 0) − (ca − cg )(Aa − Ak (0)) ≤ Ck (0, 0) where the max evaluates to La−1 − Aa since La−1 − Aa ≥ 0 and the last inequality is due to ca > cg . Therefore, L-FPA will assign a GHD dk = τ ∗(1) (L-FPA step 4). (2) (2) In the next iteration, m(2) = a + 1 and sm = sa+1 = Aa − Ak (0). In this case, Ak (sm ) = (2) Ak (0) + sm = Aa ≤ Aa+1 ≤ La since a + 1 also belongs in the same busy period. Therefore, again (2) Case I of Theorem 8.4.1 holds and τ ∗(2) = Aa+1 − Ak (sm ) = Aa+1 − Aa . Hence, the new additional cost is ³ ∗(2) Ck s(2) m ,τ ´ ³ ´ n o ∗(2) ∗(2) = cg s(2) + ca max 0, La − Ak (s(2) ) + ca (l − a)Z + H m +τ m +τ = cg (Aa − Ak (0) + Aa+1 − Aa ) + ca max{0, La − Aa+1 }} + ca (l − a)Z + H = cg (Aa+1 − Ak (0)) + ca (La−1 + Z − Aa+1 − Aa + Aa ) + ca (l − a)Z + H = cg (Aa+1 − Aa ) + cg (Aa − Ak (0)) + ca (La−1 − Aa ) ³ = Ck 0, τ ³ −ca (Aa+1 − Aa ) + ca (l − a + 1)Z + H ∗(1) ´ ´ − (ca − cg )(Aa+1 − Aa ) ≤ Ck 0, τ ∗(1) . (E.17) (2) Therefore, L-FPA will again increase the ground-holding delay to dk = sm + τ ∗(2) (L-FPA step 4). Equation (E.17) indicates that increasing the GHD such that Ak (dk ) is delayed from interval [Ak (0), Aa ) to [Aa , Aa+1 ) reduces the additional cost by an amount proportional to the length of this interval, i.e., by (ca − cg )(Aa+1 − Aa ). By proceeding in exactly the same way, in every iteration j, 1 < j ≤ l − a + 1, the additional cost is reduced by (ca − cg )(Aa+j−1 − Aa+j−2 ) and therefore the ground-holding delay is dk = s(l−a+1) + τ ∗(l−a+1) = Al−1 − Ak (0) + Al − Al−1 = Al − Ak (0) m 118 This in turn implies that Ak (dk ) = Ak (0) + dk = Ak (0) + Al − Ak (0) = Al that is, the earliest that k will arrive is exactly at the same time as the arrival of l. In this case, the additional cost is given by ³ , τ ∗(l−a+1) Ck s(l−a+1) m ´ ´ ³ = Ck s(l−a) , τ ∗(l−a) − (ca − cg )(Al − Al−1 ) m ³ ´ = Ck s(l−a−1) , τ ∗(l−a−1) − (ca − cg )(Al − Al−2 ) m = ··· = Ck (0, 0) − (ca − cg ) [Al − Ak (0)] = ca [La−1 + (l − a + 1)Z] − ca Ak (0) − ca Al + ca Ak (0) +cg [Al − Ak (0)] + H = ca [Ll − Al ] + cg [Al − Ak (0)] + H = cg [Ll − Ak (0)] + (ca − cg )[Ll − Al ] + H (E.18) where we have used (E.16). Finally, in the next iteration m(l−a+2) − 1 = l, m(1−a+2) = l + 1, and (l−a+2) sm = Al − Ak (0). Since l + 1 starts a new busy period, Al+1 (sl+1 ) = Al+1 (0) > Ll (dl ). In this (l−a+2) case, Ak (sm ) < Lm(l−a+2) −1 < Am(l−a+2) = Al+1 (0) hence, either Case II or III of Theorem 8.4.1 (l−a+2) (l−a+2) holds, thus τ ∗(l−a+2) = Lm(l−a+2) −1 −Ak (sm ) = Ll −Ak (0)−sm cost consists only of the GHD assigned to k, that is ³ Ck s(l−a+2) , τ ∗(l−a+2) m ´ h . In this case, the additional i = cg s(l−a+2) + τ ∗(l−a+2) + H m h i = cg s(l−a+2) + Ll − Ak (0) − s(l−a+2) +H m m = cg [Ll − Ak (0)] + H ³ ≤ Ck s(l−a+1) , τ ∗(l−a+1) m ´ (E.19) Hence, L-FPA will assign a new GHD so that dk = s(l−a+2) + τ ∗(l−a+2) = Ll − Ak (0) m If H is large enough, it is possible that following iterations may further increase the ground holding delay, so dk ≥ Ll − Ak (0), which is the statement of the lemma. E.3 Proof of Lemma 8.4.2 The proof is by induction over the added airplanes k = 1, 2, · · ·. First, we need to show that A2 (d2 ) ≥ A1 (d1 ). To show this, consider the scheduling done by L-FPA for airplane k = 1. In this case, since L-FPA starts with an empty list, airplanes m − 1 and m do not exist, therefore, Lm−1 = 0 and Am = ∞. Also, sm can only take a single value, sm = max{0, Am−1 − Ak (0)} = 0, hence s1 = 0. Thus, since Lm−1 < Ak (0) = A1 (0) < Am , Case IV of Theorem 8.4.1 holds and as a result τ ∗ = τ 1 = 0. Hence, d1 = s1 + τ 1 = 0, therefore, A1 (d1 ) = A1 (0) ≤ A2 (0) ≤ A2 (d2 ) since d2 ≥ 0. Next, suppose that order is preserved up to k and check if Ak+1 (dk+1 ) ≥ Ak (dk ). Since order is preserved up to k, then k is the last arrival in the last scheduled busy period since Ak (0) ≥ Aj (0) for all j = 1, · · · , k − 1. We can now identify two cases: 119 Case 1: k starts the last busy period. In this case, Ak (0) > Lk−1 (dk−1 ), hence, Case IV of Theorem 8.4.1 holds while sk can only take the value 0 and thus τ ∗ = τ k = 0. Thus, dk = sk +τ k = 0. Then Ak+1 (dk+1 ) ≥ Ak+1 (0) ≥ Ak (0) = Ak (dk ), therefore order is preserved. Case 2: k does not start the last busy period. In this case, if Ak+1 (0) > Ak (dk ) order is preserved trivially since dk+1 ≥ 0. On the other hand, if Ak+1 (0) < Ak (dk ), then by Lemma 8.4.1, k + 1 will be assigned a ground delay dk+1 such that Ak+1 (dk+1 ) ≥ Lk (dk ) > Ak (dk ) therefore, order will be preserved and the proof is complete. E.4 Proof of Theorem 8.4.2 First, note that the total cost of the system with k flights under the L-FPA control policy is given by CT (k) = k X Cj (sj , τ j ) (E.20) j=1 where Cj (sj , τ j ) is given by (8.11), and sj , τ j are the solutions to (8.9). Next, we proceed to a proof by induction over k = 1, 2, · · ·. First, when k = 1, since we start with an empty list, airplanes m − 1 and m do not exist, therefore, Lm−1 = 0 and Am = ∞. Also, sm can only take a single value, sm = 0, hence s1 = 0. Then, since Lm−1 < Ak (0) = A1 (0) < Am , Case IV of Theorem 8.4.1 holds and as a result τ ∗ = τ 1 = 0. Hence, C1 (s1 , τ 1 ) = C1 (0, 0) = 0. Then, the cost in (E.20) is CT (1) = 0, which is minimum since the cost is non-negative. Now suppose that for airplane k, L-FPA yields a GHD dk = sk + τ k that minimizes the delay cost, that is, CT (k) = CT∗ (k). Note that since the delay cost is minimized, then the airborne delay of the jth airplane must be zero and therefore Lj (sj + τ j ) = Aj (sj + τ j ) + Z for all j = 1, · · · , k. Next, consider what happens to airplane k + 1. From Corollary 8.4.1 we can identify two cases: Case 1: If dk+1 = 0, then Ck+1 (0, 0) = 0. As a result, CT (k + 1) = k+1 X Cj (sj , τ j ) = CT∗ (k) + Ck+1 (0, 0) = CT∗ (k) j=1 which implies that CT (k + 1) is minimized since the cost is a non-decreasing function of k. Case 2: If dk+1 = Lk (dk ) − Ak+1 (0), observe the following: If k + 1 were assigned a zero GHD, then its airborne waiting time would be Lk (dk ) − Ak (0) = dk+1 . Hence, under L-FPA, k has traded off all of its airborne waiting time for exactly the same amount of ground holding time, i.e., it satisfies (8.15). Further, note that k + 1 is added at the end of the list due to Lemma 8.4.2, therefore k + 1 induces no delays to any airplane j = 1, · · · , k. Hence, cost remains minimum and the proof is complete. 120 Bibliography [1] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley & Sons, 1989. [2] B. T. Allen, Managerial Economics, Harper Collins, 1994. ´ ttir, A global search method for discrete stochastic optimization, SIAM Journal [3] S. Andrado on Optimization, 6 (1999), pp. 513–530. [4] G. Andreatta and L. Brunetta, Multi-airport ground holding problem: A computational evaluation of exact algorithms, Operations Research, 46 (1998), pp. 57–64. [5] G. Andreatta and G. Romanin-Jacur, Aircraft flow management under congestion, Transportation Science, 21 (1987), pp. 249–253. [6] M. Asawa and D. Teneketzis, Multi-armed bandits with switching penalties, IEEE Transactions on Automatic Control, 41 (1996), pp. 328–348. [7] A. Ashburn, Toyota famous OhNo system, American machinist, in Applying Just in Time: The American/Japanese Experience, Y. Monden, ed., IIE Press, 1986. [8] D. Bertsimas and S. S. Patterson, The air traffic flow management problem with enroute capacities, Operations Research, 46 (1998), pp. 406–422. [9] S. Brooks, A discussion of random methods for seeking maxima, Operations Research, 6 (1958). [10] C. Cassandras and S. Strickland, Observable augmented systems for sensitivity analysis of Markov and semi-Markov processes, IEEE Transactions on Automatic Control, 34 (1989), pp. 1026–1037. [11] , On-line sensitivity analysis of Markov chains, IEEE Transactions on Automatic Control, 34 (1989), pp. 76–86. [12] C. G. Cassandras, Discrete Event Systems, Modeling and Performance Analysis, IRWIN, 1993. [13] C. G. Cassandras, L. Dai, and C. G. Panayiotou, Ordinal optimization for a class of deterministic and stochastic discrete resource allocation problems, IEEE Transactions on Automatic Control, 43 (1998), pp. 881–900. [14] C. G. Cassandras and V. Julka, Descent algorithms for discrete resource allocation problems, in Proceedings of the 33rd Conference on Decision and Control, Dec 1994, pp. 2639–2644. 121 [15] , Scheduling policies using marked/phantom slot algorithms, Queueing Systems: Theory and Applications, 20 (1995), pp. 207–254. [16] C. G. Cassandras and C. G. Panayiotou, Concurrent sample path analysis of discrete event systems, Accepted in Journal of Discrete Event Dynamic Systems, (1999). [17] C. G. Cassandras and W. Shi, Perturbation analysis of multiclass multiobjective queueing systems with ‘quality-of-service’ guarantees, in Proceedings of the 35th Conference on Decision and Control, Dec 1996, pp. 3322–3327. [18] C. Chen and Y. Ho, An approximation approach of the standard clock method for general discrete event simulation, IEEE Transactions on Control Applications, 3 (1995), pp. 309–317. [19] L. Cimini, G. Foschini, C.-L. I, and Z. Miljanic, Call blocking performance of distributed algorithms for dynamic channel allocation in microcells, IEEE Transactions on Communications, 42 (1994), pp. 2600–7. [20] L. Cimini, G. Foschini, and L. Shepp, Single-channel user-capacity calculations for selforganizing cellular systems, IEEE Transactions on Communications, 42 (1994), pp. 3137–3143. [21] D. C. Cox and D. O. Reudink, Increasing channel occupancy in large-scale mobile radio systems: Dynamic channel reassignment, IEEE Transactions on Vehicular Technology, 22 (1973). [22] L. Dai, Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems, Journal of Optimization Theory and Applications, 91 (1996), pp. 363–388. [23] L. Dai, C. G. Cassandras, and C. G. Panayiotou, On the convergence rate of ordinal optimization for stochastic discrete resource allocation problems, To appear in IEEE Transactions on Automatic Control, 44 (1999). [24] M. Di Mascolo, Y. Frein, Y. Dallery, and R. David, A unified modeling of kanban systems using petri nets, Intl. Journal of Flexible Manufacturing Systems, 3 (1991), pp. 275– 307. [25] B. Eklundh, Channel utilization and blocking probability in a cellular mobile telephone system with directed retry, IEEE Transactions on Communications, 34 (1986), pp. 329–337. [26] D. Everitt, Traffic capacity of cellular mobile communication systems, Computer Networks ISDN Systems, 20 (1990), pp. 447–54. [27] R. Galager, A minimum delay routing algorithm using distributed computation, IEEE Transactions on Communications, 25 (1977), pp. 73–85. [28] J. Gittins, Multi-Armed Bandit Allocation Indices, Wiley, New York, 1989. [29] J. Gittins and D. Jones, A dynamic allocation index for the sequential design of experiments, in Progress in Statistics, European Meeting of Statisticians, K. S. D. Gani and I. Vince, eds., Amsterdam: North Holland, 1974, pp. 241–266. [30] P. Glasserman, Gradient Estimation via Perturbation Analysis, Kluwer, Boston, 1991. [31] W.-B. Gong, Y. Ho, and W. Zhai, Stochastic comparison algorithm for discrete optimization with estimation, in Proceedings of 31st IEEE Conference on Decision and Control, Dec 1992, pp. 795–802. 122 [32] , Stochastic comparison algorithm for discrete optimization with estimation, Journal of Discrete Event Dynamic Systems: Theory and Applications, (1995). [33] S. Grandhi, R. Vijayan, D. Goodman, and Z. J., Centralized power-control in cellular radio systems, IEEE Transactions On Vehicular Technology, 42 (1993), pp. 466–468. [34] Y. Gupta and M. Gupta, A system dynamics model for a multistage multiline dual-card JIT-kanban system, Intl. Journal of Production Research, 27 (1989), pp. 309–352. [35] Y. Ho and X. Cao, Perturbation Analysis of Discrete Event Systems, Kluwer, Boston, 1991. [36] Y. Ho, M. Eyler, and T. Chien, A gradient technique for general buffer storage design in production line, Intl. Journal of Production Research, 17 (1979), pp. 557–580. [37] Y. Ho, R. Sreenivas, and P. Vakili, Ordinal optimization in DEDS, Journal of Discrete Event Dynamic Systems: Theory and Applications, 2 (1992), pp. 61–88. [38] Y. C. Ho, Heuristics, rule of thumb, and the 80/20 proposition, IEEE Transactions on Automatic Control, 39 (1994), pp. 1025–1027. [39] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975. [40] P. Huang, L. Rees, and T. B.W., A simulation analysis of the japanese Just-In-Time technique (with kanbans) for a multiline multistage production system, Decision Sciences, 14 (1983), pp. 326–344. [41] T. Ibaraki and N. Katoh, Resource Allocation Problems, Algorithmic Approaches, MIT Press, 1988. [42] T. J. Kahwa and N. D. Georganas, A hybrid channel assignment scheme in large-scale cellular-structured mobile communication systems, IEEE Transactions on Communications, 26 (1978). [43] V. Kalashnikov, Topics on Regenerative Processes, CRC Press, Boca Raton, Florida, 1994. [44] J. Karlsson and B. Eklundh, A cellular mobile telephone system with load sharing-an enhancement of directed retry, IEEE Transactions on Communications, 37 (1989), pp. 530–535. [45] I. Katzela and M. Naghshineh, Channel assignment schemes for cellular mobile telecommunication systems: A comprehensive survey, IEEE Personal Communications, 3 (1996), pp. 10–31. [46] J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Annals of Mathematical Statistics, 23 (1952), pp. 462–466. [47] O. Kimura and H. Terada, Design and analysis of pull system, a method of multi-stage production control, Intl. Journal of Production Research, 19 (1981), pp. 241–253. [48] L. Kleinrock, Queueing Systems. Volume I: Theory, Wiley, 1975. [49] X. Lagrange and B. Jabbari, Fairness in wireless microcellular networks, IEEE Transactions On Vehicular Technology, 47 (1998), p. 472. [50] M. Lulu and J. Black, Effect of process unreliability on integrated manufacturing/production systems, Journal of Manufacturing Systems, 6 (1987), pp. 15–22. 123 [51] V. H. MacDonald, The cellular concept, Bell System Technology Journal, 58 (1978), pp. 15– 41. [52] D. Mitra and I. Mitrani, Analysis of a novel discipline for cell coordination in production lines, tech. report, AT&T Laboratories, 1988. ´ ski, On optimal allocation of indivisibles [53] V. I. Norkin, Y. M. Ermoliev, and Ruszczyn under uncertainty, Operations Research, 46 (1998), pp. 381–395. [54] C. G. Panayiotou and C. G. Cassandras, Dynamic resource allocation in discrete event systems, in Proceedings of IEEE Mediterranean Conference on Control and Systems, Jul 1997. [55] , Dynamic transmission scheduling for packet radio networks, in Proceedings of IEEE Symposium on Computers and Communications, Jun 1998, pp. 69–73. [56] , Flow control for a class of transportation systems, in Proceedings of IEEE Intl. Conference on Control Applications, Sep 1998, pp. 771–775. [57] , Optimization of kanban-based manufacturing systems, Accepted for publication in AUTOMATICA, (1999). [58] , A sample path approach for solving the ground-holding policy problem in air traffic control, Submitted to IEEE Transaction on Control Systems Technology, (1999). [59] R. Parker and R. Rardin, Discrete Optimization, Academic Press, Inc, Boston, 1988. [60] P. Philipoom, L. Rees, B. Taylor, and P. Huang, An investigation of the factors influencing the number of kanbans required in the implementation of the JIT technique with kanbans, Intl. Journal of Production Research, 25 (1987), pp. 457–472. [61] P. A. Raymond, Performance analysis of cellular networks, IEEE Transactions on Communications, 39 (1991), pp. 1787–1793. [62] O. Richetta and A. R. Odoni, Solving optimally the static ground-holding policy problem in air traffic control, Transportation Science, 27 (1993), pp. 228–238. [63] , Dynamic solution to the ground-holding problem in air traffic control, Transportation Research, 28A (1994), pp. 167–185. [64] H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics, 22 (1951), pp. 400–407. [65] B. Schroer, J. Black, and S. Zhang, Just-In-Time (JIT), with kanban, manufacturing system simulation on a microcomputer, Simulation, 45 (1985), pp. 62–70. ´ [66] L. Shi and S. Olafsson, Convergence rate of nested partitions method for stochastic optimization, Submitted to Management Science, (1997). [67] , Stopping rules for the stochastic nested partition method, paper in progress, (1998). [68] , Nested partitions method for global optimization, To appear in Operations Research, (1999). [69] A. Shiryayev, Probability, Springer-Verlag, New York, 1979. 124 [70] K. C. So and S. C. Pinault, Allocating buffer storage in a pull system, Intl. Journal of Production Research, 26 (1988), pp. 1959–1980. [71] Y. Sugimori, K. Kusunoki, F. Cho, and S. Uchikawa, Toyota production system materialization of Just-In-Time and research-for-human systems, Intl. Journal of Production Research, 15 (1977), pp. 553–564. [72] M. Terrab and A. R. Odoni, Strategic flow management for air traffic control, Operations Research, 41 (1993), pp. 138–152. [73] R. Uzsoy and L. A. Martin-Vega, Modeling kanban-based demand-pull systems: a survey and critique, Manufacturing Review, 3 (1990), pp. 155–160. [74] P. Vakili, A standard clock technique for efficient simulation, Operations Research Letters, 10 (1991), pp. 445–452. [75] P. Vranas, D. Bertsimas, and A. R. Odoni, The multi-airport ground-holding problem in air traffic control, Operations Research, 42 (1994), pp. 249–261. [76] J. Wieselthier, C. Barnhart, and A. Ephremides, Optimal admission control in circuitswitched multihop radio networks, in Proceedings of 31st IEEE Conference on Decision and Control, Dec 1992, pp. 1011–1013. [77] D. Yan and H. Mukai, Stochastic discrete optimization, SIAM Journal on Control and Optimization, 30 (1992). [78] H. Yan, X. Zhou, and G. Yin, Finding optimal number of kanbans in a manufacturing system via stochastic approximation and perturbation analysis, in Proceedings of 11th Intl. Conference on Analysis and Optimization of Systems, 1994, pp. 572–578. [79] J. Zander, Distributed co-channel interference control in cellular radio systems, IEEE Transactions On Vehicular Technology, 41 (1992). [80] H. Zhu and V. S. Frost, In-service monitoring for cell loss quality of service violations in ATM networks, IEEE/ACM Transactions on Networking, 4 (1996), pp. 240–248. 125