Proceedings of MIMAR 2007
Transcription
Proceedings of MIMAR 2007
Modelling in Industrial Maintenance and Reliability Proceedings of MIMAR2007, the 6th IMA International Conference 10-11 September 2007, The Lowry Centre, Salford Quays, Manchester, United Kingdom Edited by Matthew Carr Philip Scarf Wenbin Wang Organised jointly by the Centre for Operational Research and Applied Statistics, Salford Business School, University of Salford and by the Institute of Mathematics and its Applications United Kingdom Preface These proceedings present a collected volume of papers submitted for the Institute of Mathematics and its Applications (IMA) 6th international conference on Modelling in Industrial Maintenance and Reliability (MIMAR) held on the 10th and 11th September 2007 at the Lowry Centre in Salford Quays, Manchester. The conference has been jointly organised by the IMA and the Centre for Operational Research and Applied Statistics (CORAS) at the University of Salford. The MIMAR conferences follow a three year cycle; previous conferences were held at the University of Edinburgh (1992, 1998) and the University of Salford (1995, 2001, 2004). The aim of the conference is to provide an overview of current research in industrial maintenance and reliability modelling and discussion on areas of future research. Topics for presentation include; life cycle analysis and maintenance strategies, inspections and replacements, condition monitoring and condition-based maintenance, and warrantee analysis and logistics. The conference involves researchers from all over the UK, China, Czech Republic, Brazil, Canada, Japan, Kuwait, Sweden, Finland and more. All delegates were asked to produce a short paper according to a particular format describing the work that they intended to present at the conference. It is these papers that are collected together for presentation in this volume. Authors were asked to construct the papers according to a specific format thus assisting the editing process and enabling the proceedings to have a consistent style and presentation. The short papers enable authors to; (i) publish research, (ii) provide the interested listener with supportive material during the course of presentations, and (iii) allow authors to supply additional material that may be difficult to present to complement the presentation, provide support and a point of reference for further discussion. Where possible, the papers are organised according to the scheduled order of presentation in the conference programme. When papers have not been supplied due to restrictions on time or the required format, abstracts only have been included in these proceedings. In addition, authors have been invited to submit for publication in a special issue of the IMA journal of ‘Management Mathematics’ an extended version of the papers presented here. Papers submitted for the special issue will be subject to a more rigorous refereeing process consistent with standard scientific journals. Details are available on the IMA web-site and papers are due by 31st September 2007. MATTHEW CARR PHILIP SCARF WENBIN WANG August 2007. Contents (Organised according to the order of presentation at the conference) Pg Modelling different failure modes in CBM applications using a weighted combination of stochastic filters M J Carr and W Wang 1 Remaining useful life in condition based maintenance: Is it useful? D Banjevic and A K S Jardine 7 Demand categorisation in a European spare parts logistics network (Abstract only) A A Syntetos, M Z Babai and M Keyes 13 How academics can help industry and the other way around R Dwight, H Martin and J Sharp 14 Stochastic modelling maintenance actions of complex systems J R Kearney, D F Percy and K A H Kobbacy 18 Application of the delay-time concept in a manufacturing industry B Jones, I Jenkinson and J Wang 23 A preventive maintenance decision model based on a MCDA approach R J P Ferreira, C A V Cavalcante and A T de Almeida 29 Predicting the performance of future generations of complex repairable systems, through analysis of reliability and maintenance data T J Jefferis, N Montgomery and T Dowd 35 A note on a calibration method of control parameters in simulation models for software reliability assessment M Kimura 40 Estimating the availability of a reverse osmosis plant M Hajeeh, F Faramarzi and G Al-Essa 45 The use of IT within maintenance management for continuous improvement A Ingwald and M Kans 51 An approximate algorithm for condition-based maintenance applications M J Carr and W Wang 57 The utility of a maintenance policy R D Baker 63 Stochastic demand patterns for Markov service facilities with neutral and active periods A Csenki 68 Multicriteria decision model for selecting maintenance contracts by applying utility theory and variable interdependent parameters A J de Melo Brito and A T de Almeida 74 Spare parts planning and risk assessment associated with non-considering system operating environment B Ghodrati 80 Modern maintenance system based on web and mobile technologies J Campos, E Jantunen and O Prakash 91 A literature review of computerised maintenance management support M Kans 96 Some generalizations of age and block replacement (Abstract only) P Scarf 102 Scheduling the imperfect preventive maintenance policy for deteriorating systems Y Jia and X Chen 103 Contribution to modelling of dynamic dependability of complex systems D Valis 110 Weak sound pulse extraction in pipe leak inspection using stochastic resonance Y Dingxin and X Yongcheng 116 An empirical comparison of periodic stock control heuristics for intermittent demand items (Abstract only) M Z Babai and A A Syntetos 121 Minimising average long-run cost for systems monitored by the np control chart S Wu and W Wang 122 Condition evaluation of equipment in power plant based on grey theory J Li and S Huang 134 Modelling different failure modes in CBM applications using a weighted combination of stochastic filters Matthew J. Carr, Wenbin Wang CORAS, University of Salford, UK. m.j.carr@salford.ac.uk, w.wang@salford.ac.uk Abstract: In the context of condition-based maintenance (CBM), probabilistic stochastic filters provide an established means of recursively estimating the residual life of an individual component using condition monitoring (CM) information. In this paper, we consider the potential for modelling the impact of multiple failure modes that exhibit specific types of behaviour and are identifiable using historical data. The behaviour may be categorised according to the pattern of the observed CM information, the failure times of components, or both. Stochastic filters are constructed for each contingency and we develop a Bayesian model that is used to recursively evaluate the probability that the observed CM information corresponds to each of the failure modes. The output from the individual filters is then weighted accordingly. Two scenarios are considered with the first involving a fixed but unknown underlying failure mode and the second catering for transitions between the failure modes over time. An example is presented using simulated data to illustrate the applicability of the methodology. 1. Introduction CBM applications utilise CM information when scheduling maintenance and replacement activities for individual components. The components degrade stochastically over time under operational conditions and CBM models are used to reduce the occurrence of costly and untimely failures. A number of relevant models exist in the literature including proportional hazards models (Makis & Jardine (1991) and Vlok et al. (2002)), accelerated failure-time models (Cox & Oakes (1984)) and stochastic filters (Wang & Christer (2000) and Wang (2002)). The models are parameterised using data sets consisting of CM histories pertaining to analogous components. For the research documented in this paper, we are considering CBM scenarios where multiple failure modes exhibit themselves and are identifiable using historical CM data. We assume that the different failure modes display behaviour that can be categorised according to the CM output, the failure times of the components or both. The methodology proposed in this paper involves constructing a stochastic filter for each of the defined failure modes. A given filter is used to recursively establish a conditional density for the residual life at each CM point under the relevant failure mode. An estimate of the residual life of a component is then defined as a weighted combination of the respective output from the individual filters. Firstly, we consider a situation where, the underlying dynamics (or failure mode) are assumed to be fixed and conform to one of the proposed models for the case. This model is for use when the behaviour can correspond to a number of distinct behavioural types and we are simply unaware which type the current component conforms to. An example considered in section 5 involves the modelling and estimation of the residual life of a component when the behaviour can correspond to one of two potential failure modes. The behaviour is assumed to manifest itself in the form of failure time clustering as demonstrated in figure 1. Separate models are established for each scenario and a recursive procedure is developed to determine, during the life of a component, which model the underlying dynamics conform to using both the age and the available CM history. We also consider the potential for the dynamics to evolve or fluctuate during the life of a component. We assume that at any given stage, the underlying failure mode conforms to one or more of the proposed models and that unknown transitions between the individual failure modes occur over time. The transition probabilities must be estimated from available data and are modelled using a Markov chain. Figure 1. Illustrating the clustering of failure times when two different failure modes exist 1 2. An individual stochastic filter In this section, we describe a stochastic filter designed to facilitate the residual life prediction for a component under the jth individual failure mode ( j = 1, 2, …, r). xi is defined as the underlying residual life of a component at the ith CM point at time ti . In addition, we observe a vector of CM parameters yi and have the availability of the filtration or CM history ℑi = { y0, y1,…yi }. From Wang (2002), the posterior conditional probability density for the residual life at the ith monitoring point given the history of CM information for a component is given by p ji ( xi | M j , ℑ i ) = p ( xi | M j , y , ℑ i −1 ) = { p ( y | xi , M j , ℑi −1 ) p ( xi | M j , ℑi −1 )} / p( y | M j , ℑi −1 ) (1) i i i where, Mj represents the jth failure mode and yi and ℑi-1 are independent given xi and we have p( y | xi , M j , ℑi −1 ) ≡ p( y | xi , M j ) i i (2) The influence of the individual failure mode is reflected in the specification of the component parts of the relevant filter i.e. a density for the initial residual life pj0(x0| Mj, ℑ0) = pj0(x0| Mj) and the density given by equation (2) representing the stochastic relationship between the monitored information and the underlying residual life. The second element of the numerator of equation (1) is derived as an updated version of the residual life distribution from the previous recursion, at time ti-1, as p ( xi | M j , ℑi −1 ) = p j ,i −1 ( xi −1 = xi + t i − t i −1 | M j , ℑi −1 ) / ∫ ∞ ti −ti −1 p j ,i −1 (u | M j , ℑi −1 )du (3) The denominator of equation (1) is established as p( y | M j , ℑi −1 ) = i ∞ ∫0 p ( y | xi , M j , ℑi −1 ) p ( xi | M j , ℑ i −1 )dxi i (4) Using historical CM information and failure time data, the parameters of the individual stochastic filters are established for W relevant histories using the likelihood function W ⎛ nd ⎞ L(θ) = ∏⎜⎜ ∏ p ( y | M j , ℑ d ,i −1 )P jd ,i −1 ( x d ,i −1 > t di − t d ,i −1| M j , ℑ d ,i −1 ) ⎟⎟ p jdnd ( x dnd = Td − t dnd | M j , ℑ dnd ) (5) di ⎠ d =1⎝ i =1 where a lower-case p represents a density function, an upper-case P represents a probability, θ is the unknown parameters set and nd is the number of monitoring points in the dth CM history (d = 1, …., W). 3. Fixed failure mode In this section, we discuss the weighted modelling approach for an individual component with a fixed underlying failure mode. The problem is essentially one of competing risks (Crowder (2001)) and we construct r different stochastic filters, each pertaining to an individual and distinct failure mode. The notation Mj represents failure mode j ( j = 1, 2, …, r) and with the availability of a set of past CM histories, the jth stochastic filter is parameterised using only those CM histories that correspond to failure mode j. The prior probability that the underlying dynamics of the CM process for a given component correspond to failure mode j is denoted as p(Mj|ℑ0) and is easily estimated from historical data as p(Mj|ℑ0) = (Number of histories relevant to failure mode j) / (Total number of histories) Considering a vector of condition monitoring parameters, yi, obtained at the ith discrete monitoring point at time ti, we have p ( M j | ℑi ) = p( M j | y , ℑi −1 ) (6) i as the conditional probability that the underlying dynamics of the current CM process correspond to failure mode j given the CM history available until that point in time. By the application of Bayes’ law we obtain p ( M j | y , ℑi −1 ) = { p( y | M j , ℑi −1 ) p( M j | ℑi −1 )} / p( y | ℑi −1 ) (7) i i i where, the initial probability p(Mj | ℑ0) is assumed to be known and p(Mj | ℑi-1) represents the probability that the underlying dynamics conform to failure mode j and is available from the previous recursion of the process. This is the means by which our best judgement regarding the actual underlying failure mode, and hence the residual life of the unit, is updated at each monitoring point. We also have p( y | M j , ℑi −1 ) = i ∞ ∫0 p( y | xi , M j ) p( xi | M j , ℑ i −1 )dxi i (8) on the assumption that p( yi| xi, Mj, ℑi-1) = p( yi| xi, Mj), i.e. yi is controlled by xi and Mj only. The denominator of equation (7) is obtained by enumerating over all the possible scenarios as p( y | ℑi −1 ) = i r ∑ p( y i | M j , ℑi −1 ) p( M j | ℑi −1 ) j =1 A weighted mean estimate of the residual life can be obtained as 2 (9) E [ x i | ℑi ] = ∞ ∫0 x i p i ( x i | ℑi ) dxi (10) where the weighted conditional distribution is simply r p i ( xi | ℑ i ) = ∑ p ji ( x i | M j , ℑ i ) p( M j | ℑ i ) (11) j =1 4. Failure mode transitions To facilitate the modelling of underlying dynamics that can potentially vary over time as a component ages, we introduce the notation Mji as being representative of the underlying dynamics conforming to failure mode j at the ith monitoring point. A time-invariant Markov chain is established with transition probabilities a k j = p( M ji | M k , i −1 ) (12) that correspond to the conditional probability that the underlying dynamics currently conform to failure mode j at the ith monitoring point given that they conformed to failure mode k at the previous monitoring point. The objective of the combined modelling approach with evolving dynamics is to establish the conditional distribution r p i ( x i | ℑ i ) = ∑ p ( xi | M ji , ℑi ) p ( M ji | ℑi ) (13) j =1 Both terms in equation (13) require some explanation. The first is established as p( xi | M ji , ℑi ) = p( xi | M ji , y , ℑi −1 ) = { p ( y | x i , M ji ) p ( xi | M ji , ℑi −1 )} / p ( y | M ji , ℑi −1 ) i i i (14) where, the relationship p( yi| xi, Mji) is available from the model specification and we have p ( xi | M ji , ℑi −1 ) = r ∑ p( xi | M ji , M k ,i −1, ℑi −1 ) p(M k ,i −1 | M ji , ℑi −1 ) (15) k =1 In this context, p(xi | Mji, Mk,i-1, ℑi-1) = p(xi | Mk,i-1, ℑi-1) as the one step prediction of xi is available from the previous recursion and is not dependent on the current model due to the lack of reliance on yi. We also have the reverse transition expression r p ( M k ,i −1 | M ji , ℑi −1 ) = {a kj p( M k ,i −1 | ℑ i −1 )} / ∑ a kj p( M k ,i −1 | ℑ i −1 ) (16) k =1 and the denominator of equation (14) is established as p( y i | M ji , ℑ i −1 ) = ∞ ∫0 p( y | xi , M ji ) p ( xi | M ji , ℑi −1 )dxi (17) i Now we consider the second term of equation (13). Assuming the initial probability that the underlying dynamics at the start of the CM process for a new component correspond to failure mode j, p(Mj|ℑ0), is known, we again employ Bayes’ theorem to recursively obtain p ( M ji | ℑi ) = p ( M ji | y , ℑi −1 ) = { p( y | M ji , ℑi −1 ) p( M ji | ℑ i −1 )} / p ( y | ℑ i −1 ) (18) i i i where the constituent elements of the numerator are p ( y | M ji , ℑi −1 ) = i ∞ ∫0 p ( y | xi , M ji ) p( xi | M ji , ℑ i −1 ) dx i i (19) r p( M ji | ℑ i −1 ) = ∑ a kj p ( M k ,i −1 | ℑ i −1 ) (20) k =1 and the denominator is given by enumerating over the prediction available from all the potential models as r p ( y | ℑ i −1 ) = ∑ p ( y | ℑ i −1 , M ji ) p ( M i j =1 i ji | ℑ i −1 ) (21) 5. Example – fixed dynamics In this example, we consider the modelling and estimation of the residual life of a component using vibration information when two potential failure modes are assumed to have been identified from relevant data in a scenario similar to that illustrated in figure 1. When the monitoring process commences for a new component, the underlying dynamics are assumed fixed but unknown, as described in section 3, and we develop two separate stochastic filters (filter 1 and filter 2) to represent each potential eventuality. The filters are developed using the same functional forms but are parameterised independently using relevant analogous component histories. The filters are then conducted in parallel and their respective output weighted according to the 3 probability that the underlying dynamics correspond to the relevant failure mode. In this example, we simulate a cycle of data in accordance with each modelling formulation and investigate the ability of the prescribed methodology to track the appropriate underlying failure mode and the residual life of the component. The estimate of the residual life at each monitoring point is then compared with the estimate from a general stochastic filter (filter 3) that is developed and parameterised using all the available monitoring information, i.e. the histories are not classified according to any failure mode and are all grouped together for parameter estimation purposes. This is achieved by simulating a large number of cycles of CM data corresponding to failure modes 1 and 2 and parameterising a general stochastic filter (filter 3) using all the simulated output. We then compare the weighted output from filters 1 and 2 with the output from filter 3 to ascertain the benefit of the combined modelling approach for this particular scenario. From equations (1) – (4), the filtering expression for filter j is p ( y i | x i , M j ) p j ,i −1 ( x i + t i − t i −1 | ℑ i −1 , M j ) p ji ( xi | ℑi , M j ) = (22) ∞ ∫ p( y i | xi , M j ) p j ,i −1 ( xi + t i − t i −1 | ℑi −1 , M j ) dxi 0 for j = 1, 2, 3. The constituent elements of filter j are the initial residual life distribution p j 0 ( x0 | M j ) = α j (Γ( β j )) −1 (α j x0 ) β j −1 −α j x0 e (23) which is defined as a Gamma distribution for each model but parameterised independently. Similarly, the distribution governing the conditional relationship between the observed vibration reading and the underlying residual life is taken to be Gaussian for all the filters as (24) p( y i | xi , M j ) = (1 / σ ji 2π ) exp{− 1 / 2 (( y i − µ ji ) / σ ji ) 2 } where, for filter j, we have µji = Aj + Bj exp{-Cj xi} as the expected vibration level at the ith monitoring point given a particular realisation of the underlying residual life. We assume that the standard deviation parameter is proportional to the vibration level as σji = djyi. The specified and estimated parameters of filters 1 and 2 are given in table 1. Parameter Filter 1 Estimate Filter 2 Estimate A B C d 5 17.5 0.025 0.12 0.2 45 5.11 17.3 0.027 0.126 0.218 44.205 5 21 0.01 0.15 0.1 75 4.981 20.07 0.011 0.141 0.115 74.504 α β Table 1. The estimated parameters of filters 1 and 2 The expected CM paths for the average life corresponding to model formulations 1 and 2 are illustrated in figure 2. The general filter (filter 3) is constructed with the same forms as filters 1 and 2, given by equations (23) and (24), and the parameters are estimated using 100 simulated histories. 50 of the histories are generated according to failure mode 1 and 50 according to mode 2. The reasoning for this is that, for simplicity and to demonstrate the methodology, we develop a scenario in which both contingencies are equally likely, i.e. before the monitoring process begins, we have the initial probabilities p(M1) = p(M2) = 0.5. Figure 2. Illustrating the expected CM paths for failure modes 1 and 2 Using the CM histories simulated according to failure modes 1 and 2, the estimated parameters of the general stochastic filter (filter 3) are given in table 2. 4 Parameter General Model A B C d 5.482 17.702 0.02 0.195 0.00778 3.266 α β Table 2. The estimated parameters of the general stochastic filter (filter3) At the ith CM point, a closed form stochastic filtering expression is available for filter j as ( xi + t i ) p ji ( xi | ℑi , M j ) = β j −1 ∞ ∫ 0 (u + t i ) β j −1 i exp{− α j ( xi + t i ) − ∑ φ k ( xi , t i )} k =1 i (25) exp{− α j (u + t i ) − ∑ φ k (u , t i )} du k =1 for which we define the function φ k (u , t i ) = (1 / 2σ jk2 )( y k − A j − B j e −C j ( xi + ti −t k ) )2 (26) An essential element in both the parameter estimation process and the determination of p(Mj | ℑi), see equations (6) – (9), is the distribution p(yi | ℑi-1, Mj ) given by equation (13). For the functional forms used in this example, we have ∞ p ( y i | ℑ i −1 , M j ) = ∫0 σ ji 2π ( xi + t i ) ∞ ∫ t −t i i −1 β j −1 i exp{−α j ( xi + t i ) − ∑ φ k ( xi , t i )}dxi k =1 (u + t i −1 ) β j −1 (27) i −1 exp{−α j (u + t i −1 ) − ∑ φ k (u , t i −1 )}du k =1 Failure times for the components are simulated using inversion on the initial life distribution p(x0) and the vibration readings are then generated at each CM point using inversion on the conditional density p( yi | xi). We now simulate a case corresponding to each of the two failure modes and demonstrate the ability of the proposed methodology to track the appropriate mode and the underlying residual life. We compare the estimations of residual life and the prediction errors obtained using the combined weighted modelling approach (filters 1 and 2) with those obtained using the general stochastic filter 3 at each simulated CM point. The prediction error at the ith CM point is ei = (( xi − E [ xi | ℑi ]) 2 )1 / 2 (28) The mean-square error (MSE) about the simulated failure time is used as a criterion for comparing the weighted and general filters. Considering the weighted approach, the MSE attributable to each of the contributing filters is weighted according to the probability that each model provides an appropriate representation of the underlying dynamics for the particular component. Residual life (hrs) 4.1 Case 1 For this first case, a cycle of CM data is simulated with the underlying dynamics corresponding to failure mode 1. The failure time for the cycle is 193 hours and figure 3 demonstrates the ability of the recursive process to track the appropriate mode according to equations (18) – (21) using equation (27) developed for this specific case. 500 Actual 400 Weighted General 300 200 100 0 0 50 100 150 200 CM time (hrs) Figure 3. Illustrating the tracking of failure mode 1 for case 1 Figure 4. Comparing the residual life predictions obtained using the weighted approach (filters 1 and 2) and filter 3 for case 1 5 Figure 4 illustrates the tracking of the residual life at CM points throughout the life of the component. We compare the estimations of residual life given by the combined weighted modelling approach proposed in section 3 and the general filter. Figure 3 clearly illustrates that the methodology tracks the appropriate failure mode for this particular case and figure 4 demonstrates a clear improvement with the residual life prediction capability of the combined modelling approach (filters 1 and 2) when compared with the general filter 3. In addition, the sum of squared errors for the weighted approach is 808.19 compared with 1776.6 for filter 3. The superiority of the combined approach is enhanced further by the MSE statistic of 345115 compared with 732541 for the general filter. Residual life (hrs) 4.2 Case 2 For this second case, the CM process is simulated according to failure mode 2 with a failure time for the component of 651 hours. Figures 5 and 6 illustrate the tracking of the appropriate mode and the residual life respectively. 700 Actual 600 Weighted 500 General 400 300 200 100 0 0 200 400 600 800 CM time (hrs) Figure 5. Illustrating the tracking of failure mode 2 for case 2 Figure 6. Comparing the residual life predictions obtained using the weighted approach (filters 1 and 2) and filter 3 for case 2 As with the first case, it is clear from figures 5 and 6 that the weighted approach (filters 1 and 2) tracks the appropriate failure mode quickly for this second case and that the estimates of the residual life are more accurate when compared with those obtained using the general filter 3. This conclusion is again confirmed by the fit statistics; the sum of squared errors is 1422.9 for the weighted approach and 2234.3 for the general filter and the MSE is 585240 for the weighted approach and 1050250 for the general filter. 6. Discussion Cases 1 and 2 in the example have demonstrated that in some situations, it may be advantageous to group the available CM histories and construct a number of probabilistic stochastic filters to represent the specified contingencies (failure modes/types). The filters are then applied in parallel to new component CM information and the output from each filter weighted according to the recursively derived conditional probability that the filter is the appropriate representation of the current components underlying dynamics. The model introduced in section 4 incorporating transitions between failure modes will be explored in future research and a study is currently being conducted to test the application of the fixed mode methodology to an actual monitoring scenario. Acknowledgement The research documented in this paper has been supported by the Engineering and Physical Sciences Research Council (EPSRC, UK) under grant number EP/C54658X/1. References Cox, D. R. and Oakes, D. (1984) Analysis of Survival Data, Chapman and Hall Crowder, M. (2001) Classical Competing Risks, Chapman & Hall/CRC Makis, V. and Jardine, A. K. S. (1991) Optimal replacement policies in the proportional hazards model, INFOR, 30, 172-183 Vlok, P. J., Coetzee, J. L., Banjevic, D., Jardine, A. K. S. and Makis, V. (2002) Optimal component replacement decisions using vibration monitoring and the Proportional Hazards Model, J. of the Operational Research Society, 53, 193-202 Wang, W. (2002) A model to predict the residual life of rolling element bearings given monitored condition information to date, IMA J. of Management Mathematics, 13, 3-16 Wang, W. and Christer, A. H. (2000) Towards a general condition based maintenance model for a stochastic dynamic system, J. of the Operational Research Society, 51, 145-155 6 Remaining useful life in condition based maintenance: Is it useful? D. Banjevic, A.K.S. Jardine CBM Laboratory, Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario, M5S 3G8, Canada banjev@mie.utoronto.ca, jardine@mie.utoronto.ca Abstract: Remaining useful life (RUL) is nowadays in fashion, both in theory and applications. Engineers use it mostly when they have to decide whether to do maintenance, or to delay it, due to production requirements. Most often, it is assumed that in later life of an equipment (in wear-out period), the hazard function is increasing, and then the expected RUL, µ (t ) , is decreasing. It has been noticed that the standard deviation of RUL, σ (t ) , is also decreasing, which was expected, but that the ratio σ (t ) / µ (t ) is also increasing, which was a surprise. Initiated by this observation, we have proved that under some general conditions, which include Weibull distribution with shape parameter > 1, this is indeed the case. Even more, we have proved that the limiting distribution of standardized RUL is exponential, so that the variability of RUL is relatively large. We may conclude from this that in later life the point prediction of RUL is relatively inaccurate and may not be very useful. 1. Introduction In modern industrial environment it is common to monitor periodically, or even continuously, life and state of operating equipment, particularly if its lifetime (time to failure) is a random variable and cannot be predicted with certainty. Condition monitoring can provide information on current working age and state of the system measured by some diagnostic variables and also on environmental conditions that may affect its future life. This information then can be used for prediction of the remaining useful life of the system and planning of maintenance activities. Let T be the time to failure of the system, and let the system survived until time t . Then the “conditional” random variable X t = T − t (defined when T > t ), i.e., the remaining time to failure, is called the “remaining useful life” (RUL) of the system. The conditional reliability function Rt ( x) = Pt ( X t > x) = P(T − t > x | T > t ) contains all information required for prediction and planning of future activities depending on RUL. For example, a maintenance decision policy depending on risk may be to stop operation and do preventive maintenance at the first moment t when for fixed x (e.g., the length of an regular inspection interval) probability of failure before x , Ft ( x) = 1 − Rt ( x) , exceeds certain predetermined level. Another method may be to calculate (estimate) the expected residual life, often called mean residual life (MRL), and use it either as a point estimate of RUL or to create a prediction interval for RUL. Obviously, MRL value itself, even if correct, may not be very useful due to variation of RUL. On the other hand, if a failure should be prevented, then the system should be stopped safely before MRL. Regardless how MRL is used, as a function of t it is an important practical and theoretical quantity that describes aging of the equipment. MRL function is closely related to the widely used hazard rate function h(t ) . This relationship will be considered in detail later. Muth (1977) suggests that MRL is more informative and useful than the hazard function. It may be the case, but h(t ) is so rooted in engineers’ psyche that it will be difficult to replace it with something else, even if it is more useful. Then, a convenient relationship between µ (t ) and h(t ) is of interest. 2. Definitions and basic properties Consider a nonnegative random variable T which represents random time to failure of an item. Let R(t ) = P(T > t ) , t ≥ 0 , be its reliability function. Let for simplicity R (t ) > 0 for all t , and let T be absolutely continuous so that its density function f (t ) , and its hazard function h(t ) exist. Let also H (t ) = t ∫ h( x)dx be 0 the cumulative hazard function. Let X t = T − t be the remaining useful life of T at t , which is defined if T > t , and let Rt ( x) = Pt ( X t > x) = P (T − t > x | T > t ), x ≥ 0 , be its reliability function, ht ( x) its hazard function and ft ( x) its density function. Then R(t + x) f (t + x) Rt ( x) = , ft ( x) = − ∂∂x Rt ( x) = = h(t + x) Rt ( x) , R (t ) R (t ) 7 ht ( x) = x t+x ft ( x) = h(t + x) , H t ( x) = ∫ ht ( s )ds = ∫ h( s )ds = H (t + x) − H (t ) . t 0 Rt ( x) Mean residual life function (MRLF) is defined as ∞ µ (t ) = Et X t = E (T − t | T > t ) = ∫ Rt ( x)dx . 0 Then µ (0) = µ = ET and is easy to see that MRLF is defined if and only if µ < ∞ , which we assume in the following. Let us also note that µ t ( x) = µ (t + x) . One can easily see from µ (t ) = h(t ) = − ∫ ∞ t R( x)dx / R(t ) that R '(t ) µ '(t ) + 1 = . This means that µ (t ) cannot be any positive function, but with slight restrictions R(t ) µ (t ) that µ '(t ) ≥ −1 (from h(t ) ≥ 0 ) and ∞ ∫ ∫ ∞ 0 1 / µ (t )dx = ∞ . On the other side, the restriction on h(t ) is that h( x)dx = ∞ . It is interesting to note that µ = µ (0) = E (1 / h( X )) . So, mean time to failure is the average of 0 the reciprocal hazard. More detailed relationship will be considered in the following. For other details on RUL see Guess & Proschan (1988) and Reinertsen (1996). Our interest is in deteriorating equipment, and from now on we will assume that the hazard function h(t ) is increasing (nondecreasing) for t ≥ t0 , for some t0 ≥ 0 . Note that all results in the following that involve t are valid for t ≥ t0 . For simplicity we will assume that t0 = 0 . In practice the case of deterioration (“wear-out”) is considered for more expensive and complicated equipment which is maintained regularly and fixed when fails (which is most often a “minimal repair” which does not change overall deteriorating trend of the unit). The goal is then to utilize the equipment as much as possible, until the end of its designed life, still trying to avoid a catastrophic, nonrepairable failure. This situation may explain interest in the “remaining useful life”, its distribution and expectation. In general, an IFR (increasing failure rate) distribution can be defined by the property that Rt ( x) decreases in t for each fixed x , that is, the reliability for any fixed interval decreases with age. It is easy to see that ∂ ∂t Rt ( x) = − Rt ( x)(h(t + x) − h(t )) , which means that a distribution is IFR if and only if h(t ) is increasing. From the definition µ (t ) = ∫ ∞ 0 Rt ( x)dx , and property that Rt ( x) decreases in t , it follows that µ (t ) also decreases in t, when the distribution is an IFR. The opposite is not true, as Muth (1977) shows by a counterexample. 3. Properties of the variance of the remaining life Our main interest here is to investigate variance of RUL, that is σ (t ) = Var ( X t ) = E[(T − t − µ (t )) | T > t ] = E[(T − t ) | T > t ] − µ ( t ) . Let function g ( x) be such that 2 behavior 2 of 2 ∞ ∞ 0 0 the 2 Eg (T ) exists. Then Eg (T ) = ∫ g ( x)dF ( x) =g (0) + ∫ g ' ( x) R( x)dx , and then ∞ σ (t ) = 2∫ xRt ( x)dx − µ ( t ) . 2 2 0 Lemma 1. Let h(t ) be increasing. Then (a) σ 2 (t ) is decreasing, (b) σ 2 (t ) ≤ µ 2 (t ) . Proof: We will first prove that ∫ ∞ 0 xRt ( x)dx ≤ µ 2 ( t ) (*) ∫ and µ (t ) ∫ R ( s ) ds ≥ ∫ ds ∫ R ( x )dx = ∫ from which both (a) and (b) will follow. For s ≥ t , µ (t ) ≥ µ ( s ) = ∞ µ (t ) R( s ) ≥ ∫ R( x)dx , s ∞ ∞ ∞ ∞ ∞ s ∞ t t s t R( x)dx / R( s ) , or x dxR ( x) ∫ ds = t ∞ = ∫ ( x − t ) R( x)dx = ∫ xR(t + x)dx . By dividing the both sides of the previous inequality with R(t ) , we have t 0 (*). Then for (b), σ (t ) ≤ µ 2 (t ) is obviously equivalent with (*). For (a), consider 2 8 ∞ ∞ (σ 2 (t )) ' = 2 ∫ x ∂∂t Rt ( x)dx − 2 µ (t ) µ '(t ) ≤ 0 , or − ∫ x[h(t + x) − h(t )]Rt ( x)dx ≤ µ (t ) µ '(t ) 0 0 ∞ ∞ 0 0 = µ (t )( µ (t )h(t ) − 1) . Then, − ∫ xht ( x) Rt ( x)dx + h(t ) ∫ xRt ( x)dx ≤ µ 2 (t )h(t ) − µ (t ) , or ∞ − µ (t ) + h(t ) ∫ xRt ( x)dx ≤ µ 2 (t )h(t ) − µ (t ) , which is (*). 0 The inequality (*) can be generalized to a very useful general inequality Lemma 2. Let u ( x) ≥ 0 , g ( x) ≥ 0 , and (u ∗ g )( x) = ∫ ∞ 0 ∞ ∞ 0 0 x ∫ 0 u ( s ) g ( x − s )ds . Then u ( x) Rt ( x)dx ∫ g ( x)Rt ( x)dx ≥ ∫ (u ∗ g )( x)Rt ( x)dx . Proof: Similar as in Lemma 1. E. g., if we put u ≡ g ≡ 1 , then (u ∗ g )( x) = x , and Lemma 1 follows. The following identities are useful in deriving some results (see Bradley & Gupta (2003)). Let function g ( x) be such that g ( x) µt ( x) → 0, x → ∞ . Then (i) (ii) ∫ ∞ 0 ∫ ∞ g ( x)Rt ( x)dx = g (0) µ (t ) + ∫ g '( x)µt ( x) Rt ( x)dx , 0 ∞ 0 g ( x)Rt ( x)dx = ∞ g (0) g ( x) +∫ [ ]' Rt ( x)dx . 0 h(t ) ht ( x) Identities (i) and (ii) are easily obtained by partial integration. For g ( x) = x in (i) we get ∫ ∫ ∞ 0 0 ∞ 0 ∞ xRt ( x)dx = ∫ µt ( x) Rt ( x)dx . From µt ( x) = µ (t + x) ≤ µ (t ) , we have ∞ µt ( x) Rt ( x)dx ≤ µ (t ) ∫ Rt ( x)dx = µ 2 (t ) , or (*) in Lemma 1. 0 If put g ( x) = µt ( x) in we ∞ (i), ∞ we = µ 2 (t ) + ∫ (ht ( x) µt ( x) − 1) µt ( x) Rt ( x)dx = µ 2 0 ∞ ∞ ∫ (t ) + ∫ h ( x) µ get 0 ∞ 0 ∞ µt ( x)Rt ( x)dx = µt (0) µ (t ) + ∫ µt '( x)µt ( x) Rt ( x)dx t 0 2 t ∞ ( x) Rt ( x)dx − ∫ µt ( x) Rt ( x) dx , or 0 2 ∫ µt ( x) Rt ( x)dx = µ (t ) + ∫ ht ( x) µt ( x) Rt ( x)dx . (**) 2 0 2 0 In Lemma 1 we have proved that σ 2 (t ) ≤ µ 2 (t ) , or σ 2 (t ) / µ 2 (t ) ≤ 1 . We also have proved that both σ 2 (t ) and µ 2 (t ) are decreasing. We are interested now in behavior of σ 2 (t ) / µ 2 (t ) . We will show that under broad conditions which include Weibull distribution with shape parameter greater than 1, σ 2 (t ) / µ 2 (t ) ↑ 1 . We need a technical result. Lemma 3. (a) If h(t ) µ (t ) is increasing , then h(t ) µ (t ) → 1 , and ∫ ∞ 0 xRt ( x)dx × (2 − h(t ) µ (t )) ≥ µ 2 (t ) . (b) If 1/ h(t ) is a convex function (i.e. (1 / h(t ))' ↑ ), then h(t ) µ (t ) is increasing. Proof: If h(t ) µ (t ) is increasing, then µ '(t ) = h(t ) µ (t ) − 1 is also increasing, i.e. µ (t ) is convex. We will show that µ ' (t ) ↑ 0 , so that h(t ) µ (t ) ↑ 1 . From µ (t ) ↓ , µ '(t ) ≤ 0 , that is h(t ) µ (t ) ≤ 1 . From convexity of µ (t ) it follows that for y > t , µ (t ) ≥ µ ( y ) + µ '( y )(t − y ) = µ ( y )+ | µ '( y ) | ( y − t ) ≥| µ '( y ) | ( y − t ) . Then µ (t ) ≥ lim sup | µ '( y ) | ( y − t ) , so that mast be lim sup | µ '( y ) |= lim | µ '( y ) |= 0 . From the proof it also y →∞ y →∞ y →∞ follows that µ '( y ) = ο ( 1y ) , y → ∞ . From ht ( x) µt ( x) = h(t + x) µ (t + x) ≥ h(t ) µ (t ) , and from (**), ∞ ∞ 2 ∫ µt ( x) Rt ( x)dx ≥ µ 2 (t ) + h(t ) µ (t ) ∫ µt ( x) Rt ( x)dx , and (a) follows. For (b), we need (h(t ) µ (t )) ' ≥ 0 , or 0 0 2 h ' µ + hµ ' = h ' µ + h(hµ − 1) = (h '+ h ) µ − h ≥ 0 , and using that h ' ≥ 0 , h 1 1 1 1 µ≥ 2 = = . If we use g ( x) ≡ 1 in identity (ii), we get 2 h + h ' h 1 + h '/ h h 1 − (1/ h) ' 9 ∞ ∞ 1 + ∫ ( ht 1(x ) ) ' Rt (x)dx . If 1 / h ↓ , then (1/ h) ' < 0 . If (1 / h)' ↑ , then 0 h(t ) 0 ∞ 1 1 1 1 1 1 ) ' ∫ Rt (x)dx = ) ' µ ( t ) , or µ ≥ 0 > (1/ ht ( x)) ' > (1/ h(t )) ' , and µ (t ) ≥ +( +( , h 1 − (1/ h) ' h(t ) h(t ) 0 h(t ) h(t ) 1 1 1 ≤ µ (t ) ≤ . Note that also (1 / h)' ↑ 0 . Let us as required . So, if 1/ h(t ) is convex, then h(t ) 1 − (1/ h(t )) ' h(t ) denote s (t ) = 1/ h(t ) . Then s (t ) is decreasing and 0 ≥ s ' ↑ . From convexity of s (t ) it follows s (t ) ≥ s( y ) + s '( y )(t − y ) = s( y )+ | s '( y ) | ( y − t ) ≥| s '( y ) | ( y − t ) . Then, as above for µ '(t ) , it follows that s ' (t ) → 0 . From the proof it also follows that s '( y ) = ο ( 1y ) , y → ∞ . For a more general result, see Bradley & µ (t ) = ∫ Rt ( x)dx = Gupta (2003, Theorem 4). Theorem 1. If 1/ h(t ) is a convex function, that is (1 / h)' ↑ , then σ 2 (t ) / µ 2 (t ) ↑ 1 . Proof: From Lemma 3, h(t ) µ (t ) ↑ 1 . Let G1 (t ) = ∫ ∞ 0 xRt ( x)dx . Then σ 2 (t ) / µ 2 (t ) = (2G1 (t ) − µ 2 (t )) / µ 2 (t ) = 2G1 (t ) / µ 2 (t ) − 1 . Then we have to prove that v(t ) = G1 (t ) / µ 2 (t ) is increasing, or v '(t ) ≥ 0 , or G1' µ − G1 2 µ ' ≥ 0 , or ∞ µ (t ) ∫ x ∂∂t Rt ( x)dx − 2G1 (t )(h(t ) µ (t ) − 1) ≥ 0 , or 0 ∞ − µ (t ) ∫ x[ht ( x) − h(t )]Rt ( x)dx − 2G1 (t )( h(t ) µ (t ) − 1) ≥ 0 , or 0 − µ (t )( µ (t ) − h(t )G1 (t )) − 2G1 (t )(h(t ) µ (t ) − 1) ≥ 0 , or G1 (t ) × (2 − h(t ) µ (t )) ≥ µ 2 ( x) . The monotoneity then follows from Lemma 3. Also, 1 ≥ v(t ) = G1 (t ) / µ 2 (t ) ≥ 1/(2 − h(t ) µ (t )) → 1 , t → ∞ . Then also σ 2 (t ) / µ 2 (t ) = 2v(t ) − 1 ↑ 1 . Example 1. For Weibull distribution, h(t ) = β (t / θ ) β −1 , s (t ) = 1 t 1− β 1− β t −β ( ) , s '(t ) = ( ) , and β θ βθ θ β − 1 t − β −1 . Assume that β > 1 , so that s (t ) is convex, and 0 > s ' (t ) ↑ 0 , and then ( ) θ2 θ σ 2 (t ) / µ 2 (t ) ↑ 1 . This result would be quite difficult to prove directly, using some special properties of s ''(t ) = Gamma function. 4. Limiting distribution of remaining useful life It is of interest to investigate the limiting distribution of X t , properly normalized. From the property that h(t ) µ (t ) ↑ 1 , and σ 2 (t ) / µ 2 (t ) ↑ 1 , it should be that T −t T −t T −t P( > x | T > t ) have the same > x | T > t ) , P( > x | T > t ) , and P (h(t )(T − t ) > x | T > t ) = P( s (t ) σ (t ) µ (t ) limit when t → ∞ , if the limit exists. In applications it would be the simplest to use the last value. Theorem 2. Under the assumptions for which σ 2 (t ) / µ 2 (t ) ↑ 1 , that is (1 / h)' ↑ 0 , T −t P( > x | T > t ) → e− x , t → ∞ . s (t ) T −t > x | T > t ) = P(T − t > xs (t ) | T > t ) = Rt ( xs (t )) = exp{−[ H (t + xs (t )) − H (t )]} Proof: P ( s (t ) = exp{− ∫ t + xs ( t ) t ∫ t + xs ( t ) t h(u )du} . We have to prove that h(u )du ≥ h(t ) ∫ t + xs ( t ) t ∫ t + xs ( t ) t h(u )du → x , when t → ∞ . From h(t ) ↑ , du = h(t ) s(t ) x = x . Due to convexity of s (t ) , for all t , t0 , s (t + t0 ) ≥ s (t0 ) + s '(t0 )t , and for t0 = xµ (t ) , s (t + xµ (t )) ≥ s (t ) + s '(t ) xs(t ) 10 t + xs ( t ) s (t ) x → x , because x≤ s (t + s (t ) x) 1 + xs '(t ) s '(t ) → 0 when t → ∞ , which follows from the assumption that (1 / h)' ↑ , as shown in Lemma 3. From the = s (t )(1 + xs '(t )) . Then ∫ t h(u )du ≤ h(t + s (t ) x) s (t ) x = proof of the theorem a bounds for the conditional distribution follow −x e 1 1− x|s '( t )| ≤ P( T −t > x | T > t ) ≤ e − x . (***) s(t ) Example 2: For the case of Weibull distribution, for h(t ) = β t β −1 (with θ = 1 ), inequality (***) can be slightly improved. Let, for simplicity, β ≥ 2 , so that h(t ) is convex (similar derivation can be obtained when 1 < β < 2 , that is, when h(t ) is concave). Consider also directly X t = T − t . Then ∫ t+x t h(u )du ≥ ∫ t+x t x (h(t ) + h '(t )(u − t ))du = ∫ (h(t ) + h '(t )u )du = h(t ) x + h '(t ) 0 x2 h '(t ) ). = xh(t )(1 + 12 x 2 h(t ) In a similar way, h(t + x) − h(t ) h '(t + x) ) ≤ xh(t )(1 + 12 x ) , i.e. h(t ) x h(t ) exp{− x β t β −1 (1 + 12 x β t−1 (1 + xt ) β − 2 } ≤ P(T − t > x | T > t ) ≤ exp{− x β t β −1 (1 + 12 x β t−1 )} . Let β = 3 and t = 1 (i.e., somewhere around the average life), then exp{−3x(1 + x(1 + x)} ≤ P(T − t > x | T > t ) ≤ exp{−3x(1 + x)} . ∫ t+x t h(u )du ≤ 12 x(h(t ) + h(t + x)) = xh(t )(1 + 12 x For example, the probability that the unit will survive one more average life (x=1) is not greater than exp(−3 × 2) = 0.0025 = 0.25% , and that will survive at least a half of the average life (x=0.5) is not greater than exp(−3 × 0.5 × 1.5) = 0.105 = 10.5% , but is at least 7.2% (from the lower bound). For larger t the bounds are more accurate. From µ (t ) ≤ 1/ h(t ) = 1/ β t β −1 , and t = 1 , the MRL is not greater than 1/3. 5. Condition monitoring and remaining useful life Incorporation of conditional information into “calculation” of RUL is of great importance in current industrial practice with lot of condition monitoring and periodical inspections, but is a much more complicated theoretical and practical problem than with age information only. In practical applications, some longer term predictions are based on deterministic models, mainly empirical, depending on measures of deterioration. Results of regular condition monitoring, such as from oil or vibration analysis, are used for short term predictions, mostly of risk of failure. For an example of practical MRL estimation see Elsayed (2003). Some models such as the proportional hazards, or accelerated life are often used in reliability/statistical approach. Most of the models are based on experience and simple trending. Strictly speaking, these models are more empirical than based on a sound theory, due to technical difficulties requiring a probabilistic model for behavior of covariates. If Z (t ) is a covariate information (measurement) available at time t (which may also include all past information), the conditional reliability function is Rt ( x | Z (t )) = P(T − t > x | T > t , Z (t )) . It requires a description of the joint distribution of T and Z (t ) , which is a much more difficult problem than just a model for T . Usually, there is only an indirect relationship between Z (t ) and T , often with lot of noise and irrelevant information in Z (t ) , which makes the problem even more difficult. The mean residual life function can be defined as µ (t , Z (t )) = E (T − t | T > t , Z (t )) . Very little is devoted to this function in literature (see, e.g. Maguluri & Zhang (1994), Kemp (2000), and Yuen et al. (2003), where the covariate vector is time independent). A discrete Markov process model for Z (t ) is considered in Banjevic & Jardine (2006), with application to transmission oil analysis data. As it is for the function µ (t ) , which cannot be selected arbitrarily, but is subject to certain restrictions, the function µ (t , Z (t )) is subject to even more restrictions, depending on stochastic behavior of Z (t ) . As it is also pointed out by Jewell & Nielsen (1993), an ad hoc model for µ (t , Z (t )) (such as of regression type exp{γ ' Z (t )} ) cannot be formally used, unless intended for a fixed t , because it may violate the “consistency condition” for Rt ( x | Z (t )) . For some MRL models see also Sen (2004), Muller & Zhang (2005), and Chen et al. (2005). In a dynamic industrial environment, the models with covariates would be preferable, but are less used, due to complexity and requirements for regular storage and retrieval of the condition information. Decisions are often 11 done ad hoc, using experience for short-term “emergency alarm” decisions to stop the operation. With degradation variables that show slow development, the prediction of RUL is easier, and is used for planning of maintenance. Let us finish with words of an engineer when we asked him about RUL: “RUL – it is the operating hours left on equipment before it has to be down for major repair. Some RUL is based on Vendor recommendations, some are based on experience (e.g. wear on pumps) and some are based on deterministic analyses (e.g. crack growth). However, the majority is experience-based. Due to changing demands of operation, it is useful to have both predictions (remaining life and risk of failure). At our company, RUL is based on age and "normal" operating conditions. PM's are scheduled based on typical RUL's but may be moved to accommodate either unexpected deteriorating equipment conditions or unplanned production demands. [There are] good examples of maintenance items that is based on RUL, [and other that] are good examples of maintenance items that is not based on RUL but based on current condition. I guess a good summary of my comments is that when failure modes are known or predictable, RUL becomes critical in scheduling maintenance. When failure is unpredictable due to randomly changing conditions, then RUL becomes meaningless and maintenance decisions are based on current condition.” This understanding of RUL may not be strictly as in the theory, but the comments perfectly shows standing of RUL in real life situations. Acknowledgment This research was supported by the Manufacturing and Material Ontario and Natural Sciences and Engineering Research Council of Canada. References Banjevic, D., and Jardine, A. K. S. (2006) Calculation of reliability function and remaining useful life for a Markov failure time process, IMA J. of Management Mathematics, 17, 115-130 Bradley, D. and Gupta, R. (2003) Limiting behaviour of the mean residual life, Ann. Inst. Statist. Math. 55, 217-226 Chen, Y. Q., Jewell, N. P., Lei, X. and Cheng, S. C. (2005) Semiparametric Estimation of Proportional Mean Residual Life Model in Presence of Censoring, Biometrics, 61, 170-178 Elsayed, E. (2003) Mean residual Life and Optimal Operating Conditions for Industrial Furnace Tubes, in Case Studies in Reliability and Maintenance, ed. Blischke, W. R. and Murthy, D. N. P., Wiley, 497-515 Guess, F. and Proschan, F. (1988) Mean Residual Life: Theory and Applications, In Handbook of Statistics, 7, eds. Krishnaiah, P. R. and Rao, C. R., Elsevier Science Publishers B.V., 215-224 Jewell, N. P., and Nielsen, J. P. (1993) A Framework for Consistent Prediction Rules Based on Markers, Biometrika, 80, 153-164 Kemp, G. C. R. (2000) When is a proportional hazards model valid for both stock and flow sampled duration data? Economics Letters, 69, 33-37 Maguluri, M. and Zhang, C. (1994) Estimation in the Mean Residual Life Regression Model, J. R. Statist. Soc. B, 56, 477-489 Muller, H. and Zhang, Y. (2005) Time-Varying Functional Regression for Predicting Remaining Lifetime Distributions from Longitudinal Trajectories, Biometrics, 61, 1064-1075 Muth, E. (1977) Reliability models with positive memory derived from the mean residual life function, in The Theory and Applications of Reliability, 2, ed. Tsokos, C. P. and Shimi, I. N., Academic Press, New York, 401-435 Reinertsen, R. (1996) Residual life of technical systems; diagnosis, prediction and life extension, Reliability Engineering and System Safety, 54, 23-34 Sen, P. K. (2004) HRQoL and Concomitant Adjusted Mean Residual Life Analysis, in Parametric and Semiparametric Models with Applications to Reliability, Survival Analysis and Quality of Life, eds. Nikulin, M. S., Balakrishnan, N., Mesbah, M. and Limnios, N., Birkhauser, Boston, 349-362 Yuen, K. C., Zhu, L. X. and Tang, N. Y. (2003) On the mean residual life regression model, J. of Statistical Planning and Inference, 113, 685-698 12 Demand categorisation in a European spare parts logistics network Aris A. Syntetos 1, M. Zied Babai 2, Mark Keyes 3 1, 2. Centre for Operational Research and Applied Statistics, Salford Business School, University of Salford, Maxwell Building, The Crescent, Manchester M5 4WT, UK 3. Logistics Information Centre, Brother International UK, Brother House, 1 Tame Street, Audenshaw, Manchester M34 5JE, UK a.syntetos@salford.ac.uk, m.z.babai@salford.ac.uk, Mark.Keyes@brother.co.uk Abstract: Stock keeping units (SKUs) exhibit different demand patterns requiring varying methods for forecasting and stock control. Software such as SAP requires users to categorise demand patterns before selecting the technique that optimises the forecast. This planning functionality within SAP is typical of the industry standard software packages available. This paper addresses issues related to the demand categorization and its impact on the inventory control and service level performance of a large business machine manufacturer. In particular we focus on the management of spare parts; the demand patterns exhibited by such SKUs may range from relatively smooth and constant (normal), to intermittent and sporadic (non normal). By analysing the effectiveness of the company’s previous ABC classification and then considering the latest research findings we can see the impact of utilizing these new theories in a large organization’s spare parts management and contrast differences in performance. Keywords: Spare parts management, demand categorisation, forecasting, stock control 13 How academics can help industry and the other way around R. Dwight, H. Martin, J. Sharp*, 1 1. University of Salford, UK. j.m.sharp@salford.ac.uk Introduction Many organisations are faced with an unprecedented level of competition. Markets are truly global now and countries are participating in this global market with growing success, such as the former eastern block and China. Also, former developing countries like India and south East Asia are growing economies in their own right. A growing and accessible global market means not just an increased business potential, but on the other hand, requires also much closer attention to innovation, and creating effective and efficient business operations. For many organisations, this means that “management-by-gut-feeling” may not be enough to be ready for this global challenge. Business processes need to be analysed and possibly redesigned in a systematic and justifiable manner. Management by trial and error may not give an organisation a second chance. Viewed from this perspective, academic research aimed at improvement of business processes may prove indispensable in supporting modern management. Several academic disciplines, such as industrial engineering, organisational sociology and psychology, operations research, etc. are active in this particular area, but mostly from a pure academic interest. On the other hand, the academics have a basic responsibility to develop new and innovative pieces of knowledge, i.e. to produce scientifically relevant knowledge. Essentially, the quality, and therefore the success of a researcher, is very much dependent on the recognition of the quality of the work by peers, mainly through the medium of publications. Creating as many publications in as high ranking journals as possible is a top priority of a researcher. In addition, a moral claim of society, which finances most of the research directly or indirectly, exists that this new knowledge has value through its application, i.e. to produce societal relevant knowledge. This second type of relevance is actually much less clear and recognised for various reasons we will discuss later on in the paper. We argue that not recognising the second type of responsibility and focusing solely on the first type will create in the end a self-fulfilling prophecy in which academic success will be measured by the degree to which a researcher complies with the academic establishment. Research goals are mainly determined by a combination of pure academic curiosity and compliance with existing expectations set by peers. Usefulness and applicability does not seem to be a priority. In this paper we will discuss some key issues concerning the potential academic contribution to business process improvement and development in general and maintenance processes in particular. The authors have had hands on experience with both, maintenance management in industry, as well as with academia. We hope that this creates a somewhat unique perspective and help us to analyse the so called “gap” between industry and academics (e.g. see Scarf (2004)) from both sides of the fence. In doing so, we deliberately take a more extreme position to enlarge our point and to provoke discussion. We realise that this paper doesn’t provide definitive answers, but if we can fuel the debate constructively, then we consider this paper a success. First in this paper, we will present an example of the gap between academics and practitioners to set the scene. We then discuss and analyse some core issues that need to be resolved for closing the gap, and finally we will conclude by raising a key question. Academics and industry; the nature of the problem Christer et al. (1995) discussed a case study where a company would not increase the time given over to maintenance. Christer et al. (1995) were able to demonstrate that a considerable improvement in plant productivity would be achieved following an application of Delay Time modelling. This showed that contrary to the current belief within the company, increasing the time given over to maintenance actually increased the available production time. The predicted increase was then achieved in practice. However, once the senior company contact that the modellers had worked with left, there was no modelling skill or even awareness within the company, which subsequently and unwittingly reverted to the previous and proven inefficient practice of breakdown maintenance thus demonstrating that the company did not learn. Different goal setting Given the discussion in the introductory section, it is rather obvious that academics and industrialists have very different main goals, which explains current behaviour quite adequately. Both groups need to be result oriented if they want to be successful in their field. The industrialists don’t have a strict requirement to explain good 14 results. The results, i.e. a contribution to an organisation’s mission, are proof of success by itself. For researchers, on the other hand, academic ethos usually demands methodological rigour, which usually entails some consistency between a theoretical prediction, a test and a conclusion. The level of consistency determines the quality of the theory rather than the prediction itself. The net result of this discussion is that because substantial differences in goal setting exist between industrialists and the academics, there is little motivation to develop joint efforts. Disciplinary versus a problem oriented approach Historically, science has evolved dramatically over the years. Although, doing research in accordance with strict methodological standards has only started during the last 200 years, the body of knowledge has grown so fast that specialisation has become a necessity quite quickly. Arguably, much of the research is driven by studying phenomena, which in the end also drives the choices made for specialisation as science advances. In the early days mathematicians were considered more or less universally knowledgeable in all types of strict abstracted logic. Nowadays, the qualification of a mathematician is not sufficient to typify someone’s scientific qualification. A mathematician has specialised in Bayesian statistics, fuzzy logic, fractal modelling, etc. Interestingly, we may conclude that specialisation has also narrowed the scope in which phenomena are being researched. Industrialists in general don’t share the academics’ enthusiasm for phenomena. Instead, they have to deal with problems on a day to day basis. Typically, these problems consist of many phenomena which probably interact in a certain way. Therefore, the industrialist cannot afford to specialise in the same way as the researchers do. If problems are more generic, the best that can happen is that practitioners specialise in the types of problems in which they may have developed certain problem solving skills. All of this happens usually on an individual basis or is driven by dogmatic “fashion statements” from self proclaimed management gurus. Again, we may conclude that the problem oriented perspective of practitioners doesn’t align very well with phenonomical or disciplinary specialisation of academics. Scientific convenience or real life relevance? One of the problems researchers face in their endeavour to test their theories is the so called lack of actual data. Being a phenonomical researcher, it is methodologically imperative to use as realistic data as possible about the phenomenon at hand. Unfortunately, collecting data is usually cumbersome, time consuming and costly. Researchers interested in phenomena in real organisations may get into trouble in many ways. The data that is actually required is not readily available in organisations, organisations are not prepared to spend time and effort in collecting scientifically required data which has no other value from their perspective, organisations enforce a certain level of secrecy and impose restricted or no access at all to the relevant data, etc. Most researchers have not sufficient funds to compensate organisations for their participation. Instead, organisations are expected to take their moral responsibility to support science. Needless to say, organisations rarely respond to this implicit responsibility. Also, it is very difficult for both parties to actually explain and understand each other’s potential stakes in a possible cooperation. Researchers have difficulties in understanding the relationship, if any, between certain problems in an organisation and how their research may help that particular organisation in the end. The net result is very often that practitioners carry on without any potentially refreshing input and researchers are tempted to take shortcuts in their methodology. Because this problem occurs on such a big scale, it seems that it is almost an accepted research policy to take shortcuts. E.g. researchers use other less valid datasets they can get which need to be treated in a certain way before they can be used, or worse, simply certain assumptions are made. In recent years, researchers now can make use of elaborate assumed realistic simulations as a stand in for actual situations, which is also a questionable practice. Of course, it is difficult to criticise researchers at the very core of their ethos, but every researcher still has this responsibility to search for the best objective way to test their hypothesis and in what way corner-cutting will ultimately impair the quality of their work. You, as a responsible researcher, will be the judge of that. What happened with Quality management? At the end of the second world war Deming (1986) and others (Crosby (1979) and Juran (1989)) revolutionised management thinking by introducing the concept of quality. In particular, Japanese organisation’s embraced the quality concept fully and implemented quality principles without compromise. Many attribute the Japanese success on the international markets to the rigour and dedication of the Japanese to the quality principle. At its essence, quality improvement requires people to iterate through the quality circle; “plan-do-check”, which is in effect a learning cycle. So, it is relatively safe to assume that improvement requires some sort of learning. But, that is what researchers do all the time. Yet, industry in particular is struggling with what is nowadays called 15 organisational learning and what it actually means (Burgoyne (1995)). Apart from the fact that organisational learning is still in its infancy in scientific terms and that it is far from being a closed case, many organisations have little affinity with learning. Education is mostly considered a government responsibility. Industry sees itself predominantly as a consumer acting in a human resource market, in which more or less competent individuals can be “bought” when needed and rejected if deemed insufficient. In particular, the latter belief is clashing with organisational learning. We realise that this line of reasoning is bit stretched and hard to substantiate, but can managers really afford to take a bystander’s position when it comes to learning, or do we leave that to the Japanese? Is it possible that industry can learn (no pun intended) from researchers and gain a little more appreciation for a methodological approach to address problems, or is the basic quality circle just an empty phrase? What is the value of research? Research needs to be financed. Because of all kinds of difficulties in measuring the value of research in terms of concrete society benefits, we take the pragmatic approach of indirect measurement. A thorough discussion of the problems of measuring the value of scientific research exceeds the scope of this paper by far. Besides, plenty of publications have addressed this problem (Gray (1999)). But, in view of our problem of closing the gap between academics and industrialists, it would be nice if not only the cost of research can be made visible, but also the planned revenue. Thinking in terms of cost and revenues touches the basic industrialist’s mindset. Arguably, not much effort has been done lately to improve this valuation problem. It seems academics have accepted the current “best practice” of counting publications as the main criterion for success. The government, being the main financer of research in many regions of the world, has settled for this type of measuring. In fact, this type of valuation has been institutionalised everywhere and has become the main criterion in most scientific accreditations. Strangely, the government doesn’t act as a true financer and doesn’t demand value for money. At least the government could remind researchers about their moral responsibility we briefly introduced as the second type of responsibility in the introduction. Many ambitious researchers are a bit reluctant to do any statements on this part if not explicitly asked for, e.g. when filling in an application for a research grant. Perhaps, some researchers feel that any bonding with practicality would lead them to be classified as “applied researchers”, which in turn, would drive them away from reaching the Olympic temple of science. We admit that this is a bit exaggerated and several governmental funded research projects do require attention for applicability of research. But still, we feel this is not truly recognised as important nowadays in the scientific world. Applied research is by nature much more problem oriented and could potentially alleviate the problem to value the applicability of envisioned research output. Yet, applied research is generally not seen as a scientific career booster. What about self esteem? Apart from having difficulty to express the value of research explicitly a researcher cannot escape the feeling when discussing with industrialists of being regarded as some kind of odd creature. Generally speaking, researchers are not driven by advancing their career and their self esteem through increase of their salary, by the beauty and exuberance of their office, etc. (Pun intended). Supposedly, researchers are driven by sheer curiosity and fancy the respect of their peers. We apologise for picturing this stereotype image of industrialists and researchers, but we believe that a cultural gap exists between the two groups, which can make it harder to take one another seriously. How does maintenance fit into this discussion? The discussion so far was quite general in nature and left maintenance out of the equation. Actually, we didn’t address maintenance specifically so far, because the previous discussion is very helpful in understanding what happens in maintenance. Firstly, maintenance is more problem oriented than (mono-) disciplinary by nature. This creates many of the problems identified already. Quite a few researchers with a disciplinary research background have no real understanding of the problem area called maintenance. E.g. many papers are published by researchers who haven’t ever seen a real maintenance department, let alone worked there and understand the true complexity and interrelationships between highly heterogeneous processes at work. Yet, these researchers claim relevance and ponder why they are not taken seriously by the practitioners. Understanding maintenance problems requires a multi-disciplinary approach. This, in turn, calls for applied research without taking shortcuts via oversimplified and unrealistic assumptions. Metaphorically speaking; just like a biologist must go out in the wild to study the behaviour of primates, a maintenance researcher must go out and do his studies in real maintenance environments. 16 Unfortunately, applied research requires applied researchers who are on the brink of scientific extinction and therefore, are in short supply. How can collaboration between academics and industry be improved? If we look back at all the problems identified and agree on them, it looks next to impossible to change the world. But in good scientific tradition, we can claim that once we have identified the problem, we can work on a solution methodologically. And eventually, we have no doubt, we could change the situation. At least that is the expected rational answer. The real question here is do we really want to change? Most problems relate to incompatibilities in attitude, which can be changed by sheer willpower. We suggest that industrialists, academics and the government look at this question first and deal with the details later. One attempt to close the gap was described by Christer (2005) who with Sharp, a co-author of this paper, jointly organised a three day IFRIM/EPSRC Workshop sponsored by the EPSRC, of the UK in the Spring of 2004 to identify the future direction for Maintenance research. It was organised as a high level academicindustrial workshop inviting senior industrial representatives with key responsibilities for maintenance and reliability, and recognised academic researchers with a track record of addressing real industrial maintenance problems. The remit of the workshop was to investigate the gap between maintenance theory and practice, to consider if it exists, and if so why it exists, consider if the gap is worth closing, and if so what mechanism might be considered. The consensus from the workshop was that a gap exists that is worth closing. Perhaps more significant for the future, during the workshop the perception of the industrial members as to what was possible as far as modelling and decision-aids were concerned changed markedly. There are clear future challenges facing both academic and industrial communities with a payoff to both. The former need to be more closely allied to industry and proactive in formulating proposals for collaboration, and industry needs to be better informed at the workface, become more receptive to testing new ideas and occasionally moving outside a comfort zone. As Christer (2005) and Dwight (2005) both pointed out both the industrialists and academics felt the gap was worth closing yet three years later nothing appears to have changed that demonstrates academia and industry are working closer together. References Burgoyne, J. (1995) Feeding minds to grow the business, People Management, 1 (19), 2 – 6 Christer, A. H. (2005) Closing the gap between maintenance theory and practice, Keynote Address at the Int. Congress of Maintenance Societies, Australia, April 2005 Christer, A. H., Wang, W., Baker, R. and Sharp, J. (1995) Modelling maintenance practice of production plant using the delay time concept, IMA J. of Mathematics Applied in Business and Industry, 6, 67 – 84 Crosby, P. B. (1979) Quality is free: The art of making quality certain, New York: New American Library Deming, W. E. (1986) Out of the crisis, Cambridge, MA: Massachusetts Institute of Technology Dwight, R. (2005) Industrial-academic interface in academic research, Paper presented at the Int. Reliability and Maintenance Conf., Toronto, Canada, November 2005 Gray, H. (1999) Universities and the Creation of Wealth, Open University Press, Buckingham, UK Juran, J. (1989) Juran on leadership for Quality: An Executive handbook, The Free Press, New York, USA Scarf, P. (2004) Results of EPSRC Sponsored Survey, Paper presented at the IMA Conference on Reliability and Maintenance, University of Salford, May, 2004 17 Stochastic modelling maintenance actions of complex systems J. Rhys Kearney *, David F. Percy, Khairy A. H. Kobbacy Salford Business School, University of Salford,Greater Manchester, M5 4WT, England j.r.kearney@pgr.salford.ac.uk Abstract: When a system is maintained there are many different actions which can be performed to improve its reliability, such as the replacement of aged components, the reconfiguration of mechanisms or the cleaning and lubrication of moving parts; it is assumed that each of these actions would then contribute to an overall improvement in the performance of the system as a whole. Few assumptions are made about the structure of the system in the hope that effects can be estimated from history data. Presented is a discussion on the form of the adjustment that should be made to the system rate of occurrence of failure to best capture the physical properties of the intervention be it replacement of parts of the system or preventive maintenance with limited efficacy. 1. Introduction A system will comprise a number of subsystems and components that are interconnected in such a way that the system is able to perform a set of required functions. As adopted by Rausand & Hoyland (2004), the term functional block will be used to denote an element of the system, whether it is a component or a large subsystem. The general definition of reliability in the ISO 8402 standard is ‘the ability of an item to perform a required function, under given environmental and operational considerations, for a stated period of time’. According to the IEC50(191) standard, failure is the event when a required function is terminated while a fault is ‘the state of an item characterized by inability to perform a required function’; a fault is hence the system state resulting from failure. This work’s focus is on modelling the reliability of repairable and maintainable technical systems of a complex nature. A repairable system is one which, upon failure, can be restored to satisfactory performance by any method other than replacement of the entire system (Ascher & Feingold (1984)). A system which is repairable will also be considered maintainable in this work. Here the time taken to repair the system on failure, referred to as corrective maintenance (CM), is considered negligible in comparison with the length of time the system is in operation and is therefore not considered in the modelling process. The purpose of preventive maintenance (PM) is to improve system reliability – maintenance actions are carried out on the system with the intention of retaining or restoring the system to a certain level of functional performance. These could include the repair, replacement, cleaning, lubrication or reconfiguration of functional blocks. There have been many proposed methods for modelling the effect of maintenance actions; most of these models are variations of the simple renewal process and the nonhomogeneous Poisson process. Ascher & Feingold (1984) claimed that these are the fundamental models for replacements and repairs, respectively. Discussion follows on these modelling methodologies with regards to factors which need to be taken into consideration in order to develop an accurate mathematical representation of the physical manifestation of the type of maintenance action which is undertaken on complex systems. 2. Maintenance actions and system dependencies Given its operational history, the reliability of a complex system is an aggregate measure of the reliability of its functional blocks given the system’s structural dependencies; that is, the system ROCOF is dependent on within component interactions otherwise referred to as stochastic dependence (Nicolai & Dekker (2006)). The basic assumption is that the state of a functional block, given by its age, failure rate or other measure of condition, can influence the state of other components in the system. Suppose a functional block of a system is replaced by one which was regarded as newer and more reliable; one might infer that the reliability of the system as a whole would improve as a result. This would be the case if the block was functionally independent, but as part of a complex system the resultant effect on overall reliability can be more difficult to quantify. Take a simple example of a two unit system illustrated in Figure 1, a pump which compresses gas into a sealed chamber via a system of pipes and valves. Valves 1 and 2 on the diagram are one-way in that the gas can only travel in the direction indicated; the third valve is also one-way but only releases when a certain pressure in the chamber is attained. The system is operating satisfactorily if the blasts of gas coming out of Valve 3 occur within prescribed ranges of pressure, frequency and duration. If any of these criteria are not met, the system is at fault. Say it is observed that the pressure of the blasts being released by the third valve has fallen below the acceptable level and the frequency of the blasts has increased. The maintenance engineer makes the decision to replace Valve 3. This corrects the fault bringing the system performance back within the acceptable limits, the previous valve having structurally failed as a result of repeated usage. Dependencies on this replaced block may 18 affect the reliability of others in the system. For example, the fixed valve resulting in increased work from the pump may alter the pump’s failure mode; the system may fail sooner as the rate of degradation of the pump increases. 2. 3. Figure 1. A repairable system of a compressor pump and chamber To assess the reliability of the system and the effect of repair and maintenance actions with certainty for any system would require detailed measurement of system parameters and investigation into their dynamic behaviour given inter-component and subsystem dependencies which would become increasingly difficult for ever more complicated systems. The proposed model in this work is one which takes a top-down view of system reliability taking a holistic view of the measurement system reliability and of maintenance actions via a system wide failure intensity function which implicitly models the effect of dependencies. 3. Multiplicative scaling The core accepted model of failure modes is based upon the nonhomogenous Poisson process which has the capacity to describe nonstationary interfailure times. The proportional intensities model (PIM), introduced by Cox (1972), is suited to the modelling of repairable systems as the effects of maintenance actions are captured in adjustments made to the rate of occurrence of failures (ROCOF). Percy & Alkali (2006) proposed a PIM where the intensity function is multiplicatively scaled upon failure and repair such that N (t ) λ (t ) = λ0 (t )∏ si (1) i =1 60 60 40 40 Intensity Intensity where si > 0 are constants representing the intensity scaling factors and N(t) is the number of corrective maintenance actions up to and including time t. The scaling factors si can take the form of a positive constant, random variable, specified function of the number of failures i or of the times when these occur, ti, or random variables with evolving means. In this work, a multiplicative adjustment is considered more conceptually justifiable than one which is additive, which could potentially result in a negative intensity, a premise of multiplicative scaling is that the absolute effect of a maintenance action is proportional to the health of the system at the time the maintenance action is carried out. Say a system has a monotonic increasing baseline intensity function i.e. the system health is degrading with usage resulting in an increased frequency of failures, as is the case for the intensity functions illustrated in Figure 2a). It is reasonable to assume for this common scenario that the benefits of a maintenance action will increase as system health degrades – there would be little benefit in maintaining a new system in good working order relative to a system which has suffered atrophy through continued usage. This effect can be most simply captured by a constant proportional reduction of the intensity function where si = ρ. Figure 1 illustrates a power-law intensity scaled multiplicatively to represent the effect of maintenance action either at t = 4 or t = 8 by a constant ρ = 0.5. The effect is more pronounced when the system ROCOF is larger at t = 8, however, the resultant system intensity is the same for either maintenance period. 20 20 0 2 4 6 8 0 10 2 4 6 8 Time Time Figure 2a) b) Constant proportional reduction on increasing intensity 19 10 Implicit in the assumption of proportional scaling of the intensity function is that the maintenance action has the equivalent effect of making a proportion of the system perfectly reliable; i.e. if ρ. = 1/2 one can state that of all those things in the system which were contributing to the rate of occurrence of failure prior to the maintenance action taking place, 50% of them have been eradicated by the maintenance action. If we take a holistic view of the system, this is equivalent to half the system being made perfectly reliable. One difficulty in scaling the intensity function by a multiplicative constant to represent the effect of a maintenance action is that this part of the system which, in theory, becomes perfectly reliable remains as such, ceasing to contribute to the system failure intensity since the rate of change of the ROCOF is also multiplicatively scaled such that (2) λ '(t ) = λ0′ (t ) ρ N (t ) . Figure 2b) illustrates the scenario where the baseline is scaled by a constant ρ. = 1/2 at times t = 2, 4, 6 and 8, the repeated scaling resulting in an exponential reduction in the proportion of the system which contributes to the system failure intensity. 4. System repair by component replacement When the maintenance action results in a proportion of the system being replaced, it is supposed that this is equivalent to a proportion of the system being renewed. The baseline intensity function, λ0 (t ) , represents the failure rate of the system in the absence of any maintenance action. It is assumed that this system-wide intensity function can be scaled to represent the failure intensity of a proportion p of the system; that is, components of the system are independent and identically distributed. This model is based upon an overhaul model proposed by Zhang & Jardine (1998) who attribute a proportion p of an intensity function to that of the system in the previous overhaul period. At time ti a proportion p ∈ [0,1] of the system’s components are completely renewed during a maintenance action, the corresponding system intensity function for the period following the maintenance action, λi (t ) , given a system intensity function prior the action, λi −1 (t ) , is given by λ (t ) = pλ0 ( t − ti ) + (1 − p ) λi −1 ( t ) ; ti < t < ti +1 (3) On solving the recurrence relation, this can be written as λi (t ) = ∑ p1−δ (i , j ) (1 − p ) λ0 ( t − ti − j ); ti < t < ti +1 i j (4) j =0 where ⎧0; i ≠ j (5) ⎩1; i = j We illustrate the intensity function corresponding to this model in Figure 3 for a quadratic baseline intensity function with p = 1 2 and repairs at time t1 = 100 and t2 = 200 . Equation 3 proposes a simple way to model the effect of replacement on the system reliability, an assumption of the replacement policy modelled is that all components of the system are equally likely to be chosen for replacement, that is, no account is taken of the relative health or age of components in the system. Further investigations into allowing for the inclusion of relative component health in the replacement policy will be presented in future work. δ ( i, j ) = ⎨ intensity 10 5 0 100 200 300 time Figure 3. Typcial effect of replacements on intensity function 20 The effects of dependencies between the replaced components and system reliability need further investigation. One hypothesis is that the replaced components will age faster due to the increasing rate of failures induced by the older components of the system, tending toward the systemic age. The functional age of the newer components could be moderated to include include such an effect by rewriting equation 3 as λ (t ) = pλ0 ⎡⎣t − ti exp {−φ [t − ti ]} ⎤⎦ + (1 − p ) λi −1 ( t ) ; ti < t < ti +1 (6) where φ > 0 is an unknown constant. ( ) 5. Preventive maintenance Percy & Alkali (2006) identified limitations with some of the proposed models for preventive maintenance, including difficult interpretation and poor definition. They concluded that a generalization of the proportional intensities model offers a flexible stochastic process that adapts readily to a variety of applications where only event history data are available. As touched upon in Section 3, constant scaling of the system intensity function implies the beneficial effect of a maintenance action is permanent in that it results in an equivalent proportion of the system becoming perfectly reliable, we suppose this is not the best way to model the effect of preventive maintenance actions considered here to be those maintenance tasks which restore the system to a level of functional performance, other than the replacement of components, which is known to have a limited efficacy given system usage. The need for modelling the decay of maintenance effects was noted by Jack & Murthy (2002), who proposed investigating the idea for additive adjustments of the intensity function. Unless an additive reduction is a function of time (here taken as the measurement of system usage), the adjustment has no effect on the rate of change of the system ROCOF. This means that an additive reduction implies that the effect of a maintenance action is an instantaneous shift in the absolute frequency of system failure, but the rate at which the failure rate changes remains unaffected – the system is made younger in absolute terms but continues to age at the same rate but from a lower frequency of failure. In a virtual age model, a maintenance action is supposed to restore the intensity function of the system to that of an earlier time, the assumption here is that the overall effect of maintenance action is that of globally making the system younger in all aspects of failure behaviour. Neither of these behaviours seem intuitively sensible. Instead we propose the effect of a maintenance action should be proportional to the health of the system i.e. multiplicative scaling, but that the maintenance action should have limited efficacy such that the effect decays with time. The first decay factor that we consider is exponential in nature and involves defining the scaling factor in equation 1 to be of the form (7) si = 1 − (1 − ρ ) exp −φ ( t − ti ) { } in terms of two unknown parameters, ρ > 0 and φ > 0 . Note that si = ρ immediately after preventive maintenance, corresponding to the constant proportional intensities model but we now have the realisitic scenario that the effect of preventive maintenance vanishes over time because si → 1 as t → ∞ . Although this decay factor is satisfactory, we can improve upon it by imposing instantaneous proportional intensities scaling upon performing preventive maintenance. To do this, the gradient of the intensity function must instantaneously scale by the same factor ρ as affects the actual function value. We achieve this by modifying equation 7 so that the scaling factors in Equation (7) are now in the time-squared form { si = 1 − (1 − ρ ) exp −φ ( t − ti ) 2 } (8) again in terms of two unknown parameters, ρ > 0 and φ > 0 . The effect then degrades in the shape of an Scurve such that initially the rate at which the effect degrades is slight but then increases until easing off as few benefits of the maintenance remain, similar to the logarithmic adoption curve. Figure 4 displays a graph of this intensity function for a quadratically increasing baseline intensity function with ρ = 1 2 φ = 12,000 and preventive maintenance at times t1 = 100 and t2 = 200 . 21 intensity 10 5 0 100 200 300 time Figure 4. Typical effect of preventive maintenance on intensity function The diminishing effects of preventive maintenance interventions are clear from this graph, the system intensity tending to the baseline intensity as the time since the action increases. The baseline then could be regarded as the rate at which failures occur if no interventions are made on the system, and that maintenance actions act as a shock shift from this baseline which, due to through system entropy, degrades returning the system to its un-maintained ROCOF. 6. Conclusion One measure of the immediate effect of maintenance action on the system ROCOF introduced in Section 3 is the equivalent proportion of the system which is made perfectly reliable, if all parts of the system are considered independent and identically distributed. It was noted that constant multiplicative scaling is not a sensible assumption because mathematically this is equivalent to that proportion of the system remaining perfectly reliable for the duration of the systems operation. To counter this unrealistic behaviour, a proportional renewal was proposed in section 4 for component replacement and decay parameterisation in section 5 for preventive maintenance to better approximate the effect of maintenance actions. References Ascher, H. and Feingold, H. (1984) Repairable systems reliability: modeling, inference, misconceptions and their causes, New York, Dekker, M. Cox, D. R. (1972) The statistical analysis of dependencies in point processes, Stochastic Point Processes Jack, N. and Murthy, D. N. P. (2002) A new preventive maintenance strategy for items sold under warranty, IMA J. of Management Mathematics, 13, 121-129 Nicolai, R. P. and Dekker, R. (2006) Optimal Maintenance of Multi-Component Systems: A Review Percy, D. F. and Alkali, B. M. (2006) Generalized proportional intensities models for repairable systems, IMA J. of Management Mathematics, 17, 171-185 Rausand, M. and Hoyland, A. (2004) System reliability theory: models, statistical methods, and applications, Hoboken, NJ, WileyInterscience Zhang, F. and Jardine, A. K. S. (1998) Optimal maintenance models with minimal repair, periodic overhaul and complete renewal, IIE Transactions, 30, 1109-1119 22 Application of the delay-time concept in a manufacturing industry B. Jones*, I. Jenkinson, J. Wang Liverpool John Moores University, Liverpool, UK. bryanjjones@aol.com Abstract: This paper has been written to give a methodology of applying delay-time analysis to a maintenance and inspection department. The aim is to reduce downtime of plant items and/or reducing maintenance and inspection costs. A case study of a company producing carbon black has been included to demonstrate the proposed methodology. Keywords: Maintenance, inspection maintenance, delay-time analysis 1. Introduction to delay-time analysis concept Delay-Time Analysis (DTA) is a concept whereby the time h between an initial telltale sign of failure u and the time to actual failure can be modelled in order to establish a maintenance strategy. Delay-time is the period of time when inspection or maintenance could be carried out in order to avoid total failure. Figure 1 illustrates the delay-time concept (Christer & Waller (1984)). Figure 1. The delay-time concept 2. Methodology In order to develop a maintenance model using delay-time analysis a methodology needs to be developed in order to give the process a framework. Delay-time analysis can be used as a tool for reducing the downtime, D(T) (Christer et al. (1995)) of a machine or a piece of equipment based on an inspection period T, given the probability of a defect arising within this time frame b(T). For a particular plant item, component or series of machines, delay-time analysis is useful because the equipment in question is generally high volume and high capital expense, therefore any reduction in downtime due to breakdown or over inspection can be beneficial. As with the modelling of downtime per unit time, it is also possible to establish a cost model, C(T) (Leung & Kitleung (1996)), again based on an inspection period T and probability b(T), this model estimates the expected cost per unit time of maintenance. This modelling has also been used for safety criticality (Pillay & Wang (2003)) on a fishing vessel giving safety criticality of a failure and operational safety criticality. A methodology for applying delay-time analysis is proposed as follows: • Understand the process. • Identify the problems. • Establish data required. • Gather data. • Establish parameters. • Validation of the delay-times and the distribution. • Establish assumptions. • Establish a downtime model D(T) and cost model C(T). When the probability distribution function of a delay-time f(h) follows an exponential distribution, i.e. when the failure rate λ or 1/MTBF is constant over a specified time period, the distribution function, as shown in equation (1), is used to calculate the probability of a defect arising b(T): f ( h ) = λ e − λh The probability of a defect leading to a breakdown failure b(T) can be expressed as follows in equation (2). (1) T ⎛T − h⎞ b(T ) = ∫ ⎜ ⎟ f (h)dh T ⎠ ⎝ 0 Inserting the distribution function f(h) into the breakdown failure probability b(T) gives 23 (2) T ⎛ T − h ⎞ − λh b(T ) = ∫ ⎜ ⎟λe dh T ⎝ ⎠ 0 (3) This term can be further simplified as b(T ) = 1 T T ∫ (T − h)λe − λh (4) dh 0 It is important to note that b(T) is independent of the arrival rate of a defect per unit time (kf) but it is dependant on the delay-time h. 2.1 Downtime model D(T) It has been demonstrated (Leung & Kit-leung (1996)), (Pillay et al. (2001)) that with establishing a probability for breakdown failure b(T) it is also possible to establish an expected downtime per unit time function D(T) as shown in equation (5). ⎧ d + k f Tb(T ) d b ⎫ D(T ) = ⎨ (5) ⎬ T +d ⎩ ⎭ where, d = Downtime due to inspection. = Arrival rate of defects per unit time. kf b(T) = Probability of a defect arising. = Average downtime for a breakdown repair. db T = Inspection period. Substituting b(T) from equation (4) into equation (5) gives ⎡1 T ⎤ d + k f T ⎢ ∫ (T − h)λe −λh dh ⎥ d b 0 T ⎣ ⎦ D (T ) = T +d (6) 2.2 Cost model C(T) Similarly, given the cost of inspection Costi, the cost of a breakdown CB and the cost of inspection repair CIR, the expected cost per unit time of maintenance of the equipment with an inspection of period T is C(T), giving k f T {Cost B b(T ) + Cost IR [1 − b(T )]} + Cost i C (T ) = (7) (T + d ) where, C(T) = The expected cost per unit time of maintaining the equipment on an inspection schedule of period of time T. CostB = Breakdown repair cost. CostIR = Inspection repair cost. Costi = Inspection cost. [ ] The cost of an inspection is shown in equation (8). Costi = (Costip + Costd) Tinsp where, Costip = Cost of inspection personnel per hour. Costd = Cost of downtime per hour. Tinsp = Time taken to inspect. (8) The cost of a breakdown is calculated as the cost of the failure plus the costs of corrective action to bring the equipment back to a working condition. The details of a breakdown repair are shown in equation (9). CostB = (Mstaff + Costd) (Tinsp + Trepair) + Sp + Se (9) where, Mstaff = Maintenance staff cost per hour. Trepair = Time taken to repair. Sp = Spares and replacement parts cost. = Special equipment / personnel / hire costs. Se 24 The cost of an inspection repair is somewhat identical to the breakdown repair cost apart from the following: • Inspection repair will not generally have equipment hire costs (Se). • The time to repair will be of shorter duration for inspection repair. The time for an inspection repair having a shorter duration is mainly due to a breakdown having a greater knock-on effect. The equation for inspection repair is shown in equation (10). CostIR = (Mstaff + Costd) (Tinsp + Trepair) + Sp (10) A point to note regarding the cost model C(T) (equation 7) is that it describes a worst case scenario. This worst case scenario is a fault leading to failure before an inspection takes place or a fault being detected at inspection. Conversely, a best case scenario would be no failure taking place before inspection and no fault being present at inspection. 3. Case study In order to demonstrate the above models for downtime D(T) and cost C(T) a case study of a factory producing carbon black in the UK is given. This particular process of creating carbon black is made up of three units A, C & D. The three units cover the whole process stream from the reactor, MUF (Main Unit Filter) which collects & separates the product from the gasses produced and conveying of the carbon black into storage containers. A low pressure air and natural gas produce a flame of high temperature (1500 degrees centigrade) in the combustion zone of the reactor. Heavy oil, which is known as feedstock, is sprayed into the flame and the carbon black reaction occurs. After the feedstock is exposed to the high temperature it is quenched with water in order to stop the carbon black formation reaction. At this point the basic form of carbon black is formed, carbon black powder. The filter is a bag type filter measuring approximately 10cm in diameter and 2.5m in length. The cost of a filter is around £28 each with a life expectancy of three to four years. There is a second manufacturer of the filter that has a cost of around £7.50 but it has a life expectancy of between 12 to 14 months with a lower tolerance to acid than the more expensive filter. 3.1 Costs of a failure When a filter bag is to be changed the compartment has to be closed down. This requires 8 hours of cool down followed by a period of 6 and 24 hours downtime for repair and replacement then a further 2 hours to warm the unit back up, if a total re-bag is required downtime is generally around 7 days. When a unit is brought off-line it continues to burn gasses in order to keep the temperature in the reactor constant thus wasting energy. Also the system allows any energy created can be used by the facility and any surplus energy is sold back to the national grid, therefore any downtime can be costly in respect of not just wasting energy but also potential income from surplus energy. Sometimes specialist maintenance crews need to be brought in to deal with the problem. A typical example of a breakdown which took 7 days to repair and replace all bags is demonstrated below. • Loss of production per hour: £1,500 • Burn of gasses per hour: £238 • Loss of export of energy per hour: £26 • Cost of maintenance personnel per hour: £28 • Cost of supervisor per hour: £36 • Cost of replacement filters (205): £40,180 • Jetting crew: £710 • Jetter hire: £300 • Cherry picker hire: £2,500 This gives a total cost for a breakdown resulting in 1,435 filters being replaced effecting 1 MUF for a period of 7 days to be £350,794. 3.2 Establishing a delay-time analysis In order to establish a delay-time analysis for this example several parameters need to be known. The parameters used in this example are as follows: • The arrival rate of a defect, kf - 0.28 per day. • Mean time between failure (MTBF) - 3 years. • Downtime for an inspection, d - 0.1 days. 25 • • • • Downtime for breakdown repair, db Breakdown repair cost, CostB Inspection repair cost, CostIR Inspection cost, Costi - 7 days. £350,974. £5,000. £67. Applying the parameters to equation (6) it is possible to establish an inspection interval where a minimum downtime is of primary concern as illustrated in figure 2. D(T) Expected downtime per unit time 0.100 0.090 0.080 0.070 0.060 D(T) 0.050 0.040 0.030 0.020 0.010 0.000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Days Figure 2. Optimal inspection period based on minimum downtime D(T) As illustrated in figure 2 the minimum inspection interval based on minimum downtime D(T) is 14 days. When the cost C(T) is of primary concern the optimum inspection interval is 11 days with a cost of £940 as shown in figure 3. If the inspection interval was moved to 14 days in line with minimum downtime the cost would rise to £977 which is a nominal increase of £37. C(T) Expected cost per unit time £5,000 £4,500 £4,000 £3,500 £3,000 C(T) £2,500 £2,000 £1,500 £1,000 £500 £0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Days Figure 3. Optimal inspection period based on minimum cost C(T) 4. Validation In order to analyse the effect of change to the results of D(T) and C(T) a sensitivity analysis was carried out on each model. The analysis varied certain input data by 5% and 10% resulting in the following. 4.1 Validation of D(T) The optimal inspection interval remains very close to the original interval given an increase and decrease of 5% and 10%. The sensitivity analysis for D(T) is shown graphically in figure 4. 26 Sensitivity analysis based on D(T) 0.100 -10% 0.090 -5% 0.080 0% +5% 0.070 +10% 0.060 D(T) 0.050 0.040 0.030 0.020 0.010 0.000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Days Figure 4. A graphical representation of the sensitivity analysis. 4.2 Validation of C(T) A sensitivity analysis was carried out on the cost of an inspection repair and the cost of an inspection in order to analyse the effect of a change in the costs. The cost of an inspection repair and an inspection has been increased and decreased by 5% and 10%. The sensitivity analysis is shown graphically in figure 5. The optimal inspection interval remains very close to the original interval given an increase and decrease of 5% and 10%. Sensitivity analysis based on C(T) £6,000 -10% £5,000 -5% 0% £4,000 +5% +10% C(T) £3,000 £2,000 £1,000 £0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Days Figure 5. A graphical representation of the sensitivity analysis. 5. Discussion It has been demonstrated in this case study that an optimal inspection interval taking into account a minimum downtime D(T) of 14 days has been established using the delay-time analysis technique. Using minimum cost C(T) as the criteria an inspection interval of 11 days with a cost of £940 was calculated. Current practice at the company is that of a weekly inspection interval involving a flame check and a cloth check. It can be argued that this inspection interval could move to a 2 weekly interval but given the nature of the two inspection checks and the fact that it does not stop production, a weekly inspection interval appears reasonable. 6. Conclusion This case study looked at a company in the UK producing carbon black. This paper demonstrates the delay-time concept for the use of minimising downtime and costs, setting inspection intervals to achieve this. Information was gathered form historical data as well as expert judgement, with parameters established from this information in order to develop the delay-time models. Acknowledgements The authors wish to thank Mr G.Wright and Mr A.Whitehead for their kind help for providing data and other necessary information. 27 References Arthur, N. (2005) Optimization of vibration analysis inspection intervals for an offshore oil and gas water injection pumping system, J. of Process Mechanical Engineering, 219, Part E, 251-259 Christer, A. H. and Waller, W. M. (1984) Delay-time models of industrial inspection maintenance problems, J. of the Operational Research Society, 33, 401-406 Christer, A. H., Lee, C. and Wang, W. (2000) A data deficiency based parameter estimating problem and case study in delay-time PM modelling, International J. of Production Economics, 67, 63-76 Christer, A. H., Wang, W. and Baker, R. D. (1995) Modelling maintenance practice of production plant using the delay time concept, IMA J. of Mathematics Applied in Business and Industry, 6, 67-83 Christer, A. H., Wang, W., Choi, K. and Sharp, J. (1998a) The delay-time modelling of preventive maintenance of plant given limited PM data and selective repair at PM, IMA J. of Mathematics Applied in Business and Industry, 15, 355-379 Christer, A. H., Wang, W., Sharp, J. and Baker, R. D. (1998b) A case study of modelling preventative maintenance of a production plant using subjective data, J. of the Operational Research Society, 49, 210-219 Leung, F. and Kit-leung, M. (1996) Using delay-time analysis to study the maintenance problem of gearboxes, International J. of Operation and Production Management, 6 (12), 98-105 Pillay, A. and Wang, J. (2003) Technology and safety of marine systems, Elsevier Science Publishers Ltd., Essex, UK, ISBN: 0 08 044148 3, 149-164, 179-199 Pillay, A., Wang, J., Wall, A. D. and Ruxton, T. (2001) A maintenance study of fishing vessel equipment using delay-time analysis, J. of Quality in Maintenance Engineering, 7 (2), 118-127 International Carbon Black Association (ICBA) (2004) Carbon Black users guide. Safety, health and environmental information The environmental protection act 1990. Chapter 43 Integrated pollution prevention control (IPPC) European commission (2006) http://ec.europa.eu/environment/ippc OREDA (2002) In offshore reliability data, 4th edition, SINTEF Industrial Management 28 A preventive maintenance decision model based on a MCDA approach Rodrigo José Pires Ferreira, Cristiano Alexandre Virgínio Cavalcante, Adiel Teixeira de Almeida Federal University of Pernambuco, Caixa Postal 7462, Recife - PE, 50.630-970, Brazil rodrigo@ufpe.br, cristiano@ufpe.br, aalmeida@ufpe.br Abstract: Maintenance techniques have an essential role in keeping systems available, especially in the competitive and expanding environment of the services sector. Failures have several negative implications, and the cost minimization model, frequently used in a manufacturing context, is not seen as efficient for many maintenance problems in a service context. With this in mind, quality of service is related to customer perception, the monetary value of which is difficult to estimate. This paper deals with the problem of replacement in service production systems. A maintenance multicriteria decision aiding model is proposed, based on integrating the PROMETHEE method and the Bayesian Approach. This allows decision makers to establish replacement intervals. A numerical application illustrates the decision model proposed and shows the model’s effectiveness regarding the decision maker's preferences. Keywords: multicriteria decision, preventive maintenance, maintenance policies 1. Introduction Maintenance management has a vital role in organizations by virtue of its being seen to have increasing importance over recent years. In the early-to-mid-twentieth century, maintenance was characterized as being a predominantly corrective activity, and this became more complex from the middle of the century as a result of industrialization. Later, at the end of the century, maintenance became more critical and important because of automization. The maintenance management area developed into various segments, such as Preventive Maintenance, Condition monitoring, Reliability Centered Maintenance (RCM), Total Productive Maintenance (TPM) and several applications of Operations Research (OR) (Dekker & Scarf (1998)). Given today´s high degree of competitiveness, the efficiency of maintenance management represents an element of competitive differential for companies both for one possessing high capital involvement and because of its direct relationship with the quality of products and services. In the goods sector, the inefficiency of maintenance management can directly result in the loss of production such as excessive use of overtime, hiring staff, stocks and other damages. In the services sector, the inefficiency of maintenance management can be critical and cause irreversible losses. Some reasons for this are that the service is given simultaneously and it cannot be stored. The level of the consequences in this sector can be visualized when lives are put at risk, should equipment failures occur, for example, during a serious medical surgery. Other examples are failures in air transportation equipment and failures in the distribution of electricity, which cause several direct and indirect consequences of great impact. These consequences are difficult to quantify in monetary units, but they must be taken into consideration and evaluated adequately in each context. Therefore, service not given in a satisfactory way because of the nonavailability of equipment can generate customer dissatisfaction. At the same time, customers will change their perception of the given service and probably stop requesting services from this company in future. Preventive Maintenance consists of actions that, in the attempt to prevent the occurrence of failures, are anticipated by substituting parts of the system, and therefore, in the terminology used in this paper, mention is made of the plan to substitute equipment or parts that can fail in operation, unless a substitution is made in time. In this context, preventive maintenance is appropriate for equipment where its failure rate increases with use (Glasser (1969), Barlow & Proschan (1965) and Barlow & Proschan (1975)). According to Lam (2006), a preventive repair is usually adopted for improving a system’s reliability and implementing it more economically. In many cases, such as in a hospital or a steel manufacturing complex, a cut in electricity supply may cause a serious catastrophe. A preventive repair is a very powerful measure, since preventive repair will extend the system’s lifetime and raise reliability at a lower cost rate. This paper presents a proposal for a decision model in the preventive maintenance area, the objective of which is to determine intervals of preventive maintenance in order to get a higher return of decision-making regarding system reliability and the cost expected of this maintenance policy. Thus, an analysis of reliability and cost simultaneously allow a policy of preventive maintenance to be established under the multicriteria decision aiding methodology approach based on the PROMETHEE method proposed by Brans & Vincke (1985) with Bayesian reliability analysis (Martz & Waller (1982)). This proposal 29 allows decision maker preferences to be dealt with in an appropriate way. The Bayesian proposal is used to apply the model in order to overcome the absence of data on failure. Many authors have described different models for Bayesian analysis applied to preventive maintenance. Brint (2000) considers a model that allows the interval to be extended between preventive maintenances actions for assets where failures can be catastrophic. Makis & Jardine (1992) consider a model for optimal replacement, the purpose of which is to specify simple rules for substitution in order to minimize, in the long run, the average cost expected by unit of time. Mauer & Ott (1995) observe the importance and the effect of uncertainties on costs. Thus, they follow a similar line to the previous authors. They present a model where the objective is to get the optimal value of time between substitutions, while taking into account uncertainty on the cost. Percy & Kobbacy (1997, 2000) consider stochastic models when there is still data available. They emphasize the intervention of preventive maintenance in order to prevent system failure. Silver & Fiechter (1995) deal with the problem of periodic maintenance under the approach of Bayesian analysis combined with heuristic procedures. Within the context of programmed maintenance, there are works that use the multicriteria decision aiding approach. Quan (2007) models a multi-objective problem and uses evolutionary algorithms to solve this problem introducing a form of utility theory to find Pareto optimal solutions. Kralj & Petrovic (1995) use a multi-objective approach. Gopalaswamy et al. (1993) propose a multi-criteria decision model, where three criteria are considered: the rate of cost of minimum substitution, the maximum availability and the reliability of the base component. Almeida (2004) presents a model based on multi-attribute utility theory. Lotfi (1995) presents a model based on multiple objective mixed linear programming. Chareonsuk et al. (1997) use the multi-criteria methodology PROMETHEE to establish the interval between maintenance actions taking into account two important criteria: cost by unit of time and reliability. Section 2 of the paper describes the problem and its features. Section 3 develops hypotheses concerning preventive maintenance and decision maker preferences for developing the decision model. Section 4 presents a numerical application illustrating the results of the analysis of the methodology proposed, and Section 5 provides a discussion of the results. 2. The problem In this section, the principal characteristics of the problem of preventive maintenance are presented, including the basic structure and the context of the problem. Initially, the problem considers preventive maintenance can be applied to a piece of equipment, to a component or a system. The policy for replacement by age is a procedure that consists of replacing an asset by a reserve one at the moment it fails or when it reaches a time of life T (replacement age). The asset reserve will be submitted to the same rules as the asset that is being replaced. This policy is only effective if the replacement cost before the failure happens, and provides some savings when compared to replacement due to failures. The main issue in this replacement policy is to determine the age at which an asset should be replaced at the lowest cost by unit of time of use. In many cases, cost represents the most important or only aspect of decision maker preferences. This situation is frequently seen in the production sector for goods. However, in the services sector, the decision maker can show a preference for minimizing undesirable consequences which are difficult to measure in financial units, due to their complexity. According to Almeida (2004), in this context, the customer is in direct contact with the production system. The output is produced while the customer is being served. Therefore, the product received by the customer is affected by problems due to failures in the production system. As a result, losses due to failures, or interruptions for preventive maintenance, cannot simply be counted in monetary form. In the future, the interruption consequences in the service can affect the wish of the customer to make a contract with that supplier, or to cancel the current contract. In this case, the consequences of failures cannot be transformed into costs. The objectives in this system endeavour to reduce costs when considered as part of a mix with other objectives, such as: availability, reliability of the production system, time during which the system is interrupted, and quality of the service. 3. The decision model The objective of the decision model is to determine the frequency of preventive maintenance in order to take advantage of this decision regarding decision maker preferences and the possible difficulties in estimating the probability distribution function of failures. A Bayesian reliability approach has frequently been applied in such cases (Martz & Waller (1982)). Through a multicriteria approach the set of alternatives of this problem is represented by the time, T, in which the activity of preventive maintenance will be carried out. In other words, the decision maker wants to evaluate the alternatives of time and to determine the one that is best adapted to his preferences and the recommendation as to the best time for preventive maintenance to take place. The determination of the timing of 30 replacement provides different features both for the reliability and for the structure of maintenance costs. Before this, the criteria of the decision model considered are the expected cost of maintenance, Cm(T), and reliability, R(T), assuming that the decision maker wishes to take into consideration these two criteria simultaneously, instead of just considering the expected cost of maintenance, Cm(T). The decision model should take into account the decision maker’s subjective aspects for making the decision, because, for each context, the criterion Cm(T) can require different levels of importance. For instance, if the consequence of the failure of a piece of hospital equipment is associated with deaths, the relative importance given to criterion R(T) may well be higher than in other contexts. In the literature, there are few applications of the multicriteria decision support methodology in the area of programmed maintenance. This paper presents a multicriteria approach which considers the treatment of uncertainties related to maintenance data. There is a diversity of texts that tackle the problem of equipment replacement, on account of very different focuses and of considering several aspects. However, a feature of such works is the use of the optimum paradigm, where only one objective function is considered. The hypotheses of the model are: - The set of alternatives is integer; in other words, there is a finite number of alternatives regarding replacement times; - The equipment is subject to use; in other words, the equipment presents an increasing rate of failure; - The replacement of a piece of equipment or part gives the system a good-as-new performance; - The times of failure of the equipment can be modeled using probability distribution; - The parameters of the distribution can be elicited from specialist knowledge. The main factors analyzed in the choice of a multicriteria decision support method decision are: the problem analyzed, the context considered in the problem, the structure of the decision maker preferences and the problematic. In this problem, analyzing the structure of decision maker preferences is assumed to be noncompensatory because of the difficulty of establishing a trade-off between the two criteria. Therefore, the outranking concept should be used instead of additive aggregation of methods that consider one synthesis function. The context of the problem justifies the reason for choosing the method. The problem is framed within the problematic of choice defined by Roy (1996). Fast use, easy interpretation by the decision maker and a flexible comparison process were fundamental factors in choosing the method. As a result of the features presented above, the PROMETHEE was the multicriteria decision support method chosen. The PROMETHEE method (Preference Ranking Method Enrichment Evaluation) consists of building and exploring a relationship of outranking values (Vincke (1992) and Brans & Mareschal (2002)). The methods of the PROMETHEE family are used in multicriteria problems of the type: Max { f1 ( x), f 2 ( x), ..., f k ( x) ∀ x ∈ A} (1) A is a denumerable finite set of n potential actions in f j (.), j = 1,2,..., k , k criteria that are the applications of A in the set of real numbers. Each one of the criteria has its own units and they do not have limitations in the case in that certain criteria are to maximize and others are to minimize. The use of this method allows that decision maker preferences for each attribute are taken into consideration and modeled by different functions for each attribute, called generalized criterion. Then, the performances of each one of the alternatives are evaluated for each criterion and, later, the alternatives are compared observing the differences of the performances modeled by the functions of generalized criterion. By using procedures established for the method, the values of the scores of each the alternatives are calculated. The choice of the best alternative is based on the alternative that attains the highest score. In addition to the multicriteria methodology adopted in order to deal with two criteria simultaneously Cm(T) and R(T), this was combined with the Bayesian analysis methodology in order to support situations where data on failure is absent. An interesting aspect of this problem is the conflict between the criteria. Reliability and maintenance cost are in conflict from the origin of the time to the minimum point of the function Cm(T), denoted by Cm*(T). However, for any alternatives of time starting from Cm*(T), an increment of the function Cm(T) provides a decrease of the function R(T), which is a characteristic of dominance. Thus, the alternatives starting from this point are dominated by the alternatives of time prior to this point. As a result, the alternatives starting from Cm*(T) can be ignored. These aspects are shown in Figure 1. 31 Figure 1. Reliability and Cost of Maintenance The estimate of the failure distribution should be obtained from historical data, values judgement or a combination of both. Although the probability distribution of failures can be obtained directly, starting from the data, there is difficulty in associating the data with the information necessary for planning replacement, in addition to which the data sampling is not sufficient to obtain a reasonable estimate of the probability of failure. An alternative approach for obtaining the model that describes the behavior of equipment failures, in the time dimension, is to assume a failure distribution and then to estimate the parameters of this function. This approach is justified because of the viability of using an appropriate mathematical model. The Weibull distribution was selected to model the distribution of the times between failures, since it is a useful distribution in a variety of applications, particularly, for the model of the life of devices. Besides the Weibull distribution being used commonly to model failures of equipment, it is also flexible, and could be used for several types of data (Nelson (1982) and Weibull (1951)). There follows the density function of failures of the Weibull distribution: β −1 ⎡ ⎛ x ⎞β ⎤ ⎛β ⎞ ⎛x⎞ (2) f ( x) = ⎜⎜ ⎟⎟ ⋅ ⎜⎜ ⎟⎟ ⋅ exp ⎢− ⎜⎜ ⎟⎟ ⎥ ⎢⎣ ⎝ η ⎠ ⎥⎦ ⎝ η ⎠ ⎝η ⎠ When data is absent and consequently there is uncertainty in both parameters of this distribution, the specialist knowledge can be used, through the concept of subjective probability. The parameters of the Weibull distribution are considered random variables θ1 and θ2, the distributions of which should be assessed from specialist knowledge on these variables π(θ1) and π(θ2), such knowledge being elicited (Raiffa (1970)). In order to deal with the uncertainty on the parameters η and β, the computation of the criterion reliability, R(t), considers the distributions π(η) and π(β), that follow the Weibull distribution. This criterion is denoted as Z1(T), and is calculated for each alternative of time in accordance with the following equation: t Z 1 = E[ R (t )] = 1 − +∞ +∞ ∫ ∫ ∫ π (β )π (η ) f ( x)dβ dη dx (3) −∞ −∞ −∞ The second criterion, Z2(T), is represented by the Cost of Maintenance, Cm(T), in accordance with the following expression: c ⋅ (1 − E[ R (t )]) + cb ⋅ E[ R(t )] Z 2 = Cm(t ) = t + ∞ + ∞ a (4) ∫ ∫ ∫ x ⋅ π (β )π (η ) f ( x)dβ dη dx + t ⋅ E[ R(t )] −∞ −∞ −∞ 4. Numerical application This section presents a numerical application in order to illustrate the model presented in the previous section. A hypothetical example is generated based on values close to the reality of a company. The application is carried out for a given piece of equipment with a view to planning preventive maintenance for it. The replacement policy by age is suggested in accordance with the features of the equipment. The objective of the model is to determine the most appropriate time between replacements. In addition, information on periods of failure and the costs before failure (cb) and after failure (ca) are necessary for the application of this policy. These values are presented in Table 1. 32 ca cb 1000 200 β η β η π(β) π(η) 5.40 3.15 1.80 6000 Table 1. Parameters for the decision model The alternatives of time were generated by considering the interval between 500 and 3000 days with an interval of 100 days between the alternatives. After generating the alternatives, the performances of these are calculated for the two criteria. The performance of the alternatives is shown in Table 2. The alternative of 2400 days is optimum for criterion Z2(T), then the alternatives with a time of more than 2400 days are ignored for they dominate. T Z1(T) Z2(T) T Z1(T) Z2(T) 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 0.98008 0.97010 0.95862 0.94599 0.93244 0.91809 0.90306 0.88745 0.87135 0.85482 0.83794 0.82079 0.80341 0.87698 0.76334 0.68669 0.63242 0.59265 0.56283 0.54007 0.52250 0.50884 0.49817 0.48982 0.48330 0.47825 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 0.78588 0.76823 0.75051 0.73279 0.71507 0.69741 0.67985 0.66239 0.64509 0.62795 0.61101 0.59427 0.57779 0.47435 0.47141 0.46926 0.46771 0.46672 0.46613 0.46590 0.46598 0.46628 0.46679 0.46747 0.46829 0.46917 Table 2. Performances of alternatives In this context, through an interactive process between the decision maker and the decision analyst, the function of generalized criterion (Fj(.)) is determined in order to model decision maker behavior vis-a-vis the extent of the differences (dj(.)) among the evaluations for each criterion (fj(.)). In this way, the indifference and preference thresholds are estimated. In Table 3, the function type of generalized criterion and its respective parameters are shown. Z1(T) Z2(T) Objective Weight Maximize Minimize 0.42 0.58 Preference Function Linear Linear Indiference Threshold 0.030 0.015 Preference Threshold 0.100 0.200 Table 3. Parameters for the decision model The PROMETHEE II method is used, thus allowing through the difference between the positive and negative flows to establish a complete pre-order among the alternatives evaluated. As a result, the application of the multicriteria decision support method generates the ranking of the alternatives arranged from the best to the worst. See Table 4. According to PROMETHEE II, 1200 days is the model´s solution. Sensitivity analysis for the weight of the criteria was carried out, and the weight of the criterion Z1(T) can vary in the interval between 34.33% and 56.88%, without affecting the model’s solution. T Φ+ Φ− Φ T Φ+ Φ− Φ 1200 1100 1300 1000 1400 1500 900 1600 1700 1800 0.3054 0.3136 0.296 0.3204 0.2854 0.2731 0.3223 0.2586 0.2427 0.2254 0.1214 0.1317 0.1256 0.1635 0.1352 0.1499 0.2224 0.1687 0.1919 0.2152 0.1841 0.182 0.1704 0.1569 0.1502 0.1231 0.0998 0.0898 0.0508 0.0102 800 1900 2000 2100 700 2200 2300 2400 600 500 0.3237 0.2072 0.1915 0.1808 0.3247 0.1749 0.1742 0.1745 0.3188 0.3169 0.3182 0.2381 0.2608 0.2833 0.4441 0.3056 0.3277 0.3496 0.5129 0.5641 0.0055 -0.0309 -0.0693 -0.1026 -0.1193 -0.1307 -0.1536 -0.1751 -0.194 -0.2473 Table 4. Results 33 5. Conclusions This work presents a proposition of applying a multicriteria decision model to preventive maintenace planning. This model dealed with the periodicity of the replacement for a certain item based on more than one criterion, and in the absence of data on failures, in order to provide appropriate support for the decision maker to determine the most opportune time to substitute an item. In addition, in relation to the thematic proposal, preventive maintenance has deserved more necessary investigation, to denote its applications, concerning replacement policies and problems arising. Under the form of bibliographical research, important works could be observed, especially ones which, under very different focuses, tackle the replacement of equipment. In conclusion, the proposition of the model supports one of the great concerns of the maintenance structures in organizations that have large sums tied up in fixed assets in their production plants. Acknowledgements This work is part of a research study funded by the Brazilian Research Council (CNPq). References Almeida, A. T. de and Cavalcante, C. A. V. (2004) Multicriteria decision approaches for selection of preventive maintenance intervals, MIMAR 2004 - 5th IMA International Conference on Industrial Maintenance and Reliability, Salford Barlow, R. E. and Proschan, F. (1965) Mathematical Theory of Reliability, John Wiley & Sons Barlow, R. E. and Proschan, F. (1975) Reliability and Life Testing Probability Models, Holt, Rinehart and Winston Brans, J. P. and Vincke, P. (1985) A preference ranking organisation method: (The PROMETHEE method for multiple criteria decisionmaking), Management Science, 31 (6), 647-656 Brans, J. P. and Mareschal, B. (2002) Promethee-Gaia, une Methodologie d´Aide à la Décision em Présence de Critères Multiples, Editions Ellipses, Bruxelles Brint, A. T. (2000) Sequential inspection sampling to avoid failure critical items being in an at risk condition, J. of the Operational Research Society, 51 (9), 1051-1059 Chareonsuk, C., Nagarura, N. and Tabucanona, M. T. (1997) A multicriteria approach to the selection of preventive maintenance intervals, International J. of Production Economics, 49 (1), 55-64 Dekker, R. and Scarf, P. A. (1998) On the impact of optimisation models in maintenance decision making: the state of the art, Reliability Engineering and System Safety, 60, 111-119 Glasser, G.J. (1969) Planned replacement: some theory and its application, J. of Quality Technology, 1 (2), 110-119 Gopalaswamy, V., Rice, J. A. and Miller, F. G. (1993) Transit vehicle component maintenance policy via multiple criteria decision making methods, J. of the Operational Research Society, 44 (1), 37-50 Kralj, B. and Petrovic, R. (1995) A multiobjective optimization approach to thermal generating units maintenance scheduling, European J. of Operational Research, 84 (2), 481-493 Lam, Y. (2006) A geometric process maintenance model with preventive repair, European J. of Operational Research. In Press Lotfi, V. (1995) Implementing flexible automation: a multiple criteria decision making approach, International J. of Production Economics, 38 (2-3), 255-268 Martz, H. F. and Waller, R. A. (1982) Bayesian Reliability Analysis, John Wiley & Sons, New York Makis, V. and Jardine, A. K. S. (1992) Optimal replacement in the Proportional Hazards Model, INFOR, 30 (2), 172 Mauer, D. C. and Ott, S. H. (1995) Investment under uncertainty: the case of replacement investment decisions, J. of Financial and Quantitative Analysis, 30 (4), 581-605 Nelson, W. (1982) Applied Life Data Analysis, Wiley & Sons Percy, D. F. and Kobbacy, K. A. H. (2000) Determining economical maintenance intervals, International J. of Production Economics, 67 (1), 87-94 Percy, D. F., Kobbacy, K. A. H. and Fawzi, B. B. (1997) Setting preventive maintenance schedules when data are sparse, International J. of Production Economics, 51 (3), 223-234 Quan, G., Greenwood, G. W., Liu, D. and Hu, S. (2007) Searching for multiobjective preventive maintenance schedules: combining preferences with evolutionary algorithms, European J. of Operational Research, 177, 1969–1984 Raiffa, H. (1970) Decision Analysis, Addison-Wesley Roy, B. (1996) Multicriteria Methodology Goes Decision Aiding, Kluwer Academic Publishers. Silver, E. A. and Fiechter, C. N. (1995) Preventive maintenance with limited historical data, European J. of Operational Research, 82 (1), 125-144 Weibull, W. (1951) A statistical distribution function of wide applicability, J. of Applied Mechanics, 18, 293-297 34 Predicting the performance of future generations of complex repairable systems, through analysis of reliability and maintenance data T. J. Jefferis*, 1, N Montgomery 2, T Dowd 3 1. Defence Science and Technology Laboratory, Farnborough, UK 2. Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada 3. Department of Aerospace Engineering, University of Bristol, Bristol, UK tjjefferis@dstl.gov.uk Abstract: Can the engineering performance of future generations of aircraft be predicted from current/historic data? In this paper data on the operation of sixteen types of Royal Air Force military aircraft gathered between 1983 and 2002 are analysed. The usage, failure rates and maintenance effort are considered for individual types, and comparisons are made between types with nominally similar operational profiles and between aircraft of different ages. Aircraft size and production cost are investigated as potential predictors of maintenance burden, with encouraging results. 1. Introduction For many years the Royal Air Force (RAF) has centrally collected data on the usage, reliability and associated corrective maintenance burden of the fixed wing and rotary wing aircraft that it operates. From 1970 this was achieved through a system named (rather unoriginally) Maintenance Data System (MDS). In common with most systems of that age MDS required manual completion of record cards for each relevant fault and the related maintenance actions, which were forwarded to a central processing cell for input. The capture of usage data followed a similar process. MDS was very well designed and has proved to be a useful and flexible system for over 30 years, but advances in both hardware and software mean that a similar system could now be implemented more flexibly and effectively using different techniques. At present the capture of platform usage and maintenance data is being transferred to a new system, the Logistics Information Technology System (LITS). This switch-over began in 2003 and various changes in the scope of the faults which are reported mean that comparisons between MDS and LITS data are problematic. Due to this difficulty and also because of the potential for data to have been lost during the switch between the two systems it was decided to limit this study to the data held on MDS. The data stored in MDS and LITS is widely used by the project teams responsible for the management of individual aircraft types to examine fleet usage issues, to prioritise options for improving aircraft availability and for other management tasks. However, as far as is known, the subject data has never been used to address the broader issues raised in this paper. Recent publications in the area of fleet-wide reliability and maintainability have tended to focus on new reliability metrics such as the maintenance and failure free operating periods. See Kumar (1999) and Hockley & Appleton (1997), for example. These studies focus on the design requirements of new aircraft. We have found no published examples of using historical data to examine overall trends in reliability. The data used in this study was for the calendar years 1983 – 2002 (inclusive) and covers sixteen types of fixed wing aircraft of various vintages (see Table 1). The first six of these aircraft types (Tornado IDS to Canberra) can be classified as ‘Fast Jets’ and represent the RAF’s front line aircraft. Aircraft Type Tornado IDS1 Tornado ADV2 Harrier3 II4 Jaguar Phantom Canberra Hawk Tucano Jetstream First Flight of Production Standard Aircraft 1977 1984 1987 1972 1960 1950 1975 1983 1968 Year of Entry to RAF Service 1982 1987 1987 1973 1968 1951 1976 1988 1973 1 Interdictor Strike, GR1 and GR4 Air Defence Variant F2 and F3 3 Based upon HS Harrier, first production flight 1969 4 GR5/GR7/GR9 standard 2 35 Production Cost / Unit, relative to Tornado IDS 1 1.29 .76 .50 .71 .36 .26 .07 .27 Max Take Off Weight (kg) 28000 27896 14061 15700 28030 24198 9100 3275 6950 Dominie5 Hercules K6 Nimrod7 Sentry Tristar VC-10 1963 1964 1967 1975 1971 1964 1965 1967 1969 1991 1984 1981 .33 .82 2.31 n/k 1.75 3.15 12700 70300 87090 156150 225000 151900 Table 1. Aircraft Types, Relevant Dates, Cost and Weight The next four (Hawk to Dominie) can be classified as ‘Trainers’ and represent the aircraft used by the RAF8 as part of their training processes (generally of aircrew). The final five types (Hercules K to VC-10) can be classed as ‘Large Aircraft’, which include Tankers, Transport Aircraft and Patrol Aircraft. 2. Data gathering As MDS contains detailed data on every unscheduled maintenance action undertaken on the subject aircraft a high degree of resolution can be achieved, where the situation warrants it, limited only by the integrity of the underlying data. However, as this particular study was looking for long-term trends the data for each aircraft type was grouped into six month periods, Jan 1 – 30 June and 1 July – 31 December for each calendar year. Within these periods the following relevant data elements were extracted: • Aircraft type • Total fleet flying hours • Arisings9 per flying hour • 1st, 2nd, 3rd/4th line10 and total maintenance man-hours per flying hour11 More detailed data has also been extracted for the period 1997 – 2005, which shows the contribution of different sub-systems to the Arising, No Fault Found and Adverse Operational Effect rates. It is hoped that, in due course, this will be utilised to further investigate the causes of differences in maintenance burden between aircraft types. As discussed below, during analysis of the MDS data it was found desirable to identify potential surrogate measures of system complexity, ideally measured against the current ‘state of the art’ when the aircraft was designed. Two relatively objective measures were identified: aircraft production cost and the aircraft maximum take-off weight, adjusted to constant economic conditions and constant production numbers. All of the data considered in this study is of some sensitivity. Therefore both the MDS data and the cost data have been scaled to conceal the absolute values, whilst maintaining the same relative relationships. 3. Initial data review and analysis The two measures of maintenance burden contained in the data extracted from MDS are the number of arisings and the effort that is required to fix them. The relevant values for fast jets are shown below in Figures 1 and 2. Consideration of these two measures shows that their values must be connected, as the overall maintenance effort (MMH/FH) is defined as the number of arisings, multiplied by the average time to repair. It is therefore no surprise to see that there are similarities between the two figures. In each case the Tornado IDS, Tornado ADV and Phantom are the three highest values, with the Harrier II, Jaguar and Canberra being the lowest three. One can also observe that, if the initial entry-into-service upturns and the transfer-to-LITS downturns are excluded, then there are no common, time-related trends. This might initially appear surprising, as conventional wisdom is that the maintenance burden of every aircraft increases with time, however, the long-term trend will actually reflect factors such as the relative effectiveness of maintainer ‘learning’, effectiveness of the systems improvements, which are undertaken in a attempt to improve Reliability and Maintainability, as well as the effects of aging. 5 Version of HS.125 Business Jet. Current production version named Raytheon Hawker. Based upon Lockheed Hercules A, first production flight 1956 7 Based upon the de Havilland Comet 4, first flight 1958 8 Data also includes Royal Navy Jetstreams 9 Arisings are events that require unscheduled maintenance actions (i.e. Faults) 10 During the data collection period RAF maintenance was divided into four ‘lines’. 1st Line and 2nd Line were undertaken at the Operating Bases, with 3rd and 4th Line being at Depot or in Industry. 11 MMH/FH 6 36 1.6 1.4 1.2 Tornado IDS Tornado ADV 1 Harrier II 0.8 0.6 Jaguar Phantom 0.4 0.2 Canberra Jan-01 Jan-99 Jan-97 Jan-95 Jan-93 Jan-91 Jan-89 Jan-87 Jan-85 Jan-83 0 Figure 1. Fast Jet Arisings per Flying Hour (Actual values normalised to Tornado IDS Average = 1) 1.4 1.2 Tornado IDS 1 Tornado ADV 0.8 Harrier II 0.6 Jaguar 0.4 Phantom 0.2 Canberra Jan-01 Jan-99 Jan-97 Jan-95 Jan-93 Jan-91 Jan-89 Jan-87 Jan-85 Jan-83 0 Figure 2. Fast Jet Maintenance Man Hours per Flying Hour (Actual values normalised to Tornado IDS Average =1) A similar lack of a common trend is seen in the graphs for the fleets of Training Aircraft and Large Aircraft, which are shown in figures 3 and 4. Investigation shows no relationship between the age of the design (i.e. when the aircraft type first flew) and the maintenance burden, for any of the classes of aircraft, however many of the aircraft fleets show non-zero gradients in the trend in the maintenance burden over time and these are shown in Table 2. It is possible that differences in the histories of these fleets (e.g. implementation of modification programmes) may explain some or all of these differences, but this must be left as a future investigation. At present we may simply observe that most aircraft fleets exhibit some significant time-related trend in the maintenance effort required to support them and that this trend can be increasing or decreasing. 0.6 0.5 Hawk 0.4 Tucano 0.3 RAF Jetstream 0.2 All Jetstream Dominie 0.1 Jan-01 Jan-99 Jan-97 Jan-95 Jan-93 Jan-91 Jan-89 Jan-87 Jan-85 Jan-83 0 Figure 3. Training Aircraft Maintenance Man Hours per Flying Hour (Actual values normalised to Tornado IDS Average =1) 37 1.4 1.2 Hercules K 1 Nimrod 0.8 Sentry 0.6 Tristar 0.4 VC10 0.2 Jan-01 Jan-99 Jan-97 Jan-95 Jan-93 Jan-91 Jan-89 Jan-87 Jan-85 Jan-83 0 Figure 4. Large Aircraft Maintenance Man Hours per Flying Hour (Actual values normalised to Tornado IDS Average =1) Aircraft Tornado IDS Ditto ADV Harrier II Jaguar Phantom Canberra Hawk Tucano Linear Correlation -0.455 0.365 0.824 -0.33 -0.233 0.203 -0.802 -0.456 p-value 0.003 0.026 0.000 0.037 0.368 0.291 0.000 0.011 Aircraft RAF J’stream All Jetstream Dominie Hercules K Nimrod Sentry Tristar VC10 Linear Correlation 0.017 0.607 -0.33 -0.56 -0.833 0.182 0.439 0.601 p-value 0.915 0.000 0.037 0.000 0.000 0.443 0.009 0.000 Table 2. Time trends of Maintenance Man Hours per Flying Hour 4. Analysis of aircraft size and complexity In the absence of sufficient data to explain the time-related trends in maintenance burden the average value of Maintenance Man Hours per Flying Hour is now used as a surrogate for each aircraft type’s maintenance burden, so that the relationship between this and aircraft size and complexity, as represented by Maximum Take-Off Weight and Production Cost, respectively, can be investigated. As might be expected it is found that the Large Aircraft, which are mainly modified airliners do not follow the same relationships as smaller military aircraft. Figure 5 therefore only includes data on Fast Jets and Trainers. A linear regression on this data gives a good fit, with R2=0.8002 for the equation: Normalised Average MMH/FH = 3.459x10-5 MTOW – 0.0628 The relationship between Cost and MMH/FH for the same aircraft gives similar results, but in this case the relationship is slightly less good, with R2=0.746. As might be expected, there is a correlation between Aircraft Weight and Cost, which is especially strong for the smaller aircraft. Because of this relationship the joint regression, including both Weight and Cost provides only a fractionally better predictive relationship (R2=0.825) than Weight alone. However, as prediction of the costs of future aircraft is difficult and contentious, the relationship which only requires weight is preferred. MMH/FH (Normalised) 1.2 1 0.8 0.6 0.4 0.2 0 0 5000 10000 15000 20000 25000 30000 Max Take Off Weight (kg) Figure 5. Max Take Off Weight vs Av MMH per Flying Hour 38 5. Conclusions Based upon the data obtained from the RAF’s Maintenance Data System it is clear that long-term trends do exist in the maintenance burden associate with maintaining the different aircraft fleets. However it has been found that, perhaps contrary to expectations, some of these trends are positive and some negative. The engineering history of the aircraft fleets will be further investigated to see whether these differences can be explained in terms of modification programmes, maintenance learning etc. With the relatively high-level data available it has been possible to derive relationships to predict the maintenance burden of future Fast Jets and Trainer aircraft. On this basis of the best of these it is predicted that the Typhoon, with its Max Take Off Weight of 23,500 kg, should require about 82% of the maintenance that the Tornado ADV, which it replaces. Similarly, it is predicted that the STOVL JSF will require about 33% more maintenance than the Harrier, which it replaces. Unfortunately, the variability in the data makes the prediction intervals rather wide. Naturally these predictions do not take account of any significant changes to construction techniques, such as greater use of composites, or to maintenance practices, such as the incorporation of Prognostics, all of which could drastically change the maintenance burden. References Hockley, C. J. and Appleton, D. P. (1997) Setting the requirements for the Royal Air Force's next generation aircraft, Reliability and Maintainability Symposium, 1997 Proceedings, Annual Kumar, U. D. (1999) New trends in aircraft reliability and maintenance measures, J. of Quality in Maintenance Engineering, 5, 287-295 Crown Copyright 2007. Published with the permission of the Defence Science and Technology Laboratory on behalf of the Controller of HMSO. 39 A note on a calibration method of control parameters in simulation models for software reliability assessment Mitsuhiro Kimura Department of Industrial & Systems Engineering, Faculty of Engineering, Hosei University, 3-7-2, Koganeishi,184-8584, Tokyo, Japan kim@hosei.ac.jp Abstract: In this study, we try to develop a calibration method of control parameters included in a Monte Carlo simulation model which generates time series data for software reliability assessment. The control parameters are calibrated so as to minimize the mean sum of squared errors between the actual observed data and the generated ones. As a result, the proposed method enables us to use the simulation model for assessing the software reliability based on the observed data. We show several numerical examples of the calibration and software reliability assessment by the simulation models. 1. Introduction In the literature of software reliability assessment modelling (see, e.g. Pham (2000)) which is based on software reliability data, we can find a lot of stochastic process models. Nothing to say, such stochastic models can give us several useful reliability assessment measures such as software reliability, MTBF, and so on (Lyu (1995)). However, these models also have several strong assumptions in general. For instance, although nonhomogeneous Poisson process (NHPP) models have been widely known as software reliability assessment models, these ones cannot deal with occurrences of simultaneous multiple fault removal phenomena which naturally occur in a real software testing phase. In order to overcome this unnatural assumption of NHPP models (Musa (1998)), for example, compound Poisson process models have been proposed (Sahinoglu (1992)). However it is known that the estimation of unknown parameters of the compound Poisson process models is difficult. Let us explain it in a more general sense. We here assume that there is a time series data set (ti , di ) (i = 1, 2,3,..., I ) which was observed from a non-deterministic system. di is an outcome from the system when the system time is ti . Usually it is natural that di is treated as a realization of some stochastic processes with a given time point ti . Therefore if we can find or develop an appropriate stochastic process model for the data set, we might be able to estimate unknown parameters included in the model and forecast the future behaviour of (ti , di ) (i = n + 1, n + 2,...) with several stochastic properties. However, in some cases, since a few stochastic models need to have several unrealistic and/or strong assumptions in order to make the models mathematically analyzable, such models might not describe real phenomena, or the models have difficulties in terms of parameter estimation. In contrast, Monte Carlo simulation models have more flexibility. The models can straightforwardly represent relatively complicated phenomena which are too tough for stochastic modelling in general. However, since simulation models run in one direction with previously fixed control parameters which are included in the models, and we are only able to obtain the execution results of the simulation. That is, although we can investigate the sensitivity of the model with respect to the control parameters, we usually do not estimate or calibrate control parameters inversely from the actual observed data. In this study, we propose a method of calibrating control parameters in Monte Carlo simulation models based on the least squares rule. The models appear in software reliability assessment modelling, and especially we consider the models that simulate the time-behaviour of software failure occurrences during the software testing phase. 2. Method of calibration Let us consider a Monte Carlo simulation program f (called simulator in this study, for short) which outputs a time series data set along with the given time points ti (i = 1, 2,3,..., I ) . Thus we represent f by f (ti | θ) where θ = {θ1 ,θ 2 ,θ3 ,...,θ m } is a vector consists of m control parameters included in the simulator. Since the mechanism of the simulator has some stochastic property, we can obtain one result vector f as a realization of the simulation. The vector can be shown by f = ( f (t1 | θ), f (t2 | θ), f (t3 | θ),..., f (t I | θ)) 40 (1) If we execute J times this simulation with a fixed θ , we consequently have a set of result vectors as {f1 , f 2 , f3 ,..., f J } (2) One of our main purposes is to calibrate the parameter vector θ in order that the simulator well emulates the observation (ti , di ) (i = 1, 2,3,..., I ) which was defined in Section 1. Let SE j (θ) be the sum of squared errors for j -th simulation result f j , that is I SE j (θ) = ∑ ( f (ti | θ) − di ) 2 (3) i =1 where j = 1, 2,3,..., J . Since f j behaves as random, SE j (θ) also fluctuates. Therefore, we have a sample mean MSE (θ) as J MSE (θ) = ∑ SE j (θ) / J (4) j =1 Consequently, we formulate the calibration problem of θ as θ% = min MSE (θ) (5) θ denotes a vector of calibrated parameters obtained. However, since MSE (θ) is not differentiable with where θ% respect to θ , finding θ% is not a very easy task. 3. Numerical examples In this section we show several numerical examples. We define the following: • S k (k = 1, 2,3,..., K ) is the realization of the kth software fault detection time • K is the number of detected software faults Table 1 and Figure 1 represent a sample data set to be analyzed (denoted by DS-1), which is cited from Goel & Okumoto (1979). Fault number Sk (days) Fault number 1 2 3 4 5 6 7 8 9 10 11 12 13 9 21 32 36 43 45 50 58 63 70 71 77 78 14 15 16 17 18 19 20 21 22 23 24 25 26 Sk S_k (days) 250 87 91 92 95 98 104 105 116 149 156 247 249 250 200 150 100 50 k 5 10 15 20 25 Figure 1. Plot of the data set (DS-1) Table 1. Software fault detection time data (DS-1) 3.1 One-parameter simulator for DS-1 We assume a simple black-box test (Pham (2000)) for the testing process which DS-1 was observed. A simulation algorithm of the black-box test is presented as follows. Step 1: Prepare a one-dimensional array variable p[1…psize] which represents a program to be tested, where psize is a positive integer number arbitrarily given, and it is related to the program size of the software system. Step 2: Set zi = 0 for i = 1, 2,3,..., S K (i.e., I = S K ). Step 3: For l = 1 to l = psize , set p[l ] = { 1 with probability θ1 0 with probability1 − θ1 41 (6) where ‘1’ represents a code including a software fault and ‘0’ is a clean one, and θ1 denotes unreliability per code. Step 4: Set i = 1 . Step 5: Choose an integer number c randomly from the range of [1…psize]. If p[ c ] = 1 then zi = zi + 1 and set p[ c ] = 0. Step 6: If ti < S K then i = i + 1 and go to Step 5, else Step 7. Step 7: Return f1 , i.e., f (ti | θ) = zi (i = 1, 2,3,..., S K ) . As a result of the above, we have a simulated fault occurrence data as (ti , f (ti | θ1 )) (i = 1, 2,3,..., S K ) . Figure 3 illustrates a sample path of f1 , where we set arbitrarily psize = 500 and θ1 = 0.15 . 3.1.1 Transformation of DS-1 In order to calculate equation 3, we need to transform DS-1 to the appropriate form (ti , di ) by following the procedure listed below. Step 1: S0 = 0 , j = 1 , and k = 1 . Step 2: If k satisfies S j −1 ≤ k < S j then tk = k , d k = j − 1 , k = k + 1 , go to Step 2, else Step 3. Step 3: j = j + 1 . If j < S K then go to Step 2 else Step 4. Step 4: tSk = S k , d SK = K . We present the result of the transform for DS-1 in Figure 4. 3.1.2 Calibration Now we are ready to calibrate the parameter of f . By letting J = 100 , we can obtain MSE (θ1 ) in equation (4) for a given θ1 . Therefore we have calculated MSE (θ1 ) with changing the value of θ1 . The results are plotted in Figure 5. Figure 5 is plotted with a fitted quadratic function. In order to obtain θ1 in equation (5), we have simply ) = 5485.04 . By using this result, we used the fitted quadratic curve. Thus we have θ1 = 0.14465 and MSE (θ% depict mean behaviour of the calibrated simulator f for 100 iterations with θ1 and 95% confidence intervals of the mean which were calculated under the assumption of normal distribution for variation around the mean value in Figure 6. 3.2 Two-parameter simulator as imperfect debugging model In this section, we extend the one-parameter simulator discussed in the section 3.1 by adding an imperfect debugging factor. There are several software reliability assessment models which take into account of imperfect debugging phenomena (Pham (2000)). However it is known that almost all of the imperfect debugging models are difficult to estimate the model parameters. Figure 3. Sample path of the one-parameter simulator Figure 4. Transformed data of DS-1 42 On the other hand, we can easily expand the one-parameter simulator by replacing Step 5 by Step 5’ as Step 5’: Choose an integer number c randomly from the range of [1..psize]. If p[ c ]=1 then zi = zi + 1 and set p[ c ]=0 with probability 1 − θ 2 , else go to Step 6 with probability θ 2 (0 ≤ θ 2 ≤ 1) . In this case, we consider two control parameters θ1 and θ 2 , where θ1 means unreliability per program code and θ 2 denotes imperfect debugging rate per debugging activity, respectively. The evaluation result of MSE (θ) is illustrated in Figure 7. In order to find ( θ1 , θ 2 ), we simply search the minimum value of MSE (θ) without fitting any curved surface on the plot. Thus the calibration results of the parameters are ( θ1 , θ 2 ) = (0.162, 0.20), and MSE (θ% ) = 3960.57 . We plotted the fitted simulator and the actual data with the 95% confidence intervals in Figure 8. As a measure of goodness-of-fit, we calculate (pseudo) AIC (Akaike (1974)) for these two models as AIC (one-parameter simulator) = 774.1 AIC (two-parameter simulator) = 690.7 AIC shows us that the two-parameter simulator is better than one-parameter model for DS-1. 3.2.1 Software reliability assessment measures We propose a quantitative measure for software reliability assessment as psize Number of faults after debugging Number of faults before debugging = ∑ p[l ] at t l =1 psize SK ∑ p[l ] at t l =1 (7) S0 We call this ratio the residual fault ratio and denote it by FR j , where j = 1, 2,3,..., J and J represents the number of iteration of the simulation. By using FR j , we can estimate the number of latent faults in the software before the testing phase, LF j . It is shown by LFj = K 1 − FR j (8) where K represents the total number of actually detected software faults given by the data set. More directly, we are able to assess the fault removal ratio R j by R j = 1 − FR j (9) Figures 9 and 10 illustrate the histograms of LF j and R j with J = 100 , respectively. Their mean values are E[LF ] = 80.95 and E[R]=0.3291 . The fault removal ratio estimated is quite low for DS-1. Figure 5. Behaviour of MSE and fitted quadratic curve Figure 6. Calibrated one-parameter simulator 43 d_i Fitted +1.96*SD 35 30 80000 25 60000 MSE 40000 250 0 theta1 * 1000 10 10 5 50 20 theta2 * 100 Fitted 15 150 100 Actual 20 200 20000 Fitted-1.96*SD 50 30 Figure 7. Behaviour of MSE of two-parameter model 100 150 200 250 t_i Figure 8. Calibrated two-parameter simulator 4. Concluding remarks This article has proposed the calibration method of control parameters included in Monte Carlo simulation models based on the least squares rule. We have applied this method to simple simulation models for software reliability assessment. The proposed method can find the optimal calibrated simulator by a heuristic way discussed in this study, however, we need to develop an effective method for finding θ% , when the number of control parameters becomes large. In the future, we also need to investigate the size effect of these simulation models. That is, in section 3, we set the size of the array variable psize = 500 arbitrarily. However, it must be the optimal size for doing the accurate reliability assessment by the calibrated simulator. In this sense, the estimation result of fault removal ratio R j in the previous section might be estimated defectively. Acknowledgment This work was partially supported by the Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (C), 18500066, 2006-2007. References Akaike, H. (1974) A new look at the statistical model identification, IEEE Trans. Automatic Control, AC-19, 716–723 Goel, A. L. and Okumoto, K. (1976) Time-dependent error-detection rate model for software reliability and other performance measures, IEEE Trans. Reliability, R-28, 206–211 Lyu, M. (1995) Handbook of software reliability engineering, IEEE Computer Society Press, Los Alamitos Musa, J. D. (1998) Software Reliability Engineering, McGraw-Hill, New York Pham, H. (2000) Software Reliability, Springer-Verlag, Singapore Sahinoglu, M. (1992) Compound Poisson software reliability model, IEEE Trans. Software Engineering, 18, 624–630 Freq . Freq . 20 20 15 15 10 10 5 5 LF_j 60 80 100 120 Figure 9. Histogram of LF_j (J=100) R_j 0.25 0.3 0.35 0.4 0.45 Figure 10. Histogram of R_j (J=100) 44 Estimating the availability of a reverse osmosis plant Mohammed Hajeeh *, Fatma Faramarzi, Ghadeer Al-Essa Kuwait Institute for Scientific Research, P.O. Box 24885, Safat-13109, Kuwait mhajeeh@safat.kisr.edu.kw Abstract: This paper presents an assessment of a reverse osmosis (RO) plant in Kuwait by analyzing its operational and downtime patterns. The plant is divided into main subsystems and the performance of each subsystem is derived. The overall performance of the RO plant was assessed from the performance of its subsystems. Assessment of the operational time of was considered to be more appropriate than other performance measure since the plant was deigned to operate continuously. The plant subjective assessment of failure probabilities of subsystems was made wherever detailed data were not available. The overall unavailability of the RO plant with and without is around 1.87 days/year and 0.9 days/year, respectively. Keywords: Performance measures, operational time, availability, failure probabilities 1. Introduction Fresh water is essential for life and living species. Many countries have abundant fresh water supplies, while others have limited resources. The problem of the scarcity of fresh water supplies is apparent in the Gulf Cooperation Council (GCC) countries where fresh water resources are below poverty levels. In these countries, the fresh water demand has increased from 4.25 billion cubic meters (bm3) in 1980 reaching 29.3 bm3 in 2000, Ebrahim & Abdel-Jawad (1994). Therefore; desalination technologies have been used extensively in these countries to produce fresh water to cover the progressive increase in demand. The GCC region accounts for around 45 percent of total desalination capacity in the world, Parekh (1988). Commercially available desalination techniques are categorized into two types, i.e., distillation and membrane–based technologies. The distillation processes transform water into vapor then condense it into a liquid state. This process requires power in the form of thermal and electrical energy. Commercially available desalination techniques include multistage flash (MSF), multi-effect desalination (MED), and vapor compression (VC). Membrane–based desalination techniques consume power in the form of mechanical or electrical energy. Two processes under this category are commonly used, i.e., reverse osmosis (RO), and electrodialysis (ED). However, the latter is mainly for brackish water desalination. Although several desalination technologies are used in the GCC, MSF is dominant and it accounts for approximately 80 per cent of the world’s plants. RO has been considered a successful process for desalination of brackish water and seawater, Ebrahim & Abdel-Jawad (1994), Parekh (1988). The first major breakthrough in commercial application of RO came in 1975 when Dow Chemical, Du Pont and Fluid Systems developed large-scale RO modules for the Office of Water Research and Technology, USA. Considerable amount of interest and research in the RO process throughout the world has been in evidence since that time. Today RO is considered to be a powerful process for the removal of various dissolved solids, thereby generating the ultrahigh purity in water needed for the pharmaceutical industry, research laboratories, haemodialysis, etc. RO has also assumed a prominent role in freshwater production, because of its unique ability to remove ionic impurities, colloids, organic, microorganisms, and pyrogenic materials. The most important subsystems of the RO plant are semipermeable membranes, filters, high-pressure pumps, feed-water pre-treatment system, and product-water post-treatment system (figure 1). High Pressure Pump Saline Feed Water Membrane Assembly Fresh Water Fresh Water PRETREATMENT POSTTREATMENT Concentrate Discharge Figure 1. A schematic presentation of the reverse osmosis plant The plant under study receives feed from either of the two beachwells located short distance from it. Each beachwell is fitted with a submerged pump; each of the pumps has a capacity of pumping seawater at a rate of 72 m3/h. Before feeding seawater to the RO systems, it is pretreated to eliminate any coarse pollution matter and biofoulants. Additional treatments including chlorination, filtration and antiscalant addition, dechlorination 45 are provided to ensure the long service of the RO module system. The design temperature is 250C (220C minimum and 350C maximum). Fine filtration is done through 5-µm cartridge filters in the final filtration stage. A high-pressure pump then pressurizes this high-purity filtrate to the required pressure level (60 to 65 bars) for the desalination process. Two types of RO membranes (one each for a train) are used, spiral-wound and hollowfiber twin, each with a capacity of 300 m3/d. The designed recovery rate has been fixed at 35%. The product water (i.e., high-purity water) is taken out at the end of the trains and sent for post-treatment (addition of minerals to make the water potable). The concentrated brine coming out from the system has a high pressure level (up to 50 to 60 bars). It is allowed to pass through the energy recovery system (a Pelton wheel turbine). The RO process is permanently monitored and volumetrically controlled to comply with the predetermined parameters. The membranes are cleaned at intervals depending on the actual service conditions; the cleaning/flushing equipment is made of suitable materials and comprised of a solution tank with motorized agitation, a pump, and cooling and heating facilities equipped with all necessary instruments such as temperature and pressure gauges and flow meters. The membranes are preserved with formaldehyde solution, if the shutdown period is longer than 4 or 5 days. Before the units are restarted, the formaldehyde solution is removed from the system and collected in the cleaning solution tank. The pH of the product water is maintained at 7.5 to 8.2 by the addition of bicarbonate ions (using a dolomite/limestone dissolution filter). The equipment, materials, and instruments (except the membranes) have a working life of 20 years when working continuously (90% availability) or intermittently at variable outputs. The layout and control of the plant and equipment ensure easy operation and the use of minimum manpower requirements. The central control panel for the plant enables the operator to start and shutdown the plant partially or completely. For minimum breakdown, the design specifications for all the subsystems of the plant were reviewed. Since the plant was a new plant and several other research objectives were attached to the operation parameters and performance of the plant, only the parameters relevant to the project were reviewed. In the RO process, the quality of feed is extremely important for the life of the RO membranes. The feed should be free from all suspended particles, and this is ensured by the filtration process. In the existing plant, the standby filters remove any possibility for failure due to malfunctioning of the online filter in the system. Hence for any future RO plant, it is recommended that standby filters be installed. The quality of seawater feed also determine the need for acid dosing, NaHSO4 dosing and addition of antiscalant. Normally the failure of a system rarely occurs due to failures of the pumps used for these pretreatments of the feed unless human error intervenes in the dosing activity. The availability of the RO plant, therefore, greatly depends on the performance of the high-pressure pumps, membranes and their housings, various seals at all the junctions, and the dial gauges/indicators recording the various parameters. The thickness and material for the membrane housing should be chosen based on the maximum pressure obtained from the high-pressure pump and the desired safety factor level. Before housing the membranes, any crack or non-uniformity of thickness of the shell should be checked. The high-pressure pump should have the highest reliability; hence, selection of the pump should be made very carefully. There should be a proper preventive maintenance plan for the high-pressure pump to reduce the probability of its failure during operation. The membranes should be cleaned following the procedure recommended by the supplier. If membrane is cleaned from time to time, both the quantity of the product water increases and the membrane life improves. To obtain higher availability of the plant, all the important parameters like quality of feed, amount of flow, pH, feed concentration, feed temperature, SDI, high-pressure pump outlet pressure, RO feed pressure, and brine pressure, should be monitored at regular intervals. Should any parameter drift from the desired value, the plant operation should be stopped; otherwise damage may occur to the materials, equipment and workers in the plant along and a bad quality of product water may be produced. 2. Research objectives The main aim of the work is to estimate the availability of the RO plant. The specific objectives are to: • carry out a detailed survey of operating conditions and performance of materials, components, and subsystems of the RO plant; • identify causes and sequences of failures in the RO plant; and, • assess the failure rate of the RO plant by identifying the components and events causing downtime and unwanted effects. 3. Failure analysis From the data recorded at the plant and from the design specifications of all the components and subsystems, it is felt that the failure of the RO plant many arise from failures of the beachwell pumps and the seawater feed line, the pretreatment process (chlorination, dechlorination, antiscalant addition, and pH control), cartridge filters, high-pressure pumps and motors, membranes, the cleaning/flushing system, the energy recovery system, 46 the post-treatment system for the product water, valves (including leakage), instruments and controls, and various pipe lines. Failure could also occur from loss of the power supply and human error; however, these are not considered since these are independent from the RO technology. Several safety analysis methods are used to assess the failure of a system. Failure Mode Effects and Criticality Analysis (FMECA) are considered as an important step in the risk and safety analysis study of a system. It involves reviewing as many components, assemblies and subsystems as possible to identify failure modes, critical failures and their causes and effects, Billinton & Allan (1992), Henley & Kumamoto (1992). It is mainly a qualitative analysis and is a very useful tool in suggesting design improvements to meet the reliability requirements. In this study, it was decided to go for a top-down approach, and the analysis was extended down to a level at which failure rate estimates were available from the data already collected. Since the RO plant was a new plant, the construction of FMECA was hampered due to lack of enough data on failures, their causes and their effects. Statistically the estimation of the failure times of all the components and subsystems of the plant was very difficult in the absence of sufficient objective data. Hence, in a few cases, subjective assessment of failure times, causes and effects of failures has been made from the experience of the plant personnel. The data on all the special incidents noticed at the RO plant were recorded in the log book and analyzed for the FMECA of the RO plant in the following fashion: • The RO plant was divided into its subsystems: beachwell, pretreatment of the feed, filtration, high-pressure pump, membrane system, energy recovery, and post-treatment of the product water and. • The system’s functional diagrams and drawings were reviewed to determine the interrelationships between the various subsystems. • The operational conditions, which might affect the system’s performance, were assessed and reviewed to determine the adverse effects that they could generate on the system. • For each subsystem (and, if possible, for components), the operational and failure modes were identified and recorded. In addition, possible failure mechanisms, which might produce the identified failure modes, were also recorded. • The mechanisms to detect a failure mode were then studied. • From the available data, a failure rate assessment was made for each of the failure modes. (Unfortunately the data was not sufficient since only two-and-a-half years’ data were available). • The failure effects were ranked based on their importance, and critical failures were identified. Event tree analysis (ETA) methodology is an inductive analysis. It starts with a specific initial event and follows all progressions of the accident and its contribution to the failure of other components, and subsystems. The probability of failure of the component/subsystem is calculated by tracing back and identifying the possibility of all accidents the let to it. Fault Tree Analysis (FTA) is another method used in safety analysis. It is a deductive methodology for determining the potential causes of accidents, or for system failures more generally, and for estimating the failure probabilities. FTA is centered about determining the causes of an undesired event, refereed to as the top event, since fault tree are drawn with it at the top of the tree. It then proceeds downward, dissecting the system in increasing details to determine the root causes or combinations of causes of the top event. Top events are usually failures of major consequence, engendering serious safety hazards or the potential for significant economic loss. FTA yields both qualitative and quantitative information about the system under study. Fault tree construction provides the analysts with a better understanding of the potential sources of failures, which will lead to rethink the design and operation of the system in order to eliminate many potential hazards. Once completed, the fault tree can be analyzed to determine the combinations of component failures, operational errors, or other faults that initiate the top event. Finally, fault tree may be used to calculate the demand failure probability, unreliability, or unavailability of the system under study. A FT is a diagram that displays the logical interrelationships between the basic causes of the failure. A few standard symbols, commonly known as gates, are used to depict the relationships between the events giving rise to the failure of the system. “AND gates” are used to connect the groups of events and conditions, if all of them are required to be present simultaneously to cause the hazardous event to occur; whereas “OR gates” represent the existence of the alternative ways in which a failure can occur. 4. Methodology The aim of reliability and availability of any continuously operated system in industry is to produce the desired level of output on a continuous basis without failures and to restore the system into an operable state as early as possible whenever the system suffers from failure, Haimes et al. (1992). The management can achieve the 47 lowest total costs possible, if reliability and availability are maintained at a high level. Reliability management of a system is a systematic approach to identifying and assessing the causes and frequencies of its failures, and reducing and/or controlling the effects of failures to provide the satisfactory performance of the system to the society, Bazovsky (1961). Component failures and human errors not only affect the performance of a system but they can also cause accidents. The frequencies of such events are assessed during the design stage of the system. In order to derive the maximum benefit, reliability analysis of such a system has to be made at the design stage and it should be carried on until the system is finally replaced. A fresh analysis may be recommended whenever there are modifications of the system. Since continuously operated systems can tolerate failures, the systems can be restored to an operational level by carrying out the required repairs and maintenance. For such continuous systems, a more appropriate performance measure is availability, which is defined as the probability that a system or component is performing its required function at a given point in time or over stated period of time when operated and maintained in a prescribed manner, Eblening (1997). It is classified under point-wise availability (i.e., availability at specific points in time), interval availability (i.e., availability for an interval of time) and inherent availability (i.e., long-run availability). For continuously operated systems, inherent availability is the most meaningful; it is a ratio of the total uptime (i.e. total operating time) to the total system time (i.e. sum of uptime and downtime). Availability takes into account not only the failure aspect of the system (reliability), but also the restoration of the failed components through repair or replacement (maintainability). Maintainability is a design feature, and appropriate considerations are required regarding this aspect for any continuously operated system. Since the total downtime is composed of the time for inspection and detection of faults, the time to repair faults, and administrative time, one can aim at minimizing each of these components to reduce the total downtime. Mathematical models have been designed to estimate the downtimes of systems, and researchers have approximated the downtime distributions as negative exponential distributions. If M ( t ) represents the maintainability function and µ represents the mean down time, then M ( t ) is given by M (t ) = 1 − e − µt (1) If the time-to-failure distribution indicates the reliability of the system, and most commonly, exponential distribution is used for this purpose, the reliability function is given by R(t ) = e − λt (2) A detailed FT diagram was drawn for the plant using OR and AND gates, as shown in figure 4, Chaudhuri & Hajeeh (1999). The outputs of these gates in terms of the event unavailability were computed. Unavailability of the plant due to power supply disruption and human error was not included in the overall estimation of unavailability since these are independent of the RO technology. Consider the AND fault tree as given in figure 2 where simultaneous existence of the events B1, …,Bn results in the top events. Thus, the system unavailability Qs (t ) is the probability that all events exist at time t and is given by Equation 3. TOP EVENT ............... B1 B2 B3 Bn Figure 2. Gated AND fault tree n Qs (t ) = ∏ (1 − Qi ) = Pr( B1 ∩ B2 ∩ ... ∩ Bn ) = Pr( B1 ) Pr( B2 )... Pr( Bn ) (3) i =1 For an OR fault tree as given in Figure 3, the top event exists at time t if and only if at least one of the n basic event occurs at time t. Therefore, the system unavailability is given by Equation 4. 48 TOP EVENT B1 B2 .............. B3 Bn Figure 3. Gated OR fault tree n Q s (t ) = 1 − ∏ (1 − Qi ) = Pr( B1 ∪ B 2 ∪ ... ∪ B n ) = 1 − {[1 − Pr( B1 )][1 − Pr( B2 )].....[1 − Pr( Bn ]} (4) i =1 Figure 4. A FT diagram plant using OR and AND gates 5. Results and Discussion Since the plant under study is new, small, and efficient, the unavailability (down time) is used as a performance indicator. The availability is calculated as Downtime Q =1− A = (5) Uptime + Downtime Since the plant was new, and statistical assessment of failure and repair rates was difficult due to the lack of data, system availability was computed as a ratio of the time the system was working satisfactorily and the total system time. 49 No Sub-System 1 2 3 Beachwell pumps (bwp): 1 operated , 1 standby Main filters: 1 operated, 2 standby Dosing: Antiscalant 1 operated, 1 standby Acid 1 operated, 1 standby 1 operated, 1 standby NaHSO2 4 High-pressure pumps (HPP): Train 1 Train 2 5 Energy recovery turbines (ERT): Train 1 Train 2 6 Reverse osmosis system: - Without ERT - With ERT Plant overall unavailability (Qplant)*: - Without ERT - With ERT Formula Unavailability (Q) 0.1262 x 10-3 0.00013 0.00000 0.2314 x 10-4 0.1134 x 10-2 0.2314 x 10-4 0.00118 0.03393 0.02985 0.03065 0.03107 0.00118 0.00381 0.002318(0.9days/year) 0.005113(1.87 days/year) * Qplant = 1- (1-Qbwp)(1-Qpre-treatment)(1-QRO system) Table 1. Unavailability for the different sub-systems for the RO plant 6. Conclusion The reverse osmosis (RO) process is one of the major processes for producing potable water from seawater through desalination. The performance of any RO plant depends on the failure behavior of its subsystems. Since RO plant is to be continuously operated with minimum amount of down time, the reliability of the subsystems is to be maintained at high level by proper design and selection of materials of these subsystems. Standby redundancies are needed for the subsystems, which are critical in nature and whose failures will cause the entire plant to stop. Since RO plants are likely to show very high availability, operation of a parallel combination of a few RO plants can become a viable alternative to other desalination plants used in the middle-east countries. Design and installation of RO plants for desalination of seawater are recommended for the region because it has high performance and economical. Future research should attempt to compare the performance of RO technology with other water desalination technology such as multi stage flash (MSF), which is extensively used in the region. References Bazovsky, I. (1961) Reliability Theory and Practice, Prentice Hall, Inc., Englewood Cliffs, New Jersey Billinton, R. and Allan, R. N. (1992) Reliability Evaluation of Engineering Systems, concepts and Techniques, Plenum Press, New York Chaudhuri, D. and Hajeeh, M. (1999) Reliability, availability and risk assessment for Reverse Osmosis, Technical Report, Kuwait Institute for Scientific Research, Kuwait Eblening, C. E. (1997) Introduction to Reliability and Maintainability Engineering, MacGraw-Hill Companies, Inc., New York Ebrahim, S. and Abdel-Jawad, M. (1994) Economics of seawater desalination by reverse osmosis, Desalination, 99 (11), 39-55 Haimes, Y. Y., Moser, D. A. and Stakhiv, E. Z. (edited) (1992) Risk-Based Decision Making in Water Resources, American Society of Civil Engineers, New York Henley, E. J. and Kumamoto, H. (1992) Probabilistic Risk Assessment, IEEE Press, New York Parekh, B. S. (edited) (1988) Reverse Osmosis Technology, Marcel Dekker Inc., New York 50 The use of IT within maintenance management for continuous improvement Anders Ingwald, Mirka Kans School of Technology and Design, Department of Terotechnology, Växjö University, S-351 95 Luckligs plats 1, Sweden anders.ingwald@vxu.se, mirka.kans@vxu.se Abstract: For a long time maintenance has been treated as a separate working area, isolated from other areas such as production and quality. However, the awareness of maintenance importance and complex nature has increased. To be able to take full advantage of maintenance, systems that assist in the task of planning and follow up on a continuous improvement basis are required. This paper describes the use of maintenance practices and how IT is used for maintenance management by using data from a survey about maintenance management in Swedish industry. The paper finds out that maintenance activities connected to the Plan and Check phases of the PDCA-cycle are emphasised to a low extent while the Do phase is more emphasised and that companies tend to intentionally select CMMS functionality that supports the Plan phase of the PDCA-cycle, if they put high emphasis on these kinds of activities. 1. Introduction Traditionally the maintenance has been planned and executed separately from dependent working areas such as production and quality. New manufacturing philosophies and complex production equipment has changed the situation. The consequences of problems and disturbances in the production process are today diverse and severe leading to higher demands on a reliable production, see e.g. Luce (1999), Vineyard et al. (2000) and Holmberg (2001). It has been shown that company-wide integration of maintenance and long term maintenance plans are important for the success of companies, Jonsson (1999). Furthermore, Mitchell et al. (2002) showed in a case study performed in England that successful companies have a better maintenance practice. They also conclude that a good maintenance practice can impact on broader practices and strategies and generate positive synergy effects. The changed role of maintenance has lead to new demands on information technology (IT) systems for the planning, execution and follow-up of maintenance activities. Traditional computerised maintenance management systems (CMMS) are supporting the execution of maintenance but gives low support for the planning and follow-up phases, see e.g. Liptrot & Palarchio (2000). Pintelon & Van Puyvelde (1997) point out the importance of a well functioning computerised maintenance reporting system and also the fact that most systems in this area are limited only to budget reporting. The level of computerisation and IT maturity varies between different activities within the company. The level of computerisation in maintenance management could for instance still be considered as low, and maintenance management information technology (MMIT) has not in general been in focus when developing the enterprise IT strategy. Jonsson (1997) reports that 64% of the 284 Swedish manufacturing firms included in the study used manual information systems. Still we can find maintenance departments relying on manual systems, or where MMIT is combined with paper documents and where a low extend of history is kept. The use of IT is connected to the main focus or main goals of the maintenance organisation. This focus could change in time, generally from efficiency goals towards higher goals of effectiveness and cost-effectiveness, but the opposite is also a possibility. If the change of focus is connected to the concept of continuous improvement, see for instance Deming (1986), the direction of the change would be towards higher maintenance goals. This paper will further explore the use of IT within maintenance in Swedish manufacturing industry, especially trying to answer whether continuous improvement is supported by the use of IT tools. 2. Study of IT use within maintenance To fulfil the aim of this paper the results from a survey conducted among Swedish industry during 2003 were used. In the following the survey participants, the choice and processing of variables and the connection between variables and the PDCA-cycle are presented. 2.1 Survey participants For this cross-sectional survey production plants with more than 100 employees where selected using information from Statistics Sweden (Statistiska Centralbyrån). The population was selected based on the Swedish Standard Industrial Classification (SE-SIC) 2002. The following industries were selected for the survey: Mining and quarrying except energy producing material, Manufacturing, Electricity, gas and water supply and Transport, storage and communication. 1440 questionnaires were sent out. Because of low number of responses from companies in some industries we limited the survey to industries with high response rate. The total number 51 of questionnaires in this restricted group was 539 and number of respondents 118 which gave a response rate of about 22%, see Alsyouf (2004). The respondents in this restricted population were distributed according to figure 1. 60 Percentage 50 40 30 20 10 M ec ha ni ca le ng in ee rin W g o P od ha an rm d ac tim eu be tic r al ,C he S m te ic el al an d M et al w or k P ul p an d P ap er M ed ia ,P rin tin g P et ro ch em ic al 0 Figure 1. Distribution of respondents 2.2 Determining and describing the variables of the study From the survey material two questions were used as input for this study: M1 (How much emphasise is placed on different activities) and IT4 (To what extent different CMMS modules or functions are used), see Appendix 1. For Question M1 an ordinal scale from 0 to 5 was used, where 1 denoted Not important and 5 Very important. For question IT4 an ordinal scale from 0 to 5 was also used, where 1 denoted Used minimally and 5 Used extensively. For both questions alternative 0 denoted Do not have. From M1 15 of a total of 26 variables were used. The remaining 11 variables were not relevant for our purpose. We then mapped the selected M1-variables against the IT4-variables, i.e. which IT-functionalities are required to perform a certain maintenance activity. The result was that three IT4-variables of a total of twelve where excluded as being irrelevant for our case. The results of the mapping after the reduction are shown in Table 1. In the following, a description of the connection between the remaining variables is given starting from maintenance activity number one, where on-line monitoring of critical machinery is directly connected to the functionality of analysing the condition-monitoring parameters. Further, equipment failure data capture and storage, which is the basis for equipment failure diagnosis, is needed for the recording of period and frequency of failures. The recording of period and frequency of short stoppages in addition to the ability to capture and store equipment failure data is connected to the use of maintenance key performance measures, as short stoppages is one of the parameters required for calculating Overall Equipment Effectiveness (OEE). The poor quality rate is also a parameter of OEE and therefore connected to the maintenance key performance measures. The activity of using failure historical data requires information about the equipment in form of parts list and the repair history. Information from different parts of the company requires of course access to data from production, finance, quality assurance, purchasing etc., but in this study only the maintenance related parts of data capturing, processing and storage was included. The functionalities of CMMS included in IT4 that supports the use of company wide information are equipment repair history, equipment failure data and maintenance key performance measures (which could include data from other locations than the maintenance organisation). For analysing equipment failure causes and effects, data from the equipment repair history, equipment failure data and condition-monitoring data could be utilised. Experiences from part repairs and the ability to find the failed component are important for the activity of restoring equipment to operation. These information are found in the equipment parts list and the repair history. For performing the maintenance according to original equipment manufacturers (OEM) recommendations, planning functionality is needed for work orders, preventive maintenance activities and spare parts requirement. The equipment parts list is needed to allocate the maintenance activities. The same functionalities are needed for preventive maintenance based on statistical modelling of failure data. In addition, failure data is needed for the modelling part. Activities performed based on condition-monitoring requires data from the repair history to adjust warning levels and the ability to analyse condition-monitoring parameters. The activity has not to be scheduled in beforehand, but except from that, the same planning functionality as for preventive maintenance is required. For decreasing the repair time, past repair experience is an important input when optimising the work order scheduling. Inventory control and spare parts requirement planning is required when working with keeping the spare parts inventory at a minimum. Historical repair data from current equipment is used when 52 evaluating and selecting OEM, while data about failures, repairs and other available maintenance performance measures such as OEE could be utilised for the improvement of production, for instance by finding problem machines and reducing bottlenecks by more efficient maintenance. Table 1. Variable mapping after reduction 2.3 Connecting the maintenance activities with continuous improvement The 15 remaining variables of M1, describing where the emphasis of maintenance is placed, were categorised into four groups: Information gathering, Information analysis, Maintenance execution and Improvement activities, as shown in Table 2. Furthermore, the relationships between activities and the PDCA-cycle were determined and included in Table 2. The first two groups covering variables 1-7 are activities connected to information needs and are input to the first phase of the PDCA-cycle, Planning. The groups three and four covering variables 8-15 are connected to the Do phase of the PDCA-cycle, where improvement activities are carried out. In the Check phase, information coverage is once more important, whereas the activities described in the variables 1-7 are used. The Act phase is aiming at making the improvement activity tested in the Do phase permanent and will affect future maintenance activities, for instance those described in variables 8-11, and the improvement activities, such as described in variables 12-15. The mapping between maintenance activities and continuous improvement shown in Table 2 is only an example valid for this dataset. We do not in any way say that these are the maintenance activities that are required for continuous improvement. Plan → Do → Check → Information gathering 1. On-line monitoring of critical machinery Information analysis 5. Use of failure historical data Maintenance execution 8. Restoring equipment to operation Improvement activities 12. Decreasing the repair time Information gathering Activity 1 to 4 Information analysis Activity 5 to 7 2. Recoding of the 6. Use of company 9. Performing the period and wide information for maintenance tasks frequency of failures diagnosis according to OEM 13. Keeping the level low in spare parts inventory 3. Recoding of the period and frequency of short stoppages 10. Performing the maintenance tasks based on statistical modelling of failure data 14. Helping the purchase department in OEM selection 11. Performing the maintenance tasks based on condition monitoring 15. Helping to improve the production process 4. Recording the poor quality rate 7. Analysing equipment failure causes and effects Act Will affect maintenance execution and improvement activities Table 2. Maintenance activities related to improvement according to the PDCA-cycle 53 3. Analysis Respondent answers for each of the fifteen maintenance activities were divided into two groups according to the importance: Very high importance (5) and High importance (4) in one group referred as “High” and another group covering other answers (3, 2, 1 or 0) referred as “Low”. The percentage distribution of these groups is shown in Figure 2. As can be seen, much emphasis is put on variables 7, 8, 9, 12 and 15, i.e. restoring equipment to operation, performing the maintenance tasks according to OEM, decreasing the repair time and helping improve the production process, while low emphasis is put on variables 6, 10 and 13, i.e. use of company wide information for diagnosis, performing the maintenance tasks based on statistical modelling of failure data and keeping the level low in spare parts inventory. 100 Percentage 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Variable number High emphasis (5 and 4) Low emphasis (3 to 0) Figure 2. Rating of the importance of maintenance activities The variables of the matrix presented in Table 1 were analysed by comparing the maintenance activities seen as important with the extent of which CMMS functionality that are used to support the activities, see Table 3. For example of the companies that put high emphasis on activity 6, use of company wide information for diagnosis, 38% have high use of the CMMS functionality e, equipment failure diagnosis, while of the companies that put a low emphasise on the same activity, only 2% have high use of the CMMS functionality e. Table 3. CMMS-functionality used for emphasised maintenance practices A set of functionality is forming the core of a CMMS, see for instance Kans (2005), i.e. work order handling, preventive maintenance scheduling, spare parts inventory, plant register, cost and budgeting and maintenance history. To be able to distinguish if CMMS functionality are intentionally selected by the company the authors defined a significant difference in the use of functionality as an increase of at least 50% between 54 those companies with low emphasis on the activity compared to those with high emphasis. As expected, some CMMS functionality have high use independent of whether the maintenance activity is emphasised or not: work order planning and scheduling, preventive maintenance planning and scheduling, equipment parts list, equipment repair history, inventory control and spare parts requirement planning. These functionality all belong to the core of a CMMS and are almost always available regardless if the user asks for them or not. Significant differences could be found for following functions: equipment failure diagnosis, analysis of conditionmonitoring parameters and maintenance key performance measures. 4. Results and conclusion From Figure 2, we can see that information gathering activities in general gets the lowest attention. The similar situation could be noted for information analysis: only the activity of analysing equipment failure causes and effects is highly emphasised in over half of the cases. The main emphasis is put on the execution of maintenance, reflected in activities such as restoring equipment to operation and performing planned maintenance. Still, companies are interested in improvement: two out of four improvement activities scored over 50% and one was close to 50% (49%). Thus, the Plan and Check phases of the PDCA-cycle are emphasised to a low extent while the Do phase is more emphasised. This is remarkable, as gathering information and analysing the current situation is the basis for further improvement work. Our interpretation is that the maintenance organisations are aware of the relationship between maintenance and other closely related working areas, but that resources, e.g. in form of knowledge, time or appropriate tools are lacking for the implementation of the continuous improvement concept within the organisation. A slightly different view is given when comparing maintenance activities with CMMS use. While many of the CMMS functionality seems to be used because they are belonging to the core of a CMMS, some CMMS functionality appear to be selected intentionally based on the company's need in maintenance. These CMMS functionalities are mostly related to planning and execution of maintenance and less to improvement activities. The exceptions are variable 15, helping improve the production process, and to some extent variable 7, analysing equipment failure causes and effects, which can be referred to improvement activities. In other words, companies tend to intentionally select CMMS functionality that supports the Plan phase of the PDCA-cycle if they put high emphasise on the activities that were defined in this study as belonging to especially information gathering but also analysis. According to the study most CMMS functionality that directly supports improvement activities, see Table 2, are highly used. This is not to say that they are actually used for improvement, since most of them are used almost to the same extent regardless if the improvement activities are emphasised or not. Furthermore, the CMMS functionality that supports data gathering is not used to a great extent. Thus, the entire improvement cycle is in general not supported by the use of IT tools in Swedish industry. References Deming, W. E. (1986) Out of the crisis, Cambridge University press, Cambridge, Massachusetts Holmberg, K. (2001) Competitive Reliability 1996-2000, Technology Programme Report 5/2001, Final Report Edited By Kenneth Holmberg, National Technology Agency, Helsinki Jonsson, P. (1997) The status of maintenance management in Swedish manufacturing firms, J. of Quality in Maintenance Engineering, 3 (4), 233-258 Jonsson, P. (1999) Company-wide integration of Strategic Maintenance: an empirical analysis, International J. of Production Economics, 60-61, 155-164 Kans, M. (2005) On the identification and utilisation of relevant data for applying cost-effective maintenance, licentiate thesis, Växjö University, School of Technology and Design Liptrot, D. and Palarchio, E. (2000) Utilizing advanced maintenance practices and information technology to achieve maximum equipment reliability, International J. of Quality & Reliability Management, 17 (8), 919-928 Luce, S. (1999) Choice criteria in conditional preventive maintenance: short paper, Mechanical Systems and Signal Processing, 13 (1), 163-168 Mitchell, E., Robson, A. and Prabho, V. B. (2002) The impact of maintenance practice on operational and business performance, Managerial Audition J., 15 (5), 234 – 240 Pintelon, L. and Van Puyvelde, F. (1997) Maintenance performance reporting systems: some experiences, J. of Quality in Maintenance Engineering, 3 (1), 4-15 Vineyard, M., Amoako-Gyampah, K. and Meredith, J. (2000) An evaluation of maintenance policies for flexible manufacturing systems: a case study, International J. of Operations and Production Management, 20 (4), 409-426 55 Appendix 1. Questions used from the survey M1 How much emphasize is placed on each of the following activities: Do not have Restoring equipment to operation (acute) Installing new equipment Keeping the level low in spare parts inventory Having inventory between machines, Work in Process/Progress (WIP) d. Decreasing the repair time e. Investing in improving the skills and competence of maintenance staff f. Use of computerized maintenance management systems (CMMS) g. Analysing equipment failure causes and effects h. Using failure historical data i. Off-line Monitoring of critical machinery purchasing (production is stopped during test) j. On-line Monitoring of critical machinery (test is done during production) k. Performing the maintenance tasks according to the original equipment manufacturer (OEM) recommendations l. Performing the maintenance tasks based on Condition Monitoring m. Performing the maintenance tasks based on statistical modelling of failure data n. Helping the purchasing department in OEM selection o. Performing periodic planned replacement p. Automatic diagnosis (expert system) q. Remote diagnosis (measurements are sent to another places for analyse) r. Use of company wide information for diagnosis s. Cross functional groups (for instance improvement groups) t. Helping improve the production process u. Helping design the production process v. Recording the period and frequency of failures w. Recording the period and frequency of short stoppages x. Recording the poor quality rate y. Annual overhaul a. b. c. IT4 0 □ □ □ □ Not important 1 2 □ □ □ □ □ □ □ □ Very important 4 5 □ □ □ □ □ □ □ □ 3 □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ To what extent are each of the following computerised maintenance system modules or functions used: □ Do not have a CMMS Do not have Used Used minimally extensively 0 1 2 3 4 5 a. Work-order planning and scheduling □ □ □ □ □ □ b. Preventive maintenance planning and scheduling □ □ □ □ □ □ c. Analysis of condition monitoring parameters □ □ □ □ □ □ d. Equipment failure diagnosis □ □ □ □ □ □ e. Equipment repair history □ □ □ □ □ □ f. Equipment parts list □ □ □ □ □ □ g. Manpower planning and scheduling □ □ □ □ □ □ h. Inventory control □ □ □ □ □ □ i. Spare part requirement planning □ □ □ □ □ □ j. Material and spare parts purchasing □ □ □ □ □ □ k. Maintenance budgeting □ □ □ □ □ □ l. Maintenance key performance measures □ □ □ □ □ □ 56 An approximate algorithm for condition-based maintenance applications Matthew J. Carr, Wenbin Wang Centre for Operational Research and Applied Statistics, Salford Business School, University of Salford, UK. m.j.carr@salford.ac.uk, w.wang@salford.ac.uk Abstract: Established condition-based maintenance techniques can be computationally expensive. In this paper we propose an approximate methodology using extended Kalman filtering and condition monitoring information to recursively establish a conditional density for the residual life of a component. The conditional density is then used to construct a maintenance/replacement model. The advantage of the methodology, when compared with alternative approaches, is the computational efficiency that potentially enables the simultaneous condition monitoring and associated inference for a large number of components. The application of the methodology is described for a vibration monitoring scenario and demonstrated using actual case data. Keywords: Condition-based maintenance, condition monitoring, extended Kalman filtering, residual life 1. Introduction The Kalman filter is a well known technique in the context of state estimation for discrete time stochastic systems. Efficient updating and prediction equations are easily established to obtain the parameters of the conditional distribution for some underlying state using stochastically correlated indicatory information. As such, it can potentially be a useful technique for condition monitoring (CM) applications involving the simultaneous monitoring of a large number of components when limited computational resources are available. This is particularly important for on-line diagnosis and prognosis when rapidly processing a large amount of data is required. Existing techniques such as proportional hazards modelling (Makis & Jardine (1991), Banjevic & Jardine (2004)) and non-linear probabilistic stochastic filtering (Wang & Christer (2000), Wang (2002)) involve numerical integration routines and can be computationally expensive to apply simultaneously to a large number of individually monitored components. The standard Kalman-filter can be derived within the framework of a general non-linear filter when the system and observation dynamics evolve linearly and the model errors are assumed to be independent and follow 0-mean Gaussian white noise processes, see Jazwinski (1970). However, in reality, these assumptions rarely hold. There are a number of varieties of the extended Kalman-filter (EKF) available in the literature on stochastic state estimation techniques. EKF’s are designed to enable the application of variations on the standard Kalman-filtering methodology to linearised versions of non-linear systems. The linearisarion is achieved using Taylor expansions of the state and observation equations. The order of the filter is dependent on the number of terms in the Taylor expansions that are included in the linearised equations. In this paper, we introduce a semi-deterministic form of the EKF where the state of the system is unknown but evolves deterministically. The approach is then adapted specifically for applications involving vibration information where, the underlying state that we are attempting to predict is defined as the residual life of an individual component. The deterministic element of the EKF process is designed to facilitate the exact relationship between realisations of the actual underlying residual life at sequential CM points throughout its lifetime. We then illustrate the application of the vibration model using a case-based example. When constructing EKF algorithms for conditional residual life prediction, the same principles will apply when using different types of CM information with the only difference being the specification of the relationship between the monitored information and the underlying residual life. 2. A semi-deterministic extended Kalman filter For a discrete time process, the evolution of a general state vector is described using the non-linear function x i +1 = f ( x i ) + η i (1) where, xi is the realisation of the state vector at the ith discrete time point at time ti. For a deterministic relationship, ηi is a 0-mean process with covariance matrix 0 and is henceforth removed from consideration. At the ith discrete time point, we describe the relationship between an observed information vector yi and the underlying state using the non-linear function y = h( x i ) + e i (2) i where, the measurement errors are normally distributed as ei ~ N(0, Ri). The first step in applying the extended Kalman filtering methodology is to linearise the functions f and h in equations (1) and (2). At the ith discrete time point, we define xˆ i|i = E[xi|Yi] as an estimate of xi that is conditioned on the observation history available until that point; Yi = {y1, y2, …, yi}. 57 We also define xˆ i +1|i = E[xi+1|Yi] as the one-step prediction of xi+1 that is again conditioned on Yi. The non-linear functions f and h are linearised as f ( x i ) ≈ f ( xˆ i|i ) + f ' ( xˆ i|i )( x i − xˆ i|i ) (3) h( x i ) ≈ h( xˆ i|i −1 ) + h ' ( xˆ i|i −1 )( x i − xˆ i|i −1 ) (4) and using these approximations, the state transition expression of equation (1) is written as x i +1 = f ' ( xˆ i|i ) x i + u i (5) where u i = f ( xˆ i|i ) − f ' ( xˆ i|i ) xˆ i|i . Similarly, the observation expression (equation (2)) becomes y = h ' ( xˆ i|i −1 ) x i + e i + w i (6) i where w i = h( xˆ i|i −1 ) − h ' ( xˆ i|i −1 ) xˆ i|i −1 . Assuming that the initial values for the underlying state, x0, and associated covariance matrix, P0, are known, the Kalman filtering methodology can then be applied directly to the linearised system given by equations (5) and (6). The extended Kalman filter is a recursive algorithm incorporating prediction and updating steps at each recursion. At the ith discrete time point, with the availability of the observed information yi, the equation for updating the mean estimate of the underlying state is xˆ i|i = xˆ i|i −1 + k i [ y − h ' ( xˆ i|i −1 ) xˆ i|i −1 − w i ] = xˆ i|i −1 + k i [ y − h( xˆ i|i −1 )] (7) i i where, the Kalman gain function (Harvey, 1989) is k i = P i|i −1h ' ( xˆ i|i −1 ) T [ h ' ( xˆ i|i −1 ) P i|i −1h ' ( xˆ i|i −1 ) T + R i ] −1 (8) For a semi-deterministic version of the EKF, the lack of variability is reflected in the prediction stage of each recursion where, using the original transition expression given by equation (1), a one-step forecast of the mean state vector is simply xˆ i +1|i = f ' ( xˆ i|i ) xˆ i|i + u i = f ( xˆ i|i ) (9) The covariance matrix for the state vector is also subjected to prediction and updating steps upon each recursion of the algorithm. At the ith time point, the covariance matrix is updated using P i|i = P i|i −1 − P i|i −1h ' ( xˆ i|i −1 ) T [h ' ( xˆ i|i −1 ) P i|i −1 h ' ( xˆ i|i −1 ) T + R i ] −1 h ' ( xˆ i|i −1 ) P i|i −1 which can be written as P i|i = P i|i −1 − k i h ' ( xˆ i|i −1 ) P i|i −1 (10) using the gain function given by equation (8). A one-step prediction of the covariance matrix is achieved using the equation P i +1|i = f ' ( xˆ i|i ) P i|i f ' ( xˆ i|i ) T (11) This concludes the description of the semi-deterministic EKF algorithm for general discrete time state-vector and observation-vector processes. The parameters of the algorithm are estimated in two stages. Firstly, the initial values x0 and P0 are estimated from available data using maximum likelihood estimation (MLE). Secondly, considering n individual observation histories, the parameters of the function h are estimated using MLE, the expectation-maximisation (E-M) algorithm and the relationship y | x ji ~ N (h ( x ji ), R ji ) (12) ji at the ith discrete time point in the jth history, where i = 1, 2, …, mj and j = 1, 2, …, n. The selection of an appropriate function h(xi) is essential when designing a useful filter for a particular scenario. We use the Akaike information criterion (AIC) which provides a means of comparing the maximum likelihood values obtained for different candidate forms and is derived on the assumption that the actual underlying dynamics of the state and observation processes can be described by a given model if its parameters are suitably adjusted, see Akaike (1974). The AIC statistic is AIC = 2 ( k – log e(L(θ )) (13) where L is the maximum likelihood value for the formulation and k is the number of parameters. The modelling option producing the minimum AIC is the best choice for a given data set. 3. Residual life prediction using vibration information 3.1 The system In this section, we tailor the extended Kalman filtering methodology described in section 2 to a CM scenario involving vibration monitoring and residual life estimation for individual components. At the ith vibration 58 monitoring point at time ti, we define a singular underlying state, zi, as the residual life that remains before the failure of an individual component. At this time, we obtain information in the form of the overall vibration level yi that is used to refine our estimate of zi and Yi represents the vibration monitoring history { y1, y2, …, yi } that is available until that point. If we assume that a given component has not failed before a specific monitoring point, the residual life at that time can only be defined as a positive value. As such, we choose to model the residual life as a log-normal random variable and define the unknown underlying state of the EKF algorithm as xi = loge(zi) at the ith CM point. As discussed in the general specification (section 2), the updated and predictive conditional densities for xi are derived using the EKF algorithm and are described by xi|i ~ Ν ( xˆi|i , Pi|i ) and xi +1|i ~ Ν ( xˆi +1|i , Pi +1|i ) respectively. Similarly, at the ith CM point, the conditional densities for the underlying residual life are z i|i ~ log Ν ( xˆ i|i , Pi|i ) and z i +1|i ~ log Ν ( xˆ i +1|i , Pi +1|i ) respectively. Point estimates of the residual life are obtained as zˆ i|i = E[ zi|i ] = E [ z i | Y i ] and zˆ i +1|i = E[ z i +1|i ] = E [ z i +1 | Y i ] , with associated variance Λi|i and Λi+1|i respectively, where the conditional mean estimate and variance at the ith CM point are zˆ i|i = exp( xˆ i|i + 0.5Pi|i ) Λ i|i = (exp( Pi|i ) − 1) exp(2 xˆ i|i + Pi|i ) (15) (16) and analogous results apply for zˆ i +1|i and Λi+1|i. Modelling the transition in the underlying state between two successive monitoring points, we have z i +1 = z i − (t i +1 − t i ) as the change in the residual life over the duration between the ith and (i+1)th CM point when zi > ti+1 - ti. From equation (2), the relationship between the observed vibration information and the underlying residual life is described by the expression y i = h ( x i ) + ei (17) at the ith CM point where ei represents the observation noise. 3.2 The algorithm As discussed in section 2, EKF’s and standard Kalman-filtering algorithms are often presented as a two step process involving prediction and updating stages. The algorithm is initiated at the start of a new components operational life using x̂ 0|0 and P0|0. The prediction stage of the recursive algorithm forecasts the estimate of the state over the interval (ti, ti +1). Assuming that the initial estimates ẑ 0|0 and Λ0|0 are known, the one-step deterministic prediction of the residual life at the ith CM point is zˆ i +1|i = zˆ i|i − (t i +1 − t i ) (18) if zˆi|i > ti +1 − ti and zˆ i +1|i → 0 otherwise. As the change in the state over the interval (ti, ti +1) is deterministic, the variance about the mean estimate remains fixed as Λi+1|i = Λi|i. Reversing the relationship given by equation (15), we transform zˆ i +1|i into xˆ i +1|i for insertion into the next stage of the algorithm as ⎛ Λ i +1|i ⎞⎟ xˆ i +1|i = log e ( zˆ i +1|i ) − 0.5 log e ⎜1 + (19) ⎜ zˆ i +1|i 2 ⎟⎠ ⎝ and again, without any random variation in the prediction of the state, we have no change in the variance over the interval between CM points (ti, ti +1) as Pi +1|i = Pi|i . The updating stage of the algorithm is undertaken upon obtaining the vibration information at the ith CM point. The updating equation for the log of the residual life is xˆ i|i = xˆ i|i −1 + k i ( y i − h ( xˆ i|i −1 )) (20) where, from equation (8), the Kalman gain function is Pi|i −1 h ' ( xˆ i|i −1 ) ki = h ' ( xˆ i|i −1 ) 2 Pi|i −1 + Ri and h ' ( xˆ i|i −1 ) = dh( x) / dx x = xˆi|i −1 (21) . The updating equation for the variance about xˆ i|i is Pi|i = Pi|i −1 − k i h ' ( xˆ i|i −1 ) Pi|i −1 (22) and to obtain the conditional mean and variance of the residual life, we utilise equations (15) and (16) respectively. 59 3.3 Replacement decisions For this particular application scenario, the main advantage of establishing a conditional probability distribution over the evaluation of a single point estimate of the residual life is the availability of the cumulative density function. The c.d.f. enables the construction of replacement decision models that incorporate the probability of failure before a particular instant conditioned on the CM history to date. At each monitoring point throughout the life of an individual component, an optimal replacement time can be scheduled using renewal-reward theory and the long run ‘expected cost per unit time’, see Ross (1996). Initially, some further notation is defined; TR is the planned replacement time (to be optimised), CP is the cost of a preventive replacement, and CF is the replacement cost associated with the failure of an individual component. At the ith CM point (time ti) the expected cost per unit time is given by C (t i , TR ) = E[cycle cost | TR] / E[cycle length | TR] (23) where, C(ti,TR) is to be minimised with respect to TR. As such, the replacement decision at the ith CM point is obtained via the minimisation of C P + (C F − C P ) Pi ( z i < TR − t i | Y i ) (24) C (t i , TR ) = T −t u + t i + (TR − t i )(1 − Pi ( z i < TR − t i | Y i )) + R i ∫ z pi ( z | Y i )dz 0 where u represents the age of the component prior to monitoring (for new components; u = 0), an upper-case P represents a probability, a lower-case p denotes a conditional density and the residual life at the ith CM point is distributed as z i|i ~ log Ν ( xˆ i|i , Pi|i ) . 4. Case example In this study, the EKF algorithm developed in the previous section is applied to a scenario involving the estimation of residual life. The condition monitoring information consists of the overall vibration level recorded at irregular time points during the operational life of 5 components with associated failure-time information. Previous studies have shown that vibration monitoring scenarios can often be modelled effectively using a twostage process representing normal and defective operation, see Wang (2002). The vibration signal is usually relatively stable during the first stage and begins to increase upon the occurrence of a defect. It is in the second stage of analysis that techniques such as stochastic filtering are useful for residual life prediction. In this example, we are only concerned with components operating in a defective state and assume that a fault detection technique has been used to determine the start of the second stage, defined as time 0 in the following. Figure 1 illustrates the vibration histories for the 5 components and the threshold level representing the start of the defective stage of operation. The costs associated with failures and preventive component replacements are £6000 and £2000 respectively. 35 CM history 1 30 CM history 2 Vibration level CM history 3 25 CM history 4 20 CM history 5 Threshold 15 10 5 0 0 200 400 600 800 1000 Time (hrs) Figure 1. The vibration monitoring histories To initialise the EKF algorithm, we apply MLE using the failure time information from the 5 components in the data set. The initial values are x̂ 0|0 = 4.048 and P0|0 = 0.223 and the corresponding initial estimate of residual life is ẑ 0|0 = 64.039 with associated variance Λ0|0 = 1024.5. We consider a number of candidate forms for the function h in equation (17) representing the relationship between the vibration information and the log of the 60 residual life. MLE and the AIC criterion in equation (13) are used to parameterise and select between the candidate forms. The results of the parameter estimation and model selection process are given in table 1. It is evident from the results in table 1 that the EKF modelling option with the functional form h(xi) = a + b exp{- c xi} produces the minimum AIC and is the chosen function for this case study. a b c R log(L(θ)) AIC a + b xi h(xi) a + b / xi a + b exp{- c xi} 39.138 -7.955 16.572 -64.924 135.848 0 41.688 22.672 -68.529 143.058 6.002 383.641 1.357 13.327 -62.418 132.836 * Table 1. The estimated parameters and model selection results for candidate forms of the vibration-based EKF algorithm Figure 2 illustrates the conditional residual life distributions obtained at each monitoring point for component 4 using the vibration-based EKF algorithm. The black dots represent the actual underlying residual life at each CM point. It is clear from the figure that the residual life distributions produced by the EKF algorithm are appropriately distributed about the actual residual life for the component history considered and as we would expect, accuracy improves as more CM information is obtained over time. Figure 2. The conditional residual life distributions obtained at each CM point for component 4 using the EKF algorithm As discussed in section 3.3, a replacement decision at the ith CM point can be evaluated using the age of the component, the average cost of failures and replacements, and crucially the conditional distribution for the residual life. As such, the form of the conditional density has a large impact when scheduling the replacement time. We are considering an irregular monitoring process and at the ith CM point, the time of the next CM point would be unknown. As we are not modelling a decreasing operational capability, the desirable property of the replacement decision would be to provide the maximum possible operational availability without the component failing which is reliant on the fit of the conditional density about the actual underlying residual life. Continuing the illustration of the analysis for component 4, by illustrating the expected ‘cost per unit time’ against the associated ‘time until replacement’, figure 3 demonstrates how the conditional distributions for the residual life affect the associated replacement decisions at each CM point. 5. Discussion In this paper, we have presented an approximate methodology for residual life prediction using the principles of the well known Kalman filter. The methodology is designed to enable simultaneous and efficient 61 processing/inference for a large number of components using CM information. The decision to use the EKF approach over less approximate methodologies, such as non-linear probabilistic filtering or proportional hazards modelling, will be application specific and will depend on the particular trade-off in efficiency versus precision. The trade-off will be dependent on the number of components that require simultaneous condition monitoring, the processing power that is available, and the costs associated with preventive replacements and component failures. Figure 3. Illustrating the expected cost per unit time against potential replacement time, TR, at each CM point for component 4 The results of the case study indicate that the EKF algorithm could be a useful approach for residual life prediction for applications involving multi-component monitoring and limited computational resources. In an extended version of this paper, currently under construction, we are including more information on the parameter estimation and model selection processes. In addition, we discuss the inclusion of higher order terms in the Taylor expansions of the system equations and provide a case comparison with some alternative techniques used for residual life estimation. Acknowledgement The research documented in this paper has been supported by the Engineering and Physical Sciences Research Council (EPSRC, UK) under grant EP/C54658X/1. References Akaike, H. (1974) A new look at the statistical model identification, IEEE Trans. Automatic Control, 19 (6), 716–723 Banjevic, D. and Jardine, A. K. S. (2004) Calculation of reliability function and remaining useful life for a Markov failure time process, Proceedings of 5th IMA conference in maintenance and reliability modelling, Ed.’s; Wang, W., Scarf, P. and Newby, M., 5-7 April 2004, University of Salford, UK Harvey, A. C. (1989) Forecasting Structural Time-Series Models and the Kalman-Filter, Cambridge University Press Makis, V. and Jardine, A. K. S. (1991) Optimal replacement policies in the proportional hazards model, INFOR, 30, 172-183 Ross, S. (1996) Stochastic Processes, 2nd edition, Wiley Wang, W. (2002) A model to predict the residual life of rolling element bearings given monitored condition information to date, IMA J. of Management Mathematics, 13, 3-16 Wang, W. and Christer, A. H. (2000) Towards a general condition based maintenance model for a stochastic dynamic system, J. of the Operational Research Society, 51, 145-155 62 The utility of a maintenance policy Rose Baker Centre for Operational Research and Applied Statistics, University of Salford, UK R.D.Baker@salford.ac.uk Abstract: In the author’s concept of risk-averse maintenance, we seek to minimise the disutility of cost per unit time rather than to minimise cost per unit time itself. This gives a maintenance policy that is optimal under riskaversion. The concept, introduced at the last conference in this series, is illustrated with a new example: the maintenance of a standby system. A difficulty with the use of a utility function is that it is difficult to elicit a value for the risk-aversion parameter. The use of a plot of the certainty-equivalent cost of all feasible maintenance policies against the risk aversion parameter is suggested as a way round this difficulty. Keywords: maintenance, inspection, utility function, risk aversion, standby system, Wald identity 1. Introduction Economists have long used the concept of utility to model decision-makers’ risk aversion. When making an investment in the stock market, for example, a strategy that sought only to maximise expected gain would be unduly risky. Seeking rather to maximise a concave utility function of gain leads to the classic Markowitz scheme, in which a large portfolio of stocks is purchased. Utility functions are ubiquitous in economic thought, rather oddly however, in operational research cost or cost per unit time is usually the criterion to be minimised. Risk aversion is simply ignored. The little published work that is the exception occurs in warranty and inventory. Thus Padmanabhan & Rao (1993) and Chun and Tang (1995) studied risk-averse warranty policies. In inventory, the classic newsvendor problem has now been tackled from a utility-function viewpoint, e.g.Dohi et al. (1994), Eeckhoudt et al. (1995), Dionne & Mounsif (1996) and Keren & Pliskin (2006). Maintenance and reliability seems a suitable area to explore risk-averse policies, because there are numerous cashflows occurring stochastically. What might be seen by some as overmaintenance, in the sense that mean cost per unit time is not minimised, could be optimal as a risk-averse policy, in which the large unscheduled losses from failure have such a disutility that very frequent maintenance is carried out. Baker (2006) studied risk averse maintenance policies, and this paper extends that work in several directions. A fresh example of the calculation of expected utility is given, for inspection of standby systems. The use of utility functions other than the exponential Pratt-Arrow utility is discussed, and finally a graphical aid is introduced that can help decision makers evaluate the relative merits of differing maintenance policies. 2. A utility-based approach We first briefly recapitulate the key results of the earlier work. The exponential utility function of money y was used, defined as u= 1 − exp(−η y ) η , (1) where η > 0 is a measure of risk aversion. An expenditure x = − y has disutility −u = exp(η x) − 1 η , (2) and this form is used from now on. The certainty-equivalent cost D is the sum of money that if definitely gained or lost would have the same expected utility as the variable cashflows of the policy. Hence if a policy is carried out for time t , we have that exp(η Dt ) − 1 η or D= = E exp(η X ) − 1 η ln{E exp(η X )} . ηt , (3) We consider failures of a system under some maintenance policy, where the system reaches a regeneration point after a cycle of variable length. For example, at a regeneration point the whole system might be replaced. 63 Let the cost over the i th cycle be Fi . Then when cycles are of fixed length the certainty-equivalent expenditure per unit time over k cycles is ln E{exp(η ∑ i =1 Fi )} k D= kητ where τ is the fixed cycle length. Hence as the cycles are independent, their expected utilities are equal, and D= ln E exp(η F ) ητ , (4) dropping the cycle subscript i . When cycle length is variable, equation 4 becomes exp(η DT ) = E N ( E{exp(η F )}) N ( t ) , where EN denotes the expectation over the number of cycles. Thus ln EN exp{ln( E exp(η F )) N (t )} = η Dt . (5) To find D , we use the Wald identity from the theory of random walk. Given times S n to the end of the n th cycle, and with M (θ ) as the moment generating function for cycle length, then with unit expectation. The Wald identity E exp(θ Sn ) M n (θ ) is a martingale exp(θ Sn ) =1 M n (θ ) (6) follows, and the crucial step is to see that equation 6 is true under any stopping rule, and to choose to stop at a very large time, the first regeneration after N (t ) >> 1 regenerations. Clearly, the stopping time ~ t for large t . Then with n written as the random variable N (t ) , equation 6 becomes EN exp{θt − N (t ) ln M (θ )} ~ 1 or (7) ln E N exp {− ln M (θ ) N (t )} ~ −θ t Comparing this with equation 5, we see that it is identical if we make the choice θ = −η D , when we must have − ln M (−η D) = ln E exp(η F ) or M (−η D) = 1/E exp(η F ). (8) The value of D can be found by solving equation 8 by Newton-Raphson iteration, which only requires differentiation of M . When cycle length is fixed at τ , M (−η D ) = exp(−η Dτ ) and we regain equation 4. As η → 0 , writing mean cycle length as l , M (−η D) → 1 − η l , 1/E exp(η F ) → 1 − η E ( F ) , and D → E ( F ) /l , the mean cost per cycle divided by mean cycle length. 3. Example: a standby system To illustrate the use of this methodology, consider a standby system such as a lifevest, that ‘waits to work’. There is a cost of inspection ci , and a cost of replacement or repair to ‘good as new’ of cr . We assume that inspection and replacement take negligible time. Failure is undetected until next inspection, and there is a cost per unit ‘downtime’. It is probably most realistic to consider that there is a probability pt of a disaster occurring during a downtime t , where pt << 1 . A large cost c f is incurred only if a disaster occurs; the lifevest is then needed and loss of life occurs because it is non-functional. A regeneration cycle ends with repair or replacement. A comprehensive reference to the non-statistical aspects of inspection models for standby systems is chapter 8 of Nakagawa (2005), who derives an inspection policy for a standby electric generator, when repair time is not negligible. The earliest work is by Barlow (1965), who derived inspection times Ti to minimize the total cost of checking and replacing a unit, plus the cost of undetected failure, taken as proportional to the ‘down’ time, over the life of the unit. Another important early reference is Keller (1973). When η → 0 the risk-averse policy tends to the minimum cost per unit time policy, which we first derive: let the lifetime distribution of the item (e.g.lifevest) have survival function S ( x) , with mean µ = 64 ∫ ∞ 0 S (u ) du and let inspections occur at intervals T .The probability that the cycle length is mT is S ((m − 1)T ) − S (mT ) and the mean cycle length l = T ∑ ∞ m=0 S (mT ) . The mean cost per cycle is E ( F ) = ci l + c r + c f pE (T f ) , where the downtime T f is the period from a failure at (m − 1)T + u , after the ( m − 1) th inspection at (m − 1)T to the next inspection. Then ∞ ∞ E (T f ) = ∑ ∫ f ((m − 1)T + u )(T − u )du = T ∑ S (mT ) − µ , m =1 on integrating the pdf f T 0 m=0 of failure by parts. Hence E ( F ) / l = pc f + ci / T + (cr − µpc f ) / l . Note that there will be an optimum value of T (9) to minimise cost per unit time if cr < µpc f , so that the expected cost of failure with no maintenance exceeds the cost of replacing the item. A more intuitive derivation of this cost is given in Baker (2007). Turning to the risk-averse solution, this parallels the previous derivation, but now we need the generating function of cycle length and E exp(η F ) instead of E ( F ) . From the cycle length probabilities S ((m − 1)T ) − S (mT ) , we have that ∞ M (− s ) = exp(− sT ) − (1 − exp(− sT ))∑ S (mT ) exp(− msT ). (10) m =1 Conditioning on a cycle length of m , the survival function becomes S (( m −1)T + u ) − S ( mT ) S (( m −1)T ) − S ( mT ) for 0 ≤ u < T , and the expected downtime T E (T | m) = S ((m − 1)T ) − ∫ S ((m − 1)T + u ) du The expected cost of inspections is mci . We have 0 S ((m − 1)T ) − S (mT ) { . } E exp(ηF ) m = exp(η (c r + mc i ) 1 − pE (T m) + pE (T m) exp(ηc f ) Removing the conditioning on cycle length, ∞ E exp(η F ) = ∑ exp(η (cr + mci )){S (m − 1)T ) − S (mT ) m =1 T + (exp(η c f ) − 1) p ( S ((m − 1)T )T − ∫ S ((m − 1)T + u ) du )}. 0 The certainty-equivalent cost D per unit time may now be found from equation 8. Figure 1 shows the optimum value of the inspection period T as risk aversion η increases, for a standby system where ci = 1, cr = 5, c f = 1000 and p = 0.01 . The failure time distribution is Weibull, so that the survival function is S ( x) = exp(−(α x) β ) , and here α = 1, β = 2 . 65 Figure 1. How optimum inspection period T decreases as the risk aversion parameter η increases for maintenance of a standby system. Although the optimum policy that minimises cost per unit time has T = 0.994 , figure 1 shows that the optimum inspection period decreases with increasing η in a way that looks exponential. The very high cost of a lowprobability failure drastically changes the optimum inspection period. Figure 2 illustrates the way that this methodology might be used by practitioners. The minimum cost per unit time policy is preferable to a policy with shorter inspection interval when there is no risk aversion, but the certainty-equivalent cost becomes greater with the longer inspection interval as risk aversion increases. Figure 2. Certainty equivalent cost per unit time D as a function of risk aversion parameter η for two inspection policies for maintenance of a standby system: T = 0.994 (the optimum policy at zero risk aversion) and T = 0.5. Clearly, it is unrealistic to ask engineers or managers ‘what is the risk aversion parameter of your utility function?’. There are gambling questions that can elicit the degree of risk aversion less baldly. For example, consider a bet where one has probability p > 1/ 2 of winning some large amount X . For a given value of p , how much money would one be prepared to bet? The expectation of the exponential utility function is 66 1 + (1 − 4 p(1 − p ))1 / 2 . 2(1 − p) By considering ‘bets’ as closely related to the subject area as possible, an appropriate value of η for management could be imputed. However, determination of utility function parameters is difficult. A cynical colleague has commented that the trick is not to ask too many questions! Instead of this, figure 2 could be plotted for several different maintenance policies, or for a proposed new policy and for an existing policy where the certainty-equivalent cost per unit time is estimated from cost data, without the necessity of building a model. A policy may be desirable in terms of mean cost per unit time, but it should also be robust in that D should not increase rapidly as the degree of risk aversion increases. unchanged (one would be indifferent between betting and not betting) when exp(ηX ) = 4. Conclusions This article presents ongoing work in the arena of risk-averse maintenance. The aim here has been to present the methodology for practitioners, by summarising the necessary mathematics, and by suggesting the use of a graphical aid (figure 2) in choosing a maintenance policy. It may well be difficult to decide exactly what the value of the risk aversion parameter η should be. Thought experiments in which for example gambling tasks are contemplated can help here. However, the main suggestion here is that the certainty-equivalent cost of various maintenance policies should be plotted against the degree of risk aversion. It can then be seen which policies are not robust, in that their cost increases dramatically under moderate risk aversion. References Barlow, R. E. and Proschan, F. (1965) Mathematical Theory of Reliability, Wiley, New York Baker, R. D. (2007) Inspection Policies for Reliability, Encyclopedia of Statistics in Quality and Reliability, Wiley Baker, R. D. (2006) Risk aversion in maintenance: overmaintenance and the principal-agent problem, IMA J. of Management Mathematics, 17 (2), 99 - 113 Baker, R. D. (2006) Risk aversion in maintenance, International J. of Polish Academy of Sciences ’Maintenance and Reliability’, 14-16 Chun, Y. H. and Tang, K. (1995) Determining the optimal warranty price based on the producer’s and customers’ risk preferences European J. of Operational Research, 85 (1), 97-110 Dohi, T., Watanabe, A. and Osaki, S. (1994) Risk averse newsboy problem, Rairo-recherche operationnelle-operations research, 28 (2), 181-202 Dionne, G. and Mounsif, T. (1996) Investment under demand uncertainty: the newsboy problem revisited, Geneva papers on risk and insurance theory, 21 (2), 179-189 Eeckhoudt, L., Gollier C. and Schlesinger H. (1995) The risk-averse (and prudent) newsboy, Management Science, 41 (5), 786-794 Keller, J. B. (1973) Optimum checking schedules for systems subject to random failure, Management Science, 21, 256-260 Keren, B. and Pliskin, J. S. (2006) A benchmark solution for the risk-averse newsvendor problem, European J. of Operational Research, 174, 1643-1650 Nakagawa, T. (2005) Maintenance Theory of Reliability, Springer, London, 201-229 Padmanabhan, V. and Rao, R. C. (1993) Warranty policy and extended service contracts: theory and an application to automobiles, Marketing Science, 12 (3), 230-247 67 Stochastic demand patterns for Markov service facilities with neutral and active periods1 Attila Csenki School of Computing and Mathematics, University of Bradford, Bradford BD7 1DP, UK a.csenki@bradford.ac.uk * 1. This conference paper is a shortened version of a paper currently under review with a journal. Abstract: In an earlier paper, Csenki (2007), a closed form expression was obtained for the joint interval reliability of a Markov system with a partitioned state space S = U ∪ D , i.e. for the probability that the system will reside in the set of up states U throughout the union of some specific disjoint time intervals. These deterministic time intervals formed a demand pattern specifying the desired active periods. In the present paper, we admit stochastic demand patterns by assuming that the lengths of the time intervals, that is the active periods, as well as that of the neutral periods, are random. We explore two mechanisms for modelling random demand: (1) by alternating renewal processes; (2) by sojourn times of some continuous time Markov chain with a partitioned state space. The first construction results in an expression in terms of a revised version of the moment generating functions of the sojourns of the alternating renewal process. The second construction involves the probability that a Markov chain follows certain patterns of visits to some groups of states and yields an expression using Kronecker matrix operations. The model of a small computer system is considered to exemplify the ideas. 1. Introduction In an earlier paper, Csenki (2007), a closed form expression was introduced for the joint interval reliability of systems modelled by a finite state Markov process X. The system's state space was partitioned into up and down states, S = U ∪ D , and of concern in Csenki (2007) was the probability that X will reside in the set of up states U throughout the union of a given disjoint set of k time intervals I λ = [θ λ ,θ λ + ς λ ] , λ = 1,..., k . The time intervals were termed a demand pattern specifying the system's active periods and they were taken to model customer expectations. X was termed the supply process. This paper is a continuation of work reported in Csenki (2007), building upon the result therein. It stems from the recognition that in most cases the demand pattern is best described by a stochastic rather than a deterministic process. The systems modelled by the present paper are Markovian service facilities subject to random demand where the result of the service cannot be stored and is lost if not consumed immediately. Such systems are, for example, wind farms or resourcesin organizations where resource unavailability results in immediate loss of custom. Queueing or storage is not possible. The `commodity' produced is continuous: it is simply the instantaneous uptime or availability of the facility. The Markovian supply is described in Sect. 2.1. The demand patterns comprise two kinds of random intervals interlaced: neutral and active periods. We introduce in Sect. 2.2 two alternative modes of constructing demand patterns: Scheme 1 is based on alternating renewal processes, and, Scheme 2 is based on the sojourn times of a finite irreducible Markov chain alternating between some subsets of states N (`neutral') and A (`active'). (The latter construction is built, in a sense, `to match' the supply process.) In Sect. 2.3, we define the quality of service to be the measure of compliance C (k ) as the probability that the facility will meet demand in the first k active periods. In Sect. 3, we give closed form expressions for C (k ) . For Scheme 1, the result will be represented in terms of the revised moment generating function. For Scheme 2, C (k ) is an expression involving Kronecker products of various submatrices of the rate matrices of both Markov processes modelling supply and demand. Several special cases will be elaborated upon in the presentation. In Sect. 4, the theory is applied to analyze the Markov model of a small computer system from Muppala et al. (1996). 2. Model construction 2.1. The supply process The supply is modelled by an irreducible Markov chain X = { X (t ), t ≥ 0} whose finite state space S = U ∪ D is partitioned into the set of up states U and its complement D , the set of down states. The system's transition rate matrix Λ is partitioned thus Λ UD ⎞ ⎛Λ ⎟⎟ Λ = ⎜⎜ UU (1) ⎝ Λ DU Λ DD ⎠ X is started according to some probability vector α at time zero. This is a row vector of length S . 68 2.2. The demand process 2.2.1. Modelling customer requirements: neutral and active periods In Csenki (2007) we have introduced the notion of a demand pattern as a finite sequence of finite time intervals during each of which the facility is expected by the customer to be in a definite subset of the state space S . The intervals were assumed deterministic in Csenki (2007). Now we examine the case when the demand pattern comprises random time intervals. Random demand patterns are formed by interlacing neutral and active periods: we start with a neutral period of random length ξ1 , followed by an active period of random length ζ 1 , followed by a neutral period of random duration ξ 2 etc. During a neutral period, there is no expectation concerning the facility from the customer side, the supply process can be anywhere in S . During an active period the supply process is hoped by the customer to be in the set of up states U . The lengths of the neutral and active periods are respectively denoted by ξ1 , ξ 2 ,... and ς 1 , ς 2 ,... . These random variables form the demand process W and they are assumed independent of the supply process X .1 We consider three modelling alternatives. 2.2.2. Scheme 0: deterministic demand This case was considered earlier (Csenki (2007)): the interval lengths ξi and ζ i are deterministic and so are the 2k interval endpoints: λ−1 λ i =1 i =1 θ λ = ∑ (ξi + ς i ) + ξ λ , ωλ = ∑ (ξi + ς i ) , λ = 1,..., k . (2) 2.2.3. Scheme 1: alternating renewal process W may be assumed an alternating renewal process for modelling periodic customer requirements comprising stochastically identical independent cycles. The intervals’ durations are the independent sequences ξ = (ξ1 , ξ 2 ,...) and ς = (ς 1 , ς 2 ,...) , each comprising independent and identically distributed random variables. 2.2.4. Scheme 2: Markov sojourn times To allow for the possibility of dependence between cycle lengths, here it is assumed that the random variables ξ1 , ξ 2 ,... and ς 1 , ς 2 ,... are the respective sojourn times of an irreducible Markov process Y = {Y (t ), t ≥ 0} in the disjoint subsets N , A ⊂ T , where T = N ∪ A is the finite, partitioned state space of Y . The transition rate matrix of Y is denoted by Ψ ; it is in a partitioned form ⎛Ψ Ψ = ⎜⎜ NN ⎝ Ψ AN Ψ NA ⎞ ⎟. Ψ AA ⎟⎠ (3) Y alternates between N and A indefinitely, thereby generating the sequences of neutral and active periods. Y is started at time zero in N according to some probability (row) vector β N . The overall initial probability vector of Y is therefore β = (β N ,0,...,0) = (β N ,0 A ) . (4) The sojourn times of Y in A define the active periods; they are the time intervals I λ = [θ λ , ωλ ] . We add in passing that the random variables forming the sequences ξ and ς are phase type distributed, and, in general, dependent. If the subsets N and A are entered into by Y always through the same respective states, ξ and ς define an alternating renewal process of phase type distributed random variables. This is then a special case of Scheme 1 from Sect. 2.2.3. For recent applications of phase type distributions in the reliability setting, refer to Montoro-Cazorla & Perez-Ocon (2006), Perez-Ocon & Montoro-Cazorla (2004a,b) and PerezOcon & Ruiz Castro (2004). 2.3. The compliance measure C (k ) We will evaluate in Sect. 3 expressions for the probability of supply meeting demand during the first k active periods, i.e. for k C (k ) = P( X (t ) ∈ U for all t ∈ Υ[θ i , ωi ]) . i =1 69 (5) C (k ) is interpreted as a measure of the supply complying with demand. 3. Model analysis 3.1. Scheme 0 In this case, C (k ) is the joint interval reliability over the deterministic intervals [θ i ,θ i + ς i ] , i = 1,..., k . From our earlier work we know that this is k C (k ) = αexp(ξ1Λ )I SU exp(ς 1ΛUU )∏{IUS exp(ξ λΛ )I SU exp(ς λΛUU )}1U . (6) λ= 2 3.2. Scheme 1 Because of the independence of supply and demand and because of the independence of the random durations constituting the demand process itself, (6) can be integrated termwise to give C (k ) = αM ξ ( Λ )I SU M ξ ( ΛUU ){IUS M ξ ( Λ )I SU M ς ( ΛUU )}k −11U , (7) where M κ (Z) = E (exp(κZ)) (8) stands for the revised moment generating function (mgf) of the random variable κ . In contrast to the usual practice, in (8) the matrix exponential is used and Z , the argument of M κ , is a square matrix. The special cases where the demand periods are Gaussian, shifted Erlang or two-phase Coxian will be specifically addressed in the presentation as they will be used in the application. 3.3. Scheme 2 3.3.1. Computation of C The combined, bivariate process Z (t ) = ( X (t ), Y (t )) models the interaction between supply and demand. It has state space S × T , the Cartesian product of the individual state spaces. And, because of the independence of the component processes, Z is a Markov process with transition rate matrix Γ = Λ ⊕ Ψ = Λ ⊗ ITT + I SS ⊗ Ψ , where ⊕ and ⊗ stand, respectively, for the matrix operations Kronecker sum and Kronecker product (Graham (1981), Keller & Qamber (1988)). From the many results for these operations, we need the ones for representing blocks of matrices thus generated: for any two square matrices Λ and Ψ of respective size S × S and T × T , it is for S1 , S 2 ⊂ S and T1 , T2 ⊂ T , ( Λ ⊗ Ψ) S1 ×T1 S 2 ×T2 = Λ S1 S 2 ⊗ ΨT1 T2 , (9) ( Λ ⊕ Ψ ) S1 ×T1 S 2 ×T2 = Λ S1 S 2 ⊗ IT1 T2 + I S1 S 2 ⊗ ΨT1 T2 . (10) The initial probability vector of Z is α ⊗ β ; this is a row vector of length S × T . By (4), the A -entries of the initial probability vector of Z are zero. In the presentation we shall indicate that the quantity of interest, C (k ) , is expressible in terms of an event involving Z . Then, after some work, the following formula is obtained for C (k ) , C (k ) = −(α S ⊗ β N )(Γ −EE1 Γ EF Γ −FF1 Γ FE ) k −1 Γ −EE1 Γ EF (1U ⊗ 1 A ) , (11) Γ EE = Λ SS ⊗ I NN + I SS ⊗ Ψ NN , Γ EF = I SU ⊗ Ψ NA , Γ FE = IUS ⊗ Ψ AN , Γ FF = ΛUU ⊗ I AA + IUU ⊗ Ψ AA , Γ FG = IUD ⊗ Ψ AA . (12) where (13) (14) (15) (16) The assumed initial condition (4) may appear restrictive in that demand patterns starting with an active period are disallowed. It will be indicated in the presentation that this restriction on Y can be overcome. 70 Two examples for demand patterns of the type discussed here will be spelt out in more detail in the presentation: we call them rest-then-work and work-then-rest as they evoke these respective behavioural patterns. 4. Application: a small computer system 4.1. The Markov system The Markov model of a computer system comprising three workstations connected to a file sever via a computer network will be considered here; the model (in its basic form) stems from Muppala et al. (1996). System components fail independently of each other. Each of the workstations fails at a rate of ρ w , whereas the failure rate of the file server is ρ f . The computer network is very reliable and it may be assumed that it never fails. There is a single repairman to look after the system. The repair rate of a workstation is µ w . Repair of a failed workstation commences immediately upon its failure if the repairman is not busy. In the latter case, he starts repairing the waiting workstation as soon as he has finished repairing the other item. It is assumed that the system is functional if at least one of the workstations and the file server are in the up state. Furthermore, a component cannot fail while the system is in the down. The states of the modelling process X will be denoted by pairs (i, j ) ; i = 0,1,2,3 refers to the number of workstations in the up state, j indicates the state of the file server: zero for ‘down’, one for ‘up’. As components cannot fail during system down time, the state (0,0) is not reachable from any of the other states; it will therefore be ignored. Repair of the file server enjoys priority. This policy becomes operational if the fileserver fails while one of the workstations undergoes repair while at least one of them is still functional. (This corresponds to the transition (i,1) → (i,0) , i ∈ {1,2} .) Then, the ongoing repair of the workstation is abandoned and resumed later (from scratch) once the file server has been repaired. The system's transition rate diagram is shown in Fig. 1. (Down states are indicated by dashed lines.) Notice that failure rates (denoted by ρ ) are associated with transitions where one of the entries in the pair is decremented. Repair rates (denoted by µ ) are associated with transitions where one of the entries in the pair is incremented. The system’s transition rate matrix Λ is easily constructed from Fig. 1. Initially, all components are up, i.e. α = (1,0,...,0) . The parameters’ numerical values are adopted from Muppala et al. (1996). 4.2. Demand patterns We examine the demand process {(ξ1 , ς 1 ), (ξ 2 , ς 2 ),...} under various distributional assumptions. For reasons of comparison, the respective mean periods will be the same under all schemes. Two scenarios will be considered: 1. The interval lengths are random but ‘almost’ deterministic and therefore duration variances are small. The supplier can reasonably ‘guess’ when and for how long the sevice will be in demand. The Schemes 0, 1 (G) and 1 (E) (see below) will be used to model this situation. 2. There is more uncertainty here about the durations of neutral and active periods. Schemes 1 (C) and 2 will be used to model this situation. Scheme 2 will allow also dependence of active and neutral periods to be modelled. 71 Figure 1. Markov model of the computer system Scheme 0. Neutral and active periods are deterministic and of lengths 8 and 16 hours, respectively. Expression (6) applies. Scheme 1. Neutral and active periods are interlaced to form an alternating renewal process. Expression (7) applies. We consider three cases. (Gaussians) Neutral and active durations are normally distributed with respective means a and (G) 2a . Their variances, denoted by σ 2 and τ 2 , may be set, for example, to the values 1hr 2 and 2 hr 2 , respectively, so as to model small uncertainties in the periods’ durations. The parameter a will be set to 8 hr . (E) (Shifted Erlangs) Neutral and active durations are shifted Erlang distributed with respective set of parameters: ni exponential stages, each with rate µi , and the resulting Erlang distribution then shifted by ai , i = 1,2 . Choose the parameters such that means and variances match those in Scheme 1 (G); for example, E (ξi ) = a1 + n1 / µ1 = 8 hr , Var (ξi ) = n1 / µ12 = 1 hr 2 , E (ς i ) = a2 + n2 / µ 2 = 16 hr , Var (ς i ) = n2 / µ 22 = 2 hr 2 . A set of parameters conforming to (17) and (18), respectively, is a1 = 6 hr , n1 = 4 , µ1 = 2 / hr , a2 = 14 hr , n2 = 2 , (17) (18) (19) µ 2 = 1 / hr . (20) (C)(Two-phase Coxians) Neutral and active durations are two-phase Coxian distributed. They model large random variatons in the intervals’ durations. Scheme 2. The rest-then-work demand pattern will be discussed here. 4.3. Implementation and numerical results In the conference presentation unreliabilities will be used: these are the complementary probabilities 1 − C (k ) . The numerical results will be shown and discussed there. They were obtained by SCILAB (Campbell et al. (2006), Pincon (2003)), a public domain software for numerical work with emphasis on matrix representations. 5. Conclusions We have developed formulae for the probability that a system modelled by a Markov process with a partitioned state space will be in the set of up states throughout the first k active periods where these are stochastic and are generated either by an alternating renewal process, or, are sojourn times of another Markov process with a partitioned state space. Potential future developments will be discussed in the presentation. 72 Acknowledgements I thank Dr. David Jerwood, Head of Mathematics at the University of Bradford, for the financial assistance which allowed me to attend the IMA conference MIMAR 2007 and to present this paper there. References Campbell, S. L., Chancelier, J.-Ph. and Nikoukhah, R. (2006) Modeling and Simulation in Scilab/Scicos, Springer, Heidelberg, New York Csenki, A. (2007) Joint interval reliability for Markov systems with an application in transmission line reliability, Reliability Engineering and System Safety, 92, 685-696 Graham, A. (1981) Kronecker products and matrix calculus with applications, Ellis Horwood/Wiley, Chichester, New York Keller, A. Z. and Qamber, I. S. (1988) System Availability Synthesis. In: Tenth Advances in Reliability Technology Symposium, Proceedings of the 10th Advances in Reliability Technology Symposium, University of Bradford, 6-8 April 1988, (ed. Libberton, G. P.), 173-188. Elsevier Applied Science, London, New York Montoro-Cazorla, D. and Perez-Ocon, R. (2006) A deteriorating two-system with two repair modes and sojourn times phase-type distributed, Reliability Engineering and System Safety, 91, 1-9 Muppala, J. K., Malhotra, M. and Trivedi, K. S. (1996) Markov dependability models of complex systems: analysis techniques, In NATO Advanced Science Institute Series, Series F: Computer and System Sciences 154, Proceedings of the NATO Advanced Study Institute on Current Issues and Challenges in the Reliability and Maintenance of Complex Systems, Kemer-Antalya, Turkey, 12-22 June, 1995 (ed. S. Ozekici), Springer, Berlin, Heidelberg, 442-486 Perez-Ocon, R. and Montoro-Cazorla, D. (2004a) A multiple system governed by a quasi-birth-and-death process, Reliability Engineering and System Safety 84, 187-196 Perez-Ocon, R. and Montoro-Cazorla, D. (2004b) Transient analysis of a repairable system, using phase-type distributions and geometric processes, IEEE Transactions on Reliability 53, 185-192 Perez-Ocon, R. and Ruiz Castro, J. E. (2004) Two models for a repairable two-system with phase-type sojourn time distributions, Reliability Engineering and System Safety 84, 253-260 Pincon, B. (2003) Eine Einfuhrung in Scilab, (translated from the French by Jarausch, H.), Insitute Elie Cartan Nancy, Universite Henry Poincare, France, 2003. http://www.scilab.org/publications/JARAUSCH/PinconD.pdf 73 Multicriteria decision model for selecting maintenance contracts by applying utility theory and variable interdependent parameters Anderson Jorge de Melo Brito, Adiel Teixeira de Almeida Federal University of Pernambuco, Cx. Postal 7462, Recife – PE, 50.630-970, Brazil anderson@ufpe.br, aalmeida@ufpe.br Abstract: The contracts selection is a very important stage for the process of maintenance outsourcing in the current trend towards reducing cost and increasing competitiveness by focusing on core competences. The prominence of this theme can be seen in many studies carried out on outsourcing and maintenance contracts, most of which dealing with qualitative aspects. However, quantitative approaches, such as Multicriteria Decision Aid, play an important role in helping decision makers to deal with multiple and conflicting criteria and uncertainties in the selection process for outsourcing contracts. In this context, several decision models have been developed, using Utility Theory, ELECTRE and other multicriteria methods. This paper presents a multicriteria methodology to support the selection of maintenance contracts in a context where information is imprecise, when decision makers are not able to assign precise values to importance parameters of criteria used for contract selection. Utility Theory is combined with the Variable Interdependent Parameters Method to evaluate alternatives through an additive value function regarding interruption time, contract cost and candidate’s dependability. To illustrate the use of the model, a numerical application with VIP Analysis software is presented. 1. Introduction Following the wider trend of outsourcing non-core-competences as a strategic policy, several companies are nowadays establishing contracts with external firms to perform repair and maintenance services on their systems. The main objective of maintenance outsourcing is, in general, to increase the availability of such systems through a better maintenability structure offered by specialized staff, at costs lower than those related to the use of in-house professionals. Therefore, the process of selecting maintenance service outsourcing, as well as supplier selection, often presents a conflict between assessment criteria, and a conflicting and uncertain performance of contract alternatives on these criteria. These factors feature the selection of maintenance and repair contracts as a multiple criteria decision-making process (Weber et al. (1991), Huang & Keskar (2007)). However, as Almeida (2005) pointed out, most studies found in the literature approaches this theme, as well as the supplier selection problem, using qualitative aspects (see for example, Kennedy (1993)). Few papers have analysed these problems by exploring a quantitative approach with Multicriteria Decision Aid (Keeney & Raiffa (1976), Vincke (1992)). For choice of supply contracts, Ghodsypour & O’Brien (1998) have presented an Analytic Hierarchy Process and Linear Programming-based model in order to consider qualitative and quantitative factors in supplier selection. De Boer et al. (1998) addressed the use of outranking multicriteria methods for such kind of problems, and discussed the advantages of multicriteria methods in relation to traditional decision models. Huang & Keskar (2007) have presented a set of configurable metrics to assess supplier performance and to help decision-making methodologies for choice of suppliers. Regarding maintenance contract selection, little work has been conducted exploring a multicriteria decision making approach. Almeida (2001) has presented multicriteria decision models based on Multiattribute Utility Theory (Keeney & Raiffa, 1976) for selecting repair contracts, which aggregate interruption time and related cost through an additive utility function. A different approach can be found in Almeida (2002), where the ELECTRE I method has been combined with utility functions regarding a repair contract problem. In this paper, we consider the use of a multicriteria decision approach through an additive value function to help decision makers find the most preferred alternative in selecting a maintenance contract. Additive value functions are a widespread and well-known approach in problems concerning ranking and choice, according to multiple criteria or attributes (Dias & Clímaco (2000)). However, when the set of criteria to assess available alternatives is established, it can be seen that decision makers may not only find it difficult to provide precise information about their preferences, but these preferences may change at any point during the decision making process. In many cases, the elicitation procedures of parameters of criteria importance may take more time and require more patience than decision makers are willing to provide (Dias & Clímaco (2000)). This paper presents a multicriteria model to support decision making in selecting maintenance and repair contracts under the situation mentioned. Utility functions, to reflect decision makers’ preferences and their behavior regarding risk, are combined in an additive value function with imprecise information on the scaling parameters of criteria. To illustrate the model, a numerical application with VIP Analysis, a decision support tool developed for a variable interdependent parameters approach (Dias & Clímaco (2000)), is performed. 74 2. The problem analysed The problem under study deals with selecting outsourced maintenance contracts for a repairable system. It is being widely recognized that the contract cost must no longer be taken as the only aspect to guide decisions on contract selection. In industrial and mainly in service systems, credibility, delivery time, customer satisfaction, quality and other non-monetary aspects are often affected by system availability, and they play an important role in a company’s competitiveness. Hence, companies must be concerned with all aspects which may influence the availability of their system, so contract selection must take into consideration multiple and often conflicting objectives. In this process, the decision maker (DM) faces several options for maintenance contracts, each implying different system performances and related costs. The DM has to choose the option most preferred, the one with the best combination of contract conditions (Almeida (2002)). These conditions may vary depending on the company’s market and strategy, and may involve: delivery speed or response time, quality, flexibility, dependability and obviously, cost (Slack & Lewis (2002), Almeida (2005)). The set of actions, represented here as A, corresponds to the set of all maintenance contract alternatives available to the decision maker. Assuming that A is composed for n alternatives (n>1), then A is a discrete set represented by A={a1, a2, a3,…, an}, where ai is any contract option in A. Each alternative ai presents a contract cost and other performances on the set of criteria considered. Almeida (2001, 2002) has modeled this problem by considering response time and contract cost, and using different methods and different probabilistic assumptions for response time. The decision model proposed here includes three basic criteria: interruption time, applicant’s dependability considering a contract option and contract cost. It is assumed here that these three criteria are satisfactorily comprehensive to allow the decision maker to assess alternatives and quantify the consequences of choosing any action ai in A. The interruption time, here represented by TI , corresponds to the interval when system is not working due to repair or other maintenance activities. It is related to system availability. Assuming that reliability is an inherent project feature of a company’s equipment, than it is assumed to be the same for each alternative ai . So the alternative’s maintenability structure, represented by TI , is the discriminating factor for availability. TI is related to the speed of a repair facility for each alternative ai (Almeida (2005)). It is influenced by staff training, the applicant’s repair facilities and spares provisioning. In this paper, it is assumed that administrative delay time TD (Almeida (2001)) is negligible in relation to time to repair TTR, so we consider TI = TTR. For each contract alternative ai ,TI is assumed to be a random variable. The uncertainty related to TI will be incorporated through a probability density function fi(ti ) for each contract alternative ai. The dependability criterion is used to assess contract alternatives in relation to “deadlines” being met. It is a measure related to keeping delivery promises (Slack & Lewis (2002), Almeida (2005)). Assuming that TI = TTR dependability will be represented by the probability di of achievement of time to repair under a specified probability distribution, as undertaken in the contract proposal related to ai (Almeida (2005)). A contract cost ci, presented by each applicant, represents the remuneration of the maintenance structure that will be available to the contracting company. The cost ci is appraised for a period of time during which repair activities are under warranty of being performed according to contract conditions associated with TI. Therefore, each contract alternative ai may have its performance represented by a multi-dimension vector comprising: a parameter (or parameters) related to the probability density function of the interruption time ti; a probability di of achievement of interruption time as undertaken in the contract proposal; and a contract cost ci . For each alternative ai, ci is assumed to be a fixed value, which is the cost of the contract proposal i. The three criteria presented in this model may be conflictive among contract alternatives. Usually, lower interruption times (in this model, times to repair) are related to better resource conditions, better spares provisioning and higher professional skills, and they often imply higher costs. Besides, the dependability of the alternative is not directly related to the proposal conditions associated with interruption time, but it is assessed by the contracting company taking into consideration other aspects such as the applicant’s reputation, previous services, the structure of repair facilities etc. This problem is analysed by means of a multicriteria decision model. The model seeks to be adapted to the decision maker’s inability to fix constant values for criteria “weights” that must translate not only the importance of criteria, but also compensation rates between criteria in additive value functions (Vincke (1992)). This flexibility in modeling is obtained by using the Variable Interdependent Parameters approach, presented by Dias & Clímaco (2000) and discussed as follows. 3. The Variable Interdependent Parameters method The Variable Interdependent Parameters (VIP) is a compensatory method where the coefficients of an additive value function (scale constants or “weights”) are treated as interdependent parameters subject to constraints imposed by the decision maker’s preferences structure (Dias & Clímaco (2000)). Unlike MAUT’s additive 75 utility function (Keeney & Raiffa (1976)), the logic of the VIF Method considers the assessment of the alternative as being not only a function of its performance on the criteria but also as a function of the coefficients of the criteria. This evaluation is made by means of an additive value function which takes the coefficients of the criteria as variables, by assessing all possible combinations of these parameters within a vectorial space, a “weights space” allowed by the decision maker. The VIP method explores the combinations of all parameters for what the decision maker has expressed indifference through a set of constraints. The VIP’s additive value function is obtained as in equation (1); V (ai , k ) = n ∑ k j u j ( g ij ) (1) j −1 Where ai is any alternative of the decision problem, k = (k1, k2, …, kn) is a point in the decision set K (the space of coefficients informed by the decision maker) and uj(gij) is the utility of the performance of alternative i in criterion j. Imprecise information in this method are related to the criteria coefficients k = (k1, k2, …, kn), assuming that K is bounded by linear constraints. These constraints imposed by the decision maker may bind criteria coefficients through upper and lower limits for coefficient values, through a ranking of these parameters or through restrictions on criteria trade-offs. Then, to assess the global performance of an alternative, the VIP method makes use of four approaches that complement each other in order to obtain rich and robust conclusions supporting the resulting choice. These approaches are briefly described below (Dias & Clímaco (2000)): • Approach based on optimality: Searches for the alternative presenting the best performance in the additive value function for all k ∈ K . Although it is not easy to find such an alternative, this approach helps to identify and then eliminate dominated alternatives, thus reducing the set of alternatives under analysis; • Approach based on pairwise comparison : This explores the subset of K that favour each alternative when two alternatives are compared. It helps to identify, inside the K set, the greater advantage of choosing an alternative ai in relation to another specified alternative aj. • Approach based on variation ranges: Exploring all possible parameters combinations in K, this approach permits an observation of which alternatives are most affected by coefficient variations, which avoids subsequent sensivity analysis. • Approach based on pessimistic aggregation rules: This approach helps to identify, for each alternative ai, the greatest difference within K between the global performance of ai and the global performance of all other alternatives with a higher value in the additive functions. Due to imprecise information related to criteria coefficients, an alternative ak with the lowest relative disadvantage regarding global performance may be recommended under this approach. 4. The proposed decision model Firstly, a preference and probabilistic modelling is performed for each criterion of the decision model. Applying elicitation procedures, such as described in Keeney & Raiffa (1976), a utility function for each criterion is obtained from the decision-maker. The parameters and shape of the three utility functions result from the elicitation procedures (Almeida (2005)). Using these functions, the decision-maker’s preference modeling is performed and his behavior (averse, neutral or prone) regarding risk is incorporated (Keeney & Raiffa (1976)). For interruption time and contract cost criteria, the exponential utility function has been found in Almeida (2001) for the utility functions U(ti) and U(ci), and they are given as follows; U ( t i ) = e − A1 t i U (ci ) = e − A2 c i (2) (3) Often found in practice (Keeney & Raiffa (1976)), exponential utility functions mean that higher values of consequences are much more undesirable for the decision maker than lower ones (Almeida (2002)). In this problem of maintenance contract selection, it is assumed that each alternative ai has its particular contract cost, which is a constant value. Hence, the cost criterion is evaluated directly for each alternative through expression (3). However, for the interruption time criterion, the evaluation of each alternative ai must be based on the probabilistic feature of TI. This implies that a probability density function fi(ti ) must be taken into account. In this paper, it is assumed that TTR (and consequently TI, for it has been assumed that TI =TTR) follows a Gamma distribution function, with shape parameter n=2 for any alternative. The other parameter, ui, may be different for each contract option ai. This assumption is reasonable in practical situations where interruption time is concentrated around a modal value. Thus, the p.d.f. of TI for each alternative is given as follows; 76 f ( t i ) = u i2 te − u i t i (4) Therefore, ti is not directly evaluated for each alternative ai, but the evaluation of contract alternatives on this criterion is based on parameter ui through the utility function U(ui). This is derived from U(ti) applying the linearity property of utility theory (Berger (1985)), as presented in expression (5) (Almeida (2005)); ∞ U ( u i ) = ∫ U ( t i ) Pr( t i | u i ) dt i (5) 0 In expression (5), Pr(ti| ui) is represented by fi(ti ). So, replacing (2) and (4) into (5) and solving the resulting integration it follows that U ( u i ) = u i 2 /( A1 + u i ) 2 (6) Thus, although the DM express his preferences on ti by U(ti), each alternative’s performance on this criterion is analysed through U(ui) by means of expression (6). Regarding the dependability criterion, it is assumed in this paper that there is an uncertainty related to the real value of ui and this uncertainty can be expressed through a prior probability density function on ui (Almeida (2005)).This probability, represented by π(ui), can be obtained by means of Bayesian elicitation procedures (Keeney & Raiffa (1976)). Then, for each alternative ai, di is expressed as the probability that u i ≥ u ic , where uic is the parameter value specified for ui in the contract proposal. For this kind of criterion, as pointed out by Almeida (2005), a logarithm utility function is often found in practice (Keeney & Raiffa (1976)). So, the utility of a dependability value di is given by means of expression (7) below; U ( d i ) = B 3 + C 3 ln( A3 ⋅ d i ) (7) Once U(ui), U(ci) and U(di) are obtained for each contract proposal ai, an inter-criteria assessment is performed to provide a suggestion for the best alternative according to the decision maker’s preferences. It is assumed in this paper that the decision maker presents a compensatory rationality represented by an additive value function. However, when such a type of function is adequate to get a global performance for each contract alternative, it can be seen that the decision maker may find it hard to set fixed values for criteria coefficients, and to inform the relative importance of criteria. He may feel not comfortable about saying how disposed he is to losing in one criterion in order to obtain a gain in another one. This situation is framed in a context of partial information related to the scale of criteria coefficients used in an additive model. For tackling the selection of maintenance contracts in this context, the Variable Interdependent Parameters Method is used here to support the decision process with regard to the choice of the contract proposal most preferred. So, the additive value function used to obtain a global assessment for each contract alternative is presented in expression (8); (8) V ( a i , k 1 , k 2 , k 3 ) = k 1U u I ( u i ) + k 2 U ci ( c i ) + k 3U d i ( d i ) With this expression, the global performance of each contract proposal is not only a function of its performance on interruption time, cost and dependability criteria, but it is also a function of the coefficients of the criteria. These are not fixed, but they are limited in a three-dimension space defined by preference constraints which are defined by the decision maker as an input to the model. 5. A numerical application A numerical application is presented to illustrate the use of the proposed decision model. A DM has to select a maintenance contract where 6 contract alternatives, given in Table 1, are available. Values on criteria and the utilities of performance are given for all alternatives. Utility values have been assessed after applying an elicitation procedure as described in Keeney & Raiffa (1976). From this elicitation procedure, the parameters obtained for the utility functions were: A1=0.09; A2=0.004; A3=55; B3=-4.2; C3=1.3. Once utility values for all alternatives in all criteria are obtained, a multicriteria analysis through a variable interdependent parameters approach is performed. In this application, the software VIP Analysis (Dias & Clímaco (2000)), a decision support tool that implements the VIP method, is used to undertake the calculations and to display the results. Alternative a1 a2 a3 a4 a5 a6 ui 0.9 0.75 0.75 0.5 0.7 0.3 ci 175 100 120 75 75 60 di 0.95 0.80 0.85 0.70 0.65 0.70 U(ui) 0.83 0.80 0.80 0.72 0.79 0.59 U(ci) 0.50 0.67 0.62 0.74 0.74 0.79 U(di) 0.94 0.72 0.80 0.55 0.45 0.55 Table 1. Performances of maintenance contract alternatives 77 If the decision maker felt secure about setting fixed values for the three criteria coefficients of the additive model, the global performances of alternatives could be ranked in order to obtain the one with the highest performance. However, in this problem, the decision maker has only felt able to record upper and lower limits for each criteria coefficient, and to rank these criteria, in order of importance, as listed as follows: cost, interruption time and dependability. This ranking can be expressed by k2 ≥ k1 ≥ k3. This information is shown in Table 2, and with the addition of a normalization constraint, it represents a set of constraints which places limits on a coefficients space in which robust conclusions will be sought. Criterion Interruption Time Contract Cost Dependability Coefficient k1 k2 k3 Importance Order 2o 1o 3o Lower Bound 0.25 0.40 0.10 Upper Bound 0.60 0.80 0.50 Table 2. Upper and lower bounds for criteria coefficients The information presented in Table 1 and Table 2 is then inserted into VIP Analysis software for a multicriteria assessment using the Variable Interdependent Parameters Method. Even without fixing exact values for criteria coefficients, some conclusions about the range of the global performance of alternatives can be drawn from Figure 1 below. Figure 1. Minimum and maximum values for alternatives with an imprecise additive value function (left side); and performance ranges (right side) From Figure 1, it can be seen that there are no dominated alternatives. Alternative a2, the results of which are highlighted, presents the highest minimum value of global performance within the whole coefficients space bounded by the decision maker. On the right side of Figure 1, alternative a2 is also displayed as presenting the narrowest performance range under the coefficient variation, which indicates a2 as the alternative with the most robust performance from all contract alternatives. In Figure 2, two results are shown from a pairwise confrontation and a maximum regret analysis with VIP Analysis software. In a maximum regret analysis, the set of coefficients is observed in which each alternative is dominated by the others, and the highest performance disadvantage between this alternative and all the others stands out. Figure 2. Pairwise confrontation table (left side); and maximum regret related to alternatives (right side) 78 From this analysis, it is also recommended that alternative a2 be selected, which presents the lowest maximum regret. Therefore, within a context of imcomplete information related to the criteria coefficients of a maintenance contract selection model, alternative a2 has been indicated for selection due to its having the highest minimum global value, to its having the lowest performance variability and to its having the lowest maximum regret when considering all the six alternatives for the maintenance contract. 5. Conclusions This paper has presented a multicriteria decision model for dealing with the selection of maintenance contracts within a context where the decision maker, using a compensatory rationality modeled by an additive value function, is not able to give or does not feel comfortable about giving precise information about criteria importance or criteria trade-offs. Interruption time, contract cost and dependability criteria have been used to assess contract proposals. Utility Theory has been applied for assessing the utility of the performance of alternatives in each criterion, which allowed the incorporation of the decision maker’s preferences and behavior regarding risk. The proposed model was applied using the Variable Interdependent Parameters Method and VIP Analysis software. Using some coefficient constraints given by the decision maker, the model allowed robust conclusions to be drawn about the performances of contract alternatives and to identify the best one regarding its performace variability, maximum regret and relative advantages and disadvantages within all possible combinations of criteria coefficients. Future studies may usefully explore additions of new criteria and the use of new probabilistic assumptions to model the interruption time criterion. They may also explore possible model adjustments in order to allow their application in a context of group decision making. References Almeida, A. T. (2001) Multicriteria decision making on maintenance: Spares and contracts planning, European J. of Operational Research, 129 (2), 235-241 Almeida, A. T. (2002) Multicriteria modelling for repair contract problem based on utility function and ELECTRE I method, IMA J. of Management Mathematics, 13, 29-37 Almeida, A. T. (2005) Multicriteria modelling of repair contract based on utility and ELECTRE I method with dependability and service quality criteria, Annals of Operations Research 138, 113-126 Berger, J. O. (1985) Statistical Decision Theory and Bayesian Analysis, Berlin: Springer-Verlag De Boer, L., Van Der Wegen, L. and Telgen, J. (1998) Outranking methods in support of supplier selection, European J. of Purchasing & Supply Management, 4, 109-118 Dias, L.C. and Clímaco, J. N. (2000) Additive aggregation with variable interdependent parameters; the VIP analysis software, J. of the Operational Research Society, 51 (9), 1070-1082 Ghodsypour, S. H. and O’Brien, C. (1998) A decision support system for supplier selection using an integrated analytic hierarchy process and linear programming, International J. of Production Economics, 56-57, 199-212 Huang, S. H. and Keskar, H. (2007) Comprehensive and configurable metrics for supplier selection, International J. of Production Economics, 105, 510-523 Kennedy, W. J. (1993) Modelling in-house vs contract maintenance with fixed costs and learning effects, IJPE 32 (3), 277-283 Keeney, R. L. and Raiffa, H. (1976) Decision with Multiple Objectives: Preferences and Value Trade-Offs, New York: John Wiley & Sons Slack, M. and Lewis, M. (2002) Operations Strategy, London: Prentice Hall Vincke, P. (1992) Multicriteria Decision Aid, New York: John Wiley & Sons Weber, C. A., Current, J. R. and Benton, W. C. (1991) Vendor selection criteria and methods, European J. of Operations Research, 50, 2-18 79 Spare parts planning and risk assessment associated with non-considering system operating environment Behzad Ghodrati Div. of Operation and Maintenance Engineering, Luleå University of Technology, Luleå¨, 97753- Sweden Behzad.Ghodrati@ltu.se Abstract: Spare parts needs – as an issue in the field of product support – are dependent on the technical characteristics of the product, e.g. its reliability and maintainability, and the operating environment in which the product is going to be used (e.g. the temperature, humidity, and the user/operator’s skills and capabilities), which constitute covariates. The covariates have a significant influence on the system reliability characteristics and consequently on the system failure and number of required spare parts. Ignoring this factor might cause irretrievable losses in terms of production and ultimately in terms of economy. This was proved by the event tree risk analysis method used in a new and non-standard form in the present paper. It has been found that the percentage of risk associated with not considering the system operating environment in spare parts estimation is relatively high. 1. Introduction – Product support Most industrial products and systems wear and deteriorate with use. In general, due to economical and technological considerations, it is almost impossible to design a machine/system that is maintenance-free. In fact, maintenance requirements come into consideration mainly due to the lack of proper designed reliability and tasks performance quality. Therefore, the role of maintenance and product support can be perceived as the process that compensates for deficiencies in design, with regard to the reliability of the product and the quality of the output generated by the product (Markeset & Kumar (2003)). The product support and maintenance needs of systems to a large extent are decided during the design and manufacturing phase (e.g. Blanchard (2001), Blanchard & Fabrycky (1998), Goffin (2000), Markeset & Kumar (2001) and Smith & Knezevic (1996)). The product support and service delivery performance in the operational phase can be enhanced through better provision of spare parts and improvement of the technical support system. However, to ensure the desired product performance at the lowest cost, we have to design and develop maintenance and product support concepts right from the design phase. The existing literature appears to have paid little attention to the influence of product design characteristics, influenced by the product operating environment, in the dimensioning of product support, especially in the field of spare parts planning. Therefore, spare parts needs (as an issue of product support) are dependent on the engineering characteristics of the product (reliability and maintainability), the human factors (operators’ skills and capabilities) and the environment in which the product is working. Therefore, product support specifications should be based on the design specifications and the conditions faced by the customer. The risk associated with the ignoring of the system operating environmental factors is remarkable and plays an important role in the cost of operation (Ghodrati & Kumar (2005a)). 2. Operating environment The operating environment should be seriously considered when dimensioning product support and service delivery performance strategies. Generally, the recommended maintenance program for systems and components is based on their age and condition without any consideration of the operating environment, which leads to many unexpected failures. This creates poor system performance and a higher Life Cycle Cost (LCC) due to unplanned repairs and/or restoration, as well as support. The environmental conditions in which the equipment is working, such as the temperature, humidity, dust, the maintenance crew’s and operator’s skill, the operation profile, etc., often have a considerable influence on the product failure behavior and thereby on the maintenance and product support requirement (Ghodrati (2005), Kumar et al. (1992)). Furthermore, the “Distance” in many ways (not only in geographical terms, but also in terms of infrastructure, culture, etc.) of the user from the manufacturer/supplier can exert an additional influence on spare parts management. 3. Spare parts Industrial systems and technical installations may fail and therefore repair is needed to retain in or restore them to working condition. These systems and installations are also subject to planned maintenance. In most cases, maintenance and repair require pieces of equipment to replace defective parts. The common name for these parts is spare parts. They may be subdivided into: 80 1. Repairable parts: if a repairable item has failed and a shutdown of the system is unavoidable, the user has to accept at least the time required to repair the item before the system is up again. In this situation the user only has to wait for the time that it takes to repair the item. 2. Non-repairable parts, also called consumables, which are considered and studied in the present research. In other words, we limit ourselves to non-repairable spare parts in the normal phase. If such a part fails, it is removed and replaced by a new item. Apparently, the control and management of spare parts constitute a complex matter. Common statistical models for inventory control lose their applicability, because the demand process is different from that assumed due to the machine characteristics, operating situation and unpredictable events during operation. An essential element in many models is forecasting demand, which requires some historical demand figures, which are unavailable or invalid for new and/or less consumption parts. Moreover, the shorter life cycles of products and better product quality further reduce the possibility of collecting historical data. Unfortunately, the pragmatic approaches of spare parts inventory management and control are not validated in any way, and then controllability and objectivity are hard to guarantee (Fortuin & Martin (1999)). The product reliability characteristics and operating environment based spare parts forecasting method (Ghodrati & Kumar (2005a,b), as a systematic approach, may improve this undesirable situation. The key questions in any logistic management are the following: • Which items are needed as spare parts? • Which items do we put in stock? • When do we (re)order?, and • How many items do we (re)order? Therefore, the main objective of this paper is to estimate the required number of spare parts and evaluate the associated risks (risk of shortage of spare parts due to not considering the operating environment leading to financial losses). 4. The context of spare parts logistics As mentioned earlier, spare parts are required in the maintenance process of systems. Regarded from the point of view of a spare parts supplier or systems manufacturer, we can make a distinction between two types of industrial products that require spare parts: 1. Conventional products: These are the products and systems sold to customers and installed at the customer's site for the purpose of providing products or services. These systems are under the users’ control, and are exemplified by machines in production departments, transport vehicles, TVs, computers and private cars. Mostly there is a Technical Service Department within the client location/organization performing maintenance and controlling an inventory of spare parts. In some cases a Technical Service Department of the firm that sold the system, i.e. the original equipment manufacturer (OEM), carries out the maintenance of these systems under separate contracts and conditions. 2. Functional products: In the functional products category, the user does not buy a machine/system but the function that it delivers (Markeset & Kumar (2003a)). To avoid the complexities of maintenance management, many customers/users prefer to purchase only the required function and not the machines or systems providing it. In this case the responsibility for the maintenance and product support lies with the organization delivering the required function. The diversity of the characteristics of spare parts management situations that have to be taken into account is usually quite large. Therefore, we need to categorize the entire assortment of spare parts. This can be conveniently accomplished according to the characteristics of an individual aspect, after which specialized control methods can be developed for each category (e.g. defining the criticality and the risk of shortage). The following are examples of criteria that can be used for categorization: • Demand intensity • Purchasing lead-time • Delivery time • Planning horizon • Essentiality, vitality, criticality of a part • Price of a spare part • Costs of stock keeping • (Re-)ordering costs In fact, in any given situation, not all the criteria are necessarily relevant and usable. 81 Therefore, as mentioned previously, it is wise to classify the spare parts into groups, to establish appropriate levels of control over each category. Based on the source of supply and cost, the spare parts can be classified (in our case as well) into three groups of items, A, B and C, as follows: A: Parts which can be procured overseas only and whose unit cost is very high (such as hydraulic pumps). B: Parts which can be procured overseas only and whose unit cost is not high (e.g. seals). C: Parts that are available locally (e.g. brake pads). 4.1. Estimation of the required number of spare parts The environmental conditions in which the equipment is to be operated (e.g. temperature, humidity, dust, etc.) often have a considerable influence on the product reliability characteristics (Kumar & Kumar (1992), Blischke & Murthy (2000) and Kumar et al. (1992)). In fact, the reliability characteristics (e.g. the failure (hazard) rate) of a system are a function of the time of operation and the environment in which the system is operating. The failure (hazard) rate of a system/component is the product of the baseline hazard rate λ0(t), dependent on time only, and one other positive functional term (that is basically independent of time) which incorporates the effects of operating environment factors (covariates), e.g. the temperature, pressure and operator’s skill. The baseline hazard rate is assumed to be identical and equal to the hazard rate when the covariates have no influence on the failure pattern. Therefore, the actual hazard rate (failure rate) in the Proportional Hazard Model (PHM) (Cox (1972)) with respect to the exponential form of the time-independent function, which incorporates the effects of covariates, can be defined as follows: n λ (t , z ) = λ0 (t ) exp( zα ) = λ0 (t ) exp(∑ α j z j ) (1) j =1 where, zj, j = 1, 2,…, n are the covariates associated with the system and αj, j = 1, 2, …, n are the unknown parameters of the model, defining the effects of each one of the n covariates. It has been found (Ghodrati et al. (2005b)) that in the case of Weibull distributed life time of components/system the influencing covariates change the scale parameter, η, only and the shape parameter, β, remains almost unchanged. Therefore, the scale and shape parameters after the influence of covariates are ⎧β = β 0 ⎪⎪ −1 n ⎤ β ⎡ (2) ⎨ ⎪η = η 0 ⎢exp(∑ α j z j )⎥ ⎥⎦ ⎢⎣ j =1 ⎩⎪ The required number of spare parts, however, can be calculated for exponentially distributed time to failure (constant failure rate) as (Ghodrati (2005)); N (λ t ) x 1 − P (t ) = exp(−λt ) × ∑ (3) x =0 x! where, P(t) is the probability of a shortage of spare parts (1- P(t) is the confidence level of spare part availability or service level) and N is the total number of spare parts available in period t. And while the failure time follows a Weibull distribution, the number of required spare parts can be estimated as (Ghodrati (2005)); t ξ 2 −1 t (4) Nt = + +ξ Φ −1 ( p ) MTTF 2 MTTF where, ζ is the coefficient of variation of the time to failures ( ζ = σ(T) / T ) and Φ-1(p) represents the inverse normal distribution function. 4.2. Spare parts inventory management The logistics of spare parts is very important and difficult, since the demand is hard to predict, the consequences of a stock-out may be disastrous, and the prices of parts are high. If the parts are under stocked, then the defective systems/machines cannot be serviced, resulting in lost production and consequently customer dissatisfaction. On the other hand, if the parts are overstocked, the holding costs are high. Situations where some parts have a very high inventory level and some are in shortage could be quite common. In such a service system, an efficient inventory management system is essential. The requirements for planning the logistics of spare parts differ from those of other materials in several ways: the service requirements are higher as the effects of stock-outs may be financially remarkable, the demand for parts may be extremely sporadic and difficult to forecast, and the prices of individual parts may be very high. These characteristics set pressures for streamlining the logistic system of spare parts, and with high 82 requirements for material flow, it is natural that spare parts management should be an important area of inventory research in the phases of design of technological systems and product support systems. The principle objective of any inventory management system is to achieve an adequate service level with minimum inventory investment and administrative costs. The optimum spare parts management strategy must describe what level of service is to be offered and whether the customers are segmented and prioritized in terms of service, and it must ensure the availability of parts and the quality of service at reasonable costs as a main concern in maintenance. In general terms, when designing a spare parts logistics system, the following factors at least usually have to be considered: the product-specific characteristics (e.g. the reliability characteristics), the location of customers and their special requirements, and the system/machine operating environment. There are some operational characteristics of maintenance spare parts that can be used for estimation of the spare parts need and control of the inventory. The most relevant control characteristics are: criticality, demand and value (Huiskonen (2001)). The criticality of an item is probably the first aspect that is defined by the spare part logistics practitioners. The criticality of a part is related to the consequences of the failure of a part for the process in question in the event of a replacement not being readily available. The impact of a shortage of a critical part may be a multiple of its commercial value. One practical approach is to relate the criticality to the time in which the failure has to be corrected. With respect to criticality, parts are either highly critical, medium-critical or noncritical (Huiskonen (2001)). High criticality means operationally that the need for the parts in the event of failure is immediate, and parts of medium criticality allow some leadtime to correct the failure. From the logistics control point of view, it is most essential to know how much time there is to react to the demand need, i.e. whether the need is immediate or whether there is some time to operate. The predictability of demand is related to the failure mode and process of a part, the intensity of operation and the possibilities of estimating the failure pattern and rates by statistical means. From a control point of view, it is useful to divide the parts in terms of predictability into at least two categories: parts with random failures (e.g. electronic parts) and parts with a predictable wearing pattern (e.g. mechanical parts), and the present research deals with the second category. The value of a part is a common control characteristic. 5. Risk analysis 5.1. Performance measurement Since investments in spare parts can be substantial, management is interested in decreasing stock levels whilst maximizing the service performance of a spare part management system. To assess the result of improvement actions, performance indicators (such as the fill rate and service rate) are needed. For example, sometimes the duration of the unavailability of parts is a major factor of concern, and then the waiting time for parts is a more relevant performance indicator. Performance measurement for risk represents a problem in its own right. Usually risk items are not issued, but their presence in stock is justified. In this control category, the most important factor in performance measurement is the risk of unavailability. In general, this risk can be expressed as (Fortuin & Martin (1999)); RISKi = Probability (Di > Si) × Ci (5) where, RISKi = expected financial loss due to risk item i being out of stock Di = demand for item i during its entire (or remaining) life cycle Si = initial number of items of type i in stock Ci = financial consequences if an out-of-stock situation occurs for item i In the following we will discuss in greater detail the concept of risk analysis and the risk of unavailability of spare parts when required. 5.2 Risk definition Kaplan & Garrick (1981) have discussed a number of alternative definitions of risk, including the following: • Risk is a combination of uncertainty and damage. • Risk is the ratio of hazards to safeguards. • Risk is a triplet combination of an event, its probability and its consequences. The term quantitative risk analysis refers to the process of estimating the risk of an activity based on the probability of events whose occurrence can lead to undesired consequences. The term hazard expresses the potential for producing an undesired consequence without regard to how likely such a consequence is. Therefore, one of the hazards of the spare parts inventory is the shortage of a spare part when it is required, which could produce a number of different undesired consequences. The term risk usually expresses not only the potential for an undesired consequence, but also how probable it is that such a consequence will occur. Quantitative risk analysis attempts to estimate the frequency of accidents and the magnitude of their consequences by different methods, such as the fault tree and the event tree methods. 83 In fact, maintenance plays a pivotal role in managing risks at an industrial site, and it is important that the right risk assessment tools should be applied to capture and evaluate the hazards at hand to allow a functional risk-based approach (Rasche et al. (2000)). Unplanned stoppages or unnecessary downtime will always result in a temporary upset to the operations flow and output. The cumulative unavailability of the machine (in the case of a spare parts shortage) and the beneficiation process and the added cost incurred can quickly affect the financial performance of a system. 6. Risk management Risk management is an iterative process, as shown in Figure 1 in the appendix. The successful risk management depends on a clearly defined scope for the risk assessment, comprehensive and detailed hazard mapping and a thorough understanding of the possible consequences. There are several tools and techniques available to the managers and engineers that can help to estimate the level of risk better. These may be either ‘subjective – qualitative’ or ‘objective – quantitative’, as shown in Figure 2 in the appendix. Both categories of techniques have been used effectively in establishing risk-based safety and maintenance strategies in many industries (Rasche et al. (2000)). Quantitative methods are probably ideal for maintenance applications where some data is available and decisions on system safety and criticality are to be made. Even very basic reliability analysis of maintenance data can be used effectively in determining the optimum maintenance intervention, replacement intervals or monitoring strategy. Fault Tree Analysis and Event Tree Analysis (FTA/ ETA), which are considered as semiquantitative methods, are tried and tested system safety tools originating from the defense, nuclear and aviation industries. While ETA draws the growth of an event and yields quantified risk estimates of all event paths, FTA is concerned with the identification and analysis of conditions and factors which cause or contribute to the occurrence of a defined undesirable event, usually one which significantly affects system performance, economy, safety or other required characteristics. FTA is often applied to the safety analysis of systems (IEC 1025, 1990). In the following these methods will be presented in greater detail. 6.1 Risk analysis process: As mentioned earlier briefly, the risk analysis can be accomplished through different steps as follows: 1. Define the potential event sequences and potential incidents. 2. Evaluate the incident outcomes (consequences). 3. Estimate the potential incident frequencies. Fault tree or generic databases may be used for initial event sequences. Event trees may be used to account for mitigation and post-release events. 4. Estimate the incident impacts on the health and safety, the environment and property (e.g. economy). 5. Estimate the risk. This is achieved by combining the potential consequence for each event with the event frequency, and determining the total risk by summing over all consequences. 7.1. Fault tree analysis Fault Tree Analysis (FTA) is classified as a deductive method which determines how a given system state can occur. FTA is a technique that can be used to predict the expected probability of the failure/hazardous outcome of a system in the absence of actual experience of failure (Rasmussen (1981)). This lack of experience may be due to the fact that there is very little operating experience, or the fact that the system failure/hazard rate is so low that no failures have been observed. The technique is applicable when the system is made up of many parts and the failure/hazard rate of the parts is known. The fault tree analysis always starts with the definition of the undesired event whose possible causes, probability and conditions of occurrence are to be determined. The probability of failure can be a probability of failure on demand (such as the probability that a car will fail to start when the starter switch is turned). In our case the event will be “system downtime” and is shown in the top box as a top event. The fault tree technique has been evolving for the past four decades and is probably the most widely used method for the quantitative prediction of system failure. However, it is becoming exceedingly difficult to apply in very complicated problems. 7.2 Event tree analysis An event tree is a graphical logic model that identifies and quantifies possible outcomes following an initiating event. The event tree provides systematic coverage of the time sequence of the event propagation. The event tree structure is the same as that used in decision tree analysis (Brown et al. (1974)). Each event following the initiating event is conditional on the occurrence of its precursor event. The outcomes of each 84 precursor event are most often binary (success or failure, yes or no), but can also include multiple outcomes (e.g. 100%, 40% or 0%). Event trees have found widespread applications in risk analysis. Two distinct applications can be identified. The pre-incident application examines the systems in place that would prevent incident that can develop into accidents. The event tree analysis of such a system is often sufficient for the purposes of estimating the safety of the system. The post-incident application is used to identify incident outcomes. Event tree analysis can be adequate for this application. Pre-incident event trees can be used to evaluate the effectiveness of a multi-element proactive system. A post-incident event tree can be used to identify and evaluate quantitatively the various incident outcomes that might arise from a single initiating (hazardous) event. Fault trees are often used to describe causes of an event in an event tree. Moreover, the top event of a fault tree may be the initiating event of an event tree. Note the difference in the meaning of the term initiating event between the applications of fault tree and event tree analysis. A fault tree may have many basic events that lead to the single top event, but an event tree will have only one initiating event that leads to many possible outcomes. The sequence is shown in the logic diagram in the appendix (Figure 3). 8. Case study As mentioned earlier, operation stoppages in the case of system/machine downtime are mostly due to the lack/unavailability of required spare parts. Wrong estimation of the required number of spares in the specific time horizon is one of the reasons for these events. The system/machine operating environment is an important factor which affects the function of machines. This factor also influences the maintenance and support plan of a system, and ignoring this factor is one of the most significant reasons for inaccurate forecasting of the required number of spare parts. In this paper we have attempted to analyze the risk of ignoring the effects of operating environment factors on the output of a process in the form of the system/machine downtime and loss of production. For this risk analysis we carried out mainly event tree analysis, but also applied fault tree analysis as a complementary method. Both event tree and fault tree analysis have been used in an especial and non-standard way which the organizational states and decisions as well as events and consequents changes are introduced and taken into account in the analysis. The studied cases concern the hydraulic pump of brake system of the fleet of loaders in the Choghart Iron Ore Mine in Iran. 8.1 Construction of event tree The construction of an event tree is sequential, and like fault tree analysis, is performed from the left to the right (in the usual event tree convention). The construction begins with the initiating event, and the temporal sequences of occurrence of all the relevant safety functions or events are entered. Each branch of the event tree represents a separate outcome (event sequence – as shown in Figure 4 in the appendix). The initiating event (Step 1) is usually a failure/undesired event corresponding to a release of failure/hazard. The initiating event in our case is “ignoring the product operating environment”, and the frequency of this incident was estimated from the historical records. The safety function and organizational states (Step 2) are actions or barriers that can interrupt the sequence from an initiating event to a failure/hazardous outcome (in other words, safety functions/organizational states and decisions are different state descriptions and are components of a chain of explanations). Safety functions can be of different types, most of which can be characterized as having outcomes of either success or failure with regard to demand. In our case this step comprises: a) Inadequate product support planning (organizational state) b) Inadequate/poor spare parts estimation (organizational decision) c) Shortage of spare parts when required (event) d) Excessive system/machine downtime (consequent event) e) Loss of production in the case of system downtime (consequent event) f) Economic loss in the case of loss of production (consequent event) As it is observed and also mentioned earlier, this is not a standard form of event tree analysis. This is a special form that safety function is defined as an undesired situation (state) as well, instead of state similar to barrier in the standard form of event tree analysis. Each heading in the event tree corresponds to state/event/condition (Step 5) of some outcomes taking place if the preceding event has occurred. Therefore, the probability associated with each branch is conditional and defers from one state to other (e.g. based on long/short term decision and criticality of spare parts). The source of conditional probability data in our case is historical records (e.g. daily reports from the operators, the maintenance crew at the workshop and the inventory system), 85 interview with managements of maintenance and spare parts inventory departments and experiences, which is shown upon the branches in figure 4 in the appendix. The frequency of each outcome is determined by multiplying the initiating event frequency with the conditional probabilities along each path leading to that outcome. The qualitative output shows the number of outcomes that result in the success versus the number of outcomes resulting in the failure of the protective system in a pre-incident application. The qualitative outcome from a post-incident analysis is the number of more hazardous outcomes versus the number of less hazardous ones. The quantitative output is the frequency of each event outcome. The event tree shown in the appendix (Figure 4) was developed on the bases of the existing situations and experiences of the involved people (e.g. maintenance and inventory managements) who were aware of the related consequences of events. There are 15 output branches that cover the most possible combinations of branches and cases. The upper branches represent the success (yes) connected to poor situations such as the existence of poor product support planning and/or loss of production (output), and the lower branches represent the corresponding failure (no), indicating a strong need for accurate spare parts estimation, for instance. The complete event tree analysis (Figure 4) includes the estimation of all the output branches’ frequencies. As is seen from the estimated frequencies, the sequences listed below serially have a high probability of loss (classified into two consequences groups: CRASH and HARD) related to ignoring the product operating environment factors in the dimensioning of product support and system function. CRASH : HARD : ABCDEF = 71.9712 ABCDE F = 30.8448 ABC DEF = 47.9808 ABCD EF = 17.9928 ABCDEF = 55.9776 ABC D EF = 7.7112 ABCDEF = 55.9776 ABCDEF = 5.1408 These high probability outputs mostly belong to the situation in which the operating environment has been ignored. Therefore, it is important and recommended to take this factor into consideration when estimating and managing the spare parts inventory. 8.2 Fault tree analysis A simple fault tree analysis was also carried out as a complementary method to event tree analysis in this research, to ascertain the influence of not considering the system’s working environment on the system downtime, which causes a need for repair, maintenance and consequently spare parts. As can be observed in the fault tree chart (Figure 5 in the appendix), the probability of system stoppage is influenced and controlled by the operating environmental factors. This fault tree (Figure 5) no exact probability corresponding to gates are calculated). In addition, this figure because of using only OR gates, then is not considered system level. Conclusions It has been clearly shown that the operating environment has a significant influence on the planning of product support and spare parts requirements, through product reliability characteristics (Ghodrati & Kumar (2005a,b)). Therefore, product support specifications should be based on the design specifications and the conditions faced by the customer. The remarkable influence of considering and/or ignoring the operating environment factors on the forecasting and estimation of the required spare parts is validated by the result of risk analysis. In the present paper we have performed a risk analysis of not considering the system working conditions in spare parts planning through a new and non-standard event tree and fault tree analysis. We introduced and implemented an event tree analysis in which the states of organization and managerial decisions took place in risk analysis. In other word, we used the undesired states instead of barriers in combination with events and consequents changes as a safety function in event tree analysis. Based on the results from the event tree analysis, there is a considerable risk associated with ignoring these working environment factors, which might causes irretrievable losses. References Billinton, R. and Allan, R. N. (1983) Reliability Evaluation of Engineering Systems: Concepts and Techniques, Boston, Pitman Books Limited Blanchard, B. S. (2001) Maintenance and support: a critical element in the system life cycle, Proceedings of the International Conference of Maintenance Societies, May, Melbourne, Paper 003 Blanchard, B. S. and Fabrycky, W. J. (1998) Systems Engineering and Analysis, 3rd ed., Upper Saddle River, NJ, Prentice-Hall Brwon, R. V., Kahr, A. S. and Peterson, C. (1974) Decision Analysis for the Manager, New York, Holt, Reinhardt & Winston Cox, D. R. (1972) Regression models and life-tables, J. of the Royal Statistical Society, B34, 187-220 86 Fortuin, L. and Martin, H. (1999) Control of service parts, International J. of Operations & Production Management, 19 (9), 950-971 Ghodrati, B. (2005) Reliability and operating environment based spare parts planning, PhD thesis, Luleå University of Technology, Sweden, ISSN: 1402-1544 Ghodrati, B. and Kumar, U. (2005a) Operating environment based spare parts forecasting and logistics: a case study, International J. of Logistics: Research and Applications, 8 (2), 95-105 Ghodrati, B. and Kumar, U. (2005b) Reliability and operating environment based spare parts estimation approach: a case study in Kiruna Mine, Sweden, J. of Quality in Maintenance Engineering, 11 (2), 169-184 Gnedenko, B. V., Belyayev, Y. K. and Solovyev, A. D. (1969) Mathematical Methods of Reliability, New York, Academic Press Goffin, K. (2000) Design for supportability: essential component of new product development, Research Technology Management, 43 (2), March/April, 40-47 Huiskonen, J. (2001) Maintenance spare parts logistics: special characteristics and strategic choices, International J. of Production Economics, 71 (1-3), 125-133 Kaplan, S. and Garrick, B. J. (1981) On the quantitative definition of risk, Risk Analysis, 1, 11–27 Kumar, D. and Kumar, U. (1992) Proportional hazard model: a useful tool for the analysis of a mining system, Proceedings of the 2nd APCOM Symposium, Tucson, Arizona, 6-9 April, pp. 717-24 Kumar, D., Klefsjö, B. and Kumar, U. (1992) Reliability analysis of power cables of electric loader using proportional hazard model, Reliability Engineering and System Safety, 37, 217-22 Kumar, U. D., Crocker, J., Knezevic, J., El-Haram, M. (2000) Reliability, Maintenance and Logistic Support: a Life Cycle Approach, Boston, Mass., Kluwer Academic Publishers Markeset, T. and Kumar, U. (2001) R&M and risk analysis tools in product design to reduce life-cycle cost and improve product attractiveness”, Proceedings of The Annual Reliability and Maintainability Symposium, 22-25 January, Philadelphia, 116-122 Markeset, T. and Kumar, U. (2003) Integration of RAMS information in design processes: a case study, Proceedings of the 2003 Annual Reliability and Maintainability Symposium, Tampa, FL, 20-24 January Markeset, T. and Kumar, U. (2003a) Design and development of product support and maintenance concepts for industrial systems, J. of Quality in Maintenance Engineering, 9 (4), 2003, 376-392 Rasche, T. F. and Wooley, K. (2000) Importance of risk based integrity management in your safety management system: advanced methodologies and practical examples, In Queensland Mining Industry Health & Safety Conference 2000, Townsville: Queensland Mining Council Sheikh, A. K., Younas, M. and Raouf, A. (2000) Reliability based spare parts forecasting and procurement strategies, Maintenance, Modeling and Optimization, 81-108, Boston, Mass., Kluwer Academic Publishers Smith, C. and Knezevic, J. (1996) Achieving quality through supportability: Part 1: Concepts and principles, J. of Quality in Maintenance Engineering, 2 (2), 21-29 87 Appendix Figure 1. Risk management process Figure 2. Risk Analysis Options [Source: Rasche & Wooley (2000)] Figure 3. Logic diagram for event tree analysis 88 Event Tree Ignoring Operating Environment (in spare parts estimation) Figure 4. Event tree analysis for the risk of ignoring the product operating-environment factor in spare parts planning 89 Figure 5. Partial fault tree analysis 90 Modern maintenance system based on web and mobile technologies Jaime Campos1, Erkki Jantunen2, Om Prakash3 1. School of Technology and Design, Växjö University, SE-351 95 Växjö, Sweden. 2. Senior Research Scientist, D.Sc. (Tech.), VTT Technical Research Centre of Finland, P.O.Box 1000, FI02044 VTT, Finland. 3. Associate Professor, School of Technology and Design, Växjö University, SE-351 95 Växjö, Sweden. jaime.campos@vxu.se, erkki.jantunen@vtt.fi, om.prakash@vxu.se Abstract: The paper illustrates the development of an e-monitoring and e-maintenance architecture and system based on web and mobile device, i.e. PDA, technologies to access and report maintenance tasks. Rarity of experts led to the application of artificial intelligence and later, distributed artificial intelligence for condition monitoring and diagnosis of machine condition. Recently, web technology and wireless communication emerged as an alternative to provide maintenance with a powerful decision support tool which makes it possible to have all the necessary information wherever it is needed for maintenance analysis and its various tasks. The paper goes through the characteristics of using web and mobile devices for condition monitoring and maintenance. It illustrates the ICT used to communicate among the different layers in the architecture/system and its various client machines. The practical examples are related to the maintenance of rotating machinery, more specifically, diagnosing rolling element bearing faults. Keywords: Condition monitoring, condition based maintenance, web application, mobile application, mobile device, PDA, database architecture 1. Introduction Condition based maintenance is based on condition monitoring which involves the acquisition of data, processing, analysis, interpretation and extracting useful information from it. It provides the maintenance personnel with the needed resources to identify a deviation from predetermined values. In the case of a deviation normally, diagnosis is done to determine the cause of it. Finally, a decision, regarding when and what maintenance tasks are to be performed, is taken. The prognosis is done to foresee a failure as early as possible and be able to plan the maintenance task in advance, (Jantunen (2003)). The decision support systems that have been used to help maintenance department to address this matter have changed and developed over time. In the 1980s, expert system was used and in the 1990s various techniques like the Neural Network and Fuzzy Logic were used in condition monitoring (Wang (2003) and Warwick et al. (1997)). Distributed artificial intelligence has also been used in condition monitoring after the advent of Internet during the late 1990s (Rao et al. (1996), Rao et al. (1998a), Rao et al. (1998b) and Reichard et al. (2000)). In this process recently, web technology and agent technology have started to appear in maintenance and condition monitoring. First review on the subject appeared in 2006 (Campos & Prakash (2006)). These technologies got wider acceptance because of the agents' capability to operate on distributed open environment like the Internet or corporate Intranet and access heterogeneous and geographically distributed data bases and information sources (Feng et al. (2004), Sycara, (1998)). Recently, the combination of web technology and wireless communication coming up as an alternative, to provide maintenance personnel with the right information on time, wherever it is needed for maintenance analysis and its various tasks. This paper proposes an e–maintenance, i.e., web and mobile device architecture for maintenance and condition monitoring purposes. 2. The Web and Mobile architecture The web technology, i.e. Internet and Intranet, is continuously evolving and offering various techniques to utilise the application software's that run on the net. Intranet uses Web technology to create and share knowledge within an enterprise only. The Web consists of applications that are developed in different programming languages such as Hyper Text Markup Language (HTML), Dynamic Hyper Text Markup Language (DHTML), Extensible Markup Language (XML), Active Server Pages (ASP), Java Server Pages (JSP) and Java Database Connectivity (JDBC) etc. The protocol that normally dominates the communication between the Web and its various actors is the Hypertext Transfer Protocol (HTTP) and Transmission Control Protocol/Internet Protocol (TCP/IP). Recently, Web services (WS) started to appear in Web applications. They also use HTTP to send and receive content messages. Figure 1, illustrates the proposed Web and Mobile architecture. In the left there are rotating machines. Next is the proposed three tier web and mobile architecture system. Each tier has its own specific task. The database servers store the data entering into the system. They provide data and information to the middleware and to 91 various client machines. The user interacts with the system through the client machines, i.e. computers and mobile devices. Figure 1. The Web and Mobile Architecture The middleware consist of the application/Web services and Web server. The Web servers are the computers connected to the Internet or Intranet and acting as the server machine. WS are the application softwares that are designed to support interoperability among the distributed applications over a network (World Wide Web Consortium (W3C), www.w3.org). WS facilitates conveying of the messages from and to the client machines. The potential of WS is that they can be consumed through the Web to any application program independent of the language used. They consist of three basic components (Newcomer (2002) and Meyne & Davis (2002)). First is XML. It is a language that is used across the various layers in the web services. The second is the soap listener. It works with packaging, sending and receiving messages over the HTTP. The third component is the Web Services Description Language (WSDL) the code that the client machine uses to read the messages it receives. The WS development can be done with many programming languages like from Java Sun or Microsoft. Other important component in the WS is the Repository for Universal Description, Discovery and Integration (UDDI) protocol. The UDDI produces a standard platform that the WS can use and provide various applications to find access and consume the WS over the internet (www.uddi.org). 3. The data and the system architecture Databases are characterised of various factors such as their ability to provide long-term reliable data storage, multi user access, concurrency control, query, recovery, and security capabilities. In maintenance are these important factors because of the need of for example gathering and storing data for the purpose of monitoring the machines' health. The database technologies have been changing over time and a review is available, Du & Wolfe (1997). The review goes through the database architectures such as relational database, semantic data modelling, distributed database systems, object oriented database and active databases. They mention that the most used is the relational database architecture. It has high performance when simple data requirements are involved and it has been widely accepted. However, other database architectures may be needed when complex data is used. The OSA-CBM (Open System Architecture for Condition Based Maintenance) and MIMOSA (Machinery Information Management Open Systems Alliance) are two organisations, which have been active in developing standards for information exchange and communication among different modules for CBM, [Thurston (2001), www.mimosa.org, ]. The OSA-CBM has been partly funded by the navy through a Dust (Dual Use Science and Technology) program [Thurston (2001), www.osacbm.org]. There were various participants from industrial, commercial and military applications of CBM technology such as Boeing, Caterpillar, Rockwell Automation, Rockwell Science Center, Newport News, and Oceana Sensor Technologies. MIMOSA developed a Common Relational Information Schema (CRIS). It is a relational database model for different data types that need to be processed in CBM application. The system interfaces have been defined according to the database schema based on CRIS. The interfaces’ definitions developed by MIMOSA are an open data exchange convention to use for data sharing in today's CBM systems. Other important contribution in this area is the ISO 17359 standards, which specifies the reference values to consider when a condition monitoring programme is implemented like for example standards for vibration monitoring and analysis. These were taken into consideration while developing the system. 92 4. Development of the System The system used the three Information and communication technologies (ICT); the web services, the web server and the remote access for the communication between the database servers and client machines (Fig 2). Database server in the system can also be directly and remotely accessed by mobile devices. This is done via a wireless communication. There are various communication protocols having different characteristics for the wireless communication between the client machines and the objects in the system. The Mobile devices, in normal cases, have narrow bandwidth. If interaction with the servers is too frequent, the network gets heavily loaded and slows down. This problem was partially overcome in the development process through the use of multiple forms on a single mobile page. Figure 2. ICT in the architecture The system was then tested with the simulated signal from a rolling element bearing. The data flow and the various processes involved are illustrated in Fig. 3. While doing so OSA-CBM, Mimosa Cris data structure and ISO 17359 standards were taken onto consideration. Figure. 3. The data flow and its various processes. In Fig. 3, the sensor data is gathered from the various sensors in the machine. They are next stored in the database, more specific in the data acquisition layer. From the data acquisition layer are relevant time data sent to next layer where the signal analysis is taken place. The results of the signal analysis, illustrated in some parameters are compared with condition monitoring standards in the Condition Monitoring layer. Finally a diagnosis done and a decision is taken. The results of the diagnosis are displayed next. The Fig. 4 to 6 below shows various outputs from the mobile device emulator’s windows from the Web and Mobile architecture. The first mobile window, Fig 4, illustrates the vibration velocity, RMS values, in mm/s vs. date. Figure 4. RMS chart. Figure 5 shows vibration velocity, RMS values in mm/s in time domain and in Fig.6 in frequency domain. 93 Figure 5. Time data. Figure 6. Spectrum. Security factors should be considered when developing applications with ICT in this case web and mobile device applications. The factors that make web and mobile devices more vulnerable are the cases of lack in an authentication process and lack of secure communication (Meyne & Davis (2002)). There are ways to decrease these factors with for example security policies and encryption. The security aspects, however were not considered in the development of the system, nevertheless, they are important. The mobile device provides the maintenance personnel with a mobile user interface of the whole e-maintenance system. The device is a relatively lightweight monitoring system with long battery life capacity and memory that can be used for offline work. The maintenance engineers can also, through the device, get information from other sources such as the Computerized Maintenance Management System (CMMS) to be able to make a work order or see the availability of spare parts. It provides also possibilities to access, if needed, the history of the machine stored in the CMMS through the Wireless Local Area Network (WLAN). Maintenance engineer while working offline but using his mobile device can still have access to the relevant data available on the servers and services off the architecture. This is useful since the mobile device has a small memory to store data and for further analysis. In certain cases it is needed to pinpoint the right condition of the equipment. However, the data is normally located and processed on the servers and services off the architecture. The mobile device provides also the personnel with abilities to communicate to the local intelligent sensors or other kind of sensors. It is possible when sensors are equipped with an AD-card located on the Universal Serial Bus (USB). In any case, the normal way in which the mobile device communicates with the architecture is through the WLAN and the web technology such as the Web Services. Other features that the personnel can use are, for example, the calendar and the word processing, which facilitates the maintenance personnel daily work. Conclusions The wireless technology seems to be an important factor in future maintenance. This is due to the elimination of connecting cables between the monitored machine/equipment and monitoring systems. The experience shows that it is normal that the mobile device requires frequent interaction with the server and this can cause the performance to decrease. For this reason, it is important that mobile internet performance is high since the user satisfaction is crucial. In the present work the performance was improved with the use of multiple forms on a single mobile page. The mobile device could also access the data using Web services. It is a useful development 94 as the data needed for diagnosis and prognosis are normally huge in amount and the storage capacity of a mobile device is small. For this reason, the use of Web services for this part of the system was a good approach to take. In this way the load on the server also decreases and it helps to improve the performance of the Web and wireless communication. Finally, maintenance personnel can remotely monitor the health of equipment that may be located geographically any where. The capacity of the wireless network used is the only limiting factor. Acknowledgements This work presented is based on results from the project Dynamite. The Dynamite Project is a European Community funded research project. The project is an Integrated Project instrument funded under the Sixth Framework Programme. References Campos, J. and Prakash, O. (2006) Information and Communication Technologies in Condition Monitoring and Maintenance, in Dolgui, A., Morel, G and Pereira, C.E. (Eds.) Information Control Problems In Manufacturing. Post conference proceedings, 12th IFAC International Symposium, StEtienne, France. Elsevier.Vol. II Du, T. C-T, and Wolfe, P. M. (1997) Overview of emerging database architectures, Computers & Industrial Engineering, 4 (32), 811821 Feng, J. Q., Buse, D. P., Wu, Q. H. and Fitch, J. (2002) A multi-agent based intelligent monitoring system for power transformers in distributed substations, International Conference on Power System TechnologyProceedings (Cat. No.02EX572), 3, 1962- 1965 Jantunen, E. (2003) Prognosis of wear progress based on regression analysis of condition monitoring parameters, Tribologia.Finish Journal of Tribology, 22 Meyne, H. and Davis, S. (2002) Developing web applications with ASP.NET and C#, Wiley Computer Publishing, John Wiley & Sons, Inc, ISBN 0-471-12090-1 Newcomer, E. (2002) Understanding Web Services: XML, WSDL, SOAP, and UDDI, Addison Wesley Professional ISBN: 0-201-750813 Rao, M., Yang, H. and Yang, H. (1996) Integrated distributed intelligent system for incident reporting in DMI pulp mill. Success and Failures of Knowledge-Based Systems in Real-World Applications, Proceedings of the First International Conference. BKK'96, 1996, 169- 178 Rao, M., Zhou, J. and Yang, H. (1998a) Architecture of integrated distributed intelligent multimedia system for on-line real-time process monitoring, SMC'98 Conference Proceedings.1998 IEEE International Conference on Systems,Man, and Cybernetics (Cat. No.98CH36218), 2, 1411- 1416 Rao, M., Yang, H. and Yang, H. (1998b) Integrated distributed intelligent system architecture for incidents monitoring and diagnosis, Computers in Industry, 37, 143-145 Reichard, K. M., Van Dyke, M. and Maynard, K. (2000) Application of sensor fusion and signal classification techniques in a distributed machinery condition monitoring system, Proceedings of SPIE - The International Society for Optical Engineering, 4051, 329-336 Sycara, K. P. (1998) MultiAgent Systems, AI Magazine, 19 (2) Thurston, M. G. (2001) An open standard for Web-based condition-based maintenance systems, 2001 IEEE Autotestcon Proceedings. IEEE Systems Readiness Technology Conference, 2001, 401- 415 Wang, K. (2003) Intelligent Condition Monitoring and Diagnosis System A Computational Intelligent Approach, Frontiers in Artificial Intelligence and Applications, 93, ISBN 1-58603-312-3, pp 132 Warwick, K., Ekwue, A. O. and Aggarwal, R. (Eds.) (1997) Artificial Intelligence Techniques in Power Systems, Power & Energy, Publishing & Inspec, ISBN: 0 85296 897 3 95 A literature review of computerised maintenance management support Mirka Kans School of Technology and Design, Department of Terotechnology, Växjö University, S-351 95 Luckligs plats 1, Sweden mirka.kans@vxu.se Abstract: Maintenance management information technology (MMIT) systems have existed some forty years. This paper investigates the advancement of these systems and compares the development of MMIT with other corporate information technology (IT) systems by the means of a literature study of 97 scientific papers within the topic of MMIT and additional readings in books. The study reveals that the focus of MMIT has changed in several aspects during the forty years that has been investigated; from technology to use; from maintenance function to maintenance as an integrated part of the business; from supporting reactive maintenance to proactive maintenance and from operative to strategic maintenance considerations. 1. Introduction Research shows that information technology (IT) investments have a positive correlation on companies profitability and competitiveness, thus that IT has strategic importance, see for instance, Johnsson (1999), Kini (2002) and Dedrick et al. (2003). Information technology systems have been in use in companies some 40 years and are today a natural tool for many workers. IT systems for maintenance purposes have existed approximately as long as computers have been available for commercial use. Even though, has the development of maintenance management information technology (MMIT) been in pace with the general development of corporate IT? And in what way has MMIT made advances during the forty years of existence? These questions will be investigated using literature as a basis. After reviewing literature about MMIT several times the author has not yet found a literature study describing the development of MMIT. To fill this gap, we will in this paper present a literature review over the topic of MMIT. To be able to understand the development of maintenance management IT we will first look at the general computerisation of companies. Next section presents three main phases within corporate information technology development; the Introduction, the Coordination and the Integration phase. The phases could be compared to the six stages of IT growth and maturity presented in Nolan (1979), see Figure 1, whereas the first two stages, Initiation and Contagion, are similar to the Introduction phase. Here technology and functional automation is stressed. In stages three and four, Control and Integration, the top management gains control over IT resources and the IT resources are supporting the overall business strategy, i.e. for coordination of business activities as in the Coordination phase. The last two stages in Nolan’s IT maturity model, Data administration and Maturity, deals with data sharing and information systems as a strategic matter. These steps are similar to the Integration phase. Relative level of planning and control in installations and of IT expenditures Transition point: from computer management to data resources management Nolan’s six stages of growth and maturity Main phases within corporate IT development Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Initiation Contagion Control Integration Data administration Maturity Phase 1 Phase 2 Phase 3 Introduction Coordination Integration Figure 1. Corporate IT development A literature survey covering maintenance information technology was conducted in autumn 2003 / spring 2004. The databases used for the survey were Elsevier, Emerald, and IEEE. Following combinations of keywords were used; decision support system, expert system, computerised and information system combined with maintenance, asset management or maintenance management system. An additional search was made in a full text database search tool (ELIN) that integrates a vast number of databases, e.g. Elsevier, Emerald, IEEE, 96 Proquest and Springer, using the same keywords as above, i.e. decision support system, expert system, computerised and information system combined with maintenance, asset management or maintenance management system. A total of 97 articles within the relevant topic were found in this survey. All articles were published in the period 1988 to 2003. Additional reading was made in books about maintenance and computerised maintenance management systems, especially to capture the missing period 1960-1988. The number of articles per year is presented in Table 1. The historical description is divided into three periods, 19601992, 1993-1998 and 1999-2003. The amount of articles from each period is found in Table 1. The periods are representing different stages of maintenance information technology maturity and are consistent with the three phases of corporate IT development; Introduction, Coordination and Integration. Year 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Total: Number articles 2 5 5 5 4 7 5 7 9 6 6 4 6 9 6 11 97 of Period Number of articles 1988-1992 21 1993-1998 40 1999-2003 36 97 Table 1. Number of articles per year 2. Information technology systems within companies: a historical perspective A historical review of general corporate IT is given in the following based upon the model presented in the previous section. ¾ Introduction: The emergence of corporate information technology The use of IT emerged in administration mainly to automate information processing, Persson et al. (1981) and Dahlbom (1997). The first computers, the mainframes, filled a whole room and special crew fed the mainframe with input data and programs for e.g. sorting, listing and analysing payrolls, customer registers, supplier registers and so on. Computing time was expensive and the most crucial calculations and analyses were prioritised. These computers were for the larger companies to use, while the medium and small sized still had to rely on manual information handling. When the mini computers emerged in the 70s, computer power was accessible for “everyone”, Persson et al. (1981). The IT systems that were developed for mini's contained department specific applications. The financial department ran spreadsheets and ledger applications, the management word processors and market analysing applications while the personal administration computed payments and kept a payroll application. Each application structured the information in its own way and data were stored in file systems. Cross-functional access of data and information was not supported. Neither was the management able to get reports including information from more than one area of the business, if not special applications were bought or developed for this purpose, see for instance Mullin (1989). ¾ Coordination: Connecting the systems together In the late 70s and 80s the use of middleware, devices that act like a translator between two systems both talking their own language, enabled the communication without actual changes in the specific systems, see e.g. Tuunainen (1998). With the ability to interchange information between IT systems, the possibilities to coordinate the organisation grew, both in the horizontal and hierarchical level. The 80s is characterised by incorporation efforts. Now all separate IT systems should be incorporated into a unity and this resulted in compatibility problems for the technical devises. There were vast amounts of hardware and software standards, sometimes one for each vendor. Even different versions of the same application were sometimes incompatible. These compatibility problems were slowly overcome by standardising hardware and software, see for instance Adler (1995) and Hosaka et al. (1981). On the data level problems regarding syntax heterogeneity (the structure of the information), semantic heterogeneity (the content of the information) and pragmatic heterogeneity (how the information is handled, e.g. concurrency control), Toussaint et al. (2001). The data heterogeneity problems were overcome for instance by using relational databases as data repositories, Hoffer et al. (2005). In the early coordination phase when middleware was used, only predefined data and 97 information could be interchanged. With incorporated IT systems and central databases, data and information were accessible for everyone using the IT system, Dahlbom (1997) and Kelly et al. (1997). ¾ Integration: One company wide solution In the last part of the 90s total business solutions emerge. These solutions integrate databases and functionality as well as providing a common user interface. In industries enterprise resource planning (ERP) systems and industrial automation such as computer integrated manufacturing (CIM) and supervisory control and data acquisition (SCADA) systems had and still have a central role of bringing total integration, Gordon (2002) and Nikolopoulos et al. (2003). CIM and ERP systems integrate “everything”, CIM in a production control level and ERP on the company administrative level. Until recently, these systems have been separated, but in the field the trend is now moving towards integration of these technical systems and administrative systems, see for example Dahlfors & Pilling (1995) and Bratthall et al. (2002). 3. The development of computerised maintenance support Based upon the literature survey described in the introduction, the author would like to present a historical review of the development of maintenance information technology from the 60s and until today, as it is described in scientific literature. ¾ Introduction: The emerge of maintenance information technology According to Wilder and Cannon (1993) computerised maintenance support was not existent before the year 1960. There were maintenance planning systems available for mainframes in the 70s, where the computation time was shared with other departments giving high priority for the most important processing, Kelly (1984). Maintenance was most likely not one of the high priority activities and Kelly concludes that the tasks were limited to some scheduling of preventive actions. The first maintenance IT automation step was available for the large companies and supported preventive maintenance, though in a low extent, while other companies had to rely on manual maintenance management. In the beginning of the 80s minicomputers with dedicated programs were developed giving higher freedom for the maintenance department to systematise, plan and check up maintenance activities, Wilder and Cannon (1993) and Kelly (1984). In 1985, at least 60 CMMS were available, Raouf et al. (1993). At this time, the backbone of CMMS was established, consisting of functionality for scheduling, plant inventory, stock control, cost and budgeting and maintenance history, Wilson (1984). Another popular kind of IT support was expert systems (ES) for reducing downtime when conducting reactive maintenance. About half of the papers written during the period 1988-1990 are within ES for fault detection and troubleshooting, i.e., see for example Walters (1990) and Ho et al. (1988). Technological innovation is also discussed in the late 80s. One project conducted by US Navy aimed at digitalising and integrating several sources of maintenance information, using ultra modern techniques such at optic disc storage, Landers et al. (1989). Furthermore, US Air force demonstrated the first hand-held computer that could integrate on-the-place failure data with historical data and manuals in order to reach failure diagnosis, Link (1989). The state-of-the-art in computerised maintenance management systems of the late 80s is given by Mullin (1989), who describes the computer aided maintenance management systems at Ford as developed independently, poorly integrated and with poor interfaces. ¾ Coordination: Structuring the maintenance IT resources About a third of the papers studied from the first part of the 90s deal with the concept of CMMS and words like efficiency and cost reduction occurs. Ben-Bassat et al. (1993) present an expert system for cost-effective utilisation of maintenance resources. The ability to identify and follow up maintenance costs using CMMS' is discussed in Gehl (1993). Jones (1994) concludes that if a CMMS will be cost-efficient, its introduction and use must be connected to organisation culture and the maintenance as well as business strategy. The aspect of easy used interfaces such as in Hall et al. (1994), where graphics are used to reach user friendliness, is also pointed out. The word integration shows up for the first time now. Fung (1993) promotes the use of CMMS to integrate maintenance with e.g. quality assurance and energy management. Nolasco (1994) discusses CMMS and integration between maintenance, purchase and accounting and Sherwin & Johnsson (1995) promotes the use of management information systems to integrate maintenance and production. MMIT is thus apprehended as a useful resource; the work of connecting different maintenance application begins, as well as the connections between maintenance and other working areas are explored. The main focus of the 90s lies in how to manage preventive maintenance though, for instance in the shape of expert systems for policy planning, scheduling and fault diagnosis, see e.g. Batanov et al. (1993), or IT systems for preventive maintenance management, Fung (1993), Gehl (1993) and, Raouf et al. (1993). At this time, there are more than 200 commercial CMMS packages available in North America alone, Campbell (1995). The military is still in the front-end of maintenance, for instance with two projects aimed at computerised life 98 cycle cost (LCC) analysis of weapon systems, including preventive maintenance considerations, Hansen et al.(1992) and Awtry et al. (1991). LCC simulation is also the topic of Ostebo (1993). Expert systems are still a common topic in the period, see for instance Batanov et al. (1993) and Mitchell (1991). Also, papers discussing computer support for predictive maintenance appear. Sato et al. (1992) and Itaka et al. (1991) describe an advanced system for condition monitoring and maintenance communication for power transport lines. Wichers (1996) discusses a reliability-centred maintenance based system for maintenance planning, specially stressing condition monitoring, which is connected to a manual or computerised maintenance management system. Pearce & Hall (1994) recognise the advantages of vibration monitoring and the importance of connecting on-line monitoring data to a computerised maintenance management system. We can see that computerised support for condition monitoring is developed but not widely incorporated into the administrative IT systems. ¾ Integration: Maintenance and maintenance IT as a part of the whole company In the end of the 90s, the economic aspect appears even stronger. Maintenance IT is discussed with respect to cost-effectiveness and cost reduction, see for example Labib (1998) and Weil (1998). Johnsson (2000) connects IT maturity in maintenance with profitability. The term integration is used to discuss integrated CMMS solutions during this period, where e.g. integration of CMMS and asset management systems is discussed, Boyles (1999) and Weil (1998), as well as the benefits of integrated CMMS are addressed, Panucci (2000). Zhang et al. (1997) discusses the use of artificial intelligence to achieve an integrated maintenance management system that takes into consideration not only equipment condition, but also production quality, efficiency and costs. The development of computerised communication methods, such as remote monitoring, telemaintenance and geographical information systems, also affects the topics of papers; see e.g. Hadzilacos et al. (2000) and Laugier et al. (1996). The topic of decision support systems has increased continuously during the studied years. In the period of 2001-2003 decision support systems are discussed in eight of twenty six papers, i.e. about one third of the papers (to compare with period 1988-1990 when the figure was two out of twelve papers). Yam et al. (2001) for instance discusses operational and maintenance cost reduction as the result of a more accurate condition-based fault prediction and diagnosis reached by decision support systems. Other examples of IT support for diagnosis and prognosis are found in Yagi et al. (2003) and Zhang Wang et al. (2001). Noticeable is also that papers about expert systems have decreased from about 40% (five of twelve papers) in 1988-1990 to about 20% (five of twenty six papers) in 2001-2003. 4. Conclusions The survey of computerised maintenance support reveals that the focus of MMIT has changed in four aspects during the forty years that has been investigated: 1) From technology to use, 2) From maintenance function to business integration, 3) From reactive maintenance to proactive maintenance and 4) From operative to strategic maintenance considerations. These shifts in focus are further discussed below. ¾ Technology Æ use In the microcomputer era automation of routines was in focus. Main benefits of maintenance IT lied in reducing manual paper work and getting a grip of maintenance specific resources. IT in enterprises was a new phenomenon and the technology itself was stressed in the early papers. As the IT maturity of enterprises grew, the technological construct of maintenance IT was discussed less often. Instead, the focus shifted to the use of IT. MMIT is in the later papers treated as a tool, which can benefit the user if used properly, and the actual benefits are stressed. ¾ Maintenance function Æ business integration While the literature in the early years is considering the maintenance function and its information technology needs, an increased use of the integration concept is seen in later papers. By the use of, and by integrating, CMMS advantages in maintenance could be achieved. ¾ Reactive maintenance Æ predictive-proactive maintenance A trend of increasing IT support for maintenance management activities appears in the description, from mainly supporting technical reactive and preventive maintenance strategies in the microcomputer era to predictive condition-based strategies when different corporate IT resources could be integrated. Today, as predictiveproactive maintenance strategies, which help in avoiding damage initiation by detecting the damage causes, are strongly gaining ground we should be able to see this reflected in contemporary research. The growth in amount of papers published the last years discussing integration and DSS for maintenance could be a tendency of this. Furthermore, the discussion about financial benefits of maintenance and the connection between maintenance and production performance together with IT would imply a more holistic view of the maintenance role in companies. Having a holistic perspective on maintenance enables predictive-proactive maintenance. 99 ¾ Operative maintenance considerations Æ strategic maintenance considerations A shift in focus from operative maintenance concerns to strategic maintenance concerns could be seen in the study. Notable is e.g. the increased number of papers in the later years dealing with economic advantages that could be reached by using CMMS, whereas the focus in the early years were in describing how the operative maintenance work could be speeded up and automated by using computers. References Adler, R. M. (1995) Emerging standards for component software, Computer, 28 (3), 68-77 Al-Najjar, B. (1996) Total quality maintenance An approach for continuous reduction in costs of quality products, J. of Quality in Maintenance Engineering, 2 (3), 4-20 Alsyouf, I. (2004) Cost effective maintenance for competitive advantages, PhD thesis. Växjö university, School of industrial engineering Awtry, M. H., Calvo, A. B. and Debeljak, C. J. (1991) Logistics engineering workstations for concurrent engineering applications, Proceedings of the IEEE 1991 National Aerospace and Electronics Conference, 1991. NAECON 1991., 3, 1253-1259 Batanov, D., Nagarur, N. and Nitikhunkasem, P. (1993) EXPERT-MM: A knowledge-based system for maintenance management, Artificial Intelligence in Engineering, 8 (4), 283-291 Ben-Bassat, M., Beniaminy, I., Eshel, M., Feldman, B. and Shpiro, A. (1993) Workflow management combined with diagnostic and repair expert system tools for maintenance operations, AUTOTESTCON '93. IEEE Systems Readiness Technology Conference Proceedings, 367-375 Boyles, C. (1999) CMMS and return on assets, Chemical Processing, 62 (5), 62-65 Bratthall, L. G., van der Geest, R., Hofmann, H., Jellum, E., Korendo, Z., Martinez, R., Orkisz, M., Zeidler, C. and Andersson, J.S. (2002) Integrating hundred's of products through one architecture - the industrial IT architecture, Proceedings of the 24th International Conference on Software Engineering, 2002, 604-614 Campbell, J. D. (1995) Outsourcing in maintenance management: A valid alternative to self-provision, J. of Quality in Maintenance Engineering, 1 (3), 18-24 Dahlbom, Bo. (1997) The New Informatics, Scandinavian J. of Information Systems, 8 (2), 29-48 Dahlfors, F., Pilling, J. (1995) Integrated information systems in a privatized and deregulated market, International Conference on Energy Management and Power Delivery, 1995. Proceedings of EMPD '95., 1, 249 -254 Dedrick, J., Gurbaxani, V. and Kraemer, K. L. (2003) Information Technology and Economic Performance: A Critical Review of the Empirical Evidence, ACM Computing Surveys, 3 (1), 1-29 Fung, W. Y. (1993) Computerized maintenance management system in a railway-based building services unit, ASHRAE Transactions, 99 (1), 72-83 Gehl, P. (1993) Management application of CMMS reports, Advances in Instrumentation and Control: International Conference and Exhibition, 48 (3), 1535-1556 Gordon, L. A. (2000) The e-skip-gen effect. The emergence of a cybercentric management model and the F2B market segment for industry, International J. of Production Economics, 80 (1), 11-29 Hadzilacos, T., Kalles, D., Preston, N., Melbourne, P., Camarinopoulos, L., Eimermacher, M., Kallidromitis, V., Frondistou-Yannas, S.S. and Saegrov, S. (2000) UtilNets: a water mains rehabilitation decision-support system, Computers, Environment and Urban Systems, 24 (3), 215-232 Hall, J. D., Biles, W. E. and Leach, J. (1994) An autocad-12 based maintenance management system for manufacturing, Computers industrial engineering, 29 (1-4), 285-289 Hansen, W. A., Edson B. N. and Larter, P. C. (1992) Reliability, availability and maintainability expert systems (RAMES), Annual Reliability and Maintainability Symposium, 285-289 Ho, T.-L., Bayles, R. A. and Havlicsek, B. L. (1988) A diagnostic expert system for aircraft generator control unit (GCU), Proceedings of the IEEE 1988 National Aerospace and Electronics Conference, 1988. NAECON 1988, 4, 1355-1362 Hoffer, J., Prescott, M., McFadden, F. R. (2005) Modern Database Management, Upper Saddle River, Pearson/Prentice Hall. Hosaka, T., Ueda, K. and Matsuura, H. (1981) A Design Automation System for Electronic Switching Systems, 18th conference on Design Automation 29-1 June 1981, 51-58 Itaka, K., Matsubara, I., Nakano, T., Sakurai, K. and Taga H. (1991) Advanced maintenance information systems for overhead power transmission lines, APSCOM-91., 1991 International Conference on Advances in Power System Control, Operation and Management, 2, 927-932 Jones, R. (1994) Computer-aided maintenance management systems, Computing & Control Engineering J., 5 (4), 189-192 Jonsson, P. (1999) The Impact of Maintenance on the Production Process-Achieving High Performance, Lund University, Institute of Technology ,Department of Industrial Engineering, Division of Production Management Jonsson, P. (2000) Towards a holistic understanding of disruptions in Operations Management, J. of Operations Management, 18, 701718 Kelly, A. (1984) Maintenance planning and control, Butterworths, London Kelly, G. J., Aouad, G., Rezgui, Y. and Crofts, J. (1997) Information systems development in the UK construction industry, Automation in Construction, 6, 17-22 Kini, R. B. (2002) IT in manufacturing for performance: the challenge for Tai manufacturers, Information Management & Computer Security, 10 (1), 41-48 100 Labib, A. W. (1998) World-class maintenance using a computerised maintenance management system, J. of Quality in Maintenance Engineering, 4 (1), 66-75 Landers, T., Nguyen, M. and Delgado, R. (1989) A digital maintenance information (DMI) system for ATE, AUTOTESTCON '89. IEEE Automatic Testing Conference. The Systems Readiness Technology Conference. Automatic Testing in the Next Decade and the 21st Century. Conference Record, 272-276 Laugier, A., Allahwerdi, N., Baudin, J., Gaffney, P., Grimson, W., Groth, T. and Schilders, L. (1996) Remote instrument telemaintenance, Computer Methods and Programs in Biomedicine, 50 (2), 187-194 Link, W. R. (1989) The IMIS F-16 interactive diagnostic demonstration, Proceedings of the IEEE 1989 National Aerospace and Electronics Conference, NAECON 1989, 3, 1359-1362 Mitchell, J. (1991) Research into a sensor-based diagnostic maintenance expert system for the hydraulics of a continuous mining machine, Conference Record of the 1991 IEEE Industry Applications Society Annual Meeting, 2, 1192-1199 Mullin, A. (1989) The application of information technology to asset preservation in Ford of Europe, IEE Colloquium on IT in the Management of Maintenance, 2/1-2/4 Nikolopoulus, K., Metaxiotis, K., Lekatis, N. and Assimakopoulos, V. (2003) Integrating industrial maintenance strategy into ERP, Industrial Management & Data Systems, 103 (3), 184-191 Nolan, R. L. (1979) Managing the crises in data processing, Harvard Business Review, 57 (2), 115-126 Nolasco, A. (1994) Computerized maintenance management systems (CMMS) in cement plants, World Cement, 25 (12), 44-48 Ostebo, R. (1993) System-effectiveness assessment in offshore field development using life-cycle performance simulation, Proceedings of the Annual Reliability and Maintainability Symposium, 1993, 375-385 Panucci, D. (2000) Take CMMS seriously, Manufacturing computer solutions, 6 (5), 25 Pearce D. F. and Hall, S. (1994) Using vibration monitoring techniques to minimize plant downtime, IEE, 8/1-8/2 Persson, P. O., Boberg, K-E., Broms, I., Docherty, P., Kraulis, G. and Kreimer, B. (1981) 80-talet på en bricka. Datateknik; utveckling och miljö 1980-1990 (The 80s on a tray. Computer technology; development and environment 1980-1990), Riksdataförbundet, Stockholm Raouf, A., Ali, Z. and Duffuaa, S. O. (1993) Evaluating a Computerized Maintenance Management System, International J. of Operations & Production Management, 13 (3), 38-49 Sato, K., Atsumi, S., Shibata, A. and Kanemaru, K. (1992) Power transmission line maintenance information system for Hokusei line with snow accretion monitoring capability, IEEE Transactions on Power Delivery, 7 (2), 946-951 Sherwin, D. J. and Jonsson, P. (1995) TQM, maintenance and plant availability, J. of Quality in Maintenance Engineering, 1 (1), 15-19 Toussaint, P. J., Bakker, A. R. and Groenewegen, L. P. J. (2001) Integration of information systems: assessing its quality, Computer Methods and Programs in Biomedicine, 64, 9-35 Tuunainen, V. K. (1998) Opportunities of effective integration of EDI for small businesses in the automotive industry, Information and Management, 34 (6), 361-375 Walters, M. D. (1990) Inductive learning applied to diagnostics, AUTOTESTCON '90. IEEE Systems Readiness Technology Conference. 'Advancing Mission Accomplishment', Conference Record, 167-174 Weil, M. (1998) Raising the bar for manintenance apps, Manufacturing Systems, 16 (11), 5 Wichers, J. H. (1996) Optimising maintenance functions by ensuring effective management of your computerised maintenance management system, IEEE AFRICON 4th AFRICON, 2, 788-794 Wilder P. and Cannon M. (1993) Advantages of a computerized maintenance management system in managing plant operations, Textile, Fiber and Film IEEE 1993 Annual Industry Technical Conference, 5/1-512 Wilson, A. (1984) Planning For Computerised Maintenance, Conference Communication, UK Yagi, Y., Kishi, H., Hagihara, R., Tanaka, T., Kozuma, S., Ishida, T., Waki, M., Tanaka, M. and Kiyama, S. (2003) Diagnostic technology and an expert system for photovoltaic systems using the learning method, Solar Energy Materials and Solar Cells, 75 (3-4), 655-663 Yam, R. C. M., Tse, P. W., Li, L. and Tu, P. (2001) Intelligent Predictive Decision Support System for Condition-Based Maintenance, International J. of Advanced Manufacturing Technology, 17. (5), 383-391 Zhang, J., Tu, Y. and Yeung, E. H. H. (1997) Intelligent decision support system for equipment diagnosis and maintenance management, Innovation in Technology Management - The Key to Global Leadership. PICMET '97: Portland International Conference on Management and Technology, 733 Wang Z., Guo, J., Xie, J. and Tang, G. (2002) An introduction of a condition monitoring system of electrical equipment, Proceedings of 2001 International Symposium on Electrical Insulating Materials, (ISEIM 2001), 221-224 101 Some generalizations of age and block replacement Phil Scarf Centre for OR and Applied Statistics, Salford Business School, University of Salford, Manchester, M5 4WT, UK p.a.scarf@salford.ac.uk Abstract: Consider a component that is subject to failure and preventive replacement. The simplest preventive replacement policies replace the component either when the component age reaches a critical limit (age based replacement) or at regular intervals of calendar time (block replacement). Various criteria may be used to quantify the properties of these preventive replacement policies; these are principally cost (cost per unit time) and reliability (distribution of the time between failures of the operational function of the component). We show how these criteria are related and that a value of one implies the other. Furthermore, while it is generally accepted that age based replacement is cost efficient with respect to block replacement, there are circumstances under which it is not. These circumstances are investigated. The ideas are illustrated using a case study relating to traction motor replacement. 102 Scheduling the imperfect preventive maintenance policy for deteriorating systems Yunxian Jia *, Xuhua Chen Department of Management Engineering, Shijiazhuang Mechanical engineering College, 97 Hepingxi Road, shijiazhuang 050003, P.R.China yunxian_jia@hotmail.com Abstract: Imperfect preventive maintenance can reduce the wear out and aging effects of deteriorating systems to a certain level between the conditions of as good as new and of as bad as old. The hybrid hazards rate recursion rule based on the concept of age reduction factor and hazards rate increase factor was built up to measure the extent of restoration for deteriorating systems and predict the evolution of the system reliability in different maintenance cycles. After obtaining the parameters of the hybrid hazards rate, two different situations were considered when optimizing the imperfect preventive maintenance policy. In these situations, whenever the system reliability reaches threshold R, which was determined by minimizing the cumulative maintenance cost per unit time in the life cycle of the system, imperfect preventive maintenance is performed on the system. Finally, a numerical example is presented to validate the methods discussed above and some conclusions are provided. Keywords: imperfect maintenance, preventive maintenance, hybrid hazards rate, reliability, cost optimisation Introduction Maintenance optimization has been a popular issue to researchers since early in the 1960s and a lot of optimal maintenance strategies have been developed and implemented for improving system availability, preventing system failure risk and reducing maintenance costs, Zhou et al. (2006). For the diversity of degree of maintenance, Pham & Wang (1996) subdivided maintenance work into five different types: the Perfect Maintenance, the Minimal Repair, the Imperfect Maintenance, the Worse Maintenance and the Worst Maintenance. It assumes that perfect maintenance can restore the system back to the condition of as good as new, which means that the restored system or equipment has the same hazards rate function and reliability function as a new one. Minimal repair assumes that it can repair the equipment to the condition of as bad as old, which means this work merely eliminates the failure and the repaired system has the same property as it does before the failure. Imperfect maintenance can reduce the wear out and aging effects of deteriorating systems to a certain level between the conditions of as good as new and of as bad as old. In fact, preventive maintenance is generally imperfect and it cannot restore the system to as good as new in most situations. Under the imperfect preventive maintenance policy, the system is maintained at a decreasing sequence of intervals, which is more practical since most systems need more frequent maintenance with increased usage and age. There are several methods to model an imperfect preventive action. One of the most useful methods in engineering is the improvement factor method that is in terms of the system hazards rate or other reliability measures. In the literature, two different types of improvement factors are developed. Malik (1979) introduces the concept of the age reduction factor. If Ti and hi ( t ) for t ∈ ( 0, Ti ) , respectively, represent the preventive maintenance interval and the hazards rate function of the system prior to the ith preventive maintenance, the hazards rate function after the ith preventive maintenance becomes hi ( t + aiTi ) for t ∈ ( 0, Ti +1 ) , where 0 < ai < 1 is the age reduction factor due to imperfect preventive maintenance action. This implies that each imperfect preventive maintenance changes the initial hazards rate value right after the preventive maintenance to hi ( aiTi ) , but not all the way to zero. Nakagawa (1988) proposes another model based on the hazards rate increase factor. The hazards rate function becomes bi hi ( t ) for t ∈ ( 0, Ti +1 ) after the ith preventive maintenance, where bi > 1 is the hazards rate increase factor. This indicates that each preventive maintenance resets the increase rate of the hazards rate function higher and higher. In order to benefit from both the age reduction method, that has the advantage of determining the initial failure rate value right after a preventive maintenance, and the hazards rate increase method, that has the advantage of allowing the increase rate of the hazards rate function to be higher after each preventive maintenance. Zhou et al. (2006) proposed a hybrid hazards rate evolution rule based on these two methods. It assumed that the relationship between the hazards rate functions before and after the ith preventive maintenance can be defined as hi +1 ( t ) = bi hi ( t + aiTi ) for t ∈ ( 0, Ti +1 ) , (1) 103 where, 0 < ai < 1 and bi > 1 are the age reduction factor and the hazards rate increase factor respectively, which need to be deduced from the history maintenance data of the system. As shown in Figure 1, if ai = 0 , the hybrid hazards rate function reduces to that proposed by Nakagawa (1988); if bi = 1 , the hybrid hazards rate function reduces to that proposed by Malik (1979). This hybrid recursion rule makes it possible to predict the evolution of the system reliability in different maintenance cycles. h (t ) hi+1( t) =bh i i ( t +aT i i) hi+1( t) =hi ( t +aT i i) hi ( t ) hi+1( t) =bh i i ( t) t Figure 1. Hybrid hazards rate function Aiming at minimizing the cumulative maintenance cost per unit time in the life cycle of the system, this paper applies the hybrid hazards rate recursion rule proposed by Zhou et al. (2006) to measure the extent of the imperfect preventive maintenance for deteriorating systems and predict the evolution of the system reliability in different maintenance cycles. Considering two different situations met in practice, the authors deduced the related formulas and algorithms to optimize the maintenance policies as described in section 2. When obtaining the optimal R and N * , the maintenance engineers or managers could make their optimal schedule to reduce the maintenance cost. In section 3, numerical examples concerning the two situations were presented to illustrate the methods provided above. Also, some results and conclusions about the methods were provided in the last section. 1. Model development 1.1. Notation and assumptions Notation: i ordinal of preventive maintenance cycles, i = 1, 2,L N preventive maintenance cycle number * N optimal preventive maintenance cycle number Ti time interval for preventive maintenance prior to the ith maintenance hi ( t ) system hazards rate function prior to the ith preventive maintenance ri ( t ) R cp system reliability function prior to the ith preventive maintenance system reliability threshold for the scheduled preventive maintenance preventive maintenance cost cf replacement cost cost brought by the system’s failure E (c) expected total cost for the system in the life cycle E (t ) expected operational time for the system in the life cycle cr ϕ ( R, N ) expected cost per unit time for the system in the life cycle In this section, two situations often met in practice will be taken into account. In the first situation, people periodically take some preventive maintenance actions on the system before it suffers a failure. If the system fails, a replacement action will be performed. In other words, if and only if the system suffers no failure, the imperfect preventive maintenance will be performed, and perfect maintenance (replacement) action will be taken if the system fails. But in the second situation, imperfect preventive maintenance or imperfect corrective maintenance will be taken whenever the system reliability reaches the threshold R or whenever the system fails before the scheduled preventive maintenance. And when the preventive maintenance cycle number reaches N*, replacement will be performed when the system reliability reaches R or when the system fails. That means the optimal preventive maintenance cycle number N* and the system reliability threshold R for scheduled preventive 104 maintenance are to be decided to minimize the expected cost per unit time for the system in the life cycle. Compared with the operational time for the system, the duration of preventive maintenance or replacement is short enough to be ignored. 1.2. Model formulation for situation 1 In this situation, a scheduled preventive maintenance is performed whenever the system reliability reaches the reliability threshold R, and a replacement is performed whenever the system suffers a failure. Based on this policy, a reliability equation can be constructed as ( ) ( ) ( exp − ∫ h1 ( t ) dt = exp − ∫ h2 ( t ) dt = L = exp − ∫ T1 0 T2 0 TN 0 ) hN ( t ) dt = R (2) where hi ( t ) can be deduced from equation (1). Equation (2) can be rewritten as ∫0 h1 ( t ) dt = ∫0 T1 where T2 h2 ( t ) dt = L = ∫ TN 0 hN ( t ) dt = − ln R (3) ∫0 hi ( t ) dt represents the cumulative failure risk in maintenance cycle i . This implies that the cumulative Ti risk of system failure in each maintenance cycle is equal to –lnR. Since only one preventive maintenance action is performed in one maintenance cycle, the probability to implement a scheduled preventive maintenance action is R and the probability to implement a replacement action is 1 – R.. For this situation, the expected total cost E ( c ) for the system in the life cycle can be calculated as E ( c ) = ( cr + c f ) ⋅ (1 − R ) + c p ⋅ R + ( cr + c f ) ⋅ (1 − R ) ⋅ R + c p ⋅ R 2 + L = ( cr + c f ) + cp ⋅ R 1− R and the expected operational time E ( t ) for the system in the life cycle can be write as (4) E ( t ) = ∫ r1 ( t )dt + R ⋅ ∫ r2 ( t )dt + R 2 ⋅ ∫ r3 ( t )dt + L T1 T2 0 T3 0 ∞ 0 = ∑ Ri ⋅ ∫ i =0 Ti +1 0 ri +1 ( t ) dt (5) Since the expected operational time E ( t ) calculated by equation (5) will be different as the system reliability function ri ( t ) changes, there is no unified formulation for E ( t ) as for E ( c ) . However, it can be proved that the sum of the infinite series in equation (5) will converge for the deteriorating system and the approximation for E ( t ) could be calculated using the numerical algorithm for the given ri ( t ) . After obtaining E ( c ) and E ( t ) , the expected cost per unit time ϕ ( R, N ) for the system in the life cycle can also be calculated by E (c) ϕ ( R, N ) = ϕ ( R ) = (6) E (t ) For the given ri ( t ) , both ϕ ( R ) and Ti are functions of R . By minimizing ϕ ( R ) , the optimal system reliability threshold for scheduled preventive maintenance can be determined. At the same time, the value of Ti from equation (3) can be calculated which should be helpful for preparing the scheduled preventive maintenance activities. 1.3. Model formulation for situation 2 In situation 2, preventive maintenance would be scheduled for N cycles. During the N − 1 cycles, imperfect preventive maintenance or imperfect corrective maintenance will be performed whenever the system reliability reaches the threshold R or whenever the system suffers a failure before the scheduled preventive maintenance and replacement action will be taken just at the cycle N whether the system reliability reaches the threshold R or the system fails. So, there are two variables, optimal cycle numbers N * and reliability threshold R , to be chosen to minimize the expected cost per unit time. Based on this maintenance scheduling strategy, the expected total cost E ( c ) for the deteriorating system in the life cycle is N −1 E ( c ) = ∑ ( c p + c f ⋅ (1 − R ) ) + ( cr + c f ⋅ (1 − R ) ) i =1 105 (7) and the expected operational time E ( t ) for the deteriorating system in the life cycle is N Ti i =1 0 E ( t ) = ∑ ∫ ri ( t ) dt (8) After calculating E ( c ) and E ( t ) , the expected cost per unit time ϕ ( R, N ) for the system in the life cycle can be determined by E (c) ϕ ( R, N ) = (9) E (t ) In this situation, Ti can be obtained from Equation (3) as in situation 1 and it is a function of R . By minimizing ϕ ( R, N ) , which is the function of R and N , the optimal system reliability threshold R for scheduled preventive maintenance and the optimal maintenance cycle number N * can be determined. But how to search for the best R and N * through numerical computation is still a complicated and time-consuming work. To make it simplified, for the given cycle number N , the first derivative of ϕ ( R, N ) with R is Ti ⎛ N N d ri ( t ) dt ⎞ ∫ ⎜E t ⋅ c + E c ⋅ 0 ( ) ( ) ∑ f ∑ dR ⎟⎟ ⎜ i =1 i =1 ⎜ ⎟ d ϕ ( R, N ) ⎠ = −⎝ 2 dR ( E (t )) d ∫ ri ( t ) dt (10) Ti 0 dR Let d ϕ ( R, N ) dR = ai −1 ⋅ ∫ ri ( t ) ⋅ ( hi ( t ) − hi ( 0 ) ) ⋅ Ti 0 dTi −1 1 dt − dR hi (Ti ) (11) = 0 , from equation (10), there will be N ⎛ Ti dT 1 ⎞ N ⋅ c f + ε ⋅ ∑ ⎜ ai −1 ⋅ ∫ ri ( t ) ⋅ ( hi ( t ) − hi ( 0 ) ) ⋅ i −1 dt − ⎟=0 ⎜ 0 dR hi (Ti ) ⎟⎠ i =1 ⎝ where ε = ϕ ( R, N ) , which is a function of R for a given N . (12) In order to make finding the optimal reliability threshold R and the maintenance cycle number N * less complicated and quicker, the following steps are recommended; 1. Set the search ranges N ∈ [1, m ] and let n = 1 . 2. For the given N = n , solve the equation (12) to find R by numerical method such as Newton-Raphson iteration. The R that satisfies (12) is the optimal reliability threshold which minimizes ϕ ( R, N ) for the given N . 3. If n < m , let n = n + 1 and go back to step 2; else, go to the next step. 4. Until now, the optimal system reliability threshold R has been found for the given N. The minimum ϕ ( R, N ) and the corresponding R and N are what we expect to find. After obtaining the optimal system reliability threshold R and the optimal preventive maintenance cycle number N * , preventive maintenance activities could be scheduled to reduce the maintenance cost for the deteriorating system. 3. Numerical examples In this section, we present some numerical analysis to validate the maintenance decision models discussed above. Generally speaking, maintenance engineers should be responsible for the determination of the system hazards rate functions and the improvement factors, which can be obtained from their experiences or through statistical methods if a large quantity of maintenance data is available. In these numerical examples, a Weibull distribution was used to describe the hazards rate function for the deteriorating system and the improvement δ −1 factors ( ai , bi ) were assumed to be known. It is assumed that h1 ( t ) = δ α t α , where the shape parameter δ = 4 and the scale parameter α = 120 . Also, let ai = i 3i + 5 , bi = 12i + 310i + 3 , c p = 100 , cr = 1000 and c f = 1200 . The results for two different situations are presented below. ( )( ) 106 3.1. Results for situation 1 In situation 1, the expected total cost E ( c ) for the system in the life cycle can be obtained by equation (4) and the approximation for the expected operational time E ( t ) for the system in the life cycle can be estimated by equation (5) through setting a large maintenance cycle number N . After obtaining E ( c ) and E ( t ) , the expected cost per unit time ϕ ( R ) for the deteriorating system in the life cycle can be calculated by equation (6). Here, the search ranges are set as R ∈ [ 0.60,1.00 ) (step=0.01) and N = 200 . The results are shown in Table 1 and Figure 2. R ϕ ( R) R ϕ ( R) R ϕ ( R) R ϕ ( R) R ϕ ( R) 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 13.430 13.344 13.259 13.157 13.096 13.018 12.942 12.869 0.68 0.69 0.70 0.71 0.72 0.73 0.74 0.75 12.800 12.733 12.671 12.612 12.558 12.509 12.466 12.428 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 12.398 12.375 12.361 12.358 12.366 12.388 12.425 12.482 0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 12.560 12.666 12.804 12.983 13.212 13.507 13.887 14.381 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 15.053 15.921 17.158 18.972 21.822 26.824 37.524 73.466 Table 1. Results for situation 1 From Table 1, it can be seen that the lowest cost per unit time for the deteriorating system in the life cycle is 12.358 and the optimal system reliability threshold R for the preventive maintenance is 0.79. The first five scheduled preventive maintenance for this reliability threshold can be performed at 83.614, 70.23, 54.441, 40.335 and 29.397. From Figure 2, it can also be seen that as the system reliability increases, the cost per unit time for the deteriorating system in the life cycle decreases at first and then increases rapidly after it reaches the lowest point. This conclusion is true for all the deteriorating systems even when they have different hazards rate functions and different maintenance costs. Other numerical results are also obtained with the variation of c p , cr and c f respectively, while other parameters are fixed. It can be concluded that; (1) The optimal reliability threshold R gives a gradual decrease with the increase of preventive maintenance cost c p . This means that the more expensive the preventive maintenance, the less frequent the scheduled preventive maintenance works. Figure 2. Relationship between ϕ ( R ) and R (2) With the increase of the replacement cost cr or system’s failure cost c f , the optimal system reliability threshold R exhibits a gradual increase. This implies that more frequent the scheduled preventive maintenance actions should be performed as the replacement or system’s failure cost more. 3.2. Results for situation 2 For this situation, the steps to find the optimal reliability threshold R and maintenance cycle number N * provided in section 2.3 are as follows. Firstly, let m = 20 then the search ranges is N ∈ [1, 20] ; secondly, for the given cycle number n , find the optimal reliability threshold Rn that minimizes the expected cost per unit time ϕ ( R, n ) ; lastly, when the system reliability threshold Rn for all the cycles have been found, select the lowest 107 ϕ ( R, N * ) among ϕ ( R, n ) and the corresponding Rn and n are the optimal reliability threshold R and the optimal cycle number N * . The results obtained are listed in Table 2 and described in Figure 3. Cycle Number Reliability Threshold Cost per Unit Time 1 0.75162 15.624 87.718 2 0.85793 10.769 75.080 63.061 3 0.89566 9.3798 69.138 58.071 45.015 4 0.91490 8.9603 65.532 55.042 42.668 31.612 5 0.92650 8.9529 63.076 52.979 41.068 30.427 22.176 6 0.93424 9.1550 61.283 51.473 39.901 29.562 21.546 15.788 7 0.93977 9.4757 59.910 50.320 39.007 28.900 21.063 15.434 11.494 8 0.94390 9.8693 58.821 49.405 38.298 28.375 20.680 15.154 11.286 8.5634 9 0.94712 10.311 57.934 37.721 27.947 20.369 14.925 11.115 8.4344 10 0.94969 10.786 48.661 6.5113 48.041 6.4284 37.241 5.0357 27.591 20.110 14.735 10.974 8.3270 Time Intervals for Scheduled Maintenance 57.197 Table 2. Results for situation 2 Figure 3. Relationships between N , R and ϕ ( R, N ) From Table 2, it can be seen that the lowest cost per unit time in the system’s life cycle is 8.9529 and the optimal system reliability threshold for the preventive maintenance is 0.92650. A replacement will be taken on the deteriorating system at the 5-th preventive maintenance cycle. The corresponding time intervals for the preventive maintenance were provided in the table and it can be clearly seen that the time intervals for the scheduled preventive maintenance decreases as the cycle number increases, which also indicates that the preventive maintenance is imperfect and the system is subject to a degradation process. From Figure 3, it can be seen that as the preventive maintenance cycle number n increases, the optimal reliability threshold Rn for the given n increases, too. But the expected cost per unit time ϕ ( R, n ) for the deteriorating system decreases sharply at first and then increases slowly after it reaches the lowest cost. Generally, this is true for all the deteriorating systems under imperfect preventive maintenance in the scheduled finite maintenance cycle. Conclusions With the consideration of the imperfect preventive maintenance, the hybrid hazards rate recursion model is applied to investigate the restoration effect after performing preventive maintenance on the deteriorating system. 108 Two different situations that are often met in practice have been considered and the corresponding models are also developed to minimize the expected cost per unit time for the system in the life cycle. Whenever the system reliability reaches the threshold R , which is deduced by minimizing the expected cost per unit time, a scheduled imperfect maintenance action will be performed on the system. For the second situation, the optimal cycle number N * as well as the reliability threshold R can also be determined by the cost optimization rules. Through the numerical examples, it can be seen that the models described in this paper can be applied in practice to schedule the maintenance work. The results and discussions provided in the numerical methods exhibit how the optimal schedule depends on different cost parameters in different situations. With these results and discussions, maintenance personnel can make their maintenance work schedule for the deteriorating system under imperfect maintenance and it is possible for the enterprises to perform preventive maintenance actions with near-zero inventory for the spare parts. In order to apply the imperfect maintenance models in practice more effectively and efficiently, further research, especially on deciding the improvement factors ( ai , bi ) in this hybrid hazards rate recursion model, will be needed to perfect these models. References Zhou, X., Xi, L. and Lee, J. (2006) Reliability-centered Predictive Maintenance Scheduling for a Continuously Monitored System Subject to Degradation, Reliability Engineering and System Safety, 1 Pham, H. and Wang, H. (1996) Imperfect Maintenance, European J. of Operational Research, 94, 425-438 Wang, H. and Pham, H. (1996) Optimal Age-dependent Preventive Maintenance Policies with Imperfect Maintenance, International J. of Reliability, Quality and Safety Engineering, 3 (2), 119-135 Malik, M. A. K. (1979) Reliable Preventive Maintenance Policy, AIIE Trans., 11 (3), 221-228 Nakagawa, T. (1988) Sequential Imperfect Preventive Maintenance Policies, IEEE Trans Reliab., 37 (3), 295-298 Cheng, C. and Chen, M. (2003) The periodic Preventive Maintenance Policy for Deteriorating systems by Using Improvement Factor Model, International J. of Applied Science and Engineering, 1 (2), 114-122 Chen, Y. and Cheng. C. (2002) The System Development of Ordinary Periodic Preventive Maintenance Model, Department of Industrial Engineering and Management Chaoyang University of Technology, Aug. 30, 2002 Kuang, Y., Miao, O. and Huang, H. (2006) Optimizing Sequential Imperfect Preventive Maintenance for Equipment in Life Cycle, Proceedings of the First International Conference on Maintenance Engineering, 242-249 Jayabalan, V. and Chaudhuri, D. (1992) Cost Optimization of Maintenance Scheduling for a System With Assured Reliability, IEEE Trans Reliab, 41 (1), 21-25 Lin, D., Zuo, M. J. and Yan. R. C. M. (2000) General Sequential Imperfect Preventive Maintenance Models, International J. of Reliability, Quality and Safety Engineering, 7 (3), 253-266 109 Contribution to modelling of dynamic dependability of complex systems David Valis University of Defence, Kounicova 65, 612 00 Brno, Czech Republic david.valis@unob.cz Abstract: The paper deals with the dependability assessment of complex systems. As we investigate situations regarding military applications the dynamic dependability is very important for us. Dependability characteristics of military battle equipment have the same importance for us as those which have to serve to perform battle missions itself. There is no time on the battle field to solve unpredicted and unexpected situations caused by unreliability which might lead to loss of both equipment and crew. Due to the high level of risk we face on the battlefield, many systems have to be robust enough or have to be redundant to succeed. 1. Introduction As we know, there are a number of characteristics which might be investigated and solved regarding military applications. Some of them are typically related to the performance of the object although others are related to supporting characteristics. The supporting characteristics do not mean that they play a second class role but usually are not preferred as much as those related to performance. In our branch of interest we talk about dependability and its attributes. The common and well known dependability characteristics are often announced and used for various calculations as well as to describe the item itself. We typically know these characteristics from different types of tests performed during the development and testing phase. Such characteristics are related to the so called inherent dependability – inherent availability. Apart from these specifications, we also need to know the real behaviour in the battle field – in real deployment while completing mission. In the real deployment we talk about characteristics related to the “so called” operational dependability – operational availability. These characteristics are not calculated theoretically but their calculation is based on practical and real possible situations. Such as, the real picture about technical item behaviour, namely military battle vehicles, is the most important for us. Several measures join the set of “dynamic dependability” characteristics. To be able to carry out the dynamic dependability analysis we have to know the edge conditions and our limitations for that. Dynamic in these terms means to have the information we need just in time. We may for instance choose several possibilities for getting the time related characteristics regarding the military battle vehicle. One of the most appropriate seems to be Markov analysis. Beyond the dynamic characteristics, we also need to know the potential risk level in case of unexpected event occurrence both during the training phase and during real deployment while completing a mission. If we talk about dynamic dependability we take into account those events which have the major impact on a vehicle’s function – a failure. The only failures we assess are the failures from internal reasons. We do not count the possible failures caused by external reasons – in case of battle vehicles caused by hit or attack while performing a mission. In the following parts, we deal with all the above mentioned issues. 2. Risk on battle field and its assessment In our lives we can recognise and we know plenty of circumstances which may generate the existence of a risk. As we talk about a risk, we subconsciously feel something wrong, negative, and unpleasant. We feel endangered or possibly a hazard. The more we know about risk, the harder we cope with it. In some situations, we can not do anything else other than get used it. In some cases, we may avoid it, reduce it or ignore it. There are many ways of observing a risk and ways of handling it. The whole discipline dealing with risk has the name “Risk management” and its fragments have crucial importance for us. Due to the fact that we are dealing with military battle vehicles, we have to recognise a bit more than the standard risk spectrum – the risk profile we usually see for civilian vehicles. As the battle vehicles have to perform their mission in very difficult environments under very adverse conditions, the spectrum of possible impacts is very high. We talk about sources of risk. A battle vehicle has the potential to be in collaboration with more than one source of risk, it does not really matter if the vehicle carries out training or if it is in real deployment. Of course, the real deployment may bring more consequences in the case of an event occurrence. A failure in training does not necessarily need to be as crucial as one in the case of a real mission. A failure occurrence either in training or in a real mission puts the vehicle into an involuntary situation which is raised due to the military tasks it has to fulfil. Due to the very high possibility of being immediately attacked in the battle, the risk that arises is also very high. Considering the above, we use following description of risk for further work. Let there exist a certain source of risk, either tangible (environment, object, human being) or intangible (activity). This source can have both positive, but as in our case also negative impact to its surroundings (other 110 tangible or intangible elements). The existence of this impact is not always so important. The existence of such risk (i.e. negative impact) becomes important only when its impact or importance results from an interaction, which exists between an element (individual, group, technical object or activity) and a source (environment or activity). In this moment it is necessary to realize that risk as such does not exist, if there is no interaction between the source of risk and the object (element) that has a certain relationship to this source. It is necessary to take into account the fact that interaction can also have various forms. It may be, for example, a voluntary, involuntary, random, intentional, etc. interaction. The effect of these impacts can be attributed especially to an environment, in which the object occurs during its existence. Any such impacts shall be generally called the area of risk. The important and integral part of all analyses will be precise, qualitative and sufficient identification of this source of risk alone. Without this source, we can hardly deal with a risk in a qualified way. Considering these facts, we may understand that risk can be assessed both qualitatively and quantitatively (of course in both cases as well). Basic expressions which put risk into a commonly understandable form and which enable further dealing with risk are as follows. The first and very well known description is in the form of an equation which may serve both for qualitative and quantitative assessment: R = P×C (1) where: R – Risk; P – Probability; C – Consequences. This expression allows us to carry out both qualitative and quantitative assessments. The problem is that we do not have any numerical expressions with physical units. The second very well known form for risk expression is the following formula: P×C R= ×E (2) M where: R – Risk; P – Probability; M – Measures; E – Exposition. This expression allows us also to carry out both qualitative and quantitative assessments. A very big advantage is that we may have physical units related to risk for further analysis. For every element of the equations mentioned above, there are more or less clear procedures for their determination. We have to understand that risk assessment as part of risk management is subdivided in two possible ways. In terms of finding the solution, we either talk about “Logic (sometimes determination) Access” or “Probabilistic Access”. In the case of probability, the situation is more than clear. Although in English speaking countries we have to distinguish between the terms “Probability” and “Likelihood”, the determination is clear enough. In the case of exposition, we do not have to discuss very much the possibility for unit and function determination. We may expect problems in terms of measure or consequence determination. Such decisions are more or less based on expert expressions. This way is not necessarily bad but it does not give us the possibility to validate or verify a statement made. From this point of view, as well as from our own historical experience, we recommend using new progressive forms and procedures for measure and consequence determination. As we very often work with language and qualitative measures which are consequently somehow connected to scales (numerical expressions of qualitative expressions) we would like to be sure enough that our decision was not bad and that in the same circumstances under the same conditions, one day later it will be made in the same way. The theory of fuzzy probability and fuzzy logic seems to suit to this purpose very well. For more details on how to solve such an issue, see Vališ (2006(b)). 3. Counting distribution of observed variable and dependability Based on the part describing risk assessment above, we have been looking for expressions of object behaviour. Such behaviour will give us an appropriate picture about the real conditions of the object and will allow us to prepare possible missions with such an object. From a mathematical point of view, we may distinguish between two ways of observing object behaviour. Such as behaviour based on measures and characteristics used. In this part, we would like to describe a potential way for the dependability assessment of a complex technical system which is represented by a counting value that is an observed variable. We know the basic characteristics and measures related to the object. Also, in this case – solving the issue related to the counting variable – we use Markov analysis for getting several characteristics of dynamic dependability. We have chosen an automatic 111 cannon which shoots using rounds for our purpose and description in such a case. If a failure on a round occurs, the part restoration system allows it to re-charge a faulty round with a new one. We talk about partial repair. The system may basically stay in two states as described bellow using scenarios for their description. The mission is completed. In the first case there can be a situation when all the ammunition of a certain amount which is placed in an ammunition belt is used up and a round failure occurs, or it is used up and a round failure does not occur. In this case a backup system of pyrotechnic cartridges is able to reverse a system into an operational state. Using up can be single, successive in small bursts with breaks between different bursts, or it might be on mass using one burst. Shooting is failure free or there is a round failure occurrence n. In case a round failure occurs, a system which restores a function of pyrotechnic cartridges is initiated. There are two scenarios here too – a system restoring a pyrotechnic cartridges function is failure free, or a pyrotechnic cartridge fails. If a function of pyrotechnic cartridges is applied, it can remove a failure m times. So the number of restorations of the function is the same as the number of available pyrotechnic cartridges. In order to complete the mission successfully we need a larger amount of pyrotechnic cartridges m, or in the worst case, the number of pyrotechnic cartridges should be equal to the number of failures. Another alternative is the situation that a round fails and in this case a pyrotechnic cartridge fails too. A different pyrotechnic cartridge is initiated and it restores the function. This must satisfy the requirement that an amount of all round failures n is lower or at least equal to a number of operational (undamaged) pyrotechnic cartridges m. The mission is completed in all the cases mentioned above and when following a required level of readiness of a block A. The mission is not completed. In the second case, the shooting is carried out one at a time, in small bursts or in one burst, and during the shooting there will be n round failures. At the time the failure occurs, a backup system for restoring the function will be initiated. Unlike the previous situation there will be m pyrotechnic cartridge failures and a total number of pyrotechnic cartridge failures equalling at least the number of round failures and at most, equal to the number of implemented pyrotechnic cartridges M. It might happen in this case that a restoring of the function does not take place and the mission is not completed at the same time because there are not enough implemented pyrotechnic cartridges. 1. An alternative of a function when the mission is completed. 1 - P(B) 1 - P(B) 1 - P(B) 1 - P(C) 1 - P(B) P(B) m1 0 P(C) m2 2. An alternative of a function when the mission is not completed. 1 1 - P(C) 1 - P(C) …… mm 1 P(C) P(C) Figure 1. Description of transitions among the states Characteristics of the states: 0 state: An initial state of an object until a round failure occurs with a probability function of a round P(B). It is also a state an object can get with a pyrotechnic cartridge probability P(C) in case a round failure occurs P B = 1 − P(B ) , () or P(C| B ) = P( C ∩ B ) . P( B ) m1…mm state: A state an object can get while completing the mission. Either a round failure occurs in probability P B = 1 − P(B ) , or there is a pyrotechnic cartridge failure in probability () 1 state: () P C = 1 − P(C ) . A state an object can get while completing the mission. It is so called an absorption state. Transition to the state is described as probability 112 () P C = 1 − P(C ) of a failure of last pyrotechnic cartridge as long as an object was in a state „kn“ before this state, or it can be described as probability of a round failure occurrence P B = 1 − P (B ) as long as an object was in a state 0 before this state and all pyrotechnic cartridges are eliminated from the possibility to be used. Transitions among different states as well as absolute probability might be put in the following formulae: P(0) = P(B ) + P C k1 0 + P C k2 0 + P C k3 0 + ... + P C k1n 0 () ( ) ( ) ( ) P(m1 ) = 1 − P(B ) P(mm ) = (1 − P(B )) + (1 − P(C )) P(1) = 1 ( Transition probabilities are described using a matrix of transition probabilities P p 01 ⎤ ⎡p P = ⎢ 00 ⎥ ⎣ p10 p11 ⎦ ) (3) (4) (5) (6) (7) The arrows in figure 1 describe that the transition probability may occur with positive value. If we know the form of transition probability matrix P and the original initial distribution of variable pi(0), then we can express the absolute probability of random variable pi(n) as follows: pi ( n ) = p k ( 0 ) p ki ( n ), i∈I (8) ∑ k∈I This formula can also be expressed in matrix form as follows: n P( n ) = P( 0 )P (9) We might describe the behaviour of the item in a stationary state in terms of probability using limit probabilities pj defined as follows: p j = lim pij ( n ), j∈I (10) n →∞ The importance of limit probabilities lies in the expressing of the weakening of initial conditions. With the help of this statement, we can get quite an exact picture about the behaviour of our observed item. We are either happy enough to know that after going off the initial condition, the item will stay in one state with certain probability. Or we may use the help of absolute probabilities to determine in which state the item will be after going off a specific number of some measured units. This way allows us to get the dynamic (in time) picture about the observed object. 4. Continuous distribution of observed variable and dependability As we described the counting variable relating to the observed item above, we may also use the continuous variable for getting a picture about the object behaviour. We are looking for a random function NF X(t), where X(t) takes values from the set I={0,1,2}. We call the items from set I the states of the observed process. If the parameter involved time (for instance t = < 0,∞), then we call the random function NF X(t) a Markov chain with continuous parameter. We also call such a chain homogenous if the following formula is valid: pij(s,t) = pij (0,s-t) = pij(t-s) ; s < t (11) It is clear from the above mentioned formula that the transition probabilities among each state are dependent on the difference of arguments t-s and are dependent on the arguments t and s themselves. Such a model is valid for those items and systems which are not capable of performing any operation even in a reduced mode when a failure occurs. From a state point of view, they immediately transfer from state “0” – operating state to state “1” – disabled state. This form is the most frequently used and for those items or systems with partial performance capabilities is extended to at least one mean state. Items or systems behaving in this way are not very suitable for us, due to the potential danger of a complex inability to perform any function in the case of failure. The transitions among states might be described using either probabilities or rates (as displayed bellow). Any transition among states may occur and the model has the following form: 113 As in the previous part with the counting parameter, we use the same description for states. The assignment “0” means that the item/system is in an operational state and the assignment “1” means that the item/system is in a disabled state. Such a description may be applied on different complete systems (e.g. vehicles, weapon systems) or subsystems (e.g. engines) in the frame of military equipment. We are also able to create plenty of different scenarios for each state description. For transition rates of the following form: For i = 0 and j = 1: 1 1 = (12) qij = MTBF E P ( X ) where E P (MTBF – Mean Time Between Failures) – is the mean value of time to failure and i∈{0;1}, j∈{0;1}, whereas j ≠ i. For i = 1 and j = 0: 1 1 = (13) q ji = MTTR E O ( x ) where EO (MTTR – Mean Time To Reparation) – is the mean value of time to repair and i = 0, j = 1. We resume following part of the above mentioned mathematical notation. The following formula is valid for the Markov chain with continuous parameter. We define the transition rate as follows. Let h denote an increment of the argument t, then qij is pij ( h ) , for i ≠ j (14) qij = lim h →∞ h Whereas pij denotes a transition probability from state i to state j during an interval with length h, then we call the value qij a transition probability from state i to state j. Using formula (14) the following is also valid: pij(h) ≈ qij.h (15) If the pii(h) denotes transition probability from state i to state j during a time interval t, then we define the value qi, where 1 − p ii ( t ) qi = lim (16) h→∞ h is the transition rate from state i and we have qi = -qii. Using formula (15), the following form is also valid: pii(h) ≈ 1- qi.h (17) Values qi and qij also fulfil the condition qi = ∑ q ij , for all i ∈ I (18) j∈I , j ≠ i where I is a set of states considered I∈{0;1;2;…}. We also would like to introduce the equations for transition probabilities calculation. The forms are as follows: ′ p ij ( t ) = ∑ p ik ( t ).q kj , for i,j ∈I (19) k∈I We also would like to introduce the equation system for absolute probabilities calculation. The forms are as follows: ′ p i ( t ) = ∑ p k ( t ).q ki , for all k, i ∈ I (20) k∈I It is necessary to know the particular transition rates among states for exact calculation of the above mentioned differential equations. These equations are tused o give us exact information about the system and especially what time the system will be in a particular state. We see as suitable using the theory of “Inherent availability of complex system composed from many mutually independent components” for each measures (like the transition rate for instance) calculation. The results of these differential equations will give us the transition probabilities as well as the absolute probabilities for expressing what time the system will be in what state. Such a piece of information is exactly related to the dynamic dependability measures. Our decision making would be much harder without this kind of information. That is why we do appreciate such procedures for dynamic dependability indication especially when considering military vehicles. 5. Method applicability example The methods applicability has been proven on two main military projects carried out in the Czech Armed Forces during recent years. The first of them has been the main battle tank modernisation project. During the 114 preparation phase, there should have done several analyses on the impact of modernising items. Although the analysis of power pack and other technical measures had been evaluated well, the other analyses (among others the dependability) had not been carried out properly. While testing tactical and technical capabilities of the new modernised version of the tank, many unexpected failures occurred during this phase. Some of these failures might have lead to the loss of the vehicle either due to immobility or due to the loss of other key combat functions both on the battle field and in the training phase. The result was to stop the overtaking and operating the test version until the causes of failures could be found and removed. Using the method presented here (for the continuous variable) the dependability measures have been improved and efficiency has increased. The second example of an application of the method is the new project being conducted for the air force. We talk about new two-barrel automatic cannons for sub-sonic fighters. The cannon is a very complex mechatronics system using two key parts which behave in the counting variable way – ammunition and pyrotechnical cartridges. As the project has still been in the testing phase of operation, the method has been serving to diminish the frequency of failures which occur during observed periods and to implement construction changes for dependability improvement. Again, the application of the method has proved its success. Both of the examples presented involve very complex procedure to carry out. The detailed description is much wider than this contribution allows. For further information or for personal interest, please contact the author of this contribution. 6. Conclusions This paper describes the procedures which are suitable for dynamic dependability assessment. We have been desperately looking for new and progressive methods which allow us to get a more precise view on military (battle) equipment. The more information about such equipment we have, the more successful the possible deployment might be. One of the things we have to take into account and not appear like it does not interest us is risk. The risk is very high both in training time and in real deployment as well as in the risk profile. The first part of the paper deals with the basic understanding of risk and elementary formulas for its expression. The following parts show the dynamic dependability assessment and investigation both for counting and for continuous situations. We need to be aware when using each procedure to respect the conditions of the particular procedure. Both of the procedures shown above have been proven in the frame of the Czech Armed Forces on respective equipment. These examples have confirmed the ability of the mathematical procedures to express the system behaviour in terms of the dynamic dependability. The results corresponded with reality as well as with our expectations. Acknowledgement This contribution has been made with support of “Research Purpose Fund” of Faculty of Military Technology Nr. 0000 401, University of Defence in Brno. References Kropáč, J. (2002) Vybrané partie z náhodných procesů a matematické statistiky, Brno: VA v Brně, Skripta S-1971 Kropáč, J. (1987) Základy náhodných funkcí a teorie hromadné obsluhy, Brno: VAAZ, Skripta S-1751/A Vališ, D. (2003) Analysis of vetronics application consequences onto military battle vehicles dependability, Brno: VA v Brně, Dissertation thesis Vališ, D. (2005) Fundamentals of description, perception and value determination of Risk, Liberec: Technical University 2005 Vališ, D. (2006(a)) Contribution to Reliability and Safety Assessment of Systems, Sborník příspěvků konference – Opotřebení Spolehlivost Diagnostika 2006, Brno: Universita Obrany, 31. říjen – 1. listopad 2006, str. 329 - 337, ISBN 80-7231-165-4 Vališ, D. (2006(b)) Assessment of Dependability of Mechatronics in Military Vehicles, Sborník příspěvků konference – Opotřebení Spolehlivost Diagnostika 2006, Brno: Universita Obrany, 31. říjen – 1. listopad 2006, s. 309 - 319, ISBN 80-7231-165-4 Vališ, D. and Vintr, Z. (2006) Dependability of Mechatronics Systems in Military Vehicle Design, Proceedings of the European Safety and Reliability Conference “ESREL 2006” (September 18 – 22, 2006, Estoril, Portugal), London/Leiden/New York/Philadelphia/Singapore: Taylor&Francis Group 2006, 1703 - 1707, ISBN 10 0 415 41620 5 115 Weak sound pulse extraction in pipe leak inspection using stochastic resonance Yang Dingxin, Xu Yongcheng The Institute of Mechatronics Engineering, National University of Defense Technology, Changsha, P.R.China Zip Code: 410073 yangdingxincn@gmail.com Abstract: Acoustic methods are widely used in industrial pipe leak inspections. A series of sound waves transmit through the pipe. When there is a leak at some position in the pipe, a sound pulse containing the position information of the leak will emerge in the received sound wave signal. Due to the environmental noise or the minority of the leak, the amplitude of the sound pulse is usually very weak. It is important to extract the weak sound pulse from the noisy signal. Stochastic resonance (SR) is a kind of nonlinear phenomenon occurring in special nonlinear systems subjected to noise and other excited signals, which is reported to be useful in enhancing digital signal transmission and detecting weak periodic signals. In this paper, the model of SR is applied to the weak sound pulse extraction from the received sound signal. We detail the principle and algorithm of the SR model for extracting weak aperiodic pulses. A simulation scheme and results are presented. Actual leak inspection experiments of a PVC pipe with small leaks are carried out using the SR model to extract the weak sound pulse. The results show that SR provides a novel and effective way for weak sound pulse extraction in pipe leak inspection. 1. Introduction Pipelines are widely used in metallurgy, petroleum, water supply and sewage industries. There are often cracks and leaks occurring in the pipes because of the extreme environmental conditions. Among acoustic methods, the use of sound pulse for leak inspection is known to be a novel one, which has the properties of fast inspection speed, high efficiency and has no constraint on the material of the pipes. When a series of sound waves transmit through the pipe, if there exists a leak at some position in the pipe, the reflecting sound wave containing the position information of the leak will reverse back to the transmission end. A sound pulse representing the leak will emerge in the received sound wave signal. As the speed of sound transmission in a certain pipe is changeless, we can work out the position of the leak location in the pipe by processing the received sound pulse signal with a computer. Due to the environmental noise or the minority of the leak, the amplitude of the sound pulse is usually very weak. The sound pulse is often submerged in varieties of noises and is hard to distinguish. Junming & Zingxin (2001) applied wavelets to process the weak sound pulse. Some weak sound pulse signals are extracted from noisy backgrounds using wavelets, but the wavelet method is sensitive to the selection of basic wavelet functions. Different basic wavelet functions can bring forth entirely different results, and in the case of strong noise, the effect is not good. Furthermore, as a kind of characteristic signal, weak aperiodic pulses often occur in weak chemical spectrum extraction, singularity detection etc. As a result, weak sound pulse extractions from the noisy signals are getting more and more attention. SR is a nonlinear phenomenon generally occurring in dynamical bistable systems excited by noisy periodic signals, which was first introduced by Benzi, Sutera & Vulpiani (1981) to explain the periodic switching of the ancient Earth’s climate between ice ages and periods of relative warmth. Through SR, the signal-to-noise ratio (SNR) of weak signals immersed in heavy noise can be greatly improved if the nonlinearity, weak signal and noise satisfy some conditions. Initially, SR was limited to treating weak periodic signals, and the aperiodic signals were rarely involved. In recent years, it has been found that SR can also enhance the transmission of weak aperiodic signals through nonlinear bistable systems. Collins, Chow & Imhoff (1995) introduced aperiodic SR in research on the response of excitable neurons to weak aperiodic signals. Neiman & Schimansky-Geier (1994) discussed the harmonic noise transmission through bistable systems. The weak chemical spectrum signal (a kind of aperiodic pulse wave submerged in noise) is extracted by means of the aperiodic SR in bistable systems, see Wang Liya et al. (2000) and Pan et al. (2001). In this paper, we use the SR algorithm of bistable systems to extract the sound pulse in leak inspections of pipes. Firstly, a stochastic resonance numerical simulation algorithm is proposed to meet the detection of weak aperiodic pulse signals. The effect of the parameters of the nonlinear system on the performance of the algorithm is discussed in detail and the optimisation of these parameters is studied. Finally, the SR algorithm is applied to PVC pipe leak inspections: extracting the weak sound pulse representing a small hole in the pipe from heavy noise. 116 2. Principle and algorithm of SR model for extracting a weak aperiodic pulse The nonlinear bistable system modulated by signal and Gaussian white noise has been extensively exploited in the study of SR, which is defined by nonlinear Langevin equations as x&= −U ′( x ) + K gp ( t ) p (t ) = u (t ) + n(t ) (1) < n ( t ) >= 0; < n ( t ), nΓ ( t ) >= 2 Dδ ( t − t ) ' where U ( x )=- a / 2 x 2 + b / 4 x 4 is a double-well potential with positive parameters a and b characterizing the system, which has an unstable maximum at x = 0 and two stable minimum points at x = ± a / b respectively. The potential barrier height is ∆U = a 2 / 4b . p(t) denotes the input signal embedded in a noisy background with weak signal u(t) and noise n(t). n(t) is zero mean Gaussian white noise with variance σ 2 = 2D . K is the adjustable parameter to normalise the input signal. When SR takes place, the input signal, noise and nonlinear system cooperate well and the signal will extract energy from the noise. Thus, the strength of the signals will increase and that of the noise will decrease. The obtained output signal of the system will have a better SNR when compared with the input. When the input signal is a sinusoid, u ( t ) = A sin ω t , according to the adiabatic theory of SR, the SNR of the output signal can be derived from Equation (1), see Gammaitoni (1998), as 2 A ∆U 2 SNR = D 2 − ∆U e (2) D It is clear that the SNR of the output signal can be improved by modulating the potential barrier ∆U and the noise strength D. Since ∆U is decided by system parameters a and b, the values of a and b have a substantial effect on the performance of detection. The same conclusion can be obtained for the aperiodic signal. In this work, no external noise will be added. Only the parameters of the system are adjusted to match the aperiodic pulse signal and intrinsic noise to achieve SR. The input signal is normalised over [-1,1] by adjusting factor K. Therefore, all samples will have the same strength in the nonlinear system. Until now, it has been difficult to conduct accurate analytical analysis on the aperiodic SR behaviour of bistable systems subjected to aperiodic random signals. As a result, numerical simulation is adopted here for detecting weak aperiodic pulse signals. The discrete Langevin equation (1) is approximately solved using the Runge-Kutta method. The algorithm can be described as follows; 1 xn +1 = xn + ( k1 + 2k 2 + 2k3 + k 4 ) n = 0,1,L N − 1 (3) 6 k1 = h[axn − bxn + Kpn ] 3 k 2 = h[a ( xn + k3 = h[a ( xn + k1 2 k2 2 ) − b ( xn + ) − b( xn + k1 2 k2 2 ) 3 + Kpn ] (4) ) + Kpn +1 ] 3 k 4 = h[a ( xn + k3 ) − b( xn + k3 )3 + Kpn +1 ]. where, xn and pn denote the nth sample of x(t) and p(t) respectively, h denotes time step, the reciprocal of sampling frequency fs, and N is the number of sampling points. A cross-correlation based measure that considers the correlation between the stimulus signal and the system response was usually used to evaluate the quality of the aperiodic SR. This is termed as the power norm C0. Assume the signal is p(t) and the output is x(t), then C0 is given by C0 = [ p (t ) − p (t )][ x (t ) − x (t )] (5) where, the over bar denotes an average over time. Based on the power norm, the normalised power norm C1 is given by C1 = C0 [ p (t ) − p (t )] 2 117 (6) [ x (t ) − x (t )] 2 From a signal processing perspective, maximising C1 corresponds to maximising the shape matching between the input stimulus p(t) and the system response x(t). As such, this measure enables one to quantify the detection performance. 3. Numeric simulation performance of SR model A numerical simulation method is adopted in this research for detecting a weak aperiodic pulse signal. The effect of the system parameters on the detection performance and the constraint condition for the weak aperiodic pulse signal itself are also investigated. Simulated aperiodic pulse signals were produced according to the following equation; ⎡ ⎛ t − t0 ⎞ 2 ⎤ u (t ) = h0 exp ⎢ε ⎜ ⎟⎥ ⎣ ⎝ W ⎠⎦ (7) There are three parameters to control the waveform of the aperiodic pulse signal. In Equation(6), h0 is the peak height, W denotes the peak half-width, ε is used to control the attenuation speed of the pulse signal. t and t0 denote timea and the peak position, respectively. The original aperiodic pulse signal with h0 = 0.8, W = 0.1, t0 = 0.3, ε = −2.7 is shown in Figure 1(a), and the mixed input signal added with Gaussian white noise with = 1 is shown in Figure 1(b) as an example where, the sample frequency is fs =1000Hz and the number of sample points is N = 1000. Figure 1. Simulated aperiodic pulse signal and the mixed input signal with noise 3.1 The effect of system parameters a and b on the performance of detection In the numerical simulation, the proposed Runge-Kutta algorithm is used to solve the aperiodic pulse signal detection model. The parameters a and b of the nonlinear system define the height of the potential barrier and potential’s profile. Therefore, a and b have a substantial effect on the quality of the output signal. A Normalised power norm C1 is used to quantify the detection performance. With system parameter a = 10 is invariable, the normalised power norm C1 obtained from Equation (6) for different system parameter b is shown in Figure 2, where h0 = 0.8, W = 0.1, t0 = 0.3, ε = −2.7 , σ =1, and the xcoordinate is logarithmic. It can be seen from Figure 2 that the normalised power norm C1 initially decreases with increasing b, and then reaches the minimum. Then C1 increases with increasing b until a maximum is attained, and C1 decreases slowly again. System parameter b varies in a large extension. Therefore, the optimisation of b should avoid falling in the valley of the curve. In general, a proper small b is fit for the detection. Figure 3 presents the obtained results of C1 with different a for the same dataset used in Figure 2, while b = 0.01 holds constant. The x-coordinate is also logarithmic. It is clear that C1 has a single maximum with the increase of a. The response speed of system (1) is mainly decided by a, so the optimised a should match the change speed of the signal to improve the SNR of the output signal. In Figure 3, C1 reaches the maximum when a is about 16. Figure 2. Normalized power norm C1 to parameter b Figure 3. Normalized power norm C1 to parameter a In practice, the optimisation of the system parameters is essential and complicated. Initially, a small value for b should be considered. In addition, more attention should be given to the selection of parameter a. The match with the speed of the signal should be considered and a is often selected using many attempts and practical experience. 118 3.2 The effect of signal parameters h0 and W on the performance of detection In the following, the effect of signal parameters h0 and W on the performance of detection is studied and the constraint conditions for signal parameters h0 and W are obtained. Firstly, the parameter h0 is considered. With system parameters a = 50, b = 0.01, W = 0.1, t0 = 0.3, ε = −2.7 , σ =1 invariable, the normalised power norm C1 obtained from Equation (6) at different peak height h0 is shown in Figure 4. It is obvious that C1 increases smoothly with an increasing peak height h0. In order to distinguish the weak aperiodic pulse signal from the noise, C1 must be not less than a certain value for the output signal to be of a decent quality. According to the numerical experiments, C1 must be greater than 0.5, therefore from Figure 4, the peak height must not be less than 0.3. Next, the effect of signal parameter W on detection performance is considered. With system parameters a = 50, b = 0.01, h0 = 0.8, t0 = 0.3, ε = −2.7 , σ =1 invariable, the normalised power norm C1 obtained from Equation (6) at different peak half-width W is shown in Figure 5. The curve of C1 in Figure 5 is similar to that in Figure 4. C1 increases monotonously with an increasing peak half-width W. An increase in W will enhance the detection performance and a reduction in W will have a substantial influence on the quality of the output signal. According to the numerical experiment, C1 must not be less than 0.05 for the weak aperiodic pulse signal to stand out from the noise background. Figure 4. Normalized power norm C1 to peak height h0 Figure 5. Normalized power norm C1 to peak half-width W 3.3 Example of weak aperiodic pulse signal detection With W = 0.1, t0 = 0.3, ε = −2.7 , σ = 1 and h0 = 0.5, 0.8, 1.2, and 1.5 respectively, the mixed input signals are shown in Figure 6. In accordance with the discussion above, the system parameters are taken as a = 10 and b = 0.01. The output signals obtained from Equation(6) are shown in Figure 7. Figure 6. Simulated mixed input signal (a) h0=0.5 (b) h0=0.8 (c) h0=1.2 (d) h0=1.5 Figure 7. Obtained output signal (a) h0=0.5 (b) h0=0.8 (c) h0=1.2 (d) h0=1.5 From Figure 7, it can be seen that the weak aperiodic pulse signal is extracted from the strong noise background and there is good shape-matching between the output x(t) and the signal p(t). 4. PVC pipe experiment When a series of sound waves transmit through the pipe, if there exists a leak at some position in the pipe, a weak sound pulse representing the leak will emerge in the received sound wave signal. Because the speed of sound transmission in a certain pipe is changeless, we can work out the position of the leak along the pipe by processing the received sound pulse signal with a computer. Due to the environmental noise or the minority of the leak, the amplitude of the sound pulse is usually very weak. The sound pulse is a kind of aperiodic pulse signal. In order to examine the effectivity of the above stochastic resonance algorithm, sound pulse leak 119 inspection experiments were carried out on the PVC pipe. The PVC pipe has a diameter of 25 centimetres and a length of 4 meters. We drilled a very small hole (1.2 mm in diameter) in the pipe. We sent a series of sound waves from one end of the pipe using EEC-16/XB acoustic leak inspection instrument. Because of the drilled hole, in the sampled retuned sound wave signal, an aperiodic sound pulse emerged. The position of the sound pulse represented the position of the hole in the PVC pipe. Figure 8(a) illustrates the received sound wave signal without noise and the x-coordinate is the sample point. The sound pulse peak is located near the 30th sample point, which represents the position of the leak in the pipe. Figure 8(b) illustrates the received sound signal with noise (approximately σ =1). It is hard to distinguish the sound pulse peak from figure 8(b) and the position of the hole cannot be determined. We input the sampling data from Figure 8(b) into Equation (3), with parameters a = 0.4, b = 0.4, and time step h = 1. The output signal of Equation (3) processed by the stochastic resonance algorithm is shown in figure 9. Figure 8. Received sound wave signal Figure 9. Output signal of Figure8(b) through SR process In Figure 9, it is obvious that there is a hop between the two potential wells of the bistable system, thus the weak sound pulse representing the hole position in the pipe rises from the noise signal. The peak position is near the 30th sample point in accordance with the noiseless signal of Figure 8(a). 5. Conclusion The sound pulse method is fit for fast leak inspection in metal and non-metal pipes. It is widely applicable in the online leak inspection of heater tubes, condenser tubes, heat exchanger tubes etc. The aperiodic characteristic signal containing the information of the leak is often submerged in varieties of noises. The SR method proposed in the paper is a new attempt at solving the problem. The simulation and experimental results show that the SR method can improve the weak aperiodic sound pulse detection ability, and provide a novel means of weak sound pulse extraction. References Benzi, R., Sutera, A. and Vulpiani, A. (1981) The mechanism of stochastic resonance, J. of Physics A: Mathematical and General, 14, 453-457 Collins, J., Chow, C. and Imhoff, T. (1995) Aperiodic stochastic resonance in excitable systems, Physics Review E, 52, R3321-R3324 Gammaitoni, L. (1998) Stochastic resonance, Reviews of Modern Physics, 70, 223-287 Heneghan, C. and Chow, C. C. (1996) Information measures quantifying aperiodic stochastic resonance, Phys. Rev. E, 54, R2228-R2231 Junming, L. and Zingxin, Y. (2001) The application of wavelet analysis in sound pulse processing, Nondestructive Testing, 3, 34-35 Neiman, A. and Schimansky-Geier, L. (1994) Stochastic resonance in bistable systems driven by harmonic noise, Phys. Rev. Lett., 72, 2988-2991 Pan, Z. et al. (2003) A new stochastic resonance algorithm to improve the detection limits for trace analysis, Chemometrics and Intelligent Laboratory Systems, 1362, 1-9 Wang, L. et al. (2000) A novel method of extracting weak signal, Chemical J. of Chinese Universities, 21, 53-55 120 An empirical comparison of periodic stock control heuristics for intermittent demand items M. Zied Babai, Aris A. Syntetos Centre for Operational Research and Applied Statistics, Salford Business School, University of Salford, Maxwell Building, The Crescent, Manchester M5 4WT, UK m.z.babai@salford.ac.uk, a.syntetos@salford.ac.uk Abstract: Intermittent demand patterns are common amongst spare parts. Typically, the inventories related to intermittent demand SKUs are managed through periodic stock control solutions, though the specific policy selected for application will depend upon the degree of intermittence (slow/fast intermittent demands) associated with the SKUs under concern. In this research, the performance of some periodic stock control heuristic solutions that are built upon specific demand distributional assumptions is examined in detail. Those heuristics have been shown to perform well for differing spare parts demand categories and the investigation under concern allows insight to be gained on demand classification related issues. Keywords: Spare parts management, intermittent demand, stock control, empirical analysis 121 Minimising average long-run cost for systems monitored by the np control chart Shaomin Wu 1, Wenbin Wang 2 1. Sustainable Systems Department, School of Applied Science, Cranfield University, Cranfield, Bedfordhire, MK43 0AL, UK 2. Maxwell 603, Salford Business School, University of Salford, 43 The Crescent, Manchester, M5 4WT, UK shaomin.wu@cranfield.ac.uk, w.wang@salford.ac.uk Abstract: This paper formulates the average long-run cost for a system monitored by an np-control chart. It is assumed that the system has two types of failures: minor failure and major failure. If a minor failure occurs, the system can still operate but a signal from the np-control chart indicates that the system is out-of-control. If a major failure occurs, the system cannot operate. The system with a minor failure can be restored to a better state by a maintenance activity. In this paper, the optimization of maintenance policies for such a system is presented. Geometric processes are utilized for modelling life times and repair times. The average cost per time unit for maintaining the systems is obtained. Numerical examples and sensitivity analysis are used to demonstrate the applicability of the methodology derived in the paper. Keywords: Quality control, reliability, control charts, repairable systems, geometric processes 1. Introduction This paper aims to optimise economic cost for maintaining a manufacturing system by the utilisation of statistical process control and the renewal process. A system monitored by an np-chart may have three states: in-control state, out-of-control state and failure state. An in-control state indicates that the system functions without any problem; an out-of-control state indicates that the system can be disrupted by the occurrence of events called assignable causes but it still functions; a failure state indicates that the system fails to function. If repair is carried out when the system is in either the out-of-control or failure state, the system can be brought back to the in-control state. Maintenance policy design for such systems has drawn plenty of attention since Girshick & Rubin (1952). Research in the area always assumed both the times until the occurrence of assignable causes and the times between failures to be independent identically distributed exponential random variables. The assumptions may hold for some manufacturing systems which can be repaired as good as new. In some scenarios however, the assumption may be violated because the system may deteriorate with time and cannot be repaired as good as new, see Schneeweiss (1996). Models describing these deteriorating systems include non-homogeneous Poisson process (NHPP) models (Girshick & Rubin (1952)), Brown-Proschan imperfect repair models (Coetzee (1997)) and generalized renewal process models (Brown & Proschan (1983) and Lam (1988a)). Control charts in statistical process control are normally applied to monitor the system when the assignable causes are not observable. Samples from the process output are taken at regular time intervals and the quality characteristic of the sample items is measured. Statistical inference about the state of the process and the assignable cause in effect, if any, are drawn based upon those measurements. In this paper, the optimization of maintenance policy for a system monitored with an np-chart for the system is studied. Geometric processes are applied to analyze the expected time and cost, and the average cost per time unit for maintaining a system. We then investigate the situations where the geometric process can be applied to model the system failure process. .The paper starts with the assumptions in Section 2 and Section 3 analyzes the average cost per time unit for maintaining the system. Section 4 presents data experiments, and the last section draws concluding remarks. 2. Assumptions Consider a manufacturing system with outputs characterised by a discrete countable process. In order to detect assignable causes for defective items, an np-control chart is used to monitor the system by taking samples of size n at fixed intervals of h time units. The number of defective items among the samples is observed. This provides information revealing possible problems (or assignable causes) with the system. If the number of defective items is larger than a specific number, one can statistically infer that an assignable cause may exist in the system (or the system is in the out-of-control state) and the specific number is called an out-of-control alarm. If the number is smaller than a specific number, one can infer that the system is in the in-control state. The following assumptions also hold; • The system can either shift from the in-control state to the out-of-control state and then to the failure state, or shift from the in-control state to the failure state without going through the out-of-control state. Neither the failure state nor the out-of-control state can shift to the in-control state without any repair. 122 • The defective rates of the product are p0 and p1 when the system is in the in-control state and in the out-ofcontrol state respectively, where p1 > p0. • The chart is used to detect if the system is in-control or out-of-control, but the failure state does not need detecting. If the np chart indicates that the system is in the out-of-control state, this means an assignable cause may exist and an investigation will be carried out. During the investigation for the assignable causes, the system continues running. When the system is confirmed as being in the out-of-control state, a minor repair will be conducted which can bring the system back to the in-control state. The system is still operating while it is under repair. Once the system fails, the system stops running and a major repair takes place. The major repair can bring the system back to the in-control state. Both repair types may not restore the assignable cause or the system to the condition it was prior to failure. • A cycle is defined to be the time between two adjacent starts of the system after major repairs. A cycle may include many in-control states and out-of-control states, but only one failure state. For the mth cycle, the following notation applies; • X1m and X2m are the times from the beginning of the in-control state to the occurrence of an assignable cause and to the failure, respectively. • Y1m and Y2m are the times on a minor and major repair, respectively. • τm0 and τm1 are the times between the shift of the process to the out-of-control state and the first inspection thereafter without any considerations of the possibility of failure and under the consideration of the possibility of failure, respectively. • τs is the investigation time for an assignable cause. • Tm0 is the time from the first start of the system to the end of the last minor repair (see Figure 1). It may include operating time, time on inspecting samples, time on investigating causes and time on minor repairs. • Lm01 is the expected time from a start of the system to the end of its first adjacent minor repair in the mth cycle. It may include operating time, time on inspecting samples, investigating causes and minor repairs. • Tm01 is the expected time of the system's operation and being repaired from a start of the system to the end of its first adjacent minor repair in the mth cycle. It only includes operating and repair time on a minor repair. • Tm1 is the time from the start of the in-control state to the failure with an occurrence of the assignable cause within the out-of-control state in the mth cycle (see Figure 1). • Tm2 is the time from the start of the in-control state to the failure with an occurrence of the assignable cause within the in-control state in the mth cycle (see Figure 1). • pio is the probability that the system shifts to the out-of-control state from the in-control state. • pif is the probability that the system fails within the in-control state. • cs is the investigation cost for an assignable cause. • cp is the profit per time unit when the system is in the in-control state. • co is the profit per time unit when the system is in the out-of-control state. • cr1 and cr2 are the costs per time unit for minor and major repairs, respectively. • ci is the inspection cost per item of output. • M is the number of past cycles. −1 X1m, X2m, Y1m and Y2m are exponentially distributed with means 1 / λ1m−1 , 1 / λm , 1 / µ1m−1 and 1 / µ 2m−1 , 2 respectively. Figure 1 shows an example of an np control chart, where the in-control zone is Z 0 = ( LCL, UCL) and the out- • of-control zone is Z1=(0, LCL)∪(UCL, n), UCL = np + δ np (1 − p ) and LCL = max(0, np − δ np (1 − p ) ) . 3. Expected cost and reliability indices Denoting γj as the probability that the number of defective items found in the sample falls in Z0 when the process is in state j, then for j = 0, 1, we have γj = C nk p kj (1 − p j ) n − k ∑ k∈Z 0 According to the above-mentioned assumptions, in each cycle, one of the following two scenarios occurs (see Figure 2); Scenario 1: having passed some in-control states and out-of-control states, the system fails within the out-of-control state, and Scenario 2: having passed some in-control states and out-of-control states, the system fails within the in-control state. 123 3.1 Expected time in the mth cycle −1 m −1 Denote λ1m = λ1m −1 , λ 2 m = λ m , and µ 2 m = µ 2m −1 . From Figure 1, in 2 , 1 / λ m = 1 / λ1m + 1 / λ 2 m , µ1m = µ1 the mth cycle, the expected time is E (Tm ) = E (Tm 0 ) + pio E (Tm1 ) + pif E (Tm 2 ) + E (Tm3 ) (1) where pio = λ1m / λ m , pif = λ 2 m / λ m and the other items will be shown below. 3.1.1 Expected time E(Tm0) Within time interval Lm 01 , the expected system's operating time within the in-control state is 1 / λ1m . The expected number of samples taken within the in-control state is exp(−λ1m h) /(1 − exp(−λ1m h)) , see Lam (1988b). The probability of the appearance of a false alarm which triggers unnecessary investigation efforts is ( 1 − γ 0 ). Thus, for every sampling investigation the process remains idle for an average of ((1 − γ 0 )τ s + nt ) time units. According to Duncan (1956), the density function of τ m 0 in the mth cycle is f (τ m0 ) = (λ1m exp(−λ1m (h − τ m )) ) / (1 − exp(−λ1m h) ) (2) Therefore, the expected time between the shift of the system to the out-of-control state and the first inspection thereafter in the mth cycle is E (τ m 0 ) = (λ1m h − 1 + exp(−λ1m h) ) / (λ1m (1 − exp(−λ1m h)) ) (3) The process is at the out-of-control state before the kth inspection and an out-of-control alarm appears, resulting in an investigation for the alarm. The probability of the occurrence of this event is (1 − γ 1 )γ 1k −1 . Time on inspections is (k − 1)h + knt . As an investigation may uncover whether the out-of-control alarm is true, there is only one minor repair. The time on a minor repair is 1 / µ1m . Hence, the whole expected time on this event is ∞ ∑ [(k − 1)h + knt + τ s ](1 − γ 1 )γ 1k −1 + 1 / µ1m = (nt + hγ 1 ) /(1 − γ 1 ) + τ s + 1 / µ1m k =1 Therefore, Lm 01 = 1 / λ1m + ((1 − γ 0 )τ s + nt )(exp(−λ1m h) ) / (1 − exp(−λ1m h) ) + E (τ m 0 ) + (nt + hγ 1 ) / (1 − γ 1 ) + τ s + 1 / µ1m (4) In Figure 1, Tm01 consists of time intervals as follows. The expected time in the in-control states is 1 / λ1m , the expected time in the out-of-control states before the first inspection being carried out is E (τ m0 ) , the expected time on the inspection and the investigation for the first assignable alarm after the system shifting to the out-of-control state is nt + τ s , and the expected time on a minor repair is 1 / µ1m . Therefore, the expected time is Tm 01 = 1 / λ1m + E (τ m 0 ) + nt + τ s + 1 / µ1m (5) However, all of the above events occur only if no failure occurs within the time interval Tm 01 . The probability of this event is exp(−λ 2 m Tm01 ) . Therefore, the expected time is ∞ E (Tm 0 ) = ∑ kLm 01e − kλ2 mTm 01 = ( Lm 01e −λ2 mTm 01 ) /((1 − e −λ2 mTm 01 ) 2 ) (6) k =0 3.1.2 Expected time E(Tm1) Within time interval Tm1 , the expected time in the in-control state is 1 / λ m . The expected number of samples taken during the in-control state is (exp(−λ m h) ) / (1 − exp(−λ m h) ) . The probability of the appearance of a false alarm which triggers unnecessary investigation efforts is ( 1 − γ 0 ). Thus, for every sampling inspection, the process remains idle for an average of ((1 − γ 0 )τ s + nt ) time units. The expected time between the shift of the system to the out-of-control state and the first inspection thereafter in the mth cycle is E (τ m1 ) = (λ m h − 1 + exp(−λ m h) ) / (λ m (1 − exp(−λ m h)) ) (7) The probability of failure within ( (k − 1)h , kh ) is exp(−(k − 1) hλ 2m ) − exp(− khλ2m ) . The expected time between the first inspection after the shift of the process mean to the out-of-control state and the end of Tm1 is ∞ ∑ [(k − 1)h + knt ]γ 1k −1 (e −(k −1)hλ 2m k =1 − e − khλ2 m ) = (hγ 1e − hλ2 m + nt )(1 − e − hλ2 m ) /(1 − γ 1e − hλ2 m ) 2 Hence, the total expected time in Tm1 is E (Tm1 ) = 1 /λ m +((1 − γ 0 )τ s + nt )(e − λm h ) /(1 − e − λm h ) + E (τ m1 ) + (hγ 1e − hλ2 m + nt )(1 − e − hλ2 m ) /(1 − γ 1e − hλ2 m ) 2 (8) 124 3.1.3 Expected time E(Tm2) Within time interval (0, t), the probability of that the system fails without any shift from the in-control state to the out-of-control state is exp(−λ1m t )(1 − exp(−λ 2 m t )) . Therefore, the expected operating time from the first start of the system to failure without any shift to the out-of-control state in the mth cycle is ∞ − ∫ tde −λ1mt (1 − e −λ2 mt ) = 1 / λ1m − 1 / λm . The probability of the appearance of a false alarm which triggers 0 unnecessary investigation efforts is ( 1 − γ 0 ). Thus, for every sampling inspection the process remains idle for an average of ((1 − γ 0 )τ s + nt ) time units. The expected number of samples taken during the in-control state is e − λ1m h /(1 − e − λ1m h ) − e − λm h /(1 − e − λm h ) , giving ( E (Tm 2 ) = 1 / λ1m − 1 / λ m + ((1 − γ 0 )τ s + nt ) e − λ1m h /(1 − e − λ1m h ) − e − λm h /(1 − e − λm h ) 3.1.4 Expected time E(Tm3) Tm3 is the time required to repair the failure in the mth cycle. The expected time is E (Tm3 ) = 1 / µ 2 m 3.2 Expected cost in the mth cycle The expected cost during the mth cycle is E (C m ) = E (C m 0 ) + p io E (C m1 ) + p if E (C m 2 ) + E (C m3 ) ) (9) (10) (11) where, E(Cm j) is the expected cost incurred during Tm j, where j = 0, 1, 2, 3. Several parts to the cost may be considered: profit of the system, investigation cost for out-of-control alarms, cost for repair and cost for inspecting samples. 3.2.1 Expected cost E(Cm0) The profit per time unit in the in-control state is c p / λ1m . The expected profit operating within the out-ofcontrol state is E (τ m 0 )(c p − c o ) + (nt + hγ 1 )(c p − co ) /(1 − γ 1 ) . The investigation cost for false out-of-control alarms within the in-control state is ((1 − γ 0 )c s + ntci )(exp(−λ1m h) /(1 − exp(−λ1m h))) and the investigation cost for out-of-control alarms within the out-of-control state is ntci /(1 − γ 1 ) + c s + c r1 / µ1m . Therefore, the expected cost within Tm01 is C m 01 = − c p / λ1m − E (τ m0 )(c p − c o ) − ( nt + hγ 1 )(c p − c o ) /(1 − γ 1 ) + ((1 − γ 0 )c s + ntc i )e − λ1m h /(1 − e − λ1m h ) + ntci /(1 − γ 1 ) + cs + cr1 / µ1m The expected cost within Tm0 is (12) ∞ E (C m 0 ) = ∑ kC m 01e − kλ2 mTm 01 = C m 01e −λ2 mTm 01 /((1 − e −λ2 mTm 01 ) 2 ) (13) k =0 3.2.2 Expected cost E(Cm1) The profit per time unit in the in-control state is c p / λ m and the expected profit when operating within the outof-control state is E (τ m1 )(c p − c o ) + (1 − e − hλ2 m )(nt + hγ 1e − hλ2 m )(c p − c o ) /((1 − γ 1e − hλ2 m ) 2 ) . The investigation cost for false alarms within the in-control state is ((1 − γ 0 )c s + ntci )e − λm h /(1 − e − λm h ) and for alarms within the out-of-control state is (1 − e − hλ2 m )ntci /((1 − γ 1e − hλ2 m ) 2 ) . Therefore, the expected cost within Tm01 is E (C m1 ) = − c p / λ m − E (τ m1 )(c p − c o ) − (1 − e − hλ2 m )(nt + hγ 1e − hλ2 m )(c p − c o ) /(1 − γ 1e − hλ2 m ) 2 + ((1 − γ 0 )cs + ntci )e − λm h /(1 − e − λm h ) + (1 − e − hλ2 m )ntci /(1 − γ 1e − hλ2 m ) 2 (14) 3.2.3 Expected cost E(Cm2) The profit per time unit in the in-control state is c p / λ1m − c p / λ m . The investigation cost for false alarms within the in-control state is ((1 − γ 0 )c s + ntci )(e − λ1m h /(1 − e − λ1m h ) − e − λm h /(1 − e − λm h )) . Hence, the expected cost within time interval Tm2 is E (C m 2 ) = ((1 − γ 0 )c s + ntci )(e − λ1m h /(1 − e − λ1m h ) − e − λm h /(1 − e − λm h )) − (c p /λ1m − c p /λ m ) (15) 3.2.4 Expected cost E(Cm3) The expected time within Tm3 is shown in equation (10). Therefore, the cost incurred within Tm3 is E (C m3 ) = c 2 r / µ 2 m 125 (16) Proposition 1. The average cost per time unit for the operation of the system until cycle M is M M m =1 m =1 E ( M ) = ∑ E (C m ) / ∑ E (Tm ) (17) where, to optimise of the parameter settings of an np control chart, E(M) should be minimised for M → ∞ . Unfortunately, as equation (17) is complex, an explicit solution is hard to obtain. Section 5 presents some data experiments for further discussion on the optimisation. 4. Numerical experiments 4.1 A geometric process case We take the geometric process as an example. The definition of the geometric process is as follows. Definition 1. Given random variables ξ and ζ, ξ is stochastically greater than (less than) ζ if Pr{ξ > ν } ≥ Pr{ζ > ν } for all real v: ξ ≥ st ζ or ξ ≤ st ζ According to Ross (1996), a stochastic process {ξ i , i = 1,2,...} is stochastically increasing (decreasing) if ξ i ≤ st (≥ st ) ξ i +1 for all i = 1,2,... . Definition 2, (Lam (1992)). A sequence of non-negative independent random variables {ξi , i = 1,2,...} is called a geometric process (GP), if for some τ > 0 , the distribution function of ξ i is F (τ i −1 x) for n=1,2,… From definition 2, we can obtain: • If τ > 1 , then {ξ i , i = 1,2,...} is stochastically decreasing: ξ i > st ξ i +1 , n = 1,2,... • If 0 < τ < 1 , then {ξ i , i = 1,2,...} is stochastically increasing: ξ i < st ξ i +1 , i = 1,2,... , and • If τ = 1 , then {ξ i , i = 1,2,...} is a renewal process. A GP benefits from its simplicity for describing times between failures and times between repairs. Studies involving repair models, maintenance policies and reliability indices include Lam (1992), Lam & Zhang (2003), Zhang (2002) and Wu & Clements-Croome (2006). However, the GP suffers from an important limitation: it can only model systems with a monotonously increasing, decreasing or constant failure intensity and real intensities are usually more complicated (e.g. bathtub curves). In such a case, a single GP could not model the whole life cycle of the system. A bathtub curve can be viewed as being comprised of three distinct periods; a burn-in failure period with decreasing intensity, an intrinsic failure period with constant intensity and a wearout failure period with increasing intensity. GP’s can only be applied to model the system within one of the three periods. Below, we define an extended GP for systems whose failure intensity exhibits a bathtub curve. Definition 3, Wu (1994). A sequence of non-negative independent r.v.’s {Xn ; n = 1, 2,...} is called an extended Poisson process (EPP), if for some α β ≠ 0, α, β ≥ 0, a ≥ 1 and 0 < b ≤ 1, the cdf of Xn is G((α an-1 + β bn-1) x) and G(x) is an exponential cdf. α, β, a and b are parameers of the process. 1) If a = b = 1, then the EPP is an HPP. 2) If α an-1 ≠ 0 and β bn-1 = 0 (or α an-1 = 0 and β bn-1 ≠ 0) for n = 1, 2, ... then {Xn ; n = 1, 2,...} is a GP. 3) If α an-1 ≠ 0, a > 1 and b = 1, then {Xn ; n = 1, 2,...} can describe the periods from the intrinsic failure period to the wear-out time period in a bathtub curve. 4) If a = 1, b < 1, and β bn-1 ≠ 0, then {Xn ; n = 1, 2,...} can describe the periods from the burn-in time period to the end of intrinsic failure period in a bathtub curve. 5) If α an-1 ≠ 0, a > 1, 0 < b < 1 and β bn-1 ≠ 0, then {Xn ; n = 1, 2,...} can describe more complicated failure intensity curves. Assume that X 1m , X 2 m , Y1m and Y2 m are exponentially distributed with means 1 /((α 1a1m −1 + β1b1m −1 )λ1 ) , 1 /((α 2 a 2m −1 + β 2 b2m −1 )λ 2 ) , 1 /((α 1 a1m −1 + β1b1m −1 )λ1 ) and 1 /((α 4 a 4m −1 + β 4 b4m −1 ) µ 2 ) respectively, where, a j ≥ 1 , 0 < b j ≤ 1 and j = 1, 2, 3, 4 . Denote λ1′m = (α 1 a1m −1 + β 1b1m −1 )λ1 , λ 2′ m = (α 2 a 2m −1 + β 2 b2m −1 )λ 2 , λ m′ = λ1′m + λ 2′ m , µ1′m = (α 3 a 3m −1 + β 3 b3m −1 ) µ1 and µ 2′ m = (α 4 a 4m −1 + β 4 b4m −1 ) µ 2 . By replacing λ1m , λ 2m , λ m , µ1m , and µ 2 m with λ1′m , λ 2′ m , λ m′ , µ1′m , and µ 2′ m in the previous equations, all of the aforementioned results can be extended. The following parameters (Table 1) are used in this section. cs 0.9 co cp 100 cr1 ci 0.1 cr2 λ1 0.01 2 λ2 10 µ1 0.04 µ2 0.008 h 5 0.004 t 0 p0 0.03 0.07 τσ 1 p1 0.08 Table 1. Parameter settings 126 4.2 A comparison between the GP and EPP When GP’s are used, we assume the parameter values shown in Table 2 and when extended GP’s are used, we assume the parameter values shown in Table 3. a1 p1 1 0.03 b1 p2 1.1 0.08 a2 δ 1 1 b2 n 0.9 50 1 0.01 1 0 1 β1 β2 β3 β4 n 0 1 0 1 50 Table 2. GP parameters a1 a2 a3 a4 p1 1 1.1 1 0 0.03 b1 b2 b3 b4 p2 α1 α2 α3 α4 δ 0 0.9 0 0.9 0.08 Table 3. Extended GP parameters Figure 1. Average cost E(M) under GP Figure 2. E(M) under extended GP Figure 2 shows that cost per time unit changes monotonously when GP’s are used to model the system. That may be oversimplified because the cost per time unit may be lower while the system is staying within the intrinsic failure period of the bathtub curve. When extended GP’s are used, the average cost per time unit (see Figure 3) shows a different shape from that in Figure 2. The change in the average cost per unit time in Figure 3 exhibits a bath-tub curve shape. This result may be more realistic than that based on GP’s as the average cost per time unit should be lower when the system is within the intrinsic failure period. 5. Parameter sensitivity analysis 5.1 Comparing different values of δ δ is a parameter in the upper (lower) control limit of an np chart. We use the parameter values shown in tables 1 and 3 (except δ) and adjust the value of δ. Letting δ = 1, 2 and 3, the average cost per time unit E(M) is shown in Figure 4 for M ≤ 50 and Figure 5 for 70 ≤ M ≤ 100 . The interval 50 < M < 70 is omitted as the average cost difference is too small for illustration within that interval. We can see from Figure 4 that when δ = 1, E(M) is larger than that observed when δ = 3 whereas, the opposite is observed (in Figure 5) in the range 70 ≤ M ≤ 100 . Therefore, we draw the conclusion that δ can be set to 3 when M ≤ 50 and set to 1 when 70 ≤ M ≤ 100 . Figure 3. Relationship between E(M) and δ when M ≤ 50 Figure 4. Relationship between E(M) and δ when 70 ≤ M ≤ 100 5.2. Comparing different values of n Another parameter in the upper (lower) control limit of an np chart is the sample size n. The parameter values given in Tables 1 and 3 are again used here (except n) and the value of n is adjusted at increments of 5 in the range 40 - 150. The effect on E(M) for the differing values of n is illustrated in Figure 5. From the figure, it is clear that the cost is the smallest when n = 55 and the largest when n = 40. Therefore, n = 55 is a better choice for the np control chart. 127 Figure 5. Cost changes under different n 6. Concluding remarks Optimising the long-run average cost for monitoring manufacturing systems is an interesting research topic. A typical assumption is that the system can be repaired to an ‘as good as new’ state. In reality, the assumption may be violated when the system deteriorates over time. Research on the economic design of control charts for deteriorating systems is therefore more useful. This paper discusses the economic design of np control charts for systems that are not repaired ‘as good as new’. The main contributions in the paper are: (1) GPs are applied to the economic design of control charts for the first time. (2) The average cost per unit time until a given cycle is obtained. We obtained the average cost per unit time for np control charts under the assumptions that the times to the adjacent occurrences of assignable causes, the times to failure and the times to repair in each cycle are GPs, and we found that the average costs per unit time are increasing monotonously. We then borrowed the extended GP which can be used to describe the entire lifecycle of a system and applied it to optimise the average cost per unit time. The average cost per unit time shows a bathtub curve. Experimental results show that the average cost per unit time of a given manufacturing system can vary under changes in the parameter values. References Girshick, M. A. and Rubin, H. A. (1952) Bayes' approach to a quality control model, Annals of Mathematical Statistics, 23, 114-125 Schneeweiss, W. G. (1996) Mission Success with Components not as Good as New, Reliability Engineering & System Safety, 52, 45-53 Coetzee, J. L. (1997) The role of NHPP models in the practical analysis of maintenance failure data, Reliability Engineering & System Safety, 56, 161-168 Brown, M. and Proschan, F. (1983) Imperfect repair, J. of Applied Probability, 20, 851-859 Lam, Y. (1988a) Geometric processes and replacement problem. ACTC Mathematicae Applicatae Sinica, 4, 366-377 Lam, Y. (1988b) A note on the optimal replacement problem. Advances in Applied Probability, 20, 479-482 Duncan, A. J. (1956) The economic design of X-charts used to maintain current control of a process, J. of the American Statistical Association, 51, 228-242 Ross, S. M. (1996) Stochastic Processes, Wiley Lam, Y. (1992) Optimal geometric process replacement model, ACTC Mathematicae Applicatae Sinica, 8, 73-81 Lam, Y. and Zhang, Y. L. (2003) A geometric-process maintenance model for a deteriorating system under a random environment, IEEE Transactions On Reliability, 52, 83-89 Zhang, Y. L. (2002) A Geometric-Process Repair-Model With Good-as-New Preventive Repair, IEEE Transactions on Reliability, 51, 223-228 Wu, S. (1994) Reliability analysis of a repairable system without being repaired "as good as" new, Microelectronics & Reliability, 34, 357-360 Wu, S. and Clements-Croome, D (2006) A novel repair model for imperfect maintenance, IMA J. of Management Mathematics, 17, 235243 WU, S. and Clements-Croome, D. (2005) Optimal maintenance policies under different operational schedules, IEEE Transactions on Reliability, 54 (2), 338-346 128 Condition evaluation of equipment in power plant based on grey theory Jian-lan Li *, Shu-hong Huang Huazhong University of Science & Technology, Wuhan, 430074, China hust_ljl@hust.edu.cn Abstract: Condition evaluation of equipment is important for reliability centered maintenance (RCM) in power plants. Based on the basic theory of grey system theoretics, a new concept of grey space relation is advanced and defined in this paper. The grey space relation reflects the approximation degree of two sequences both in distance and shape, which is a fixed quantity index that denotes the relation between sequences. A grey model of condition evaluation for equipment in power plant based on the grey space relation is constructed in this paper. By calculating the grey space relation between the condition parameter sequence and the rating condition parameter sequence of evaluated equipment, the approximation degree of equipment’s condition and rating condition is obtained, thus, the fix quantity evaluation of equipment is realized in this model. Finally, condition evaluation of feed water pump is realised and good identification result are acquired, and the evaluated result can be used as a support in making scientific maintenance decisions. Keywords: Equipment in power plant, grey model, condition evaluation, grey space relation 1. Introduction In the process of reliability centered maintenance (RCM) for equipment in power plant, a core step is to evaluating the equipment’s current condition and its operational risk, which provides evidence for maintenance decision-making. Recently, some scholars attempted to construct a model of equipment condition evaluation using some mathematical method. Li et al. (2002) evaluated the steam turbine unit by simple weight and Gu et al. (2004) judged equipment condition by calculating the degree of membership of the equipment’s impairment grade. Both of the models achieve some success in evaluation, but the result is not good when some condition character parameters produce severe deviation in the rating values. Grey theory is a new subject founded by Professor J L Deng in 1982, which achieves the correct description and effective control of a system’s rule by extracting valuable information from an uncertainty system of small samples and poor information. Grey theory has now been widely applied in analysis, modelling, forecasting, decision-making, programming of the system of society, economy, weather, zoology, water conservancy, industry, agriculture, medicine etc and good effects have been achieved, see Wang et al. (2005), Zhang & Liu (2006), Xu et al. (2006) and Chen & Li (2005) for details. The condition of equipment in a power plant is commonly described by some condition parameters and it has the typical uncertainty characteristics. The essence of equipment condition evaluation is to compare the condition parameters with rating parameters thus, a grey relation can be used to achieve this intention. In this paper, a grey space relation, which reflects the approximation degree of sequences both in relatively distance and geometry shape, is advanced and defined using grey theory and a condition evaluation model of equipment in power plant is constructed. The model considers the effect of some condition parameters changing an equipment’s condition by calculating the grey relation between the condition parameters and the rating parameters of evaluated equipment thus, the fix quantify evaluation of equipment in power plant is achieved. Finally, we test that the model can realise the condition evaluation of equipment in power plant and provide support when making scientific maintenance decisions. The paper is organized as follows; in section 2, three grey relations that are grey distance, grey shape relation and grey space relation are defined, and a model of condition evaluation of equipment in power plant based on grey theory is constructed. In section 3, the condition evaluation of feed water pump based on the grey model is realised. Section 4 concludes the paper. 2. Condition evaluation of equipment in power plant based on grey theory 2.1 Dimensionless condition character parameter of equipment As the units of the parameters in the condition character parameter sequence are different, in order to evaluate the effect of each parameter consistently, a dimensionless parameter is needed before calculating the grey space relation of equipment in power plant. According to the plant information and expert experience, equipment’s condition deteriorates rapidly when the condition character parameters deviate the rating value, especially for some key parameters. Thus, the issue of how to exactly express this relationship between the parameter and condition is very important during the dimensionless process. Li et al. (2002) and Gu et al. (2004) evaluate the equipment’s condition by calculating the impairment grade using a simple linearity model, which obviously 134 can’t reflect the exact relationship between the parameter and the condition. A new dimensionless model of condition character parameters is constructed in this paper. k (x −x ) / x −x yi = e i 0 T i (1) In Eq. (1), yi is the dimensionless evaluation index of condition character parameter, xi is the practice measure value of condition character parameter, x0 is the rating value of the condition character parameter, xT is the threshold value of condition character parameter and k is the increasing factor. The larger k is, the more significant the effect of the parameter’s change on the equipment’s condition; k = 2 in this paper. It can be seen from Eq. (1) that the relationship between the equipment’s condition and the deviation value of the condition parameter is an exponential curve, which means that the equipment’s condition deteriorates rapidly along with the deviation of practice value to rating value increases. 2.2 Index of equipment’s condition evaluation A sequence X of n dimensions is constructed using some condition character parameters x(1), x(2),… , x(n), that can represent the equipment’s character function. Let the condition character parameters in the rating state construct a standard sequence X0 = {x0(1), x0(2),…, x0(n)} and the evaluated condition character parameters construct an evaluated sequence Xi = {xi(1), xi(2),…, xi(n)}. The essential aspect of equipment condition evaluation is to analyse the approximation degree between the operational value and the rating value of the equipment’s character parameter. The more approximate the two values are, the better the equipment’s condition. According to the basic theory of grey system theoretics, the condition evaluation of equipment can be translated into calculating the grey relation of two sequences of Xi and X0. A large relationship means greater equipment condition. For evaluated sequence Xi and standard sequence X0 of equipment in power plant, the relationship of these two sequences is related to not only the sequence’s approximation in geometry shape, but also the distance of the sequences and the effect (that is the weight) of each parameter in the sequence. As such, the above factors will be considered in calculating the grey relation of sequences. Firstly, a real number γ (X0, Xi) is presented and defined in this paper as 1 (2) γ (X 0 , X i ) = n 1 + ∑ α (k )( xi (k ) − x 0 (k )) k =1 where, α(k) is a weighting factor of condition character parameter x(k). γ (X0, Xi) satisfies the 4 axioms of grey relationships that are normalisation, entirety, even symmetry and approximation. As such, γ (X0, Xi) is the grey relationship of Xi to X0, which is called the grey distance relationship and is abridged as γ 0i. The justification process is as below; 1) Normalisation ∵ 0 ≤ α (k )( x i (k ) − x 0 (k )) ∴ 0 < γ (X 0, Xi ) ≤1 ∴ satisfies normalisation 2) Entirety If X = {X s s = 0,1,2 Λ n}, then for any X s1 , X s 2 ∈ X , α (k )( x i (k ) − x s1 (k )) ≠ α (k )( xi (k ) − x s 2 (k )) ∴ satisfies entirety 3) Even symmetry If X = { X 0 , X 1 } , then α (k )( xi (k ) − x 0 (k )) = α (k )( x i (k ) − x1 (k )) In the formula above, i = 1 in the left-hand side, i = 0 in the right, so γ ( X 0 , X 1 ) = γ ( X 1 , X 0 ) ∴ satisfies even symmetry 4) Approximation The smaller ( x i (k ) − x 0 (k )) is, the greater γ ( X 0 , X i ) is. ∴ satisfies approximation This concludes the justification. 135 γ 0i reflects the approximation degree of sequences of Xi to X0 in distance by calculating the absolute distance of the parameters in the two sequences. At the same time, the effects of each parameter are embodied by weigh factors. The larger γ 0i is, the more Xi approaches X0 in distance. The value range of γ 0i is (0, 1). In order to reflect the geometric approximation of sequences Xi and X0, a grey shape relationship ε (X0, Xi) is defined (Liu et al. (1999)) which is abridged as ε 0i. We have 1 + s 0 + si ε (X 0, X i ) = 1 + s 0 + si + si − s 0 X 00 = ( x 00 (1), Λ , x 00 ( n)) s0 = n −1 ∑ x00 (k ) + k =2 si − s 0 = 1 0 x 0 (n) 2 X i0 = ( xi0 (1), Λ , x i0 ( n)) si = n −1 n −1 1 ∑ xi0 (k ) + 2 xi0 (n) (3) k =2 1 ∑ ( xi0 (k ) − x00 (k )) + 2 ( xi0 (n) − x00 (n)) k =2 In Eq. (3), X 00 and X i0 are the zero image of the initial point of Xi and X0 separately. ε 0i reflects the approximation degree of sequences of Xi and X0 in geometric shape, which relates with shape only and is independent from distance, that is to say translation has no effect on ε 0i. The larger ε 0i is, the closer the approximate Xi is to X0 in geometric shape. The value range of ε 0i is (0, 1). In the evaluation process of equipment in power plant, the grey distance relationship reflects the approximation degree of equipment’s evaluated condition to standard condition on the whole, but it can not reflect exactly the effect on equipment’s evaluation when some parameters, especially key parameters, produce substantial deviation in the rating value. For example, there are two cases of feed water pumps; one is that vibration exceeds a dangerous value whilst other parameters remain normal and the other is that all parameters do not exceed a dangerous value but do exceed the rating value. If only distance is considered, then the grey distance relationship of the two cases may be the same. However, the feed water pump’s condition in the former case is obviously worse than that of the latter. As such, when only the distance and shape of the sequence are considered, the approximation degree of the evaluated parameters to the rating parameters can be true reflected. According to Eq.’s (1) and (2), a grey space relationship ρ(X0, Xi) is advanced and defined in this paper (the justification process of ρ(X0, Xi) is similarly to that of γ 0i), which is abridged as ρ0i. We have (4) ρ ( X 0 , X i ) = γ 0i ∗ θ + (1 − θ ) ∗ ε 0i In Eq. (4), θ is the weighting factor of the grey distance relationship in the grey space relationship. As the equipment’s condition is mainly deduced by the relative distance between sequences, the value of θ is commonly in the range [0.5, 1]. θ is 0.7 in this paper. It appears from Eq. (4) that ρ0i is a weighted composite of the distance and shape relationships. ρ0i is a fixed quantity index which indicates the relationship between the sequences. The larger ρ0i is, the closer Xi is to X0 and the greater the equipment’s condition. 2.3 Evaluation of equipment condition Equipment condition can be divided into 4 grades; good, common, bad and faulty, shown in table 1. Equipment condition description Good Common Bad Faulty Equipment condition is good and can continue operating Parameters deviate rating value, more inspections to be taken Equipment’s condition deteriorates obviously, faults shall be found immediately Equipment is in a faulty condition and shall be stopped to examine and repair immediately Table 1. Condition grades of generate electricity equipment According to power plant rules, the character parameters of equipment in power plants have certain permissible operational ranges and rating operational values. As such, the grades of equipment’s condition can be assigned using information from the character parameters. 3 character parameter sequences X1, X2 and X3 are defined using rating operational values and permitting an operational range; {x0i + ( xTi − x0i ) ∗ 0.3} X1 : X 2: X 3: {x0i + ( xTi − x0i ) ∗ 0.7} {xTi } 136 (5) In Eq. (5), x0i and xTi are the rating value and the maximum operating value of the equipment’s character parameter, respectively. According to Eq.’s (2) - (4), 3 grey space relations ρ 1, ρ 2, and ρ 3 are calculated. These are the relationships between the character parameter sequences X1, X2 and X3 and the rating parameter sequence, respectively. These 3 space relationships are called boundary space relationships and are the thresholds of 4 condition grades; good and common, common and bad, bad and faulty: ρ 0i > ρ1 good : common : ρ1 ≥ ρ 0i > ρ 2 (6) ρ 2 ≥ ρ 0i > ρ 3 faulty : ρ 0i ≤ ρ 3 bad : In Eq. (6), ρ 0i is the grey space relationship of the sequence of factual operational values to that of rating operational values. 3. Condition evaluation of feed water pump An example of a feed water pump is taken for equipment evaluation in this paper. A character parameter sequence of the feed water pump is acquired from Gu et al. (2004), of the vibration of feed water pump, export pressure of feed water pump, bearing metal temperature of feed water pump, export oil temperature of oil cooler, temperature of airproof cooling water, lubricating oil pressure of feed water pump, shown in table 2. Sequence Y0 is a rating sequence that is made of rating operational values of the feed water pump and Y1, Y2 and Y3 are 3 sequences that are made of factual operational values of the feed water pump. According to Eq.’s (1) - (5), 3 grey boundary space relationships for the feed water pump are calculated as ρ 1 = 0.684, ρ 2 = 0.478 and ρ 3 = 0.395 respectively. Similarly, the 3 grey space relationships between the sequences Y1, Y2 and Y3 to Y0 are calculated, these are ρ 01 = 0.816, ρ 02 = 0.367 and ρ 03 = 0.547. Finally, the equipment’s condition evaluation results are obtained according to Eq. (6) and shown in table 3. The diagnostic results of the model are the same as that of the experts, which testifies toward the exactness of model especially for condition Y2 (most of whose condition parameters are good except vibration which is out of the permissible range) whose grey space relation is calculated to be ρ 02 = 0.367 < ρ 3. As such, it can be judged that Y2’s condition grade is faulty. From this example, it appears that the equipment’s condition can be well judged when some parameters produce severe deviation in the rating values. permissible range weight Y0 Y1 Y2 Y3 Vibration of feed water pump (µm) Export pressure of feed water pump (Mpa) Bearing metal temperature of feed water pump (℃) 0~50 >16 60~70 0.2 0.2 0.15 20 16.7 65 20 16.5 65 52 16.7 68 35 16.7 65 Export oil temperature of oil cooler (℃) 35~46 0.15 40 40 40 42 Temperature of airproof cooling water (℃) Lubricating oil pressure of feed water pump (Mpa) 55~75 0.15 65 65 65 70 0.1~0.24 0.15 0.18 0.18 0.22 0.18 Table 2. Character parameter of water feed pump grey space relation evaluation result by model evaluation result by expert Y1 Y2 Y3 0.816 good good 0.367 fault fault 0.547 common common Table 3. Condition evaluation result of water feed pump 4. Conclusion Based on grey system theory, the concepts of grey distance relationships and grey space relationships are advanced and their mathematical models defined. The grey distance relationship is an index that reflects the approximation degree of the sequences Xi and X0 in distance by calculating the absolute distance of the 137 parameters in the two sequences. The grey space relationship is a comprehensive quantity index that reflects the approximation degree of sequences Xi and X0 both in distance and shape. A condition evaluation model of equipment in power plant is constructed in this paper and the model has a clear mathematical definition. In this model, fixed quantity evaluation is achieved by calculating the grey space relationship between sequences of factual parameters and that of the rating parameter. In particular, the effect on the evaluation of condition when some of the parameters produce severe deviations in the rating value is also addressed. 3 conditions of feed water pump are evaluated using the model in this paper. The diagnostic result is consistent with that of experts, which validated the model. References Chen, J. H., Sheng, D. R. and Li, W. (2002) A model of multi-objective comprehensive evaluation for power plant, Proceedings of the Chinese Society for Electrical Engineering, 22 (12), 152- 155. Chen, S. W. and Li, Z. G. (2005) Application of grey theory in oil monitoring for diesel engine, Transactions of CSICE, 23 (5), 476- 480. Deng, J. L. (1985) Grey Systems, Beijing: National Defense Industry Press. Gu, Y. J., Dong, Y. L. and Yang, K. (2004) Synthetic evaluation on conditions of equipment in power plant based on fuzzy judgment and RCM analysis, Proceedings of the Chinese Society for Electrical Engineering, 24 (6), 189- 194. Li, J., Sun, C. X. and Liao, R. J. (2004) Study on analysis method about fault diagnosis of transformer and degree of grey incidence based on fuzzy clustering, Chinese Journal of Scientific Instrument, 25 (5), 587- 589. Li, L. P., Zhang, X. L. and Wang, C. M. (2002) Theoretical and systematic study of the comprehensive evaluation of the operation state of a large-sized steam turbine unit, J. of Engineering for Thermal Energy & Power, 17 (5), 442- 444. Liu, S. F., Guo, T. B. and Dang, Y. G. (1999) Grey System Theory and Its Application, Beijing: Science Press. Ren, S., Mu, D. J. and Zhu, L. B. (2006) Model of information security evaluation based on grey analytical hierarchy process, J. of Computer Application, 26 (9), 2111- 2113. Wang, H. Q., Wang, T. and Gu, Z. H. (2005) Grey prediction model and modification for city electric power demand, Proceedings of the Chinese Society of Universities, 17 (2), 73- 75. Xu, W. G., Tian, L. W. and Zhang, Q. Y. (2006) Study on modification and application of grey relation analysis model in evaluation of atmospheric environmental quality, Environmental monitoring in China, 22 (3), 63- 66. Zhang, C. H. and Liu, Z. G. (2006) Multivariable grey model and its application to prediction gas from boreholes, China Safety Science Journal, 16 (6), 50- 54. 138