EG2 Evaluating Software Reliability in port equipments a

Transcription

EG2 Evaluating Software Reliability in port equipments a
Evaluating Software Reliability in port equipments:
a case study
Mathew A G
Rizwan S M
Ministry of Higher Education, CAS-Sohar
Engineering Department, Mechanical Unit
Sohar, Oman
agmathew123@gmail.com
Caledonian College of Engineering
Department of Mathematics & Statistics
Abstract—Critical systems such as spacecraft, aircraft, nuclear
power plant, heavy port equipment etc. need a very high level of
dependability and reliability in their operations, a majority of which
are software controlled. Two types of techniques are used in the
design and implementation of dependable software systems: fault
avoidance and fault tolerance techniques. Reliability is a very popular
aspect of software dependability, which relies, in particular, on fault
forecasting and fault removal. The real data of a software based
system controlling critical port equipment has been used for this
purpose. Any one software fault in the system brings the entire port
equipment to a complete halt. The critical port equipment fails due to
any one of the two types of software faults as categorized in the data.
The reliability modeling methodology used here is from a wellknown model with the emphasis on its application to problems with
software’s in critical port equipment. The paper outlines a modeling
strategy by embedding the fault types as seen in the data and
important reliability metrics such as: mean time to system failure
(MTSF) and steady state availability are obtained using semi-Markov
processes and regenerative point techniques. Graphs are essentially
established to interpret the results.
Keywords—software reliability; software dependability; semi–
markov; regenerative processes; MTSF;
I. INTRODUCTION
Different types of industrial assets under varied operating
situations have been studied and simulated by means of real
and hypothetical data by a number of academics: Bhupender et
al., (2007), Rizwan et al., (2011, 2013), including references
there in. Recently, Mathew et. al (2012) wrote about the
effects of planned and unplanned maintenance stops of a
programmable logic controller (PLC) controlled quayside
container crane number 15 (QC 15) which is used to load and
unload the containers of the docks, which is the heart of the
docks and port trade. The primary focus of the paper was from
a mechanical maintenance perspective associated with
mechanical systems only.
Noting the above, and expanding the coverage of the case
study further, to include the software based faults also, this
case study performs a reliability modeling and analysis of the
same 65 ton PLC controlled quayside container crane (QC 15)
currently operational at a large strategic port in Oman. The
port has in operation a number of PLC controlled quayside
Muscat, Oman
container cranes for trans-shipment purposes. It is understood
from the data that frequent repair and replacement of PLC’s
due to software faults, is a cause of concern to the port
maintenance department as the downtime cost is significantly
high resulting in, slowdown of the container trans-shipment
process. For the purpose of reliability modeling and analysis,
quayside container crane number 15 is selected. A specific
software reliability model incorporating the actual software
fault states and software outages as observed in the data is
developed and optimized maintenance metrics are estimated.
The emphasis of this paper is on the application of a reliability
modeling methodology to PLC based software reliability
problems by providing a context in which the effect of PLC
reliability and availability can be quantified based on actual
values of different rates and probabilities using a real time
quayside container crane data to achieve all the concluding
results.
Any programmable or non-programmable software fault in
the PLC stops the quayside container crane and halts the
container trans-shipment process thereby causing a series of
losses. A visual scrutiny of the fault display screen exposes the
software fault type and decides the type of maintenance
decision to be taken. The quayside container crane fails due to
any one of the two types of software faults as seen within the
data, i.e. programmable software faults and nonprogrammable software faults. The software fault is attended
to by the port maintenance department as soon as it occurs.
The system regenerates and works like new after each
programmable software fault or non-programmable software
fault removal.
The collected data gives the following estimations:
Probability of programmable software fault p1= 0.5.
Probability of non-programmable software fault p2= 0.5.
Estimated value of software fault rate λ = 0.000051408 per
hour.
Estimated value of programmable software fault removal rate
α1 = 0.0833 per hour.
Estimated value of non-programmable software fault removal
rate α 2 = 0.0833 per hour.
The PLC software system used in the QC 15 is analyzed using
semi Markov process and regenerative point technique, and
the following maintenance performance metrics are obtained:
 Mean time to QC 15 (PLC software) failure.
 QC 15 (PLC software) availability analysis.
II. MODEL DETAILS AND ASSUMPTIONS
The unit is initially operative at state 0 and transits
probabilistically depending on the type of software fault to any
of the two states 1 to 2 with probabilities p1and p2 respectively
(refer figure 1).
1. All programmable and non-programmable software fault
times are assumed to have exponential distribution with
software fault rate (𝝺) whereas the repair times have general
distributions.
2. After each programmable fault and non-programmable fault
rectification at state’s 1 to 2, the system works as good as new
and returns back to state 0.
3. The software (programmable/non-programmable) faults are
self-announcing.
4. The software (programmable/non-programmable) fault port
maintenance department comes as soon as the PLC unit fails.
dQ01 = p1λe-λt dt
dQ02 = p2 λe-λt dt
dQ10 = g1 (t)dt
dQ20 = g 2 (t)dt
(1)- (4)
The non-zero elements p ij are as given below:
p01 = p1
(5)- (6)
p02 = p 2
By these transition probabilities it is verified that:
p01 + p02 = 1
p10 = p20 = 1
(7)- (8)
III. NOTATIONS USED
O
𝝺
p1
p2
SP
SNP
©
pij, Qij
*
Operative PLC unit.
Constant software fault rate of the PLC unit
Probability of programmable software fault
Probability of non-programmable software fault
Quayside container crane (QC 15) is under
programmable software fault removal
Quayside container crane (QC 15) is under nonprogrammable software fault removal
Convolution.
p.d.f., c.d.f. of first passage time from a regenerative
state i to j or to a failed state j in (0, t]
c.d.f. of first passage time from a regenerative state i
to a failed state j
Laplace Transforms (LT), i.e., for any f(t) and g(t);
f(t) *g(t) =
g1(t), G1(t)
g2(t), G2(t)
t
 f(t - u)g(u)du
0
p.d.f., c.d.f. of programmable software fault
removal time
p.d.f., c.d.f. of non-programmable software
fault removal time
IV. TRANSITION PROBABILITIES AND MEAN SOJOURN
TIMES
A transition diagram showing the different states of
transition of the PLC software system used in the QC 15 is as
shown in figure 1. The epochs of entry into states 0, 1, and 2
are regeneration points and hence the states are regenerative
states. The states 1 and 2 are fault states. The transition
probabilities are as given below:
Figure 1. Transition states of the PLC software system used in the QC 15
The mean sojourn time ( μ i ) in the regeneration state ‘i’ is
called as the time of stay in that state before transition to any
other state. If T shows the sojourn time in the regenerative
state i, then:
μi = E(T) = Pr[T > t]dt


Thus: μ 0 = e-λt dt =
0
1
;
λ


μ1 = G1 (t)dt;
0


μ 2 = G 2 (t)dt;
(9)- (11)
0
The unconditional mean time taken by the PLC software
system to change into regenerative state ‘j’ when it is counted
from the epoch of entrance into state ‘i’ is mathematically
stated as:


mij = tdQij (t) = -qij * (0)
0
Thus, m10 = μ1
m 01 +m02 = μ0
m20 = μ 2
TABLE 1. SUMMARY OF THE DATA
(12)- (14)
V. MATHEMATICAL ANALYSIS
A. Mean time to QC 15 (PLC software) failure (MTSF)
By denoting ‘Ui’ as the random variable that shows the
time to QC 15 (PLC software) failure, when the QC 15 (PLC
software) starts from state i (i=0) then, the reliability of the
QC 15 (PLC software) is given by: R i (t)  P[Ui  t].
Taking the software fault states 1, and 2 as absorbing states
and employing the arguments used for regenerative processes,
we have the following recursive relation for R0 (t) ,
R0 (t) = q01 (t) + q02 (t)
(15)
Resolving the equation as shown in (15) using Laplace
Transforms (L.T.) the solution is derived the expression
for R0 (t) in terms of its L.T, i.e., R0 *(t) , now using the
formula for mean time to QC 15 (PLC software) failure
(MTSF), it is got:
E(T0 )  MTSF  lim R 0 *(s) 
s 0
N
D
(16)
Where N = μ 0 and D =1
The following particular case is considered for graphical
analysis:
g1 (t) = α1e-α1t ,g 2 (t) = α2e-α2 t
p01 = p1, p 02 = p 2,
p10 = 1, p 20 = 1
μ0 =
B. QC 15 (PLC software) availability analysis
Using the probabilistic arguments and by defining Ai(t) as
the probability that the QC 15 (PLC software) is in upstate at
the instant t, given that the QC 15 (PLC software) entered the
regenerative state i at t = 0, the following recursive relations
are obtained:
A0 = M0 (t) + q01 (t)©A1 (t) + q02©A2 (t)
A1 (t) = q10 (t)©A0 (t)
A2 (t) = q 20 (t)©A0 (t)
VI. PARTICULAR CASE
1
1
1
,μ1 = ,μ 2 =
λ
α1
α2
Using the numerical values calculated from the data collected
from the company as shown in table 1 and the expressions
(16) and (20); the mean time to QC 15 (PLC software) failure
and QC 15 (PLC software) availability are estimated as:
Mean Time to QC 15 (PLC software) Failure: 19452.22533
hours
QC 15 (PLC software) Availability: 0.999383238
(17)-(19)
-λt
Where M0 (t) = e .
Taking the Laplace Transforms (L.T.) of the equations shown
above and solving them for A0 *(s) , it is got:
A0 * (s) =
N1 (s)
D1 (s)
(20)
The steady state availability of the QC 15 (PLC software) is
given as:
A0 = limsA0 * (s) =
s 0
Where
N1
D1
N1 (s) = μ 0 ,
D1 = μ 0 + p1μ1 + p 2μ 2
(21)
Figure 2, MTSF vs. software fault rate (λ)
The important reliability indices such as MTSF,
availability are estimated numerically. A declining trend of
MTSF and Availability with respect to the software fault rate
can be seen in Figure 2 and Figure 3.
REFERENCES
[1]
[2]
Figure 3, Availability (A0) vs. software fault rate (λ)
VII. CONCLUSIONS
Reliability modeling and analysis proves to be an effective
tool in achieving the goal of zero software failure performance
of critical port equipment. Based on the risk factor, the model
correctly predicts/determines the fault mechanisms, causes of
faults and offers a scientific basis for achieving improved and
better maintenance metrics.
[3]
[4]
[5]
A.G. Mathew and S.M. Rizwan, “Maintenance analysis of port Plc’s - A
case study”, Caledonian Journal of Engineering, Vol.8,No.2, pp. 52-57,
Jul-Dec 2012.
M. N. Al Alawi, A.G Mathew. and J. Collington, “Attaining zero failure
performance for GIV gulfstream aircraft through reliability modeling
and analysis”, International Journal of Advanced Scientific Engineering
and Technological Research, Vol.1, No. 2, pp. 1-9, 2012.
Bhupender Parashar and Gulshan Taneja, “Reliability and profit
evaluation of a PLC hot standby system based on a master-slave concept
and two types of repair facilities”, IEEE Transactions on reliability, vol.
56, no. 3, pp. 534-539, 2007.
S.M. Rizwan, N. Padmavathi, A. Pal, and G. Taneja, “Reliability
analysis of a seven unit desalination plant with shutdown during winter
season and repair/maintenance on FCFS basis”, International Journal of
Performability Engineering, vol. 9, no. 5, pp. 523-528, 2013.
S.M. Rizwan, A.G. Mathew, M.C. Majumder, K.P. Ramachandran and
G. Taneja, “Reliability analysis of an identical two-unit parallel CC
plant system operative with full installed capacity”, International Journal
of Performability Engineering, vol. 7, no. 2, pp. 179-185, 2011.