Mathematical Model of Corrective Maintenance Based on Operability Checks for Safety Critical Systems

Maintenance based on equipment operability checks is widely used for technical systems of various physical nature. For commercial and military aircraft such checks are carried-out after a certain amount of time according to specific maintenance programs. Therefore, great attention in the research literature is paid to the mathematical modeling of maintenance on the basis of equipment operability checks. In this study, a mathematical model of corrective maintenance with operability checks at discrete times for the safety critical systems is considered. The criterion of the corrective maintenance effectiveness is proposed to provide a given level of operational reliability with minimum maintenance costs. A finite time interval is considered for modeling the moments of the system operability checks. The graph of decision making is analyzed for imperfect operability checks and the probabilities of possible decisions are determined. Analytical equations for the operational reliability and expected maintenance costs are derived for an arbitrary distribution of time to failure. The criteria of determining optimal policies of sequential checks are formulated. Numerical examples illustrate the developed theory. For the first time it has been shown that conditional probabilities of correct and incorrect decisions when checking system operability are dependent on the time of failure and parameters of the degradation model. Numerical calculations have shown that in the case of mixing deteriorating systems with different initial time points of operation, the interval between operability checks converges to a constant periodicity.


Introduction
At present, corrective maintenance based on operability checks is widely used to maintain the operational reliability of various technical systems. Evidence of this is a large number of publications on periodic and sequential plans of operability checks. Mathematical models of corrective maintenance based on operability checks can be conditionally divided into the two groups: models with perfect checks and models with imperfect checks. Models with perfect checks were considered in a large number of publications, for example, in [1][2][3][4][5]. In these studies, the problems of determining the optimal moments of checks are considered. The criterion of optimization is the minimum of expected maintenance costs, which includes the cost of checks, losses due to the unrevealed failure and cost of the system repair. The plans of checks can be sequential and periodic. Let us now turn to the analysis of maintenance models with imperfect checks. A typical inspection model with two imperfect inspection probabilities is analyzed in [6].
The system under testing may be judged as failed even if it is operable or the system may be incapable of detecting its failure due to imperfect inspection. The optimal policy that minimizes the total expected cost up to the detection of system failure is considered. An imperfect-inspection model in which failures can only be detected with probability p < 1 is considered in [7]. The exponential distribution of time to system failure is supposed. The asymptotic distribution of the test statistic is obtained under the null hypothesis as well as under the alternative. In [8], a maintenance model with periodic checks is examined to detect and eliminate unrevealed failures. Imperfect periodic checks are conducted with periodicity τ in the finite time interval [0, (n + 1) τ]. For any of the checks, the failure of the system is detected with probability p∈ (0, 1). After detecting failure, corrective repair is performed, which is equivalent to replacing the system with a new one. If there was no failure on the interval [0, (n + 1) τ] or it was not detected, then at the time (n + 1) τ the system is replaced by a new one. The goal is to determine the optimal frequency of imperfect checks between preventive maintenance, which minimizes the cost of maintenance per unit time. In [9][10][11][12], the maintenance models based on imperfect checks are considered with two types of errors: "false alarm" with probability α and "undetected failure" with probability β and, accordingly, correct solutions with probabilities 1 -α and 1 -β. It should be noted that in [6][7][8][9][10][11][12] the probabilities of correct and incorrect decisions are assumed to be independent of the time and the degradation process parameters, therefore, these models are not fully adequate to the real maintenance processes.
In this study, a new maintenance model is developed for determining optimal moments of operability checks for safety critical systems. The proposed maintenance model takes into account the dependence of the probabilities of correct and incorrect decisions on the time and parameters of the degradation process.

Decision Rule When Checking System Operability
Assume that the operation of the new system begins at time t = 0 and scheduled checks are conducted in times t 1 < t 2 < ⋯ t M < T, where T is the finite time horizon. At time T the system is not checked. If the system has not been rejected before the time T, then a preventive maintenance is performed, renewing its state. It is also assumed that the state of the system is completely determined by the value of the parameter L (t), which is a non-stationary random process with continuous time. If the value of L (t) exceeds the functional failure threshold FF, the system goes into an inoperable state. The result of measurement of L (t) at time t k can be represented as where Y (t k ) is the measurement error or noise at time t k . When checking the system operability at time t k the following decision rule is used: if x (t k ) < FF, the system is judged as operable; if x (t k ) ≥ FF, the system is judged as inoperable. If x (t k ) ≥ FF, then a repair (preventive, if l (t k ) < FF and corrective, if l (t k ) ≥ FF) of the system is performed at time point t k .
Depending on the results of operability checking at time t k the following decisions can be made: if the system is judged as operable, then it is allowed to use in the time interval (t k , t k+1 ); if the system is judged as inoperable, it is repaired and allowed to use in the time interval (t 0 , t 1 ). For periodic checks, the interval (0, T) is divided equally into M + 1 sub-intervals and the system is periodically checked at time points kτ (k = 1, 2, …, M), where M = T/τ -1.

Graph of Decision Making
Graph of decision making when checking the system operability is shown in Figure 1. According to Figure 1, a priori the system can be found in one of the following two states at time t k : operable with probability P (t k ) and inoperable with probability 1 -P (t k ), where P (t) is the system reliability function.
The event Γ 1 (t k ) corresponds to a "true positive" at time t k . The event Γ 2 (t k ) corresponds to a "false alarm" at time t k . The event Γ 3 (t k ) corresponds to a "missed detection" at time t k . The event Γ 4 (t k ) corresponds to a "true negative" at time t k . The events Γ 1 (t k ) and Γ 4 (t k ) correspond to correct decisions and the events Γ 2 (t k ) and Γ 3 (t k ) -to incorrect decisions.
Let us denote the random time to system failure by H with the probability distribution function (PDF) ω (η). Consider the random variable H k , representing an estimate of the random variable H on the results of operability checking at instant t k . Random variables H and H k are the solutions of the following stochastic equations: is an additive function of random variables L (t k ) and Y (t k ), then random error ∆ k in evaluation of time to failure at time t k can be represented as follows Let us formulate the conditional probabilities of the events when checking the system operability at time t k in terms of the random variables H and H k . Assume the system failure occurs at time η, where t k < η ≤ t k+1 (k = 0, 1, …, M). Then, the conditional probability of a "true positive" at time t µ (µ = 1, …, k) under the condition that H = η is formulated as The conditional probability of a "false alarm" at time t µ (µ = 1, …, k) under the condition that H = η is formulated as The conditional probability of a "true negative" at time t j (j = k + 1, …, M) under the condition that H = η is formulated as The conditional probability of a "missed detection" at time t j (j = k + 1, …, M) under the condition that H = η is formulated as The calculation of the conditional probabilities (5)-(8) is equivalent to computing the probability of hitting the random point {H 1 ,..., H k } within the k -dimensional domain, which is formed by the variation limits of each random variable.

Maintenance Key Performance Indicators for Safety Critical Systems
The operational reliability is the most important maintenance key performance indicator for the safety-critical systems. Examples of such systems can be aircraft engines, equipment of nuclear power plants, etc. The expected cost of maintenance can be used as the second maintenance effectiveness indicator. The operational reliability P OR (t k , t) is defined as the probability of the system failure-free operation in the interval (t k , t), t k < t ≤ t k+1 considering the fact that at time points t 1 , …, t k the scheduled operability checks are performed and unscheduled repairs are carried-out if the system is judged as inoperable.
Operational reliability of the system, which is checked at discrete time points, is determined as follows: where P R (t j ) is the probability of repairing the system at time t j . The probability P R (t j ) is given by where P PR (t j ) and P CR (t j ) are, respectively, the probability of preventive and corrective repair. Preventive repair at time t j is associated with a "false alarm" event occurred by the results of operability checking at time t j . Corrective repair at time t j is associated with a "true negative" event occurred at time t j .
The probability of the preventive repair at time t j can be represented as The probability of the corrective repair at time t j is determined as Let us begin with the proof of (14). Consider the following events: R (t j ) is the event consisting in the repair of the system at time t j after the j-th operability check, PR (t j ) and CR (t j ) are the events consisting in the preventive or corrective repair of the system, respectively. The system will be repaired at time t j if either of the events PR (t j ) or CR (t j ) occurs. Therefore, From Figure 1 follows that events PR (t j ) and CR (t j ) are mutually exclusive because they are based on the incompatible events Γ 2 (t j ) and Γ 4 (t j ). Applying the addition theorem to (17), we obtain (14).
Let us now prove (13). The probabilistic definition of the operational reliability P OR (t k , t) can be represented as follows: Assume that last repair was at time t j . Since the scheduling of the operability checks is carried out over a finite time interval (0, T), the random variable H is determined in the interval (0, Tt j ) with the conditional PDF Assume that the system failure occurs after the j-th check in the interval from η to η + dη. The probability of this event is determined as The probability of event is determined by integrating (20) over the region of existence of the random variable H, i.e.
( ) ( ) The joint probability of the events (17) and (21) can be found using the multiplication theorem of probability for independent events ( ) The events R (t 0 ), …, R (t k ) are independent because the system can be repaired at any of the time points t 0 , …, t k and after any repair the system becomes as good as the new. Therefore, summing the probabilities (24) with the variation of j from 0 to k gives (13), where P R (t 0 ) = P [R (t 0 )] = 1. Q. E. D. Equations (15) and (16) are proved analogously.
As it was already indicated, the expected corrective maintenance costs of the system in the time interval (0, T) can be used as the second indicator of the maintenance effectiveness. Using the formula for the mathematical expectation of a discrete random variable, we obtain where C CM is the random cost of the corrective maintenance, C PR and C CR are, respectively, the average cost of the preventive and corrective repair, C OC is the average cost of the operability checking.

Optimization Criteria
Since we have two indicators for assessing the effectiveness of the corrective maintenance, then two criteria of determining the optimal moments of operability checks can be proposed. The choice of criteria depends on what kind of the effectiveness indicator is put in the constraint. If the minimum allowable value of the operational reliability P * is set, then the expected corrective maintenance cost can be minimized. The optimization criterion in this case can be formulated as follows: opt opt M t t are the moments of the reliability checks providing minimum of the expected corrective maintenance costs in the interval (0, T) and the value of the operational reliability not less than P * .
If the maximum expected corrective maintenance costs E (C * ) is specified, then optimization criterion is formulated as follows:

Deterioration Model
To calculate the probabilities of the correct and incorrect decisions (9) - (12), which are included into (13)-(16), it is necessary to know the PDF Λ (δ 1 ,…, δ k |η). This PDF depends on the type of stochastic degradation process L (t). Assume that the system deterioration process can be described by the linear stochastic equation where L 0 is the random initial value of L (t) and L 1 is the random velocity of the system deterioration, L 1 ∈ (0, ∞). If the measurement errors Y (t 1 ), …, Y (t k ) are independent random variables with PDF Ω (y i ), i = 1, …, k then Λ (δ 1 ,…, δ k |η) is determined as [13] ( ) where f (l 0 ) is the PDF of random variable L 0 , ω (η|l 0 ) is the conditional PDF of random variable H under the condition that L 0 = l 0 . If the initial value of L (t) in (29) is a constant, i.e., L 0 = l 0 , then ( ) In the case of a normal distribution of Y (t i ), i = 1, …, k the PDF (30) can be represented as where σ y is the standard deviation of Y (t i ). For k = 1, from (31) we obtain ( ) Example 1. Let the system state parameter L (t) be the output voltage of the power supply of the radar transmitter [14]. Suppose that the random error in measuring L (t) has a normal distribution law with zero mathematical expectation and a standard deviation σ y = 1 kV, l 0 = 20 kV and FF = 25 kV. Figure 2 shows a 3-D image of the conditional PDF Λ (δ 1 |η), which was plotted using the 3D Surface Plotter. The values on both horizontal axes correspond to the hours. As can be seen from Figure 2, with an increase in the failure time η, the conditional PDF Λ (δ|η) is flattened, which indicates an increase in the variance of the random error in the evaluation of the operating time to failure, ∆.
In Figure 3, the 3D image of the conditional PDF of the error in the evaluation of the operating time to failure as a function of the arguments δ and σ y for η = 300 h is shown. As can be seen from Figure 3, with an increase in the standard deviation of the measurement error of the system state parameter σ y , the conditional PDF Λ (δ|η) is also flattened, which indicates an increase in the variance of the random error in evaluating the operating time to failure, ∆.
From this example follows very clear conclusion that conditional probabilities of correct and incorrect decisions depend on the time of failure and parameters of the degradation model. Therefore, assumption of constant probabilities of correct and incorrect decisions when checking system operability, which is assumed in many published papers, is incorrect. The use of constant probabilities of correct and incorrect decisions in maintenance models leads to significant errors in the calculated values of the maintenance effectiveness indicators.
Example 2. Assume, as in example 1, the system state parameter is the output voltage of the radar transmitter' power supply. It is necessary to solve the problem (26) if T = 5000 h, C PR = 3,000 $, C CR = 10,000 $, C OC = 500 $, FF = 20 kV, l 0 = 16 kV, and σ y = 0.25 kV. Let the minimum allowable value of the operational reliability be P * = 0.95.
The PDF of time to failure for stochastic process (26) when L 0 = l 0 is determined as [15] ( ) ( ) ( ) where m 1 and σ 1 are, respectively, mathematical expectation and standard deviation of random variable L 1 . Assume that m 1 = 0.002 kV/h and σ 1 = 0.00085 kV/h. Solving problem (26) we determine the following sequence of optimal time points of operability checks: {t k , k = 1, …, 43} = {1165, 1285, 1380, 1470, 1555, …, 4840, 4920 h}. The dependence of the operational reliability on the operating time indicating the moments of operability checks is shown in Figure 4. As can be seen from the sequence of operability checks, the time interval between the checks decreases and tends to 80 h. The number of operability checks, which is necessary to ensure a required operational reliability value of 0.95 in the interval (0, 5,000 h), is 43.
The minimum value of the expected corrective maintenance costs in the time interval (0, 5000 h) is ( ) ( )

Conclusions
In this study, the key performance indicators of the corrective maintenance for the safety-critical systems have been proposed. General equations of the operational reliability and expected maintenance costs for a system, which is checked at successive times, have been derived. Unlike previously published studies, the proposed equations have been obtained for the case of arbitrary time-to-failure distribution and imperfect checks. To determine the optimal moments of operability checks, two optimization criteria have been formulated. For the first time it has been shown that conditional probabilities of correct and incorrect decisions when checking system operability are dependent on the time of failure and parameters of the degradation model. Numerical calculations have shown that in the case of mixing degrading systems with different initial points of operation, the interval between checks converges to a constant periodicity.