Significant efforts have been recently devoted to the qualitative and quantitative evaluation of resilience in engineering systems. Current resilience evaluation methods, however, have mainly focused on business supply chains and civil infrastructure, and need to be extended for application in engineering design. A new resilience metric is proposed in this paper for the design of mechanical systems to bridge this gap, by investigating the effects of recovery activity and system failure paths on system resilience. The defined resilience metric is connected to design through time-dependent system reliability analysis. This connection enables us to design a system for a specific resilience target in the design stage. Since computationally expensive computer simulations are usually used in design, a surrogate modeling method is developed to efficiently perform time-dependent system reliability analysis. Based on the time-dependent system reliability analysis, dominant system failure paths are enumerated and then the system resilience is estimated. The connection between the proposed resilience assessment method and design is explored through sensitivity analysis and component importance measure (CIM). Two numerical examples are used to illustrate the effectiveness of the proposed resilience assessment method.

Introduction

Resilience refers to the ability of a system to recover to its normal operating condition after occurrence of disruptive events [1]. Since the first definition in 1970s, modeling and definitions of resilience have been widely studied in ecology [2], social science [3], and economics [4].

Even though resilience has been intensively studied in the above areas, its development in engineering field is still in the early stages. Resilience assessment of engineering systems has gained increasing interest in recent years. From the perspective of definition, in 2009, the American Society of Mechanical Engineers (ASME) defined resilience as a system's ability to rapidly recover to the full function after disruption [5]; Ouyang and Wang [6] evaluated the annual resilience of a system under multihazard events; Ayyub [7] proposed a resilience metric by considering the aging effects and different types of vulnerability and recoverability scenarios. Reed et al. developed a method to evaluate the resilience of networked infrastructure [8]. Other definitions of resilience metrics have also been proposed [7], and a detailed review can be found in Ref. [9]. From the perspective of application, Yodo and Wang [10,11] assessed the resilience of an electric motor supply chain using Bayesian networks (BNs); Panteli and Mancarella assessed the resilience of electrical power infrastructure [12]; Baroud et al. [13] evaluated the resilience of an inland waterway network based on the CIM method proposed by Barker et al. [14]; and Spiegler et al. [15] estimated supply chain resilience using a control engineering approach.

The above literature reviews show that current studies of resilience in engineering system have focused on problems related to supply chains [10,11,15], waterway networks [13], power infrastructure [12], and civil infrastructure systems [6]. The developed resilience metrics are difficult to apply in engineering design. Motivated by filling the gap between resilience assessment and engineering design, the first quantitative attempt was made by Youn et al. [16] in 2011 to develop a resilience-driven design framework. After that, Mehrpouyan et al. [17] investigated the resilience of complex engineered system design by employing a graph spectral approach in the design of system architecture. The resilience design framework proposed by Youn et al. [16] was basically designed for prognostics and health management (PHM), which is associated with the detectability of failure events. In the method proposed by Mehrpouyan et al. [17], resilience is affected by the physical connections between components. In addition, Wang and Li [18,19] studied the redundancy allocation of an engineering system by considering the failure interactions; this is also related to resilience since redundancy is able to increase the reliability and decrease the vulnerability of a system.

Considering that self-healing is usually difficult for traditional mechanical systems, the recovery of mechanical systems is often achieved through repair or replacement. For different components, the recovery probability, ability, and required time are different. In this situation, according to the definition of resilience, a resilient mechanical system should be a system that has low quality loss after recovery and requires a short time to recover. Besides, there are numerous failure paths for a system with multiple components. For different failure paths, the recovery properties are different. Based on these observations, a new resilience metric is proposed in this paper. Since resilience is usually time-dependent and uncertainty is inherent in design, the proposed resilience metric is connected with design through time-dependent system reliability analysis. Time-dependent system reliability computation requires a large number of runs for realistic systems [20,21]. In this paper, a surrogate model-based method is developed to reduce the computational burden. The connection between the proposed resilience assessment and design optimization is investigated through resilience sensitivity analysis and CIM.

The contributions of this paper are thus summarized as: (1) the definition of a new resilience metric, which connects design with resilience assessment; (2) a new time-dependent system reliability analysis method for resilience assessment; (3) a strategy for the efficient evaluation of resilience based on time-dependent system reliability analysis; and (4) investigation of resilient design through sensitivity analysis and CIM.

The remainder of the paper is organized as follows: Section 2 provides background concepts on resilience and time-dependent reliability analysis. Section 3 presents the proposed resilience assessment method. Two numerical examples are used to illustrate the proposed method in Sec. 4. Concluding remarks are provided in Sec. 5.

Background

In this section, we first briefly review two resilience metrics that have a qualitative connection to design. After that, we summarize the concept of time-dependent reliability analysis.

Resilience of an Engineering System.

Figure 1 illustrates a generalized representation of system resilience, which consists of three key elements, namely, reliability, vulnerability, and recoverability.

The reliability element is associated with the probability that the system performs satisfactorily in the presence of disruptive events. It can be time-independent or time-dependent. High reliability implies low probability of performing unsatisfactorily. However, high reliability requires large initial investment. The vulnerability element describes the degraded performance of the system after disruptive events. If a disruption occurs, a system with higher vulnerability will have a more severe failure consequence than a system with lower vulnerability. The recoverability element quantifies how quickly and how well a system can recover to its normal state after disruption. Inspired by the three elements of system resilience, various models and definitions of resilience have been proposed in recent years. Two representative definitions are the resilience metrics proposed by Youn et al. [16] and Ayyub [7]. (It should also be noted that robustness is an element overlapping between reliability and vulnerability).

In the resilience metric proposed by Youn et al. [16], resilience is expressed as a function of reliability (R) and restoration (ρ) as follows: 
Ψ=Reliability(R)+Restoration(ρ)
(1)
The restoration is further expressed as a joint probability of having failure event, correct prognosis, diagnosis, and mitigation/recovery as [16] 
Ψ=R+Pr(Esf)Pr(Ecd|Esf)Pr(Ecp|EcdEsf)Pr(Emr|EcpEcdEsf)=R+(1R)PDiagPProgPCorr
(2)

where Esf is system failure event, Ecd is correct diagnosis, Ecp is correct prognosis, Emr is mitigation/recovery event, PDiag is the probability of correct diagnosis [22], PProg is the probability of correct prognosis [23], and PCorr is the probability of correct recovery.

The resilience metric proposed in Eqs. (1) and (2) focuses on the restoration of the system using PHM methods. In order to increase the resilience of a system, the resilience design problem finally becomes a sensor network design problem, which is associated with the probability of correct diagnosis and prognosis. However, the resilience metric given in Eq. (2) does not include the vulnerability element in Fig. 1. For example, for two systems with identical reliability and PDiagPProgPCorr, it is apparent that the system with lower vulnerability has a higher resilience. But Eq. (2) fails to represent this situation.

Another resilience metric is proposed by Ayyub [7] as 
Ψ=(Tin+FΔTf+ReΔTr)/(Tin+ΔTf+ΔTr)
(3)

where Tin is the time instant of failure initialization, Tf is the time to failure, Tr is the time to recovery, F is the failure profile, Re is the recovery profile, ΔTf is the duration of failure, and ΔTr is the duration of recovery. F measures the robustness and redundancy, and Re measures the resourcefulness and rapidity. As shown in Fig. 2, three failure events and six recovery events have been considered in Ayyub's resilience metric [7].

All the three elements of the original resilience definition in Fig. 1 have been included in the metric defined in Eq. (3). A review of other alternative definitions and metrics of system resilience is available in Ref. [9]. From the literature review, it is found that most of current definitions of resilience metrics have not been explicitly connected to engineering design. The purpose of this paper is to develop a resilience metric that can be quantitatively connected to time-dependent reliability analysis and design optimization. In Secs. 2.2 and 3, we first briefly introduce the concept of time-dependent reliability analysis and then propose a new resilience metric to connect engineering design with resilience.

Time-Dependent Reliability Analysis.

For a response function G(t)=g(X,Y(t),t) with inputs of random variables X=[X1,X2,,Xn], stochastic processes Y(t)=[Y1(t),Y2(t),,Ym(t)], and time t, the time-dependent reliability is defined as [24] 
R(0,t)=Pr{G(τ)=g(X,Y(τ),τ)0,τ[0,t]}
(4)

where Pr{} is probability, “” means “for all”, and [0,t] is the time duration of interest. The corresponding time-dependent failure probability is given by pf(0,t)=1R(0,t).

Time-dependent reliability analysis has been intensively studied during the past years [25]. The efforts in time-dependent reliability analysis have led to a group of time-dependent reliability analysis methods, such as the upcrossing rate methods [26], surrogate model-based methods [27,28], sampling-based approaches [24], and composite limit-state function methods [29]. Next, we develop the proposed approach to perform resilience assessment based on time-dependent reliability analysis.

Resilience Assessment Based on Time-Dependent System Reliability Analysis

In this section, we first propose a new resilience metric for an engineering system. After that, we discuss in detail how to evaluate the resilience based on this metric.

New Definition of Resilience Metric.

Considering the fact that Youn's resilience metric [16] can effectively represent resilience in terms of probability, we propose a new resilience metric by extending Youn's resilience metric [16] to incorporate vulnerability and the effect of uncertainty in recoverability.

We start to explain the proposed new resilience metric by investigating the resilience of a specific system without considering uncertainty. For a specific system, as shown in Fig. 3, consider a certain quantity of interest (QoI). The QoI can be system performance, economic value of the system, or other quantities. Suppose the QoI decreases over time from its original state Q0, and at a certain time instant tf, the QoI suddenly decreases from Q(tf) to Qe<Q0 (QoI after failure) due to disturbance or failure of the system. The quality loss due to the disturbance is Qloss=Q(tf)Qe. After the disturbance, the recovery starts to be active. Recovery has three elements: (1) can the system function be recovered or not, (2) how much can it be recovered, and (3) how long does it take to recover. If the system can be recovered, the recovery activity is performed immediately, and the system recovers to QrQ0 without taking any time (immediate recovery), the recovered QoI is then Qrecover=QrQe. If it takes some time for the system to recover (normal recovery) and the system is recovered at time instant tr, the recovered QoI is then Qrecover=QrQeQ¯(trtf), where Q¯ is the average quality loss during recovery used to account for the required effort for recovery.

Based on the above discussion and basic definition of resilience, we define the resilience of such a specific system as follows: 
Ψ={1,ifthesystemissafeIrecoverQrecoverQloss=IrecoverQrQeQ¯(trtf)Q(tf)Qe=Irecovervrvev¯(trtf)v(tf)ve,otherwise
(5)

where Irecover=0 means the system function cannot be recovered, Irecover=1 indicates the system function can be recovered, Q(t) is the QoI at time instant t,v(tf)=Q(tf)/Q0,vr=Qr/Q0, ve=Qe/Q0, and v¯=Q¯/Q0 are the remaining performance ratio at tf before disturbance, recovery ratio, the remaining performance ratio after disturbance, and average performance loss ratio per unit time during recovery process, respectively. tr=tf corresponds to the situation of immediate recovery. The three elements of recovery are represented as Irecover, vr, and (trtf) in the above equation.

The resilience metric given in Eq. (5) for a specific system is similar to the resilience metrics defined in Refs. [14,30]. When the metric is applied to the design of a component, the uncertainty in the design and working environment need to be considered. Note that for a performance curve as shown in Fig. 3, it corresponds to a resilience value as given in Eq. (5). Due to the uncertainty in design, there may have a lot of different realizations of the performance curves, which means uncertainty in resilience. In order to quantify the resilience of a design over a time duration of interest [0,t], we use the expected resilience value. After considering the uncertainty, the proposed new resilience metric for a component is defined as 
Ψ(t)=Pr{E¯sf}(Ψ̃(t)|E¯sf)+Pr{Esf}(Ψ̃(t)|Esf)
(6)
where Esf is the event of failure, Pr{E¯sf} is the probability of no failure, Ψ̃(t)|E¯sf is the resilience given that the component is safe, which is 1 according to Eq. (5), and Ψ̃(t)|Esf is the resilience given that failure occurs, which is given by 
Ψ̃(t)|Esf=Pr{Recovery|Esf}(Ψ̃(t)|Recovery,Esf)=Pre0tτtftf,tr(τ,ζ)vrvev¯(ζτ)v(τ)vedζdτ
(7)

where Pre=Pr{Recovery|Esf} is the probability of recovery given that the component is failed, which is a probabilistic form of Irecover defined in Eq. (5), (Ψ̃(t)|Recovery,Esf) is the resilience given that the component is failed and can be recovered, 0tτtftf,tr(τ,ζ)[vrvev¯(ζτ)/v(τ)ve]dζdτ is the expected resilience by considering the uncertainty in tf and tr, and ftf,tr(τ,ζ) is the joint probability density function (PDF) of tf and tr. The distribution of tf can be obtained using the time-dependent system reliability analysis method presented in Sec. 3.2.

Combining Eqs. (6) and (7), we have the resilience for a component as 
Ψ(t)=Pr{E¯sf}(Ψ̃(t)|E¯sf)+Pr{Esf}(Ψ̃(t)|Esf)=R(0,t)+(1R(0,t))Pre0tτtftf,tr(τ,ζ)vrvev¯(ζτ)v(τ)vedζdτ
(8)

in which R(0,t) is the time-dependent reliability of the component.

For the sake of illustration and explanation, in Secs. 3.2 to 3.4, we assume that v(tf)=Q(tf)/Q0=1 (i.e., no performance degradation if there is no failure). Equation (8) can then be rewritten as 
Ψ(t)=R(0,t)+(1R(0,t))Pre(vrve1vev¯1ve0tτtftf,tr(τ,ζ)(ζτ)dζdτ)=R(0,t)+(1R(0,t))Prevrvev¯Δt1ve
(9)

in which Δt=0tτtftf,tr(τ,ζ)(ζτ)dζdτ is the expected recovery time.

Since Qe<Q0 and QrQ0, we have ve<1 and vr1. R(0,t), ve and v¯ may be affected by the redundancy of a system since redundancy will reduce the QoI losses due to failure and during recovery. In this paper, the resilience metric is proposed without considering the effect of redundancy. Redundancy can be considered in the proposed resilience metric by studying its effect on R(0,t), ve, and v¯ in future. Analysis of Eq. (9) shows that: (i) when the component is completely reliable (R(0,t)=1), the resilience is also unity; (ii) when the reliability is zero (R(0,t)=0), Ψ(t) is governed by the recovery probability (Pre), recovery time (Δt), recovery ratio (vr), and vulnerability which is represented as the remaining performance ratio after failure (ve); (iii) when the recovery ratio is unity (vr=1) and reliability is zero, Ψ(t) is mainly affected by the recovery time (Δt).

When the resilience is considered for a system of multiple components, the above resilience metric needs to be extended further. The failure of one component may or may not result in the failure of the system. In other words, there are many possible mutually exclusive failure paths for a system with multiple components. For example, an automobile vehicle may fail due to the failure of tires, failure of engine, or failure of transmission. The failure consequences, the required effort of recovery, and the probability of recovery are different for different system failure paths. Note here that the system failure paths are termed as mutually exclusive because any failure region can be decomposed into mutually exclusive failure segments. For instance, there are three mutually exclusive failure paths (A¯B, AB¯, and A¯B¯) for a series system with two components A and B. Similarly, a parallel or combined system can also be decomposed into mutually exclusive failure paths. Based on the mutually exclusive failure paths, Eq. (9) is rewritten as 
Ψ(t)=Rs(0,t)+(1Rs(0,t))i=1Nf[Pr{Recovery|EFSi,Esf}Pr{EFSi|Esf}ψi]=Rs(0,t)+i=1Nf[FS,i(0,t)Pr{Recovery|EFSi,Esf}ψi]
(10)
where Rs(0,t) is the system reliability over time interval [0,t], EFSi is the event that failure path i occurs, Pr{EFSi|Esf} is the probability that the system fails through failure path i, Pr{Recovery|EFSi,Esf} is the probability that the system recovers from failure path i, Nf is the total number of mutually exclusive failure paths, ψi is the system's resilience to the ith system failure path, and FS,i(0,t) and ψi are given by 
FS,i(0,t)=(1Rs(0,t))Pr{EFSi|Esf}
(11)
 
i=1NfFS,i(0,t)=1R(t)
(12)
and 
ψi=(vr,ive,iv¯Δti)/(1ve,i)
(13)

in which vr,i, ve,i, and Δti are the recovery ratio to the system initial performance Q0, remaining ratio to Q0, and the expected required recovery time of the ith failure path.

Let the number of failed components in system failure path i be nf(i), we have 
Pr{Recovery|FSi,Esf}=j=1nf(i)Pre(Findexi(j))
(14)
 
vr,i=1j=1nf(i)(1vr(Findexi(j)))=1(nf(i)j=1nf(i)vr(Findexi(j))),Δti=j=1nf(i)Δt(Findexi(j))
(15)
 
ve,i=1j=1nf(i)(1ve(Findexi(j)))=1(nf(i)j=1nf(i)ve(Findexi(j)))
(16)

in which Pre(k) is the recovery probability of the kth component, Findexi is the vector of failed component indices of the ith system failure path, vr(k) is the recovery ratio to Q0 of the kth component, Δt(k) is the expected required recovery time of the kth component, and ve(k) is the quality remaining ratio to Q0 of the kth component. Note that vr(k), Δt(k), and ve(k) are used as constants for a component k in this paper for the sake of illustration. They can also be treated as random. Pre(i) and the quality recovery ratios and times can be obtained from the failure modes and effects analysis (FMEA) for the system. Besides, Pre(k) can be expressed as Pre(k)=PDiag(k)PProg(k)PCorr(k) to connect the proposed resilience metric with the metric given in Eq. (2).

Combining Eqs. (10)(16), we have the proposed new resilience metric as 
Ψ(t)=Rs(0,t)+i=1Nf[FS,i(0,t)j=1nf(i)Pre(Findexi(j))j=1nf(i)(vr(Findexi(j))ve(Findexi(j))v¯Δt(Findexi(j)))nf(i)j=1nf(i)ve(Findexi(j))]
(17)

Equation (17) indicates that four main elements are required to evaluate the resilience of a system. The four elements are (i) reliability of the system, (ii) probability of having different system failure paths, (iii) probability of recovery of different system failure paths from failure, and (iv) QoI loss due to different system failure paths.

It can be seen from Eq. (10) that the proposed resilience metric has a form similar to Youn's resilience metric [16] as presented in Eq. (2). However, there are mainly three differences between Eqs. (2) and (17): (i) the resilience is expressed as a time-dependent function in Eq. (17) while Eq. (2) is time-independent; (ii) The term PDiagPProgPCorr given in Eq. (2) is combined into one term Pre in Eq. (17) and is expanded into i=1Nf[FS,i(0,t)(j=1nf(i)Pre(Findexi(j)))] by investigating the effects of different mutually exclusive system failure paths; and (iii) Eq. (17) has an extra term (vr,ive,iv¯Δti)/(1ve,i) to include the vulnerability element within resilience assessment. Besides, the investigation of effects of different failure paths on the probability of recovery also incorporates vulnerability into resilience evaluation.

The resilience defined in Eq. (17) is bounded in the interval [0,1], with 1 indicating high resilience of the system and 0 indicating low resilience. Before applying the proposed new resilience metric to engineering design, there are three main challenges that need to be solved.

  1. (1)

    Computationally expensive simulation models are usually used to predict the system response. Since time-dependent system reliability analysis is required in Eq. (17), how to efficiently estimate Rs(0,t) and FS,i(0,t), i=1,2,,Nf over [0,t] is the first challenge.

  2. (2)

    A system with multiple components may have many mutually exclusive failure paths, which are required in the proposed resilience assessment. How to efficiently enumerate these mutually exclusive system failure paths is the second challenge.

  3. (3)

    Given the resilience metric defined in Eq. (17), how to efficiently perform resilience assessment and how to connect the resilience analysis with design is the third challenge.

In this paper, a new time-dependent system reliability analysis method is developed to address the first challenge. Based on the system reliability analysis method, the second and third challenges are solved as well.

Time-Dependent System Reliability Analysis.

Time-dependent system reliability analysis provides R(0,t) and FS,i(0,t), i=1,2,,Nf, required in Eq. (17). During the past decades, only a few methods have been reported for time-dependent system reliability analysis [20,31,32]. Most of the reported system reliability analysis methods rely on the first-order reliability method (FORM). In this paper, to remove the limitation of FORM and yet be computationally efficient, a recently developed single-loop Kriging (SILK) surrogate modeling method is employed and extended for time-dependent system reliability analysis [33]. Note that failure sequences and brittle failure events [34] are important issues for time-dependent system reliability analysis. In the case of ductile failures, the overall system limit state is not affected by the sequence of component failures [35]. However, in the case of brittle failures, the failure of a component changes the limit state functions of the other components; as a result, the overall system limit state is dependent on the failure sequence [35]. Consider a two-bar system with brittle failures as shown in Fig. 4. Two failure sequences are possible (as in Fig. 4(c)), and the corresponding reliability block diagram (RBD) is shown in Fig. 4(d) [36]. The time-dependent system reliability method discussed in this paper is applicable when the sequences are identified and the RBD is available. In large systems with multiple components, dominant failure sequences may need to be identified using a branch-and-bound technique [34] or adaptive sampling [34].

For series and parallel systems, the time-dependent system failure probabilities are given by 
pfseries(0,t)=Pr{igi(X,Y(τi),τi)>0,τi[0,t]}
(18)
 
pfparallel(0,t)=Pr{igi(X,Y(τi),τi)>0,τi[0,t]}
(19)

where “” is “union,” “” is “intersection,” and gi(X,Y(τi),τi) is the limit-state function of the ith component.

In the context of surrogate model-based reliability analysis, methods have been proposed to construct a single extreme value surrogate model for system reliability analysis [37,38]. The extreme value surrogate model may be highly nonlinear. In this situation, building surrogate models for individual limit state functions is a promising way. In this paper, we therefore build a surrogate model for each individual limit state function and the SILK method is employed for the surrogate modeling. The original SILK method only focused on the estimation of pf(0,t), which is point estimation. For different time intervals, surrogate models need to be constructed repeatedly to obtain the failure probability up to [0,t]. In this section, we first briefly review the SILK method. Based on that, we modify the original SILK method to efficiently estimate pf(0,τ),τ[0,t] and FS,i(0,τ),τ[0,t], i=1,2,,Nf. By doing so, we can evaluate the resilience up to [0,t] with just one surrogate model.

A Brief Review of SILK.

In SILK [33], a single surrogate model Ĝ=ĝ(X,Y,t) is built using the Kriging surrogate modeling method for time-dependent reliability analysis. An initial surrogate model Ĝ=ĝ(X,Y,t) is built. After that, the surrogate model is refined adaptively based on a learning function and a convergence criterion. In order to refine the surrogate model, the time interval [0,t] is discretized into Nt time instants, and NMCS samples are generated for X and NMCS trajectories are generated for Y(t). For any given sample point [x(i),y(i)(t(j)),t(j)],i=1,2,,NMCS;j=1,2,,Nt, from the Kriging surrogate model, we have 
ĜN(ĝ(x(i),y(i)(t(j)),t(j)),σĝ2(x(i),y(i)(t(j)),t(j)))
(20)

where N(,) stands for normal distribution, ĝ(x(i),y(i)(t(j)),t(j)) and σĝ2(x(i),y(i)(t(j)),t(j)) are mean and variance of the prediction, which are obtained from Kriging surrogate model [39], and y(i)(t(j)) is the ith trajectory of Y(t) at time instant t(j).

During iterations of the surrogate model training, the quality of the Kriging surrogate model is checked using the following convergence criterion [33]: 
εrmax=maxNf2*[0,Nf2]{|Nf2Nf2*|/(Nf1+Nf2*)×100%}
(21)
where Nf1=i=1NMCSI1(i) and Nf2=i=1NMCSI2(i). I1(i) and I2(i), i=1,2,,NMCS are given by 
I1(i)={1,ifmaxt[t0,te]{ĝ(x(i),y(i)(t),t)}>0andUmin(i)20,otherwise
(22)
 
I2(i)={1,ifmaxt[t0,te]{ĝ(x(i),y(i)(t),t)}>0andUmin(i)<20,otherwise
(23)
in which 
Umin(i)={ue,ifĝ(x(i),y(i)(t(j)),t(j))>0andU(x(i),y(i)(t(j)),t(j))2,j=1,2,,Ntminj=1,2,,Nt{U(x(i),y(i)(t(j)),t(j))},otherwise
(24)
where ue is an arbitrary constant larger than 2 and U(x(i),y(i)(t(j)),t(j)) is given by [33] 
U(x(i),y(i)(t(j)),t(j))=|ĝ(x(i),y(i)(t(j)),t(j))|/σĝ(x(i),y(i)(t(j)),t(j))
(25)
If εrmax is less than a specific requirement (say 5%), the time-dependent failure probability is then estimated by [33] 
pf(t0,te)=i=1NMCSIf(maxt[t0,te]{ĝ(x(i),y(i)(t),t)})
(26)
where 
If(maxt[t0,te]{ĝ(x(i),y(i)(t),t)})={1,ifmaxt[t0,te]{ĝ(x(i),y(i)(t),t)}>0;0,otherwise
(27)
If εrmax is larger than the accuracy requirement (say 5%), a new training point is identified by 
xnewt=[x(inew),y(inew)(t(inewt)),t(inewt)]andmax{ρ(xnewt,xs)}<0.95
(28)
where xs is a matrix of current training points and ρ(xnewt,xs) are the correlations between the new training points and current training points, inew and inewt are indices obtained by 
inew=argmini=1,2,,NMCS{Umin(i)}
(29)
 
inewt=argmini=1,2,,Nt{U(x(inew),y(inew)(t(i)),t(i))}
(30)

A detailed description of SILK is available in Ref. [33].

Time-Dependent System Reliability Analysis Based on SILK.

As discussed above, the original SILK method only focuses on estimating pf(0,t) instead of pf(0,τ),τ[0,t]. In order to accurately estimate pf(0,τ),τ[0,t], the first-passage boundary needs to be accurately modeled in the surrogate model Ĝ=ĝ(X,Y,t). It also implies that for every trajectory of the response function, the sign of points close to the first-passage point as shown in Fig. 5 needs to be accurately classified.

Assume that the first-time passage occurs at time instant tfirst, the event of accurately classifying the sign of the first-passage point in each trajectory can be expressed as 
{ĝ(x,y(τ),τ)0|ĝ(x,y(tfirst),tfirst)>0,τ[0,tfirst]}{thesignofĝ(x,y(τ),τ)isaccuratelyclassified,τ[0,tfirst]}
(31)
Based on Eq. (31), the Umin defined in the original SILK (Eq. (24)) is modified as follows: 
Umin(i)={mink=1,2,,j{U(x(i),y(i)(t(k)),t(k))},ifĝ(x(i),y(i)(t(j)),t(j))>0andU(x(i),y(i)(t(j)),t(j))2,j=1,2,,Ntminj=1,2,,Nt{U(x(i),y(i)(t(j)),t(j))},otherwise
(32)
The other steps of SILK remain the same as in the original SILK method. From the individual surrogate models constructed from SILK, the failure or safe states of individual components can be obtained. In order to perform system reliability analysis based on the surrogate modeling, the following Boolean functions are defined according to the RBD of the system. For a series system with Nc components, the Boolean function for the kth realization of the system over time interval [0,τ],τt is given by [20]: 
IB(k,τ)=i=1NcIs,i(k,τ)
(33)

where IB(k,τ) is the system failure indicator for the kth realization of the system with IB(k,τ)>0 indicating failure and IB(k,τ)=0 indicating success, Is,i(k,τ) is the failure indicator of the ith component over time interval [0,τ],τt, and Is,i(k,τ)=1 indicates failure and Is,i(k,τ)=0 indicates success.

For a parallel system, the corresponding Boolean function is given by 
IB(k,τ)=i=1NcIs,i(k,τ)
(34)

For a combined series and parallel system, the system Boolean function is defined according to the system topology based on Eqs. (33) and (34). For instance, for the kth realization of a combined system as shown in Fig. 6, the Boolean function is defined as

 
IB(k,t)=(i=13Is,i(k,t))+Is,2(k,t)Is,3(k,t)+Is,1(k,t)Is,3(k,t)Is,4(k,t)
(35)
Once the Boolean function is defined for the system, the system failure state can be identified based on the states of individual components and the system failure probability is estimated as 
pf(0,τ)=k=1NMCSIsys(IB(k,τ))/NMCS,τ[0,t]
(36)

where Isys(IB(k,τ))=1, if IB(k,τ)>0 and Isys(IB(k,τ))=0,otherwise.

By implementing a similar procedure, we can also estimate FS,i(0,t), i=1,2,,Nf. However, there is a challenge that the number of failure scenarios (i.e., Nf) will increase exponentially with the number of components. This makes it almost impossible to get all FS,i(0,t), i=1,2,,Nf. In Sec. 3.3, we will discuss how to perform resilience assessment by overcoming this challenge.

Resilience Assessment.

According to the resilience metric defined in Eq. (17), the first step of resilience assessment is to identify all the mutually exclusive system failure paths. A possible way of achieving this purpose is to use the binary decision diagram (BDD)-based method as presented in Ref. [40]. From the BDD, the mutually exclusive failure paths can be identified efficiently. However, for some failure paths, there are still a lot of possible failure paths. In the proposed resilience metric, all the failure paths need to be identified. This is not practical for a system with a large number of components even if the BDD-based method [40] is employed.

Since a surrogate modeling-based time-dependent system reliability analysis method has been developed in Sec. 3.2, Monte Carlo simulation (MCS) can be performed on the individual surrogate models to evaluate the failure states of individual components. Based on MCS, the dominant system failure paths can be easily enumerated. In this paper, the abovementioned challenge is therefore solved using MCS-based method by taking advantage of the developed surrogate modeling reliability analysis method. The reason that the MCS-based method can be used to solve the above challenge is that FS,i(0,τ),τ[0,t] can be written as follows: 
FS,i(0,τ)=nf,i/NMCS=j=1NMCSIf,i(j,τ)/NMCS
(37)

where nf,i is the number of failed system realizations through the ith system failure path, and If,i(j,τ) is the failure indicator function of the ith failure path at the jth random realization.

The resilience Ψ(t) (Eq. (17)) can then be estimated based on the modified SILK as below 
Ψ(τ)=Rs(0,τ)+i=1Nf[j=1NMCSIf,i(j,τ)NMCSj=1nf(i)Pre(Findexi(j))j=1nf(i)(vr(Findexi(j))ve(Findexi(j))v¯Δt(Findexi(j)))nf(i)j=1nf(i)ve(Findexi(j))],τ[0,t]
(38)
Defining  
ai=j=1nf(i)Pre(Findexi(j))[j=1nf(i)(vr(Findexi(j))ve(Findexi(j))v¯Δt(Findexi(j)))]/[nf(i)j=1nf(i)ve(Findexi(j))]
Eq. (38) is rewritten as 
Ψ(τ)=Rs(0,τ)+1NMCSj=1NMCSa¯(j),τ[0,t]
(39)

where a¯(j)=i=1NfIf,i(j,τ)ai.

The above equation indicates that we need to compute the value of a¯(j) for each failed random realization. Based on the definitions given in Eqs. (14)(16), we then have 
a¯(j)={0,ifIB(j,τ)=0k=1nfail(j)Pre(Ffail(j)(k))[k=1nfail(j)(vr(Ffail(j)(k))ve(Ffail(j)(k))v¯Δt(Ffail(j)(k)))]1nfail(j)k=1nfail(j)ve(Ffail(j)(k)),otherwise
(40)

in which Ffail(j)(k),k=1,,nfail(j) is the vector of failed component indices and nfail(j) is the number of failed components in the jth failed random realization. Note that failure indicator and failed indices are discussed at the component level in this paper, the failure indicators of component-level failure modes need to be converted into failure indicator of components if a component has multiple failure modes.

With all the methods proposed from Secs. 3.1 to 3.3, Ψ(τ),τ[0,t] of a system can be estimated. Table 1 summarizes the main procedure of the proposed resilience assessment method.

Resilience Sensitivity Analysis and CIM.

In this section, the relationship between design variables and the proposed resilience metric is investigated through resilience sensitivity analysis and CIM.

Resilience Sensitivity Analysis.

Assume that the design variables are the mean values of random variables and are denoted as μ=[μ1,μ2,,μnd], the resilience metric given in Eq. (17) is written as 
Ψ(t)=1ΩSIS(z,t)fZ(z,μ)dz+{i=1NfaiΩSiIf,i(z,t)fZ(z,μ)dz}
(41)

where z is a realization of random variables Z, IS(z,t) is the system failure indicator over [0,t], fZ(z,μ) is the joint PDF of Z under given μ, If,i(z,t) is the failure indicator of the ith system failure path over [0,t], and ΩS and ΩSi are failure domains of the system and the ith system failure path, respectively.

It should be noted that when stochastic processes Y(t) are present in the problem, a stochastic process Yi(t) needs to be converted into independent random variables first using the Karhunen–Loève (K-L) expansion as follows: 
Yi(t)=μYi(t)+σYi(t)j=1neλjξjfj(t)
(42)

in which μYi(t) and σYi(t) are the mean and standard deviation of Yi(t), ξj, j=1,2,,ne are independent random variables, λj and fj(t) are the eigenvalues and eigenvectors of the covariance function of Yi(t), and ne is the number of eigenvectors used to represent the stochastic process.

Based on the K-L expansion, Z=[X,ξ] includes not only the random variables X but also the expansion random variables ξ. Since a surrogate model is built as a function of X, Y, and t in the SILK method, z needs to be transformed into values of x and y through zEq.(42)x,y to obtained the IS(z,t) and If,i(z,t) in Eq. (41). With the expression given in Eq. (41), the resilience sensitivity with respect to μi is given by 
Ψ(t)μi=ΩSIS(z,t)fZ(z,μ)dz/μi+{i=1Nfai(ΩSiIf,i(z,t)fZ(z,μ)dz/μi)}
(43)
Since IS(z,t)=i=1NfIf,i(z,t), Eq. (43) is simplified as 
Ψ(t)μi={ΩSii=1NfIf,i(z,t)(1ai)fZ(z,μ)dz}/μi=i=1Nf(1ai)(ΩSiIf,i(z,t)fZ(z,μ)dz/μi)
(44)
The differentiation term in Eq. (44) can be rewritten using Leibniz's rule as below [41,42] 
ΩSiIf,i(z,t)fZ(z,μ)dzμi=EZ[If,i(z,t)lnfZ(z,μ)μi]
(45)
where EZ[] is expectation and lnfZ(z,μ)/μi has analytical expressions for some distributions. For example, when the distribution is normal, we have 
lnfZ(Zi,μi)/μi=(ziμi)/σi2
(46)

in which σi is the standard deviation of the normal random variable.

Since MCS based on surrogate modeling is used to estimate the time-dependent system reliability and probabilities of different system failure paths, Eq. (45) is estimated using MCS as 
ΩSiIf,i(z,t)fZ(z,μ)dzμi=1NMCSj=1NMCSIf,i(z(j),t)lnfZ(z,μ)μi|z=z(j)
(47)
Substituting Eq. (47) into Eq. (44), we have the resilience sensitivity as 
Ψ(t)μi=1NMCSj=1NMCSi=1Nf(1ai)If,i(z(j),t)lnfZ(z,μ)μi|z=z(j)
(48)
Defining a¯new(j)=i=1NfIf,i(z(j),t)(1ai), we have the resilience sensitivity as 
Ψ(t)μi=1NMCSj=1NMCSa¯new(j)lnfZ(z,μ)μi|z=z(j)
(49)
and 
a¯new(j)={0,ifIB(z(j),τ)=01a¯(j),otherwise
(50)

where a¯(j) is given in Eq. (40).

Resilience CIM.

CIM [43] was originally developed to measure the importance of a component to the system reliability. In this paper, the CIM is extended to resilience analysis of a system based on the proposed resilience metric. In the resilience CIM, we are interested in the effect of the resilience of component i on the resilience of the system. Since the proposed resilience metric includes two parts: reliability and restoration as shown in Eq. (17), the resilience CIM is defined as 
ΔΨi(t)=ΔΨis(t)+ΔΨir(t)
(51)

where ΔΨis(t) is the resilience difference given that component i is safe and ΔΨir(t) is the resilience difference given that the recoverability of component i is one.

ΔΨis(t) and ΔΨir(t) are given by 
ΔΨis(t)=(Ψ(t)|Is,i=0)Ψ(t)
(52)
 
ΔΨir(t)={Ψ(t)|Pre(i)=1,vr(i)=1,Δt(i)=0}Ψ(t)
(53)

Based on the resilience CIM, the importance of each component to the resilience of the system can be analyzed. In design for resilience, we could allocate different resilience levels to different components based on the CIM [44].

Numerical Examples

In this section, a roller clutch without brittle failure events and a cantilever beam-bar system with brittle failure events are used to demonstrate the proposed resilience assessment method.

A Roller Clutch.

An automotive roller clutch as shown in Fig. 7 is adopted from Ref. [29] as our first example. For proper operation of the clutch, three performance functions, namely, contact angle, torque capacity, and hoop stress, need to be verified during the clutch design. A proper contact angle ensures that the clutch will not be scraped. A requirement of torque avoids the situation that the clutch is locked. The hoop stress requirement guarantees the fatigue life of the cage [29]. The clutch will fail if any of the three requirements cannot be satisfied.

The relationship between the contact angle α, the torque Th, the hoop stress σh, and the geometry of the clutch is given by [29] 
α=cos1((DH+d)/(Dind))
(54)
 
Th=4L(σc/c1)2(DH2d/(4(DH+d)))1((DH+d)/(Dind))2
(55)
and 
σh=4(σc/(2πc1))DHd(DH+d)(Dout2+Din2)/[(DH+d)(Dind)Din(Dout2Din2)]
(56)

in which L=80mm, σc=3790MPa, c1=0.25πE/2(10.292), and σc=207GPa.

Due to surface wear and corrosion, the dimensions of the clutch are decreasing over time. We have DH(t)=DH0(1kt), d(t)=d0(1kt), and Din(t)=Din0(1kt), where DH0, d0, and Din0 are the initial dimensions of the hub, roller, and inner diameter of the cage, and k=1×104m/year is the dimension decreasing rate. Based on the requirement of contact angle, torque capacity, and hoop stress, the following time-dependent failure probabilities are defined: 
pf1(0,t)=Pr{α=cos1[(DH(τ)+d(τ))/(Din(τ)d(τ))]0.110.03,τ[0,t]}
(57)
 
pf2(0,t)=Pr{α=0.11cos1[(DH(τ)+d(τ))/(Din(τ)d(τ))]0.03,τ[0,t]}
(58)
 
pf3(0,t)=Pr{35004L(σcc1)2DH2(τ)d(τ)4(DH(τ)+d(τ))1(DH(τ)+d(τ)Din(τ)d(τ))20,τ[0,t]}
(59)
and 
pf4(0,t)=Pr{42π(σcc1)DH(τ)d(τ)DH(τ)+d(τ)DH(τ)+d(τ)(Din(τ)d(τ))Din(τ)×Dout2+Din2(τ)Dout2Din2(τ)398×1060,τ[0,t]}
(60)

In the above time-dependent failure probability expressions, pf1(0,t) and pf2(0,t) are related to the contact angle, pf3(0,t) is related to the torque capacity, and pf4(0,t) is related to the cage stress. Table 2 gives the random variables of the roller clutch example. The QoI of the clutch is torque. There are three types of components: roller (pf1(0,t) and pf2(0,t)), hub (pf3(0,t)), and cage (pf4(0,t)). The average quality loss rate (v¯=Q¯/Q0) during revoery is assumed to be 0.2/year. Table 3 gives the assumed data of the three types of components for the resilience assessment of the roller clutch.

Following the procedure given in Table 1, we first construct surrogate models for the limit-state functions given in Eqs. (56)(59) using the modified SILK method. Table 4 gives the number of function evaluations (NOF) required for each limit-state function.

Figure 8 plots the comparison of time-dependent system failure probability obtained from the modified SILK with Kriging surrogate model and MCS. It shows that the modified SILK method can accurately estimate the time-dependent system failure probability. We then perform resilience assessment for the roller clutch. Figure 9 gives the resilience of the roller clutch over 20 years. Along with the resilience curve, we also plot two realizations of the system performance curves with failure events. In each individual realization, the system performance is recovered to a particular value after failure due to the recovery activity. Comparing Figs. 8 and 9, it can be found that considering the recovery activity has increased the resilience of the system.

We also perform resilience sensitivity analysis for the mean values of Dout, Din0, DH0, and d0 and resilience CIM using the method presented in Sec. 3.4. Figures 10 and 11 plot the results of resilience sensitivity analysis and CIM over different time intervals. The results show that the resilience is the most sensivity to the mean of d0. With the increase of time duration, sensivities of Dout, DH0, and d0 are getting close to each other. The results of CIM analysis indicate that component 1 (roller) is the most important for the clutch resilience.

A Cantilever Beam-Bar System.

A cantilever beam-bar system as shown in Fig. 12 is modified from Refs. [36,40] as our second example. There are three components in the system including (1) bar, (2) beam, and (3) joint at the fixed point. The RBD which defines the failure of the system is also given in Fig. 12. There are brittle failure events in this example. The failure of component 3 (i.e., joint at the fixed point) will change the limit state function of components 1 and 2. Meanwhile, the failure of the bar will trigger the change in limit state function of component 3.

The time-dependent failure probabilities of the three components are given by 
pf1(0,t)=Pr{5F(τ)/16S0,τ[0,t]}
(61)
 
pf1|3(0,t)=Pr{LF(τ)M2LS0,τ[0,t]},ifcomponent3fails
(62)
 
pf2|3(0,t)=Pr{LF(τ)/3M0,τ[0,t]},ifcomponent3fails
(63)
 
pf3(0,t)=Pr{3LF(τ)/8M0,τ[0,t]}
(64)
 
pf3|1(0,t)=Pr{LF(τ)M0,τ[0,t]},ifcomponent1fails
(65)

Table 5 gives the random variables and stochastic load process of the cantilever beam-bar system. The QoI of this example is the cost of the system. Table 6 gives the assumed recovery data of the three components for the resilience assessment of the cantilever beam-bar system. The average quality loss rate (v¯=Q¯/Q0) during revoery is 0.6/year. In this example, the load F(t) is modeled as a stationary Gaussian stochastic process and the correlation of the stochastic process is given by

 
ρF(t1,t2)=exp((t2t1)2)
(66)

Equations (61)(65) show that each component has two-stage failure paths. Based on the relationship between the trigger events and the resulted failure modes, the RBD as shown in Fig. 12 is modified as Fig. 13, which is the same as that presented in Refs. [36,40].

Based on the modified RBD, we perform time-dependent system reliability analysis and resilience assessment for the system. Figure 14 gives the resilience of the system over twenty years. Figure 15 presents the CIM analysis results.

The result illustrates that the resilience decreases with time and component 2 (Beam) and 3 (Joint) are more important than component 1 (Bar) for the system resilience.

Conclusion

A new resilience metric is proposed in this paper in order to connect resilience assessment to engineering design, by investigating the effects of failure, recovery, and the system failure paths on system resilience. The proposed resilience metric is expressed as a function of time-dependent system failure paths, reliability, and recovery probability. This builds a bridge between design and the resilience metric. A new time-dependent system reliability analysis method is presented to efficiently evaluate system resilience based on the proposed resilience metric. Resilience sensitivity analysis and CIM are also discussed based on the proposed metric to study the connection between resilience and design. Two numerical examples illustrate the effectiveness of the proposed method.

In the proposed resilience metric, the recovery probability of a component is assumed to be constant. In reality, the recovery probability may be random as well. How to integrate the health monitoring system into the proposed resilience metric needs to be investigated in the future. Other future needs include considering redundancy [18] among components in the system resilience assessment, accounting for the interdependency between different components and multiple failure sequences, considering different types of recovery scenarios, and learning the interdependence between components using BNs.

Acknowledgment

The research reported in this paper was supported by the Air Force Office of Scientific Research (Grant No. FA9550-15-1-0018, Technical Monitor: Dr. David Stargel). The support is gratefully acknowledged.

References

References
1.
Plodinec
,
M. J.
,
2009
, “
Definitions of Resilience: An Analysis
,” Oak Ridge: Community and Regional Resilience Institute (CARRI), Report No. accessed Dec. 20, 2015, http://www.resilientus.org/wp-content/uploads/2013/08/definitions-of-community-resilience.pdf
2.
Cumming
,
G. S.
,
2011
, “
Spatial Resilience: Integrating Landscape Ecology, Resilience, and Sustainability
,”
Landscape Ecol.
,
26
(
7
), pp.
899
909
.
3.
Norris
,
F. H.
,
Stevens
,
S. P.
,
Pfefferbaum
,
B.
,
Wyche
,
K. F.
, and
Pfefferbaum
,
R. L.
,
2008
, “
Community Resilience as a Metaphor, Theory, Set of Capacities, and Strategy for Disaster Readiness
,”
Am. J. Commun. Psychol.
,
41
(
1–2
), pp.
127
150
.
4.
Plummer
,
R.
, and
Armitage
,
D.
,
2007
, “
A Resilience-Based Framework for Evaluating Adaptive Co-management: Linking Ecology, Economics and Society in a Complex World
,”
Ecol. Econ.
,
61
(
1
), pp.
62
74
.
5.
ASME-ITI
,
2009
,
All Hazards Risk and Resilience–Prioritizing Critical Infrastructure Using the RAMCAP Plus SM Approach
,
ASME Innovative Technology Institute
,
Washington, DC
.http://files.asme.org/ASMEITI/RAMCAP/17978.pdf
6.
Ouyang
,
M.
, and
Wang
,
Z.
,
2015
, “
Resilience Assessment of Interdependent Infrastructure Systems: With a Focus on Joint Restoration Modeling and Analysis
,”
Reliab. Eng. Syst. Saf.
,
141
, pp.
74
82
.
7.
Ayyub
,
B. M.
,
2014
, “
Systems Resilience for Multihazard Environments: Definition, Metrics, and Valuation for Decision Making
,”
Risk Anal.
,
34
(
2
), pp.
340
355
.
8.
Reed
,
D. A.
,
Kapur
,
K. C.
, and
Christie
,
R. D.
,
2009
, “
Methodology for Assessing the Resilience of Networked Infrastructure
,”
IEEE Syst. J.
,
3
(
2
), pp.
174
180
.
9.
Hosseini
,
S.
,
Barker
,
K.
, and
Ramirez-Marquez
,
J. E.
,
2016
, “
A Review of Definitions and Measures of System Resilience
,”
Reliab. Eng. Syst. Saf.
,
145
, pp.
47
61
.
10.
Hosseini
,
S.
,
Yodo
,
N.
, and
Wang
,
P.
, “
Resilience Modeling and Quantification for Design of Complex Engineered Systems Using Bayesian Networks
,”
ASME
Paper No. DETC2014-34558.
11.
Yodo
,
N.
, and
Wang
,
P.
,
2016
, “
Resilience Modeling and Quantification for Engineered Systems Using Bayesian Networks
,”
ASME J. Mech. Des.
,
138
(
3
), p.
031404
.
12.
Panteli
,
M.
, and
Mancarella
,
P.
,
2015
, “
Modeling and Evaluating the Resilience of Critical Electrical Power Infrastructure to Extreme Weather Events
,”
IEEE Syst. J.
,
PP
(
99
), pp.
1
10
.
13.
Baroud
,
H.
,
Barker
,
K.
, and
Ramirez-Marquez
,
J. E.
,
2014
, “
Importance Measures for Inland Waterway Network Resilience
,”
Transp. Res., Part E
,
62
, pp.
55
67
.
14.
Barker
,
K.
,
Ramirez-Marquez
,
J. E.
, and
Rocco
,
C. M.
,
2013
, “
Resilience-Based Network Component Importance Measures
,”
Reliab. Eng. Syst. Saf.
,
117
, pp.
89
97
.
15.
Spiegler
,
V. L.
,
Naim
,
M. M.
, and
Wikner
,
J.
,
2012
, “
A Control Engineering Approach to the Assessment of Supply Chain Resilience
,”
Int. J. Prod. Res.
,
50
(
21
), pp.
6162
6187
.
16.
Youn
,
B. D.
,
Hu
,
C.
, and
Wang
,
P.
,
2011
, “
Resilience-Driven System Design of Complex Engineered Systems
,”
ASME J. Mech. Des.
,
133
(
10
), p.
101011
.
17.
Mehrpouyan
,
H.
,
Haley
,
B.
,
Dong
,
A.
,
Tumer
, I
. Y.
, and
Hoyle
,
C.
,
2015
, “
Resiliency Analysis for Complex Engineered System Design
,”
Artif. Intell. Eng. Des., Anal. Manuf.
,
29
(
1
), pp.
93
108
.
18.
Wang
,
J.
, and
Li
,
M.
,
2015
, “
Redundancy Allocation for Reliability Design of Engineering Systems With Failure Interactions
,”
ASME J. Mech. Des.
,
137
(
3
), p.
031403
.
19.
Wang
,
J.
, and
Li
,
M.
,
2015
, “
Redundancy Allocation Optimization for Multistate Systems With Failure Interactions Using Semi-Markov Process
,”
ASME J. Mech. Des.
,
137
(
10
), p.
101403
.
20.
Hu
,
Z.
, and
Mahadevan
,
S.
,
2015
, “
Time-Dependent System Reliability Analysis Using Random Field Discretization
,”
ASME J. Mech. Des.
,
137
(
10
), p.
101404
.
21.
Mourelatos
,
Z. P.
,
Majcher
,
M.
,
Pandey
,
V.
, and
Baseski
,
I.
,
2015
, “
Time-Dependent Reliability Analysis Using the Total Probability Theorem
,”
ASME J. Mech. Des.
,
137
(
3
), p.
031405
.
22.
Jeon
,
B. C.
,
Jung
,
J. H.
,
Youn
,
B. D.
,
Kim
,
Y.-W.
, and
Bae
,
Y.-C.
,
2015
, “
Datum Unit Optimization for Robustness of a Journal Bearing Diagnosis System
,”
Int. J. Precis. Eng. Manuf.
,
16
(
11
), pp.
2411
2425
.
23.
Hu
,
C.
,
Youn
,
B. D.
,
Wang
,
P.
, and
Yoon
,
J. T.
,
2012
, “
Ensemble of Data-Driven Prognostic Algorithms for Robust Prediction of Remaining Useful Life
,”
Reliab. Eng. Syst. Saf.
,
103
, pp.
120
135
.
24.
Hu
,
Z.
, and
Du
,
X.
,
2015
, “
First Order Reliability Method for Time-Variant Problems Using Series Expansions
,”
Struct. Multidiscip. Optim.
,
51
(
1
), pp.
1
21
.
25.
Hu
,
Z.
, and
Mahadevan
,
S.
, “
Accelerated Life Testing (ALT) Design Based on Computational Reliability Analysis
,”
Qual. Reliab. Eng. Int.
(in press).
26.
Hu
,
Z.
, and
Du
,
X.
,
2013
, “
Time-Dependent Reliability Analysis With Joint Upcrossing Rates
,”
Struct. Multidiscip. Optim.
,
48
(
5
), pp.
893
907
.
27.
Wang
,
Z.
, and
Wang
,
P.
,
2012
, “
A Nested Extreme Response Surface Approach for Time-Dependent Reliability-Based Design Optimization
,”
ASME J. Mech. Des.
,
134
(
12
), p.
121007
.
28.
Hu
,
Z.
, and
Du
,
X.
,
2015
, “
Mixed Efficient Global Optimization for Time-Dependent Reliability Analysis
,”
ASME J. Mech. Des.
,
137
(
5
), p.
051401
.
29.
Singh
,
A.
,
Mourelatos
,
Z. P.
, and
Li
,
J.
,
2010
, “
Design for Lifecycle Cost Using Time-Dependent Reliability
,”
ASME J. Mech. Des.
,
132
(
9
), p.
091008
.
30.
Bruneau
,
M.
,
Chang
,
S. E.
,
Eguchi
,
R. T.
,
Lee
,
G. C.
,
O'Rourke
,
T. D.
,
Reinhorn
,
A. M.
,
Shinozuka
,
M.
,
Tierney
,
K.
,
Wallace
,
W. A.
, and
von Winterfeldt
,
D.
,
2003
, “
A Framework to Quantitatively Assess and Enhance the Seismic Resilience of Communities
,”
Earthquake Spectra
,
19
(
4
), pp.
733
752
.
31.
Song
,
J.
, and
Der Kiureghian
,
A.
,
2006
, “
Joint First-Passage Probability and Reliability of Systems Under Stochastic Excitation
,”
J. Eng. Mech.
,
132
(
1
), pp.
65
77
.
32.
Hu
,
Z.
,
Zhu
,
Z.
, and
Du
,
X.
,
2015
, “
Time-Dependent Reliability Analysis for Bivariate Responses
,”
ASME
Paper No. IMECE2015-53441.
33.
Hu
,
Z.
, and
Mahadevan
,
S.
,
2016
, “
A Single-Loop Kriging Surrogate Modeling for Time-Dependent Reliability Analysis
,”
ASME J. Mech. Des.
,
138
(
6
), p.
061406
.
34.
Mahadevan
,
S.
, and
Dey
,
A.
,
1997
, “
Adaptive Monte Carlo Simulation for Time-Variant Reliability Analysis of Brittle Structures
,”
AIAA J.
,
35
(
2
), pp.
321
326
.
35.
Melchers
,
R. E.
,
1999
,
Structural Reliability Analysis and Prediction
,
Wiley
,
New York
.
36.
Song
,
J.
, and
Der Kiureghian
,
A.
,
2003
, “
Bounds on System Reliability by Linear Programming
,”
ASME J. Eng. Mech.
,
129
(
6
), pp.
627
636
.
37.
Bichon
,
B. J.
,
McFarland
,
J. M.
, and
Mahadevan
,
S.
,
2011
, “
Efficient Surrogate Models for Reliability Analysis of Systems With Multiple Failure Modes
,”
Reliab. Eng. Syst. Saf.
,
96
(
10
), pp.
1386
1395
.
38.
Fauriat
,
W.
, and
Gayton
,
N.
,
2014
, “
AK-SYS: An Adaptation of the AK-MCS Method for System Reliability
,”
Reliab. Eng. Syst. Saf.
,
123
, pp.
137
144
.
39.
Rasmussen
,
C. E.
,
2006
,
Gaussian Processes for Machine Learning
,
The MIT Press
,
Cambridge, MA
.
40.
Wang
,
P.
,
Hu
,
C.
, and
Youn
,
B. D.
,
2011
, “
A Generalized Complementary Intersection Method (GCIM) for System Reliability Analysis
,”
ASME J. Mech. Des.
,
133
(
7
), p.
071003
.
41.
Lee
,
I.
,
Choi
,
K.
, and
Zhao
,
L.
,
2011
, “
Sampling-Based RBDO Using the Stochastic Sensitivity Analysis and Dynamic Kriging Method
,”
Struct. Multidiscip. Optim.
,
44
(
3
), pp.
299
317
.
42.
Rahman
,
S.
,
2009
, “
Stochastic Sensitivity Analysis by Dimensional Decomposition and Score Functions
,”
Probab. Eng. Mech.
,
24
(
3
), pp.
278
287
.
43.
Kuo
,
W.
, and
Zhu
,
X.
,
2012
, “
Some Recent Advances on Importance Measures in Reliability
,”
IEEE Trans. Reliab.
,
61
(
2
), pp.
344
360
.
44.
Li
,
C.
, and
Mahadevan
,
S.
,
2016
, “
Role of Calibration, Validation, and Relevance in Multi-Level Uncertainty Integration
,”
Reliab. Eng. Syst. Saf.
,
148
, pp.
32
43
.