Random variables are commonly encountered in engineering applications, and their distributions are required for analysis and design, especially for reliability prediction during the design process. Distribution parameters are usually estimated using samples. In many applications, samples are in the form of intervals, and the estimated distribution parameters will also be in intervals. Traditional reliability methodologies assume independent interval distribution parameters, but as shown in this study, the parameters are actually dependent since they are estimated from the same set of samples. This study investigates the effect of the dependence of distribution parameters on the accuracy of reliability analysis results. The major approach is numerical simulation and optimization. This study demonstrates that the independent distribution parameter assumption makes the estimated reliability bounds wider than the true bounds. The reason is that the actual combination of the distribution parameters may not include the entire box-type domain assumed by the independent interval parameter assumption. The results of this study not only reveal the cause of the imprecision of the independent distribution parameter assumption, but also demonstrate a need of developing new reliability methods to accommodate dependent distribution parameters.

## Introduction

Information used by engineers may not be precise and perfect, and engineers are often surrounded by uncertainty. Uncertainty is the difference between the present state of knowledge and the complete knowledge [1]. Uncertainty is usually classified into two types, aleatory uncertainty and epistemic uncertainty. Aleatory uncertainty describes the inherent variability associated with a physical system or environment. It comes from inherent randomness and irreducible variability in nature. For example, ocean wave loads acting on a ship, properties of a material, and dimensions of a mechanical component are all random variables.

Epistemic uncertainty, on the other hand, is due to the lack of knowledge about a physical system or environment. It could be reduced by acquiring more knowledge [2]. For example, without sufficient information about the coefficient of restitution for an impact simulation, engineers may initially estimate it as an interval with a large width, and this interval is due to the epistemic uncertainty. With more information collected, the width of the interval will be narrower with less degree of uncertainty and may also be reduced to a point with no uncertainty.

Uncertainty is the major factor with which reliability analysis deals. Reliability is the probability that a system or component performs its intended function within a given period of time under specified conditions [3]. Reliability analysis is important in engineering applications given the catastrophic consequences when a failure occurs, and uncertainty should be considered in reliability analysis [4]. Aleatory uncertainty is commonly modeled by random variables with probability distributions, which are usually estimated from samples. In real applications, however, we may not get precise and complete information due to limitations of testing conditions and instrumentation, as well as experimental uncertainty. Sometimes, the information may be from judgment and experience. In those cases, samples may be bounded within intervals [5–7]. As a result, epistemic uncertainty arises.

Traditional reliability methodologies, such as the first-order reliability method (FORM) and the second-order reliability method [8], require a great amount of information to construct precise distributions of the input variables for a limit-state function, which predicts the state of a component or system, either in a working condition or a failure condition. As mentioned previously, the distributions of the input variables are often obtained from samples. If some of the samples are intervals, the distribution parameters, such as means and standard deviations, are also intervals. This means that the random input variables with aleatory uncertainty also have epistemic uncertainty in their distribution parameters. The latter uncertainty is, therefore, called the second-order uncertainty because it is on the top of the former uncertainty [9–11].

Although there are situations where some of input variables are not only random variables, but also intervals [12–16], in this study, we focus on only the second-order uncertainty. In other words, the scope of this study is the reliability prediction involving random input variables with interval distribution parameters. Researchers have studied the distribution parameter uncertainty. Kiureghian [17] introduced an index of reliability based on minimizing a penalty function and developed methods for quantifying the uncertainty in the measure of safety arising from the imperfect state of knowledge of distribution parameters. Elishakoff and Colombi [18] and Zhu and Elishakoff [19] proposed methods to tackle parameter uncertainty when scarce knowledge was present on acoustic excitation parameters. Qiu et al. [20] combined classical reliability theory and interval theory to obtain the system failure probability bounds from the statistical parameter intervals of the basic variables. Jiang et al. [21] developed a hybrid reliability model based on monotonic analysis for random variables with interval distribution parameters. Sankararaman and Mahadevan [22] proposed a computational methodology based on Bayesian approach to quantify the individual contributions of variability and distribution parameter uncertainty in a random variable. Xie et al. [11] developed a single-loop optimization model, which combines both probability analysis loop and interval analysis loop, to calculate the reliability bounds with the second-order uncertainty.

The above-mentioned methodologies treat the intervals of distribution parameters independent. In fact, the parameters of a distribution are dependent because they are estimated from the same set of samples. The independent parameter assumption may make the estimated reliability bounds wider than the true bounds. The purpose of this study is to investigate the effect of dependent distribution parameters on the accuracy of reliability prediction. The major approach is numerical simulation and optimization. The results of this study not only reveal the cause of the inaccuracy of the independent distribution parameter assumption, but also demonstrate a need of developing new reliability methods to accommodate dependent distribution parameters.

The organization of this paper is as follows. Section 2 reviews the existing methods for estimating the distribution parameters of a random variable with mixed point and interval samples. Section 3 discusses a likelihood-based approach to estimate the distribution parameters with mixed point and interval samples; it also presents the investigation of how dependent interval distribution parameters affect the accuracy of reliability prediction. Such an effect is demonstrated by four examples in Sec. 4. Section 5 provides conclusions and the research needs for developing new reliability methods that can accommodate dependent interval distribution parameters.

## Review of Distribution Parameter Estimation

where $f(xi|p)$ is the PDF of $X$ at $xi$ with distribution parameters $p$.

In engineering applications, it is also possible that some of the samples are in the form of intervals. For example, if the status of a system is checked periodically, the time to failure will be an interval between the previous instant of time when the system was working and the instant of time when the system is found in a failure state. If the measurement uncertainty is large, then the measured quantity will also be reported as an interval in the form of the best estimate plus and minus the uncertainty term.

*X*with interval samples ($y1,y2,\u2026,yn$), where $yi\u2208[y\xafi,y\xafi]$, $i=1,2,\u2026,n$, Gentleman and Geyer [25] constructed the following likelihood function using the cumulative distribution function (CDF) of

*X*:

where $F(y\xafi|p)$ is the CDF on the upper bound of interval sample $y\xafi$ and $F(y\xafi|p)$ is the CDF on the lower bound of interval sample $y\xafi$.

This likelihood function is constructed using the PDF of point data and the CDF of interval data. Then, the maximum likelihood estimate of **p** can be obtained by maximizing $L(p)$.

**p**. With the joint PDF $fp(p)$, the marginal PDF of each distribution parameter can be obtained. Also, the PDF is then given by

The above methods have the advantage of getting precise distributions even though some samples are intervals, therefore, hiding the epistemic uncertainty and making reliability analysis easier. This treatment, however, produces only a single reliability prediction although the interval-type of epistemic uncertainty exists. In Sec. 3, we will discuss a likelihood-based approach to estimate the intervals of distribution parameters from the mixed point and interval samples and then investigate the effects of dependent distribution parameters on reliability analysis.

## Effect of Dependent Distribution Parameters on Reliability Prediction

In this section, we at first use the maximum likelihood approach to obtain the interval distribution parameters from point and interval samples. Instead of calculating the full likelihood [26], we estimate the lower and upper bounds of distribution parameters by using the interval samples. We then show the dependence of the distribution parameters. Finally, we discuss how the dependent interval distribution parameters affect the reliability analysis result.

### Estimation of Distribution Parameters.

In this subsection, a likelihood-based approach is used to estimate the bounds of distribution parameters of a random variable *X* with the mixed point and interval samples.

where $f(xi|p)$ and $f(yj|p)$ are the PDFs of point sample $xi$ and interval sample $yi$, respectively, given distribution parameter **p**.

With the point and interval samples available, the bounds of distribution parameter $[p\xaf,p\xaf]$ can be obtained.

Changing $miny\u2009\sigma F0$ to $maxy\u2009\sigma F0$, we also obtain the maximum standard deviation $\sigma \xafF0$. Then, the interval distribution parameters $\mu F0\u2208[\mu \xafF0,\u2009\mu \xafF0]$ and $\sigma F0\u2208[\sigma \xafF0,\u2009\sigma \xafF0]$ are available.

### Dependence Between Distribution Parameters.

Theoretically, distribution parameters are dependent because they are estimated from the same set of samples. Note that this dependence is unlike the statistical dependence between random variables. The latter can be reflected by the joint distribution or covariance between two random variables [27]. We will continue to use the approach in Sec. 3.1 to reveal the dependent relationship between distribution parameters of a random variable. The method we use is numerical simulation.

As discussed previously, the load $F0$ follows a normal distribution $F0\u223cN(\mu F0,\sigma F02)\u2009kN$. The total of ten samples of the load includes four points $(x1,x2,x3,x4)=$ (40.486, 31.252, 29.648, 36.285) kN and six intervals $(y1,y2,\u2026,y6)=$ ([23.816, 24.788], [24.78, 25.791], [31.765, 33.061], [29.755, 30.969], [39.815, 41.44], and [35.797, 37.259]) kN. Using Eqs. (10) and (12), we obtain the bounds of mean and standard deviation with intervals $\mu F0\u2208[\mu \xafF0,\u2009\mu \xafF0]=[32.340,\u200933.098]$ kN and $\sigma F0\u2208[\sigma \xafF0,\u2009\sigma \xafF0]=[5.358,\u20096.085]$ kN, respectively.

If we do not consider the dependence between the two distribution parameters, the possible values of the two parameters vary in a box defined by $\mu F0\u2208[\mu \xafF0,\u2009\mu \xafF0]=[32.340,\u200933.098]$ kN and $\sigma F0\u2208[\sigma \xafF0,\u2009\sigma \xafF0]=[5.358,\u20096.085]$ kN. The box is plotted in Fig. 1.

Since the actual distribution parameters are constrained with the box, the reliability prediction will also reside within an interval. The width of the reliability determines the precision of the reliability prediction and the amount of epistemic uncertainty in the prediction, which of course depends on the size of the box of the distribution parameters. As we will see that the actual points of $(\mu F0,\sigma F0)$ may not occupy the entire area of the box, the bounds of the reliability prediction using the box constraint will be likely wider than the actual bounds. If the possible points of $(\mu F0,\sigma F0)$ do not occupy the entire box, they must form another shape, instead of a rectangle. In other words, the distribution parameters are dependent.

To study the dependence between distribution parameters, we can analytically derive equations that define the shape (boundary) of the domain of the distribution parameters. If the shape is not rectangular, the distribution parameters are indeed dependent. To find the shape more easily, we actually perform experiments by random sampling. We generate a sufficient number of points in the range of an interval sample, which represent possible values of that interval sample. To make the simulation process efficient, we choose to use a uniform distribution for an interval sample. There are two advantages of doing so. First, the bounds of the uniform distribution are the same as those of the interval sample. The generated points will not be outside the range of the interval sample. Second, we can get good uniformity. Other distributions could also be used. For example, a normal distribution can be used, but it has to be truncated. Note that using a distribution to obtain realizations of an interval sample herein does not mean that the interval sample follows that distribution.

For this example, the four-point samples are constant, while the six interval samples are randomly simulated. The actual values of each of the interval samples are drawn within its intervals with uniform distributions. Totally 10^{5} sets of samples are obtained, and the same numbers of means and standards are calculated. The results are shown in Fig. 2. For comparison, the box discussed previously is also plotted in Fig. 2. It is seen that the actual domain of the distribution parameters is smaller than the box-type hyper-rectangle. The simulation indicates that the actual points of $(\mu F0,\sigma F0)$ do not appear at the four corners of the box.

To investigate how the pattern of dependence changes with respect to the number of interval samples, we also vary the number of interval samples from one to nine while keeping the sample size (total number of point samples and interval samples) as ten. The results are shown in Fig. 3. Figure 3(a) shows the dependent distribution parameter relationship for one interval and nine points; likewise, Fig. 3(h) shows the dependent distribution parameter relationship for nine intervals and one point.

Although no clear patterns could be identified, the results clearly indicate that the actual domains of the distribution parameters are smaller than the box-type domains. In Sec. 3.3, we will discuss reliability analysis with interval samples.

### Reliability Analysis With Interval Distribution Parameters.

In order to demonstrate the impact of dependent distribution parameters on reliability prediction, herein we discuss two reliability methods. The first is the traditional reliability analysis that uses the box (bounds) of the distribution parameters directly without accounting for the dependence between the distributions parameters. We call this method *the independence method*. The second method uses the raw sample data of input random variables, including both point and interval samples. We call this method *the dependence method*. Both methods will produce interval reliability because of interval samples. For engineering applications, we always prefer narrower bounds of reliability prediction or a smaller width of the reliability interval. As we will see, the two methods will produce different reliability bounds, and the latter method accounts for dependence of distribution parameters and will generate narrower reliability bounds and is therefore more preferable.

Note that the purpose of our discussions is not for the development of a new reliability methodology with dependent distribution parameters; instead, we are interested in understanding how the dependence of distribution parameter intervals affects the precision of reliability prediction. The findings will demonstrate the need and will provide a guidance of developing new reliability methodologies.

Let the intervals of distribution parameters of *X* be $p\u2208[p\xaf,p\xaf]$. The probability of failure $pf$ is a function of **p**, namely, $pf=pf(p)$. As a result, $pf$ is also an interval and $pf\u2208[p\xaff,p\xaff]$, where $p\xaff$ and $p\xaff$ are lower and upper bounds of $pf$, respectively. Next, we discuss how to obtain the bounds of $pf$.

For the upper bound or maximum probability of failure $p\xaff$, the first line of the optimization model in Eq. (15) is changed to $maxp\u2009pf(p)$.

For $p\xaff$, the first line of the optimization model in Eq. (16) is changed from $miny\u2009pf(y)$ to $maxy\u2009pf(y)$.

Note that in the independence method, the distribution parameters are assumed independent within box-type constraints. The dependence method using raw data accounts for dependent distribution parameters automatically. As discussed previously, the box-type domain of interval distribution parameters in the former method is larger than and also covers the domain of interval distribution parameters used in the latter method. Roughly speaking, the feasible region of the optimization in the former method is larger than and covers that in the latter method. As a result, the bounds of the probability of failure of the former method are wider than those of the latter method. In Sec. 4, we will demonstrate this with examples.

## Examples

In this section, we use four examples to demonstrate the effect of dependent interval distribution parameters on reliability prediction. The four examples cover different situations. Example 1 has only one random variable with interval samples, and it is therefore easy to change the number of interval samples to study its impact on the dependence between distribution parameters. Example 2 involves more random variables with interval samples. While random variables in the first two examples are normally distributed, those in Example 3 are non-normally distributed. The first three examples have analytical solutions, but there is no analytical solution existing in Example 4. The probability of failure bounds from both the independence and dependence methods. We used the sequential quadratic programming in the matlaboptimizationtoolbox for optimization.

### Example 1: Reliability Prediction With Point and Interval Load Samples.

As shown in Fig. 4, a resultant force $Q=\u2211i=1kQi$, $k=1,2,3$, is applied to the end of a beam. There are three cases. There is only one force $Q1$ in case 1 ($k=1$), there are two forces $Q1$ and $Q2$ in case 2 ($k=2$), and there are three forces $Q1$, $Q2$, and $Q3$ in case 3 ($k=3$). $Qi=Q0$, $k=1,2,3$, are independently and identically distributed with the distribution of $Q0$. The samples of $Q0$ are obtained from experiments. The ten samples include four points $(x1,x2,x3,x4)\u2009$ and six intervals $(y1,y2,y3,y4,y5,y6)\u2009$. The samples are given in Table 1. The distribution of $Q0$ is normal, and the yield strength of the beam is $Sy=kS$ ($k=1,2,3$) for the three cases. All the information available is summarized in Table 2.

where $l$ is the beam length, and $d$ is the beam width and thickness. $G<0$ indicates a failure.

Using the samples of $Q0$ in Table 1 and Eqs. (10) and (12), we obtain the bounds of the mean and standard deviation of $Q0$ as follows: $\mu F0\u2208[\mu \xafF0,\mu \xafF0]=[3.2340,\u20093.3098]$ kN and $\sigma F0\u2208[\sigma \xafF0,\sigma \xafF0]=[5.3582,\u20096.0849]$ kN.

For $p\xaff$, the first line of the model in Eq. (20) is changed to $maxy\u2009pf(y1,y2,\u2026,y6)$.

As seen in Eq. (20), the six intervals are used as constraints. Table 3 shows the bounds of $pf$ from both methods.

The results indicate that the dependence method produces narrower bounds of $pf$ than those from the independence method. The average reduction of the bound width from the former method is about 31%. For this problem with a linear limit-state function, the solution to $pf$ in Eq. (18) is exact, and the bounds obtained from the dependence method are the true bounds. The independent method produces wider bounds, which therefore contain higher amount of epistemic uncertainty in the predicted probability of failure.

In addition to the study with six interval samples for $F0$ discussed earlier, we also assume other numbers of interval samples as shown in Table 4. The dependent relationship between the mean and standard deviation with the increasing number of intervals has been shown previously in Fig. 2. Figure 5 shows the bounds of $pf$ for case 1. The results also indicate that the bounds are much wider than the true bounds if the mean and standard deviation are considered independent.

### Example 2: Reliability Prediction With Point and Interval Samples of Both Strength and Load.

This example is modified from case 3 ($k=3$) in Example 1. The samples of $Q0$ have already been given in Table 1. The samples of the yield stress $Sy$ also include intervals and are given in Table 5. $Sy$ is normally distributed and is independent of $Q0$. All the information available is given in Table 6.

Using the samples of $Sy$ in Table 5 and Eqs. (10) and (12), we obtain the following bounds for $Sy$: $\mu Sy\u2208[\mu \xafSy,\mu \xafSy]=[7.0855,7.2628]\xd7107$ Pa and $\sigma Sy\u2208[\sigma \xafSy,\sigma \xafSy]=[5.5012,\u20097.2305]\xd7106$ Pa.

Using simulation, we find the dependent relationship between the mean and standard deviation of the yield strength $Sy$ as shown in Fig. 6.

where $yi\u2009(i=1,2,\u2026,6)$ are interval samples of $Q0$ and $zj\u2009(j=1,2,\u2026,6)$ are interval samples of $Sy$. If we change the first line of the model in Eq. (22) to $maxz,y\u2009pf(y1,\u2026,y6;z1,\u2026,z6)$, we obtain $p\xaff$. The twelve intervals are used as constraints.

Table 7 shows the bounds of $pf$ obtained from the two methods. The results indicate that the dependence method with raw data produces much narrower bounds of $pf$ than those from the independence method with the independent distribution parameter assumption. The reduction of the bound width is about 89%.

### Example 3: Reliability Prediction With Point and Interval Samples of Young's Modulus and Load.

This problem is the modification of the example given in Ref. [28]. Non-normal variables are involved. As shown in Fig. 7, a load *p* is uniformly distributed on a simply supported beam, whose length, width, and height are *l*, *b*, and *h*, respectively. The beam dimensions are given in Table 8. Force *p* and Young's modulus *E* follow lognormal distributions, and their samples are given in Table 9. All the random variables are independent.

where $\delta $ is the allowable deflection, and $\delta =16$ mm.

Using the samples of *E* and *p* in Table 9 and Eqs. (10) and (12), we obtain the bounds of the means and standard deviations of *E* and *p* as shown in Table 10.

Using the means and standard deviations of the random variables, we obtain their distribution parameters given in Table 11.

where $wi\u2009(i=1,2,\u2026,6)$ are interval samples of $E$, and $vj\u2009(j=1,2,\u2026,6)$ are interval samples of $p$. $p\xaff$ is obtained by changing the first line of the optimization model in Eq. (31) to $maxw,v\u2009pf(w1,\u2026,w6;v1,\u2026,v6)$. The twelve interval samples are used as the constraints.

Table 12 shows the bounds of $pf$ from the two methods. The results indicate that the dependence method produces much narrower bounds than those from the independence method. The reduction of the bound width is about 75%.

### Example 4: Reliability Prediction With Point and Interval Samples of Young's Modulus and Load.

A cantilever tube [7] is shown in Fig. 10. It is subjected to three forces $F1$, $F2$, and $P$, as well as a torque $T$.

where $L1=120$ mm and $L2=60$ mm.

All the distributions with precise distribution parameters are given in Table 13. The parameters of $F1$, $\theta 1$, and $\theta 2$ are estimated from 25 point samples and five interval samples. They are normally distributed. The bounds of their means and standard deviations are given in Table 14.

Different from the previous three examples, there is no analytical solution to the probability of failure, the FORM [8] is used. The results are given in Table 15, which again indicates that the consideration of dependent distribution parameters produces narrower bounds of $pf$.

Figure 11 shows the simulated distribution parameter combinations from the raw data and the combinations where the two methods obtain the extreme values of $pf$. The independence method obtains the extreme values at lower-left and upper-right corners of the box, but the simulation shows it is impossible to reach the two corners. This is the reason why the box-model produces wider bounds of $pf$.

## Conclusions

When interval samples exist, the distribution parameters of a random input variable are also intervals. The distribution parameters are dependent because they are estimated from the same set of samples. If the dependence is not considered, the domain of the distribution parameters is a box-shaped hyper-rectangle, which is determined by the lower and upper bounds of each distribution parameter. This study shows that the actual domain of the distribution parameters is not a hyper-rectangle and that the pattern depends on the number of interval samples. This study also finds that the actual domain is enclosed by and is smaller than the box-shaped hyper rectangular domain. Besides, the ignorance of distribution parameter dependence may also result in wider reliability bounds than the true ones, making decision-making difficult.

In this study, we use the raw interval samples as constraints to find the extreme values of the probability of failure. Its efficiency, however, many not be high. The optimization requires repeated calls for the reliability analysis for the combinations of possible distribution parameters with the parameter bounds. The FORM itself also needs an optimization process. If FORM is employed, a double-loop procedure will be necessary. As a result, there is a need to develop new reliability methods that can efficiently accommodate dependent distribution parameters.

In many situations, raw point and interval samples are proprietary and may not be available to reliability engineers and design engineers who know only the simple bounds of distribution parameters. As a result, they could only assume that the distribution parameters are independent. The other future task is how to report distributions and their parameters to reliability engineers and design engineers so that the dependence of the distribution parameters can be presented without giving the raw samples.

## Acknowledgment

This material is based upon work supported by the Intelligent Systems Center at the Missouri University of Science and Technology.

## Funding Data

The National Science Foundation (Grant Nos. CMMI 1300870 and CMMI 1562593).