In a Bayesian network (BN), how a node of interest is affected by the observation at another node is a main concern, especially in backward inference. This challenge necessitates the proposed global sensitivity analysis (GSA) for BN, which calculates the Sobol’ sensitivity index to quantify the contribution of an observation node toward the uncertainty of the node of interest. In backward inference, a low sensitivity index indicates that the observation cannot reduce the uncertainty of the node of interest, so that a more appropriate observation node providing higher sensitivity index should be measured. This GSA for BN confronts two challenges. First, the computation of the Sobol’ index requires a deterministic function while the BN is a stochastic model. This paper uses an auxiliary variable method to convert the path between two nodes in the BN to a deterministic function, thus making the Sobol’ index computation feasible. Second, the computation of the Sobol’ index can be expensive, especially if the model inputs are correlated, which is common in a BN. This paper uses an efficient algorithm proposed by the authors to directly estimate the Sobol’ index from input–output samples of the prior distribution of the BN, thus making the proposed GSA for BN computationally affordable. This paper also extends this algorithm so that the uncertainty reduction of the node of interest at given observation value can be estimated. This estimate purely uses the prior distribution samples, thus providing quantitative guidance for effective observation and updating.

Introduction

During the past 30 years, the Bayesian network (BN) has become a key method for representation and reasoning under uncertainty in the fields of engineering [1,2], machine learning [3,4], artificial intelligence [5,6], etc. In a BN, random variables are denoted by nodes and their dependence relationships are denoted by directed arcs (arrows). An arc (arrow) between two nodes indicates that the distribution of the downstream node (child node) is conditioned on the value of the upstream node (parent node), i.e., the arc is associated with a conditional probability distribution (CPD). The entire BN represents the joint probability distribution of the random variables. As long as the BN has been established, two tasks can be pursued: (1) calculate the distribution of a downstream node based on the given observation of an upstream node, i.e., forward propagation and (2) estimate the posterior distribution of an upstream node at the given observation of a downstream node, i.e., backward inference. Usually, the purpose of backward inference is to reduce the uncertainty of the variable (node) of interest, as shown in Fig. 1.

We can see that both the forward propagation and the backward inference concern that whether fixing a node would affect the distribution of another node. And this concern actually requires a sensitivity analysis: in forward propagation, if the node of interest has a low sensitivity with respect to an input node (an upstreaming node which causes uncertainty in the node of interest), we can be simply fix this input node to achieve dimension reduction without affecting the resultant distribution of the node of interest; and in backward propagation, if the node of interest has a low sensitivity with respect to an observation node, then this observation node cannot effectively reduce the uncertainty of the node of interest (as shown in Fig. 1(b)); thus, we should switch to observe a more appropriate node leading to higher sensitivity. Note that this sensitivity analysis is desired before conducting the forward propagation and/or the backward inference.

Various sensitivity analysis methods have been developed in the literature [7], and this paper selected the prominent variance-based Sobol’ index [813], one of the “global sensitivity analysis” (GSA) method that considers the entire distribution of the inputs. The Sobol’ index includes first-order and higher-order indices. While the higher-order index requires uncorrelated model inputs, Saltelli and Tarantola [14] pointed out that the first-order index is still an informed choice to rank the importance of correlated model inputs. This paper focuses on calculating the first-order index Si since the correlation among the nodes of a BN is very common. In a BN, the correlation between any two nodes can be established by the arcs (CPDs) between them. In other words, if the nodes are independent to each other, then they cannot even construct a BN. Consider a computational model in the form of y=f(x) where x={x1,,xk} is the vector of stochastic model inputs. The first-order Sobol’ index Si(i=1,2,,k) is 
Si=Vxi(Exi(y|xi))V(y)=1Exi(Vxi(y|xi))V(y)
(1)

where xi means all the model inputs other than xi, and V() means the variance, and E() means the mean value. For the numerator of the middle term of Eq. (1), VΦl(y)=EΦl(Vxi(y|xi))+VΦl(Exi(y|xi)) denotes the mean value of y across different values of xi at given value of xi, and Vxi() means the variance across different values of xi. Similarly, for the numerator of the right term of Eq. (1), Vxi(y|xi) denotes the variance of y across different values of xi at given value of xi, and Exi() means the variance across different values of xi. Furthermore, the middle term and the right term are equal due to the law of total variance V(y)=Vxi(Exi(y|xi))+Exi(Vxi(y|xi)).

Si quantifies the contribution of input xi by itself to the uncertainty in output y. Specifically, the last term of Eq. (1) also indicates that Si is the average ratio of the output variance reduction by fixing xi. For example, Si=0.3 means that the variance of y will reduce by 30% on average after fixing xi.

The proposed GSA for BN aims to calculate the first-order Sobol’ index to quantity the contribution of node X1 toward the uncertainty of the node of interest XN. (As mentioned earlier, X1 denotes an input node in the case of forward propagation, and an observation node in the case backward inference). However, this GSA for a BN confronts two challenges of feasibility and affordability. First, the computation of the Sobol’ index requires a deterministic function [15]. The term “deterministic function” means that if all the inputs are given specific values, the function outputs a single value. It does not matter whether the function is closed-form or implicit. But a unique input to the model gives a unique value of the output, and the variance-based Sobol’ index [16] needs this assumption. In a previous paper, we have discussed the need for a deterministic function in the Sobol index computation [10]. Our paper emphasizes the term deterministic function to contrast from a “stochastic” prediction; in the latter case, the function output is probabilistic (i.e., it has many possible realizations) even if all inputs are fixed. A simple stochastic model is a Gaussian variable XN(μ, σ2) if we consider μ and σ as model inputs and X as model output. The BN is a stochastic model, i.e., the node of interest still has a distribution even at given values of other nodes. And the required deterministic function mapping X1 (and some other variables) to the node of interest XN is unestablished. Proof of the existence and the establishment of this deterministic function needs to be solved. Note that usually the Sobol’ index is applied to deterministic function of continuous input/output, so that this paper only considers the continuous BN where all the nodes are continuous variables.

Second, the computation of the Sobol’ index can be expensive even if the deterministic function is established. Based on Eq. (1), computing Si analytically is nontrivial, since Exi() in Eq. (1) indicates a multidimensional integral. Computing Si by Monte Carlo simulation (MCS) directly is also expensive. The numerator in Eq. (1) leads to a double-loop MCS [7]: the inner loop Exi(y|xi) computes the mean value of y using Xtm random samples of xi; and the outer loop computes Vxi(Exi(y|xi)) by iterating the inner loop n2 times at different values of xi. Various algorithms have been proposed to reduce the cost in the computation of the first-order Sobol’ index, categorized into analytical methods [1719] and Monte Carlo methods [16,2023]. However, all of these algorithms assume uncorrelated model inputs, while the nodes in a BN are usually correlated. Saltelli's paper [23] in 2002 mentioned that there is no alternative to the expensive double-loop MCS to compute Si with correlated model inputs. This hurdle in computational cost is also to be solved.

The rest of the paper is organized as follows: Section 2 uses the auxiliary variable method to convert the path between node X1 and node XN to a deterministic function, thus making the Sobol’ index computation feasible for a BN. Section 3.1 introduces an efficient algorithm to directly estimate the Sobol’ index from Monte Carlo samples of the prior distribution of the BN, so that the proposed GSA for the BN is affordable. The resultant index is the averaged variance reduction ratio (VRR) of the node of interest across all possible values of the observation node. If a specific value of the observation node is given, the proposed method is extended to give a better estimate of variance reduction ratio at the given observation value. Section 4 illustrates the proposed method using two examples, including a time-independent static BN and a time-dependent dynamic BN.

Note: the sensitivity analysis in this paper is based on “current knowledge,” which means the joint distribution of the Bayesian network. This paper uses the prior distribution samples to conduct the sensitivity analysis since that is our current knowledge before any observation and updating. Of course, after collecting an observation and subsequent calculation of the posterior distribution, we can also use the samples from the posterior distribution to calculate the posterior sensitivity index.

Feasibility of Global Sensitivity Analysis for a Bayesian Network

Auxiliary Variable Method.

The auxiliary variable method was developed by Sankararaman and Mahadevan [24] to quantitatively distinguish the contributions of aleatory natural variability and epistemic distribution parameter uncertainty in a random variable x. The distribution of x is conditioned on the value of its distribution parameter θx. The uncertainty about the value of θx is represented by a probability density π(θx), and this uncertainty is epistemic. The aleatory uncertainty of x is denoted by a conditional distribution π(x|θx), which changes as the value of θx changes.

Based on the probability integral transform theorem [25], random sampling from the π(x|θx) is realized in two steps: (1) define a variable Ux of standard uniform distribution U(0,1) and generate its sample, which is taken as the cumulative density function (CDF) value of x, and (2) obtain a sample of x by inverting the conditional CDF F(x|θx), i.e., 
x=F1(Ux|θx)
(2)

Equation (2) is a deterministic function f:(Ux,θx)x, since a sample of θx and a sample of Ux lead to a single value of x. At a given value of θx, the sample of Ux and the sample of x have a one-to-one mapping, i.e., a single value of x is determined once the value of Ux is decided. Thus the natural variability in x is represented by Ux. This standard uniform random variable Ux, which is the CDF value of π(x|θx), is named as the auxiliary variable.

Although the use of auxiliary variable is a standard procedure in sampling random variables, generally it is used implicitly and only the resultant samples of the random variables are recorded and utilized. However, if we use this auxiliary variable explicitly, it makes the GSA for BN possible. This will be explained in the remainder of this section.

Deterministic Function for a Directed Path in Bayesian Network.

The auxiliary variable method have been extended to any variable whose distribution is conditioned on other variables [10,26], i.e., to any CPD in the BN. Assume that the distribution of a random variable C depends on the value of two other random variables A and B by a CPD π(C|A,B). Then the variability in π(C|A,B) can be captured by a single auxiliary variable UC, which is the CDF value of π(C|A,B). Thus, the uncertainty in variable C is caused by two components: (1) the uncertainty due to the parent nodes A,B; and (2) the uncertainty expressed by the CPD at given values of A and B. The introduced auxiliary variable captures the later part. (With reference to Sec. 2.1, A and B are analogous to distribution parameters for the random variable C). As shown in Fig. 2 (A), (B), and (C) constitute a simple BN. The introduced auxiliary variable UC converts C to be a deterministic node, which means the value of C is fixed once the value of its parent nodes {A,B,UC} is given. Finally, this auxiliary variable build a deterministic function C=F1(UC|A,B), where F1() is the inverse CDF of the CPD π(C|A,B).

The auxiliary variable method can be further extended to a directed path X1X2XN in a BN. Here, XN is the node of interest, and the objective is to compute the sensitivity index of X1 to decide whether XN is sensitive to it. An example of the directed path is ACEG in Fig. 3. As shown in Fig. 4, by introducing auxiliary variables to each CPD in this directed path, a deterministic function mapping X1 to XN is established

 
{X2=F1(UX2|PaX2,X1)X3=F1(UX3|PaX3,X2)XN=F1(UXN|PaXN,XN1)
(3)

where F1(UXi|PaXi, Xi1) for i=2 to N is the inverse CDF of the CPD π(Xi|PaXi,Xi1), and UXi is the auxiliary variable introduced for this CPD, and PaXi represents the parent nodes of Xi that are not in this path (Note that another notation PaV is used later, which means all the parents node of V, i.e., PaXi={PaXi,Xi1} in Fig. 4. The inputs of Eq. (3) are {X1,PXi,UXi} for i=2 to N, thus Eq. (3) can be also denoted as a deterministic function f:{X1,PXi,UXi}XN.

The deterministic function established in Eq. (3) can be illustrated by a simple BN in Fig. 3. For the directed path ACEG, thus the deterministic function mapping A to G is 
{C=F1(UC|A,B)E=F1(UE|C,D,F)G=F1(UG|E,F)
(4)

Deterministic Function for a Trail in Bayesian Network.

The directed path from X1 to XN in Sec. 2.2 requires that all the arcs are directed toward XN, which may NOT be satisfied. It is not usually that some arrows are toward X1. In backward inference, the node of interest XN is usually an upstream node (probably root node) while the observation node is usually a downstream node. Thus, the path between them turns to be X1X2XN, where all the arrows are toward X1. In this case, the function f:X1XN will be still missing.

To solve this problem, let consider a general case named “trail.” A trail X1X2XN (where the arc “ ” is still directed, either “ ” or “ ”) only requires all the adjacent nodes in the path are connected by arcs, regardless of the direction of the arcs. The deterministic function established in Eq. (3) for the directed path can be also extended to the trail based on the theorem of arc reversal [27].

Theorem 1 Arc Reversal. Given that there is an arc(V1,V2)from nodeV1to nodeV2, but no other directed path fromV1toV2, arc(V1,V2)can be replaced by arc(V2,V1). Afterward, both nodes inherit each other's parent nodes.

This theorem is illustrated in Fig. 5. Here, PaV1 indicates the parent nodes of V1, and PaV2 indicates the parent nodes of V2. In addition, PaV1\PaV2 are the nodes which are the parents of V1 but not the parents of V2, and correspondingly PaV2\PaV1 are the nodes which are the parents of V2 but not the parents of V1; and PaV1PaV2 are the shared parents of V1 and V2. Figure 5 shows that after reversing the arc between V1 and V2, extra arcs (PaV1\PaV2,V2) and (PaV2\PaV1,V1) are also derived based on Ref. [27] and added the new BN to guarantee that the new BN after arc reversal is mathematically equivalent to the original BN. The CPDs also need to be redefined, and the derivation of the new CPDs can be also found in Ref. [27]. However, note that the proposed method in this paper do NOT need to derive these new CPDs. The main focus of this section is to illustrate the possibility of arc reversal and prove the existence of the deterministic function mapping X1 to XN even if the path between them is undirected.

With respect to the trail between X1 and XN, Theorem 1 proves that the arc (Xi,Xj) between two adjacent nodes Xi and Xj (i=j+1 or j1 so that they are adjacent) can be reversed, as long as there is no other directed path from Xi to Xj. If all the arcs toward X1 can be reversed, this trail will be converted to a directed path from X1 to XN so that a deterministic function mapping X1 to XN exists based on Eq. (3). In Fig. 3, the trail HEG can be converted to a directed path HEG by reserving the arc (E, H); then, a deterministic function mapping H to G can be constructed using auxiliary variables.

The arc reversal makes the sensitivity analysis regarding the backward Bayesian inference possible. In this case, we aim to compute the Sobol’ index of observation node X1 regarding an inference node of interest XN so that we can quantify the variance reduction of XN by fixing node X1 at an observation value. The existing path in the Bayesian network is X1X2XN, then the arc reversal enables us to reverse all the arcs to build the required deterministic function.

It is arc reversal that makes the sensitivity analysis regarding Bayesian inference possible. In this case, we aim to compute the sensitivity of root node XN with respect to the observation node X1 so that we can quantify the uncertainty reduction of XN by fixing node X1 at an observation value. We need a directed path from X1 to XN. However, the existing path in the Bayesian network is always directed from root node XN toward the observation X1, so that we must reverse all the arcs to build the required deterministic function.

Global Sensitivity for a Path in the Bayesian Network.

For two nodes X1 and XN in the BN, Secs. 2.12.3 explained the possibility to build a deterministic function mapping X1 to XN via: (1) the directed path or trail between them, (2) the theorem of arc reversal, and (3) the introduction of auxiliary variables. Thus, we can conduct the GSA on Eq. (3) and compute the first-order Sobol’ index SX1 for X1. As explained below in Eq. (1), SX1 is the average ratio of the reduced variance of XN by fixing X1. In backward inference, a low value SX1 indicates that measuring X1 to reduce the uncertainty of XN is just a waste of effort, and another node of higher sensitivity index would be more appropriate.

However, the computation of SX1 is nontrivial. First, building the deterministic function explicitly can be complicated for both directed and trail. For the directed path X1XN, the effort to build the deterministic function in Eq. (3) becomes intensive if the path is long so that many nodes and auxiliary variables will be involved. For the trail, more efforts to modify the structure of the BN and derive new CPDs are required. In backward inference, usually the observation node is downstream and the node of interest is the upstream; thus, the corresponding path is X1XN. To build the required deterministic function mapping X1 to XN, all arcs in the path need to be reversed, which brings intensive computational efforts.

Second, even with the deterministic function established, calculating the sensitivity index also needs intensive effort. The inputs of the deterministic function include the nodes in the BN, so the correlation between them is very common. As mentioned in Sec. 1, since current efficient algorithms for Sobol’ index usually require uncorrelated inputs, the expensive double-loop MCS is the only choice.

Actually, the purpose of this section is to mathematically prove the existence of a deterministic function between two nodes (regardless of the computational effort), so that the Sobol’ index, which requires the deterministic function, is applicable to quantify the sensitivity for the Bayesian network. The Sobol’ index computation will be realized by an algorithm proposed by the authors. Details of this algorithm can be found in Ref. [28], and a brief introduction is given in Sec. 3.1. This algorithm directly extracts the first-order Sobol’ index from the input–output samples. Using this algorithm, the Sobol’ index of the observation node quantifying its contribution toward the uncertainty of the node of interest can be computed purely using their samples generated from the joint prior distribution of the BN, where the computation of the inference is NOT needed. In detail, we generate samples of the root nodes from their prior distributions and then generate samples of nonroot nodes based on the samples of their parent nodes and the CPDs. Therefore, explicitly establishing the deterministic function is not necessary, and the expensive double-loop MCS method is also avoided. Section 3.2 extends this algorithm to a better variance reduction estimate at a specific observation value.

Sobol’ Index Computation and Variance Reduction at Given Observation

Directly Estimate the First-Order Sobol’ Index From Input–Output Samples.

Consider a deterministic function of y=f(x) where x={x1,,xk}. This algorithm is regarding first-order Sobol’ index expression of Si=1Exi(Vxi(y|xi))/V(y), whose numerator Exi(Vxi(y|xi)) implies an expensive double-loop Monte Carlo simulation.

First we illustrate stratified sampling, which generates samples in equal probability intervals to represent the distribution of a random variable xi. Figure 6(a) shows one strategy [7] of stratified sampling: (1) divide the CDF of xi into M intervals such that these intervals have the same length and (2) generate one sample ul (the red dots in Fig. 6(a), and l=1 to M) from each CDF interval and obtain samples of xi (the green dots in Fig. 6) by CDF inversion xil=P1(ul), where P1() is the inverse CDF of xi. If we take the bounds of these intervals of the CDF as the inputs of P1(), the sampling space of xi is actually divided into M equally probable intervals Φl(l=1 to M), as shown in Fig. 6, and xil is actually a random sample generated within Φl. In this paper we also denote Φ={Φ1,Φ2,,ΦM}.

Vxi(y|xi) is a function of xi. Based on the extreme value theorem, Vxi(y|xi) must have a maximum value and a minimum value in Φl. The mean value of Vxi(y|xi), i.e., EΦl(Vxi(y|xi)) for xiΦl, is between the maximum and minimum values 
minxiΦl(Vxi(y|xi))EΦl(Vxi(y|xi))maxxiΦl(Vxi(y|xi))
(5)
Then the intermediate value theorem proves that 
xi#Φl s.t. Vxi(y|xi#)=EΦl(Vxi(y|xi))
(6)
With Xtm, the law of total variance is 
VΦl(y)=EΦl(Vxi(y|xi))+VΦl(Exi(y|xi))
(7)
where the subscript Φl means all the terms are constrained to xiΦl. Substituting Eq. (7) into Eq. (6), we can have 
xil#Φls.t.Vxi(y|xil#)=VΦl(y)VΦl(Exi(y|xi))
(8)

where xil# is an unknown but existing point in Φl. Note that now VΦl(y) is the variance of y given xiΦl and VΦl(Exi(y|xi)) is the variance of Exi(y|xi) given xiΦl.

The last term in Eq. (8) can be rewritten as 
VΦl(y)VΦl(Exi(y|xi))=(1VΦl(Exi(y|xi))VΦl(y))VΦl(y)
(9)
In Eq. (9), the term VΦl(Exi(y|xi))/VΦl(y)=SiΦl is nothing but the first-order Sobol’ index of xi as it is restricted to the interval Φl. We always have SiΦl<Si since the uncertainty of xi has been reduced significantly by restricting it in Φl such that its Sobol’ index will be much lower. And the value of SiΦl approaches zero as M increases so that the Φl is narrowed. Then we have VΦl(y)VΦl(Exi(y|xi))VΦl(y) so that Eq. (8) changes to 
Vxi(y|xil#)VΦl(y)
(10)
In the formula of Si=1Exi(Vxi(y|xi))/V(y), the outer loop Exi() requires fixing xi at different locations, and these selected locations are samples from the distribution of xi. Based on stratified sampling, the set of these unknown but existing points xi#={xi1#,,xiM#} from the equally probable intervals Φ={Φ1,,ΦM} can represent the distribution of xi. Based on Eq. (10), computation of Si is expressed as 
Si=1EΦ(Vxi(y|xil#))V(y)1EΦ(VΦl(y))V(y)
(11)

where Φl(l=1 to M) then Eq. (11) can be realized by the following steps:

  1. (1)

    Generate n random samples of x;

  2. (2)

    Obtain corresponding values of y by evaluating y=f(x), and estimate V(y) using all samples of y;

  3. (3)

    Divide the domain of xi into M equally probable intervals, as shown in Fig. 6;

  4. (4)

    Assign the samples of y into divided intervals based on one-to-one mapping between the samples of xi and samples of y;

  5. (5)

    Estimate VΦl(y) as the sampling variance of y in each interval;

  6. (6)

    Estimate EΦ(VΦl(y)) as the sampling mean of t in step 5;

  7. (7)

    The first-order index is Si=1EΦ(VΦl(y))/V(y).

The steps to realize the proposed method are also illustrated in Fig. 7, where samples in different equally probable intervals are represented in different colors. Step 1 in Fig. 7 shows that the proposed algorithm can directly estimate Si from input–output samples without rerunning the underlying model. Moreover, steps 3 and 4 show that the samples of xi are not used in calculating the index Si for xi; therefore, the calculation of Si purely depends on the samples of xi and y, and can be achieved even if the samples of xi are missing. In addition, the derivation of Eq. (11) does not assume uncorrelated inputs; thus, this algorithm can handle both correlated and uncorrelated inputs. These three advantages make this algorithm suitable for the GSA of a BN using the samples generated from the joint prior distribution.

One question is how many samples are needed to reach desirable accuracy of the computed index. This question has been addressed in an earlier paper [28] by the authors. A heuristic rule is: (1) at least 50 intervals and (2) each interval Φl contains at least 50 samples. To guarantee this condition, we simply need to generate 2500 samples.

Variance Reduction Assessment by the Proposed Algorithm.

As explained earlier, for a deterministic function y=f(x) where x={x1,,xk}, the proposed algorithm only requires the samples of input xi and output y to calculate the first-order Sobol’ index Si. And this calculation only needs to assign the samples to equally probable intervals of xi and requires no more functional evaluation. For two nodes X1 and XN in the BN, since the existence of the deterministic function mapping X1 to XN has been proved in Sec. 2, the calculation of the first-order Sobol’ index of node X1 quantifying its contribution toward the uncertainty in node XN only needs the samples of X1 and XN. Based on Eq. (11), the equation to compute the desired index SX1 is 
SX1=1EΦ(VΦl(XN))V(XN)
(12)

Specifically, the desired first-order index SX1 can be calculated by considering the X1 as “xi” and XN as “y” in Fig. 7.

The resultant index SX1 is the average ratio of the variance reduction of XN by fixing X1 at its observation; and this is an average over all possible values of X1. This is an informative estimate before the forward propagation or backward inference if the value of the observation is NOT known.

If the value of the observation of X1 is known, this VRR estimate can be further improved by identifying the equally probably interval where the observation is located and computing the local variance. Denote Φ̂ as the equally probable interval that contains the observation X̂1, i.e., X̂1Φ̂. Section 3.1 has proved that VΦ̂(XN)=V(XN|X1=X1#) where X1#Φ̂. Since X1# and X̂1 are in the same interval so that X̂1X1#, then we have VΦ̂(XN)V(XN|X1=X̂1). As the desired variance reduction ratio is 1V(XN|X1=X̂1)/V(XN), the improved estimate is 
VRR1VΦ̂(y)V(y)
(13)

Compared to Eq. (12) which computes the average VRR of XN over all possible values of X1, Eq. (13) estimates the VRR of XN at a specific value of X1. The accuracy of Eq. (13) will be higher if: (1) Φ̂ is narrower so that X̂1 is closer to X1# and (2) more samples of XN are assigned to Φ̂ so that VΦ̂(XN) is a better estimate of V(XN|X1=X1#).

Summary.

Section 3.1 introduces a new algorithm to efficiently estimate the first-order Sobol’ index from Monte Carlo samples. In the proposed algorithm, the calculation of the first-order Sobol’ index of an input does not need the samples of other inputs. This advantage makes it an ideal algorithm for the GSA of a BN, where generating prior samples are much easier than building the underlying deterministic function in discussed in Sec. 2. The proposed algorithm can also handle correlated inputs, so that it is suitable for a BN where the nodes are generally correlated.

Section 3.2 described the implementation of the proposed algorithm to the BN. Equation (12) can be used to calculate the average ratio of the variance reduction of a node of interest XN over all possible observations of another node X1 ; Eq. (13) provides a better estimate of variance reduction ratio if the value of the observation is known.

In Sec. 4, two numerical examples are provided to illustrate the proposed GSA method for the BN. The first example is a time-independent static BN, and the second example is a time-dependent dynamic BN.

Numerical Examples

Structural Dynamics Problem.

A structural dynamics problem provided by Sandia National Laboratories is used to illustrate the proposed method, and more details on this problem can be found in Refs. [2932]. As shown in Fig. 8, the system of interest contains three mass–spring–damper components in series; and these components are mounted on a beam supported by a hinge at one end and a spring at the other end; and a sinusoidal force input P=3000sin(350t) is applied on the beam.

This system has three model parameters of spring stiffnesses k=(k1,k2,k3) and they are assumed to have unknown true values to be calibrated. The prior distribution of ki is assumed to be Gaussian with a coefficient of variation of 10% and mean values of μk1=5000, μk2=10,000, and μk3=9000.

The quantity to be measured for model calibration is the maximum acceleration A3 in the third mass m3. A computational model A3=F(k) based on finite element analysis has been provided by Sandia National Laboratories [29].

To improve the computational efficiency, a Gaussian process (GP) [33,34] surrogate model A3=GP(k) is constructed to replace the expensive dynamics computational model. The prediction of the GP model is a Gaussian distribution N(μ(k),σ2(k)); thus, a CPD is given by the GP model.

The observation variable is denoted as D and we have D=A3+ϵm where ϵm is the measurement error with a zero-mean Gaussian distribution ϵmN(0,σm2). Thus, another CPD is given by the measurement error. In this example, σm is another parameter to be calibrated and we assign a noninformative uniform prior distribution U(150,250) to it.

A BN is established for this model calibration problem, as shown in Fig. 9. In this example, we are interested in: (1) calculating the first-order Sobol’ index of the observation variable D quantifying its contribution toward the uncertainty in each calibration parameters of interest {k1,k2,k3,σm} and (2) predicting the VRR of the calibration parameters at a given observation.

As samples are generated from the joint prior distribution of this network, the first-order Sobol’ indices of {k1,k2,k3,σm} are obtained by considering the calibration parameter as XN and the observation variable D as X1 in Eq. (12). The results are listed in Table 1. From this table, we conclude that the variance of k1 will reduce by 50% on average due to calibration; the variance reduction of k3 is 11% on average; while the variance of k2 and σm will not be reduced significantly by calibration. This is a very valuable insight, which is also supported by Fig. 10. Thus, if we want to reduce the uncertainty in k2, we need to observe another quantity. In the latter computation of VRR at specific observations of A3, we focus on k1 and k3.

Table 1 shows the average VRR. Now assume that the specific observed value of A3 is known (a synthetic data point). Based on Eq. (13), we predict the VRR of k1 and k3 for this specific observation, as shown in Table 2, where the “by inference” method mean we finish the Bayesian inference and compute the VRR based on the posterior distributions. In this example, the Bayesian inference is finished by the rejection sampling algorithm [35] which results in 2×104 samples from the posterior distributions of the calibration parameters. Figure 10 shows the probability density functions of these posterior distributions at data point of D=4500. We recalculate the VRR by comparing the variances of the posterior samples and the prior samples. As shown in Table 2, our earlier predictions are close to the sample-based results. However, the proposed method only uses the samples from the prior distribution, and no actual Bayesian inference effort is required. In a personal computer of Intel Core i7-4710HQ CPU, the proposed method spends less than 1 s to calculate the sensitivity index at a data point, while the other method by inference spends about 20 s.

In summary, this example verified the effectiveness of the proposed method to predict the variance reduction ratio before conducting the Bayesian inference. Thus, the proposed method provides valuable guidance for selecting observation nodes; for example, in the subsequent updating, nodes such as k2 and σm cannot be updated by observing A3 data.

Example of a Dynamic Bayesian Network.

This example applies the proposed method to a mathematical example of a dynamic BN, as shown in Fig. 11. The CPDs of this dynamic BN are as follows.

The root node C0 has an unknown true value to be calibrated, so that C0 is a static node and C0t=C0t+1. The prior distribution of C0 is N(2,0.52). C1 and C2 are two dynamic state variables, and their states are to be tracked. At t=1 the CPD of the child node C1 is C11N(C012+10, 12); at t>1 the CPD of C1 is C1tN(C0t2+0.9C1t1+1, 12), thus the distribution of C1 depends on its previous value and the value of C0. C2 is the child node of C1 and its CPD is C2tN(C1t2,52). In this problem, the observation node is C3 and its CPD is C3tN(C2t,(C2t/20)2), i.e., the value of C2t plus a measurement error of zero mean Gaussian distribution. This example considers the first 30 steps of this dynamic BN. Assuming the true value of C0 is 2.5, the synthetic data of the observation node C3 is generated at each step, as shown in Fig. 12.

A widely used particle filter method named sequential importance resampling (SIR) algorithm [36] is applied in this example to track the state variables. Here a particle is a sample from the joint distribution of the state variables. This SIR algorithm propagates the particles of the posterior distribution at time step t1 to time step t to obtain the particles of the prior distribution of time step t. The likelihoods of these particles are calculated and normalized as the weights for them. Then the particles are resampled based on the weight terms and the resultant new particles represent the posterior joint distribution of the state variables in time step t. Details of this SIR algorithm can be found in Ref. [36] and the Appendix.

The number of particles in this example is 50,000. The mean value and 95% bounds of the posterior distributions of the state variables are shown in Fig. 13. The uncertainty of C0 reduces and its posterior distribution approximates to its true value 2.5, but this uncertainty reduction is not significant after step 20.

At each step, before the calculation of the likelihoods and the particle resampling, we apply the proposed method using the particles of the prior joint distribution of the state variables. The VRR of each state variable is predicted by the proposed method of Eq. (13) using the prior samples of the state variables. This VRR is also calculated by the prior/posterior samples at each step. Figure 14 shows that the results from these two methods are consistent so that the proposed method is verified. Note that the proposed method uses the prior samples and the observation data, while the sample-based method needs both the prior and posterior sample. In other words, the proposed method can be applied before Bayesian inference, but the sample-based method happens after the Bayesian inference has been done.

In this example, the CPD of C0 is C0t=C0t+1 so the uncertainty of C0 will not be enlarged in the propagation from time step t1 to time step t. However, its uncertainty is reduced by the updating in each time step. Figure 13 shows that this uncertainty reduction is significant for the first five times steps so that the VRR in Fig. 14 has a large value before time step 5; this uncertainty reduction is negligible after time step 20 so that the value of VRR is closer to zero after time step 20.

In comparison, Fig. 13 shows that the uncertainty in the posterior distributions of C1 and C2 are increasing, even if their VRR values in Fig. 14 are always significant. The reason is that the uncertainties of C1 and C2 are enlarged in the propagation from time step t1 to time step t, so their prior distributions at t have more uncertainty than the posterior distribution at t1. The uncertainty in the prior at t is reduced by the updating, but the posterior uncertainty at t may not be smaller than the posterior uncertainty at t1 if the uncertainty reduction by the updating cannot outperform the uncertainty enlargement by the propagation. Note that the VRR in Fig. 14 is the variance reduction with respect to the prior/posterior distribution at the same time step, not the variance reduction for adjacent posterior distributions.

In summary, this example extended the proposed sensitivity analysis to a dynamic BN and verified its validity. The proposed method is capable of predicting the variance reduction of each state variable before updating.

Conclusion

In a BN, how a node of interest is affected by fixing another node at some value is of prominent interest. The proposed GSA for BN calculates the first-order Sobol’ index to quantity the sensitivity of the node of interest XN with respect to an input/observation node X1. In forward propagation where X1 is an upstream node, a low index indicates that XN is not sensitive to the value of X1 so that we can simply fix X1 at a deterministic value. In backward inference where X1 is a downstream node, a low sensitivity index indicates that the uncertainty of XN cannot be reduced by observing X1; thus, we should observe another node of higher Sobol’ index in order to reduce its uncertainty.

The proposed GSA for BN is developed in two steps. First, an auxiliary variable method is used to convert the path between node X1 and node XN to a deterministic function thus making the Sobol’ index computation feasible for a BN. If the path from X1 to XN is not a directed path form, the theorem of arc reversal is used to transform it to the desired directed path so that the auxiliary variable method can still be used to build the deterministic function. Second, this paper uses an efficient algorithm to directly estimate the Sobol’ index from the samples of the prior distribution of the BN, so that the proposed GSA for the BN is computationally affordable. The resultant Sobol’ index is the average variance reduction ratio across all possible observations of X1. This paper also extend the algorithm to give a better estimate of the uncertainty reduction of the node of interest when the value of the observation is known, thus providing an informative guidance before the updating.

The limitation of the proposed method at present is that currently it only considers a single observation; thus, an extension to the case of multiple observations needs to be addressed in future work.

Acknowledgment

The authors thank Dr. Zhen Hu for valuable discussions.

Nomenclature

     
  • F1() =

    inverse CDF

  •  
  • k =

    stiffnesses

  •  
  • PV1,PV2 =

    parent nodes of V1,V2

  •  
  • Si =

    first-order Sobol’ index of xi

  •  
  • Ux =

    auxiliary variable of x

  •  
  • V1,V2 =

    nodes in BN

  •  
  • xi =

    a random variable

  •  
  • xi =

    all random variables other than xi

  •  
  • Xj =

    a node in the BN

  •  
  • θx =

    distribution parameter of random variable x

  •  
  • π() =

    probability density function

  •  
  • Φl =

    a equally probably interval

Funding Data

  • The Air Force Office of Scientific Research (Grant No. FA9550-15-1-0018).

Appendix: Introduction to Particle Filter

Particle filter is a general algorithm to track the evolution of the state variables in a dynamic Bayesian network (DBN). In this section, capital letters denote random variables; lower-case letters denote particles, where the superscript i indicates that it is the ith particle. The subscripts of letters indicate the time step. Thus, the state variables at time step t are denoted as Xt, as shown in the simple DBN of Fig. 15.

Assume that the state variables Xtm at time t evolves from the state variables Xt1m according to the state function 
Xt=f(Xt1,vt1) 
(A1)
where vt1m is the vector of noise terms in the state function. The measurement Ztn is obtained according to the measurement function 
Zt=h(Xt,nt)
(A2)

where ntn is the vector of noise terms in the measurement function.

In case that the DBN represented by Eqs. (A1) and (A2) is not a linear Gaussian DBN, several particle filter algorithms have been developed to track the evolution of Xt and Zt. The most basic particle filter algorithm is sequential importance sampling [36]. The sequential importance sampling considers the full joint posterior distribution at time step t, p(X0:t|Z1:t). This distribution is approximated with a weighted set of particles {x0:ti,ωti}t=1N. These particles approximate the joint posterior distribution p(X0:t|Z1:t) by 
p(X0:t|Z1:t)i=1Nωtiδx0:ti 
(A3)

where δx0:ti is a delta function at x0:ti.

At time step t, the ith particle of Xt is denoted as xti, and it is sampled based on the current state X0:t1i and the observation Z1:t according to a proposal density 
Xtiq(Xt|X0:t1i,Z1:t)
(A4)

In other words, the new state Xti of the ith particle at time step t is sampled from a distribution which takes the current state X0:t1i and the observation Z1:t as parameters.

At time step t, the weight ωti is updated from ωt1i by 
ωtiωt1ip(Zt|Xti)p(Xti|Xt1i)q(Xti|Xt1i,Zt)
(A5)

In addition, the initial state X0i are sampled from the joint prior distribution of the state variables, and the initial weight ω0i for each particle is 1/N.

In practice, iterations of Eqs. (A4) and (A5) over time step t may lead to particle degeneracy problem, i.e., only a few particles have significant weights but most particles have negligible weights. In that case, we are assigning most computational efforts to the particles of nonsignificant contribution to the posterior distribution. This degeneracy problem can be solved by resampling, i.e., generating a new set of N particles based on the particles of Xt. The new particles represent the same posterior distribution as the former particles.

The simplest strategy of resampling is generating new resampled particles based on the discrete approximation shown in Eq. (A3), and the weight of each new particle is set as 1/N again. This resampling is bootstrapping process of N iterations, and each iteration selects one particle from current particles with replacement. In an iteration, the probability that a particle is selected is proportional to its weight.

The resampling strategy mentioned previously based on Eq. (A3) is adopted in a widely used algorithm, the SIR algorithm [36]. The SIR algorithm: (1) takes the state transition distribution p(Xt|Xt1i) as the proposal density distribution q(Xt|X0:t1i,Z1:t), and (2) conducts resampling at each iteration. Thus, Eqs. (A4) and (A5) reduce to 
Xtip(Xt|Xt1i)
(A6)
 
ωtip(Zt|Xti)
(A7)

Note that resampling based on Eq. (A3) is after the calculation of Eqs. (A6) and (A7) at each time step, where new particles of Xti are generated and the weight of each new particle is set as 1/N.

References

References
1.
Ling
,
Y.
, and
Mahadevan
,
S.
,
2012
, “
Integration of Structural Health Monitoring and Fatigue Damage Prognosis
,”
Mech. Syst. Signal Process.
,
28
, pp.
89
104
.
2.
Sankararaman
,
S.
,
Ling
,
Y.
, and
Mahadevan
,
S.
,
2011
, “
Uncertainty Quantification and Model Validation of Fatigue Crack Growth Prediction
,”
Eng. Fract. Mech.
,
78
(
7
), pp.
1487
1504
.
3.
Helman
,
P.
,
Veroff
,
R.
,
Atlas
,
S. R.
, and
Willman
,
C.
,
2004
, “
A Bayesian Network Classification Methodology for Gene Expression Data
,”
J. Comput. Biol.
,
11
(
4
), pp.
581
615
.
4.
Friedman
,
N.
,
Geiger
,
D.
, and
Goldszmidt
,
M.
,
1997
, “
Bayesian Network Classifiers
,”
Mach. Learn.
,
29
(2–3), pp.
131
163
.
5.
Korb
,
K. B.
, and
Nicholson
,
A. E.
,
2010
,
Bayesian Artificial Intelligence
,
CRC Press
,
Boca Raton, FL
.
6.
Poole
,
D.
,
1993
, “
Probabilistic Horn Abduction and Bayesian Networks
,”
Artif. Intell.
,
64
(
1
), pp.
81
129
.
7.
Saltelli
,
A.
,
Ratto
,
M.
,
Andres
,
T.
,
Campolongo
,
F.
,
Cariboni
,
J.
,
Gatelli
,
D.
,
Saisana
,
M.
, and
Tarantola
,
S.
,
2008
,
Global Sensitivity Analysis: The Primer
,
Wiley
,
Chichester, UK
.
8.
Hu
,
Z.
, and
Du
,
X.
,
2015
, “
Mixed Efficient Global Optimization for Time-Dependent Reliability Analysis
,”
ASME J. Mech. Des.
,
137
(
5
), p.
051401
.
9.
Li
,
C.
, and
Mahadevan
,
S.
,
2016
, “
Robust Test Resource Allocation Using Global Sensitivity Analysis
,”
AIAA
Paper No. 2016-0952.
10.
Li
,
C.
, and
Mahadevan
,
S.
,
2016
, “
Relative Contributions of Aleatory and Epistemic Uncertainty Sources in Time Series Prediction
,”
Int. J. Fatigue
,
82
(Part 3), pp.
474
486
.
11.
Nannapaneni
,
S.
, and
Mahadevan
,
S.
,
2014
, “
Uncertainty Quantification in Performance Evaluation of Manufacturing Processes
,”
IEEE International Conference on Big Data
(
Big Data
), Washington, DC, Oct. 27–30, pp.
996
1005
.
12.
Mullins
,
J.
,
Li
,
C.
,
Mahadevan
,
S.
, and
Urbina
,
A.
,
2014
, “
Optimal Selection of Calibration and Validation Test Samples Under Uncertainty
,”
Model Validation and Uncertainty Quantification
, Vol.
3
,
H. S.
Atamturktur
,
B.
Moaveni
,
C.
Papadimitriou
, and
T.
Schoenherr
, eds.,
Springer International Publishing
,
Berlin
, pp.
391
401
.
13.
Li
,
C.
, and
Mahadevan
,
S.
,
2015
, “
Sensitivity Analysis for Test Resource Allocation
,”
Model Validation and Uncertainty Quantification
, Vol.
3
, SE—14,
H. S.
Atamturktur
,
B.
Moaveni
,
C.
Papadimitriou
, and
T.
Schoenherr
, eds.,
Springer International Publishing
,
Berlin
, pp.
143
150
.
14.
Saltelli
,
A.
, and
Tarantola
,
S.
,
2002
, “
On the Relative Importance of Input Factors in Mathematical Models
,”
J. Am. Stat. Assoc.
,
97
(
459
), pp.
702
709
.
15.
Marrel
,
A.
,
Iooss
,
B.
,
Laurent
,
B.
, and
Roustant
,
O.
,
2009
, “
Calculations of Sobol Indices for the Gaussian Process Metamodel
,”
Reliab. Eng. Syst. Saf.
,
94
(
3
), pp.
742
751
.
16.
Sobol’
, I
. M.
,
2001
, “
Global Sensitivity Indices for Nonlinear Mathematical Models and Their Monte Carlo Estimates
,”
Math. Comput. Simul.
,
55
(
1–3
), pp.
271
280
.
17.
Zhang
,
X.
, and
Pandey
,
M. D.
,
2014
, “
An Effective Approximation for Variance-Based Global Sensitivity Analysis
,”
Reliab. Eng. Syst. Saf.
,
121
, pp.
164
174
.
18.
Sudret
,
B.
,
2008
, “
Global Sensitivity Analysis Using Polynomial Chaos Expansions
,”
Reliab. Eng. Syst. Saf.
,
93
(
7
), pp.
964
979
.
19.
Chen
,
W.
,
Jin
,
R.
, and
Sudjianto
,
A.
,
2005
, “
Analytical Variance-Based Global Sensitivity Analysis in Simulation-Based Design Under Uncertainty
,”
ASME J. Mech. Des.
,
127
(
5
), pp.
875
884
.
20.
Homma
,
T.
, and
Saltelli
,
A.
,
1996
, “
Importance Measures in Global Sensitivity Analysis of Nonlinear Models
,”
Reliab. Eng. Syst. Saf.
,
52
(
1
), pp.
1
17
.
21.
Sobol’
, I
. M.
, and
Myshetskaya
,
E. E.
,
2008
, “
Monte Carlo Estimators for Small Sensitivity Indices
,”
Monte Carlo Methods Appl.
,
13
(
5–6
), pp.
455
465
.
22.
Owen
,
A.
,
2013
, “
Better Estimation of Small Sobol Sensitivity Indices
,”
ACM Trans. Model. Comput. Simul.
,
23
(2), pp.
11
17
.
23.
Saltelli
,
A.
,
2002
, “
Making Best Use of Model Evaluations to Compute Sensitivity Indices
,”
Comput. Phys. Commun.
,
145
(
2
), pp.
280
297
.
24.
Sankararaman
,
S.
, and
Mahadevan
,
S.
,
2013
, “
Separating the Contributions of Variability and Parameter Uncertainty in Probability Distributions
,”
Reliab. Eng. Syst. Saf.
,
112
, pp.
187
199
.
25.
Angus
,
J. E.
,
1994
, “
The Probability Integral Transform and Related Results
,”
SIAM Rev.
,
36
(
4
), pp.
652
654
.
26.
Li
,
C.
, and
Mahadevan
,
S.
,
2015
, “
Global Sensitivity Analysis for System Response Prediction Using Auxiliary Variable Method
,”
AIAA
Paper No. 2015-0661.
27.
Shachter
,
R. D.
,
1986
, “
Evaluating Influence Diagrams
,”
Oper. Res.
,
34
(
6
), pp.
871
882
.
28.
Li
,
C.
, and
Mahadevan
,
S.
,
2016
, “
An Efficient Modularized Sample-Based Method to Estimate the First-Order Sobol’ Index
,”
Reliab. Eng. Syst. Saf.
,
153
, pp.
110
121
.
29.
Red-Horse
,
J. R.
, and
Paez
,
T. L.
,
2008
, “
Sandia National Laboratories Validation Workshop: Structural Dynamics Application
,”
Comput. Methods Appl. Mech. Eng.
,
197
(29–32), pp.
2578
2584
.
30.
Li
,
C.
, and
Mahadevan
,
S.
,
2014
, “
Uncertainty Quantification and Output Prediction in Multi-Level Problems
,”
AIAA
Paper No. 2014-0124.
31.
Li
,
C.
, and
Mahadevan
,
S.
,
2016
, “
Role of Calibration, Validation, and Relevance in Multi-Level Uncertainty Integration
,”
Reliab. Eng. Syst. Saf.
,
148
, pp.
32
43
.
32.
Mullins
,
J.
,
Li
,
C.
,
Sankararaman
,
S.
,
Mahadevan
,
S.
, and
Urbina
,
A.
,
2013
, “
Probabilistic Integration of Validation and Calibration Results for Prediction Level Uncertainty Quantification: Application to Structural Dynamics
,”
AIAA
Paper No. 2013-1872.
33.
Xu
,
P.
,
Su
,
X.
,
Mahadevan
,
S.
,
Li
,
C.
, and
Deng
,
Y.
,
2014
, “
A Non-Parametric Method to Determine Basic Probability Assignment for Classification Problems
,”
Appl. Intell.
,
41
(
3
), pp.
681
693
.
34.
Pan
,
F.
,
Zhu
,
P.
,
Chen
,
W.
, and
Li
,
C.
,
2013
, “
Application of Conservative Surrogate to Reliability Based Vehicle Design for Crashworthiness
,”
J. Shanghai Jiaotong Univ.
,
18
(
2
), pp.
159
165
.
35.
Henrion
,
M
.,
1988
, “
Propagation of Uncertainty by Probabilistic Logic Sampling in Bayes' Networks
,”
Uncertainty Artif. Intell.
,
5
, pp.
149
164
.
36.
Arulampalam
,
M. S.
,
Maskell
,
S.
,
Gordon
,
N.
, and
Clapp
,
T.
,
2002
, “
A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking
,”
IEEE Trans. Signal Process.
,
50
(
2
), pp.
174
188
.