Issue Section:

Discussion
## 1 Introduction

In Ref. [1], the authors developed a sequential Bayesian optimal design framework to estimate the statistical expectation of a black-box function $f(x):Rd\u2192R$. Let (1)The function (2)

**x**∼*p*(**x**) with*p*(**x**) the probability distribution of the input**x**, the statistical expectation is then defined as follows:$q=\u222bf(x)p(x)dx$

*f*(**x**) is not known a priori but can be evaluated at arbitrary**x**with Gaussian noise of variance $\sigma 2$:$y=f(x)+\epsilon ,\epsilon \u223cN(0,\sigma 2)$

Based on the Gaussian process surrogate learned from the available samples (3)where $G(x~)$ computes the information gain of adding a sample $y~$ at $x~$, i.e., the expected KL divergence between the current estimation (4)In Eq. (4), $y~$ is chosen based on the surrogate

**D**_{n}= {**X**_{n},**Y**_{n}}, i.e., $f(x)|Dn\u223cGP(mn(x),kn(x,x\u2032))$, the next-best sample is chosen by maximizing the information-based acquisition $G(x~)$:$xn+1=argmaxx~G(x~)$

*p*(*q*|**D**_{n}) and the hypothetical next-step estimation $p(q|Dn,x~,y~)$:$G(x~)=Ey~[KL(p(q|Dn,x~,y~)|p(q|Dn))]=\u222b\u222bp(q|Dn,x~,y~)logp(q|Dn,x~,y~)p(q|Dn)dqp(y~|x~,Dn)dy~$

*f*(**x**)|**D**_{n}following a distribution of $N(mn(x~),kn(x~,x~)+\sigma 2)$);*q*is considered as a random variable with uncertainties coming from the (current and next-step) surrogates. It is noted that $G(x~)$ also depends on the hyperparameter $\theta $ in the learned Gaussian process*f*(**x**)|**D**_{n}. We neglect this dependence for simplicity, which does not affect the main derivation.As a major contribution of the discussed paper, the authors simplified the information-based acquisition as Eq. (30) in Ref. [1]:(5)where $\sigma 12$ and $\sigma 22$ are, respectively, the variances of current estimation and hypothetical next-step estimation of

$G(x~)=log(\sigma 1\sigma 2(x~))+12\sigma 22(x~)\sigma 12\u221212+12v(x~)2\sigma 12(\sigma n2(x~)+\sigma 2)$

*q*; $v(x~)=\u222bkn(x~,x)p(x)dx$ and $\sigma n2(x~)=kn(x~,x~)$. Furthermore, for numerical computation of Eq. (5), the authors developed analytical formulas for each involved quantity (important for high-dimensional computation) under uniform distribution of**x**.The purpose of our discussion is to show the following two critical points:

## 2 Derivation of the Simplified Acquisition $G(x~)$

To simplify Eq. (4), we first notice that (6)(7)(8)After adding one hypothetical sample ${x~,y~}$, the function follows an updated surrogate $f(x)|Dn,x~,y~\u223cGP(mn+1(x),kn+1(x,x\u2032))$ with(9)(10)The quantity $q|Dn,x~,y~$ can then be represented by another Gaussian with mean (11)(12)(13)

*q*|**D**_{n}follows a Gaussian distribution with mean*μ*_{1}and variance $\sigma 12$:$p(q|Dn)=N(q;\mu 1,\sigma 12)\mu 1=E[\u222bf(x)p(x)dx|Dn]$

$=\u222bmn(x)p(x)dx$

$\sigma 12=E[(\u222bf(x)p(x)dx)2|Dn]\u2212(E[(\u222bfn(x)p(x)dx)|Dn])2=\u222b\u222bkn(x,x\u2032)p(x)p(x\u2032)dx\u2032dx$

$mn+1(x)=mn(x)+kn(x~,x)(y~\u2212mn(x~))kn(x~,x~)+\sigma 2$

$kn+1(x,x\u2032)=kn(x,x\u2032)\u2212kn(x~,x)kn(x\u2032,x~)kn(x~,x~)+\sigma 2$

*μ*_{2}and variance $\sigma 22$:$p(q|Dn,x~,y~)=N(q;\mu 2(x~,y~),\sigma 22(x~)),$

$\mu 2(x~,y~)=E[\u222bf(x)p(x)dx|Dn,x~,y~]=\u222bmn+1(x)p(x)dx=\mu 1+\u222bkn(x~,x)p(x)dxkn(x~,x~)+\sigma 2(y~\u2212mn(x~))$

$\sigma 22(x~)=E[(\u222bf(x)p(x)dx)2|Dn,x~,y~]\u2212(E[(\u222bfn(x)p(x)dx)|Dn,x~,y~])2=\u222b\u222bkn+1(x,x\u2032)p(x)p(x\u2032)dx\u2032dx=\sigma 12\u2212(\u222bkn(x~,x)p(x)dx)2kn(x~,x~)+\sigma 2$

We note that Eqs. (7), (8), (12), and (13) are, respectively, intermediate steps of Eqs. (19), (21), (26), and (28) in the discussed paper. Substitute Eq. (6) and Eq. (11) into Eq. (4), one can obtain:(14)(15)where Eq. (14) is exactly Eq. (5) (or Eq. (30) in discussed paper). The fact that the last three terms of Eq. (14) sum up to zero is a direct result of Eq. (13).

$G(x~)=\u222b\u222bp(q|Dn,x~,y~)logp(q|Dn,x~,y~)p(q|Dn)dqp(y~|x~,Dn))dy~=\u222b(log(\sigma 1\sigma 2(x~))+\sigma 22(x~)2\sigma 12+(\mu 2(x~,y~)\u2212\mu 1)22\sigma 12\u221212)p(y~|x~,Dn)dy~=log(\sigma 1\sigma 2(x~))+12\sigma 12(\u222b(\mu 2(x~,y~)\u2212\mu 1)2p(y~|x~,Dn)dy~+\sigma 22(x~)\u2212\sigma 12)=log(\sigma 1\sigma 2(x~))+12\sigma 12((\u222bkn(x~,x)p(x)dx)2kn(x~,x~)+\sigma 2+\sigma 22(x~)\u2212\sigma 12)$

$=log(\sigma 1\sigma 2(x~))$

The advantage of having a simplified form (15) is that the optimization (3) yields a much more intuitive physical interpretation. Since $\sigma 1$ does not depend on $x~$, Eq. (3) can be reformulated as(16)which selects the next-best sample minimizing the expected variance of

$xn+1=argminx~\sigma 22(x~)$

*q*. Similar optimization criterion is also used in Refs. [2,3] for the purpose of computing the extreme-event probability.Another alternative interpretation can be obtained by writing (3) as(17)(18)where Eq. (17) is a result of Eq. (13), and $\rho y|Dn(x~,x)=kn(x~,x)/((kn(x~,x~)+\sigma 2)(kn(x,x)+\sigma 2))1/2$ is the correlation of

$xn+1\u2261argmaxx~\sigma 12\u2212\sigma 22(x~)\u2261argmaxx~(\u222bkn(x~,x)p(x)dx(kn(x~,x~)+\sigma 2)1/2)2$

$\u2261argmaxx~(\u222b\rho y|Dn(x~,x)(kn(x,x)+\sigma 2)1/2p(x)dx)2,$

*y*for two inputs $x~$ and**x**. Equation (18) can be interpreted as to select the next sample which has overall most (weighted) correlation with all**x**.We finally remark that the above derivation is for given hyperparamter values $\theta $ in (19)

*f*(**x**)|**D**_{n}. This is consistent with the Bayesian approach where the optimal values of $\theta $ are chosen from maximizing the likelihood function. However, the discussed paper used a different approach by sampling a distribution of $\theta $ and computed $G(x~)$ as an average of the sampling. In the latter case, the above analysis should be likewise considered in a slightly different way, i.e., Eq. (16) should be considered as maximization of the multiplication of $\sigma 22$ from all samples of ${\theta (i)}i=1s$:$xn+1=argmaxx~1s\u2211i=1sG(x~,\theta (i))\u2261argmaxx~1s\u2211i=1slog(\sigma 1(\theta (i))\sigma 2(x~,\theta (i)))\u2261argminx~1s\u2211i=1slog\sigma 2(x~,\theta (i))\u2261argminx~\u220fi=1s\sigma 22(x~,\theta (i))$

## 3 Analytical Computation of *G*(**x**) for Arbitrary Input Distribution *p*(**x**)

In the computation of (20)where(21)(22)with

*G*(**x**) in the form of Eq. (17), the most heavy computation involved is the integral $\u222bkn(x~,x)p(x)dx$ (which is prohibitive in high-dimensional problem if direct integration is performed). Following the discussed paper, the integral can be reformulated as$\u222bkn(x~,x)p(x)dx=K(x~)\u2212k(x~,Xn)(K(Xn,Xn)+\sigma 2In)\u22121K(Xn)$

$K(x)=\u222bk(x,x\u2032)p(x\u2032)dx\u2032$

$k(x,x\u2032)=s2exp(\u221212(x\u2212x\u2032)T\Lambda \u22121(x\u2212x\u2032))$

*s*and Λ involving hyperparameters of the kernel function (with either optimized values from training or selected values as in Ref. [1]).The main computation is then Eq.(21), for which the authors of the discussed paper addressed the situation of uniform (23)Equation (21) can then be formulated as:(24)which yields an analytical computation. In practice, the number of mixtures

*p*(**x**). To generalize the formulation to arbitrary*p*(**x**), we can approximate*p*(**x**) with the Gaussian mixture model (as an universal approximator of distributions [4]) :$p(x)\u2248\u2211i=1nGMM\alpha iN(x;wi,\Sigma i)$

$K(x)\u2248\u2211i=1nGMM\alpha i\u222bk(x,x\u2032)N(x\u2032;wi,\Sigma i)dx\u2032=\u2211i=1nGMM\alpha i|\Sigma i\Lambda \u22121+I|\u22121/2k(x,wi;\Sigma i+\Lambda )$

*n*_{GMM}is determined by the complexity of the input distributions, but any distribution of*p*(**x**) can be approximated in such a way.Finally, in computing $G(x~)$ in the form of Eq. (19), computation of $\sigma 12$ in Eq. (8) is necessary. This can also be generalized for arbitrary (25)

*p*(**x**) using the Gaussian mixture model as follows:$\sigma 12=\u222b\u222bkn(x,x\u2032)p(x)p(x\u2032)dx\u2032dx=\u222b\u222bk(x,x\u2032)p(x)p(x\u2032)dx\u2032dx\u2212\u222bk(x,Xn)p(x)dx(K(Xn,Xn)+\sigma 2In)\u22121\u222bk(Xn,x\u2032)p(x\u2032)dx\u2032=\u2211i=1nGMM\u2211j=1nGMM\alpha i\alpha j|\Lambda |1/2|\Lambda +\Sigma i+\Sigma j|\u22121/2k(wi,wj;\Lambda +\Sigma i+\Sigma j)\u2212K(Xn)T(K(Xn,Xn)+\sigma 2In)\u22121K(Xn)$

## Conflicts of Interest

There are no conflicts of interest.

## Data Availability Statement

No data, models, or code were generated or used for this paper.

## References

1.

Pandita

, P.

, Bilionis

, I.

, and Panchal

, J.

, 2019

, “Bayesian Optimal Design of Experiments for Inferring the Statistical Expectation of Expensive Black-Box Functions

,” ASME J. Mech. Des.

, 141

(10

), p. 101404

. 2.

Hu

, Z.

, and Mahadevan

, S.

, 2016

, “Global Sensitivity Analysis-Enhanced Surrogate (gsas) Modeling for Reliability Analysis

,” Struct. Multidiscipl. Optim.

, 53

(3

), pp. 501

–521

. 3.

Blanchard

, A.

, and Sapsis

, T.

, 2021

, “Output-weighted Optimal Sampling for Bayesian Experimental Design and Uncertainty Quantification

,” SIAM/ASA J. Uncertainty Quantif.

, 9

(2

), pp. 564

–592

. 4.

Goodfellow

, I.

, Bengio

, Y.

, and Courville

, A.

, 2016

, Deep Learning

, MIT Press

, Cambridge, MA

.Copyright © 2021 by ASME