Issue Section:
Discussion
1 Introduction
In Ref. [1], the authors developed a sequential Bayesian optimal design framework to estimate the statistical expectation of a black-box function . Let x ∼ p(x) with p(x) the probability distribution of the input x, the statistical expectation is then defined as follows:The function f(x) is not known a priori but can be evaluated at arbitrary x with Gaussian noise of variance :
(1)
(2)
Based on the Gaussian process surrogate learned from the available samples Dn = {Xn, Yn}, i.e., , the next-best sample is chosen by maximizing the information-based acquisition :where computes the information gain of adding a sample at , i.e., the expected KL divergence between the current estimation p(q|Dn) and the hypothetical next-step estimation :In Eq. (4), is chosen based on the surrogate f(x)|Dn following a distribution of ); q is considered as a random variable with uncertainties coming from the (current and next-step) surrogates. It is noted that also depends on the hyperparameter in the learned Gaussian process f(x)|Dn. We neglect this dependence for simplicity, which does not affect the main derivation.
(3)
(4)
As a major contribution of the discussed paper, the authors simplified the information-based acquisition as Eq. (30) in Ref. [1]:where and are, respectively, the variances of current estimation and hypothetical next-step estimation of q; and . Furthermore, for numerical computation of Eq. (5), the authors developed analytical formulas for each involved quantity (important for high-dimensional computation) under uniform distribution of x.
(5)
The purpose of our discussion is to show the following two critical points:
2 Derivation of the Simplified Acquisition
To simplify Eq. (4), we first notice that q|Dn follows a Gaussian distribution with mean μ1 and variance :After adding one hypothetical sample , the function follows an updated surrogate withThe quantity can then be represented by another Gaussian with mean μ2 and variance :
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
We note that Eqs. (7), (8), (12), and (13) are, respectively, intermediate steps of Eqs. (19), (21), (26), and (28) in the discussed paper. Substitute Eq. (6) and Eq. (11) into Eq. (4), one can obtain:where Eq. (14) is exactly Eq. (5) (or Eq. (30) in discussed paper). The fact that the last three terms of Eq. (14) sum up to zero is a direct result of Eq. (13).
(14)
(15)
The advantage of having a simplified form (15) is that the optimization (3) yields a much more intuitive physical interpretation. Since does not depend on , Eq. (3) can be reformulated aswhich selects the next-best sample minimizing the expected variance of q. Similar optimization criterion is also used in Refs. [2,3] for the purpose of computing the extreme-event probability.
(16)
We finally remark that the above derivation is for given hyperparamter values in f(x)|Dn. This is consistent with the Bayesian approach where the optimal values of are chosen from maximizing the likelihood function. However, the discussed paper used a different approach by sampling a distribution of and computed as an average of the sampling. In the latter case, the above analysis should be likewise considered in a slightly different way, i.e., Eq. (16) should be considered as maximization of the multiplication of from all samples of :
(19)
3 Analytical Computation of G(x) for Arbitrary Input Distribution p(x)
In the computation of G(x) in the form of Eq. (17), the most heavy computation involved is the integral (which is prohibitive in high-dimensional problem if direct integration is performed). Following the discussed paper, the integral can be reformulated aswherewith s and Λ involving hyperparameters of the kernel function (with either optimized values from training or selected values as in Ref. [1]).
(20)
(21)
(22)
The main computation is then Eq.(21), for which the authors of the discussed paper addressed the situation of uniform p(x). To generalize the formulation to arbitrary p(x), we can approximate p(x) with the Gaussian mixture model (as an universal approximator of distributions [4]) :Equation (21) can then be formulated as:which yields an analytical computation. In practice, the number of mixtures nGMM is determined by the complexity of the input distributions, but any distribution of p(x) can be approximated in such a way.
(23)
(24)
Conflicts of Interest
There are no conflicts of interest.
Data Availability Statement
No data, models, or code were generated or used for this paper.
References
1.
Pandita
, P.
, Bilionis
, I.
, and Panchal
, J.
, 2019
, “Bayesian Optimal Design of Experiments for Inferring the Statistical Expectation of Expensive Black-Box Functions
,” ASME J. Mech. Des.
, 141
(10
), p. 101404
. 2.
Hu
, Z.
, and Mahadevan
, S.
, 2016
, “Global Sensitivity Analysis-Enhanced Surrogate (gsas) Modeling for Reliability Analysis
,” Struct. Multidiscipl. Optim.
, 53
(3
), pp. 501
–521
. 3.
Blanchard
, A.
, and Sapsis
, T.
, 2021
, “Output-weighted Optimal Sampling for Bayesian Experimental Design and Uncertainty Quantification
,” SIAM/ASA J. Uncertainty Quantif.
, 9
(2
), pp. 564
–592
. 4.
Goodfellow
, I.
, Bengio
, Y.
, and Courville
, A.
, 2016
, Deep Learning
, MIT Press
, Cambridge, MA
.Copyright © 2021 by ASME