We consider the utilization of a computational model to guide the optimal acquisition of experimental data to inform the stochastic description of model input parameters. Our formulation is based on the recently developed consistent Bayesian approach for solving stochastic inverse problems, which seeks a posterior probability density that is consistent with the model and the data in the sense that the push-forward of the posterior (through the computational model) matches the observed density on the observations almost everywhere. Given a set of potential observations, our optimal experimental design (OED) seeks the observation, or set of observations, that maximizes the expected information gain from the prior probability density on the model parameters. We discuss the characterization of the space of observed densities and a computationally efficient approach for rescaling observed densities to satisfy the fundamental assumptions of the consistent Bayesian approach. Numerical results are presented to compare our approach with existing OED methodologies using the classical/statistical Bayesian approach and to demonstrate our OED on a set of representative partial differential equations (PDE)-based models.

## Introduction

Experimental data are often used to infer valuable information about parameters for models of physical systems. However, the collection of experimental data can be costly and time consuming. For example, exploratory drilling can reveal valuable information about subsurface hydrocarbon reservoirs, but each well can cost upward of tens of millions of U.S. dollars. In such situations, we can only afford to gather some limited number of experimental data; however, not all experiments provide the same amount of information about the processes they are helping inform. Consequently, it is important to design experiments in an optimal way, i.e., to choose some limited number of experimental data to maximize the value of each experiment.

The first experimental design methods employed mainly heuristics, based on concepts such as space-filling and blocking, to select field experiments [15]. While these methods can perform well in some situations, these methods can be improved upon by incorporating any knowledge of the underlying physical processes being inferred or measured. Using physical models to guide, experiment selection has been shown to drastically improve the cost-effectiveness of experimental designs for a variety of models based on ordinary differential equations [68], partial differential equations (PDEs) [9] and differential algebraic equations [10]. When model observables are linear with respect to the model parameters, the alphabetic optimality criteria are often used [1113]. For example, A-optimality to minimize the average variance of parameter estimates, D-optimality to maximize the differential Shannon entropy, or G-optimality to minimize the maximum variance of model predictions. These criteria have been developed in both Bayesian and non-Bayesian settings [11,12,1417].

In this paper, we focus attention on Bayesian methods for optimal experimental design (OED) that can be applied to both linear and nonlinear models [1820]. Specifically, we pursue OEDs, which are optimal for inferring model parameters on finite-dimensional spaces from experimental data observed at a set of sensor locations. In the context of OED for inference, analogs of the alphabetic criterion, for linear models, have also been applied to nonlinear models [9,14,21,22]. In certain situations, for example infinite-dimensional problems (random variables are random fields) or problems with computational expensive models, OED based upon linearizations of the model response, and Laplace (Gaussian) approximations of the posterior distribution have been necessary [14,21]. In other settings, non-Gaussian approximations of the posterior have also been pursued [1720].

This manuscript presents a new approach for OED based upon consistent Bayesian inference, introduced in Ref. [23]. We adopt an approach for OED similar to the approach in Ref. [24] and seek an OED that maximizes the expected information gain from the prior to the posterior over the set of possible observational densities. Although our OED framework is Bayesian in nature, this approach is fundamentally different from the statistical Bayesian methods mentioned previously. The aforementioned Bayesian OED methods use what we will refer to as the classical/statistical Bayesian approach for stochastic inference (e.g., see Ref. [25]) to characterize posterior densities that reflect an assumed error model. In contrast, consistent Bayesian inference assumes a probability density on the observations is given and produces a posterior density that is consistent with the model and the data in the sense that the push-forward of the posterior (through the computational model) matches the observed density almost everywhere. We direct the interested reader to Ref. [23] for a discussion on the differences between the consistent and statistical Bayesian approaches. Consistent Bayesian inference has some connections with measure-theoretic inference [26], which was used for OED in Ref. [27], but the two approaches make different assumptions and therefore typically give different solutions to the stochastic inverse problem.

The consistent Bayesian approach is appealing for OED since it can be used in an offline-online mode. Consistent Bayesian inference requires an estimate of the push-forward of the prior, which although expensive can be computed offline or obtained from archival simulation data. Once the push-forward of the prior is constructed, the posterior density can be approximated cheaply. Moreover, this push-forward of the prior does not depend on the density on the observations, which enables a computationally efficient approach for solving multiple stochastic inverse problems for different densities on the observations. This can significantly reduce the cost of computing the expected information gain if the set of candidate observation is known a priori.

The main objectives in this paper are to derive an OED formulation using the consistent Bayesian framework and to present a computational strategy to estimate the expected information gained for an experimental design. The pursuit of a computationally efficient approach for coupling our OED method with continuous optimization techniques is an intriguing topic that we leave for future work. Here, we consider batch design over a discrete set of possible experiments. Batch design, also known as open-loop design, involves selecting a set of experiments concurrently such that the outcome of any experiment does not affect the selection of the other experiments. Such an approach is often necessary when one cannot wait for the results of one experiment before starting another, but is limited in terms of the number of observations we can consider.

The remainder of this paper is outlined as follows: In Sec. 2, we summarize the consistent Bayesian method for solving stochastic inverse problems. In Sec. 3, we discuss the information content of an experiment and present our OED formulation based upon expected information gain. During the process of defining the expected information gain of a given experimental design, care must be taken to ensure that the model can predict all of our potential observed data. In Sec. 4, we discuss situations for which this assumption is violated and means for avoiding these situations. Numerical examples are presented in Sec. 5 and concluding remarks are provided in Sec. 6.

## A Consistent Bayes Formulation for Stochastic Inverse Problems

We are interested in experimental designs, which are optimal for inferring model parameters from experimental data. Inferring model parameters for a single design and realization of experimental data is a fundamental component of producing such optimal designs. In this section, we summarize the consistent Bayes method for parametric inference, originally presented in Ref. [23]. Although Bayesian in nature, the consistent Bayesian approach differs significantly from its classical Bayesian counterpart [25,28,29], which was used for OED in Refs. [12,14,17,21], [24], and [3032]. We refer the interested reader to Ref. [23] for a full discussion of these differences.

### Notation, Assumptions, and a Stochastic Inverse Problem.

Let M(Y, λ) denote a deterministic model with solution Y (λ) that is an implicit function of model parameters $λ∈Λ⊂ℝn$. The set Λ represents the largest physically meaningful domain of parameter values, and, for simplicity, we assume that Λ is compact. In practice, modelers are often only concerned with computing a relatively small set of quantities of interest (QOI), ${Qi(Y)}i=1m$, where each Qi is a real-valued functional dependent on the model solution Y. Since Y is a function of parameters λ, so are the QOI and we write Qi(λ) to make this dependence explicit. Given a set of QOI, we define the QOI map $Q(λ):=(Q1(λ),…,Qm(λ))⊤:Λ→D⊂ℝm$ where $D:=Q(Λ)$ denotes the range of the QOI map.

Assume $(Λ,BΛ,μΛ)$ and $(D,BD,μD)$ are measure spaces. We assume $BΛ$ and $BD$ are the Borel σ-algebras inherited from the metric topologies on $ℝn$ and $ℝm$, respectively. The measures μΛ and $μD$ are volume measures.

We assume that the QOI map Q is at least piecewise smooth implying that Q is a measurable map between the measurable spaces $(Λ,BΛ)$ and $(D,BD)$. For any $A∈BD$, we then have
$Q−1(A)={λ∈Λ | Q(λ)∈A}∈BΛ, and Q(Q−1(A))=A$

Furthermore, $B⊆Q−1(Q(B))$ for any $B∈BΛ$, although in most cases, $B≠Q−1(Q(B))$ even when n = m.

Finally, we assume that an observed probability measure, $PDobs$, is given on $(D,BD)$ and is absolutely continuous with respect to $μD$, which implies it can be described in terms of an observed probability density, $πDobs$. The stochastic inverse problem is then defined as determining a probability measure, PΛ, described as a probability density, πΛ, such that the push-forward measure agrees with $PDobs$. We use $PDQ(PΛ)$ to denote the push-forward of PΛ through Q(λ), i.e.,
$PDQ(PΛ)(A)=PΛ(Q−1(A))$

for all $A∈BD$. Using this notation, a solution to the stochastic inverse problem is defined formally as follows:

Definition 1(Consistency). Given a probability measure$PDobs$on$(D,BD)$that is absolutely continuous with respect to$μD$and admits a density$πDobs$, the stochastic inverse problem seeks a probability measure PΛ on$(Λ,BΛ)$that is absolutely continuous with respect to μΛ and admits a probability density πΛ such that the subsequent push-forward measure induced by the map, Q(λ), satisfies
$PΛ(Q−1(A))=PDQ(PΛ)(A)=PDobs(A)$
(1)

for any$A∈BD$. We refer to any probability measure PΛ that satisfies Eq.(1)as a consistent solution to the stochastic inverse problem.

Clearly, a consistent solution may not be unique, i.e., there may be multiple probability measures that are consistent in the sense of Definition 1. This is analogous to a deterministic inverse problem where multiple sets of parameters may produce the observed data. A unique solution may be obtained by imposing additional constraints or structure on the stochastic inverse problem. In this paper, such structure is obtained by incorporating prior information to construct a unique Bayesian solution to the stochastic inverse problem.

### A Bayesian Solution to the Stochastic Inverse Problem.

Following the Bayesian philosophy [33], we introduce a prior probability measure $PΛprior$ on $(Λ,BΛ)$ that is absolutely continuous with respect to μΛ and admits a probability density $πΛprior$. The prior probability measure encapsulates the existing knowledge about the uncertain parameters.

Assuming that Q is at least measurable, then the prior probability measure on Λ, $PΛprior$, and the map, Q, induce a push-forward measure $PDQ(prior)$ on $D$, which is defined for all $A∈BD$
$PDQ(prior)(A)=PΛprior(Q−1(A))$
(2)
We utilize the following expression for the posterior:
$PΛpost(B):={PΛprior(B)PDobs(Q(B))PDQ(prior)(Q(B)),if PΛprior(B)>00,otherwise$
(3)
which we describe in terms of a probability density given by
$πΛpost(λ)=πΛprior(λ)πDobs(Q(λ))πDQ(prior)(Q(λ)), λ∈Λ$
(4)

We note that if $πDQ(prior)=πDobs$, i.e., if the prior solves the stochastic inverse problem, then the posterior density will be equal to the prior density.

It was recently shown in Ref. [23] that the posterior given by Eq. (3) defines a consistent probability measure using a contour σ-algebra. When interpreted as a particular iterated integral of Eq. (4), the posterior defines a probability measure on $(Λ,BΛ)$ in the sense of Definition 1, i.e., the push-forward of the posterior matches the observed probability density. Approximating the posterior density using the consistent Bayesian approach only requires an approximation of the push-forward of the prior probability on the model parameters, which is fundamentally a forward propagation of uncertainty. While numerous approaches have been developed in recent years to improve the efficiency and accuracy of the forward propagation of uncertainty using computational models, in this paper, we only consider the most basic of methods, namely Monte Carlo sampling, to sample from the prior. We evaluate the computational model for each of the samples from the prior and use a standard Kernel density estimator (KDE) [34] to approximate the push-forward of the prior.

Given the approximation of the push-forward of the prior, we can evaluate the posterior at any point λ ∈Λ if we compute Q(λ). This provides several possibilities for interrogating the posterior. In Sec. 3.2, we compute Q(λ) on a uniform grid of points to visualize the posterior after we compute the push-forward of the prior. This does require additional model evaluations, but visualizing the posterior is rarely required and only useful for illustrative purposes in 1 or 2 dimensions. More often, we are interested in obtaining samples from the posterior. This is also demonstrated in Sec. 3.2 where the samples from the prior are either accepted or rejected using a standard rejection sampling procedure. For a given λ, we compute the ratio $πΛpost(λ)/(MπΛprior(λ))$, where M is an estimate of the maximum of the ratio over Λ, and compare this value with a sample, η, drawn from a uniform distribution on (0,1). If the ratio is larger than η, then we accept the sample. We apply the accept/reject algorithm to the samples from the prior and therefore the samples from the posterior are a subset of the samples used to compute the push-forward of the prior. Since we have already computed Q(λ) for each of these samples, the computational cost to select a subset of the samples for the posterior is minimal. However, in the context of OED, we are primarily interested in computing the information gained from the prior to the posterior, which only involves integrating with respect to the prior (see Sec. 3.1) and does not require additional model evaluations or rejection sampling.

In practice, we prefer to use data that are sensitive to the parameters since otherwise it is difficult to infer useful information about the uncertain parameters. Specifically, if m ≤ n and the Jacobian of Q is defined almost everywhere in Λ and is full rank almost everywhere, then the push-forward volume measure $μD$ is absolutely continuous with respect to the Lebesgue measure [23].

For the rest of this work, we maintain the following assumptions needed to produce a unique consistent solution to the stochastic inverse problem:

• (A1) We have a mathematical model and a description of our prior knowledge about the model input parameters,

• (A2) The data exhibit sensitivity to the parameters almost everywhere in Λ; hence, we use the Lebesgue measure μ as the volume measure on the data space,

• (A3) The observed density is absolutely continuous with respect to the push-forward of the prior.

The assumption concerning the absolute continuity of the observed density with respect to the prior is essential to define a solution to the stochastic inverse problem [23]. While this assumption may appear rather abstract, it simply assures that the prior and the model can predict, with nonzero probability, any event that we have observed. Since the observed density and the model are assumed to be fixed, this is only an assumption on the prior.

In the remainder of this work, we focus on quantifying the value of these posterior densities. We use the Kullback–Leibler (KL) divergence [35,36] to measure the information gained about the parameters from the prior to the posterior. We compute the expected information gain of a given set of QOI (a given experimental design), and then determine the OED to deploy in the field.

## The Information Content of an Experiment

We are interested in finding the OED for inferring model input parameters. Conceptually, a design is informative if the posterior distribution of the model parameters is significantly different from the prior. To quantify the information gain of a design, we use the Kullback–Leibler divergence [36] as a measure of the difference between a prior and posterior distribution. While the KL divergence is by no means the only way to compare two probability densities, it does provide a reasonable measure of the information gained in the sense of Shannon information [37] and is commonly used in Bayesian OED [30]. In this section, we discuss how to compute the KL divergence and define our OED formulation based upon expected information gain over a specific space of possible observed densities.

### Information Gain: Kullback–Leibler Divergence.

Suppose we are given a description of the uncertainty on the observed data in terms of a probability density $πDobs$. This produces a unique solution to the stochastic inverse problem $PΛpost$ that is absolutely continuous with respect to the Lebesgue measure μΛ [23] and admits a probability density, $πΛpost$. The KL divergence of the posterior from the prior (information gain), denoted IQ, is given by
$IQ(πΛprior:πΛpost):=∫ΛπΛpost log(πΛpostπΛprior)dμΛ$
(5)
Note that because $πΛprior$ is fixed, IQ is simply a function of the posterior
$IQ(πΛprior:πΛpost)=IQ(πΛpost)$
(6)
and from Eq. (4), the posterior is a function of the observed density. Therefore, we write IQ as a function of the observed density
$IQ(πΛpost)=IQ(πDobs)$
(7)

The observation that IQ is a function of only $πDobs$ allows us to define the expected information gain in Sec. 3.3 based on a specific space of observed densities.

Given a high dimensional parameter space, it may be computationally infeasible to accurately approximate the integral in Eq. (5). For example, a multivariate normal density with unit variance in 100-dimensions has a maximum value of $(1/2π)100≈1×10−40$. However, we may write this integral in terms of densities on the data space evaluated at Q(λ) as follows:
$IQ(πΛpost)=∫ΛπΛpost(λ)log(πΛpost(λ)πΛprior(λ))dμΛ=∫ΛπΛprior(λ)πDobs(Q(λ))πDQ(prior)(Q(λ))log(πDobs(Q(λ))πDQ(prior)(Q(λ)))dμΛ=∫ΛπDobs(Q(λ))πDQ(prior)(Q(λ)) log(πDobs(Q(λ))πDQ(prior)(Q(λ)))dPΛprior$
(8)

where the second equality comes from a simple substitution using Eq. (4). Given a set of samples from the prior, we only need to compute the push-forward of the prior in the data space to approximate IQ. This observation provides an efficient method for approximating IQ given a high dimensional parameter space and a low dimensional data space. In fact, we found it convenient to use Eq. (8) whenever the prior is not uniform. In the consistent Bayesian formulation, we evaluate the model at the samples generated from the prior to estimate the push-forward of the prior. It is a computational advantage to also use these samples to integrate with respect to the prior rather than integrating with respect to the volume measure, which would require additional model evaluations.

### A Motivating Nonlinear System.

Consider the following two-component nonlinear system of equations with two parameters introduced in Ref. [26]:
$λ1x12+x22=1x12−λ2x22=1$

The first QOI is the second component, i.e., Q1(λ) = x2(λ). The parameter ranges are given by λ1 ∈ [0.79, 0.99] and $λ2∈[1−4.50.1,1+4.50.1]$, which are chosen as in Ref. [26] to induce an interesting variation in the QOI. We assume the observed density on Q1 is a truncated normal distribution with mean 0.3 and standard deviation of 0.01, see Fig. 1 (right).

Fig. 1
Fig. 1
Close modal

We generate 40,000 samples from the uniform prior and use a Kernel density estimator to construct an approximation to the resulting push-forward density, see Fig. 1 (right). Then we use Eq. (4) to construct an approximation to the posterior density using the same 40,000 samples, see Fig. 1 (left), and a simple accept/reject algorithm to generate a set of samples from the posterior, see Fig. 1 (middle). We propagate this set of samples from the posterior through the model and approximate the resulting push-forward of the posterior density using a KDE. In Fig. 1 (right) we see that the push-forward of the posterior agrees quite well with the observed density. Notice the support of the posterior lies in a relatively small region of the parameter space. The information gain from this posterior is $IQ1(πDobs)≈2.015$.

Next, we consider a different QOI to use in the inverse problem, and compare the support of its posterior to the one we just observed. Specifically consider
$Q2(λ)=x1$

We assume the observed density on Q2 is a truncated normal distribution with mean 1.015 and standard deviation of 0.01. We approximate the push-forward density and the posterior using the same 40,000 samples and again generate a set of samples from the posterior and propagate these samples through the model to approximate the push-forward of the posterior, see Fig. 2.

Fig. 2
Fig. 2
Close modal

Although both Q1 and Q2 have the same standard deviation in their observed densities, clearly the two QOI produce very different posterior densities. The posterior corresponding to data from Q2 has a much larger region of support within the parameter space compared to that of the posterior corresponding to Q1. This is quantified with the information gain from this posterior $IQ2(πDobs)≈0.466$. Given these two maps, Q1 and Q2, and the specified observed data on each of these data spaces, the data Q1 is more informative of the parameters than the data Q2.

Next, we consider using the data from both Q1 and Q2, Q: Λ → (Q1, Q2), with the same means and standard deviations as specified earlier. Again, we approximate the push-forward density and the posterior using the same 40,000 samples, see Fig. 3. With the information from both Q1 and Q2, we see a substantial decrease in the support of the posterior density. Intuitively, the support of the posterior using both Q1 and Q2 is the support of the posterior using Q1 intersected with the support of the posterior using Q2. This is quantified in the information gain of this posterior $IQ(πDobs)≈2.98$.

Fig. 3
Fig. 3
Close modal

In the scenario in which we can afford to gather data on both Q1 and Q2, we benefit greatly in terms of reducing the uncertainties on the model input parameters. However, suppose we could only afford to gather one of these QOI in the field. Based on the information gained from each posterior, Q1 is more informative about the parameters than Q2. However, consider a scenario in which the observed data have different means in both Q1 and Q2. Due to the nonlinearities of the maps, it is not necessarily true that Q1 is still more informative than Q2. If we do not know the mean of the data for either Q1 or Q2, then we want to determine which of these QOI we expect to produce the most informative posterior.

### Expected Information Gain.

Optimal experimental design must select a design before experimental data become available. In the absence of data, we use the simulation model to quantify the expected information gain of a given experimental design. Let $O$ denote the space of densities over $D$. We want to define the expected information gain as some kind of average over this density space in a meaningful way. However, this is far too general of a space to use to define the expected information gain. This space includes densities that are unlikely to be observed in reality. Therefore, we restrict $O$ to be a space more representative of densities that may be observed in reality.

With no experimental data available to specify an observed density on a single QOI, we assume the density is a truncated Gaussian with a standard deviation determined by some estimate of the measurement instrument error. With Gaussians of (possibly) varying standard deviations specified for each QOI, this defines the shape of the observed densities we consider. We let $OD$ denote the space of all densities of this shape centered in $D=Q(Λ)$
$OD={N̂(q,σ2):q∈D}$
(9)

where $N̂(q,σ2)$ is a truncated Gaussian function with mean q and standard deviation σ. More details of this definition of $OD$ are addressed in Sec. 4. We can easily generalize our description of $O$. For example, we could also consider the standard deviation of the observed data to be uncertain, in which case we would also average over some interval of possible values for σ. However, in this work, we only vary the center of the Gaussian densities.

Remark 1. We can restrict $O$ in other ways as well. For example, if we expect the uncertainty in each QOI to be described by a uniform density, then we define the restriction on $O$ accordingly. This choice of characterization of the observed density space is largely dependent on the application. The only limitation is that we require the measure specified on the observed density space to be defined in terms of the push-forward measure, $PDQ(prior)$, as described later. In Sec. 5.2, we describe one approach for defining a restricted observed density space where the observed density of each QOI has a Gaussian profile and the standard deviations are functions of the magnitudes of each QOI.

The restriction of possible $πDobs$ to this specific space of densities allows us to represent each density uniquely with a single point $q∈D$. Based on our prior knowledge of the parameters and the sensitivities of the map Q, the model informs us that some data are more likely to be observed than other data; this is seen in the plot of $πDQ(prior)$ in Fig. 3 (upper left). This implies we do not want to average over $D$ with respect to μ or $μD$, but rather with respect to the push-forward of the prior on D, $PDQ(prior)$. This respects the prior knowledge of the parameters and the sensitivity information provided by the model. We define the expected information gain, denoted E(IQ), as just described
$E(IQ):=∫DIQ(q)πDQ(prior)(q)dμ=∫DIQ(q)dPDQ(prior)$
(10)
From Eq. (5), IQ itself is defined in terms of an integral. The expanded form for E(IQ) is then an iterated integral
$E(IQ)=∫D∫ΛπΛpost(λ;q)log(πΛpost(λ;q)πΛprior(λ))dμΛdPDQ(prior)$
(11)

where we make explicit that $πΛpost$ is a function of the observed density and, by our restriction of the space of observed densities in Eq. (9), therefore a function of $q∈D$. We utilize Monte Carlo sampling to approximate the integral in Eq. (10) as described in Algorithm 1.

Algorithm 1

Approximating the expected information gain of an experiment

 1. Given a set of samples from the prior density: $λ(i), i=1,…,N$⁠; 2. Given a set of samples from the push-forward density: $q(j)=Q(λ(j)), j=1,…,N$ ; 3. Construct an observed density centered at each q(j). 4. For j = 1,…, M approximate IQ(q(j)) using Eq. (8) $IQ(q(j))≈μΛ(Λ)N∑i=1NπDobs(Q(λ(i)))πDQ(prior)(Q(λ(i))) log(πDobs(Q(λ(i)))πDQ(prior)(Q(λ(i))))$ 5. Compute $E(IQ)≈(1/M)∑j=1MIQ(q(j))$⁠;
 1. Given a set of samples from the prior density: $λ(i), i=1,…,N$⁠; 2. Given a set of samples from the push-forward density: $q(j)=Q(λ(j)), j=1,…,N$ ; 3. Construct an observed density centered at each q(j). 4. For j = 1,…, M approximate IQ(q(j)) using Eq. (8) $IQ(q(j))≈μΛ(Λ)N∑i=1NπDobs(Q(λ(i)))πDQ(prior)(Q(λ(i))) log(πDobs(Q(λ(i)))πDQ(prior)(Q(λ(i))))$ 5. Compute $E(IQ)≈(1/M)∑j=1MIQ(q(j))$⁠;

Table 1

(Stationary convection-diffusion: Uncertain source amplitude) The top five experimental designs chosen using the full set of 5000 samples. For each of these designs, we compute $E(IQz)$ for 50, 200, 1000, and 5000 samples. Notice the change in $E(IQz)$ for a given design decreases as we increase to 5000 samples.

Design location5020010005000
(0.558, 0.571)2.7582.7672.8152.826
(0.561, 0.546)2.7522.7622.8092.820
(0.582, 0.574)2.7292.7362.7822.793
(0.549, 0.570)2.7282.7352.7812.792
(0.593, 0.596)2.7262.7332.7792.790
Design location5020010005000
(0.558, 0.571)2.7582.7672.8152.826
(0.561, 0.546)2.7522.7622.8092.820
(0.582, 0.574)2.7292.7362.7822.793
(0.549, 0.570)2.7282.7352.7812.792
(0.593, 0.596)2.7262.7332.7792.790

Algorithm 1 appears to be a computationally expensive procedure since it requires solving M stochastic inverse problems and, as noted in Ref. [23], approximating $πDQ(prior)$ can be expensive. In Ref. [23] and in this paper, we use kernel density estimation techniques to approximate $πDQ(prior)$, which does not scale well as the dimension of $D$ increases [34]. On the other hand, for a given experimental design, we only need to compute this approximation once, as each IQ in step 4 of Algorithm 1 is computed using the same prior and map Q and, therefore, the same $πDQ(prior)$. In other words, the fact that the consistent Bayes method only requires approximating the push-forward of the prior implies that this information can be used to approximate posteriors for different observed densities without requiring additional model evaluations. This significantly improves the computational efficiency of the consistent Bayesian approach in the context of OED. We leverage this computational advantage throughout this paper by considering a discrete set of designs, which allows us to compute the push-forward for all of the candidate designs simultaneously. Utilizing a continuous design space might require computing the push-forward for each iteration of the optimization algorithm, since the designs (locations of the observations) are not known a priori. The additional model simulations required to compute the push-forward of the prior at new design points might be intractable if the number of iterations is large; however, the need for new simulations may be avoided if the new observations can be extracted from archived state-space data. For example, if one stores the finite element solutions of a PDE at all samples of the prior at the first iteration of the design optimization, one can evaluate observations at new design locations, which are functionals of this PDE solution, via interpolation using the finite element basis.

### Defining the Optimal Experimental Design.

We are now in a position to define our OED formulation. Recall that our experimental design is defined as the set of QOI computed from the model and we seek the optimal set of QOI to deploy in the field. Given a physics-based model, prior information on the model parameters, a space of potential experimental designs, and a generic description of the uncertainties for each QOI, we define our OED as follows.

Definition 2(OED). Let$Q$represent the design space, i.e., the space of all possible experimental designs, and$Qz∈Q$be a specific design. Then, the OED is$Qz∈Q$that maximizes the expected information gain
$Qopt:=argmaxQz∈QE(IQz)$
(12)

As previously mentioned, the focus in this paper is on the utilization of the consistent Bayesian methodology within the OED framework, so we do not explore different approaches for solving the optimization problem given by Definition 2 and simply find the optimal design over a discrete set of candidate designs.

Remark 2. Consistent Bayesian inference is potentially well suited to finding OED in continuous design spaces. Typically, OED based upon statistical Bayesian methods uses Markov chain Monte Carlo methods to characterize the posterior distribution. Markov chain Monte Carlo methods do not provide a functional form for the posterior but rather only provide samples from the posterior. Consequently, gradient-free or stochastic gradient-based optimization methods must be used to find the optimal design. In contrast, consistent Bayesian inference provides a functional form for the posterior, which allows the use of more efficient gradient-based optimizers. Exploring the use of more efficient continuous optimization procedures will be the subject of future work.

## Infeasible Data

The OED procedure proposed in this manuscript is based upon consistent Bayesian inference, which requires that the observed measure is absolutely continuous with respect to the push-forward measure induced by the prior and the model (assumption A3). In other words, any event that we observe with nonzero probability will be predicted using the model and prior with nonzero probability. During the process of computing E(IQ), it is possible that we violate this assumption. Specifically, depending on the mean and variance of the observational density, we may encounter $πDobs∈OD$ such that $∫DπDobsdμ<1$, i.e., support of $πDobs$ extends beyond the range of the map Q, see Fig. 4 (upper right). In this section, we discuss the causes of infeasible data and options for avoiding infeasible data when estimating an optimal experimental design.

Fig. 4
Fig. 4
Close modal

### Infeasible Data and Consistent Bayesian Inference.

When inferring model parameters using consistent Bayesian inference, the most common cause for infeasible data is that the model being used to estimate the OED is inadequate. That is, the deviation between the computational model and reality is large enough to prohibit the model from predicting all of the observational data. The deviation between the model prediction and the observational data is often referred to as model structure error and can often be a major source of uncertainty. This is an issue of most if not all inverse parameter estimation problems [29]. Recently, there has been a number of attempts to quantify this error (e.g., see Ref. [28]); however, such approaches are beyond the scope of this paper. In the following, we will assume that the model structure error does not prevent the model from predicting all the observational data.

### Infeasible Data and Optimal Experimental Design.

To estimate an approximate OED, we must quantify the expected information gain of a given experimental design (see Sec. 3.3). The expectation is over all possible normal observation densities with mean $q∈D$ and variance σ, defined by the space (9). When the support of $D$ is bounded, these densities may produce infeasible data. The effect of this violation increases as q approaches the boundary of $D$.

To remedy this violation of (A3), we must modify the set of observational densities. In this paper, we choose to normalize $πDobs$ over $D$. We redefine the observed density space $OD$ so that (A3) holds for each density in the space
$OD={N̂(q,σ2)Cq:q∈D}$
(13)
where $N̂(q,σ2)$ is a truncated Gaussian function with mean q and standard deviation σ, and Cq is the integral of $N̂(q,σ2)$ over $D$ with respect to the Lebesgue measure on $D$
$Cq=∫DN̂(q,σ2)dμ$
(14)

A similar approach for normalizing Gaussian densities over compact domains was taken in Ref. [38].

### A Nonlinear Model With Infeasible Data.

In this section, we use the nonlinear model introduced in Sec. 3.2 to demonstrate that infeasible data can arise from relatively benign assumptions. Suppose the observed density on Q1 is a truncated normal distribution with mean 0.3 and standard deviation of 0.04. In this one-dimensional (1D) data space, this observed density is absolutely continuous with respect to the push-forward of the prior on Q1, see Fig. 5 (left). Next, suppose the observed density on Q2 is a truncated normal distribution with mean 0.982 and standard deviation of 0.01. Again, in this new one-dimensional data space, this observed density is absolutely continuous with respect to the push-forward of the prior on Q2, see Fig. 5 (right). Both of these observe densities are dominated by their corresponding push-forward densities, i.e., the model can reach all of the observed data in each case.

Fig. 5
Fig. 5
Close modal

However, consider the data space defined by both Q1 and Q2 and the corresponding push-forward and observed densities on this space, see Fig. 4. The nonrectangular shape of the combined data space is induced by the nonlinearity in the model and the correlations between Q1 and Q2. As we see in Fig. 4, the observed density using the product of the one-dimensional Gaussian densities is not absolutely continuous with respect to the push-forward density on (Q1, Q2), i.e., the support of $πDobs$ extends beyond the support of $πDQ(prior)$. Referring to Eq. (13), we normalize this observed density over $D$, see Fig. 4 (right). Now that the new observed density obeys the assumptions needed, we could solve the stochastic inverse problem as described in Sec. 2.

### Computational Considerations.

The main computational challenge in the consistent Bayesian approach is the approximation of the push-forward of the prior. Following Ref. [23], we use Monte Carlo sampling for the forward propagation of uncertainty. While the rate of convergence is independent of the number of parameters (dimension of Λ), the accuracy in the statistics for the QOI may be relatively poor unless a large number of samples can be taken. Alternative approaches based on surrogate models can significantly improve the accuracy, but are generally limited to small number of parameters. We also employ kernel density estimation techniques to construct a nonparametric approximation of the push-forward density, but it is well known that these techniques do not scale well with the number of observations (dimension of $D$) [34].

Next, we address the computational issue of normalizing $N̂(q,σ2)$, i.e., $πDobs$, over $D$. From the plot of $πDQ(prior)$ in Fig. 4 (left), it is clear that the data space may be a complex region. Normalizing $πDobs$, as in Fig. 4 (right), over $D$ would be computationally expensive. Fortunately, the consistent Bayesian approach provides a means to avoid this expense. Note that from Eq. (3), we have
$PΛpost(Λ)=PΛprior(Λ)PDobs(Q(Λ))PDQ(prior)(Q(Λ))$
(15)
where $PΛprior(Λ)=PDQ(prior)(Q(Λ))=1$ which implies
$PΛpost(Λ)=PDobs(Q(Λ))$
(16)
Therefore, normalizing $πDobs$ over $D$ is equivalent to solving the inverse problem and then normalizing $π̃Λpost$ (where we use the tilde over π to indicate this function does not integrate to 1 because we have violated (A3)) over Λ. Although Λ may not always be a generalized rectangle, (A1) implies we have a clear definition of Λ and therefore can efficiently integrate $π̃Λpost$ over Λ and then normalize $π̃Λpost$ by
$πΛpost=π̃Λpost∫Λπ̃ΛpostdμΛ$
(17)
In fact, this normalization factor can be estimated without additional model evaluations and without using the values of the prior or the posterior, which may not be usable in high-dimensional spaces. We observe that
$PΛpost(Λ)=∫ΛπΛpost dμΛ=∫ΛπDobs(Q(λ))πDQ(prior)(Q(λ)) dPΛprior$

Thus, we can use the values of $πDobs$ and $πDQ(prior)$ computed for the samples generated from the prior, which were used to estimate the push-forward of prior, to integrate $πDobs/πDQ(prior)$ with respect to the prior.

## Numerical Examples

In this section, we consider several models of physical systems. First, we consider a stationary convection–diffusion model with a single uncertain parameter controlling the magnitude of the source term. Next, we consider a transient transport model with a two-dimensional parameter space determining the location of the source of a contaminant. Then, we consider an inclusion problem in computational mechanics where two uncertain parameters control the shape of the inclusion. Finally, we consider a high-dimensional example of single-phase incompressible flow in porous media where the uncertain permeability field is given by a Karhunen–Loéve expansion [39].

In each example, we have a parameter space Λ, a set of possible QOI, and a specified number of QOI we can afford to gather during the experiment. This in turn defines a design space $Q$ and we let $Qz∈Q$ represent a single experimental design and $Dz=Qz(Λ)$ the corresponding data space. For each experimental design, we let σz represent the standard deviations defined by the uncertainties in each QOI that compose Qz and $ODz$ representing the observed density space.

All of these examples have continuous design spaces, so we approximate the OED by selecting the OED from a large set of candidate designs. This approach was chosen because it is much more efficient to perform the forward propagation of uncertainty using random sampling only once and to compute all of the candidate measurements for each of these random samples. Alternatively, one could pursue a continuous optimization formulation, which would require a full forward propagation of uncertainty for each new design. As mentioned in Sec. 3.4, one could limit the number of designs using a gradient-based or Newton-based optimization approach, but this is beyond the scope of this paper.

### Stationary Convection–Diffusion: Uncertain Source Amplitude.

In this section, we consider a convection–diffusion problem with a single uncertain parameter controlling the magnitude of a source term. This example serves to demonstrate that the OED formulation gives intuitive results for simple problems.

#### Problem Setup.

Consider a stationary convection–diffusion model on a square domain
${−D∇2u+∇·(vu)=S,x∈Ω∇u·n=0,x∈ΓN⊂∂Ωu=0,x∈ΓD⊂∂Ω$
(18)
with
$S(x)=A exp(−||xsrc−x||22h2)$

where Ω = [0, 1]2, u is the concentration field, the diffusion coefficient D = 0.01, the convection vector v = [1, 1], and S is a Gaussian source with the following parameters: xsrc is the location, A is the amplitude, h is the width. We impose homogeneous Neumann boundary conditions on ΓN (right and top boundaries) and homogeneous Dirichlet conditions on ΓD (left and bottom boundaries). For this problem, we choose xsrc = [0.5, 0.5], and h = 0.05. We let A be uncertain within [50, 150]; thus, the parameter space for this problem is Λ = [50, 150]. Hence, our goal is to gather some limited amount of data that provide the best information about the amplitude of the source, i.e., reduces our uncertainty in A. To approximate solutions to the PDE in Eq. (18), given a source amplitude A, we use a finite element discretization with continuous piecewise bilinear basis functions defined on a uniform (25 × 25) spatial grid.

#### Results.

We assume that we have limited resources for gathering experimental data; specifically, we can only afford to place one sensor in the domain to gather a single concentration measurement. Our goal is to place this single sensor in Ω to maximize the expected information gained about the amplitude of the source. We discretize Ω using 2000 uniform random points, which produces a design space with 2000 possible experimental designs. For this problem, we let the uncertainty in each QOI be described by a truncated Gaussian profile with a fixed standard deviation of 0.1. This produces observed density spaces, $ODz$, as described in Eq. (13).

We generate 5000 uniform samples from the prior and simulate measurements of each QOI for each of these 5000 samples. We consider approximate solutions to the OED problem using subsets of the 5000 samples of size 50, 200, 1000, and 5000. For each experimental design, we calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 6. Notice the expected information gain is greatest near the center of the domain (near the location of the source) and in the direction of the convection vector away from the source. This result matches intuition, as we expect data gathered in regions of the domain that exhibit sensitivity to the parameters to produce high expected information gains.

Fig. 6
Fig. 6
Close modal

We note that, for this example, a sufficiently accurate approximation to the design space and the OED is obtained using only 50 samples corresponding to 50 model evaluations. In Table 1, we show the top five experimental designs (computed using the full set of 5000 samples) and corresponding $E(IQz)$ for each set of samples.

### Time-Dependent Diffusion: Uncertain Source Location.

In this section, we compare results from a statistical Bayesian formulation of OED to the formulation described in this paper. Specifically, we consider the model in Ref. [24] where the author uses a classical Bayesian framework for OED to determine the optimal placement of a single sensor that maximizes the expected information about the location of a contaminant source.

#### Problem Setup.

Consider a contaminant transport model on a square domain
${∂u∂t=∇2u+S,x∈Ω,t>0∇u·n=0,x∈∂Ω,t>0u=0,x∈Ω,t=0$
(19)
with
$S(x)={s2πh2exp(−||xsrc−x||22h2),if 0≤t<τ0,if t≥τ$

where Ω = [0, 1]2, u is the space–time concentration field, we impose homogeneous Neumann boundary conditions along with a zero initial condition, and S is a Gaussian source with the following parameters: xsrc is the location, s is the intensity, h is the width, and τ is the shutoff time.

Our goal is to gather some limited amount of data that provide the best information about the location of the source, i.e., reduce our uncertainty in xsrc. For this problem, we choose s = 2.0, h = 0.05, and τ = 0.3 and let xsrc be uncertain within [0, 1]2 such that Λ = [0, 1]2. To approximate solutions to the PDE in Eq. (19) given a location of S, i.e., a given xsrc, we use a finite element discretization with continuous piecewise bilinear basis functions defined on a uniform (25 × 25) spatial grid and backward Euler time integration with a step size Δt = 0.004 (100 time steps).

#### Results.

We assume that we have limited resources for gathering experimental data, specifically, we can only afford to place one sensor in the domain and can only gather a single concentration measurement at time t = 0.24. Our goal is to place this single sensor in Ω to maximize the expected information gained about the location of the contaminant source. For simplicity, we discretize Ω using an 11 × 11 regular grid of points, which produces a design space with 121 possible experimental designs. We let the uncertainty in each QOI be described by a Gaussian profile with a standard deviation that is a function of the magnitude of the QOI, i.e.,
$σi=0.1+0.1|qi| for i=1…M$
(20)
where M is the dimension of the data space. This produces observed density spaces, $ODz$, that consist of truncated Gaussian functions with varying standard deviations
$ODz={N̂(q,(σ(q))2)Cq:q∈Dz}$
(21)

We generate 5000 uniform samples from the prior and simulate measurements of each QOI for each of these 5000 samples. We consider approximate solutions to the OED problem using subsets of the 5000 samples of size 50, 200, 1000, and 5000. For each experimental design, we use these data to calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 7. Notice the expected information gain is greatest near the corners of the domain and smallest near the center; this is consistent with Ref. [24]. In Table 2, we show the top five experimental designs, approximated using the full set of 5000 samples, and corresponding $E(IQz)$ for each set of samples.

Fig. 7
Fig. 7
Close modal
Table 2

(Time-dependent diffusion: Uncertain source location) The top five experimental designs chosen using the full set of 5000 samples. For each of these designs, we compute $E(IQz)$ for 50, 200, 1000, and 5000 samples. Notice the change in $E(IQz)$ for a given design decreases as we increase to 5000 samples.

Design location5020010005000
(0, 0)0.6870.8280.7380.741
(1, 1)0.6530.7130.7470.740
(0, 0.1)0.6870.8170.7330.736
(0.1, 0)0.6870.8100.7280.735
(1, 0.9)0.6480.7130.7420.735
Design location5020010005000
(0, 0)0.6870.8280.7380.741
(1, 1)0.6530.7130.7470.740
(0, 0.1)0.6870.8170.7330.736
(0.1, 0)0.6870.8100.7280.735
(1, 0.9)0.6480.7130.7420.735

In Fig. 8 we consider three different posteriors computed using data from the OED approximated using 5000 samples, i.e., data gathered by a sensor placed in the bottom left corner of the domain, where each posterior corresponds to a different possible location of the source. We see varying levels of information gain in these three scenarios, reiterating the point that we choose the OED based on the average of these information gains, E(IQ).

Fig. 8
Fig. 8
Close modal

Remark 3. Although many of the results in this section seem to match our intuition about which measurement locations should produce high expected information gains, this may not always be the case. In particular, we have found that our results can depend on our choice of the variance in the observed densities σ. If σ is chosen to be large relative to the range of a data space, then the posteriors produced as we average over $OD$ are all nearly the same and potentially produce unusually high information gains when the observed densities have substantial support over regions of the data space with very small probability (very small values of the push-forward of the prior). Another way to think of this is the push-forward densities have high entropy and because σ is large, $πDobs$ is very close to uniform and this produces posterior densities with high information gains. If σ is chosen to be small relative to the range of the data space, i.e., if we expect the experiments to be informative, we do not encounter this issue because we are integrating over $D$ with respect to the push-forward measure so most of our potential observed data lie in high probability regions of the data space.

### A Parameterized Inclusion.

In this section, we consider a simple problem in computational mechanics where the precise boundary of an inclusion is uncertain. We parameterize the inclusion and seek to determine the location to place a sensor that will maximize the information gained regarding the shape of the inclusion. We use a linear elastic formulation to model the response of the media to surface forces and measure horizontal stress at each sensor location. We assume that the material properties (Poisson ratio and Young's modulus) are different inside the inclusion and that these properties are known a prior (Fig. 9).

Fig. 9
Fig. 9
Close modal

#### Problem Setup.

Consider a linear elastic plane strain model
${−∇·σ(u)=0,x∈Ω=[−5,5]×[0,2]u=g,x∈ΓD={(x,y)∈Ω | x=0}σ(u)n=t,x∈ΓN=∂Ω∖ΓD$
(22)
where $σ(u)$ is given by the linear elastic constitutive relation
$σ(u)=λ(∇·u)I+μ(∇u+∇uT)$
We express this relation in terms of the Lamé parameters, λ and μ, which are related to the Poisson ratio, ν, and Young's modulus, E, via the following expressions:
$μ=E2(1+ν), λ=Eν(1+ν)(1−2ν)$
Now, assume that there is an inclusion within the media defined by an ellipse
$I={(x,y)∈Ω | 1α(x−x0)2+1β(y−y0)2≤1}$
where x0 = y0 = 0 and α is uniformly distributed on [0.5, 1] and β is uniformly distributed on [0.25, 0.5]. The material properties are assumed to be known and are given by
$ν={0.45,(x,y)∈I,0.3,otherwise,, E={10.0,(x,y)∈I40.0,otherwise$

These material properties were not chosen to emulate any particular materials, just to demonstrate the proposed OED formulation.

Next, let us impose homogeneous Dirichlet boundary conditions on the bottom boundary and stress-free boundary conditions on the sides, and impose a uniform traction in the y-direction along the top boundary ($ttop=(0,−1)T$). And finally assume that we can probe the media and measure the horizontal stress at a given sensor location. We do not want to puncture the inclusion, so we only consider sensor locations outside the bounds on the inclusion. Equation (22) was solved using a finite element discretization with piecewise linear basis functions defined on a uniform 400 × 80 mesh resulting in a system with 64,962 degrees-of-freedom. The computational model is implemented using the Trilinos toolkit [40] and each realization of the model requires approximately 1 s using eight processors.

#### Results.

As in previous examples, we assume that we have limited resources for gathering experimental data; specifically, we can only afford to place one sensor in the domain to gather a single stress measurement. Our goal is to place this single sensor to maximize the expected information gained about the shape of the inclusion. We select 2000 random sensor locations (outside the inclusion bounds), which produce a design space with 2000 possible experimental designs. For this problem, we let the probability density for the QOI be described by a truncated Gaussian profile with a fixed standard deviation of 0.001. We generate 1000 uniform samples from the prior and compute the horizontal stress at each sensor location for each of these 1000 samples.

First, we compare the posterior densities for two sensor locations, (3.5294, 1.3049) and (1.3902, 1.2100), under the assumption that we have already gathered data at these sensor locations. The purpose here is to demonstrate that we obtain different posterior densities and therefore gain different information from each sensor. The first sensor is further from the inclusion; so we expect that the data from the second sensor will constrain the posterior more than the data from the first. In Figs. 10 and 11, we plot the samples from the posterior and the corresponding kernel density estimate of the posterior for the first and second sensor locations, respectively. It is clear that measuring the horizontal stress closer to the inclusion increases the information gained from the prior to the posterior.

Fig. 10
Fig. 10
Close modal
Fig. 11
Fig. 11
Close modal

We consider approximate solutions to the OED problem using subsets of the 1000 samples of size 10, 50, 100, and 1000. For each experimental design, we use this data to calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 12. Notice the expected information gain is greatest near the bottom of the domain near the inclusion and is reasonably symmetric around the inclusion. Also, note that the expected information gain is relatively large in the bottom corners of the domain. This is due to the choice of boundary conditions for the model, which induces a large amount of stress in these corners. In Table 3, we show the top five experimental designs, approximated using the full set of 1000 samples, and corresponding $E(IQz)$ for each set of samples.

Fig. 12
Fig. 12
Close modal

### A Higher-Dimensional Porous Media Example With Uncertain Permeability.

In this section, we consider an example of single-phase incompressible flow in porous media with a Karhunen–Loéve expansion of the uncertain permeability field. The purpose of this example is to demonstrate the OED formulation on a problem with a high-dimensional parameter space and more than one sensor.

#### Problem Setup.

Consider a single-phase incompressible flow model
${−∇·(K(λ)∇p)=0,x∈Ω=(0,1)2p=1,x=0p=0,x=1K∇p·n=0,y=0 and y=1$
(23)
Here, p is the pressure field and K is the permeability field, which we assume is a scalar field given by a Karhunen–Loéve expansion of the log transformation, $Y=log K$, with
$Y(λ)=Y¯+∑i=1∞ξi(λ)ηifi(x,y)$
where $Y¯$ is the mean field. We assume that the mean removed random media is given by a Gaussian process, which implies that ξi are mutually uncorrelated random variables with zero mean and unit variance [41,42]. The eigenvalues, ηi, and eigenfunctions, fi, are computed numerically using the following covariance function:
$CY(x,x¯)=σY2 exp[−(x1−x¯1)22η1−(x2−x¯2)22η2]$

where σY and ηi denote the variance and correlation length in the ith spatial direction, respectively. We assume a correlation length of 0.01 in each spatial direction and truncate the expansion at 100 terms. This choice of truncation is purely for the sake of demonstration. In practice, the expansion is truncated once a sufficient fraction of the energy in the eigenvalues is retained [39,41]. This truncation gives 100 uncorrelated random variables, ξ1,…, ξ100, with zero mean and unit variance which implies $Λ=ℝ100$. To approximate solutions to the PDE in Eq. (23), we use a finite element discretization with continuous piecewise bilinear basis functions defined on a uniform (50 × 50) spatial grid.

#### Results.

In this section, we present approximate solutions to several different design problems. We begin with the familiar problem of choosing a single sensor location within the physical domain. Then, we consider approximating the optimal location of a second sensor given the location of the first sensor. In this way, we solve the greedy OED problem and determine the greedy optimal locations of 1–8 sensors within the physical domain. We then consider solving the exhaustive OED problem where we limit the sensors to 25 locations and consider determining the optimal location of five available sensors.

First, assume that we have limited resources for gathering experimental data, specifically, we can only afford to place one sensor in the domain to gather a single pressure measurement. Our goal is to place this single sensor in Ω to maximize the expected information gained about the amplitude of the source. We discretize Ω using 1301 points on a grid, which produces a design space with 1301 possible experimental designs. For this problem, we let the uncertainty in each QOI be described by a truncated Gaussian profile with a fixed standard deviation of 0.01. This produces observed density spaces, $ODz$, as described in Eq. (13).

We generate 10,000 samples from the prior and simulate measurements of each QOI. We consider approximate solutions to the OED problem using subsets of the 10,000 samples of size 50, 100, 1000, and 10,000. For each experimental design, we calculate $E(IQz)$ using Algorithm 1 and plot $E(IQz)$ as a function of the discretized design space in Fig. 13. Notice the expected information gain is greatest near the top and bottom of the domain away from the left and right edges. This result matches intuition, as we expect data gathered near the left and right edges to be less informative given the Dirichlet boundary condition imposed on those boundaries. We note that, for this example, a sufficiently accurate approximation to the design space and the OED is obtained using only 1000 samples corresponding to 1000 model evaluations. In Table 4, we show the top five experimental designs (computed using the full set of 10,000 samples) and corresponding $E(IQz)$ for each set of samples.

Fig. 13
Fig. 13
Close modal

Next, we consider the greedy OED problem of placing eight sensors within the physical domain. We choose to use all of the available 10,000 samples to solve this problem. In Fig. 14, we see the design space as a function of the previously determined locations of placed sensors. We observe a strong symmetry to this problem, as is expected due to the symmetry of the physical process defined on this domain with the given boundary conditions. In the bottom right of Fig. 14, notice the very small range of the expected information gained indicating the possible values of the expected information gain. This suggests, for this example, we expect there is a limit on the number of useful sensor locations for informing likely parameter values.

Fig. 14
Fig. 14
Close modal

Finally, we consider the exhaustive OED problem of placing five sensors within the physical domain and, for computational feasibility, restrict the possible locations of these five sensors to 25 points in the physical domain, see Fig. 15. We choose to use 1000 samples to solve this problem. In Fig. 15, we plot the design space for a single sensor location using these 1000 samples and show the optimal location of 1, 2, 3, 4, and 5 sensors. The results are quite similar to the greedy results previously described.

Fig. 15
Fig. 15
Close modal
Table 3

(A parameterized inclusion) The top five experimental designs chosen using the full set of 1000 samples. For each of these designs, we compute $E(IQz)$ for 10, 50, 100 and 1000 samples.

Design location10501001000
(−1.001, 0.961)2.3033.6624.0854.569
(−1.050, 1.035)2.3033.3913.8114.261
(1.041, 0.959)2.3033.4373.8394.256
(−1.050, 1.130)2.3023.4183.6614.096
(1.005, 0.870)2.2773.2463.5443.813
Design location10501001000
(−1.001, 0.961)2.3033.6624.0854.569
(−1.050, 1.035)2.3033.3913.8114.261
(1.041, 0.959)2.3033.4373.8394.256
(−1.050, 1.130)2.3023.4183.6614.096
(1.005, 0.870)2.2773.2463.5443.813
Table 4

(A higher-dimensional porous media example with uncertain permeability) The top five experimental designs chosen using the full set of 10,000 samples. For each of these designs, we compute $E(IQz)$ for 50, 100, 1000, and 10,000 samples. Notice the change in $E(IQz)$ for a given design decreases as we increase to 10,000 samples.

Design location50100100010,000
(0.48, 1)1.9662.0011.9922.008
(0.5, 0.98)1.9521.9631.9932.007
(0.46, 0.98)1.9301.9851.9862.006
(0.52, 0)1.7771.9151.9992.006
(0.56, 0)1.7511.8631.9962.006
Design location50100100010,000
(0.48, 1)1.9662.0011.9922.008
(0.5, 0.98)1.9521.9631.9932.007
(0.46, 0.98)1.9301.9851.9862.006
(0.52, 0)1.7771.9151.9992.006
(0.56, 0)1.7511.8631.9962.006

## Conclusion

In this manuscript, we developed an OED formulation based on the recently developed consistent Bayesian approach for solving stochastic inverse problems. We used the Kullback–Leibler divergence and the posterior obtained using consistent Bayesian inference to measure the information gain of a design and present a discrete optimization procedure for choosing the optimal experimental design that maximizes the expected information gain. The optimization procedure presented in this paper is limited in terms of the number of observations we can consider, but was chosen to focus attention on the definition and approximation of the expected information gained. More efficient strategies, utilizing gradient-based methods on continuous design spaces, will be pursued in a future work. We discussed a characterization of the space of observed densities needed to compute the expected information gain and a computationally efficient approach for rescaling observed densities to satisfy the requirements of the consistent Bayesian approach. Numerical examples were given to highlight the properties and utility of our approach.

## Acknowledgment

J.D. Jakeman's work was supported by DARPA EQUIPS. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under Contract No. DE-NA-0003525.

## Funding Data

• Defense Advanced Research Projects Agency (Grant No. EQUiPS).

## References

1.
Cox
,
D. R.
, and
Reid
,
N.
,
2000
,
The Theory of the Design of Experiments
,
CRC Press
,
Boca Raton, FL
.
2.
Fisher
,
R. A.
,
1966
,
The Design of Experiments
, 8th ed.,
Oliver & Boyd
,
Edinburgh, UK
.
3.
Pazman
,
A.
,
1986
,
Foundations of Optimum Experimental Design
,
D. Reidel Publishing
,
Dordrecht, The Netherlands
.
4.
Pukelsheim
,
F.
,
1993
,
Optimal Design of Experiments
,
Wiley
,
New York
.
5.
Ucinski
,
D.
,
2005
, Optimal Measurement Methods for Distributed Parameter System Identification,
CRC Press
,
Boca Raton, FL
.
6.
Korkel
,
S.
,
Kostina
,
E.
,
Bock
,
H. G.
, and
Schloder
,
J. P.
,
2004
, “
Numerical Methods for Optimal Control Problems in Design of Robust Optimal Experiments for Nonlinear Dynamic Processes
,”
Optim. Methods Software
,
19
(
3–4
), pp.
327
338
.
7.
Chung
,
M.
, and
Haber
,
E.
,
2012
, “
Experimental Design for Biological Systems
,”
SIAM J. Control Optim.
,
50
(
1
), pp.
471
489
.
8.
Bock
,
H. G.
,
Körkel
,
S.
, and
Schlöder
,
J. P.
,
2013
, “
Parameter Estimation and Optimum Experimental Design for Differential Equation Models
,”
Model Based Parameter Estimation
,
Springer
,
Berlin
, pp.
1
30
.
9.
Horesh
,
L.
,
Haber
,
E.
, and
Tenorio
,
L.
,
2010
, “
Optimal Experimental Design for the Large-Scale Nonlinear Ill-Posed Problem of Impedance Imaging
,”
Large-Scale Inverse Problems and Quantification of Uncertainty
,
Wiley
, Chichester, UK, pp.
273
290
.
10.
Bauer
,
I.
,
Bock
,
H. G.
,
Körkel
,
S.
, and
Schlöder
,
J. P.
,
2000
, “
,”
J. Comput. Appl. Math.
,
120
(
1–2
), pp.
1
25
.
11.
Haber
,
E.
,
Magnant
,
Z.
,
Lucero
,
C.
, and
Tenorio
,
L.
,
2012
, “
Numerical Methods for A-Optimal Designs With a Sparsity Constraint for Ill-Posed Inverse Problems
,”
Comput. Optim. Appl.
,
52
(
1
), pp.
293
314
.
12.
Alexanderian
,
A.
,
Petra
,
N.
,
,
G.
, and
Ghattas
,
O.
,
2014
, “
A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Linear Inverse Problems With Regularized ℓ0-Sparsification
,”
SIAM J. Sci. Comput.
,
36
(
5
), pp.
A2122
A2148
.
13.
Haber
,
E.
,
Horesh
,
L.
, and
Tenorio
,
L.
,
2008
, “
Numerical Methods for Experimental Design of Large-Scale Linear Ill-Posed Inverse Problems
,”
Inverse Probl.
,
24
(
5
), p.
055012
.
14.
Alexanderian
,
A.
,
Petra
,
N.
,
,
G.
, and
Ghattas
,
O.
,
2016
, “
A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-Dimensional Bayesian Nonlinear Inverse Problems
,”
SIAM J. Sci. Comput.
,
38
(
1
), pp.
A243
A272
.
15.
Atkinson
,
A.
, and
Donev
,
A.
,
1992
,
Optimum Experimental Designs
,
Oxford University Press
,
Oxford, UK
.
16.
Chaloner
,
K.
, and
Verdinelli
,
I.
,
1995
, “
Bayesian Experimental Design: A Review
,”
Stat. Sci.
,
10
(
3
), pp.
273
304
.
17.
Long
,
Q.
,
Scavino
,
M.
,
Tempone
,
R.
, and
Wang
,
S.
,
2015
, “
A Laplace Method for Under-Determined Bayesian Optimal Experimental Designs
,”
Comput. Methods Appl. Mech. Eng.
,
285
, pp.
849
876
.
18.
Loredo
,
T. J.
, and
Chernoff
,
D. F.
,
2003
,
,
Springer
,
New York
, pp.
57
70
.
19.
Müller
,
P.
,
Sansó
,
B.
, and
Iorio
,
M. D.
,
2004
, “
Optimal Bayesian Design by Inhomogeneous Markov Chain Simulation
,”
J. Am. Stat. Assoc.
,
99
(
467
), pp.
788
798
.
20.
Solonen
,
A.
,
Haario
,
H.
, and
Laine
,
M.
,
2012
, “
Simulation-Based Optimal Design Using a Response Variance Criterion
,”
J. Comput. Graphical Stat.
,
21
(
1
), pp.
234
252
.
21.
Long
,
Q.
,
Scavino
,
M.
,
Tempone
,
R.
, and
Wang
,
S.
,
2013
, “
Fast Estimation of Expected Information Gains for Bayesian Experimental Designs Based on Laplace Approximations
,”
Comput. Methods Appl. Mech. Eng.
,
259
, pp.
24
39
.
22.
Haber
,
E.
,
Horesh
,
L.
, and
Tenorio
,
L.
,
2010
, “
Numerical Methods for the Design of Large-Scale Nonlinear Discrete Ill-Posed Inverse Problems
,”
Inverse Probl.
,
26
(
2
), p.
025002
.
23.
Butler
,
T.
,
Jakeman
,
J. D.
, and
Wildey
,
T.
,
2016
, “
A Consistent Bayesian Formulation for Stochastic Inverse Problems Based on Push-Forward Measures
,” preprint
arXiv:1704.00680
.https://arxiv.org/abs/1704.00680
24.
Huan
,
X.
,
2015
, “
Numerical Approaches for Sequential Bayesian Optimal Experimental Design
,”
Ph.D. thesis
, Massachusetts Institute of Technology, Cambridge, MAhttps://dspace.mit.edu/handle/1721.1/101442.
25.
Stuart
,
A. M.
,
2010
, “
Inverse Problems: A Bayesian Perspective
,”
Acta Numer.
,
19
(
5
), pp.
451
559
.
26.
Breidt
,
J.
,
Butler
,
T.
, and
Estep
,
D.
,
2012
, “
A Measure-Theoretic Computational Method for Inverse Sensitivity Problems I: Method and Analysis
,”
SIAM J. Numer. Anal.
,
49
(
5
), pp.
1836
1859
.
27.
Butler
,
T.
,
Pilosov
,
M.
, and
Walsh
,
S.
,
2016
, “
Simulation-Based Optimal Experimental Design: A Measure-Theoretic Perspective
.” SIAM J. Sci. Comput. (in press).
28.
Sargsyan
,
K.
,
Najm
,
H. N.
, and
Ghanem
,
R.
,
2015
, “
On the Statistical Calibration of Physical Models
,”
Int. J. Chem. Kinet.
,
47
(
4
), pp.
246
276
.
29.
Kennedy
,
M. C.
, and
O'Hagan
,
A.
,
2001
, “
Bayesian Calibration of Computer Models
,”
J. R. Stat. Soc., Ser. B
,
63
(
3
), pp.
425
464
.
30.
Huan
,
X.
, and
Marzouk
,
Y. M.
,
2013
, “
Simulation-Based Optimal Bayesian Experimental Design for Nonlinear Systems
,”
J. Comput. Phys.
,
232
(
1
), pp.
288
317
.
31.
Chaloner
,
K.
, and
Verdinelli
,
I.
,
1995
, “
Bayesian Experimental Design: A Review
,”
Inst. Math. Stat.
,
10
(
3
), pp.
273
304
.
32.
Huan
,
X.
, and
Marzouk
,
Y.
,
2014
, “
Gradient-Based Stochastic Optimization Methods in Bayesian Experimental Design
,”
Int. J. Uncertainty Quantif.
,
4
(
6
), pp.
479
510
.
33.
Tarantola
,
A.
,
2005
,
Inverse Problem Theory and Methods for Model Parameter Estimation
,
SIAM
,
.
34.
Wand
,
M.
, and
Jones
,
M.
,
1994
, “
Multivariate Plug-In Bandwidth Selection
,”
Comput. Stat.
,
9
(
2
), pp.
97
116
.
35.
Kullback
,
S.
, and
Leibler
,
R. A.
,
1951
, “
On Information and Sufficiency
,”
Ann. Math. Stat.
,
22
(
1
), pp.
79
86
.
36.
van Erven
,
T.
, and
Harremoes
,
P.
,
2014
, “
Rényi Divergence and Kullback–Leibler Divergence
,”
IEEE Trans. Inf. Theory
,
60
(
7
), pp.
3797
3820
.
37.
Cover
,
T. A.
, and
Thomas
,
J. A.
,
2006
,
Elements of Information Theory
,
Wiley
,
Chichester, UK
.
38.
Bisetti
,
F.
,
Kim
,
D.
,
Knio
,
O.
,
Long
,
Q.
, and
Tempone
,
R.
,
2016
, “
Optimal Bayesian Experimental Design for Priors of Compact Support With Application to Shock-Tube Experiments for Combustion Kinetics
,”
Int. J. Numer. Methods Eng.
,
108
(
2
), pp.
136
155
.
39.
Zhang
,
D.
, and
Lu
,
Z.
,
2004
, “
An Efficient, High-Order Perturbation Approach for Flow in Random Porous Media Via Karhunen–Loéve and Polynomial Expansions
,”
J. Comput. Phys.
,
194
(
2
), pp.
773
794
.
40.
Heroux
,
M.
,
Bartlett
,
R.
,
Hoekstra
,
V. H. R.
,
Hu
,
J.
,
Kolda
,
T.
,
Lehoucq
,
R.
,
Long
,
K.
,
Pawlowski
,
R.
,
Phipps
,
E.
,
Salinger
,
A.
,
Thornquist
,
H.
,
Tuminaro
,
R.
,
Willenbring
,
J.
, and
Williams
,
A.
,
2003
, “
An Overview of Trilinos
,” Sandia National Laboratories, Albuquerque, NM, Technical Report No.
SAND2003-2927
.http://www.sandia.gov/~tgkolda/pubs/pubfiles/SAND2003-2927.pdf
41.
Ganis
,
B.
,
Klie
,
H.
,
Wheeler
,
M. F.
,
Wildey
,
T.
,
Yotov
,
I.
, and
Zhang
,
D.
,
2008
, “
Stochastic Collocation and Mixed Finite Elements for Flow in Porous Media
,”
Comput. Methods Appl. Mech. Eng.
,
197
(
43–44
), pp.
3547
3559
.
42.
Wheeler
,
M. F.
,
Wildey
,
T.
, and
Yotov
,
I.
,
2011
, “
A Multiscale Preconditioner for Stochastic Mortar Mixed Finite Elements
,”
Comput. Methods Appl. Mech. Eng.
,
200
(
9–12
), pp.
1251
1262
.