## Abstract

The cost of adaptive sampling for global metamodeling depends on the total number of costly function evaluations and to which degree these evaluations are performed in parallel. Conventionally, samples are taken through a greedy sampling strategy that is optimal for either a single sample or a handful of samples. The limitation of such an approach is that they compromise optimality when more samples are taken. In this paper, we propose a thrifty adaptive batch sampling (TABS) approach that maximizes a multistage reward function to find an optimal sampling policy containing the total number of sampling stages, the number of samples per stage, and the spatial location of each sample. Consequently, the first batch identified by TABS is optimal with respect to all potential future samples, the available resources, and is consistent with a modeler’s preference and risk attitude. Moreover, we propose two heuristic-based strategies that reduce numerical complexity with a minimal reduction in optimality. Through numerical examples, we show that TABS outperforms or is comparable with greedy sampling strategies. In short, TABS provides modelers with a flexible adaptive sampling tool for global metamodeling that effectively reduces sampling costs while maintaining prediction accuracy.

## 1 Introduction

Costly physical experiments have been increasingly replaced by simulation models. These simulations facilitate an otherwise very expensive process for developing complex products and services. However, even simulation models have become too expensive to handle the complexity required by many contemporary engineering applications. To this end, the use of metamodels (a.k.a., surrogate models, models of models, emulators, or response surface approximations) is now ubiquitous across a wide range of applications [13]. Metamodels offer modelers significant monetary and computational advantages as they can efficiently approximate expensive simulations [4] based on a small number of samples.

Allocating the spatial locations of sample data involves a trade-off between maximizing metamodel accuracy (i.e., reducing the discrepancy between the predicted and actual function) and minimizing the total number of evaluations (sampling cost). The entire set of all such spatial locations is commonly known as a “design” of computer experiments. Typically, two strategies are considered when constructing a “design” [5]: static sampling and adaptive sampling [6,7]. A static sampling approach does not use information from the costly simulation model and can therefore create a design in one stage [8,9]. In contrast, adaptive sampling strategies leverage the information obtained from newly evaluated samples [10] and have been of particular interest in a multitude of research efforts in metamodel based design, including (i) global optimization [1116], (ii) contour estimation [1719], and (iii) global metamodeling [2023].

In this paper, we consider the problem of adaptive sampling for global metamodeling which by convention consists of three steps: (i) allocate the next sample(s) using a heuristic-based acquisition function, (ii) evaluate the costly simulation model, and (iii) retrain the metamodel after including the newly observed data. These three steps are repeated until either the model is considered accurate enough, or until all available resources have been exhausted. This stagewise approach has an advantage over static sampling strategies that are prone to over-sampling or under-sampling, as the required number of samples is not known a priori. Adaptive sampling for global metamodeling techniques helps modelers to efficiently find the required number of samples. The required number of samples depends on the desired metamodel accuracy (i.e., higher accuracy requires more samples), the nonlinearity of the function (i.e., higher nonlinearity requires more samples), and the choice of metamodel. Concerning the latter, Harold Kusher [24] was the first to use the error estimation of a Brownian motion spatial random process (SRP) to guide sampling. Much of the succeeding literature, which includes adaptive sampling for global metamodeling, adopts a similar strategy by leveraging the properties of Gaussian processes (GP) (i.e., a special case of SRP) to construct a design [25,26].

Despite a rich literature, most of the early work on global metamodeling neglects the potential gain of taking multiple samples simultaneously and does not consider the interrelation between samples at present and future sampling stages [14,27], presumably due to tractability challenges. Regarding the latter, by not considering the influence of samples at future sampling stages, a modeler compromises the optimality of the design when multiple sampling stages are required (as is often the case) [28]. An example of an approach that does account for the interrelation for global metamodeling of noisy objective functions is given in Ref. [29]. However, for reasons of tractability, the approach assumes a maximum time horizon. As the potential monetary or temporal gain of performing simulations in batches (i.e., running multiple costly simulations per sampling stage) is not considered, the obtained design will not optimally utilize available resources. An example of an effort that considers the potential gain of batch sampling is given in Ref. [30], where a bin-based Latin hypercube design method is proposed for global metamodeling using a fixed batch size. To the best of our knowledge, an approach that combines multistage sampling with flexible batch size selection considering available resources does not exist, which is likely due to the complexity in formulating and solving such decision-making problems using optimization.

In this article, we present a novel thrifty adaptive batch sampling (TABS) approach for global metamodeling. At each sampling stage, TABS identifies a batch of samples that is optimal with respect to the available resources and the predicted future sampling batches. Optimality in this regard is with respect to the number of invested resources and improved metamodel accuracy. Because the improved metamodel accuracy is uncertain before sampling a new batch, we employ normative decision theory to make the trade-off between improved prediction accuracy and sampling cost that is consistent with a modeler’s preference and risk attitude. As a result, TABS offers decision support to identify a design that satisfies a predefined metamodel accuracy criterion using minimal resources.

The proposed method adopts a multistage reward function that is analogous to the reward function in dynamic programming [31,32]. However, as previous work on multistage adaptive sampling for global optimization has indicated, dynamic programming scales poorly with the increase in time horizon [33]. To circumvent this limitation, we propose two numerical approximations. The first method makes three consecutive decisions (TABS-3) at each stage: (i), the number of samples to consider overall future sampling stages and their spatial location, (ii) in what order to sample, and (iii) the number of sampling stages and batch size at each stage. Note that throughout this paper we consider a decision as the optimization of one objective function, and therefore, a single decision can consist of more than one argument. The second method (TABS-1) considers only one decision per stage: the number of sampling stages and the batch size at each stage. The spatial location and the sampling order for TABS-1 are decided prior to initiating the adaptive sampling procedure from a low-discrepancy sequence (e.g., Sobel, Halton, and Faure [34]). Through simulations, we show that both approaches outperform conventional adaptive sampling strategies in terms of monetary or temporal efficiency.

The format of the remainder of the paper is as follows. Section 2 provides background information on normative decision theory, GP modeling, and dynamic programming. Section 3 introduces the proposed TABS method. Next, we apply TABS to a numerical test problem, explore some of its intricacies with regard to sparse data, risk attitude, and evaluate its performance on an engineering example in Sec. 4. Finally, Sec. 5 concludes the paper.

## 2 Review of Related Work

The fundamental techniques for the development of TABS are presented in this section.

### 2.1 A Normative Approach to Decision-Making Under Uncertainty.

Growing in popularity is the perspective of viewing engineering design (including adaptive sampling) as a decision-making process consisting of two steps: (i) determining all possible design alternatives and (ii) choosing the best alternative [35]. From the perspective of adaptive sampling for global metamodeling, the preference order of alternative sampling batches is given by a preference function $φ:X↦R$, where $x∈X$ and $X⊂Rd$ are a bounded subset of a d-dimensional Euclidean space. In this context, a preference function rank-orders alternative sampling batches from the most preferred to the least preferred. An example for a preference function in the context of global metamodeling is the integrated mean-squared error, as this is a good proxy for metamodel accuracy (the lower the integrated mean-squared error the better). As such, the spatial location of additional sample(s) can be found by identifying the sampling location(s) z = {x1, x2, …, xi} for i ∈ ℕ+ that minimizes the integrated mean-squared error
$z*=argminz∈Xφ(z)$
(1)

This type of function is typically known as an acquisition function. One limitation is that conventional acquisition functions do not provide a fair trade-off between committed resources and improved metamodel accuracy. To this end, the normative decision theory introduces a value functionR(φ(z)) that quantifies the value of sampling z. To provide more context, the value can be considered as simulation time or monetary cost. Throughout this work, we will be using simulation time that we consider to be the on the wall clock time. In other words, the value function is a mapping from metamodel accuracy to its equivalent worth in terms of simulation time. In the case of adaptive sampling for global metamodeling, a value function needs to capture a modeler’s value for improving metamodel accuracy and the cost for running additional simulations. However, prediction of the value function is subject to various sources of uncertainty obscuring and compromising the optimality of Eq. (1). An example of such uncertainty sources is the unknown responses at unobserved spatial locations z.

Decisions under uncertainty are often made through robust and reliability-based design optimization procedures [36,37]. However, these philosophies do not account for the entire density of the improvement in metamodel accuracy, nor the nonlinearity by which modelers value sampling alternatives [38]. The latter is known as a modeler’s risk attitude, a concept drawn from psychological and economic factors that are found through an indifference probability between two sampling alternatives [39,40].

A more consistent school of thought is normative decision theory that suggests taking the expected value of a utility functionU(·), where the expectation is taken with respect to the uncertainty in the inputs of the acquisition function. This requires a new type of acquisition function ψ(z, θ), where θ are the uncertain inputs. Note that this kind of acquisition function effectively predicts the metamodel accuracy if a batch z was to be sampled. Rewriting Eq. (1) using ψ(z, θ) yields
$z*=argmaxz∈XE[U(R(ψ(z,θ)))]$
(2)
A commonly used utility function is the exponential utility given as
$U(R(⋅))=1−exp{γ(R(⋅)−a)}1−exp{γ(b−a)}$
(3)
where a and b are the minimum and maximum values of R(ψ(z, θ)) for all z, and γ is a scalar parameter that accounts for the risk attitude. Generally, three risk attitudes can be identified: (i) risk-prone γ < 0, (ii) risk-adverse γ > 0, and (iii) risk-neutral in which case no utility function is used.

### 2.2 Basic Concepts of Spatial Random Processes.

An SRP consists of a collection of random variables distributed over a spatial or temporal domain [41,42] and are typically used to approximate costly simulation models. A common form of SRPs is GP model, where the collection of random variables is jointly normally distributed. Let x = [x1, x2, …, xd]T be a d-dimensional input vector $x∈X$, and y:ℝd → ℝ be a (noise-free) scalar response function. In this way, GPs approximate a function f(x) as
$f(x)∼GP(mT(x)β,c(x,x′))$
(4)
where m(x) is a column vector of q pre-specified basis functions, β is a column vector of q unknown regression coefficients, and c(x, x′) is the covariance function often chosen to be Gaussian, i.e.,
$c(x,x′)=σ2exp[−∑i=1d10ωi(xi−xi′)2]=σ2r(x,x′)$
(5)
where ω = [ω1, ω2, …, ωd]T is a vector of correlation parameters that govern the smoothness of the metamodel, $r(x,x′)$ is the Gaussian correlation, and σ2 is the prior variance. Note, Eq. (5) defines a stationary GP model and has been used throughout this paper. However, the proposed method works with any form of the covariance function.
An essential part of training a GP is the estimation of the hyperparameters β, ω, σ2. Assuming we have queried the costly function at n locations x1, x2, …, xn and observed y = [y(x1), y(x2), …, y(xn)]T, the hyperparameters are typically found by maximizing the likelihood (or log-likelihood) function through numerical optimization. Through a procedure known as profiling [43,44], the log-likelihood can be simplified to
$ω^=argminω∈ΩL=argminω∈Ω[nlog(σ^2)+log(det(R))]$
(6)
where $calR$ is an n × n matrix with the (i, j)th element given as $Ri,j=r(xi,xj)$ for i, j = 1, …, n, and typically Ω = [−10, 10]d. Subsequent to estimating ω, the response y(x) can be predicted from the posterior mean $μ¯(x)$, and mean-squared error $σ¯2(x)$ for any x in the design space.

### 2.3 Basic Concepts of Dynamic Programming.

Adaptive sampling lends itself well to the adoption of a dynamic programming perspective as it includes a set of subsequent intercorrelated decisions at discrete sampling stages [45]. Note that in this section we use the term “time-step,” instead of “sampling stage”, as this is a convention in dynamic programming literature. Consider a discrete-time-step dynamic that at time t is characterized by a state $st∈St$, where at each step an action at from an action space $At(st)$ is applied. Note that the action space has the current state as an input argument because not all actions might be available at each time-step. For a specified action at at state $st$, the system transits into state $st+1∈St+1$ at time-step t + 1.

The state-to-state transition holds when dealing with a fully defined deterministic system. However, this type of modeling is often impractical as random disturbances $θt=Θt(st,at)$ impede our ability to deterministically predict the state of the system at future time steps. Consequently, the system transitions into future states stochastically according to
$∀t∈[1,…τ],∀(at,st,θt)∈At×St×Θtst+1=Γt(at,st,θt)$
(7)
where τ is the total number of future time steps (a.k.a., time horizon), and for a specific time-step t, we have the dynamic$Γt:At×St×Θt→St+1$
The main objective of dynamic programming is to identify a problem-specific optimal control policy $μ=[μ1,μ2,…,μτ]$ that consists of a sequence of mappings from states $st$ to actions at = μt(st), i.e., $μt:St→At$. The definition of optimality is rooted in the use of a reward function, a function that quantifies the reward when an action at is taken to transit into state $st+1$ while considering the disturbance θt. The general formulation of the reward function is given as $rt:At×St×Θt↦R$. For an arbitrary decision policy and a finite number of time steps, the expected cumulative reward is given as
$Vμ(s1)=E[rτ+1(sτ+1)+∑t=1τrt(st,μt(st),θt)]$
(8)
where $rτ+1(sτ+1)$ is the final reward function $rτ+1:Sτ+1↦R$, to account for the reward of ending in state $sτ+1$. Note that the expectation is taken with respect to the random disturbances θt. The cumulative reward is commonly referred to as action-value function and should not be confused with the deterministic value function from normative decision theory that we defined as R(·).
Through dynamic programming, an optimal policy μ* is identified by maximizing the action-value function
$μ*=argmaxμ∈MVμ(s1)$
(9)

This optimization maximizes the expected reward over τ future steps considering all admissible policies $M$. It should be pointed out that this formulation is recursive so that it can potentially be solved through backward induction starting from t = τ to t = 1. Whether Eq. (9) can actually be solved in reality depends on the tractability of the specific problem.

## 3 Thrifty Adaptive Batch Sampling and Its Implementation

The general problem setting considered in this work is the case where a modeler has to evaluate a costly function $f:X↦R$ a large number of times, where $x∈X$ with $X⊂Rd$ a bounded subset of a d-dimensional Euclidean space. To reduce the cost, the modeler wants to replace the costly simulation function f(x) with a metamodel. In this work, we only consider continuously differentiable simulation functions as the converse cannot be modeled through typical GPs. Note that this is only a minor limitation as many physical phenomena are continuous. This section starts by introducing the general framework of TABS (Sec. 3.1), followed by introducing two numerical approximation techniques TABS-3 and TABS-1 (Sec. 3.2). More details of assessing the elements of TABS’s acquisition function, including the utility-based reward function formulation, probability assessment of meeting the accuracy criterion, and the example of a potential reward function, are provided in Secs. 3.33.5, respectively.

### 3.1 General Framework Thrifty Adaptive Batch Sampling.

TABS is initiated by defining five following inputs:

1. The maximum amount of resources $Feq$ that the modeler is willing to invest (in this work, “resources” refers to simulation time, but can also be formulated in terms of money).

2. A cost function C(·) that captures the cost as a function of the number of sampling stages and the batch size at each stage.

3. Given an estimate for the metamodel accuracy ψ(·), the modeler has to specify a stopping criterion ɛ under which the metamodel is considered sufficiently accurate.

4. The modeler’s risk attitude U(·).

5. An initial design $a0={x0,1,x0,2,…,x0,m0}$ containing a small number of samples, where m0 is the total number of initial samples, and $x0,i∈X,i=1,…,m0$.

The first three inputs are commonly used in various adaptive sampling methods, while the risk attitude and the cost function are not. The utility function as introduced in Sec. 2.1 enables modelers to make decisions consistent with their risk attitude, and the cost function translates the effort of taking samples into simulation time. Adaptive sampling is an iterative discrete time process, where each discrete point in time is referred to as a sampling stage, and each sampling stage consists of four phases as marked in Fig. 1. We propose two numerical approximations, TABS-1 (blue arrows) and TABS-3 (green arrows) as optimizing the acquisition function is computationally intensive. The inputs as defined in Phase 0 to these two approaches differ slightly and will be further explained in Sec. 3.2.

In the first phase, we initiate the optimal sampling policy $π*={a0}$, setting the number of sampling stages t = 1, and define the current state $s0$ as the empty set. Note that a state $st∈St$ is defined by all samples $at∈At(st)$ and their response yt = f(at); in other words, $st=∪i=0t−1{ai,yi}$ for $ai={xi,1,xi,2,…,xi,mi}$, and $yi={yi,1,yi,1,…,yi,mi}$. The first batch of the sampling policy is then used in Phase 1 to run the computer simulations and define the current state $st=st−1∪{(at−1,yt−1)}$. Next, in Phase 2, the GP model is updated by re-estimating its hyperparameters through the likelihood function in Eq. (6). The state and the GP model are then passed to Phase 3 to estimate the metamodel accuracy. TABS is terminated if the accuracy requirement is met; otherwise, the current state and GP model are passed to the next phase. In Phase 4, a utility-based action-value function (UBAV) (i.e., the newly proposed acquisition functions) denoted by $Vπ(st)$ is maximized updating the optimal sampling policy $π*$, containing the recommended optimal number of samples, spatial location, number of stages, and the batch size for each stage. TABS-3 uses three consecutive decisions as given by the green block in Fig. 1, and TABS-1 only considers one decision as given by the blue block in Fig. 1.

After Phase 4, the number of stages is updated t = t + 1 and the procedure is repeated until the accuracy criterion is satisfied. Note that at the beginning of each stage the optimal sampling policy is updated as $π*={π1*,π2*,…,πτ*}$, where $πi*$ is the optimal batch of samples at the ith sampling stage for a total of τ stages. For notational purposes, we have not indexed π* with time but note that it is updated and differs at each sampling stage.

### 3.2 Implementation and Tractability.

The formulation of the UBAV is likely to be computationally intensive due to the large variability in sampling policies, and thus, we require a heuristic that can mitigate the complexity at a minimal cost of optimality. In this section, we introduce two such methods, TABS-3 and TABS-1.

#### 3.2.1 TABS Through Three Consecutive Decisions (TABS-3).

TABS-3 improves numerical tractability by approximating the UBAV through three consecutive decisions: (i) total number of samples that will be considered and their spatial locations, (ii) the order in which to sample, and (iii) how many batches and the size of each batch.

##### Decision I: Number and spatial location of samples.
The spatial location and the optimal number of samples to consider are found by setting the number of stages equal to one (τ = 1), and maximizing $Vπ(st)$, with respect to the total number of samples
$D1*=argmaxnf∈N+Vπ(s1)$
(10)
where the set of admissible sampling policies is defined by a maximin distance design $π=Xf*(nf,Xp)$ [46]. In this formulation $D1*={d1,1*,d1,2*,…,d1,nf*}$ is a set of $nf$ samples, where $d1,i*∈X$. The reason for this is that adaptive sampling strategies for global metamodeling do not outperform static sampling strategies with respect to final metamodel accuracy, as pointed out in Ref. [10]. The explanation for this is that adaptive sampling strategies have compromised space-filling properties as samples are added without considering the influence of potential future samples.
The formulation of a maximin distance design is as follows, assume that at the sampling stage t we have already observed a total of $np$ samples Xp (subscript p indicates past), and we consider a total of $nf=∑i=1tmi$ future samples Xf (subscript f indicates future). By introducing $Xpf=Xp∪Xf$, and maximizing the minimum distance between samples gives
$Xf*(nf,Xp)=argmaxXf∈X[min1≤i≤nf1≤j≤np+nfXf,i≠Xpf,j||xf,i−xpf,j||2]$
(11)
where $xf,i∈Xf(i=1,…nf)$ and $xpf,i∈Xpf(j=1,…nf+np)$. Note that this implies a nested optimization problem where the optimization of Eq. (10) requires the spatial location of $nf$ samples as obtained through Eq. (11).

Simplifying the problem by settings the number of sampling stages equal to one compromises the optimality of the sampling policy when the true optimum spans more than one stage. However, it is reasonable to believe that this compromise is marginal, as it is observed that the first several sampling stages have the largest contribution to the UBAV as any reasonable reward function is monotonically decreasing with the increase of sampling stages. Consequently, the UBAV will maximize the probability of satisfying the accuracy criterion (see details in Sec 3.4) at the most forthcoming stages and is thus relatively insensitive to additional samples.

##### Decision II: Sampling order.
Identifying the optimal sampling order is of particular interest as this will reduce the variability of the sampling batches $πt*∈D1*$. Again we fix the number of sampling stages to one (τ = 1), define a new set of samples $D1*∼d1,i*$ that includes all but the ith sample of $D1*$, and set $h=nf$. We can then find the least beneficial sample as
$d2,k*=argmaxd1,i*∈D1*VD1*∼d1,i*(s1)$
(12)

Next, we remove $d2,k*$ from $D1*$, incrementally decrease h by one and reinitiate Eq. (12) to find the next least beneficial sample. This process is repeated until all samples in $D1*$ have been exhausted, or until the probability of meeting the accuracy criterion is 0. We then obtain the decision vector $D2*={d2,1*,d2,2*,…,d2,nf*}$ containing an ordered set of $nf$ samples, where $d2,i*∈D1*$.The process of incrementally removing samples is analogous to backward induction, and is a common procedure in the field of dynamic programming [47].

##### Decision III: Number and size of batches.
Finally, we are left with the decision of how to divide the ordered set of samples $D2*$ to create sampling batches, this implies that the time horizon is now t ≥ 1. During Decision II, we have already obtained the probabilities of satisfying the accuracy criterion and can therefore readily evaluate the UBAV function and approximate the optimal sampling policy as
$π*=argmaxπ∈D2*Vπ(s1),for{π1,π2…,πτ}=D2*$
(13)

#### 3.2.2 TABS Through One Decision (TABS-1).

Low-discrepancy sequences provide an ordered set of samples xc ∈ ℝd×i, where $Uj=1ixc,j$ has good space-filling properties for any i ∈ ℕ+. Consequently, we can use such a sequence to decide where to sample and in what order during the initiation Phase 0. The integration of a low-discrepancy sequence forms the basis for the TABS-1 approach as depicted by green arrows in Fig. 1. In Phase 0 a candidate set xc is prescribed of which the first i0 elements are selected to form the initial design a0. In this work we used a fully sequential space-filling (FSSF) sequence [48] to create xc. An FSSF sequence is created by sequentially selecting samples from an additional and larger Sobol sequence using minimum pairwise distance. In this way, no repeated decisions need to be made regarding the order and the locations of samples. The remaining process is identical to TABS-3, except for Phase 4 where the optimal sampling policy is approximated through a single decision that is formulated as
$π*=argmaxnf,πi∈xc,i,Vπ(s1),for{π1,π2…,πτ}={xc,1,xc,2,…,xc,nf}$
(14)

Note how the decision is nearly identical to Eq. (13), except that in addition to batching the samples we also have to find the optimal number of samples $nf$.

In terms of numerical complexity, TABS-1 grows implicitly with the increase in input dimensionality as more future samples will be required (i.e., $nf$ becomes larger). More specifically, the number of combinations in which the additional samples can be batched increases at a squared rate as the total number of future samples increases. Observe that this increase in complexity equals that of Eq. (1) with respect to increasing input dimensionality. Consequently, TABS-1 scales similar in terms of numerical complexity with the increase in input dimensionality as conventional acquisition functions.

### 3.3 Formulation of the Utility-Based Action Value Function.

Critical to any adaptive sampling approach is the formulation of the acquisition function; in the case of TABS, this is the UBAV that captures the multistage nature of the problem by adopting a dynamic programming perspective. Conventional dynamic programming is likely to run into the following challenges: (i) the decision space is continuous and likely to be intractable, and (ii) an absorbent state can be reached after each state transition, but we do not know a priori which states are absorbent. In the context of adaptive sampling, an absorbent state implies that we have reached a state in the sequential decision-making process where the modeler is satisfied with the metamodel accuracy and no more samples/actions are taken.

Details on the decision process as considered by the UBAV function for the first sampling stage t = 1 are visualized in Fig. 2. We start at state $s1$ after sampling a0. For an arbitrary sampling policy $π={π1,π2,…,πτ}$, the first batch $π1={x1,1,x1,2,…,x1,m1}$ containing m1 ∈ ℕ+ samples would transition the metamodel to a future state $s1$ where we have a probability $p1$ to satisfy the metamodel accuracy and gain a reward r1 (implying that an absorbent state has been reached). On the other hand, we have a probability $p1′=1−p1$ that our metamodel is not accurate enough and that we require an additional batch of samples π2 transitioning us to state $s3$ where again we have a probability $p2$ that an absorbent state is reached and would otherwise require additional samples. The final state $sτ+1$, which is reached after sampling a total of τ batches, can be absorbent for one of two reasons: (i) the stopping criteria $Ψ(sτ+1)<ε$ is met, or (ii) the maximum number of sampling stages has been reached while $Ψ(sτ+1)≥ε$, for which we incur the rewards $rτ$ and rτ +1, respectively.

When we initiate the UBAV function, we know that the current state $st$ did not contain enough function evaluations to train an accurate metamodel and as such $p0=0$. As it is impossible to predict the total number of samples (or stages), we define the random variable T, which is the number of future sampling stages required to reach an absorbent state. Subsequently, for an arbitrary sampling policy π, we can identify the probability of T as
$P(T=t;π)=P(ψ(st+1)<ε≤ψ(st))$
(15)
where t = 0, …, τ, $Ψ(s0)=∞$ and ψ(sτ +1) = 0.

Note that the construction of the probabilities of reaching an absorbent state at future time t matches the probability of obtaining a reward rt as presented in Fig. 2. For notational purposes, we introduce the probability of meeting the accuracy criterion at sampling stage t as $Pt=P(T=t;π)$.

For an arbitrary state $st$, the modeler will incur a reward $rt+1(st,πt(st),θt)$ with probability $Pt$. Here, θt is a random disturbance, and πt is the tth batch of sampling policy $π={π1,π2,…,πτ}$, containing mt ∈ ℕ+ samples given as $πt={xt,1,xt,2,…,xt,mt}$ for $xt,mt∈X$. According to dynamic programming, we can identify an action value function for an arbitrary sampling policy as
$Vπ(s1)=E[(1−∑t=1τPt)rτ+1(sτ+1)+∑t=1τPtrt(st,πt,θt)]$
(16)
where the expectation is taken with respect to a yet to be defined disturbance $θ={θ1,θ2,…,θτ}$.
The formulation as presented in Eq. (16) lends itself well for the integration of a utility function allowing us to be consistent with a modeler’s risk attitude by substituting Eq. (3) in the previous argument and find our UBAV function as
$Vπ(s1)=E[(1−∑t=1τPt)U(rτ+1(sτ+1))+∑t=1τPtU(rt(st,πt,θt))]$
(17)
By maximizing Eq. (17) over all admissible sampling policies Π, the optimal sampling policy is obtained as
$Vπ*(s1)=maxπ∈ΠVπ(s1)$
(18)
Defining $Pτ+1=(1−∑t=1τPt)$ and $Vτ+1(sτ+1)=Pt+1$$U(rτ+1(sτ+1))$, through recursion and Bellman’s principle of optimality, we find
$Vπt(st)=maxat∈AtE[PtU(rt(st,at,θt))+Pt+1Vπt+1(Γ(st,at,θt))]$
(19)
where At is the set of all admissible samples that can be taken at the tth stage. Note that in order to predict a future state, we require a system dynamic Γ(·). For this purpose, TABS uses a preposterior analysis of our GP model. More specifically, the system dynamic is obtained by adding samples at and a random realization of the predicted response to the GP model [49].

Through Eq. (19), the optimal value is given by $V1(s1)$, if $at*=πt*(st)$ maximizes the right hand side of Eq. (19) for all t and $st$. As a result, we find the optimal sampling policy as $π*={π1*,π2*,…,πτ*}$.

### 3.4 Formulation of the Probability of Meeting the Accuracy Criterion.

A direct result of identifying a stopping criterion ψ(·) < ɛ is the potential existence of a set of samples for which our metamodel is considered accurate enough so that no more samples are required. However, we cannot predict with certainty if the stopping criteria for a future state will be met because it is a function of the random variables yt = f(at) and the approximated hyperparameters ω. For the disturbance associated with the unknown response, we use the prediction of our current GP given as $PΛ(λ)=N(μ¯(at;st),σ¯2(at;st))$. The disturbance associated with the roughness parameters are estimated through the Fisher information matrix $I:Ω×Ω↦Rd×d$ that takes the variance of the sensitivity of the likelihood function in Eq. (6) with respect to changes in the hyperparameters [50]. The closed form of the Fisher information matrix for a GP is
$Ik,l=tr(R−1∂R∂ωkR−1∂R∂ωl)$
(20)
where tr(·) is the trace operator, and k, l = 1, …, d [51,52]. Under the assumption of the Bernstein-Von Mises theorem [53], we can approximate $P(ω)≈N(ω^,I−1(ω))$, where $ω^$ is the maximum likelihood estimate of the roughness parameters.
The conditional probability of meeting the accuracy criterion at future time t (i.e., $∑i=1tPi$) is the predicted probability that the metamodel accuracy is below threshold ɛ, obtained as
$P[ψ(at;st,w,λ)<ε]=∫W×Λ1[ψ(at;st,w,λ)<ε]PW(w)PΛ(λ)dwdλ$
(21)
where 1[ · ] is an indicator function. We can then find the probability of meeting the accuracy criterion at a specific stage through recursion as $Pt=P[ψ(at;st,w,λ)<ε]−∑i=1t−1Pi$, for t = 1, …, τ and $P0=p0=0$.

### 3.5 Formulation of a Reward Function.

Maximizing Eq. (19) requires the formulation of a reward function that enables a modeler to quantify the value (i.e., simulation time or monetary cost) of transitioning from a state $st$ to $st+1$ by means of an action at. As previously mentioned, the quantity of interest used to gauge the accuracy of a metamodel is the estimation of the metamodel accuracy ψ(st). However, this quantity only accounts for the potential gain, while neglecting the cost associated with the sampling strategy. As such, a general reward function is defined as
$rt(st,at,θt)=Ft(st,at,θt)−Ct(a1,a2,…,at)$
(22)
where the potential gain is given by $Ft:St×At↦R$, and the cost is given as $Ct:a1×a2×…×at↦R$.
The gain in value is assumed to be monotonically increasing with the increase in metamodel accuracy. However, defining such a function and calibrating it to represent a modeler’s real-world scenario is a difficult task. We therefore adopt a stepwise approximation
$Ft(st,at,θt)={Feq,ifψ(st,at,θt)<ε0,otherwise$
(23)
Here, the modeler has to define the stopping criteria ɛ and the maximum amount of time that a modeler is willing to invest $Feq$.
The cost function Ct should adequately account for the number of sampling stages t and the number of samples observed during each stage |at|, where | · | is the cardinality operator (i.e., the number of samples in at). In the examples used in this work, we propose a flexible cost function as
$Ct(a)=cb(|at|bs+((rem(|at|bs)1bs))1/α)+cst+∑i=1t−1Ci(at)$
(24)
where the cost of a batch with size bs ∈ ℕ+ is given by cb ∈ ℝ+, the cost of initiating a stage cst ∈ ℝ+, and α ∈ ℝ+ is a tuning parameter. Furthermore, [ · ] rounds a variable down to its nearest integer, and rem(·) is the modulo operator that gives the remainder of a fraction. Selecting values for α can have three different effects: (i) when 0 < α < 1 TABS will discount more than one sample per stage, (ii) when α = 1 cost increases linearly with the number of samples per batch, and (iii) for 1 < α < ∞, we obtain a cost function that prefers a batch size that is a multiple of bs.

For the sake of illustration, a visualization of Eq. (24) is presented in Fig. 3. This visualization considers a batch size of four, α > 1 and spans three sampling stages, where seven, five, and two samples are taken in that order. Note for a particular sampling stage, the increase in cost becomes smaller as a batch fills up and rapidly increases again once a batch has been filled (lower two lines). This behavior is analogous to a modeler that is able to run four simulations in parallel and incurs a significant cost increase when one additional stimulation is to be run. The purpose of the cost function is to enable us to solve Eq. (19). In practice, modelers are recommended to specify their own cost function considering the gain of improved metamodel accuracy and sampling cost.

## 4 Numerical and Engineering Examples

In this section, we demonstrate the functionality of TABS by testing it to two numerical functions and one engineering problem.

### 4.1 One-Dimensional Test Problem.

We start with a simple test problem used in Ref. [54], given as
$f(x)=(6x−2)2sin(12x−4)$
(25)
where x ∈ [0, 1]. TABS-3 is used for approximating Eq. (25) and is initiated using the settings in Table 1. The selection of the desired metamodel accuracy (ɛ) and the total amount of simulation time that a modeler is willing to invest $(Feq)$ originate from two schools of thought. In the first, modelers simply allocate new samples until emulation accuracy ɛ has been achieved. In the second, a modeler determines the amount of simulation time $Feq$ and adds samples until all resources have been exhausted. For the selection of the initial number of samples, it is recommended to take between two and five times the input dimension of a problem. This is a conservative number of samples that typically avoids over-sampling (so that no resources are wasted). The risk attitude is best chosen through the selection of an indifference probability for which we refer the reader to Ref. [40]. Finally, the cost function and its parameters are to be selected according to how simulations are run by the modeler. When using Eq. (24), approximations of the parameters can be made by monitoring the used resources for evaluations of the initial set of samples. For example, the batch size bs can be chosen as the maximum number of simulations that can run in parallel, the batch cost cb as the time it takes to simulate a single batch, and a conservative choice for α is 1.5 (because this provides moderate advantage for sampling batches).

After problem initiation, we evaluated the response of the five initial samples (Phase 1), as represented by the five circles in the left panel of Fig. 4. Next, a GP is trained to these five observations providing a mean estimation and 95% prediction interval (PI) as shown by the dashed line and shaded region (Phase 2). We find that the metamodel does not meet the accuracy requirement (Phase 3) and thus update the optimal sampling policy by maximizing the UBAV (Phase 4). The optimal sampling policy at the first sampling stage is given as $π*={π1*,π2*,π3*}$, where the first batch contains five samples (squares), and the second (diamonds) and third batch (upward pointing triangles) contain two samples each.

Next, TABS advances to the next sampling stage by evaluating Eq. (25) for $π1*={0.82,0.21,0.28,0.40,0.87}$, updating the metamodel (Phase 2) and concluding that the updated GP model does not meet the accuracy criterion (Phase 3). Consequently, we advance to Phase 4 and update the optimal sampling policy by maximizing the UBAV. This process is repeated for a total of five stages before satisfying the accuracy criterion.

The final set of samples, the metamodel prediction, and the true function are presented in the right panel of Fig. 4. We observe that an accurate metamodel is created as it is difficult to distinguish between the original function and the 95% PI. The samples taken after the first stage (squares) were followed by the diamond, upward pointing triangle, downward pointing triangle, and left pointing triangle at subsequent stages. We like to draw the reader’s attention to the last added sample (left pointing triangle) and the relative uniform distance between its neighboring sample points. This is because the decision policy at the first sampling stage had already considered that a sample might be needed in this region, as shown by the diamond on the left. This foresight is an intriguing visualization of how the UBAV positively influences the space-filling property of the final design.

#### 4.1.1 The Influence of Sparse Initial Sampling.

Most adaptive sampling strategies starting with a small number of samples are likely to run into problems associated with sparse data. To exemplify this problem, we present the GP estimation for three samples taken from Eq. (25) in the left panel of Fig. 5. Observe how the mean and the 95% PI quickly converge to a constant. This is a consequence of the negative log-likelihood not being able to accurately predict the roughness parameter ω as visualized in the right panel of Fig. 5. Note how the value of the negative log-likelihood is relatively constant over the range ω ∈ [2, 10], indicating the difficulty of estimating ω.

As a result of sparsity, the metamodel cannot identify if the observed data originate from a linear or a nonlinear function and TABS will therefore recommend taking only a single sample. This is a compelling decision as the optimal number of samples can range from zero to infinity. To understand why TABS makes this decision, consider the uncertainty in the roughness parameter that Eq. (20) estimated as $ω^∼N(2.6,3.5e27)$. This distribution is used to estimate the probability of satisfying the accuracy criterion. Furthermore, in the case of such large variance in the roughness parameter, we find that Eq. (21) is relatively insensitive to the number of total future samples, i.e., $Pt≈0.5$, regardless of the decision policy, and thus TABS recommends taking only one sample.

#### 4.1.2 The Influence of Risk Attitude.

To illustrate the influence of the risk attitude, we will take a close look at the optimal sampling policies obtained for approximating Eq. (25) using TABS-3 with various risk attitudes. First, we initiate the problem using five different risk attitudes and, otherwise, the same initial settings as presented in Table 1. The recommended batch sizes at each stage are presented in the second to fifth column of Table 2, and the total number of future samples are presented in the sixth. Note that $|πt*|$ is used to indicate the batch size of the tth batch in the optimal sampling policy. The risk-adverse attitude considers more samples as the utility function will be less sensitive to the increased cost, while the converse is true for the risk-prone attitude. Considering more samples will have reduced space-filling property when only a few samples are required; but will have improved space-filling property when more samples are needed. This observation is indicative of normative decision-making as it captures the conventional trade-off between improved expected reward and reduced variability.

### 4.2 Two-Dimensional Test Function.

We use a two-dimensional test function to compare the results from different sampling strategies. Here, we test five different sampling techniques to approximate the function
$f(x1,x2)=5((x1x2)2+(x1−0.8)2)$
(26)
where x1, x2 ∈ [0, 1]. The five sampling techniques considered are the maximum mean-squared error approach (MSE), integrated mean-squared error (IMSE), TABS-3, TABS-1, and the optimal Latin hypercube sampling (OLHS). The MSE approach uses a GP model to sequentially add samples at the spatial location that maximizes the mean-squared error $σ¯2(⋅)$, after each sample has been identified the metamodel gets updated. The IMSE approach is similar to the MSE approach by sequentially adding samples that minimize $∫Xσ¯2(⋅)dx$. Furthermore, OLHS is a static sampling approach that has been included as a utopian point (if we had known the total number of samples a priori).

TABS-3 is used to find the total number of samples required to approximate Eq. (26) based on the initial settings presented in Table 3. Initiating the same problem with 20 unique initial designs, TABS-3 found that the optimal number of samples $n$ was either 13, 14, or 15. For each of the 20 simulations, TABS-1 selected the first $n$ samples of an FSSF design, exactly replicating the design that would have been achieved if TABS-1 required $n$ samples. In addition, the MSE and IMSE approaches started from the same initial design as TABS-3 and sequentially added samples until the design contained $n$ samples. Finally, OLHS was used to create a static design containing $n$ samples. Note that this implies that all designs have the same number of samples, facilitating a fair comparison between metamodel accuracy.

For comparison, we evaluated Eq. (26) for an additional 100 samples and used those to estimate the normalized root-mean-squared error (NRMSE) as presented in Fig. 6. From this figure, we observe that both TABS-3 and TABS-1 outperform the MSE and IMSE approaches as a result of not considering the influence of potential future samples. What is more, we observe that OLHS has an improved median performance over TABS, corroborating the assertion that static sampling approaches have improved space-filling properties over adaptive sampling approaches. Finally, we observe that TABS-1 outperforms TABS-3, a result that is likely due to the use of the FSSF sequence as they are known to be similar for low dimensional realizations with few samples (thus resulting in small variation in NRMSE).

### 4.3 Engineering Problem: Design of an Engine Piston.

In this section, we apply TABS to the approximation of a piston design simulation as analyzed in Ref. [55]. This is a six-dimensional dynamic computer simulation that predicts the sound level resulting from the secondary motion of an engine piston.

In addition to the methods used in Sec. 4.2, we also consider a bin-based batch sampling technique, that at each stage, the method evaluates bs samples from an FSSF sequence until the desired model accuracy is achieved. Using the initial settings as presented in Table 4, we created 20 unique designs with each of the six methods. The performance of each method is compared in terms of sampling cost (Fig. 7(a)) (obtained from the cost function), model accuracy (Fig. 7(b)), number of samples (Fig. 7(c)), and the computational cost (Fig. 7(d)).

Concerning sampling cost, we find that TABS-1 and TABS-3 perform nearly as well as our utopian design (OLHS). In addition, bin-based batch sampling is slightly worse than TABS-1 and TABS-3, while the MSE and IMSE approaches are significantly worse.

As for model accuracy, we have retained 200 samples for validation and used these to estimate NRMSE. It is found that TABS-1, TABS-3, OLHS, and bin-based batch sampling have similar accuracy and outperform the IMSE and MSE adaptive sampling approaches. The MSE approach, in particular, performs poorly in terms of accuracy as it has a tendency to allocate more samples at the boundary of the design space. However, this is not strictly a fair comparison as different approaches require a different number of samples, as can be observed from Fig. 7(c).

The time spent on maximizing the acquisition/UBAV functions (i.e., the computational efficiency of the sampling methods) is presented in Fig. 7(d), and TABS-1 has a significant advantage over TABS-3 while performing similar to conventional one-sample-at-the-time adaptive sampling strategies (i.e., MSE and IMSE). This observation corroborates the assertion made in Sec. 3.2 that TABS-1 has similar numerical complexity as adaptive sampling methods that use an acquisition function as defined by Eq. (1). Moreover, bin-based batch sampling and the OLHS approach are most efficient as only one optimization problem has to be performed during the entire process, while the MSE and IMSE methods solve an optimization problem during each sampling stage. Because TABS enables samples to be taken in batches, fewer sampling stages are required, and thus, fewer optimizations need to be performed than the MSE or IMSE approach. Consequently, we find that TABS-1 scales slightly better in terms of numerical tractability with respect to problem dimensionality as the MSE and IMSE approaches, and TABS-3 performs significantly worse as it needs to solve a nested optimization problem.

To further explore the difference between TABS-1 and TABS-3, one has to consider the difference in batch size at each sampling stage. Consequently, the mean and the standard deviation of the number of samples taken at each stage by TABS-1, TABS-3, and bin-based batch sampling are presented in Fig. 8. We observe that both TABS-1 and TABS-3 have a propensity to allocate larger batches during the first few sampling stages and gradually decrease batch size at later stages. This is an appealing property as the risk of over-sampling becomes larger at later sampling stages (when the design contains more sample points). Moreover, in the worst-case bin-based batch sampling required eight sampling stages, while the worst case of TABS-1 and TABS-3 required only five sampling stages. In addition, during the first stage, TABS-3 has on average larger sampling batches compared to TABS-1. This is likely a result of decision II in TABS-3 where the sampling order is selected so as to minimize the incremental improvement (in terms of the probability of satisfying the accuracy criterion) between the latter samples in the sampling policy. Conversely, the incremental improvement in the probability of satisfying the accuracy criterion between earlier samples will be larger and thus those samples are more likely to be included in the first sampling batch. Furthermore, because TABS-3 includes more samples in the first sampling stage, it will require fewer samples in consecutive states, and thus, the average batch size of TABS-3 at stages 2 and 3 is smaller than TABS-1.

We recommend the use of TABS when there is a potential gain for running simulations in parallel, in batches, or both. Moreover, under close observation, we find a slight advantage of TABS-3 over TABS-1 in terms of accuracy and sampling cost. This observation is likely due to the simplification that TABS-1 enforces on the approximation of Eq. (19). Therefore, we recommend the use of TABS-3 when the problem dimension is low (less than four inputs); otherwise, we recommend the use of TABS-1 for its superior temporal performance.

## 5 Conclusion

This paper presents a thrifty adaptive batch sampling (TABS) approach for global metamodeling. The contribution of TABS is threefold: (i) through integration of normative decision theory, TABS achieves a balance between the cost of sampling and gain in metamodel accuracy that is consistent with a modeler’s preference and risk attitude; (ii) during each sampling stage, TABS finds an optimal sampling policy that considers samples at the current stage as well as samples at future stages, and (iii) TABS provides notable flexibility in that it finds and optimal sampling policy that includes the total number of sampling stages, the total number of samples per stage, and the spatial location of each sample.

For a specific state (i.e., the currently available data set), TABS identifies an optimal sampling policy by predicting the metamodel accuracy at future states while considering the associated sampling cost. Predicting future states introduces uncertainty and complicates the decision-making process while the large variability in the decision space of the sampling policy impedes numerical tractability. These issues have been addressed through the development of a dynamic programming inspired acquisition function for which we proposed two approximation techniques, TABS-1 and TABS-3. The most comprehensive approach is given by TABS-3 that makes three consecutive decision each sampling stage: (i) where to sample and how many samples to consider, (ii) what is the optimal order to sample, and (iii) how many sampling stages and how many samples per stage should there be. Contrarily, TABS-1 reduces numerical complexity by using a low-discrepancy sequence to decide the optimal sampling location and sampling order for each sample in a candidate set, prior to the initiation of the adaptive sampling technique. Consequently, during each sampling stage, TABS-1 only needs to decide the optimal number of stages and the optimal number of samples to take at each stage.

Through the application of TABS to two test problems and one engineering problem, it is shown that TABS-1 and TABS-3 are comparable or outperform conventional adaptive sampling techniques. In terms of sampling cost, both TABS-1 and TABS-3 hold an advantage over conventional approaches (in particular, “one sample at a time” methods), with TABS-3 performing slightly better than TABS-1. However, in terms of computational cost, we show that TABS-1 scales at a similar rate as conventional adaptive sampling methods with respect to function dimensionality, while TABS-3 scales significantly worst as a result of a nested optimization loop.

TABS has specifically been developed under the broad framework of spatial random processes, in particular, Gaussian process (GP) models. Subsequently, the proposed method inherits similar features as GP models. In general, GP methods are powerful for nonlinear, but differentiable functions. Concerning noisy simulation functions, the proposed framework can readily be extended to problems with known homoscedastic and known heteroscedastic noise variance. However, if the noise variance is not known a priori, we need to include it as a hyperparameter in our metamodel and consider its uncertainty in the prediction of future states. This problem further complicates when a simulation model has unknown heteroscedastic noise. The challenge of global metamodeling of computer simulations with unknown heteroscedastic noise will be considered in future work. Moreover, the use of TABS is only warranted when simulations are time-consuming, it is advantageous to run simulations in parallel, and if there is an advantage in minimizing the total number of sampling stages. If running simulations in parallel is not an advantage, TABS will perform similar to “one sample at a time” techniques. If it is not advantageous to reduce the number of sampling stages, TABS will perform similar to conventional batch sampling techniques.

## Acknowledgment

The support from AFOSR FA9550-18-1-0381 and NSF CMMI-1537641 is greatly appreciated.

## Nomenclature

• x =

{x1, x2, …, xd} =d-dimensional vector of sampling locations

•
• at =

${xt,1,xt,2,…,xt,mt}=$ a batch of mt samples observed at sampling stage t

•
• pt =

conditional probability of satisfying metamodel accuracy criterion at sampling stage t given that it was not satisfied at sampling stage t − 1

•
• $s$ =

the state of our metamodel at sampling stage t containing all samples ai, and response

•
• xc =

candidate set TABS-1, obtained through a fully low-discrepancy sequence yi for i = 1, …, t

•
• yt =

${yt,1,yt,2,yt,mt}=$ a batch of mt responses observed at sampling stage t. (i.e., f(at))

•
• $Pt$ =

probability that future state $s$ is the first to satisfy the metamodel accuracy criterion

•
• $Feq$ =

available monetary or temporal budget for sampling

•
• cb, bs, α =

constants in the cost function, cost per batch, batch size, and tuning parameter

•
• f(·) =

costly simulation model

•
• $nf,np$ =

number of total future and past samples, respectively

•
• rt(·) =

reward function at sampling stage t

•
• w, λ =

disturbance of Gaussian process model hyperparameters and unobserved responses, respectively.

•
• Ct(·) =

cost function at sampling stage t

•
• D1, D2 =

decision vectors after solving Decisions I and II, respectively

•
• R(·) =

value function as an element of normative decision-making

•
• U(·) =

utility function capturing a modeler’s risk attitude

•
• $Vπ(⋅)$ =

action value function of dynamic programming

•
• I(·) =

Gaussian process Fisher information matrix

•
• $Xf*(⋅)$ =

a set of $nf$ future samples dispersed over the sampling space by maximizing minimum distance

•
• $L(⋅)$ =

Gaussian process likelihood function

•
• $Vπ(⋅)$ =

utility-based action value function

•
• β, ω, σ2 =

Gaussian process hyperparameters

•
• Γ(·) =

system dynamic that predicts a future state

•
• ɛ =

desired maximum metamodel model accuracy

•
• θt =

disturbance in the prediction of future state t

•
• $μ¯(⋅),σ¯(⋅)$ =

mean and error estimation of metamodel

•
• πt =

${xt,1,xt,2,…,xt,mt}=$ a recommended batch of samples at future sampling stage t

•
• π =

${π1,π2,…,πτ}=$ a sampling policy

•
• τ =

total number of sampling stages in sampling policy π

•
• Π =

the set of all admissible sampling policies

•
• Γ(·) =

system dynamic that predicts a future state

•
• ψ(·) =

a metric for metamodel accuracy

## References

References
1.
Liu
,
R.
,
Yabansu
,
Y. C.
,
Yang
,
Z.
,
Choudhary
,
A. N.
,
Kalidindi
,
S. R.
, and
Agrawal
,
A.
,
2017
, “
Context Aware Machine Learning Approaches for Modeling Elastic Localization in Three-Dimensional Composite Microstructures
,”
Integr. Mater. Manuf. Innov.
,
6
(
2
), pp.
160
171
. 10.1007/s40192-017-0094-3
2.
Zhang
,
S.
,
Zhu
,
P.
, and
Chen
,
W.
,
2013
, “
Crashworthiness-Based Lightweight Design Problem via New Robust Design Method Considering Two Sources of Uncertainties
,”
J. Mech. Eng. Sci.
,
227
(
7
), pp.
1381
1391
. 10.1177/0954406212460824
3.
Bessa
,
M. A.
,
,
R.
,
Liu
,
Z.
,
Hu
,
A.
,
Apley
,
D. W.
,
Brinson
,
C.
,
Chen
,
W.
, and
Liu
,
W. K.
,
2017
, “
A Framework for Data-Driven Analysis of Materials Under Uncertainty: Countering the Curse of Dimensionality
,”
Comput. Methods Appl. Mech. Eng.
,
320
(
1
), pp.
633
667
. 10.1016/j.cma.2017.03.037
4.
Viana
,
F. A. C.
,
Gogu
,
C.
, and
Haftka
,
R. T.
,
2010
, “
Making the Most Out of Surrogate Models: Tricks of the Trade
,”
Proceedings of the ASME 2010 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
,
Aug. 15–18
, pp.
587
598
.
5.
Santner
,
T. J.
,
Williams
,
B. J.
, and
Notz
,
W. I.
,
2003
,
The Design and Analysis of Computer Experiments
,
Springer-Verlag
,
New York
.
6.
Garbo
,
A.
, and
German
,
B.
,
2017
, “
,”
AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference
,
Denver, CO
,
July 5–9
, pp.
1
26
.
7.
Jin
,
R.
,
Chen
,
W.
, and
Simpson
,
T. W.
,
2001
, “
Comparative Studies of Metamodelling Techniques Under Multiple Modelling Criteria
,”
Struct. Multidiscip. Optim.
,
23
(
1
), pp.
1
13
. 10.1007/s00158-001-0160-4
8.
Jin
,
R.
,
Chen
,
W.
, and
Sudjianto
,
A.
,
2005
, “
An Efficient Algorithm for Constructing Optimal Design of Computer Experiments
,”
J. Stat. Plan. Inference
,
134
(
1
), pp.
268
287
. 10.1016/j.jspi.2004.02.014
9.
Olsson
,
A.
,
Sandberg
,
G.
, and
Dahlblom
,
O.
,
2003
, “
On Latin Hypercube Sampling for Structural Reliability Analysis
,”
Struct. Saf.
,
25
(
1
), pp.
47
68
. 10.1016/S0167-4730(02)00039-5
10.
Jin
,
R.
,
Chen
,
W.
, and
Sudjianto
,
A.
,
2002
, “
On Sequential Sampling for Global Metamodeling in Engineering Design
,”
Design Engineering Technical Conferences and Computers and Information in Engineering
,
,
Sept. 29–Oct. 2
, pp.
1
10
.
11.
Lam
,
R.
,
Allaire
,
D. L.
, and
Willcox
,
K. E.
,
2015
, “
Multifidelity Optimization Using Statistical Surrogate Modeling for Non-Hierarchical Information Sources
,”
Am. Inst. Aeronaut. Astronaut.
,
56
(
1
), pp.
1
21
.
12.
Forrester
,
A. I. J.
, and
Keane
,
A. J.
,
2009
, “
,”
Prog. Aerosp. Sci.
,
45
(
1–3
), pp.
50
79
. 10.1016/j.paerosci.2008.11.001
13.
Kandasamy
,
K.
,
Schneider
,
J.
, and
Poczos
,
B.
,
2015
, “
High Dimensional Bayesian Optimisation and Bandits via Additive Models
,”
International Conference on Machine Learning
,
Lille, France
,
July 6–11
, pp.
1
10
.
14.
Jones
,
D.
,
2001
, “
A Taxonomy of Global Optimization Methods Based on Response Surfaces
,”
J. Glob. Optim.
,
21
(
4
), pp.
345
383
. 10.1023/A:1012771025575
15.
Viana
,
F. A. C.
,
Haftka
,
R. T.
, and
Watson
,
L. T.
,
2013
, “
Efficient Global Optimization Algorithm Assisted by Multiple Surrogate Techniques
,”
J. Glob. Optim.
,
56
(
2
), pp.
669
689
. 10.1007/s10898-012-9892-5
16.
Zhao
,
L.
,
Choi
,
K.
, and
Lee
,
I.
,
2010
, “
A Metamodeling Method Using Dynamic Kriging and Sequential Sampling
,”
13th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference
,
Fort Worth, TX
,
Sept. 13–15
, pp.
1
18
.
17.
Marques
,
A. N.
,
Lam
,
R. R.
, and
Willcox
,
K. E.
,
2018
, “
Contour Location via Entropy Reduction Leveraging Multiple Information Sources
,”
Conference on Neural Information Processing Systems
,
,
Dec. 3–8
, pp.
1
11
.
18.
Ranjan
,
P.
,
Bingham
,
D.
, and
Michailidis
,
G.
,
2008
, “
Sequential Experiment Design for Contour Estimation From Complex Computer Codes
,”
Technometrics
,
50
(
4
), pp.
527
541
. 10.1198/004017008000000541
19.
Picheny
,
V.
,
Ginsbourger
,
D.
,
Roustant
,
O.
,
Haftka
, and
RaphaelKim
,
N.
,
2010
, “
Adaptive Designs of Experiments for Accurate Approximation of a Target Region
,”
ASME J. Mech. Des.
,
132
(
7
), pp.
1
13
. 10.1115/1.4001873
20.
Jiang
,
Z.
,
Chen
,
S.
,
Apley
,
D. W.
, and
Chen
,
W.
,
2016
, “
Reduction of Epistemic Model Uncertainty in Simulation-Based Multidisciplinary Design
,”
ASME J. Mech. Des.
,
138
(
8
), p.
8
. 10.1115/1.4033918
21.
Xu
,
S.
,
Liu
,
H.
,
Wang
,
X.
, and
Jiang
,
X.
,
2014
, “
A Robust Error-Pursuing Sequential Sampling Approach for Global Metamodeling Based on Voronoi Diagram and Cross Validation
,”
ASME J. Mech. Des.
,
136
(
7
), pp.
1
10
. 10.1115/1.4027161
22.
Liu
,
Y.
,
Shi
,
Y.
,
Zhou
,
Q.
, and
Xiu
,
R.
,
2016
, “
A Sequential Sampling Strategy to Improve the Global Fidelity of Metamodels in Multi-Level System Design
,”
Struct. Multidiscip. Optim.
,
53
(
6
), pp.
1295
1313
. 10.1007/s00158-015-1379-9
23.
Sacks
,
J.
,
Welch
,
W. J.
,
Mitchell
,
T. J.
, and
Wynn
,
H. P.
,
1989
, “
Design and Analysis of Computer Experiments
,”
Stat. Sci.
,
4
(
4
), pp.
409
423
. 10.1214/ss/1177012413
24.
Kushner
,
H. J.
,
1964
, “
A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise
,”
J. Basic Eng.
,
86
(
1
), pp.
97
106
. 10.1115/1.3653121
25.
Chen
,
S.
,
Jiang
,
Z.
,
Yang
,
S.
, and
Chen
,
W.
,
2016
, “
Multimodel Fusion Based Sequential Optimization
,”
AIAA J.
,
55
(
1
), pp.
241
254
. 10.2514/1.J054729
26.
Huang
,
D.
,
Allen
,
T. T.
,
Notz
,
W. I.
, and
Miller
,
R. A.
,
2006
, “
Sequential Kriging Optimization Using Multiple Fidelity Evaluations
,”
Struct. Multidiscip. Optim.
,
32
(
5
), pp.
369
382
. 10.1007/s00158-005-0587-0
27.
Welch
,
W. J.
,
1983
, “
A Mean Squared Error Criterion for the Design of Experiments
,”
Biometrika
,
70
(
1
), pp.
205
213
. 10.1093/biomet/70.1.205
28.
Pronzato
,
L.
, and
Müller
,
W. G.
,
2012
, “
Design of Computer Experiments: Space Filling and Beyond
,”
Stat. Comput.
,
22
(
3
), pp.
681
701
. 10.1007/s11222-011-9242-3
29.
Binois
,
M.
,
Huang
,
J.
,
Gramacy
,
R. B.
, and
Ludkovski
,
M.
,
2018
, “
Replication or Exploration ? Sequential Design for Stochastic Simulation Experiments
,”
Technometrics
,
61
(
1
), pp.
1
17
.
30.
Loeppky
,
J. L.
,
Moore
,
L. M.
, and
Williams
,
B. J.
,
2010
, “
Batch Sequential Designs for Computer Experiments
,”
J. Stat. Plan. Inference
,
140
(
6
), pp.
1452
1464
. 10.1016/j.jspi.2009.12.004
31.
Lam
,
R.
,
Willcox
,
K.
, and
Wolpert
,
D. H.
,
2016
, “
Bayesian Optimization With a Finite Budget: An Approximate Dynamic Programming Approach
,”
Neural Inf. Process. Syst.
,
30
, pp.
883
891
.
32.
Wiering
,
M.
, and
van Otterlo
,
M.
,
2012
,
Reinforcement Learning
,
Springer-Verlag
,
Berlin
.
33.
Lam
,
R. R.
, and
Willcox
,
K. E.
,
2017
, “
Lookahead Bayesian Optimization With Inequality Constraints
,”
Advances in Neural Information Processing Systems
,
Long Beach, United States
,
Dec. 4–9
, pp.
1
11
.
34.
Kocis
,
L.
, and
Whiten
,
W. J.
,
1997
, “
Computational Investigations of Low-Discrepancy Sequences
,”
ACM Trans. Math. Softw.
,
23
(
2
), pp.
266
294
. 10.1145/264029.264064
35.
Hazelrigg
,
G. A.
,
1998
, “
A Framework for Decision-Based Engineering Design
,”
ASME J. Mech. Des.
,
120
(
4
), pp.
653
658
. 10.1115/1.2829328
36.
Apley
,
D. W.
,
Liu
,
J.
, and
Chen
,
W.
,
2006
, “
Understanding the Effects of Model Uncertainty in Robust Design With Computer Experiments
,”
ASME J. Mech. Des.
,
128
(
4
), pp.
945
958
. 10.1115/1.2204974
37.
Bertsimas
,
D.
, and
Thiele
,
A.
,
2006
, “
Robust and Data-Driven Optimization: Modern Decision Making Under Uncertainty
,”
Model. Methods, Appl. Innov. Decis. Mak.
,
36
, pp.
95
122
.
38.
von Neumann
,
J.
, and
Morgenstern
,
O.
,
1953
,
Theory of Games and Economic Behavior
,
Princeton University Press
,
Princeton
.
39.
Abbas
,
A. E.
,
2006
, “
Maximum Entropy Utility
,”
Oper. Res.
,
54
(
2
), pp.
277
290
. 10.1287/opre.1040.0204
40.
Hazelrigg
,
G. A.
,
2012
,
Fundamentals of Decision Making
,
Neils Corp
,
Hammond, LA
.
41.
Gallager
,
R. G.
,
2013
,
Stochastic Processes: Theory for Applications
,
Cambridge University Press
,
New York
.
42.
Rasmussen
,
C. E.
, and
Williams
,
C. K. I.
,
2006
,
Gaussian Processes for Machine Learning
,
The MIT Press
,
Cambridge, Massachusetts
.
43.
Martin
,
J. D.
, and
Simpson
,
T. W.
,
2004
, “
On the Use of Kriging Models to Approximate Deterministic Computer Models
,”
Design Engineering Technical Conferences and Computers and Information in Engineering
,
Salt Lake City, UT
,
Sept. 28–Oct. 2
, pp.
1
12
.
44.
Tao
,
S.
,
Shintani
,
K.
,
,
R.
,
Chan
,
Y.
,
Yang
,
G.
,
Meingast
,
H.
, and
Chen
,
W.
,
2017
, “
Enhanced Gaussian Process Metamodeling and Collaborative Optimization for Vehicle Suspension Design Optimization
,”
Design Engineering Technical Conferences And Computers and Information in Engineering
,
Cleveland, OH
,
Aug. 6–9
, pp.
1
12
.
45.
Bertsekas
,
D. P.
,
1995
,
Dynamic Programming and Optimal Control
,
Athena Scientific
,
Belmont
.
46.
Johnson
,
M. E.
,
Moore
,
L. M.
, and
Ylvisaker
,
D.
,
1990
, “
Minimax and Maximin Distance Designs
,”
J. Stat.
,
26
(
2
), pp.
131
148
.
47.
,
S. P.
,
Hax
,
A. C.
, and
Magnanti
,
T. L.
,
1977
,
Applied Mathematical Programming
,
,
Boston
.
48.
Shang
,
B.
, and
Apley
,
W. D.
,
2019
, “
Fully-Sequential Space-Filling Design Algorithms for Computer Experiments
,”
J. Qual. Technol.
https://doi.org/10.1080/00224065.2019.1705207
49.
Jiang
,
Z.
,
Apley
,
D. W.
, and
Chen
,
W.
,
2015
, “
Surrogate Preposterior Analyses for Predicting and Enhancing Identifiability in Model Calibration
,”
Int. J. Uncertain. Quantif.
,
5
(
4
), pp.
341
359
. 10.1615/Int.J.UncertaintyQuantification.2015012627
50.
Hogg
,
R. V.
,
McKean
,
J.
, and
Craig
,
A. T.
,
2014
,
Introduction to Mathematical Statistics
,
Pearson
,
London
.
51.
Mardia
,
K. V.
, and
Marshall
,
R.
,
1984
, “
Maximum Likelihood Estimation of Models for Residual Covariance in Spatial Regression
,”
Biometrika
,
71
(
1
), pp.
135
146
. 10.1093/biomet/71.1.135
52.
Abt
,
M.
, and
Welch
,
W. J.
,
1998
, “
Fisher Information and Maximum- Likelihood Estimation of Covariance Parameters in Gaussian Stochastic Processes
,”
Can. J. S
,
26
(
1
), pp.
127
137
. 10.2307/3315678
53.
van der Vaart
,
A.
,
1998
,
Asymptotic Statistics
,
Cambridge University Press
,
Cambridge
.
54.
Forrester
,
I. J.
,
Sobester
,
A.
, and
Keane
,
A. J.
,
2008
,
Engineering Design via Surrogate Modelling : A Practical Guide
,
Wiley
,
New York
.
55.
Jin
,
R.
,
Chen
,
W.
, and
Sudjianto
,
A.
,
2004
, “
Analytical Metamodel-Based Global Sensitivity Analysis and Uncertainty Propagation for Robust Design
,”
Soc. Automot. Eng.
,
1
(
429
), pp.
1
10
.