## Abstract

The cost of adaptive sampling for global metamodeling depends on the total number of costly function evaluations and to which degree these evaluations are performed in parallel. Conventionally, samples are taken through a greedy sampling strategy that is optimal for either a single sample or a handful of samples. The limitation of such an approach is that they compromise optimality when more samples are taken. In this paper, we propose a thrifty adaptive batch sampling (TABS) approach that maximizes a multistage reward function to find an optimal sampling policy containing the total number of sampling stages, the number of samples per stage, and the spatial location of each sample. Consequently, the first batch identified by TABS is optimal with respect to all potential future samples, the available resources, and is consistent with a modeler’s preference and risk attitude. Moreover, we propose two heuristic-based strategies that reduce numerical complexity with a minimal reduction in optimality. Through numerical examples, we show that TABS outperforms or is comparable with greedy sampling strategies. In short, TABS provides modelers with a flexible adaptive sampling tool for global metamodeling that effectively reduces sampling costs while maintaining prediction accuracy.

## 1 Introduction

Costly physical experiments have been increasingly replaced by simulation models. These simulations facilitate an otherwise very expensive process for developing complex products and services. However, even simulation models have become too expensive to handle the complexity required by many contemporary engineering applications. To this end, the use of metamodels (a.k.a., surrogate models, models of models, emulators, or response surface approximations) is now ubiquitous across a wide range of applications [1–3]. Metamodels offer modelers significant monetary and computational advantages as they can efficiently approximate expensive simulations [4] based on a small number of samples.

Allocating the spatial locations of sample data involves a trade-off between maximizing metamodel accuracy (i.e., reducing the discrepancy between the predicted and actual function) and minimizing the total number of evaluations (sampling cost). The entire set of all such spatial locations is commonly known as a “design” of computer experiments. Typically, two strategies are considered when constructing a “design” [5]: static sampling and adaptive sampling [6,7]. A static sampling approach does not use information from the costly simulation model and can therefore create a design in one stage [8,9]. In contrast, adaptive sampling strategies leverage the information obtained from newly evaluated samples [10] and have been of particular interest in a multitude of research efforts in metamodel based design, including (i) global optimization [11–16], (ii) contour estimation [17–19], and (iii) global metamodeling [20–23].

In this paper, we consider the problem of adaptive sampling for global metamodeling which by convention consists of three steps: (i) allocate the next sample(s) using a heuristic-based acquisition function, (ii) evaluate the costly simulation model, and (iii) retrain the metamodel after including the newly observed data. These three steps are repeated until either the model is considered accurate enough, or until all available resources have been exhausted. This stagewise approach has an advantage over static sampling strategies that are prone to over-sampling or under-sampling, as the required number of samples is not known a priori. Adaptive sampling for global metamodeling techniques helps modelers to efficiently find the required number of samples. The required number of samples depends on the desired metamodel accuracy (i.e., higher accuracy requires more samples), the nonlinearity of the function (i.e., higher nonlinearity requires more samples), and the choice of metamodel. Concerning the latter, Harold Kusher [24] was the first to use the error estimation of a Brownian motion spatial random process (SRP) to guide sampling. Much of the succeeding literature, which includes adaptive sampling for global metamodeling, adopts a similar strategy by leveraging the properties of Gaussian processes (GP) (i.e., a special case of SRP) to construct a design [25,26].

Despite a rich literature, most of the early work on global metamodeling neglects the potential gain of taking multiple samples simultaneously and does not consider the interrelation between samples at present and future sampling stages [14,27], presumably due to tractability challenges. Regarding the latter, by not considering the influence of samples at future sampling stages, a modeler compromises the optimality of the design when multiple sampling stages are required (as is often the case) [28]. An example of an approach that does account for the interrelation for global metamodeling of noisy objective functions is given in Ref. [29]. However, for reasons of tractability, the approach assumes a maximum time horizon. As the potential monetary or temporal gain of performing simulations in batches (i.e., running multiple costly simulations per sampling stage) is not considered, the obtained design will not optimally utilize available resources. An example of an effort that considers the potential gain of batch sampling is given in Ref. [30], where a bin-based Latin hypercube design method is proposed for global metamodeling using a fixed batch size. To the best of our knowledge, an approach that combines multistage sampling with flexible batch size selection considering available resources does not exist, which is likely due to the complexity in formulating and solving such decision-making problems using optimization.

In this article, we present a novel thrifty adaptive batch sampling (TABS) approach for global metamodeling. At each sampling stage, TABS identifies a batch of samples that is optimal with respect to the available resources and the predicted future sampling batches. Optimality in this regard is with respect to the number of invested resources and improved metamodel accuracy. Because the improved metamodel accuracy is uncertain before sampling a new batch, we employ normative decision theory to make the trade-off between improved prediction accuracy and sampling cost that is consistent with a modeler’s preference and risk attitude. As a result, TABS offers decision support to identify a design that satisfies a predefined metamodel accuracy criterion using minimal resources.

The proposed method adopts a multistage reward function that is analogous to the reward function in dynamic programming [31,32]. However, as previous work on multistage adaptive sampling for global optimization has indicated, dynamic programming scales poorly with the increase in time horizon [33]. To circumvent this limitation, we propose two numerical approximations. The first method makes three consecutive decisions (TABS-3) at each stage: (i), the number of samples to consider overall future sampling stages and their spatial location, (ii) in what order to sample, and (iii) the number of sampling stages and batch size at each stage. Note that throughout this paper we consider a decision as the optimization of one objective function, and therefore, a single decision can consist of more than one argument. The second method (TABS-1) considers only one decision per stage: the number of sampling stages and the batch size at each stage. The spatial location and the sampling order for TABS-1 are decided prior to initiating the adaptive sampling procedure from a low-discrepancy sequence (e.g., Sobel, Halton, and Faure [34]). Through simulations, we show that both approaches outperform conventional adaptive sampling strategies in terms of monetary or temporal efficiency.

The format of the remainder of the paper is as follows. Section 2 provides background information on normative decision theory, GP modeling, and dynamic programming. Section 3 introduces the proposed TABS method. Next, we apply TABS to a numerical test problem, explore some of its intricacies with regard to sparse data, risk attitude, and evaluate its performance on an engineering example in Sec. 4. Finally, Sec. 5 concludes the paper.

## 2 Review of Related Work

The fundamental techniques for the development of TABS are presented in this section.

### 2.1 A Normative Approach to Decision-Making Under Uncertainty.

*d*-dimensional Euclidean space. In this context, a preference function rank-orders alternative sampling batches from the most preferred to the least preferred. An example for a preference function in the context of global metamodeling is the integrated mean-squared error, as this is a good proxy for metamodel accuracy (the lower the integrated mean-squared error the better). As such, the spatial location of additional sample(s) can be found by identifying the sampling location(s)

**= {**

*z*

*x*_{1},

*x*_{2}, …,

*x*_{i}} for

*i*∈ ℕ

^{+}that minimizes the integrated mean-squared error

This type of function is typically known as an acquisition function. One limitation is that conventional acquisition functions do not provide a fair trade-off between committed resources and improved metamodel accuracy. To this end, the normative decision theory introduces a *value function**R*(*φ*(** z**)) that quantifies the value of sampling

**. To provide more context, the value can be considered as simulation time or monetary cost. Throughout this work, we will be using simulation time that we consider to be the on the wall clock time. In other words, the value function is a mapping from metamodel accuracy to its equivalent worth in terms of simulation time. In the case of adaptive sampling for global metamodeling, a value function needs to capture a modeler’s value for improving metamodel accuracy and the cost for running additional simulations. However, prediction of the value function is subject to various sources of uncertainty obscuring and compromising the optimality of Eq. (1). An example of such uncertainty sources is the unknown responses at unobserved spatial locations**

*z***.**

*z*Decisions under uncertainty are often made through robust and reliability-based design optimization procedures [36,37]. However, these philosophies do not account for the entire density of the improvement in metamodel accuracy, nor the nonlinearity by which modelers value sampling alternatives [38]. The latter is known as a modeler’s *risk attitude*, a concept drawn from psychological and economic factors that are found through an indifference probability between two sampling alternatives [39,40].

*utility function*

*U*(·), where the expectation is taken with respect to the uncertainty in the inputs of the acquisition function. This requires a new type of acquisition function

*ψ*(

**,**

*z**θ*), where

*θ*are the uncertain inputs. Note that this kind of acquisition function effectively predicts the metamodel accuracy if a batch

**was to be sampled. Rewriting Eq. (1) using**

*z**ψ*(

**,**

*z**θ*) yields

*a*and

*b*are the minimum and maximum values of

*R*(

*ψ*(

**,**

*z**θ*)) for all

**, and**

*z**γ*is a scalar parameter that accounts for the risk attitude. Generally, three risk attitudes can be identified: (i) risk-prone

*γ*< 0, (ii) risk-adverse

*γ*> 0, and (iii) risk-neutral in which case no utility function is used.

### 2.2 Basic Concepts of Spatial Random Processes.

**= [**

*x**x*

_{1},

*x*

_{2}, …,

*x*

_{d}]

^{T}be a

*d*-dimensional input vector $x\u2208X$, and

*y*:ℝ

^{d}→ ℝ be a (noise-free) scalar response function. In this way, GPs approximate a function

*f*(

**) as**

*x***(**

*m***) is a column vector of**

*x**q*pre-specified basis functions,

**is a column vector of**

*β**q*unknown regression coefficients, and

*c*(

**,**

*x***′) is the covariance function often chosen to be Gaussian, i.e.,**

*x***= [**

*ω**ω*

_{1},

*ω*

_{2}, …,

*ω*

_{d}]

^{T}is a vector of correlation parameters that govern the smoothness of the metamodel, $r(x,x\u2032)$ is the Gaussian correlation, and

*σ*

^{2}is the prior variance. Note, Eq. (5) defines a stationary GP model and has been used throughout this paper. However, the proposed method works with any form of the covariance function.

**,**

*β***,**

*ω**σ*

^{2}. Assuming we have queried the costly function at

*n*locations

*x*_{1},

*x*_{2}, …,

*x*_{n}and observed

**= [**

*y**y*(

*x*_{1}),

*y*(

*x*_{2}), …,

*y*(

*x*_{n})]

^{T}, the hyperparameters are typically found by maximizing the likelihood (or log-likelihood) function through numerical optimization. Through a procedure known as profiling [43,44], the log-likelihood can be simplified to

*n*×

*n*matrix with the (

*i*,

*j*)th element given as $Ri,j=r(xi,xj)$ for

*i*,

*j*= 1, …,

*n*, and typically Ω = [−10, 10]

^{d}. Subsequent to estimating

**, the response**

*ω**y*(

**) can be predicted from the posterior mean $\mu \xaf(x)$, and mean-squared error $\sigma \xaf2(x)$ for any**

*x***in the design space.**

*x*### 2.3 Basic Concepts of Dynamic Programming.

Adaptive sampling lends itself well to the adoption of a dynamic programming perspective as it includes a set of subsequent intercorrelated decisions at discrete sampling stages [45]. Note that in this section we use the term “time-step,” instead of “sampling stage”, as this is a convention in dynamic programming literature. Consider a discrete-time-step dynamic that at time *t* is characterized by a state $st\u2208St$, where at each step an action *a*_{t} from an action space $At(st)$ is applied. Note that the action space has the current state as an input argument because not all actions might be available at each time-step. For a specified action *a*_{t} at state $st$, the system transits into state $st+1\u2208St+1$ at time-step *t* + 1.

*τ*is the total number of future time steps (a.k.a., time horizon), and for a specific time-step

*t*, we have the

*dynamic*$\Gamma t:At\xd7St\xd7\Theta t\u2192St+1$

*a*

_{t}=

*μ*

_{t}(

*s*

_{t}), i.e., $\mu t:St\u2192At$. The definition of optimality is rooted in the use of a

*reward function*, a function that quantifies the reward when an action

*a*

_{t}is taken to transit into state $st+1$ while considering the disturbance

*θ*

_{t}. The general formulation of the reward function is given as $rt:At\xd7St\xd7\Theta t\u21a6R$. For an arbitrary decision policy and a finite number of time steps, the expected cumulative reward is given as

*θ*

_{t}. The cumulative reward is commonly referred to as

*action-value function*and should not be confused with the deterministic value function from normative decision theory that we defined as

*R*(·).

This optimization maximizes the expected reward over *τ* future steps considering all admissible policies $M$. It should be pointed out that this formulation is recursive so that it can potentially be solved through backward induction starting from *t* = *τ* to *t* = 1. Whether Eq. (9) can actually be solved in reality depends on the tractability of the specific problem.

## 3 Thrifty Adaptive Batch Sampling and Its Implementation

The general problem setting considered in this work is the case where a modeler has to evaluate a costly function $f:X\u21a6R$ a large number of times, where $x\u2208X$ with $X\u2282Rd$ a bounded subset of a *d*-dimensional Euclidean space. To reduce the cost, the modeler wants to replace the costly simulation function *f*(** x**) with a metamodel. In this work, we only consider continuously differentiable simulation functions as the converse cannot be modeled through typical GPs. Note that this is only a minor limitation as many physical phenomena are continuous. This section starts by introducing the general framework of TABS (Sec. 3.1), followed by introducing two numerical approximation techniques TABS-3 and TABS-1 (Sec. 3.2). More details of assessing the elements of TABS’s acquisition function, including the utility-based reward function formulation, probability assessment of meeting the accuracy criterion, and the example of a potential reward function, are provided in Secs. 3.3–3.5, respectively.

### 3.1 General Framework Thrifty Adaptive Batch Sampling.

TABS is initiated by defining five following inputs:

The maximum amount of resources $Feq$ that the modeler is willing to invest (in this work, “resources” refers to simulation time, but can also be formulated in terms of money).

A cost function

*C*(·) that captures the cost as a function of the number of sampling stages and the batch size at each stage.Given an estimate for the metamodel accuracy

*ψ*(·), the modeler has to specify a stopping criterion*ɛ*under which the metamodel is considered sufficiently accurate.The modeler’s risk attitude

*U*(·).An initial design $a0={x0,1,x0,2,\u2026,x0,m0}$ containing a small number of samples, where

*m*_{0}is the total number of initial samples, and $x0,i\u2208X,i=1,\u2026,m0$.

The first three inputs are commonly used in various adaptive sampling methods, while the risk attitude and the cost function are not. The utility function as introduced in Sec. 2.1 enables modelers to make decisions consistent with their risk attitude, and the cost function translates the effort of taking samples into simulation time. Adaptive sampling is an iterative *discrete time process*, where each discrete point in time is referred to as a *sampling stage,* and each sampling stage consists of four *phases* as marked in Fig. 1. We propose two numerical approximations, TABS-1 (blue arrows) and TABS-3 (green arrows) as optimizing the acquisition function is computationally intensive. The inputs as defined in Phase 0 to these two approaches differ slightly and will be further explained in Sec. 3.2.

In the first phase, we initiate the optimal sampling policy $\pi *={a0}$, setting the number of sampling stages *t* = 1, and define the current state $s0$ as the empty set. Note that a state $st\u2208St$ is defined by all samples $at\u2208At(st)$ and their response *y*_{t} = *f*(*a*_{t}); in other words, $st=\u222ai=0t\u22121{ai,yi}$ for $ai={xi,1,xi,2,\u2026,xi,mi}$, and $yi={yi,1,yi,1,\u2026,yi,mi}$. The first batch of the sampling policy is then used in Phase 1 to run the computer simulations and define the current state $st=st\u22121\u222a{(at\u22121,yt\u22121)}$. Next, in Phase 2, the GP model is updated by re-estimating its hyperparameters through the likelihood function in Eq. (6). The state and the GP model are then passed to Phase 3 to estimate the metamodel accuracy. TABS is terminated if the accuracy requirement is met; otherwise, the current state and GP model are passed to the next phase. In Phase 4, a *utility-based action-value function* (UBAV) (i.e., the newly proposed acquisition functions) denoted by $V\pi (st)$ is maximized updating the optimal sampling policy $\pi *$, containing the recommended optimal number of samples, spatial location, number of stages, and the batch size for each stage. TABS-3 uses three consecutive decisions as given by the green block in Fig. 1, and TABS-1 only considers one decision as given by the blue block in Fig. 1.

After Phase 4, the number of stages is updated *t* = *t* + 1 and the procedure is repeated until the accuracy criterion is satisfied. Note that at the beginning of each stage the optimal sampling policy is updated as $\pi *={\pi 1*,\pi 2*,\u2026,\pi \tau *}$, where $\pi i*$ is the optimal batch of samples at the *i*th sampling stage for a total of *τ* stages. For notational purposes, we have not indexed ** π^{*}** with time but note that it is updated and differs at each sampling stage.

### 3.2 Implementation and Tractability.

The formulation of the UBAV is likely to be computationally intensive due to the large variability in sampling policies, and thus, we require a heuristic that can mitigate the complexity at a minimal cost of optimality. In this section, we introduce two such methods, TABS-3 and TABS-1.

#### 3.2.1 TABS Through Three Consecutive Decisions (TABS-3).

TABS-3 improves numerical tractability by approximating the UBAV through three consecutive decisions: (i) total number of samples that will be considered and their spatial locations, (ii) the order in which to sample, and (iii) how many batches and the size of each batch.

##### Decision I: Number and spatial location of samples.

*τ*= 1), and maximizing $V\pi (st)$, with respect to the total number of samples

*t*we have already observed a total of $np$ samples

*X*_{p}(subscript

*p*indicates past), and we consider a total of $nf=\u2211i=1tmi$ future samples

*X*_{f}(subscript

*f*indicates future). By introducing $Xpf=Xp\u222aXf$, and maximizing the minimum distance between samples gives

Simplifying the problem by settings the number of sampling stages equal to one compromises the optimality of the sampling policy when the true optimum spans more than one stage. However, it is reasonable to believe that this compromise is marginal, as it is observed that the first several sampling stages have the largest contribution to the UBAV as any reasonable reward function is monotonically decreasing with the increase of sampling stages. Consequently, the UBAV will maximize the probability of satisfying the accuracy criterion (see details in Sec 3.4) at the most forthcoming stages and is thus relatively insensitive to additional samples.

##### Decision II: Sampling order.

*τ*= 1), define a new set of samples $D1*\u223cd1,i*$ that includes all but the

*i*th sample of $D1*$, and set $h=nf$. We can then find the least beneficial sample as

Next, we remove $d2,k*$ from $D1*$, incrementally decrease *h* by one and reinitiate Eq. (12) to find the next least beneficial sample. This process is repeated until all samples in $D1*$ have been exhausted, or until the probability of meeting the accuracy criterion is 0. We then obtain the decision vector $D2*={d2,1*,d2,2*,\u2026,d2,nf*}$ containing an ordered set of $nf$ samples, where $d2,i*\u2208D1*$.The process of incrementally removing samples is analogous to backward induction, and is a common procedure in the field of dynamic programming [47].

##### Decision III: Number and size of batches.

*t*≥ 1. During Decision II, we have already obtained the probabilities of satisfying the accuracy criterion and can therefore readily evaluate the UBAV function and approximate the optimal sampling policy as

#### 3.2.2 TABS Through One Decision (TABS-1).

*x*_{c}∈ ℝ

^{d×i}, where $Uj=1ixc,j$ has good space-filling properties for any

*i*∈ ℕ

^{+}. Consequently, we can use such a sequence to decide where to sample and in what order during the initiation Phase 0. The integration of a low-discrepancy sequence forms the basis for the TABS-1 approach as depicted by green arrows in Fig. 1. In Phase 0 a candidate set

*x*_{c}is prescribed of which the first

*i*

_{0}elements are selected to form the initial design

*a*

_{0}. In this work we used a fully sequential space-filling (FSSF) sequence [48] to create

*x*_{c}. An FSSF sequence is created by sequentially selecting samples from an additional and larger Sobol sequence using minimum pairwise distance. In this way, no repeated decisions need to be made regarding the order and the locations of samples. The remaining process is identical to TABS-3, except for Phase 4 where the optimal sampling policy is approximated through a single decision that is formulated as

Note how the decision is nearly identical to Eq. (13), except that in addition to batching the samples we also have to find the optimal number of samples $nf$.

In terms of numerical complexity, TABS-1 grows implicitly with the increase in input dimensionality as more future samples will be required (i.e., $nf$ becomes larger). More specifically, the number of combinations in which the additional samples can be batched increases at a squared rate as the total number of future samples increases. Observe that this increase in complexity equals that of Eq. (1) with respect to increasing input dimensionality. Consequently, TABS-1 scales similar in terms of numerical complexity with the increase in input dimensionality as conventional acquisition functions.

### 3.3 Formulation of the Utility-Based Action Value Function.

Critical to any adaptive sampling approach is the formulation of the acquisition function; in the case of TABS, this is the UBAV that captures the multistage nature of the problem by adopting a dynamic programming perspective. Conventional dynamic programming is likely to run into the following challenges: (i) the decision space is continuous and likely to be intractable, and (ii) an *absorbent* state can be reached after each state transition, but we do not know a priori which states are absorbent. In the context of adaptive sampling, an absorbent state implies that we have reached a state in the sequential decision-making process where the modeler is satisfied with the metamodel accuracy and no more samples/actions are taken.

Details on the decision process as considered by the UBAV function for the first sampling stage *t* = 1 are visualized in Fig. 2. We start at state $s1$ after sampling *a*_{0}. For an arbitrary sampling policy $\pi ={\pi 1,\pi 2,\u2026,\pi \tau}$, the first batch $\pi 1={x1,1,x1,2,\u2026,x1,m1}$ containing *m*_{1} ∈ ℕ^{+} samples would transition the metamodel to a future state $s1$ where we have a probability $p1$ to satisfy the metamodel accuracy and gain a reward *r*_{1} (implying that an absorbent state has been reached). On the other hand, we have a probability $p1\u2032=1\u2212p1$ that our metamodel is not accurate enough and that we require an additional batch of samples *π*_{2} transitioning us to state $s3$ where again we have a probability $p2$ that an absorbent state is reached and would otherwise require additional samples. The final state $s\tau +1$, which is reached after sampling a total of *τ* batches, can be absorbent for one of two reasons: (i) the stopping criteria $\Psi (s\tau +1)<\epsilon $ is met, or (ii) the maximum number of sampling stages has been reached while $\Psi (s\tau +1)\u2265\epsilon $, for which we incur the rewards $r\tau $ and *r*_{τ +1}, respectively.

*T*, which is the number of future sampling stages required to reach an absorbent state. Subsequently, for an arbitrary sampling policy

**, we can identify the probability of**

*π**T*as

*t*= 0, …,

*τ*, $\Psi (s0)=\u221e$ and

*ψ*(

*s*

_{τ +1}) = 0.

Note that the construction of the probabilities of reaching an absorbent state at future time *t* matches the probability of obtaining a reward *r*_{t} as presented in Fig. 2. For notational purposes, we introduce the *probability of meeting the accuracy criterion* at sampling stage *t* as $Pt=P(T=t;\pi )$.

*θ*

_{t}is a random disturbance, and

*π*

_{t}is the

*t*th batch of sampling policy $\pi ={\pi 1,\pi 2,\u2026,\pi \tau}$, containing

*m*

_{t}∈ ℕ

^{+}samples given as $\pi t={xt,1,xt,2,\u2026,xt,mt}$ for $xt,mt\u2208X$. According to dynamic programming, we can identify an action value function for an arbitrary sampling policy as

*A*

_{t}is the set of all admissible samples that can be taken at the

*t*th stage. Note that in order to predict a future state, we require a system dynamic Γ(·). For this purpose, TABS uses a preposterior analysis of our GP model. More specifically, the system dynamic is obtained by adding samples

*a*

_{t}and a random realization of the predicted response to the GP model [49].

### 3.4 Formulation of the Probability of Meeting the Accuracy Criterion.

*ψ*(·) <

*ɛ*is the potential existence of a set of samples for which our metamodel is considered accurate enough so that no more samples are required. However, we cannot predict with certainty if the stopping criteria for a future state will be met because it is a function of the random variables

*y*_{t}=

*f*(

*a*

_{t}) and the approximated hyperparameters

**. For the disturbance associated with the unknown response, we use the prediction of our current GP given as $P\Lambda (\lambda )=N(\mu \xaf(at;st),\sigma \xaf2(at;st))$. The disturbance associated with the roughness parameters are estimated through the Fisher information matrix $I:\Omega \xd7\Omega \u21a6Rd\xd7d$ that takes the variance of the sensitivity of the likelihood function in Eq. (6) with respect to changes in the hyperparameters [50]. The closed form of the Fisher information matrix for a GP is**

*ω**tr*(·) is the trace operator, and

*k*,

*l*= 1, …,

*d*[51,52]. Under the assumption of the Bernstein-Von Mises theorem [53], we can approximate $P(\omega )\u2248N(\omega ^,I\u22121(\omega ))$, where $\omega ^$ is the maximum likelihood estimate of the roughness parameters.

*t*(i.e., $\u2211i=1tPi$) is the predicted probability that the metamodel accuracy is below threshold

*ɛ*, obtained as

*t*= 1, …,

*τ*and $P0=p0=0$.

### 3.5 Formulation of a Reward Function.

*a*

_{t}. As previously mentioned, the quantity of interest used to gauge the accuracy of a metamodel is the estimation of the metamodel accuracy

*ψ*(

*s*

_{t}). However, this quantity only accounts for the potential gain, while neglecting the cost associated with the sampling strategy. As such, a general reward function is defined as

*ɛ*and the maximum amount of time that a modeler is willing to invest $Feq$.

*C*

_{t}should adequately account for the number of sampling stages

*t*and the number of samples observed during each stage |

*a*

_{t}|, where | · | is the cardinality operator (i.e., the number of samples in

*a*

_{t}). In the examples used in this work, we propose a flexible cost function as

*b*

_{s}∈ ℕ

^{+}is given by

*c*

_{b}∈ ℝ

^{+}, the cost of initiating a stage

*c*

_{st}∈ ℝ

^{+}, and

*α*∈ ℝ

^{+}is a tuning parameter. Furthermore, [ · ] rounds a variable down to its nearest integer, and

*rem*(·) is the modulo operator that gives the remainder of a fraction. Selecting values for

*α*can have three different effects: (i) when 0 <

*α*< 1 TABS will discount more than one sample per stage, (ii) when

*α*= 1 cost increases linearly with the number of samples per batch, and (iii) for 1 <

*α*< ∞, we obtain a cost function that prefers a batch size that is a multiple of

*b*

_{s}.

For the sake of illustration, a visualization of Eq. (24) is presented in Fig. 3. This visualization considers a batch size of four, *α* > 1 and spans three sampling stages, where seven, five, and two samples are taken in that order. Note for a particular sampling stage, the increase in cost becomes smaller as a batch fills up and rapidly increases again once a batch has been filled (lower two lines). This behavior is analogous to a modeler that is able to run four simulations in parallel and incurs a significant cost increase when one additional stimulation is to be run. The purpose of the cost function is to enable us to solve Eq. (19). In practice, modelers are recommended to specify their own cost function considering the gain of improved metamodel accuracy and sampling cost.

## 4 Numerical and Engineering Examples

In this section, we demonstrate the functionality of TABS by testing it to two numerical functions and one engineering problem.

### 4.1 One-Dimensional Test Problem.

*x*∈ [0, 1]. TABS-3 is used for approximating Eq. (25) and is initiated using the settings in Table 1. The selection of the desired metamodel accuracy (

*ɛ*) and the total amount of simulation time that a modeler is willing to invest $(Feq)$ originate from two schools of thought. In the first, modelers simply allocate new samples until emulation accuracy

*ɛ*has been achieved. In the second, a modeler determines the amount of simulation time $Feq$ and adds samples until all resources have been exhausted. For the selection of the initial number of samples, it is recommended to take between two and five times the input dimension of a problem. This is a conservative number of samples that typically avoids over-sampling (so that no resources are wasted). The risk attitude is best chosen through the selection of an indifference probability for which we refer the reader to Ref. [40]. Finally, the cost function and its parameters are to be selected according to how simulations are run by the modeler. When using Eq. (24), approximations of the parameters can be made by monitoring the used resources for evaluations of the initial set of samples. For example, the batch size

*b*

_{s}can be chosen as the maximum number of simulations that can run in parallel, the batch cost

*c*

_{b}as the time it takes to simulate a single batch, and a conservative choice for

*α*is 1.5 (because this provides moderate advantage for sampling batches).

After problem initiation, we evaluated the response of the five initial samples (Phase 1), as represented by the five circles in the left panel of Fig. 4. Next, a GP is trained to these five observations providing a mean estimation and 95% prediction interval (PI) as shown by the dashed line and shaded region (Phase 2). We find that the metamodel does not meet the accuracy requirement (Phase 3) and thus update the optimal sampling policy by maximizing the UBAV (Phase 4). The optimal sampling policy at the first sampling stage is given as $\pi *={\pi 1*,\pi 2*,\pi 3*}$, where the first batch contains five samples (squares), and the second (diamonds) and third batch (upward pointing triangles) contain two samples each.

Next, TABS advances to the next sampling stage by evaluating Eq. (25) for $\pi 1*={0.82,0.21,0.28,0.40,0.87}$, updating the metamodel (Phase 2) and concluding that the updated GP model does not meet the accuracy criterion (Phase 3). Consequently, we advance to Phase 4 and update the optimal sampling policy by maximizing the UBAV. This process is repeated for a total of five stages before satisfying the accuracy criterion.

The final set of samples, the metamodel prediction, and the true function are presented in the right panel of Fig. 4. We observe that an accurate metamodel is created as it is difficult to distinguish between the original function and the 95% PI. The samples taken after the first stage (squares) were followed by the diamond, upward pointing triangle, downward pointing triangle, and left pointing triangle at subsequent stages. We like to draw the reader’s attention to the last added sample (left pointing triangle) and the relative uniform distance between its neighboring sample points. This is because the decision policy at the first sampling stage had already considered that a sample might be needed in this region, as shown by the diamond on the left. This foresight is an intriguing visualization of how the UBAV positively influences the space-filling property of the final design.

#### 4.1.1 The Influence of Sparse Initial Sampling.

Most adaptive sampling strategies starting with a small number of samples are likely to run into problems associated with sparse data. To exemplify this problem, we present the GP estimation for three samples taken from Eq. (25) in the left panel of Fig. 5. Observe how the mean and the 95% PI quickly converge to a constant. This is a consequence of the negative log-likelihood not being able to accurately predict the roughness parameter *ω* as visualized in the right panel of Fig. 5. Note how the value of the negative log-likelihood is relatively constant over the range *ω* ∈ [2, 10], indicating the difficulty of estimating *ω*.

As a result of sparsity, the metamodel cannot identify if the observed data originate from a linear or a nonlinear function and TABS will therefore recommend taking only a single sample. This is a compelling decision as the optimal number of samples can range from zero to infinity. To understand why TABS makes this decision, consider the uncertainty in the roughness parameter that Eq. (20) estimated as $\omega ^\u223cN(2.6,3.5e27)$. This distribution is used to estimate the probability of satisfying the accuracy criterion. Furthermore, in the case of such large variance in the roughness parameter, we find that Eq. (21) is relatively insensitive to the number of total future samples, i.e., $Pt\u22480.5$, regardless of the decision policy, and thus TABS recommends taking only one sample.

#### 4.1.2 The Influence of Risk Attitude.

To illustrate the influence of the risk attitude, we will take a close look at the optimal sampling policies obtained for approximating Eq. (25) using TABS-3 with various risk attitudes. First, we initiate the problem using five different risk attitudes and, otherwise, the same initial settings as presented in Table 1. The recommended batch sizes at each stage are presented in the second to fifth column of Table 2, and the total number of future samples are presented in the sixth. Note that $|\pi t*|$ is used to indicate the batch size of the *t*th batch in the optimal sampling policy. The risk-adverse attitude considers more samples as the utility function will be less sensitive to the increased cost, while the converse is true for the risk-prone attitude. Considering more samples will have reduced space-filling property when only a few samples are required; but will have improved space-filling property when more samples are needed. This observation is indicative of normative decision-making as it captures the conventional trade-off between improved expected reward and reduced variability.

### 4.2 Two-Dimensional Test Function.

*x*

_{1},

*x*

_{2}∈ [0, 1]. The five sampling techniques considered are the maximum mean-squared error approach (MSE), integrated mean-squared error (IMSE), TABS-3, TABS-1, and the optimal Latin hypercube sampling (OLHS). The MSE approach uses a GP model to sequentially add samples at the spatial location that maximizes the mean-squared error $\sigma \xaf2(\u22c5)$, after each sample has been identified the metamodel gets updated. The IMSE approach is similar to the MSE approach by sequentially adding samples that minimize $\u222bX\sigma \xaf2(\u22c5)dx$. Furthermore, OLHS is a static sampling approach that has been included as a utopian point (if we had known the total number of samples a priori).

TABS-3 is used to find the total number of samples required to approximate Eq. (26) based on the initial settings presented in Table 3. Initiating the same problem with 20 unique initial designs, TABS-3 found that the optimal number of samples $n$ was either 13, 14, or 15. For each of the 20 simulations, TABS-1 selected the first $n$ samples of an FSSF design, exactly replicating the design that would have been achieved if TABS-1 required $n$ samples. In addition, the MSE and IMSE approaches started from the same initial design as TABS-3 and sequentially added samples until the design contained $n$ samples. Finally, OLHS was used to create a static design containing $n$ samples. Note that this implies that all designs have the same number of samples, facilitating a fair comparison between metamodel accuracy.

For comparison, we evaluated Eq. (26) for an additional 100 samples and used those to estimate the normalized root-mean-squared error (NRMSE) as presented in Fig. 6. From this figure, we observe that both TABS-3 and TABS-1 outperform the MSE and IMSE approaches as a result of not considering the influence of potential future samples. What is more, we observe that OLHS has an improved median performance over TABS, corroborating the assertion that static sampling approaches have improved space-filling properties over adaptive sampling approaches. Finally, we observe that TABS-1 outperforms TABS-3, a result that is likely due to the use of the FSSF sequence as they are known to be similar for low dimensional realizations with few samples (thus resulting in small variation in NRMSE).

### 4.3 Engineering Problem: Design of an Engine Piston.

In this section, we apply TABS to the approximation of a piston design simulation as analyzed in Ref. [55]. This is a six-dimensional dynamic computer simulation that predicts the sound level resulting from the secondary motion of an engine piston.

In addition to the methods used in Sec. 4.2, we also consider a bin-based batch sampling technique, that at each stage, the method evaluates *b*_{s} samples from an FSSF sequence until the desired model accuracy is achieved. Using the initial settings as presented in Table 4, we created 20 unique designs with each of the six methods. The performance of each method is compared in terms of sampling cost (Fig. 7(a)) (obtained from the cost function), model accuracy (Fig. 7(b)), number of samples (Fig. 7(c)), and the computational cost (Fig. 7(d)).

Concerning sampling cost, we find that TABS-1 and TABS-3 perform nearly as well as our utopian design (OLHS). In addition, bin-based batch sampling is slightly worse than TABS-1 and TABS-3, while the MSE and IMSE approaches are significantly worse.

As for model accuracy, we have retained 200 samples for validation and used these to estimate NRMSE. It is found that TABS-1, TABS-3, OLHS, and bin-based batch sampling have similar accuracy and outperform the IMSE and MSE adaptive sampling approaches. The MSE approach, in particular, performs poorly in terms of accuracy as it has a tendency to allocate more samples at the boundary of the design space. However, this is not strictly a fair comparison as different approaches require a different number of samples, as can be observed from Fig. 7(c).

The time spent on maximizing the acquisition/UBAV functions (i.e., the computational efficiency of the sampling methods) is presented in Fig. 7(d), and TABS-1 has a significant advantage over TABS-3 while performing similar to conventional one-sample-at-the-time adaptive sampling strategies (i.e., MSE and IMSE). This observation corroborates the assertion made in Sec. 3.2 that TABS-1 has similar numerical complexity as adaptive sampling methods that use an acquisition function as defined by Eq. (1). Moreover, bin-based batch sampling and the OLHS approach are most efficient as only one optimization problem has to be performed during the entire process, while the MSE and IMSE methods solve an optimization problem during each sampling stage. Because TABS enables samples to be taken in batches, fewer sampling stages are required, and thus, fewer optimizations need to be performed than the MSE or IMSE approach. Consequently, we find that TABS-1 scales slightly better in terms of numerical tractability with respect to problem dimensionality as the MSE and IMSE approaches, and TABS-3 performs significantly worse as it needs to solve a nested optimization problem.

To further explore the difference between TABS-1 and TABS-3, one has to consider the difference in batch size at each sampling stage. Consequently, the mean and the standard deviation of the number of samples taken at each stage by TABS-1, TABS-3, and bin-based batch sampling are presented in Fig. 8. We observe that both TABS-1 and TABS-3 have a propensity to allocate larger batches during the first few sampling stages and gradually decrease batch size at later stages. This is an appealing property as the risk of over-sampling becomes larger at later sampling stages (when the design contains more sample points). Moreover, in the worst-case bin-based batch sampling required eight sampling stages, while the worst case of TABS-1 and TABS-3 required only five sampling stages. In addition, during the first stage, TABS-3 has on average larger sampling batches compared to TABS-1. This is likely a result of decision II in TABS-3 where the sampling order is selected so as to minimize the incremental improvement (in terms of the probability of satisfying the accuracy criterion) between the latter samples in the sampling policy. Conversely, the incremental improvement in the probability of satisfying the accuracy criterion between earlier samples will be larger and thus those samples are more likely to be included in the first sampling batch. Furthermore, because TABS-3 includes more samples in the first sampling stage, it will require fewer samples in consecutive states, and thus, the average batch size of TABS-3 at stages 2 and 3 is smaller than TABS-1.

We recommend the use of TABS when there is a potential gain for running simulations in parallel, in batches, or both. Moreover, under close observation, we find a slight advantage of TABS-3 over TABS-1 in terms of accuracy and sampling cost. This observation is likely due to the simplification that TABS-1 enforces on the approximation of Eq. (19). Therefore, we recommend the use of TABS-3 when the problem dimension is low (less than four inputs); otherwise, we recommend the use of TABS-1 for its superior temporal performance.

## 5 Conclusion

This paper presents a thrifty adaptive batch sampling (TABS) approach for global metamodeling. The contribution of TABS is threefold: (i) through integration of normative decision theory, TABS achieves a balance between the cost of sampling and gain in metamodel accuracy that is consistent with a modeler’s preference and risk attitude; (ii) during each sampling stage, TABS finds an optimal sampling policy that considers samples at the current stage as well as samples at future stages, and (iii) TABS provides notable flexibility in that it finds and optimal sampling policy that includes the total number of sampling stages, the total number of samples per stage, and the spatial location of each sample.

For a specific state (i.e., the currently available data set), TABS identifies an optimal sampling policy by predicting the metamodel accuracy at future states while considering the associated sampling cost. Predicting future states introduces uncertainty and complicates the decision-making process while the large variability in the decision space of the sampling policy impedes numerical tractability. These issues have been addressed through the development of a dynamic programming inspired acquisition function for which we proposed two approximation techniques, TABS-1 and TABS-3. The most comprehensive approach is given by TABS-3 that makes three consecutive decision each sampling stage: (i) where to sample and how many samples to consider, (ii) what is the optimal order to sample, and (iii) how many sampling stages and how many samples per stage should there be. Contrarily, TABS-1 reduces numerical complexity by using a low-discrepancy sequence to decide the optimal sampling location and sampling order for each sample in a candidate set, prior to the initiation of the adaptive sampling technique. Consequently, during each sampling stage, TABS-1 only needs to decide the optimal number of stages and the optimal number of samples to take at each stage.

Through the application of TABS to two test problems and one engineering problem, it is shown that TABS-1 and TABS-3 are comparable or outperform conventional adaptive sampling techniques. In terms of sampling cost, both TABS-1 and TABS-3 hold an advantage over conventional approaches (in particular, “one sample at a time” methods), with TABS-3 performing slightly better than TABS-1. However, in terms of computational cost, we show that TABS-1 scales at a similar rate as conventional adaptive sampling methods with respect to function dimensionality, while TABS-3 scales significantly worst as a result of a nested optimization loop.

TABS has specifically been developed under the broad framework of spatial random processes, in particular, Gaussian process (GP) models. Subsequently, the proposed method inherits similar features as GP models. In general, GP methods are powerful for nonlinear, but differentiable functions. Concerning noisy simulation functions, the proposed framework can readily be extended to problems with known homoscedastic and known heteroscedastic noise variance. However, if the noise variance is not known a priori, we need to include it as a hyperparameter in our metamodel and consider its uncertainty in the prediction of future states. This problem further complicates when a simulation model has unknown heteroscedastic noise. The challenge of global metamodeling of computer simulations with unknown heteroscedastic noise will be considered in future work. Moreover, the use of TABS is only warranted when simulations are time-consuming, it is advantageous to run simulations in parallel, and if there is an advantage in minimizing the total number of sampling stages. If running simulations in parallel is not an advantage, TABS will perform similar to “one sample at a time” techniques. If it is not advantageous to reduce the number of sampling stages, TABS will perform similar to conventional batch sampling techniques.

## Acknowledgment

The support from AFOSR FA9550-18-1-0381 and NSF CMMI-1537641 is greatly appreciated.

## Nomenclature

=*x*{

*x*_{1},*x*_{2}, …,*x*_{d}} =*d*-dimensional vector of sampling locations*a*_{t}=${xt,1,xt,2,\u2026,xt,mt}=$ a batch of

*m*_{t}samples observed at sampling stage*t**p*_{t}=conditional probability of satisfying metamodel accuracy criterion at sampling stage

*t*given that it was not satisfied at sampling stage*t*− 1- $s$ =
the state of our metamodel at sampling stage

*t*containing all samples*a*_{i}, and response *x*_{c}=candidate set TABS-1, obtained through a fully low-discrepancy sequence

*y*_{i}for*i*= 1, …,*t**y*_{t}=${yt,1,yt,2,yt,mt}=$ a batch of

*m*_{t}responses observed at sampling stage*t*. (i.e.,*f*(*a*_{t}))- $Pt$ =
probability that future state $s$ is the first to satisfy the metamodel accuracy criterion

- $Feq$ =
available monetary or temporal budget for sampling

*c*_{b},*b*_{s},*α*=constants in the cost function, cost per batch, batch size, and tuning parameter

*f*(·) =costly simulation model

- $nf,np$ =
number of total future and past samples, respectively

*r*_{t}(·) =reward function at sampling stage

*t**w*,=*λ*disturbance of Gaussian process model hyperparameters and unobserved responses, respectively.

*C*_{t}(·) =cost function at sampling stage

*t**D*_{1},*D*_{2}=decision vectors after solving Decisions I and II, respectively

*R*(·) =value function as an element of normative decision-making

*U*(·) =utility function capturing a modeler’s risk attitude

- $V\pi (\u22c5)$ =
action value function of dynamic programming

(·) =*I*Gaussian process Fisher information matrix

- $Xf*(\u22c5)$ =
a set of $nf$ future samples dispersed over the sampling space by maximizing minimum distance

- $L(\u22c5)$ =
Gaussian process likelihood function

- $V\pi (\u22c5)$ =
utility-based action value function

,*β*,*ω**σ*^{2}=Gaussian process hyperparameters

- Γ(·) =
system dynamic that predicts a future state

*ɛ*=desired maximum metamodel model accuracy

*θ*_{t}=disturbance in the prediction of future state

*t*- $\mu \xaf(\u22c5),\sigma \xaf(\u22c5)$ =
mean and error estimation of metamodel

*π*_{t}=${xt,1,xt,2,\u2026,xt,mt}=$ a recommended batch of samples at future sampling stage

*t*=*π*${\pi 1,\pi 2,\u2026,\pi \tau}=$ a sampling policy

*τ*=total number of sampling stages in sampling policy

*π*- Π =
the set of all admissible sampling policies

- Γ(·) =
system dynamic that predicts a future state

*ψ*(·) =a metric for metamodel accuracy