## Abstract

We introduce a novel method for Gaussian process (GP) modeling of massive datasets called globally approximate Gaussian process (GAGP). Unlike most large-scale supervised learners such as neural networks and trees, GAGP is easy to fit and can interpret the model behavior, making it particularly useful in engineering design with big data. The key idea of GAGP is to build a collection of independent GPs that use the same hyperparameters but randomly distribute the entire training dataset among themselves. This is based on our observation that the GP hyperparameter approximations change negligibly as the size of the training data exceeds a certain level, which can be estimated systematically. For inference, the predictions from all GPs in the collection are pooled, allowing the entire training dataset to be efficiently exploited for prediction. Through analytical examples, we demonstrate that GAGP achieves very high predictive power matching (and in some cases exceeding) that of state-of-the-art supervised learning methods. We illustrate the application of GAGP in engineering design with a problem on data-driven metamaterials, using it to link reduced-dimension geometrical descriptors of unit cells and their properties. Searching for new unit cell designs with desired properties is then achieved by employing GAGP in inverse optimization.

## 1 Introduction

Fueled by recent advancements in high-performance computing as well as data acquisition and storage capabilities (e.g., online repositories), data-driven methods are increasingly employed in engineering design [13] to efficiently explore the design space of complex systems by obviating the need for expensive experiments or simulations. For emerging material systems, in particular, large datasets have been successfully leveraged to design heterogeneous materials [48] and mechanical metamaterials [912].

Key to data-driven design is to develop supervised learners that can distill as much useful information from massive datasets as possible. However, most large-scale learners such as deep neural networks (NNs) [13] and gradient boosted trees (GBT) [14] are difficult to interpret and hence less suitable for engineering design. Gaussian process (GP) models (also known as Kriging) have many attractive features that underpin their widespread use in engineering design. For example, GPs interpolate the data, have a natural and intuitive mechanism to smooth the data to address noise (i.e., to avoid interpolation) [15], and are very interpretable (i.e., provide insight into input–output relations) [16,17]. In addition, they quantify prediction uncertainty and have analytical conditional distributions that enable, e.g., tractable adaptive sampling or Bayesian analysis [18]. However, conventional GPs are not readily applicable to large datasets and have been mostly confined to engineering design with small data. The goal of our work is to bridge the gap between big data and GPs while achieving high predictive accuracy.

The difficulty in fitting GPs to big data is rooted in the repetitive inversion of the sample correlation matrix, R, whose size equals the number of training samples, n. Given the practical features and popularity of GPs, considerable effort has been devoted to resolving their scalability shortcoming. One avenue of research has explored partitioning the input space (and hence the training data) via, e.g., trees [19] or Voronoi cells [20], and fitting an independent GP to each partition. While particularly useful for small to relatively large datasets that exhibit the nonstationary behavior, prediction using these methods results in discontinuity (at the partitions’ boundaries) and information loss (because the query point is associated with only one partition). Projected process approximation (PPA) [21] is another method where the information from n samples is distilled into mn randomly (or sequentially) selected samples through conditional distributions. PPA is very sensitive to the m selected samples, however, and overestimates the variance [21]. In Bayesian committee machine (BCM) [22], the dataset is partitioned into p mutually exclusive and collectively exhaustive parts with independent GP priors, and then, the predictions from all the GPs are pooled together in a Bayesian setting. While theoretically very attractive, BCM does not scale well with the dataset size and is computationally very expensive.

Another avenue of research has pursued subset selection. For example, a simple strategy is to only use mn samples to train a GP [23,24], where the m samples are selected either randomly or sequentially based on maximizing some criteria such as information gain or differential entropy score. Reduced-rank approximation of R with mn samples is another option for subset selection and has been used in the Nystrom [25] and subset of regressors [26,27] methods. The m samples in these methods are chosen randomly or in a greedy fashion to minimize some cost function. While the many variants of subset selection may be useful in some applications, they waste information and are not applicable to very large datasets due to the computational and storage costs. Local methods also use subsets of the data because they fit a stationary GP (for each prediction) to a very small number of training data points that are closest to the query point. Locally approximate Gaussian process (LAGP) [28] is perhaps the most widely recognized local method where the subsets are selected either based on their proximity to the query point or to minimize the predictive variance. Despite being useful for nonstationary and relatively large datasets, local methods also waste some information and can be prohibitively expensive for repetitive use since local samples have to be found and a GP must be fitted for each prediction.

Although the recent works have made significant progress in bridging the gap between GPs and big data, GPs still struggle to achieve the accuracy of the state-of-the-art large-scale supervised learners such as NNs and trees. Motivated by this limitation, we develop a computationally stable and inexpensive approach for GP modeling of massive datasets. The main idea of our approach is to build a collection of independent GPs that utilize a converged roughness parameter as their hyperparameters. This is based on an empirical observation that the estimates of the GP hyperparameters change negligibly as the size of the training data exceeds a certain level. While having some common aspects with a few of the abovementioned works, our method is more massively scalable, can leverage multicore or graphical processing unit computations [29,30], and is applicable to very high-dimensional data with or without noise.

As mentioned earlier, big data have enticed new design methods for complex systems such as metamaterials [912], which possess superior properties through their hierarchical structure that consists of repeated unit cells. While traditional methods like topology optimization (TO) provide a systematic computational platform to find metamaterials with unprecedented properties, they have many challenges that are primarily due to the high-dimensional design space (i.e., the geometry of unit cells), computational costs, local optimality, and spatial discontinuities across unit cell boundaries (when multiple unit cells are simultaneously designed). Techniques for TO such as varying the volume fraction or size of one unit cell to maintain continuous boundaries [31,32], adding connectivity constraints [33], and substructuring [34] have recently been proposed but cannot fully address all of the above challenges. Instead, we take a data-driven approach by first building a large training database of many unit cells and their corresponding properties. Unlike previous data-driven works that represent unit cells as signed distance fields [9] or voxels [11], we drastically reduce the input dimension in our dataset by characterizing the unit cells via spectral shape descriptors based on the Laplace–Beltrami (LB) operator. Then, we employ our globally approximate Gaussian process (GAGP) modeling approach to link the LB descriptors of unit cells to their properties and, in turn, efficiently discover new unit cells with desired properties.

The rest of the paper is organized as follows. We first review some preliminaries on GP modeling in Sec. 2 and then introduce our novel idea in Sec. 3. In Sec. 4, we validate the accuracy of our approach by comparing its performance against three popular and large-scale supervised learning methods on five analytical problems. We demonstrate an application of GAGP to our data-driven design method for metamaterials in Sec. 5 and conclude the paper in Sec. 6.

## 2 Review on Gaussian Process Modeling

Below, we describe how GP emulators (also known as surrogates, metamodels, or models) can replace a computer simulator. The procedure is identical if the data are obtained from physical experiments. Let us denote the output and inputs of a computer simulator by, respectively, y and the d dimensional vector x = [x(1), x(2), …, x(d)]T, where x ∈ ℝd. Assume the input–output relation is a realization of the random process η(x):
$η(x)=∑i=1h⁡β(i)fi(x)+ξ(x)$
(1)
where fi(x)’s are some predetermined set of basis functions, β = [β(1), …., β(h)]T are unknown weights, and ξ(x) is a zero-mean GP characterized with its parametric covariance function, c( · , · ):
$cov(ξ(x),ξ(x′))=c(x,x′)=σ2r(x,x′)$
(2)
where r( · ) is the correlation function having the property r(x, x) = 1 and σ2 is the process variance. Various correlation functions have been developed in the literature, with the most widely used one being the Gaussian correlation function:
$r(x,x′)=exp⁡{−(x−x′)TΩ(x−x′)}$
(3)
where $Ω=diag(10ω)$ and ω = [ω(1), ω(2), …, ω(d)]T, − ∞ < ωi < ∞ are the roughness or scale parameters. The collection of σ2 and ω are called the hyperparameters.

With the formulation in Eq. (1) and given the n training pairs of (xi, yi), GP modeling requires finding a point estimate for β, ω, and σ2 via either maximum likelihood estimation (MLE) or cross-validation (CV). Alternatively, Bayes’ rule can be employed to find the posterior distributions if there is prior knowledge on these parameters. Herein, we use a constant process mean (i.e., $∑i=1h⁡βifi(x)=β$) and employ MLE. These choices are widely practiced because a high predictive power is provided while computational costs are minimized [28,3539].

MLE requires maximizing the multivariate Gaussian likelihood function, or equivalently:
$[β^,σ^2,ω^]=argminβ,σ2,ω(n2log⁡(σ2)+12log⁡(|R|)+12σ2(y−1β)TR−1(y−1β))$
(4)
where log( · ) is the natural logarithm, 1 is an n × 1 vector of ones, and R is the n × n correlation matrix with (i, j)th element Rij = r(xi, xj) for i, j = 1, …, n. Setting the partial derivatives with respect to β and σ2 to zero yields:
$β^=[1TR−11]−11TR−1y$
(5)
$σ^2=1n(y−1β^)TR−1(y−1β^)$
(6)
Plugging these values into Eq. (4) and eliminating the constants:
$ω^=argminωnlog⁡(σ^2)+log⁡(|R|)=argminωL$
(7)

By numerically minimizing L in Eq. (7), one can find $ω^$. Many global optimization methods such as genetic algorithm (GA) [40], pattern searches [41,42], and particle swarm optimization [43] have been employed to solve for $ω^$ in Eq. (7). However, gradient-based optimization techniques are commonly preferred due to their ease of implementation and superior computational efficiency [15,16,35]. To guarantee global optimality in this case, the optimization is done numerous times with different initial guesses. It is noted that, in practice, the search space of ωi is generally limited to [− 20, 5] rather than ( − ∞, ∞) since the correlation exponentially changes as a function of ωi (see also Fig. 3).

On completion of MLE, the following closed-form formula can be used to predict the response at any x*:
$y^(x*)=β^+gT(x*)V−1(y−1β^)$
(8)
where $g(x*)$ is an n × 1 vector with ith element $c(xi,x*)=σ^2r(xi,x*)$, V is the covariance matrix with (i, j)th element $σ^2r(xi,xj)$, and y = [y1, …, yn]T are the responses in the training dataset. The posterior covariance between the responses at $x*$ and x′ reads:
$cov(y*,y′)=c(x*,x′)−gT(x*)V−1g(x′)+hT(1TV−11)−1h$
(9)
where h = (11TV−1g(x′)).
If the training dataset has multiple outputs, one may fit either a single-response GP emulator to each response or a multiresponse GP to all the responses. We follow Ref. [44] and extend the above formulations to simulators with q responses by placing a constant mean for each response (i.e., β = [β(1), …., β(q)]T) and employing the separable covariance function:
$cov(ξ(x),ξ(x′))=c(x,x′)=Σ⊗r(x,x′)$
(10)
where ⊗ denotes the Kronecker product and $Σ$ is the q × q process covariance matrix with its off-diagonal elements representing the covariance between the corresponding responses at any fixed x. The MLE approach described above can also be applied to multiresponse datasets in which case σ2 will be replaced with $Σ$ (see Refs. [4548] for details).

Finally, we note that GPs can address noise and smooth the data (i.e., avoid interpolation) via the so-called nugget or jitter parameter, δ, in which case R is replaced with $Rδ=R+δIn×n$. If δ is used, the estimated (stationary) noise variance in the data would be $δσ^2$. We have recently developed an automatic method to robustly detect and estimate noise [35].

## 3 Globally Approximate Gaussian Process

Regardless of the optimization method used to solve for $ω^$, each evaluation of L in Eq. (7) requires inverting the n × n matrix R. For very large n, there are two main challenges associated with this inversion: computational cost of approximately O(αn3) and singularity of R (since the samples get closer as n increases). To address these issues and enable GP modeling of big data, our essential idea is to build a collection of independent GPs that use the same $ω^$ and share the training data among themselves.

To illustrate, we consider the function y = x4x3 − 7x2 + 3x + 5sin(5x) over −2 ≤ x ≤ 3. The associated likelihood profile (i.e., L) is visualized in Fig. 1 as a function of ω for various values of n. Two interesting phenomena are observed in this figure: (i) With large n, the profile of L does not alter as the training samples change. To observe this, for each n, we generate five independent training samples via Sobol sequence [49,50] and plot the corresponding L. As illustrated in Fig. 1, even though a total of 20 curves are plotted, only four are visible since the five curves with the same n are indistinguishable. (ii) As n increases, L is minimized at similar ω’s.

While we visualize the above two points with a simple 1D function, our studies indicate that they hold in general (i.e., irrespective of problem dimensionality and the absence or presence of noise; see Sec. 4) as long as the number of training samples is large. Therefore, we propose the following approach for GP modeling of large datasets.

Assuming a very large training dataset of size n is available, we first randomly select a relatively small subset of size n0 (e.g., n0 = 500) and estimate $ω^0$ with a gradient-based optimization technique. Then, we add ns random samples (e.g., ns = 250) to this subset and estimate $ω^1$ while employing $ω^0$ as the initial guess in the optimization. This process is stopped after s steps when $ω^$ does not change noticeably (i.e., $ω^s≅ω^s−1$) as more training data are used. The latest solution, denoted by $ω^∞$, is then employed to build m GP models, each with nkn0 + s × ns samples chosen randomly from the entire training data such that $n=∑k=1m⁡nk$. Here, we have assumed that the collection of these GPs (who have $ω^∞$ as their hyperparameters) approximate a GP that is fitted to the entire training dataset and, correspondingly, call it GAGP. The algorithm of GAGP is presented in Fig. 2.

We point out the following important features regarding GAGP. First, we recommend using gradient-based optimizations throughout the entire process because (i) if n0 is large enough (e.g., n0 > 500), one would need to select only a few initial guesses to find the global minimizer of Eq. (7), i.e., $ω^0$ (we suggest the method developed in Ref. [35] to estimate $ω^0$); and (ii) we want to use $ω^i−1$ as the initial guess for the optimization in the ith step to ensure fast convergence since the minimizer of L changes slightly as the dataset size increases (see Fig. 1). Regarding the choice on n0, note that it has to be small enough to avoid prohibitive computational time but large enough, so that (i) the global optimum changes slightly if n0 + s data points are used instead of n0 data points, and (ii) most (if not all) of the local optima of L are smoothed out. Second, for predicting the response, Eq. (8) is used for each of the m GP models, and then, the results are averaged. In our experience, we observe very similar prediction results with different averaging (e.g., weighted averaging where the weights are proportional to inverse variance) or pooling (e.g., median) schemes. The best scheme for a particular problem can be found via CV, but we avoid this step to ensure ease of use and generality. The advantages of employing a collection of models (in our case the m GPs) in prediction are extensively demonstrated in the literature [14,22]. Third, the predictive power is not sensitive to n0, s, and ns so long as large enough values are used for them. For novice users, we recommend starting with n0 = 500, s = 6, and ns = 250, and equally distributing the samples among the m resulting GPs (we use these parameters in Sec. 5 and for all the examples in Sec. 4). For more experienced users, we provide a systematic way in Sec. 4 to choose these values based on GP’s inherent ability to estimate noise using the nugget variance. Finally, we point out that GAGP has a high predictive power and is applicable to very large datasets while maintaining a straightforward implementation because it only entails integrating a GP modeling package such as GPM [35] with the algorithm presented in Fig. 2.

## 4 Comparative Studies on Analytical Examples

To validate the performance of GAGP in regression, we compare its predictive power on five examples (Ex1–5) against recognized big data learners: LAGP [28], GBT [14], and feed-forward NNs [51]. As shown in Eqs. (11)(15), the examples cover a wide range of dimensionality and input–output complexity. Ex1 is a 1D function with many local fluctuations due to the sin(5x) term. Ex2 is a 4D function that is primarily sensitive to the first two inputs. Ex3 is a 7D function that models the cycle time of a piston, which mainly depends on the first and fourth inputs. Ex4 is a very complex 10D function relating the stored potential in a material to the applied load and material properties; the inputs interact nonlinearly and all affect the response. Finally, Ex5 is a 20D function where the output is independent of x8 and x16.
• Ex1:
$y=x4−x3−7x2+3x+5sin⁡(5x)−2≤x≤3$
(11)
• Ex2 [52]:
$y=(1−exp⁡(−12x2))(x3x13+1900x12+2092x1+60)x4x13+500x12+4x1+20min(x)=[0,0,2200,85]max(x)=[1,1,2400,110]$
(12)
• Ex3 [53]:
$y=2πx1x4+4x3x4x5x6(x5x2+19.62x1−x4x3x2)2+4x4x5(x6x7)x7min(x)=[30,0.005,0.002,1000,9×104,290,340]max(x)=[60,0.02,0.01,5000,11×104,296,360]$
(13)
• Ex4 [54]:
$y=92x9εm2+x8x101+x7[εeqx10]1+x7ε=(x1x6x5x6x2x4x5x4x3),εm=13Tr(ε),εd=ε−εm1,εeq=23(εd:εd)min(x)=[−0.1,−0.1,−0.1,−0.1,−0.1,−0.1,0.02,1,5,0.1]max(x)=[0.1,0.1,0.1,0.1,0.1,0.1,0.1,5,25,1.5]$
(14)
• Ex5 [53]:
$y=5x121+x1+5(x4−x20)2+x5+40x193−5x1+0.05x2+0.08x3−0.03x6+0.03x7−0.09x9−0.01x10−0.07x11+0.25x132−0.04x14+0.06x15−0.01x17−0.03x18−0.5≤xi≤0.5fori=1,…20$
(15)

For each example, two independent and unique datasets of size 30,000 are generated with Sobol sequence [50], where the first one is used for training and the second for validation. In each example, Gaussian noise is added to both the training and validation outputs. We consider two noise levels to test the sensitivity of the results where the noise standard deviation (SD) is determined based on each example’s output range (e.g., the outputs in Ex1 and Ex4 fall in the [−20, 5] and [0, 1.8] ranges, respectively). As we measure performance by root mean squared error (RMSE), the noise SD should be recovered on the validation dataset (i.e., the RMSE would ideally equal noise SD).

We use CV to ensure the best performance is achieved for LAGP, GBT, and NN. For GAGP, we choose n0 = 500, s = 6, and ns = 250 and equally distribute the samples among the $m=30000/(500+6×250)=15$ GPs (i.e., each GP has 2000 samples). The results are summarized in Table 1 (for small noise SD) and Table 2 (for large noise SD) and indicate that (i) GAGP consistently outperforms LAGP and GBT, (ii) both GAGP and NN recover the true amount of added noise with high accuracy, and (iii) GAGP achieves very similar results to NN. Given the large number of data points, the effect of sample-to-sample randomness on the results is very small and hence not reported.

We highlight that the performance of GAGP in each case could have been improved even further by tuning its parameters via CV (which was done for LAGP, GBT, and NN). Potential parameters include n0, s, ns, and fi(x). However, we intentionally avoid this tuning to demonstrate GAGP’s flexibility, generality, and ease of use.

In engineering design, it is highly desirable to employ interpretable methods and tools that facilitate the knowledge discovery and decision-making processes. Contrary to many supervised learning techniques such as NNs and random forests that are black boxes, the structure of GPs can provide qualitative insights. To demonstrate, we rewrite Eq. (3) as $r(x,x′)=exp⁡{−∑i=1d⁡10ωi(x(i)−x(i)′)2}$. If ωi ≪ 0 (e.g., ωi = −10), then variations along the ith dimension (i.e., x(i)) do not contribute to the summation and, subsequently, to the correlation between x and x′ (see Fig. 3 for a 1D illustration). This contribution increases as the magnitude of ωi increases. In a GP with a constant mean of β, all the effect of inputs on the output is captured through r(x, x′). Hence, as ωi decreases, the effect of xi on the output decreases as well. We illustrate this feature with a 2D example as follows. Assume $y=f(x1,x2;α)=sin⁡(2x1x2)+αcos⁡(αx12)$, − πx1, x2π for α = 2, 4, 6. Three points regarding f are highlighted:

1. x1 is more important than x2 since both sin(2x1x2) and $αcos⁡(αx12)$ depend on x1 (note that α ≠ 0), while x2 only affects the first term.

2. As α increases, the relative importance of x1 (compared with x2) increases because the amplitude of $αcos⁡(αx12)$ increases.

3. As α increases, y depends on x1 with growing nonlinearity because the frequency of $αcos⁡(αx12)$ increases.

The first two points can be verified by calculating Sobol’s total sensitivity indices (SIs) for x1 and x2 in f; see Table 3. These indices range from 0 to 1, with higher values indicating more sensitivity to the input. Here, the SI of x1 is always 1, but the SI of x2 decreases as α increases, indicating that the relative importance of x1 on y increases as α increases.

Note that calculating the Sobol’s SIs involves evaluating f for hundreds of thousands of samples, while a GP can distill similar sensitivities from a dataset. To show this, for each α, we fit two GPs: one with n = 1000 training data and the other with n = 2000. The hyperparameter estimates are summarized in Table 3 and indicate that:

• For each α, $ω^1$ is larger than $ω^2$, implying x1 is more important than x2.

• As α increases, $ω^1$ increases (x1 becomes more important), while $ω^2$ changes negligibly (the underlying functional relation between x2 and y does not depend on α).

• For a given α, the estimates change insignificantly when n is increased.

The above feature is present in GAGP as well and depicted by the convergence histories for Ex3 and Ex5 in Figs. 4 and 5, respectively. Similar to Fig. 1, it is evident that the estimated roughness parameters do not change noticeably as more samples are used in training (only 6 of the 20 roughness parameters are plotted in Fig. 5 for a clearer illustration). The values of these parameters can determine which inputs (and to what extent) affect the output. For instance, in Ex5, ω8 is very small so the output must be almost insensitive to x8. In addition, since ω4ω20, it is expected that the corresponding inputs should affect y similarly. These observations agree with the analytical relation between x and y in Ex5, where y is independent of x8 and symmetric with respect to x4 and x20. By using GAGP, such information can also be extracted from a training dataset whose underlying functional relation is unknown and subsequently used for sensitivity analysis or dimensionality reduction (e.g., in Ex5, x8 and x16 can be excluded from the training data).

In Figs. 4 and 5, the estimated variance, $δσ^2$, varies closely around the true noise variance. It provides a useful quantitative measure for the expected predictive power (e.g., RMSE in future uses of the model). In addition, like $ω^$, its convergence history helps in determining whether sufficient samples have been used in training. First, the number of training samples should be increased until $δσ^2$ does not fluctuate noticeably. Second, via k-fold CV during training, the true noise variance should ideally be recovered by calculating the RMSE associated with predicting the samples in the ith fold (when fold i is not used in training). If these two values differ significantly, s (or ns) should be increased. For instance, if the fluctuations on the right panel in Fig. 5 had been large or far from the noise variance, we would have increased s (from 6 to, e.g., 10) or ns (from 250 to, e.g., 500).

We close this section with some theoretical discussions related to the use of the same hyperparameters within a collection of independent GPs. In Bayesian experimental design [55], multidimensional integrals ubiquitously arise when maximizing the expected information gain, i.e., E[I]. By quantifying I using Kullback–Leibler divergence [56], it can be shown [57] that $E[I]=∫(∫log⁡(p(θ|yi)p(θ))p(θ|yi)dθ)p(yi)dyi$, where θ are the hyperparameters (to be estimated), yi are the observables in the ith experiment, and p( · ) is the probability density function. The nested integral renders maximizing E[I] prohibitively expensive. To address this and facilitate integration, Laplace’s theorem is used to approximate p(θ|yi) via a multivariate Gaussian likelihood or log likelihood, such as in Eq. (4). Following the central limit theorem, the accuracy of this approximation increases as the number of data points increases since the likelihood would more closely resemble a unimodal multivariate Gaussian curve [58]. With GAGP, we essentially make the same approximation, i.e., Eq. (4) approximates a unimodal multivariate Gaussian curve in the log space whose minimizer insignificantly changes when the training data are massive (note that the function value, L, does change; see Fig. 1).

## 5 Data-Driven Design of Metamaterials

To demonstrate the application of GAGP in engineering design, we employ it in a new data-driven method for the optimization of metamaterial unit cells using big data. Although various methods, e.g., TO and GA, have been applied to design metamaterials with prescribed properties, these are computationally intensive and suffer from sensitivity to the initial guess as well as disconnected boundaries when using multiple unit cells. A promising solution is to construct a large database of precomputed unit cells (also known as microstructures or building blocks) for efficient selection of well-connected unit cells from the database as well as inexpensive optimization of new unit cells [912]. However, with the exception of Ref. [12] where unit cells are parameterized via geometric features like beam thickness, research in this area thus far use high-dimensional geometric representations (e.g., signed distance functions [9] or voxels [11]) that increase the memory demand and the complexity of constructing machine learning models that link structures to properties. Therefore, reducing the dimension of the unit cell is a crucial step.

In this work, we reduce the dimension of the unit cells in our metamaterial database with spectral shape descriptors based on the LB operator. We then employ GAGP to learn how the effective stiffness tensor of unit cells changes as a function of their LB descriptors. After the GAGP model is fitted, we use it to discover unit cells with desired properties through inverse optimization. Furthermore, to present the advantages of a large unit cell database and GAGP, we compare the results with those obtained using an NN model fitted to the same dataset and a conventional GP model fitted to a smaller dataset.

### 5.1 Metamaterials Database Generation.

We propose a novel two-stage pipeline inspired by Ref. [11] to generate a large training dataset of unit cells and their corresponding homogenized properties. For demonstration, our primary properties of interest are the components of the stiffness tensor, Ex, Ey, and Exy. As elaborated below, our method starts by building an initial dataset and then proceeds to efficiently cover the input (geometry) and output (property) spaces as widely as possible.

To construct the initial dataset in stage one, we select design targets in the property space (the 3D space spanned by Ex, Ey, and Exy). As the bounds of the property space are unknown a priori, we sample 1000 points uniformly distributed in [0, 1]3. Then, we use the solid isotropic material with penalization TO method [59] to find the orthotropic unit cells corresponding to each target. This stage generates 358 valid structures. The remaining 642 points do not result in feasible unit cells mainly because (i) the uniform sampling places some design targets in theoretically infeasible regions and (ii) the TO method may fail to meet targets due to sensitivity to the initial shape, which is difficult to guess without prior knowledge. The properties of these 358 structures are shown in Fig. 6, where the Poisson’s ratio is used instead of Exy for a better illustration of the space.

In stage two, we expand the initial database via an iterative stochastic shape perturbation algorithm that by-passes TO by generating distorted structures with slightly different properties from the original ones. Specifically, the following radial distortion model is used to perturb an existing shape:
$xnew={xc+rnewrold(xold−xc)ifrold≤R0xoldifrold>R0$
(16)
where xnew and xold are the coordinates of the new and original pixel locations, xc is the coordinate vector of the distortion center, rnew and rold are the new and original distances to the distortion center, respectively, and R0 is the outer distortion radius. rnew can be expressed as follows:
$rnew={12R0(1−cot(γ2)−β)ifγ>012R0(1−cot(γ2)+β)ifγ<0roldotherwise$
(17)
where
$β=2sin2(γ2)−(1+cot(γ2)−2roldR0)2$
(18)
and $γ∈(−π2,π2)$ is the angle that controls the amount of distortion. Considering the orthotropic symmetry of the unit cells, only a quarter of the original structure is distorted and then reassembled to realize the full structure. We adopt the distortion model in Eq. (16) for two reasons. First, its parameters (R0, γ, and xc) have clear interpretations and hence can be easily tuned. In our case, they are all set as random variables with standard uniform distributions to generate a wide range of structures. Second, it preserves the topology of the original unit cell and introduces negligible artifacts (e.g., disconnections and checkerboard patterns) upon perturbation.
To better cover the property space, the database is populated iteratively. In each iteration, we first calculate the following score for all available unit cells:
$Score=1(d+ε)2ρ$
(19)
where d is the Euclidean (L2) distance between the stiffness tensor components of each unit cell to the boundaries of the region enclosing the current property space (see the shaded areas in Figs. 6 and 7), ρ is the number of data points inside the neighborhood within a given radius in the property space (in our experience, sampling is more uniform when the radius equals 0.05), and ɛ ≪ 1 is used to avoid singularity. Then, we select the N points with the highest scores for stochastic perturbation. After each iteration, newly generated unit cells are verified and discarded if they contain infeasible features such as isolated parts. The properties of new feasible structures are then calculated via numerical homogenization [60] and added to the dataset. The perturbation is repeated until the boundary of property space does not expand significantly (Δd < 0.1), and the points inside the boundaries are relatively dense (average ρ > 500). By using this strategy, the database is expanded in our test case from 358 to 88,000 unit cells that cover a wider range of properties (see Fig. 7). We note that the machine learner (GAGP in our case) may also be used to detect the infeasible structures; we do not consider this in the current work but may include it in our future studies.

### 5.2 Unit Cell Dimension Reduction via Spectral Shape Descriptors.

In the previous section, each unit cell in the database is represented by 50 × 50 pixels. For dimension reduction, we use spectral shape descriptors as they retain geometric and physical information. Specifically, we use the LB spectrum, also known as Shape-DNA, which can be directly calculated for any unit cell shape [61,62].

The LB spectrum is an effective descriptor for the metamaterials database for several reasons: (i) It has a powerful discrimination ability and has been successfully applied to shape matching and classification in computer vision despite being one of the simplest spectral descriptors. (ii) All of the complex structures in our orthotropic metamaterials database can be uniquely characterized with the first 10–15 eigenvalues in the LB spectrum. (iii) The spectrum embodies some geometrical information, including perimeter, area, and Euler number. This can be beneficial for the construction of the machine learning model as less training data may be required to obtain an accurate model compared with voxel- or point-based representations. (iv) Similar shapes have close LB spectrum, which may also help the supervised learning task.

The calculation of the LB spectrum for each unit cell is as follows. For a real-valued function f defined on a Riemannian manifold [61], the Helmholtz equation reads as follows:
$Δf=−λf$
(20)
where the Laplace–Beltrami operator Δ is defined as follows:
$Δ:=div(gradf)$
(21)
The eigenvalues of the Helmholtz equation are the LB spectrum and denoted as follows:
$0≤λ1≤λ2≤⋯<∞$
(22)
Our unit cells reside in a 2D planar domain and can be considered as a special case of Riemannian manifolds. Therefore, we focus on the LB spectrum of a 2D shape under Dirichlet boundary conditions, which reduce the Helmholtz equation to a Laplacian eigenvalue problem:
$∂2f∂x2+∂2f∂y2=−λfinΩf=0onτ$
(23)
where Ω and τ are the interior and boundaries of the domain of interest, respectively.

Finally, the finite element method is employed to obtain the LB spectrum of unit cells [63]; see Fig. 8. It is noted that our 88,000 structures can be uniquely determined with only the first 16 non-zero eigenvalues, reducing the input dimension from 50 × 50 = 2500 pixels to 16 scalar descriptors. In general, the computation of the LB spectrum takes only a few seconds per unit cell on a single CPU (Intel(R) Xeon(R) Gold 6144 CPU at 3.50 GHz). Since these computations are performed once and in parallel, the runtime is acceptable.

### 5.3 Machine Learning: Linking LB Representation to Property via GAGP.

Once the dataset is built, we follow the algorithm in Fig. 2 for machine learning, i.e., relating the LB representations of unit cells to their stiffness tensor. We use the same fitting parameters as in Sec. 4 (n0 = 500, s = 6, ns = 250), equally distribute the samples among the $m=88000/(500+6×250)=44$ GPs, and use Eq. (10) to have a multiresponse model that leverages the correlation between the responses to have a higher predictive power. The convergence histories are provided in Fig. 9, where the trends are consistent with those in Sec. 4. It is observed that the 16 estimated roughness parameters do not change noticeably once more than 1000 samples are used in training. In particular, 3 of the 16 roughness estimates, which correspond to λ14, λ15, and λ16 are very small, indicating that those LB descriptors do not affect the responses. The next largest estimate belongs to ω13 ≅ −8, which corresponds to λ13. The rest of the estimates are all between 2.5 and 3, implying that the first 12 eigenvalues (shape descriptors) affect the responses similarly and nonlinearly (since large ωi indicates rough response changes along dimension i). These observations agree well with the fact that the higher order eigenvalues generally explain less variability in the data. The estimated noise variances (one per response) also converge, with Exy having the largest estimated noise variance in the data, which is potentially due to larger numerical errors in property estimation.

To illustrate the effect of expanding the training data from 385 to 88,000, we randomly select 28,000 samples for validation. Then, we evaluate the mean squared error (MSE) of the following two models on this test set: a conventional GP fitted to the initial 385 samples and a GAGP fitted to the rest of the data (i.e., to 60,000 samples, resulting in $m=60,000/(500+6×250)=30$ models). To account for randomness, we repeat this process 20 times. The results are summarized in Table 4 and demonstrate that (i) increasing the dataset size (stage two in Sec. 5.1) creates a supervised learner with a higher predictive power (compare the mean of MSEs for GP and GAGP). (ii) GAGP is more robust to variations than GP (compare the variance of MSEs for GP and GAGP). (iii) With 60,000samples, the predictive power of GAGP is slightly lower than the case where the entire dataset is used in training (compare mean of MSEs for GAGP in Table 4 with the converged noise estimates in Fig. 9).

To assess the robustness of GAGP to data size, we repeat the above procedure but with 20,000 and 40,000 training samples instead of 60,000. For fair comparisons, the same validation sample size of 28,000 is used for each. The results are summarized in Table 5 and, by comparing them with those of GAGP in Table 4, indicate that increasing the sample size from 20,000 to 60,000 increases the predictive power and robustness. Note that, since [n0, ns, s] are not changed when fitting GAGP, using more samples increases m, the number of GPs.

### 5.4 Data-Driven Unit Cell Optimization.

Finally, we illustrate the benefits of the GAGP model in an inverse optimization scheme to realize unit cells with target stiffness tensor components and compare the results with those designed using other techniques. Establishing such an inverse link is highly desirable in structure design as it allows to efficiently achieve target elastic properties by avoiding expensive finite element simulations and tedious trial and error in TO. In addition, although not demonstrated in this work, such a link can provide multiple candidate unit cells with the same properties that, in turn, enable tiling different unit cells into a macrostructure while ensuring boundary compatibility.

Our data-driven optimization scheme has two steps: the search for the optimal LB spectrum and the reconstruction of the unit cell given the LB spectrum. By using the GAGP (or another) model, we directly search for the LB spectrum of the unit cell with the desired properties. The search problem is formulated as follows:
$minλ⁡∥Et−Ep∥∞s.t.λi−1≤λi0.9λi0≤λi≤1.1λi0$
(24)
where Et and Ep are the vectorized forms of, respectively, the target and predicted stiffness tensors. λ = [λ1, …, λ16] and λ0 are the LB spectra of, respectively, candidate unit cell for the target stiffness tensor and the unit cell closest to the prescribed properties in the property space (λi is the ith order eigenvalue). We choose GA for optimization since the GAGP model is cheap to run and GA ensures global optimality for multivariate and constrained problems. Note that, to ensure fast convergence during GA, we limit the search space for GA using the LB spectrum of the unit cell in the training dataset whose properties are closest to Et.

After obtaining the optimal LB spectrum, we use a level set method to reconstruct the corresponding unit cell [64] while employing squared residuals of the LB spectrum as the objective function. For faster convergence, the unit cell closest to the optimal LB spectrum in the spectrum space is taken as the initial guess in the reconstruction process.

In the following two examples, the goal is to design structures with desired Ex, Ey, and Exy (see the target properties in Fig. 10). For each example, three unit cells are designed using different models: GAGP, NN, and GP (GAGP and NN use the entire dataset while GP uses the initial one with 358 structures). The results are visualized in Fig. 10 and demonstrate that the unit cells identified from GAGP and NN are more geometrically diverse than those obtained via GP. This is a direct result of populating the large dataset with perturbed structures and, in turn, providing the GA search process with a wider range of initial seeds. While we utilized our entire database for GAGP and NN in an attempt to provide more diversity for new designs, a smaller or less diverse training dataset could potentially achieve similar results; such a study is left for future work. We also note that the unit cells designed with GP are similar in shape but different in the size of the center hole, which leads to the significant change in properties.

From a quantitative point of view, our data-driven design method with the large database can, compared with the small dataset case, discover unit cells with properties that are closer to the target values. For instance, in Ex1, the GAGP and NN results using the large dataset achieve the target Ex, whereas the GP result with the small dataset differs from the target by around $12%$. Ex2 shows a similar pattern, with the GAGP, NN, and GP results differing from the target Ex by $4%$, $10%$, and $16%$, respectively. When the small dataset is used, the greater deviations from the target properties can be mainly attributed to insufficient training samples and the relatively small search space. This reinforces the need for a large database of unit cells in the data-driven design of metamaterials, along with an expedient machine learning method for big data. Moreover, unit cells designed with GAGP have smaller deviations than those with NN.

## 6 Conclusion and Future Works

In this work, we proposed a novel approach, GAGP, to enable GP modeling of massive datasets. Our method is centered on the observation that the hyperparameter estimates of a GP converge to some limit values, $ω^∞$, as more training samples are added. We introduced an intuitive and straightforward method to find $ω^∞$ and, subsequently, build a collection of independent GPs that all use the converged $ω^∞$ as their hyperparameters. These GPs randomly distribute the entire training dataset among themselves, which allows inference based on the entire dataset by pooling the predictions from the individual GPs. The training cost of GAGP primarily depends on the initial optimization with ns data points and the s optimizations thereafter. The former cost is the same as fitting a conventional GP with ns samples. The latter is an additional cost but is generally manageable since a single initial guess close to the global optimum is available at each iteration. The cost of building m GPs is negligible compared with these optimization costs. The prediction cost, although being m times larger than a conventional GP, is small enough for practical applications.

With analytical examples, we demonstrated that GAGP achieves very high predictive power that matches, and in some cases exceeds, that of state-of-the-art machine learning methods such as NNs and boosted trees. Unlike these latter methods, GAGP is easy to fit and interpret, which makes it particularly useful in engineering design with big data. Although the predictive power of GAGP increases as the size of the training data increases, so does the cost of fitting and training; it may be necessary to choose part of the data if resources are limited. We also note that, throughout, we assumed that the training samples are not ordered or highly correlated. If they are, randomization and appropriate transformations are required. In addition, we assumed stationary noise with an unknown variance. Considering nonstationary noise variance would be an interesting and useful extension for GAGP. Thrifty sample selection for model refinement (instead of randomly taking subsets of training data) can also improve the predictive power of GAGP and is planned for our future works.

As a case study, we applied GAGP to a data-driven metamaterials unit cell design process that attains desired elastic properties by transforming the complex material design problem into a parametric one using spectral descriptors. After mapping reduced-dimensional geometric descriptors (LB spectrum) to properties through GAGP, unit cells with properties close to the target values are discovered by finding the optimal LB spectrum with inverse optimization. This framework provides a springboard for a salient new approach to systematically and efficiently design metamaterials with optimized boundary compatibility, spatially varying properties, and multiple functionalities.

## Acknowledgment

The authors are grateful to Professor K. Svanberg from the Royal Institute of Technology, Sweden, for providing a copy of the MMA code for metamaterial design. Support from the National Science Foundation (NSF) (Grant Nos. ACI 1640840 and OAC 1835782; Funder ID: 10.13039/501100008982) and the Air Force Office of Scientific Research (AFOSR FA9550-18-1-0381; Funder ID: 10.13039/100000181) are greatly appreciated. Ms. Yu-Chin Chan would like to acknowledge the NSF Graduate Research Fellowship Program (Grant No. DGE-1842165).

## Nomenclature

• d =

input dimensionality

•
• n =

number of training samples

•
• q =

output dimensionality

•
• s =

number of times that ns samples are added to n0

•
• x =

vector of d inputs

•
• y =

vector of q outputs

•
• L =

objective function in MLE

•
• R =

sample correlation matrix of size n × n

•
• GP =

Gaussian process

•
• MLE =

maximum likelihood estimation

•
• δ =

Nugget or jitter parameter

•
• n0 =

number of initial random samples

•
• ns =

number of random samples added to n0 per iteration

•
• ω =

roughness parameters of the correlation function

•
• $ω^∞$ =

estimate of ω via MLE with very large training data

## References

1.
Yan
,
W.
,
Lin
,
S.
,
Kafka
,
O. L.
,
Lian
,
Y.
,
Yu
,
C.
,
Liu
,
Z.
,
Yan
,
J.
,
Wolff
,
S.
,
Wu
,
H.
, and
Ndip-Agbor
,
E.
,
2018
, “
Data-Driven Multi-Scale Multi-Physics Models to Derive Process–Structure–Property Relationships for Additive Manufacturing
,”
Comput. Mech.
,
61
(
5
), pp.
521
541
. 10.1007/s00466-018-1539-z
2.
Mozaffar
,
M.
,
Paul
,
A.
,
Al-Bahrani
,
R.
,
Wolff
,
S.
,
Choudhary
,
A.
,
Agrawal
,
A.
,
Ehmann
,
K.
, and
Cao
,
J.
,
2018
, “
Data-Driven Prediction of the High-Dimensional Thermal History in Directed Energy Deposition Processes via Recurrent Neural Networks
,”
Manuf. Lett.
,
18
, pp.
35
39
. 10.1016/j.mfglet.2018.10.002
3.
Ghumman
,
U. F.
,
Iyer
,
A.
,
Dulal
,
R.
,
Munshi
,
J.
,
Wang
,
A.
,
Chien
,
T.
,
Balasubramanian
,
G.
, and
Chen
,
W.
,
2018
, “
A Spectral Density Function Approach for Active Layer Design of Organic Photovoltaic Cells
,”
ASME J. Mech. Des.
,
140
(
11
), p.
111408
. 10.1115/1.4040912
4.
Bessa
,
M. A.
,
,
R.
,
Liu
,
Z.
,
Hu
,
A.
,
Apley
,
D. W.
,
Brinson
,
C.
,
Chen
,
W.
, and
Liu
,
W. K.
,
2017
, “
A Framework for Data-Driven Analysis of Materials Under Uncertainty: Countering the Curse of Dimensionality
,”
Comput. Method Appl. M.
,
320
, pp.
633
667
. 10.1016/j.cma.2017.03.037
5.
,
R.
,
Chen
,
W.
, and
Apley
,
D. W.
,
2016
, “
Characterization and Reconstruction of 3D Stochastic Microstructures via Supervised Learning
,”
J. Microsc.
,
264
(
3
), pp.
282
297
. 10.1111/jmi.12441
6.
,
R.
,
Zhang
,
Y.
,
Li
,
X.
,
Kearney
,
T.
,
Brinson
,
L. C.
,
Apley
,
D. W.
,
Liu
,
W. K.
, and
Chen
,
W.
,
2018
, “
Computational Microstructure Characterization and Reconstruction: Review of the State-of-the-Art Techniques
,”
Prog. Mater. Sci.
,
95
, pp.
1
41
. 10.1016/j.pmatsci.2018.01.005
7.
Li
,
X.
,
Yang
,
Z.
,
Brinson
,
L. C.
,
Choudhary
,
A.
,
Agrawal
,
A.
, and
Chen
,
W.
,
2018
, “
A Deep Adversarial Learning Methodology for Designing Microstructural Material Systems
,”
ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
,
Aug. 26–29
.
8.
,
R.
,
Bui
,
A. T.
,
Xie
,
W.
,
Apley
,
D. W.
, and
Chen
,
W.
,
2016
, “
Stochastic Microstructure Characterization and Reconstruction via Supervised Learning
,”
Acta Mater.
,
103
, pp.
89
102
. 10.1016/j.actamat.2015.09.044
9.
Schumacher
,
C.
,
Bickel
,
B.
,
Rys
,
J.
,
Marschner
,
S.
,
Daraio
,
C.
, and
Gross
,
M.
,
2015
, “
Microstructures to Control Elasticity in 3D Printing
,”
ACM Trans. Graph.
,
34
(
4
), p.
136
. 10.1145/2766926
10.
Panetta
,
J.
,
Zhou
,
Q.
,
Malomo
,
L.
,
Pietroni
,
N.
,
Cignoni
,
P.
, and
Zorin
,
D.
,
2015
, “
,”
ACM Trans. Graph.
,
34
(
4
), p.
135
. 10.1145/2766937
11.
Zhu
,
B.
,
Skouras
,
M.
,
Chen
,
D.
, and
Matusik
,
W.
,
2017
, “
Two-Scale Topology Optimization With Microstructures
,”
ACM Trans. Graph.
,
36
(
5
), p.
164
.10.1145/3072959.3126835
12.
Chen
,
D.
,
Skouras
,
M.
,
Zhu
,
B.
, and
Matusik
,
W.
,
2018
, “
Computational Discovery of Extremal Microstructure Families
,”
,
4
(
1
), p.
eaao7005
13.
LeCun
,
Y.
,
Bengio
,
Y.
, and
Hinton
,
G.
,
2015
, “
Deep Learning
,”
Nature
,
521
(
7553
), pp.
436
444
. 10.1038/nature14539
14.
Chen
,
T.
, and
Guestrin
,
C.
, “
Xgboost: A Scalable Tree Boosting System
,”
2016
,
Proceedings of 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
,
San Francisco, CA
,
Aug. 13–17
, pp.
785
794
.
15.
Hassaninia
,
I.
,
,
R.
,
Chen
,
W.
, and
Mohseni
,
H.
,
2017
, “
Characterization of the Optical Properties of Turbid Media by Supervised Learning of Scattering Patterns
,”
Sci. Rep.
,
7
(
1
), p.
15259
. 10.1038/s41598-017-15601-4
16.
Tao
,
S.
,
Shintani
,
K.
,
,
R.
,
Chan
,
Y.-C.
,
Yang
,
G.
,
Meingast
,
H.
, and
Chen
,
W.
,
2017
, “
Enhanced Gaussian Process Metamodeling and Collaborative Optimization for Vehicle Suspension Design Optimization
,”
ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Cleveland, OH
,
Aug. 6–9
, Paper No. DETC2017-67976.
17.
,
R.
,
Liang
,
B.
,
Gao
,
J.
,
Liu
,
W. K.
,
Cao
,
J.
,
Zeng
,
D.
,
Su
,
X.
,
Xu
,
H.
,
Li
,
Y.
, and
Chen
,
W.
,
2018
, “
Uncertainty Quantification in Multiscale Simulation of Woven Fiber Composites
,”
Comput. Method Appl. M.
,
338
, pp.
506
532
. 10.1016/j.cma.2018.04.024
18.
Zhang
,
W.
,
,
R.
,
Liang
,
B.
,
Su
,
X.
,
Zeng
,
D.
,
Bessa
,
M. A.
,
Wang
,
Y.
,
Chen
,
W.
, and
Cao
,
J.
,
2019
, “
A Numerical Bayesian-Calibrated Characterization Method for Multiscale Prepreg Preforming Simulations With Tension-Shear Coupling
,”
Compos. Sci. Technol.
,
170
, pp.
15
24
. 10.1016/j.compscitech.2018.11.019
19.
Gramacy
,
R. B.
, and
Lee
,
H. K. H.
,
2008
, “
Bayesian Treed Gaussian Process Models With an Application to Computer Modeling
,”
J. Am. Stat. Assoc.
,
103
(
483
), pp.
1119
1130
. 10.1198/016214508000000689
20.
Kim
,
H.-M.
,
Mallick
,
B. K.
, and
Holmes
,
C.
,
2005
, “
Analyzing Nonstationary Spatial Data Using Piecewise Gaussian Processes
,”
J. Am. Stat. Assoc.
,
100
(
470
), pp.
653
668
. 10.1198/016214504000002014
21.
Rasmussen
,
C. E.
,
2006
,
Gaussian Processes for Machine Learning
,
MIT Press
,
Cambridge, MA
.
22.
Tresp
,
V.
,
2000
, “
A Bayesian Committee Machine
,”
Neural Comput.
,
12
(
11
), pp.
2719
2741
. 10.1162/089976600300014908
23.
Herbrich
,
R.
,
Lawrence
,
N. D.
, and
Seeger
,
M.
,
2003
, “
Fast Sparse Gaussian Process Methods: The Informative Vector Machine
,”
Proc. Advances in Neural Information Processing Systems
, pp.
625
632
.
24.
Seeger
,
M.
,
Williams
,
C.
, and
Lawrence
,
N.
,
2003
, “
Fast Forward Selection to Speed Up Sparse Gaussian Process Regression
,”
Proceedings of Artificial Intelligence and Statistics 9
.
25.
Williams
,
C. K.
, and
Seeger
,
M.
,
2001
, “
Using the Nyström Method to Speed Up Kernel Machines
,”
Proceedings of Advances in Neural Information Processing Systems
, pp.
682
688
.
26.
Rasmussen
,
C. E.
, and
Quinonero-Candela
,
J.
,
2005
, “
Healing the Relevance Vector Machine Through Augmentation
,”
Proceedings of 22nd International Conference on Machine Learning
,
Aug. 7–11
, pp.
689
696
.
27.
Wahba
,
G.
,
1990
,
Spline Models for Observational Data
, Vol.
59
, Siam.
28.
Gramacy
,
R. B.
, and
Apley
,
D. W.
,
2015
, “
Local Gaussian Process Approximation for Large Computer Experiments
,”
J. Comput. Graph. Stat.
,
24
(
2
), pp.
561
578
. 10.1080/10618600.2014.914442
29.
Garcia
,
D. J.
,
Mozaffar
,
M.
,
Ren
,
H. Q.
,
Correa
,
J. E.
,
Ehmann
,
K.
,
Cao
,
J.
, and
You
,
F. Q.
,
2019
, “
Sustainable Manufacturing With Cyber-Physical Discrete Manufacturing Networks: Overview and Modeling Framework
,”
ASME J. Manuf. Sci. Eng.
,
141
(
2
), p.
021013
. 10.1115/1.4041833
30.
Mozaffar
,
M.
,
Ndip-Agbor
,
E.
,
Stephen
,
L.
,
Wagner
,
G. J.
,
Ehmann
,
K.
, and
Cao
,
J.
,
2019
, “
Acceleration Strategies for Explicit Finite Element Analysis of Metal Powder-Based Additive Manufacturing Processes Using Graphical Processing Units
,”
Comput. Mech.
, pp.
1
16
. 10.1007/s00466-019-01685-4
31.
Han
,
Y.
, and
Lu
,
W. F.
,
2018
, “
A Novel Design Method for Nonuniform Lattice Structures Based on Topology Optimization
,”
ASME J. Mech. Des.
,
140
(
9
), p.
091403
. 10.1115/1.4040546
32.
Li
,
D.
,
Dai
,
N.
,
Tang
,
Y.
,
Dong
,
G.
, and
Zhao
,
Y. F.
,
2019
, “
Design and Optimization of Graded Cellular Structures With Triply Periodic Level Surface-Based Topological Shapes
,”
ASME J. Mech. Des.
,
141
(
7
), p.
071402
.10.1115/1.4042617
33.
Du
,
Z.
,
Zhou
,
X.-Y.
,
Picelli
,
R.
, and
Kim
,
H. A.
,
2018
, “
Connecting Microstructures for Multiscale Topology Optimization With Connectivity Index Constraints
,”
ASME J. Mech. Des.
,
140
(
11
), p.
111417
. 10.1115/1.4041176
34.
Fu
,
J.
,
Xia
,
L.
,
Gao
,
L.
,
Xiao
,
M.
, and
Li
,
H.
,
2019
, “
Topology Optimization of Periodic Structures With Substructuring
,”
ASME J. Mech. Des.
,
141
(
7
), p.
071403
. 10.1115/1.4042616
35.
,
R.
,
Kearney
,
T.
,
Tao
,
S.
,
Apley
,
D. W.
, and
Chen
,
W.
,
2018
, “
Leveraging the Nugget Parameter for Efficient Gaussian Process Modeling
,”
Int. J. Numer Meth. Eng.
,
114
(
5
), pp.
501
516
. 10.1002/nme.5751
36.
MacDonald
,
B.
,
Ranjan
,
P.
, and
Chipman
,
H.
,
2015
, “
GPfit: AnRPackage for Fitting a Gaussian Process Model to Deterministic Simulator Outputs
,”
J. Stat. Software
,
64
, p.
12
. 10.18637/jss.v064.i12
37.
Plumlee
,
M.
, and
Apley
,
D. W.
,
2017
, “
Lifted Brownian Kriging Models
,”
Technometrics
,
59
(
2
), pp.
165
177
. 10.1080/00401706.2016.1211555
38.
Ranjan
,
P.
,
Haynes
,
R.
, and
Karsten
,
R.
,
2011
, “
A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data
,”
Technometrics
,
53
(
4
), pp.
366
378
. 10.1198/TECH.2011.09141
39.
Sacks
,
J.
,
Schiller
,
S. B.
, and
Welch
,
W. J.
,
1989
, “
Designs for Computer Experiments
,”
Technometrics
,
31
(
1
), pp.
41
47
. 10.1080/00401706.1989.10488474
40.
Toal
,
D. J. J.
,
Bressloff
,
N. W.
, and
Keane
,
A. J.
,
2008
, “
Kriging Hyperparameter Tuning Strategies
,”
AIAA J.
,
46
(
5
), pp.
1240
1252
. 10.2514/1.34822
41.
Audet
,
C.
, and
Dennis
,
J. E.
, Jr
,
2002
, “
Analysis of Generalized Pattern Searches
,”
SIAM J. Optim.
,
13
(
3
), pp.
889
903
. 10.1137/S1052623400378742
42.
Zhao
,
L.
,
Choi
,
K.
, and
Lee
,
I.
,
2011
, “
Metamodeling Method Using Dynamic Kriging for Design Optimization
,”
AIAA J.
,
49
(
9
), pp.
2034
2046
. 10.2514/1.J051017
43.
Toal
,
D. J.
,
Bressloff
,
N.
,
Keane
,
A.
, and
Holden
,
C.
,
2011
, “
The Development of a Hybridized Particle Swarm for Kriging Hyperparameter Tuning
,”
Eng. Optim.
,
43
(
6
), pp.
675
699
. 10.1080/0305215X.2010.508524
44.
Conti
,
S.
,
Gosling
,
J. P.
,
Oakley
,
J. E.
, and
O'Hagan
,
A.
,
2009
, “
Gaussian Process Emulation of Dynamic Computer Codes
,”
Biometrika
,
96
(
3
), pp.
663
676
. 10.1093/biomet/asp028
45.
Arendt
,
P. D.
,
Apley
,
D. W.
, and
Chen
,
W.
,
2012
, “
Quantification of Model Uncertainty: Calibration, Model Discrepancy, and Identifiability
,”
ASME J. Mech. Des.
,
134
(
10
), p.
100908
. 10.1115/1.4007390
46.
Arendt
,
P. D.
,
Apley
,
D. W.
,
Chen
,
W.
,
Lamb
,
D.
, and
Gorsich
,
D.
,
2012
, “
Improving Identifiability in Model Calibration Using Multiple Responses
,”
ASME J. Mech. Des.
,
134
(
10
), p.
100909
. 10.1115/1.4007573
47.
Bayarri
,
M.
,
Berger
,
J.
,
Cafeo
,
J.
,
Garcia-Donato
,
G.
,
Liu
,
F.
,
Palomo
,
J.
,
Parthasarathy
,
R.
,
Paulo
,
R.
,
Sacks
,
J.
, and
Walsh
,
D.
,
2007
, “
Computer Model Validation With Functional Output
,”
Ann. Stat.
,
35
(
5
), pp.
1874
1906
. 10.1214/009053607000000163
48.
Conti
,
S.
, and
O’Hagan
,
A.
,
2010
, “
Bayesian Emulation of Complex Multi-Output and Dynamic Computer Models
,”
J. Stat. Plann. Inference
,
140
(
3
), pp.
640
651
. 10.1016/j.jspi.2009.08.006
49.
Sobol
,
I. M.
,
1967
, “
On the Distribution of Points in a Cube and the Approximate Evaluation of Integrals
,”
Zh. Vychisl. Mat. Mat. Fiz.
,
7
(
4
), pp.
784
802
.
50.
Sobol
,
I. M.
,
1998
, “
On Quasi-Monte Carlo Integrations
,”
Math. Comput. Simulat.
,
47
(
2–5
), pp.
103
112
. 10.1016/S0378-4754(98)00096-2
51.
Hastie
,
T.
,
Tibshirani
,
R.
,
Friedman
,
J.
,
Hastie
,
T.
,
Friedman
,
J.
, and
Tibshirani
,
R.
,
2009
,
The Elements of Statistical Learning
,
Springer
,
New York
.
52.
Bastos
,
L. S.
, and
O’Hagan
,
A.
,
2009
, “
Diagnostics for Gaussian Process Emulators
,”
Technometrics
,
51
(
4
), pp.
425
438
. 10.1198/TECH.2009.08019
53.
Ben-Ari
,
E. N.
, and
Steinberg
,
D. M.
,
2007
, “
Modeling Data From Computer Experiments: An Empirical Comparison of Kriging With MARS and Projection Pursuit Regression
,”
Qual. Eng.
,
19
(
4
), pp.
327
338
. 10.1080/08982110701580930
54.
Le
,
B.
,
Yvonnet
,
J.
, and
He
,
Q. C.
,
2015
, “
Computational Homogenization of Nonlinear Elastic Materials Using Neural Networks
,”
Int. J. Numer Meth. Eng.
,
104
(
12
), pp.
1061
1084
. 10.1002/nme.4953
55.
Gelman
,
A.
,
Carlin
,
J. B.
,
Stern
,
H. S.
,
Dunson
,
D. B.
,
Vehtari
,
A.
, and
Rubin
,
D. B.
,
2013
,
Bayesian Data Analysis
,
CRC Press
,
Boca Raton, FL
.
56.
Kullback
,
S.
,
1997
, Information Theory and Statistics,
Dover Publications
,
New York
.
57.
Long
,
Q.
,
Scavino
,
M.
,
Tempone
,
R.
, and
Wang
,
S.
,
2013
, “
Fast Estimation of Expected Information Gains for Bayesian Experimental Designs Based on Laplace Approximations
,”
Comput. Method Appl. M.
,
259
(
Supplement C
), pp.
24
39
. 10.1016/j.cma.2013.02.017
58.
Tierney
,
L.
, and
,
J. B.
,
1986
, “
Accurate Approximations for Posterior Moments and Marginal Densities
,”
J. Am. Stat. Assoc.
,
81
(
393
), pp.
82
86
. 10.1080/01621459.1986.10478240
59.
Xia
,
L.
, and
Breitkopf
,
P.
,
2015
, “
Design of Materials Using Topology Optimization and Energy-Based Homogenization Approach in Matlab
,”
Struct. Multidiscip. Optim.
,
52
(
6
), pp.
1229
1241
. 10.1007/s00158-015-1294-0
60.
Andreassen
,
E.
, and
Andreasen
,
C. S.
,
2014
, “
How to Determine Composite Material Properties Using Numerical Homogenization
,”
Comp. Mater. Sci.
,
83
, pp.
488
495
. 10.1016/j.commatsci.2013.09.006
61.
Reuter
,
M.
,
Wolter
,
F.-E.
, and
Peinecke
,
N.
,
2006
, “
Laplace–Beltrami Spectra as ‘Shape-DNA’ of Surfaces and Solids
,”
Comput.-Aided Des.
,
38
(
4
), pp.
342
366
62.
Lian
,
Z.
,
Godil
,
A.
,
Bustos
,
B.
,
Daoudi
,
M.
,
Hermans
,
J.
,
Kawamura
,
S.
,
Kurita
,
Y.
,
Lavoué
,
G.
,
Van Nguyen
,
H.
,
Ohbuchi
,
R.
,
Ohkita
,
Y.
,
Ohishi
,
Y.
,
Porikli
,
F.
,
Reuter
,
M.
,
Sipiran
,
I.
,
Smeets
,
D.
,
Suetens
,
P.
,
Tabia
,
H.
, and
Vandermeulen
,
D.
,
2013
, “
A Comparison of Methods for Non-Rigid 3D Shape Retrieval
,”
Pattern Recognit.
,
46
(
1
), pp.
449
461
. 10.1016/j.patcog.2012.07.014
63.
Su
,
S.
,
2010
, “
Numerical Approaches on Shape Optimization of Elliptic Eigenvalue Problems and Shape Study of Human Brains
,” Ph.D. thesis,
The Ohio State University
,
Columbus, OH
.
64.
Zhu
,
S.
,
2018
, “
Effective Shape Optimization of Laplace Eigenvalue Problems Using Domain Expressions of Eulerian Derivatives
,”
J. Optim. Theory Appl.
,
176
(
1
), pp.
17
34
. 10.1007/s10957-017-1198-9