## Abstract

Bayesian optimization (BO) is an efficient and flexible global optimization framework that is applicable to a very wide range of engineering applications. To leverage the capability of the classical BO, many extensions, including multi-objective, multi-fidelity, parallelization, and latent-variable modeling, have been proposed to address the limitations of the classical BO framework. In this work, we propose a novel multi-objective BO formalism, called srMO-BO-3GP, to solve multi-objective optimization problems in a sequential setting. Three different Gaussian processes (GPs) are stacked together, where each of the GPs is assigned with a different task. The first GP is used to approximate a single-objective computed from the multi-objective definition, the second GP is used to learn the unknown constraints, and the third one is used to learn the uncertain Pareto frontier. At each iteration, a multi-objective augmented Tchebycheff function is adopted to convert multi-objective to single-objective, where the regularization with a regularized ridge term is also introduced to smooth the single-objective function. Finally, we couple the third GP along with the classical BO framework to explore the convergence and diversity of the Pareto frontier by the acquisition function for exploitation and exploration. The proposed framework is demonstrated using several numerical benchmark functions, as well as a thermomechanical finite element model for flip-chip package design optimization.

## 1 Introduction

Global optimization is a common problem that appears in many contexts and particularly important for engineering design. The most used global optimization methods are population based searching algorithms, such as evolutionary algorithms. In contrast, surrogate-based searching methods rely on the guidance of surrogate models to perform sequential sampling. Bayesian optimization (BO) method is such an example. BO is a derivative-free surrogate-based global optimization technique that has started being used in engineering domains. A surrogate of the objective function is constructed and continuously updated during the search. The sequential sampling is based on the maximization of an acquisition function, also constructed in the same searching space. Different acquisition functions such as expected improvement (EI), probability of improvement (PI), upper confidence bound (UCB), and entropy-based approaches have been used. The classical BO methods were developed for single-objective optimization, whereas multi-objective optimization problems are of interest in practice. In real-world applications, objectives are often found to conflict with each other. Design decision-making requires us to enumerate options and make trade-offs between these objectives. Thus, it is important to obtain the set of optimal solutions in the Pareto sense so that the relationships among objectives can be fully explored.

Some efforts [1,2] have been made to develop multi-objective Bayesian optimization (MOBO) methods. Jeong and Obayashi [3] proposed an EI-based criteria and employed genetic algorithm (GA) to search for potential sampling points on the Pareto frontier. Knowles [4] introduced the ParEGO framework with the augmented Tchebycheff function to scalarize the multi-objective functions, which was later extended by Davins-Valldaura et al. [5] by including the probability of Pareto frontier with another GP. Dächert et al. [6] studied the importance of the augmented weighted Tchebycheff regularization parameter between and 1 norms. Zhang et al. [7] proposed the MOEA/D framework with the classical Tchebycheff scalarization function and decomposed a multi-objective (MO) problem into a number of single-objective optimization subproblems. Li et al. [8,9] improved the MO GA methods with the K-MOGA framework, where the posterior mean and posterior variance are considered to assist GA methods. Feliot et al. [10] proposed a multi-objective EI acquisition function considering constraints based on the posterior mean and posterior variance. Yang et al. [11] proposed an expected hypervolume improvement gradient focusing on bi-objective problems and further accelerated the computation of hypervolume using an analytical derivation of gradients. Valladares and Tovar [12] compared MO EI (or MEI) criteria against Tchebycheff in 1 and and concluded that randomized weights indeed help diversifying the Pareto frontier.

Gupta et al. [13] proposed a batch MOBO framework with a weighted linear average objective and demonstrated by optimizing the heat treatment process of an Al-Sc alloy. Shu et al. [14] developed a composite acquisition function for MOBO to efficiently sample with simultaneous improvement of convergence and diversity in constructing the Pareto frontier. Hernández-Lobato et al. [15,16] proposed a predictive entropy search acquisition function for MO problems considering constraints. Abdolshah et al. [17] proposed a MOBO framework where various computational efforts are taken into account when exploring the input domain. Gaudrie et al. [18] proposed a sequential and batch extension for MOBO method by maximizing the expected hypervolume improvement. Palar et al. [19] compared the impact of different covariance functions in MOBO and concluded that Matérn-3/2 is the most robust kernel for design optimization applications. Golovin et al. [20] proposed a hypervolume scalarization algorithm and proved the regret bound scales as $O(T)$. Han and Zheng [21] proposed an EIR2 indicator as a minimax extension for EI acquisition function in the context of MO optimization problems. While hypervolume-based approaches (also known as $S$-metric or Lebesgue measure) aim at both convergence and diversity, it also comes with a computational cost to compute the hypervolume. Let s be the number of objectives and n be the number of data points. The grid-based hypervolume algorithm scales at $O(m⋅nm)$, whereas the Walking Fish Group algorithm has a complexity of $O(s⋅2n)$ [22]. Beume et al. [23] proved the lower bound of $O(nlogn)$ for any number of objectives s > 1 and presented an algorithm that scales particularly so for s = 3. We propose an improved acquisition function that can avoid the cost of computing hypervolume for arbitrarily many objective functions.

In this paper, we propose a new framework, called srMO-BO-3GP, to stack three GPs to solve the MO constrained optimization problems. Furthermore, we explore the possibility of introducing regularization in the Tchebycheff scalarization method. The approach is unique in three different perspectives. First, we employ GP as a machine learning technique to probabilistically search for the Pareto frontier, by stacking this classifier over the objective GP. The acquisition function in the classical BO method is employed to balance the exploitation and the exploration of this Pareto GP, which enhances its richness, diversity, and convergence in the Pareto frontier. Second, we combine the multi-objective functions to a single-objective function with a randomized weight vector and regularize this single-objective function by a ridge regularization to smoothen the single-objective function. The philosophy of our approach is intrinsically similar to that of classical BO methods. Compared to other approach in literature, the heuristic proposed in this research is greatly simplified, yet its efficiency is comparable to other methods. Our approach is more general in the sense that it does not restrict to a particular form of the acquisition function and leave the choice of acquisition functions up to the users. The idea is extended based on the work of Davins-Valldaura et al. [5]; however, our work differs from their work in treating the Pareto GP as well as the acquisition function. In our approach, we opt to include both exploitation and exploration of this Pareto GP by considering uncertainty, whereas in their approach, only the probability of Pareto frontier (i.e., exploitation) is considered. Another novelty in our approach is the ridge regularized term in the acquisition to smooth the acquisition function.

Two significant advantages of the proposed srMO-BO-3GP algorithms are highlighted as follows. First, the regularization terms in the acquisition can significantly mitigate the effects of noise on observations. Second, our approach is not limited by the number of objective functions, as in other approaches. For example, if hypervolume-based approach is considered, the number of objectives would have a scalability effect on the computation of the hypervolume. Our proposed approach avoids limitation by converting the identification of Pareto frontier into an uncertain classification problems in machine learning context, with a flavor of uncertainty quantification. Thus, as the Pareto GP classifier gains more accuracy, it converges to the true Pareto frontier through the mean of classical Bayesian optimization approach.

For the remaining of this paper, Sec. 2 describes the proposed srMO-BO-3GP framework. Section 3 provides the numerical results for several numerical analytical functions, as well as an engineering thermomechanical finite element model (FEM) using the proposed approach. Section 5 provides discussions, and Sec. 6 concludes the paper. Compared to our previous preliminary work in Ref. [24], in this paper, we provide a comprehensive benchmark for the proposed srMO-BO-3GP framework and update the weighted parameters balancing and 0 norms in the augmented Tchebycheff scalarization function.

## 2 Methodology

We follow the formulation of Lin et al. [25] in defining the MOBO problem. For the sake of clarity, we denote $x={xi}i=1d∈X⊆Rd$ as the input of d continuous variables in d-dimensional space, whereas $y={yj}j=1s$ as s outputs. Traditionally, BO solves the single-objective optimization problem
$x*=argminx∈Xf(x)$
(1)
subject to the constraints c(x) ≤ 0. In this paper, we consider the scenario of MO optimization problem,
$x*=argminx∈X(f1(x),…,fs(x))$
(2)
subject to the known constraints g(x) ≤ 0.

### 2.1 Gaussian Process.

Assume that f is a function of x, where $x∈X$ is a d-dimensional input, and y is the observation. Let the dataset $D=(xi,yi)i=1n$, where n is the number of observations. A GP regression assumes that f = f1:n is jointly Gaussian, and the observation y is normally distributed given f,
$f|X∼N(m,K)$
(3)
$y|f,σ2∼N(f,σ2I)$
(4)
where mi := μ(xi) and Ki,j := k(xi, xj).
The covariance kernel K is a choice of modeling covariance between inputs. At an unknown sampling location x, the predicted response is described by a posterior Gaussian distribution, where the posterior mean is
$μn(x)=μ0(x)+k(x)T(K+σ2I)−1(y−m)$
(5)
and the posterior variance is
$σn2=k(x)−k(x)T(K+σ2I)−1k(x)$
(6)
where k(x) is the covariance vector between the query point x and x1:n. The classical GP formulation assumes stationary covariance matrix, which only depends on the distance $r=‖x−x′‖$. Several most common kernels for GP include [26]
$kMate´rn1(x,x′)=θ02exp(−r)kMate´rn3(x,x′)=θ02exp(−3r)(1+3r)kMate´rn5(x,x′)=θ02exp(−5r)(1+5r+53r2)ksq-exp(x,x′)=θ02exp(−12r2)$
The log-likelihood function can be written as
$logp(y|x1:n,θ)=−n2log(2π)−12log|Kθ+σ2I|−12(y−mθ)T(Kθ+σ2I)−1(y−mθ)$
(7)
Optimizing the log-likelihood function yields the hyper-parameter θ at the computational cost of $O(n3)$ due to the cost to compute the inverse of the covariance matrix.

### 2.2 Multi-Objective Function.

We adopt the definition of Pareto-dominant from Rojas-Gonzalez et al. [2] to define the Pareto frontier.

Definition 1

x1is said to dominatex2, denoted as$x1⪯x2$, if and only if$∀j:1≤j≤s$, such thatyj(x1) ≤ yj(x2), and$∃j:1≤j≤s$, such thatyj(x1) < yj(x2).

Definition 2

x1is said to strictly dominatex2, denoted as$x1≺x2$, if and only if$∀j:1≤j≤s$, such thatyj(x1) < yj(x2).

A MO problem can be solved by converting it to a single-objective problem, where the single-objective function is a weighted sum of multiple objectives [2], such as

1. weighted Tchebycheff scalarization function $y=max1≤i≤s$$wi(yi(x)−zi*)$,

2. the weighted sum scalarization function: $y=∑i=1swiyi(x)$,

3. and the augmented (weighted) Tchebycheff scalarization function $y=max1≤i≤swi(yi(x)−zi*)+ρ∑i=1swiyi(x)$,

where $zi*$ denotes the ideal value for the ith objective, the weights 0 ≤ wi ≤ 1, $∑i=1mwi=1$, and ρ is a positive constant. Typically, ρ is set as 0.05 in [4,5]. However, we follow Ref. [27] and set ρ = 0.65 due to its superior performance; the idea of tuning ρ adaptively remains open for future research.

Following the idea of Davins-Valldaura et al. [5] in converting multiple objectives to a single-objective function, we propose a new objective function based on the augmented Tchebycheff scalarization function, with the addition of a regularized term for the single-objective function, which could be interpreted in terms of ridge regression,
$y=max1≤j≤swjyj(x)+ρ∑j=1swjyj(x)+λ‖x‖2$
(8)
where λ is a regularization parameter. The first term can be thought of as an -norm, whereas the second term can be thought of as a 1-norm [6]. Various treatments for the Tchebycheff decomposition exist in literature [28]. The regularization term is used to smoothen the objective function and mitigate the singularity effect when Tchebycheff decomposition is introduced. More specifically, the term max1≤iswjyj(x) does not have a continuous first derivative, while the assumption for fitting GP is that the underlying function needs to be smooth. In that sense, it is necessary to introduce a small ridge regularization term in 2-norm.

### 2.3 Constraints.

We consider the known constraints and hidden constraints, where the known constraints are known before the functional evaluation, whereas the hidden constraints must be learn indirectly through the functional evaluation. For real-world engineering design applications, some constraints are imposed to the design performance which does not have a simple analytical form to evaluate. Simulations such as finite element model and computational fluid dynamics need to be run to check if the constraints are satisfied. Those are called unknown constraints, which typically correspond to unforeseeable and reproducible failure of the applications with low performance for a certain region of inputs. It could also be due to numerical issues in simulations, e.g., mesh failure, numerical instability, and numerical divergence. Such errors must be accounted for, but they are only known after the functional evaluation of f(·) (in Eq. (2)) is invoked with simulations. We refer to the constraints that are related to the functional evaluation f(·) are unknown constraints. On the contrary, the known constraints c(x) are those that can be evaluated or invoked separately based on the available analytical forms without calling the functional evaluation f(·) by simulation.

#### 2.3.1 Known Constraints.

Known constraints can be represented as a set of inequalities g(x) ≤ 0, where g is a relatively cheap set of functions to evaluate, compared to the real objective function f. The known constraints can be easily implemented by directly penalizing the acquisition by setting it to zero if known constraints are violated, while leaving unviolated sampling points as is.

In practice, the penalty scheme is implemented by multiplying the acquisition with another indicator function $I(x)$,
$I(x)={1,if∀k:gk(x)≤00,if∃k:gk(x)>0$
(9)
where gk denotes the kth constraint in the set of known constraints.

#### 2.3.2 Unknown/hidden Constraints.

We adopt our previous strategy by employing a probabilistic binary classifier to learn the unknown or hidden constraints. As a result, feasible and infeasible regions are separated in the input domain $X$. These two regions are mutually exclusive, i.e., disjoint, because a sampling point cannot be both feasible and infeasible at the same time. The labels of feasible/infeasible for sampling points are fixed, in the sense that as the optimization process advances, the labels do not change.

Denote the feasibility dataset as ${xi,ci}i=1n$, where n is the number of data points and c is associated with unknown constraints. We assign ci = 1 if xi is feasible, and ci = 0 if xi is infeasible. At an unknown x, the feasibility classifier provides a probability mass function, with Pr(x|c(x) = 1) as the predicted probability of satisfying the hidden constraints, and Pr(x|c(x) = 0) as the predicted probability of failing the hidden constraints. It is noteworthy that their sum adds up Pr(x|c(x) = 0) + Pr(x|c(x) = 1), as they are mutually exclusive, and there are only two possibilities. The probability of passing the unknown constraints will be used to condition on the acquisition, resulting in the multiplication of Pr(x|c(x) = 1) in the conditioned acquisition.

Even though there are many available probabilistic binary classifier in the context of machine learning, for example, k-NN [29], AdaBoost [30], RandomForest [31], support vector machine [32] (SVM), least squares support vector machine (LSSVM) [33], and convolutional neural network [34]; in this work, we restrict our methodology to GP as a binary probabilistic classifier. Labeling feasibility as described above, the posterior mean of feasibility GP can be used to predict the probability of satisfying unknown constraints, i.e., μfeasible(x) = Pr(x|c(x) = 1).

### 2.4 Pareto Frontier With an Uncertain Gaussian Process Classifier.

At each iteration, we construct the current Pareto frontier, which is subjected to change as the optimization process advances. If a sampling point is currently Pareto-dominant, the point is labeled as 1, and if the sampling point is not Pareto-dominant (i.e., it is dominated by another point in the dataset), the sampling point is labeled as 0. The classification process is thus uncertain, in the sense that the labels change from one iteration to another. This is to contrast with the constraint feasibility classifier μfeasible(x), where the labels are fixed and do not change, as the optimization process advances. The uncertainty in the Pareto frontier classification gradually decreases, as the number of sampling data points increases.

We explore the possibility of using GP as an uncertain Pareto frontier classifier. Surprisingly, GP is one of a few well-established machine learning techniques that considers uncertainty in prediction through its posterior variance function. Because of this particular reason, GP is employed as an uncertain classifier to construct the Pareto frontier, where the uncertainty is quantified by the GP posterior variance function (Eq. (6)).

The main idea is to force the Pareto GP classifier to balance its learning by exploitation and exploration, especially when the Pareto frontier prediction is not accurate by focusing on the unknown region at the beginning of the optimization process. As the optimization advances, if the Pareto frontier can be classified with high accuracy, the BO framework should be exploited to promote convergence and explored to promote the diversity in the MO optimization settings. This is consistent with the philosophy of the classical BO approach, which strikes for the balance of exploration and exploitation. Furthermore, the convergence and diversity, which are the two keys measure of MO optimization problems [2], are promoted by employing common acquisition functions on the Pareto frontier GP. As a result, the acquisition of Pareto GP classifier is included as another in the main acquisition function for the classical BO method (Eq. (10)).

It is noteworthy to point out that the Pareto classification problem is also mutually exclusive, in the sense that a sampling point cannot be both Pareto-dominant and Pareto-non-dominant at the same time. For an unknown location x, the Pareto GP classifier provides both the probability of being Pareto-dominant μPareto(x), which is bounded between 0 and 1, as well as the uncertainty associated with the probability as $σPareto2(x)$. Instead of employing logistic logit function which maps from (−∞, + ∞) to [0, 1] for binary classification problem [35], we simply use the regression formulation for GP and clip the prediction if the responses are beyond the range of [0, 1].

### 2.5 Acquisition Function.

We propose an extended criteria in learning the Pareto frontier considering uncertainty, as opposed to directly couple the Pareto-dominant probability into the EI acquisition function in Davins-Valldaura et al. [5]. The composite acquisition function is defined as
$a(x)=aobj(x)⏟objectiveGP⋅aPareto(x)⏟uncertainPareto⋅Pr(x|c(x)=1)⏟unknowncstr.⋅I(x)⏟knowncstr.$
(10)
Equation (10) is explained as follows. The first term, $aobj(x;μobj(x)$, $σobj2(x))$, is the regularized augmented Tchebycheff single-objective function in Eq. (8). At each time-step, a random weight vector w = (w1, …, ws) is sampled to combine multiple objectives ${yj}j=1s$ to a single-objective function y. A GP model is fitted using the dataset of n data points ${xi,yi}i=1n$. An acquisition function aobj(x) is formed, as a function of posterior mean μobj(x) and posterior variance $σobj2(x)$. The second term, $aPareto(x;μPareto(x)$, $σPareto2(x))$, is the acquisition function based on the Pareto frontier GP classifier. The third term, Pr(x|c(x) = 1) = μfeasible(x), is the probability of passing unknown constraints, provided by posterior mean of the second GP. The fourth term, $I(x)$, is the indicator function describing known constraints as in Eq. (9). The next sampling point x* is obtained by maximizing the acquisition function described in Eq. (10), i.e.,
$x*=argmaxx∈Xa(x)$
(11)

### 2.6 Summary.

The srMO-BO-3GP framework is summarized in Algorithm 1. CMA-ES [36,37] is used as the sub-optimizer to obtain the next sampling point in Eq. (11), where the default settings are retained. An interface between the srMO-BO-3GP optimizer and the engineering applications is constructed by matlab, python, and Shell scripts. Also, since the maximization is set as a default setting, the Tchebycheff function is algebraically manipulated to conform with the default setting.

Algorithm 1

Input: dataset $Dn$ consisting of inputs, observation, feasibility $(x,y,c)i=1n$, multi-objective $(xi,yi)i=1n$, constraint GP $(x,ci)i=1n$,

1: for$n=1,2,…,$do

2:    randomize a weight vector w

3:    combine ${yj}j=1s$ to $y$        ▹ multi- to single- objective - Eq. 8)

4:    construct single-objective GP     ▹ GP #1: $μobj(x)$, $σobj2(x)$

5:    construct Pareto front      ▹ GP #2: $μPareto(x)$, $σPareto2(x)$

6:      find current Pareto front

7:      construct Pareto classifier GP

8:    construct constraints classifier GP ▹ GP #3: $μ\,feasible(x),σ\,feasible2(x)$

9:    locate the next sampling point $xn+1$        ▹ (Eq. 10)

10:    query objectives $yn+1←{yj}j=1s$ and feasibility $cn+1$

11:    augment dataset $Dn+1←{Dn∪(xn+1,yn+1,cn+1)}$

12: end for

## 3 Numerical Benchmarks

In this section, we follow the standard test suite [5,38] to benchmark and investigate the effectiveness and the efficiency of the proposed srMO-BO-3GP algorithm by ZDT [39] and DTLZ [40] test suites, where the geometry of Pareto frontiers are well visualized by Jin et al. [41] as well as Ishibuchi et al. [42]; analytical solutions of Pareto frontiers are provided by the work of Deb et al. [40]. The regularization λ parameter is set to 0.01, even though its effect requires a further benchmark study. We compare the numerical performance between variants of srMO-BO-3GP, particularly for

• variations of acquisition function for the objective GP: PI, EI, and UCB;

• variations of acquisition function for the Pareto GP: PI, EI, and UCB;

• regularized versus non-regularized Tchebycheff objective function.

This results in 18 different variants of the srMO-BO-3GP. The name convention for these variants is Reg/NoReg-{PI,EI,UCB}-{PI,EI,UCB}, which corresponds the option of regularization, the objective GP, and the Pareto GP. Note that in the DTLZ benchmark, d = M + k − 1, and the function g(x) is constructed based on k = dM + 1 variables, $(xM,…,xd)$. The benchmark is repeated five times for each variant BO method and each benchmark function, resulting in almost 1000 runs in total. In all the optimization runs, five initial sampling points are randomly chosen using Monte Carlo sampling. The maximum number of iteration is 1500. For ZDT and DTLZ benchmarks, the dimensions are 3 and 4, respectively. For ZDT benchmark, the number of objectives is 2, whereas for DTLZ benchmark, the number of objectives is 3.

### 3.1 ZDT Test Suite

#### 3.1.1 ZDT1.

For a d-dimensional input x ∈ [0, 1]d, a more general ZDT1 function can be introduced [39]
$f1(x)=x1,f2(x)=gh$
(12)
where
$g=1+9∑i=2dxid−1,h=1−f1g$
(13)

#### 3.1.2 ZDT2.

For a d-dimensional input x ∈ [0, 1]d, the ZDT2 function can be described as [39]
$f1(x)=x1,f2(x)=gh$
(14)
where
$g=1+9∑i=2dxid−1,h=1−(f1g)2$
(15)

#### 3.1.3 ZDT3.

For a d-dimensional input x ∈ [0, 1]d, the ZDT3 function can be described as [39]
$f1(x)=x1,f2(x)=gh$
(16)
where
$g=1+9∑i=2dxid−1,h=1−f1g−(f1gsin(10πf1))$
(17)

#### 3.1.4 ZDT4.

For a d-dimensional input x ∈ [0, 1]1 × [ − 5, 5]d−1, the ZDT4 function can be described as [39]
$f1(x)=x1,f2(x)=gh$
(18)
where
$g=1+10(d−1)+∑i=2d[xi2−10cos(4πxi)],h=1−f1g$
(19)
The Pareto frontier are with g(x) = 1.

#### 3.1.5 ZDT6.

For a d-dimensional input x ∈ [0, 1]d, the ZDT6 function can be described as [39]
$f1(x)=1−exp(−4x1)sin6(6πx1),f2(x)=gh$
(20)
where
$g=1+9(∑i=2dxid−1)0.25,h=1−(f1g)2$
(21)

### 3.2 DTLZ Test Suite

#### 3.2.1 DTLZ1.

A generalized DTLZ1 function for a d-dimensional input x ∈ [0, 1]d with M objectives, f1:M, is defined as [40]
$f1(x)=0.5(1+g)∏i=1M−1xi$
(22)
$fk=2:M−1(x)=0.5(1+g)[∏i=1M−kxi](1−xM−k+1)$
(23)
$fM(x)=0.5(1+g)(1−x1)$
(24)
where $g=100(d+∑i=1d{(xi−0.5)2−cos[20π(xi−0.5)]})$.

#### 3.2.2 DTLZ2.

A generalized DTLZ2 function for a d-dimensional input x ∈ [0, 1]d with M objectives, f1:M, is defined as [40]
$f1(x)=0.5(1+g)∏i=1M−1cos(xiπ2)$
(25)
$fk=2:M−1(x)=0.5(1+g)[∏i=1M−kcos(xiπ2)]sin(xM−k+1π2)$
(26)
$fM(x)=0.5(1+g)sin(x1π2)$
(27)
where $g=∑i=Md(xi−0.5)2$.

#### 3.2.3 DTLZ3.

A generalized DTLZ2 function for a d-dimensional input x ∈ [0, 1]d with M objectives, f1:M, is defined as [40]
$f1(x)=0.5(1+g)∏i=1M−1cos(xiπ2)$
(28)
$fk=2:M−1(x)=0.5(1+g)[∏i=1M−kcos(xiπ2)]sin(xM−k+1π2)$
(29)
$fM(x)=0.5(1+g)sin(x1π2)$
(30)
where $g=100(k+∑i=Md{(xi−0.5)2−cos[20π(xi−0.5)]})$.

#### 3.2.4 DTLZ4.

A generalized DTLZ4 function for a d-dimensional input x ∈ [0, 1]d with M objectives, f1:M, is defined as [40]
$f1(x)=0.5(1+g)∏i=1M−1cos(xiαπ2)$
(31)
$fk=2:M−1(x)=0.5(1+g)[∏i=1M−kcos(xiαπ2)]sin(xM−k+1π2)$
(32)
$fM(x)=0.5(1+g)sin(x1π2)$
(33)
where $g=∑i=Md(xi−0.5)2$.

#### 3.2.5 DTLZ5.

A generalized DTLZ5 function for a d-dimensional input x ∈ [0, 1]d with M objectives, f1:M, is defined as [40]
$f1(x)=0.5(1+g)∏i=1M−1cos(θiπ2)$
(34)
$fk=2:M−1(x)=0.5(1+g)[∏i=1M−kcos(θiπ2)]sin(xM−k+1π2)$
(35)
$fM(x)=0.5(1+g)sin(x1π2)$
(36)
where $θi=π4(1+g)(1+2gxi)$, for 2 ≤ iM, $g=∑i=Md(xi−0.5)2$.

#### 3.2.6 DTLZ6.

A generalized DTLZ6 function for a d-dimensional input x ∈ [0, 1]d with M objectives, f1:M, is defined as [40]
$f1(x)=0.5(1+g)∏i=1M−1cos(θiπ2)$
(37)
$fk=2:M−1(x)=0.5(1+g)[∏i=1M−kcos(θiπ2)]sin(xM−k+1π2)$
(38)
$fM(x)=0.5(1+g)sin(x1π2)$
(39)
where $θi=π4(1+g)(1+2gxi$ for 2 ≤ iM, $g=∑i=Mdxi0.1$.

### 3.3 Performance Metrics.

We followed and adopted the notation from Jin et al. [41] and Hernández-Lobato et al. [15], to measure numerical performance for our proposed method, namely the generational distance (GD), the inverted generational distance (IGD), and the hypervolume (HV). These metrics also require prior knowledge of the true Pareto frontier. For the sake of clarity, we denote PF* as a set of the discretized points on the true Pareto frontier of an arbitrary benchmark function, and PF as the set of obtained Pareto frontier.

#### 3.3.1 Generational Distance (GD).

The generational distance between PF* and PF, denoted as GD, is computed as
$GD(PF,PF*)=∑u∈PFd(u,PF*)|#PF|$
(40)
where d is the minimum Euclidean distance between an obtained Pareto-dominant point and the true Pareto frontier. In other words, $d(u,PF*)=argminv∈PF*d(u,v)$. The denominator, |#PF|, is the number of obtained solution in PF. A smaller value of GD indicates a better convergence to the true Pareto frontier.

#### 3.3.2 Inverted Generational Distance (IGD).

$IGD(PF,PF*)=∑v∈PF*d(PF,v)|#PF*|$
(41)
where $d(PF,v)=argminv∈PF*d(u,v)$ is the minimum Euclidean distance between v on the true Pareto frontier with the obtained Pareto frontier. The denominator, |#PF*|, is the number of discretization Pareto-dominant point. As pointed out by Jin et al. [41], if |#PF*| is sufficiently large, i.e., when the true Pareto frontier is sufficiently discretized, then IGD can measure both convergence and diversity. A smaller of IGD indicates a better convergence and diversity to the true Pareto frontier.

#### 3.3.3 Log of Relative Hypervolume Difference (LRHD).

The hypervolume metric, which is a strictly monotonic measure, also known as Lesbegue measure or $S$-metric, is computed using Walking Fish Group algorithms used [43,44]. We adopt the log of relative hypervolume difference from Hernández-Lobato et al. [15] to measure the performance. The log of relative hypervolume difference is computed as
$LRHD=log(|HV−HVideal|)$
(42)
where HVideal is the ideal hypervolume and HV > HVideal is the computed hypervolume of the obtained Pareto frontier. The origin is chosen as the reference point to compute the hypervolume. A smaller of LRHD indicates a better convergence and diversity to the true Pareto frontier.

### 3.4 Comparison Between Variants.

In Figs. 1 and 2, the label of the x-axis is noted by the objective acquisition function, followed by the Pareto acquisition function. For example, UCB-PI means UCB is used as the objective acquisition function for the scalarized Tchebycheff, and PI is used to search for the Pareto frontier. Seaborn package [45] is used to visualize the performance metrics, measure the log of generational distance, log of inverted generational distance, and log of relative hypervolume difference.

Fig. 1
Fig. 1
Close modal
Fig. 2
Fig. 2
Close modal

Figure 1 shows the comparison between 18 variants of the srMO-BO-3GP methods in ZDT test suite, including regularized versus non-regularized augmented Tchebycheff scalarized single-objective functions. Overall, regularized objective function tends to slightly outperform non-regularized objective function. For the Pareto frontier acquisition function, EI and PI acquisition functions perform better than UCB acquisition function. However, for the objective acquisition function, UCB seems to perform best, then to EI, then to PI.

Figure 2 shows the comparison between 18 variants of the srMO-BO-3GP methods in DTLZ test suite, including regularized versus non-regularized augmented Tchebycheff scalarized single-objective functions. Again, regularization seems to slightly improve the performance in terms of hypervolume. For the Pareto acquisition function, all UCB, EI, and PI acquisition functions seem to perform on par with each other. For the objective acquisition function, UCB outperforms both EI and PI, and EI and PI are of the same performance. Figure 3 compares Pareto frontiers obtained by the proposed algorithm against the true Pareto frontiers, showing an acceptable agreement in ZDT and DTLZ benchmarks. Overall, the proposed algorithm performs relatively well on most benchmarks, except for those highly oscillatory with multiple modes, but the optimum” objectives are reasonably close with the global optimum” objectives.

Fig. 3
Fig. 3
Close modal

## 4 Engineering Applications: Thermomechanical Finite Element for Flip-Chip Package Design Optimization

We demonstrate the applicability of our proposed framework to a thermomechanical FEM model for flip-chip package design, where five objectives are considered. Figure 4 shows the geometric model of the thermomechanical problem, where the mesh density varies for different levels of fidelity. Two design variables are associated with the die, three are associated with the substrate, three more are associated with the stiffener ring, two are with the underfill, and the last one is with the PCB board. Only two levels of fidelity are considered in this example. Table 1 show the design variables, the physical meaning of the design variables, as well as their lower and upper bounds in this case study.

Fig. 4
Fig. 4
Close modal
Table 1

Design variables for the FCBGA design optimization

VariableDesign partLower boundUpper boundOptimal value
x1Die200003000020702
x2Die300750320
x3Substrate300004000035539
x4Substrate10018001614
x5Substrate10 · 10−617 · 10−617 · 10−6
x6Stiffener ring200060004126
x7Stiffener ring10025001646
x8Stiffener ring8 · 10−625 · 10−68.94 · 10−6
x9Underfill1.03.01.52
x10Underfill0.51.00.804
x11PCB board12.0 · 10−616.7 · 10−616.7 · 10−6
VariableDesign partLower boundUpper boundOptimal value
x1Die200003000020702
x2Die300750320
x3Substrate300004000035539
x4Substrate10018001614
x5Substrate10 · 10−617 · 10−617 · 10−6
x6Stiffener ring200060004126
x7Stiffener ring10025001646
x8Stiffener ring8 · 10−625 · 10−68.94 · 10−6
x9Underfill1.03.01.52
x10Underfill0.51.00.804
x11PCB board12.0 · 10−616.7 · 10−616.7 · 10−6

After the numerical solution is obtained, the component warpage at 20°C, 200°C, and the strain energy density of the furthest solder joint are calculated. The five objectives are listed as follows,

• Objective-1: warpage at 20°C,

• Objective-2: warpage at 200°C,

• Objective-3: damage metric for BGA lifetime prediction,

• Objective-4: damage metric for C4 interconnection lifetime prediction,

• Objective-5: first principal stress at C4 corner, where cracking and delamination frequently occurs,

where the goal is to minimize all these objectives.

Figure 5(a) shows the scatter plot matrix between the five objectives and their joint densities, whereas Fig. 5(b) presents the Pearson correlation, which is bounded between (−1) and (+1). 851 simulations are performed, in which only 256 simulations are feasible. Out of 256 feasible simulations, 84 of those represent the current Pareto frontier of this numerical study. The objective 1 is very positively correlated with the objective 2, as both of them corresponds to the warpages at different temperatures. This implies that minimizing the object 1 would also minimize the objective 2, thus no trade-off is found. The same argument applies for objectives 3 and 5, where positive correlation is found. The objective 4 is poorly correlated with other objectives. Trade-offs are found between members of the group of objectives 1 and 2 and those of the group of objective 3 and 5. Overall, it is challenging to plot the Pareto front on high-dimensional space for a practical visualization. However, we have demonstrated that our methodology is applicable to problems with many objective functions. In practice, the number of objectives are often reduced to minimum, before the optimization study is conducted.

Fig. 5
Fig. 5
Close modal

## 5 Discussion

BO is a powerful and flexible framework which allows for many useful extensions. For example, the local GP approach [46,47] could be used to improve the scalability of GP. An extended version of local GP has also been developed [48] to solve the mixed-integer optimization problems, where the number of discrete/categorical variables is relatively small. Accelerated BO methods by exploiting computational resource on high-performance computing platform have been proposed [49,50] to reduce the amount of physical waiting time. Multi-fidelity BO approaches [51,52] have been developed to couple information between different levels of fidelity by exploiting the correlation between low- and high-fidelity models to reduce the computational efforts. Its success has been demonstrated, at least in the field of bioengineering [53] and computational fluid dynamics [54,55].

For multi-objective problems, hypervolume-based approaches remain more popular than the scalarization approaches. Furthermore, there are limitations in combining multiple objectives into single objective as this approach relies on the weights, which are randomly sampled and may lead to convergence and diversity issues. Perhaps, combining the Pareto classifier with a hypervolume-based approach would result in a better numerical performance. This topic remains as an interesting direction in the future.

An advantage of the proposed method is that the inclusion of the uncertain Pareto classifier makes the effort of locating the Pareto frontier easier in the input space $X$, regardless of whether the Pareto frontier is discontinuous, convex, or concave, because this uncertain Pareto classifier only considers the input space. As shown in this paper, the proposed method performs relatively well for multiple objectives in both ZDT and DTLZ test suites.

As usual, the predictive performance of GP is highly dependent on the dimensionality of the problem, i.e. the number of design variables, and the characteristics of the underlying function, which is typically assumed to be smooth. It is worthy to note that the proposed algorithm does not depend on the number of objective functions, as the Pareto classifier is constructed directly on the input space $X$ and the weighted Tchebycheff scheme combines multiple objectives to a single objective. Therefore, the proposed srMO-BO-3GP method has a competitive advantage for problems with high number of objective functions. Furthermore, it does not depend on the characteristics of the Pareto frontiers, i.e., whether they are convex, concave, or discontinuous. The known constraints do not have any effect on the optimization problem, as we assume they can be cheaply evaluated and the functional evaluation f(·) would not be invoked if one of the known constraints gk is violated. The number of unknown constraints also does not have any effect on the optimization problems, as we are primarily concerned with the input space $X$, and the probability of satisfying all unknown constraints is modeled by another GP, which can be considered as a black-box function. In summary, the main factor that may have the most impact on the numerical performance of srMO-BO-3GP is the dimensionality of the optimization problem.

The drawback of Tchebycheff decomposition for MO problems is that it can only identify a unique Pareto solution point for a fixed weight vector at best (cf. Miettinen [56]). The diversity of the Pareto frontier solely depends on the weight vectors, which is uniformly distributed in this case. Furthermore, because the Tchebycheff scheme does not consider the current Pareto frontiers, it does not promote diversity the Pareto frontiers, even though one might argue that the convergence is preserved. The second term in the composite acquisition function (Eq. (10)), which corresponds to the Pareto classifier, is inserted to promote the diversity in Pareto frontiers and bypass the weight adjustment problem.

As theoretically formulated in the BO framework, the acquisition function strikes for the balance of exploration and exploitation, which would probabilistically converge to the global optimum, regardless of the number and locations of initial sampling points, as described in Bull [57]. Strong theoretical guarantees are often provided in the case of single objective function as BO is well known as a no-regret optimization algorithm. Along with the Tchebycheff argument in Miettinen [56], where optimizing with a fixed weight vector would lead to a unique Pareto point, the proposed srMO-BO-3GP method does not depend on the initial sampling points, at least from a theoretical point of view.

The proposed method does not cope well with high-dimensional problems, as they are the usual challenges for GP. While some solutions have been proposed, the scalability problem remains as an active research area and is posed as the future extension to this work.

## 6 Conclusion

In this paper, we propose a multi-objective Bayesian optimization framework, called srMO-BO-3GP in a sequential setting with applications to engineering-based simulations. In this framework, three distinct GPs are coupled together. The first GP is used to approximate a single-objective function, which is converted from the original multi-objective functions using a regularization-augmented Tchebycheff function. The second GP is used to learn the unknown constraints, which are evaluated simultaneously with the set of objective functions, supporting the general case where some performance metrics reflect design aspirations (objectives) while others must satisfy prescribed requirements (constraints). The third GP is used as an uncertain binary classifier to learn the Pareto frontier, where its own acquisition function is embedded in the main acquisition function. The srMO-BO-3GP framework is demonstrated using two numerical benchmarking test suites, ZDT and DTLZ, and an engineering thermomechanical FEM application.

## Acknowledgment

The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.

## Conflict of Interest

There are no conflicts of interest.

## References

1.
Allmendinger
,
R.
,
Emmerich
,
M. T.
,
Hakanen
,
J.
,
Jin
,
Y.
, and
Rigoni
,
E.
,
2017
, “
Surrogate-Assisted Multicriteria Optimization: Complexities, Prospective Solutions, and Business Case
,”
J. Multi-Criteria Decis. Anal.
,
24
(
1–2
), pp.
5
24
.
2.
Rojas-Gonzalez
,
S.
, and
Van Nieuwenhuyse
,
I.
,
2019
, “
A Survey on Kriging-Based Infill Algorithms for Multi-Objective Simulation Optimization
”.
FEB Research Report KBI_1907
.
3.
Jeong
,
S.
, and
Obayashi
,
S.
,
2005
, “
Efficient Global Optimization (EGO) for Multi-Objective Problem and Data Mining
,”
2005 IEEE Congress on Evolutionary Computation
,
Edinburgh, Scotland
, Vol.
3
,
IEEE
, pp.
2138
2145
.
4.
Knowles
,
J.
,
2006
, “
ParEGO: a Hybrid Algorithm With On-Line Landscape Approximation for Expensive Multiobjective Optimization Problems
,”
IEEE Trans. Evol. Comput.
,
10
(
1
), pp.
50
66
.
5.
Davins-Valldaura
,
J.
,
Moussaoui
,
S.
,
Pita-Gil
,
G.
, and
Plestan
,
F.
,
2017
, “
ParEGO Extensions for Multi-Objective Optimization of Expensive Evaluation Functions
,”
J. Global Optim.
,
67
(
1–2
), pp.
79
96
.
6.
Dächert
,
K.
,
Gorski
,
J.
, and
Klamroth
,
K.
,
2012
, “
An Augmented Weighted Tchebycheff Method With Adaptively Chosen Parameters for Discrete Bicriteria Optimization Problems
,”
Comput. Oper. Res.
,
39
(
12
), pp.
2929
2943
.
7.
Zhang
,
Q.
,
Liu
,
W.
,
Tsang
,
E.
, and
Virginas
,
B.
,
2009
, “
Expensive Multiobjective Optimization by MOEA/D with Gaussian Process Model
,”
IEEE Trans. Evol. Comput.
,
14
(
3
), pp.
456
474
.
8.
Li
,
M.
,
Li
,
G.
, and
Azarm
,
S.
,
2008
, “
A Kriging Metamodel Assisted Multi-Objective Genetic Algorithm for Design Optimization
,”
J. Mech. Des.
,
130
(
3
), p.
031401
.
9.
Li
,
G.
,
Li
,
M.
,
Azarm
,
S.
,
Al Hashimi
,
S.
,
Al Ameri
,
T.
, and
Al Qasas
,
N.
,
2009
, “
Improving Multi-Objective Genetic Algorithms With Adaptive Design of Experiments and Online Metamodeling
,”
Struct. Multidiscipl. Optim.
,
37
(
5
), pp.
447
461
.
10.
Feliot
,
P.
,
Bect
,
J.
, and
Vazquez
,
E.
,
2017
, “
A Bayesian Approach to Constrained Single-and Multi-Objective Optimization
,”
J. Global Optim.
,
67
(
1–2
), pp.
97
133
.
11.
Yang
,
K.
,
Emmerich
,
M.
,
Deutz
,
A.
, and
Bäck
,
T.
,
2019
, “
Multi-Objective Bayesian Global Optimization Using Expected Hypervolume Improvement Gradient
,”
Swarm Evol. Comput.
,
44
, pp.
945
956
.
12.
,
H.
, and
Tovar
,
A.
,
2020
, “
A Simple and Effective Methodology to Perform Multi-Objective Bayesian Optimization: An Application in the Design of Sandwich Composite Armors for Blast Mitigation
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
, Vol.
84003
,
Virtual, Online
,
American Society of Mechanical Engineers
, p.
V11AT11A056
.
13.
Gupta
,
S.
,
Shilton
,
A.
,
Rana
,
S.
, and
Venkatesh
,
S.
,
2018
, “
Exploiting Strategy-Space Diversity for Batch Bayesian Optimization
,”
International Conference on Artificial Intelligence and Statistics
,
Playa Blanca, Lanzarote, Canary Islands
,
PMLR
, pp.
538
547
.
14.
Shu
,
L.
,
Jiang
,
P.
,
Shao
,
X.
, and
Wang
,
Y.
,
2020
, “
A New Multi-Objective Bayesian Optimization Formulation With the Acquisition Function for Convergence and Diversity
,”
ASME J. Mech. Des.
,
142
(
9
), p.
091703
.
15.
Hernández-Lobato
,
D.
,
Hernández-Lobato
,
J.
,
Shah
,
A.
, and
,
R.
,
2016
, “
Predictive Entropy Search for Multi-Objective Bayesian Optimization
,”
International Conference on Machine Learning
,
PMLR
, pp.
1492
1501
.
16.
Garrido-Merchán
,
E. C.
, and
Hernández-Lobato
,
D.
,
2019
, “
Predictive Entropy Search for Multi-Objective Bayesian Optimization With Constraints
,”
Neurocomputing
,
361
, pp.
50
68
.
17.
Abdolshah
,
M.
,
Shilton
,
A.
,
Rana
,
S.
,
Gupta
,
S.
, and
Venkatesh
,
S.
,
2019
, “
Cost-Aware Multi-Objective Bayesian Optimisation
.”
arXiv preprint
.
18.
Gaudrie
,
D.
,
Le Riche
,
R.
,
Picheny
,
V.
,
Enaux
,
B.
, and
Herbert
,
V.
,
2019
, “
Targeting Solutions in Bayesian Multi-Objective Optimization: Sequential and Batch Versions
,”
Ann. Math. Artif. Intell.
,
88
(
1
), pp.
1
26
.
19.
Palar
,
P. S.
,
Zuhal
,
L. R.
,
Chugh
,
T.
, and
Rahat
,
A.
,
2020
, “
On the Impact of Covariance Functions in Multi-Objective Bayesian Optimization for Engineering Design
,”
AIAA Scitech 2020 Forum
,
AIAA
, p.
1867
.
20.
Golovin
,
D.
, and
Zhang
,
Q.
,
2020
, “
Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization
.”
arXiv preprint
.
21.
Han
,
D.
, and
Zheng
,
J.
,
2020
, “
A Kriging Model-Based Expensive Multi-Objective Optimization Algorithm Using R2 Indicator of Expectation Improvement
,”
Math. Probl. Eng.
,
2020
.
22.
Zhao
,
G.
,
Arroyave
,
R.
, and
Qian
,
X.
,
2018
, “
Fast Exact Computation of Expected Hypervolume Improvement.
arXiv preprint
.
23.
Beume
,
N.
,
Fonseca
,
C. M.
,
Lopez-Ibanez
,
M.
,
Paquete
,
L.
, and
Vahrenhold
,
J.
,
2009
, “
On the Complexity of Computing the Hypervolume Indicator
,”
IEEE Trans. Evol. Comput.
,
13
(
5
), pp.
1075
1082
.
24.
Tran
,
A.
,
Eldred
,
M.
,
Wang
,
Y.
, and
McCann
,
S.
,
2020
, “
srMO-BO-3GP: A Sequential Regularized Multi-Objective Constrained Bayesian Optimization for Design Applications
,”
Proceedings of the ASME 2020 IDETC/CIE, Vol. Volume 1: 40th Computers and Information in Engineering Conference of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Virtual, Online
,
American Society of Mechanical Engineers
.
25.
Lin
,
X.
,
Zhen
,
H.-L.
,
Li
,
Z.
,
Zhang
,
Q.
, and
Kwong
,
S.
,
2018
, “
A Batched Scalable Multi-Objective Bayesian Optimization Algorithm
.”
arXiv preprint
.
26.
Shahriari
,
B.
,
Swersky
,
K.
,
Wang
,
Z.
,
,
R. P.
, and
de Freitas
,
N.
,
2016
, “
Taking the Human out of the Loop: A Review of Bayesian Optimization
,”
Proc. IEEE
,
104
(
1
), pp.
148
175
.
27.
Ishibuchi
,
H.
,
Sakane
,
Y.
, and
Nojima
,
Y.
,
2010
, “
Use of Multiple Grids with Different Scalarizing Functions in MOEA/D
,”
In SCIS & ISIS SCIS & ISIS 2010
,
Okayama, Japan
,
Japan Society for Fuzzy Theory and Intelligent Informatics
, pp.
898
903
.
28.
Trivedi
,
A.
,
Srinivasan
,
D.
,
Sanyal
,
K.
, and
Ghosh
,
A.
,
2016
, “
A Survey of Multiobjective Evolutionary Algorithms Based on Decomposition
,”
IEEE Trans. Evol. Comput.
,
21
(
3
), pp.
440
462
.
29.
Bentley
,
J. L.
,
1975
, “
Multidimensional Binary Search Trees Used for Associative Searching
,”
Commun. ACM
,
18
(
9
), pp.
509
517
.
30.
Hastie
,
T.
,
Rosset
,
S.
,
Zhu
,
J.
, and
Zou
,
H.
,
2009
, “
,”
Stat. Its Interface
,
2
(
3
), pp.
349
360
.
31.
Breiman
,
L.
,
2001
, “
Random Forests
,”
Mach. Learn.
,
45
(
1
), pp.
5
32
.
32.
Hearst
,
M. A.
,
Dumais
,
S. T.
,
Osuna
,
E.
,
Platt
,
J.
, and
Scholkopf
,
B.
,
1998
, “
Support Vector Machines
,”
IEEE Intell. Syst. Appl.
,
13
(
4
), pp.
18
28
.
33.
Suykens
,
J. A.
, and
Vandewalle
,
J.
,
1999
, “
Least Squares Support Vector Machine Classifiers
,”
Neural Process. Lett.
,
9
(
3
), pp.
293
300
.
34.
LeCun
,
Y.
,
Bengio
,
Y.
, and
Hinton
,
G.
,
2015
, “
Deep Learning
,”
Nature
,
521
(
7553
), p.
436
.
35.
Rasmussen
,
C. E.
,
2004
, “
Gaussian Processes in Machine Learning
,”
Advanced Lectures on Machine Learning
,
Springer
, pp.
63
71
.
36.
Hansen
,
N.
,
Müller
,
S. D.
, and
Koumoutsakos
,
P.
,
2003
, “
Reducing the Time Complexity of the Derandomized Evolution Strategy With Covariance Matrix Adaptation (CMA-ES)
,”
Evol. Comput.
,
11
(
1
), pp.
1
18
.
37.
Hansen
,
N.
, and
Kern
,
S.
,
2004
, “
Evaluating the CMA Evolution Strategy on Multimodal Test Functions
,”
International Conference on Parallel Problem Solving from Nature
,
Birmingham, UK
,
Springer
, pp.
282
291
.
38.
Huband
,
S.
,
Hingston
,
P.
,
Barone
,
L.
, and
While
,
L.
,
2006
, “
A Review of Multiobjective Test Problems and a Scalable Test Problem Toolkit
,”
IEEE Trans. Evol. Comput.n
,
10
(
5
), pp.
477
506
.
39.
Zitzler
,
E.
,
Deb
,
K.
, and
Thiele
,
L.
,
2000
, “
Comparison of Multiobjective Evolutionary Algorithms: Empirical Results
,”
Evol. Comput.
,
8
(
2
), pp.
173
195
.
40.
Deb
,
K.
,
Thiele
,
L.
,
Laumanns
,
M.
, and
Zitzler
,
E.
,
2005
, “
Scalable Test Problems for Evolutionary Multiobjective Optimization
,”
Evolutionary Multiobjective Optimization
,
Springer
, pp.
105
145
.
41.
Jin
,
Y.-F.
,
Yin
,
Z.-Y.
,
Zhou
,
W.-H.
, and
Huang
,
H.-W.
,
2019
, “
Multi-Objective Optimization-Based Updating of Predictions During Excavation
,”
Eng. Appl. Artif. Intell.
,
78
, pp.
102
123
.
42.
Ishibuchi
,
H.
,
Masuda
,
H.
, and
Nojima
,
Y.
,
2015
, “
Pareto Fronts of Many-Objective Degenerate Test Problems
,”
IEEE Trans. Evol. Comput.
,
20
(
5
), pp.
807
813
.
43.
Cox
,
W.
, and
While
,
L.
,
2016
, “
Improving the IWFG Algorithm for Calculating Incremental Hypervolume
,”
2016 IEEE Congress on Evolutionary Computation (CEC)
,
,
IEEE
, pp.
3969
3976
.
44.
While
,
L.
,
2005
, “
A New Analysis of the LebMeasure Algorithm for Calculating Hypervolume
,”
In International Conference on Evolutionary Multi-Criterion Optimization
,
Guanajuato, Mexico
,
Springer
, pp.
326
340
.
45.
Michael Waskom, and the seaborn development team
,
2020
46.
Tran
,
A.
,
He
,
L.
, and
Wang
,
Y.
,
2018
, “
An Efficient First-Principles Saddle Point Searching Method Based on Distributed Kriging Metamodels
,”
ASCE-ASME J. Risk Uncertainty in Eng. Syst., Part B: Mech. Eng.
,
4
(
1
), p.
011006
.
47.
,
R.
,
Chan
,
Y.-C.
,
Wang
,
L.
,
Zhu
,
P.
, and
Chen
,
W.
,
2019
, “
Globally Approximate Gaussian Processes for Big Data With Application to Data-Driven Metamaterials Design
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111402
.
48.
Tran
,
A.
,
Tran
,
M.
, and
Wang
,
Y.
,
2019
, “
Constrained Mixed-Integer Gaussian Mixture Bayesian Optimization and Its Applications in Designing Fractal and Auxetic Metamaterials
,”
Struct. Multidiscipl. Optim.
,
59
, pp.
1
24
.
49.
Tran
,
A.
,
Sun
,
J.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
, and
Wang
,
Y.
,
2019
, “
pBO-2GP-3B: A Batch Parallel Known/unknown Constrained Bayesian Optimization With Feasibility Classification and Its Applications in Computational Fluid Dynamics
,”
Comput. Methods. Appl. Mech. Eng.
,
347
, pp.
827
852
.
50.
Tran
,
A.
,
McCann
,
S.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
,
Wildey
,
T.
, and
Eldred
,
M.
,
2020
, “
aphBO-2GP-3B: A Budgeted Asynchronous-Parallel Multi-Acquisition for Known/Unknown Constrained Bayesian Optimization on High-Performing Computing Architecture
.”
arXiv preprint
.
51.
Tran
,
A.
,
Wildey
,
T.
, and
McCann
,
S.
,
2019
, “
sBF-BO-2CoGP: A Sequential Bi-Fidelity Constrained Bayesian Optimization for Design Applications
,”
Proceedings of the ASME 2019 IDETC/CIE, Vol. Volume 1: 39th Computers and Information in Engineering Conference of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Anaheim, CA
,
American Society of Mechanical Engineers
, p.
V001T02A073
.
52.
Tran
,
A.
,
Wildey
,
T.
, and
McCann
,
S.
,
2020
, “
sMF-BO-2CoGP: A Sequential Multi-fidelity Constrained Bayesian Optimization for Design Applications
,”
ASME J. Comput. Inf. Sci. Eng.
,
20
(
3
), p. 031007.
53.
Travaglino
,
S.
,
Murdock
,
K.
,
Tran
,
A.
,
Martin
,
C.
,
Liang
,
L.
,
Wang
,
Y.
, and
Sun
,
W.
,
2020
, “
Computational Optimization Study of Transcatheter Aortic Valve Leaflet Design Using Porcine and Bovine Leaflets
,”
ASME J. Biomech. Eng.
,
142
(
1
), p.
011007
.
54.
Tran
,
A.
,
Furlan
,
J. M.
,
Pagalthivarthi
,
K. V.
,
Visintainer
,
R. J.
,
Wildey
,
T.
, and
Wang
,
Y.
,
2019
, “
WearGP: A Computationally Efficient Machine Learning Framework for Local Erosive Wear Predictions Via Nodal Gaussian Processes
,”
Wear
,
422
, pp.
9
26
.
55.
Tran
,
A.
,
Wang
,
Y.
,
Furlan
,
J.
,
Pagalthivarthi
,
K. V.
,
Garman
,
M.
,
Cutright
,
A.
, and
Visintainer
,
R. J.
,
2020
, “
WearGP: A UQ/ML Wear Prediction Framework for Slurry Pump Impellers and Casings
,”
ASME 2020 Fluids Engineering Division Summer Meeting
,
Virtual, Online
,
American Society of Mechanical Engineers
.
56.
Miettinen
,
K.
,
2012
,
Nonlinear Multiobjective Optimization
, Vol.
12
.
Springer Science & Business Media
,
New York
.
57.
Bull
,
A. D.
,
2011
, “
Convergence Rates of Efficient Global Optimization Algorithms
,”
J. Mach. Learn. Res.
,
12
, pp.
2879
2904
.