## Abstract

Bayesian optimization (BO) is a low-cost global optimization tool for expensive black-box objective functions, where we learn from prior evaluated designs, update a posterior surrogate Gaussian process model, and select new designs for future evaluation using an acquisition function. This research focuses upon developing a BO model with multiple black-box objective functions. In the standard multi-objective (MO) optimization problem, the weighted Tchebycheff method is efficiently used to find both convex and non-convex Pareto frontiers. This approach requires knowledge of utopia values before we start optimization. However, in the BO framework, since the functions are expensive to evaluate, it is very expensive to obtain the utopia values as a prior knowledge. Therefore, in this paper, we develop a MO-BO framework where we calibrate with multiple linear regression (MLR) models to estimate the utopia value for each objective as a function of design input variables; the models are updated iteratively with sampled training data from the proposed MO-BO. These iteratively estimated mean utopia values are used to formulate the weighted Tchebycheff MO acquisition function. The proposed approach is implemented in two numerical test examples and one engineering design problem of optimizing thin tube geometries under constant loading of temperature and pressure, with minimizing the risk of creep-fatigue failure and design cost, along with risk-based and manufacturing constraints. Finally, the model accuracy with frequentist, Bayesian and without MLR-based calibration are compared to true Pareto solutions.

## 1 Introduction

In the early design phase, it is very important for the designers to be able to identify potential good design decisions in a large design space while the design cost is low. This helps the designers to easily eliminate the undesirable designs and avoid investing in high cost manufacturing and testing of those designs at the later design phase. In practice, most of the design problems are too complex to be handled by simple optimization frameworks due to having constraints in cost, time, formulation, etc. Also, approximating a complex design problem into much simpler problems can lead to the disregard of original complex constraints; thus it can violate such constraints and not be a useful approach for practical decisions. Some practical design problems have been investigated where complex optimization frameworks have been modeled [1–3]. However, in many design optimization problems, it is difficult to numerically formulate an objective function and therefore we consider those problems as having a black-box objective function with high function evaluation cost due to limited resources, tools, time, etc. Thus, a trade-off between learning and expense is likely and therefore, a low-fidelity surrogate model is developed to reduce the cost. When we have no/limited knowledge on the expensive true objective function, we cannot guarantee the maximization of our learning towards an optimal solution without proper guidance or expertise. Also, due to the mentioned high function evaluation cost, exhaustive search is not a valid option. In such black-box engineering design problems, a Bayesian optimization (BO) technique, which eliminates the need of standard formulation of objective functions [4–6], is widely applied in sequential learning to provide better guidance in sampling the designs for expensive experiments or function evaluations in order to find the optimal region of the unknown function at minimal cost of experiments. In this approach, we first build a posterior surrogate model, given the data from the current evaluations. We then use this model to strategically select the best design locations for future evaluations by maximizing the acquisition functions defined from the posterior model.

*Motivation:* In the design of mechanical components under temperature and pressure cycling, such as heat exchangers, the design decisions are defined as optimal geometries which focus on multiple objectives, such as minimizing the risk of creep-fatigue failure and the design cost under continuous cycling of temperature and pressure. The stated risk depends on the location of the structural design in regions such as elastic, shakedown, plastic, or ratcheting. A similar example has been provided for thin tubes where the location of the designs can be numerically represented from a Bree diagram [7] (Fig. 8 in Appendix A) in terms of pressure and thermal stresses. Under cyclic loading, the elastic and shakedown regions in the Bree diagram are considered as the safe region where no strain accumulation occurs or the growth of residual strain is practically diminishing when sufficient loading cycles are applied. However, plastic and ratcheting in the Bree diagram are unsafe designs where the plastic strain accumulates until failure. The cost of the design involves the total cost in the entire design process and thus can be based on material, component, manufacturing, assembly, and quality checking. When the complexity of the design increases like in a diffusion bonded compact heat exchanger [8], we cannot provide a numerical representation of the function which defines the location of designs. Also, considering different costs in the design process makes the cost function hard to formulate. This makes both the objectives such as risk and cost functions to be black-box functions, representing the problem as black-box multi-objective (MO) design problem.

In an optimization problem with multiple objectives, it is a challenging task to pick a best optimization method, as the performance of the method depends upon the problem and its constraints, as well as dimension, functional form, computational cost, etc. Although there are many MO formulations, the weighted Tchebycheff method (WTB) [9,10], a global criterion method, is the focus in this paper to find every Pareto-optimal solution from both convex and non-convex regions of the Pareto frontier. In a minimization problem, a point *x** in the feasible design space *S* is Pareto-optimal if and only if there does not exist another point *x* in the set *S* that gives a lower minimum objective function for at least one of the objectives *f*_{i} < *f*_{i}(*x**) without sacrificing the others *f*_{j} (*x*) ≤ *f*_{j} (*x**). However, this approach requires knowledge of utopia values, i.e., optimal values of each objective independently, before we start optimization. Thus, for any multi-objective optimization (MOO) problem having a minimum of two objectives, we need to solve three optimization problems with three different objective functions: one problem with objective function 1 (finding utopia 1), one problem with objective function 2 (finding utopia 2), and finally with MO function. It is obvious that with *q* objectives, the number of optimization problems increases to *q* + 1. In the BO framework, since the objective functions are black-box and expensive to evaluate, it is very expensive to conduct so many individual BO optimizations to obtain the utopia values as a prior knowledge. One way we can mitigate this issue is by providing a rough approximation of utopia points as global minimum or maximum values of the respective expensive black-box objectives [11,12]. For example, the global minimum values of risk and cost could be zero. However, the value might not be feasible due to different manufacturing and design constraints and therefore are not the true utopia values. In the WTB, at a given vector of weighting factors **w** on two objectives *f*_{1} and *f*_{2}, the WTB forms a rectangle and searches along the diagonal to find the Pareto-optimal solution on the Pareto frontier curve. Let us assume, in Fig. 1, that the true Pareto-optimal solution lies at (*C*) where true utopia (*u*) is shown. If the location of (*u*) is unknown and is estimated to be at position (*u*′) (in *X* or *Y*), we see with the same weighting factor (**w**) that the WTB will now project the Pareto-optimal solution to *C*′ (the diagonal is shifted). Thus, we cannot find the true optimal solution (*C*) for weighting factor (**w**), if we use the incorrect estimate for the utopia of *u*′. This motivates us to predict utopia values using low-cost statistical regression models instead of assuming certain values, such as zero. A regression model, given the existing data, is used as a proxy surrogate model for each expensive black-box utopia optimization. The estimation of utopia values, rather than assuming values, increases the accuracy of the local optimal solution at negligible computational cost. The goal of this paper is to formulate a weighted Tchebycheff multi-objective Bayesian optimization (WTB MO-BO) approach for expensive black-box optimization using adaptive sampling to minimize the overall cost for expensive function evaluations of the black-box objectives to find the optimal solutions. This approach avoids the scalability issue from running multiple BO to solve for each utopia value, properly utilizing sequential adaptive sampled training data (guided by WTB MO-BO) for better estimation of the utopia through calibration from cheap surrogate regression models, rather than having a fixed educated guess for the utopia. We start our work in this paper with a focus on linear regression models to obtain better model accuracy. However, the proposed approach can be extended to implement any linear or non-linear regression models based on the complexity of the problem.

To illustrate our approach, we have simplified a large-scale complex design of a heat exchanger to a thin tube design problem, which will be presented as the case study to solve using the proposed approach and to compare results with the known true solutions. The road-map of this paper is as follows. Section 2 provides an overview on BO and different MOO methods. Section 3 provides the general algorithm of the proposed weighted Tchebycheff MO-BO framework, which can be applicable to any MO problems with any number of objectives. Section 4 presents the implementation of the approach on numerical examples. Section 5 describes the case study of cyclic pressure–temperature loaded thin tube problem. Section 6 shows the results of the application of the proposed design architecture to the case study with detailed comparison. Section 7 concludes the paper with final thoughts.

## 2 Background

In this section, we provide literature reviews on BO in single and MO settings, different methods for formulating MO functions, and developing multiple linear regression (MLR) models in frequentist and Bayesian approaches.

### 2.1 Bayesian Optimization.

BO [13] is an emerging field of study in the sequential design methods. It is considered as a low-cost global optimization tool for design problems having expensive black-box objective functions. The general idea of the BO is to emulate an expensive unknown design space and find the local and global optimal locations while reducing the cost of function evaluation from expensive high-fidelity models. This approach has been widely used in many machine learning problems [14–18]. However, attempts have been made when the response is discrete like in consumer choice modeling problems where the responses are in terms of user preference [13,19]. The idea is to approximate the user preference discrete response function into continuous latent functions using a Binomial-Probit model for two choices [20,21] and a polychotomous regression model for more than two choices where the user can state no preference [22].

BO adopts a Bayesian perspective and assumes that there is a prior on the function; typically, we use a Gaussian process (GP) prior. The overall BO approach has two major components: a predictor or GP model and the acquisition function. As shown in Fig. 2, assuming a Gaussian prior, we first build a posterior GP model, given the current available data from the experiments. The surrogate GP model then predicts the objective or response of the samples generated from a design of experiments (DOE)-based sampling method within a design space. We then use this model to strategically select the best design locations for future experiments by maximizing the acquisition function, defined from the posterior simulations which are obtained from the GP model.

#### 2.1.1 Gaussian Process Model.

Figure 3 shows a simple 1D GP model with one design variable *x* and one response variable *z* = *f*(*x*). The dots are the experimental design variables and the dotted and solid lines are the true and the predictor mean functions or responses in the design space, given the observations. The shaded area along the solid line shows the measure of uncertainty over the surrogate Gaussian process model (GPM) prediction. We can clearly see that the variance near the observations is small and increases as the design samples are farther away from the observational data, thereby related to Kriging models where the errors are not independent. Much research has been ongoing regarding incorporating and quantifying uncertainty of the experimental or training data by using a nugget term in the predictor GPM. It has been found that the nugget provides better a solution and computational stability framework [23,24]. Furthermore, GPM has also been implemented in high dimensional design space exploration [25] and big data problems [26], as an attempt to increase computational efficiency. As an alternative to GPs, random forest regression has been proposed as an expressive and flexible surrogate model in the context of sequential model-based algorithm configuration [27]. Although random forests are good interpolators in the sense that they output good predictions in the neighborhood of training data, they are very poor extrapolators where the training data are far away [28]. This can lead to selecting redundant exploration (more experiments) in the non-interesting region as suggested by the acquisition function in the early iterations of the optimization, due to having additional prediction error of the region far away from the training data. This motivates us to consider the GP model in Bayesian framework. However, in MO settings (multiple responses), the GP model still fits each objective or response variables independently, thus assuming the objectives are uncorrelated. In this paper, we follow this assumption as well. A survey of implementation of different GP packages has been provided in different coding languages like matlab, R, and python [29].

#### 2.1.2 Acquisition Function.

The second major component in BO is the acquisition function whose goal is to guide the search for the optimum and thereby bring the sequential design into the BO. The acquisition function predicts the improvement of each sample. The improvement value depends on exploration (unexplored design spaces) and exploitation (region near high responses). Thus, the acquisition function gives high value of improvement to the samples whose mean prediction is high, variance is high, or both. Thus, by maximizing the acquisition function, we strategically select design points which have the potential to have the optimal (maximum value of the unknown function) and gradually reduce the error to align with the true unknown function with iterations. Over the years, various formulations have been applied to define the acquisition functions. One such method is the *probability of improvement* (PI) [30] which is an improvement-based acquisition function. Jones [31] notes that the performance of PI(.) “is truly impressive;” however, the difficulty is that the PI(.) method is extremely sensitive to the choice of the target. If the desired improvement is too small, the search will be highly local and will only move on to search globally after searching nearly exhaustively around the current best point. On the other hand, if the small-valued tolerance parameter *ζ* in PI(.) equation, as in Ref. [31], is set too high, the search will be excessively global, and the algorithm will be slow to fine-tune any promising solutions. Thus, the *expected improvement* (EI) acquisition function [13] is widely used over PI to provide an efficient trade-off between exploration and exploitation, with a good balance for locating promising solutions while avoiding any redundant evaluations through excessive search. Another acquisition function is the *confidence bound criteria* introduced by Cox and John [32], where the selection of points is based on the upper or lower confidence bound of the predicted design surface for maximization or minimization problem, respectively.

In recent years, two common approaches have been taken for solving black-box MO design problems using BO: (1) formulating a MO (hyper-volume) acquisition function and (2) building a combined (single) MO function and then treat it as a single acquisition function. As for the first approach, the *expected improvement hyper-volume* (EIHV) acquisition function has been formulated to provide better performance [33,34]. However, to increase the computational efficiency, Yang et al. [35] modified the EIHV acquisition function into the *expected hyper-volume gradient-based* (EIHVG) acquisition function and proposed an efficient algorithm to calculate it. To reduce the computational cost, other acquisition functions like the *max-value entropy search* [36] and *predictive entropy search* [37] have been formulated. Abdolshah et al. [38] proposed a MO-BO framework with preference over objectives on the basis of stability and attempting to focus on the Pareto front where preferred objectives are more stable. The first approach can posses significant model complexity in formulating multiple acquisition functions with a large number of objectives. The second approach does not possess such limitations but requires a special consideration of the choice of the method to formulate the MO. As our paper aims to limit challenges for scaling to high dimensional problems, we follow the second approach and choose a popular global criterion, the WTB. Recent work has used this approach, where the WTB has been augmented in MO-BO in Ref. [39] with a ridge regularization term to smoothen the converted single MO function. Our work focuses upon a modeling technique for the weighted Tchebychef-based MO-BO model to estimate the utopia using regression analysis.

### 2.2 Multi-Objective Optimization.

*weighted sum method*[40] where we transform all the objectives into single objective of weighted objectives. The method though simple and fast is inefficient to find the true Pareto-optimal points, mainly in the non-convex region. To increase the performance, a global criterion method, the WTB [9,10,41], is introduced where the MOs are combined into a weighted distance metric that guarantees finding all the Pareto-optimal solution. Another a priori method is the

*ε*-

*constraint*method where the most critical objective is picked and treated as a single objective problem and the other objectives are treated as constraints [42]. Other a priori methods include the

*Lexicographic*method where the objective functions are organized sequentially in order of preferences [43]. Some a posteriori methods used in MOO problems are

*vector evaluated genetic algorithm*[44],

*niche methods*[45],

*particle swarm algorithm*[46], Non-dominated sorting genetic algorithm II (

*NGSA 2)*[47], etc. Readers can refer Refs. [40,48] for application in practical MO design problems using the above-mentioned a priori and a posteriori methods, as well as Ref. [49] for additional methods. It is a challenging task to pick a best method, as the performance of the method depends on the problem and its constraints, as well as dimension, functional form, computational cost, etc. In this paper, we are focused upon a priori methods where we pre-define the weights of multiple black-box objectives (setting a goal for the acquisition function in the BO setting) and then search for the Pareto front using BO with the pre-defined weights. As for the a posteriori methods, it can be difficult to determine a convergence criteria in the BO setting, as the goal is not pre-set for the acquisition function and we need sufficient expensive sampling to search for the Pareto frontier. In this paper, we choose to focus on the WTB in formulating the proposed MO-BO framework due to the simplicity in the formulation of combining several multiple objectives into a combined single MO function, which can be treated with a single acquisition function in the BO architecture.

## 3 Design Methodology

Figure 4 shows the detailed structure of the proposed weighted Tchebycheff MOBO framework (weighted Tchebycheff MO-BO) with the augmentation (highlighted in red) of the regression models for unknown parameter (utopia point) estimation. This proposed design architecture can be applicable to any MO black-box problem, extending to any number of expensive black-box objectives. The steps within the brown bold lines are the low-cost computational framework in the MO-BO setting, whereas the steps within the green bold lines are where the expensive evaluations are conducted outside the previously discussed computational framework. The bold blue line connects the low-fidelity computational framework with the expensive evaluations at each iteration of the MO-BO. Below we provide the detailed explanation of the steps of the weighted Tchebycheff MO-BO.

### 3.1 Step-by-Step Explanation for Proposed Weighted Tchebycheff Multi-Objective Bayesian Optimization.

*Step 0 (Initialization):*Define the design space or the region of interest for a given problem. From the defined design space, generate a grid matrix $X`$ using a DOE approach. In this paper, we used the Latin hypercube sampling method. However, any space filling DOE approach can be used. Load any experimental data pre-sampled from expensive experimental analysis, or this algorithm can be initialized with limited randomly selected experimental data. Calculate the objectives and normalize the values. Normalizing each objective is preferred to avoid any discrepancies due to inconsistent magnitude between the objectives.*Step 1: Build a training data matrix with the sampled designs*: We build a training data matrix with the sampled data. We define*X*as design input variables and*Y*as output functions. Create the training data matrix, assuming at iteration*k*of the BO,**D**= [_{k}**X**,_{k}**Y(X**]._{k})*Step 2: Constraint validation*: Next, we validate the generated design grid matrix $X`$ and the initialized sampled data with any constraints if present for the problem. We select only the designs which do not violate the constraints. Let**X**be the feasible unsampled grid matrix of the input designs for which we did not do any expensive experimental analysis (true output is unknown) and_{f}**D**be the filtered feasible sampled data matrix of inputs and respective outputs from expensive analysis with_{f,k}**D**∈_{f,k}**D**._{k}*Step 3: Predicting the utopia point from regression analysis*: Conduct regression modeling using the feasible sampled data,**D**. The reason for not including infeasible data for our regression modeling is to avoid potentially influential observations which can increase the error in the regression modeling and thus the utopia prediction, having an adverse effect on the proposed MO-BO model. Thus, as general approach, it is better to consider only feasible designs for this regression modeling. Calculate the mean utopia values for all the black-box objectives of the problem, $\mu ^k(u)=[\mu ^k(u1),\mu ^k(u2),\u2026,\mu ^k(uq)]$ (assuming_{f,k}*q*expensive black-box objectives). These values, at iteration*k*+, are used in Step 6.*Step 4: GPM*: Next with the full available sampled data (prior knowledge),**D**, we can develop posterior surrogate GP models for each objectives independently. Thus, we develop a Bayesian framework. It is to be noted to build the GP model, unlike in Step 3 regression modeling, we can use the data matrix_{k}**D**which also contains infeasible data. This is because with more data, the GP model reduces uncertainty in the overall design space and provides better prediction where more data are sampled._{k}*Step 5*: Use the posterior GP model to conduct posterior predictive simulations of the unsampled feasible designs in the grid matrix**X**and predict respective mean and MSE of the objective outputs, forming two matrices of $\mu ^(Yf(Xf))\u2223Mk$ and $\sigma ^2(Yf(Xf))\u2223Mk$, respectively, where_{f}**M**=_{k}**GP**(**D**) represents the surrogate posterior GP model fitted with data_{k}**D**._{k}*Step 6: Define the acquisition function and maximize the acquisition function*: Define the weighted Tchebycheff MO acquisition function and maximize the acquisition function,**U**(.)*U*(.|**M**). The general equation of the weighted Tchebycheff MO function is_{k}where(2)$minX(Ymulti)=minX(maxq=1,2,\u2026,Q[wq(Yq\u2212uq)])$*w*_{q}is the weighting factor of*q*th objective,*Y*_{q}and*u*_{q}are the*q*th objective function values and its respective utopia point. Minimizing the maximum weighted distance from the utopia among the objective functions will provide the non-dominated solutions or Pareto-optimal solutions. In the case of multiple expensive black-box objectives,*Y*_{q}is estimated from the posterior GP model (step 5) and the respective utopia*u*_{q}is estimated from the fitted low computational cost regression model (step 3). Finally, Eq. (2) is transformed into maximizing the acquisition function of*Y*_{multi}, thus selecting samples for expensive function evaluations with higher likelihood of being a Pareto-optimal solution. Thus, Eq. (2) has been changed as follows:We calculated the acquisition function value for each feasible non-sampled designs individually, considering the respective mean and MSE values in matrices generated in step 5. Thus, we develop the vector of the acquisition function values as(3)$maxxfU(\u2212maxq=1,2,\u2026,Q(wq*|\mu ^(yf,q(xf)\u2223Mk)\u2212\mu ^k(uq)|))$**U**(−**Y**(_{multi}**X**)|_{f}**M**). It is to be noted that we transformed the multiple objectives into a single MO function using the Tchebycheff method, thus we can treat Eq. (3) as single objective acquisition function. Since the general setting in BO is the maximization of the acquisition function, we added a negative sign before the weighted Tchebycheff MO function in Eq. (3). In this paper, we applied an EI acquisition function. In the computation of EI for all the unsampled designs, the vectors of the predicted MSE of the respective weighted Tchebycheff MO function,_{k}*σ*^{2}(**Y**(_{multi}**X**)), are the predicted MSE of the individual objectives (from GP in Stage 5) which have maximum absolute weighted Tchebycheff distance. Thus, for a design_{f}*x*_{f}Finally, the EI acquisition function is formulated as Eqs. (5) and (6):(4)$\sigma 2^(ymulti(xf))=\sigma 2^yf,q(maxq=1,2,\u2026,Q(wq*|\mu ^(yf,q(xf)\u2223Mk)\u2212\mu ^k(uq)|)$(5)$U(yf,multi\u2223Mk)=EI(Yf,multi)={(\mu ^(yf,multi)\u2212ymulti(x+)\u2212\xi )*\Phi (Z,0,1)+\sigma ^(yf,multi)*\varphi (Z)if\sigma ^(yf,multi>0)0if\sigma ^(yf,multi=0)$where(6)$Z={\mu ^(yf,multi)\u2212ymulti(x+)\u2212\xi \sigma ^(yf,multi)if\sigma ^(yf,multi>0)0if\sigma ^(Yf,multi=0)$*y*_{multi}(*x*^{+}) is the maximum of the negative actual responses (for minimization problem) among all the sampled data until the current stage which is at*x*=*x*^{+}; $\mu ^(yf,multi)$ and $\sigma 2^(yf,multi)$ are the predicted mean and MSE from GPM for the non-sampled design*x*_{f}∈*X*_{f}; Φ(.) is the cdf;*ϕ*(.) is the pdf;*ξ*≥ 0 which is recommended to be 0.01 [15].A selection criterion is applied to choose new design location for future sampling,*x*_{f,max}∈**X**which will maximize the predicted improvement of the learning of the unknown design space (maximizing acquisition function). Thus we select the design with maximum acquisition function value as_{f}Finally, augment the data, $Dk=[Dk;(xf,max,yf,max)]$.(7)$yf,max(xf,max)=max(U(\u2212Ymulti(Xf)\u2223Mk))$*Step 7*: Check for convergence criteria 1. If met, stop and proceed to step 10. Else, run*j*= 1:*n*loops of steps 4–7; each loop takes one optimal design location*x*_{f,max,j}; to select the best*n*design locations**X**= [_{f,max}*x*_{f,max,1},..,*x*_{f,max,n}] to proceed to the next round of experiments. The reason for this step to provide multiple experimental data in a single round of an experiment is it will be time consuming to provide one experiment at a time. The assumption behind this step is that we believe that the GP prediction of*x*_{f,max,j}is accurate and proceed to the next best location*x*_{f,max,j+1}by minimizing the error in the current selected location*x*_{f,max,j}. We believe this is a fair assumption since with more knowledge, GP prediction will be close to the actual experimented data. Therefore, in the early round of experiments, though we might see deviations from the actual experimented results (not following the assumptions), with the knowledge from those experiments, eventually the GP will improve and provide predictions closer to the actual experimented results as the model goes to convergence.*Step 8: Expensive function evaluations*: Conduct experiments for new design locations**X**=_{k+1}**X**. This step is outside the model environment as actual expensive experiments or function evaluations will be conducted from the original high-fidelity model. Therefore, new experimented data are [_{f,max}**X**,_{k+1}**Y(X**)]._{k+1}*Step 9: Data augmentation*: Update the prior knowledge for the next iteration of the model. Update training data matrix of both the regression (step 3) and the GP models (step 4) with current sampled data as**D**= [_{f,k+1}**D**; (_{f,k}**X**,_{k+1}**Y(X**))] and_{k+1}**D**= [_{k+1}**D**; (_{k}**X**,_{k+1}**Y(X**))]. Repeat steps 3–9 until convergence._{k+1}*Step 10:*If convergence criteria 1 or 2 is met, STOP the model and note the optimal solution,**X**and_{opt}**Y**= [_{opt}*y*_{1}(**X**),_{opt}*y*_{2}(**X**), …,_{opt}*y*_{q}(**X**)]. Convergence criteria 1 and 2 will be explained in the next section._{opt}

### 3.2 Convergence Criteria.

In this section, we discuss the convergence criteria established for the model. From the steps of the proposed MO-BO model, there are two checks for convergence in the model in step 7 and in step 10. Note the convergence criteria in step 7 as convergence 1 and the convergence criteria in step 10 as convergence 2. If either of the convergence checks succeed, the model stops and returns the final solution. Below is the list of convergence criteria which are implemented in the models.

*Convergence 1.*The maximum improvement value of the acquisition function in selecting the first design sample (first iteration in step 7) after conducting actual experiments is less than very small value,*α*. Mathematically, it can be stated as(8)$ifj==1max[U(\u2212Ymulti(Xf)\u2223Mk)]\u2264\alpha $*Convergence 2.*Stopping the model after limiting the budget in terms of maximum number or experiments or function evaluations, i.e., $\u2211nk\u2265S$, where*S*is the maximum number of function evaluations possible and*n*_{k}is the number of samples selected for experiments at*k*th iteration.

## 4 Implementation on Numerical Examples

To see the general application of the proposed design architecture, we first implement using two numerical MO examples. As the scope of the proposed Bayesian framework in this paper is for solving expensive black-box MO problems, we assume the numerical functions are black-box in nature, for which we do not know the true utopia during the optimization process. We have considered here two numerical test problems (maximization): test problem (1): *ZDT1* and test problem (2): *2D six-hump camel back—inversed-Ackley’s path* MO functions. The full mathematical equations of the optimization problems ((22)–(B3)) are provided in the Appendix. For the *ZDT1* problem, we considered two design variables and assume that function 2 is only black-box, where we do not know the true utopia *u*_{2}. Function 1 is very simple and therefore we did not assume that to be a unknown black-box, knowing the true utopia, *u*_{1} = 1 prior to the optimization. Thus, for this problem, we compared the performance of WTB MO-BO with a fixed incorrect assumption of *u*_{2} = 0 and predicting *u*2 using regression analysis with MLR and Bayesian multiple linear regression (BMLR). For the *2D six-hump camel back—inversed-Ackley’s path*, we considered two design variables and assumed both functions are black-box, where we do not know *u*_{1} and *u*_{2}. Thus, for this problem, we compared the performance of WTB MO-BO with a fixed incorrect assumption of *u*_{1} = 0 and *u*_{2} = 0 and estimate those using regression analysis with MLR and BMLR.

Tables 1 and 2 show the overall quantitative measurement of the performance of the MO-BO design architectures across the weighting factors, *w*_{1} = [0, 0.1, …, 1] and *w*_{2} = 1 − *w*_{1}, in terms of overall accuracy (minimum mean Euclidean norm) and the overall variability in accuracy (minimum std. dev. Euclidean norm) among the weights of the true Pareto-optimal solutions. We stop each optimization problem using the convergence criteria 1 (Eq. (7)) with *α* = 10^{−5} met. We use the origin of the Euclidean norms as the respective true Pareto-optimal solutions, calculated numerically. Thus, the better method should have minimum mean Euclidean norms and minimum standard deviations (insensitive) to different weighting parameter values. From both the tables, we can clearly see the proposed WTB MO-BO method with calibrating the model to estimate utopia through regression analysis and provide better accuracy when compared to the true solutions. Furthermore, we do not see any significant difference on the performance considering frequentist MLR or Bayesian BMLR regression models on both the numerical examples. Although the linear model might not be best for estimating multi-modal or non-linear functions (e.g., test problem 2), it performs better than using an incorrect assumption for the utopia. In general, we see a trade-off between accuracy and function evaluations. However, regarding the first test problem (Table 1), the increase of the function evaluations is not significant. We do see higher function evaluations for test problem 2 (Table 2). One interesting observation in both tables is that the regression model which predicted the utopia with better solution accuracy also has lower function evaluations. Thus, with the choice of regression models for better estimating utopia for a black-box problem, we can also potentially lower the function evaluations.

## 5 Case Study

In this section, we describe the engineering problem of a pressure–temperature loaded thin tube design problem which provides the proof of concept for a higher dimension complex expensive black-box design problem, such as a compact heat exchanger. As the tube is assumed to undergo constant loading of temperature and pressure, there will be risk of creep-fatigue failure which depends on the design geometry. Thus, we choose the design variables as radius (*R*), length (*L*), and thickness (*t*) of the tube. Next, we describe the experimentation and the formulation of the multiple objectives and constraints of the thin tube problem, with the regression modeling considered for this case study.

### 5.1 Experimental Analysis.

In this section, we provide a short overview of the experimental procedure. To formulate the objective of risk of creep-fatigue failure, we need to find the location of any design in terms of elastic, plastic, shakedown, and ratcheting, and the respective strain accumulation. We represent these outputs as the responses from the expensive experiments. In this paper, we considered the Bree diagram for a non-work-hardened material whose yield stress remains unchanged by changes in mean temperature as provided in Appendix A. For the sake of simplicity, we have ignored the further division of shakedown (*S*1, *S*2) and ratcheting (*R*1, *R*2) as shown in the figure and assumed a single region of shakedown (*S*) and ratcheting (*R*), because the design risks are equivalent in the *S*1 and *S*2, and *R*1 and *R*2 regions, respectively. The three major steps we follow in the procedure are (1) calculate pressure and temperature stress of the design point, (2) determine the region of the design in terms of elastic, plastic, shakedown, and ratcheting, and (3) calculate the strain accumulation based on the location of the design. The detailed computation of the whole process for the thin tube can be found in our earlier paper (Sec. 3.1, [50]). In the thin tube design, although these computations are not expensive and can be done analytically, we still represent these as expensive function evaluations which will be true in our future problem of considering a diffusion bonded compact heat exchanger where expensive *finite element analysis* is required.

### 5.2 Formulation of Objective Functions and Constraints.

In this section, we provide the formulation of the proposed MO function with the manufacturing and other design constraints.

#### 5.2.1 Multiple Objectives for the Thin Tube Problem.

*Y*

_{1}, which is defined on the location of the design and the respective strain values. The quantification of the distance function was the scope of our earlier paper, and therefore, we would encourage the readers to look into (Sec. 3.2, [50]) for detailed computation and explanation. Since our goal is to minimize the risk of creep-fatigue failure, we minimize the distance function. Thus, our objective 1 distance function is defined as follows:

*k*is the

*k*th iteration of the BO model. It is to be noted that in each iteration, with more experimental or training data (increase prior knowledge), the distance value for all the training data is re-evaluated. The next objective is the cost function,

*Y*

_{2}, which is defined in our problem as the material cost of the tube. For the sake of simplicity and numerical computation, we focus upon the material cost only which intuitively we believe that having more material in the tube generally leads to a stronger design and minimizes the risk of creep-fatigue failure, but increases the cost of the design. Thus, we have a trade-off between risk and material cost. It is to be noted that our proposed MO-BO model has the flexibility to add any other costs as part of our objectives, which can be a black-box function in the complex design. Thus, our objective 2 cost function is defined as follows:

*P*is the material cost per kg,

*ρ*is the density of the material,

*V*is the total volume, and

*t*is the thickness of the tube.

#### 5.2.2 Design and Manufacturing Constraints for the Thin Tube Problem.

Below are some of the manufacturing and design constraints which have been considered in the case study of the thin tube problem to ensure safe and realistic design solutions.

*Creep-fatigue failure:*Though one of our objectives is to minimize the risk of creep-fatigue failure, the minimization of the other objective increases the risk of creep-fatigue failure. Thus, in the trade-off between risk and cost, it is acceptable to have a design which falls in the domain of the safe region (elastic and shakedown) but closer to the transition to the unsafe region (plastic and ratcheting), thus minimizing the cost at the expense of risk of failure. However, it is not acceptable to have a design which is likely to fall into unsafe region, irrespective of the reduction of cost. We termed that design as an unsafe design and therefore not a feasible solution. Thus, we formulate the constraint to bound the objective 1 and ensure we do not have a trade-off where the design will fail. Since the transition boundary will not be known exactly when the objectives are black-box, we can predict the transition boundary as in our earlier paper [50]. In this paper, we assumed that we have already pre-trained the model with the transition region with minimal error rate. Thus, to account for the uncertainty of the prediction of the transition region, we define the boundary as a probabilistic constraint as Eq. (11).

*Y*

_{1}(

*R*,

*L*,

*t*)|

**GP**) is the predicted value of the response distance function,

_{stage1}*Y*

_{1}, for the input design variables, given the converged GP model of the MO-BO framework in the pre-stage (Sec. 4, [50]). The error rate in the feasibility check is taken into account as the probabilistic constraint where Rel = 0.99 is the reliability factor considered in this problem. The value 0.5 is the threshold since we set this distance value at the transition boundary line.

*Normal stress:*The normal stress should not exceed the yield stress. Therefore, constraint equation (12) is defined as

*L*

_{D}= 1 kN is the load exerted on the wall of the thin tube,

*t*is the thickness, and

*σ*

_{y}= 205 MPa is the yield stress.

*Buckling load:*The tube should not buckle. Therefore, the constraint equation (13) is defined as

*E*= 207 GPa is Young’s modulus.

*Aspect ratio:*This constraint equation (14) ensures to avoid any unrealistic design solutions with

*L*≫

*R*.

*δ*= 0.025 in this problem.

### 5.3 Linear Regression Modeling for the Thin Tube Problem.

*R*,

*L*, and

*t*). It is also noted that the models do not need to have the same predictor design variables and the regression models can be different for different objectives to estimate the utopia. However, in this paper, we avoid that complexity and use the same regression models for estimation of both utopia.

*k*of the BO model; $\mu ^(.\u2223R,L,t)$ is the mean of the estimated objective function.

*p*(

*β*)), likelihood function (ℓ(

*Y*|

*X*,

*β*)), and posterior distribution (

*p*(

*β*|

*Y*,

*X*)). For BMLR, we set Gaussian (or normal) priors as Eqs. (17) and (18).

*k*− 1 of the MO-BO model.

*I*

_{p+1}is the (

*p*+ 1) identity matrix where

*p*is the number of input design variables. Thus, the general idea is to compute the posterior distribution of the regression coefficients at the current iteration of the MO-BO model; the respective prior distribution is taken as the posterior distribution of the regression coefficients at the previous iteration. At iteration

*k*= 1, we start with non-informative prior as $p(\beta .,q,1)\u223cMVN(0,106Ip+1)$. Thus, unlike the frequentist approach, in the Bayesian framework, we consider both the knowledge of experimental or training data and the prior knowledge of the estimate of regression coefficients parameters.

**X**= [

_{f}*R*

_{f},

*L*

_{f},

*t*

_{f}]) which are not sampled from expensive experiments or function evaluations. As we want to minimize the objectives (Eqs. (9) and (10)), therefore, the estimated utopia values at iteration

*k*will be the minimum predicted mean values as per Eqs. (21) and (22).

## 6 Results: Thin Tube Design

In this section, we will show the results of the implementation of the proposed WTB MO-BO framework on the engineering problem of pressure–temperature reloaded thin tube design described in Sec. 5. Now, we provide the performance of our proposed WTB MO-BO architecture, implementing on an cyclic temperature–pressure reloaded thin tube problem. With radius, length, and thickness of the tube as decision variables, referring to step 0-Algorithm 3.1, the pre-stage optimization [50] is done to locate the unknown (black-box) constraint boundary (see constraint equation (10)) between safe and unsafe designs in terms of creep-fatigue failure. The feasible bounds for radius, length, and thickness are [4–6.55] mm, [0.1–1] m, and [0.8–1.7] mm, respectively, and this defines our region of interest. Considering the convergence criteria 1 (Eq. (7)), where *α* = 10^{−3}, Table 3 shows the confusion table for mis-classification which gives a low error rate of 1.1%.

*Data analysis:* Before running the MO-BO problem, we conduct some statistical data analysis to visualize the relationship among the design variables and objective functions. It is to be noted that the data sampled from the pre-optimization problem is first filtered to get only the feasible data for this MOO problem. Any data which are infeasible for this problem are discarded since those can affect our analysis and the proposed data-driven MO-BO model by acting as an influential observations or outliers. This is because, in the current MO-BO model, sampling will be only done within the feasible design space and will not be explored in the infeasible region. Figure 5 shows the scatter plots with a correlation matrix among the design variables (radius, length, and thickness) and objectives (distance and cost). From the correlation matrix, we can clearly see the negative association (corr coeff = −0.73) between both objectives, which signifies the trade-offs in our optimization problem. We can see that the design variables are all positively correlated (0.847, 0.74, and 0.168, respectively) with the cost function, which agrees with the intuition that with the increase of radius, length, and thickness, the material cost of the tube increases. Similarly, we see that radius and length are negatively correlated (−0.547 and −0.911) with distance function, which shows the risk of creep-fatigue failure increases with the decrease of these variables. Interestingly, thickness is weakly positively correlated (0.125) with distance function, which is a discrepancy with our intuition. However, since the relationship of radius and length with the objective functions is much stronger than thickness, those variables will primarily drive the trade-offs in our MO-BO problem. Looking at the scatter plots of the design variables with the objective functions, we see some variables better meet linearity assumptions than the others. Similar conclusions can be drawn from the residuals plots as in Figs. 6 and 7, where we see some patterns. However, none of the plots show a strong violation of linearity or the constant variance assumption; therefore, we have focused in this paper with linear models. However, further improvement can be achieved considering more complex models, including non-linear terms and that study is left for future research.

*Comparison between MLR-based calibrated and non-calibrated MO-BO models:* Next, we run the proposed MO-BO model using two scenarios of weighting factors on objectives distance and cost functions, *w*_{11} = [0, 0.1, …, 1] and *w*_{21} = 1 − *w*_{11}; *w*_{12} = [0, 0.025, …, 1] and *w*_{22} = 1 − *w*_{12}, and considering the convergence criteria 1 (Eq. (7)) with *α* = 10^{−5}. We used the design and analysis of computer experiments (DACE) toolboxin matlab for the GP model. The MO-BO model along with the MLR models has been coded in matlab 2018. Figures in Table 4 show the comparison among Pareto-optimal solutions at those weights using different optimization methods and different strategies (calibrating versus non-calibrating) for providing utopia values in the objective criteria space. The red circles in both the figures are the Pareto frontiers obtained numerically from the exhaustive search using the WTB, where the true utopia values [0.1764, 0.3545] are known. Thus, all the red circles can be considered as the *true* Pareto-optimal solutions, which are well spread with the weighting factors (trade-offs) of the objectives, likely giving unique solutions at different weights. This shows the efficiency of the WTB in solving MOO problems. Next, assuming the utopia values are unknown and the objectives are black-box, we conduct the weighted Tchebycheff MO-BO model with model calibration where utopia values are iteratively predicted from MLR models (denoted by blue +); and without model calibration by considering constant global optimal values [0, 0] as suggested in Refs. [11,12], (denoted by black *). We can see that the earlier (blue +) are more spread and closely aligned with true Pareto solutions (red o) than the former (black *).

For better visualizations, we calculated and compared the Euclidean norms of the Pareto-optimal solutions, from calibrated and non-calibrated MO-BO at different weights, using the origins as the respective true Pareto-optimal solutions (red o in Table 4). The better method should have minimum Euclidean norms and the values should have minimum variability (insensitive) to different weighting parameter values. From Table 5 (left figure), we can see the results (blue +-line) from the proposed calibrated MO-BO approach where the Pareto-optimal solutions are much more accurate and the solution accuracy is less variable across the weights or trade-offs between objectives (mean Euclidean norm = 0.1043; std. dev. = 0.0377) than the results (black *-line) from using the non-calibrated model with utopia values of global solutions [0, 0] (mean Euclidean norm = 0.1209; std. dev = 0.0623). Similar conclusions can be drawn from Table 5 (right figure) where Euclidean norms are compared across 41 weights, *w*_{12} and *w*_{22}, where we see minimum mean and std. dev. of the norm for the proposed model as 0.1082 and 0.0349, respectively, when compared the models without calibration as 0.1297 and 0.0537, respectively. Although we see that the proposed approach has relatively higher norm values in the range of weights in *w*_{12}, [0.15 − 0.25], it is much better overall in comparison (following minimum norm and variability to weights).

*Comparison between frequentist and Bayesian MLR-based calibrated MO-BO models:* Finally, we made a comparison of the proposed MO-BO model using frequentist MLR and Bayesian MLR in terms of accuracy (minimum Euclidean norm) and the variability towards the true Pareto-optimal solutions as shown in Table 6. We considered the weighting factor, *w*_{11} = [0, 0.1, …, 1] and *w*_{21} = 1 − *w*_{11}, with two case studies. The first case study (left figure of Table 6) is the 2D problem where the radius (*R*) and length (*L*) of the tube are considered as decision variables, while the thickness (*t*) of the tube is kept as a constant value at 1.7mm. The second case study (right figure of Table 6) is the 3D problem with radius (*R*), length (*L*), and thickness (*t*) as design variables. The Bayesian MLR model has been implemented in stan (a probabilistic programming language) using a *Markov chain Monte Carlo* approach (no. of Markov chain = 4, no. of warmup iter/chain = 1000, max. iter/chain = 4000) and considering *Metropolis–Hastings* sampling algorithm to approximate posterior distribution of regression coefficients. From the comparison in the 2D problem, we can see better accuracy and less variability to different weights in the frequentist MLR-based calibrated MO-BO model (denoted by the blue +-line) than in the Bayesian MLR-based calibrated MO-BO model (denoted by green *-line): the mean Euclidean norm and the standard deviation in frequentist approach are 0.0048 and 0.0037, respectively, compared to those in the Bayesian approach of 0.011 and 0.0078, respectively. This is interesting in that the efficiency of a Bayesian model is either similar or higher than a frequentist model because it contains prior information on data. On this simple 2D problem, it is likely that the increased efficiency from the prior information is not significant. However, both these approaches clearly surpass the performance of a model with non-calibrated fixed utopia values (black x-line) and a model with using the infeasible data for calibration (red o-line). Next, as we move to more complex 3D problem (right figure of Table 6), we see that the Bayesian MLR-based calibrated MO-BO model (denoted by red o-line) now performs better than the frequentist MLR-based calibrated MO-BO model (denoted by blue +-line) with mean and standard deviation of the Euclidean norm as 0.078 and 0.039 and 0.104 and 0.038, respectively. Similarly, the model without calibration performs the worst. We know that one of the criteria for efficiency of the MO-BO models is how accurate we predict the utopia values. We think the Bayesian linear regression model performs better than the frequentist method in reducing the overall residual errors as expected, but not necessarily minimizing the residual error near the utopia region which gives lower accuracy to predict the utopia values. This is why we see the MO-BO model calibrated with the Bayesian MLR provided lower accuracy for the 2D case study problem. However, both approaches show good performance overall. Further improvement will be addressed in future where we will add some trade-off and flexibility in selecting regression models to minimize the residual error near the utopia region along with the overall reduction in the entire feasible design space.

## 7 Conclusion

In this paper, we have proposed the application of BO to account for multiple objectives, formulating the acquisition function of the WTB and calibrating the MO-BO model in which the prior knowledge of utopia values of each objective (optimal values of each objective independently) are iteratively predicted using frequentist or Bayesian MLR model. The computational cost of the proposed approach, using a regression model as the surrogate model, is negligible compared to simulating an expensive black-box model to solve for the utopia point. This proposed weighted Tchebycheff MO-BO model (weighted Tchebycheff MO-BO) is applied to thin tube design, as a proof of concept to find the optimal geometry (radius, length, and thickness), considering two objectives such as distance (risk of failure) and cost function, with manufacturing and design constraints. The data used in the prediction of the utopia point in each iteration are the sampled data from expensive experiments or function evaluations, suggested by maximizing the weighted Tchebycheff MO acquisition function. Thus, in the calibration of the proposed MO-BO model, the MLR models utilize the existing sampled experiment data, while providing at least better local optimal solutions than the non-calibrated model in terms of an overall accuracy and its variability at different trade-offs, as shown in the results. Our approach incorporates a Bayesian framework (using a Gaussian prior) into the optimization, where we strategically select design samples in order to maximize the learning (locating optimal designs) iteratively and reduce the overall cost of training data to achieve the desired level of accuracy, without the need of numerical formulations and having potential to consider for practical design problems. From this proposed approach using a priori WTB, we also lower the complexity of the formulation of the acquisition function as no matter how many objectives we have; it is still a simple formulation to priorily combine all objectives into a single weighted Tchebycheff MO function, and then fit into a single acquisition function which can be easily solved using existing methods like EI, PI, etc. This is unlike in previous work where the complexity of the formulation (hyper-volume) of the MO acquisition function rises significantly with increasing the number of objectives. Though the focus of this paper is on one MOO technique (weighted Tchebycheff), the method is applicable to any other global criterion MOO methods where prior knowledge of utopia is required.

There are few directions that we can proceed in this research as future tasks. As stated, we started our work in this paper with implementing linear regression models and thereby obtained better model accuracy. The future research scope is to add flexibility and further increase the proposed MO-BO model accuracy by developing a model selection (linear or non-linear) criteria based on the measures of cost, complexity, and minimizing errors in model fitting, thereby broadening the utility of the proposed approach. As we discussed, we focused on a priori MO methods; however, we also want to explore an adaptive technique to sample only the best weighting factors on the objectives, and thus minimize the number of runs while avoiding the loss of accuracy on Pareto frontier. Over the years, MO-BO has been handled by formulating the acquisition functions which consider mean vectors of objectives, obtained from independent runs of GP models for each objective. A comparison of the proposed method with existing approaches such as EIHV, EIHVG, etc. in terms of accuracy with true solutions will be considered as future scope.

## Acknowledgment

This research was funded in part by DOE NEUP DE-NE0008533. The opinions, findings, conclusions, and recommendations expressed are those of the authors and do not necessarily reflect the views of the sponsor.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtained from the corresponding author upon reasonable request.

### Appendix A

### Appendix B

*g*= 1 + 9

*x*

_{2}, 0 ≤

*x*

_{1}≤ 1, and 0 ≤

*x*

_{2}≤ 1.

*a*= 20,

*b*= 0.2,

*c*= 2

*π*, −3 ≤

*x*

_{1}≤ 3, −2 ≤

*x*

_{2}≤ 2, and

*x*

_{1},

*x*

_{2}≠ 0.