Abstract

The paper presents a novel approach to applying Bayesian Optimization (BO) in predicting an unknown constraint boundary, also representing the discontinuity of an unknown function, for a feasibility check on the design space, thereby representing a classification tool to discern between a feasible and infeasible region. Bayesian optimization is a low-cost black-box global optimization tool in the Sequential Design Methods where one learns and updates knowledge from prior evaluated designs, and proceeds to the selection of new designs for future evaluation. However, BO is best suited to problems with the assumption of a continuous objective function and does not guarantee true convergence when having a discontinuous design space. This is because of the insufficient knowledge of the BO about the nature of the discontinuity of the unknown true function. In this paper, we have proposed to predict the location of the discontinuity using a BO algorithm on an artificially projected continuous design space from the original discontinuous design space. The proposed approach has been implemented in a thin tube design with the risk of creep-fatigue failure under constant loading of temperature and pressure. The stated risk depends on the location of the designs in terms of safe and unsafe regions, where the discontinuities lie at the transition between those regions; therefore, the discontinuity has also been treated as an unknown creep-fatigue failure constraint. The proposed BO algorithm has been trained to maximize sampling toward the unknown transition region, to act as a high accuracy classifier between safe and unsafe designs with minimal training cost. The converged solution has been validated for different design parameters with classification error rate and function evaluations at an average of <1% and ∼150, respectively. Finally, the performance of our proposed approach in terms of training cost and classification accuracy of thin tube design is shown to be better than the existing machine learning (ML) algorithms such as Support Vector Machine (SVM), Random Forest (RF), and Boosting.

1 Introduction

In the early design phase, it is very important for the designers to be able to identify the feasible regions in a large design space, while the design cost is low. This knowledge guides can help guide the designers to eliminate inferior designs and avoid investing in the high cost prototyping and testing of those designs at the later design phase. With efficient knowledge of the feasible space, the designers can also avoid falsely selecting infeasible designs as optimal which can result in high risk to failure consequences. In design practice, most design problems are too complex to be handled by simple optimization frameworks due to having constraints on cost, time, formulation, etc. Also, approximating a complex design problem into much simpler problems can lead to the negligence of the original complex constraints; thus, the design may violate those constraints and not provide a useful choice for practical decisions. Some practical design problems have been investigated where complex optimization frameworks have been modeled [13]. However, in many design problems, it is difficult to numerically formulate an objective function or constraint boundaries and, therefore, we consider those as black-box problems which typically have high function evaluation cost [4,5]. Thus, a trade-off between learning and expense is present, and a low fidelity surrogate model is often implemented to reduce cost. When we have no or limited knowledge on the expensive true unknown functions, we cannot guarantee the maximization of our learning toward optimizing the functions without proper guidance or expertise. Also, due to the mentioned high function evaluation cost, exhaustive search is not a valid option. In such problems, a Bayesian Optimization (BO) technique (BO), which eliminates the need of standard formulation of the black-box functions, is widely applied in sequential learning and provides better guidance in sampling the designs for expensive experiments, or function evaluations, in order to find the optimal region of that unknown function at minimal cost of experiments. In the BO approach, we first build a posterior surrogate model, given the data from the current evaluations. We then use this model to strategically select the best design locations for future evaluations by maximizing the Acquisition functions, defined from the posterior model. BO can be used in optimizing any black-box functions in a design problem, either to emulate the unknown objective functions, when the goal is to locate the optimal solutions, or to emulate the unknown constraints when the goal is the classification and preservation of only potential good designs [6]. This paper is focused on BO framework to emulate the unknown constraint boundary as a classification problem for design feasibility check. However, the motivation behind this classification problem comes from the ultimate goal of design optimization with a complex design space which is described in Sec. 1.1.

1.1 Research Motivation.

Although BO is a powerful method, it works on the assumption that the true function is continuous [7] and generally fails to converge to the solution if the objective function has a discontinuity. This is because of insufficient knowledge of the BO about the nature of the discontinuity of the unknown true function. Figures 1 and 2 provide an example where a BO model fails to converge to the true discontinuous function even after excessive sampling. Figure 1 shows the true response function in terms of design variables x1, x2 where there is jump discontinuity at x1, x2 = 1. Figure 2 shows how inefficiently the BO model emulated the true function even after 500 sampling for function evaluations, denoted by black dots, and produces a very nonsmooth surface with many peaks near the discontinuity. With this limitation of BO for the discontinuous design space, we will present a research example problem to highlight our motivation behind the design feasibility check classification problem.

1.2 Research Contribution.

To address the issues of Sec. 1.1, this paper proposes an approach to Bayesian optimization in solving the stated classification problem. The goal is to predict the location of the discontinuous transition region as a constraint boundary between the safe and unsafe regions by strategically sampling designs, while minimizing the cost of expensive function evaluations and maximizing classification accuracy in predicting the true boundary. In order to achieve the above desired goal, the research contributes the formulation of a new function which projects the original discontinuous design space into an artificial continuous design space, mitigating the limitation of BO. This new function helps us to develop the acquisition function in the proposed BO model, which when maximized, guides our sampling toward the desired unknown constraint boundary (transition region), which ultimately maximizes the classification accuracy. The contribution of this research is to provide a classification method to classify the creep-fatigue failure feasibility of any new designs in the specified design space, without conducting further expensive function evaluations once the model is fully trained (converged). This paper focuses on the proof of concept and therefore we have simplified the large-scale complex design of the diffusion bonded CHX into a simple thin tube where we will be able to compare the results obtained from the proposed model with the known true solution. The proposed approach can be considered as the pre-stage for the optimization of the design geometry of diffusion bonded CHX to minimize the risk of failure with manufacturing and experimental cost, subject to constraints for creep-fatigue failure (from proposed pre-stage) and other manufacturing constraints, which is considered as future research.

The roadmap of this paper is as follows. Section 2 provides an overview on Bayesian optimization and machine learning to solve classification problem. Section 3 presents the thin tube problem and projecting the original discontinuous design space to the artificial continuous design space, from knowledge gained by actual function evaluations or experiments. Section 4 provides the detailed description of the design methodology in fast and adaptive sampling toward predicting the constraint boundary (transition region) and minimizing error rate in the classification problem. Section 5 shows the results of the proposed approach under different design parameters. Section 6 concludes the paper with final thoughts.

2 Literature Review

2.1 Classification Problem.

A classification problem, in general, is a subset of machine learning problems where the main idea is to subset a region of interest or design space into labels or clusters through proper training of a machine learning tool with existing data. After the designer is satisfied with training the model, any new design data can be classified as which category the design has the maximum probability of belonging without further expensive evaluations. Classification problems can be subdivided into Binary Classification and Multi-Label Classification problems. To solve these, different machine learning tools has been used such as Support Vector Machine (SVM), Random Forest (RF), and Boosting [1012]. Recently, advanced methods like a neural network have been used in both binary and multi-label classification problems [13,14]. Inan et al. [15] proposed a robust neural network based classification method for premature ventricular contractions. Li et al. [16] attempted a hyperspectral image reconstruction method using a convoluted neural network to enhance classification accuracy. Similarly, clustering approaches has been taken, especially for multi-label classification problem with a large number of labels [17]. Barros et al. [18] proposed a probabilistic clustering approach for a hierarchical multi-label classification of Protein Functions. Solving a design classification problem with standard machine learning classifier methods is dependent on the quality or amount of training data and always raise the question on how much data is enough to get the maximum learning [19,20], thus can be very critical to the sampling cost and methods. Therefore, in order to apply these machine learning algorithms, we need to assume we already have a lot of existing data, which is not true in our classification problem. Considering a black-box problem where these sampled designs undergo expensive evaluations, training data is limited due to very high cost. As mentioned earlier, the research objective in this chapter is not only to identify an appropriate classifier tool, but also an efficient sampling strategy for training the classifier sequentially toward the desired goal of fast and adaptive learning (minimizing expensive evaluations). With this BO as a design classifier, we attempt to first optimize the location (discontinuity or constraint boundary) with the existing technique of minimizing expensive sampling (data) for fast and adaptive learning (data sampling suggested where there is more likelihood of achieving user-defined good solutions), then classify any new designs (either side of the discontinuity or constraint boundary) by the trained posterior surrogate model of the converged (maximized learning) BO. Thus, this research also contributes in integrating the classification technique into the existing efficient sampling method of BO, without having to worry about pre-existing data.

2.2 Bayesian Optimization.

Bayesian optimization [7] is an emerging field of study in sequential design methods. It is considered as a low-cost global optimization tool for design problems having expensive black-box objective functions. The general idea of BO is to emulate an expensive unknown design space and find the local and global optimal locations while reducing the cost of function evaluation from expensive high-fidelity models. This approach has been widely used in many machine learning problems [2125]. However, attempts have been made when the response is discrete such as in consumer modeling problems where the responses are in terms of user preference [7,26]. The idea is to approximate the user preference discrete response function into continuous latent functions using Binomial-Probit model for two choices [27,28] and polychotomous regression model for more than two choices where the user can state no preference [29]. BO has also been implemented in multi-objective [30] and high-dimensional [31,32] engineering design problems.

Bayesian optimization adopts a Bayesian perspective and assumes that there is a prior on the function; typically, we use a Gaussian process prior. The prior is represented from the experiment or training data which is assumed as realizations of the true function. The overall Bayesian optimization approach has two major components: A predictor or Gaussian Process Model (GPM) and an Acquisition Function (AF). As shown in Fig. 3, we first build a posterior GPM, given the data from the current experiments. The surrogate GPM then predicts the objective or response of the samples generated from a design of experiments (DOE) based sampling method within the design space. We then use this model to strategically select the best design locations for future experimentation by maximizing the acquisition function, defined from the posterior simulations obtained from the GPM. However, we need to assume that the objective or response is Lipschitz continuous [7]. As an alternative to a GPM, random forest regression has been proposed as an expressive and flexible surrogate model in the context of sequential model-based algorithm configuration [33]. Although random forests are good interpolators in the sense that they output good predictions in the neighborhood of training data, they are very poor extrapolators where the training data are far away [34]. This can lead to selecting redundant exploration (more experiments) in the noninteresting region as suggested by the acquisition function in the early iterations of the optimization, due to having additional prediction error of the region far away from the training data. This motivates us to consider the GPM in a Bayesian framework while extending the application to discontinuous design response surfaces, which can be represented as complex practical problems in the domain of experimental design. We next describe the GPM and AF.

2.2.1 Gaussian Process Model.

Figure 4 shows a simple 1D Gaussian Process Model with one design variable x and one response variable z = f(x). The dots are the experimental design variables and the dotted and solid lines are the true and the predictor mean functions or responses in the design space, given the observations. The shaded area along the solid line shows the measure of uncertainty over the surrogate GPM prediction. We can clearly see that the variance near the observations is small and increases as the design samples are farther away from the observational data, thereby related to kriging models where the errors are not independent. Much research has been ongoing regarding incorporating and quantifying uncertainty of the experimental or training data by using a nugget term in the predictor GPM. It has been found that the nugget provides a better solution and computational stability framework [35,36]. Furthermore, GPM has also been implemented in high-dimensional design space exploration [37] and big data problems [38], as an attempt to increase computational efficiency. A survey of implementation of different GP packages has been provided in different coding languages such as matlab, R, and python [39].

2.2.2 Acquisition Function.

The second major component in Bayesian optimization is the Acquisition Function whose goal is to guide the search for future experiments toward the desired goal and thereby bring the sequential design into the BO. The AF predicts an improvement metric for each sample. The improvement metric depends on exploration (unexplored design spaces) and exploitation (region near high responses). Thus, the acquisition function gives high value of improvement to the samples whose mean prediction is high, variance is high, or a combination of both. Thus, by maximizing the acquisition function, we select the best samples to find the optimum solution and reduce the uncertainty of the unknown expensive design space. Various formulations have been applied to define the acquisition functions. One such method is the Probability of Improvement, PI [40] which is improvement based acquisition function. Jones in Ref. [41] notes that the performance of PI(·) “is truly impressive;… however, the difficulty is that the PI(·) method is extremely sensitive to the choice of the target. If the desired improvement is too small, the search will be highly local and will only move on to search globally after searching nearly exhaustively around the current best point. On the other hand, if the small-valued tolerance parameter ξ in PI(.) equation is set too high (see [41]), the search will be excessively global, and the algorithm will be slow to fine-tune any promising solutions.” Thus, the Expected Improvement acquisition function, EI [7], is widely used over PI which is a trade-off between exploration and exploitation. Another Acquisition function is the Confidence bound criteria, CB, introduced by Cox and John [42], where the selection of points is based on the upper or lower confidence bound of the predicted design surface for maximization or minimization problem respectively.

3 Problem Description

In this section, we describe the thin tube design problem which represents the proof of concept for the large complex design of diffusion bonded CHX. As the tube is assumed to undergo constant loading of temperature and pressure, there will be risk of creep-fatigue failure which will vary with the design geometry. Fatigue damage is created when one cycles a test specimen at a fixed stress amplitude for enough cycles until it develops microstructural damage and eventually fails. Creep damage is created when one holds a test specimen at a fixed load for a long enough time that it eventually develops microstructural damage and fails. Creep-fatigue damage is therefore to do both of these things simultaneously (i.e., a stress controlled cycle with a hold) and the specimen will generally fail sooner than conducting the cycling and the hold individually. As mentioned previously, our goal is to predict the transition region between the safe and unsafe region as defined in Sec. 1.2. For design variables for the CHX which will influence creep-fatigue behavior, we choose the radius (rad) and length (l) of the tube. Next we describe the experimentation and the formulation of the objective function which depends on the experimental results and the prior knowledge on the domain of solid mechanics.

3.1 Model Experiments.

In this section, we provide the computation of the location of any design in terms of Elastic, Plastic, Shakedown and Ratchetting, and the respective strain accumulation. We represent these outputs as the responses from the expensive experiments. In our problem of thin tube design, though these computations are not expensive and can be done analytically, we still represent these as expensive function evaluations which will be true for our future problem of considering the actual diffusion bonded CHX geometry where expensive Finite Element Analysis (FEA) is required. The computations have been done based on the formulation of a Bree diagram [8]. In this paper, we considered the Bree diagram for a nonwork-hardening material whose yield stress remains unchanged by changes in mean temperature, as provided in the Supplemental Materials on the ASME Digital Collection. For the sake of simplicity, we have ignored the further division of Shakedown (S1, S2) and Ratchetting (R1, R2) as shown in the figure, and assumed a single region of Shakedown (S) and Ratchetting (R). This is because, for the purpose of our problem, any design in Shakedown is considered safe, while in Ratchetting is considered unsafe.

Below are the steps for computation of the various stresses and strains for the thin tube required for our methodology:

• Step 1: Calculate pressure and temperature stress
$σp=P*rad/d$
(1)
$σt=(E*α*ΔT)/2(1−ρ)$
(2)
where
$ΔT=ΔTslop*l+Tin$
(3)
$ΔTslop=−Tin+Tout*(rad−radminradmax−radmin)$
(4)
where σp and σt are the pressure and temperature stresses; P is internal pressure subjected to the tube which is taken as 25 MPa; rad is the radius; d is the wall thickness; l is the length; E = 200 GPa is the Young’s modulus; α = 16e − 6 is the thermal coefficient of the linear expansion; ρ = 0.27 is the Poisson’s ratio; ΔT is temperature drop across the wall with Tin and Tout are the inlet and outlet temperatures which are taken as 400 °C and 20 °C, respectively; radmin and radmax are the minimum and maximum radius.
• Step 2: Determine the region of the design:

Case 1:

• If σp ≤ 0.5σy and σt < 2σy, (σy = 205 MPa is the yield stress), the design is in the Elastic or Shakedown (Safe) region;

• else, if σt > 2σy, the design is in Plastic or Ratchetting region (Unsafe). For the design in Plastic or Ratchetting, if $σp*σt≤σy2$, the design is in Plastic.

• else, if σt = 2σy, the design is in at the transition line.

Case 2:

• If σp > 0.5σy and σp + 0.25σt < σy, the design is in Elastic or Shakedown (Safe);

• else, if σp + 0.25σt > σy, the design is in Ratchetting region (Unsafe).

• else, if σp + 0.25σt = σy, the design is in at the transition line.

• Step 3: Calculate the strain accumulation:

• If the design is in Elastic/Shakedown, strain ɛs can be calculated as
$εs=(2E)*(σ−(xd)*σt)$
(5)
where
$σ=σp+2(xd)*σt$
(6)
and x is the section of the tube wall, which varies from 0 at the outer wall to d at the inner wall. Since our problem is subjected to internal pressure, the maximum stress is at the inner wall of the tube. Thus, we consider the worst condition and focus on the stress at the inner wall at x = d.
• If the design is in Plastic, strain ɛp can be calculated as
$εp=(σt−2σy)*n/E$
(7)
• If the design is in Ratchetting, strain ɛr can be calculated as:
$εr=(2n*σtE)*(1−2(σy−σp)/σt)$
(8)
where n is the number of cycles. In this problem, we considered n = 50.

It is to be noted that when the design is at the transition line, as per Step 2, we avoid the Step 3 strain calculation for those designs as for those designs, Eqs. (5), (7), or (8) are all justified, and this creates the jump discontinuity (refer Fig. 3 in the Supplemental Materials on the ASME Digital Collection for 1D example). In Sec. 3.2, we present the formulation of the distance metric which mitigates this discontinuity issue, suitable for the BO framework.

3.2 Formulation of Distance Metric.

In this section, we provide the formulation of the distance metric. Although we can obtain the strain accumulation for a particular design from the model experiments, we do not have a good idea of strain accumulation for a design close to the transition region, where we do not know which equation (Eqs. (5), (7), and (8)) applies. Therefore, with the value of strain only, it is difficult to formulate an objective function where we can either maximize or minimize the strain accumulation in the BO model in order to maximize the accuracy and iteratively get closer to the unknown transition region. Also, the jump discontinuity lies at the transition line between safe and unsafe region in the design space of strain accumulation. Therefore, we propose to formulate a new function with the help of the experimental results, which we have defined as a distance function, Y, by transforming the original discontinuous design space into an artificially created continuous design space. The computation of the distance value for any designs is based on the heuristics that, given two designs that are in the Shakedown (Safe) region, the design having more strain accumulation is closer to the unknown transition region and therefore a higher value will be assigned. The reverse occurs for any design in Plastic or Ratchetting (Unsafe) region where for any two designs in those regions, the design having lower strain accumulation is toward the unknown transition region and therefore a lower value will be assigned. This prior knowledge helps us build our distance function where we first separate the sampled designs (prior data) in terms of regions which can be evaluated from experiments (Step 2 in Sec. 3.1). It is worth noting that for the complex problem of diffusion bonded HCX, the determination of the region for a design must be conducted from FEA. After we separate all the sampled designs into the regions of Elastic/Shakedown, Plastic and Ratchetting, we next assume a linear increment of strain accumulation as increasing the risk of creep-fatigue failure and build our formula for computing the distance value of the ith design at iteration k of BO model, Yk,i as below:

• For design i, in Elastic/Shakedown:
$Ys,k,i=YSmin+(εs,i−min(εs,k))max(εs,k)−min(εs,k)*(YSmax−YSmin)$
• For design i, in Plastic:
$Yp,k,i=YPmin+(εp,i−min(εp,k))max(εp,k)−min(εp,k)*(YPmax−YPmin)$
• For design i, in Ratchetting:
$Yr,k,i=YRmin+(εr,i−min(εr,k))max(εr,k)−min(εr,k)*(YRmax−YRmin)$
where ɛs,i, ɛp,i, ɛr,i are the strain accumulation of design i, given the design falls into Elastic/Shakedown, Plastic, or Ratchetting, respectively; min(ɛs,k), max(ɛs,k) are the minimum and the maximum strain accumulation among all the sampled designs (training data) in Shakedown at iteration k; min(ɛp,k), max(ɛp,k) are the minimum and the maximum strain accumulation among all the sampled designs (training data) in Plastic at iteration k; min(ɛr,k), max(ɛr,k) are the minimum and the maximum strain accumulation among all the sampled designs (training data) in Ratchetting at iteration k; YSmin, YSmax are the minimum and maximum distance function bounds for the designs in Elastic/Shakedown and are set as 0 and 0.45, respectively; YPmin, YPmax are the minimum and maximum distance function bounds for the designs in Plastic and are set as 0.55 and 1, respectively; YRmin, YRmax are the minimum and maximum distance function bounds for the designs in Ratchetting and are set as 0.51 and 1, respectively. With changing the values for YSmax, YPmin, YRmin, the efficiency of the model changes in terms of accuracy and cost of function evaluations and, therefore, a sensitivity analysis has been done within a recommended range of values which will be described later. However, the values given have been found to produce consistent performance in terms of accuracy.

The idea of the objective is that the design samples, at iteration k, in the Elastic/Shakedown region which are nearest to the predicted transition region will have Ys,k,i = 0.45 and the design samples in the Elastic/Shakedown region which are farthest from the predicted transition region will have Ys,k,i = 0. All the other samples, or training data, in the Elastic/Shakedown region will have values within the range of [0–0.45] based on the closeness to the predicted transition region. Similarly, at iteration k, the sample in the Plastic or Ratchetting region which is nearest to the predicted transition region will have Yp,k,i or Yr,k,i = 0.55 and the sample in the Plastic or Ratchetting region which is farthest from the predicted transition region will have Yp,k,i or Yr,k,i = 1. All the other samples or training data in the Elastic/ Shakedown region will have values within the range of [0.55–1] based on the closeness to the predicted transition region. The width of the transition region thus set in this case as [0.45–0.55], assuming the true transition or constraint boundary line is at Ys,i = Yp,i = Yr,i = 0.5 for any iteration of the BO model. In our formulation, this setting of a distance value of 0.5 for any design at the transition line builds the continuity in the design space. It is to be noted that knowing the distance value at the transition line, we attempt to optimize the location of the unknown transition region, given the width of the region. Locating the exact transition line or constraint boundary will require exhaustive experiments or function evaluations and may occur overfitting issues in prediction to classify safe and unsafe designs; therefore, we assume that with sequential improvement of the prediction of the location of the transition region from BO increases accuracy in the location of transition line (constraint boundary line) as well.

To summarize, the reasons to construct the distance function are (1) an output as a discrete region is not useful in the BO framework: we need to transform region knowledge into a continuous metric and (2) it allows us to define our objective in the BO framework in terms of finding the transition region between Elastic/Shakedown versus Plastic and Ratchetting. It is to be noted that this is a sequential design approach and with more training data (increase prior knowledge), the values of distance function, Y for all the training data, except at the transition line, changes and are re-computed per iteration.

4 Design Methodology

Figure 5 shows the detailed structure of the proposed Bayesian Optimization framework. Below is the algorithm with explanation of each steps of the proposed Bayesian Optimization classification method to predict the transition region between safe and unsafe region for the thin tube problem as describe in Sec. 3.1; however, the general algorithm is applicable to larger scale problems such as the diffusion bonded CHX.

Step-by-Step description for proposed Bayesian Optimization to find the transition region between safe and unsafe designs of the thin tube:

• Step 0 (Initialization): Define the design space or the region of interest for the given problem. From the defined design space, generate grid matrix $X¯¯$ using a DOE approach. Conduct function evaluations or experiments of very limited randomly generated samples in the design space. In our thin tube problem, we choose ten random selected designs as starting samples which are not included in $X¯¯$.

• Step 1: Build a training data matrix with the sampled designs: The data consist of X as design input variables and Y as output functions. In our problem, we define X as the matrix of design geometries as radius (rad) and length (l) of the thin tube and Y as the vector of distance values of the respective sampled designs as described in Sec. 3. Create the training data matrix, assuming at iteration k, Dk = {Xk, Y(Xk)}. It is to be noted Dk contains the distance values for all sampled designs from any region. Also, like in the Bree diagram, it is possible to choose the pressure and temperature stresses as the design variables. However, with design geometry as the design variables, it is more useful for a designer to directly visualize and understand the efficiency of the designs.

• Step 2: GPM: Next, with the knowledge gained from previous experiments (prior knowledge), Dk = {Xk, Y(Xk)}, we can develop a single posterior Gaussian Process model.

• Step 3: Use the posterior GP model $Δk$, to conduct posterior predictive simulations of the nonsampled designs in the grid matrix $X¯¯$ and predict respective mean and mean squared error (MSE) of distance values, forming two vectors of $μ(Y¯¯(X¯¯))|Δk$ and $σ2(Y¯¯(X¯¯))|Δk$ respectively.

• Step 4: Define objectives to sample designs toward unknown transition region: Now, we have the vector of predictive posterior means of all the nonsampled designs, we need to define our objective, for which the acquisition function will be formulated. In this classification problem between safe and unsafe designs, our goal is to therefore maximize sampling of designs toward the unknown transition boundary and thus train the BO model sequentially with higher accuracy toward the transition region. Thus, maximizing the distance function for the set of designs in Elastic or Shakedown (Safe) will create the optimal region toward the transition region as the design in Shakedown closest to the transition region will have higher distance values. Similarly, minimizing the distance function for the set of designs in Plastic and Ratchetting (Unsafe) will create the optimal region toward the transition region as the designs in those regions closest to the transition region will have lower distance function values.

To convert into a single-objective maximization problem, we did the elementwise transformation of the vector $μ(Y¯¯(X¯¯))|Δk$. Let us define the transformed mean vector as $μT(Y¯¯(X¯¯))$, after conducting the elementwise operation as follows:
${μ(y¯¯(x¯¯))ifx¯¯region∈Elastic/Shakedown1−μ(y¯¯(x¯¯))ifx¯¯region∈Plastic/Rachetting$
(12)
where $y¯¯$ is the scalar posterior mean value of the nonsampled design $x¯¯$.
• Step 5: Define the Acquisition Function and maximize the Acquisition functionu(.): After the transformation as mentioned in Step 4, we calculated the acquisition function value elementwise as $u(y¯¯(x¯¯)|Δk)$, for each nonsampled designs, considering the respective mean and MSE values in vectors $μT(Y¯¯(X¯¯))$ and $σ2(Y¯¯(X¯¯))$. Thus, we develop the vector of the acquisition function values as $u(Y¯¯(X¯¯)|Δk)$. A selection criterion is applied to choose new design location for future sampling,${x¯¯max};x¯¯max∈X¯¯$, which will maximize the predicted improvement of the learning of the unknown design space (maximizing acquisition function). Thus, we select the design with maximum acquisition function value as
$y¯¯max(x¯¯max)=max(u(Y¯¯(X¯¯)|Δk))$
(13)

Augment the data, $D¯¯k={Dk;(x¯¯max,y¯¯max)}$. The methodology used to compute the acquisition function has been described in Sec. 4.3.

• Step 6: Check for convergence criteria 1. If not met, run j = 1: n loops of Steps 2–6; each loop takes one optimal design location ${x¯¯max,j}$; to select the best n design locations $X¯¯max={x¯¯max,1,…,x¯¯max,n}$ to proceed to the next round of experiments. This step provides multiple experimental data in a single round of an experiment since it will be unrealistic and time consuming to provide one experiment at a time. The assumption behind this step is that we believe the GP prediction of ${x¯¯max,j}$ is accurate and proceed to the next best location ${x¯¯max,j+1}$ by minimizing the error in the current selected location ${x¯¯max,j}$. We believe this is a fair assumption since with more knowledge, the GP prediction will be close to the actual experiment data. In the early round of experiments, although we might see deviations from the actual experiment results (not following the assumptions), with the knowledge from those experiments, eventually the GP will improve and provide predictions closer to the actual experiment results as the model convergences (following assumptions).

• Step 7: Expensive Function evaluations: Conduct experiments for new design location $Xk+1=X¯¯max$. This step is outside the model environment as actual experiments will be conducted from the original high-fidelity model to generate required outputs (strain accumulation and the location of the designs), which ultimately used to compute distance metric (Eqs. (9)(11)). Therefore, the new experiment data is: {Xk+1, Y(Xk+1)}.

• Step 8: Data Augmentation: Update the prior knowledge for the next iteration of the model. Update training data matrix with current experimented data Dk+1 = {Dk;(Xk+1, Y(Xk+1))}. Repeat Steps 2–8 until convergence.

• Step 9: If convergence criteria 2 is met, update the GP with the final training data $Λ$, augmented with final sampled data and stop the model. Convergence criteria 1 and 2 will be explained later in this section.

• Step 10: Feasibility check between safe and unsafe region:

This step is after the optimization is completed and the proposed BO model is fully trained, satisfying convergence criteria. Now, instead of running expensive evaluations, to classify any new designs as safe or unsafe design, we check the feasibility using the trained low-cost BO model:

• If the posterior predictive mean of the new design, $μ((Ynew(Xnew))|Λ)≤0.5$, the new design is in safe region.

• Otherwise, the new design is in the unsafe region.

The value 0.5 is the threshold as we set this distance value at the transition boundary line.

4.1 Gaussian Process Model Formulation of the Thin Tube.

In this section, we present the GPM in our proposed BO model. The general form of the GPM is as follows:
$y(x)=xTβ+z(x)$
(14)
where xTβ is the Polynomial Regression model. In our model, we have used first- and second-order polynomial regression model. The polynomial regression model captures the global trend of the data. In general, first-order polynomial regression is used, which is also known as universal kriging [43]; however, it has also been claimed that it is fine to use a constant mean model [44]. z(x) is a realization of a correlated Gaussian Process with mean E[z(x)] and covariance cov(xi, xj) functions defined as follows:
$z(x)∼GP(E[z(x)],cov(xi,xj))$
(15)
$E[z(x)]=0,cov(xi,xj)=σ2R(xi,xj)$
(16)
$R(xi,xj)=exp(−∑m=1dθm(xmi−xmj)2)$
(17)
$θm=(θ1,θ2,….,θd)$
where σ2 is the overall scale parameter and θm is the correlation length parameter in dimension m of d dimension of x. These are termed as the hyper-parameters of GP model. R(xi, xj) is the spatial correlation function. In our model, we have used a Gaussian spatial correlation function which is given by Eq. (17). The objective is to estimate (by maximum likelihood estimate) the hyper-parameters σ, θm which creates the surrogate model that best explains the training data Dk at iteration k.
After we build the GP model, the next task of the GP model is to predict (Step 3) an arbitrary point drawn from the grid matrix in Step 0. Assume Dk = {Xk, Y(Xk)} is the prior information from previous experiments from high-fidelity models, representing the realizations of prior belief of the unknown true functions, and $x¯¯k+1∈X¯¯$ is any new design. The predictive output distribution of xk+1, given the posterior GP model, is given by Eq. (18)
$P(y¯¯k+1|Dk,x¯¯k+1,σk2,θk)=N(μ(y¯¯k+1(x¯¯k+1)),σ2(y¯¯k+1(x¯¯k+1)))$
(18)
where
$μ(y¯¯k+1(x¯¯k+1))=covk+1TCOVk−1Yk$
(19)
$σ2(y¯¯k+1(x¯¯k+1))=cov(x¯¯k+1,x¯¯k+1)−covk+1TCOVk−1covk+1$
(20)
COVk is the kernel matrix of already sampled designs Xk and covk+1 is the covariance function of new design $x¯¯k+1$ which is defined as follows:
$COVk=[cov(x1,x1)⋯cov(x1,xk)⋮⋱⋮cov(xk,x1)⋯cov(xk,xk)]$
$covk+1=[cov(x¯¯k+1,x1),cov(x¯¯k+1,x2),..,cov(x¯¯k+1,xk)]$

4.2 Generating Grid Points From Unknown Design Spaces of Thin Tube.

In this section, we discuss the generation of grid points within the specified design spaces, where the selected grid points by the acquisition function will be considered as samples for experiments. The goal of generating a grid using a rectangular grid or Latin hypercube is to use the space filling properties to cover the entire design space of the unknown design response surface. More details on the formulation and sampling strategies of these two methods have been provided in the paper [45]. However, the proposed model is not restricted to use these two methods and the user can select a preferred sampling strategy.

4.3 Acquisition Function Formulation of the Thin Tube.

In this section, we provide a detailed formulation of the acquisition function for Step 4 of the proposed model. Three types of acquisition functions have been studied in the model: Probability of Improvement, Expected Improvement, and Full Exploration search. The first acquisition function considers the idea of pure exploitation (selecting design points where predicted mean is high); the second acquisition function develops on the idea of exploitation (selecting design points where predicted mean is high) and exploration (selecting design points where predicted variance is high). The final acquisition function is based on only exploration. The final acquisition function is very useful when the design space is very flat, and the global optimal solution is confined in a very small region. With the first two acquisition functions, it has been seen the model can fall into false convergence since the design space is flat with limited samples in the early iterations. The acquisition function predicts very low probability/expected improvement as all the responses have similar values for all the experimented design inputs. Thus, when the design surface is unknown and could be very flat in most regions, it is important to use a full exploration acquisition function in the early iterations of the model to ensure that any potential optimal region is not missed. Once we find a sample within the confined interesting region, we can switch back to exploration-exploitation search to avoid unnecessary selection of samples for experiments in the nonoptimal regions. In our model, we have set a switching criterion as follows:

• do Full Exploration search
$ifmax(Yk)−min(Yk)≤δk=1,2,…K$
• else do Expected Improvement or Probability of Improvement

where δ is a very small value which is set as 0.1. We can also set δ as the percentage (say 1%) of the mean Yk.

After selecting an appropriate acquisition function, we optimize over the GP to get the next design input location ${x¯¯max};x¯¯max∈X¯¯$ such as
$x¯¯max=argmaxxi∈X¯¯u(Y¯¯(X¯¯)|Δk)$
(21)
where $u(Y¯¯(X¯¯)|Δk)$ is the vector of acquisition function values of all the elements of vector $Y¯¯(X¯¯)$ given the posterior model at iteration k. Below are the equations for the acquisition functions, Probability of Improvement (Eq. (22)), Expected Improvement (Eqs. (23) and (24)), and Full Exploration (Eq. (25))
$u(y¯¯(x¯¯)|Δk)=PI(y¯¯(x¯¯))={Φ(μ(y¯¯(x¯¯))−y(x+)−ξσ(y¯¯(x¯¯)),mean=0,sd=1)ifσ(y¯¯(x¯¯))>00ifσ(y¯¯(x¯¯))=0$
(22)
$u(y¯¯(x¯¯)|Δk)=EI(y¯¯(x¯¯))={(μ(y¯¯(x¯¯))−y(x+)−ξ)*Φ(Z,0,1)+σ(y¯¯(x¯¯))*ϕ(Z)ifσ(y¯¯(x¯¯))>00ifσ(y¯¯(x¯¯))=0$
(23)
$Z={μ(y¯¯(x¯¯))−y(x+)−ξσ(y¯¯(x¯¯))ifσ(y¯¯(x¯¯))>00ifσ(y¯¯(x¯¯))=0$
(24)
$u(y¯¯(x¯¯)|Δk)=σ2(y¯¯(x¯¯))$
(25)
where y(x+) is the maximum actual response among all the experimented data until the current stage which is at x = x+; $μ(y¯¯)$ and $σ2(y¯¯)$ are the predicted mean and MSE from GPM for the nonsampled design $x¯¯∈X¯¯$; Φ(.) is the cdf; ϕ(.) is the pdf; ξ ≥ 0 is a small value which is recommended to be 0.01 [22] as this works well in most cases, whereas the cooling function of ξ did not. Jones [41] notes that the performance of PI(·) is highly sensitive to the value of ξ, with nonideal values leading to poor performance.

4.4 Convergence Criteria.

In this section, we discuss the convergence criteria established for the model. From the steps of the proposed BO model, there are two checks for convergence in the model in Step 6 and in Step 9 in Sec. 4. The convergence criteria in Step 6 is Convergence 1 and the convergence criteria in Step 9 is Convergence 2. If either of the convergence checks succeed, the model stops and return the final solution:

Convergence 1:

• The maximum improvement value of the acquisition function in selecting the first design sample (first iteration in Step 6) after conducting actual experiments is less than α=0.001. Mathematically, it can be stated as

If j = = 1
$max{u(Y¯¯(X¯¯)|Δk)}≤α$
(26)

Convergence 2:

• The absolute difference in the total mean MSE of the predicted responses in m successive iterations is less than α1
$|μ(σ2(Y¯¯k))−μ(σ2(Y¯¯k+m))|≤α1$
(27)
where $Y¯¯k$ is the column vector of all the predicted value of matrix $X¯¯$ at iteration k.
• Stopping the model after limiting the budget in terms of maximum number of experiments or function evaluations, i.e., $∑nk≥S$ where S is the maximum number of function evaluations possible; nk is the number of samples selected for experiments at kth iteration.

5 Results

In this section, we will show the results of the proposed Bayesian optimization framework on the design of the tube in terms of the performance of finding the transition region between safe and unsafe region. We used the DACE package [46] in matlab to fit the GP model in the Bayesian optimization. With radius (rad) and length (l) of the tube as decision variables, two test scenarios have been considered with different thicknesses of the tube: 1.7 mm and 1.2 mm. The feasible bounds for radius and length are [4–6.55] mm and [0.1–1] m, respectively. Figures 69 shows the results after the model satisfies convergence criteria 1, considering the two different thickness of tube.

5.1 Predicted Transition Region of Converged Model.

The pink and black dots in the Figs. 6 and 7 represent the randomly starting samples and the BO guided adaptive sample design locations that have been trained from actual function evaluations as described in Sec. 3.1. The final posterior predicted transition region, representing also the discontinuity or the constraint boundary region, has been developed based on those prior training data only and, therefore, the designers can provide decisions about the feasibility of any new designs in the specified design space based on the small sample of data, instead of undergoing further experiments. The green and red highlighted region represent the final transition region which is defined as the predicted distance function value Y, ranges between 0.4 to 0.5 and 0.5 to 0.6, respectively, given the prior training data. The green highlighted region represents the area in the Shakedown region (safe), but very close to transition region near the constraint boundary line. The red dots represent the area in the Plastic or Ratchetting region (unsafe), but very close to transition region near the constraint boundary line.

From visualization, we can see that a that design falls above the green region is most likely to be safe and a good design. A design that falls within the green or red region is very close to the transition region and therefore recommended for further analysis. Any design that falls below the red region is most likely not a safe design, susceptible to creep-fatigue failure. The converged results of both scenarios from the proposed BO framework have been compared with the true solution (Bree diagram of thin tube) in terms of pressure and temperature stresses in Figs. 8 and 9. The grey shadowed part is the region of interest of our test cases where we show the predicted transition region (denoted by red and green) centered about the true transition line. We know from the Bree diagram that below the solid black and dashed red line is the true Elastic/Shakedown (safe) region and above those are the Plastic and Ratchetting region (unsafe), respectively. The region of interest in Fig. 8 does not cover the Ratchetting region; thus, we see the red region above the solid black line (toward the plastic region) and the green region below the solid black line (toward the elastic region). The region of interest is more complicated in Fig. 9, since the region covers Elastic/Shakedown (below black and red dashed line), Plastic (above black line) and Ratchetting region (above red dashed line). It can be understood that due to a more complex design response surface, the model took more training data (black dots) for the 1.2 mm versus the 1.7 mm thickness to reach model convergence.

5.2 Classification Error.

Next, we consider a randomly selected 100,000 new designs (test data) from Latin Hypercube sampling in the same design space for validation to classify between safe and unsafe region (Step 9 in Sec. 4). Tables 1 and 2 provide the confusion matrices for both the test scenarios with a classification error rate of 0.42% when YSmax = 0.45 and YPmin = YRmin = 0.55. We found some incorrect classifications as the BO model optimizes for a transition region rather than the true line. However, our assumption appears reasonable, as optimizing the model to locate the transition region provides efficient learning and high accuracy (error rate < 1%) in locating the true constraint boundary line.

Table 3 provides a summary of the sensitivity analysis of the values of YSmax, YPmin, YRmin in Eqs. (9)(11) in terms of the number of training data sampled and the accuracy of the model in terms of the classification of the new designs, considering the equivalent 100 k test data and both values of thickness parameter of the thin tube after model convergence (Convergence 1). In this case study, from the sensitivity analysis, we can see best consistent accuracy of classification for both scenarios of thickness values when $YSmax=0.45$ and $YPmin=YRmin=0.55$ having a mean error rate of 0.42. However, considering the amount of training data sampled, a range of YSmax between 0.45 and 0.48 and YPmin, YRmin between 0.55 and 0.52 is good in terms of trade-off between cost of training data and the accuracy of classification. However, beyond that range, we can see that either we have significant error rate (mean approx. 5.3%) or significant cost of training data (mean approx. 150–200) to reach the minimal error rate. Thus, when we attempt to locate the exact transition line (last two rows of Table 3) versus a region, the model has the highest error rate (∼5.3%) and requires much more sampling to reduce error, making the model inefficient.

The sensitivity analysis has been presented in this paper as a study to see the effect of gap width on the model performance. Although we have seen some changes in the model classification accuracy, the error rate has generally been less than 1%. This shows the performance is not extremely sensitive to the gap width and therefore, in general application, we can think of a standard value (not too wide or thin gap width) as the values provided in the sensitivity analysis. Further research on the strategy to optimize the band width as a trade-off between model performance and cost for a general problem will be considered in the future.

5.3 Comparison With Existing Classification Methods.

Finally, Table 4 shows a comparison of our proposed method with other methods, such as a SVM, Random forest (RF) and Ada Boosting (ADA) [1012], for classification between safe and unsafe designs among 100k randomly selected new thin tube designs, considering both thickness values. These existing methods were implemented using inbuilt function in R packages [4749], with a radial kernel in SVM and 2000 trees (iterations) for ADA; the responses are provided as standard binary values (0-unsafe and 1-safe). At first, we use Latin Hypercube sampling to generate a full matrix, X′ (refer Sec. 4, Step 0), over the design space as the training data (2500 samples) for the SVM, RF and ada models. From results in Table 4, we can see in classification, SVM gives the best performance (err rate = 0.325) and ada gives worst performance (err rate = 3.39). Though SVM has lower error rate than our proposed BO method, it took much more sampling to train the models (2500 samples versus 67 and 120 samples), thus causing a significant increase in experimental or function evaluation cost. Thus, we did another comparison where we used only the training data used, until convergence, of the proposed BO models to train the SVM, RF, and ADA models. Using the minimal BO training data, our proposed method provides the best performance (err. rate = 0.42), while SVM gives much higher error rate of 1.5%, and ADA is the worst (err. rate = 9.15%). The detailed confusion matrices for classification using SVM, RF, and ADA are provided in Tables 14 in the Supplemental Materials on the ASME Digital Collection.

In this problem, as our main objective is to classify the design between safe and unsafe region, the BO model guides us to do more sampling near the unknown transition region so that the surrogate GP model predicts the output for a design with high accuracy close to the transition boundary, rather than the designs which are far away from the transition boundary. This is because the designs closer to the boundary are more critical for mis-classification, thus higher prediction accuracy is required from GP model, thus higher sampling over that region has been recommended by the BO model. This is not true for the designs farther away from the transition boundary, since even with lower prediction accuracy, the designs have lower likelihood to jump the threshold. Therefore, more sampling in such noninteresting region would be redundant considering the trade-off between experimental cost and model classification accuracy. Thus, with the strategic and adaptive sampling from BO model, we see a minimal error rate with minimal training design samples (Table 4). For a subsequent goal of predicting the model output with higher accuracy from the surrogate model in only the safe design space (such as for finding the optimal tube design) we can refine the GP model over this region (a future research topic).

6 Conclusion and Future Research Scope

In this paper, we have proposed the application of the Bayesian optimization to locate the constraint boundary of at the transition region between safe and unsafe region for thin tube in terms of risk of creep-fatigue failure under constant application of pressure and temperature stresses, and thereby use as a classification tool for evaluation of new designs as good or bad designs. As we have discussed, the constraint boundary in this problem also represents the discontinuity of the function (discontinuous transition region); the proposed strategy provides a way to tackle the discontinuous design space by projecting to an artificial continuous design space for better convergence of BO model. However, it is worthy to mention that once we obtain the required data (region and strain accumulation) for the design, the formulation of the distance function is not dependent upon the scale or complex design geometry. The complexity arises from how those required data are obtained (e.g., FEA). At each iteration, the model with prior knowledge of training data sampled from previous iterations updates the posterior predictive model. This informs the acquisition function to choose the design for sampling in the next iteration to maximize learning of the optimal region of the unknown function. However, unlike the standard BO model for maximization or minimization problems, our objective is to locate the unknown constraint boundary. Therefore, we reformulate our objective function as a distance function which helps us to recast our objective as a maximization problem where the maximum objective function value, or the new optimal region, is toward the true constraint boundary.

Our proposed BO approach does not have dependencies of having pre-existing training data as incorporating Bayesian knowledge into the optimization framework allows us to strategically select design samples to maximize the learning iteratively and minimize the overall cost for sampling for expensive function evaluations (training data) to achieve the desired level of accuracy. With the resulting small error rate, we have high likelihood that the model emulates the true constraint boundary and this will help us to continue our problem to the next stage (future research) to find the optimal design as we have higher confidence to preserve only feasible designs in term of creep-fatigue failure. The next stage of research will be focused on the full framework will be implemented in a complex high-dimensional CHX (Fig. 11) design. In this problem, we use the results of the classification as an optimization pre-stage, where the design optimization problem considers application and manufacturing constraints.

Acknowledgment

This research was funded in part by DOE NEUP DE-NE0008533. The opinions, findings, conclusions, and recommendations expressed are those of the authors and do not necessarily reflect the views of the sponsor.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper.

References

References
1.
Huo
,
J.
, and
Liu
,
L.
,
2018
, “
An Optimization Framework of Multiobjective Artificial Bee Colony Algorithm Based on the MOEA Framework
,”
Comput. Intell. Neurosci.
, pp.
1
26
. https://doi.org/10.1155/2018/5865168
Article ID 5865168
.
2.
Feng
,
J.
,
Shen
,
W. Z.
, and
Li
,
Y.
,
2018
, “
,”
Appl. Sci.
,
8
(
11
), p.
2053
. 10.3390/app8112053
3.
Li
,
Y.
,
Chen
,
C.-K.
, and
Cho
,
Y.-Y.
,
2006
, “
A Unified Optimization Framework for Microelectronics Industry
,”
Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation
,
Seattle, WA
,
July
.
4.
Isaac
,
B
, and
Allaire
,
D
,
2019
, “
Expensive Black-Box Model Optimization Via a Gold Rush Policy
,”
ASME. J. Mech. Des.
,
141
(
3
), p.
031401
. https://doi.org/10.1115/1.4042113
5.
Sharif
,
B.
,
Wang
,
G. G.
, and
ElMekkawy
,
T. Y.
,
2008
, “
Mode Pursuing Sampling Method for Discrete Variable Optimization on Expensive Black-Box Functions
,”
ASME J. Mech. Des.
,
130
(
2
), p.
021402
. 10.1115/1.2803251
6.
Tran
,
A.
,
Wildey
,
T.
, and
McCann
,
S.
,
2019
, “
sBF-BO-2CoGP: A Sequential Bi-Fidelity Constrained Bayesian Optimization for Design Applications
,”
The ASME 2019 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Anaheim, CA
,
Aug. 18–21
. http://dx.doi.org/10.1115/DETC2019-97986
7.
Brochu
,
E.
,
Cora
,
V. M.
, and
de Freitas
,
N.
,
2010
, “
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
,”
arXivLabs
. arXiv:1012.2599v1
8.
Bree
,
J.
,
1967
, “
Elastic-Plastic Behaviour of Thin Tubes Subjected to Internal Pressure and Intermittent High-Heat Fluxes With Application to Fast-Nuclear-Reactor Fuel Elements
,”
J. Strain Anal.
,
2
(
3
), pp.
226
238
. 10.1243/03093247V023226
9.
Saranam
,
V. R.
, and
Paul
,
B. K.
,
2018
, “
Feasibility of Using Diffusion Bonding for Producing Hybrid Printed Circuit Heat Exchangers for Nuclear Energy Applications
,”
Procedia Manuf.
,
26
, pp.
560
569
. 10.1016/j.promfg.2018.07.066
10.
Musumeci
,
F.
,
Rottondi
,
C.
,
Nag
,
A.
,
Macaluso
,
I.
,
Zibar
,
D.
,
Ruffini
,
M.
, and
Tornatore
,
M.
,
2019
, “
An Overview on Application of Machine Learning Techniques in Optical Networks
,”
IEEE Commun. Surv. Tutor.
,
21
(
2
), pp.
1383
1408
. 10.1109/COMST.2018.2880039
11.
Binkhonain
,
M.
, and
Zhao
,
L.
,
2019
, “
A Review of Machine Learning Algorithms for Identification and Classification of Non-Functional Requirements
,”
Expert Syst. Appl. X
,
1
. https://doi.org/10.1016/j.eswax.2019.100001
Article 100001
.
12.
Sekeroglu
,
B.
,
Hasan
,
S. S.
, and
Abdullah
,
S. M.
,
2020
, “
Comparison of Machine Learning Algorithms for Classification Problems
,”
,
Las Vegas
,
May 2–3
, pp.
491
499
. http://dx.doi.org/10.1007/978-3-030-17798-0_39
13.
Kurata
,
G.
,
Xiang
,
B.
, and
Zhou
,
B.
, 2016, “
Improved Neural Network-Based Multi-Label Classification with Better Initialization Leveraging Label Co-occurrence
,”
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
,
San Diego, CA
,
June
, pp.
521
526
. 10.18653/v1/N16-1063
14.
Kanellopoulos
,
I.
, and
Wilkinson
,
G. G.
,
1997
, “
Strategies and Best Practice for Neural Network Image Classification
,”
Int. J. Remote Sens.
,
18
(
4
), pp.
711
725
. 10.1080/014311697218719
15.
Inan
,
O. T.
,
Giovangrandi
,
L.
, and
Kovacs
,
G. T. A.
,
2006
, “
Robust Neural-Network-Based Classification of Premature Ventricular Contractions Using Wavelet Transform and Timing Interval Features
,”
IEEE Trans. Biomed. Eng.
,
53
(
12
), pp.
2507
2515
. 10.1109/TBME.2006.880879
16.
Li
,
Y.
,
Xie
,
W.
, and
Li
,
H.
,
2017
, “
Hyperspectral Image Reconstruction by Deep Convolutional Neural Network for Classification
,”
Pattern Recognition
,
63
, pp.
371
383
. https://doi.org/10.1016/j.patcog.2016.10.019
17.
Nasierding
,
G.
,
Tsoumakas
,
G.
, and
Kouzani
,
A. Z.
,
2009
, “
Clustering Based Multi-Label Classification for Image Annotation and Retrieval
,”
2009 IEEE International Conference on Systems, Man and Cybernetics
,
San Antonio, TX
,
Oct. 11–14
, pp.
4514
4519
.
18.
Barros
,
R. C.
,
Cerri
,
R.
,
Freitas
,
A. A.
, and
de Carvalho
,
A. C. P. L. F.
,
2013
, “
Probabilistic Clustering for Hierarchical Multi-Label Classification of Protein Functions
,”
Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013, Prague, Czech Republic
,
Sept. 23–27
,
Berlin, Heidelberg
, pp.
385
400
.
19.
Zhu
,
X.
,
Vondrick
,
C.
,
Fowlkes
,
C. C.
, and
Ramanan
,
D.
,
2016
, “
Do We Need More Training Data?
,”
Int. J. Comput. Vis.
,
119
(
1
), pp.
76
92
. 10.1007/s11263-015-0812-2
20.
Cho
,
J.
,
Lee
,
K.
,
Shin
,
E.
,
Choy
,
G.
, and
Do
,
S.
2020
, “
How Much Data is Needed to Train a Medical Image Deep Learning System to Achieve Necessary High Accuracy?
,”
ArXiv151106348 Cs
,
Jan. 2016, Accessed November 20, 2020
. http://arxiv.org/abs/1511.06348
21.
Lizotte
,
D.
,
Wang
,
T.
,
Bowling
,
M.
, and
Schuurmans
,
D.
, 2007, “
Automatic Gait Optimization with Gaussian Process Regression
,”
IJCAI'07: Proceedings of the 20th International Joint Conference on Artifical Intelligence
,
,
Jan. 6–12
.
22.
Lizotte
,
D.
,
2008
,
Practical Bayesian Optimization
,
University of Alberta
,
.
23.
Cora
,
V. M.
,
2008
,
Model-Based Active Learning in Hierarchical Policies
,
University of British Columbia Library
,
. 10.14288/1.0051276
24.
Frean
,
M.
, and
Boyle
,
P.
,
2008
, “
Using Gaussian Processes to Optimize Expensive Functions
,”
AI 2008: Advances in Artificial Intelligence, Auckland, New Zealand
,
Dec. 1–5
,
Berlin, Heidelberg
, pp.
258
267
.
25.
Martinez-Cantin
,
R.
,
de Freitas
,
N.
,
Brochu
,
E.
,
Castellanos
,
J.
, and
Doucet
,
A.
,
Aug. 2009
, “
A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing and Planning With a Visually Guided Mobile Robot
,”
Auton. Robots
,
27
(
2
), pp.
93
103
. 10.1007/s10514-009-9130-2
26.
Chu
,
W.
, and
Ghahramani
,
Z.
,
2005
, “
Extensions of Gaussian Processes for Ranking: Semisupervised and Active Learning
,”
The NIPS 2005 Workshop on Learning to Rank
,
Whistler, BC
,
Dec. 9
.
27.
Thurstone
,
L. L.
,
1927
, “
A Law of Comparative Judgment
,”
Psychol. Rev.
,
34
(
4
), pp.
273
286
. 10.1037/h0070288
28.
Mosteller
,
F.
,
2006
, “Remarks on the Method of Paired Comparisons: I. The Least Squares Solution Assuming Equal Standard Deviations and Equal Correlations,”
Selected Papers of Frederick Mosteller
,
S. E.
Fienberg
, and
D. C.
Hoaglin
, eds.,
Springer
,
New York, NY
, pp.
157
162
.
29.
Holmes
,
C. C.
, and
Held
,
L.
,
2006
, “
Bayesian Auxiliary Variable Models for Binary and Multinomial Regression
,”
Bayesian Anal.
,
1
(
1
), pp.
145
168
. 10.1214/06-BA105
30.
Shu
,
L.
,
Jiang
,
P.
,
Shao
,
X.
, and
Wang
,
Y.
,
2020
, “
A New Multi-Objective Bayesian Optimization Formulation With the Acquisition Function for Convergence and Diversity
,”
ASME J. Mech. Des.
,
142
(
9
), p.
091703
. 10.1115/1.4046508
31.
Sarkar
,
S.
,
Mondal
,
S.
,
Joly
,
M.
,
Lynch
,
M. E.
,
Bopardikar
,
S. D.
,
Acharya
,
R.
, and
Perdikaris
,
P.
,
2019
, “
Multifidelity and Multiscale Bayesian Framework for High-Dimensional Engineering Design and Calibration
,”
ASME J. Mech. Des.
,
141
(
12
), p.
121001
. 10.1115/1.4044598
32.
Sexton
,
T.
, and
Ren
,
M. Y.
,
2017
, “
Learning an Optimization Algorithm Through Human Design Iterations
,”
ASME J. Mech. Des.
,
139
(
10
), p.
101404
. 10.1115/1.4037344
33.
Hutter
,
F.
,
Hoos
,
H. H.
, and
Leyton-Brown
,
K.
, “
Sequential Model-Based Optimization for General Algorithm Configuration
,”
Learning and Intelligent Optimization
,
Berlin, Heidelberg
,
2011
, pp.
507
523
. 10.1007/978-3-642-25566-3_40.
34.
Shahriari
,
B.
,
Swersky
,
K.
,
Wang
,
Z.
,
,
R. P.
, and
de Freitas
,
N.
,
2016
, “
Taking the Human Out of the Loop: A Review of Bayesian Optimization
,”
Proc. IEEE
,
104
(
1
), pp.
148
175
. 10.1109/JPROC.2015.2494218
35.
Andrianakis
,
I.
, and
Challenor
,
P.
,
2012
, “
The Effect of the Nugget on Gaussian Process Emulators of Computer Models
,”
Comput. Stat. Data Anal.
,
56
(
12
), pp.
4215
4228
. 10.1016/j.csda.2012.04.020
36.
Pepelyshev
,
A.
,
2010
, “
The Role of the Nugget Term in the Gaussian Process Method
,”
mODa 9—Advances in Model-Oriented Design and Analysis
,
Bertinoro, Italy
,
June 14–18
,
Heidelberg
, pp.
149
156
.
37.
Xing
,
W.
,
Elhabian
,
S. Y.
,
,
V.
, and
Kirby
,
R. M.
,
2020
, “
Shared-Gaussian Process: Learning Interpretable Shared Hidden Structure Across Data Spaces for Design Space Analysis and Exploration
,”
ASME J. Mech. Des.
,
142
(
8
), p.
081707
. 10.1115/1.4046074
38.
,
R.
,
Chan
,
Y.-C.
,
Wang
,
L.
,
Zhu
,
P.
, and
Chen
,
W.
,
2019
, “
Globally Approximate Gaussian Processes for Big Data With Application to Data-Driven Metamaterials Design
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111402
. 10.1115/1.4044257
39.
Erickson
,
C. B.
,
Ankenman
,
B. E.
, and
Sanchez
,
S. M.
,
2018
, “
Comparison of Gaussian Process Modeling Software
,”
Eur. J. Oper. Res.
,
266
(
1
), pp.
179
192
. 10.1016/j.ejor.2017.10.002
40.
Kushner
,
H. J.
,
1964
, “
A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise
,”
ASME J. Basic Eng.
,
86
(
1
), pp.
97
106
. 10.1115/1.3653121
41.
Jones
,
D. R.
,
2001
, “
A Taxonomy of Global Optimization Methods Based on Response Surfaces
,”
J. Glob. Optim.
,
21
(
4
), pp.
345
383
. 10.1023/A:1012771025575
42.
Cox
,
D. D.
, and
John
,
S.
, “
A Statistical Method for Global Optimization
,”
[Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics
,
Chicago, IL
,
Oct. 18–21
, Vol.
2
, pp.
1241
1246
.
43.
Bastos
,
L. S.
, and
O’Hagan
,
A.
,
2009
, “
Diagnostics for Gaussian Process Emulators
,”
Technometrics
,
51
(
4
), pp.
425
438
. 10.1198/TECH.2009.08019
44.
Chen
,
H.
,
Loeppky
,
J. L.
,
Sacks
,
J.
, and
Welch
,
W. J.
,
2016
, “
Analysis Methods for Computer Experiments: How to Assess and What Counts?
,”
Stat. Sci.
,
31
(
1
), pp.
40
60
. 10.1214/15-STS531
45.
Nielsen
,
H. B.
,
Lophaven
,
S. N.
, and
Søndergaard
,
J.
,
2002
, “
DACE—A Matlab Kriging Toolbox
.” https://orbit.dtu.dk/en/publications/dace-a-matlab-kriging-toolbox
46.
Lophaven
,
N. S.
,
Nielsen
,
B. H.
, and
Søndergaard
,
J.
,
2002
,
DACE – A Matlab Kriging Toolbox, Version 2.0.
,
DTU Orbit
, http://www.imm.dtu.dk/pubdb/p.php?3213
47.
Meyer
,
D.
,
,
E.
,
Hornik
,
K.
,
Leisch
,
F.
, and
Weingessel
,
A.
,
2020
,
Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
.
48.
Breiman
,
L
,
2001
, “
Random Forests
,”
Machine Learning
,
45
, pp.
5
32
. https://doi.org/10.1023/A:1010933404324
49.
Greenwell
,
B.
,
Boehmke
,
B.
,
Cunningham
,
J.
, and
G. B. M. Developers
,
2020
,
Generalized Boosted Regression Models
, https://github.com/gbm-developers/gbm.