Uncertainty quantification (UQ) is an emerging field that focuses on characterizing, quantifying, and potentially reducing, the uncertainties associated with computer simulation models used in a wide range of applications. Although it has been successfully applied to computer simulation models in areas such as structural engineering, climate forecasting, and medical sciences, this powerful research area is still lagging behind in materials simulation models. These are broadly defined as physics-based predictive models developed to predict material behavior, i.e., processing-microstructure-property relations and have recently received considerable interest with the advent of emerging concepts such as Integrated Computational Materials Engineering (ICME). The need of effective tools for quantifying the uncertainties associated with materials simulation models has been identified as a high priority research area in most recent roadmapping efforts in the field. In this paper, we present one of the first efforts in conducting systematic UQ of a physics-based materials simulation model used for predicting the evolution of precipitates in advanced nickel–titanium shape-memory alloys (SMAs) subject to heat treatment. Specifically, a Bayesian calibration approach is used to conduct calibration of the precipitation model using a synthesis of experimental and computer simulation data. We focus on constructing a Gaussian process-based surrogate modeling approach for achieving this task, and then benchmark the predictive accuracy of the calibrated model with that of the model calibrated using traditional Markov chain Monte Carlo (MCMC) methods.

## Introduction

Uncertainty quantification (UQ) is increasingly receiving attention due to advances in computational infrastructure and computer simulation models in multiple applications. These models are used to help understand the behavior of complex systems through representation of the phenomena that govern these systems using mathematical and physical formulations, and subsequently predicting specific quantities of interest (QoIs). It is widely accepted that models are imperfect since they involve modeling very complex phenomena that typically occur across different length and time scales [1]. Even if the correct values of the model inputs are used in the simulation, the output of the model will differ from the true value of the physical phenomenon being simulated due to incomplete understanding of the model parameters, incomplete knowledge of the physics, and/or intrinsic stochastic behavior of the system being simulated [2]. Consequently, the challenging task of characterizing and quantifying the uncertainties associated with these models is crucial toward leveraging their benefits and streamlining their applicability. Applications of UQ span a wide range across several science, engineering, and social disciplines such as climate and weather [3], forestry [4], nuclear engineering [5], computational fluid dynamics [6], biological phenomena [7], medicine [8], and econometrics [9].

One field in which the application of UQ is highly desired is computational materials science in general [10] and the emerging field of *Integrated Computational Materials Engineering* (ICME) in particular [11–13]. ICME is a new approach within the broad field of materials science and engineering that aims to integrate computational materials models to enable the optimization of the materials, manufacturing processes, and component design long before components are fabricated [14]. It has evolved due to the tremendous advances in computational materials models over the past three decades. In a recent roadmapping study by The Minerals, Metals & Materials Society (TMS) [15], UQ of these computational models was identified as a fundamental issue that needs to be addressed within the ICME framework. As stated in many recent reports [10,12,13], all computational materials models have uncertainties due to the stochastic nature of materials structures in addition to the levels of uncertainties associated with simulation methods [14]. To date, the focus in ICME has been on the development of physics-based computational models, but UQ has not been a central question despite its crucial importance [10]. In this work, we conduct one among the early efforts in applying formal UQ to an important class of computational materials models known as precipitation evolution models.

Precipitation evolution models are concerned with understanding and predicting the formation, growth and evolution of secondary phase particles (*precipitates*) that form from an initially supersaturated matrix phase during prescribed thermal treatments. Precipitates can play an important role in enhancing and modifying material properties (e.g., strength and hardness), and hence part performance [16,17]. Accurate modeling of the evolution of precipitate populations requires considerable understanding not only of the thermodynamics (i.e., driving forces) and kinetics (i.e., diffusion, growth rates) of the materials of interest, but also of the coupling of sophisticated large-scale simulations with difficult to acquire experimental data. Consequently, uncertainties are typically associated with these interacting components which results in hampering our understanding of the underlying process and our efforts to develop meaningful predictions [1]. These uncertainties originate from many different sources, some of which are related to the choice of model itself, while others relate to model inputs and parameters [6]. For example, the model may include oversimplified assumptions about some complex and intricate physics. Another possible source of uncertainty might originate from the lack of knowledge about the true value of intrinsic physical properties of the material, such as density or diffusion coefficient, that serve as inputs to the model. Kennedy and O'Hagan [18] provide a classification of the sources of uncertainty in computer simulation models.

The literature on formally characterizing uncertainties associated with ICME computational materials models is sparse. Traditionally, research efforts have utilized Monte Carlo methods to achieve this task [19–23]. Although Monte Carlo methods represent a powerful and well-studied approach, they require a large number of simulation runs, which becomes impractical and often infeasible with ICME models which are typically computationally expensive. In contrast, we present a UQ approach based on surrogate modeling to conduct *Bayesian calibration* of a physics-based computational model used to predict the evolution of precipitates in advanced shape-memory alloys subject to heat treatment. The calibration problem (sometimes known as the inverse UQ problem) refers to the adjustment of a set of model parameters such that the model's predictions are in agreement with experimental data [24].

Shape-memory alloys (SMAs) are a class of advanced metal alloys attractive to the industrial and academic communities due to their ability to *remember* their original shape. In other words, after deformation they have the ability to retain their original shape upon being heated [17]. The application scope of SMAs is broad across various sectors such as the automotive, aerospace, manufacturing, and energy exploration sectors [25]. Recently, it has been shown that control of the distribution of secondary strengthening precipitate phases in some SMAs can enable further control strategies to tune their transformation behavior to specific engineering applications [16,26].

The paper is organized as follows: Sec. 2 presents an overview of the literature. The current work represents a unique intersection of the fields of UQ, Bayesian calibration, physics-based precipitation models, and shape-memory alloys. Next, we present the precipitation model for nickel–titanium shape-memory alloys (NiTi SMAs) and its attributes in Sec. 3. The Bayesian calibration model is described in Sec. 4, and subsequently, Sec. 5 reports the application of the calibration model into the precipitation model and the obtained results. We finalize with conclusions, insights, and directions for future work in Sec. 6.

## Literature Review

Modeling and simulation of precipitate evolution during material processing is of great interest to material scientists and engineers since it can have considerable impact on mechanical properties and part performance. Research on precipitation modeling typically starts with the development of thermodynamic models—using, for example, the CALPHAD framework—in order to identify the thermodynamic conditions that yield specific phase constitution states [27–29]. One of the earliest numerical precipitation models upon which many other studies have relied is the model provided by Wagner et al. [30]. The existing literature on precipitation modeling focuses on microstructural features and mechanical properties. For instance, Robson et al. [31] develop a model to account for the effect of nucleation processes on dislocations, and predict precipitation behavior in aluminum [32]. A different study conducted by Deschamps et al. [33,34] presents research on microstructure evolution and modeling of precipitation kinetics by integrating nucleation growth and coarsening under different heat treatments schemes, and then maps them to mechanical properties. Other works focus on phase field models to assess the morphology of precipitates at different sizes [35] and to analytically predict the evolution of the precipitate volume fraction using calorimetry techniques [36].

While there exists a plethora of research efforts on the topic of computational materials models, there is a corresponding lack of efforts on uncertainty quantification of these models. Chernatynskiy et al. [10] present a review of the few existing works on UQ in computational materials model with emphasis on multiscale simulations. One example includes Cai and Mahadevan [37] who conducted uncertainty propagation in a manufacturing process through leveraging modeling using Gaussian processes. Another example is given in Crews and Smith [38] where the authors use UQ to predict heat transfer parameters of SMA bending actuators.

One possible explanation for the lack of research on UQ in the context of computational materials models is the fact that advances in these models are recent and have focused on aspects relevant to the underlying physics responsible for the phenomena of interest. As these models continue to develop and mature, the need for UQ receives more attention as noted in multiple reports [14,15]. An important note, however, is that despite the lack of this integration, UQ as an independent field is quite mature due to the broad scope of applications that it addresses. Numerous UQ frameworks and methods have been developed to solve different problems such as regression, subset selection, hypothesis testing, and basis expansions [39,40]. In this paper, we focus on *Gaussian process* (GP) models to conduct UQ in ICME computational materials models.

Gaussian processes have a long history of application in mining, agriculture, and forestry since 1920 [41]. Due to their versatility, they have experienced significant growth over the past three decades, especially with the advances in low-cost high-speed computing [42]. GP models are well suited for studying and quantifying the uncertainty of functions and for making predictions given observed data. They have a very wide range of applications that include spatial modeling [43], computer model emulation [44], image analysis [45], and supervised classification and prediction in machine learning [46]. Rasmussen and Williams [46] present the theory and applications of GPs, while Cressie [47] and Gelfand et al. [42] focus on GP applied to spatial statistics (often called *geostatistics*), its implementations and generalization to different problems. Kennedy and O'Hagan [18,48] set foot for GP in the UQ field with their seminal article on calibration of computer models, a study that served as base for many subsequent works, including the current work, such as Refs. [49–51] for surrogate modeling and [8,52,53] for calibration.

The current work is first to conduct systematic UQ of a precipitation evolution computational model in shape-memory alloys. By conducting this task, we improve the predictive capability of the model which is very useful in the design and processing of materials. We focus on NiTi SMAs as a model material due to their unique features highlighted in Sec. 1. Two excellent review articles on SMAs are presented by Lagoudas [54] and Ma et al. [25] summarizing modeling, engineering, and high temperature applications. The nickel–titanium (NiTi) alloy considered in this paper (commercially known as *nitinol*) is a popular class of SMAs that has been broadly investigated for medical applications due to its biocompatibility [55,56], superelasticity, and strain control [57,58].

## Precipitation Model

The focus of the current work is on conducting uncertainty quantification of a physics-based model, rather than the development of the model itself. In other words, the physics-based model is treated as a black box and the focus is on the development of the statistical UQ framework. In this section, we present a high-level overview of the physics-based model of interest but omit technical details since they are out of the paper's scope.

Material modeling tools are generally used to determine the evolution of a material over time based on the environmental conditions imposed upon it. One approach, precipitation modeling, is used to predict the nucleation and growth of a secondary phase within the matrix of a primary phase. These small precipitates (secondary-phase particles dispersed throughout the material) are compositionally and/or structurally different than the surrounding matrix phase, allowing their process-dependent morphologies to be tracked as they evolve in the presence of various environmental conditions. The sizes and shapes of these particles can drastically alter the properties of the overall material [59,60]. In the context of this research, we are interested in the nickel composition of the matrix which is calculated at any step in the process through stoichiometric considerations and a mass-balance equation involving nickel in the precipitates and the initial nickel content of the material before thermal processing. In turn, the composition of nickel in the matrix, the volume fraction of the precipitates, and their size distribution control the final shape-memory behavior in NiTi alloys [16].

*MatCalc*, the microstructural modeling tool used in this research, utilizes the numerical Kampmann Wagner (NKW) approach, coupled with proprietary thermodynamic and kinetic databases, to model the phase transformation of a material [61–63]. The model is unique in that it calculates the evolution of a discrete size distribution of particles in a material system when given a set of times and temperatures representing a thermal processing schedule. This model strikes a balance between the fast but overly simplified Johnson–Mel–Avrami–Kolmogorov (JMAK) approach [64] and the accurate but computationally intensive phase field approach [65]. The size distribution provides a more complete physical representation of the system than JMAK and the mean-field approximation for each size class leads to much faster calculations overall when compared to more detailed microstructural evolution techniques, such as phase field modeling which explicitly simulates the evolution of the entire microstructure, including secondary phases.

The evolution of the precipitate size distribution is tackled by applying the nucleation, growth, and coarsening equations to each bin at each time step, then updating the size distribution bin sizes to reflect the evolution that occurred during said time step. The nucleation of each bin in the distribution is calculated by governing equations based on classical nucleation theory. These nucleated bins are then individually subjected to growth-based governing equations rooted in the thermodynamic extremal principle. Finally, the coarsening process is approximated by establishing a minimum precipitate radius through an analysis of the system as a whole. Precipitates smaller than the minimum size value are then dissolved and redistributed to the next nearest bins larger than the minimum radius. These three evolution calculations are repeated at every time step, leading to the overall evolution of the size distribution of particles. Each of these sections has a different set of governing equations which are not included in this paper for the sake of brevity, but can easily be found at the reference provided [63]. Although each set of governing equations has been derived to explain a different portion of the material's overall evolution, they predominantly rely on the same parameters and variables meaning that all three sets of equations can be calibrated holistically.

Three terms (interfacial energy, diffusion correction prefactor, and diffusion correction exponential factor) significantly influence the kinetic behavior of the overall model and as such can be used to calibrate the model to experimental data. The interfacial energy term influences the behavior of the nucleation portion of the model and the diffusion correction factors equally influence all three stages of precipitate evolution listed above by modifying the substitutional diffusivity of the matrix phase. Values for interfacial energy are known with much less accuracy than they should be, especially when considering how sensitive the entire model is to this term. In fact, MatCalc attempts to calculate an interfacial energy value using the generalized broken bond (GBB) method if no calibrated interfacial energy is given [63,66]. This GBB approach works well for many systems but failed to accurately reproduce experimental results for $Ni4Ti3$ precipitation. Previous parametric calibration attempts lead to an interfacial energy value of $0.0505\u2009J/m2$. The diffusion correction terms can be seen as unknown values because they are essentially fitting parameters for the diffusivity of the matrix. In the remainder of the paper, we focus on conducting statistical calibration of the three parameters described above.

## Statistical Calibration of the Precipitation Model

We start by introducing some definitions and notations. We distinguish between the model presented in this section to conduct statistical calibration and the physics-based precipitation model (Sec. 3) that is to be calibrated. We will refer to the former as the *statistical model* and the latter as the *computer model* to avoid confusion. It is common practice in the literature to refer to computational models as computer models since they involve running a computer simulation code.

*n*-dimensional multivariate normal distribution

with mean vector $\mu $ and covariance matrix **C** defined by a mean function $\mu (\xb7)$ and a covariance function $C(\xb7,\xb7)$ respectively. We will denote this as $f(\xb7)\u223cGP(\mu ,C)$, with $\mu (x)=E[f(x)]$ and $C(xi,xj)=cov(f(xi),f(xj))$. It is evident that the choice of mean and covariance functions will influence the distribution in Eq. (1); therefore, they need to be carefully selected. Useful guidelines for their selection are given in Refs. [46] and [67] for the interested reader.

In the calibration problem, we distinguish between two groups of inputs to the computer model:

*Calibration parameters*denoted by the vector $\theta $ represent physical parameters that can be specified as inputs to the computer model, and whose values are unknown or not measurable [18,49]. Examples include properties of materials (such as conductivity or interfacial energy, among many) which are not easily determined for materials models, or saturation constants in ecosystem models [68]. The goal of the calibration problem is to estimate these parameters such that the computer model simulations agree with the experimental observation of the real process being simulated.*Control inputs*(or design variables) denoted by the vector**x**are variables that are set to known values by the user [18]. Examples of these inputs include temperature, pressure, force, or any other quantities that are known and controlled by the user.

*q*-dimensional study region or domain of interest. Similarly, computer model output is denoted as $Y(x,\theta )\u2208\mathbb{R}$ evaluated at an input vector

_{x}**x**and calibration parameters $\theta \u2208T\u2286\mathbb{R}q\theta $, where $T$ is a $q\theta $-dimensional region within which we believe the true (but unknown) value of the calibration parameters, $\theta \u22c6$, lies. The statistical model is given by

where *ρ* is a scaling factor of the computer model, $\delta (\xb7)$ is a discrepancy function or model inadequacy function whose role is to account for the missing physics in the computer model, and $\epsilon (\xb7)$ is the observation or measurement error.

The calibration problem involves a synthesis of computer simulations and experimental observations. More specifically, to solve the calibration problem, one needs to collect experimental observations, $Z(x)$, and run a sufficiently large number of computer model simulations $Y(x,\theta )$. The main task is to use these to infer the values of the calibration parameters $\theta $ such that the computer model output most closely matches the real process represented by experimental observations. An important challenge within this context is the computational burden associated with generating the large number of computer model simulations needed to effectively conduct calibration. For example, some computer models might require the numerical solution to systems of coupled nonlinear partial differential equations. In this case, even generating a few simulations can be a challenging task.

When the above holds true, it is common practice to construct a *surrogate* model (sometimes known as an emulator or a meta-model). This represents a computationally efficient approximation that encapsulates the behavior of the original computer model and can thus be used to generate a large number of simulations. Of course, when the computer model is not computationally expensive, it can then be directly used to conduct calibration, without the need to construct this computationally efficient surrogate description. We will refer to this as *direct calibration.* It is important to note that although our main focus is on calibrating the precipitation model using a surrogate model, we will also present the direct calibration approach for benchmarking and comparison purposes.

### Direct Calibration.

*i*th control input to the computer model as $xi=[xi,1,xi,2,\u2026,xi,qx]\u22a4$, experimental observation as $Z(xi)=Zi$ for notational convenience, and the computer model output at the same control input $xi$ and some value of the calibration parameters $\theta $ as $Y(xi,\theta )=Yi$. We assume that data sets of size

*n*are collected by the user represented by the following matrices:

We model the discrepancy function defined in Eq. (2) as a Gaussian process, $\delta (\xb7)\u223cGP(\mu ,C)$. The mean function $\mu (x)=H(x)\beta $, where $H(x)=[1,x\u22a4]$ is a row vector of basis functions and $\beta =[\beta 0,\beta 1,\u2026,\beta qx]\u22a4$ is a vector of regressors. In other words, $\mu (x)=\beta 0+\beta 1x1+\cdots +\beta qxxqx$. The covariance function *C* will depend on an additional set of parameters $\psi $ (commonly known as hyperparameters). These will be presented in detail in Sec. 5. Finally, the measurement error terms are modeled as independent and identically distributed (iid) normal random variables, $\epsilon (x)\u223cN(0,\tau 2)$ for all $x\u2208X$. Hence, the statistical model is fully characterized by the set of parameters $\varphi ={\rho ,\theta ,\beta ,\psi ,\tau 2}$.

where $p(Z|\varphi )$ is the likelihood function defined as the probability of observing **Z** conditioned on the model parameters ($\varphi $ in this case) [69]. The term $p(\varphi )$ is the prior distribution of $\varphi $ that captures our prior belief or knowledge, and $p(\varphi |Z)$ is the posterior distribution of $\varphi $ conditioned on experimental observations.

*Likelihood function*: From the statistical model in Eq. (2) and the above definitions and assumptions, it can be shown that the likelihood function is given by

*n*×

*n*identity matrix, and $C\psi $ is a

*n*×

*n*matrix that is calculated by evaluating the covariance function

*C*at different pairs of control inputs given a specific value of the hyperparameters $\psi $

Selection of these prior distributions plays an important role in the parameter estimation process. This selection is driven by our prior knowledge or belief regarding the parameters. It is common practice to assign uninformative prior distributions to parameters for which little knowledge is available (for example, a uniform distribution or normal distribution with large variances). Finally, utilization of conjugate priors such that the prior and posterior distributions of the parameter belong to the same family is usually recommended and offers practical and computational advantages [41].

*Posterior distribution*: The posterior distribution (up to a normalizing constant) follows the next equation:

Several methods can be utilized to obtain this posterior distribution. We rely on Markov chain Monte Carlo (MCMC) methods, specifically *Gibbs* sampler and *Metropolis–Hastings* (MH) algorithm [70,71].

*The prediction step*: Although the formal objective of the calibration problem is to obtain the posterior distribution of the model parameters $p(\varphi |Z)$ as detailed above, the ultimate goal is to use the calibrated computer model to make predictions. Let $x0$ denote a vector of control inputs where no experimental observations have been made. We use the calibrated model to predict the process output at $x0$ using the

*Kriging*estimator (also known as best linear unbiased predictor (BLUP)) [72]. It can be shown that the mean and variance of this estimator are given by

The BLUP predictor will be subsequently used in Sec. 5 to assess the prediction accuracy of the calibrated model using a test set or cross-validation procedures.

### Surrogate Calibration.

*η*is constructed to replace the original computer model

for $x\u2208X$ and $\theta \u2208T$.

where the star notation for the control inputs **x** in the set *D*_{2} indicates that they might be of different values than the control inputs used in *D*_{1}.

where $H1(x,\theta )=[1,x\u22a4]$ and $H2(x)=[1,x\u22a4]$. Similar to the direct calibration case, the covariance functions $C1((\xb7,\xb7),(\xb7,\xb7))$ and $C2(\xb7,\xb7)$ are fully characterized by the sets of hyperparameters $\psi 1,\psi 2$, respectively, and henceforth, all of the parameters from this statistical model can be grouped into two sets: $\beta =[\beta 1\u22a4,\beta 2\u22a4]\u22a4$ and $\varphi ={\rho ,\psi 1,\psi 2,\tau 2}$, which along with calibration parameters $\theta $ represent all the statistical parameters that will be estimated using the Bayesian procedure. Notice that the measurement error is still following a normal distribution $N(0,\tau 2)$.

Bayesian estimation of the parameters is conducted in two stages. The first stage is to estimate the hyperparameters $\psi 1$ by maximizing the posterior probability distribution $p(\psi 1|Y)$. Subsequently, in the second stage, the remainder of the parameters ${\rho ,\psi 2,\tau 2}$ is estimated by maximizing the conditional posterior distribution $p(\rho ,\psi 2,\tau 2|d,\psi 1)$. A detailed procedure to carry out these optimizations is presented in the seminal work by Kennedy and O'Hagan [48].

Following these two stages, the remaining step is to obtain the posterior distribution of the calibration parameters $\theta $. We employ MCMC methods in a similar fashion to the direct calibration case.

*The prediction step*: Again, the last step is to use the calibrated model to make predictions at new control inputs $x0$ using the BLUP (Kriging) estimator

## Case Study: Calibration of the Precipitation Model for NiTi SMAs

In this section, we implement both calibration approaches presented in Sec. 4 to calibrate the computer model of interest. As detailed in Sec. 3, the computer model in our case predicts the evolution of precipitates in NiTi shape-memory alloys subject to heat treatment. More specifically, the model predicts the atomic percent (at. %) of nickel in the matrix following heat treatment for a specified period of time at a preset temperature.

The control inputs and calibration parameters are given as follows:

Three control inputs

**x**:- —
Initial nickel content in the alloy (at. %)

- —
Aging (heat treatment) temperature (°C)

- —
Aging (heat treatment) time (s)

Three calibration parameters $\theta $:

- —
Interfacial energy ($J/m2$)

- —
Diffusion correction prefactor (dimensionless)

- —
Diffusion correction activation ($J/mol$)

The output of the process $Z(x)$ and of the computer model $Y(x,\theta )$ correspond to the final nickel content of the surrounding matrix (at. %). The posterior distribution of the unknown calibration parameters will be derived using a synthesis of experimental observations and computer model simulations.

The execution time of the computer model is highly dependent on its inputs. For example, although simulation under a heat treatment setting with longer aging time is generally expected to take longer than with shorter aging time, some certain values of the calibration parameters can result in a different outcome. From running multiple simulations in the initial model testing phase, execution times ranged between 3 s and 5 min. It is important to point out that although this might not represent significant computational burden from the perspective of obtaining model prediction for a particular input combination, in conducting statistical calibration a large number of computer simulations is necessary to perform MCMC simulation (10,000 or more). For example, the median execution time is 2.5 min, thus generating a sufficient number of simulations to conduct MCMC calibration can take up to 2 weeks. This makes the task of conducting subsequent sensitivity analysis, uncertainty analysis, and what-if scenarios that are typically needed by the user quite challenging. Our focus in this work is thus on conducting surrogate model calibration. We will implement the direct MCMC calibration for benchmarking purposes.

### Experimental Observations.

The experimental observations consisted of 36 data points. Each data point represents the final nickel content of the matrix (at. %) corresponding to a specific heat treatment regime (aging temperature and aging time) for a particular NiTi alloy with initial composition. It is obvious that these data are consistent with the computer model output and control inputs, respectively.

To avoid oxygen contamination, the samples were individually sealed into quartz tubes under a high-purity argon atmosphere. Heat treatments were performed in a muffle furnace followed by subsequent water quenching. A technique called *differential scanning calorimetry* (DSC) was first used to directly measure the resulting martensitic transformation temperature of each heat treated alloy. The martensitic transformation temperature is a property of NiTi alloys that is highly sensitive to the amount of nickel in the surrounding matrix. The formation of nickel-rich precipitates causes a change in the composition of the alloy, which in turn reflects into a change in the martensitic transformation temperatures. In other words, the martensitic transformation temperature determined through DSC can be used to indirectly measure the matrix composition.

The relationship between NiTi composition and the martensitic transformation temperatures has been extensively studied in literature, such as Refs. [73–75]. In the current work, the DSC measurements of the martensitic transformation temperatures were converted to nickel composition (at. %) using curve fitting to data presented in Frenzel et al. [76]. This yielded the 36 experimental observations data set used henceforth. From this set, 31 will be used to calibrate (or train) the precipitation model as detailed next and five will be used for test and validation to assess the predictive performance of the calibrated model. Figure 1 and Table 3 in Appendix presents the whole experimental data set.

### Direct Calibration.

Following the notation of Sec. 4, we have that *q _{x}* = 3 and control inputs $x=[x1,x2,x3]\u22a4$, $q\theta =3$ and calibration parameters $\theta =[\theta 1,\theta 2,\theta 3]\u22a4$. We define region $X$ to include all values from the experimental set; that is the hypercube with bounds $Xmin={50.5,350,35000}$ and $Xmax={52,600,380000}$. Similarly, region $T$ is also defined as a hypercube; however, its bounds are specified by the modelers based on their experience and prior knowledge of the process to a region within which they have reason to believe that the true values of $\theta $ might lie. These bounds were defined to be $Tmin={0,0,0}$ (not included) and $Tmax={0.1,100,1000}$.

*ν*is the smoothness parameter of the covariance function, $K\nu (\xb7)$ is the modified Bessel function of the second kind of order

*ν*, $\Gamma (\xb7)$ is the gamma function, and

is a distance measure between $xi$ and $xj$ known as the *Mahalanobis* distance [42], with *length scale* parameters *ω _{k},* which control the relative influence of each input dimension to the process. It can be seen that the covariance function is fully defined by hyperparameters $\psi ={\sigma 2,\omega 1,\omega 2,\omega 3}$. The smoothness parameter was kept constant at $\nu =1/2$, based on preliminary work by the authors and common practice in the literature [41,42].

#### Selection of Prior Distributions.

The reasoning behind this selection is straightforward. With the exception of the variance parameters $\sigma 2$ and $\tau 2$, uninformative uniform distributions were selected for all the parameters due to the lack of prior information regarding the shape of the distributions. The scaling factor *ρ* is defined to take values between 0 and 1, while *β _{j}* can take on any real value and

*ω*is limited to positive values due to the intrinsics of Mahalanobis distance. The calibration parameters $\theta 1,\theta 2$, and

_{j}*θ*

_{3}take values within the previously defined bounds of the study region $T$. Notice that an indicator function $1{A}(x)$ has been used for the prior distributions of $\theta 1,\theta 2$, and

*θ*

_{3}. This function is equal to 1 if $x\u2208A$ and 0 otherwise. It was used to reject values of $\theta 1,\theta 2$, and

*θ*

_{3}for which the final (output) and initial (input) content of nickel are equal, which indicates that precipitation did not occur. This represents an anomaly that needs to be discarded. Finally, inverse gamma prior distributions were selected for the variance parameters $\sigma 2,\tau 2$. These represent conjugate distributions with respect to the normal likelihood distribution.

#### Model Calibration.

Next, we conduct Bayesian calibration of the model by obtaining the posterior distributions of the parameters above using MCMC simulations. We employ the Metropolis–Hastings algorithm, which requires the evaluation of the posterior distribution given in Eq. (6) at each iteration. This implies that **Y** needs to be evaluated, and therefore the computer model $Y(x,\theta )$ is executed $n\xd7q\theta =31\xd73=93$ times at each of the MCMC iterations. We utilize *parallel programming* in the 862-node Intel x86-64 Linux cluster supercomputer at Texas A&M high performance research computing facilities. MCMC simulations were implemented over ten cores in the supercomputer and run for 15,000 iterations, with a 25% burn-in period and thinned every fifth iteration. Results of the posterior distributions of the calibration parameters are shown in Fig. 2, where each plot displays the histogram with 20 bins and their corresponding kernel density estimate.

In the graphs, we can see that *θ*_{1} and *θ*_{2} have defined informative and unimodal posterior distributions, with *θ*_{1} having a unique peak and *θ*_{2} being skewed to the right. The posterior distribution of *θ*_{3} shows some uniformity, which implies that all the values across its domain are equally likely to characterize the real process. Table 1 presents mean, mode (most frequent value), and standard deviation of the posterior distribution for the calibration parameters.

On its own, the distribution of *θ*_{1} can be attributed to the fact that it appears as a cubic term within an exponential function of the governing equations for precipitate evolution. The narrow nature of its distribution, in comparison to the other two parameters, can be explained by its appearance in the governing equation for precipitate nucleation, which is the materials science equivalent of initial conditions drastically influencing the solution to a differential equation. The interfacial energy is so dominant in precipitation models due to the fact that relatively small values imply almost no barrier to nucleation, leading to the precipitation of all the precipitates that can form in the material (ultimately limited by thermodynamics) right at the beginning of the precipitation process. On the other hand, slightly high values for this property may result in a barrier to nucleation that effectively results in no precipitation of secondary phases at all during reasonable heat treatment times. Thus, only a narrow range of interfacial energy values can yield physically meaningful precipitation predictions. In fact, the estimated mean and mode of *θ*_{1} are well within the range of values reported in the literature and very close to the value calculated through parametric sweep-experimental data coupling.

It is more difficult to ascertain direct significance from specific values of *θ*_{2} and *θ*_{3} distributions since they represent corrections to the diffusivity values taken from the MatCalc NiTi kinetic database. These corrections are necessary as one does not really know the state of the microstructure before the precipitation starts. Of particular effect is the presence, for example, of grain boundaries, dislocations, and other defects that can catalyze (facilitate) the nucleation of the second phase precipitate particles. The ranges for each parameter were selected such that only physically realistic values would be calculated for these two parameters. Specifically, *θ*_{2} and *θ*_{3} correspond to corrections to the diffusion coefficient and activation energy values, respectively. Although they represent correction factors and as such have no direct physical significance themselves, it can be inferred that these two diffusion correction values are less influential (i.e., have broader distributions) than *θ*_{1} for two main reasons. First, the equations that contain them are not as drastic as a cubic in an exponential, and second, it is because they are implemented in a later stage of the precipitate evolution which reduces their influence with respect to *θ*_{1}.

### Surrogate Calibration.

We now calibrate the computer model using the surrogate calibration to account for the computational burden associated with direct calibration. Calibration will be conducted based on the redefined statistical model in Eq. (10).

#### Surrogate Model Construction.

The first step toward surrogate calibration is to construct the computationally efficient surrogate model that can be used in lieu of the original computer model. We start by generating a data set of computer model simulations over different combinations of $(x,\theta )$. A data set *D*_{1} with size *N* = 3025 data points was generated. *Latin Hypercube Sampling* was used to select values of $(x,\theta )$ uniformly over the $X\xd7T$ space at which the computer model is evaluated. This data set was again implemented using parallelization in the supercomputer to reduce the execution time. The same data set of experimental observations used for direct calibration was utilized for surrogate model calibration. We denote this as *D*_{2} with size *n* = 31.

are now squared Mahalanobis distances. Similar to the previous case, the covariance function *C*_{1} is fully defined by hyperparameters $\psi 1={\sigma 12,\omega 1,\u2026,\omega 6}$, and *C*_{2} by $\psi 2={\sigma 22,\omega 7,\omega 8,\omega 9}$. Notice that these covariance functions represent special cases of the Matérn function when $\nu \u2192\u221e$ [42].

#### Selection of Prior Distributions.

*θ*

_{3}were omitted. Recall that in surrogate model calibration, we do not need to evaluate the computer model $Y(x,\theta )$ anymore since we use the Gaussian process (GP) surrogate to replace it. Thus, we do not need to account for the special case where values of $\theta 1,\theta 2$, and

*θ*

_{3}result in no change in the output of the original computer model

#### Model Calibration.

Next, we estimate the hyperparameters of the GP models by maximizing their marginal posterior distributions in stages 1 and 2 as explained in Sec. 4.2. Subsequently, estimation of the calibration parameters was performed using MCMC with the same specified simulation procedure; that is, 15,000 iterations with 25% burn-in period and thinning set to every fifth iteration. The posterior distributions for the calibration parameters are shown in Fig. 3.

By inspecting the obtained posterior distributions, we see some differences from those obtained using direct calibration. The distribution of *θ*_{1} is an informative bimodal behavior with two distinctive peaks. The distribution of *θ*_{2} is skewed to the left, in contrast to the direct calibration case, and the distribution of *θ*_{3} shows similar uninformative behavior to the direct calibration case. Posterior values are presented in Table 2 in a similar manner as with previous case.

As with the previous case, *θ*_{1} is the parameter with direct physical significance while *θ*_{2} and *θ*_{3} represent correction factors for the kinetic equations of the model. The bimodal nature of *θ*_{1} is most likely a result of the inability of the surrogate model to directly compare initial and final nickel content of the matrix, which is used as a sanity check in the direct model. To elaborate on this, note that while the surrogate model is to replace the computer model, it does not capture any physics, and thus does not recognize that cases where the initial and final nickel content are identical should be discarded.

To make this more explicitly clear: in the direct calibration model, an unreasonably small value for the interfacial energy would result in *instantaneous* saturation of the precipitate phase to volume fractions predicted through thermodynamics (lever rule). On the other hand, unreasonably large values for this quantity would result in a practically infinite barrier for nucleation, leading to no precipitate phases regardless of the heat treatment time. The former case would result in complete nickel depletion at times close to *t* = 0, while the latter case would correspond to no nickel depletion at all. In the direct model, these two problematic cases could be detected and discarded from the sampled distribution. That being said, both peaks are within the acceptable range of values seen in the literature.

The explanation of *θ*_{2} and *θ*_{3} distributions remains essentially the same for the surrogate model as it was for the direct case. These two parameters are correction factors and as such have no direct physical connection. All that can be said is that the mean and mode values seen in Fig. 3 will result in physically realistic diffusion coefficient and activation energy values. Again, these two distributions are broader because they influence later stages of the precipitate evolution which have less of an impact on overall precipitate morphology than early stage parameters like *θ*_{1}.

In terms of hyperparameters, the estimated variances are as follows: $\sigma 12=0.1624,\u2009\sigma 22=0.3324,\u2009\tau 2=0.0092$. These results are consistent with what we would expect. For instance, the marginal variance of the surrogate model $\sigma 12$ is smaller than that of the discrepancy function $\sigma 22$. This is justified since the discrepancy function accounts for the missing physics between the computer model and the experimental observations; therefore, it handles more noisy data, which translates to larger uncertainty for the discrepancy function. Additionally, the surrogate model replaces a deterministic computer model; therefore, it is expected that the surrogate will have smaller uncertainty given that the simulation dataset *D*_{1} does not involve any noise besides numerical errors.

Regarding the length scale parameters, a large value translates to low influence of that specific input dimension on the covariance function and vice versa. The estimated values are $\omega 1:9={1.6759,0.2967,12.9291,0.1317,4.7732,12.6946,29.9991,29.9424,5.9599}$ which show several interesting results. Within the surrogate model (Eq. (17)), the most important input dimension corresponds to the interfacial energy (related to *ω*_{4}), which makes sense as it highly influences the precipitation process of nickel. On the opposite side, Aging time (*ω*_{3}) has been estimated to be less impactful on the surrogate model. In Sec. 5.5, we present a variance-based sensitivity analysis of the model, which further explains the dependences of the output of the process on each of the inputs.

### Predictive Performance of the Calibrated Models.

In Secs. 5.2 and 5.3, 31 experimental observations were used to calibrate the precipitation model using both the direct and surrogate modeling approaches. We now use five additional experimental observations from the test set to assess their performance.

*n*data points $X={x1,\u2026,xn}$, the predictive distribution of

*f*is given by

*μ*

_{0}and $\sigma 02$ are calculated using the BLUP (Kriging) estimator given in Eqs. (7) and (8) for the direct calibration case, and Eqs. (14) and (15) for the surrogate calibration case. Notice that these involve conditioning on parameters, $p(Z(x0)|Z,\varphi )$ and $p(Z(x0)|d,\theta ,\varphi )$. We thus integrate these parameters out (marginalize them) to obtain the predictive distribution in the form given in Eq. (19). This is achieved numerically since there are no closed forms for the posterior distributions. We demonstrate this in the case of the predictive mean for the direct calibration case given by Eq. (7)

where $npost$ is the total number of posterior distribution samples calculated during MCMC (after burn-in period and thinning) and $\varphi (i)$ is the *i*th posterior sample. A similar procedure is utilized for the predictive distribution parameters in the remaining Eqs. (8), (14), and (15).

First, we run a formal ten-fold cross-validation procedure solely focused on assessing the performance of the surrogate model as an approximation for the computer model and how accurate of an approximation it represents. Figure 4 shows the results of this cross-validation, where the abscissa in the plot is the simulated values (the computer model output), while the ordinate is the mean of the predictions obtained by the surrogate model; therefore, a point in the plot compares a simulation $Y(x,\theta )$ with a prediction $E[Y(x)|Y,\theta ]$ for a specific input $(x,\theta )$. In this case, the straight line is a reference indicating ideal predictions in the case when simulations and predictions are identical.

The surrogate model appears to adequately capture the behavior of the computer model. This is demonstrated by the points being reasonably scattered close to the ideal red line. It is worth noting that we are comparing 3025 points from dataset *D*_{1} and that the most dense regions in the plot are those close the red line. Numerically, the average deviation from the ideal line is 0.0661 (at. %), which further confirms the results.

Next, we use the predictive distributions from Eq. (20) to assess the performance for both approaches, where we employ the set of data points not used for training the models and compare their predictive distribution to the actual observed values (which might be noisy). Figure 5 displays model predictions for the test set consisting of five points using the direct and surrogate calibration approaches, respectively. The abscissa in the plot is the experimental observation while the ordinate shows the mean of the predicted value, $E[Z(x0)|Z]$, at the squared marker. The confidence interval bars represent ±1 standard deviation, $\sigma 0=var[Z(x0)|Z]$.

where $Zobs,i$ is the *i*th experimental observation from the test set, $Zpred,i$ is the *i*th predicted value (mean of the predictive distribution in this case), and $npred$ is the number of data points in the test set (five in our case). The standard error of prediction was equal to 0.0656 and 0.0675 for direct and surrogate calibrations, respectively. These values are approximately equal to 3% of the full range of experimental observations, which is acceptable given all the experimental uncertainties in these types of studies.

### Sensitivity Analysis.

*u*of variables $xu\u2286x$, by comparing the responses of the function for two different datasets of size

*n*

where $xi(1),xi(2)\u223ciidU(Xmin,Xmax)$. We denote a hybrid point $xu(1):x\u2212u(2)$ where the *j*th element equals $xj(1)$ if $j\u2208u$, and $xj(2)$ if $j\u2209u$.

where a large value of $\kappa \xaf\u0302u2$ represents a higher influence of variables subset *u* on function *f.* For reporting purposes, the indices are frequently normalized by the variance of the function evaluated at dataset 1, $var(f(X(1)))$.

We apply this procedure into our application and first calculate the indices for the computer model itself, in order to analyze which of the six inputs (three control inputs and three calibration parameters) impact it the most. First pane of Fig. 6 shows the calculated total sensitivity indices for the computer model, where the subsets *u* were selected to be single inputs (i.e., leftmost bar represents input *x*_{1}, hence its subset is $u={1}$). As we can see, initial nickel content (*x*_{1}) dominates the variance of the computer model followed by interfacial energy (*θ*_{1}), which is consistent with expert knowledge and results from the hyperparameters estimation in Sec. 4.2.

Similarly, we calculated Sobol indices for the calibrated surrogate model. In this case, we use the posterior distributions of the calibration parameters reported in Fig. 3 to calculate the total sensitivity indices for each of the inputs as well. The results can be visualized in the second pane of Fig. 6 with consistent results once again, where input *x*_{1} is computed as the most controlling input to the surrogate model variance.

## Conclusions

Uncertainty quantification (UQ) in materials simulation models has not been a central question in the UQ literature despite their great importance. In this paper, we have presented an inverse UQ problem (commonly known as the calibration problem) to a materials simulation model that predicts the evolution of precipitates in nickel–titanium shape memory alloys (NiTi SMAs) subject to aging heat treatment. We used a Gaussian process (GP) based surrogate modeling approach to conduct Bayesian calibration. The surrogate modeling approach was used in order to account for the computational burden associated with the precipitation model. The predictive performance of this surrogate modeling approach was benchmarked against the classical Markov chain Monte Carlo (MCMC) approach using a case study that involves real-world experimental observations.

The Bayesian approach used is invaluable both for solving the calibration problem and for quantifying the uncertainties in model predictions. Results show that the calibrated model using both approaches has good prediction accuracy, making it suitable for ICME-based design of shape memory alloys.

There are multiple follow-up studies currently undertaken by the co-authors. Namely, employing adaptive sampling techniques to accelerate and improve the predictive performance of the calibrated model, and coupling the calibrated precipitation model with other physics-based microstructure evolution models to quantify the uncertainty throughout the entire material processing chain, with particular emphasis on laser-based additive manufacturing of shape memory alloys.

## Acknowledgment

This work was supported by an Early Stage Innovations grant from NASA's Space Technology Research Grants Program, Grant No. NNX15AD71G. Portions of this research were conducted with High Performance Research Computing resources provided by Texas A&M University.^{2} L. J. and R. A. also acknowledge the support of NSF through the NSF Research Traineeship (NRT) program under Grant No. NSF-DGE-1545403, “*NRT-DESE: Data-Enabled Discovery and Design of Energy Materials (D ^{3}EM)*.” RA and IK also acknowledge the partial support from NSF through Grant No. NSF-CMMI-1534534.