## Abstract

We applied machine learning models to predict the relationship between the yield stress and the stacking fault energies landscape in high entropy alloys. The data for learning in this work were taken from phase-field dislocation dynamics simulations of partial dislocations in face-centered-cubic metals. This study was motivated by the intensive computation required for phase-field simulations. We adopted three different ways to describe the variations of the stacking fault energy (SFE) landscape as inputs to the machine learning models. Our study showed that the best machine learning model was able to predict the yield stress to approximately 2% error. In addition, our unsupervised learning study produced a principal component that showed the same trend as a physically meaningful quantity with respect to the critical yield stress.

## 1 Introduction

High entropy alloy (HEA) is a new class of material that has many favorable mechanical and thermal properties. It shows a high yield stress [1,2], low-coupling between ductility and temperature [3], excellent specific strength, superior mechanical performance at high temperatures, exceptional ductility and fracture toughness at cryogenic temperatures, super-paramagnetism, and superconductivity [4]. Also, its high hardness, wear resistance, high-temperature softening resistance, and anticorrosion make the HEAs a perfect candidate for structural uses in the transportation and energy industries [5,6].

High entropy alloys are generally classified as alloys that are composed of five or more alloy elements. The crystal structure of HEAs can be complex with heterogeneous phases [7]. There are several studies that examine the variability of the stacking fault energy (SFE) landscape in an face-centered cubic (FCC) HEA as a result of the presence of heterogeneous phases [8–10]. The phase-field dislocation method (PFDM) studies conducted by Zeng et al. [8] showed that the yield stress of a FCC HEA increases with larger fluctuations of the SFE, and the maximum strength increase is attained when the characteristic length scale of SFE fluctuations is close to the average equilibrium stacking fault width. While their simulation results illuminate on the effect of SFE fluctuations on the strength of FCC HEAs, the computational cost of their simulations are extremely high. In this study, we apply machine learning (ML) models to learn the relationship between the SFE fluctuations and the yield stress of an FCC HEA using the published simulations results from Zeng et al. We showed that our ML models can predict the yield stress of HEAs with varying SFE landscapes to around $2%$ error.

## 2 Data

The data used in this study contain the SFE landscape and the resulting critical yield stress. The yield stress is obtained from PFDM simulations of dislocations moving in this energy landscape under an externally applied shear stress. The details of the simulations can be found in Ref. [8], and a summary of the model is described in the following sections.

### 2.1 Phase-Field Dislocation Model.

*ξ*

^{α}(

**x**):

The evolution of the phase fields is obtained through computing the minimum total energy of the dislocation ensemble [12,13]. This energy consists of two energies: first, the strain energy *E*^{e}, and second, the misfit energy, *E*^{m}. The misfit energy accounts for stacking fault formations through parametrizing the gamma-surface [11,14].

*N*is the total number of slip systems,

*d*is the distance between slip planes, and

*m*^{α}is the normal to the slip plane

*α*. The total distortion can be obtained using the elastic Green’s function

*G*

_{ij}[15,16] as follows:

*C*

_{klmn}is the tensor of elastic constants and ($\u22c6$) represents the convolution operator. The strain energy can be calculated as follows:

*ε*

_{ij}=

*sym*(

*β*

_{ij}) is the symmetric part of the distortion, $\u03f5ijp=sym(\beta ijp)$ is the plastic strain, and

*γ*and the unstable stacking fault energy

*γ*

_{u}[11,14,18,19]:

*γ*and

*γ*

_{u}. However, the limitation of Eq. (7) is that it can be used only for displacements from a single phase field.

*α*coupled equations for the phase fields:

In HEAs, the stacking fault energy varies locally with the local composition of the alloy. In different regions of the slip plane, we allocated different values of the intrinsic stacking fault energy. All remaining material properties are left unchanged. Two straight extended dislocations with Burgers vector in the [110] direction are introduced in the slip plane. At zero applied stress, each dislocation splits into two partials. Subsequently, an external stress is applied. The yield stress is defined as the minimum stress required for the dislocation to slide. All the data used in this article are obtained from the study by Zeng et al. [8].

## 3 Methods

### 3.1 Machine Learning Models.

There are two major classes of ML models: supervised and unsupervised learning. Unsupervised learning looks for patterns in the training data without using target outputs. Supervised learning uses the target outputs to train the ML models.

In this study, we applied six different supervised ML models and one unsupervised ML model to our data to find the optimal ML model. All the ML models used in this study are implemented in the Scikit-learn machine learning library in python [21]. We used the GridSearchCV feature to iteratively select the optimal parameters for each ML models from a set of predefined hyperparameters. Hyperparameters are specific to each ML models that are used to tune the learning process. The hyperparameters for each ML models were chosen using the tenfold cross-validation scheme. We briefly describe each ML models used in this study and their hyperparameters.

*K-Neighbors Regressor.*The K-neighbors model uses the similarity of the data attributes to predict the value of the test data. This similarity is computed as distances in the*m*-dimensional features space*x*through an Euclidean distance as shown in Eq. (9), where*x*_{i}denotes a training input and $x^i$ denotes a test input. The predicted outcome is the mean of the nearest neighbor’s output as shown in Eq. (10) [22], where*K*is the number of nearest neighbors,*y*_{i}denotes the output from the training data in the neighborhood, and $y^$ denotes the predicted output for test input $x^i$.(9)$d=\u2211i=1m(x^i\u2212xi)2$The hyperparameter of the Kernel Ridge Regression model is the parameter(10)$y^=1K\u2211i=1Kyi$*K*, which is the number of nearest neighbors.*Bayesian Ridge Regression.*Bayesian Ridge is a probabilistic approach to linear regression. It assumes that the output of the data are normally distributed as shown in Eq. (11), where*β*^{T}are the weights of linear regression and*σ*is the standard deviation. Using Bayes theorem and a prior distribution for the parameters*β*and*σ*, a posterior distribution for the output at the unknown input can be found given the input data [23].The hyperparameters of the Bayesian ridge regression model are as follows:(11)$y^\u223cN(\beta TX,\sigma 2I)$*σ*is the standard deviation of the normal distribution about the linear fit*β*^{T}*X*;*λ*is the standard deviation of the distribution of the weights*β*.*Decision Tree Regression.*Decision tree regression breaks the training data into branches where each break point is a decision node. Each node has two or more branches where an attribute of the input can be tested. A prediction is achieved when a leaf (terminal) node is reached. It uses a top–down greedy search through the branches with no backtracking [24]. The hyperparameters used in this model are the criterion (e.g., mean absolute error (MAE) and mean squared error) used to determine locations where the tree splits—the maximum depth of the tree, which is the length of the longest path that traces from the root of the tree to the leaf of the tree.*Gradient Boosting Regression (GBR).*Gradient boosting regression consists of a set of*M*weak learning models and sequentially (i.e., boosting) reducing a predefined loss function using a gradient descent residual by adding the*M*weak learning models one at a time. The most commonly used weak learning models are decision trees with a small number of leaves [25]. The hyperparameters used in this models are the maximum depth of the decision trees; the loss function, which is used to compute the residual for each decision tree; the learning rate, which scales the contribution from each decision tree; and the maximum number of decision trees allowed.*Kernel Ridge Regression.*Kernel ridge regression is a generalization of ordinary least-squares models to a nonlinear infinite dimensional feature space using the kernel trick. The difference between Kernel ridge regression and support vector regression lies in the loss function. Kernel ridge regression minimizes the square loss function [26]. The hyperparameters used in this model are the types of kernel (e.g., linear, squared exponential, and polynomial) and the regularization parameter*α*, which prevents the models from overfitting.*Gaussian Process Regression (GPR).*Gaussian process regression assumes the output to be a Gaussian process. The joint probability distribution between any subset of the outputs form a multivariate Gaussian distribution. The covariance kernel*k*(*x*,*x*′) is assumed to be only a function of the inputs*x*and*x*′. A commonly used covariance kernel is the radial basis function as shown in Eq. (12), where*l*is a hyper-parameter indicating a characteristic length scale of the data. Using the definition of condition probability, a posterior distribution for the target*y*at an unknown input is computed as follows [27].The hyperparameters used in this model are as follows: the types of kernels (e.g., squared exponential, rational quadratic, and Matern). Each of the kernels are controlled by a length-scale hyperparameter.(12)$k(x,x\u2032)=exp(\u2212\gamma \Vert x\u2212x\u2032l\Vert 2)$*Principal Component Analysis (PCA)*. Principal component analysis is an unsupervised ML models where the principal components of the data are computed. The principal components transforms the set of correlated input variables to a set of linearly uncorrelated variables. The components are chosen by picking directions with the largest variances in the data [28]. In our study, we use the principal component analysis to linearly decorrelate the input features. The only hyperparameter for PCA is the number of components selected for the analysis.

### 3.2 Features for Learning.

For our study, we used 83 PFDM simulations of the Ni-Co-Fe-Cr-Mn family of high entropy alloys with different SFE distributions from Zeng et al. [8] to train our ML models. In the PFDM simulations, the regions with constant SFE are produced from a Voronoi tessellation algorithm with the mean region size, *d*, between [0.25, 12] nm. The SFEs are distributed uniformly and are randomly assigned to each region. The mean SFE $\gamma \xaf$ used in the PFDM simulations are $\gamma \xaf\u2208[72.0,84.7,127.1]mJ/m$ with a standard deviation of *σ* = 39 mJ/m and $\gamma \xaf=35mJ/m$ with standard deviation of *σ* = 12 mJ/m. An example of an input to the PFDM simulations is plotted in Fig. 1.

To determine which features best describe the PFDM inputs, we proposed three different candidates as the input features. Each machine learning model was trained separately using each of three candidate input features. The dimensions for each candidate input features are listed in Table 2.

Feature Type 1. We used the prescribed mean region size

*d*, mean SFE values $\gamma \xaf$, and the standard deviations of the SFE*σ*as input features to the ML models. We refer to the triplet $(\gamma \xaf,\sigma ,d)$ as the*prescribed statistical*features for the remaining of the article.Feature Type 2. We numerically estimated the mean region size $d^$, mean SFE values $\gamma ^$, and the standard deviations $\sigma ^$ of the SFE from pixel values of each image as shown in Fig. 1. This feature was motivated by SFE landscapes produced by atomic simulations, where the distribution of SFE landscape is not known a priori. We refer to this triplet, $(\gamma ^,\sigma ^,d^)$, as the

*estimated statistical*features for the remaining of the article.Feature Type 3. We used a resolution of 256 × 256 grid to sample the SFEs at each grid point of the PFDM inputs as shown in Fig. 1. We refer to this feature as the

*SFE grid*features for the remaining of the article.

### 3.3 Evaluation Metrics.

We used a number of metrics to measure the efficacy of the ML models to predict the yield stress. For the following equations, *y* denotes the actual output for the test data, $y^$ is the ML predicted output for the test data, and *n* is the number of test data points.

*Coefficient of Determination (R*. Also known as multiple correlation coefficient, the^{2})*R*^{2}is a measure of the explained variability of the dependent variable by a model [29]. This coefficient can take negative values. The metric is shown in Eq. (13):where $y\xaf$ is expressed as follows:(13)$R2(y,y^)=1\u2212(\u2211i=0n\u22121(yi\u2212yi^)2\u2211i=0n\u22121(yi\u2212y\xaf)2)$(14)$y\xaf=1n\u2211i=0n\u22121yi$*Mean Absolute Percentage Error (MAPE)*:(15)$MAPE=100n\u2211i=0n\u22121|yi\u2212yi^yi|$*Mean Absolute Error*:(16)$MAE(y,y^)=1n\u2211i=0n\u22121|yi\u2212yi^|$*Root-Mean-Square Error (RMSE)*:(17)$RMSE(y,y^)=1n\u2211i=0n\u22121(yi\u2212yi^)2$

### 3.4 K-Fold Cross-Validation.

The K-fold cross-validation is an evaluation method where the training data are split into K parts. The ML model is trained K times. For each training, each of the K parts is left out as validation data, and the remaining data are used for training. The validation error of the ML model is averaged over all the K rounds of validation error. Cross-validation provides a measure of how well the model will generalize to an independent data set [30].

### 3.5 Training and Test Data Partition.

The data used in this ML study were split randomly into 75% training and 25% test data. A tenfold cross-validation technique was used on the training data to select the hyperparameters for each of the ML models. The test data are completely unseen by the ML models and is only used to compute the test error of the ML models reported in Sec. 4

## 4 Results and Discussion

We trained our machine learning models using the three different type of features and compared their efficacy at learning the phase-field data. We discuss the results using both supervised and unsupervised models in this section.

### 4.1 Training the Machine Learning Models.

In this section, we applied different ML models listed in Sec. 3 to the training data for each feature types. The *training-testing procedure* of an ML model is as follows: the available data is randomly partitioned into 75% training data and 25% test data. The training data is used to train the ML models, and the test data are unseen by the ML models. A tenfold cross-validation scheme is applied to the training data to select the hyperparameters of each ML model. The hyperparameters are selected using the GridSearchCV routine in the Scikit-learn python machine learning library [21]. After the training of an ML model is finished, the ML model is used to make predictions on the unseen test data, and errors are computed based on a measured difference between the predicted yield stress and the exact yield stress of the test data. The error measures are listed in Sec. 3.3.

To ensure that the test errors for a given ML model represent a good generalization of the errors on unseen data, we repeated the aforementioned training-testing procedure for the ML model ten times. The test errors reported in the following results are averaged over ten different rounds of training-test procedures for every ML model and for every input feature type.

For the first two types of input, i.e., the prescribed and estimated statistical features, the input data consist of only three dimensions as presented in Table 2. The third type of input, i.e., SFE grid, consists of 65,536 dimensions. A direct application of the SFE grid input to an ML model would suffer from the “curse of dimensionality” where an extremely large number of training data would be necessary to avoid overfitting the ML model, leading to high test errors. The reason why a high-dimensional input requires for training data can be seen by the heuristic that to fit a line in one dimension, two data points are necessary; to fit a plane in two dimension, three data points are necessary; to fit a linear function in 65,536-dimension input, a minimum of 65,537 data points would be necessary. Due to the limited number of available PFDM data points, we have to reduce the dimension of the SFE grid input using PCA as described in Sec. 3. By using PCA, we reduced the SFE grid input to three dimensions to provide a fair comparison to the first two input types.

For each input type, the training-testing procedure was performed ten times on each of the six ML models in Sec. 3. The ML model that yielded the smallest averaged mean absolute error (MAE) on the test data was chosen as the ML model for the input type. In Table 3, the averaged test errors for each of the input types are reported. We observed that the feature types that use statistical descriptors yielded slightly better accuracy than the SFE grid.

To provide an alternative visual form of the test results from the ML models, we plotted the exact yield stress versus the ML model predicted yield stress using a randomly selected 25% of the available data as test data in Fig. 2. Since the test data for each of the subplots in the Fig. 2 were selected randomly from the available data, different data points were shown in the subplots in the figure. Note that the actual yield stress of the test data was lumped together in roughly four groups according to the four different mean SFE values. It can be seen that there is very little spread from diagonal line, which indicates that the predictions are robust.

#### 4.1.1 Dimension Reduction Using Principal Component Analysis.

Recall that so far, we have only applied PCA for the SFE grid input feature because the number of dimensions were too large to be used directly. Conversely, we used the prescribed and estimated statistical inputs directly to train the ML models. We investigated whether PCA could be also applied to the first two input types: prescribed and estimated statistical features. The goal of applying PCA to the first two input types was to reduce their dimensions to less than three. Using the PCA-reduced components from the first two input types, we applied the training-testing procedure ten times and computed the averaged test error. Figure 3 shows the averaged test error as a function of increasing number of PCA components (i.e., from one to three components) for the mean absolute percent error. It is shown in Fig. 3, after applying PCA, that only two components are sufficient to achieve comparable test errors to the three-dimensional raw input data. The reduction from three to two dimension enables visualization of the relationship between the yield stress and the two PCA components. The reason why we can reduce input data dimension using PCA is justified by the explained variance of each PCA dimensions for all three input types shown in Table 4. The first two PCA dimensions captures more than 90% of the variance in the input data.

Despite the efficacy of PCA in reducing the dimensions of input data, what is lost in the dimension-reduction procedure is the physical interpretability of the PCA dimensions. However, not all is lost in our case. The first PCA component from the SFE grid coincidentally correlates with the estimated mean SFE. Figure 4 shows that the first PCA component extracted from the SFE grid data captured the same trend for the yield stress as using the prescribed and estimated mean SFE even though it is an unsupervised process. In other words, the first principal component is equivalent to scalar multiple of the estimated mean SFE from the grid data.

### 4.2 Predictions Using Machine Learning Model.

The availability of a trained ML model allowed us to create plots that elucidate the relationship between the critical yield stress and the different statistical quantities that describe the distribution of SFE landscape. We applied GBR and GPR to predict the yield stress curve as a function of mean SFE with different SFE region sizes. Both the predicted yield stress and the PFDM yield stress used for training are plotted in Figs. 5 and 6. We see that the GBR model provided nearly piecewise-constant predictions in regions where there are training data. It is important to emphasize that GBR is not simply using a piecewise-constant at the training data to predict the yield stress. It can be seen in the two lower curves in Figs. 5(a) and 5(b), respectively, that in regions between 4.0 nm and 6.0 nm, there is a step in the predicted yield stress even though there is not any training data in that region. The piecewise nature of the predicted curves from GBR is a result of the different stages of decision trees used in the models. The GPR model produced smoother predictions compared to the GBR model. Both ML models suggest that given a fixed SFE mean, there is slight increase in the yield stress for region sizes between 1 nm and 4 nm.

Next we applied the GPR model to predict the critical yield stress as a function of the mean SFE for several SFE region sizes *d*. The results are shown in Fig. 7. It is important to note that due to the sparsely available training and test data for a given region size, the ML-predicted yield stresses for varying mean SFE are not as a robust compared to the yield stress for varying mean region sizes. Especially for predictions using a standard deviation at 12.0 mJ/m, we only have a few data points at SFE at 35 mJ/m. Nevertheless, we filled in the rest of the curve using our ML prediction for completeness. The ML models predict that an increase in the mean SFE will lead to a decrease in the critical yield stress for all region sizes. In addition, the models show that the critical yield stress decreases more rapidly for higher values of mean SFE. At low values of mean SFE, the increase in critical yield stress begins to taper off with a peak between mean SFE at 40 mJ/m–50 mJ/m.

## 5 Summary and Conclusions

This study demonstrated the capability of ML learning model to learn the relationship between the yield stress and the variation in the SFE landscape using results from PFDM simulations. The principal component from the SFE grid data using PCA shows the same trend in its relationship to the yield stress as the mean SFE. In addition, by employing PCA, we were able to reduce the dimension of statistical features to two dimension, which is necessary for data visualization. Of the three different feature types, we used to train the ML models, the input types where statistical descriptors of the SFE variations produced the lowest error. The ML models can be used as surrogate models for the PFDM simulations at a small fraction of its computational cost.

## Footnote

## Acknowledgment

The authors thank the Balsell’s foundation for providing the scholarship for Pau Cutrina Vilalta’s undergraduate thesis at University of Colorado, Colorado Springs. The authors also thank the Women in Mathematics of Materials (WIMM) Organization and Michigan Center for Applied and Interdisciplinary Mathematics at University of Michigan for providing travel and lodging funds for collaborative work on this project.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The raw data required to reproduce these findings are available to download from github repository.^{2} The processed data required to reproduce these findings cannot be shared at this time due to technical limitations.

## References

*γ*-Surface