## Abstract

This study presents a cost-effective and high-precision machine learning (ML) method for predicting the melt-pool geometry and optimizing the process parameters in the laser powder-bed fusion (LPBF) process with Ti-6Al-4V alloy. Unlike many ML models, the presented method incorporates five key features, including three process parameters (laser power, scanning speed, and spot size) and two material parameters (layer thickness and powder porosity). The target variables are the melt-pool width and depth that collectively define the melt-pool geometry and give insight into the melt-pool dynamics in LPBF. The dataset integrates information from an extensive literature survey, computational fluid dynamics (CFD) modeling, and laser melting experiments. Multiple ML regression methods are assessed to determine the best model to predict the melt-pool geometry. Tenfold cross-validation is applied to evaluate the model performance using five evaluation metrics. Several data pre-processing, augmentation, and feature engineering techniques are performed to improve the accuracy of the models. Results show that the “Extra Trees regression” and “Gaussian process regression” models yield the least errors for predicting melt-pool width and depth, respectively. The ML modeling results are compared with the experimental and CFD modeling results to validate the proposed ML models. The most influential parameter affecting the melt-pool geometry is also determined by the sensitivity analysis. The processing parameters are optimized using an iterative grid search method employing the trained ML models. The presented ML framework offers computational speed and simplicity, which can be implemented in other additive manufacturing techniques to comprehend the critical traits.

## Introduction

Powder-bed fusion (PBF) is a fusion-based additive manufacturing (AM) technique for creating complex metallic or alloy parts with a high strength-to-weight ratio, especially for aerospace, automotive, biomedical, dental, and electronic applications [1]. One of the most salient PBF processes thriving the industries now is the laser powder-bed fusion (LPBF) process, which uses finely focused monochromatic coherent photons, i.e., laser for melting the powder bed in a closed chamber. When the laser beam scans the top surface, it melts a volume of the powder bed to form a liquid melt pool, which is rapidly cooled and solidified in an inert gas environment [1]. When the melt pool in the first layer is solidified, a new powder layer is spread, and the process is repeated until the whole part is formed. Melt-pool geometry is a significant output in the LPBF process which indicates the width and depth of laser penetration and the heat-affected zone within the workpiece [2]. This output depends on several factors, including the material behavior, environment, and laser parameters. Among these factors, finding the optimized combination of the processing and material parameters, namely the laser power, scanning speed, spot size, porosity, and layer thickness, is extremely crucial when performing the LPBF process effectively [3]. The process can be defective and costly if the processing parameters are not set and optimized properly. Researchers usually perform trial and error methods using experiments and physics-based numerical modeling to determine the correct combination of the processing parameters, which takes substantial effort, time, and cost [1]. This is where machine learning (ML) becomes instrumental, by giving the scope for modeling and predicting desired outputs rapidly utilizing a large dataset of input parameters. When that dataset is obtained and provided as input to the model, the computer can learn and produce its results. The digital nature of the PBF process allows ML to identify and resolve the issues in manufacturing conveniently.

The study is focused on five vital parameters that affect the melt-pool geometry in LPBF directly. The first parameter that affects the performance of the LPBF process is the laser power, which must be sufficient to melt the powder layer completely. A laser beam having more than enough power is costlier and may lead to over-melting of the powder bed. Besides the laser power, optimizing the scanning speed is equally crucial; because a higher scanning speed can make the process faster, but it may result in the lack of penetration of the laser beam into the powder bed, causing incomplete melting. The third parameter is the laser spot size, also known as the beam diameter. A smaller diameter means a more concentrated laser (i.e., more energy can be transferred deeper into the substrate); whereas a larger diameter makes the laser beam less focused by spreading out more energy over the surface. Layer thickness is the fourth feature to be optimized. The laser beam can penetrate the substrate deeply causing a keyhole if the layer thickness is not enough, which can waste power, affect the melt-pool geometry, and make the prototyping unnecessarily slow. In contrast, a thicker layer prevents the laser beam from penetrating the layer fully resulting in incomplete melting of the powder. The final parameter is the porosity of the material, also known as the packing density. The porosity considered in this study is the porosity of the powder before it is melted. A more porous medium means the material is more in powder form than solid. The dataset compiled consists of these five parameters which are the ML program's input. The depth and width of the melt pool collected from the same dataset serve as the output. This dataset is used to train multiple regression-based ML models while predicting outputs.

The effect of LPBF process parameters on critical outputs has been investigated by researchers using various ML methods in recent years. Wang et al. [4] studied the relationship of the LPBF process conditions with the relative density of the processed parts. A novel dataset was prepared from an extensive literature survey with information on materials such as 316L steel, AlSi10Mg, and Fe60Co15Ni15Cr10. Different types of regression-based ML models were trained and validated using this dataset, and an accuracy of 87% was achieved. Baldi et al. [5] proposed a novel ML framework to predict the melt-pool dimensions as a function of laser power (*P*) and scanning speed (*V*) of the LPBF process with Inconel 718 as the material. KNIME analytics platform was used to train the ML models and an automl tool was used to find the most appropriate model depending on the coefficient of determination (*R*^{2}) and mean absolute error (MAE) scores. Among all tested ML models, the gradient-boosted tree models performed the best. Gorgannejad et al. [6] investigated the keyhole porosity formation in LPBF of Ti-6Al-4V substrates using a data fusion approach. They used off-axis and coaxial photodiode sensors, acoustic emission, and high-speed X-ray imaging to examine the formation of subsurface defects. ML models such as K-nearest neighbor (KNN), support vector machine (SVM), and Gaussian Naive Bayes (GNB) were used to predict keyhole pore formation at different time scales, achieving high accuracy. Mojumder et al. [7] studied the relationship between the processing conditions and lack of fusion (LOF) in the LPBF process with Ti-6Al-4V alloy. They developed a physics-based thermo-fluid model with the integration of an active learning framework to predict the LOF porosity in the LPBF process. A customized neural network (NN)-based symbolic regression tool was utilized to characterize the relationship between process parameters and LOF porosity. Mondal et al. [8] created a surrogate Gaussian Process model to predict melt-pool geometries with high accuracy. However, their model used only two parameters as inputs—laser power and scanning speed. Chen et al. [9] presented a method for detecting defects in the laser-directed energy deposition process using acoustic signals. They used a convolutional neural network (CNN) approach to denoise and analyze acoustic signals and detected cracks and keyhole pores in alloys with an overall accuracy of 89%. Ero et al. [10] introduced an ML-driven technique utilizing optical tomography data to detect deficiencies in fusion and the presence of keyhole porosity in LPBF. Their methodology incorporated a self-organizing map and a customized U-Net model, resulting in a resilient and computationally efficient system. Jeon et al. [11] developed an online melt-pool depth assessment technique via a coaxial infrared (IR) camera, laser line scanner, and artificial neural network (ANN) for a similar process, known as the directed energy deposition process. They extracted the features (inputs to the ANN model) from the IR camera and laser line scanner and predicted the melt-pool depth under the conduction mode. They compared the estimated results with the data obtained from the optical microscopy inspection. Song et al. [12] proposed a hybrid deep generative prediction network to identify the relationship between the processing parameters and pore microstructure. The proposed framework used a variational autoencoder and a generative adversarial network (GAN) to describe complex microstructures and predict pore morphology under various processing parameters. More ML-assisted studies on part-porosity detection were conducted by Tian et al. [13], Senenayaka et al. [14], and Ren et al. [15], who utilized image data and classification models to correlate the LPBF process parameters with keyhole porosity. Alexander et al. [16] proposed an ML-based methodology for real-time process monitoring to detect complex patterns in the melt-pool geometry. The study showed that the support vector regression and CNN models offered a promising solution for real-time process control with the mean absolute percentage error (MAPE) values of 3.67% and 3.68%, respectively. Ghungrad et al. [17] presented a deep learning technique for predicting the thermal history of the LPBF process where physics-based thermal modeling was integrated with the CNN model. The proposed methodology could counter the data limitation with the deep unfolding approach. The model achieved a MAPE of 2.8% and an *R*^{2} value of 0.936 for 1000 datapoints. Rahman et al. [18] studied the correlation between the LPBF process parameters and the melt-pool geometry (width and depth) using regression-based ML models. However, the study required significant improvement as the ML models incurred limited datapoints and lacked in the variation of porosity and layer thickness values. Table 1 represents a summary of the relevant ML studies that covered the LPBF process analysis.

AM application | ML technique | Reference |
---|---|---|

Relative density | Random Forest, XGBoost | Wang et al. [4] |

Keyhole porosity | SVM, KNN, and GNB based classifier | Gorgannejad et al. [6] |

CNN based classifier | Chen et al. [9] | |

ANN and U-Net semantic segmentation model | Ero et al. [10] | |

SVM, CNN, and transfer learning CNN based classifiers | Senenayaka et al. [14] | |

CNN | Ren et al. [15] | |

Hybrid deep generative Network | Tian et al. [13] | |

Layer-wise porosity/Anomaly | NN-based regressor | Mojumder et al. [7] |

Recurrent convolutional neural network | Song et al. [12] | |

Temperature | Physics informed deep learning | Ghungrad et al. [17] |

Thermal history | CNN | Wilkinson et al. [19] |

Regression and neural network | Kuehne et al. [20] | |

Regression and classification | Paulson et al. [21] | |

Melt-pool geometry | Regression models | Baldi et al. [5] |

Regression and CNN models | Alexander et al. [16] | |

Artificial neural network | Jeon et al. [11] | |

Gaussian process regression | Mondal et al. [8] |

AM application | ML technique | Reference |
---|---|---|

Relative density | Random Forest, XGBoost | Wang et al. [4] |

Keyhole porosity | SVM, KNN, and GNB based classifier | Gorgannejad et al. [6] |

CNN based classifier | Chen et al. [9] | |

ANN and U-Net semantic segmentation model | Ero et al. [10] | |

SVM, CNN, and transfer learning CNN based classifiers | Senenayaka et al. [14] | |

CNN | Ren et al. [15] | |

Hybrid deep generative Network | Tian et al. [13] | |

Layer-wise porosity/Anomaly | NN-based regressor | Mojumder et al. [7] |

Recurrent convolutional neural network | Song et al. [12] | |

Temperature | Physics informed deep learning | Ghungrad et al. [17] |

Thermal history | CNN | Wilkinson et al. [19] |

Regression and neural network | Kuehne et al. [20] | |

Regression and classification | Paulson et al. [21] | |

Melt-pool geometry | Regression models | Baldi et al. [5] |

Regression and CNN models | Alexander et al. [16] | |

Artificial neural network | Jeon et al. [11] | |

Gaussian process regression | Mondal et al. [8] |

Most ML-based investigations, including regression and classification models, have targeted part-porosity and defect detection as shown in Table 1. However, melt-pool geometry prediction is crucial as it correlates the thermal and material properties to the build performance and microstructure. Melt-pool geometry analysis also gives direct insight into the incomplete melting and over-melting of powders. The available studies on melt-pool geometry are overly complex and computationally expensive, making them challenging for the end-users and learners. Literature survey shows that the state-of-the-art ML-assisted melt-pool geometry analyses are focused on a smaller number of features (e.g., only laser power and scanning speed [5,8]) and target variables (e.g., only depth [11]). The incorporation of more features and target variables can make the model robust, but it entails challenging data processing tasks. All the stated circumstances indicate a substantial research gap in finding a cost-effective yet robust ML framework, rendering the complete picture of the LPBF process encompassing melt-pool dynamics, process parameter optimization, and sensitivity analysis. Therefore, the current study bridges the gap by pursuing a comprehensive but feasible ML framework that benefits both technical and non-technical individuals for analyzing the LPBF and similar processes.

This study focuses on a cost-effective prediction of the Ti-6Al-4V melt-pool geometry in the LPBF process using a high-precision ML modeling with five features and two target variables. First, the ML models are trained and cross-validated using the *k*-fold cross-validation procedure to find the best model for predicting melt-pool geometry. The ML models are then trained with five features—laser power, scanning speed, spot size, layer thickness, and porosity of the powder. To have a proper understanding of the correlation between the features and the target variables, the heatmap correlation matrix and sensitivity analysis are presented. Hyperparameter (HP) tuning is performed to fine-tune the ML models for extracting superior results. The predicted results for melt-pool width and depth from the proposed ML models are compared with the experimental and computational fluid dynamics (CFD) modeling results to confirm the validation of the proposed ML models. The material and process parameters are also optimized using the ML analysis targeting the maximum volume of the melt pool at a given energy density.

## Material and Methods

This study contains a combination of experimental, numerical, and ML analyses to predict the melt-pool geometry based on the compiled dataset. This section covers the material selection and associated methodology related to the study.

### Material.

The material considered for this study is Ti-6Al-4V, which is a high-entropy alloy, and can withstand remarkably elevated temperatures [22]. It offers a unique blend of physical and mechanical properties including high strength-to-weight ratio, lightweight, corrosion resistance, and resilience to fatigue. However, Ti-6Al-4V is quite expensive when compared to other leading industry metals such as stainless steel and carbon steel; therefore, it is imperative to ensure that the LPBF process is conducted correctly the first time. A correct combination of the laser power, scanning speed, spot size, powder porosity, and layer thickness is required to reduce or eliminate material wastage, part defects, and time delays in the LPBF process. The chemical composition of Ti-6Al-4V considered for this study is shown in Table 2. The solid Ti-6Al-4V and its powder form show different thermo-physical properties [23], especially for the thermal conductivity, specific heat capacity, melting point, density, and viscosity. Also, when the solid-to-liquid (and vice versa) transformation occurs, material properties during the phase change fluctuate severely, leaving substantial effects on the melt-pool evolution and part quality. Rahman et al. [1] reported the thermo-physical properties of Ti-6Al-4V in solid, powder, and liquid states, which can be utilized to generate accurate LPBF modeling data.

N | C | H | Fe | O | Al | V | Ti |
---|---|---|---|---|---|---|---|

0.050 | 0.080 | 0.015 | 0.400 | 0.200 | 5.500–6.750 | 3.500–4.500 | Bal. |

N | C | H | Fe | O | Al | V | Ti |
---|---|---|---|---|---|---|---|

0.050 | 0.080 | 0.015 | 0.400 | 0.200 | 5.500–6.750 | 3.500–4.500 | Bal. |

### Experimental Analysis.

A custom-designed laser melting system is used for conducting the LPBF experiments on the solid (zero porosity) and powder-bed Ti-6Al-4V specimens to study the effect of various process parameters on the melt-pool geometry. The laser melting system (as shown in Fig. 1) consists of a ytterbium fiber laser (YLR-200-AC-Y11), a Collimator (IPG D25), a scan head (Cambridge Technology ProSeries II), and a Jenoptik F-θ lens. The build chamber is filled with argon gas during the laser processing. The solid specimens are made of solid Ti-6Al-4V disks (polished) with a diameter of 12.7 mm and a thickness of 2 mm (Fig. 1(b)). The powder-bed specimens are prepared by using spherical Ti-6Al-4V powders with an average diameter of 25 *µ*m. A powder layer of 70 *µ*m is created on top of a 25.4 mm × 25.4 mm × 2.0 mm solid Ti-6Al-4V substrate (Fig. 1(c)). The porosity of the Ti-6Al-4V powder is measured manually by means of the powder density and bulk density, which is found to be 58.76%. The powder density is calculated by measuring the mass and volume of the powder several times and averaging the values to get the final reading.

Keeping the laser power and spot size fixed at 200 W and 58 *µ*m, respectively, four single-track laser scans are performed on both specimens with 100 mm/s, 300 mm/s, 750 mm/s, and 1000 mm/s scanning speeds, which are shown in Figs. 1(b) and 1(c). After laser scanning, the specimens are cut using a low-speed saw in the direction perpendicular to the single tracks. The cross sections are then ground with SiC papers (using successive grits), polished with the MetaDiTM Supreme polycrystalline 1-*μ*m diamond suspension, and rinsed ultrasonically in ethanol, acetone, and de-ionized water. Finally, the cross sections are etched with the Kroll's reagent to reveal the microstructure. The cross-sectional areas are examined by a Quanta^{™} 3D Dual Beam^{™} FEG FIB-scanning electron microscope (SEM) to see the microstructures at 20 kV accelerating voltage. The melt-pool width and depth are measured from the SEM images and the average values are recorded in the ML dataset.

### Numerical Modeling.

The study utilizes a 3D transient CFD model (originally developed by Rahman et al. [1]) for partial data collection and ML model validation. The physical domain of the CFD model involves a solid 4-mm thick Ti-6Al-4V substrate, and a 0.07-mm thick Ti-6Al-4V powder layer as shown in Fig. 2(a). A Ti-6Al-4V powder layer of 0.07 mm thickness is considered on top of the 14 mm × 4 mm × 4 mm solid Ti-6Al-4V block. Figure 2(a) shows the single track of the laser scan in the *y*-direction that starts from (0, 0, 2 mm) and ends at (0, 0, 12 mm) on the top surface. The 3D computational mesh, as shown in Fig. 2(b), is formed by refining the powder layer region and biasing the grid around the scanning track of the laser to achieve an exceptionally fine mesh in the target region. As the powder layer is melted by the moving heat source, the molten pool of liquid alloy is assumed to be an incompressible Newtonian fluid having a laminar flow. Other assumptions for the CFD model are flat top surface, fixed nodes, and no powder layer shrinkage during melting. The top surface of the physical domain is exposed to convection and radiation at a temperature of 298 K. As preheating is not required for the LPBF process, the bottom surface and side walls are kept in adiabatic condition with a temperature of 298 K. The LPBF simulations are conducted in ansys 2022 R2 where the Gaussian laser heat source and thermal properties (density, emissivity, thermal conductivity, specific heat capacity, and viscosity) of the material are incorporated as user-defined functions (UDFs) [1]. Results for melt-pool geometry at the cross section were obtained using ansys fluent and cfd post.

### Machine Learning Modeling.

A suite of supervised machine learning models is explored for the prediction of the melt-pool geometry and the optimization of the processing parameters. While utilizing any ML model for output prediction, training is a prerequisite step. Typically, the model is trained with experimental data, or some simulated data obtained from a finite element or a similar model. In supervised learning, the regression model receives inputs (certain numbers) and is informed about the expected outputs. Conversely, unsupervised learning entails the model being provided with inputs and autonomously determining the desired outputs. Once the model is trained with the given data, it undergoes testing. Unseen data become the input, and the model calculates an output based on its acquired knowledge. This study incorporates single, NN, and ensemble-based ML models. The single model includes Gaussian process, linear regression, polynomial regression, and support vector machine. Multi-layer perception is the NN-based model used in this study. Ensemble-based models include Random Forest, Gradient Boosting, AdaBoost, Bagging, and Extra Trees.

#### Data Collection.

The dataset is compiled from multiple sources incorporating both experimental and simulated data for single-track laser scans using Ti-6Al-4V alloy, including the works of Rahman et al. [18], Wilkinson et al. [19], Kuehne et al. [20], Dilip et al. [24], Gong et al. [25], Soylemez [26], Kusuma [27], and Cunningham et al. [28]. The representative X-ray video files and images corresponding to both stationary and moving laser melting experiments are accessed to extract datapoints. The dataset considered for this study contains selected data from the above research articles along with some additional experimental and numerical modeling data generated by the authors. Figure 3 shows the data collection scheme for the study.

#### Dataset Preparation.

The dataset is prepared in the form of a CSV file after collecting and sorting data from multiple sources. The melt-pool geometry data are graphically extracted from the video files using accurate frames. opencv python library is used to extract frames from each video and label them to their corresponding video file number. The extracted frames are then imported into cad software, scaled according to the specified dimensions, and finally, the melt-pool geometry information is extracted in the forms of width and depth. Using this process, a lot of melt-pool geometry data were gathered from each video and eventually, a single data point was obtained by averaging multiple data points. The final dataset has 830 samples, each of the samples having five features ($x1$–$x5$) and two target variables ($y1$ and $y2$). However, two separate models are created, one targeting the five input parameters' influence on the melt-pool width and the other focusing on the five input parameters' influence on the melt-pool depth. The input parameters include laser power, *P* (W), scanning speed, *v _{s}* (mm/s), spot size,

*Φ*(

*μ*m), layer thickness, $lt$ (

*μ*m), and powder porosity,

*φ*(%). The output parameters are melt-pool width,

*w*(

*μ*m) and melt-pool depth,

*d*(

*μ*m). The ML models are trained twice, once for the melt-pool width and once for the melt-pool depth. The maximum and minimum values of the features and target variables in the dataset are shown in Table 3.

Parameter | P (w) | v_{s}(mm/s) | Φ( μm) | l_{t}( μm) | $\phi (%)$ | w (μm) | d (μm) |
---|---|---|---|---|---|---|---|

Max. | 520 | 3200 | 250 | 140 | 57.600 | 415.500 | 980 |

Min. | 50 | 0 | 30 | 58 | 0 | 45 | 5 |

Parameter | P (w) | v_{s}(mm/s) | Φ( μm) | l_{t}( μm) | $\phi (%)$ | w (μm) | d (μm) |
---|---|---|---|---|---|---|---|

Max. | 520 | 3200 | 250 | 140 | 57.600 | 415.500 | 980 |

Min. | 50 | 0 | 30 | 58 | 0 | 45 | 5 |

The prepared dataset necessitates specific pre-processing steps to mitigate potential biases in the ML models' results. The ensuing discussion outlines the employed data pre-processing and feature engineering techniques.

#### Normalization.

Normalization is a data pre-processing technique that organizes the dataset by generating consistent values across the records. Typically, this is achieved by transforming variables to a common range or scale, thereby eliminating variations in units or distributions. Since the features and target variables exhibit different ranges (as shown in Table 3), normalization is performed by dividing them by their corresponding maximum values. This ensures that all variables are constrained within the same standardized range (from 0 to 1).

#### Polynomial Feature Expansion.

The technique of generating polynomial features from the original features in a dataset is known as polynomial feature expansion (PFE) where new features are engineered by increasing the capabilities of the already-existing features. PFE, for example, can generate new features such as “*x*^{2}”, “*x*^{3}”, “*x*^{4}”, and so on, given a single feature “*x*”. Models are better able to fit the underlying relationships when this approach is used to assist capture nonlinear patterns in the data. This study incorporates five features and two target variables. Using second- and third-degree PFE, 12 additional features are generated and integrated with the original dataset. They are generated by multiplying laser power, scanning speed, squared laser power, and squared scanning speed with layer thickness, spot size, and porosity. Transforming features into polynomials provides the ML models with increased flexibility, enabling them to capture intricate patterns such as curves. Utilizing polynomial features can enhance the model performance, particularly for datasets exhibiting nonlinear patterns. This approach usually allows the ML models to generate more accurate predictions and provides a better approximation of the underlying data.

#### Synthetic Data Generation.

Synthetic data generation is a process of producing additional samples following the distribution of the samples from the original dataset. Various techniques are available to generate synthetic samples among which the GAN is incorporated in this study. The conditional tabular generative adversarial network (CTGAN) is a useful method that utilizes GANs for modeling the distribution of tabular data and producing synthetic data [31]. The design of this system enables it to effectively process both numerical and categorical characteristics, while dealing with difficulties such as non-Gaussian and multimodal distributions, as well as the scarcity of one-hot-encoded vectors in real-world data. CTGAN implements mode-specific normalization for numeric features and utilizes a conditional vector to specify the condition for sampling training data. It can obscure personally identifiable information and produce artificial data for structured, tabular datasets with diverse characteristics and a sufficient training size. The hyperparameters possessed by CTGAN govern the learning behavior of the model and carry the potential to influence the performance of the generated data.

#### Data Splitting (*k*-Fold Cross-Validation).

The *k*-fold cross-validation method is used to shuffle the dataset and test the skill of the ML model accordingly. With this method, the original sample data are resampled and randomly partitioned into *k* equal parts. Then, the model chooses one set of data from the original dataset as the test set (or validation set) and takes the other sets (i.e., the remaining *k*–1) as the training sets. The procedure is repeated *k* times, where each specific data point serves exactly once in the test set and *k*–1 time in the training set [32]. When the ML model completes its cross validations, the average error is computed to evaluate the model's accuracy [32]. This study uses a tenfold cross-validation model as it gives the least amount of error during the evaluation.

#### Hyperparameter Tuning.

Hyperparameter (HP) tuning is a method of optimizing the HPs of an ML model to improve its performance [33]. HPs refer to external configuration settings that exert an influence on a model's learning process not acquired through the data. Illustrative instances of HP tuning encompass the learning rate, regularization strength, and depth of a decision tree. The procedure of HP tuning entails a methodical exploration of various combinations of parameters to identify the optimal configuration. Various techniques can be used for HP tuning, including random search, grid search, and Bayesian optimization. Ensuring appropriate tuning is crucial to enhance the model's ability to effectively generalize unfamiliar data and achieve optimal predictive accuracy. In this study, the grid search method is implemented by iterating over different HP arguments. The tenfold cross-validation technique is used to assess the impact of each parameter on the model's performance. The optimal HPs are selected by comparing the values of the five performance-evaluating metrics. The HP arguments used for the HP tuning process in this study are provided in the supplemental file available in the Supplemental Materials on the ASME Digital Collection.

## Results and Discussion

Results for data correlation, best ML model selection, sensitivity analysis, and processing parameter optimization found from the ML analysis are presented and discussed in this section.

### Data Correlation.

The dataset compiled for this study has five feature variables (laser power, *P* (W), scanning speed, *v _{s}* (mm/s), spot size,

*Φ*(

*μ*m), layer thickness, $lt$ (

*μ*m), and powder porosity,

*φ*(%)) and two target variables (melt-pool width,

*w*(

*μ*m) and melt-pool depth,

*d*(

*μ*m)). To understand the correlation between the feature and target variables, two scatter plots are presented in Figs. 4(a) and 4(b) representing the relationship of feature variables with melt-pool width and melt-pool depth. The correlation is not conclusive from Fig. 4 as the data points seem randomly scattered for other variables. Further quantification and ML analysis are necessary to better understand the correlation among the variables.

The relationship between the laser power and melt-pool geometry is roughly understandable in Fig. 4, but the variations of the melt-pool geometry with the variations of other feature variables are difficult to understand. To have a clear understanding, a heatmap correlation matrix is presented in Fig. 5. The magnitudes of the correlation between the features and target variables are illustrated in this figure. The +1 value represents the strongest correlation, whereas the −1 value indicates the weakest correlation.

Figure 5 shows that the magnitude of correlation is similar for both melt-pool width and depth. It is also observed that both laser power and layer thickness have a positive correlation with melt-pool width and depth, where the laser power has higher magnitude than that of the layer thickness. Negative correlations are found between the scanning speed, spot size, and porosity with respect to the melt-pool width and depth. For negative correlation, scanning speed has the highest magnitude followed by the spot size and porosity.

### Performance of the Base Model.

Several ML models are trained and tested in the Scikit learn module [34] of python using the normalized dataset. Performances of the ML models for predicting melt-pool width and depth are analyzed by evaluating the correlation coefficient and four error metrics—MAE, root mean squared error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE), which are reported in Table 4.

Model | Melt-pool geometry | Correlation coefficient | MAE | RMSE | RAE | RRSE |
---|---|---|---|---|---|---|

Gaussian process | Depth | −84.269 | 0.080 | 0.657 | 1.756 | 7.936 |

Width | −7.497 | 0.054 | 0.373 | 0.593 | 2.821 | |

Linear regression | Depth | 0.632 | 0.028 | 0.052 | 0.625 | 0.626 |

Width | 0.745 | 0.034 | 0.066 | 0.370 | 0.502 | |

Polynomial regression | Depth | 0.632 | 0.028 | 0.052 | 0.625 | 0.626 |

Width | 0.745 | 0.034 | 0.066 | 0.370 | 0.502 | |

Support vector regression | Depth | 0.357 | 0.053 | 0.062 | 1.174 | 0.754 |

Width | 0.764 | 0.043 | 0.063 | 0.472 | 0.480 | |

KNN | Depth | 0.733 | 0.011 | 0.042 | 0.234 | 0.509 |

Width | 0.803 | 0.016 | 0.058 | 0.178 | 0.438 | |

Multi-layer perception | Depth | 0.574 | 0.025 | 0.057 | 0.540 | 0.683 |

Width | 0.669 | 0.040 | 0.075 | 0.442 | 0.570 | |

Random Forest | Depth | 0.799 | 0.009 | 0.035 | 0.192 | 0.419 |

Width | 0.917 | 0.011 | 0.038 | 0.125 | 0.288 | |

Gradient Boosting | Depth | 0.900 | 0.008 | 0.026 | 0.186 | 0.318 |

Width | 0.924 | 0.016 | 0.036 | 0.171 | 0.274 | |

AdaBoost | Depth | 0.516 | 0.036 | 0.052 | 0.801 | 0.629 |

Width | 0.750 | 0.045 | 0.065 | 0.486 | 0.492 | |

Bagging | Depth | 0.789 | 0.009 | 0.036 | 0.204 | 0.434 |

Width | 0.906 | 0.012 | 0.040 | 0.132 | 0.305 | |

Extra Trees | Depth | 0.878 | 0.007 | 0.028 | 0.155 | 0.339 |

Width | 0.924 | 0.011 | 0.036 | 0.117 | 0.272 |

Model | Melt-pool geometry | Correlation coefficient | MAE | RMSE | RAE | RRSE |
---|---|---|---|---|---|---|

Gaussian process | Depth | −84.269 | 0.080 | 0.657 | 1.756 | 7.936 |

Width | −7.497 | 0.054 | 0.373 | 0.593 | 2.821 | |

Linear regression | Depth | 0.632 | 0.028 | 0.052 | 0.625 | 0.626 |

Width | 0.745 | 0.034 | 0.066 | 0.370 | 0.502 | |

Polynomial regression | Depth | 0.632 | 0.028 | 0.052 | 0.625 | 0.626 |

Width | 0.745 | 0.034 | 0.066 | 0.370 | 0.502 | |

Support vector regression | Depth | 0.357 | 0.053 | 0.062 | 1.174 | 0.754 |

Width | 0.764 | 0.043 | 0.063 | 0.472 | 0.480 | |

KNN | Depth | 0.733 | 0.011 | 0.042 | 0.234 | 0.509 |

Width | 0.803 | 0.016 | 0.058 | 0.178 | 0.438 | |

Multi-layer perception | Depth | 0.574 | 0.025 | 0.057 | 0.540 | 0.683 |

Width | 0.669 | 0.040 | 0.075 | 0.442 | 0.570 | |

Random Forest | Depth | 0.799 | 0.009 | 0.035 | 0.192 | 0.419 |

Width | 0.917 | 0.011 | 0.038 | 0.125 | 0.288 | |

Gradient Boosting | Depth | 0.900 | 0.008 | 0.026 | 0.186 | 0.318 |

Width | 0.924 | 0.016 | 0.036 | 0.171 | 0.274 | |

AdaBoost | Depth | 0.516 | 0.036 | 0.052 | 0.801 | 0.629 |

Width | 0.750 | 0.045 | 0.065 | 0.486 | 0.492 | |

Bagging | Depth | 0.789 | 0.009 | 0.036 | 0.204 | 0.434 |

Width | 0.906 | 0.012 | 0.040 | 0.132 | 0.305 | |

Extra Trees | Depth | 0.878 | 0.007 | 0.028 | 0.155 | 0.339 |

Width | 0.924 | 0.011 | 0.036 | 0.117 | 0.272 |

Note: The bold fonts indicate the best results.

All results reported in Table 4 are generated from the default parameters of the ML models, which can be denoted as the “base models.” It can be observed that Extra Trees regressor gives the lowest number of errors and the highest amount of correlation coefficient for melt-pool width. Extra Trees regressor also shows promising results for predicting melt-pool depth, but the Gradient Boosting outperformed it with a higher correlation coefficient and lower RMSE and RRSE values (highlighted in Table 4).

### Hyperparameter Tuning Result.

The HP tuning is performed using the grid search approach to obtain the best parameters for the models. These parameters enable the ML models to perform with the least amount of error. The hyperparameters and their optimized values for all ML models are provided in the supplemental file available in the Supplemental Materials on the ASME Digital Collection. After evaluating the error metrics, Extra Trees is found to be a better-performing model among all the ML models. Four parameters are used to tune this model. The first parameter, “n_estimators” determines the number of decision trees used in the model where sampling is random for each tree. The second parameter, “max_depth” controls the number of branches an individual tree can have, thus it directly contributes to controlling model complexity and preventing overfitting. A higher value of max_depth represents a complex tree that is a better fit to training data and exhibits higher risk of overfitting. However, a lower value of this parameter depicts a simpler tree that is more general to unseen data hence, reducing the risk of overfitting. The third parameter, “min_samples_split” controls the growth of the tree and prevents overfitting by specifying the number of samples required to initiate a split in an internal node. The final parameter, “min_samples_leaf” represents the minimum number of samples required to be at a leaf node. This parameter also controls tree growth and complexity. It also prevents the model's overfitting and improves generalization to unseen data. Figure 6 shows four curves depicting the relationship between the HP values and the error metrics for the Extra Trees regressor. The optimum HP values corresponding to the lowest error can be identified from Fig. 6 for each error metric.

Gaussian process is another ML model that shows promising results. The HP tuning technique is also applied to the Gaussian process model. However, to tune the Gaussian process model, several kernels are used as the model argument in place of numerical values. Figure 7 summarizes different kernels used for HP tuning of the Gaussian process model and their associated errors. It can be observed from Fig. 7 that the kernel (C(1.0, (1e-3, 1e3)) * Matern (1.0, (1e-2, 1e2), nu = 0.5)) gives the least amount of error (e.g., the lowest MAE of 0.62%). Hence, this kernel is used as the optimized tuning value of the Gaussian process ML model.

Table 4 corresponds to the results generated from the base ML regression models with default parameters. The HP tuning technique is performed to get more accurate results than those obtained from the base models. A grid search approach is pursued to fine-tune the models from a list of parameters. This technique is proven to provide better accuracy than the conventional methods. Results from the HP-tuned models are reported in Table 5, where it can be observed that the Extra Trees model shows the highest correlation coefficient for melt-pool width and lowest errors, outperforming all other ML models. It can also be observed that the Gaussian process shows the least amount of error and the highest correlation coefficient while predicting melt-pool depth, outperforming all other ML models. Comparing the results from Tables 4 and 5, a significant improvement in accuracy is detected after the application of the HP tuning technique. In predicting the melt-pool width, the Extra Trees model remains the best, but the accuracy increases after HP tuning. This is because the Extra Trees Regressor uses Random Forest as its base estimator and its ensemble nature helps mitigate the impact of individual outliers or noisy samples by averaging the predictions of multiple trees. This robustness allows Extra Trees to maintain reliable performance even in the presence of noisy or outlying data points [35]. As the dataset is compiled from multiple sources, the impact of individual noisy samples is minimized due to the nature of the Extra Trees algorithm, making it a better-performing model than other ML regression models.

Model | Melt-pool geometry | Correlation coefficient | MAE | RMSE | RAE | RRSE |
---|---|---|---|---|---|---|

Gaussian process | Depth | 0.908 | 0.006 | 0.025 | 0.137 | 0.300 |

Width | 0.883 | 0.012 | 0.045 | 0.128 | 0.336 | |

Linear regression | Depth | 0.632 | 0.028 | 0.052 | 0.625 | 0.626 |

Width | 0.745 | 0.034 | 0.066 | 0.370 | 0.502 | |

Polynomial regression | Depth | −1.219 | 0.020 | 0.094 | 0.434 | 1.130 |

Width | 0.849 | 0.021 | 0.051 | 0.230 | 0.385 | |

Support vector regression | Depth | 0.582 | 0.036 | 0.053 | 0.784 | 0.646 |

Width | 0.821 | 0.038 | 0.055 | 0.411 | 0.418 | |

KNN | Depth | 0.774 | 0.010 | 0.037 | 0.218 | 0.452 |

Width | 0.825 | 0.016 | 0.055 | 0.173 | 0.413 | |

Multi-layer perception | Depth | 0.885 | 0.011 | 0.027 | 0.250 | 0.331 |

Width | 0.892 | 0.017 | 0.043 | 0.183 | 0.323 | |

Random Forest | Depth | 0.806 | 0.009 | 0.034 | 0.190 | 0.413 |

Width | 0.918 | 0.011 | 0.038 | 0.123 | 0.285 | |

Gradient Boosting | Depth | 0.891 | 0.007 | 0.026 | 0.164 | 0.320 |

Width | 0.920 | 0.012 | 0.037 | 0.134 | 0.283 | |

AdaBoost | Depth | 0.701 | 0.015 | 0.039 | 0.326 | 0.476 |

Width | 0.886 | 0.021 | 0.045 | 0.232 | 0.337 | |

Bagging | Depth | 0.832 | 0.008 | 0.033 | 0.186 | 0.399 |

Width | 0.919 | 0.011 | 0.037 | 0.122 | 0.283 | |

Extra Trees | Depth | 0.882 | 0.007 | 0.028 | 0.152 | 0.339 |

Width | 0.931 | 0.010 | 0.034 | 0.112 | 0.259 |

Model | Melt-pool geometry | Correlation coefficient | MAE | RMSE | RAE | RRSE |
---|---|---|---|---|---|---|

Gaussian process | Depth | 0.908 | 0.006 | 0.025 | 0.137 | 0.300 |

Width | 0.883 | 0.012 | 0.045 | 0.128 | 0.336 | |

Linear regression | Depth | 0.632 | 0.028 | 0.052 | 0.625 | 0.626 |

Width | 0.745 | 0.034 | 0.066 | 0.370 | 0.502 | |

Polynomial regression | Depth | −1.219 | 0.020 | 0.094 | 0.434 | 1.130 |

Width | 0.849 | 0.021 | 0.051 | 0.230 | 0.385 | |

Support vector regression | Depth | 0.582 | 0.036 | 0.053 | 0.784 | 0.646 |

Width | 0.821 | 0.038 | 0.055 | 0.411 | 0.418 | |

KNN | Depth | 0.774 | 0.010 | 0.037 | 0.218 | 0.452 |

Width | 0.825 | 0.016 | 0.055 | 0.173 | 0.413 | |

Multi-layer perception | Depth | 0.885 | 0.011 | 0.027 | 0.250 | 0.331 |

Width | 0.892 | 0.017 | 0.043 | 0.183 | 0.323 | |

Random Forest | Depth | 0.806 | 0.009 | 0.034 | 0.190 | 0.413 |

Width | 0.918 | 0.011 | 0.038 | 0.123 | 0.285 | |

Gradient Boosting | Depth | 0.891 | 0.007 | 0.026 | 0.164 | 0.320 |

Width | 0.920 | 0.012 | 0.037 | 0.134 | 0.283 | |

AdaBoost | Depth | 0.701 | 0.015 | 0.039 | 0.326 | 0.476 |

Width | 0.886 | 0.021 | 0.045 | 0.232 | 0.337 | |

Bagging | Depth | 0.832 | 0.008 | 0.033 | 0.186 | 0.399 |

Width | 0.919 | 0.011 | 0.037 | 0.122 | 0.283 | |

Extra Trees | Depth | 0.882 | 0.007 | 0.028 | 0.152 | 0.339 |

Width | 0.931 | 0.010 | 0.034 | 0.112 | 0.259 |

Note: The bold fonts indicate the best results.

After performing HP tuning, a drastic change is observed in predicting the melt-pool depth, where the Gaussian process outperforms all other models. Table 5 represents the results after HP tuning, where it is visible that the Gaussian process is the best model for predicting the melt-pool depth. The nonlinear trait of the Gaussian process model helps to fit better with noisy samples. The Gaussian process model performs better in predicting the melt-pool depth as it has a random relationship with its corresponding feature variables. The Extra Trees regressor remains the best model for predicting the melt-pool width after HP tuning. The application of HP tuning reduces the error values significantly and increases the correlation coefficient.

Figure 8 represents the performance of the ML models across all other error metrics for both melt-pool width and depth, to have a clear understanding. Figure 8 clearly shows that after performing HP tuning the ML model performance improves and, Extra Trees and Gaussian process have the least error for melt-pool width and depth respectively across all the error metrics.

### Performance of ML Models with Feature Engineering.

In addition to employing HP tuning, various feature engineering techniques are implemented, including PFE and synthetic data generation. The errors stemming from these feature engineering methods are compared against both the base and HP-tuned ML models. Following the application of PFE, notable improvements are observed in specific models, namely Gaussian process, KNN, multi-layer perceptron, and AdaBoost, as illustrated in Fig. 9. Random Forest and Bagging exhibit a marginal decrease in errors after PFE. Conversely, support vector regression does not exhibit any improvement in model performance; in fact, there is a deterioration in performance. By referring to Figs. 8 and 9, it is evident that the Gaussian process and Extra Trees display the least errors for the melt-pool depth and width, respectively. However, employing PFE does not yield further enhancements for these two models. The resulting performance (shown in Fig. 9) is nearly identical to the HP-tuned model. Consequently, the results based on the HP-tuned model are recommended without escalating further feature complexity.

Synthetic data generation also does not help further improve the model performance, rather it displays the highest error observed from Fig. 10. As the Extra Trees model gives the lowest error in predicting melt-pool width and Gaussian process produces the least amount of error in predicting melt-pool depth, these two models are selected for predicting target variables and performing optimization of the process parameters for the rest of the analysis. Figure 10 shows a comparative model performance evaluation for all the techniques applied to both Extra Trees and Gaussian process ML models.

### Machine Learning Model Validation.

The predicted results from the ML model are compared with the CFD modeling and experimental results for the melt-pool width and depth. The laser melting experiments are conducted in a custom-designed ytterbium fiber laser processing system as described in “Experimental Analysis” section. The cross sections of the processed specimens are examined using SEM to acquire melt-pool width and depth results. The process parameters, i.e., the five features are kept the same for the experiment and the CFD model to facilitate the comparative study. The values for the simulation and experimental parameters and the five features considered for the ML model are shown in Table 6, where UDF represents user-defined functions in terms of temperature [22,36].

Parameters | Values |
---|---|

Solidus temperature, T (K)_{S} | 1878 |

Liquidus temperature, T (K)_{L} | 1938 |

Latent heat of fusion, L (kJ/kg)_{f} | 440 |

Spot size of laser beam, Φ (µm) | 58 |

Scanning speed, v (mm/s)_{s} | 300 |

Laser power, P (W) | 200 |

Initial temperature, $Tin(K)$ | 298 |

Laser absorption efficiency, $\eta l$ | 0.865 |

Powder porosity (%) | 50 |

Powder layer thickness, l (mm)_{t} | 0.07 |

Beam penetration depth, S (µm) | 62 |

Convection coefficient, h (W/m^{2}-K) | 10 |

Effective viscosity, µ (kg/m-s) | UDF |

Specific heat, c (J/kg-K)_{p} | UDF |

Thermal conductivity, k (W/m-K) | UDF |

Emissivity, $\epsilon $ | UDF |

Density, ρ (kg/m^{3}) | UDF |

Parameters | Values |
---|---|

Solidus temperature, T (K)_{S} | 1878 |

Liquidus temperature, T (K)_{L} | 1938 |

Latent heat of fusion, L (kJ/kg)_{f} | 440 |

Spot size of laser beam, Φ (µm) | 58 |

Scanning speed, v (mm/s)_{s} | 300 |

Laser power, P (W) | 200 |

Initial temperature, $Tin(K)$ | 298 |

Laser absorption efficiency, $\eta l$ | 0.865 |

Powder porosity (%) | 50 |

Powder layer thickness, l (mm)_{t} | 0.07 |

Beam penetration depth, S (µm) | 62 |

Convection coefficient, h (W/m^{2}-K) | 10 |

Effective viscosity, µ (kg/m-s) | UDF |

Specific heat, c (J/kg-K)_{p} | UDF |

Thermal conductivity, k (W/m-K) | UDF |

Emissivity, $\epsilon $ | UDF |

Density, ρ (kg/m^{3}) | UDF |

The comparison of the results and percentage of deviation are depicted in Table 7. The comparison shows that the ML model results for melt-pool width and depth show a good agreement with the CFD modeling and experimental results. The CFD model shows less magnitude of the maximum width and depth due to the conduction mode of the melt pool, which results from the ideal Gaussian heat source assumption. However, the overall cross-sectional area of the melt pool from the CFD modeling is comparable with the experimental data [37]. The ML model offers flexibility and saves a significant amount of time while predicting the melt-pool geometry when compared to the other two techniques. These prediction results are compared with the actual melt-pool geometry and CFD model value from the dataset and the values of percentage of deviation are tabulated in Table 7.

Variable | ML | CFD model | Experiment | Percentage of deviation from the CFD model | Percentage of deviation from experiment |
---|---|---|---|---|---|

Melt-pool width (μm) | 204.300 | 190 | 205 | 7.526 | 0.341 |

Melt-pool depth (μm) | 189.500 | 176 | 190 | 7.670 | 0.263 |

Variable | ML | CFD model | Experiment | Percentage of deviation from the CFD model | Percentage of deviation from experiment |
---|---|---|---|---|---|

Melt-pool width (μm) | 204.300 | 190 | 205 | 7.526 | 0.341 |

Melt-pool depth (μm) | 189.500 | 176 | 190 | 7.670 | 0.263 |

It is apparent from Table 7 that ML model prediction can be a viable alternative to performing actual experiments. The ML model predicts melt-pool geometry for a set of random feature variables from the test group of the dataset. The maximum deviations from the experimental melt-pool geometry values are 0.341% for the melt-pool width and 0.263% for the melt-pool depth. Figure 11 shows the comparison of the actual data (test set of the dataset) with the ML prediction results. The *x*-axis of Fig. 11 indicates the sample size of the test set while the *y*-axis represents the magnitude of the melt-pool geometry. The actual curves presented in Fig. 11 show 166 melt-pool width or depth values from the test set of the dataset and the predicted curves show the predicted melt-pool width or depth values from the ML model. The comparison clearly indicates a good agreement between the actual and predicted results for the melt-pool geometry.

### Sensitivity Analysis.

The influence of feature variables on the target variables can also be expressed in terms of the global sensitivity analysis (GSA), which is a statistical technique employed to assess the influence of uncertain input variables on the variability of output in a mathematical or computational model considering the entire parameter space [38,39]. GSA employs Monte Carlo sampling, in which a comprehensive collection of parameters (global sample values) is employed to investigate the impacts of fluctuations in model parameters on the model's output. This approach entails the concurrent alteration of multiple parameters and the application of statistical techniques to measure the significance of each parameter [40]. GSA is used in this study because it considers multiple parameters as opposed to local sensitivity analysis which only considers one-to-one relationship. In the LPBF process, the sensitivity of the target variables (e.g., the maximum melt-pool width and maximum melt-pool depth) is determined by the effects of the feature variables (laser power, scanning speed, layer thickness, spot size, and porosity) on them, which is presented in Figs. 12(a) and 12(b). It can be observed from Fig. 12 that laser power is the most influential parameter while layer thickness is the least influential parameter on the target variables. The sensitivity analysis results are generated using the SALib library in python programming language [40].

### Process Parameters Optimization.

The ML models are reconfigured to predict optimum melt-pool geometry after proper training. Through this study, two different ML models are selected for the prediction of melt-pool width and depth with the lowest error margin. The optimized LPBF processing parameters can be predicted using these ML models for any combinations of input parameters. This study incorporates an iterative grid search approach where laser power, scanning speed, and spot size were varied across their limits from the original dataset. The two material parameters (porosity and layer thickness) usually remain the same during the LPBF process, as they are determined by packing the powder before laser processing. Therefore, the layer thickness and porosity are kept constant at 70 *µ*m and 50%, respectively. The condition for melt-pool width is considered as at least twice the spot size of the laser. The melt-pool depth is considered within the range of 1.5–2.0 times the layer thickness. The ratio of depth to width is also considered within a range from 1.0 to 1.2 to avoid keyhole shapes. The optimization routine is performed using python programming language with all these conditions which results in a total of 99 optimized combinations from which four sets are presented in Table 8.

Parameters | P(w) | v_{s}(mm/s) | Φ( μm) | l_{t}( μm) | $\phi (%)$ | w( μm) | d( μm) |
---|---|---|---|---|---|---|---|

Set 01 | 200 | 650 | 70 | 50 | 50 | 118.70 | 129.99 |

Set 02 | 250 | 650 | 70 | 60 | 50 | 121.35 | 136.54 |

Set 03 | 280 | 700 | 70 | 40 | 50 | 122.03 | 139.42 |

Set 04 | 300 | 600 | 70 | 50 | 50 | 135.91 | 139.35 |

Parameters | P(w) | v_{s}(mm/s) | Φ( μm) | l_{t}( μm) | $\phi (%)$ | w( μm) | d( μm) |
---|---|---|---|---|---|---|---|

Set 01 | 200 | 650 | 70 | 50 | 50 | 118.70 | 129.99 |

Set 02 | 250 | 650 | 70 | 60 | 50 | 121.35 | 136.54 |

Set 03 | 280 | 700 | 70 | 40 | 50 | 122.03 | 139.42 |

Set 04 | 300 | 600 | 70 | 50 | 50 | 135.91 | 139.35 |

## Conclusions

A high-precision yet computationally inexpensive ML framework is developed to predict the melt-pool geometry and optimize the processing parameters in the laser powder-bed fusion (LPBF) process. A novel dataset is prepared from an extensive literature survey, experiments, numerical modeling, and image processing techniques. After completing the dataset, ML-based analysis is performed to predict the target variables, optimize the process parameters, and determine the correlations between the features and target variables. Data pre-processing and feature engineering techniques are strategically applied to reduce the error in predictions. Results are obtained for multiple ML models and the best models are proposed for the melt-pool geometry predictions. The following conclusions are drawn from the study:

This study proposes a reproducible ML framework for melt-pool geometry prediction incorporating five features and two target variables, which are higher than that of the state-of-the-art ML-assisted LPBF process analysis. The consideration of both material and processing parameters and the inclusion of wide-ranging experimental and numerical modeling data enhance the robustness in melt-pool geometry prediction.

The HP tuning approach taken in this study helps achieve significant improvement in the ML model performance when compared to the base model with default parameters. Other feature engineering techniques such as polynomial feature expansion and data augmentation do not show significant improvement in results for the dataset used in this study.

For the melt-pool width, the Extra Trees model gives the best prediction accuracy with an MAE of 1.028%, while the Gaussian process model produces the best prediction accuracy for the melt-pool depth with an MAE value of 0.623%. The dataset contains noisy data from diverse sources for the same material. Due to the presence of randomness in the dataset, the ensemble tree-based models perform better than the conventional ML models.

It is observed from the heatmap correlation matrix that the laser power has the highest Pearson correlation coefficient followed by scanning speed, layer thickness, spot size, and porosity for melt-pool width prediction. For melt-pool depth, the order is laser power, scanning speed, spot size, layer thickness, and porosity.

The global sensitivity analysis shows that laser power and laser scanning speed are the most dominant factors affecting the melt-pool geometry followed by the porosity, spot size, and layer thickness.

The optimization study conducted by the ML-based approach generated 99 combinations of optimum processing parameters, keeping the material parameters constant. Four representative sets of values are proposed in this study. The conditions for optimization are determined from the concept of achieving the maximum melt-pool volume with respect to the minimum energy density of the laser.

This study presents a robust ML analysis using a dataset comprising of 830 samples, acknowledging that this quantity may be increased in the future for a deep learning-based analysis. Subsequent research efforts may benefit from incorporating larger datasets to facilitate the application of deep learning techniques. Future scope may also include image processing-based melt-pool geometry prediction. Though the proposed ML framework is validated for the LPBF process, it can be applied for analyzing (i.e., target variable prediction, defect detection, and characterization) other additive manufacturing techniques such as fused deposition modeling, direct ink writing, direct energy deposition, and additive friction-stir deposition processes.

## Acknowledgment

The authors acknowledge the support of the US National Science Foundation under Grant No. OIA-1946231, and the Louisiana Board of Regents for the Louisiana Materials Design Alliance (LAMDA).

## Funding Data

This project is mainly funded by the US National Science Foundation under Grant No. OIA-1946231, and the Louisiana Board of Regents through the Louisiana Materials Design Alliance (LAMDA). Partial funding has come from the support of the Louisiana Board of Regents through the Board of Regents Support Fund, Contract No. LEQSF (2023-26)-RD-A-19, Program: R&D, Research Competitiveness Subprogram (RCS).

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The dataset and code used in the paper are available upon request. Information about the hyperparameter tuning is provided in the supplemental material file available in the Supplemental Materials on the ASME Digital Collection.

## Nomenclature

*h*=convection coefficient

*k*=thermal conductivity

*P*=laser power

- $S$ =
beam penetration depth

*c*=_{p}specific heat

- $lt$ =
powder layer thickness

*v*=_{s}scanning speed

*L*=_{f}latent heat of fusion

- $Tin$ =
initial temperature

*T*=_{S}solidus temperature

*T*=_{L}liquidus temperature

- $\epsilon $ =
emissivity

- $\eta l$ =
laser absorption efficiency

*µ*=effective viscosity

- $\rho powder$ =
apparent density

- $\rho solid$ =
solid density

*Φ*=spot size

*Φ*=powder porosity