Abstract

Despite the rapid adoption of deep learning models in additive manufacturing (AM), significant quality assurance challenges continue to persist. This is further emphasized by the limited availability of sample objects for monitoring AM-fabricated builds. Thus, this study advances an emerging diffusion generative model, i.e., the denoising diffusion implicit model (DDIM), for layer-wise image augmentation and monitoring in AM. The generative model can be used to generate potential layer-wise variations, which can be further studied to understand their causation and prevent their occurrence. The proposed models integrate two proposed kernel-based distance metrics into the DDIM framework for effective layer-wise AM image augmentation. These newly proposed metrics include a modified version of the kernel inception distance (m-KID) as well as an integration of m-KID and the inception score (IS) termed KID-IS. These novel integrations demonstrate great potential for maintaining both similarity and consistency in AM layer-wise image augmentation, while simultaneously exploring possible unobserved process variations. In the case study, six different cases based on both metal-based and polymer-based fused filament fabrication (FFF) are examined. The results indicate that both the proposed DDIM/m-KID and DDIM/KID-IS models outperform the four benchmark methods, including the popular denoising diffusion probabilistic models (DDPMs), and three other generative adversarial networks (GANs). Overall, DDIM/KID-IS emerges as the best-performing model with an average KID score of 0.840, m-KID score of 0.1185, peak signal-to-noise ratio (PSNR) of 18.150, and structural similarity index measure (SSIM) of 0.173, which demonstrated strong capabilities in generating potential AM process variations in terms of layer-wise images.

Graphical Abstract Figure
Graphical Abstract Figure
Close modal

1 Introduction

Additive manufacturing (AM) is one of the most revolutionary technologies in advanced manufacturing, capable of printing complex objects layer by layer [1]. Despite the widespread use of AM, its significant process uncertainty leads to tremendous quality assurance challenges in real-world engineering practices [2]. Certain process variations or uncertainties, such as voids, cracks, and underfill layers, can be directly observed and remedied by leveraging appropriate design of experiments as well as online monitoring and control techniques [3]. However, some process variations cannot be easily identified as the AM process might be highly nonstationary. Such variations from various sources may not always be observed in early stages, and thereby their accumulation could greatly impact the quality of the final printed object [4]. One way to improve AM quality assurance is to identify and predict the potential unobserved process variations before they occur. By doing so, those predicted unobserved process variations can be studied to understand their causation and ensure that they can be prevented in future printing.

Furthermore, though advanced data-driven techniques like deep learning are powerful for AM monitoring, control, and prediction, they often require a large training dataset [5], which can be prohibitively inaccessible for many AM practitioners. Thus, there is an urgent need for effective AM layer-wise monitoring in instances of limited data availability. Data augmentation is a common solution to address data availability issues, but these conventional approaches also come with some critical limitations. One of those limitations seen in basic augmentation is the lack of diversity. Besides, for the popular synthetic minority oversampling technique (SMOTE) [6], which has been used in AM to generate synthetic samples, its limitation is that it cannot always completely capture the data distribution [7]. Therefore, this work aims to develop an effective data-driven tool for functional layer-wise image augmentation, while simultaneously capturing possible process variations. Using this tool, such generated variations can be further studied to prevent them in future printing.

To address the data availability issues, the emerging generative AI-based data augmentation framework can be potentially leveraged to advance the capability of online AM layer-wise monitoring. In recent years, generative AI models have witnessed tremendous growth in the fields of image synthesis, temporal modeling, computer vision, and many interdisciplinary applications [8]. They aim to learn the underlying distributions of the training data, and therefore, they can potentially identify unobserved process variations and generate new possible sample sets. In the field of data augmentation, they have long been dominated by generative adversarial networks (GANs) [8]. Recently, diffusion models (DMs) have also emerged as viable alternatives with comparable or even superior performance compared to GANs [8].

While the development of generative AI models has achieved significant advancements in recent years, there are still some remaining challenges when applied to practical AM applications. First, there is a need to deal with the complexity of AM data, which may have high dimensionality, limited sample size, and complex underlying features [911]. Such highly complex AM underlying features include the microstructure and macrostructure of printed products, the material used, the process mechanism, the functionality of the part, as well as the design geometry [9]. Therefore, it could be difficult for certain generative AI models to capture the underlying distributions of the AM training data. This leads to some critical issues, such as mode collapse, vanishing gradients [12,13], generalization [13], or optimization and memorization issues [14,15]. Another challenge is related to the learning effectiveness of those models. Deep learning models generally rely on large amounts of training data to achieve high accuracy, and more often, limited training data may lead to overfitting [5]. Although the learning effectiveness can be improved by developing stronger network architectures and incorporating powerful regularization or optimization techniques, these tasks are not always easily achievable [5,16]. Finally, another challenge for generative AI modeling in AM monitoring lies in the paradox between similarity and diversity. Newly generated samples must maintain a level of similarity or consistency to the original samples, while also balancing this similarity with diversity. Specifically, in this study, it is ideal to introduce new data with possible/realistic variations (i.e., diversity), but it is also necessary to avoid overly focusing on diversity. Too much emphasis on diversity can lead the model to hallucinate. Therefore, it is crucial to determine the appropriate balance between similarity and diversity.

In this study, a new diffusion generative model-based framework is developed by incorporating newly established kernel-based distance metrics to generate effective layer-wise AM images. An effectively generated layer-wise AM image should be similar to the real images, realistic, and also never observed in the training data, i.e., be new, which is further explained in Sec. 3.1. The proposed framework can capture possible variations caused by the highly nonlinear process dynamics. To validate the proposed method, a high-resolution camera was used to capture AM layer-wise surface images during a fused filament fabrication (FFF) process. Subsequently, the AM layer-wise images were used to train the model and generate novel images, with the ability to predict unseen layer-wise variations. The generated samples were validated through a three-step quality assessment routine. The technical contributions of this study can be summarized as follows:

  1. The proposed models improved the denoising diffusion implicit model (DDIM) by incorporating novel kernel-based distance metrics to enable high-quality AM layer-wise image data augmentation, which can address the limited data availability issue in the image-based AM layer-wise monitoring.

  2. These newly proposed distance metrics can also serve as posttraining quality assessment tools for evaluating the generated images against the real images. The reality of the generated images against the real images is validated with a reality detection model.

  3. The proposed methodology, with a more effective balance between similarity and diversity, can predict possible layer-wise variations. Therefore, it can serve as a monitoring tool for AM.

The rest of this article is structured as follows. Section 2 introduces a review of machine learning (ML)-enabled layer-wise monitoring and the applications of generative models for AM. Section 3 presents the proposed research methodology, while Sec. 4 exhibits the various cases used to validate the effectiveness of the proposed approach. Finally, Sec. 5 summarizes the conclusion and discusses the future research directions.

2 Literature Review

This section presents an overview of the research addressing AM quality assurance issues. More specifically, Sec. 2.1 discusses current studies that have utilized ML-enabled models for layer-wise monitoring of AM, while Sec. 2.2 summarizes generative models that have been applied to AM. Finally, Sec. 2.3 presents the existing research gaps.

2.1 Machine Learning-Enabled Additive Manufacturing Layer-Wise Monitoring.

Due to the complexity of the AM process, many conventional methods, such as statistical control charts, are not always an appropriate option. For example, the direct application of statistical control charts usually requires a large sample size to well establish the control chart. Also, due to the complex underlying features of AM data, it is very challenging for the control chart to directly understand such underlying features. Therefore, many ML techniques have been used to address quality assurance issues in various AM technologies [17]. For example, Shi et al. [18] utilized a long short-term memory (LSTM) autoencoder to extract important features from the side channels, which were fed into both supervised (classification) and unsupervised process monitoring models for FFF AM process alteration detection. In another study, the authors also used an LSTM autoencoder to extract important features from spectra, which were then used for a K-means clustering classification of deposition quality (qualified, unqualified, or unknown) in the directed energy deposition (DED) process [19]. For process monitoring and control in metal AM, melt pool (from the melted zone) characteristics are important features to monitor to establish the quality of the printed parts. To do so, Akbari et al. [20] used several material and processing parameters and various ML models to predict melt pool geometry (length, depth, and width) and classification (balling, desirable, lack of fusion, and keyhole) in DED or powder bed fusion (PBF) process. Some of the ML models of this study included random forest (RF), support vector machine (SVM), neural network, XGBoost, ridge linear, and Gaussian process model. Furthermore, other models integrate feature extraction methods and ML models such as SVM and RF for anomaly detection using melt pool images in the DED AM process [21,22]. One of the most destructive defects to part quality is cracks; therefore, Kononenko et al. [23] proposed to use waveforms from acoustics signals with ML (SVM, RF, GP, logistic regression) to classify between crack and noise in laser-PBF (L-PBF).

In addition, several studies have utilized visual imaging analysis for variation detection within AM layers. Convolutional neural networks (CNNs), suited for grid-like data, have shown great potential for interlayer variation quantification, anomaly detection, and imperfection detection in AM [24,25]. For instance, a study used a CNN model to classify layer-by-layer failures at various temperatures and speeds with an accuracy of 96% under 21 classes [26]. Trinks and Felden [27] performed an image mining-based rapid prototyping quality monitoring by comparing seven algorithms for fault detection, where the CNN model ranked among the top performers. Patel et al. [28] performed a dross defect detection in the L-PBF process using a YOLO CNN.

Furthermore, extensive studies have also been conducted for monitoring and controlling AM using multi-sensor data. For instance, a recent study by Mahato et al. [29] on porosity detection used pyrometer time series data, k-nearest neighbors (kNN), and dynamic time warping for pore defect classification in the PBF process. Point clouds are point representations of an object in a 3D space and can be used to capture defects in small process shifts [4]. Ye et al. [4] used 3D point clouds to determine miniature variations, such as those caused by small process shifts in the FFF process. By using approximately 600 samples and a deep forest model, the obtained accuracy was high enough for AM quality assurance. Similarly, Lyu et al. [30] used point clouds from a 3D scanner turned into a 2D image and then fed into several ML models (SVM, KNN, CNN, and hybrid convolution autoencoder (HCAE)) for anomaly classification based on F-score. In addition, though the HCAE had the best performance, turning the point clouds into 2D images can lead to some key information being lost in the transformation process. Therefore, Yangue et al. [3] also utilized 3D point clouds (directly without turning them into images) to capture the process dynamics of the printed layers. Their models consisted of using an integrated CNN autoencoder-LSTM network to predict plane layer-wise surface morphology of the FFF process. Overall, ML has played a significant role in enabling layer-wise monitoring and enhancing quality assurance in diverse applications of AM.

2.2 Generative Models for Additive Manufacturing.

Generative models are algorithms used to study the underlying probabilistic distributions of training data to generate new sample sets [8]. They include models such as energy-based models [31], variational autoencoder (VAE) [32], GAN [33], normalizing flow [34], and DM [35].

In a study on selective laser melting (SLM), the researchers utilized deep belief networks (DBNs) for defect detection [36]. In another SLM process study, the same authors developed a tool for in situ process monitoring based on spatter and plume image signatures. The tool utilized an improved DBN, achieving a classification rate of about 83.4% [37]. DBNs are powerful; however, their complexity is a major barrier to wider adoption [38]. VAE is a powerful tool used for encoding data into a latent lower-dimensional code, which can then be utilized for further study. Song et al. [39] integrated a VAE and GAN in a hybrid deep generative prediction network for pore morphology prediction in the metal AM process. Ghayoomi Mohammad et al. [40] used a three-phase model for defect detection in L-PBF, where k-means clustering and a neural network were used to label the data. The outputs were then matched to their corresponding defect types of the acoustic signals. The resulting model successfully detected the acoustic emission signals. VAE has been used by Lew and Buehler [41] to get the 2D latent code from cantilever structures and then fed the code to an LSTM to study the trajectories linking it to the optimization procedure.

GAN is by far one of the most used generative models in AM [8]. For instance, many studies have utilized GANs, such as augmented time regularized-GAN [42], for data augmentation. These models generate additional samples used to train supervised models. Chung et al. [7] used a standard GAN to address the issue of data imbalance in classification due to the rare occurrence of anomalies compared to normal data in the FFF and electron beam melting process. In addition, Hertlein et al. [43] used a conditional GAN for topology optimization. A variety of studies have leveraged GANs for defect detection, data augmentation due to data imbalance, and pattern identification and segmentation [7,44]. On the other hand, DM is another type of generative model that has shown great potential in image augmentation in computer vision [45]. DMs are capable of generating high-quality images similar to or even better than those generated by GANs [8]. This technique, which has not been fully applied to AM layer-wise images, holds the potential to revolutionize AM studies requiring the use of generative models. For instance, Zhang et al. [46] proposed to utilize deep learning models for image super-resolution improvement model taking the low resolution of high dynamic range fringe projection to a higher resolution in layer-wise surface topography measurements in L-PBF AM. Comparing the residual dense-based CNN with two denoising diffusion probabilistic models (DDPM and DDPM-SR3 with refinement steps), the DDPM-SR3 turned out to have the best performance. Ogoke et al. [47] used denoising diffusion probabilistic models (DDPMs) for image super-resolution models for melt pools to map low-fidelity simulation information to a higher fidelity. However, despite the new popularity of DMs such as DDPM, DMs still face challenges in terms of speed, data structure diversification, dimensional reduction, and likelihood optimization [8].

2.3 Research Gaps.

Numerous studies have developed accurate deep-learning tools for image-based layer-wise monitoring and control of AM, but they generally require a large training data set to capture both within- and between-layer patterns and extract features. Though models such as GANs can generate fast and high-quality images, they sometimes demonstrate poor distribution learning and thus generate poor-quality images limited in diversity. This limitation is usually due to certain issues such as nonconvergence, optimization, architecture selection, vanishing gradient, or mode collapse [48,49]. Furthermore, many advanced generative models have been trained using visual image data, such as human faces or dogs (see Fig. 1(b)), whereas AM layer-wise images have complex patterns and textures as shown in Fig. 1(a). Besides, there is an insufficient use of diffusion-based models in AM, which can bypass to some extent the issues of mode collapse and training instability faced by some GAN models. In addition, diffusion-based models can generate high-quality and diverse samples since they use likelihood probability that can cover certain regions of the training data [8]. Thus, there is a need to develop a diffusion-based model to generate high-quality image augmentation for layer-wise images of AM with a balance between similarity and diversity. Furthermore, the model should be capable of generating possible layer-wise variations unseen during training.

Fig. 1
(a) An FFF layer-wise image sample and (b) an image of a dog
Fig. 1
(a) An FFF layer-wise image sample and (b) an image of a dog
Close modal

3 Research Methodology

The research methodology section is subdivided into five subsections. Section 3.1 presents the overall framework of the proposed methodology. Section 3.2 introduces the DM leveraged in this study followed by Sec. 3.3 presenting the novel distance metrics we proposed to advance the DM. Section 3.4 describes the quality assessment steps used to validate the generated images. Finally, Sec. 3.5 describes the hyperparameter tuning and the training procedure in this work.

3.1 Overall Methodological Framework.

This section presents our overall methodological framework. Figure 2 depicts the structure of the proposed model with a supplemental quality assessment approach to validate the reality, similarity, and diversity of the AM layer-wise images generated by our proposed DM. In this work, an FFF 3D printer is utilized to print several AM objects with various patterns, and a high-resolution camera is used to take layer-wise images of each printed object (more details are presented in Sec. 4.1). Afterward, the AM layer-wise images go through a data preprocessing step for preparation before being fed into the DM for generating high-quality images. The leveraged-based DM, DDIM, and its brief introduction are presented in Sec. 3.2.

Fig. 2
The structural overview of the proposed method
Fig. 2
The structural overview of the proposed method
Close modal

A key contribution of this work is advancing the DDIM by incorporating two newly proposed kernel-based metrics to strike a balance between similarity and diversity while generating images, as presented in Sec. 3.3. Moreover, to validate the generation outcomes, the generated images undergo three sequential quality assessment checks. The first assessment (similarity check) consists of using established image quality metrics to determine the likeness of the generated images against the real images. The second assessment is the reality check to assess the quality of generated images via a reality detection model. Finally, the third assessment is the diversity check, where the proposed model is used for layer-wise monitoring for the identification of possible AM process variations.

3.2 Diffusion Generative Model.

DMs are emerging algorithms that exhibit impressive generative capacities comparable to those of GANs [50,51]. A DM consists of transforming training images into random noise and then reconstructing the data into new samples by studying the transformation steps [50,51]. These steps are called forward and reverse diffusion as displayed in Fig. 3. The reverse diffusion is similar to a denoising autoencoder with no bottlenecks [52]. Popular DMs include the score stochastic differential equation models [53], the score-based generative models [54], and the DDPM [51].

Fig. 3
The overall idea of a diffusion generative model
Fig. 3
The overall idea of a diffusion generative model
Close modal

For this study, the DDIM is utilized, which is similar to a DDPM implementing a non-Markovian diffusion process instead of Markov chains used in the DDPM applications [50], as shown in Fig. 4. In practice, DDPM usually requires a large number of iterations and steps to produce high-quality generated images, which is mainly caused by the employed Markov chain process. This makes DDPM computationally slow and expensive compared to GAN. By overcoming this barrier, the DDIM could have much faster computational efficiency, more consistent properties, and better image interpolation than the standard DDPM [50]. Therefore, DDIM is a viable candidate to generate high-quality images for AM layer-wise images. After an image has been mixed with noise, a CNN autoencoder-based segmentation network (namely, the U-Net [55]) is used to denoise the noisy input. The U-Net gradually downsamples and then upsamples the AM layer-wise images while linking layers with skip connections (which allow the model to learn image denoising).

Fig. 4
Markovian versus non-Markovian diffusion process
Fig. 4
Markovian versus non-Markovian diffusion process
Close modal
The DDIM-generated outputs can be derived from three steps: (1) deriving the non-Markovian forward process, (2) defining the generative process and variational inference objective, and (3) generating the DDIM samples. First, DDIM tries to understand the input distribution of the AM training real images R:(x1,,xt). Understanding the input distribution of R can ensure that the generated images capture both similarity for data augmentation and diversity for possible AM process variations. The AM layer-wise images R go through a forward process initiated by iφ in which R are mixed with Gaussian noise at a noise rate of ωt. The inference distribution iφ is established to define the forward process from Bayes' rule where φ controls the stochasticity of the forward process. Note that this process is non-Markovian (Fig. 4), where each xt is dependent on x0 and xt1, and this approach should further improve the generated samples' quality with fewer time-steps t. iφ is calculated using Eq. (1).
iφ(xt|xt1,x0)=iφ(xt1|xt,x0)iφ(xt|x0)iφ(xt1|x0)
(1)
After the noise input xt is obtained, a prediction x0 is used with a reverse conditional distribution iφ(xt1|xt,x0) to determine a sample xt1. The denoised observation prediction dθ(t)(xt) of layer x0 given xt is drawn by Eq. (2). dθ(t)(xt) is predicted using the noise variable τθ(t), the noise rate ωt, and an observation sample xt. τ, the noise variable, in some cases, could follow the Gaussian distribution N of mean 0 with the identity matrix I as the covariance matrix τtN(0,I). θ represents the learnable parameter capturing the probabilistic relationship between the noisy and the clean images. Afterward, dθ(t)(xt) of Eq. (2) is used to finalize the generative process gθ(t)(xt1|xt) established by Eq. (3). The generative layer-wise AM sample xt1 from xt is established by Eq. (4).
dθ(t)(xt):=(xt1ωtτθ(t)(xt))ωt
(2)
gθ(t)(xt1|xt)={N(dθ(1)(x1),φ12I)ift=1iφ(xt1|xt,dθ(t)(xt))otherwise
(3)
xt1=ωt1(xt1ωtτθ(t)(xt)ωt)+1ωt1φt2τθ(t)(xt)+φtτt
(4)
where xt1ωtτθ(t)(xt)ωt is the predicted x0, 1ωt1φt2τθ(t)(xt) is the direction pointing to xt, and φtτt is the random noise.

Through these steps, new samples denoted as G, are generated by capturing the underlying distribution of the training samples. The overall training procedure of the DDIM is illustrated by Algorithm 1.

General DDIM training procedure

Algorithm 1:

Input: Layer-wise images R(x1,,xt), time-steps t{1.T}, noise rate ωt, parameter θ

Repeat

 Step 1: Initialize forward process iφ

 Step 2: Add Gaussian noise τtN(0,I) to real samples R(x1,,xt)

 Step 3: Denoise sample xtdθ(t)(xt) using Eq. (2)

 Step 4: Initialize the generative process gθ(t)(xt1|xt) given xt

 Step 5: Calculate the generated sample xt1 using Eq. (4)

 Step 6: Backpropagate and optimize the model parameters based on sample and noise loss

Until converged

However, notably different from the natural images, the AM layer-wise images demonstrate very specific patterns. Therefore, certain DDIM-generated distributions could be considered unrealistic if they are physically impossible to appear in AM practice (see Fig. 15). Because of this, the DDIM needs to be further tailored for AM, which can balance the diversity and similarity of the inference distribution. It motivates this study to develop appropriate metrics for DDIM training. Such metrics update the model per generated batches of images G to improve the quality of generated images at the next step t. More details about the proposed metrics to advance DDIM for AM are presented in Sec. 3.3.

3.3 Distance Metric to Balance Similarity and Diversity.

Image quality assessments have long been dominated by human visual judgment, which is not very scientific. Although efforts have been made to develop subjective and objective metrics [56,57], many of these metrics exhibit some limitations. For instance, they sometimes fail to identify obvious visual observations between the real and generated images. This study aims to generate high-quality, layer-wise AM images that are both sufficiently similar to the training AM images and diverse enough to capture possible underlying variation patterns. In terms of evaluation metrics, two metrics have been widely used to measure image quality in terms of balance between similarity and diversity. These metrics are the inception score (IS) [58] and the Fréchet inception distance (FID) [59], displayed in Eqs. (5) and (6), respectively.
IS=exp(EyG[DKL(p(v|y)||p(v))])
(5)
where EyG represents the expectation of the generated images G and y is an image sampled from G. The DKL() is the Kullback–Leiber (KL) divergence between the conditional class distribution p(v|y) of the generated images and the marginal class distribution p(v) of all generated samples. v is the output class label.
FID=μRμG2+Tr(CR+CG2CRCG)
(6)
where μRandμG are the means of the extracted features from the real and generated images. Tr() represents the trace operation, and CR and CG are the covariance matrices of the extracted features from the real images (R) and the generated images (G), respectively.
The IS calculates the entropy of the probabilities of generated images and the KL divergence between conditional and marginal class distribution. IS is a good metric to evaluate the quality and diversity of generated images, but it is not always capable of capturing certain desired distributions of datasets with a variety of different classes [58]. On the other hand, the FID [59] can measure the similarity of the generated images between the generated and real images and capture distributions that the IS cannot capture. Similar to IS, FID also uses an inception network (a deep CNN-based architecture) and is based on the Wasserstein-2 distance. However, a study by Bińkowski et al. [60] has shown that FID has a strong bias against small sample sizes, meaning that it requires a large sample size of generated images before it can provide an unbiased score. Moreover, in practice, the FID metric is also computationally expensive. Therefore, the distance metric used in our DDIM model is the Kernel inception distance (KID) developed by Binkowski et al. [60], as shown in Eqs. (7a) and (7b). KID aims to improve the limitations of FID by measuring the maximum mean discrepancy (MMD) between the inception network of the real and generated images. KID assumes that kernel-extracted features can capture and balance the similarity and diversity of the generated samples [61], and it is usually not biased even for small sample sizes.
KID=MMD2(R,G)
(7a)
KID=Ex,xR[k(x,x)]2ExR,yG[k(x,y)]+Ey,yG[k(y,y)]
(7b)
where MMD is the maximum mean discrepancy between the extracted features from the real images (R) and generated images (G). More specifically, Eq. (7a) can also be rewritten with kernel expressions as shown in Eq. (7b), where Ex,xR[k(x,x)] denotes the expected mean kernel features extracted from the real images (R). Ey,yG[k(y,y)] represents the expected mean kernel extracted from the generated images (G). ExR,yG[k(x,y)] represents the expected cross-mean kernel between the real images and the generated images. x and y are vector data points or samples from the data distribution of R and G, respectively. Section 4.3.2 further presents a comparative study between KID and FID.
In this study, to further fit the scenario in AM layer-wise monitoring, two modified KID metrics are proposed to improve the quality of the generated AM layer-wise images. Note that the distance metrics can also be used after postaugmentation to determine image quality, as discussed in Sec. 3.4. The original KID is based on the polynomial kernel in Eq. (8). Specifically, the authors set their original kernel to be equal to k(x,y)=(1dxTy+1)3 [60]. The proposed modified m-KID is the combination of two kernels: polynomial kernel kp (see Eq. (8)) and Gaussian kernel kg (Eq. (9)) with a weight coefficient α. The final combined kernel kc is obtained by Eq. (10).
kp(x,y)=(φxTy+ε)d
(8)
kg(x,y)=12πσexp(||xy||22σ2)
(9)
kc(x,y)=αkp(x,y)+(1α)kg(x,y)
(10)
where, in this study, according to preliminary experiments, without loss of generality, we set φ=1, ε=0.5, d=2, σ=1, α=0.5, and the normalization constant 12πσ of the Gaussian kernel omitted for the final proposed model. Note that a high α gives more weight to the polynomial kernel, implying the effect of such high α to be similar to the effect of using only a polynomial kernel. On the other hand, a low α gives more weight to the Gaussian kernel. The ablation study in Sec. 4.2 discusses the effect of each kernel on the performance of KID and further demonstrates the impact of α.
By incorporating the newly integrated kernel kc(x,y) of Eq. (10) into Eq. (7b), the first version of the modified KID (m-KID) is represented by Eq. (11).
m-KID=Ex,xR[kc(x,x)]2ExR,yG[kc(x,y)]+Ey,yG[kc(y,y)]
(11)
The second metric proposed, termed KID-IS, is an integration of the m-KID in Eq. (11) and IS in Eq. (5). Though the baseline KID with the DDIM can produce some realistic images, the generated images still demonstrate a relatively high chance to generate unrealistic patterns. The intuition behind KID-IS is the understanding that IS can capture more of the diverse distribution of the training dataset. Therefore, combining it with m-KID can further help to balance the expected similarity and diversity in the AM process. This addition should make the DDIM more robust in producing AM layer-wise images that are both reasonably realistic and diverse. Specifically, KID-IS is calculated using the harmonic mean. The harmonic mean can average KID and IS with an emphasis on ratio and lower values. Therefore, KID-IS is not significantly affected by extremely large values and thus presents a balanced representation of the integrated evaluation metrics. Equation (12) presents the calculation of KID-IS.
KID-IS=2(m-KID×IS)m-KID+IS
(12)

Leveraging the DDIM as a base model, the training procedure incorporating the proposed distance metrics, m-KID or KID-IS, i.e., our proposed DDIM/m-KID and DDIM/KID-IS is illustrated in Algorithm 2. Notably, an ablation study is also presented in Sec. 4.2 to demonstrate the effects of using different kernels and their combinations on the DDIM model.

DDIM training incorporated with the distance metrics

Algorithm 2:

Input: Layer-wise images R(x1,,xt), time-steps t{1.T}, noise rate ωt, parameter θ, maximum number of epochs: E,

Step 1: Initialize epoch=1, max and min signal rates, learning rate, and L2 regularization.

whileepochEdo

Step 2: fort=1.Tdo

    Obtain xt1 by Algorithm 1

    Backpropagate and optimize the model parameters based on sample and noise loss output.

   end for

   return generated samples G(y1,,yt), image loss, noise loss

Step 3: forxR and yGdo

    Compute combined kernel-extracted features kc(x,y) using Eq. (10) from kp(x,y) and kg(x,y)

    Use kc(x,y) to calculate m-KID(R,G) with Eq. (11)

    If calculating KID-IS(R,G) use IS and m-KID with Eq. (12)

   end for

   returnKID-IS or m-KID

Step 4: Update exponential moving averages of weights of Algorithm 1

end while

3.4 Post-Training/Generation Quality Assessment.

As discussed in Sec. 3.1, the quality of the generated images after posttraining is assessed and validated through three sequential assessments. The first assessment consists of using several common evaluation metrics to determine the similarity of the generated images against the real images. These metrics include the kernel inception distance (KID) [60], the peak signal-to-noise ratio (PSNR) [56], and the structural similarity index measure (SSIM) [56]. In addition, as a reference, one of the proposed metrics (m-KID) is also used as a posttraining image quality metric.

The second assessment is termed a reality check to assess the quality of generated images via a reality detection model. The reality detection model is developed using an architecture similar to unsupervised anomaly detection. This assessment is to ensure whether the generated samples can be classified as real. An unsupervised autoencoder, namely, a CNN-autoencoder (CAE), is leveraged and trained using real images. Subsequently, the trained model is used to evaluate if our generated images are real or fake. The overall architecture of the reality detection model can be seen in Fig. 5. The layer-wise images pass through the CNN encoder of the CAE, and a Gaussian kernel density function is used to estimate the probability distribution Z of the generated code. This estimated distribution Zth is used as a deterministic parameter for the reality detection threshold. Afterward, the code goes through the CNN decoder of the CAE, and the reconstruction error ε between the real and reconstructed images is determined. This estimated reconstructed error εth is also used as the second deterministic parameter of the reality detection threshold.

Fig. 5
Overall architecture of the leveraged reality detection model
Fig. 5
Overall architecture of the leveraged reality detection model
Close modal
Notably, the reality detection threshold is determined by Eq. (13):
Ageneratedimage(Gi)is{fake,ifZGi<ZthandεGi>εthreal,otherwise
(13)
where ZGi is the estimated density of an encoded generated image (Gi), and εGi is the estimated reconstructed error between the generated image (Gi) and the decoded generated image. Zth and εthrepresent, respectively, the estimated threshold density of the encoded real images, and the estimated threshold error between the real images and the decoded/reconstructed real images.

Finally, the third assessment is the diversity check. The proposed model is used for layer-wise monitoring for the identification of possible AM process variations. These variations in AM are process deviations occurring during the printing process, which may or may not impact the quality of the printed product. DMs can understand the probability distribution of their training inputs. Therefore, the DDIM needs to be further investigated to determine its capability to foresee possible layer-wise AM variations. To do so, the layer-wise images are separated between the training and testing layers. The training layers undergo basic data augmentation and are then fed to the proposed model to generate new images. Through a visual manual search, some of the generated images are selected and compared to the testing layers not seen by the model. The goal is to determine if there are any possible inherent predicted variations by the DDIM model.

3.5 Hyperparameter Tuning and Model Training.

The AM layer-wise images go through a specific training procedure. The layer-wise images are first preprocessed and separated between training and validation images to facilitate the evaluation metrics calculation. Afterward, the layer-wise images go through the following diffusion process steps for data augmentation:

  1. The network hyperparameters are initialized (see Sec. 4.1 for hyperparameters details)

  2. DDIM training loop until convergence:

    • Forward diffusion: The training layer-wise images are mixed with Gaussian noise at a noise rate and continuous diffusion time. A diffusion schedule measures the signal and noise levels within the images as signal rate and noise rate. The signal rate and noise rate are the cosine and sine of the diffusion angle, respectively.

    • Reverse diffusion: The noisy inputs go through the neural network at each diffusion step for separation from noisy images into noise and the cleared images.

    • The loss, mean absolute error (MAE), between the generated and real images is measured and the gradient is backpropagated through the network. The proposed distance metrics m-KID and KID-IS are also computed between the real and generated images per batch size. An exponential moving average rate is used to average changes in the image evaluation metrics per batch during training.

    • The optimizer AdamW optimizes [62] and updates the network weight. The DDIM iterates through the steps for the number of epochs selected until convergence.

4 Case Study

4.1 Experimental Setup and Data Collection.

Six experimental cases are conducted to generate real-world layer-wise images from the FFF process using copper and polylactic acid (PLA) filament, respectively. A Prusa I3 3D printer is used for both metal FFF and polymer FFF printing. For metal FFF, the filament spool is suspended by a PVC-pipe-based support structure (as illustrated in Fig. 6(a)). This allows pre-heating of the metal filament by a filament warmer before entering the printing head in Fig. 6(b), which avoids the breakage of the metal filaments.

Fig. 6
The experimental setup on an FFF 3D printer in this study
Fig. 6
The experimental setup on an FFF 3D printer in this study
Close modal

To facilitate layer-wise imaging, a HAYEAR 4K UHD 12MP microscope camera is positioned so that it overlooks a small section of the bed. Furthermore, a Neoteck height probe is used to make fine adjustments to the focus of the camera. For data collection, 1 cm3 cubes are printed with varying infill patterns, and the GCODES are adapted to pause between layers. Once each layer has been printed, the print bed is under the microscope camera and the height probe is used to perform fine adjustments of the camera focus. Subsequently, the layer-wise image (with dimension 1920 by 1080) is captured, and the printing of the new layer is resumed. Process parameters and the infill patterns/designs of each case are summarized in Tables 1 and 2.

Table 1

Printing process parameters

Process parameterValues
Nozzle temp215 °C
Bed temp65 °C
Infill printing speed80 mm/s
Perimeter printing speed60 mm/s
Non-printing speed130 mm/s
Infill patternRectilinear
Infill density100%
Layer height0.3 mm
First layer height0.35 mm
Nozzle diameter0.6 mm
Process parameterValues
Nozzle temp215 °C
Bed temp65 °C
Infill printing speed80 mm/s
Perimeter printing speed60 mm/s
Non-printing speed130 mm/s
Infill patternRectilinear
Infill density100%
Layer height0.3 mm
First layer height0.35 mm
Nozzle diameter0.6 mm
Table 2

Cases materials and infill pattern

CaseMaterialInfill pattern
1CopperSolid, 45 deg/135 deg
2PLASolid, 45 deg/135 deg
3CopperSolid, 0 deg/90 deg
4PLASolid, 0 deg/90 deg
5aPLAHollow box, 45 deg/135 deg
5bPLAHollow box, 45 deg/135 deg
CaseMaterialInfill pattern
1CopperSolid, 45 deg/135 deg
2PLASolid, 45 deg/135 deg
3CopperSolid, 0 deg/90 deg
4PLASolid, 0 deg/90 deg
5aPLAHollow box, 45 deg/135 deg
5bPLAHollow box, 45 deg/135 deg

Figure 7 shows samples of the six cases. Case 5a can be understood as a combination of cases 2 and 5b, where a hollow box is printed on top of the normal layer, whereas case 5b only consists of hollow box layer-wise images. After the layer-wise images for each case have been collected, they go through a data preprocessing phase. This phase includes image cropping (to focus on the region of interest), basic data augmentation (rotation, width, height shift, zoom, flip, etc.), resizing, and Gaussian noise injection. Note that this preprocessing noise injection is a data augmentation technique to improve the overall data distribution, and it is different from the noise introduced by the DM itself. Each printed object has either 33 or 34 layers, which can be considered a small dataset for deep learning models. Therefore, there is a significant need for basic data augmentation. The  Appendix shows the preliminary study done to determine the appropriate data augmentation (Table 10).

Fig. 7
AM layer-wise sample images for the six cases
Fig. 7
AM layer-wise sample images for the six cases
Close modal

To demonstrate the need for data augmentation, the training images of case 1 with 32 layer-wise images (Fig. 8(a)) are trained with the baseline DDIM model. After training the model with 1000 epochs, the DDIM model could not generate any realistic images, which is shown in Fig. 8(b). Therefore, there is a need for basic augmentation. The layer-wise images are augmented using data augmentation 5 in Table 10. The proposed DDIM model is trained with MAE loss instead of mean square error (MSE) since MSE also leads to generating imaginary images after a certain number of epochs as seen in Fig. 8(c). The different parameters used and tuned during the model's training are summarized in Table 3. The model is trained using an AdamW optimizer [62]. It is worth noting that setting the width architecture to a higher dimension such as [64,128,256,512] leads to incomplete generated images as shown in Fig. 8(d). To facilitate the DDIM training model, the layer-wise image resolution is resized to 128×128. Training is implemented using Nvidia RTX A2000 12 GB, NVIDIA Tesla T4 GPU, and the TensorFlow 2.10 library.

Fig. 8
(a) Original layer-wise image of case 1, (b) generated image using 32 layer-wise AM images with no augmentation after 1000 epochs, (c) generated image with MSE Loss after denoising process, (d) generated image while using a higher architecture dimension in U-Net, and (e) image generated with LSGAN
Fig. 8
(a) Original layer-wise image of case 1, (b) generated image using 32 layer-wise AM images with no augmentation after 1000 epochs, (c) generated image with MSE Loss after denoising process, (d) generated image while using a higher architecture dimension in U-Net, and (e) image generated with LSGAN
Close modal
Table 3

DDIM parameters setup in this study

Model parametersValue
Ridge regularization (L2)0.0002
Exponential moving average rate0.999
Learning rate0.001
Minimum/maximum signal rates0.02/0.98
Embedding dimension/block depth/width32/2/[32, 64, 96, 128, 256]
Minimum/maximum frequencies1/1000
Model parametersValue
Ridge regularization (L2)0.0002
Exponential moving average rate0.999
Learning rate0.001
Minimum/maximum signal rates0.02/0.98
Embedding dimension/block depth/width32/2/[32, 64, 96, 128, 256]
Minimum/maximum frequencies1/1000

4.2 Ablation Study: Test With Different Kernels.

For this study, five individual kernels are studied, and their performances are evaluated, including polynomial kernel kp (which is the original baseline kernel of the KID in Eq. (8)), Gaussian kernel kg (see Eq. (9)), linear kernel kl (see Eq. (14)), sigmoid kernel ks (see Eq. (15)), and Laplacian kernel klp (see Eq. (16)).
kl(x,y)=xTy+r
(14)
ks(x,y)=tanh(xTy+r)
(15)
where r=1.
klp(x,y)=exp(φ||xy||)
(16)
where φ=1. Moreover, since the KID equation is made of the mean kernel of the cross, generated, and real images, one of the models of the ablation study uses the three different kernels at once. The polynomial kernel, the Gaussian kernel, and the Laplacian kernel are, respectively, namely, PGL, used to extract features from the real images, generated images, and the cross between real and generated. The KID version for that combination KIDPGL is represented by Eq. (17):
KIDPGL=Ex,xR[kp(x,x)]2ExR,yG[klp(x,y)]+Ey,yG[kg(y,y)]
(17)

In addition to the hyperparameter tuning, this ablation study is conducted to evaluate the effects of several types of kernel-based KID on the quality of the generated images. The augmented layers (using the finalized augmentation setting in Table 10) of case 1, with approximately 500 layer-wise images, are used as the dataset for this study. In Table 4, the average KID of the images generated within the latest epochs is reported.

Table 4

Evaluation of different kernels in KID

ModelAverage m-KID in Training
Polynomial (baseline)1.43
Gaussian2.00
Linear644
Sigmoid0
Laplacian10−13
PGL20.25
Combined kernel (our model)0.55
ModelAverage m-KID in Training
Polynomial (baseline)1.43
Gaussian2.00
Linear644
Sigmoid0
Laplacian10−13
PGL20.25
Combined kernel (our model)0.55

Figure 9 depicts some of the generated images for each component of the study. Based on the results, using a linear kernel leads to extremely high KID values while the generated images are blurred with either no clear visible patterns or unrealistic patterns. On the other hand, the Laplacian kernel results in extremely low KID values, while the generated images are still blurred and similar to the linear-based generated images. The sigmoid kernel leads to darkened generated images with a constant value of 0 as the resulting KID throughout training. Such a low KID value usually implies that the generated images should be visually similar to the original, but this is not the case. Therefore, it implies that this kernel might confuse the KID metrics and that the score is therefore meaningless. The Gaussian kernel led to the third lowest KID after the polynomial kernel, satisfying the intuition of combining the two kernels in the proposed model. The Gaussian kernel did lead to samples with a distinct color intensity than the original samples. The results of this ablation study explain the reason α in Eq. (10) is set to be equal to 0.5 to provide an equal balance between the polynomial kernel and the Gaussian kernel. Moreover, the resulting KID, as depicted in Fig. 10(a), shows the resulting KID for the PGL failed to measure the improvement of the DDIM from the noisy image to the cleared image, whereas our proposed modified kernel could detect the difference (see Fig. 10(b)). In summary, considering factors such as color intensity, patterns, similarity, and average KID, our proposed model produced images with the closest quality to the real images.

Fig. 9
Ablation study results comparing the effect of various kernels on the generated images
Fig. 9
Ablation study results comparing the effect of various kernels on the generated images
Close modal
Fig. 10
Measured KID score versus epochs on ablation dataset using: (a) PGL and (b) our proposed kernel
Fig. 10
Measured KID score versus epochs on ablation dataset using: (a) PGL and (b) our proposed kernel
Close modal

4.3 Benchmarks and Evaluation Metrics

4.3.1 Benchmark Methods.

For the six cases used in this study, four benchmark methods are used to validate and demonstrate the effectiveness of our proposed models. These benchmarks include the Wasserstein generative adversarial networks (WGAN) [16], the DDPM [51], deep convolutional generative adversarial networks (DCGANs) [63], and the least-square GAN (LSGAN) [64].

WGAN is a modified version of the original GAN, which uses a critic and the Wasserstein distance [16]. DDPM serves as a benchmark since DDIM addresses DDPM's limitations. DCGAN is a specialized GAN using CNN as the architecture of the generator and discriminator. Finally, LSGAN employs the least squares loss function [64]. All benchmarks have been evaluated with the same number of epochs as the proposed method except for the DDPM due to its extremely high computational demands. DDPM is evaluated with fewer number of epochs, using 500 time-steps. These benchmarks produce layer-wise images for the six cases, which are then compared to those generated by the proposed models.

4.3.2 Evaluation Metrics.

The quality of the generated images is assessed using four evaluation metrics: KID [60], m-KID (proposed metric), peak signal-to-noise ratio (PSNR) [56], and the structural similarity index measure (SSIM) [56]. Higher image quality leads to lower KID scores. KID is used over FID due to FIDs bias. A test comparing KID and FID between real images and generated (bad and good) images using case 1 layer-wise images demonstrated the consistency of KID. Using generated images from augmentation 4 and 1 (see Fig. 16), the test is repeated three times and presented in Table 5, show the inconsistencies of FID compared to KID.

Table 5

Performance of KID and FID

TrialKIDFID
Good imagesBad imagesGood imagesBad images
Trial 10.226040.296771.92.0
Trial 20.226040.2967701.9
Trial 30.226040.2967700
TrialKIDFID
Good imagesBad imagesGood imagesBad images
Trial 10.226040.296771.92.0
Trial 20.226040.2967701.9
Trial 30.226040.2967700

We also introduce m-KID, which uses a modified combination of kernels and can improve the generated image quality. As depicted in Fig. 11, m-KID decreases as the generated images per epoch improve in quality, transitioning from noisy images to clear images. This implies that m-KID can be used for image quality assessment after postaugmentation.

Fig. 11
Generated image performance per epoch using m-KID (case 1)
Fig. 11
Generated image performance per epoch using m-KID (case 1)
Close modal
PSNR (Eq. (18)) measures the level of noise or distortion per peak signal between the real and the generated images. Therefore, a high PSNR score would indicate a high-quality image. On the other hand, SSIM measures the similarity between the real and generated images by comparing structural and visible elements between the two sets of images. SSIM (Eq. (19)) varies between −1 and 1, with a score closer to 1, indicating more similarity. Moreover, SSIM considers elements such as contrast and luminance. It is worth noting that such metrics as PSNR and SSIM are pixel based and have limitations in that they do not always align with human visual judgment [65]. All four metrics combined can serve as a good image quality assessment to determine the similarity between the generated and real images.
PSNR=10×log10R2MSE
(18)
where R is the maximum pixel fluctuation in the real images, and MSE is the mean square error difference between the real images (R) and generated images (G).
SSIM=(2μRμG+τ2l2)(2σRG+τ2l2)(μR2+μG2+τ2l2)(σR2+σG2+τ2l2)
(19)
where μRandμG represent the pixel mean of the real images (R) and generated images (G), respectively. σR2 and σG2 represent the variance of the real and generated images, respectively, and σRG represents the covariance between R and G. l is the dynamic range, and τandτ constant values are 0.01 and 0.03, respectively.

4.4 Results and Discussion.

The generated images are evaluated using three types of assessments: the evaluation metrics assessment, the reality detection assessment, and the variation assessment.

4.4.1 Evaluation Metrics Assessment.

The results for the six cases using the different evaluation metrics are reported in Table 6. This assessment aims to determine the similarity of the generated images against the real images. It can be observed that the proposed models (DDIM/KID-IS and DDIM/m-KID) generate better images that are visually closer to the real images compared to the benchmarks (Fig. 12).

Fig. 12
Generated images for each model and each case
Fig. 12
Generated images for each model and each case
Close modal
Table 6

Performance evaluation of the six cases using five models

Cases
ModelsMetricsCase 1Case 2Case 3Case 4Case 5aCase 5bAverage computational time (h)
DDIM/KID-IS (proposed)KID0.6850.7201.3540.4350.8940.9501.05
m-KID0.0980.1060.1960.0640.1170.130
PSNR20.21817.94321.40718.00715.43015.893
SSIM0.2540.1580.2560.1330.0900.148
DDIM/m-KID (proposed)KID1.3800.8981.3200.4690.7411.6441.02
m-KID0.2020.1330.1910.0710.0970.222
PSNR20.0918.13321.36517.73015.27016.282
SSIM0.2470.1550.2500.1110.1090.170
DDPMKID1.7100.9771.4380.9241.1792.3012.34
m-KID0.2530.1460.2090.1380.1570.314
PSNR18.62017.3719.51617.45615.65015.691
SSIM0.2220.1230.2090.1030.0900.110
DCGANKID2.2731.7163.0621.4171.8922.1571.37
m-KID0.3420.2600.4630.2170.2590.296
PSNR17.05516.23717.85515.97014.38613.863
SSIM0.2320.1660.2160.1430.1070.148
WGANKID1.9581.7622.6481.7792.9862.230.76
m-KID0.2920.2690.3980.2750.4230.311
PSNR13.76213.35913.37815.15612.82613.144
SSIM0.2220.1720.2200.1210.0990.133
Cases
ModelsMetricsCase 1Case 2Case 3Case 4Case 5aCase 5bAverage computational time (h)
DDIM/KID-IS (proposed)KID0.6850.7201.3540.4350.8940.9501.05
m-KID0.0980.1060.1960.0640.1170.130
PSNR20.21817.94321.40718.00715.43015.893
SSIM0.2540.1580.2560.1330.0900.148
DDIM/m-KID (proposed)KID1.3800.8981.3200.4690.7411.6441.02
m-KID0.2020.1330.1910.0710.0970.222
PSNR20.0918.13321.36517.73015.27016.282
SSIM0.2470.1550.2500.1110.1090.170
DDPMKID1.7100.9771.4380.9241.1792.3012.34
m-KID0.2530.1460.2090.1380.1570.314
PSNR18.62017.3719.51617.45615.65015.691
SSIM0.2220.1230.2090.1030.0900.110
DCGANKID2.2731.7163.0621.4171.8922.1571.37
m-KID0.3420.2600.4630.2170.2590.296
PSNR17.05516.23717.85515.97014.38613.863
SSIM0.2320.1660.2160.1430.1070.148
WGANKID1.9581.7622.6481.7792.9862.230.76
m-KID0.2920.2690.3980.2750.4230.311
PSNR13.76213.35913.37815.15612.82613.144
SSIM0.2220.1720.2200.1210.0990.133

Note: The best evaluation metric score across all models for each case is highlighted in bold. High quality generated sample is indicated by lower KID, lower m-KID, higher PSNR and higher SSIM.

The proposed models (DDIM/KID-IS and DDIM/m-KID) have the lowest KID and m-KID scores for all the cases showing the high performance of the proposed models over the benchmarks. When comparing the two proposed models, although their visual performance appears similar, DDIM/KID-IS outperforms DDIM/m-KID in four of the six cases. The DDIM/KID-IS seems to have a higher performance than the DDIM/m-KID according to the evaluation metric assessment. This aligns with our intuition that the combination and m-KID and IS metrics can enhance the quality of the generated images by DDIM.

As for SSIM, in two out of the six cases (case 2 with WGAN and case 4 with DCGAN), the metric does not seem to align with the KID and m-KID results that the proposed models have the best performance. This discrepancy arises because the generated images by the WGAN and the DCGAN in those cases have higher contrast, which the metric captures. However, upon examining Fig. 12, it is evident that WGAN and DCGAN for these cases do not have the highest-quality images. While training the DDPM model, the model also produces high-quality images to the level of DDIM. However, the color of the generated images is different in some cases due to the DDPM training process. In addition, the DDPM is also computationally expensive, so to generate images to the level of DDIM, DDPM has to be trained longer than what is noted in Table 6 using more time-steps.

The LSGAN was not able to generate any realistic images for any of the cases, as depicted in Fig. 8(e); therefore, LSGAN's results are omitted in Table 6. It also validates the statement made by Song et al. [50] in the literature, emphasizing that GANs' high performance is highly dependent on selective optimization and architectural choices for training. In an attempt to evaluate the effectiveness of the proposed models, the models were also evaluated on higher pixel resolution images (256×256). However, our models were not able to run due to an out-of-memory issue. One of the future work plans is to address that issue. Please note that the various modalities of the six cases (infill patterns, materials, colors, and shapes) establish the generalization of our models. Therefore, the proposed models should be able to deliver similar performance results when tested on other AM processes such as L-PBF and DED.

4.4.2 Reality Detection Assessment.

The second assessment is the reality detection assessment to assess the reality of the generated images against the real images. The hyperparameter of the CAE is presented in Table 7.

Table 7

The hyperparameters of the CAE

NetworkLayersValue/type
CNN encoderConvolutional layers128,64,32,16,8
Filters(3,3)
Activation functionRelu
Max pooling(2,2)
CNN decoderDeconvolutional layers8,16,32,64,128
Filters(3,3)
Activation functionRelu
Up-sampling(2,2)
CompilerOptimizerAdam
Loss functionMean squared error
NetworkLayersValue/type
CNN encoderConvolutional layers128,64,32,16,8
Filters(3,3)
Activation functionRelu
Max pooling(2,2)
CNN decoderDeconvolutional layers8,16,32,64,128
Filters(3,3)
Activation functionRelu
Up-sampling(2,2)
CompilerOptimizerAdam
Loss functionMean squared error
More than 500 images separated between the training and validation set for cases 1 and 2 are used to train the reality detection model. The reconstruction error and accuracy for case 1 are presented in Figs. 13(a) and 13(b), with case 2 exhibiting similar performance. Table 8 also presents the average density (μdensity) and average error (μerror) from the real images used to determine the threshold metrics. The threshold density Zth should be equal to the lower bound of the density range of the real images and the threshold error εth should be equal to the upper bound of the reconstructed error of the real images, as shown, respectively, by Eqs. (20) and (21).
Lowerbound(density)=μdensity(zσdensity)
(20)
Upperbound(error)=μerror+(zσerror)
(21)
where μdensity is the average mean density of the encoded real images, and μerror is the average mean error after reconstructing the real images from the code. σdensity is the standard deviation from the mean density of the encoded real images, and σerror is the standard deviation from the mean error after reconstructing the real images from the code. z is the z-score number of standard deviations from the mean for the normal distribution. z=3 in this study (99.7% confidence interval). Note that the values of standard deviations in Table 8 are so small that using z=1or2(CI=68%or95%) does not have any major impact on the calculated threshold values.
Fig. 13
(a) Reconstruction error and (b) accuracy for case 1
Fig. 13
(a) Reconstruction error and (b) accuracy for case 1
Close modal
Table 8

Average density and reconstruction error for cases 1 and 2

Casesμdensityσdensityμerrorσerror
Case 17011.436 × 10−60.009690.000426
Case 282.3922.435 × 10−60.00660.00243
Casesμdensityσdensityμerrorσerror
Case 17011.436 × 10−60.009690.000426
Case 282.3922.435 × 10−60.00660.00243

Table 9 presents the calculated threshold values as well as the results of the test of reality. Though the threshold condition is stricter to ensure that only fake images are selected, this condition reduces the rate of false positives (classifying a fake image as a real image), while it is worth noting that it also increases the rate of false negatives. In this study, a true positive is considered when predicting a fake image as fake, and a true negative is considered when predicting a real image as real. Thirty test images per case, with 15 images as fake (or generated) and 15 images as real, are used to evaluate the reality detection model. The model detected all 60 images of both cases 1 and 2 as real images implying that our generated images passed the test of reality.

Table 9

Threshold values and results

CasesThresholdResults
DensityError
Case 1<701>0.01All images are considered real
Case 2<82>0.01All images are considered real
CasesThresholdResults
DensityError
Case 1<701>0.01All images are considered real
Case 2<82>0.01All images are considered real

4.4.3 Layer-Wise Variations Monitoring.

The final assessment involves diversity checks via variations monitoring. Although the proposed models can eliminate certain unrealistic patterns (by visual comparison against the baseline DDIM, as seen in the  Appendix), they did generate a few unrealistic AM layer-wise images. Four types of unrealistic patterns or variations as depicted in Fig. 14 could be observed throughout the training of the different proposed models. These variations are (1) repeated patterns, (2) patterns overlapping with each other, (3) patterns going in different directions, and (4) blurred or unclear/missing patterns.

Fig. 14
Unrealistic patterns in AM layer-wise generated images
Fig. 14
Unrealistic patterns in AM layer-wise generated images
Close modal

Our proposed models reduced the number of samples generated with unrealistic patterns except for case 5a (e.g., Fig. 14(2)) where the models are still confused between the combination of clear distinctive style patterns. The possible main reason is that case 5a has the most complex pattern distribution among the six cases (as shown in Fig. 7), and such patterns may confuse the models and lead them to generate a lot of unrealistic layer-wise patterns or variations. Case 5a reveals that the proposed models need to be further improved to capture overly complex patterns.

The augmented images from case 2 are trained using the DDIM/KID-IS (best model) to determine its capacity to generate some images very similar to the test layer-wise not seen by the DM. Figure 15(a) depicts a generated image by the DDIM/KID-IS that is highly similar to a test layer (Fig. 15(b)) with their areas of similarities highlighted. The original layer has more contrast than the generated layer. This is expected since the results in Table 6 show that the DDIM/KID-IS of case 2 has the best quality images in terms of KID and m-KID, but its SSIM (0.158) is not the best among all models. To ensure that the generated image by the DDIM/KID-IS is not the cause of data copying but is a predicted layer image with possible variations, the pixel comparison between the two images is conducted. After the pixel-by-pixel comparison of 32,400-pixel points, it turns out that there are 22,858 different points, thus accounting for 30% of pixel similarities. Figure 15(c) illustrates the difference or variations between the generated and real layer, with the underlying variations combined with the real layer. This demonstrates that the DDIM/KID-IS can predict potential future AM layers with possible variations.

Fig. 15
(a) Generated layer-wise image, (b) test layer-wise image, and (c) test layer-wise image with generated variations
Fig. 15
(a) Generated layer-wise image, (b) test layer-wise image, and (c) test layer-wise image with generated variations
Close modal

Since the similarity selection of the two images is conducted manually via human perception, future work requires a more robust approach to evaluate the proposed model. This future approach should be based on structural or feature control, to better assess the capability of our model in capturing possible feature variations.

5 Conclusions and Future Work

This article proposes DDIM models incorporated with novel kernel-based metrics trained to generate high-quality AM layer-wise images. The models can address issues related to layer-wise sampling in AM and can also be used for layer-wise AM monitoring. The generated samples can predict potential realistic layer-wise variations not seen during training. Such variations can be studied to understand their causation to prevent their occurrence in future printing. The proposed method consists of combining the developed distance metrics m-KID and KID-IS within the DDIM model to balance the similarity and diversity of the generated images. After careful data augmentation and hyperparameter tuning, the real layer-wise images are fed into the DDIM/m-KID and DDIM/KID-IS to generate layer-wise images. The proposed methods have been evaluated on six cases with various materials and infill patterns and angles from the FFF process, with four benchmarks (LSGAN, WGAN, DCGAN, and DDPM) and four quality assessment metrics (KID, m-KID, PSNR, and SSIM). The proposed metrics (e.g., m-KID) have also demonstrated their capability to be used as posttraining quality metrics assessment. Both DDIM/m-KID and DDIM/KID-IS outperform the benchmarks by both visual judgment and evaluation metrics. The models have demonstrated great potential for generating AM layer-wise images with possible process variation while passing the test of reality against the real images.

There are still a few future research directions that remain open. First, the capacity of the proposed model to capture process variations can be further enhanced by building a feature control-based DM. Second, the proposed distance metrics will be further improved to help DDIM in reducing the number of unrealistic images while generating diverse yet realistic images. Third, many hyperparameters of the DDIM can also be further tuned to improve the quality of the generated images. Fourth, an innovative approach to incorporate an advanced reality detection tool within the DDIM model can be further investigated. Finally, though the proposed models can be generalized to similar layer-wise AM processes such as L-PBF or DED, case 5a has also demonstrated the limitation of the models to fully grasp the distribution of complex patterns or shapes. Therefore, it is also crucial to attest to the effectiveness of the models in more AM real-world case studies with various complex patterns and shapes.

Acknowledgment

This work was partially supported by the Oklahoma State University CEAT Engineering Research and Seed Funding Program. We would also like to express our appreciation to the anonymous reviewers who provided insightful and constructive feedback that contributed to the improvement of this paper.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Appendix: Data Augmentation and Hyperparameter Tuning Preliminary Study

This section presents the preliminary study to determine the effect of data augmentation and hyperparameter tuning on the generated images and the settings used in our final model. Table 10 shows the details of different settings for data augmentation, whereas the determined ideal hyperparameters are noted in Table 3. The preliminary results after data augmentation using case 1 training images are depicted in Fig. 16. Higher rotation angle and shift with active horizontal and vertical flip lead to unrealistic shape-like generated images. The generated image quality improves when the angle and shift values are lowered, and only the horizontal flip is activated. The final setting, data augmentation 5 with noise, used in the study assured that the augmented data are close to the original data. The final selected noise setting is set to have a mean of 0 with a standard deviation of 0.1 following the normal Gaussian distribution.

Fig. 16
Generated images after data augmentation for preliminary study
Fig. 16
Generated images after data augmentation for preliminary study
Close modal
Table 10

Data augmentation setting

Setting
Rotation angleWidth shiftHeight shiftZoom rangeHorizontal flipVertical flipFill mode
Augmentation 145 deg0.10.10.1TrueTrueConstant
Augmentation 210 deg0.10.10.1FalseFalseConstant
Augmentation 35 deg0.050.050.05FalseFalseConstant
Augmentation 43 deg0.0050.0050.005TrueFalseConstant
Augmentation 51 deg0.0050.0050.005TrueFalseConstant
Setting
Rotation angleWidth shiftHeight shiftZoom rangeHorizontal flipVertical flipFill mode
Augmentation 145 deg0.10.10.1TrueTrueConstant
Augmentation 210 deg0.10.10.1FalseFalseConstant
Augmentation 35 deg0.050.050.05FalseFalseConstant
Augmentation 43 deg0.0050.0050.005TrueFalseConstant
Augmentation 51 deg0.0050.0050.005TrueFalseConstant

References

1.
Gao
,
W.
,
Zhang
,
Y.
,
Ramanujan
,
D.
,
Ramani
,
K.
,
Chen
,
Y.
,
Williams
,
C. B.
,
Wang
,
C. C. L.
,
Shin
,
Y. C.
,
Zhang
,
S.
, and
Zavattieri
,
P. D.
,
2015
, “
The Status, Challenges, and Future of Additive Manufacturing in Engineering
,”
Comput. Aided Des.
,
69
, pp.
65
89
.
2.
Liu
,
C.
,
Tian
,
W.
, and
Kan
,
C.
,
2022
, “
When AI Meets Additive Manufacturing: Challenges and Emerging Opportunities for Human-Centered Products Development
,”
J. Manuf. Syst.
,
64
, pp.
648
656
.
3.
Yangue
,
E.
,
Ye
,
Z.
,
Kan
,
C.
, and
Liu
,
C.
,
2023
, “
Integrated Deep Learning-Based Online Layer-Wise Surface Prediction of Additive Manufacturing
,”
Manuf. Lett.
,
35
, pp.
760
769
.
4.
Ye
,
Z.
,
Liu
,
C.
,
Tian
,
W.
, and
Kan
,
C.
,
2021
, “
In-Situ Point Cloud Fusion for Layer-Wise Monitoring of Additive Manufacturing
,”
J. Manuf. Syst.
,
61
, pp.
210
222
.
5.
LeCun
,
Y.
,
Bengio
,
Y.
, and
Hinton
,
G.
,
2015
, “
Deep Learning
,”
Nature
,
521
(
7553
), pp.
436
444
.
6.
Chawla
,
N. V.
,
Bowyer
,
K. W.
,
Hall
,
L. O.
, and
Kegelmeyer
,
W. P.
,
2002
, “
SMOTE: Synthetic Minority Over-Sampling Technique
,”
J. Artif. Intell. Res.
,
16
, pp.
321
357
.
7.
Chung
,
J.
,
Shen
,
B.
, and
Kong
,
Z. J.
,
2023
, “
Anomaly Detection in Additive Manufacturing Processes Using Supervised Classification With Imbalanced Sensor Data Based on Generative Adversarial Network
,”
J. Intell. Manuf.
8.
Yang
,
L.
,
Zhang
,
Z.
,
Song
,
Y.
,
Hong
,
S.
,
Xu
,
R.
,
Zhao
,
Y.
,
Zhang
,
W.
,
Cui
,
B.
, and
Yang
,
M.-H.
,
2023
, “
Diffusion Models: A Comprehensive Survey of Methods and Applications
,”
ACM
,
56
(
4
).
9.
Yang
,
S.
, and
Zhao
,
Y. F.
,
2015
, “
Additive Manufacturing-Enabled Design Theory and Methodology: A Critical Review
,”
Int. J. Adv. Manuf. Technol.
,
80
(
1
), pp.
327
342
.
10.
Fullington
,
D.
,
Bian
,
L.
, and
Tian
,
W.
,
2023
, “
Design De-Identification of Thermal History for Collaborative Process-Defect Modeling of Directed Energy Deposition Processes
,”
ASME J. Manuf. Sci. Eng.
,
145
(
5
), p.
051004
.
11.
Sanaei
,
N.
,
Fatemi
,
A.
, and
Phan
,
N.
,
2019
, “
Defect Characteristics and Analysis of Their Variability in Metal L-PBF Additive Manufacturing
,”
Mater. Des.
,
182
, p.
108091
.
12.
Bau
,
D.
,
Zhu
,
J.-Y.
,
Wulff
,
J.
,
Peebles
,
W.
,
Strobelt
,
H.
,
Zhou
,
B.
, and
Torralba
,
A.
,
2019
, “
Seeing What a GAN Cannot Generate
,” pp.
4502
4511
.
13.
Arora
,
S.
,
Ge
,
R.
,
Liang
,
Y.
,
Ma
,
T.
, and
Zhang
,
Y.
,
2017
, “
Generalization and Equilibrium in Generative Adversarial Nets (GANs)
,”
Proceedings of the 34th International Conference on Machine Learning
,
Sydney, Australia
,
Aug. 6–11
,
PMLR
, pp.
224
232
.
14.
Nagarajan
,
V.
,
Raffel
,
C.
, and
Goodfellow
,
I. J.
, “
Theoretical Insights Into Memorization in GANs
.”
15.
Gidel
,
G.
,
Berard
,
H.
,
Vignoud
,
G.
,
Vincent
,
P.
, and
Lacoste-Julien
,
S.
,
2020
, “
A Variational Inequality Perspective on Generative Adversarial Networks
.”
16.
Arjovsky
,
M.
,
Chintala
,
S.
, and
Bottou
,
L.
,
2017
, “
Wasserstein Generative Adversarial Networks
,”
Proceedings of the 34th International Conference on Machine Learning
,
Sydney, Australia
,
Aug. 6–11
,
PMLR
, pp.
214
223
.
17.
Xames
,
M. D.
,
Torsha
,
F. K.
, and
Sarwar
,
F.
,
2023
, “
A Systematic Literature Review on Recent Trends of Machine Learning Applications in Additive Manufacturing
,”
J. Intell. Manuf.
,
34
(
6
), pp.
2529
2555
.
18.
Shi
,
Z.
,
Mamun
,
A. A.
,
Kan
,
C.
,
Tian
,
W.
, and
Liu
,
C.
,
2022
, “
An LSTM-Autoencoder Based Online Side Channel Monitoring Approach for Cyber-Physical Attack Detection in Additive Manufacturing
,”
J. Intell. Manuf.
,
34
(
4
), pp.
1815
1831
.
19.
Ren
,
W.
,
Wen
,
G.
,
Zhang
,
Z.
, and
Mazumder
,
J.
,
2022
, “
Quality Monitoring in Additive Manufacturing Using Emission Spectroscopy and Unsupervised Deep Learning
,”
Mater. Manuf. Processes
,
37
(
11
), pp.
1339
1346
.
20.
Akbari
,
P.
,
Ogoke
,
F.
,
Kao
,
N.-Y.
,
Meidani
,
K.
,
Yeh
,
C.-Y.
,
Lee
,
W.
, and
Barati Farimani
,
A.
,
2022
, “
MeltpoolNet: Melt Pool Characteristic Prediction in Metal Additive Manufacturing Using Machine Learning
,”
Addit. Manuf.
,
55
, p.
102817
.
21.
Bappy
,
M. M.
,
Liu
,
C.
,
Bian
,
L.
, and
Tian
,
W.
,
2022
, “
Morphological Dynamics-Based Anomaly Detection Towards In Situ Layer-Wise Certification for Directed Energy Deposition Processes
,”
ASME J. Manuf. Sci. Eng.
,
144
(
11
), p.
111007
.
22.
Esfahani
,
M. N.
,
Bappy
,
M. M.
,
Bian
,
L.
, and
Tian
,
W.
,
2022
, “
In-Situ Layer-Wise Certification for Direct Laser Deposition Processes Based on Thermal Image Series Analysis
,”
J. Manuf. Process.
,
75
, pp.
895
902
.
23.
Kononenko
,
D. Y.
,
Nikonova
,
V.
,
Seleznev
,
M.
,
van den Brink
,
J.
, and
Chernyavsky
,
D.
,
2023
, “
An In Situ Crack Detection Approach in Additive Manufacturing Based on Acoustic Emission and Machine Learning
,”
Addit. Manuf. Lett.
,
5
, p.
100130
.
24.
Lyu
,
J.
, and
Manoochehri
,
S.
,
2021
, “
Online Convolutional Neural Network-Based Anomaly Detection and Quality Control for Fused Filament Fabrication Process
,”
Virtual Phys. Prototyp.
,
16
(
2
), pp.
160
177
.
25.
Valizadeh
,
M.
, and
Wolff
,
S. J.
,
2022
, “
Convolutional Neural Network Applications in Additive Manufacturing: A Review
,”
Adv. Ind. Manuf. Eng.
,
4
, p.
100072
.
26.
Banadaki
,
Y.
,
Razaviarab
,
N.
,
Fekrmandi
,
H.
, and
Sharifi
,
S.
,
2020
, “
Toward Enabling a Reliable Quality Monitoring System for Additive Manufacturing Process Using Deep Convolutional Neural Networks
,”
arXiv
. https://arxiv.org/abs/2003.08749
27.
Trinks
,
S.
, and
Felden
,
C.
,
2019
, “
Image Mining for Real Time Quality Assurance in Rapid Prototyping
,”
2019 IEEE International Conference on Big Data (Big Data)
,
Los Angeles, CA
,
Dec. 9–12
, pp.
3529
3534
.
28.
Patel
,
S.
,
Mekavibul
,
J.
,
Park
,
J.
,
Kolla
,
A.
,
French
,
R.
,
Kersey
,
Z.
, and
Lewin
,
G. C.
,
2019
, “
Using Machine Learning to Analyze Image Data From Advanced Manufacturing Processes
,”
2019 Systems and Information Engineering Design Symposium (SIEDS)
,
Charlottesville, VA
,
Apr. 26
, pp.
1
5
.
29.
Mahato
,
V.
,
Obeidi
,
M. A.
,
Brabazon
,
D.
, and
Cunningham
,
P.
,
2022
, “
Detecting Voids in 3D Printing Using Melt Pool Time Series Data
,”
J. Intell. Manuf.
,
33
(
3
), pp.
845
852
.
30.
Lyu
,
J.
,
Akhavan Taheri Boroujeni
,
J.
, and
Manoochehri
,
S.
,
2021
, “
In-Situ Laser-Based Process Monitoring and In-Plane Surface Anomaly Identification for Additive Manufacturing Using Point Cloud and Machine Learning
,”
ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
, Paper No: DETC2021-69436.
31.
Song
,
Y.
, and
Kingma
,
D. P.
,
2021
, “
How to Train Your Energy-Based Models
,”
arXiv
. https://arxiv.org/abs/2101.03288
32.
Kingma
,
D. P.
, and
Welling
,
M.
,
2019
, “
An Introduction to Variational Autoencoders
,”
MAL
,
12
(
4
), pp.
307
392
.
33.
Goodfellow
,
I.
,
Pouget-Abadie
,
J.
,
Mirza
,
M.
,
Xu
,
B.
,
Warde-Farley
,
D.
,
Ozair
,
S.
,
Courville
,
A.
, and
Bengio
,
Y.
,
2020
, “
Generative Adversarial Networks
,”
Commun. ACM
,
63
(
11
), pp.
139
144
.
34.
Zhang
,
Q.
, and
Chen
,
Y.
,
2021
, “Diffusion Normalizing Flow,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc.
, pp.
16280
16291
.
35.
Lee
,
K.-H.
, and
Yun
,
G. J.
,
2023
, “
Microstructure Reconstruction Using Diffusion-Based Generative Models
,”
Mech. Adv. Mater. Struct.
, pp.
1
19
.
36.
Ye
,
D.
,
Hong
,
G. S.
,
Zhang
,
Y.
,
Zhu
,
K.
, and
Fuh
,
J. Y. H.
,
2018
, “
Defect Detection in Selective Laser Melting Technology by Acoustic Signals With Deep Belief Networks
,”
Int. J. Adv. Manuf. Technol.
,
96
(
5
), pp.
2791
2801
.
37.
Ye
,
D.
,
Hsi Fuh
,
J. Y.
,
Zhang
,
Y.
,
Hong
,
G. S.
, and
Zhu
,
K.
,
2018
, “
In Situ Monitoring of Selective Laser Melting Using Plume and Spatter Signatures by Deep Belief Networks
,”
ISA Trans.
,
81
, pp.
96
104
.
38.
Wang
,
J.
,
Ma
,
Y.
,
Zhang
,
L.
,
Gao
,
R. X.
, and
Wu
,
D.
,
2018
, “
Deep Learning for Smart Manufacturing: Methods and Applications
,”
J. Manuf. Syst.
,
48
, pp.
144
156
.
39.
Song
,
Z.
,
Wang
,
X.
,
Gao
,
Y.
,
Son
,
J.
, and
Wu
,
J.
,
2023
, “
A Hybrid Deep Generative Network for Pore Morphology Prediction in Metal Additive Manufacturing
,”
ASME J. Manuf. Sci. Eng.
,
145
(
7
), p.
071005
.
40.
Ghayoomi Mohammadi
,
M.
,
Mahmoud
,
D.
, and
Elbestawi
,
M.
,
2021
, “
On the Application of Machine Learning for Defect Detection in L-PBF Additive Manufacturing
,”
Opt. Laser Technol.
,
143
, p.
107338
.
41.
Lew
,
A. J.
, and
Buehler
,
M. J.
,
2021
, “
Encoding and Exploring Latent Design Space of Optimal Material Structures via a VAE-LSTM Model
,”
Forces Mech.
,
5
, p.
100054
.
42.
Li
,
Y.
,
Shi
,
Z.
,
Liu
,
C.
,
Tian
,
W.
,
Kong
,
Z.
, and
Williams
,
C. B.
,
2022
, “
Augmented Time Regularized Generative Adversarial Network (ATR-GAN) for Data Augmentation in Online Process Anomaly Detection
,”
IEEE Trans. Autom. Sci. Eng.
,
19
(
4
), pp.
3338
3355
.
43.
Hertlein
,
N.
,
Buskohl
,
P. R.
,
Gillman
,
A.
,
Vemaganti
,
K.
, and
Anand
,
S.
,
2021
, “
Generative Adversarial Network for Early-Stage Design Flexibility in Topology Optimization for Additive Manufacturing
,”
J. Manuf. Syst.
,
59
, pp.
675
685
.
44.
He
,
X.
,
Chang
,
Z.
,
Zhang
,
L.
,
Xu
,
H.
,
Chen
,
H.
, and
Luo
,
Z.
,
2022
, “
A Survey of Defect Detection Applications Based on Generative Adversarial Networks
,”
IEEE Access
,
10
, pp.
113493
113512
.
45.
Saharia
,
C.
,
Ho
,
J.
,
Chan
,
W.
,
Salimans
,
T.
,
Fleet
,
D. J.
, and
Norouzi
,
M.
,
2023
, “
Image Super-Resolution via Iterative Refinement
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
45
(
4
), pp.
4713
4726
.
46.
Zhang
,
H.
,
Prasad Vallabh
,
C. K.
, and
Zhao
,
X.
,
2023
, “
Machine Learning Enhanced High Dynamic Range Fringe Projection Profilometry for In-Situ Layer-Wise Surface Topography Measurement During LPBF Additive Manufacturing
,”
Precis. Eng.
,
84
, pp.
1
14
.
47.
Ogoke
,
F.
,
Liu
,
Q.
,
Ajenifujah
,
O.
,
Myers
,
A.
,
Quirarte
,
G.
,
Beuth
,
J.
,
Malen
,
J.
, and
Farimani
,
A. B.
,
2023
, “
Inexpensive High Fidelity Melt Pool Models in Additive Manufacturing Using Generative Deep Diffusion
,”
arXiv
. https://arxiv.org/abs/2311.16168
48.
Zhao
,
S.
,
Ren
,
H.
,
Yuan
,
A.
,
Song
,
J.
,
Goodman
,
N.
, and
Ermon
,
S.
,
2018
, “Bias and Generalization in Deep Generative Models: An Empirical Study,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc
.
49.
Gulrajani
,
I.
,
Ahmed
,
F.
,
Arjovsky
,
M.
,
Dumoulin
,
V.
, and
Courville
,
A. C.
,
2017
, “Improved Training of Wasserstein GANs,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc
.
50.
Song
,
J.
,
Meng
,
C.
, and
Ermon
,
S.
,
2022
, “
Denoising Diffusion Implicit Models
.”
51.
Ho
,
J.
,
Jain
,
A.
, and
Abbeel
,
P.
,
2020
, “Denoising Diffusion Probabilistic Models,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc.
, pp.
6840
6851
.
52.
Bengio
,
Y.
,
Yao
,
L.
,
Alain
,
G.
, and
Vincent
,
P.
,
2013
, “Generalized Denoising Auto-Encoders as Generative Models,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc
.
53.
Song
,
Y.
,
Sohl-Dickstein
,
J.
,
Kingma
,
D. P.
,
Kumar
,
A.
,
Ermon
,
S.
, and
Poole
,
B.
,
2021
, “
Score-Based Generative Modeling Through Stochastic Differential Equations
.”
54.
Song
,
Y.
, and
Ermon
,
S.
,
2019
, “Generative Modeling by Estimating Gradients of the Data Distribution,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc
.
55.
Ronneberger
,
O.
,
Fischer
,
P.
, and
Brox
,
T.
,
2015
, “
U-Net: Convolutional Networks for Biomedical Image Segmentation
.”
56.
Sheikh
,
H. R.
, and
Bovik
,
A. C.
,
2006
, “
Image Information and Visual Quality
,”
IEEE Trans. Image Process.
,
15
(
2
), pp.
430
444
.
57.
Wang
,
Z.
, and
Bovik
,
A. C.
,
2002
, “
A Universal Image Quality Index
,”
IEEE Signal Process. Lett.
,
9
(
3
), pp.
81
84
.
58.
Salimans
,
T.
,
Goodfellow
,
I.
,
Zaremba
,
W.
,
Cheung
,
V.
,
Radford
,
A.
,
Chen
,
X.
, and
Chen
,
X.
,
2016
, “Improved Techniques for Training GANs,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc
.
59.
Heusel
,
M.
,
Ramsauer
,
H.
,
Unterthiner
,
T.
,
Nessler
,
B.
, and
Hochreiter
,
S.
,
2017
, “GANS Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,”
Advances in Neural Information Processing Systems
,
Curran Associates, Inc
.
60.
Bińkowski
,
M.
,
Sutherland
,
D. J.
,
Arbel
,
M.
, and
Gretton
,
A.
,
2021
, “
Demystifying MMD GANs
.”
61.
Betzalel
,
E.
,
Penso
,
C.
,
Navon
,
A.
, and
Fetaya
,
E.
,
2022
, “
A Study on the Evaluation of Generative Models
,”
arXiv
. https://arxiv.org/abs/2206.10935
62.
Loshchilov
,
I.
, and
Hutter
,
F.
,
2019
, “
Decoupled Weight Decay Regularization
.”
63.
Radford
,
A.
,
Metz
,
L.
, and
Chintala
,
S.
,
2016
, “
Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks
.”
64.
Mao
,
X.
,
Li
,
Q.
,
Xie
,
H.
,
Lau
,
R. Y. K.
,
Wang
,
Z.
, and
Smolley
,
S. P.
, “
Least Squares Generative Adversarial Networks
.” https://ieeexplore.ieee.org/document/8237566
65.
Ridder
,
H. d.
,
1998
, “Psychophysical Evaluation of Image Quality: From Judgment to Impression,”
Human Vision and Electronic Imaging III
,
SPIE
, pp.
252
263
.