Abstract
Neural networks have gained popularity for modeling complex non-linear relationships. Their computational efficiency has led to their growing adoption in optimization methods, including topology optimization. Recently, there have been several contributions toward improving derivatives of neural network outputs, which can improve their use in gradient-based optimization. However, a comparative study has yet to be conducted on the different derivative methods for the sensitivity of the input features on the neural network outputs. This paper aims to evaluate four derivative methods: analytical neural network’s Jacobian, central finite difference method, complex step method, and automatic differentiation. These methods are implemented into density-based and homogenization-based topology optimization using multilayer perceptrons (MLPs). For density-based topology optimization, the MLP approximates Young’s modulus for the solid isotropic material with penalization (SIMP) model. For homogenization-based topology optimization, the MLP approximates the homogenized stiffness tensor of a representative volume element, e.g., square cell microstructure with a rectangular hole. The comparative study is performed by solving two-dimensional topology optimization problems using the sensitivity coefficients from each derivative method. Evaluation includes initial sensitivity coefficients, convergence plots, and the final topologies, compliance, and design variables. The findings demonstrate that neural network-based sensitivity coefficients are sufficiently accurate for density-based and homogenization-based topology optimization. The neural network’s Jacobian, complex step method, and automatic differentiation produced identical sensitivity coefficients to working precision. The study’s open-source code is provided through a python repository.
1 Introduction
Over the past decades, neural networks have experienced a rapid surge in popularity as a means to model intricate non-linear relationships. They are particularly well suited for this purpose because they can be trained solely using data without the need for an explicit function. Once trained, neural networks efficiently generate predictions, making them suitable for a wide range of applications. A typical application is for regression problems, in which the neural network is trained to predict the response of an expensive black-box function [1]. Moreover, the neural network can be employed in conjunction with various derivative methods to obtain approximate sensitivity coefficients of this function. Standard methods include analytical derivatives taken through the neural network’s activation functions [2], the central finite difference (CFD) method [3], the complex step method (CSM) [4], and automatic differentiation (AD) [5].
The computational efficiency and flexibility of neural network-based derivatives have contributed to their increasing application to gradient-based optimization algorithms. This work specifically examines their use in structural topology optimization. In that context, neural network-based derivative methods play a valuable role in generating sensitivity coefficients enabling the efficient synthesis of the optimal material layout using gradient-based mathematical programming techniques [6].
1.1 Neural Network-Based Derivatives.
In recent years, a number of published studies have utilized and enhanced neural network-based derivatives and their associated numerical methods. Kissel and Diepold [7] proposed using least-squares approximated derivatives to train a Sobolev norm neural network for functions where target derivatives are not directly available. Meanwhile, Avrutskiy [8] showed that if several orders of target derivatives are known, then feedforward neural networks can be trained with increased precision. This was accomplished by incorporating deviations of the target derivatives from the known derivatives into an extended cost function. Later, Kiran and Naik [9] applied the complex step method to a feedforward neural network to obtain derivatives of a regression task accurately. Whereas Ledesma et al. [10] used the chain rule to take an analytical derivative of a multilayer perceptron (MLP) neural network to derive a differential neural network. The differential neural network is based on the original network and does not need to be trained. Similarly, Rodini [11] derived and proposed a simple recursive algorithm for computing a deep neural network’s first- and second-order derivatives via analytical methods.
Extending these methods to gradient-based optimization algorithms is straightforward, but its potential has not been fully explored or exploited. Notably, topology optimization is a compelling application for neural network-based derivatives, enabling the comparison of the sensitivity coefficients obtained using standard derivative methods and the corresponding final topologies.
1.2 Application to Topology Optimization.
In recent years, neural networks have been applied to topology optimization in various ways, ranging from approximating the implicit function of a parameterized level set method [12] to generating three-dimensional aircraft models for design exploration [13]. A promising application is the prediction of effective properties of materials or microstructures, i.e., homogenized stiffness tensors [14]. The resulting neural network material models can be applied to reduce the computational costs in density-based topology optimization (DBTO) and homogenization-based topology optimization (HBTO). As a result, these neural network material models enable the efficient evaluation of materials or microstructures that were previously computationally intractable in DBTO and HBTO methods.
DBTO methods use the density or thickness of each finite element as the design variable to optimize the material distribution. Several works have extended traditional DBTO methods with neural networks for predicting sensitivity coefficients or homogenized microstructures’ properties. Takahashi et al. [15] proposed using convolution neural networks in topology optimization to predict the sensitivity coefficients from a discrete material distribution. Meanwhile, Watts et al. [16] deployed surrogate models to predict the homogenized stiffness tensor of open micro-trusses, given their relative densities. In a series of similar works, Zhang et al. [17–19] used the effective density of a microstructure to predict the homogenized properties of shape-interpolated microstructures generated with the parametric level set method.
HBTO methods use the parameters of a homogenized microstructure as the design variables to optimize the multiscale structure. HBTO methods date back to the seminal paper by Bendsøe and Kikuchi [20] and commonly feature parameterizations including unit cells with rectangular holes [20–23], implicit functions [18,24,25], and truss structures [26–28]. These parameterizations facilitate extension (i.e., by providing input features) of HBTO methods with neural networks for predicting the homogenized microstructures’ properties. White et al. [28] utilized single-layer feedforward neural networks to approximate the homogenized stiffness tensor of parametrically sized micro-trusses. Similarly, Black and Najafi [29] presented a deep neural network to approximate the homogenized stiffness tensor of parametrically shaped biotrusses. The homogenized properties of more complex microstructures have also been parameterized with implicit surface functions [30,31] for neural network approximation.
1.3 Objectives and Contributions of This Work.
Although numerous derivative methods have been proposed and examined for neural networks, to the authors’ knowledge, a comparative study to assess their performance has not been conducted. Therefore, the objective of this paper is to implement and evaluate standard methods for obtaining or approximating the derivatives of neural network outputs with respect to their inputs. The methods explored in this study encompass the analytical neural network Jacobian (NNJ), the central finite difference method, the complex step method, and automatic differentiation. These methods are implemented and evaluated in neural networks trained to model material properties for DBTO and HBTO. Specifically, these neural networks are trained to predict the homogenized stiffness tensors of finite elements in a discretized design domain, given the design variables of the elements. The two topology optimization methods are used to solve the Messerschmitt–Bölkow–Blohm (MBB) beam problem using the sensitivity coefficients provided by each neural network-based derivative method. The results include the comparison of the initial sensitivity coefficients, convergence plots, final topologies, compliance values, and design variables. The training of the networks and evaluation of the derivative methods are done using the TensorFlow library in python.
The DBTO method in this work utilizes a neural network material model trained to approximate the solid isotropic material with penalization (SIMP) model [32]. The SIMP model was chosen for several reasons. First, SIMP derivatives have a closed-form analytical expression, enabling a ground truth comparison. Second, it provides a single-input, single-output system that is well suited for training an MLP. Third, the results are easier to verify and provide confidence in implementing the derivative methods. Finally, SIMP provides a familiar benchmark for the study. The HBTO method in this work utilizes a neural network material model trained to approximate the homogenized properties of a square cell microstructure with a rectangular hole. The microstructures are parameterized by the height and width of the hole, and their effective properties are found with numerical homogenization [33]. The HBTO method provides a multivariate input and output neural network application for a more comprehensive comparison of neural network-based derivative methods compared to the single input and output neural network for approximating the SIMP model.
The contributions of this work are the following: first, the implementation of four neural network-based derivative methods in TensorFlow that can be applied to a MLP of any arbitrary architecture. To the authors’ knowledge, there is no other implementation of these methods made available to the scientific community via a general, open-source python code. Second, the evaluation of four neural network-based derivative methods in terms of the relative accuracy of their sensitivity coefficients. Third, this work is the first to assess the effect of neural network-based derivative methods on the final topologies generated by DBTO and HBTO. This study’s DBTO and HBTO methods handle two-dimensional layouts of any used-defined size and boundary conditions. See Appendix A for a GitHub® repository of all the python code utilized in this work.
The paper is organized as follows: the four studied derivative methods are presented in Sec. 2. The density-based and homogenization-based topology optimization methods and their neural network material models are detailed in Sec. 3. In Sec. 4, the neural network-based derivative methods are compared in the two topology optimization methods for the MBB beam problem. Lastly, the paper’s conclusions are given in Sec. 5.
2 Neural Network-Based Derivative Methods
Neural networks or artificial neural networks are a type of machine learning model that can be trained to recognize patterns or make predictions. Neural networks consist of interconnected neurons that perform weighted operations (i.e., additions and multiplications) passed through non-linear activation functions [34]. Each neuron’s weight and bias variables can be optimized to minimize the loss or objective function of the model, e.g., the mean squared error (MSE) between the target and the network’s predicted output. Multiple layers of these neurons can be connected to form a deep neural network architecture. The simplest neural network architecture is found in a feedforward neural network where input data are only passed in the forward direction to the next layer. While many types of neural network architectures exist [35], this work employs a class of fully connected feedforward neural networks known as MLPs. This is motivated by the fundamental architecture of MLPs, which provides a traditional benchmark for the novel comparison of neural network-based derivative methods in this work.

Architecture of a simple multilayer perceptron feedforward neural network. A circle indicates an element-wise computation of Eq. (1). A square indicates an input feature of the network.

Architecture of a simple multilayer perceptron feedforward neural network. A circle indicates an element-wise computation of Eq. (1). A square indicates an input feature of the network.
2.1 Neural Network Jacobian.
2.2 Central Finite Difference Method.
2.3 Complex Step Method.
2.4 Automatic Differentiation.
AD is a set of techniques to evaluate the derivative of a function defined by a computer program [38]. AD works by overloading standard elementary operators and functions with a derivative rule in addition to their function value. Similar to the previously derived NNJ, the chain rule can be repeatedly applied to these elementary operations, allowing for derivatives of arbitrary order to be computed automatically to the nominal working precision. The downside to AD is that it requires careful implementation into a software package [5], so its availability can be limited. TensorFlow natively supports AD [39], which makes its implementation straightforward.
3 Topology Optimization With Neural Networks
To evaluate the various neural network-based derivative methods of Sec. 2, neural networks are trained to approximate the material models in density-based and homogenization-based topology optimization. In the DBTO method, the neural network replaces the SIMP model, while in the HBTO method, the neural network replaces the model of an orthotropic square microstructure cell with a rectangular hole. Although these material models are relatively simple compared to those in other works [29], they provide a suitable context for studying neural network-based derivative methods; see Sec. 1.3 for motivation. To this end, this section presents an overview of the topology optimization method, the optimization problem statement, and the implementation of the neural network into the density-based and homogenization-based topology optimization methods.
3.1 Method Overview.
In topology optimization methods with neural network material models, a neural network is first trained to predict the homogenized stiffness tensor of a finite element e given the design variables of that element. An initial design is provided to start the optimization process. At every optimization iteration, is predicted for every element. After the global stiffness matrix K is assembled from the predicted stiffness tensors, finite element analysis (FEA) is performed to find the resulting displacements in the design domain Ω. The objective function can now be evaluated. Additionally, the derivative of the homogenized stiffness tensor with respect to each design variable is derived from the neural network with one of the four methods presented in Sec. 2. Following the same process for K, the derivative of the stiffness matrix with respect to each design variable ∂K/∂θe is computed and assembled. The sensitivity coefficients can now be calculated using this and the FEA results.
The design variables are then updated with the trust region method from SciPy’s optimize.minimize function [40]. These steps occur at every optimization iteration until a final design (i.e., topology) has converged or a maximum number of iterations is reached. See Fig. 2 for a flowchart of the topology optimization method with a neural network material model.

Flowchart of the topology optimization method with a neural network material model. Inside the dashed window are subroutines that use the neural network material model.
3.2 Optimization Problem Statement.
3.3 Neural Network Material Models.

The (a) microstructure with a rectangular hole, (b) loading cases for numerical homogenization, and (c) contour plots of the homogenized, orthotropic stiffness tensor properties
4 Evaluation of Neural Network-Based Derivative Methods in Topology Optimization
This section compares the four neural network-based derivative methods: NNJ, CFD, CSM, and AD. To this end, the MBB beam problem is solved using density-based and homogenization-based topology optimization. Additionally, the SIMP model, Eq. (14), and the analytical derivative of the SIMP model, Eq. (15), are utilized for a ground truth comparison when evaluating the DBTO method. Since Eq. (16) is approximated with numerical homogenization [33], its analytical derivative is computationally intractable. Therefore, a ground truth comparison is not available when evaluating the HBTO method. The results of this section compare the initial set of sensitivity coefficients for each method along with the final design variables, topologies, and compliance values.
4.1 Neural Network Training.
First, two neural networks, specifically MLPs, are trained to surrogate the material models of Sec. 3.3. To determine a suitable MLP architecture, the KerasTuner package was utilized to search the hyperparameters (e.g., NL, NN, learning rate, etc.) of the MLP using the hyperband algorithm [42]. After tuning, the optimal MLP architecture for both networks was found to be NL = 1 hidden layers with NN = 64 neurons in that layer. An optimal learning rate of 0.001 was found for the adaptive moment estimation (Adam) optimization algorithm [43]. No kernel regularization was also found to produce better performing MLPs. The hidden and output layers utilized the sigmoid and linear activation functions. The sigmoid activation function was selected over the relu activation function due to its increased smoothness. Normalization and denormalization layers are added before and after the hidden layers to improve training.
For each material model, the feature space was randomly sampled with lattice hypercube sampling to produce 10, 000 feature sets. This feature set size was motivated by the need to balance MLP accuracy and training time efficiently. The targets of these feature sets were found using Eq. (14) for the DBTO method’s MLP or through numerical homogenization for the HBTO method’s MLP. Each sampled dataset is broken into training , testing , and validation datasets, which are used for training with TensorFlow’s fit function. The training process, which aims to minimize the MSE between the targets and predictions of the MLP, spans for 5000 epochs with a batch size of 32. Note that no derivative information on the feature sets was provided for training the MLPs.
Training occurred on a workstation equipped with an 8-core Intel® Xeon® E5-2620 v4 @ 2.10 GHz and 64 gigabytes of memory. Table 1 presents the training times tt and testing performance metrics for the DBTO and HBTO MLPs with 10,000 feature sets. An additional DBTO material model is trained with 100 feature sets to study the effect feature set density has on DBTO method’s results. The DBTO model trained with 100 feature sets has the same feature set density (i.e., 100 feature sets per input dimension) as the HBTO model trained with 10,000 feature sets. The MSE and correlation of determination (R2) values show good agreement between the MLPs’ prediction and the testing dataset, indicating no overfitting. As expected, a larger number of feature sets is shown to further improve the performance of the DBTO MLP at an additional computational cost. The performance metrics are observed to correspond more closely between the DBTO and HBTO MLPs if the feature set density is kept consistent. The MLPs that were trained with 10,000 feature sets are always used in the following topology optimization problems unless stated otherwise. The GitHub® repository provided in Appendix A includes the datasets, MLP models, and training codes corresponding to this section.
The training times and performance metrics for the DBTO and HBTO MLPs trained with 10,000 feature sets and the DBTO MLP trained with 100 feature sets
Metric | DBTO (102) | DBTO (104) | HBTO (104) |
---|---|---|---|
tt | 7.23 min | 39.25 min | 40.81 min |
MSE | 1.44 × 10−4 | 1.87 × 10−7 | 1.47 × 10−3 |
R2 | 0.99 | 1.00 | 0.99 |
Metric | DBTO (102) | DBTO (104) | HBTO (104) |
---|---|---|---|
tt | 7.23 min | 39.25 min | 40.81 min |
MSE | 1.44 × 10−4 | 1.87 × 10−7 | 1.47 × 10−3 |
R2 | 0.99 | 1.00 | 0.99 |
Note: The performance metrics include the mean squared error and correlation of determination between the MLPs’ prediction and the testing dataset.
4.2 Density-Based Topology Optimization.
The MBB beam’s design domain Ω has a fixed support (u1,2 = 0) applied to the left side and a load (f2 = −1) applied to the bottom corner of the right side (Fig. 4(a)). Ω is discretized into 60 × 30 elements. The sensitivity filter radius is set to 2.7 elements, and a volume fraction constraint of Vt = 0.50 is assigned. The MBB beam problem is solved with the DBTO method from an initial design (Fig. 4(b)) of until the maximum change in is less than 10−3.

The (a) boundary conditions of the MBB beam and its initial designs for the (b) DBTO and (c) HBTO methods. The DBTO method’s initial design is defined by . The HBTO method’s initial design is defined by so that the microstructure of Fig. 3(a) has a square hole sized according to Vt.

The (a) boundary conditions of the MBB beam and its initial designs for the (b) DBTO and (c) HBTO methods. The DBTO method’s initial design is defined by . The HBTO method’s initial design is defined by so that the microstructure of Fig. 3(a) has a square hole sized according to Vt.
The final topologies for the MBB beam problem were solved with the DBTO method using the sensitivity coefficients from the four neural network-based derivative methods and the ground truth SIMP model (Fig. 5). The final compliance values and the number of iterations until convergence are also provided. For all of the derivative methods, the DBTO method converged to final topologies with a similar compliance value around ϕ = 82. An additional MBB beam problem is solved using the DBTO method with the 100 feature set material model and AD sensitivity coefficients. Figure 6 compares the result of this problem with the result from Fig. 5 using AD sensitivity coefficients. The final topologies are similar, but the more accurate 10,000 feature set material model (Table 1) produced a final topology with better compliance that is closer to the ground truth SIMP. The DBTO method’s convergence for most derivative methods was also very similar to the sensitivity coefficients provided by the ground truth SIMP model (Fig. 7). However, the sensitivity coefficients from the CSM method resulted in a significantly different and more slowly converging optimization problem.

The final topologies for the MBB beam problem were solved with the DBTO method using the sensitivity coefficients from the four neural network-based derivative methods and the ground truth SIMP model. The final compliance and iterations till convergence are also shown.

The final topologies for the MBB beam problem were solved with the DBTO method using the material models trained with the 100 and 10,000 feature sets. Sensitivity coefficients are approximated with AD for both cases. The final compliance and iterations till convergence are also shown.

Convergence plots of the MBB beam’s compliance for the DBTO method when using sensitivity coefficients from the different neural network-based derivative methods and the ground truth SIMP model. See Fig. 5 for the corresponding final topologies.

Convergence plots of the MBB beam’s compliance for the DBTO method when using sensitivity coefficients from the different neural network-based derivative methods and the ground truth SIMP model. See Fig. 5 for the corresponding final topologies.
To compare the very similar but still unique final topologies in Fig. 5, a matrix of topology difference plots is provided in Fig. 8. Each plot corresponds to the difference between the final topologies of the column’s method and the row’s method. The MSE between the densities of each topology comparison is also provided. The smallest difference between the final topologies is seen between the NNJ, CFD, and AD methods. Due to the significantly different path to convergence, the CSM method had the most unique final topology. Compared to using the SIMP model, the NNJ, CFD, and AD methods all had a similar error.

The matrix of topology difference plots between all of the final topologies of the DBTO-solved MBB beam problem (Fig. 5). Each topology difference plot corresponds to the difference between the final topology of the column’s method (bottom row) and the final topology of the row’s method (left column). The MSE between the densities of each comparison (i.e., plot) is provided. Additionally, the MSE between the sensitivity coefficients of the initial design of each comparison is also provided.

The matrix of topology difference plots between all of the final topologies of the DBTO-solved MBB beam problem (Fig. 5). Each topology difference plot corresponds to the difference between the final topology of the column’s method (bottom row) and the final topology of the row’s method (left column). The MSE between the densities of each comparison (i.e., plot) is provided. Additionally, the MSE between the sensitivity coefficients of the initial design of each comparison is also provided.
All of the comparisons made so far have dealt with differences in the final topologies. The differences between these final topologies are amplified by initial differences in the neural network-based sensitivity coefficients that get compounded and grown over the numerous optimization iterations. Therefore, while the comparisons between the final results of the DBTO method provide useful insight into how the derivative method influences the convergence and final topology, it does not provide insight into how the neural network-based derivative methods actually differ from one another on a functional level.
For this reason, the MSE between the sensitivity coefficients of the initial design of each topology comparison is also provided in Fig. 8. An interesting observation is that there is zero error between the NNJ, CSM, and AD sensitivity coefficients, at least near the nominal working precision of the computer. In theory, the convergence and results of the DBTO method with these coefficients should be identical; however, this is not the case due to an accumulation of numerical (truncation and round-off) errors over the optimization iterations. These three sets of equal sensitivity coefficients agree with the true sensitivity coefficients from the SIMP model. The CFD method is the only method that produces sensitivity coefficients different from the other derivative methods, most likely due to the approximation that occurs from its small finite step (h = 10−6) in Eq. (7). With respect to the sensitivity coefficients and final topologies from the DBTO method, the most consistent neural network-based derivative methods are NNJ and AD.
4.3 Homogenization-Based Topology Optimization.
The same boundary conditions, optimization settings, and convergence criteria are applied to the HBTO method as what was used in the DBTO method. The MBB beam problem is solved accordingly using an initial design (Fig. 4(c)) that assumes equal parameters across the design domain Ω, which satisfy the volume fraction constraint Vt = 0.50.
The multiscale topologies for the MBB beam at 10 iterations, 100 iterations, and the final iteration (i.e., a little over 1000 iterations) are shown in Fig. 9 for the HBTO method using the sensitivity coefficients from the four neural network-based derivative methods. The compliance values for each topology are also provided. The multiscale topologies are assembled from the microstructures with rectangular holes specified by . For plotting, each microstructure is discretized by a 100 × 100 element mesh. The compliance values of the topologies at 10 iterations were equal for the NNJ, CSM, and AD methods. As before, the compliance of the final topologies for all four methods converged to a similar compliance value around ϕ = 76. The convergence of the HBTO method was similar for all four methods, with the greatest similarity occurring between the NNJ and AD methods (see Fig. 10). Despite the different number of iterations for convergence between these two methods, they converged to a topology with the same compliance value ϕ = 75.70. The CFD method had the most different compliance values and convergence plots compared to other three methods, which were all similar.

The topologies at 10,100, and the final iteration for the MBB beam problem solved with the HBTO method using the sensitivity coefficients from the four neural network-based derivative methods. The final compliance and iterations till convergence are also shown. within of their bounds are set to the bounds to avoid very small holes or thin members.

The topologies at 10,100, and the final iteration for the MBB beam problem solved with the HBTO method using the sensitivity coefficients from the four neural network-based derivative methods. The final compliance and iterations till convergence are also shown. within of their bounds are set to the bounds to avoid very small holes or thin members.

Convergence plots of the MBB beam’s compliance for the HBTO method when using sensitivity coefficients from the different neural network-based derivative methods. See Fig. 9 for the corresponding final topologies.

Convergence plots of the MBB beam’s compliance for the HBTO method when using sensitivity coefficients from the different neural network-based derivative methods. See Fig. 9 for the corresponding final topologies.
As with the DBTO results, a matrix of final topology difference plots is provided in Fig. 11. The topology difference plot is computed between the assembled multiscale topologies, not the final parameters used to create the topologies. However, an MSE is computed between the parameters for each topology comparison. The smallest difference between the final topologies is seen between the NNJ, CSM, and AD methods. As seen in the DBTO method, the derivative method with the most different convergence plots had the most different final topology (i.e., the CFD method).

The matrix of topology difference plots between all of the final topologies of the HBTO-solved MBB beam problem (Fig. 9). Each topology difference plot corresponds to the difference between the final topology of the column’s method (bottom row) and the final topology of the row’s method (left column). The MSE between the parameters of each comparison is provided. Additionally, the MSE between the sensitivity coefficients of the initial design of each comparison is also provided.

The matrix of topology difference plots between all of the final topologies of the HBTO-solved MBB beam problem (Fig. 9). Each topology difference plot corresponds to the difference between the final topology of the column’s method (bottom row) and the final topology of the row’s method (left column). The MSE between the parameters of each comparison is provided. Additionally, the MSE between the sensitivity coefficients of the initial design of each comparison is also provided.
As before, the MSE between the sensitivity coefficients of the initial design of each topology comparison is provided in Fig. 11. Here, we observe the same phenomenon seen in the DBTO method, namely, that the sensitivity coefficients resulting from NNJ, CSM, and AD are equal, i.e., . As previously mentioned, the convergence and final results of the HBTO method are different due to an accumulation of numerical errors over the optimization iterations. The effect of this error accumulation can be seen in Fig. 9, where the compliance values for the NNJ, CSM, and AD methods are no longer equal after 100 iterations. The two most consistent neural network-based derivative methods are the NNJ and AD methods when considering the final compliance value, convergence plots, initial sensitivity coefficients, and similarity in final design variables. For further reference, the average computational time it takes to evaluate the sensitivity coefficients per optimization iteration is reported in Table 2 for the four neural network-based derivative methods. On average, the fastest derivative method is CSM, while the slowest derivative methods were NNJ and AD for the DBTO and HBTO methods, respectively. That being said, the implementations of the four derivative methods (see Appendix A) have not been optimized for efficiency, and therefore, the reported times are not definitive on which method is the fastest.
The average time spent computing sensitivity coefficients at each optimization iteration for the four neural network-based derivative methods
Derivative method | DBTO | HBTO |
---|---|---|
NNJ | 2.741 s | 2.988 s |
CFD | 0.342 s | 0.682 s |
CSM | 0.088 s | 0.180 s |
AD | 2.568 s | 8.553 s |
Derivative method | DBTO | HBTO |
---|---|---|
NNJ | 2.741 s | 2.988 s |
CFD | 0.342 s | 0.682 s |
CSM | 0.088 s | 0.180 s |
AD | 2.568 s | 8.553 s |
Note: The computational cost is greater for the HBTO method due to having double the number of design variables. The times are measured on the same workstation specified in Sec. 4.1.
5 Conclusion
This paper implements and compares four neural network-based derivative methods utilized to produce the sensitivity coefficients for two traditional topology optimization methods. The derivative methods studied include NNJ, CFD, CSM, and AD. The topology optimization applications include DBTO and HBTO. For each topology optimization method, a neural network is trained to approximate the material model of the method. For DBTO, a one-dimensional input and one-dimensional output MLP are trained to approximate the SIMP model’s interpolation of Young’s modulus. For HBTO, a two-dimensional input and four-dimensional output MLP is trained to approximate the homogenized and orthotropic stiffness tensors of microstructures with rectangular holes. These neural network material models are implemented in the topology optimization methods to predict the material properties and their corresponding sensitivity coefficients at each iteration. A general implementation of the four neural network-based derivative methods for an arbitrary MLP architecture has also been included for further reference.
The single-layer MLPs for both material models were tuned, trained, and tested using only sampled data from the models, i.e., no derivative information was provided. The resulting networks agreed well with the predicted material model and the testing dataset. After training, the comparative study is carried out by solving the MBB beam problem with the DBTO and HBTO methods using sensitivity coefficients supplied by each derivative method. Comparisons are made between the initial sensitivity coefficients, convergence plots, final topologies, compliance values, and design variables.
Several key observations were made from the comparative studies of both topology optimization applications. First, all four of the derivative methods produced final topologies and design variables that were similar to one another (e.g., MSE ≤7.58 × 10−2). Second, the derivative methods that produced the most different final topologies and design variables had the most different convergence paths. These derivative methods are CSM and CFD for the DBTO and HBTO, respectively. Third and most surprisingly, the NNJ, CSM, and AD methods produced the exact same sensitivity coefficients for the initial design of the MBB beam, at least near the nominal working precision of the computer. The CFD method produced similar sensitivity coefficients with a small MSE error to the other methods. Lastly, the NNJ and AD methods were the most similar methods regarding their convergence paths, final topologies, compliance values, and sensitivity coefficients. As such, these two derivative methods are recommended for optimization applications.
In the context of topology optimization methods with neural network material models, the NNJ, CSM, and AD methods are all capable of approximating the problem’s sensitivity coefficients with a well-trained MLP, regardless of the number of input and output dimensions. These findings differ slightly from our preliminary results [44], which indicated that all of the derivative methods were capable of approximating a network’s derivative to near working precision of the computer; however, this study was done in the context of networks trained to predict one-dimensional analytical functions with a single variable input. Additionally, a more accurate MLP material model was shown in the DBTO method to produce final topologies that were better performing and closer to the ground truth. Lastly, this work demonstrates that neural network-based sensitivity coefficients are sufficient for density-based and homogenization-based topology optimization methods.
From the results of this study, several recommendations can be made regarding neural network-based derivative methods. In most cases, NNJ and AD are the primary derivative methods recommended as they had the lowest error in sensitivity coefficients and showed the closest agreement to one another, even over numerous optimization problems. More specifically, if a software package natively supports AD, it is recommended as its utilization is significantly simpler than implementing NNJ. On the other hand, if AD is not supported, NNJ is recommended as it is far easier for an end-user to implement it into an existing software package or an in-house code. The other derivative methods (CFD and CSM) are valuable when computing the analytical derivative through the network involves discontinuous activation functions or when access to the network’s hyperparameters and other details is restricted, e.g., when interfacing with third-party networks. In particular, in the case of holomorphic activation functions, CSM will be preferred over CFD.
The trust region algorithm utilized in this work is slower to converge in the topology optimization problems studied. Consequently, a large number of iterations are required to solve each problem. This is most noticeable in the HBTO method, where certain parameters (e.g., rectangular hole height) near the boundary result in poorly connected microstructure. Additionally, since a ground truth comparison was unavailable for the HBTO method, the conclusions derived from Sec. 4.3 are based only on the numerical approximations provided by the four derivative methods. Lastly, this work employs dense MLPs as the neural network for the comparative study. Therefore, any conclusions from this paper regarding neural network-based derivatives should be taken with this architecture in mind.
Footnotes
Acknowledgment
This paper is based upon work supported by the National Science Foundation, Candent Technologies, and the U.S. Naval Air Systems Command. The authors gratefully acknowledge their support. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of our supporters.
Funding Data
National Science Foundation (Grant No. 1842164)
U.S. Naval Air Systems Command (STTR Contract No. N68335-22-C-0448)
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The data and information that support the findings of this article are freely available.2
Appendix A: python Codes
This work was primarily developed and deployed in python. To aid the reader in reproducing our results, the source code is provided in the following GitHub® repository.3 The repository includes the MLP models utilized in this work and their training scripts, the DBTO and HBTO methods, and the implementation of the four neural network-based derivative methods for an arbitrary MLP architecture. The specific files and their function are:
NN_derivatives_examples.py: This script provides a general implementation of the four derivative methods of Sec. 2 with several multivariate examples.
Train_DBTO_NN.py: This script trains the neural network material model for DBTO.
Train_HBTO_NN.py: This script trains the neural network material model for HBTO.
Run_DBTO_NN.py: This script executes DBTO on the MBB beam example using the neural network material model.
Run_DBTO_SIMP.py: This script executes DBTO on the MBB beam example using the SIMP material model.
Run_HBTO_NN.py: This script executes HBTO on the MBB beam example using the neural network material model.
Appendix_B_example.py: This script trains the MLP and performs the NNJ calculations in Appendix B.
Appendix B: Example of Neural Network Jacobian Calculations

Architecture of the multilayer perceptron feedforward neural network used to approximate y = x2 in the Appendix B example

Architecture of the multilayer perceptron feedforward neural network used to approximate y = x2 in the Appendix B example