## Abstract

This work describes neural network surrogate models for calculating the effective mechanical properties of a periodic composites. The models achieve good accuracy even when only provided with training data sampling a small portion of the design space. As an example, the surrogate models are applied to solving the inverse design problem of finding structures with optimal mechanical properties. The surrogate models are sufficiently accurate to recover optimal solutions in general agreement with established topology optimization methods. However, improvements will be required to develop robust, efficient neural network-based surrogate models and several directions for future research are highlighted here.

## 1 Introduction

Convolutional neural networks (CNNs) have been widely and successfully applied to image recognition problems identifying or categorizing images based on raw pixel data [1–5]. This work applies CNNs to the homogenization of periodic microstructures. The goal is a fast surrogate model for calculating the effective properties of periodic structures. Formally, this work presents identifying effective mechanical properties of periodic structures as an image regression problem—given a description of the structure as a binary bitmap image, the model returns the mechanical properties of the homogenized, periodic system.

Neural networks are composite functions organized into a logical sequence of layers. At the start of the sequence, a neural network applies a layer of functions to the set of input variables, here a description of a periodic cell structure. Subsequent layers take as input the output of some of the functions in the previous layer. Finally, the last layer in the network maps to the output data, here the effective mechanical properties of the periodic cell. The type of function and the connection between layer outputs and inputs defines the neural network topology. One reason for the increased popularity of neural networks in recent years is that modern GPUs can efficiently fit them to data using stochastic gradient descent algorithms. Similarly, once trained, GPUs can quickly evaluate the resulting neural network model. This paper describes a deep neural network surrogate model consisting of many interconnected layers of neurons. For more details of deep neural networks, see Ref. [6] among many other recent surveys.

The current work uses CNNs to represent the effective, homogenized response of a periodic mechanical system. Similar past work includes Papadrakakis et al. [7] who used neural networks as surrogate models for simple, parameterized structures. This work was later extended to the optimization of frame structures [8–11]. These early studies severely limited the number of input variables by heavily parameterizing the structural geometry and loading conditions, likely reflecting the limited computational resources available for training models at the time. Unger and Könke [12] used neural networks to homogenize the response of a mechanical system to a higher scale. However, they did not take advantage of the structure of the governing equations by using a convolutional network topology. Le et al. [13] used neural networks to homogenize nonlinear elastic composites; however, they used a simplified analysis on the mesoscale that did not allow arbitrary topologies.

In addition to quantifying the accuracy of the surrogate modeling approach, this work applies the trained surrogate models to solve an example inverse problem. Design and optimization by surrogate modeling is a well-known technique [14,15]. The CNN surrogate approach, given sufficient training data, produces optimal structures with similar mechanical properties to those produced using SIMP topology optimization methods [16]. While sufficiently accurate to solve this sample inverse problem, additional work will be required to develop deep neural networks that can accurately represent the results of mechanics problems from sparse training data. Several directions for future research are highlighted in the conclusions below.

## 2 Data Set, Convolutional Neural Network Topology, and Training

### 2.1 The Example Problem.

The example problem considered in this work is the homogenization of a 2D, plane stress, periodic structure with square symmetry. Figure 1 shows a single unit cell. The structure of interest is the infinite periodic tiling of the single cell.

The methods described here consider a discretized unit cell that divides the cell into regular, square regions of materials that this work refers to as pixels. Each pixel can either be isotropic solid material with elastic properties Young’s modulus *E* = 1000 and Poisson’s ratio ν = 0.3—colored black in Fig. 1 and subsequent diagrams—or be void—colored white in the diagrams. The number of pixels along each edge of the square unit cell, 2*N*, describes the size of the discretization. The unit cell is then a binary bitmap image with black pixels representing material and white pixels representing void.

**represents each structure, where**

*p***has length**

*p**p*

_{i}= 1, it indicates that the corresponding pixel is a solid material, and if

*p*

_{i}= 0, it indicates the corresponding pixel is a void material. The operation

**takes a vector**

*Q***to the**

*p**N*×

*N*boolean matrix representing the upper-left quadrant of the square unit cell. The operation

**takes a vector**

*F***to the 2**

*p**N*× 2

*N*boolean matrix representing the full, periodic structure.

*E*. The first property is a geometric relation:

This work considers two methods for evaluating the stiffness of a particular structure ** p**: a finite element simulation

*E*(

**) and a surrogate model**

*p**E*′(

**). The surrogate model is a convolutional neural network trained on a large collection of data from finite element simulations. The finite element method takes the full unit cell**

*p***(**

*F***) and creates a 2D mesh representing the cell geometry. The FE model uses a regular mesh of the cell, including void regions, using the bulk material properties for the solid regions and the soft material properties**

*p**E*= 10

^{−3}and ν = 0.3 for the void regions. This prevents singular structures, so the FE model can return effective properties for any arbitrary structure

**. The Young’s modulus of the effective structure can be deduced by applying a normal stress to the cell, calculating the resulting normal strain, and taking the ratio.**

*p*This study considers this problem with two discretizations: a small grid using *N* = 10 (*n*_{diag} = 55) and a large grid using *N* = 50 (*n*_{diag} = 1275). For each case, the full training database consists of 2,000,000 finite element evaluations of random structures uniformly sampling the design space. For the large grid, the total size of the finite design space is 2^{1275} ≈ 10^{383}, and so, the database of 2 × 10^{6} simulations samples only a very small fraction of the complete design space.

Each finite element simulation was completed using WARP3D,^{1} an open-source FEA package, in serial on a single core of a Intel^{®}Xeon^{®}E5-2698 CPU. Each individual simulation is independent of the others, so the 20 cores of the processor generated the database by running individual simulations in parallel. It took just over 115 h wall time to generate the complete database for the large grid.

### 2.2 The Convolutional Neural Network Surrogate Model.

The network topology was determined by grid search hyperparameter optimization. Figure 2 shows the final topology arrived at through this process and shows the general framework used in the hyperparameter study. The form of the network is standard for image recognition problems [1,2] and consists of alternating layers of convolutional filters convolving the structure over the indicated window, with a stride of one, and applying a ReLU activation function followed by a max pool layer to reduce the spatial dimension of the data. Before applying the convolution the images are periodically padded to handle the part of the convolution extending outside the image at the boundaries. After the convolutional filters, the model flattens the image and applies a fully dense layer, again with ReLU activation functions. During training, a dropout filter is applied to help prevent overfitting. A final fully dense layer reduces the data to the single output—the effective modulus of the structure. This network was implemented in the Keras [17] framework using the TensorFlow [18] backend to train and evaluate the model on a single NVIDIA^{®}Quadro^{®}M6000 GPU with 24 GB of memory.

Within this general network framework, specific layer parameters were determined by brute force hyperparameter optimization. The complete set of 2,000,000 data points was divided into three categories: test (20% of the total), validation (20% of the remaining 1.6 million), and training (the remainder). Each potential model in the hyperparameter study trains against the training set and its accuracy is assessed against the validation set. Table 1 lists the different parameters and features considered in the hyperparameter study. All combinations of the different options in the table were tested with a grid search. Each network was trained using the Adam stochastic optimizer [19] over 25 epochs with a batch size of 1024. The different hyperparameter options were compared using the validation data set and the best selected for the final network topology shown in Fig. 2.

Option | Choices |
---|---|

Padding | Symmetric, zero |

First convolution layer window size | 5,3 |

Second convolution layer window size | 5,3 |

Third convolution layer window size | 5,3 |

First convolution layer depth | 16,32 |

Second convolution layer depth | 16,32 |

Third convolution layer depth | 16,32 |

Dropout parameter | 0.25,0.5,0.75 |

Option | Choices |
---|---|

Padding | Symmetric, zero |

First convolution layer window size | 5,3 |

Second convolution layer window size | 5,3 |

Third convolution layer window size | 5,3 |

First convolution layer depth | 16,32 |

Second convolution layer depth | 16,32 |

Third convolution layer depth | 16,32 |

Dropout parameter | 0.25,0.5,0.75 |

The discussion below studies the effect of reducing the size of the training database on the accuracy of the surrogate model. Using the large grid case as a timing study, the cost of one forward model evaluation of the trained CNN is 7.5 × 10^{−5} s while the cost of one forward evaluation of the FEA model is 0.207 s. So, once trained, the CNN is three orders of magnitude faster than the direct evaluation. The time required to train the CNN over the full data set is about 3 min. Therefore, the cost of developing the surrogate model is essentially entirely in generating training data (115 h).

As the model trains using a stochastic optimization process, each run through the fit process will produce a slightly different model. This fitting process was repeated 10 times for both the *N* = 10 and *N* = 50 in order to assess the reproducibility of the approach.

## 3 Results

### 3.1 Surrogate Model Accuracy.

Figure 3 plots the accuracy of the surrogate model as a function of the training database size ((*a*): mean squared error $R=(1/ntest)\u2211i=1ntest((Ei\u2212E^i)/E0)2;$ (*b*): maximum absolute error $A=max(|Ei\u2212E^i|/E0)$, where *E*_{i} are model predictions, $E^i$ are finite element evaluations, and *E*_{0} is the modulus of the solid material). About 20% of the total database (400,000 simulations) was reserved for testing. A subset of the remaining 80% of the database were used to train and validate the CNN. For each reduced database, 80% of the data was used for training and 20% for validation. Increasing the size of the database decreases the error between the surrogate and the direct simulation results, measured either globally or locally.

The figure shows the mean error for 10 models constructed using 10 repetitions of the training process, discussed previously, as well as the error for the worst (maximum error) and best (minimum error) model for each case. There is very little difference between the average response of 10 models, the responses of the most and least accurate models, and the response of a randomly selected model, which suggests that the training process is reproducible.

The surrogate modeling problem solved over the smaller domain was somewhat less accurate than the large *N* = 50 domain for smaller training databases. This somewhat counter-intuitive result is discussed below.

### 3.2 Optimization Through the Surrogate Model.

*P*is the perimeter of the solid part of the structure. Here,

*E*′ is evaluated using a randomly selected trained surrogate model from the group of 10 models trained over the largest database. The perimeter constraint is required to regularize the problem. The specific example here imposes a 50% relative density constraint on the general problem and a perimeter constraint of

*P*

_{0}= 25 for

*N*= 10 and

*P*

_{0}= 250 for

*N*= 50 in units of pixel edge lengths.

This work solves the optimization problem with genetic algorithm (GA) optimization. The GA uses standard tournament selection with tournament size *t*_{s}, two-point crossover applied to sequential pairs of individuals with probability *p*_{cx}, and binary bit-flip mutation operator, applied to each individual with probability *p*_{mut} and, if an individual is chosen for mutation, flipping each bit in *p*_{i} with probability *p*_{bit}. As with the training data set, each individual in the initiation population is generated by first randomly selecting a number of bits to be true, *r*, and then randomly choosing *r* entries in *p*_{i} to be set to true.

Table 2 lists the optimization parameters for the examples below. These parameters were selected to achieve convergence to a stable structure in less than 50 generations. Beyond that criteria, the GA was not tuned to achieve optimal performance. A simple study was performed to test the GA: over the 10 × 10 domain the GA, when run using direct FE evaluations of the objective function, converges to a correct solution in the prescribed 50 generations. This structure is the same one found using the surrogate model in Fig. 4(a) for *N* = 10. This test does not directly demonstrate the effectiveness of the GA over the larger 50 × 50 domain but at least suggests that the optimization errors over this larger domain, discussed below, are caused by the surrogate model and not the GA.

Parameter | Description | Value |
---|---|---|

n_{pop} | Population size | 1000 |

n_{gen} | Number of generations | 50 |

t_{size} | Tournament size | 3 |

p_{cx} | Crossover probability | 0.5 |

p_{mut} | Mutation probability | 0.25 |

p_{bit} | Bit flip probability | 0.05 |

Parameter | Description | Value |
---|---|---|

n_{pop} | Population size | 1000 |

n_{gen} | Number of generations | 50 |

t_{size} | Tournament size | 3 |

p_{cx} | Crossover probability | 0.5 |

p_{mut} | Mutation probability | 0.25 |

p_{bit} | Bit flip probability | 0.05 |

*c*(

**) is the constraint function,**

*p**c*

_{0}is the constraint value, and

*b*and

*k*are the penalty parameters. This penalty is subtracted from the objective function in defining the GA fitness. Separate penalty parameters were selected for each of the two constraints (

*b*= 10 and

*k*= 1000 for the density constraint,

*b*= 10 and 1000 for the perimeter constraint).

Figure 4(a) shows the resulting optimal structure for both the small and large optimization domains. It is a square lattice, which is known to be the correct result [20]. To validate the GA/surrogate method, Fig. 4(b) shows the results of applying the SIMP optimization method [16,21] to the same problems. The SIMP problem was solved in a custom python code, based on the standard 88 line code [22], implemented in open-source python by Ref. [23], and modified to handle the symmetry constraint and periodic boundary condition. The two solutions are in reasonable agreement, allowing for the randomness introduced into the surrogate model optimization by the stochastic training and optimization methods and differences in the optimization techniques.

## 4 Discussion and Conclusions

### 4.1 Convergence.

Figure 3 demonstrates that the accuracy of the surrogate models increases with the size of the training database. However, there are diminishing returns to adding more training data—eventually large increases in the database size produce only small increases in the surrogate model accuracy. In the limit of providing a complete training database covering each configuration in the design space, the CNN should be perfectly accurate—all it would have to do is index a design to the corresponding mechanical property. However, even for the *N* = 10 case, 2 × 10^{6} training simulations only sample a tiny fraction of the design space (2 × 10^{6}/2^{55} ≈ 6 × 10^{−11}). Viewed in this context, the CNN surrogate models are remarkably accurate given the relatively small amount of training data provided.

Figure 3 shows that the *N* = 50 model is actually more accurate than the *N* = 10 model for small training database sizes, though both models are comparatively accurate for larger training database. The amount of data in the training database scales linearly with the number of design variables but the number of neural network weights and biases does not—the *N* = 10 model has 21,505 network parameters whereas the *N* = 100 model has 93,185 parameters. So, relative to the number of design variables, the *N* = 100 case actually has less information to train to data than the *N* = 10 case $(p10=2150555=391$; $p50=931851275=73.1)$, which may explain why this model is more accurate for smaller training data sets.

### 4.2 Choice of Network Topology.

CNNs were originally inspired by the structures and methods of image recognition in the visual cortex [24] and have successfully been applied to a wide range of image recognition problems (c.f. Ref. [2,1]). The convolutional topology combines output from small regions of adjacent structures—in the final network topology used here 5 × 5 and 3 × 3 regions of adjacent pixels. The idea underlying such networks is to train the model to recognize features—local patterns with support over small regions in the image. The idea of local support is fundamental: solid mechanics problems are of this local type. For example, the solution to classical elasticity problems only depend on the fourth derivative of a stress function and, generally, the stress field depends only on the gradient of the displacements, i.e., the strains, and not on the displacement field directly.

The structural mechanics problem underlying the training database is local but the actual database consists of homogenized information describing some average property of the complete unit cell. This kind of problem is non-local. For example, a homogenized quantity, such as the average cell Young’s modulus, can be described as a weighted integral over the cell volume. A fully connected dense layer of neurons can describe this kind of volume integral over the whole structure. In a dense layer, each neuron is connected to each other neurons. Strictly, this layer is over-connected for just homogenization—homogenization requires only local-to-global communication, filtering the results or features of local regions to a global average. However, the dense layers in the final network topology serve two purposes: homogenization and feature recognition. The dense layers not only average properties but also learn which features, from the convolutional layers, should be combined to represent the simulation results.

However, the surrogate models, particularly models fit with smaller training databases, can conflate a structure and its inverse—i.e., the structure where the solid material in the original structure is void and the void material is solid. A potential cause of this failure is the CNN topology itself, which is designed to recognize features based on contrast. This means, for example, CNNs can be fooled by optical illusions that also fool the human visual system [25]. This surrogate model failure points to the need for specialized network topologies specifically for mechanics problems. One promising area of future work is developing network topologies that respect physical constraints. In this example, the network structure was developed to respect the physical square symmetry of the problem. Other work has developed CNNs that are generically SO(3) invariant [26], which could be directly relevant to mechanics problems. Physic-based networks have been applied successfully in other areas of science [27,28]. Potentially, specialized network topologies could reduce the required size of the training database and make the surrogate modeling method more generally applicable to problems with sparse data. Additionally, physically constrained topologies at least guarantee that the heuristic surrogate model produces physically reasonable results.

### 4.3 Optimization.

Figure 3 demonstrates that the surrogate models have sufficient predictive power to approximate known solutions when solving inverse problems. The choice of problem definition and the selection of an optimization method are orthogonal to the development of the deep neural network surrogate model. Different optimizers and problem definitions could be explored while retaining the basic idea of optimization via a CNN surrogate model. The example here uses a hard 0/1 definition and a genetic algorithm solver. Potentially, the results obtained here could be improved by an alternate representation of the problem or by using a different optimization technique. Deep neural networks are, by construction, continuous functions and so gradient-based optimizers could be applied. An efficient method for computing the appropriate sensitivities would need to be derived, possibly based on the backpropagation algorithm used to train the network parameters.

### 4.4 Costs and Benefits of Surrogate Modeling.

The computational cost of generating the surrogate model is dominated by the cost of building the training database—here running a large number of FE simulations. Once the model is trained, the forward surrogate model is much faster than the finite element simulation. The cost of generating the train data can be amortized if the same surrogate model can be used in multiple applications. The surrogate approach then is amenable to problems of wide interest or common in engineering practice, particularly problems that can be relatively easily parameterized. Any surrogate modeling approach will remain a heuristic as the exact representation of the forward problem is replaced by the inexact surrogate. Additional work on quantifying the influence of the surrogate model accuracy on the final solution to the problem of interest is required to develop robust methods.

### 4.5 Summary and Future Work.

The key results of this study are as follows:

A convolutional neural network can serve as a fast surrogate model for slower fully resolved simulations predicting the mechanical properties of periodic composites.

The resulting surrogate models are sufficiently accurate to solve inverse problems.

The main barrier to the wider use of neural network surrogate models is collecting or generating sufficient training data.

This study considered a simple mechanical problem and improvements to the approach will be required to develop a robust surrogate modeling method. The previous discussion highlights three areas where additional work is required:

Improved network topologies specialized for mechanics problems that could improve surrogate model accuracy and reduce the size of the required training database.

Improved methods for interpreting the results and quantifying and controlling the accuracy of the heuristic surrogate models.

Improved optimization methods, for example, an efficient scheme to calculate the gradient of the surrogate model so that gradient-based optimization methods could be applied.

## Footnote

## Acknowledgment

This research was sponsored by the U.S. Department of Energy, under Contract No. DE-AC02-06CH11357 with Argonne National Laboratory, managed and operated by UChicago Argonne LLC. The author thanks Prasanna Balaprakash and Sam Sham for providing feedback on early drafts of the manuscript.