## Abstract

This work investigates surrogate modeling techniques for learning to approximate a computationally expensive function evaluation of 3D models. While in the past, 3D point clouds have been a data format that is too high dimensional for surrogate modeling, by leveraging advances in 3D object autoencoding neural networks, these point clouds can be mapped to a one-dimensional latent space. This leads to the fundamental research question: what surrogate modeling technique is most suitable for learning relationships between the 3D geometric features of the objects captured in the encoded latent vector and the physical phenomena captured in the evaluation software? Radial basis functions (RBFs), Kriging, and shallow 1D analogs of popular deep 2D image classification neural networks are investigated in this work. We find the nonintuitive result that departing from neural networks to decode latent representations of 3D objects into performance predictions is far more efficient than using a neural network decoder. In test cases using datasets of aircraft and watercraft 3D models, the non-neural network surrogate models achieve comparable accuracy to the neural network models. We find that an RBF surrogate model is able to approximate the lift and drag coefficients of 234 aircraft models with a mean absolute error of 1.97 × 10^{−3} and trains in only 3 seconds. Furthermore, the RBF surrogate model is able to rank a set of designs with an average percentile error of less than 8%. In comparison, a 1D ResNet achieves an average absolute error of 1.35 × 10^{3} in 38 min for the same test case. We validate the comparable accuracy of the four techniques through a test case involving 214 3D watercraft models, but we also find that the distribution of the performance values of the data, in particular the presence of many outliers, has a significant negative impact on accuracy. These results contradict a common perception of neural networks as an efficient “one-size-fits-all” solution for learning black-box functions and suggests that even within systems that utilize multiple neural networks, potentially more efficient alternatives should be considered for each network in the system. Depending on the required accuracy of the application, this surrogate modeling approach could be used to approximate an expensive simulation software, or if the tolerance for error is low, it serves as a first pass which can narrow down the number of candidate designs to be analyzed more thoroughly.

## 1 Introduction

Performance evaluation is a critical component of the design process as it enables designers to receive feedback with respect to performance criteria for candidate designs. However, because building and testing every candidate design is too expensive, computer simulations that approximate the test environment are typically used as an initial performance evaluation that is less expensive than building and testing. When the test environment is sufficiently complex, then even computer simulation can prove to be prohibitively costly for a large number of candidate designs. Additionally, computational fluid dynamics (CFD) is notorious for being computationally expensive. One CFD calculation can take days or even weeks to complete, even with modern computing power [1,2].

To mitigate the challenges of simulation, a further approximation of the testing environment is made by constructing a surrogate model that approximates the simulation environment. This surrogate model typically uses function approximation techniques such as polynomial regression, splines, radial basis functions (RBFs), neural networks (NNs), support vector machines, or Kriging (also called spatial correlation modeling) to approximate the simulation output [3].

Surrogate models are typically able to approximate a simulation with a high degree of accuracy such that they can serve as a substitute for the simulation. However, these existing surrogate modeling techniques often rely on a parameterized design space in which a complex 3D shape is distilled down to dozens or fewer design variables. The purpose of design parameterization is to characterize parameters that control the shape of a geometric model [4]. This frees the designer from the traditionally laborious task of directly manipulating the design geometry. Figure 1(a) shows a parameterized surrogate model, in which a set of human-defined parameters define the design space. This is contrasted in Fig. 1(b), in which the design space is instead represented by a set of points.

Design parameterization is a laborious task that requires a high degree of domain expertise, and design of methods to optimize parameterizations for specific applications is the subject of research in itself [4,6,7]. This makes it prohibitive to use parameterized surrogate modeling techniques when given a database of 3D shapes that are not parameterized such as in online shape repositories like ShapeNet [8] or ModelNet40 [9]. Additionally, the advent of deep generative neural networks (DGNNs) allows the direct generation of non-parameterized 3D designs [10,11]. These DGNN approaches to design have not used objective performance evaluation because of the difficulty in evaluating their large datasets of 3D models.

One popular approach to 3D object generation is autoencoding [12–14]. The idea is to take a 3D point cloud and learn to map it into a latent vector that preserves geometric information about the object. Then, in the case of PointNet [12], features from this latent vector are extracted for classification, and in the case of AtlasNet [13], the object is “decoded” back into the original object. While no notion of performance is embedded in this training process, we propose a surrogate model that can be thought of as replacing the decoders in each of these approaches with one that decodes the latent vector into a performance vector instead of back into a 3D object. This latent vector is typically of length 1024, and it represents a learned parameterization of the 3D space that can be difficult for a designer to intuitively describe. However, the dimensionality reduction of the 3D object provided by the autoencoder allows non-parameterized surrogate modeling techniques to learn form-function relationships to be employed on data in this format. Once trained, a surrogate model would allow for a large number of 3D models to be evaluated in seconds. In cases where the tolerance for error is very low, this approach can serve as a first pass estimate of the performance criteria and eliminate the need to do expensive simulation for a portion of the candidate designs. We postulate that this approach would be especially well-suited to evaluating designs from a non-parameterized generative model, such as a DGNN, as it would allow for many designs to be sampled without incurring the cost of evaluating every design in an expensive simulation environment.

Given that the form of a 3D object can be represented as a 1D latent vector of geometric features, and given that the function that the 3D object performs can be represented as a performance score that is acquired from a physics-based simulation, we investigate the following research questions:

RQ1 Can the relationship between the 1D latent vector and the performance score be learned by a surrogate model?

RQ2 Which surrogate modeling technique is most efficient at learning this form-function relationship?

The knowledge gained from answering these two research questions will enable the efficient evaluation of large datasets of non-parameterized 3D models. The rest of this paper is organized as follows. Section 2 reviews related literature and Sec. 3 details our method. Section 4 then applies this method to two case studies involving 3D models of aircraft and watercraft. Section 5 discusses the results before the conclusions and future work are discussed in Sec. 6.

## 2 Literature Review

In this section, related work in 3D object feature extraction and surrogate modeling is reviewed.

### 2.1 3D Feature Extraction.

Deep neural networks have shown the ability to automatically extract useful features from high-dimensional input data. Architectures such as ShapeNet [8], VoxNet [15], and 3D ShapeNets [9] have achieved high classification accuracy on voxelized 3D objects, but because they rely on a voxelized representation of 3D space, the dimensionality of the network increases cubically with the voxel resolution. Approaches such as OctNet [16] and Vote3D [17] introduce methods to mitigate this problem, but they still have trouble processing large point clouds.

Another strategy is to convert a 3D object into a vector and then perform feature extraction on that vector for classification. Fang et al. [18] combined traditional shape feature extraction techniques with fully connected neural networks to classify 3D objects. Nash et al. [14] developed a variational autoencoder which models both 3D surface points and surface normals to generate surfaces.

Qi et al. [12] develop a method to directly convert unordered point sets into an encoded feature vector, which is invariant under transformations and captures local geometric structures of points. They show that this feature vector can be used to achieve state-of-the-art 3D object classification, part segmentation, and semantic segmentation. Groueix et al. [13] base their 3D shape encoder off of PointNet, which has achieved state-of-the art 3D object generation results by decoding the feature vector back into a 3D point cloud. Our work uses the AtlasNet point cloud encoder and then introduces a novel performance-based decoder that learns features that relate directly to a design's ability to meet a performance objective function and that can be evaluated far more efficiently than a computationally expensive performance evaluation environment. This allows designers to efficiently evaluate thousands of 3D models.

### 2.2 Non-Parameterized Surrogate Modeling.

Metamodels approximate the output of costly simulations to allow a large sampling of the candidate design space to be explored. In their review of metamodeling techniques across many applications, Wang et al. [3] motivated metamodeling through the discussion of computationally expensive simulation. Many commonly used classical function identification techniques used in surrogate modeling are discussed, including RBFs, Kriging, and NNs. These three techniques are also identified by Mullur and Messac [19] as the most commonly used non-parameterized surrogate modeling techniques. We discuss related work for each of these techniques in Secs. 2.2.1–2.2.3.

#### 2.2.1 Radial Basis Functions.

RBF methods use linear combinations of a radially symmetric functions based on the Euclidean distance of training points. RBF methods have the desirable properties of being able to accurately model arbitrary functions, handle scattered data points in multiple dimensions, and are known for their relatively simple implementation [19]. RBF methods have been demonstrated to be a highly effective surrogate modeling technique in terms of accuracy and robustness, and perform particularly well given a small number of training samples [20].

Ellbrant et al. [21] used an RBF-based metamodel to approximate the output of a CFD analysis tool that evaluates blade geometries given a design parameterized by ten variables. Their approach was shown to reduce the total design time from approximately 2 weeks to 3.5 days.

#### 2.2.2 Kriging.

Kriging models the response of a computer model as a linear model plus stochastic systematic departure [22]. Kleijnen [23] discussed the advantages of Kriging over classic linear regression models, he expands the discussion to included metamodeling of random simulation, in which the output of a computer simulation is not deterministic for a given input. Kriging is formulated as a generalized linear regression model that accounts for correlation in the residuals between the regression model and the direct data from the simulation. Simpson et al. [24] show the advantages of Kriging compared with second-order polynomial response surface models. Jia and Taflanidis [25] apply a Kriging metamodel to approximate simulations of high-dimensional wave and surge responses in real-time storm and hurricane risk assessment. The model takes five parameters and returns a risk estimate across many time instants and over a large coastal area, which is a high-dimensional output.

However, standard Kriging models are not suited to problems with high-input dimensionality, because a large covariance matrix must be inverted several times to estimate the model's parameters. We therefore employ Kriging with partial least squares (KPLS) for this application [26]. The PLS method is an established tool for high-dimensional problems that project the input and output variables into a smaller space using principal components [27]. By incorporating this information into the Kriging correlation matrix, the number of hyper-parameters is reduced and can be solved much more quickly [26].

#### 2.2.3 Neural Networks.

Neural networks have been well explored as non-parameterized surrogate models. For example, Caixeta and Marques [28] use a neural network metamodel-based multidisciplinary design optimization for wing design. Ferreiro et al. [29] developed a neural network metamodel that can make multi-criteria decisions to optimize for low environmental impact in the design of one-way slabs. Sreekanth and Datta [30] employ a neural network-based surrogate model for the multi-objective management of saltwater intrusion in coastal aquifers.

In recent years, neural networks have been improved to be able to achieve state-of-the-art results in classification tasks [31–35]. These improvements have also been leveraged for regression tasks such as speech enhancement [36,37], human pose estimation [38], and object tracking [39]. Kossaifi et al. [40] showed that by augmenting state-of-the-art classification networks such as ResNet [35] and VGG [34], their high performance will transfer to regression tasks.

These image classification networks have also been adapted to 1D data. Solovyev et al. [41] proposed 1D ResNet and VGG architectures for understanding simple speech commands. They classify a 16384 × 1 dimensional waveform into one of 12 categories. Hurtado et al. [42] also adapt ResNet and VGG to one dimension in order to classify protein models. These approaches both utilize datasets comparable in size to image datasets. Our application is unlike these approaches in that 3D model datasets are much smaller than image datasets, on the order of thousands instead of millions. Deep neural networks are non-parameterized models that are able to achieve extremely high accuracy across many applications, but they have the drawback of requiring a large amount of data. To mitigate this problem for our application, we implement our own shallower versions of ResNet and VGG adapted for one-dimensional data.

## 3 Method

This section details our proposed method to use a surrogate model to replace the decoder of a 3D autoencoder neural network to predict the performance of a 3D model from only a point cloud of the model. The surrogate modeling problem that our performance-based decoder solves is significantly different than the classification and reconstruction tasks addressed by the neural network decoders of other approaches [12,13]. This motivates the exploration of alternate decoder models that tie directly to performance. Section 3.1 describes the formulation of the overall performance prediction model, and Sec. 3.2 describes the surrogate modeling techniques explored in this work.

### 3.1 Performance Prediction Model Formulation.

*F*(·), which is explicitly defined as follows:

**x**is a 3D model belonging to a dataset of encoded models

**X**,

**y**is the performance metric vector, and

*F*(·) is the function implied by the evaluation environment

*F**(·) is the approximation of

*F*(·) implied by the surrogate model, and the task of the performance predictor is the optimization problem given by

As shown in Fig. 2, our performance prediction surrogate model takes as input an autoencoded latent representation of the 3D point cloud, which we represent as $x~$. This latent representation comes from the encoder portion of AtlasNet [13], which itself is based on PointNet [12], a network that provides state-of-the-art point cloud analysis results. This encoder network transforms a point cloud into a 1024 latent vector through a series of multilayer perceptrons as well as a single symmetric function, max pooling, to make the model invariant to permutation of the input point cloud data [12]. This transformation can be thought of as a learned way to summarize a shape by a sparse set of keypoints. Once the geometric information of the point cloud has been compressed by the autoencoder, it is of a tractable dimension for surrogate modeling techniques.

**z**is the true performance metric vector and $z^$ is the predicted performance metric vector

The prediction model training phase consists of computing the loss function $MAE(z,z^)$ defined in Eq. (4) over a batch of designs from the training dataset and then repeating for many epochs. This process is illustrated in Fig. 2. In Sec. 3.2, we present the formulation of each of the surrogate models that will be used to decode the autoencoded geometry into a performance vector as shown in Fig. 2.

### 3.2 Surrogate Modeling Techniques.

In this section, we discuss the surrogate modeling techniques explored for the performance decoder, which learns to transform **x** into $x~$. Sections 3.2.1 and 3.2.2 describe the implementations of RBF and KPLS, respectively, used in this work. We use the open-source Surrogate Modeling Toolbox [43] implementations of both of these methods. Section 3.2.3 describes the architectures of our customized 1D implementations of ResNet and VGG. By comparing each of the surrogate modeling techniques in this section, we will be able to answer RQ2 by examining whether any of the proposed models outperform the others with respect to learning form-function relationships from autoencoded latent vectors of 3D point clouds.

#### 3.2.1 Radial Basis Function.

*i*th training input vector,

**w**

_{p}is the vector of polynomial coefficients, and

**w**

_{r}is the vector of basis function coefficients.

#### 3.2.2 Kriging With Partial Least Squares.

**y**=

*F*(

*x*) a realization of the stochastic process:

*j*= 1, …,

*m*,

*f*

_{j}is a known independent basis function, β

_{j}is an unknown parameter, and

*Z*is a zero-mean Gaussian random variable with a stationary covariance function

*k*.

*d*is the dimension size of $x~$. Estimating these hyper-parameters in cases where

*d*is large is prohibitively computationally expensive [26]. KPLS mitigates this problem by projecting input variables onto a new space formed by the principle components of the input variables. This leads to the following kernel for KPLS:

*h*is the number of principal components and

**w**

_{i}is a vector of weights for each principle component for a given data sample. This leads to considerably fewer hyper-parameters than the standard Kriging case. The derivation of this kernel can be found in Ref. [26].

#### 3.2.3 Neural Networks.

This section details the two neural network (NN) architectures we implement for the surrogate model: a 1D ResNet architecture and a 1D VGG architecture. These two NN architectures are considered state-of-the-art for general function regression, and the depths of the networks are chosen in consideration of the small size of 3D model datasets compared with image datasets.

*N*is the batch size,

*C*is the number of channels, and

*L*is the number of input features.

*ɛ*is a small positive constant.

Next, we discuss the architectures of the 1D ResNet and 1D VGG neural network models.

*1D ResNet*: This 1D ResNet implementation is based on the popular 2D ResNet for image classification tasks [35], with modifications for this low data one-dimensional case such as modified filter lengths and shallower network architectures. Figure 3 shows the ResNet building block that is sequentially placed to create networks of varying depths. The left path consists of two sequential convolutions, and the right path is the “identity” connection that adds the unmodified (except for proper scaling) previous layer and improves the ability of neural networks to exploit increased depth even with less training data. Convolution kernel sizes in our 1D case are selected to match the 2D case; so, 3 × 3 2D convolutions become 1D convolutions of length 3.

He et al. [35] observed that the accuracy of ResNet increased with network depth, but with the caveat that a deeper network requires more training data. The image classification domain in which this approach was originally proposed is data-rich, and thus, the shallowest networks proposed in Ref. [35] is an 18-layer network. Given that our dataset is considerably smaller than image classification datasets, we extrapolate to a 12-layer case that reduces the depth of the network while maintaining the ResNet structure. Table 1 provides the details of each of this architecture.

*1D VGG*: This 1D VGG network is based on Simonyan and Zisserman's very deep convolutional neural network structure [34], once again with modifications for a small dataset and a one-dimensional input vector. Figure 4 shows the architecture of a VGG building block, which is a series of 1D convolutions followed by a max-pooling layer. The block size *B* indicates the number of consecutive 1D convolutions before max pooling occurs. Once again, convolutional kernel sizes are selected to match their 2D counterparts. For these networks, a pooling size of 4 is chosen for the first two layers to more quickly reduce the dimensionality of our filters and then subsequent layers use a pooling size of 2. These pooling sizes were chosen such that the flattened layer has a dimension of 4096, which matches the flattened layer size in the original 2D network.

Once again, a deeper network is observed to provide better accuracy given a large enough supporting dataset, and we once again extrapolate to a shallower seven-layer network that preserves the VGG structure, in consideration of our small dataset. Table 2 shows the details of this seven-layer architecture. Dropout regularization of 50% is employed between the first two dense layers to improve generalizability to the testing dataset as in the 2D analog.

In Sec. 4, two case studies in which each of these surrogate modeling techniques is applied to predict the performance of 3D models of both aircraft and watercraft are presented. These two case studies will give us insight to be able to answer RQ1, whether these surrogate modeling techniques can accurately and efficiently predict design performance from a latent vector, which would enable the use of large-scale 3D datasets in engineering design contexts. The case studies will also address RQ2, by allowing us to identify which surrogate modeling technique(s) perform the best in terms of accuracy and efficiency.

## 4 Application

The surrogate modeling techniques described in Sec. 3 are applied to case studies using 3D point clouds of 1250 aircraft and 250 watercraft from the ShapeNet dataset. The objective is to use these surrogate modeling techniques to accurately predict the coefficients of lift and drag for an aerodynamic simulation in openfoam for the aircraft test case, and the coefficient of drag only in a hydrodynamic simulation for the watercraft test case. Section 4.1 describes the ShapeNet Dataset, and Sec. 4.2 describes the aerodynamic simulation software openfoam.

### 4.1 3D Model Dataset.

The dataset used for this experiment was derived from the ShapeNet dataset [8]. Specifically, 1250 models from the aircraft category were sampled to create our aircraft dataset, and 250 models from the watercraft categories were sampled to create the watercraft dataset. The models were available in mesh format as well as in point cloud format. The mesh models are required for the openfoam evaluation software, and the point clouds are used for the prediction model. Note that in the general case, any mesh model can be easily converted to a point cloud by stripping away the face connectivity information. The point clouds are normalized to each containing 2500 points. A pre-trained AtlasNet encoder made publicly available by Groueix et al. [13] was used to convert the 3D models to 1024 latent vectors. Figure 5 shows some of the objects in each category in mesh format.

One limitation of this dataset includes the relatively small size which is on the order of thousands of designs. This is small compared with many image datasets.

### 4.2 Model Evaluation Method.

We use the open-source CFD software openfoam [45,46]. openfoam can perform simulations of basic CFD, combustion, turbulence modeling, electromagnetics, heat transfer, multiphase flow, and stress analysis [47]. openfoam is based on the finite volume method, using c++ and object-oriented programming to develop a syntactical model of equation mimicking and scalar-vector-tensor operations [47]. openfoam has been used in research for simulating coastal engineering processes [48], realistic wave generation and active wave absorption [49], boiling fluid flows [50], and many other applications [51–56].

*simplefoam*motorbike tutorial case for incompressible fluids using Reynolds-averaged simulation turbulence modeling. This tutorial solves for incompresisble fluid flow over a 3D object and calculates coefficients of lift and drag as well as pressures on the object surface and flow streamlines by solving a system of Navier–Stokes equations:

**u**is the velocity vector,

**p**is the static pressure,

**g**is the gravity force vector,

**v**is the kinematic viscosity,

*μ*is the dynamic viscosity, and ρ is the fluid density.

For the aircraft test case, we use the kinematic viscosity of air at 20 °C *v* = 1.516*e* − 5 m^{2}/s. Likewise, for the watercraft test case, we use the kinematic viscosity of water at 20 °C *v* = 1.004*e* − 6 m^{2}/s. We set the flow velocity to *u* = 250 m/s for the aircraft test case and *u* = 17*m*/*s* for the watercraft test case. We use an angle of attack *α* = 0 for all objects in both simulations. The inlet and outlet dimensions were 12 × 8, and the total size of the domain was a 12 × 8 × 20 rectangular prism. For each test case, 200 simulation time steps were calculated, and the coefficient values at time step 199 were used for the coefficient ground truth data, after being scaled by a factor of 156.25.

## 5 Results and Discussion

In this section, we discuss the results of the openfoam aerodynamic evaluation as well as the prediction model training and regression accuracy.

### 5.1 openfoam Evaluation Results.

Of the 1250 Shapenet models that were evaluated in openfoam for the aircraft test case, only 64 of them were non-manifold objects, which returned 0 for both coefficients of lift and drag. Furthermore, the 15 models that had one (or both) coefficient's absolute value greater than 6 were excluded from training as extreme outliers. Figure 6 shows the histograms for the coefficients of lift and drag for the remaining 1170 models. The mean coefficient of drag was *μ*_{d} = 0.175, and the standard deviation was σ_{d} = 0.162. For the coefficient of lift, the mean value was *μ*_{l} = 7.75 × 10^{−3}, and the standard deviation was σ_{l} = 0.149.

For the 250 models evaluated according to *C*_{D} in the watercraft test case, 23 were nonmanifold and 8 were outliers according to the aforementioned criteria. Figure 7 shows the histogram of the coefficient of drag for the remaining 214 models. The mean coefficient of drag was *μ*_{d} = 0.746, and the standard deviation was σ_{d} = 0.750. This large mean and standard deviation comes from the fact that the watercraft dataset consists of both submersible and non-submersible watercraft. The non-submersible watercraft models return a large coefficient of drag in an underwater flow simulation (Fig. 8).

### 5.2 Prediction Model Regression Results.

In this section, we compare the convergence time and prediction accuracy of each of the surrogate modeling techniques discussed in Sec. 3.2 for the main test case involving 1170 aircraft models, as well as the smaller validation test case involving 171 watercraft models.

Table 3 shows the convergence times for each method for the aircraft test case. All experiments were run using a GeForce GTX 1080 GPU and a 12-core Intel i7-8700 CPU. KPLS, and RBF both converge on a time scale of seconds, whereas the neural network approaches take minutes. Both neural network approaches were trained for 250 epochs. Once trained, all methods are able to evaluate the testing models in less than 10 s. By contrast, based on the average openfoam model evaluation time of 7 min, evaluating the 234 testing models for the aircraft test case in openfoam would have taken 43.05 h. Even for the neural network models, training the surrogate model and then using it to evaluate the remaining models reduces the evaluation time for the remaining models by a factor of 70 compared with evaluating them in openfoam.

#### 5.2.1 Aircraft Test Case.

The 1170 samples were randomly split 80–20% into training and testing data, respectively. The prediction model was trained on the 936 training models and then evaluated for generalizability on the 234 testing models. Experiments with two mutually exclusive splits of training and testing data were conducted, and the average metrics of the two experiments recorded. MAE values are reported after being scaled back to the original coefficient value range.

The best possible score is 1.0, and lower values indicate a less predictive model. Table 4 shows the MAE and explained variance for *C*_{L} and *C*_{D} in the testing dataset, as well as the *p*-value for statistical difference between the lowest error method and all others.

We see that only KPLS has a statistically significant larger error than the other three methods for *C*_{L} and that ResNet does not outperform the other methods by a statistically significant margin with respect to *C*_{D} error. Notably, in all cases, the surrogate model is more accurate with respect to *C*_{D} than the coefficient of *C*_{L} by a statistically significant margin as shown in Table 5. This is likely due to the fact that all *C*_{D} values are greater than zero, whereas the *C*_{L} values can be negative, and in fact, the *C*_{L} distribution has large outlier values in both the positive and negative directions. The fact that the neural network and KPLS methods have a more severe reduction in accuracy with respect to *C*_{L} suggests that they are comparatively less robust to outliers than the RBF model used in this study. We speculate that the input dimension reduction performed in these two methods limit the degree of overfitting to dense regions of the input space. This goes against the intuition that a neural network would be best suited to decode an object encoded by a neural network, as well as the perception that neural networks are the best at black box function approximation.

Given the fact that samples with large coefficient values are sparse, we would expect that any model would be considerably more accurate at predicting models with coefficient values near the center of the distribution than those at the extremes. However, it would still be useful to a designer if the networks were able to provide insight to how these models with outlier coefficients compare with the more common models. Therefore, in addition to evaluating the accuracy of the performance prediction network with respect to the absolute value, we are also interested in evaluating how well it is able to rank a set of designs. We again calculate the MAE, but now **y** and $y^$ are the true and predicted percentile, respectively, of the object coefficients with respect to the entire testing dataset. Table 6 summarizes the resulting errors.

With this analysis, we see that the neural network methods are superior in terms of ranking designs with respect to both coefficients, and they can achieve accuracy to within 7% of the true percentile. This suggests that while the neural network methods may not have been able to achieve as high regression accuracy on outlier data, they were still able to recognize when a data point was an outlier and rank it accordingly.

Next, we investigate the prediction performance of the surrogate models on outlier data. We separately analyze the accuracy for two outer-most and two inner quartiles of coefficients for the highest performing network architectures. This allows us to directly see the difference in how the model predicts outlier data and data close to the median. Table 7 shows the MAE in terms of both raw coefficient value and percentile for these categories.

The raw MAE values in Table 7 confirm that for the outer-most quartiles, the model has significantly lower precision with respect to the coefficient values than the inner-most quartiles, but the outlier accuracy is closer to the inlier accuracy in terms of percentile, especially with respect to *C*_{D}. The intuitive interpretation of this result is that the models are better at recognizing outliers than predicting the absolute value of their performance.

This is to be expected given the histograms in Fig. 6, which show that the sample size of models with coefficients of absolute value greater than 100 is small. Likewise, the inverse result in the inner quartiles can be explained by the large concentration of samples with coefficient values less than 0.33. The large number of samples indicates that the prediction models have learned to predict the absolute value of the coefficients better, but because these true values are much closer together than in the outer-most quartiles, the task of ranking them becomes more difficult. Overall, it appears that the improved ability to learn the relationships in the inner quartiles overcomes the increased difficulty of the ranking task, as the ranking accuracy in the inner quartiles is still superior.

Finally, we investigate the reason that the RBF and KPLS models were able to achieve comparable accuracy to the neural network methods despite being comparatively simple computationally. Inspecting the latent vectors produced by the autoencoder reveals that they are sparse, and on average, their ℓ_{0} norm is 370. This means that, on average, 64% of the latent variables are zero. This suggests that the RBF and KPLS methods, which rely on projecting the input vectors to a lower-dimensional space, were able to compress the latent vectors without losing much information and efficiently solve the problem without sacrificing accuracy.

While this test case validates the performance of the surrogate models with respect to a certain category of geometries, we would like to see if they are robust to multiple types of geometries. In order to examine this, we investigate a second test case using a dataset of watercraft models.

#### 5.2.2 Watercraft Test Case.

As with the aircraft test case, the 214 watercraft samples were randomly split 80–20% into training and testing data, respectively. This gives 171 training models and 43 testing models. Once again, two experiments were conducted, each with a different split of training and testing data.

Table 8 shows that, in this test case, all methods perform worse than the aircraft test case, especially the neural network methods. This raises the question of whether the performance suffers due to the different distribution of the latent vectors (geometry of the data) or due to differences in the distribution of the performance vectors, as the boat test case includes significantly more designs with *C*_{D} > 1.5 than the aircraft test case. An additional experiment was performed in which all designs with *C*_{D} > 300 were removed from the dataset, leaving 197 watercraft models. The results of this experiment are shown in Table 9.

We see that once these outliers have been removed, the MAEs are more in line with the values in the aircraft test case. ResNet achieves the best accuracy, same as in the aircraft *C*_{D} test case shown in Table 4(b).

We also see that the percentile accuracy is better than the aircraft test case regardless of whether additional extreme outliers are removed. Looking at the difference in distributions between Figs. 6(a) and 7, we see that the watercraft *C*_{D} values are more evenly distributed, as opposed to the aircraft test case in which most values are clustered between 0 and 0.33 with a small number of outliers. As we found in Sec. 5.2.1, a tight cluster of values presents difficulty in ranking, because the true values are very close together. On the other hand, ranking outliers is an easier task, but the results suffer from the fact that there are less training examples of outliers by definition. Both of these facts suggest that having data that is more spread out, but with sufficient samples for learning in each region of values would be most conducive to ranking accurately. The watercraft performance value distribution meets both of these criteria when compared with the aircraft distribution.

Modifying the distribution of the performance vectors to be more similar to the aircraft test case's distribution had a large impact on the regression accuracy, but changing the latent vector distribution to an entirely different category of geometries resulted in a similar performance. This suggests that the surrogate models are robust to different encoded geometry representations, but that the regression is sensitive to performance vector distributions with a large number of outliers.

With respect to RQ1, we found that all of the surrogate models investigated in this work are able to learn the form-function relationship between a geometry encoded in a 1D latent vector and a performance score, and they are robust to multiple categories of geometries. However, care must be taken to remove extreme outliers from the performance metric distribution, as we have shown this to have a severe negative impact on the accuracy of all of the surrogate models.

With respect to RQ2, we found that moving away from neural network approaches (i.e., RBF and KPLS) was far more efficient than using neural network models. However, none of the methods consistently outperformed the others in terms of accuracy. The relative accuracy of the surrogate models seems to be most heavily affected by the distribution of the performance values, and thus RBF and KPLS are both viable candidates for efficient surrogate models to learn form-function relationships from autoencoded latent vectors, enabling large-scale efficient evaluation of 3D model datasets.

## 6 Conclusions and Future Work

We investigated the use of four surrogate modeling techniques for estimating performance features of 3D models from a neural network autoencoded representation of a point cloud, namely RBF, KPLS, a shallow 1D ResNet architecture, and a shallow 1D VGG architecture. We show that departing from a neural network structure for decoding this representation into performance predictions is far more efficient than state-of-the-art neural network techniques, yet comparable in accuracy. In all experiments, the neural network methods converged on the timescale of minutes, and the non-neural network approaches converged on the timescale of seconds. In a test case involving 1170 aircraft models, the RBF surrogate model achieved an absolute performance prediction error of 1.97 × 10^{−3}, on average, and converged in 3 seconds, although all methods were within the margin of statistical significance in terms of accuracy. In a test case with 214 watercraft models, ResNet achieved the highest accuracy, and KPLS was within the margin of statistical significance after eliminating outliers with *C*_{D} > 1.92. We found that while the surrogate models were robust to encoded geometries of a different form (both aircraft and watercraft), they were sensitive to large numbers of extreme outliers with respect to regression accuracy.

However, for all methods, prediction accuracy in terms of the raw coefficient value is much better for models with coefficients close to the median of the dataset; however, the models are much more consistent at predicting where an object's coefficient values fall relative to other objects in the dataset and achieve percentile rank errors of less than 9%. In particular, the models are most accurate at identifying data with high coefficients of drag with respect to the rest of the dataset. In applications in which performance metrics must be gathered for a large amount of 3D models of varying quality, this prediction model can serve as an adequate filter to narrow down which models will be tested more rigorously, and depending on the tolerance for error of the application, it could replace the evaluation environment altogether. This surrogate modeling approach would be best suited for this purpose in cases where the models of interest are near the center of the coefficient distribution, and outliers can safely be eliminated. This surrogate modeling method allows for large 3D model datasets to be efficiently evaluated for performance, enabling their use in engineering design applications.

The neural network results have the potential to be greatly improved by introducing more data for training. While the non-neural network approaches had consistent performance across the two test cases, going from 214 training models to 936 training models greatly improved the performance of the neural network approaches. Nonetheless, using a very shallow architecture to manage the number of trainable weights, 1170 data points is very small compared with the amount of data available in applications where these deep neural network approaches are common, which is usually on the order of millions of data points. While datasets of this size for 3D point clouds are not readily available, an increase in data would enable effective training of deeper network architectures. Given the promising performance of these networks on a comparably small dataset, it is reasonable to speculate that as large 3D model datasets become available, a surrogate model could serve as a substitute for an expensive performance evaluation software with a high degree of accuracy, and the long convergence time compared with the RBF and KPLS methods could prove worthwhile.

Another area that could be explored is the generalization of one surrogate model to another application. For example, the geometric features learned to predict aerodynamic coefficients will have some overlap with the geometric features needed to predict these coefficients in an underwater environment. Therefore, we would expect that some transfer learning would occur when using the aerodynamic prediction model as a starting point, which would reduce the data load required for the new application.

## Acknowledgment

This research is funded in part by DARPA HR0011-18-2-0008 (Funder ID: 10.13039/100000185). Any opinions, findings, or conclusions found in this paper are those of the authors and do not necessarily reflect the views of the sponsors. The authors would like to acknowledge Dule Shu and Haoyuan Meng for their contributions to this work.