## Abstract

Assessing vehicle safety is a challenging, yet fundamental task. In the early phase of development, car manufacturers need to ensure the compliance with strict safety requirements. An interesting task to automate these early-stage operations is to harness information from already developed products. Established designs are largely accessible, with abundant data; novel designs' data are scarce. While established and novel designs are (by definition) different, it is expected nonetheless that there is a degree of correlation between them. Thus, the established design could be regarded as a low-fidelity (LF) model of the novel design, in the sense that it may provide an approximation of the behavior of the novel design. In turn, the novel design could be regarded as a high-fidelity (HF) model, as it represents the true product being designed. This bifidelity character of the problem stands at the basis of this paper. This work explores the application of control variates (CV) to a crashworthiness analysis scenario. Control variates is a variance reduction technique that exploits the low-fidelity information to improve the accuracy of the response statistics of the high-fidelity model. Such an approach could be most useful for industrial applications. Therefore, we apply control variates to a crash box example and compare its performance to its plain Monte Carlo (MC) counterpart. The results of this paper show the benefits of this bifidelity approach, resulting in control variates being a powerful technique to extract valuable information from limited data sets. Indeed, control variates can serve as an innovative solution to support car manufacturers in the early phase of vehicle development and thus improve the performance in crashworthiness scenarios.

## 1 Introduction

The development of a vehicle is a complex task. Car manufacturers need to satisfy strict safety requirements. When assessing vehicle safety in an early stage of development, they need to face the challenge of low data availability.

In the early stage of the vehicle development, indeed, the ultimate geometrical and material data of the product are not entirely defined, and the prototype of the vehicle is not yet available. In this context, it is often more economical for companies to make changes to their existing products rather than creating new ones from scratch. In industry, many new products can indeed be seen as modifications or upgrades of existing ones [1].

A task worthy of pursuit is exploiting the data coming from past development processes to infer knowledge on future situations. This problem can be visualized as a bifidelity one. The numerous pieces of information (e.g., simulations and hardware tests) from the already developed products can be seen as the low-fidelity (LF) data. These are inexpensive to access, as they are readily available from the archives, having been accumulated over years of development; the few data belonging to the product under current development take the role of the high-fidelity (HF) counterpart. These data are scarce and costly to obtain. Each new round of testing, simulation, or prototyping demands significant organizational, financial, and computational resources. Moreover, the urgency imposed by project timelines further escalates the value of these fresh pieces of information.

The concept of learning from larger amounts of available data that belong to a different domain with respect to the one of interest is a topic that has been extensively explored in various fields, e.g., structural health monitoring, manufacturing process planning, etc. In structural health monitoring, information from a population of structures is transferred to the complete population [2]. In manufacturing, process parameters are predicted using a combination of simulations and experimental data [3]. To do this, transfer learning is used, which is a deep learning algorithm that allows to relax the need of having big amounts of HF data.

In the context of crashworthiness, the concept of transfer of knowledge from the LF to the HF data is still relatively unexplored. Our previous work [4] addresses this problem by applying transfer learning, using past development processes to enhance predictions for future designs. A further notable example in the literature related to crash analysis is the development of geodesic convolutional neural networks [5] to optimize a crash box component. However, in the current paper, we propose an alternative approach based on control variates (CV). Differently from the cited works, CV is not a deep learning technique, but a statistical one.

Control variates is a framework that allows to estimate the statistics of the response of a given system [6], particularly useful for unimodal distributions. This is achieved by considering an additional variable that is correlated with the response of interest. In this way, CV reduces the variance of the estimates of the statistics, leading to more precise results. The key advantage of this method is that it allows to aggregate estimates generated using the high- and low-fidelity models to enhance the estimated statistics of the response associated with the HF one [7]. The known-to-the-literature CV applications make use of high- and low-fidelity pieces of information belonging to the same physical system [7,8]. As an example, this situation may occur in crash analysis when a system is, on the one hand, cheaply—from a computational point of view—simulated in finite element (FE) analysis through a coarser mesh, and, on the other hand, with a more dense one. This situation is graphically represented in Fig. 1(a): the same system is evaluated in a cheap and an expensive FE way.

Although CV is generally used in situations as the one just described, in this paper, we attempt to utilize it under different circumstances. We plan to apply it to a bifidelity industrial problem similar to the one of Fig. 1(b). High quantities of data coming from old versions of a crash box will be exploited to gain knowledge on a new one, characterized by low data availability. The innovative contribution of this work with respect to the literature consists in applying CV in a past-to-future configuration, where the past data are the LF and the current ones are the HF counterpart. In doing so, it is assumed that both the old and new configurations depend on the same set of input variables. This implies that, for example, if the behavior of the old configuration depends on two thicknesses, the behavior of the new configuration also depends on the same two thicknesses. However, the range of values associated with the input variables related to the old and new configurations can be different, so as to reflect the typical situation of an updated crashworthiness design.

The goal of this paper is to explore the benefits of CV to harness information from already developed products, thus enhancing the prediction on a new product version. Section 2 describes the investigated crash case scenario, Sec. 3 provides a theoretic overview on CV, and Sec. 4 explains the application of CV to the crash box explanatory example. Finally, Sec. 5 provides the conclusions of the work, with a practical outlook on possible future utilization of CV for crashworthiness analysis.

## 2 The Case Study: Crash Box Drop Tower Test

The mechanical problem selected for this paper is a simplified version of the crash box drop tower test [9]. In the automotive industry, these experimental impact studies are commonly carried out and are crucial to study the influence of variations in geometric shapes or materials on the crashworthiness [10]. Crash boxes indeed cover two main engineering objectives in automotive industry: lightweight design and energy absorption. For this reason, the crash box drop tower test offers an excellent example here.

A crash box is an integral component placed at the front-most portion of the body-in-white of a car. It ensures the structural performance of a car by serving as an energy-absorbing member, together with the bumper beam in case of frontal collisions during car accidents [11], thereby protecting the subsequent structures from higher repair costs. The drop tower test is a type of experiment that is commonly used in mechanical engineering and material science to study the behavior of a structure or material when subjected to impact. The test involves suspending a weight or a rigid barrier at the top of a crash box and then releasing it.

In the field of crashworthiness, the crash box drop tower test is an essential tool. These experiments are often replicated through the use of FE simulations [12]. Crash FE simulations have become essential in recent years for automotive Research and Development. As they are faster, sufficiently repeatable, and more cost-efficient than hardware tests, FE simulations currently support the design process. By enabling the exploration of new concepts and unconventional designs without the need for physical prototypes, simulations drive the innovation in the field of crash safety [13].

When it comes to crash analysis and dynamic events, explicit simulations are generally preferred to implicit ones. Crashes typically involve large deformations, material nonlinearity, multiple interacting bodies with complex contact conditions, and possibly failure. Explicit methods handle these nonlinearities well because they do not require the solution of large sets of simultaneous equations at each time-step [14].

In summary, data coming from FE simulations offer a solid foundation to conduct studies due to the increased accessibility and repeatability compared to hardware tests. In this section, we describe the FE model at the basis of our study; later, we introduce the selected crash box variants.

### 2.1 Finite Element Model.

Figure 2 schematically represents the FE model of a crash box subjected to a drop tower test. The crash box is fully constrained at the rear end, and a rigid barrier moves toward it along the *z*-axis with an initial velocity of −56 km/h. The FE model is realized meshing the crash box with Belytschko–Tsay shell elements and mesh size of 4 mm, while the rigid barrier is meshed with fully integrated shell elements.

The rigid barrier is considered as a rigid body using the MAT_RIGID ls-dyna material model. The material of the crash boxes is instead a steel modeled through an ls-dyna MAT24 model with elastoplastic behavior. The mass of the rigid barrier is set according to the crash box geometry in order to obtain four folds in the fully deformed configuration, resulting in a nicely observable outcome. In the context of crashworthiness, the term folds refers to the deformations that occur as a result of the buckling process in the crash box during the crash event. To provide a visual understanding, Fig. 3 illustrates the undeformed and deformed configuration of a crash box with the four folds highlighted.

The results of this simulation are extracted in terms of forces and displacements: the force is measured on a cross section 10 mm distant from the constrained end of the crash box; and the displacement is instead obtained averaging the displacements of eight equally distant nodes on the unconstrained end of the crash box. We arbitrarily choose eight measurement points on the external perimeter of the crash box to ensure that the displacement of all sides is equally considered. In cases where the crash box has internal ribs, these ribs are typically welded to the external perimeter. Therefore, our choice would still provide a representative average displacement of the upper part of the crash box. Both the cross section and the eight nodes are represented in Fig. 2.

During the impact phase, the force measured on the rigid barrier increases rapidly, reaching a peak, and then decreases. The first peak represents the initial contact between the barrier and the crash box. The peak force for the first fold is higher than the forces occurring later because of higher energy and because of predeformations: one fold triggers the generation of the next fold [15]. Hence, the initial peak force can be considered as one of the main parameters influencing both the energy absorption and the load transmission to the main body of the vehicle. The secondary peaks, which are related to the subsequent foldings, help to enhance the energy absorption [16]. This continuous folding mechanism is desired during the entire crash phenomenon. The reader can refer to Ref. [17] for representations of the force–displacement behavior of crash box structures.

### 2.2 Model Variants.

Six crash boxes are selected for the study. For the sake of simplicity, we refer to these crash boxes as designs A, B, C, D, E, and F. The length *c* along the *z*-direction, see Fig. 2, is the same for all the crash boxes and equal to 348 mm. An artificial trigger has been introduced to ensure that when the load is applied, a progressive deformation takes place in the axial direction. The trigger is placed in the first row of nodes by shrinking the perimeter length by 1%.

Looking at their cross sections, crash boxes A, B, and C are single-cell geometries, while crash box D, E, and F are three-cells ones. In Fig. 2, the dotted lines show a three-cells crash box design. The crash box geometries differ from each other in terms of their dimensions in *x*- and *y*-directions, i.e., *a* and *b* of Fig. 2, and nominal thickness *t _{n}*. Table 1 summarizes the geometrical features of the selected crash boxes. The last column reports the mass of the rigid wall set for each specific crash box design.

Design | a (mm) | b (mm) | t (mm)_{n} | No. of cells | M (kg) |
---|---|---|---|---|---|

A | 122.3 | 62.6 | 2.4 | 1 | 226.1 |

B | 39.5 | 54.7 | 2.5 | 1 | 182.7 |

C | 44.0 | 38.0 | 2.1 | 1 | 92.2 |

D | 120.0 | 63.3 | 2.2 | 3 | 396.0 |

E | 101.0 | 62.1 | 2.3 | 3 | 408.4 |

F | 120.6 | 69.2 | 2.0 | 3 | 408.4 |

Design | a (mm) | b (mm) | t (mm)_{n} | No. of cells | M (kg) |
---|---|---|---|---|---|

A | 122.3 | 62.6 | 2.4 | 1 | 226.1 |

B | 39.5 | 54.7 | 2.5 | 1 | 182.7 |

C | 44.0 | 38.0 | 2.1 | 1 | 92.2 |

D | 120.0 | 63.3 | 2.2 | 3 | 396.0 |

E | 101.0 | 62.1 | 2.3 | 3 | 408.4 |

F | 120.6 | 69.2 | 2.0 | 3 | 408.4 |

## 3 Control Variates

In this paper, we propose a method to apply the bifidelity CV to different physical domains. In the literature, the term bifidelity typically refers to the use of models with different levels of accuracy, e.g., Ref. [18]. The fidelity level usually indicates the degree to which the model captures the complexity of the real-world system it represents. In the context of passive safety development, the use of bifidelity approaches traditionally implies integrating results from both high- and low-fidelity simulations, e.g., detailed FE models and simpler analytical models, to predict the performance of a safety component. However, in this paper, the term LF is being repurposed as described in Sec. 1: we use it to describe data from previous versions of a component (Fig. 1(b)), not a less accurate modeling method (Fig. 1(a)).

In the following, we first explain the basic theory of CV and then propose a methodology for the use of CV in those cases where the cheap and expensive data belong to different physical systems. Finally, we summarize an overview of the whole method.

### 3.1 Classic Control Variates.

Control variates is a technique that can be used to estimate the first and second order statistics, i.e., mean and variance, respectively, of a quantity of interest (QoI), see, e.g., Refs. [6] and [19]. This technique is of aid when one has a large amount of cheap-to-evaluate data from a LF model, and a few data points from a HF model, which are typically more computationally expensive to obtain. The basic idea behind CV is to aggregate the LF and HF data to estimate the first and second order statistics of the QoI. These statistics are particularly useful for describing unimodal statistical distributions, providing valuable insights into the central tendency and spread of the response. In the following, the focus is on presenting CV for estimating the mean of the response. However, the framework discussed here can be extended toward estimation of the variance of the response, see, e.g., Ref. [20].

*n*sample evaluations available. That is, the values $rHF(\theta n(i)),\u2009i=1,\u2026,n$ are known, where $\theta n(i)$ denotes the

*i*th sample of $\theta $ distributed according to $p\Theta (\theta )$. Thus, the estimate for the mean of the response $\mu \u03021(rHF,\Theta n)$ considering these samples is

where $\Theta n$ is the set grouping all samples $\theta n(i),\u2009i=1,\u2026,n$. The above equation corresponds to the classical Monte Carlo (MC) estimator for the mean.

As evaluating the HF model is expensive from a numerical viewpoint, it is assumed that *n* is a small number, and therefore the estimate of the mean generated with Eq. (1) may lack precision. This is illustrated in Fig. 4, where the probability density function associated with the estimator in Eq. (1) is represented by the curve with the lowest peak. We observe that this probability density possesses a large variance.

Here, $\Theta m$ is a set of *m* samples of the uncertain input parameters, *γ* is a real number termed as the *control parameter* to be discussed later, and $\mu \u03021(rLF,\Theta n)$ and $\mu \u03021(rLF,\Theta m)$ are estimates of the mean of the response associated with the LF model considering *n* and *m* samples, respectively. From Eq. (2), one can observe how the LF and HF data are aggregated by means of CV. We assume to have significantly more samples in $\Theta m$ than $\Theta n$, that is, $m\u226bn$.

We understand the CV estimator in Eq. (2) as the summation of the following two terms. The first term $\gamma \mu \u03021(rLF,\Theta m)$ represents the mean response of the LF model amplified by the control parameter *γ*. As the number of samples *m* is large, this estimator is precise. This is represented schematically in Fig. 4, where the probability density associated with this estimator corresponds to the blue curve that possesses a small variance.

The second term $(\mu \u03021(rHF,\Theta n)\u2212\gamma \mu \u03021(rLF,\Theta n))$ in Eq. (2) represents a correction over the previous estimate that forces the overall estimate to converge to the sought mean value. A characteristic of the latter term is that both the low- and high-fidelity models are evaluated considering the same set of samples $\Theta n$, and thus the estimator associated with this second term will usually possess small variance, provided that there is sufficient correlation between the low- and high-fidelity models. Therefore, the CV estimator in Eq. (2) possesses a reduced variance when compared to its counterpart in Eq. (1), as represented schematically in Fig. 4 by the probability distribution in green color.

*γ*. Thus, the control parameter is selected such that this variance is minimized, leading to

where $\gamma \u2217$ denotes the value of the control parameter that minimizes the variance in Eq. (3) and that should be used when estimating the mean of the response when applying CV as in Eq. (2). Equation (4) shows that the computation of the control parameter $\gamma \u2217$ is done automatically on the base of the LF and HF samples. It is interesting to note that the control parameter depends directly on the covariance $\mu \u03021,1$, which is calculated based on the samples. From this observation, it is possible to derive two extreme cases. Whenever this covariance is equal to zero, the control parameter becomes zero, and thus the estimators of the mean and variance in Eqs. (2) and (3) rely solely on the simulations available of the HF model. In other words, given the null covariance, the estimators ignore all information associated with the LF model, as this is not useful at all. On the contrary, when the LF model mimics perfectly well the HF model, the control parameter tends to one, implying that the information provided by the LF model is deemed as good as the HF model. In summary, the optimal control parameter can be interpreted as a means for aggregating the information of the LF and HF models in an automatic way, giving proper weight to the information provided by each of them.

An examination of Eqs. (2)–(4) reveals that for the application of the CV framework, all what is required is the set of samples of the response of the low- and the high-fidelity model, that is, $rLF(\theta n(i)),\u2009i=1,\u2026,n,\u2009rLF(\theta m(j)),\u2009j=1,\u2026,m$, and $rHF(\theta n(i)),\u2009i=1,\u2026,n$, respectively. However, if the samples of the response associated with the low- and high-fidelity models are used to compute both the optimal control parameter in Eq. (4) as well as the mean in Eq. (2), the estimator for the mean becomes biased, as documented in, e.g., Ref. [6]. Fortunately, this issue can be solved by applying a splitting scheme, as proposed in Ref. [21]. The splitting scheme consists, in a nutshell, of estimating both the optimal control parameter and the mean of the response considering subsets of the samples already available. To explain this strategy in detail, consider that each of the sets of samples $\Theta m$ and $\Theta n$ are split into three subsets, that is, $\Theta m\u2217,k$ and $\Theta n\u2217,k$, where *k *=* *1, 2, 3 and $m\u2217=m/3$ and $n\u2217=n/3$. In addition, for each subset *k*, one defines a so-called subset controller $\tau (k)$, which is an integer defined according to Table 2.

In the above equations, $\mu \u03021CVS$ denotes the estimator of the mean considering CV and splitting, while $\gamma \tau (k)\u2217$ denotes the optimal control parameter associated with the subset of samples $\tau (k)$.

It is noted from Eq. (5) that when considering the *k*th subset of samples for calculating the mean responses of the low- and high-fidelity models, the control parameter is calculated with respect to the samples contained in the subset $\tau (k)$. In other words, when the mean is calculated with a certain subset of samples, there is a different subset of samples involved in the calculation of the optimal control parameter. Such a strategy ensures that the estimate $\mu \u03021CVS$ is unbiased, as discussed in detail in Ref. [21].

### 3.2 Control Variates for Different Domains.

Section 3.1 assumes that both the low- and high-fidelity models possess the same input $\theta $. However, this may not be necessarily the case when considering an already developed product versus one which is being currently developed. For example, consider the thickness of a crash box as described in Sec. 2. The assumption of uncertainty associated with this parameter is plausible due to unavoidable manufacturing variability. However, the thicknesses associated with an existing crash box and one under development may possess completely different ranges of variability. For this reason, we propose in this section an approach to handle LF and HF systems that have different nominal values of design parameters and, consequently, different design spaces. This might influence the computation of the correlation between the LF and HF data, as well as the CV implementation itself. To overcome this challenge, we propose to generate samples within the parameter space of each system and then compare these samples in a sequential manner. Generally, the LF data are assumed to be available from past studies. However, if the computational resources for the collection of the LF data are not limited as for the HF ones, one could also think to generate the LF dataset from scratch. The method that we propose would still hold.

*j*represents the number of design variables involved in the study, $j\u22081,2$. We propose to compute the corresponding points for the HF system with respect to the available LF ones following the formula in Eq. (8). In particular, with $val1,HF$ referring to the HF dashed-blue value in Fig. 5 and $val1,LF$ the LF counterpart squared in a blue solid line, the formula allows to compute the first with respect to the second. Equation (8) ensures that the HF value corresponding to the LF one is appropriately adjusted to its specific range

After the collection of the data for the HF system, we propose using an ordinal criterion. We exploit the sequential nature of the collected data to compute the correlation between the LF and HF data. This means that the first data point of the LF model has to be associated with the first HF one and so on. This approach can be extended to cases with multidimensional inputs and can help to overcome the challenge of comparing models with different ranges of design variables.

We take the sample from the first row of the LF model and compare it with the first sample from the HF model, regardless the respective design spaces; then, we take the second sample from each and compare those, and so on. This comparison would be straight forward in the commonly known bifidelity case, as the LF and HF samples would be direct equivalents of each other. One has to remember, however, that in the case under investigation, the assumption of knowing the design variables ranges covers a role of great importance.

*y*

_{LF}is the original output vector, avg

_{LF}is the mean of that output vector, and std

_{LF}is its standard deviation. Consequently, we scale the HF output data using the scale of the LF. Scaling the HF data with respect to the LF ones reflects the assumption that the entire range of the HF data is unknown at the beginning of the study, as these data belong to a domain for which only a limited amount of information is available. The formula, with

*y*

_{HF}being the original output HF vector, clarifies the technique

Once the paired samples are set, and the output data are scaled, we can implement the CV technique described in Sec. 3.1.

### 3.3 Overview of the Method.

Our proposed method can be summarized in six steps: (i) defining the LF and HF systems; (ii) collecting the data from both systems, assuming that it is cheaper to obtain the information for the LF system than for the HF one; and (iii) scaling the data. Finally, (iv) applying CV to reduce the variance in the estimation for the selected metric, and (v) evaluating the results, comparing them to those obtained from plain MC.

When evaluating the CV results, one can focus on various aspects depending on the specific assumptions of the problem at hand. In our study, we are interested in investigating the impact of the number of HF samples *n* on the quality of the prediction. Given the challenges associated with collecting the HF data, i.e., data from the component under current development, we aim to observe how the prediction changes when the availability of HF samples is limited, while a large number of LF samples *m* is readily available. To achieve this, the results are plotted by keeping *m* fixed and high, while varying *n* in a specified vector.

For each value of *n*, the mean and variance of the selected QoI are calculated using CV and MC. While CV exploits both the low- and high-fidelity data, Eq. (6), MC only relies on the *n* HF samples, Eq. (1). These *n* HF samples are randomly selected each time from the original pool of simulation data. To eliminate the effect of random sampling, the results are averaged over ten independent runs.

## 4 Application

With the application of CV to a crash box drop tower test, we achieve two goals. First, we use CV on data coming from different mechanical systems with distinct design spaces. This implies employing the approach proposed in Sec. 3.2; second, we explore the effectiveness of CV in the industrial context of crashworthiness.

Although the original use of CV provides the opportunity to incorporate data of different fidelities, in this paper we uniquely refer to data generated through FE simulations. With the mechanical system under investigation in place as described in Sec. 2, we run the simulations using the R9.3.0 version of the ls-dyna solver. To this aim, we select the inputs and output of the study. We are interested in predicting the maximum value $Fmax$ of the force–displacement curve measured during the test. The input variables chosen for the study are the thickness *t* of the crash box, the overlap in *x*-direction Ω between the crash box and the rigid barrier, and the bending angle *α* of the rigid barrier, see Fig. 6. For proper nomenclature, *t* is a design variable, and Ω and *α* are load case parameters.

Aiming to generate a good space filling, we arbitrarily chose to use the Sobol sequences as sampling technique to apply variations to the parameters. As explained in Sec. 3.2, the accuracy of the results is not influenced by the choice of the sampling technique. The ranges defined in the scheme of Table 3 were also arbitrarily chosen to purely illustrate the applicability of our methodology. An increased *t* is expected to increase the $Fmax$; with values of Ω closer to 0, $Fmax$ increases; with higher *α*, $Fmax$ decreases. In total, we perform 500 FE simulations for each crash box of Table 1, varying the values of *t*, Ω, and *α* within their defined ranges. The results of these simulations form a comprehensive pool of data, serving as the repository from which the data are drawn for the application of CV.

Parameter | Distribution |
---|---|

t (mm) | $N$ [$tn\u22120.05tn$; $tn+0.05tn$] |

Ω (mm) | $U$ [$\u2212(3/10)a$; $+(3/10)a$] |

α (deg) | $U$ [0; 5] |

Parameter | Distribution |
---|---|

t (mm) | $N$ [$tn\u22120.05tn$; $tn+0.05tn$] |

Ω (mm) | $U$ [$\u2212(3/10)a$; $+(3/10)a$] |

α (deg) | $U$ [0; 5] |

The thickness *t* varies with a (truncated) Gaussian $N$ distribution with mean $\mu =tn$ and standard deviation $\sigma =0.074tn$; the overlap Ω and the angle *α* vary with a uniform $U$ distribution. The distributions are defined with a minimum–maximum range.

To apply CV to the six crash box designs that we have selected, i.e., A–F of Table 1, we create pairs of crash box designs to compare each one against all the others. So, we pair A with B, then A with C, and so on until A with F. After that, we pair B with A, then B with C, up to B with F. We continue this process for all designs, creating 30 different pairing combinations. Each time, one design acts as the LF and the other as the HF version. The LF output data are scaled to have zero mean and a unit standard deviation following Eq. (9), and the HF output data are scaled with respect to the LF range by means of Eq. (10).

For each of the 30 combinations investigated, we compute the Pearson's correlation coefficient between the LF and the HF crash box following the sequential method of Sec. 3.2. Table 4 contains the correlation coefficients computed for each LF/HF case. The symmetry of Table 4 indicates that the correlation coefficients remain consistent regardless of whether the same crash box combination is considered as LF/HF or HF/LF.

A | B | C | D | E | F | |
---|---|---|---|---|---|---|

A | 1.00 | 0.79 | 0.83 | 0.89 | 0.88 | 0.89 |

B | 0.79 | 1.00 | 0.92 | 0.75 | 0.76 | 0.68 |

C | 0.83 | 0.92 | 1.00 | 0.81 | 0.78 | 0.76 |

D | 0.89 | 0.75 | 0.81 | 1.00 | 0.88 | 0.90 |

E | 0.88 | 0.76 | 0.78 | 0.88 | 1.00 | 0.89 |

F | 0.89 | 0.68 | 0.76 | 0.90 | 0.89 | 1.00 |

A | B | C | D | E | F | |
---|---|---|---|---|---|---|

A | 1.00 | 0.79 | 0.83 | 0.89 | 0.88 | 0.89 |

B | 0.79 | 1.00 | 0.92 | 0.75 | 0.76 | 0.68 |

C | 0.83 | 0.92 | 1.00 | 0.81 | 0.78 | 0.76 |

D | 0.89 | 0.75 | 0.81 | 1.00 | 0.88 | 0.90 |

E | 0.88 | 0.76 | 0.78 | 0.88 | 1.00 | 0.89 |

F | 0.89 | 0.68 | 0.76 | 0.90 | 0.89 | 1.00 |

The element at location *i*, *j* in the matrix represents the correlation between the LF crash box *i* and the HF crash box *j*.

Control variates is applied for each LF/HF crash box combination. The amount of HF samples *n* assumes the values contained in the set $n\u2208{9,12,15,18,21,24,27,30,33,36}$, while the LF samples *m* are fixed at 200. For each *n* value and each LF/HF combination, the computation is repeated for ten times. Each time, for a given HF crash box design, a new randomly selected set of *n* HF samples is chosen from the original pool of 500 FE simulations. This is meant to minimize the impact of sampling-related chance effects. The computation provides the predicted values of mean and variance of $Fmax$ with CV and MC. The results from the ten repetitions are collected in boxplots. Within the boxplot, the median and the interquartile range (IQR) are the two parameters that we use to draw the conclusions of this study. In addition, we use the CV and MC methods to compute the mean value of the QoI for each LF/HF combination utilizing a high amount of HF samples, i.e., *n *=* *200. This allows us to assess the convergence of the methods in the long horizon.

The median value of the boxplots represents the central tendency of the data. The boxplots' IQR spans from the 25th percentile to the 75th percentile of the results and provides valuable information about the dispersion of the results across the ten repetitions. As similar results hold for all the 30 LF/HF crash box combinations, we decide to show only some representative results. Specifically, those referred to C/B and F/B. Figures 7 and 8 represent the variation of the median and the IQR of the mean value of $Fmax$ along the different *n* values.

In the context of our study, the median value of the boxplots represents the average value of the predicted mean of $Fmax$. The IQR indicates how the results are spread out, suggesting a higher level of consistency among the repetitions with a smaller IQR. Figure 7 shows how fast the methods, i.e., CV and MC, reach convergence with increasing *n*. Moreover, with *n *=* *200 HF samples, the mean of $Fmax$ computed, respectively, with CV and MC is 4.885 and 4.907 for the C/B LF/HF combination, and −1.466 and −1.462 for the F/B LF/HF combination. This confirms that with high *n*, both methods converge to the exact mean value. Figure 8 indicates the consistency of the methods among the random repetitions.

For each LF/HF crash box combination, the HF data have always been scaled with respect to the LF data. This explains the different scales of the subplots of Figs. 7 and 8 and suggests to read them separately. Focus has to be given to the individual comparison between the CV and MC lines, not to the one between their absolute values in the different subplots. When comparing the CV and MC median value of the $Fmax$ mean value predictions, Fig. 7 shows that CV outperforms MC. The CV prediction, shown with blue circles, reaches convergence faster than MC, in red squares. In other words, for lower *n* values, the CV median is closer to the exact $Fmax$ mean value than the MC one. In the figure, one can observe that this result holds for the two different crash box LF/HF combinations.

Figure 8 shows that CV not only better predicts the first order statistics with respect to MC; it is also more repeatable for limited HF datasets. The IQR for CV reaches zero faster than for MC. This means that, for lower *n*, i.e., for fewer HF samples, CV gives less spread predictions of the first order statistics of $Fmax$ over multiple random repetitions. As with the previous finding, this is again true for both the crash boxes combinations considered in the figures. As a further note to the reader, the results obtained for the predicted variance of $Fmax$ in terms of the boxplots' median and IQR are consistent with those of the mean and are therefore not reported here.

The results also confirm the state-of-the-art knowledge about CV. Specifically, CV is shown to reach better results than MC where the correlation between the LF and the HF data is higher. To prove this concept, we focus on the fact that the two situations represented in Fig. 8 correspond to the highest and lowest Pearson's correlation coefficients computed over all the crash box combinations: C/B has a correlation of 0.92, while F/B has 0.68, see Table 4. The left subplot of Fig. 8 shows a stronger prevail of CV over MC than the one visible in the right one. The vertical distance between the CV and the MC lines in the left subplot is larger than the one in the right since the data from crash box C are more correlated to those of crash box B than the ones of crash box F. For this reason, using CV is more efficient in the first case than in the second. Overall, the results demonstrate that the estimation of the control parameter remains reliable even with a very low amount of HF points.

## 5 Conclusions

This paper has presented the application of CV to crash analysis. In the automotive industry, when assessing the crash safety early on in the development, the final characteristics of a product are not yet entirely defined. However, it frequently happens that plenty of past data are stored from previous products and remain unexploited. This paper has shown that CV can be of aid to improve the prediction of the first and second order statistics of a crash performance parameter, exploiting data from past systems' designs.

Through a methodical approach that generates samples within each system's parameter space and compares them in a sequential manner, we have illustrated the use of CV in an industrial scenario. Here, the LF and HF data belong to different physical system, i.e., old and new ones. The addressed study case focuses on a crash box subjected to drop tower test.

The results of the study suggest that CV outperforms plain MC in the prediction of the mean of the maximum crashing force for a limited amount of HF data. Wherever a high correlation is observed between the LF and HF data, CV is shown to more effectively converge to the exact mean value of the quantity of interest. CV also produces more stable solutions with respect to MC over a set of random repetitions.

Despite the promising results of the proposed method, some limitations should be acknowledged. Our method is useful for unimodal distribution, as it estimates the first and second order statistics. It does not suffice for multimodal distributions. In addition, our method is limited to LF and HF systems with the same set of input variables. In more complex cases, however, new versions of the same product may introduce uncertain parameters that were not present in the older versions. To overcome these limitations, alternative approaches need to be considered, e.g., deep learning techniques. Overall, the proposed method is promising but is still in its early stages. In terms of practical application, the translation of our method into industry requires further validations with diverse datasets and the development of integration protocols. Challenges to this aim include ensuring robustness across varied scenarios and properly training the engineers.

The application of CV in crashworthiness analysis provides the opportunity to combine data from different design stages of a component, thereby enhancing the efficiency of crash simulations. Moreover, with CV, it becomes possible in the future to incorporate data from different sources, including FE simulations and hardware tests. This approach allows for a more comprehensive understanding of the crash behavior of structures in the very early phase of the development, leading to improved vehicle safety and occupant protection.

## Funding Data

European Union's Horizon 2020 research and innovation programme (Marie Sklodowska-Curie Grant Agreement GREYDIENT No. 955393).

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## Nomenclature

*a*=dimension of the crash box in

*x*-direction*b*=dimension of the crash box in

*y*-direction- CV =
control variates

- $Fmax$ =
maximum value of the force–displacement curve

- HF =
high fidelity

- IQR =
interquartile range

- LF =
low fidelity

*m*=number of LF samples

*M*=mass of the rigid wall

- MC =
Monte Carlo

*n*=number of HF samples

- QoI =
quantity of interest

*t*=_{n}nominal thickness of the crash box

*α*=bending angle of the rigid barrier

- Ω =
overlap between the crash box and the rigid barrier

### Appendix: Calculation of Variance and Covariance

*r*is either the LF response $rLF$ or the HF response $rHF$; and $\Theta l$ is a sample set that represents either $\Theta m$ or $\Theta n$. The covariance $\mu \u03021,1$ of the response between the LF and HF models is given by (see, e.g., Ref. [23])

## References

**98**(1), pp. 1–17.https://www.researchgate.net/publication/38176606_Energy_absorption_of_thinwalled_corrugated_crash_box_in_axial_crushing