A statistical approach for computational fluid dynamics (CFD) state-of-the-art (SoA) assessment is presented for specified benchmark test cases and validation variables, based on the combination of solution and N-version verification and validation (V&V). Solution V&V estimates the systematic numerical and modeling errors/uncertainties. N-version verification estimates the random simulation uncertainty. N-version validation estimates the random absolute error uncertainty, which is combined with the experimental and systematic numerical uncertainties into the SoA uncertainties and then used to determine whether or not the individual codes/simulations and the mean code are N-version validated at the $USoAi$ and U_{SoA} intervals, respectively. The scatter is due to differences in models and numerical methods, grid types, domains, boundary conditions, and other setup parameters. Differences between codes/simulations and implementations are due to myriad possibilities for modeling and numerical methods and their implementation as CFD codes and simulation applications. Industrial CFD codes are complex software with many user options such that even in solving the same application, different results may be obtained by different users, not necessarily due to user error but rather the variability arising from the selection of various models, numerical methods, and setup options. Examples are shown for ship hydrodynamics applications using results from the Seventh CFD Ship Hydrodynamics and Second Ship Maneuvering Prediction Workshops. The role and relationship of individual code solution V&V versus N-version V&V and SoA assessment are discussed.

## Introduction

The foundations of experimental uncertainty analysis have been well established over the last 75 yr, as explained in Ref. [1]. The individual facility test uncertainty includes both systematic and random components that are estimated and combined as an overall uncertainty estimate. The systematic uncertainty is estimated at the zeroth-order replication level (single realization) based on the inherent measurement system biases. The random uncertainty is estimated at the first-order replication level (multiple realizations) based on statistical analysis. The combination of zeroth- and first-order replication level results provides the N-order replication level overall uncertainty. Normal statistics is often used under the large-sample assumption. Facility biases can only be assessed if individual facility test results and uncertainties are available from multiple facilities for the same geometry and test conditions. Stern et al. [2] proposed a statistical approach for assessing facility biases called M × N-order replication level (multiple facilities) in which the mean facility provides reference values. Industrial design testing seldom is concerned with facility biases as often the focus is on design tradeoffs, whereas it is of utmost importance for computational fluid dynamics (CFD) validation studies, as the assessment of prediction capability requires a high degree of confidence in the experimental benchmark data as an accurate representation of the true value.

The foundations of CFD uncertainty analysis are not yet well established. Large conceptual and philosophical differences exist, ranging from definitions of errors and uncertainties and verification and validation (V&V) to detailed procedures and perspective followed.

Roache [3] adopts experimental uncertainty analysis definitions of errors and uncertainties; phrases for defining verification (equations solved correctly/mathematics) and validation (correct equations/physics); procedures such as manufactured solutions and the grid convergence index (GCI) for code verification and solution grid-size and time-step verification, respectively; and procedures for validation using experimental data.

The AIAA Committee on Standards for CFD [4] adopts information theory definitions of errors and uncertainties; adopts and expands on Roache phrases and procedures for V&V including the use of GCI; and segregates verification and validation as distinctly separate processes. Similarly, Oberkampf et al. [5] present validation metrics and although provide references for verification and uncertainty quantification (UQ), neither are included explicitly in the validation process.

Coleman and Stern [6] adopt experimental uncertainty analysis definitions of errors and uncertainties; define solution verification as a process for assessing simulation numerical errors/uncertainties; and define solution validation as a process for assessing simulation modeling errors/uncertainties. Validation is achieved if the absolute value of the comparison error $|E|$ (the difference between experimental data and simulation value) is less than or equal to the validation uncertainty *U _{V}* (root-sum-square of the simulation numerical and experimental data uncertainties).

*U*sets the interval at which validation can be achieved. Roache [7] criticized this approach since the larger

_{V}*U*is the easier it is to be validated and that a programmatic tolerance should be added to

_{V}*U*. The authors point out that validation at large

_{V}*U*is clearly undesirable and programmatic validation requirements/tolerances on $|E|$ and

_{V}*U*can easily be incorporated [8].

_{V}Stern et al. [9] expand on Coleman and Stern [6] by providing a comprehensive mathematical framework for V&V including the use of numerical benchmarks and programmatic validation requirements/tolerances; details of verification procedures including the correction-factor (CF) verification method; and examples for CFD simulations [10]. Oberkampf [11] criticized this approach as neglecting code verification and claimed that the verification approach used experimental data and validation approach was based on single realization experimental data. The authors point out that their focus is on solution verification and rebut his other two claims [12,13]. Roache [14] criticized the correction-factor verification method as failing to provide 95% confidence level along with several other issues. The authors revised the correction factor to address his 95% confidence level criticism and rebut his other issues [15].

The ASME Standard for Verification and Validation in Computational Fluid Dynamics and Heat Transfer [16] follows Coleman and Stern [6] validation and Roache [3] verification procedures. Also discussed are the effects of important parameter uncertainties and evaluation and interpretation of V&V results with examples.

Xing and Stern [17] developed the factor of safety (FS) verification method, which addressed two deficiencies of the CF and GCI verification methods: unreasonably small uncertainties when the Richardson extrapolation order of accuracy is larger than the theoretical order of accuracy and lack of statistical evidence that the interval of numerical uncertainty is at the 95% confidence level. Roache [18] gave ten items of discussion to which the authors responded [19]. Eça and Hoekstra [20] developed the least square root (LSR) verification method, which does not require monotonic convergence. Xing and Stern [21] gave several criticisms to which the authors responded [22]. Phillips and Roy [23] propose a global deviation uncertainty estimator and compare with other verification methods. Rider et al. [24] propose a robust verification method and provide examples from computational heat transfer, fluid dynamics, and radiation transport. Outstanding issues include coverage factors/factors of safety, evidence for confidence levels, conservatism, and oscillatory convergence.

Code verification is appropriate during code development phases and not considered herein; however, shortcomings have been observed [25] in some cases. Solution V&V provides error/uncertainty estimates for a single user/code/simulation/setup resulting in fixed estimates analogous to experimental systematic uncertainties. This provides only a partial uncertainty estimate, with the risk of relying on too small and nonrealistic uncertainty intervals. Clearly, it would be desirable if consensus could be reached on definitions for errors and uncertainties and verification and validation, including the mathematical framework. While there are pros and cons for the different current verification methods, all are based on Richardson extrapolation, and, in practice, often provide similar results. In contrast, large differences exist on validation procedures. The AIAA approach segregates verification and validation and does not explicitly take into consideration simulation and experimental uncertainties in the validation process, whereas ASME approach integrates verification and validation, and explicitly takes into consideration simulation and experimental uncertainties in the validation process. In the AIAA perspective, the ASME approach is criticized since the larger the validation uncertainty is the easier it is to be validated, albeit at a large interval. In the ASME perspective, the AIAA is criticized since it is unreasonable to presume that experimental, numerical, and other relevant uncertainties are negligible. A shortcoming of all these solution V&V procedures is the lack of a random simulation uncertainty estimate.

Hemsch [26] proposed a statistical approach referred to as N-version testing for the assessment of the scatter in CFD simulations based on results from the Drag Prediction Workshop (most recently, Ref. [27]). The objective is to assess CFD for simplified geometries and conditions relevant to commercial aircraft. The focus is mainly on drag predictions, along with grid convergence and statistical analysis of the CFD outcomes. The assessment of the CFD results is largely based on solution verification and variability, with a limited discussion on quantitative validation versus experiments. The focus is on the collective computational process consisting of all the individual processes with the dispersion of the results interpreted as the noise in the collective computational process. Although the GCI method intention is to provide a numerical uncertainty estimate, it is neither used nor is the experimental uncertainty in assessing validation. Grid convergence is assessed following Salas [28,29], including estimates for the numerical benchmarks. The workshop series has had large participation and has been clearly successful. Hemsch claims that N-version testing is similar to experimental N-order replication level, but in our opinion, it is more similar to first-order replication level, i.e., lacks the zeroth-order replication systematic uncertainty estimate, which is needed in combination with the first-order replication level to obtain the N-order replication level. Statistical methods include estimating the mean using the sample average and/or median and assessing the 95% confidence interval using the sample standard deviation, the average moving range, and/or the median absolute deviation, and assuming a uniform distribution. Running record statistics of both simulation results and estimated numerical benchmarks are shown.

Stern et al. [30,31] proposed a statistical approach for certification of CFD codes using concepts and methods from Stern et al. [2] for estimating experimental facility biases, N-version testing [26], and solution V&V [9]. This overall approach includes systematic uncertainties (stemming from solution V&V), reference values (from experimental data and associated uncertainties), and random simulation uncertainty from N-version testing. Certification is defined as a process for assessing probabilistic confidence intervals for CFD codes/simulations for specified benchmark applications and certification variables. Certification of CFD codes/simulations is conducted for multiple codes and users (code level) for the same benchmark test case such that the scatter due to differences in models and numerical methods, grid types, domains, boundary conditions and other setup parameters can be assessed. Differences between codes/simulations are due to myriad possibilities for modeling and numerical methods and their implementation as CFD codes and simulation applications. Industrial CFD codes are complex software with many user options such that, even in solving the same application, different results may be obtained by different users, not necessarily due to user error but rather the variability arising from the selection of various models, numerical methods, and setup options. The perspective is essentially the same as Hemsch [26] that the scatter in the results represents the random component of the CFD simulation uncertainty. Certification broadly refers to meeting specified requirements via testing or similar procedures without legal implications such as licensing; therefore, the terminology certification was used to distinguish from validation in lieu of a better/accepted word. Statistical methods include estimating the mean using the sample mean and assessing the 95% confidence interval using the sample standard deviation and assuming a normal distribution. Running record statistics of simulation results is shown.

Stern et al. [30,31] used solutions from the fourth [32] and fifth [33] CFD Ship Hydrodynamics Workshops, respectively, as examples. The latest Seventh Workshop was held in 2015 in Tokyo (T2015). The next Eighth Workshop will be held in Wageningen in 2020. The objective of these workshops is the assessment of current CFD methods for ship hydrodynamics to aid code development, establish best practices, and guide industry. The benchmark test cases use modern tanker, containership, and combatant hull forms for design Froude number (Fr) and model-scale Reynolds number (Re) conditions, with the focus on V&V for turbulence modeling, self-propulsion, free surface, and ship motions. The test cases use experimental data from towing tanks, wave basins, and limited wind tunnel experimental facilities. The overall V&V approach follows Stern et al. [9]. Verification uses GCI, FS, and LSR methods. Validation follows Coleman and Stern [6], including numerical and experimental uncertainties in assessing the validation interval. Verification and validation is not segregated, i.e., is considered integral in assessing the quality of the CFD solutions. The sixth (held in 2010 [34]) and seventh Workshops have assessed the scatter in the submissions, i.e., the solution standard deviation as an additional assessment metric. The workshop series has had large participation and has been clearly successful.

Herein, an improved more robust certification approach [35] is presented that uses absolute value statistics, thereby providing a more conservative error estimate and the possibility of certification at a reduced interval of uncertainty. The use of the terminology “certification” has been criticized as implies/requires an authorizing body. Therefore, we accept this criticism and change the reference to our approach as a statistical approach for CFD state-of-the-art (SoA) assessment: N-version verification and validation. The focus is on assessing the CFD SoA capability in toto (for the codes/simulations as a whole) as opposed to a ranking of the individual codes/simulations. Additionally, the SoA assessment methodology is intended to help identify the allocation of resources for the reduction in errors and/or SoA uncertainties based on error magnitudes and the relative contributions of SoA uncertainty components. It should be emphasized that it is the code solutions for the benchmark test cases that are validated at either the individual code/simulation or multiple code/N-version levels, as opposed to the code itself for general application. The terminology code/simulation is adopted when referring to individual code solutions versus mean code when referring to the codes in toto. Examples are provided for ship hydrodynamics using results from the Seventh CFD Ship Hydrodynamics Workshop and the Second Ship Maneuvering Prediction Workshop held in 2014 (SIMMAN 2014 [36]). The First Ship Maneuvering Prediction Workshop was held in 2008 [37]. The discussion includes the role and relationship of individual code/simulation V&V versus N-version SoA assessment.

## Solution Verification and Validation

*S*use the root-sum-square of the iterative $UIi$, grid $UGi$, and time-step $UTi$ uncertainties

_{i}where *i* indicates an individual code and *S _{i}* is the solution on the finest grid. ASME [16] advocates adding $UIi2$ with $UGi2$ and $UTi2$. Iterative and grid/time verification studies are difficult and unfortunately often neglected. The FS method requires monotonic convergence and ratio of the Richardson extrapolation and theoretical order of accuracy

*P*=

*P*

_{RE}/

*P*

_{th}≤ 2 (due to lack of data for

*P*> 2 used for estimation of the required factor of safety). Least-square-root method allows for oscillatory convergence, but there are differences of opinion on some aspects of the procedures. The results of the sixth CFD Ship Hydrodynamics Workshop [34] indicated that both methods provided similar uncertainty estimates.

If $UVi\u226a|Ei|$, the sign and magnitude of $Ei\u2248\delta SM$ can be used to make modeling improvements. $UVi$ includes all estimable uncertainties in the data and the simulations and is the key metric in the validation process, which sets the interval at which validation can be achieved and may or may not meet programmatic requirements/tolerances, as discussed in Ref. [9].

Individual code solution V&V provides metrics for both the error $Ei$ and its uncertainty $UVi$ from which conclusions can be made concerning acceptability or improvement strategies. The experimental uncertainty *U _{D}* usually includes both systematic and random components, whereas the numerical uncertainty $USNi$ is based solely on the systematic error and uncertainty estimates. Sensitivity and UQ studies using random perturbations of CFD code input parameters fail to provide an accurate simulation random uncertainty estimate as not representative of the inherent randomness in the CFD process as applied by different codes and/or users for different applications.

## N-Version Verification and Validation

N multiple solutions from different codes and/or users for specified benchmark test cases provide the necessary data for assessment of CFD SoA capability, including individual solution and mean code errors and estimates for simulation and absolute error random uncertainties. Similar to Hemsch [26], the assumption is made that the scatter of the CFD results represents the reproducibility of the computations. Results from many users of the same code are similar to N-order replication level experiments (individual facility and measurement systems), whereas results from many different codes are similar to M × N-order replication level experiments (multiple facilities and measurement systems). The statistical analysis of the results provides the desired metrics for N-version V&V based on the sample mean and standard deviation assuming a normal distribution of solutions and errors. Outliers are identified/rejected using Chauvenet's criterion [1]. A coverage factor of 2 (twice the standard deviation) is used for evaluating the 95% confidence intervals. Alternatively, the median and the median absolute deviation may be used to estimate (population) mean and standard deviation; the uniform distribution may be used with a coverage factor of $3$ (square root of three times the standard deviation). For simplicity, the sample mean value and standard deviation and normal statistics are used under a large-sample assumption; however, the procedure itself can easily accommodate nonparametric statistical methods such as bootstraps techniques [38] if sufficient data (number of multiple solutions) are available to warrant their use.

Industrial CFD codes/simulations and users are the statistical sample population of interest with appropriate benchmark validation test cases. The test cases should represent industrial practice for design, e.g., in ship hydrodynamics resistance and propulsion, seakeeping, and maneuvering tests using towing tanks and maneuvering and wave basins. The test case specifications are the geometry, conditions similar as they would be specified for experiments, and the validation variables are the measurement results used in ship design. The CFD SoA assessment is in the context of industrial applications and the ability of CFD to replace model testing. Thus, the setup of the CFD codes/simulations is not overly specified but rather left to the user best practices such that the variability in the CFD results represents the variability in the CFD SoA for the industrial application of interest.

The scatter in the results from N-version testing represents the random component in the CFD simulation and absolute error uncertainties, analogous to experimental random uncertainty. As with estimating the random uncertainty in an experiment, the variability can only represent that included in the statistical analysis. One must take into consideration appropriate time intervals and other factors between repeat tests such that the variability for the experimental result of interest is properly represented. For example: are repeat tests all done on the same day or over a longer interval of time, do the same or multiple test engineers conduct the tests, is the model re-installed for different repeat tests, and is the same measurement system used in estimating the random uncertainty, as per N-order replication level uncertainties? Such estimates, however, neglect facility biases and scale effects, which require additional results obtained from different facilities using their procedures, same or different models of the same geometry, and same or different scales, as per M × N-order replication level uncertainties. The analogous situation for N-version testing is academic exercises in which the CFD codes are all applied to the exact same initial boundary value problem, use similar models and numerical methods, and satisfy acceptable V&V and UQ such that the variability between solutions will likely be of similar order as the V&V and UQ versus an industrial exercise as envisaged herein.

### N-Version Verification.

where bias uncertainties are estimated at the simulation (single realization) level and precision uncertainties at the code (N-version, multiple realization) level.

Equations (1) and (3) assume correlated modeling and numerical errors are negligible, as a first approximation, as discussed in Refs. [9] and [30]. Thus, the systematic uncertainty should include correlated modeling and numerical errors at a higher order of approximation. In contrast, $PSi$ includes all simulation random uncertainties, including those arising from modeling and numerical errors and their correlations, i.e., represents the random simulation uncertainty.

$\sigma S%S\xaf$ provides a measure of the scatter in the multiple CFD solutions for the specified benchmark test case. $USi$ including $PSi$ (similarly for $US\xaf$ and $PS\xaf$) provides a simulation uncertainty estimate at the N-order replication level. The mean code is a fictitious representation of the average of the N-version population. Outliers can be identified and rejected similarly as with experimental data using, e.g., Chauvenet's criterion. Herein, for simplicity, a solution is rejected if its deviation from the mean is larger than $2\sigma S$, i.e., *N* ≈ 10.

The assumption that for $N\u226510$ and codes/simulations sufficiently similar in modeling and numerical methods and code development that $Si$ distribution is approximately normal is reasonable; however, multiple peaks and skewed distributions are also realized and should be expected, e.g., clustering around turbulence models or grid types.

### N-Version Validation.

The CFD SoA assessment is based on N-version validation for the specified benchmark test case.

The average absolute error is always greater than or equal to the absolute value of the signed average error. Previous certification approach used average error with the sign and $\sigma S=\sigma E$ for estimating simulation and error random uncertainties. Bias and precision uncertainties were estimated similarly as for solution validation, i.e., treating *E _{i}* and $E\xaf$ as data reduction equation and using propagation of error analysis. Clearly, the average absolute error is a better indicator of CFD SoA capability, as average of large positive and negative errors leads to erroneous result that the errors are small. Herein, average absolute error and its scatter are used for the CFD SoA assessment.

The bias uncertainty is evaluated using $|E|\xaf$ as data reduction equation and propagation of error analysis, whereas precision uncertainty uses $|E|\xaf$ as data reduction equation and end-to-end analysis in which the standard deviation is evaluated for $|E|\xaf$ itself. Note that this is the usual practice in experimental uncertainty analysis.

$\sigma |E|%D$ provides a measure of the scatter in the multiple solution absolute errors for the specified benchmark test case.

$USoA$ includes all estimable uncertainties in the data and the simulations and is the key metric in the assessment of the CFD SoA. It sets the interval at which the SoA can be achieved and may or may not meet programmatic requirements/tolerances.

The sign of $E\xaf$ may be of value; however, clearly improvements are made at the individual code/simulation level.

Note that the coverage factor *k* in Eq. (29) follows the folded normal distribution quantiles and is asymmetric for lower and upper bound. Depending on the mean and standard deviation of the signed error, *k* ranges from 1.3 to 2 for the lower bound and from 2 to 2.4 for the upper bound. For simplicity, hereafter, the approximated value *k* = 2 is used.

N-version validation provides additional confidence compared to individual solution validation since it is additionally based on statistics of the normal distribution of N-versions. State-of-the-art uncertainty is also an improvement over simply identifying outliers based on $\sigma S$ alone since additionally includes considerations of bias uncertainties. As with experimental uncertainty analysis, maximum confidence is achieved if both bias and precision uncertainties are considered. Subgroup analysis procedures can be used for isolating and assessing differences due to the use of different models and/or numerical methods.

*U*

_{req}can be considered similarly as for solution validation, but with

*U*replaced by $USoAi$. Since $USoAi$ is ≥

_{V}*U*it will always be a more conservative assessment. There are six possible combinations of $|Ei|$, $USoAi$

_{V,}_{,}and

*U*

_{req}assuming none are equal

In cases 1, 2, and 3, N-version validation is achieved at the $USoAi$ interval, i.e., the comparison error is below the noise level. From an uncertainty perspective, modeling errors cannot be isolated. In cases 4, 5, and 6, the comparison error is larger than the noise level, i.e., $USoAi$ < $|Ei|$ such that from an uncertainty perspective, the sign and magnitude of *E* can be used to estimate *δ*_{SM}. If $USoAi$ ≪ $|Ei|$, *E* = *δ*_{SM}. Only cases 1 and 4 meet the programmatic requirements.

Consideration of programmatic requirements/tolerances resolves two paradoxes of the Coleman and Stern [6] solution validation approach: (1) that only when validation is not achieved it is possible to have confidence that the error equals the modeling error; and (2) validation is easier to achieve for large *U _{V}*, i.e., noisy experiments and/or simulations. These paradoxes are mentioned at the individual code/simulation level but are also true for N-version validation with

*U*replaced by $USoAi$.

_{V}The reason for paradox (1) is that only for *U _{V}* = 0 it is true that

*E*=

*δ*

_{SM}, which only can occur for cases 4–6. For case 4, even though validation is not achieved both $|Ei|$ and

*U*are <

_{V}*U*

_{req}such that programmatic requirements are met and no action is needed. For case 5,

*E*=

*δ*

_{SM}can be used to guide improvements in modeling in order to meet programmatic requirements. For case 6, similar as for case 5,

*E*=

*δ*

_{SM}can be used to guide improvements in modeling and reduction in

*U*, i.e.,

_{V}*U*and/or

_{D}*U*

_{SN}(depending on their relative magnitudes) are required in order to meet programmatic requirements.

The reason for paradox (2) is that, without *U*_{req}, *U _{V}* is unrestricted, whereas once restricted by

*U*

_{req,}there is no possibility for acceptance of the achievement of validation by a large

*U*. For case 1, both $|Ei|$ and

_{V}*U*are <

_{V}*U*

_{req}such that programmatic requirements are met and no action is needed. For case 2, reduction in

*U*, i.e.,

_{V}*U*and/or

_{D}*U*

_{SN}(depending on their relative magnitudes), is required in order to meet programmatic requirements. For case 3, reduction in both $|Ei|$ and

*U*, i.e.,

_{V}*U*and/or

_{D}*U*

_{SN}(depending on their relative magnitudes), is required in order to meet programmatic requirements. Thus, case 3 is the most difficult as one cannot discriminate between different models with $|Ei|$ <

*U*from an uncertainty perspective.

_{V}The processes for determining *U*_{req} and *U _{V}*/$USoAi$ are very different; therefore, meeting or not

*U*

_{req}should not be confused with individual code/simulation and multiple codes/N-version validation. Solution validation is a process for assessing simulation modeling errors/uncertainties. N-version validation extends this concept for multiple codes/simulations, which enables inclusion of the random absolute error uncertainty in assessing the CFD SoA. Presumably, the process for determining

*U*

_{req}is dominated by financial (beyond design testing and simulation), safety, environmental, and other concerns which may or may not take into consideration

*E*and

_{i}*U*/$USoAi$

_{V}_{.}

## Summary of Solution and N-Version Verification and Validation and State-of-the-Art Assessment Metrics and Procedures

The important metrics for solution V&V at the individual code/simulation level are the simulation numerical uncertainty $USNi$, the comparison error *E _{i}*, the experimental uncertainty

*U*, and the validation uncertainty $UVi$. Verification procedures depend on the verification method, but typically require iterative convergence at least one order magnitude smaller than grid or time convergence. Grid and time-step convergence estimates use multiple systematically refined grid/time-step sizes for which convergence, the order of accuracy, and error/uncertainty estimates are made. Validation is achieved if $|Ei|\u2264UVi$. Reduction in the validation interval requires reduction in $USNi$ and/or

_{D}*U*depending on their relative magnitudes. If $UVi\u226a|Ei|$, the sign and magnitude of $Ei\u2248\delta SM$ can be used to make modeling improvements. Programmatic requirements/tolerances are easily incorporated as discussed previously.

_{D}The most important metrics for N-version verification are additionally the scatter in the simulation results $\sigma S$, the code level individual code simulation uncertainty $USi$_{,} and the code level mean code simulation uncertainty $US\xaf$. Outliers can be identified/rejected using Chauvenet's criterion. The combination of solution and N-version verification uncertainties provides an N-order replication level simulation uncertainty estimate.

The most important metrics for N-version validation and CFD SoA assessment are additionally the mean comparison error $E\xaf$, the mean absolute comparison error $|E|\xaf$, the mean absolute comparison error standard deviation $\sigma |E|$, the mean code SoA uncertainty $USoA$_{,} and the individual code/simulation SoA uncertainty $USoAi$. The individual code/simulation N-version validation is achieved if $|Ei|\u2264USoAi$. $USoAi$ sets the interval of the CFD SoA. Reduction in $USoAi$ requires reduction in $USNi$, *U _{D}* and/or $\sigma |E|$ depending on their relative magnitudes. If $USoAi\u226a|Ei|$, the sign and magnitude of $Ei\u2248\delta SMi$ can be used to make modeling improvements. Programmatic requirements/tolerances can be considered as already discussed. Mean code N-version validation follows similar procedures as for the individual codes.

Submissions from the CFD workshops for benchmark test cases provide the necessary data to assess these important metrics and reach conclusions on the CFD SoA capability.

## Example Computational Fluid Dynamics

State-of-the-Art Assessments

Two examples are provided using results from the Seventh CFD Ship Hydrodynamics Workshop and the Second Ship Maneuvering Prediction Workshop. Test cases are selected that include sufficient submissions for statistically relevant sample populations representing many countries, institutes, and codes. Comparisons are made with results from previous workshops to assess advancements in CFD prediction capabilities.

### Test Case 2.10 From Seventh CFD Ship Hydrodynamics Workshop (T2015).

Test case 2.10 conducted V&V and SoA assessment for KRISO Container Ship (KCS) ship at Fr = 0.26 and Re = 1.074 × 10^{7} for added resistance (AR) in head waves. Figure 1(a) shows KCS hull form and coordinate system. Simonsen et al. [39] provided the experimental data and uncertainties. The validation variables for calm water are resistance *X*_{calm}, sinkage *σ*, and trim *τ*; and for head waves are *X* = *X*_{mean in waves} − *X*_{calm}, heave *z*, and pitch *θ*. The variables are nondimensionalized by water density *ρ*, wetted surface area *S*, ship velocity *V*, ship length *L* and beam *B*, and wave amplitude *A,* and slope *k*. For calm water, the nondimensional variables are *C _{T}* =

*X*

_{calm}/0.5

*ρSV*

^{2}, $\sigma /L$ and

*τ*(in deg). For head waves, the nondimensional variables are AR =

*X*/

*ρgA*

^{2}

*B*

^{2}/

*L*, $z/A$

**and $\theta /Ak$.**

_{,}*C*is based on wetted surface area

_{T}*S*/

*L*

^{2}= 0.1818 with the rudder and in calm water. The KCS geometry, mass properties, and Fr and Re were specified along with the wave length/model length ratio

*λ*/

*L*= 0.65, 0.85, 1.15, 1.37, and 1.95 and the wave height/wave length ratio

*H*/

*λ*= 1/60. The

*λ*/

*L*= 1.15 condition is the resonance condition when the frequency of encounter equals the natural frequency for heave and pitch motions. Motions for

*H*/

*λ*≤ 1/60 are usually considered small amplitude response. The experimental first harmonic wave amplitude (

*A*=

*ζ*

_{1}) and length (

*λ*=

*λ*

_{1}) were used as the CFD inflow. The experimental waves were largely first harmonic dominant, i.e.,

*ζ*

_{2}≤ 1%

*ζ*

_{1}for short waves and

*ζ*

_{2}≤ 10%

*ζ*

_{1}for long waves. For calm water ship design,

*C*is a first-order term, whereas sinkage and trim are second-order terms. For head wave ship design, first harmonic amplitude and phase of heave and pitch are first-order terms, whereas AR is a second-order term.

_{T}KRISO Container Ship calm water experimental data were available from four facilities/countries using four model sizes ranging from 7.3 to 2.7 m such that *C***_{T}**, $\sigma /L$

_{,}and

*τ*uncertainties include facility bias uncertainty due to data scatter. For

*C*and

_{T}*C*(residuary resistance coefficient), the scatter was 5.6 and 1.4 times larger than the individual facility uncertainty, respectively, excluding the smallest model. For $\sigma /L$ and

_{R}*τ*, the individual facility uncertainty was not estimated; therefore, for these variables, the uncertainty estimate is only based on the data scatter. More attention is needed for individual facility uncertainty analysis. Since

*C*partially removes scale effects, its value was used. KRISO Container Ship head wave data and uncertainties were only available from two facilities/countries using three model sizes. The data scatter was large, which may be due to scale effects as one of the model sizes is relatively small. Therefore, the data for the small model size were not included in the uncertainty estimates used for validation. However, the statistical convergence of the Fourier coefficients was included, which was equal, three and four times larger than the individual facility uncertainty estimates for AR, heave, and pitch, respectively. Thus, for head waves, more data are needed from more facilities including uncertainties for more robust validation. In addition, more attention is needed to quantify the CFD input wave condition and uncertainty and specifying the CFD conditions and evaluation locations.

_{R}As shown in Table 1, ten institutes from eight countries using nine different CFD codes submitted results for both calm water and waves. The CFD codes included commercial, industrial, and academic software. Two-equation turbulence models were used, with eight using wall functions and two using near wall models. The solvers were mostly finite volume (FV) except one was finite difference (FD). All but two used unstructured grids. Four used level set and six used volume of fluid (VoF) free surface models. Grid motion used dynamic, dynamic overset, and deforming grids. Grid sizes varied from 0.6 to 29 M with average 9.8 M. The average number of CPU and time is 60 processors and 3484 h, respectively.

It is very unfortunate that participants did not submit individual code/simulation grid and time-step verification for ship motions test cases and an important oversight of the workshop organizers. One submission did verification for head waves but did not submit the results, whereas others performed verification for different hulls with similar conditions but not for the present test cases. Larsson et al. [34] summarize results from several ship motions verification studies (Table 4.11). For current grid sizes/time steps, individual solution verification $USNi$ is estimated at about 4% and 8%*S*_{1} for first- and second-order terms, respectively, which would increase intervals of solution validation by about 30%*D* with a relatively small effect on intervals of N-version validation, especially for head waves. Herein, $USNi$ is neglected instead of including representative estimates, with the warning that intervals of validation are optimistic and strong conclusions that future workshops need to require participants to submit verification for all test cases.

Figure 2 shows the signed and absolute error histograms and running records for calm water resistance *C _{T}*, including sample mean and median values and mean, median, and individual code/simulation uncertainty estimates based on several statistical methods.

Figures 2(a) and 2(b) signed and absolute error histograms include normal/uniform and folded normal/uniform distributions for comparison. Both *C _{T}* signed and absolute error distributions are skewed and do not appear to follow closely the normal/uniform and folded normal/uniform distributions. However,

*N*= 10 items are insufficient for accurate assessment of the distribution.

Figure 2(c) shows the signed error running record. The differences between the mean and the bootstrapped mean and the mean and the median are 0.05 and 0.26%*D*, respectively. The uncertainty of the mean evaluated assuming normal distribution/large sample is 0.19%*D* larger than the uncertainty of the mean provided by the bootstrap method and 1.4%*D* smaller than the uncertainty of the median obtained using the binomial distribution. The difference between the individual code/simulation uncertainty assuming a normal distribution with a coverage factor of 2 and the individual code/simulation uncertainty using the median and assuming a uniform distribution with a coverage factor of $3$ is 0.73%*D*.

Figure 2(d) shows the absolute error running record. The differences between the mean and the bootstrapped mean and the mean and the median are 0.01 and 0.16%*D*, respectively. The uncertainty of the mean evaluated assuming a large sample is 0.06%*D* larger than the uncertainty of the mean provided by the bootstrap method and 0.88%*D* smaller than the uncertainty of the median obtained using the binomial distribution. The difference between the individual code/simulation uncertainty assuming a folded normal distribution using a coverage factor of 2 and the actual 95% confidence interval upper bound for the folded normal distribution is 0.58%*D*. If the median and the folded uniform distribution are used, the individual code/simulation uncertainty is 0.37%*D* smaller than using the mean with a coverage factor of 2.

The comparisons show negligible differences and justify the present use of mean values and (large sample) normal and folded normal statistics.

Figures 2(e) and 2(f) show only results using mean/normal and mean/folded-normal distributions for signed and absolute error, respectively. Figure 2(e) uses previous certification method nomenclature, whereas Fig. 2(f) uses present SoA assessment nomenclature. The mean absolute error clearly better represents the CFD prediction capability than the signed error, which due to sign cancelations suggests much smaller error on average. Also, clearly shown is the reduced uncertainty for the mean and individual codes/simulations for the absolute versus signed error.

Table 2 summarizes the SoA assessment results for resistance, sinkage, and trim. There are no outliers for all three variables. At the solution level, *E _{i}* shows 60% over-, 100% under- and 55%, overpredictions among all submissions for

*C*, $\sigma /L$, and

_{T}*τ*, respectively. Three (resistance), five (sinkage), and nine (trim) codes/simulations are validated at $UVi$ =

*U*= 2, 8, and 14%

_{D}*D*for

*C*, $\sigma /L$, and

_{T}*τ*, respectively. The code level scatter $\sigma E$ = 3, 3, and 6%

*D*for

*C*, $\sigma /L$, and

_{T}*τ*, respectively. At the code level, the mean code $E\xaf$ shows over-, under- and overpredictions with $|E|\xaf$ = 2, 7, and 5%

*D*for

*C*, $\sigma /L$, and

_{T}*τ*, respectively. $USoAi$ = 3, 10 and 15%

*D*for

*C*$\sigma /L$, and

_{T}*τ*, respectively. Thus, most codes/simulations are N-version validated at $USoAi$ intervals. The mean code is (or is not) N-version validated at $USoA$= 2, 8 and 14%

*D*for

*C*, sinkage and trim, respectively.

_{T}For calm water *C _{T}*, $\sigma /L$, and

*τ*, the code/simulation scatter and errors are comparable in magnitude and relatively small ≤6%

*D*such that reducing the intervals of solution and N-version validation requires reducing the errors and experimental uncertainties for $\sigma /L$ and

*τ*.

Table 3 compares calm water resistance V&V and SoA assessment results from the last four Ship Hydrodynamics CFD Workshops. The number of submissions is similar. The grid numbers have increased from 1 to 10 M, which has somewhat reduced both *U*_{SN} (not shown here) and the scatter from 5 to 3%*D*. The error has reduced from 4 to 2%*D*, which may be both due to use of finer grids and improved turbulence and free surface modeling.

Figure 3 shows head wave results, presenting on the left: AR and motions $S\xaf$ with and without outliers; *S _{i}* with outliers identified; and

*D*versus

*λ*/

*L*; in the middle and on the right: individual and mean code absolute errors and SoA uncertainties, and

*U*versus

_{D}*λ*/

*L*with (middle) and without (right) outliers. Table 4 summarizes the SoA assessment results for AR,

*z*

_{1}/

*A*, and

*θ*

_{1}/

*Ak*. Results are averaged over

*λ*/

*L*= 0.65, 0.85, 1.15, 1.37, and 1.95 and expressed as %

*D*.

The solutions show large scatter for all variables, especially AR, i.e., *σ _{E}* = 34, 11, and 12%

*D*for AR, heave, and pitch, respectively. The average signed error closely follows the experimental trends, which supports the perspective that scatter in solutions represents the random component of the CFD uncertainty. Outliers are found/rejected for all variables, especially heave and pitch first harmonic phases. $E\xaf$ is relatively small 13, 6, and 17%

*D*, whereas $|E|\xaf$ = 25, 9, and 19%

*D*for AR,

*z*

_{1}/

*A*, and

*θ*

_{1}/

*Ak*, respectively. The phase errors are small ≤6%2

*π*. Four and five codes/simulations are validated for AR and first harmonic amplitudes of heave and pitch at $UVi$ =

*U*= 13, 9, and 14%

_{D}*D*, respectively. None and one code/simulation are validated for heave and pitch phases at 2%2

*π*uncertainty. Seven codes/simulations are N-version validated for AR at 53%

*D*uncertainty and seven and six codes/simulations are N-version validated for first harmonic amplitudes of heave and pitch at 18 and 26%

*D*uncertainty, respectively. Five and four codes/simulations are N-version validated for heave and pitch first harmonic phases, respectively, at 5%

*D*uncertainty. The mean code is only N-version validated for the first harmonic amplitude of heave at 10%

*D*uncertainty. The errors and uncertainties are largest for resonance and short-wave conditions.

The error for AR and motions averaged over all wavelength conditions is 11 and two times larger, respectively, than the error for calm water resistance and motions. The experimental uncertainty for AR is seven times that for resistance, whereas the head wave and calm water motions *U _{D}* values are similar. The CFD scatter is small $\sigma E$ ≤ 6%

*D*for calm water, whereas much larger for head waves, i.e., 11 and 34%

*D*for first- (amplitudes) and second-order terms, respectively. The SoA

*uncertainty is dominated by experimental uncertainties for calm water motions and CFD scatter for head waves AR. Most codes/simulations are solution and N-version validated for calm water sinkage and trim, whereas for resistance, only three codes/simulations are solution validated and most codes/simulations are N-version validated.*

_{i}For head waves, four and five are validated for AR and motions, respectively, whereas seven are N-version validated for both AR and motions. The mean code is mostly not N-version validated for head wave conditions. Improvements for calm water require reduced errors and experimental uncertainties for sinkage and trim. Improvements for head waves requires reduction in CFD scatter, errors, and experimental uncertainties.

### Test Case 1d-2p From Second Ship Maneuvering Prediction Workshop (SIMMAN 2014).

Test case 1d-2p conducted V&V and SoA assessment for KVLCC2 tanker at Fr = 0.142 and Re = 7.2 × 10^{6} for zigzag 20/20 port side calm water maneuver. Figure 1(b) shows KVLCC2 hull form and coordinate system. KVLCC2 has a horn type rudder and four-blade clockwise rotation propeller. MARIN [40] provided the experimental data and uncertainties. The validation variables are first overshoot angle $\alpha 201$, period $P$, initial turning ability $\u211320$, and yaw rate at first overshoot angle $r\alpha 201$, overshoot time $T\alpha 201$ and second overshoot angle $\alpha 202$, as defined in Fig. 4. KVLCC2 mass properties, Fr and Re, were specified along with rudder controller algorithm and maximum rudder rate and angle. The test case was for model scale with constant propeller revolutions per second based on self-propulsion value at the specified Fr and Re. The International Maritime Organization (IMO) requires that the zigzag 20/20 first overshoot angle be <25 deg [42].

As shown in Table 5, eight institutes from six countries submitted three empirical (EMP), two whole ship model (WHS), five modular model (maneuvering model group (MMG)), and two CFD codes. CFD refers to six degrees-of-freedom (6DOF) simulations of the trajectories using a rudder controller algorithm and maximum rudder rate and angle and propeller models. One used a body force propeller and the other used the actual rotating propeller. The other codes are 3DOF system-based codes, which use mathematical models with empirical formulas, planar motion mechanism (PMM) experimental or CFD, and circular motion test (CMT) experimental maneuvering coefficients.

Here, again numerical uncertainties are not available. Presumably, for system-based methods, *U*_{SN} is negligible. The situation for CFD numerical uncertainties is similar to that discussed previously.

Table 6 has a similar format as Tables 2–4. Figure 5 shows the zigzag 20/20 experimental and solution trajectories for rudder angle, heading, yaw rate, drift angle, and roll. Similarly, as for Fig. 3 (left column), the solutions show large scatter with the average signed error closely following the experimental trends.

The solutions show large scatter for all variables, i.e., average $\sigma E$ = 21%*D*. The average $E\xaf$ is small 8% $D$_{,} whereas the average $|E|\xaf$ is fairly large 17%*D* and more representative of the actual accuracy of the predictions. $\sigma |E|$ is 14%*D;* therefore, the error is larger than the absolute error scatter. The errors are comparable for CFD and MMG using CMT maneuvering coefficients (13%*D*) and increase for EMP (16%*D*) and MMG using PMM maneuvering coefficients (17%*D*) and WHS (25%*D*). These rankings are not firm as can be seen from the same test case analysis results from the earlier workshop SIMMAN 2008 presented in the last column of Table 6. In SIMMAN 2008, the errors were the smallest (9%*D*) for CFD, comparable (13%*D*) for WHS and MMG (both PMM and CMT), and the largest (18%*D*) for EMP. This is because there are too few submissions in each group and differences are within uncertainty but give an indication about subgroup capabilities. Large errors for actual versus body force propeller are unexpected. About 40% of the submission (5 out of 12 codes/simulations) are validated at $UD$ = 11%*D* ($UD=UVi$ with $USNi$ = 0) and 75% (9 out of 12 codes/simulations) N-version validated at $USoAi$ = 30% $D$. $USoA$ = 14%*D* such that mean code is mostly not N-version validated. Improvements for zigzag 20/20 deg maneuvering require a reduction in code/simulation scatter and errors and experimental uncertainties. The magnitudes of the scatter and errors are in between those for head waves AR and motions. The experiments and all code predictions satisfied the IMO requirement, i.e., the first overshoot angle in zigzag 20/20 deg being <25 deg.

## Conclusions

Multiple code/simulation SoA assessment provides more robust confidence levels/intervals than individual code solution V&V since it includes CFD scatter, i.e., random uncertainty and not just deterministic solution verification uncertainty. Comparison errors alone are insufficient to determine CFD capability with confidence, which would require consideration of error confidence levels/intervals. Validation compares CFD errors with the root-sum-square of the CFD numerical and experimental (systematic and random) uncertainties, but lacks the random component of the CFD uncertainty.

Evaluating accuracy by *E _{i}* without considering experimental and computational uncertainties excludes the possibility of establishing confidence levels/intervals and therefore the actual CFD simulation capability, whereas including uncertainties enables the evaluation of both comparison error and its confidence level/interval. Decomposing the uncertainties into systematic and random components for both experiments and simulations provides robust uncertainty estimates and identifies where to focus resources for reduction in intervals of validation/certification.

The Seventh CFD Ship Hydrodynamics Workshop example has provided fairly robust estimates of CFD capability for both calm water and head waves conditions, including confidence levels/intervals. The SoA assessment intervals are relatively small for calm water resistance (3.4%*D*), sinkage (9.6%*D*), and trim (15%*D*), whereas are large for head waves motions (22%*D*) and very large for AR (53%*D*). Some and most codes are solution and N-version validated for calm water resistance, sinkage and trim, whereas about half and most are solution and N-version validated for head waves AR and motions. For calm water, reduction in solution and N-version validation intervals requires reduction in errors and experimental uncertainties for sinkage and trim. For head waves, reduction in solution and N-version validation intervals requires reduction in CFD scatter and *U _{D}* for motions. Errors need reduction for calm water motions and head waves AR and motions. Limitations of the current estimates are the relatively small sample size of only ten CFD codes/simulations, lack of verification of most codes/simulations, and need for experimental data and uncertainties from multiple facilities. Hopefully, current SoA estimates will motivate the Eighth CFD Ship Hydrodynamics Workshop (to be held in 2020) to attract more submissions and require verification for all test cases (which will likely reduce CFD scatter) and more benchmark experiments from more facilities with reduced uncertainties.

The Second Ship Maneuvering Prediction Workshop example has also provided fairly robust estimates of the system based and CFD prediction capability for zigzag 20/20 conditions, including confidence levels/intervals. The solution and N-version validation intervals are large, i.e., 11 and 30%*D*, respectively. Reduction in solution and N-version validation intervals requires reduction in code/simulation scatter. Errors are also large, i.e., 17%*D*. Limitations of the current estimates are similar as for the CFD Workshop. Hopefully, current SoA estimates will motivate the Third Ship Maneuvering Prediction Workshop (to be held in 2019) to attract more submissions and require verification for all test cases.

Comparing the results for calm water resistance, sinkage and trim, ship motions and maneuvering conditions shows that the experimental uncertainties, code/simulation scatter, and errors are often small for calm water and increase with increasing the complexity of the conditions. Assuming a programmatic requirement of *U*_{req} = 10%*D* with reference to Eq. (36), resistance/sinkage and trim meet cases 1 and 2 conditions, respectively, whereas AR, heave and pitch, and zigzag 20/20 maneuver meet the case 3 condition. Thus, N-version validation is achieved in all cases except resistance, whereas the programmatic requirement is only achieved for resistance.

The AIAA and Ship Hydrodynamics and Maneuvering Workshops differ in complexity, base test case, and focus. Base test case for ship hydrodynamics and maneuvering inherently includes complexity of free surface waves, 2DOF/6DOF, bluff body/ rudder/propeller with the focus on V&V CFD capability, and solution scatter. The base test case for AIAA is relatively less complex with low-Mach number compressible flow, fixed slender body with focus on CFD verification, and solution scatter. Both workshops have emphasized verification studies, but at least for ship hydrodynamics not for all test cases. Present results clearly show the importance of requiring verification for all test cases. This will provide more robust solution and N-version confidence levels/intervals. It is hard to compare the verification results between the two workshops since ship workshops estimate numerical uncertainty and AIAA workshops evaluate/discuss verification variables versus NPTS^{−2/3} (number of grid points). In both cases, it is apparent that the solutions are far from the asymptotic range and even finer grids are required. Interestingly, scatter in the CFD codes/simulations for resistance is similar for both workshops, i.e., for the most recent AIAA workshop, the standard deviation is 2.1% of the mean value based on 44 CFD submissions, whereas the comparable value for ship hydrodynamics is 2.8% based on ten CFD submissions.

## Acknowledgment

The Office of Naval Research supports the first four authors' research under grants administered by Dr. Thomas Fu, Dr. Woei-Min Lin, and Dr. Ki-Han Kim. The associate editor's and reviewer's comments were helpful in clarifying the presentation of our perspective.

## Funding Data

- •
Office of Naval Research (Multiple grants).