## Introduction

The Technical Brief by Roache [(1)] presents ten items of discussion of our factor of safety (FS) method for solution verification [(2)]. Our responses are listed below item-by-item using the same numbering as Roache. The nomenclature mostly follows our own and not Roache’s such as $pRE$ for the order of accuracy calculated using the Richardson Extrapolation as opposed to the observed order of convergence and the GCI and GCI2 methods as opposed to the GCI0 and the real GCI methods. However, we agree with Roache to use $FS$ for the factor of safety used in all the verification methods. In response to item (10), we have used our approach to evaluate two new variants of the GCI method and one new variant of the FS method.

## Response

(1) The GCI and FS methods can be written in the following general form:
$UGCI/FS=FS|ɛ21rp-1|$
1
The FS method is substantially different from and not a variant of the GCI method. In the FS method we use $P=pRE/pth$ to determine $FS$ and always use $p=pRE$. Only the FS method, compared with different variants of the GCI method, provides a reliability R larger than 95% and a lower confidence limit (LCL) greater than or equal to 1.2 at the 95% confidence level for the true mean of the parent population of the actual factor of safety. This conclusion is true for different studies, variables, ranges of P values, and single P values where multiple actual factors of safety are available. $FS$ is a smooth linear function of P and has no jumps.

There are a few variants of the GCI method. We have used the definition of the GCI method, which arguably is the most common version/interpretation applied in the literature [(3,4,5)]. The GCI1 method was proposed by Logan and Nitta [(6)]. The guideline for the GCI2 method was communicated to us by Roache

1

The private communication is available to the public upon request.

[(7)] in his criticisms of an earlier version of our FS method [(8)], which he now refers to as the real GCI method. Roache’s most recent book [(9)] does point out that the choice of $FS$ and $p$ requires user judgment calls; however, no single guideline was provided. The lack of a single guideline clearly has caused considerable confusion. We take no responsibility for this confusion. Statistical analysis showed that none of the GCI variants shows R > 95% and LCL > 1.2 for different studies, variables, ranges of P values, and single P values where multiple actual factors of safety are available. As a result, there are high risks to use these GCI variants in certain circumstances, especially for $P>1$. Except the original GCI method, all variants of the GCI method have jumps of $FS$ versus P.

Our purpose is not to add to this confusion but rather to evaluate the performance of the outcomes of selecting any of these variants of the GCI method and compare with the FS method using our approach. The correction factor and $pRE$ are used to define the GCI2 and other verification methods as defined by Eqs. (10)–(15) in Ref. [2] in order to compare their relative conservativeness using the same error estimate $δRE$.

(2) We disagree with Roache to refer to the GCI as the GCI0 method and the GCI2 as the GCI method for reasons given in item (1). The lack of a single guideline for selecting $FS$ and $p$ and when to use which variant of the GCI method is highlighted by Roache’s current discussion. Roache accepts using $pRE$ when it is within a 5% difference of $pth$ in item (2), whereas later in item (10), Roache considers two other judgment calls as reasonable.

The GCI2 method discards the “coarse” grid solution in the uncertainty estimate when $P>1$, which is difficult to justify. For example, four grid solutions from the coarsest grid 4 to the finest grid 1 can build two grid triplet studies, (1, 2, 3) and (2, 3, 4). Grid convergence studies for industrial applications often show the oscillation of $pRE$ such that (1, 2, 3) could estimate $P>1$ but (2, 3, 4) could estimate $P<1$. Based on the GCI2 method, $S3$ should be discarded in the uncertainty estimate for (1, 2, 3) but not for (2, 3, 4). Of course, we agree that ideally one would conduct additional grid triplet studies until the solution is at or as close as possible to the asymptotic range; however, clearly this is not always possible especially for industrial applications [(10)].

We agree that a grid-triplet study with $P=0.08$ is not desirable. However, it is not uncommon for solution verification studies (e.g., local $pRE$ ranges from 0.012 to 8.47 in Ref. [4]). Additionally, Roache’s criticism of using $P=0.08$ is inconsistent with one of his previous conclusions that there is no necessity to discard results with $pRE<1$ ($P<0.5$ for a second order method in Ref. [11]).

(3) The fact that “the use of the GCI1 method is closer to a 68% than a 95% confidence level” was one of the conclusions by Logan and Nitta [(6)]. This conclusion was not just based on the dataset with intentional choice of grid studies with oscillations in both exponent p and output quantity. As stated in page 367 in Ref. [6], “However, for our contrived and mechanics example $NS=18$ sets (most of which were non-smooth), the use of GCI = 1.25 is much closer to a 68% confidence estimate than 95%.” It was also recommended in Refs. [6] and [2] that a sample with the number of grid convergence studies much larger than 100 is needed to draw general conclusions.

We did not recommend the GCI1 method but rather evaluated it using much larger sample sizes than Ref. [6]. For the largest sample 3 with size $N=329$, the reliability $R$ (Eq. (19) in Ref. [2]) is 90.3% for the GCI1 method.

(4) We disagree with Roache’s evaluation in Ref. [11] where it states that “Briefly, the net result is 14 NC (nonconservative) of 176 entries, or 8.0%.” Only 151 of the 176 grid triplet studies have the actual error $E$. This results in 24 nonconservative of 151 (note there are nine nonconservative grid-triplet studies that estimate $U=E$). So, the reliability for the GCI method [(12)] is actually 84.1%, which agrees very well with the reliability 83.9% estimated using our 329 grid-triplet studies (sample 3 in Ref. [2]).

Based on our own evaluation above and the fact that Cadafalch et al. [(12)] used $FS=1.25$ for $P>1$, the method they applied was not the GCI2 method and more likely the GCI method. The claim of “an original and reasonable variant of the real GCI” [(1)] again is confusing.

(5) We take 95% coverage as the common uncertainty target for both experiments and computations [(5)]. Although the GCI2 method only misses the overall reliability by 0.8% for sample 3, more importantly it fails to provide sufficient conservatism for other samples including the reliabilities of 91.4%, 90%, and 87.5% for samples 5, 8, and 16, respectively [(2)]. It is possible that another dataset could slightly change our evaluations. Nonetheless, the current sample size is large and the range of $P$ values is wide such that a further increase of the number of samples is not likely to significantly alter the FS method and its results.

(6) The FS method was calibrated/validated against the available dataset. Note that calibration/validation requires that the true error can be evaluated, i.e., the solution numerical benchmark ($SNB$) or solution analytical benchmark ($SAB$) is known. We welcome additional validation of the FS method and if necessary re-calibration and improvement, but again $SNB$ or $SAB$ must be known. The claim of Roache and others of the 95% reliability for the GCI method is undocumented and based on anecdotal information. We doubt that $SNB$ or $SAB$ is available for many of the cases cited by Roache and others. It should be a simple matter to provide proper documentation.

Note that the FS method is more conservative than the GCI2 method except for $1 due to the jump of the factor of safety at $P=1$ for the GCI2 method. If the FS method is not conservative enough for another dataset, the GCI2 method will likely be worse.

The claim that the GCI2 method has been stable for over 12 years is not well founded. Due to the lack of a single guideline on the choice of $FS$ and $p$, different variants of the GCI method have been used by different users based on their own judgment calls. For example, Cadafalch et al. [(12)] did not use the GCI2 method, and Logan and Nitta [(6)] used the GCI1 method. Furthermore, the GCI method may have been applied to O(1000) cases but no statistical evidence for reliability has been documented.

(7) We disagree with Roache’s suggestion that the FS method has problems in predicting monotonic convergence for fine grids. The uncertainty estimates in Table 6 for the FS method in Ref. [2] for the three finest grid triplets are not monotonically decreasing since $P$ shows large oscillations, and the factor of safety for the second finest grid triplet (2, 3, 4) at $P=1.49$ is much larger than that for the other methods evaluated at the same P. However, the larger factor of safety is required to ensure the reliability for $P>1$. For the three grid triplets discussed, it is interesting to evaluate the convergence ratio $R$ for the fine grid solution $S1$ ($RS1$), $P$ ($RP$), and $UG$ ($RUG$). All the five verification methods have the same $RS1$ and $RP$, which show monotonic convergence. The GCI, GCI1, and CF methods show monotonic convergence for UG, whereas the GCI2 and FS methods show monotonic divergence ($RUG=2.74$) and oscillatory divergence ($RUG=-4.53$), respectively.

The oscillation of P may be caused by many factors. Grid 4 is still too coarse for the solution to be in the asymptotic range. Additionally, reducing the iterative error to machine zero is very difficult for large-scale computations. With the small grid refinement ratio $r=24$, solution changes ɛ will be small, and the sensitivity to grid-spacing and time step may be difficult to identify compared with iterative errors $UI$. As shown in Fig. 6(b) in Ref. [10], $UI,1/ɛ12=61.6%$ for the cases in Table 6 [(2)]. When $r$ increases, $UI/ɛ$ will likely decrease. For example, the grid uncertainty decreases from 5.04 for (2, 4, 6) to 4.02 for (1, 3, 5) with $UI,1/ɛ13=20%$ for $r=2$. However, it should be noted that a large $r$ may be problematic, too, as different grids may resolve different flow physics.

There are some other cases that the GCI, GCI1, GCI2, CF, and FS methods show non-monotonic convergence for multiple grid-triplet studies, including the “well-behaved” problems Cadafalch et al. [(12)] and Roache [(11)] used to evaluate the conservativeness of the GCI method. For the radial velocity using the SMART scheme in the study of premixed methane/air laminar flat flame on a perforated burner [(13,14,15)], the uncertainty estimates using the FS and GCI2 methods monotonically decreased whereas the other three methods did not as the grid is refined. Another example is for the uncertainty estimates for temperature at a monitored location for a two-dimensional natural convection in square cavities at $Ra=106$, which had five grid-triplet studies with $r=2$ [(16)]. Uncertainty estimates using the five verification methods discussed in Ref. [2] first monotonically decreased as the grid is refined but suddenly increased for the finest grid-triplet. Thus, it is unreasonable to blame the FS method as the reason for such behavior.

The verification results for our industrial application example are far from the asymptotic range. Although we evaluated the convergence characteristics for the 98 verification variables using $P$ and $|E|$ as functions of $Δxfine/Δxfinest$ [(2)], a standard criterion for achieving the asymptotic range is still lacking. A possible criterion is that monotonic convergence should be established based on evaluation of the convergence ratio $R$ for fine grid solution $S1$ (towards $SC$), $P$ (towards 1), and $U$ (monotonically decreasing) for multiple (at least three) grid-triplets with the same grid refinement ratio $r$ and $UI≪U$. In some cases, oscillatory convergence may be acceptable; however, this would require many grid triplets [(17)]. Although $R$ still needs to be evaluated for all the variables in our dataset, 41.5% of the variables that have more than two grid-triplet studies do show that $S1$ approaches $SC$, $P$ approaches 1, and $U$ monotonically decreases as the grid is refined. For the other 58.5% of the variables, $S1$ also approaches $SC$ as shown by monotonically decreased error magnitude $|E|$, but $P$ and $UG$ often show mixed convergence conditions as the grid is refined.

(8) As discussed in item (6), without statistical evidence, the claim of the conservativeness of the GCI2 method is undocumented. Furthermore, we doubt very much how many applications have $SNB$ or $SAB$. If they do, we will be glad to add them to our dataset. The work by Dr. C. J. Freitas and his group is not publicly available [(9)]. Therefore, the claim of achieving the 95% reliability is again undocumented and based on anecdotal information.

(9) We agree that the actual factor of safety is undefined when a solution not in the asymptotic range happens to predict the true value. If this happens, it should be excluded from the dataset used to derive the FS method. However, monotonic convergence ensures that the uncertainty estimate is always greater than zero so that a zero error will be automatically bounded by the uncertainty.

The contrived example created by Roache only proves that the average actual factor of safety ($X¯$) cannot be used alone to determine if a solution verification method is conservative enough. But it can be used to determine the relative conservativeness between different verification methods.

It should be noted that we used both the reliability $R$ and LCL as defined by Eq. (22) in Ref. [2] to develop the FS method and determine if a method is conservative enough. Larger $X¯$ does not necessarily mean larger $R$ (readers can refer to sample 6 in Ref. [2]).

(10) As requested by Roache, we use our approach to evaluate two new variants of the GCI method proposed by Oberkampf and Roy [(18)] (GCIOR) and by Roache [(1)] (GCI3).
$UGCIOR={1.25|ɛ21rpth-1|,0.9≤P≤1.13|ɛ21rmin(max(0.5,pRE),pth)-1|,01.1$
2

$UGCI3={1.25|ε21rmin(pRE,pth)−1|,0.9≤P≤1.13|ε21rmin(pRE,pth)−1|,01.1$
3
To address Roache’s concern of using $pRE$ when $pRE>>pth$, we also evaluate an alternative form of the FS method (FS1 method). The FS1 method is the same as the FS method for $P<1$ but uses $pth$ instead of $pRE$ in the error estimate for $P>1$. Thus, Eq. (14) in Ref. [2] becomes
$UFS1={[FS1P+FS0(1−P)]|ε21rpRE−1|,01$
4
Following the same procedure described in Sec. 2.4 of Ref. [2], $FS0=2.45$, $FS1=1.6$, and $FS2=6.9$ are recommended, and the final form of the FS1 method is
$UFS1={(2.45−0.85P)|ε21rpRE−1|,01$
5
To compare the relative conservativeness between different verification methods, the three new methods are rewritten in terms of the same error estimate $δRE$.
$UGCIOR={1.25CF|δRE|,0.9≤P≤1.13(rpRE−1)rmin(max(0.5,pRE),pth)−1|δRE|,01.1$
6

$UGCI3={1.25(rpRE−1)rmin(pRE,pth)−1|δRE|,0.9≤P≤1.13(rpRE−1)rmin(pRE,pth)−1|δRE|,01.1$
7

$UFS1={(2.45−0.85P)|δRE|,01$
8
The factors of safety for all the verification methods discussed so far are shown in Fig. 1. One problem of the GCI2 method is the jump of factor of safety across the asymptotic range at $P=1$. For two grid-triplet studies with one at $P=0.999$ and the other at $P=1.001$, the factor of safety suddenly increases from 1.25 to 3 even though $P$ only varies by less than 0.2%. Eça et al. [(19)] gave similar comments on this issue: “However, it is not easy ‘to accept’ a jump of a factor of 2.4 in the uncertainty when the observed order of accuracy may vary by only 0.1.” Similar problems exist for the GCIOR and GCI3 methods when $pRE$ differs from $pth$ by 10%. It should be noted that the GCIOR method set the lower limit of $pRE$ to be larger than 0.5, which corresponds to $P≥0.25$ for a nominal second order method. Thus, the factor of safety for $P<0.25$ for the GCIOR method shown in Fig. 1 is only a result of the mathematical reformulation. Figure 1 also shows that the GCIOR and GCI3 methods are much more conservative than the other methods for $0.25 and coincide with the GCI2 method for $P>1.1$. The FS1 method is less and more conservative than the FS method for $1 and $P>1.235$, respectively.

The GCIOR, GCI3, and FS1 methods are evaluated using statistical analysis of the 25 samples following Ref. [2], with focus on samples 3 to 25. Table 1 shows the statistics for samples 3 to 8 [(2)] based on six different $P$ ranges for the three new methods. The FS1 method has the same reliability as the FS method for samples 3 to 8. The GCIOR and GCI3 methods almost have the same reliability, but the GCI3 method is a little more conservative. Compared to the GCI2 method, the GCIOR and GCI3 methods improve the reliability for $P<1$ to be larger than 95% but are not conservative enough for $P≥1$, especially near the asymptotic range. Examination of 18.2% of the data for $1.1≤P<2.0$, which cover samples 7 and 8, shows that only the FS and FS1 methods achieve 95% reliability, but the GCIOR and GCI3 methods achieve only 90%. The largest $X¯$ for samples 3-5, sample 6, sample 7, and sample 8 are the GCI3, GCI2, FS, and FS1 methods, respectively. For all the verification methods, the LCLs are larger than 1.2 for all the $P$ ranges.

Table 2 shows the statistics at the seventeen $P$ values (samples 9 to 25) ranging from 0.705 to 1.205. For samples 9 to 19 ($P<0.99$), all the verification methods achieve reliabilities larger than 95% except 93.1% for the GCIOR method at $P=0.905$, 87.5% for the three GCI methods at $P=0.955$, and 84.6% for the GCIOR and GCI3 methods at $P=1.105$. The largest $X¯$ for samples 9-12, samples 13-20, samples 21-24, and sample 25 are the GCIOR and GCI3, FS and FS1, GCI2, and FS1 methods, respectively. Only the FS and FS1 methods satisfy the requirement that $LCL>1.2$ for samples 9 to 25. The GCI2 method has $LCL<1.2$ for sample 20; the GCIOR method has $LCL<1.2$ for samples 13, 16, 17, 18, 20, and 22; and the GCI3 method has $LCL<1.2$ for samples 20 and 22.

The actual factor of safety for sample 3, sample 3 averaged using $ΔP=0.01$, and the upper and lower band of the confidence interval $X¯±tSX¯$ for samples 9 to 25 are shown in Fig. 2. t is the factor for the student-t distribution and $SX¯$ is the standard deviation of the mean of the sample, as defined in Ref. [2]. The GCIOR and GCI3 methods do not satisfy $LCL>1.2$ near the asymptotic range. Compared to the FS method (Fig. 4(e) in Ref. [2]), the FS1 method shows a larger actual factor of safety when solutions are farther from the asymptotic range for $P>1$.

## Concluding Remarks

The choice of $FS$ and $p$ in the GCI method requires user judgment calls, for which no single guideline is currently available. We recommend that a single guideline be provided.

The GCIOR and GCI3 methods have almost the same reliability. But the GCI3 method is a little more conservative. Compared to the GCI2 method, the GCIOR and GCI3 methods improve the reliability for $P<1$. However, they are too conservative for $P<0.9$ using a factor of safety 3 and not conservative enough for $P≥1.1$.

The FS1 and FS methods are the same for $P≤1$. For $pth=2$ and $r=2$, the FS1 method is less and more conservative than the FS method for $1 and $P>1.235$, respectively. As a result, the FS1 method may have an advantage for uncertainty estimates when $P>2$ where the FS and other verification methods likely predict unreasonably small uncertainties due to small error estimates. However, since the current dataset is restricted to $P<2$, the pros/cons of using the FS or FS1 method cannot be validated. Thus, until additional data is available for $P>2$, all verification methods should be used with caution for such conditions and, if possible, additional grid-triplet studies conducted to obtain $P<2$.

The authors’ statistical approach based on many analytical and numerical benchmarks provides a robust framework for developing solution verification methods. The authors welcome additional validation of the FS method and, if necessary, re-calibration and improvement using additional rigorous verification studies with $SAB$ or $SNB$ available. More research is needed to establish the criterion for achieving the asymptotic range along with its use in providing high quality numerical benchmarks.

## Acknowledgment

This study was sponsored by the Office of Naval Research under Grant No. N000141-01-00-1-7, administered by Dr. Patrick Purtell.

## References

References
1.
Roache
,
P. J.
, 2011, “
Discussion: ‘Factors of Safety for Richardson Extrapolation’
,”
ASME J. Fluids Eng.
,
133
(
11
), p.
115501
.
2.
Xing
,
T.
, and
Stern
,
F.
, 2010, “
Factors of Safety for Richardson Extrapolation
,”
J. Fluids Eng.
,
132
(
6
), p.
061403
.
3.
Roache
,
P. J.
, 1998,
Verification and Validation in Computational Science and Engineering
,
Hermosa
,
Albuquerque, NM
.
4.
Celik
,
I. B.
,
Ghia
,
U.
,
Roache
,
P. J.
,
Freitas
,
C. J.
,
Coleman
,
H.
, and
,
P. E.
, 2008, “
Procedure for Estimation and Reporting of Uncertainty Due to Discretization in CFD Applications
,”
ASME J. Fluids Eng.
,
130
(
7
), p.
078001
.
5.
ASME Committee PTC-61
, 2008, “
ANSI Standard V&V20
,” ASME Guide on Verification and Validation in Computational Fluid Dynamics and Heat Transfer, Nov. 30, 2009.
6.
Logan
,
R. W.
, and
Nitta
,
C. K.
, 2006, “
Comparing 10 Methods for Solution Verification, and Linking to Model Validation
,”
J. Aerosp. Comput. Inf. Commun.
,
3
(
7
), pp.
354
373
.
7.
Roache
,
P. J.
, 2009, private communication.
8.
Xing
,
T.
, and
Stern
,
F.
, 2009, “
Factors of Safety for Richardson Extrapolation for Industrial Applications
,” IIHR Report No. 469.
9.
Roache
,
P. J.
, 2009,
Fundamentals of Verification and Validation
,
Hermosa
,
Albuquerque, NM
.
10.
Xing
,
T.
,
Carrica
,
P.
, and
Stern
,
F.
, 2008, “
Computational Towing Tank Procedures for Single Run Curves of Resistance and Propulsion
,”
ASME J. Fluids Eng.
,
130
(
10
), p.
101102
.
11.
Roache
,
P. J.
, 2003, “
Conservatism of the Grid Convergence Index in Finite Volume Computations on Steady-State Fluid Flow and Heat Transfer
,”
ASME J. Fluids Eng.
,
125
(
4
), pp.
731
732
.
12.
,
J.
,
Pérez-Segarra
,
C. D.
,
Cònsul
,
R.
, and
Oliva
,
A.
, 2002, “
Verification of Finite Volume Computations on Steady-State Fluid Flow and Heat Transfer
,”
ASME J. Fluids Eng.
,
124
(
1
), pp.
11
21
.
13.
Pérez-Segarra
,
C. D.
,
Oliva
,
A.
,
Costa
,
M.
, and
Escanes
,
F.
, 1995, “
Numerical Experiments in Turbulent Natural and Mixed Convection in Internal Flows
,”
Int. J. Numer. Methods Heat Fluid Flow
,
5
(
1
), pp.
13
33
.
14.
Sommers
,
L. M. T.
, 1994, “
Simulation of Flat Flames With Detailed and Reduced Chemical Models
,” Ph.D. thesis, Technical University of Eindhoven, Eindhoven, The Netherlands.
15.
Soria
,
M.
,
,
J.
,
Cònsul
,
R.
, and
Oliva
,
A.
, 2000, “
A Parallel Algorithm for the Detailed Numerical Simulation of Reactive Flows
,”
Proceedings of the 1999 Parallel Computational Fluid Dynamics Conference
,
Williamsburg
,
VA
, pp.
389
396
.
16.
Hortmann
,
M.
,
Perić
,
M.
, and
Scheuerer
,
G.
, 1990, “
Multigrid Benchmark Solutions for Laminar Natural Convection Flows in Square Cavities
,”
Benchmark Test Cases for Computational Fluid Dynamics
,
I.
Celik
and
C. J.
Freitas
, eds.,
ASME
,
New York
, pp.
1
6
.
17.
Coleman
,
H. W.
,
Stern
,
F.
,
Di Mascio
,
A.
, and
Campana
,
E.
, 2001, “
The Problem With Oscillatory Behavior in Grid Convergence Studies
,”
ASME J. Fluids Eng.
,
123
(
2
), pp.
438
439
.
18.
Oberkampf
,
W. L.
, and
Roy
,
C. J.
, 2010,
Verification and Validation in Scientific Computing
,
Cambridge University Press
,
New York
.
19.
Eça
,
L.
,
Vaz
,
G.
, and
Hoekstra
,
M.
, 2010, “
Code Verification, Solution Verification and Validation in RANS Solvers
,”
Proceedings of the ASME 2010 29th International Conference on Ocean, Offshore, and Arctic Engineering (OMAE2010)
, June 6–11, 2010,
Shanghai, China
.