## Introduction

The Technical Brief by Roache [(1)] presents ten items of discussion of our factor of safety (FS) method for solution verification [(2)]. Our responses are listed below item-by-item using the same numbering as Roache. The nomenclature mostly follows our own and not Roache’s such as $pRE$ for the order of accuracy calculated using the Richardson Extrapolation as opposed to the observed order of convergence and the GCI and GCI_{2} methods as opposed to the GCI_{0} and the real GCI methods. However, we agree with Roache to use $FS$ for the factor of safety used in all the verification methods. In response to item (10), we have used our approach to evaluate two new variants of the GCI method and one new variant of the FS method.

## Response

**R**larger than 95% and a lower confidence limit (LCL) greater than or equal to 1.2 at the 95% confidence level for the true mean of the parent population of the actual factor of safety. This conclusion is true for different studies, variables, ranges of P values, and single P values where multiple actual factors of safety are available. $FS$ is a smooth linear function of

*P*and has no jumps.

There are a few variants of the GCI method. We have used the definition of the GCI method, which arguably is the most common version/interpretation applied in the literature [(3,4,5)]. The GCI_{1} method was proposed by Logan and Nitta [(6)]. The guideline for the GCI_{2} method was communicated to us by Roache

The private communication is available to the public upon request.

**R**> 95% and LCL > 1.2 for different studies, variables, ranges of

*P*values, and single

*P*values where multiple actual factors of safety are available. As a result, there are high risks to use these GCI variants in certain circumstances, especially for $P>1$. Except the original GCI method, all variants of the GCI method have jumps of $FS$ versus

*P*.

Our purpose is not to add to this confusion but rather to evaluate the performance of the outcomes of selecting any of these variants of the GCI method and compare with the FS method using our approach. The correction factor and $pRE$ are used to define the GCI_{2} and other verification methods as defined by Eqs. (10)–(15) in Ref. [2] in order to compare their relative conservativeness using the same error estimate $\delta RE$.

(2) We disagree with Roache to refer to the GCI as the GCI_{0} method and the GCI_{2} as the GCI method for reasons given in item (1). The lack of a single guideline for selecting $FS$ and $p$ and when to use which variant of the GCI method is highlighted by Roache’s current discussion. Roache accepts using $pRE$ when it is within a 5% difference of $pth$ in item (2), whereas later in item (10), Roache considers two other judgment calls as reasonable.

The GCI_{2} method discards the “coarse” grid solution in the uncertainty estimate when $P>1$, which is difficult to justify. For example, four grid solutions from the coarsest grid 4 to the finest grid 1 can build two grid triplet studies, (1, 2, 3) and (2, 3, 4). Grid convergence studies for industrial applications often show the oscillation of $pRE$ such that (1, 2, 3) could estimate $P>1$ but (2, 3, 4) could estimate $P<1$. Based on the GCI_{2} method, $S3$ should be discarded in the uncertainty estimate for (1, 2, 3) but not for (2, 3, 4). Of course, we agree that ideally one would conduct additional grid triplet studies until the solution is at or as close as possible to the asymptotic range; however, clearly this is not always possible especially for industrial applications [(10)].

We agree that a grid-triplet study with $P=0.08$ is not desirable. However, it is not uncommon for solution verification studies (e.g., local $pRE$ ranges from 0.012 to 8.47 in Ref. [4]). Additionally, Roache’s criticism of using $P=0.08$ is inconsistent with one of his previous conclusions that there is no necessity to discard results with $pRE<1$ ($P<0.5$ for a second order method in Ref. [11]).

(3) The fact that “the use of the GCI_{1} method is closer to a 68% than a 95% confidence level” was one of the conclusions by Logan and Nitta [(6)]. This conclusion was not just based on the dataset with intentional choice of grid studies with oscillations in both exponent *p* and output quantity. As stated in page 367 in Ref. [6], “However, for our contrived and mechanics example $NS=18$ sets (most of which were non-smooth), the use of GCI = 1.25 is much closer to a 68% confidence estimate than 95%.” It was also recommended in Refs. [6] and [2] that a sample with the number of grid convergence studies much larger than 100 is needed to draw general conclusions.

We did not recommend the GCI_{1} method but rather evaluated it using much larger sample sizes than Ref. [6]. For the largest sample 3 with size $N=329$, the reliability $R$ (Eq. (19) in Ref. [2]) is 90.3% for the GCI_{1} method.

(4) We disagree with Roache’s evaluation in Ref. [11] where it states that “Briefly, the net result is 14 NC (nonconservative) of 176 entries, or 8.0%.” Only 151 of the 176 grid triplet studies have the actual error $E$. This results in 24 nonconservative of 151 (note there are nine nonconservative grid-triplet studies that estimate $U=E$). So, the reliability for the GCI method [(12)] is actually 84.1%, which agrees very well with the reliability 83.9% estimated using our 329 grid-triplet studies (sample 3 in Ref. [2]).

Based on our own evaluation above and the fact that Cadafalch et al. [(12)] used $FS=1.25$ for $P>1$, the method they applied was not the GCI_{2} method and more likely the GCI method. The claim of “an original and reasonable variant of the real GCI” [(1)] again is confusing.

(5) We take 95% coverage as the common uncertainty target for both experiments and computations [(5)]. Although the GCI_{2} method only misses the overall reliability by 0.8% for sample 3, more importantly it fails to provide sufficient conservatism for other samples including the reliabilities of 91.4%, 90%, and 87.5% for samples 5, 8, and 16, respectively [(2)]. It is possible that another dataset could slightly change our evaluations. Nonetheless, the current sample size is large and the range of $P$ values is wide such that a further increase of the number of samples is not likely to significantly alter the FS method and its results.

(6) The FS method was calibrated/validated against the available dataset. Note that calibration/validation requires that the true error can be evaluated, i.e., the solution numerical benchmark ($SNB$) or solution analytical benchmark ($SAB$) is known. We welcome additional validation of the FS method and if necessary re-calibration and improvement, but again $SNB$ or $SAB$ must be known. The claim of Roache and others of the 95% reliability for the GCI method is undocumented and based on anecdotal information. We doubt that $SNB$ or $SAB$ is available for many of the cases cited by Roache and others. It should be a simple matter to provide proper documentation.

Note that the FS method is more conservative than the GCI_{2} method except for $1<P<1.136$ due to the jump of the factor of safety at $P=1$ for the GCI_{2} method. If the FS method is not conservative enough for another dataset, the GCI_{2} method will likely be worse.

The claim that the GCI_{2} method has been stable for over 12 years is not well founded. Due to the lack of a single guideline on the choice of $FS$ and $p$, different variants of the GCI method have been used by different users based on their own judgment calls. For example, Cadafalch et al. [(12)] did not use the GCI_{2} method, and Logan and Nitta [(6)] used the GCI_{1} method. Furthermore, the GCI method may have been applied to *O*(1000) cases but no statistical evidence for reliability has been documented.

(7) We disagree with Roache’s suggestion that the FS method has problems in predicting monotonic convergence for fine grids. The uncertainty estimates in Table 6 for the FS method in Ref. [2] for the three finest grid triplets are not monotonically decreasing since $P$ shows large oscillations, and the factor of safety for the second finest grid triplet (2, 3, 4) at $P=1.49$ is much larger than that for the other methods evaluated at the same *P*. However, the larger factor of safety is required to ensure the reliability for $P>1$. For the three grid triplets discussed, it is interesting to evaluate the convergence ratio $R$ for the fine grid solution $S1$ ($RS1$), $P$ ($RP$), and $UG$ ($RUG$). All the five verification methods have the same $RS1$ and $RP$, which show monotonic convergence. The GCI, GCI_{1}, and CF methods show monotonic convergence for U_{G}, whereas the GCI_{2} and FS methods show monotonic divergence ($RUG=2.74$) and oscillatory divergence ($RUG=-4.53$), respectively.

The oscillation of *P* may be caused by many factors. Grid 4 is still too coarse for the solution to be in the asymptotic range. Additionally, reducing the iterative error to machine zero is very difficult for large-scale computations. With the small grid refinement ratio $r=24$, solution changes *ɛ* will be small, and the sensitivity to grid-spacing and time step may be difficult to identify compared with iterative errors $UI$. As shown in Fig. 6(*b*) in Ref. [10], $UI,1/\u025b12=61.6%$ for the cases in Table 6 [(2)]. When $r$ increases, $UI/\u025b$ will likely decrease. For example, the grid uncertainty decreases from 5.04 for (2, 4, 6) to 4.02 for (1, 3, 5) with $UI,1/\u025b13=20%$ for $r=2$. However, it should be noted that a large $r$ may be problematic, too, as different grids may resolve different flow physics.

There are some other cases that the GCI, GCI_{1}, GCI_{2}, CF, and FS methods show non-monotonic convergence for multiple grid-triplet studies, including the “well-behaved” problems Cadafalch et al. [(12)] and Roache [(11)] used to evaluate the conservativeness of the GCI method. For the radial velocity using the SMART scheme in the study of premixed methane/air laminar flat flame on a perforated burner [(13,14,15)], the uncertainty estimates using the FS and GCI_{2} methods monotonically decreased whereas the other three methods did not as the grid is refined. Another example is for the uncertainty estimates for temperature at a monitored location for a two-dimensional natural convection in square cavities at $Ra=106$, which had five grid-triplet studies with $r=2$ [(16)]. Uncertainty estimates using the five verification methods discussed in Ref. [2] first monotonically decreased as the grid is refined but suddenly increased for the finest grid-triplet. Thus, it is unreasonable to blame the FS method as the reason for such behavior.

The verification results for our industrial application example are far from the asymptotic range. Although we evaluated the convergence characteristics for the 98 verification variables using $P$ and $|E|$ as functions of $\Delta xfine/\Delta xfinest$ [(2)], a standard criterion for achieving the asymptotic range is still lacking. A possible criterion is that monotonic convergence should be established based on evaluation of the convergence ratio $R$ for fine grid solution $S1$ (towards $SC$), $P$ (towards 1), and $U$ (monotonically decreasing) for multiple (at least three) grid-triplets with the same grid refinement ratio $r$ and $UI\u226aU$. In some cases, oscillatory convergence may be acceptable; however, this would require many grid triplets [(17)]. Although $R$ still needs to be evaluated for all the variables in our dataset, 41.5% of the variables that have more than two grid-triplet studies do show that $S1$ approaches $SC$, $P$ approaches 1, and $U$ monotonically decreases as the grid is refined. For the other 58.5% of the variables, $S1$ also approaches $SC$ as shown by monotonically decreased error magnitude $|E|$, but $P$ and $UG$ often show mixed convergence conditions as the grid is refined.

(8) As discussed in item (6), without statistical evidence, the claim of the conservativeness of the GCI_{2} method is undocumented. Furthermore, we doubt very much how many applications have $SNB$ or $SAB$. If they do, we will be glad to add them to our dataset. The work by Dr. C. J. Freitas and his group is not publicly available [(9)]. Therefore, the claim of achieving the 95% reliability is again undocumented and based on anecdotal information.

(9) We agree that the actual factor of safety is undefined when a solution not in the asymptotic range happens to predict the true value. If this happens, it should be excluded from the dataset used to derive the FS method. However, monotonic convergence ensures that the uncertainty estimate is always greater than zero so that a zero error will be automatically bounded by the uncertainty.

The contrived example created by Roache only proves that the average actual factor of safety ($X\xaf$) cannot be used alone to determine if a solution verification method is conservative enough. But it can be used to determine the relative conservativeness between different verification methods.

It should be noted that we used both the reliability $R$ and *LCL* as defined by Eq. (22) in Ref. [2] to develop the FS method and determine if a method is conservative enough. Larger $X\xaf$ does not necessarily mean larger $R$ (readers can refer to sample 6 in Ref. [2]).

_{OR}) and by Roache [(1)] (GCI

_{3}).

_{1}method). The FS

_{1}method is the same as the FS method for $P<1$ but uses $pth$ instead of $pRE$ in the error estimate for $P>1$. Thus, Eq. (14) in Ref. [2] becomes

_{1}method is

_{2}method is the jump of factor of safety across the asymptotic range at $P=1$. For two grid-triplet studies with one at $P=0.999$ and the other at $P=1.001$, the factor of safety suddenly increases from 1.25 to 3 even though $P$ only varies by less than 0.2%. Eça et al. [(19)] gave similar comments on this issue: “However, it is not easy ‘to accept’ a jump of a factor of 2.4 in the uncertainty when the observed order of accuracy may vary by only 0.1.” Similar problems exist for the GCI

_{OR}and GCI

_{3}methods when $pRE$ differs from $pth$ by 10%. It should be noted that the GCI

_{OR}method set the lower limit of $pRE$ to be larger than 0.5, which corresponds to $P\u22650.25$ for a nominal second order method. Thus, the factor of safety for $P<0.25$ for the GCI

_{OR}method shown in Fig. 1 is only a result of the mathematical reformulation. Figure 1 also shows that the GCI

_{OR}and GCI

_{3}methods are much more conservative than the other methods for $0.25<P<0.9$ and coincide with the GCI

_{2}method for $P>1.1$. The FS

_{1}method is less and more conservative than the FS method for $1<P\u22641.235$ and $P>1.235$, respectively.

The GCI_{OR}, GCI_{3}, and FS_{1} methods are evaluated using statistical analysis of the 25 samples following Ref. [2], with focus on samples 3 to 25. Table 1 shows the statistics for samples 3 to 8 [(2)] based on six different $P$ ranges for the three new methods. The FS_{1} method has the same reliability as the FS method for samples 3 to 8. The GCI_{OR} and GCI_{3} methods almost have the same reliability, but the GCI_{3} method is a little more conservative. Compared to the GCI_{2} method, the GCI_{OR} and GCI_{3} methods improve the reliability for $P<1$ to be larger than 95% but are not conservative enough for $P\u22651$, especially near the asymptotic range. Examination of 18.2% of the data for $1.1\u2264P<2.0$, which cover samples 7 and 8, shows that only the FS and FS_{1} methods achieve 95% reliability, but the GCI_{OR} and GCI_{3} methods achieve only 90%. The largest $X\xaf$ for samples 3-5, sample 6, sample 7, and sample 8 are the GCI_{3}, GCI_{2}, FS, and FS_{1} methods, respectively. For all the verification methods, the LCLs are larger than 1.2 for all the $P$ ranges.

Table 2 shows the statistics at the seventeen $P$ values (samples 9 to 25) ranging from 0.705 to 1.205. For samples 9 to 19 ($P<0.99$), all the verification methods achieve reliabilities larger than 95% except 93.1% for the GCI_{OR} method at $P=0.905$, 87.5% for the three GCI methods at $P=0.955$, and 84.6% for the GCI_{OR} and GCI_{3} methods at $P=1.105$. The largest $X\xaf$ for samples 9-12, samples 13-20, samples 21-24, and sample 25 are the GCI_{OR} and GCI_{3}, FS and FS_{1}, GCI_{2}, and FS_{1} methods, respectively. Only the FS and FS_{1} methods satisfy the requirement that $LCL>1.2$ for samples 9 to 25. The GCI_{2} method has $LCL<1.2$ for sample 20; the GCI_{OR} method has $LCL<1.2$ for samples 13, 16, 17, 18, 20, and 22; and the GCI_{3} method has $LCL<1.2$ for samples 20 and 22.

The actual factor of safety for sample 3, sample 3 averaged using $\Delta P=0.01$, and the upper and lower band of the confidence interval $X\xaf\xb1tSX\xaf$ for samples 9 to 25 are shown in Fig. 2. *t* is the factor for the student-*t* distribution and $SX\xaf$ is the standard deviation of the mean of the sample, as defined in Ref. [2]. The GCI_{OR} and GCI_{3} methods do not satisfy $LCL>1.2$ near the asymptotic range. Compared to the FS method (Fig. 4(*e*) in Ref. [2]), the FS_{1} method shows a larger actual factor of safety when solutions are farther from the asymptotic range for $P>1$.

## Concluding Remarks

The choice of $FS$ and $p$ in the GCI method requires user judgment calls, for which no single guideline is currently available. We recommend that a single guideline be provided.

The GCI_{OR} and GCI_{3} methods have almost the same reliability. But the GCI_{3} method is a little more conservative. Compared to the GCI_{2} method, the GCI_{OR} and GCI_{3} methods improve the reliability for $P<1$. However, they are too conservative for $P<0.9$ using a factor of safety 3 and not conservative enough for $P\u22651.1$.

The FS_{1} and FS methods are the same for $P\u22641$. For $pth=2$ and $r=2$, the FS_{1} method is less and more conservative than the FS method for $1<P\u22641.235$ and $P>1.235$, respectively. As a result, the FS_{1} method may have an advantage for uncertainty estimates when $P>2$ where the FS and other verification methods likely predict unreasonably small uncertainties due to small error estimates. However, since the current dataset is restricted to $P<2$, the pros/cons of using the FS or FS_{1} method cannot be validated. Thus, until additional data is available for $P>2$, all verification methods should be used with caution for such conditions and, if possible, additional grid-triplet studies conducted to obtain $P<2$.

The authors’ statistical approach based on many analytical and numerical benchmarks provides a robust framework for developing solution verification methods. The authors welcome additional validation of the FS method and, if necessary, re-calibration and improvement using additional rigorous verification studies with $SAB$ or $SNB$ available. More research is needed to establish the criterion for achieving the asymptotic range along with its use in providing high quality numerical benchmarks.

## Acknowledgment

This study was sponsored by the Office of Naval Research under Grant No. N000141-01-00-1-7, administered by Dr. Patrick Purtell.

## References

*ASME Guide on Verification and Validation in Computational Fluid Dynamics and Heat Transfer*, Nov. 30, 2009.