## Abstract

In nuclear engineering, modeling and simulations (M&Ss) are widely applied to support risk-informed safety analysis. Since nuclear safety analysis has important implications, a convincing validation process is needed to assess simulation adequacy, i.e., the degree to which M&S tools can adequately represent the system quantities of interest. However, due to data gaps, validation becomes a decision-making process under uncertainties. Expert knowledge and judgments are required to collect, choose, characterize, and integrate evidence toward the final adequacy decision. However, in validation frameworks, CSAU: code scaling, applicability, and uncertainty (NUREG/CR-5249) and EMDAP: evaluation model development and assessment process regulatory guide (RG 1.203), such a decision-making process is largely implicit and obscure. When scenarios are complex, knowledge biases and unreliable judgments can be overlooked, which could increase uncertainty in the simulation adequacy result and the corresponding risks. Therefore, a framework is required to formalize the decision-making process for simulation adequacy in a practical, transparent, and consistent manner. This paper suggests a framework—“Predictive capability maturity quantification using Bayesian network (PCMQBN)”—as a quantified framework for assessing simulation adequacy based on information collected from validation activities. A case study is prepared for evaluating the adequacy of a Smoothed Particle Hydrodynamic simulation in predicting the hydrodynamic forces onto static structures during an external flooding scenario. Comparing to the qualitative and implicit adequacy assessment, PCMQBN is able to improve confidence in the simulation adequacy result and to reduce expected loss in the risk-informed safety analysis.

## 1 Introduction

Nowadays, an increasing amount of research has been conducted for developing and applying advanced modeling and simulation (M&S) tools in nuclear discipline. In risk-informed safety analysis [1,2], M&S tools are used to investigate the effects of uncertain scenarios, simulate accident progressions, characterize the reactor safety margin, improve the operational procedures, locate design vulnerabilities, etc. Compared to classical risk analysis, the risk-informed analysis aims to address both aleatory and epistemic uncertainty within a well-defined issue space, rather than trying to work with arbitrarily defined point values of load and capacity. Meanwhile, in complex systems like nuclear power plants (NPPs), since the interactions among systems, components, and external events can be highly nonlinear, risk-informed safety analysis uses advanced simulations to fully represent the generations, progressions, and interactions of accident scenarios with the NPPs. However, the classical risk-informed approach does not consider the impacts of simulation adequacy [3,4], which includes model parameter uncertainty, model form uncertainty, numerical approximations, and scaling errors. As a result, a validation framework is needed to not only determine whether the M&S code is adequate for representing the issue spaces but also to be directly used in the risk-informed safety analysis.

Code scaling, applicability, and uncertainty (CSAU) evaluation methodology was introduced in 1989 [5] to demonstrate a method that “can be used to quantify uncertainties as required by the best-estimate option described in the U.S. Nuclear Regulatory Commission (NRC) 1988 revision to the emergency core cooling systems (ECCS) code of federal regulations (10 CFR 50.46) [6]”. A regulatory guide (RG 1.203), evaluation model development and assessment process (EMDAP), is developed at 2005 to “describe a process that the U.S. NRC considers acceptable for use in developing and assessing evaluation models that may be used to analyze transient and accident behavior that is within the design basis of a nuclear power plant [7].” In the CSAU/EMDAP framework, the complexity of physics and phenomena is emphasized, and scaling analysis is suggested to resolve the lack of data issues. The objective is to ensure both the sufficiency and necessity of validation data, modeling, and simulation, such that the simulation can adequately describe the scenarios investigated. Although the evidence involved is objective, the assessment process requires subjective information, including phenomena ranking and identification, decisions regarding data applicability (DA), and selection of validation metrics. In CSAU/EMDAP, such subjective evidence is treated implicitly, and it causes the validation process to lack transparency. Meanwhile, due to a lack of formalized treatment, it becomes hard for analysts and decision makers to ensure the consistency of elicitation and processing of subjective information. Therefore, a decision model is needed for integrating all sources of evidence and determining final simulation adequacy. Meanwhile, the decision model needs to be practical, transparent, and consistent such that the simulation adequacy results can be used with sufficient confidence.

The predictive capability maturity model (PCMM) [8] was developed by W.L. Oberkampf et al. in 2007. As a decision model for verification, validation, and uncertainty quantification, PCMM explicitly treats the model credibility/uncertainty assessment as a decision-making process. For designated scenarios, six attributes are designed and assessed qualitatively based on a PCMM matrix, which is designed according to the context and consequence of applications. Since the final decisions are informed by requirements and consequences, PCMM can effectively guide the development and validation of M&S tools. However, since the PCMM matrix is constructed using descriptive statements, the representations of performance standards can be obscure and suggest inconsistent criteria. Meanwhile, although validation and uncertainty quantification are discussed as major attributes, other critical components, including scaling analysis, data applicability, and data quality, are not explicitly discussed. As a result, when there are data gaps induced by differences between the prototypical and experimental systems, such implicitness could suggest results in inconsistent maturity levels.

Other frameworks include “guide for verification and validation in computational solid mechanics (VV10)” [9] and “standard for verification and validation in computational fluid dynamics and heat transfer (VV20)” [10] by ASME for quantifying the degree of accuracy to consider the errors and uncertainties in both the solution and the data. Since the adequacy results are used to support nuclear risk analysis, while VV10 and VV20 are designed as a general guidance for the V&V of computational model, CSAU/EMDAP and PCMM are more appropriate and relevant to the context of this study.

In this paper, a new decision model named predictive capability maturity quantification using Bayesian network (PCMQBN) is presented. Developed based on argumentation theory and Bayes' theorem, PCMQBN aims to formalize the decision-making for assessing simulation adequacy assessment such that the process is transparent, consistent, and improvable with new evidence. Figure 1 shows the organization of this paper: Sec. 2 limits the scope of this study by introducing assumptions, conditions, and limitations of proposed framework. Section 3 formalizes the interpretation of simulation adequacy based on the nature of validation. Section 4 introduces PCMQBN, where the first two subsections describe technical basis that characterizes and integrates evidence based on the argumentation theory and Bayes' theorem; the last two subsections evaluate the behavior of this framework. Section 5 illustrates the application of PCMQBN in evaluating the adequacy of a smoothed particle hydrodynamic simulation for predicting the hydrodynamic forces onto static structures during an external flooding scenario.

## 2 Assumption, Condition, and Limitation

To properly identify the scope of this study, important conditions and assumptions are listed in Table 1. Category A aims to define the scope and application of this study; category B lists the assumptions in PCMQBN for formalizing the decision-making process in validations; category C suggests the conditions and assumptions used in case studies.

Assumption A1 limits the application to risk-informed safety analysis, and the objective is to determine the error distribution of the quantity of interest predicted by M&S. More specifically, this study focuses on situations with data gaps. As a result, to better characterize adequacy of M&Ss and to avoid unreliable expert judgments, this study aims to reduce the uncertainty in estimating the simulation adequacy and corresponding risks induced by such uncertainty. Assumption A3 mainly assumes that the code verification has been performed. The confidence on such assumption is built on the theory manual of NEUTRINO [11], together with code and solution verifications from various literature [1214].

Assumption B1 suggests the formal methods to improve the reliability and robustness of the validation decision-making process. Formal methods have continuously proven its success in financial, computer system, etc. in reducing major losses due to unverified errors [15]. It is argued that the formal methods do not obviate the need for testing, experiments, and other assertion techniques, it is mainly designed to help identify errors in reasoning, which could be overlooked or left unverified. Assumption B2 aims to formalize the validation process as an argument process and to further represent the validation argument with Bayes' theorem. However, it has been suggested that the prior probability and likelihood cannot be known precisely [16,17]. In this study, a sensitivity analysis is suggested by performing standard Bayesian analysis with a class of prior and likelihood functions. Next, all important parameters, which have high impacts on the results, are carefully examined. If no significant discrepancy is witnessed, the result is claimed to be robust. Assumption B3 aims to suggest expected losses for representing the risks of adopting code predictions to risk-informed safety analysis. Since M&Ss are mainly used to support safety decisions and alternatives in designated applications, the corresponding “adequacy” should be defined based on the consequence of adopting the predicted QoIs. This study makes a table of synthetic monetary loss for each possible consequence, and the expected losses are calculated based on the simulation adequacy result.

Assumptions 1, 2, and 3 in category C are made to define the error distributions of QoIs in the case study based on the simulation adequacy results. It is criticized that the current assembly of simulation adequacy and model predictions is arbitrary. Therefore, the claim that the proposed framework can reduce uncertainty in simulation adequacy results is questionable. However, at the initial developmental phase, it is acceptable to have a crude ensemble method for demonstration purposes. It is stressed that the parameters in the proposed framework are not fixed. As more evidence is gathered, the parameters need to be calibrated and refined. Moreover, since formal methods are designed to avoid or reduce unverified errors, it is argued that the validity of this claim should not be greatly deteriorated by assumptions for simplification purposes. Assumption 4 suggests a rational agent who prefers to have fewer expected monetary losses. It is criticized in Ref. [18] that the expected value cannot fully represent the agents' choices, where subjective and psychological impacts are neglected. It is argued that this study is at the scoping stage, and the objective is to formalize the decision-making process. In the future developmental stage, different decision analysis models can be tested and optimized for validation purposes.

## 3 Simulation Adequacy Interpretation

To formalize the decision-making process in simulation-adequacy assessment, a consistent and transparent interpretation is needed for “simulation adequacy” as a theorical basis of the proposed framework. This section first reviews definitions from relevant works and identifies requirements for interpreting simulation adequacy. Next, the simulation adequacy is interpreted as a triplet set by answering three key questions. Meanwhile, examples are given for illustrating each of the three elements.

As a result, this study describes simulation adequacy as the degree to which M&S tools can adequately represent the system quantities of interest in the target applications. The objective is not only to determine if an M&S is good or bad but also to describe the uncertainty in the real application, especially when it is understood from nonprototypical data. In this study, simulation adequacy is suggested to be composed of three components: scenario, uncertainty/predictive capability levels, and beliefs. Note the purpose of this interpretation is not to resolve fundamental issues of uncertainty classifications through a sophisticated interpretation. Instead, this study focuses on practical resolutions for deciding simulation adequacy in complex engineering problems with a transparent and consistent framework. In the context of nuclear engineering, the term “transparent” requires a formalized interpretation and representation for simulation adequacy; the term “consistent” requires the formalization to have a mathematical basis, and allows for assumptions that cannot be violated in real applications; the term “practical” requires that the formalized simulation adequacy assessment should be easily applied to risk analysis. Equation (1) shows a representation of simulation adequacy as a triplet set: scenarios, beliefs, and levels of uncertainty or predictive capability for M&S
$simulation adequacy={scenario, belief, predictive capability}$
(1)

The structure of interpretation in this study is similar to the triplet by Kaplan and Garrick [27] for probabilistic risk analysis. The definition for simulation adequacy aims to answer three questions:

1. What scenario does the M&S apply to?

In the nuclear accident and transient analysis, results of M&S are used to support system designs and risk management within a range of issue spaces. Meanwhile, since risk-informed safety analysis aims to address the scenario uncertainty, a scenario set $S=[S1,…,Si,…]$ is defined, and each element corresponds to one sampled scenario $Si$ according to designated distributions. Therefore, the selections of computational methods and simulations naturally depend on the investigated scenario. Moreover, in scenarios with minor impacts, the reactor systems can be robust enough to withstand much higher loads than those being exerted. In this circumstance, safety decisions do not heavily rely on M&S results, requirements on model predictive capability and confidence do not need to be high. Similarly, when scenarios loads are likely to exceed system capacities and the uncertainty of M&S results could alter the safety decisions, the requirements on the predictive capability and confidence will be strict.

2. What is the predictive capability of M&S?

The “predictive capability” refers to the capability of M&Ss in predicting QoIs during accident and transient scenarios. As a major product of classical validation methods, the capability is quantified by errors between simulation results and observations. Such techniques, as validation metrics and statistical analysis are usually used. Meanwhile, Oberkampf et al. [19] represent the model's predictive capability by maturity levels, which are further explained by subattributes and descriptive terms. In this case, argumentation theory and corresponding techniques, including goal structure notation (GSN), claim, argument, and evidence notation are used.

3. What is the belief in the M&S predictive capability?

Due to imperfect knowledge and insufficient data, predictive capability cannot be precisely estimated, and belief is used to describe a state of knowledge regarding estimations. Although belief is represented by probability, it does not refer to the frequency or statistics in the sense that it does not represent a property of the “real” world. Rather, belief describes our state of knowledge and discusses its effects on decisions. Table 2 shows an example of belief scales in probabilities together with their characteristics. This scale provides the definition of unreasonable model maturity levels as involving the independent combination of an end-of-spectrum condition with a condition that is expected to be outside the main body of the spectrum but cannot be positively excluded. The spectrum in this study refers to the spectrum of physics, scales, data applicability, and prediction errors. For example, when a solid-mechanistic code is applied to simulate fluid dynamics, its prediction errors for certain QoIs can occasionally be small at certain locations. However, the belief that this simulation generally has low prediction errors and high maturity should be low since the physics in solid mechanics are outside the spectrum of fluid dynamics; when experimental data for validating a simulation in kilometer-scale and multiphysics scenarios are collected from a centimeter facility that focuses on one of the involving phenomena, the belief that the experimental data are applicable to the target scenarios should be low since the scale are different and phenomena are separate. However, such reduced scale and separate-effect data cannot be positively excluded from the main body of the spectrum in target applications since the involving physics and phenomena are in the spectrum of target scenarios.

As a result, the objective is to find the belief $Pi(levelj)$, represented by probabilities, such that Eq. (2) can be satisfied for any investigated scenarios $Si$ within the designated scenario set or issue space $S=[S1,…,Si,…]$; $Ps$ is the screening probability for beliefs in simulation's validation result (VR), data applicability, and simulation adequacy for a given set of scenarios. It is to ensure consistent belief assignments across the entire issue space in the risk-informed safety analysis. Similar definitions can also be found in the risk-oriented accident analysis methodology by Theofanous [2], which focuses on the scenarios spectrum and aims to distinguish unreasonable and small-probability events
$Pi(levelj)
(2)

Table 2 shows an example of screening probability assigned by expert knowledge. Examples are also provided assuming that an M&S simulation is applied to predict the generation and progression of surface waves in the flooding scenarios. VR is assessed by comparing simulation predictions against validation databases, while DA is assessed by the scale of facilities, relevancy of phenomena, and quality of data.

The probability values $Pi(levelj)$ are computed from the probabilistic framework that represents a map of parameters in the causal relationships ${di}$, prior knowledge ${p̃i}$, and decision parameters ${ki}$
$Pi(levelj)=F(d1,d2,…,p̃1, p̃2,…,k1,k2,…)$
(3)

The prior knowledge ${p̃i}$ and corresponding uncertainties are distributions and can be quantified according to the probability scale in Table 2. Causal relationships and decision parameters should not violate well-known physics and laws, and a synthetic model can be developed to support the value assignment. Meanwhile, they are assumed to be well-posed problems in the sense that they are not subject to major discontinuities and the uncertainty can be reduced to the parameter level without major modeling uncertainty. It is argued that the three questions above are sufficient for guiding validation activities and adequacy assessment. However, since simulation results are usually applied in risk analysis and safety decisions, the preferences and consequences of accepting certain simulation adequacy results need to be evaluated, especially when results contain uncertainties. Although such topics are beyond the scope, for completeness, this study briefly discusses a fourth attribute of simulation adequacy as an additional concern. Meanwhile, a synthetic model, together with a review of other sophisticated options, is included regarding the application of simulation adequacy results.

Since the simulation adequacy results are mostly applied to support safety-related decisions or alternatives, the adequacy should be judged not only based on model predictions and validation databases, it should also consider the target decisions. For example, in scenarios with severe consequences, requirements on belief and M&S's predictive capability levels should be more stringent than for those with less severe consequences. In the risk-informed analysis, the predictive capability level and belief should be higher for regions where loads distributions and capacity distributions overlap. If the adequacy result satisfies the requirements, a cost-benefit analysis is performed based on the consequence of simulation's uncertainty and risk. If the adequacy results do not satisfy the requirements or it is net beneficial for improving the predictive capability level and belief, additional iteration will be conducted to either continue developing new models, collecting new data, or updating the validation techniques. By adding risk and performance measurement results, the validation process becomes risk-informed in the sense that the acceptance criteria of simulation adequacy are informed by risks of target applications, which are caused by both model and scenario uncertainty.

## 4 Predictive Capability Maturity Quantification Using Bayesian Network

To avoid expert biases and unreliable judgement within an implicit decision scheme, this study proposes a quantitative decision-making framework, named PCMQBN to formalize the assessment of simulation adequacy. Considering the similarity between assurance case and simulation validation, the simulation adequacy assessment can be described as a “confidence argument” supported by evidence that justifies the claim that simulation provides reliable prediction in the domain of application. The evidence is collected from the validation framework and characterized mathematically such that it is consistent with the interpretation of simulation adequacy. Moreover, such an argument process can be further quantified by probabilities and maturity levels, and further represented graphically by Bayesian networks. In this framework, all evidence is integrated by probabilistic inferences and can be further represented graphically by a Bayesian network. At the same time, to integrate evidence from various sources, a synthetic decision model is suggested for determining the relative weights and conditional probabilities in Bayesian networks. Figure 2 shows the scheme for assessing and applying simulation adequacy by PCMQBN. Evidence of validation result and data applicability is first collected from validation activities guided by validation frameworks like CSAU/EMDAP. Section 4.1 discusses in detail how evidence is collected and characterized consistently based on the maturity level assignment (4.1.1) and belief assignment (4.1.2). Next, the characterized evidence, together with decision parameters regarding the conditional dependencies among different evidence and attributes, is integrated for simulation adequacy results. Section 4.2 discusses details of how evidence is integrated based on the argumentation theory and further quantified by probabilistic inference. To evaluate the sensitivity of decision parameters, Sec. 4.3 suggests a sensitivity analysis for the simulation adequacy result with the same set of evidence. At last, the simulation adequacy result is applied to safety analysis by assembling the predictions by modeling and simulation (M&S) and beliefs. At the same time, the parameter assignment and integration structures are subject to refinement. Section 4.4 discusses different phases of simulation adequacy assessment based on qualities of each step.

### 4.1 Evidence Characterization.

During a validation process like CSAU/EMDAP, different activities and materials, including validation databases, scaling analysis of experimental databases, simulation assessment results, phenomenon identification and ranking process, are used to support the adequacy assessment of a simulation. To make better use of these materials, this study characterizes all related evidence based on the argumentation theory and the triplet definition for simulation adequacy. The characterization is required to be transparent, practical, and consistent. The term “transparency” requires a clear representation of evidence by mathematical forms such that the meaning and substance of evidence are maintained and visible. The term “practice” requires all related evidence to be effective for practical purposes and easily obtainable. In the context of nuclear safety analysis, the evidence should be characterized such that it can be directly used to support safety-related decisions. The term “consistency” requires the characterizations to be theoretically defendable, mathematically sound, and consistent with common knowledge and well-known rules.

There are various ways of characterizing evidence. Sun [28] categorizes evidence as direct evidence, backing evidence or counterevidence, based on its association with confidence. In the context of assurance case that focuses on safety [29], the evidence is defined as “the information that serves as the grounds and starting point of (safety) arguments, based on which the degree of truth of the claims in arguments can be established, challenged and contextualized.” Furthermore, in Toulmin's argument model [30], the evidence is classified into six groups: claim, data, warrant, backing, qualifier, and rebuttal. Since validation shares many common features with the assurance case, Table 3 shows examples in simulation adequacy assessment for each element based on the Toulmin's argument model. In this study, information including simulation predictive capability, validation data, scaling results, data relevance, data uncertainty, assumptions, and conditions, are considered as evidence for assessing simulation adequacy. In addition, although indirect evidence, including process quality assurance, use history, and M&S management [31] will affect the adequacy assessment for M&S, this study mainly investigates direct evidence for validation.

Since simulation adequacy is to estimate the degree that model predictions represent the real values, the errors, referred to as validation result, between model predictions and the validation data should be used to support the adequacy assessment. In some validation methods, simulation adequacy is interpreted as uncertainty distributions of model predictions [22]. However, it is argued that in nuclear applications, the difficulties and costs in collecting data under prototypical conditions are so high that only data from small-scale facilities and separate (or mixed) effect tests are practically obtainable. Therefore, the uncertainty distribution estimated by validation data on different scales can be significantly distorted. To avoid the problem of scaling distortion, it becomes necessary to evaluate the applicability of validation data to the target applications, referred to as data applicability, in addition to the validation result. As a result, the top claim of simulation adequacy is supported by subclaims of validation result and data applicability. The validation result is to determine the errors between simulation results and the validation data, while the data applicability is to determine the applicability of validation data from reduced scales and experimental conditions in the context of applications. Next, the corresponding evidence is collected and evaluated.

The following two sections (4.1.1 and 4.1.2) discuss how evidence for validation results and data applicability are characterized. Specifically, the predictive capability is described by maturity levels, while the belief is represented by probability.

#### 4.1.1 Maturity Level Assignment.

There have been many researches performed to quantitatively measure the level of predictive capability for an M&S tool. Harmon and Youngblood [32]suggested a five-point maturity ranking scale based on the concept of credibility, objectivity, and sufficiency of accuracy for the intended use. Long and Nitta [33] suggested a 10-point scale by the concepts of completeness, credibility, and sufficiency of accuracy for the intended use. Pilch et al. [34] suggested a four-point scale dominated by the level of formality, the degree of risk in the decision based on the M&S effort, the importance of the decision to which the M&S effort contributes, and sufficiency of accuracy for the intended use. It is discussed by Pilch et al. that the maturity level of each element should be made based on the risk tolerance of the decision maker. NASA suggested a four-point scale based on the level of believability, formality, and credibility [35]. It was suggested by NASA that the credibility assessment should be separated from the requirements for a given application of M&S. In this study, the maturity level by Oberkampf [8] is used to represent and rank the predictive capability of M&Ss. It is believed that the maturity assessment and adequacy assessment should be dealt with independently as much as possible to reduce misunderstandings or misuse of an M&S maturity assessment. As a result, the maturity level in this study is defined by the intrinsic and fundamental attributes in the M&S validation and decision-making process. The objective is to objectively track all intellectual artifacts obtained during all related validation activities.

##### 4.1.1.1 Validation result.

In this study, the “validation result” is defined as the comparisons between the model predictions and validation data. Based on the comparisons, maturity levels can be further defined by descriptive terms in predictive capability maturity model [8], value bounds from probabilistic or distance metrics, confidence interval, or hypothesis testing. The results from different validation metrics are in different ranges, and the corresponding interpretations can be distinct. Maupin et al. [36] have reviewed and tested a class of validation metrics with a synthetic example; it is found that the selection of metrics is problem dependent. For example, when both the experimental measurement uncertainty (EMU) and model uncertainty are available, probabilistic metrics are more preferred than distance metrics. Otherwise, for results from deterministic models, the distance metrics are more appropriate. The descriptive terms are composed of two elements: model accuracy and performance standards. Performance standards are criteria for measuring “acceptability” of simulation accuracy, and they are defined according to applications and scenarios. These numbers are not fixed such that the upper and lower bounds can be floating in a single application, especially when multiple phenomena and databases are available. At the same time, it is suggested that the designation of value bounds should be consistent with the meaning of metrics outputs. For example, if hypothesis testing is used, higher values suggest a higher confidence level, and the corresponding level should be higher; if distance metrics are used, higher values usually suggest larger error, and the corresponding levels should be smaller.

When validation data are collected directly from the prototypical system, the validation result can directly support the argument of simulation adequacy. However, when the data are collected from reduced-scale facilities or separate effect tests (SETs), additional evidence is needed for assessing the simulation adequacy in target applications. Different from the maturity level definitions in PCMM, attributes of data applicability and scaling analysis are not included in the validation result. Rather, a separate evidence characterization, data applicability, is prepared to account for the effect of data relevance, scaling analysis, and data uncertainty. Meanwhile, the involvements of expert knowledge and judgment in selecting metrics and designing acceptance criteria are not included, and they will be discussed separately in the belief assessment.

##### 4.1.1.2 Data applicability.

In addition to the levels from validation results, evidence of data applicability is also needed when the data are collected from reduced-scale facilities, SETs or integral effect tests (IETs), etc. The “data applicability” is defined by the similarity between validation facilities and reactor prototypical conditions. In this study, the maturity level of data applicability is characterized by a R/S/U grading system. The R/S/U is first developed by N. Dinh's works in 2013 [37] and has been used in Ref. [38] to evaluate the quality of validation data. The R/S/U system categorizes evidence of data applicability into three subattributes: [R]elevance, [S]caling, and [U]ncertainty, and each of them is designed according to their intrinsic properties. In this study, focuses have been put on extreme cases with binary grades for relevance and scaling attributes. In practical applications, intermediate grades can be introduced with higher resolutions. The relevance grade [R] is determined according to relationships of phenomenon and physics between application and reduced-scale validation databases. For example, the flow data collected from a curved tube are irrelevant to those in a straight tube since the phenomena are different; and the channel flows with $Re$ around $100$ is irrelevant from those around $5000$ since the dominating physics is different. The relevance grade is mostly determined by expert opinions. Phenomenon identification and ranking table (PIRT) [39] and the corresponding quantitative version QPIRT [40] are strategies for identifying and ranking the relevance between validation databases and applications. The (physics) scaling grade [S] measures the degree of similarity between phenomena in the prototypical systems and reduced-scale experiments on the basis of physics scaling. At the same time, the scaling grade aims to determine if the validation databases are sufficient to justify extending the experimental model assessment results to applications. A formalized scaling analysis can be found in Ref. [5], and a recent review on scaling methodology can be found in Ref. [41]. In classical scaling analysis [26], dimensionless parameters are used for measuring the similarity between prototypical systems and reduced-scale facilities. If the dimensionless space of the validation databases covers the space of application, scaling analysis is claimed to be sufficient. Meanwhile, the database is claimed to be capable of representing behaviors and phenomena in the designated scenarios. For example, it is assumed that the lid-driven cavity flow can be sufficiently characterized by the Reynolds number ($Re$). It is also assumed that behaviors in the prototypical system can be represented by reduced-scale lid-driven cavity flow, while geometries, driven velocity, and fluid properties are different. As a result, the scaling grade for the validation databases can be decided by comparing the range of $Re$ for the reduced-scale database against the range under prototypical conditions. If the $Re$ range of validation databases covers that in prototypical systems, the scaling is graded as 1. Otherwise, scaling is graded as 0. In addition, scaling grade equals to 1 only if and only if the relevance [R] is not 0. Moreover, considering the effects of measurement errors, the uncertainty grade [U] is suggested for measuring the data uncertainty due to instrumentation errors and limited resolution.

For example, the data applicability assessment is performed when the target application is a channel flow, and the quantities of interest are the averaged flow velocity $v0$. It is assumed that the flow can be fully characterized by Reynolds number ($Re$), and the target Re-equals to $5×103$. Meanwhile, it is required that the uncertainty, quantified by L1 relative error norm $εQoI$, in measuring $v0$ is less than 50% of the characterized velocity $v0$. It is further assumed that four databases are available from four different experiments. The experiment #1 is performed in a curved pipe with $Re∈[102,104]$ and measurement error $εQoI=±0.1v0$. The experiments #2, #3, and #4 are performed in straight pipes. Experiment #2 has $Re∈[10,103]$ and $εQoI=±0.1v0$; both experiment #3 and #4 have $Re∈[102,104]$, while experiment #4 has higher measurement errors $εQoI=±2v0$. For experiment #1, since the phenomenon in the curved pipe (case #1) is different from those in the straight pipe, the collected data are not relevant to the target application even though the Reynolds number and data uncertainty satisfies the target conditions. Databases of case #3 and #4 are sufficient since the physical characterization ($Re$) of validation database covers the same characterization in the target application. However, case #2 does not cover the target application. Therefore, the scaling attribute of case #3 and #4 is rated as 1, while case #2 is rated as 0. The uncertainty of case #4 in measuring quantities of interest $εQoI$ is higher than the acceptance criteria, and the corresponding attribute is rated as 0. Uncertainty of case 2 and 3 satisfies the criteria and rated to be at least 1. As a result, case #3 is found to be applicable.

#### 4.1.2 Beliefs Assessment.

In addition to the maturity, belief in levels of validation results and data applicability needs to be assessed based on the prior knowledge. Considering the subjective and intangible nature of beliefs, a table of belief scales is prepared for the temporary quantification of intangibles, and the results are rendered in qualitative terms by applying this scale in reverse. Table 2 provides an example with an arbitrary assignment of probabilistic values; more sophisticated evaluations might be made by different sources and groups. The objective is to reach an agreement on a single or a class of scales, and the defense in depth is assured with better scrutability and communicability [2]. Also, beliefs can be estimated by metrics, including confidence interval, probability boxes, etc. [16,36]. However, their results cannot violate Eq. (2) such that an adequate margin can be ensured. Meanwhile, the belief can also be assessed based on expert opinions and represented by splinter probabilities. The value assignment in this study is arbitrary, and it is also suggested that the values are problem-dependent. For scenarios with severe consequences and small margins, the belief assessment and the belief scales can be more stringent.

It is suggested that the attributes of data applicability and validation result are not independent. For example, it has been pointed out by Ref. [36] that the selection of validation metrics depends on the uncertainty grades. It is also suggested by Ref. [26] that the gradings for scaling and relevance are also correlated. Meanwhile, the assessment for data applicability and selection of validation metrics relies on expert opinions. Considering the objective nature of maturity level and R/S/U grades by their definitions, an evidence integration process is needed for integrating intercorrelations and dependencies among attributes, subjective and objective information to the final simulation adequacy. Although GSN provides structural representations of validation arguments, no quantitative result can be obtained. To better support risk analysis and guide model selections, additional techniques are needed to quantify evidence and to transform validation arguments into computable networks.

### 4.2 Evidence Integration.

To integrate evidence in a transparent and consistent manner, many studies have employed GSN to integrate evidence to final simulation adequacy with the diagrammatic notation [42]. Based on the evidence characterization, the claim of overall simulation adequacy is supported by subclaims of validation result and data applicability, which is further argued based on the R/S/U grade. Figure 3 depicts the network of simulation adequacy assessment by GSN [43] and defines principal components in GSN. The top objective (goal #1) is to assess the adequacy of M&S for a designated scenario, and it is argued based on subclaims of validation results and data applicability. Furthermore, the data applicability is argued based on three subclaims: relevance, scaling, and uncertainty (R/S/U). The goals at bottom levels are solved by corresponding evidence and corresponding characterizations.

To quantify the validation argument with mathematical languages, this work uses probabilities and connects them with logic for quantitative reasoning. Comparing to the classical logics with rigid and binary characters, probabilistic approaches soften the constraints of Boolean logic and allow truth values to be measured on a belief scale [44]. According to Eq. (3), the belief is represented as a function of causal relationships ${di}$, prior knowledge ${p̃i}$, and decision parameters ${ki}$. The prior knowledge, represented by probability, has been estimated as belief and collected from the validation framework, together with the evidence of validation result and data applicability. Causal relationship includes direct and indirect dependency among all attributes. Since the dependence can be uncertain, the dependence becomes conditional to all possible states of attributes or intermediate variables. Such a process enables reasoning “by assumption” and decomposes the reasoning task into a set of independent subtasks. It also allows us to use local chunks of information taken from diverse domains and fit them together to form a global interference in stages, using simple, local vector operations. Since the quantification of conditional dependency relies on conceptual relationships and expert opinions, decision models are needed for assessing conditional probabilities. A validation knowledge base is constructed by quantifying components ${di}$, ${p̃i}$, and ${ki}$. In addition to different evidence characterizations, PCMQBN also aims to integrate evidence from different databases, and a synthetic model is needed for assessing the conditional probabilities according to their levels in relevancy, scaling, data uncertainty, data applicability, and validation results.

For better visualizations, this study uses the Bayesian network to represent the statistical relationships between different evidence and attributes. A Bayesian network (BN) is a directed acyclic graph (DAG) that is created by using the nodes represented by circles, arrows, and the conditional probability table. Each node defines either a discrete or a continuous random variable. An intermediate node serves as a parent as well as a child node. The nodes which have arrows directed to other nodes are parent nodes and nodes that have arrows coming from other nodes are called child nodes. A node that does not have any arrow coming from another node is called as the root node, and it does not have any parent node. Arrows represent direct relationships between interconnected parent and child nodes. The conditional probability table assigned to each node describes the quantitative relationships between interconnected nodes. A BN analysis is performed based on the conditional probability as in Eq. (4) and the conditional independence assumption, i.e., $P(x,yz)=P(x|z)P(y|z)$ if and only if $x⊥y|z$. The joint probability distributions can be described by conditional probability as
$P(X1,X2,…,Xn)=∏i=1nP(Xi|X1,…,Xi−1)$
(4)
The conditional independence assumption simplifies Eq. (4) further as
$P(X1,X2,…,Xn)=∏P(Xi|Parent (Xi))$
(5)
$Parent (Xi)$ is parent nodes for $Xi$; $P(Xi|Parent (Xi))$ is the conditional probability table of $Xi$; $n$ is the number of nodes in the network. A Bayesian network can also be used as an inference tool to evaluate beliefs of events when evidence becomes available. For evidence $e$, the joint probability of all the nodes can be inferred as
$P(X1,X2,…,Xn|e)=P(X1,X2,…,Xn,e)P(e)=P(X1,X2,…,Xn,e)∑X1,…,XnP(X1,X2,…,Xn,e)$
(6)

In this study, node $Xi$ includes simulation adequacy (SA), VR, DA, relevancy [R], scaling [S], and uncertainty [U], and each node is further characterized by maturity levels. Based on Eqs. (4) and (5), the joint probability distributions are calculated as a product of probability distributions of each of the variable's conditional on other variables. The conditional probability table is determined based on expert knowledge in casual relationships and dependencies among different nodes. Table 4 shows an example of assigning conditional probabilities when the data applicability is assessed based on evidence from R/S/U grades. First of all, it is 0% confident that corresponding data is applicable if the phenomena and involving physics are 100% not relevant; Meanwhile, the data are applicable with 100% confidence only if the data are relevant, scaling is sufficient, and data uncertainty is acceptable with 100% confidence [26]. Second, the confidence level of having applicable data drops to 60% if the data uncertainty becomes unacceptable; the level drops to 20% if the scaling becomes insufficient; the level further drops to 5% if both scaling and uncertainty are not acceptable. These number are required to be less than 100% based on findings by D' Auria [45] such that insufficient scaling and low-quality data are expected to have negative impacts on simulation adequacy assessment. However, the values are arbitrarily assigned to quantify the relative impacts due to different root causes, and it is assumed in this study that the negative impact due to insufficient scaling is higher than that due to low-quality data. Similar techniques also apply to the conditional probabilities for simulation adequacy assessment. The simulation is 100% adequate if the data is applicable and the validation result satisfies the acceptance criteria. Moreover, it becomes 30% or less confident that the simulation is adequate if either validation result or data applicability does not satisfy the criteria.

Figure 4 shows examples of the Bayesian network with the conditional probabilistic prepared with GeNie [46]. Although the data are relevant and have good quality, the confidence for applicable validation data is 20% since the dimensionless space of validation data does not cover the space of the target application. Meanwhile, since the confidence of getting an adequate simulation given an acceptable validation and not applicable data is 0.25. The confidence for an adequate simulation is 40% even the simulation predictions have good accuracy in predicting the validation data.

In practice, since multiple databases are usually used in the validation process, the overall simulation adequacy should account for impacts from multiple nodes that represent the simulation adequacy result from each database. In this study, a synthetic integration model is designed to determine the conditional probability based on the reactor prototypicality parameter (RPP) and EMU.

The concept of validation cubic was first suggested in Ref. [37], and the objective is to measure how close the given test conditions are to the reactor conditions in scenario of interest to the application. The term “cubic” refers to three-dimensional and normalized space, which is filled by a body of validation evidence from validation experiments. At the same time, each “dimension” is normalized to the range of 0 to 1 such that each face has a square shape. Three dimensions include RPP, system decomposition, and physics models. RPP, reactor prototypicality parameter, is defined as the significance of certain evidence in supporting claims in reactor conditions. In this study, a numerical value equal to 1 stands for highly significant evidence, in the sense that the data from validation experiments are relevant, sufficient, and high-quality. 0 means insignificant evidence where the data can be irrelevant, insufficient or low-quality. System decomposition represents the separation of target scenarios into subphenomena and subphysics. As a result, the validation experiments can be classified into separate or mixed effect tests, where separate phenomena and physics are investigated in different facilities. Physics models refer to the microscale closures, equation sets, and macroscale effective-field model for simulating the prototypical system. Figure 5 shows an example of a validation cubic. A body of evidence ($E1,…,Ei,…$) is collected from experiments with different system decomposition, i.e., SET, mixed effect test, small-scale integral effect test (SS-IET), and large-scale integral effect test. Meanwhile, each evidence $Ei$ is to develop the model and to support the validation over a range of models from subgrid-scale models (closures) to macroscale effective-field model (EFM). In this study, the RPP value is proposed to integrate the dimension of system decomposition and physics model, and it represents the relative importance of each evidence from the perspective of the physics model and system decomposition. Also, it is found that the status of evidence collection and simulation adequacy support is correlated with the filling of the Cubic's upper layer (RPP->1) across physics and system decomposition dimensions.

This study suggests a synthetic model for determining the RPP values based on the ratio of scaling parameters (Sc) in the experiments $[ScModK] EXP$ and in the applications $[ScModK]APP$
$RPP=[ScModK]EXP/[ScModK]APP$
(7)
The $ModK$ represents the physical process $K$ calculated from test/experimental conditions, which is also a high-ranked physics in the application conditions. $[ScModK]EXP$ represents the scaling parameters of $ModK$ in experimental conditions, while $[ScModK]APP$ is the scaling parameters in the application's conditions. In fluid mechanics, $ScModK$ can be quantified by dimensionless parameters, like Reynolds number and Mach number, which describe the relative magnitude of fluid and physical system characteristics, such as density, viscosity, speed of sound, and flow speed. To determine the conditional probability, a weight factor $ψEi$ for each evidence $Ei$ is first calculated by Eq. (8) based on the EMU and RPP in the validation cubic model [37]
$ψEk∼m·EMUJ+n·RPPK,J$
(8)

EMU is experimental measurement uncertainty that measures the uncertainty of a certain experiment, and it is determined based on the level of uncertainty characterizations of experimental measurements. A similar characterization for uncertainty levels can be found in Ref. [36]. $m$ and $n$ are grades that represent the significance of experiment $J$ and the physics $K$. The experimental significance is affected by the quality and relevance of a given experiment, while the physical significance is ranked according to the PIRT process, where highly ranked phenomena and their corresponding physics should have high a significance factor $n$. Table 5 provides an example of parameter selections and their definitions in the validation cubic decision model.

Figure 6 illustrates both 2D and 3D views of the validation cubic. To demonstrate the effects of significance factors, ranges of weight factors against the EMU values are made with three arbitrarily assigned values for $m$ and $n$. The minimum bound is obtained with RPP equals 0, while the maximum bound is obtained with RPP equals 1. It is emphasized that the current formulation is to illustrate the qualitative correlations between important decision parameters, i.e., the weight of evidence, and validation evidence, including scaling parameters, and experimental VUQ qualities.

After determining the weight factor $ψEi$ for each evidence $Ei$, they are normalized to $ψ̃Ei$ according to Eq. (9) and used as the conditional probabilities between overall simulation adequacy $CA$ and individual simulation adequacy from separate databases
$P(SASAEi)=ψEi/∑i=1nψEi$
(9)

Considering the previous discussion on validation result and data applicability, the general standards for simulation adequacy can be identified as:

Adequate—For the high-rank phenomena, the accuracy in predicting the quantity of interest is acceptable. The simulation can also be confidently used in similar applications with relevant, scaling, and high-quality validation databases (high R/S/U grades or answer yes). The accuracy in predicting corresponding quantities of interest should also be acceptable.

Inadequate—For the high-rank phenomena, the accuracy in predicting the quantity of interest is unacceptable. The simulation cannot be confidently used in similar applications with irrelevant, insufficient, or low-quality validation databases (low R/S/U grades or answer no).

The inadequacy can be caused by reasons including unacceptable validation result, irrelevant, low-quality data insufficient validation data. In classical validations, the simulation is inadequate if one of these conditions is satisfied. In the PCMQBN framework, the simulation becomes “partial” inadequate, and the degree is defined based on beliefs in probability.

### 4.3 Sensitivity Analysis.

Sensitivity analysis is the study of how the uncertainty in the output of a system can be divided and allocated to different sources of uncertainty in its inputs [47]. In Bayesian-network applications, sensitivity analysis investigates the effect of small changes in numerical parameters (prior probability, conditional probability) on the output parameters (posterior probabilities). Since the design and parameter selection of PCMQBN requires expert inputs, it is necessary to evaluate that induced uncertainty in the PCMQBN framework. A list of uncertain parameters is designed, including beliefs on the levels of evidence, conditional probability, and evidence integration structures. Next, a sensitivity analysis is performed to assess the impact of each parameter on any target nodes. In this study, an algorithm by Kjaerulff and van der Gaag [48] is used for calculating a complete set of derivatives of the posterior probability distributions over the target nodes over each of the uncertain parameters. Figure 7 shows an example of a tornado plot for the Bayesian network in Fig. 4. Twelve variables are sampled, including the belief in the evidence of validation result is acceptable (VR = Yes), validation data are relevant (DA_R = Yes), validation data are sufficient for scaling (DA_S = Yes), the probability of having an adequate simulation given that the data are applicable and validation result is acceptable (SA = Yes|DA = Yes, VR = No). All parameters are sampled from 0 to 1, and the width of each bar represents the range of belief values on the target attribute (Simulation adequacy = Yes). It can be found that evidence of validation result has the most significant impact on simulation adequacy. This is reasonable since the comparison between model predictions and experimental data directly represents the simulation's degree of accuracy. The conditional dependencies of simulation adequacy on data applicability and validation result have more impacts on the target belief than other dependencies.

Sensitivity analysis is a unique feature enabled by formalizing and quantifying the decision-making process. It improves the robustness of the assessment results for simulation adequacy in the presence of uncertainty. It also helps the understanding of correlations between different attributes in the validation decision-making process such that the structure can be continuously refined. Moreover, by identifying the most sensitive attribute, simulation adequacy can be improved by collecting evidence of specific phenomena, improving the model performance for local predictions, and refining the conditional-dependency parameters. In addition, the sensitivity analysis offers a simple strategy against the imprecision issue in classical Bayesian analysis, where the uncertainty is required to be measured by a single (additive) probability, and values can be measured by a precise utility function [16]. However, such an assumption is very hard to achieve in validation since the data are too few to make precise estimates on the probability and the distribution. By performing a sensitivity study on various sources of uncertainty, the standard analysis is applied to all possible combinations of the decision including parameters, evidence, and integration structure. Next, a class of simulation adequacy is determined, and if the class of decisions is approximately the same, it can be claimed that a robust result is obtained. Otherwise, the range can be taken as an expression of confidence from the analysis. As a “convenient” approach against the imprecision issue, this method is also known “Robust Bayes” or “Bayesian sensitivity study” [49,50].

### 4.4 Phase of Simulation Adequacy Assessment.

To manage the progress of validation activities, PCMQBN adequacy assessment, sensitivity analysis, and applications, this study defines three phases of development for grading the quality and confidence in the simulation adequacy results based on the sources and levels of uncertainties. Table 6 defines the phases of development based on the sources and levels of uncertainties in simulation adequacy assessment by PCMQBN. At each stage, evidence needs to be collected and characterized accordingly. Meanwhile, the uncertainty in each evidence, parameter, integration structure, and the final simulation adequacy need to be evaluated. Complete documentation and review of this process mark the completion of each phase. Phase 1 is designed for initial adequacy assessment. Although the uncertainty in final adequacy is large, the objective is to agree on the evidence selection, conditional dependencies, acceptance criteria, and qualitative impacts on the target applications. Meanwhile, it serves as the foundation for phase 2. Most validation activities and decision-making efforts will be conducted in phase 2, and the objective is to have a sufficiently adequate simulation that can support designated decisions with confidence. The quality assurance for the simulation is also required to prevent defects and issues in software products. Phase 3 involves licensing and regulatory activities, and the objective is to provide confirmatory results and define a defense-in-depth in evaluation.

To illustrate the process and help the understanding, a case study is prepared for assessing the simulation adequacy for smoothed particle hydrodynamics (SPH) methods in external-flooding scenarios. A validation process has been performed and discussed in Ref. [51]. The current case study is at the scoping stage, and the decision parameters are subject to sensitivity analysis.

## 5 Adequacy Assessment for Smoothed Particle Hydrodynamics Methods by Predictive Capability Maturity Quantification Using Bayesian Network

To demonstrate the capability of PCMQBN in assessing the adequacy of simulation results, this study assesses the adequacy of SPH simulations in predicting the impact forces during an external-flooding scenario. Evidence is collected from the CSAU/EMDAP framework, which is performed and explained in detail by a separate work [52]. Section 5.1 describes the assessment process for simulation adequacy. Section 5.2 evaluates the sensitivity of simulation-adequacy results by sampling decision parameters and evidence characterizations. Section 5.3 describes the application of simulation adequacy from PCMQBN results.

There are different types of flooding scenarios evaluated by the nuclear industry, and each may have multiple criteria for adequacy acceptance. For this external-flooding example, the analysis purpose is to assess if the simulation adequacy of SPH to model impact forces when simulating the scenario of “floods damage the building structures, enter the room, and cause diesel generator (DG) malfunctioning” is acceptable. The validation framework CSAU and its regulatory guide EMDAP are used for qualitative adequacy assessment. Figure 8 shows the scheme of the CSAU-guide validation process, and results from all activities lead to a qualitative decision of simulation adequacy for SPH methods in designated applications. The SPH methods and the simulation code, Neutrino, are explained in Ref. [52].

The corresponding QoIs include the response time and the structural loads on systems, structures, and components (SSCs) by floods. The response time is the time for the external floods to reach the DG building and to potentially fail the DGs, while the structural loads are the pressure forces acting on the nuclear SSCs by the floods. This study focuses on the adequacy assessment of SPH methods in predicting the structural loads. An SPH-based software, Neutrino [11], is used to simulate an external-flooding scenario.

A PIRT process is performed to rank the importance of separate phenomena for evaluating the simulation adequacy in the designated scenarios. To estimate the structural loads with sufficient accuracy, the adequacy of SPH methods in simulating the hydrodynamic forces on stationary structures is highly important. As a result, a validation database is constructed with a list of numerical benchmarks, and evidence of simulation accuracy (validation result) is collected by comparing simulation predictions against measurements from each benchmark. At the same time, a scaling analysis is performed to evaluate the applicability of all databases. Table 7 shows a list of benchmarks together with qualitative results for each assessment. In both benchmarks, the peak pressure forces are selected as the quantity of interest, and SPH simulations are performed with different simulation parameters for complete uncertainty quantification. Next, simulation results are compared against the experimental measurements, and an L1 metric (L1 relative error norm) described in Eq. (10) is used to evaluate the accuracy of SPH's performance. The accuracy is acceptable if $L1$ is less than 20%
$L1=|QoIpreds−QoImeasQoImeas|$
(10)

where $QoIpreds$ represents the predicted quantity of interest by Neutrino, while $QoImeas$ represents the measurements from experiments. More details about the accident scenario, PIRT process, performance measurement standards, accuracy, and scaling analysis can be found in Ref. [51].

It is found from the dam break benchmark that the SPH method is able to adequately predict the hydrodynamic forces on the stationary object with acceptable accuracy and applicable databases. At the same time, an opposite conclusion is obtained from the moving solids in fluid benchmark since the experimental scale is too small to cover the application scenarios. Therefore, based on the collected databases, it is hard to decide whether SPH methods can predict the hydrodynamic force on solid objects with acceptable accuracy since claims from two benchmarks seem to be contradictory. To reduce uncertainty, PCMQBN is applied to assess the simulation adequacy with the validation cubic model.

### 5.1 Predictive Capability Maturity Quantification Using Bayesian Network Adequacy Assessment.

Since evidence from two experimental databases is used, the weight factor needs to be calculated, and Table 8 shows the assignment of decision parameters based on validation activities from CSAU/EMDAP. Parameter $m$ represents the significance of dam-break and moving-solid-in-fluid experiments. It ranges from 0 to 3, and it is mainly determined by the quality of experiment and collected data. Since the dam break data are collected by extracting graphical points from literatures, its experimental significance is rated as low (=1). The moving solid data are collected directly from experimental facilities, and repeated runs are performed to quantify the experimental uncertainties from sensors, equipment, operating conditions, etc. Therefore, the moving-solid experiment is rated as high (=3). Parameter $n$ represents the significance of physics in two experiments, and it is rated according to the PIRT process. Since both experiments are investigating the phenomenon of hydrodynamic forces on stationary structures, they are rated as high, and the corresponding value is 3. $[ScModK] EXP$ and $[ScModK]APP$ are scaling parameters in experimental and prototypical conditions, respectively. A scaling analysis has been performed and discussed in Ref. [51]. A dimensionless number $x*$ is suggested for the dam break benchmark according to Eq. (11). $L$ is the distance between the gate and the solid object, $h$ is the initial depth of surface wave
$x*=h/L$
(11)
For the moving-solid benchmark, the scaling analysis shows that the accuracy in predicting the buoyance force depends on the particle intensity around the solid object. Therefore, for the moving object calculation, the cube density ratio ($ρ*$ defined in Eq. (12)) and the ratio between cube volume and average particle volume ($V*$ defined in Eq. (13)) are selected as the dimensionless parameters. $V¯dp$ is the average particle volume defined by Eq. (14), and $d$ is the initial particle diameter
$ρ*=ρcube/ρfluid$
(12)

$V*=Vcube/V¯dp$
(13)

$V¯dp=d3$
(14)

Based on the scaling parameters, the RPP can be determined according to Eq. (7). The dam break has RPP equal to 1 since the range of dimensionless parameters in validation databases covers those in the application scenario. The EMU is rated according to the characterization of experimental uncertainties (Table 5). Since the dam break data does not have any uncertainty information, the uncertainty level is rated as level 1 (EMU = 0.1). The uncertainty of moving solid measurements is quantified by repeated runs and rated as level 2 (EMU = 0.01). At last, all parameters are substituted into Eq. (8), and the weight factor $ψEi$ for each benchmark can be determined. They are further normalized to $ψEi¯$ such that they can be further used in PCMQBN for calculating the conditional probabilities.

Figure 9 shows the Bayesian network for simulation adequacy assessment based on evidence and decision parameters for two numerical benchmarks. It is found that the belief level on the claim that the SPH method is adequate in predicting the hydrodynamic force is 100% when the simulation adequacy is estimated solely by evidence from the dam break benchmark. This finding is consistent with the qualitative result given the simulation accuracy and data applicability for the dam break benchmark. Meanwhile, the belief level on the same claim becomes 36% when the simulation adequacy is estimated by evidence from the moving-solid experiment. This result is similar to the qualitative results where the simulation is not adequate in simulating pressures in the moving-solid benchmarks. Furthermore, it is found that the belief level for an adequate SPH simulation is 83% when evidence from both benchmarks is used. Compared to the qualitative results, there is higher confidence that the SPH simulation is adequate for the designated purposes based on available evidence. Also, the uncertainty of simulation adequacy is less than that from the qualitative assessment since the contradictory results suggest a noninformative adequacy distribution.

### 5.2 Sensitivity Analysis.

Considering the uncertainty of assigning decision parameters, a sensitivity study is performed by sampling all conditional probabilities by 10% of their current values. Figure 10 shows the sensitivity tornado, and it turns out that the relative importance of two validation databases, i.e., P(SA=Adequate|SA_DAM = Adequate, SA_Moving = Inadequate), has the highest impact on the final simulation adequacy. When the conditional probability is sampled from 0.438 to 1 (currently at 0.73 based on the RPP model), the probability of having an adequate simulation ranges from 0.64 to 1.

### 5.3 Application of Predictive Capability Maturity Quantification Using Bayesian Network Adequacy Results.

To further demonstrate how PCMQBN results can be used in risk-informed validation (Fig. 2), a risk-informed safety analysis is performed to evaluate potential damages to SSCs of NPPs by water waves. SPH simulations are performed to determine the structural loads by a wave for 60 cycles. The cycle is defined based on the frequency of hydrodynamic pressures by the surface wave. Figure 11 shows the predicted time transient of hydrodynamic pressure $Pr(t)$ and impulse, and 1 cycle lasts for 9.09 s. The impulse is calculated by
$Im(t)=∫T0T0+tPr(t)dt$
(15)

To evaluate damages from structural loads in each wave cycle, the pressure-impulse (P-I) diagram is calculated for each cycle. The pressure-impulse diagram is determined by finding the maximum pressure and maximum impulse in each cycle. In structural engineering, the P-I diagram is used to describe a structure's response to blast load. Depending on the P-I values in each cycle, damages to the structure by surface waves can be characterized by four damage levels as in Fig. 12. This study uses the P-I diagrams for reinforced concrete (RC) structures, and the curve of damage levels is made based on experimental data from Ref. [53].

Based on the adequacy definition, the accuracy is acceptable when the L1 error in predicting hydrodynamic pressure is less than 20%. It is further assumed that when the simulation is not adequate, either due to unacceptable error or inapplicable data, the prediction will have maximally 100% L1 errors. As a result, error bands are added to the SPH predictions by
$y=Yprd+εrYprd$
(16)
$Yprd$ is the SPH prediction for the hydrodynamic pressure and impulse, $εr$ is the maximum L1 error by the requirements: $εr$ equals to 20% when the simulation is adequate, and the accuracy is acceptable; $εr$ equals to 100% when the simulation is not adequate. When the simulation adequacy is uncertain, the prediction is linearly assembled based on the confidence
$yen=P(adq)·yadq+(1−P(adq))·yinadq$
(17)

$yen$ is the ensembled predictions; $P(adq)$ is the confidence in the claim that the simulation is adequate; $yadq$ is the SPH predictions with error bands when the simulation is adequate ($εr=20%$); $yinadq$ is the predictions with error bands when simulation is not adequate ($εr=100%$). Figure 13 shows the distribution of P-I values onto the damage-level plots for all 60 cycles in four conditions: (1) the simulation is 100% adequate; (2) the simulation is 100% inadequate; (3) the simulation is 50% adequate and 50% inadequate; (4) the simulation is 83% adequate and 17% inadequate.

The number of cycles in each different damage levels can be found with different simulation adequacy results. Table 9 shows the number of cycles in each damage level for four distributions of P-I values based on the simulation adequacy results. If no validation decision is made, on one hand, when the simulation is presumably 100% inadequate, all damages are predicted to be severe; on the other hand, when the simulation is presumably 100% adequate, there are no severe damages, and 21 out of 60 cycles result in light damages. If validation activities are performed, and when a qualitative validation decision is made with 50/50 adequacy results, 26 out of 60 cycles (43.3%) turn to be severe. However, when a quantitative validation decision is made 83/17 adequacy results based on the PCMQBN framework, all cycles turn to be moderate.

To further demonstrate how these predictions affect the safety analysis, an expected loss $C$ is calculated based on a table of synthetic monetary loss and the probability of each damage levels
$C=∫PDL⋅CdC=∑i=14PDL(i)⋅CDL(i)$
(18)

$CDL(i)$ is the consequence in monetary losses for the damage level $i$, and a synthetic value is assigned in Table 9; $PDL(i)$ is the chances that the predicted cycles will fall into the damage level $i$, and it is determined in Table 9. $i$ ranges from 1 to 4 and it represents four damage levels from no damage to severe damage. Table 10 shows the value of expected loss $C$ based on Eq. (18) and corresponding values in Table 9.

It is found that if the decision maker is willing to accept potential risks by the simulation errors and completely trust the simulation with 100% simulation adequacy, the expected loss is the smallest, which suggests an optimistic attitude to the simulation prediction errors. However, if the decision maker is not willing to accept any risks by simulation errors, the expected loss is greatest, which suggests a conservative attribute to the simulation and its prediction errors. Meanwhile, it is found that with simulation adequacy result assessed by PCMQBN (83/17), the expected loss is reduced by 30% compared to the qualitative and implicit decision framework (50/50) in classical validations. Assuming our goal is to make the expected loss less than \$60. The currently available evidence is sufficient to achieve this target. However, with the qualitative decision framework, we need additional validation efforts to further improve our confidence in simulation adequacy. Therefore, it is found that compared to the qualitative decision analysis, the PCMQBN framework is able to reduce costs by effectively conducting and planning validation activities.

## 6 Conclusion

In this study, a framework of PCMQBN is developed to formalize and quantify the validation decision-making process with mathematical languages. The objective is to support the decision-making process for simulation adequacy in a transparent, consistent, and improvable manner. PCMQBN first formalizes the mathematical representation of simulation adequacy as a triplet of scenario, predictive capability level, and belief. Next, argumentation theory is employed to formalize the decision-making process in validation as an argument for simulation adequacy that is based on evidence from the validation frameworks and activities. In this process, all related evidence is characterized such that its representation is consistent with the definition of simulation adequacy. Next, all evidence is quantified where the predictive capability is represented by maturity levels and the belief is quantified by probabilities. Next, Bayes' theorem is used to integrate the quantified evidence, and the Bayesian network is used to represent this integration by directed acyclic graphs. To ensure the consistency of network connections and causal dependence on well-known physics, rules, and knowledge, a synthetic model is also suggested for evaluating the conditional probability among all nodes in the network by calculating the Reactor Prototypicality Parameter. A sensitivity analysis is performed to evaluate the impact of conditional probability and decision parameters. It is found that the conditional dependency between simulation adequacy and validation result has higher impacts on those between [R]elevancy/[S]caling/[U]ncertainty grade and data applicability. It is also found that relative weights of evidence from different databases have large impacts on the final data adequacy. Therefore, during a validation decision-making process, the correlations and dependencies among different databases and attributes need to be evaluated more carefully than accuracy assessments and scaling analysis for separate models and databases. Based on the sources and levels of uncertainty, three phases of development are defined for documenting and grading the quality of the assessment process and simulation adequacy results.

## Acknowledgment

This work is fully supported by the U.S. Department of Energy via the Integrated Research Project on “Development and Application of a Data-Driven Methodology for Validation of Risk-Informed Safety Margin Characterization Models” under the grant DE-NE0008530 and the ARPA-E MEITNER program on “Development of a Nearly Autonomous Management and Control System for Advanced Reactors” under the grant DE-AR0000976. The author would also like to acknowledge the comments and suggestions by Dr. Robert Youngblood and Mr. Steven Prescott at Idaho National Laboratory, Mr. Ram Sampath and Mr. Niels Montanari at CentroidLAB Inc., Dr. Matthieu Andre and Dr. Philippe Bardet at George Washington University.

## Funding Data

• Advanced Research Projects Agency—Energy (Grant No. DE-AR0000976; Funder ID: 10.13039/100006133).

• U.S. Department of Energy (Grant Nos. IRP-16-10918, DE-NE0008530; Funder ID: 10.13039/100000015).

## Nomenclature

• BN =

Bayesian network

•
• CFD =

computational fluid dynamics

•
• CSAU =

code scaling, applicability, and uncertainty

•
• DA =

data applicability

•
• DG =

diesel generator

•
• DL =

damage level

•
• EFM =

effective-field model

•
• EMDAP =

evaluation model development and assessment process

•
• EMV =

expected monetary value

•
• GSN =

goal structuring notation

•
• IET =

integral effect test

•
• M&S =

modeling and simulation

•
• NPP =

nuclear power plan

•
• NRC =

Nuclear Regulatory Commission

•
• NRMSE =

normalized root mean squared error

•
• PCMM =

predictive capability maturity model

•
• PCMQ =

predictive capability maturity quantification

•
• PCMQBN =

PCMQ using BN

•
• P-I =

pressure-impulse

•
• PIRT =

phenomenon identification and ranking table

•
• QoI =

quantity of interest

•
• R/S/U =

relevancy/scaling/uncertainty

•
• RPP =

reactor prototypicality parameter

•
• SA =

•
• SET =

separate effect test

•
• SPH =

smoothed particle hydrodynamics

•
• SSC =

system, structure, and component

•
• SS-IET =

small-scale IET

•
• VR =

validation result

## References

References
1.
S.
Hess
,
N.
Dinh
,
J.
Gaertner
, and
R.
Szilard
,
2008
, “
Risk-Informed Safety Margin Characterization
,”
Proceedings of the 17th International Conference on Nuclear Engineering
, Brussels, Belgium, July 12–16, Paper No. ICONE17-75064, pp.
11
17
.
2.
Theofanous
,
T.
,
1996
, “
On the Proper Formulation of Safety Goals and Assessment of Safety Margins for Rare and High-Consequence Hazards
,”
Reliab. Eng. Syst. Saf.
,
54
(
2–3
), pp.
243
257
.10.1016/S0951-8320(96)00079-8
3.
Smith
,
C.
,
Rabiti
,
C.
,
Martineau
,
R.
, and
Szilard
,
R.
,
2015
,
Risk-Informed Safety Margins Characterization (RISMC) Pathway Technical Program Plan
,
Idaho National Laboratory
,
Idaho Falls, ID
.
4.
Smith
,
C.
,
Schwieder
,
D.
,
Phelan
,
C.
,
Bui
,
A.
, and
Bayless
,
P.
,
2012
,
Risk Informed Safety Margin Characterization (RISMC) Advanced Test Reactor Demonstration Case Study
,
Idaho National Laboratory
,
Idaho Falls, ID
.
5.
Zuber
,
N.
,
Wilson
,
G. E.
,
Ishii
,
M.
,
Wulff
,
W.
,
Boyack
,
B. E.
,
Dukler
,
A. E.
,
Griffith
,
P.
,
Healzer
,
J. M.
,
Henry
,
R. E.
,
Lehner
,
J. R.
,
Levy
,
S.
,
Moody
,
F. J.
,
Pilch
,
M.
,
Sehgal
,
B. R.
,
Spencer
,
B. W.
,
Theofanous
,
T. G.
, and
Valente
,
J.
,
1998
, “
An Integrated Structure and Scaling Methodology for Severe Accident Technical Issue Resolution: Development of Methodology
,”
Nucl. Eng. Des.
,
186
(
1–2
), pp.
1
21
.10.1016/S0029-5493(98)00215-5
6.
U.S. NRC
,
2017
, “
50.46 Acceptance Criteria for Emergency Core Cooling Systems for Light-Water Nuclear Power Reactors
,” U.S. NRC, Washington, DC, accessed Feb. 18, 2020 https://www.nrc.gov/reading-rm/doc-collections/cfr/part050/part050-0046.html
7.
U.S. NRC
,
2005
,
Transient and Accident Analysis Methods
,
U.S. Nuclear Regulatory Commission
,
Washington, DC
.
8.
Oberkampf
,
W.
,
Pilch
,
M.
, and
Trucano
,
T.
,
2007
,
Predictive Capability Maturity Model for Computational Modeling and Simulation
,
Sandia National Laboratories
,
Albuquerque, NM
, Report No. SAND2007-5948.
9.
ASME
,
2006
, “
Guide for Verification and Validation in Computational Solid Mechanics
,”
ASME
Standard No. VV10-2006(R2016).
10.
ASME
,
2009
, “
Standard for Verification and Validation in Computational Fluid Dynamics and Heat Transfer
,”
ASME
Standard No. VV20-2009(R2016).
11.
Sampath
,
R.
,
2018
, “Neutrino Document (Release 1.0),” Centroid LAB Inc., Los Angeles, CA, accessed Oct. 10, 2018, https://media.readthedocs.org/pdf/neutrinodocs/master/neutrinodocs.pdf
12.
Lin
,
L.
,
2016
,
Assessment of the Smoothed Particle Hydrodynamics Method for Nuclear Thermal-Hydraulic Applications
,
North Carolina State University
,
Raleigh, NC
.
13.
Violeau
,
D.
,
2012
,
Fluid Mechanics and the SPH Method: Theory and Applications
,
Oxford University Press
,
Oxford, UK
.
14.
Zhu
,
Q.
,
Hernquist
,
L.
, and
Li
,
Y.
,
2015
, “
Numerical Convergence in Smoothed Particle Hydrodynamics
,”
Astrophys. J.
,
800
(
1
), p.
6
.10.1088/0004-637X/800/1/6
15.
Clarke
,
E.
, and
Wing
,
J.
,
1996
, “
Formal Methods: State of the Art and Future Directions
,”
ACM Comput. Surv.
,
28
(
4
), pp.
626
643
.10.1145/242223.242257
16.
Walley
,
P.
,
1990
,
Statistical Reasoning With Imprecise Probabilities
,
Chapman and Hall/CRC
, Boca Raton, FL.
17.
Youngblood
,
R.
,
2017
, “
Recommendations on Validation of Models to Be Developed
,”
Development and Application of a Data-Driven Methodology for Validation of Risk-Informed Safety Margin Characterization Models
,
North Carolina State University
,
Raleigh, NC
.
18.
Kahneman
,
D.
,
2011
,
Thinking, Fast and Slow
,
Farrar, Straus and Giroux
,
New York
.
19.
Oberkampf
,
W.
, and
Roy
,
C.
,
2010
,
Verification and Validation in Scientific Computing
,
Cambridge University Press
, Cambridge, UK.
20.
Athe
,
P.
,
2018
,
A Framework for Predictive Capability Maturity Assessment of Simulation Codes
,
North Carolina State University
,
Raleigh, NC
.
21.
Kaizer
,
J.
,
Anzalone
,
R.
,
Brown
,
E.
,
Panicker
,
M.
,
Haider
,
S.
,
Gilmer
,
J.
,
Drzewiecki
,
T.
, and
Attard
,
A.
,
2018
,
Credibility Assessment Framework for Critical Boiling Transition Models
,
U.S. NRC
,
Washington, DC
.
22.
Zhang
,
R.
, and
,
S.
,
2003
, “
Bayesian Methodology for Reliability Model Acceptance
,”
Reliab. Eng. Syst. Saf.
,
80
(
1
), pp.
95
103
.10.1016/S0951-8320(02)00269-7
23.
Smith
,
R.
,
2013
,
Uncertainty Quantification: Theory, Implementation, and Applications
,
Society for Industrial and Applied Mathematics
,
.
24.
Liu
,
Y.
, and
Dinh
,
N.
,
2019
, “
Validation and Uncertainty Quantification for Wall Boiling Closure Relations in Multiphase-CFD Solver
,”
Nucl. Sci. Eng.
,
193
(
1–2
), pp.
81
99
.10.1080/00295639.2018.1512790
25.
Wu
,
X.
,
Shirvan
,
K.
, and
Kozlowski
,
T.
,
2019
, “
Demonstration of the Relationship Between Sensitivity and Identifiability for Inverse Uncertainty Quantification
,”
J. Comput. Phys.
,
396
, pp.
12
30
.10.1016/j.jcp.2019.06.032
26.
Fletcher
,
C.
,
Bayless
,
P.
,
Davis
,
C.
,
Ortiz
,
M.
,
Sloan
,
S.
,
Shaw
,
R.
,
Shultz
,
R.
,
Slater
,
C.
,
Johnsen
,
G.
,
,
J.
,
Ghan
,
L.
, and
Bessette
,
D.
,
1997
,
Adequacy Evaluation of RELAP5/MOD3, Version 3.2.1.2. For Simulating AP600 Small Break Loss-of-Coolant Accidents
,
Idaho National Engineering and Environment Laboratory
,
Idaho Falls, ID
.
27.
Kaplan
,
S.
, and
Garrick
,
B.
,
1981
, “
On the Quantitative Definition of Risk
,”
Risk Anal.
,
1
(
1
), pp.
11
27
.10.1111/j.1539-6924.1981.tb01350.x
28.
Sun
,
L.
,
2012
,
Establishing Confidence in Safety Assessment Evidence
,
Department of Computer Science, University of York
, York, UK.
29.
U.K. MoD,
1991
,
The Procurement of Safety Critical Software in Defence Equipment
,
UK Ministry of Defence
, London, UK, Standard No. 00–55.
30.
Toulmin
,
S. E.
,
2003
,
The Use of Argument
,
Cambridge University Press
,
Cambridge, UK
.
31.
NASA
,
2008
,
Standard for Models and Simulation (NASA-HDBK-7009)
,
National Aeronautics and Space Administration
,
Washington, DC
.
32.
Harmon
,
S.
, and
Youngblood
,
S.
,
2005
, “
A Proposed Model for Simulation Validation Process Maturity
,”
J. Defense Model. Simul.
,
2
(
4
), pp.
179
190
.10.1177/154851290500200402
33.
Logan
,
R.
, and
Nitta
,
C.
,
2003
, “
Validation, Uncertainty, and Quantitative Reliability at Confidence (QRC)
,”
AIAA
Paper No. 2003-1337.10.2514/6.2003-1337
34.
Pilch
,
M.
,
Trucano
,
T.
,
Peercy
,
D.
,
Hodges
,
A.
, and
Froehlich
,
G.
,
2004
,
Concepts for Stockpile Computing
,
Sandia National Laboratories
, Report No. SAND2004-2479.
35.
NASA
,
2006
,
Interim NASA Technical Standard for Models and Simulations
,
National Aeronautics and Space Administration
, Washington, DC, Report No. NASA-STD-(I)-7009.
36.
Maupin
,
K.
,
Swiler
,
L.
, and
Porter
,
N. W.
,
2018
, “
Validation Metric for Deterministic and Probabilistic Data
,”
J. Verif, Valid. Uncertainty Quantif.
,
3
(
3
), p. 031002.10.1115/1.4042443
37.
Dinh
,
N.
,
2013
, “
Validation Data to Support Advanced Code Development
,”
15th International Topical Meeting on Nuclear Reactor Thermal Hydraulics (NURETH-15)
,
Pisa
, Italy, May 12–15, Paper No. 676.
38.
Bodda
,
S.
,
Gupta
,
A.
, and
Dinh
,
N.
,
2020
, “
Risk Informed Validation Framework for External Flooding Scenario
,”
Nucl. Eng. Des.
,
356
, p.
110377
.10.1016/j.nucengdes.2019.110377
39.
Olivier
,
T.
, and
Nowlen
,
S.
,
2008
,
A Phenomena Identification and Ranking Table (PIRT) Exercise for Nuclear Power Plant Fire Modeling Applications (NUREG/CR-6978)
,
U.S. NRC
,
Washington, DC
.
40.
Yurko
,
J.
, and
Buongiorno
,
J.
,
2011
, “
Quantitative Phenomena Identification and Ranking Table (QPIRT) for Reactor Safety Analysis
,”
Trans. Am. Nucl. Soc.
, Vol.
104
, Hollywood, FL, June 26–30.
41.
OECD Nuclear Energy Agency
,
2016
,
Scaling in System Thermal-Hydraulics Applications to Nuclear Reactor Safety and Design: A State-of-the-Art Report
,
OECD NEA
, Paris, France.
42.
Athe
,
P.
, and
Dinh
,
N.
,
2017
,
A Framework to Support Assessment of Predictive Capability Maturity of Multiphysics Simulation Codes
,
Consortium for Advanced Simulation of LWRs (CASL)
,
Oak Ridge, IN
.
43.
Spriggs
,
J.
,
2012
,
GSN—The Goal Structuring Notation
,
Springer-Verlag
,
London
.
44.
Pearl
,
J.
,
1988
,
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
,
Morgan Kaufmann Publishers
,
San Francisco, CA
.
45.
D'Auria
,
F.
, and
Galassi
,
G.
,
2010
, “
Scaling in Nuclear Reactor System Thermal-Hydraulics
,”
Nucl. Eng. Des.
,
240
, pp.
3267
3293
.10.1016/j.nucengdes.2010.06.010
46.
BayesFusion LLC.
,
2019
,
GeNie Modeler User Manual
,
BayesFusion, LLC
, Pittsburgh, PA.
47.
Saltelli
,
A.
,
Ratto
,
M.
,
Andres
,
T.
,
Campolongo
,
F.
,
Cariboni
,
J.
,
Gatelli
,
D.
,
Saisana
,
M.
, and
Tarantola
,
S.
,
2008
,
Global Sensitivity Analysis: The Primer
,
Wiley
, Hoboken, NJ.
48.
Kjaerulff
,
U.
, and
Gaag
,
L. C. V D.
,
2000
, “
Making Sensitivity Analysis Computationally Efficient
,”
Proceedings of the 16th Annual Conference on Uncertainty in Artificial Intelligence
, Stanford, CA, June 30–July 3, pp 317–325.
49.
Berger
,
J.
,
1984
, “
The Robust Bayesian Viewpoint (With Discussion)
,”
Robust. Bayesian Anal.
,, pp.
63
144, North-Holland, Amsterdam, The Netherlands
.
50.
Berger
,
J.
,
1985
,
Statistical Decision Theory and Bayesian Analysis
,
Springer-Verlag
,
New York
.
51.
Lin
,
L.
,
Prescott
,
S.
,
Montanari
,
N.
,
Sampath
,
R.
,
Bao
,
H.
, and
Dinh
,
N.
,
2020
, “
Adequacy Evaluation of Smoothed Particle Hydrodynamics Methods for Simulating the External-Flooding Scenario
,”
Nucl. Eng. Des.
,
365
, p.
110720
.
52.
Lin
,
L.
,
2019
,
Development and Assessment of Smoothed Particle Hydrodynamics Method for Analysis of External Hazards
,
North Carolina State University
,
Raleigh, NC
.
53.
Abedini
,
M.
,
Mutalib
,
A.
,
Raman
,
S.
,
Alipour
,
R.
, and
Akhlaghi
,
E.
,
2019
, “
Pressure-Impulse (P-I) Diagrams for Reinforced Concrete (RC) Structures: A Review
,”
Arch. Comput. Methods Eng.
,
26
, pp.
733
767
.10.1007/s11831-018-9260-9