## Abstract

Addressing safety concerns in commercial nuclear power plants (NPPs) often requires the use of advanced modeling and simulation (M&S) in association with the probabilistic risk assessment (PRA). Advanced M&S are also needed to accelerate the analysis, design, licensing, and operationalization of advanced nuclear reactors. However, before a simulation model can be used for PRA, its validity must be adequately established. The objective of this research is to develop a systematic and scientifically justifiable validation methodology, namely, probabilistic validation (PV), to facilitate the validity evaluation (especially when validation data are not sufficiently available) of advanced simulation models that are used for PRA in support of risk-informed decision-making and regulation. This paper is the first in a series of two papers related to PV that provides the theoretical foundation and methodological platform. The second paper applies the PV methodological platform for a case study of fire PRA of NPPs. Although the PV methodology is explained in the context of PRA of the nuclear industry, it is grounded on a cross-disciplinary review of literature and so applicable to validation of simulation models, in general, not necessarily associated with PRA or nuclear applications.

## 1 Introduction and Statement of Objectives

Emergent safety concerns in nuclear power plants (NPPs) commonly involve complex phenomena whose outcomes are very likely to change as a result of spatiotemporal variations in their compounding events [1]. Addressing these safety concerns in a risk-informed decision-making framework would, therefore, require an adequate degree of spatiotemporal resolution in the risk estimation. As a result, the use of advanced M&S to capture those complex and highly spatiotemporal phenomena more realistically within the probabilistic risk assessment (PRA) of existing NPPs has recently received growing attention from academia, industry, and regulatory agencies. Advanced M&S have also been used to accelerate the analysis, design, and operationalization of advanced nuclear reactors. The validity of a simulation model, however, must be adequately established before its use in PRA to support risk-informed decision-making. To facilitate the validity evaluation of advanced simulation models and to support their use for PRA in support of risk-informed decision-making and regulation, this research develops a systematic and scientifically justifiable validation methodology, i.e., the probabilistic validation (PV) methodology.

Many approaches for validating simulation models use empirical validation where model predictions are directly compared against validation data. The validation data refer to data obtained from model validation experiments whose design and execution allow for capturing the essential physics of interest and measuring all information required by the model to simulate the physics [2]. Such information includes initial and boundary conditions, material properties, and system excitation, as well as measurements of the system response quantities of interest [2]. However, conducting the empirical validation for the simulation models used in PRA at existing NPPs and to support the development of advanced nuclear reactors would be challenging, if not impossible, due to (i) a lack of validation data at the model output level; (ii) available data at lower levels (e.g., model inputs) being subject to various sources of uncertainty; and (iii) available data not being fully applicable for the specific context and conditions under which the model is to be used. When adequate validation data are not available, validity of simulation predictions is commonly assessed using subjective and qualitative metrics such as face validation, believability, plausibility, or reasonableness [3]. Nevertheless, to effectively support the use of simulation models in PRA and risk-informed decision-making applications, quantitative metrics of validity are more useful because they are more informative.

This PV methodology advances the scientific usage of uncertainty and acceptability criteria to facilitate the validity evaluation for simulation predictions especially when validation data are not sufficiently available. The uncertainty associated with simulation predictions can be categorized into two groups: (i) epistemic uncertainty that arises from a lack of knowledge regarding the true values of the predicted quantities; and (ii) aleatory uncertainty rooted in inherent stochasticity of the physical phenomena underlying the quantities of interest. The PV methodology utilizes the epistemic uncertainty associated with the simulation predicted quantities as a measure of their “degree of confidence.” This methodology aligns with the National Research Council's recommendation that models should be made “*as useful as possible by quantifying how wrong they are*” [4]. In the PV methodology, the validity of a simulation prediction used for PRA is determined by: (1) the magnitude of epistemic uncertainty (i.e., representing the degree of confidence) in the simulation prediction, calculated using a comprehensive uncertainty analysis that can quantify and aggregate all dominant sources of uncertainty involved in the development and usage of the simulation model; and (2) result of an acceptability evaluation that determines whether the total uncertainty (including both aleatory and epistemic uncertainties) associated with the simulation prediction is acceptable for the specific application of interest (e.g., PRA).

When using the PV methodology in the PRA context, aleatory, and epistemic uncertainties involved in the development of simulation model predictions used in various levels of causality of the PRA model are identified and characterized. These uncertainty sources are then propagated through the PRA model to estimate the epistemic uncertainty and total uncertainty in the plant risk estimates. These uncertainty measures, when combined with an advanced importance measure analysis [5–7], help identify uncertainty contributors that are most critical. This then helps inform decision-makers as to how resources can be efficiently utilized to improve the validity of the simulation prediction. Thus, to execute the PV methodology for supporting risk-informed decision-making applications, there is a need for a “unified” platform that connects the underlying simulations to the plant-level risk. For this purpose, the integrated PRA (I-PRA) methodological framework [1,8,9] (Fig. 1) previously developed by the authors is leveraged. I-PRA integrates simulation models of the underlying human and physical failure mechanisms (“b,” “c,” and “d” in Fig. 1) with the existing plant PRA (“f” in Fig. 1) through a probabilistic interface (i.e., interface module; “e” in Fig. 1). This integration helps add more realism to the plant risk estimates while avoiding significant changes to the PRA model structure and its associated costs (e.g., peer review). The PV methodology is an essential feature of the interface module to help validate and justify the use of these simulation modules, alongside dependency treatment and Bayesian updating [10].

In current nuclear regulatory and industry practices, simulation models are used in various applications to estimate PRA inputs. For instance, Fire PRA uses a fire progression model to estimate fire-induced cable damage probabilities in an “offline” manner, and these are then plugged into the PRA software as inputs to quantify the plant PRA model. In this fashion, the communication between the plant PRA model and the underlying physical and human models is not causal (or explicit) and is only based on the basic event probabilities. This “passive” communication does not provide the capability of tracing the input–output relationships between plant risk metrics and the input parameters of the simulation models. I-PRA, however, creates a “unified” connection between the plant PRA and the underlying physics and human performance simulation models. In this “unified” connection, the communication of data and information among multiple levels of causality (e.g., cable level, component level, and system level) is carried out in a cohesive computational platform. This way, the relationship between the plant risk metrics and the input parameters associated with the underlying physics and human performance can be explicitly traced and captured. This “unified” connection has also allowed for adding a global importance measure analysis method [5,6] into I-PRA to enable the ranking of significant sources of uncertainty at the level of underlying physical failure mechanisms and human performance. This importance ranking is based on the contribution of these sources of uncertainty to the total uncertainty in the plant risk estimates. The global importance measure method can account for (i) uncertainty associated with the input parameters, (ii) nonlinearity and interaction effects in the model, and (iii) uncertainty associated with the model outputs. I-PRA has been used in two industry-academia collaborative projects with the South Texas Project Nuclear Operating Company (STPNOC): risk-informed resolution of generic letter 2004-02 [1,9] and Fire PRA [8,11].

The preliminary development of the PV methodology [10,12] only quantified uncertainties associated with the choice of sampling techniques and sample size used for the sampling-based uncertainty quantification (#1 in module “e” in Fig. 1). This current research provides a theoretical foundation for PV, established based on five key characteristics, and develops a methodological platform that operationalizes this theoretical foundation.

In this paper, Sec. 2 covers a cross-disciplinary literature review to generate scientific justification for using epistemic uncertainty as a supporting quantitative measure for validating simulation model prediction, especially when empirical validation is challenging. This literature review supports the PV theoretical foundation established in Sec. 3 as well as the PV methodological platform developed in Sec. 4. Section 5 covers concluding remarks and future work. This paper is the first in a series of two journal papers related to PV. The second paper implements the PV methodological platform for a case study of fire PRA of NPPs.

All the validation approaches reviewed in Sec. 2 leverage uncertainty analysis to some extent for assessing errors and uncertainties inherent in activities of the M&S development process. Compared to these existing approaches, the PV methodology has a unique combination of five key characteristics:

*Characteristic #1*: The PV methodology offers a multilevel, multimodel-form validation analysis that can integrate data and uncertainty analysis at multiple levels of the system hierarchy to support the degree of confidence evaluation.*Characteristic #2*: The PV methodology separates aleatory and epistemic uncertainties and, when possible, differentiates between two forms of epistemic uncertainty (i.e., statistical variability and systematic bias) while considering their influence on the uncertainty in the simulation prediction.*Characteristic #3*: The PV methodology uses a risk-informed acceptability criteria, along with a predefined guideline, to evaluate the acceptability of the simulation prediction.*Characteristic #4*: The PV methodology combines uncertainty analysis with a two-layer sensitivity analysis to streamline the validity assessment and to efficiently improve the degree of confidence in the simulation prediction.*Characteristic #5*: The PV methodology is equipped with a theoretical causal framework that supports the comprehensive identification and traceability of uncertainty sources influencing the uncertainty in the simulation prediction.

Characteristics #3 and #5 are uniquely developed for the PV methodology. Although each of the other characteristics (#1, #2, and #4) exists in some of the current studies, the integration of these characteristics under one methodology is a unique contribution of this research. Characteristic #1 is not available in the PRA domain and is adopted from other disciplines such as computational fluid dynamics (CFD) [13–16] and structural reliability analysis [17,18]. Characteristic #2 is common in PRA of NPPs for uncertainty analysis, but not for validation purpose as seen in other disciplines. Regarding characteristic #4, literature outside the PRA domain acknowledges the importance of sensitivity analysis for validity improvement but lacks a formal, quantitative screening technique to make it computationally feasible [19,20]; and this gap is addressed in the PV methodology.

Although the PV methodology is explained in the context of PRA for NPPs, it is grounded on a cross-disciplinary review of literature and, thus, applicable to validation of simulation models in general, not necessarily associated with PRA or nuclear applications.

## 2 Review of Existing Studies That Leverage Uncertainty Quantification to Assess the Credibility of Simulation Models

Since the 1970s, owing to the wide variety of views on verification and validation (V&V) principles and procedures, multiple definitions of validation emerged in various communities. The first definition of validation, together with the definition of its relevant term “verification,” was developed by the early efforts of the Operations Research community and was first published in 1979 by the Society for Computer Simulation [21]. This first definition of validation, though being instructive as it refers to validation as “a substantiation that a computerized model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model,” but does not provide sufficient clarity on the “satisfactory range of accuracy.” In the 1980s, the Institute of Electrical and Electronics Engineers (IEEE) introduced its V&V definitions [22,23]. The IEEE defined validation as “the process of testing a computer program and evaluating the results to ensure compliance with specific requirements.” This definition was restricted to software V&V and did not contribute much to both the intuitive understanding of validation and the development of validation methods due to the lack of explicit statements on the “specific requirements.” This IEEE definition also did not contain any indication of what the requirements for correctness, accuracy, or degree of compliance would be. In the early 1990s, the U.S. Department of Defense (DoD) published its definitions of V&V [24] in which validation was defined as “the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model.” A key feature of the DoD definition, which is not mentioned in the IEEE definition, is the emphasis on accuracy and the assumption that a *measure of accuracy* can be determined. In this view, accuracy in model validation can be measured relative to any accepted referent such as experimentally measured data or expert opinion. The DoD definition [24] was later adopted verbatim by the American Institute of the Aeronautics and Astronautics (AIAA) [16] and the American Society of Mechanical Engineers (ASME) [25–27] communities.

Although different interpretations of the definition of validation tend to converge when V&V problems become interdisciplinary, significant differences, as pointed out in Oberkampf and Roy [28] and Beisbart [29], still exist. Indeed, even for the DoD/AIAA/ASME common definition of validation, there are points of disagreement among different communities when it comes to interpreting several aspects of the definition [30]. For example, this common definition emphasizes the need to have “referent” data against which the simulation predictions will be compared; however, in the AIAA interpretation, the “referent” data is strictly defined as experimental outcomes [16] while in the ASME interpretation, the “referent” data is relaxed to include “data, theory, and information” that are applicable [27]. Roach [30] later elaborated on the DoD/AIAA/ASME definition of validation to view the model as a whole with its associated data while clarifying that the “real world” is determined by experimental data: “Validation is the process of determining the degree to which a model, with its associated data, is an accurate representation of the real world as determined by experimental data, the metrics of which are chosen from the perspective of the intended uses of the model.” Other contested issues of the DoD/AIAA/ASME validation definition, as acknowledged in the ASME V&V 20-2009 [26], are (i) whether the “degree to which a model is an accurate representation of the real world” implies acceptability criteria (pass/fail) that are normally associated with certification or accreditation; and (ii) whether the “intended uses of the model” are specific or general. The existing commonalities and points of disagreement in different interpretations of the validation concept indicate that more attention should be given to the theoretical perspectives of validation before developing the PV methodology.

Related to the broader domain of Verification, Validation, and Uncertainty Quantification (VVUQ), it is noteworthy that there has been general agreement about the purposes of VVUQ despite the differences in their interpretations/definitions [4]. To establish a technical background and common language for the discussion, Secs. 2.1–2.3 review notable VVUQ frameworks that exist in various engineering fields. It is not the purpose of the authors to provide a complete review of existing VVUQ approaches; such work has been done in various fields and the reader can refer to Refs. [28] and [31–43]. The authors focus, instead, on reviewing quantitative VVUQ frameworks that, to some extent, leverage uncertainty quantification for assessing the credibility of simulation models. This review helps the authors (i) better understand and define the validation problem of interest that motivates the development of the PV methodology, as compared to the practice in other domains; (ii) collect and present scientific evidence to support the use of epistemic uncertainty for validation of simulation models, and (iii) develop the PV theoretical foundation (covered in Sec. 3) and the PV methodological platform (offered in Sec. 4).

### 2.1 American Society of Mechanical Engineers Verification and Validation Standards.

American Society of Mechanical Engineers plays an important role in the VVUQ community by developing VVUQ standards to provide guidance that helps practitioners assess and enhance the credibility of their computational models [44]. As a result, two foundational standards have been published by the ASME V&V Committee to present VVUQ methodologies in the domains of computational fluid dynamics and heat transfer [26] and computational solid mechanics [27,45]. Additionally, the ASME V&V Committee is working on discipline-specific standards that encompass applications^{1} to nuclear system thermal fluids behavior, medical devices, advanced manufacturing, energy systems, and machine learning.

American Society of Mechanical Engineers V&V 20-2009 [26] presents a quantitative V&V approach for the computational fluid dynamics and heat transfer engineering domain. The approach is based on the concepts of errors and uncertainties in experimental uncertainty analysis, first initiated by Coleman and Stern [46], and extends their usage for considering errors and uncertainties in simulation output. In this approach, a validation metric, namely, *validation uncertainty*, is calculated from sources of uncertainty in the experimental data, simulation input uncertainty, and numerical approximation uncertainty (calculated from code verification and solution verification processes) [26]. Underlying assumptions to this calculation are that the errors due to experiment, simulation input parameters, and numerical approximation (e.g., discretization) are independent of each other and their distributions are roughly Gaussian. Consequently, the model error, based on the validation uncertainty and a comparison error (which measures the difference between the experimental data and simulation output), can be estimated to be within an uncertainty range [26]. The approach presented in ASME V&V 20-2009 has a specific scope for the validation problem such that the degree of accuracy of the simulation output is quantified only at validation points where conditions of the actual experiment are simulated [26]. In other words, ASME V&V 20-2009 requires experimental data to compare with the simulation output when performing the validation. Thus, this approach is not applicable for assessing the degree of accuracy of the simulation output at those application points that differ from the validation set-points defining the validation domain. The ASME V&V Committee recommends that acceptability criteria not be included within the scope of validation and, instead, should be relegated to certification or accreditation concepts tied to a specific project; hence, the intended use of the model is general [26].

ASME V&V 10-2019 [27] presents a V&V approach for the computational solid mechanics domain. In this standard, the focus centers on providing a common language, a conceptual framework, and general guidance for implementing a recommended bottom-up validation approach for simulation models of hierarchical systems, which are common subjects of interest in the computational solid mechanics domain. In this bottom-up validation approach, the validation is done for each constituent component/submodel of the system-level model, starting from the lowest system hierarchical level. ASME V&V 10-2019 introduces the concept of predictive capability (when distinguishing between the validation domain and application domain of a simulation model) and recommends that “it is best to specify validation criteria before initiating model-development and experimental activities in order to establish a basis for defining how “good” is good enough” [27]. In assessing the uncertainty associated with the model prediction, ASME V&V 10-2019 requires that all significant sources of uncertainty in the simulation and experimental results be identified and treated to quantify their effects on the model predictions [27]. These sources of uncertainty include input parameter uncertainty, numerical approximation uncertainty, model-form uncertainty, and experimental data uncertainty [27]. These uncertainty sources, depending on their characteristics, can be categorized as aleatory or epistemic uncertainty and can be quantified using recommended/available techniques [27]. After accounting for their associated uncertainty, simulation results and experimental results are quantitatively compared using a validation metric, and the metric results are then compared against predefined accuracy requirements to determine whether an acceptable agreement has been achieved [27]. ASME V&V 10.1-2012 [45], which is a supplement to ASME V&V 10-2019, provides a detailed example to illustrate the most important aspects of ASME V&V 10-2019. It is worth noting that ASME V&V 10.1-2012 [45] neither illustrated the separate treatment nor evaluated the distinguished impacts of aleatory uncertainties versus epistemic uncertainties on the validation of simulation model predictions. In addition, the case study in ASME V&V 10.1-2012 [45] did not address the accuracy of model predictions for conditions where experimental data are not available as it stated that this is out of the scope of the standard. As compared to the ASME V&V 20-2009 approach, a key philosophical difference is that the ASME V&V 10-2019 approach tries to quantify model-form uncertainty while the validation uncertainty in the ASME V&V 20-2009 approach, when possible, seeks to correct the model with the data.

### 2.2 Verification, Validation, and Uncertainty Quantification Based on Probability Bound Analysis.

Probability bound analysis (PBA) is a combination of interval analysis and probability theory to produce probability bounds (also known as probability boxes or p-boxes) that can be used to represent and distinguish aleatory and epistemic uncertainties, allowing for a comprehensive uncertainty propagation approach. A VVUQ framework that utilizes PBA was introduced by Oberkampf et al. [19,20,47–49]. This framework is consistent with the conceptual framework provided in ASME V&V 10-2019 [27] and applicable for simulation models of hierarchical/multilevel systems. The illustration provided in ASME V&V 10.1-2012 [45], however, falls short of demonstrating the feasibility and benefits of using p-boxes to separately handle aleatory and epistemic uncertainties (as compared to the works in Refs. [19], [20], and [47–49]). In the VVUQ framework based on PBA [19], [20], and [47–49], input, numerical, and model-form uncertainties, and their contributions toward the total uncertainty in the simulation prediction, are separately quantified. The distinction between aleatory and epistemic uncertainties is considered in their studies such that aleatory uncertainties, epistemic uncertainties, and mixed aleatory-epistemic uncertainties are respectively characterized by precise probability distributions, intervals, and probability boxes (*p*-boxes) [19,49]. Nested double-loop Monte Carlo sampling methods are used to propagate input uncertainties through the simulation model [19,20]. For numerical uncertainties, the Richardson extrapolation technique is used to quantify discretization errors that arise due to the spatiotemporal discretization of the mathematical model and its input conditions [19,49]. To quantify model-form uncertainty resulting from assumptions and approximations underlying the simulation model, the area between the distributions (either in the form of cdfs or *p*-boxes) obtained from simulation results and experimental observations is utilized, i.e., the area validation metric approach [19,20,49]. In addition, model-form uncertainty associated with the application/prediction conditions is obtained by interpolating or extrapolating the area validation metric obtained from the validation conditions using a regression model. Finally, the total uncertainty in the simulation prediction can be obtained in the form of another *p*-box by appending the epistemic intervals that represent the calculated model-form and numerical uncertainties to both sides of the *p*-box obtained from propagating the input uncertainties.

Existing gaps in the current practice of the PBA-based VVUQ approach [19,20,47–49] include (i) extrapolation of model-form uncertainty from the validation domain to the application domain is not a straightforward task and is questionable especially if the underlying system is operating under abnormal or extreme environmental conditions which involve more geometric and/or physical interaction complexities as compared to the conditions considered in the validation domain; (ii) the approach has limited capability in dealing with situations where most (or all) available experimental data are obtained from systems that are distantly related to the system of interest; (iii) lack of a formal, quantitative screening technique that can screen out insignificant sources of uncertainty in order to efficiently reduce the dimension of the input space and, hence, improve the computational efficiency of the validation process; and (iv) lack of an appropriate sensitivity analysis technique that can separately rank the importance of aleatory versus epistemic uncertainties.

### 2.3 Verification, Validation, and Uncertainty Quantification-Based on Bayesian Approach.

Verification, Validation, and Uncertainty Quantification frameworks based on the Bayesian approach are sometimes called Bayesian model calibration as they convolve calibration and validation. This approach is different from the PBA-based VVUQ approach [19,20,47–49] where, as argued by Oberkampf and Roy [28], calibration and validation should be rigorously separated. The Bayesian approach to VVUQ was first introduced by Kennedy and O'Hagan [50] in which validation and calibration were combined to produce an updated model that is an improved representation of the “true value” in nature. Experimental results at a range of input settings for the model should be available to calculate the observed discrepancy between the experiment and the simulation in these settings. These discrepancies can then be used to update a model discrepancy term through Bayesian updating, and Gaussian processes are used over the input domain to connect the observations to one another to create the updated model. In the Bayesian approach, available data are leveraged to improve the model as much as possible. From this perspective, the PBA approach by Oberkampf and Roy reserves the first use of any validation data for assessing the performance of the model (i.e., quantifying the model-form uncertainty) so as to reveal an explicit characterization of its predictive abilities [19]. Examples of work that applied and/or extended the Bayesian model calibration approach by Kennedy and O'Hagan [50] can be found in Refs. [51–56].

An advantage of introducing the discrepancy function (a.k.a., model-form error) into the governing equations of the simulation model (in the Bayesian approach to VVUQ) is that it can consider the possible existence of unrecognized uncertainty sources (e.g., missing physics and other inaccuracies of the computer model) while updating the calibration variables. The simultaneous estimation of calibration parameters and model-form error proposed in VVUQ frameworks based on the Bayesian approach, however, requires high-dimensional integrations, of which solution requires complex statistical computational methods such as the Markov Chain Monte Carlo method. Additional challenges of the Bayesian approach to VVUQ should also be acknowledged. One is that this approach cannot account for the model-form errors in the prediction system that do not appear in the test systems. If the test and prediction systems differ in boundary conditions, the model-form errors associated with the elements at the boundary of the prediction system are not updated [51]. Extrapolating/projecting the model-form errors from the tested system (or validation domain) to the prediction system (or application domain) is challenging due to the potentially complex nature of the errors, which could be different for models/submodels at different levels of a complex system.

While extending the Bayesian approach for problems that involve simulation models of hierarchical complex systems subject to both aleatory and epistemic uncertainties in the structural analysis domain, Urbina et al. [17] presented a Bayesian network methodology to include both aleatory and epistemic uncertainties independently at each level of a multilevel complex engineering system while quantifying the uncertainty in the system-level response. Originally, pure aleatory uncertainties were represented with probability distribution functions, while pure epistemic uncertainties were represented with intervals [17]. With this Bayesian network approach, system-level and subsystem-level models, their inputs, model parameters, and outputs, as well as various sources of model error and experimental data are connected through a Bayesian network [17]. Epistemic uncertainty sources are incorporated into the Bayesian network by converting their interval representation into probability density functions using the percentile matching method and kernel density estimators [17]. Consequently, sources of aleatory and epistemic uncertainties are propagated through the Bayesian network to quantify the system-level prediction uncertainty [17]. Building on the work of Urbina et al. [17], Sankaraman and Mahadevan [18] formalized a Bayesian methodology that integrates model verification, validation, and calibration activities into the overall uncertainty quantification for the system-level model response. Their methodology was demonstrated for single-level models and two types of multilevel models [18]. Unlike the Bayesian calibration approach by Kennedy and O'Hagan [50], Sankaraman and Mahadevan [18] separated calibration and validation by using two independent sets of data for these two activities. Neither of the studies by Urbina et al. [17] and Sankaraman and Mahadevan [18], however, considered mixed sources of uncertainties, e.g., an input quantity or a model parameter that is subject to both aleatory and epistemic uncertainties [17]. Meanwhile, in the domain of nuclear safety analysis, Kwag et al. [57] defined an overlapping coefficient between the simulation and experiment probability distribution functions (pdfs) as the validation metric. This metric is extrapolated through a Bayesian network representation of the complex system from component level to system level. The resulting system-level overlapping coefficient is then compared with a threshold value to evaluate the adequacy of the current degree of validation. This validation framework was later enhanced by Bodda et al. [58] to include an additional validation metric, termed consistency index, to help evaluate the confidence in the predicted system-level overlapping coefficient. However, the validation framework proposed by Kwag et al. [57] and enhanced by Bodda et al. [58] neither explicitly accounted for certain sources of uncertainty in the system model (e.g., model-form uncertainties) nor separated aleatory versus epistemic uncertainties.

## 3 Probabilistic Validation: Theoretical Foundation

As stated in Sec. 2, multiple definitions of validation exist and are used in various fields. The DoD/AIAA/ASME definition of validation [16,24–27] is a widely accepted one among several communities; however, there are still points of disagreement on its interpretation [30]. In addition, this validation definition, and the associated methodologies [16,24–27], are only usable when there is a sufficient amount of “referent” data to compare with the simulation prediction. This section conceptualizes and theorizes the goal and key characteristics of the PV methodology.

The goal of the PV methodology is to assess the validity of a simulation prediction that is tied to a specific application of interest, especially when empirical validation data associated with the system response quantity (SRQ) of interest are limited or unavailable. The simulation predicted SRQ is obtained from running the simulation code for some application conditions of interest. The rational underlying the validity assessment in the PV methodology is consistent with the viewpoint prevalent in various scientific communities [16,25,26,28,30,59], i.e., validity should be evaluated using standards that center around truth, accuracy, and credibility. To be specific, considering the fact that uncertainties exist in the simulation prediction, and its corresponding true value is usually unknown (or if empirical data are treated as the truth then there are uncertainties associated with the data), the PV methodology adopts the standard of “credibility of accuracy^{2}.” This is because “what researchers can at most obtain in typical cases are more or less credible claims about the accuracy of results” [29], and that is “the closest we can get in the direction of truth” [29]. In practice, when sufficient validation data associated with the SRQ of interest are available, traditional empirical validation approaches (e.g., classical hypothesis testing, Bayesian hypothesis testing) can be leveraged to evaluate the agreement between the simulated and empirically measured values of the SRQ using an appropriate validation metric. However, in predictive modeling of complex systems, such validation data are either limited or unavailable at the system level. The PV methodology provides a more comprehensive validation approach (as compared to the empirical validation approach) such that it is applicable for the whole spectrum of validation data availability (Fig. 2). This spectrum (Fig. 2) encompasses validation problems that range from situations where no empirical validation data at the simulation output level are available (left end of the spectrum) to situations where a sufficient amount of such data exists (right end of the spectrum). The PV methodology offers a mechanism for locating the validation problem of interest on this spectrum and for selecting an appropriate validation approach that suits the characteristics of the given validation problem. To overcome the reliance on empirical data for validation, the PV methodology leverages uncertainty analysis techniques to support the validity assessment of the simulation prediction. In particular, the PV methodology introduces the concept of degree of confidence which is determined by the magnitude of epistemic uncertainty in the simulation prediction. The use of this degree of confidence concept allows for one to make credible claims about the accuracy of the simulation prediction even when validation data are limited/unavailable and, hence, makes it compatible with the standard of “credibility of accuracy.” To complete the validity assessment, the PV methodology also leverages the design and use of acceptability criteria to evaluate whether the total uncertainty (including both aleatory and epistemic uncertainties) in the simulation prediction can be accepted for a specific application of interest. Consequently, by combining the results of the degree of confidence and acceptability evaluations, conclusion can be made regarding the validity of the simulation prediction of interest.

Note that all the validation approaches reviewed in Sec. 2 leverage uncertainty analysis for assessing errors and uncertainties inherent in activities of the M&S development process. Compared to the existing approaches reviewed in Sec. 2, the PV methodology has a unique combination of five characteristics as explained in Secs. 3.1–3.5. Although some of the following characteristics (partially or completely) exist in the existing methods reviewed in Sec. 2, the integration of them under one methodology is a unique contribution of this research.

### 3.1 Key Characteristic #1: The PV Methodology Offers a Multi-Level, Multimodel-Form Validation Analysis That Can Integrate Data and Uncertainty Analysis at Multiple Levels of the System Hierarchy to Support the Degree of Confidence Evaluation.

In modeling complex systems, though empirical validation data may be limited or nonexistent at the system level, they are usually available at various subsystem levels. The quantity, quality, and relevancy of these subsystem data, however, vary considerably, depending on the degrees of physics coupling and geometric complexity of the real world and the associated costs for obtaining such data. In parallel, models built for these complex systems are usually composed of elements/submodels that interact in multiple ways, depending on the detailed structure of the complete system hierarchy. As the data and models at these subsystem levels are subject to various sources of uncertainty that would eventually affect the total uncertainty in the system-level prediction, a multilevel validation analysis that can efficiently combine these sources of uncertainty to support the validity assessment would be needed. This multilevel approach is not new as it was originally conceptualized by a number of researchers in the fluid dynamics community and was referred to as the building block approach [13–16]. The core idea of this approach is that the complete and complex system is divided into simpler tiers to encourage assessment of the model accuracy at multiple levels of complexity and physics coupling. Recent efforts in the structural analysis [17,18,60] and nuclear safety analysis [57,58] domains have utilized Bayesian methods to connect data and models at different system levels (as discussed in Sec. 2.3). For example, the quantitative relationships among elements (at different levels) of the system hierarchy were modeled using mechanistic equations and/or response surface models [17,18] or using event tree/fault tree models [57,58]. The models were then mapped into a Bayesian network to facilitate uncertainty propagation [17,18,57,58]. In this regard, the PV methodology offers a systematic, multilevel multimodel-form validation analysis to efficiently integrate data and uncertainty sources at multiple levels of the system hierarchy into its degree of confidence evaluation. Note that before the execution of the PV methodology, the hierarchical system of interest needs to be represented using a system-level model that is composed of element-/submodels at different hierarchical levels of the overall system and their interrelationships. In addition, multiple plausible model forms associated with each element-/submodel may exist and can be considered in the PV methodology. As compared to the previous works [17,18,57,58], the scope of uncertainty analysis in the PV methodology is more comprehensive. Specifically, the identification of all important sources of uncertainty that could eventually affect the uncertainty in the system-level simulation prediction is guided in PV by a theoretical causal framework presented in Sec. 3.5. Regarding the validity assessment, the PV methodology provides a different approach (as compared to that of the previous works [17,18,57,58]) by introducing an algorithm that can leverage (i) results of the uncertainty analysis and (ii) suitable predefined criteria for evaluating the validity of the simulation prediction for a specific application (Sec. 4.2).

### 3.2 Key Characteristic #2: The PV Methodology Separates Aleatory and Epistemic Uncertainties and, When Possible, Differentiates Between Two Forms of Epistemic Uncertainty (i.e., Statistical Variability and Systematic Bias) While Considering Their Influence on the Uncertainty in the Simulation Prediction.

The separation of aleatory and epistemic uncertainties has been a common practice for uncertainty analysis in several domains, such as the nuclear [61] and aerospace [62] domains. The differentiation between the two sources of epistemic uncertainty (i.e., statistical variability and systematic bias) has also been taken into consideration in the risk analysis field for treatment of model uncertainties [63,64]; however, it has not been considered under the scope of validation. The PV methodology applies the abovementioned separation and differentiation for its validity assessment. This is because these sources of uncertainty have different impacts on the total uncertainty of the simulation prediction and are addressed differently if one is seeking uncertainty reduction strategies. In the PV methodology, all important sources of aleatory and epistemic uncertainty, existing in all stages of the M&S process, are identified, characterized separately, and appropriately propagated up to the simulation prediction level (or to the application output level). As a result, the total uncertainty associated with the simulation prediction is a mixed combination of aleatory uncertainty and epistemic uncertainty inherited from the M&S process. Mixed aleatory-epistemic uncertainties are represented in the PV methodology using the probability bound analysis approach, i.e., the uncertainty is represented by a family of probability distributions, i.e., a probability box (p-box) such as the one in Fig. 3.

In Fig. 3, aleatory uncertainty in the SRQ is reflected in the type of corresponding distribution family (e.g., gamma distribution family). Aleatory uncertainty is rooted in the inherent stochastic nature of the physical phenomena underlying the SRQ and can only be quantified/characterized, not reduced. In contrast, both statistical variability and systematic bias arise from a lack of knowledge and, hence, are categorized as sources of epistemic uncertainty. Statistical variability (e.g., variability in the values of the distributional parameters of a gamma distribution family due to limited supporting data) is reflected in the width of the distribution family, while systematic bias (e.g., bias due to conservative assumptions and/or idealizations introduced into the M&S process) contributes to both the width and the relative location of the p-box within the SRQ domain. A formal quantification of various sources of systematic bias is a challenging task, and common practices in scientific computing often only identify the sources of systematic bias and verify their appropriateness through expert review [65]. In the nuclear domain, NUREG-1855 [61] emphasizes the need to qualitatively identify (but not quantify) conservative biases associated with chosen models and adopted modeling assumptions, and their impact on the conservatism in the PRA. The PV methodology adopts this practice when considering the differentiation between the two abovementioned forms of epistemic uncertainty and utilizes it to facilitate the validity assessment (Secs. 4.2.2 and 4.2.3).

### 3.3 Key Characteristic #3: The Probabilistic Validation Methodology Uses a Risk-Informed Acceptability Criteria, Along With a Predefined Guideline, to Evaluate the Acceptability of the Simulation Prediction.

In situations where sufficient validation data are available, validation metric and acceptability criteria can be designed to leverage such data. For example, in the ASME V&V 10-2019 [27] and its supplement ASME V&V 10.1-2012 [45] standards, an area validation metric (e.g., the area between the model predicted and the experimentally measured cumulative distribution functions of the SRQ of interest) is used to measure the agreement between the model predicted and experimentally measured SRQ values, i.e., to evaluate the accuracy of the simulation model. This validation metric can then be compared against some predefined acceptability criteria (also termed accuracy requirements or validation requirements) to evaluate the acceptability/adequacy of the simulation prediction for a specific application. In the case study in ASME V&V 10.1-2012, the accuracy requirement was designed such that the calculated area validation metric, normalized by the absolute experimental mean, was less than 10% [45]. This approach of designing and leveraging validation metrics and acceptability criteria is common in different engineering domains when sufficient validation data can be obtained. However, in situations where validation data are insufficient or unavailable, this approach can no longer be applied. To overcome this challenge, the PV methodology introduces the concept of degree of confidence, which is determined by the magnitude of epistemic uncertainty in the simulation prediction (i.e., the area within the p-box in Fig. 3), as an analog to the concept of validation metric. In addition, the acceptability criteria/requirements should be designed to help evaluate the acceptability of the simulation prediction (i.e., to evaluate the acceptability of its current degree of confidence) for a specific application even when experimental validation data are not available. In general, acceptability criteria should be developed based on the needs and goals of the application of interest, and this should be done before performing the validation so that the criteria can reflect the expectations of the model for that particular application.

The PV methodology is developed to evaluate the validity of simulation models so that they can be used in PRA to support risk-informed decision-making with confidence under the U.S. NRC risk-informed regulatory framework. Results of the validity assessment help determine whether the validity of the simulation model would need to be improved. However, if the risk estimates obtained from PRA, with the simulation prediction being an input, are safely within regulatory risk limits (e.g., risk acceptance guidelines in regulatory guide 1.174 [66]), the validity of the simulation prediction should be accepted for the risk assessment application of interest provided that the degree of confidence has already been calculated properly and thoroughly (i.e., all important sources of uncertainty affecting the uncertainty in the simulation prediction have been properly accounted for in calculating the degree of confidence). For example, one may obtain an uncertainty interval for the 95-percentile values of the risk estimates from the PRA model by accounting for all important sources of uncertainty in the PRA model, including the uncertainty associated with the simulation prediction of interest. In this case, the uncertainty associated with the simulation prediction of interest has already been characterized properly and thoroughly by accounting for all sources input parameter, model-form, and numerical approximation uncertainties. Figure 4(a) illustrates general cases where the uncertainty associated with the 95-percentile value of the risk estimate (or any application output of interest) and the regulatory risk acceptance guideline (or any acceptability criteria associated with the application output of interest) can be compared to evaluate the acceptability of the simulation prediction (and its associated uncertainty). With this design, the simulation prediction can be considered “valid” for the specific application conditions if the total uncertainty associated with the application output of interest (i.e., one of the blue intervals in Fig. 4(a)), obtained with the simulation prediction being an input, falls entirely below the acceptance guideline. Details on the assessment associated with these cases in Fig. 4(a) are further discussed in Sec. 4.2.2.

If acceptability criteria are available at the simulation prediction level (which implicitly consider the system-level risk for a specific use of the simulation model), the acceptability evaluation can be performed at this level. Figure 4(b) illustrates various cases where the uncertainty associated with the simulation prediction, represented with a p-box as it is a mixed source of aleatory and epistemic uncertainty, can be compared against corresponding acceptability criteria (i.e., one of the vertical red dashed lines in Fig. 4(b)) to evaluate the acceptability of the simulation prediction. The simulation prediction can be considered “valid” for the specific application conditions if the total uncertainty associated with the predicted SRQ (i.e., the whole p-box in Fig. 4(b)) falls entirely below the corresponding acceptability criteria. Details on the assessment associated with these cases in Fig. 4(b) are further discussed in Sec. 4.2.3. Note that in both Figs. 4(a) and 4(b), for simplicity, the acceptability criteria are represented as point values; in reality, these criteria may be in ranges.

### 3.4 Key Characteristic #4: The Probabilistic Validation Methodology Combines Uncertainty Analysis With a Two-Layer Sensitivity Analysis to Streamline the Validity Assessment and to Efficiently Improve the Degree of Confidence in the Simulation Prediction.

The PV methodology quantifies epistemic uncertainty (key characteristic #1) as a supporting metric that facilitates a validity assessment for the simulation prediction. Consequently, lack of validity can be addressed by reducing the sources of epistemic uncertainty that contribute to the total uncertainty in the simulation prediction. In this regard, Oberkampf and Roy [28] emphasize the need to perform a sensitivity analysis, usually after an uncertainty analysis, to deal with the questions of how the total uncertainty in the simulation prediction changes as a function of the contributing sources of uncertainty and their structural relationships. Results from such a sensitivity analysis can be presented in the form of an importance ranking for the contributing sources of uncertainty with respect to their influence on the total uncertainty in the simulation prediction. The importance ranking results are useful for prioritizing and allocating resources in a project when decision-makers look for efficient and economic uncertainty reduction strategies. Sensitivity analysis is typically a more complex mathematical task than uncertainty analysis and often requires a larger amount of computational resources. This is usually true when the simulation model includes multiple elements/submodels in a hierarchical structure, a large number of uncertain input parameters, and/or model uncertainty sources. Some VVUQ approaches [19,20] acknowledge the importance of performing sensitivity analysis but they lack a formal, quantitative screening technique that can screen out insignificant sources of uncertainty to efficiently reduce the dimension of the input space. To overcome such a computational challenge, the PV methodology includes two-layer sensitivity analysis. The first layer screens out insignificant sources of uncertainty (Sec. 4.1.3) by using computationally efficient sensitivity analysis methods; for example, one-at-a-time (OAT) techniques such as Morris elementary effect analysis [67]. The second layer performs a comprehensive global sensitivity analysis (Sec. 4.3.1) to rank the contributing sources of epistemic uncertainty.

### 3.5 Key Characteristic #5: The Probabilistic Validation Methodology is Equipped With a Theoretical Causal Framework That Supports the Comprehensive Identification and Traceability of Uncertainty Sources Influencing the Uncertainty in the Simulation Prediction.

As the PV methodology relies on uncertainty analysis to assess the validity of the simulation prediction of interest, it is crucial that all, or at least all the dominant, uncertainty sources and their interrelationships be identified, characterized, and propagated. These sources of uncertainty arise during the M&S process and eventually propagate into the simulation prediction. Therefore, examination of the M&S process would be useful to understand where and how the uncertainty contributors arise, as well as how they could influence the uncertainty in the simulation prediction. Efforts to develop general principles and procedures for M&S, including defining the related activities and categorizing them into phases, were initiated by the operations research and systems engineering communities [68–70]. Subsequent efforts in the scientific computing communities developed a more comprehensive framework that includes main activities of the computational simulation model development [71–73]: (a) conceptual modeling of the system of interest, (b) mathematical modeling of the conceptual model, (c) discretization and algorithm selection for the mathematical model, (d) computer programing of the discrete model, (e) numerical solution of the computational model, and (f) representation of the numerical solution. In these efforts to formalize the M&S procedure, three modeling phases were recognized: conceptual, mathematical, and computational modeling. The M&S procedure based on these three phases became a common practice in various computational simulation communities (e.g., computational fluid dynamics, computational solid mechanics) and is recommended in the ASME V&V standards [26,27]. These efforts have been leveraged for identification and analysis of M&S uncertainties in various studies. Oberkampf et al. [73] identified and discussed dominant sources of uncertainty in the aforementioned M&S activities, while Roy and Oberkampf [19] grouped them into three main categories of uncertainty that are associated with input parameters, numerical approximations, and model form. Zhu [74] proposed a method for systematically identifying sources of model uncertainty based on reviewing the systematic modeling process and developed a fish-bone diagram to graphically represent at what point in each modeling step these sources of uncertainty might occur. Similar ideas of using the model development process for identifying and analyzing model uncertainties can be found in Refs. [75] and [76]. All these previous studies, however, have two limitations: (i) they lack a formal modeling technique to model the relationship among the identified sources of uncertainty; and (ii) they lack a formal measuring technique to evaluate the degree of influence of the identified sources of uncertainty on the simulation prediction and its uncertainty.

To overcome these limitations, this research develops a high-level, generic theoretical causal framework (Fig. 5) that aims to systematically guide the identification of uncertainty sources and their relationships. This theoretical causal framework provides a guiding process for (i) decomposing the complex M&S process to facilitate the identification of causal factors influencing the simulation prediction and their paths of influence; and (ii) identifying uncertainty sources associated with the causal influencing factors and the interrelationships among these uncertainty sources that together can influence the uncertainty in the simulation prediction. In this framework (Fig. 5), the causal influencing factors and their associated uncertainties are organized into the three modeling phases of a typical M&S process: the conceptual, mathematical, and computational modeling phases. Sources of uncertainty are organized in this manner because the uncertainties arising from each phase are often treated differently.

The theoretical causal framework (Fig. 5) is a qualitative input to the PV methodology (Sec. 4) to ensure that the uncertainty quantification for the simulation prediction considers all the important causal factors and their associated uncertainties. To provide more details on the status of the proposed theoretical causal framework (Fig. 5), Secs. 3.5.1–3.5.3 below explain the influencing factors, their causal links, and the associated sources of uncertainty included for the three modeling phases.

#### 3.5.1 Influencing Factors and Associated Uncertainty Sources in the Conceptual Modeling Phase.

In an M&S application, one is interested in the behavior of system of interest $S$ at some specific input conditions $x$, denoted by $gS(x)$. As the system is assumed to be complex, its response $gS(x)$ usually cannot be determined directly, and one needs to develop models to obtain such a response. One would start with the conceptual modeling phase to develop a conceptual model, denoted by $gconc(\xb7)$, which is a qualitative description of the system of interest, $S$.

Before the conceptual modeling, the system of interest and its surroundings (i.e., defining the conceptual model of reality [77]) should be defined, and the requirements (or objectives) of the M&S (i.e., application requirements) should be specified [28,71–73]. Subsequently, four main tasks [27,28,71–73,77–81] should be completed. First, the operating environment (e.g., normal, abnormal, and hostile environments [28]) and all possible operating scenarios (where each scenario identifies events or sequences of events that could possibly be considered for a given environment) need to be specified. Second, all phenomena of interest and possible couplings among them that need to be incorporated into the modeling should be specified. Third, assumptions to simplify the system of interest and the phenomena of interest and the couplings among the phenomena need to be derived considering the application requirements. Finally, the system response(s) of interest are specified according to the application requirements. In the specifications within these four tasks, the operating environment can influence the operating scenarios. Meanwhile, the operating scenarios can influence the choice of the phenomena of interest, their possible couplings, and the conceptualized system of interest and its surroundings [28,71–73]. Note that the conceptualized system of interest and its surroundings are core to the conceptual model of analysis, which is an “evolved” version of the conceptual model of reality as the modeling process proceeds with simplification assumptions [77]. In fact, simplification assumptions can influence both the conceptualized system of interest and the selected phenomena of interest and their possible couplings [28,71–73,80,81]. By examining the descriptions of these tasks in Refs. [27], [28], [71–73], and [77–81], the influencing factors in the conceptual modeling phase are extracted, and the causal relationships among these factors are identified, as shown in Fig. 5.

Though each of the factors in the conceptual modeling phase has its own uncertainty attached to it, they arise mainly due to the inability to establish, with a reasonable degree of certainty, the validity of specifications and (simplifying) assumptions made during this phase (given the application requirements). This is because these specifications and assumptions are usually based on subjective interpretations of limited information. Uncertainties in the conceptual modeling phase are difficult to deal with because these uncertainties are related not only to the accuracy and appropriateness of these specifications and assumptions but also to their completeness and possible interrelationships. These uncertainties are, therefore, usually represented as alternative sets of specifications and assumptions for the application of interest (each set may represent an alternative conceptual model). At a high level of abstract, the theoretical causal framework (Fig. 5) helps visualize how the specifications and assumptions in the conceptual modeling phase can impact other influencing factors in the subsequent mathematical modeling phase.

#### 3.5.2 Influencing Factors in the Mathematical Modeling Phase.

where $ymath$ denotes the SRQs of interest obtained from solving the mathematical model at the input conditions of interest $x$. Specifications of the initial conditions, the system geometry, and the modeling parameters are influenced by the conceptualized system of interest defined in the conceptual modeling phase [28]. Meanwhile, specifications of the boundary conditions and the system excitation are influenced by the conceptualized system surroundings (also defined in the conceptual modeling phase) [28]. The more detailed the system of interest and its surroundings are when conceptualized, the less ambiguous the specifications of the input conditions of interest in the mathematical modeling phase. In deriving the form of the mathematical model, one would need to first select an appropriate set of scientific principles and laws based on the considerations [4] of (i) the conceptualized system of interest, (ii) the phenomena of interest and their possible couplings, and (iii) the SRQ of interest. These are all specified during the conceptual modeling phase. One then also needs to consider necessary idealizations and simplifications [4]. For example, a compartment fire model is usually based on the fundamental conservation laws of mass, momentum, and energy that are applied either to the control volumes that make up the fire compartment or to the fire compartment as a whole. CFD fire simulation codes such as fire dynamics simulator (FDS), however, do not solve the exact underlying Navier–Stokes equations obtained from such considerations; instead, they solve an approximate form of the equations that are derived by applying some reasonable simplifications [82]. In addition, the form of $gmath(\xb7)$ in Eq. (1) may include multiple submodels if the system of interest is complex, multiphysics, and/or hierarchical. An example of such submodels in a fire model is a mathematical model for fire growth, i.e., the power-law fire growth model ($Q\u02d9=\alpha tp$). Here, we refer to the power form of the equation as the mathematical model form/structure, $\alpha $ (fire growth constant) and $p$ (positive exponent) as the parameters of the submodel, $t$ as the independent variable, and $Q\u02d9$ as the dependent variable or the SRQ of interest. The modeling parameters (a component of the input conditions discussed above) would be defined by specifying $\alpha $, $p$, and $t$, which are also influenced by the choice of the scientific principles and laws. By considering some simplifications to the power-law fire growth submodel, one would be able to obtain the t-squared model ($Q\u02d9=\alpha t2$ where $p=2$), which has been shown to be applicable for modeling the heat release rate of a wide range of fuels.

Sources of uncertainty associated with the influencing factors considered in the mathematical modeling phase can be identified and mapped to two main groups: those that contribute to the uncertainty in the mathematical model inputs $x={IC,BC,SG,\u2009SE,\u2009MP}$, and those that contribute to the uncertainty in the mathematical model form $gmath(\xb7)$.

It is also important to note that multiple forms of the mathematical model may exist depending on: (i) how the system of interest is conceptualized in the conceptual modeling phase (e.g., depends on the set of phenomena to be included in each conceptual model alternative or on the interpretation of the phenomena of interest), or (ii) choice of scientific principles and laws, idealizations and simplifications that are used to derive the form of the mathematical model. In such cases, one may need to deal with a set of plausible mathematical model forms, $gmath(i)(\xb7)$, where the selection of one model form from that set for the application of interest would introduce an additional source of uncertainty.

#### 3.5.3 Influencing Factors in the Computational Modeling Phase.

The computational model form is influenced by the continuous form of the mathematical model, the chosen numerical algorithm, and its associated discretization, as well as the instantiation of that algorithm into the programing code [27,28,71–73]. Meanwhile, the computational model inputs are influenced by the continuous form of the mathematical model inputs, the discretization process (i.e., spatial and temporal discretization of the input conditions), and the input preparation process [28,71–73]. In solving the computational model, approximate numerical solutions are typically used instead of exact solutions. The numerical approximations are influenced by selection of the numerical algorithm, instantiation of that algorithm into the programing code (e.g., unidentified errors in the code after the code verification process), the discretization process, computer round-offs, and possible iterative convergence errors [49]. Note that the iterative convergence errors may occur when discretization of the mathematical model results in a simultaneous set of algebraic equations that are solved using approximation techniques [49] and, hence, are influenced by the discrete form of the computational model. Finally, the simulation result obtained from running the computer code at the input conditions of interest is influenced by the computational model form, the computational model inputs, and the numerical approximations [26–28,71–73].

Sources of uncertainty associated with the influencing factors in the computational modeling phase are mapped to three main groups, i.e., those that contribute to the uncertainties in (i) the computational model inputs $x={IC,BC,SG,\u2009SE,\u2009MP}$, (ii) the computational model form $gcomp(\xb7)$, and (iii) the numerical approximations. In addition, due to its discretized nature, the computational model also contains input parameters that define its spatiotemporal grid sizes, number of iterations, and decimal precision. As a computational model is always constrained by limited computational resources, there are inevitable sources of error (e.g., discretization error, iteration error, and precision error) associated with these parameters. These sources of error can either be quantified (and then removed from the model response) or converted into numerical uncertainties (see Sec. 4.1.5 for more details). Multiple forms of the computational model may exist depending on: (a) the presence of multiple forms of the mathematical model as discussed in Sec. 3.5.2, or (b) the choice of numerical algorithm and discretization methods that are used to derive the form of the computational model. In such cases, uncertainty associated with the lack of knowledge as to which computational model form best represents the system of interest needs to be considered.

From Fig. 5, it can be seen that the uncertainties that have arisen during the conceptual and mathematical modeling phases are carried over to the computational modeling phase through the causal relationships among the influencing factors. Thus, to validate the simulation prediction using its associated uncertainty, the theoretical causal framework indicates that the uncertainty can be quantified by characterizing, propagating, and aggregating sources of uncertainty considered in the computational modeling phase. Modeling and measuring techniques for operationalizing this theoretical causal framework are, however, subject to future research.

## 4 Probabilistic Validation: Methodological Platform

This section describes the PV methodological platform (Fig. 6) developed in this research to operationalize the theoretical foundation established in Sec. 3. At this stage of the research, the PV methodology evaluates the validity of simulation predictions of a complex, hierarchical system. This means that the system-level simulation model may include multiple element-/submodels (e.g., each element/submodel may represent a particular element such as a component, a subsystem, or an isolated set of features/physics of the overall system) that are organized in a hierarchical structure. Other types of complex systems, for instance, coupled, multiphysics systems, are outside of the current research scope and left to future research.

The PV methodological steps are organized into three main modules (modules A, B, and C in Fig. 6). Module A is grounded on characteristics #1, #2, and #5 of the PV theoretical foundation in Sec. 3. Modules B and C are grounded on characteristics #3 and #4, respectively. A shared database is created to facilitate the communication among these modules such that it (i) provides necessary input information for execution of the modules, and (ii) receives and stores results/updated information obtained from these modules. Sections 4.1–4.3 provide details on each of the three modules along with the associated methodological steps.

### 4.1 Module A: Uncertainty Screening, Characterization, Propagation, and Aggregation.

Module A provides a comprehensive uncertainty analysis framework for quantifying the uncertainty associated with an SRQ of interest. For simplicity, the uncertainty analysis procedure in Module A is illustrated with an assumption that the SRQ of interest is a single scalar, stationary quantity $y$, predicted by the computational model. However, the procedure shown in module A is also applicable to cases with independent vector-valued (multidimensional) model output. An extension to independent vector-valued model output could simply be attained by applying the proposed uncertainty analysis procedure for each independent component of the model output vector separately. Meanwhile, an extension to time-dependent model output can be done, though computationally demanding, by considering the time-dependent SRQ as a collection of snapshots of the corresponding stationary SRQ at multiple timesteps of interest. Details of this extension are subject to future research.

Assuming that the system of interest $S$ is a hierarchical system with $Ni$ levels, each system level $i$ (with $i={1,2,\u2026,Ni}$ and $i=Ni$ being the highest hierarchical level associated with the system of interest) is composed of $Ni,j$ elements. Each system element $j$ (with $j={1,2,\u2026,Ni,j}$) in the system level $i$ is subject to an element model, denoted as $Mi,j$. In other words, the system-level model $M$ is composed of multiple element models $Mi,j$ to represent the complex hierarchical structure of the system of interest. Depending on the configuration of the hierarchical structure of the system, outputs of a lower-level element model (i.e., its point-estimate prediction and the associated uncertainty) may be used as inputs to higher-level element model(s) in the system hierarchy. Note that element models on the same hierarchical level may share some inputs that are obtained from lower-level element models, yet are assumed to be independent of each other, i.e., element models on the same hierarchical level do not interact with each other in a way that output from one model is input to another model or vice versa. Element models on the same hierarchical level dynamically interacting with each other (i.e., coupled models) is out of scope of this paper.

It is also assumed that each element model $Mi,j$ may have $Ni,j,k$ plausible, independent model forms, and that each of these model forms can be associated with a corresponding constituent model $Mi,j,k$, where $k={1,2,\u2026,Ni,j,k}$. Note that this assumption may not always be valid. An important issue in aggregating model predictions obtained with multiple plausible model forms is the possibility of dependence among these model forms. Section 4.1.10 provides more discussion around this assumption.

The uncertainty analysis in module A adopts a bottom-up approach, meaning that the process starts with an element model at the lowest level of the system hierarchy ($i=1,\u2009j=1$), quantifies the uncertainty associated with its prediction (while considering its $Ni,j,k$ plausible model forms), and then moves to the next element model on the same hierarchical level ($i=1,\u2009j=2$). Once all element models on the current level ($i=1$) are quantified, the process continues with an element model on the next hierarchical level ($i=2,\u2009j=1$). This process continues until it reaches the system-level model ($i=Ni$) where the model output is the SRQ of interest. In Fig. 6, this bottom-up approach is represented by three logic nodes for $i,j,k$ in the area on left-hand-side of module A.

The uncertainty analysis process in module A includes ten steps (steps 1–10 in Fig. 6), conducted in an iterative fashion, to obtain the total uncertainty associated with element model $Mi,j$ prediction (considering $Ni,j,k$ plausible model forms of $Mi,j$). When the iteration reaches the system-level model $M$, i.e., when $i=Ni$ and $j=1$ (since we are only interested in a single scalar system-level model output), the outcome is the total uncertainty associated with the SRQ of interest at the system level. Details of these ten uncertainty analysis steps are provided below.

#### 4.1.1 Step 1 in Figure 6: Theory-Based Analysis and Qualitative Screening of Causal Influencing Factors and Their Sources of Uncertainties.

This step defines the scope of the uncertainty analysis for each constituent model form $Mi,j,k$ in the system hierarchy. This step consists of two substeps.

*Substep 1.1* is a theory-based analysis in which all the relevant factors that can influence the prediction of $Mi,j,k$ are identified by applying the PV theoretical causal framework (Fig. 5) considering the specific details in the development process of that constituent model (e.g., assumptions and approximations made in the three modeling phases, the model structure/form, and input parameters). Depending on the complexity of $Mi,j,k$, many influencing factors may be identified during this substep. Ideally, all of these factors and their associated uncertainties should be explicitly considered in the scope of the uncertainty analysis process. Often, though, this full-scope uncertainty analysis is neither practical (e.g., due to computational burdens, limited data, or other cost issues) nor necessary (e.g., due to some factors having insignificant influence on the model prediction) and, hence, should be refined to be more practical.

*Substep 1.2* provides a mechanism that helps define a more practical scope for the uncertainty analysis of each constituent model $Mi,j,k$ by considering the unique characteristics and constraints of the M&S problem at hand (e.g., availability of supporting data, budget and time limitations, required level of details of the analysis). This substep is essentially a qualitative screening (Fig. 7) that focuses on (i) determining factors that would be considered explicitly versus those that would be considered implicitly in the uncertainty analysis; and (ii) determining, among those factors that would be considered explicitly, factors that would be treated as fixed values (i.e., deterministic factors) versus factors that would be treated as uncertain values (i.e., nondeterministic factors) in the uncertainty analysis. In this context, “implicit” consideration of an influencing factor means that analysts would acknowledge the presence of the factor and its potential influence on the model prediction but would not quantify this influence in the uncertainty analysis. In other words, the results obtained from the uncertainty analysis (e.g., total uncertainty associated with prediction of $Mi,j,k$) would be conditional on the “default” values and influences of those factors that are implicitly considered. This is in contrast to an “explicit” consideration where the point-estimate value and/or uncertainty associated with the factor and the corresponding influence (of that factor) on the model prediction are quantified in the uncertainty analysis.

To classify influencing factors into “implicit” and “explicit” categories, expert elicitation and/or insights from literature review (e.g., the number and credibility of evidence supporting the importance of the causal factors and their influence on the model prediction) can be leveraged to qualitatively rank the factors based on their importance (e.g., with regards to their influence on the model prediction). Those factors ranked as “important” and “unimportant” can then be categorized into the “explicit” and “implicit” categories, respectively. Examples of the use of this qualitative ranking approach are the phenomena identification and ranking table technique to identify phenomena considered in simulation codes and to rank their importance by using either solely expert judgment [83] or a combination of expert judgment and an analytic hierarchy process [84,85].

In determining whether an “explicit” factor should be treated as a deterministic factor (i.e., its state is represented by a point-estimate value) or as a nondeterministic factor (i.e., its state is represented by an uncertainty distribution), one would need to consider the nature of the factor, the requirements of the M&S problem, as well as the constraints on supporting resources (e.g., available data for uncertainty characterization). This determination should also leverage experience from domain experts and insights from available literature. The nondeterministic factors identified from this step are called “*potentially important causal influencing factors*.” Based on the structure of the theoretical causal framework, one would be able to classify these “potentially important causal influencing factors” into three categories:

Category 1: factors that contribute to the uncertainty in the computational model inputs.

Category 2: factors that contribute to the uncertainty in the numerical approximations used for estimating the computational model prediction.

Category 3: factors that contribute to the uncertainty in the computational model form.

The qualitative screening and categorizations of causal influencing factors in step 1 are illustrated in Fig. 7. Uncertainties associated with factors of the three nondeterministic categories above are analyzed in the subsequent steps.

#### 4.1.2 Step 2 in Figure 6: Approximate Characterization of Uncertainties Associated With Input Parameters of Model $Mi,j,k$.

Nondeterministic factors contributing to the uncertainty in $Xi,j,k$ mainly include those associated with initial and boundary conditions, system geometry, system excitations, and other modeling parameters of $Mi,j,k$ (Eq. (2)). Depending on the results of the qualitative screening in step 1, other nondeterministic factors such as those associated with the input preparation process and the spatiotemporal discretization of the input conditions may also be included within the scope of approximate uncertainty characterization. The purpose of this approximate uncertainty characterization is to facilitate a quantitative screening, performed in step 3, that aims at reducing the dimension of the input space of $Mi,j,k$ to a manageable set of input parameters. Therefore, only available data (e.g., historical and testing data) and expert judgment should be used, instead of spending resources to obtain, for example, additional experimental data. Moreover, the approximate uncertainty characterization in step 2 does not distinguish between aleatory and epistemic uncertainties. Instead, for each of the uncertain input parameters considered in this step, an adequate probability distribution is selected to model its uncertainty, and parameters of this distribution are estimated. The probability distributions and their parameters for $Xi,j,k$ can be determined using appropriate parameter estimation techniques (e.g., maximum likelihood estimation method, Bayesian approach) and tested with goodness-of-fit tests (e.g., Kolmogorov–Smirnov, Chi-squared tests). If an optimal probability distribution is required when characterizing the uncertainty in an input parameter, methods such as the Bayesian information criterion [86,87] can be leveraged.

In this step, it is important that the uncertainties associated with the data sources are accounted for appropriately. For example, techniques/procedures are available for eliciting, analyzing, and characterizing expert opinion [88–91] as well as for reducing impacts of common expert elicitation pitfalls such as misinterpretation and misrepresentation [88]. Regarding experimental data, techniques, and methods are also available to deal with random measurement errors and systematic/bias errors in experimental measurements [92,93]. These available techniques and procedures for handling data uncertainties should be leveraged in the PV methodology when applicable.

#### 4.1.3 Step 3 in Figure 6: Quantitative Screening of Uncertainties Associated With Input Parameters of Model $Mi,j,k$.

This step uses sensitivity analysis to conduct a quantitative screening of the uncertainties associated with the input parameters $Xi,j,k(h)$ of the constituent model $Mi,j,k$ (where $h={1,2,\u2026,nXi,j,k}$ and $nXi,j,k$ is the dimension of the input vector $Xi,j,k$). This step aims at reducing the dimension of the nondeterministic input space of $Mi,j,k$ by identifying those input parameters that have a negligible influence on the model output $Yi,j,k$ and treating them as deterministic parameters (or using their approximate uncertainty distributions). The input parameters that are not screened out from this quantitative screening are kept as nondeterministic variables and their uncertainties are propagated to the model prediction level in step 6. Hereinafter, these unscreened sources of uncertainty are called “potentially dominant sources of uncertainty.”

*p*-level grid so that each input parameter $Xi,j,k(h)$ can be randomly sampled from its discretized values. Using two sets of input parameters where $Xi,j,k(h)$ has two different values, i.e., $xi,j,k(h)$ and $(xi,j,k(h)+\Delta )$, while the other parameters are kept the same, the EE of input parameter $Xi,j,k(h)$ is defined as

*p*assumed to be even [67]. An empirical distribution $Fi,j,k(h)$ of $EEi,j,k(h)$ associated with input parameter $Xi,j,k(h)$, i.e., $EEi,j,k(h)\u2009\u223c\u2009Fi,j,k(h)$, is obtained by randomly sampling different sets of values from the input space of $Xi,j,k(h)$ and repeating the calculation of $EEi,j,k(h)$ using Eq. (5). Based on the resultant random samples of $EEi,j,k(h)$, two sensitivity measures [67], i.e., the mean value $\mu i,j,k(h)$ and standard deviation $\sigma i,j,k(h)$ of the distribution $Fi,j,k(h)$, can be estimated as follows:

where $ri,j,k$ is the number of observations of $Fi,j,k(h)$ per input parameter for each model output, estimated at a cost of $ri,j,k*(nXi,j,k+1)$ model evaluations.

$\mu i,j,k(h)$ assesses the overall influence of input parameter $Xi,j,k(h)$ on the model output $Yi,j,k$, with higher values of $\mu i,j,k(h)$ implying a larger main effect. Meanwhile, $\sigma i,j,k(h)$ estimates the ensemble of that input parameter's effects and indicates the influences of nonlinearity and interactions among input parameters on the model output. A higher value of $\sigma i,j,k(h)$ implies that the EE of input parameter $Xi,j,k(h)$ varies considerably depending on the choice of the values of other input parameters in the input space. The sole use of $\mu i,j,k(h)$ may fail to find a parameter that has considerable influence on the output of model $Mi,j,k$ in situations where the EE distribution has both positive and negative elements, i.e., model $Mi,j,k$ is nonmonotonic or has interactions. In such situations, some interactions may cancel out each other in the process of computing $\mu i,j,k(h)$, producing a low value of $\mu i,j,k(h)$ for an important parameter. In these situations, both $\mu i,j,k(h)$ and $\sigma i,j,k(h)$ should be simultaneously considered [67].

Alternatively, Campolongo et al. [95] suggest using another measure, $\mu *$, which is the mean of the absolute values of the EE, to solve the problem of different signs. The drawback is the loss of information on the sign of the EE. To recover this loss of information, a simultaneous examination of both $\mu $ and $\mu *$ can be performed with no extra computational cost. For example, if $\mu $ is low while $\mu *$ is high, it indicates that the parameter being considered carries the effects of different signs. In practice, the use of $\mu *$ has been shown to be more convenient when the model output contains several variables [95]. To extract maximum sensitivity information, however, all three of these sensitivity measures may be used [96].

Note that even though the ranking of the input parameters may vary from output to output (when several model outputs are of interest), the Morris EE analysis method may still be useful for identifying a subset of input parameters that is noninfluential for all model outputs.

It is also important to note that the original structure of the Morris method [67] as highlighted above cannot provide accurate screening results if the input parameters are dependent. However, recent efforts such as a study by Ge and Menendez [97] have succeeded in extending the method for screening dependent inputs. In their study, Ge and Menendez proposed two extended elementary effects, i.e., the independent elementary effects $EEind$ and the full elementary effects $EEfull$, and offered a qualitative analysis based on these two measures to determine whether the corresponding input parameter is a noninfluential input [97].

As the quantitative screening in this step requires execution of the model $Mi,j,k$, it is important that some reasonable efforts at code verification be made in advance to avoid unnecessary numerical errors that could be induced from (i) the selection of the numerical algorithm for solving the corresponding mathematical model and (ii) the instantiation of that algorithm into the programing code underlying $Mi,j,k$. These efforts are discussed in more detail in step 5 of the PV methodology.

#### 4.1.4 Step 4 in Figure 6: Detailed Characterization of Unscreened Sources of Input Parameter Uncertainty Identified in Step 3.

This step uses additional data (e.g., data collected from additional experiments and/or system operation, subject domain expert judgment) to refine the uncertainty characterization for the unscreened input parameters resulting from the quantitative screening in step 3. Similar to step 2, parameter estimation methods, goodness-of-fit tests, and techniques for treatment of data uncertainties are all needed for the uncertainty characterization/refinement in this step.

With this p-box representation, the distribution type of $FXi,j,k(h)$ (e.g., gamma distribution) captures the aleatory uncertainty in $Xi,j,k(h)$ while the ranges of the hyperparameters $\theta i,j,k(h)$ capture the epistemic uncertainty in $Xi,j,k(h)$ due to lack of knowledge of the true values of $\theta i,j,k(h)$. Note that p-box is a flexible representation that can be used not only when the input parameter is classified as a source of mixed aleatory and epistemic uncertainties but also for sources of purely aleatory uncertainty and purely epistemic uncertainty [28].

In some cases, depending on the available information and expert judgment associated with an input parameter, analysts may even be able to distinguish between two types of epistemic uncertainty, i.e., systematic bias and statistical variability, and provide insights as to which type of epistemic uncertainty is the dominating contributor. For example, intentional conservatisms that an analyst introduces into the analysis (e.g., by using only experimental values between the 90-percentile and 95-percentile values of a set of measurements to characterize the maximum heat release rate of a fire source), if spotted, should be treated properly as a source of systematic bias (rather than statistical variability). Such distinction insights would be useful for prioritizing uncertainty reduction strategies.

#### 4.1.5 Step 5 in Figure 6: Characterization of Uncertainties Associated With Numerical Approximations for Model $Mi,j,k$.

This step characterizes the uncertainties associated with factors belonging to category 2, identified in step 1 (Fig. 7), i.e., the potentially important nondeterministic factors that contribute to uncertainty in the numerical approximations (that are embedded in the solution of the computational model $Mi,j,k$). As in Fig. 5, depending on the results of the qualitative screening in step 1, these factors may theoretically include error sources associated with (a) the selection of a numerical algorithm for solving the mathematical model, (b) the instantiation of that algorithm into the programing code, (c) the computer round-offs, (d) the iterative convergence, and (e) the discretization process. In practice, however, most errors associated with factors in sources (a) and (b) are often determined via code verification techniques such as the order-of-accuracy test [26,102] and, once determined, would be located (e.g., via code or algorithm debugging) and removed from the computational model. Note that, ideally, this activity should be done before the quantitative screening in step 3 of the PV methodology since the quantitative screening requires execution of the computational model. Any error associated with sources (a) and (b) that remain after the code verification, and its associated impact on the model prediction are essentially impossible to estimate. Such error should be treated as an “implicitly considered” factor as discussed in Step 1 and, therefore, would not be included in category 2 in Fig. 7 (i.e., it is out of the uncertainty characterization scope in step 5). Consequently, the uncertainty characterization in step 5 should focus on the numerical approximation errors induced from sources (c), (d), and (e) listed above, i.e., computer round-off errors, iterative convergence errors, and discretization errors.

Methods are available to estimate computer round-off errors, iterative convergence errors, and discretization errors, as can be found in the ASME V&V 20-2009 [26] and Roy [49]. Computer round-off errors occur due to the fact that only a limited number of significant figures can be used to store floating numbers in a computer. This type of error can be estimated by repeating the simulation with higher precision arithmetic and estimating the difference between, for example, single versus double precision arithmetic [49]. Iterative errors can arise from the difference between an approximate solution and the exact solution to the same discretized equations (of the computational model). Iterative error can be assessed and monitored by examining norms of the iterative residuals [49]. Discretization errors arise due to the spatiotemporal discretization of the mathematical model and the input conditions. This type of errors can be represented by the difference between the exact solution to the discretized equations (of the computational model) and the exact solution to the mathematical model [26]. Methods for estimating discretization errors are available, as summarized in Oberkampf et al. [28], including recovery methods (e.g., Richardson extrapolation, order extrapolation, recovery methods from finite elements) and residual-based methods (e.g., discretization error of transport equations, defect correction methods, implicit/explicit residual methods in finite elements, adjoint methods for estimating the error in solution functionals).

If these numerical approximation errors can be thoroughly quantified (i.e., both their magnitudes and signs can be estimated) with a reasonable level of confidence, their impact on the model prediction can theoretically be eliminated (with sufficient computing resources). In this case, these precisely quantifiable errors are not subject to the uncertainty characterization in step 5. In reality, however, this is not usually the case, i.e., this error quantification is often impractical and/or the errors can only be roughly estimated. In such cases, a common practice is to convert these errors to epistemic uncertainties where the uncertainties originate from the error estimation process [26,49]. Methods such as the grid convergence index [103] can be used for this conversion and the resulting numerical approximation uncertainty is usually represented as an interval about the model response, as documented in ASME V&V 20-2009 [26] and Roy [49]. This common practice is consistent with the fact that these numerical approximation errors are epistemic in nature, i.e., they originate from a lack of knowledge and their uncertainties can be reduced with additional information. As proposed by Roy and Oberkampf [19], the total numerical approximation uncertainty associated with the model prediction of interest (i.e., $Yi,j,k$), hereby denoted as $Ui,j,knum$, can be conservatively obtained by summing the “converted” epistemic uncertainties. Inclusion of $Ui,j,knum$ in the total uncertainty associated with the model prediction $Yi,j,k$ is discussed in step 8.

#### 4.1.6 Step 6 in Figure 6: Propagation of Unscreened Input Parameter Uncertainties Through Model $Mi,j,k$.

This step propagates all uncertainties associated with the unscreened input parameters, $Xi,j,k(*)$, through the model $Mi,j,k$. Sources of aleatory and epistemic uncertainties associated with $Xi,j,k(*)$ are separately propagated because they have different impacts on the total uncertainty of the model prediction and would be addressed distinctly later if one seeks to reduce the epistemic uncertainty in the model prediction [104]. Note that model $Mi,j,k$, used for uncertainty propagation in this step, should already be updated by correcting/eliminating those numerical errors that can be precisely quantified in step 5. The remaining numerical errors that cannot be precisely quantified have been accounted for in the total numerical approximation uncertainty $Ui,j,knum$ (obtained in step 5) and will be aggregated into the total uncertainty of the model prediction in step 8.

Multiple methods/techniques are available in literature for the propagation of input uncertainties through a simulation model [94], e.g., Monte Carlo sampling methods [105], perturbation methods [106], and stochastic spectral methods [107,108]. Implementation of Monte Carlo sampling methods is simple and straightforward; yet they usually converge slowly and require a large number of samples and, therefore, a large number of model evaluations may be required. For faster convergence rates, advanced sampling strategies/techniques (e.g., Latin hypercube sampling, importance sampling, and quasi-Monte Carlo sampling) can be considered [109,110]. Perturbation methods are based on Taylor expansions and are usually referred to as techniques for propagation of moments, i.e., they provide only an estimate for the mean and variance (or covariance) in contrast to Monte Carlo sampling methods that result in distribution functions. Stochastic spectral methods (e.g., polynomial chaos methods) have the potential to significantly reduce the number of samples required for statistical convergence [107,108]; however, they are usually practical only when the number of uncertain inputs is relatively small. In risk analysis applications, sampling methods are usually preferred for uncertainty propagation because simulation models involved in these applications often have large numbers of uncertain inputs and/or the input parameters are statistically correlated. In addition, Monte Carlo sampling methods can propagate aleatory and epistemic uncertainties even when the simulation model of interest is treated as a “black-box” model. For these reasons, Monte Carlo sampling methods are selected for the PV methodology.

To separately propagate aleatory and epistemic input uncertainties, a double-loop Monte Carlo sampling strategy is performed in a nested iteration fashion. Typically, epistemic uncertain inputs are first sampled on the outer loop and then, for each sample set of uncertain epistemic inputs, aleatory uncertain inputs are sampled on the inner loop. An entire cumulative distribution function (cdf) for the model response of interest can be obtained for each particular set of epistemic uncertain inputs. Consequently, with multiple sample sets of epistemic uncertain inputs, one would obtain an ensemble of cdfs (i.e., a family of distributions or a p-box) for the model response where the upper and lower bounds of the cdf ensemble can be determined and visualized.

In the PV methodology, the uncertainty associated with the choice of the uncertainty propagation methods/techniques (i.e., Monte Carlo sampling methods) is also considered as it can be a significant contributor to the total uncertainty associated with the model/simulation prediction. Consequently, the uncertainty arising from the use of finite numbers of samples for the two loops of the nested Monte Carlo sampling strategy can be treated in this step with corresponding convergence studies. This sampling uncertainty can either be assessed using confidence intervals or be characterized probabilistically. For instance, Bui et al. [12] combined the use of the replicated Latin hypercube sampling and bootstrap resampling methods to (i) perform convergence studies for the two loops of the nested Monte Carlo sampling strategy, and (ii) generate confidence intervals for their sampling-based simulation results representing the uncertainty associated with their selected sampling-based uncertainty propagation method.

After performing step 6 of the PV methodology, one would be able to obtain a p-box of the model prediction of interest (i.e., $Yi,j,k$), which represents the impact of both aleatory and epistemic input uncertainties on the model prediction/response. This p-box is bounded by two cdfs, i.e., a lower-bound cdf ($F\xafYi,j,k$) and an upper-bound cdf ($F\xafYi,j,k$), and is denoted by $[F\xafYi,j,k,F\xafYi,j,k]$.

#### 4.1.7 Step 7 in Figure 6: Characterization of Model-Form Uncertainty Associated With Model $Mi,j,k$.

Model-form uncertainty associated with $Mi,j,k$ arises due to the assumptions, conceptualizations, abstractions, approximations, and mathematical formulations introduced into the development of $Mi,j,k$. These are sources of uncertainty associated with the model $Mi,j,k$ itself and are different from those related to the presence of multiple plausible model forms (which will be dealt with in step 10). After step 6, the PV methodology provides three paths to account for the model-form uncertainty associated with model $Mi,j,k$. Selection of an appropriate path to account for this uncertainty would be dependent on (i) availability of validation data associated with the model $Mi,j,k$ prediction ($Yi,j,k$) and (ii) relationship between an existing validation domain (if any) and the application domain where the model $Mi,j,k$ will be used. These three paths, as listed below, are represented by the logic node following step 6 in Fig. 6.

*Path 1*: Characterization of the model-form uncertainty when an existing validation domain exists, and the application domain is completely enclosed inside the validation domain.*Path 2*: Characterization of the model-form uncertainty when an existing validation domain does not exist.*Path 3:*Bayesian updating for the uncertainty p-box $[F\xafYi,j,k,F\xafYi,j,k]$ obtained in step 6 when an existing validation domain exists but the application domain is not enclosed within this validation domain.

Paths 1 and 2 are within the scope of step 7 and will be addressed in substeps 7A and 7B, respectively. Path 3 will be addressed in step 9.

##### 4.1.7.1 Substep 7A Data-driven characterization of model-form uncertainty.

When validation data associated with the model output level are available (i.e., some validation domain exists) and the application domain is enclosed inside the validation domain, model-form uncertainty can be estimated by comparing the model response with the validation data using a certain validation metric. Various metrics of validation are available in literature and can be classified [111] as to whether (a) the metrics can account for uncertainties in the simulation results and/or empirical data (deterministic versus probabilistic metrics); (b) the comparison between simulation results and empirical data underlying the metrics is made for a single model response or multiple model responses; and (c) the metrics provide a quantitative distance-based measure that can be used to quantify the model-form uncertainty. Notable validation metrics in engineering applications that can account for uncertainties in both simulation results and empirical data include: (i) statistical approaches based on comparing the means, variances, covariances, and other distributional characteristics of the simulation results and corresponding empirical data distributions [112]; (ii) classical hypothesis testing (or significance testing) approaches [113–115]; (iii) Bayesian hypothesis testing approaches [116–119]; (iv) approaches based on an area validation metric such as the area between the simulation result distribution and the empirical data distribution [28,47,48,120]; and (v) a model reliability metric approach that provides a statistical result of the difference (bound between 0 and 1) between the simulation results and empirical data distributions [116,121].

The PV methodology is leveraging the area validation metric approach [28,47,48,120] to characterize model-form uncertainty for two reasons: (1) the area validation metric allows for the comparison between simulation results and empirical validation data when these quantities are subject to both aleatory and epistemic uncertainties and, thus, are represented by p-boxes; and (2) the area validation metric can be used even when only a few data points from simulation results and/or validation experiments are available. A thorough review of the validation metrics and evaluation of their applicability to PV is, however, left to future work.

Considering cases with a single scalar model response of interest $Yi,j,k$ and an existing set of $n$ validation data points for $Yi,j,k$ in which both the model response and the validation data are subject to mixed aleatory and epistemic uncertainties. The current scope of PV focuses on cases with a single model response (which can be extended to cases with multiple independent model responses with minor modifications), while PV for cases with multiple correlated responses [122] requires future research.

In Eq. (9), the p-box $[F\xafYi,j,k,F\xafYi,j,k]$ obtained in Step 6 represents the uncertainty in the model response of interest due to the input parameter uncertainties. The p-box $[S\xafYi,j,k,S\xafYi,j,k]$ characterizes the mixed aleatory and epistemic uncertainties in the validation data for $Yi,j,k$, with $SYi,j,k$ denoting the cdf associated with the data, bounded by $S\xafYi,j,k$ and $S\xafYi,j,k$.

In Eqs. (12) and (13), $yi,j,k(p)$ denotes the available validation data points, i.e., the observations of $yi,j,k$, where $p={1,2,\u2026,n}$. In Eq. (11), the function $\Delta ([F\xafYi,j,k,F\xafYi,j,k],SYi,j,k)$ represents the shortest distance between the model response p-box and the validation data empirical cdf.

In Eq. (16), $n$ is the number of available experimental samples, $F1=1.25$, and $F0=4.0$ [123]. As $n$ becomes smaller, the safety factor $Fs(n)$ becomes larger to account for the fact that the behavior of the experimental response has less resolution. As compared to the original area validation metric and its use in Eq. (14), the use of the modified area validation metric in Eq. (15) takes into account the relationship between the model response and the empirical data and leverages the use of a safety factor to provide a more conservative estimate of the model form uncertainty. The safety factor $Fs(n)$ can be problem dependent and needs to be determined beforehand [123].

With the validation data empirical cdf constructed using a step function as in Eq. (12), statistical information in the dataset is preserved and, hence, facilitates the area validation metric approach even when there are only limited empirical data available. In addition, while dealing with the scarcity of empirical data, the area validation metric approach can pool experimental observations available at different validation sites into a single measure to assess the overall disagreement between the model response and the empirical data. This can be done by combining the area validation metric approach with either the U-pooling technique [48] (for cases with single model response or multiple independent model responses) or the T-pooling technique [122] (for cases with multiple correlated model responses).

*Uncertainty sources contributing to the “validation metric” that represents the model-form uncertainty:*

It is important to note that the validation metric used in substep 7A does not mix the effects of model-form uncertainty with input parameter uncertainty (estimated in step 4 of the PV methodology) and numerical approximation uncertainty (estimated in step 5 of the PV methodology). Instead, the “validation metric” that represents model-form uncertainty is a combination of following uncertainty sources:

*Modeling assumptions, conceptualizations, abstractions, approximations, and mathematical formulations*: These are introduced into the final form of the model during the three phases of the model development process. Ideally, the validation metric calculation in this substep 7A is supposed to measure only these contributing sources of uncertainty as the model-form uncertainty. However, this is almost impossible because this calculation process generates other sources of uncertainty, as discussed in factors (ii), (iii), and (iv) below, that cannot be separated from those of the modeling assumptions, conceptualizations, abstractions, approximations, and mathematical formulations.*Experimental measurement input uncertainty (arisen from the validation experiments)*: An important requirement for computing the validation metric is that the uncertainty in all model input parameters should be carefully measured during the model validation experiments before being used as input to the model and propagated through the model to obtain the model response. Once this requirement is fulfilled, the impact of the input parameter uncertainty on the experimentally measured ($SYi,j,k$) and simulated ($FYi,j,k$) values of the SRQ of interest can be, to some extent, canceled out in the validation metric. However, the uncertainty associated with experimentally measuring the input parameters during the model validation experiments is confounded in the input parameter uncertainty and is expected to be propagated through the simulation model (but not through the physical processes of the experiment) when calculating the validation metric. Thus, this*experimental measurement input uncertainty*, which usually involves bias and random errors in the measurements, has a footprint in the calculated validation metric. This uncertainty can be attributed to*imprecise knowledge of input parameter uncertainty*and should be considered a contributing source of model-form uncertainty since it arises only during the validation metric calculation.*Experimental measurement SRQ uncertainty (arisen from the validation experiments)*: During the model validation experiments, the SRQ of interest needs to be experimentally measured and, thus, leading to the existence of*experimental uncertainty in the measurement of the SRQ of interest*. This uncertainty can interfere with the calculation of the validation metric because such SRQ measurement uncertainty cannot be completely accounted for in the process of obtaining the model response through simulation. The SRQ measurement uncertainty can be attributed to*imprecise knowledge of the experimentally measured SRQ*and should be considered a source of model-form uncertainty since it arises only during the validation metric calculation.*Simulation predicted SRQ uncertainty (arisen from the model calculation at validation points)*: As stated in step 5 of the PV methodology, the numerical calculation of the simulation model response is influenced by five main error sources associated with (a) the selection of a numerical algorithm for solving the mathematical model, (b) the instantiation of that algorithm into the programing code, (c) the computer round-offs, (d) the iterative convergence, and (e) the discretization process.• Most errors associated with sources (a) and (b) are often determined via code verification techniques and are then removed from the computational model. This activity should have already been done before the validation metric is computed, which involves execution of the computational model for obtaining the simulation model response.

*Error associated with sources (a) and (b) that remain after the code verification*should be considered as part of the “final form” of the simulation model, and their associated impacts on the model response are essentially impossible to estimate. Consequently, these remaining errors can be attributed to*imprecise knowledge of the simulation predicted SRQ*and should be lumped into the model-form uncertainty.• For errors associated with sources (c), (d), and (e) listed above, if they can be thoroughly quantified at the validation points (i.e., where the validation experiments are conducted) with a reasonable level of confidence, their impact on the model response (and, thus, the validation metric) can theoretically be eliminated. If these errors cannot be quantified at the validation points, their impact on the model response will interfere with the calculation of the validation metric because they are not accounted for in the process of obtaining the experimentally measured SRQ. In such cases, these errors should be considered a source of model-form uncertainty and can be attributed to

*imprecise knowledge of the simulation predicted SRQ*. Note that, these numerical errors arise from calculating the model at the validation points and are different from the numerical errors quantified in step 5 of the PV methodology, which arise when solving the model for prediction at the application conditions.

Consequently, any mismatch between the experimentally measured validation data and the simulated model response associated with the SRQ of interest can be attributed to model-form uncertainty, which encompasses the four uncertainty sources discussed above, i.e., source (i) to (iv). Further research is needed if one wants to separately evaluate these sources of uncertainty. At the moment, it is a challenge to attribute only source (i) to model-form uncertainty because, for example, the experimental measurement uncertainties (associated with the measured input and SRQ in validation experiments) cannot be completely removed; even if bias errors in the measurements can be reduced to a negligible level, there can always be random measurement uncertainty in the measured quantities.

It is finally important to emphasize that the procedure for using a validation metric to characterize model-form uncertainty in substep 7A above is reasonable only when there is an existing validation domain (associated with the model response $Yi,j,k$) and the application domain (where $Mi,j,k$ is used) is enclosed inside that validation domain. When the application domain falls outside the validation domain, substep 7B should be performed.

##### 4.1.7.2 Substep 7B Bottom-up causal model quantification to estimate model-form uncertainty.

Substep 7B is designed for the cases when (i) relevant empirical validation data associated with the model response $Yi,j,k$ are not available and (ii) there is an existing validation domain, but the application domain falls outside the validation domain. In the latter cases, a commonly accepted approach is to extrapolate the validation metric (and, hence, the model-form uncertainty) estimated over the validation domain to the application domain [19]. This extrapolation though being conceptually reasonable from a practical engineering viewpoint, can be difficult to implement in an accurate manner. Difficulties usually arise from (a) the extrapolation of the model itself to make a prediction under application conditions where the model has not been tested/validated previously (or, even worse, where the model is unknowingly inapplicable), and (b) the extrapolation of the error structure of the model and uncertainty in the empirical data, usually in a high dimensional space [28,33]. In addition, validation metric extrapolation would be questionable if the underlying system/subsystem is operating under abnormal or extreme conditions where there can be more geometric and/or physical interaction complexities as compared to the conditions considered inside the validation domain [28]. Due to these challenges, the use of extrapolation is not recommended; instead, a bottom-up approach for characterizing model-form uncertainty is suggested for this substep.

The underlying idea of this bottom-up approach is to characterize model-form uncertainty by quantifying the causal node “computational model form” in the theoretical causal framework developed specifically for model $Mi,j,k$ in Step 1. For instance, the causal structure between the $Mi,j,k$ “computational model form” node and the influencing factors can be modeled using Bayesian belief network. Then, existing quantitative data associated with these influencing factors and qualitative data from expert judgment could be used for the causality quantification. Note that these types of quantitative and qualitative data are different from the validation data (from validation experiments) required for executing substep 7A. This substep is included in the PV methodological platform as a placeholder, while the methodology for substep 7B is under development and will be reported in a future publication.

If for some reason, substep 7B cannot be implemented when the application domain falls outside the validation domain, the authors recommend using Bayesian updating to leverage the available empirical data for providing a best estimate of the model prediction uncertainty as shown in step 9 (Sec. 4.1.9).

#### 4.1.8 Step 8 in Figure 6: Estimating Total Uncertainty Associated With Model $Mi,j,k$ Prediction by Aggregating Results Obtained From Steps 6 and 7A/7B.

Once the model-form uncertainty is characterized (either substep 7A or 7B), Step 8 quantifies the total uncertainty associated with the model response/prediction $Yi,j,k$ by aggregating all sources of input, numerical approximation, and model-form uncertainties identified and characterized in previous steps. This uncertainty aggregation can be done by applying the model-form uncertainty (e.g., $Ui,j,kmf$ obtained from step 7A) and the numerical approximation uncertainty ($Ui,j,knum$ obtained from step 5) about the uncertainty p-box $[F\xafYi,j,k,F\xafYi,j,k]$ obtained in step 6 [19,49], as illustrated in Fig. 8. The uncertainty p-box $[F\xafYi,j,k,F\xafYi,j,k]$ (illustrated by the blue p-box in Fig. 8) has resulted from the propagation of input parameter uncertainties through the model $Mi,j,k$, considering that the input parameter uncertainties contain both aleatory and epistemic uncertainties. This aggregation is feasible because both the model-form uncertainty ($Ui,j,kmf$, represented by the green areas in Fig. 8) and the numerical approximation uncertainty ($Ui,j,knum$, represented by the red areas in Fig. 8) are interval-characterized epistemic uncertainties.

This p-box $TUi,j,k$ represents the family of all possible cdfs within which the true cdf of the model prediction $Yi,j,k$ can exist. The width of the total uncertainty p-box $TUi,j,k$ obtained from Eq. (17) represents the degree of confidence of the simulation model $Mi,j,k$ and its prediction $Yi,j,k$ for the application of interest.

#### 4.1.9 Step 9 in Figure 6: Bayesian Updating the Uncertainty Associated With Model $Mi,j,k$ Prediction.

This step uses Bayesian updating to maximize the use of available empirical data for providing a best estimate of the total uncertainty associated with the model prediction $Yi,j,k$. In the PV methodology, this step serves two goals. The first goal is to help analysts account for cumulative effects of the sources of uncertainty that have not been addressed in the previous steps (e.g., uncertainty due to errors in the screening processes in steps 1 and 3, uncertainty in selecting user-defined model features). Each of the unaccounted sources of uncertainty may be a small contributor to the total uncertainty $TUi,j,k$ but their cumulative effect may not be insignificant; thus, by leveraging available empirical data, the Bayesian updating approach can account for this cumulative effect. This goal can be achieved by applying Bayesian updating to the total uncertainty p-box $TUi,j,k$ obtained using Eq. (17) in step 8 (paths 1 or 2 in module A, Fig. 6). The second goal of this step is to provide an alternative solution to model-form uncertainty quantification when this uncertainty (associated with $Mi,j,k$) cannot be reliably estimated due to: (a) lack of validation data associated with $Yi,j,k$ to facilitate the procedure in substep 7A; and (b) lack of data associated with the causal factors contributing to the “computational model form” node (in the theoretical causal framework developed for $Mi,j,k$ in step 1) to facilitate the procedure in substep 7B. This second goal can be achieved by applying Bayesian updating to the initial model prediction p-box $[F\xafYi,j,k,F\xafYi,j,k]$ obtained in step 6 without characterizing the model-form uncertainty (path 3 in module A, Fig. 6).

where $L(E|y)$ is the likelihood of observing the empirical data $E$ given the model response $y$. Depending on the characteristics of the available empirical data, a specific form of the likelihood function $L(E|y)$ and, therefore, $L(E|\theta )$, can be obtained for the updating process. For example, while developing a set of Bayesian methods to update model output uncertainty distributions using experimental data, Pourgol-Mohamad et al. [124] considered whether the data can or cannot be paired with model-calculated results and whether the data are fully or partially relevant to the application at hand.

The Bayesian approach allows for incorporating not only “hard” data, such as experimental and testing data, but also “soft” data, such as expert beliefs on the credibility and quality of the model, into the uncertainty updating process [125]. Note that, in any case, the relevant data $E$ used for the Bayesian updating need to be different from the empirical data that have already been used for model development and uncertainty characterization in the previous steps of the PV methodology.

#### 4.1.10 Step 10 in Figure 6: Aggregating Results Associated With Multiple Model Forms $Mi,j,k\u2009(k=1,\u20092,\u2009\u2026,\u2009Ni,j,k)$ to Estimate the Uncertainty Associated With $Mi,j$ Model Prediction.

The purpose of this step is to aggregate multiple p-boxes of $Yi,j,k$ obtained from using multiple model forms $Mi,j,k\u2009(k=1,\u20092,\u2009\u2026,\u2009Ni,j,k)$. Among available approaches for aggregation of multiple p-boxes, two commonly used approaches are the envelope and mixture approaches [110,126,127].

for $y\u2009\u2208DY$. This envelope approach is applicable when there is no information on which p-box really encompasses the unknown cdf of $Yi,j,k$ yet it is known that at least one of the p-boxes does encompass it.

where $wi,j,k>0$ and $Wi,j,k=\u2211k=1Ni,j,kwi,j,k$. In practice, values of $wi,j,k$ are usually agreed upon by experts before the aggregation takes place, and the resulting p-box $[F\xafYi,j,F\xafYi,j]$ is a weighted average of the p-boxes obtained from using $Ni,j,k$ model forms.

Note that selection of an appropriate approach for the p-box aggregation is a delicate matter and involves various information sources and their interrelationships [126]. Future work will study this topic in more depth and provide guidance on which aggregation approach would be most appropriate for this step of the PV methodology. It is also noted that complications in aggregating the results of multiple model forms may interfere with the “well-bedded” layers of input, model-form, and numerical approximation uncertainties as visually illustrated in Fig. 8. This is because the input parameters and numerical approximations associated with each model form may differ from the others. This topic is also left to future research.

In addition, the assumption of independent model forms made in the current PV algorithm may not always be the case. In fact, an important issue in aggregating multiple model prediction p-boxes obtained with multiple plausible model forms is the possibility of dependence among these model forms. Possible reasons for this dependence may include (but are not limited to): (i) the model forms may have been developed using some common theoretical/physics principles (as the models are supposed to represent the same reality); (ii) the model forms may have been conceptualized and implemented by individuals sharing the same basic training and knowledge; and (iii) the model forms may have been developed using same data sets and using similar modeling process, resulting in similarities in their structural elements (e.g., forms, parameters). Future work will further evaluate approaches for handling dependent model forms.

where $F\xafYinput$ and $F\xafYinput$ are the lower-bounded and upper-bounded cdfs of the uncertainty p-box obtained from propagating input parameter uncertainties through the model $M$, $Umf$ is the model-form uncertainty associated with $M$, and $Unum$ is the uncertainty associated with the numerical approximations for $M$. The width of the total uncertainty p-box $[F\xafY,F\xafY]$ obtained from Eq. (23) represents the degree of confidence of the system-level simulation model $M$ and its prediction $Y$ for the application of interest. Module B of the PV methodology (Sec. 4.2) evaluates whether this degree of confidence is acceptable for the application of interest.

### 4.2 Module B: Acceptability Evaluation.

When the simulation model is used for a specific application, its degree of confidence may or may not be accepted, depending on the acceptability criteria for that application. Module B of the PV methodology helps evaluate the acceptability of the simulation prediction and its associated degree of confidence for a specific application of interest. This evaluation is done by comparing the total uncertainty associated with either the application output or the simulation prediction against corresponding predefined acceptability criteria. The acceptability evaluation procedure is guided by the algorithm included in module B in Fig. 6.

In the PV methodology, it is preferable that the acceptability evaluation be performed at the application output level because the application is the ultimate goal of developing and using the simulation model. This path follows steps 11 and 12A in Fig. 6, where the simulation prediction is considered “valid” for the specific application of interest if the total uncertainty associated with the application output falls entirely below its corresponding acceptability criteria. If, for any reason, the acceptability evaluation cannot be performed at the application level, it should be done at the simulation prediction level by performing step 12B in Fig. 6, where the simulation prediction (with its associated degree of confidence) is considered “valid” for the specific application of interest if the total uncertainty associated with the simulation prediction (i.e., the whole p-box $[F\xafY,F\xafY]$ obtained from Eq. (23) in Module A) falls entirely below the corresponding acceptability criteria. Steps 11, 12A, and 12B are described below.

#### 4.2.1 Step 11 in Figure 6: Quantifying the Total Uncertainty Associated With the Application Output of Interest.

To execute the acceptability evaluation at the application output level, this step quantifies the total uncertainty associated with the application output of interest. This can be achieved by (i) integrating the simulation model into the computational platform for the application of interest, and (ii) executing that computational platform to quantify the total uncertainty associated with the application output of interest. In the risk analysis context, the I-PRA methodology (Fig. 1) can be leveraged [1,8]. As can be seen in Fig. 1, the PV methodology is a feature of the interface module in I-PRA. The total uncertainty associated with the simulation prediction, $[F\xafY,F\xafY]$, obtained from implementing module A of the PV methodology (Sec. 4.1), is used for the plant risk quantification in the plant-specific PRA module.

#### 4.2.2 Step 12A in Figure 6: Performing Acceptability Evaluation at the Application Output Level.

In this step, the total uncertainty associated with the application output of interest, quantified in Step 11, is used to perform the acceptability evaluation at the application output level. The implementation of this step is based on the concepts shown in Fig. 4(a) (Sec. 3.3).

Figure 4(a) illustrates five cases where the acceptability criteria are available at the application output level (e.g., core damage frequency estimated using PRA with input from the simulation prediction). When the NPP risk is the application output of interest, risk acceptance guidelines (e.g., the ones provided in regulatory guide 1.174 [66]) can be used to derive the acceptability criteria. The five cases are considered in Fig. 4(a). Demonstrate situations in which the total uncertainty associated with the estimated application output is well below, near, or well above the predefined acceptance criteria. The total uncertainty associated with the application output of interest is represented by the vertical blue lines. For instance, if the risk acceptance criteria are defined in terms of the 95-percentile value of the application output, the vertical blue lines in Fig. 4(a) represent the uncertainty bounds for the 95-percentile value of the estimated application output. The length of the blue lines represents the magnitude of the epistemic uncertainty reflecting the contribution of statistical variabilities and systematic biases, while their vertical location reflects the contribution of aleatory uncertainties and systematic biases. The conclusion on the acceptability evaluation for the five cases in Fig. 4(a) can then be derived as follows:

Cases 1 and 2: The epistemic uncertainty (i.e., degree of confidence) for the application output estimate is either small (case 1) or large (case 2) but the uncertainty intervals fall entirely below the predefined acceptance guideline. Accordingly, the simulation prediction and its associated degree of confidence can be considered acceptable since there is no need for a more realistic estimation of the simulation prediction (or other parts of the application model).

Case 3: The epistemic uncertainty for the application output estimate is small but the acceptance guideline falls within the epistemic uncertainty bounds. One may consider reducing epistemic uncertainty in the estimation of the application output (i.e., improving the degree of confidence), expecting that the updated epistemic uncertainty bounds in the application output would fall below the acceptance guideline. The tight epistemic uncertainty bounds for the application output estimate in case 3, however, indicate limited room for this reduction. If proceeding with the reduction, resources should be prioritized to the most significant sources of epistemic uncertainty (with respect to their impact on the total uncertainty in the application output estimate). The importance raking analysis (step 13 of the PV methodology; Sec. 4.3.1) is used for identifying those most significant sources. Note that this importance ranking may be misleading if there are systematic biases in the application output estimation and, hence, it should be done only when the biases have been properly treated with a reasonable level of confidence. After a sufficient reduction of epistemic uncertainty, if the updated epistemic uncertainty bounds fall completely below the acceptance guideline, the updated simulation prediction and its updated degree of confidence can be considered acceptable. Otherwise, the simulation prediction cannot be considered “valid” for the application at hand.

Case 4: The epistemic uncertainty for the application output estimate is large and the acceptance guideline falls within the epistemic uncertainty bounds. The situation illustrated in this case is similar to the one in case 3, except that the large epistemic uncertainty bounds indicate more flexibility possibility for the reduction of epistemic uncertainty.

Case 5: This case is representative of situations where the epistemic uncertainty bounds associated with the application output completely exceed the acceptance guideline. This may be due to the remaining systematic bias being too large, and one may want to consider correcting the bias. If the bias correction results in the acceptance guideline being within the bias-corrected uncertainty band, one can then try reducing the most important epistemic uncertainties similar to the considerations discussed in case 3. However, if the bias correction still results in the acceptance guideline falling completely below the bias-corrected uncertainty band, this may indicate that the design of the systems considered in this application (including the simulated system) needs modifications to satisfy the predefined acceptance guideline. In such cases, design modifications would need to be considered.

#### 4.2.3 Step 12B in Figure 6: Performing Acceptability Evaluation at the Simulation Prediction Level.

This step facilitates the acceptability evaluation at the simulation prediction level (in case the evaluation cannot be done at the application output level). The implementation of this step is based on the concepts shown in Fig. 4(b) (Sec. 3.3).

Figure 4(b) illustrates three cases where the acceptability criteria are available at the simulation prediction level, i.e., associated with a certain SRQ of interest. The total uncertainty associated with the predicted SRQ is represented by the p-box limited by the two blue cumulative distributions. The red-dashed lines represent thresholds at which the SRQ of interest should not exceed. The three thresholds, denoted by *T*_{1}, *T*_{2}, and *T*_{3}, demonstrate three cases (A, B, and C) in which the threshold may fall above, within, and below the predicted SRQ p-box, respectively. The width of the p-box represents the degree of confidence of the predicted SRQ. By comparing the total uncertainty (i.e., the whole p-box) against a corresponding threshold, whether the simulation prediction and its degree of confidence are acceptable for the specific application can be determined. Note that, as much as possible, conservative sources of systematic bias should already have been avoided in the simulation model. The conclusion on the validity of the simulation prediction for cases A, B, and C in Fig. 4(b) can then be derived as follows:

Case A (threshold

*T*_{1}): The simulation prediction and its degree of confidence can be considered acceptable because, considering its total uncertainty, the predicted SRQ p-box falls entirely below the corresponding threshold*T*_{1}, and there is no need for a more realistic estimation of the SRQ. The simulation prediction can be considered “valid” for the application conditions being considered.Case B (threshold

*T*_{2}): The threshold falls inside the predicted SRQ p-box, indicating that there is a probability that the predicted SRQ would exceed the threshold*T*_{2}. One then may consider reducing epistemic uncertainty in the simulation model to improve the current degree of confidence of the simulation prediction. This reduction should be prioritized for the most significant sources of epistemic uncertainty (with respect to their impact on the total uncertainty in the predicted SRQ) to efficiently utilize available resources. Insights from an uncertainty importance ranking analysis (step 13 of the PV methodology) would help identify the most significant sources of epistemic uncertainty. This importance ranking analysis may be misleading if there are systematic biases in the simulation prediction; hence, it should be done only when the biases have been properly treated to a reasonable degree. After a sufficient reduction of its epistemic uncertainty, the updated simulation prediction and its associated degree of confidence can be considered acceptable if the updated p-box falls entirely below the threshold*T*_{2}. Otherwise, the simulation prediction cannot be considered “valid” for the application at hand.Case C (threshold

*T*_{3}): The predicted SRQ p-box exceeds the threshold, indicating that the simulation prediction and its degree of confidence are not acceptable. This may be due to the remaining systematic biases being too large, and one may want to consider correcting the biases. If the bias correction results in the threshold*T*_{3}being within the bias-corrected p-box, one can then try reducing the most important epistemic uncertainties similar to the considerations discussed in case B. However, if the bias correction still results in the SRQ bias-corrected p-box exceeds the threshold T_{3}, this may indicate that the design of the simulated system needs modifications to satisfy the predefined threshold. In this case, system design modifications should be considered.

### 4.3 Module C: Global Importance Ranking and Validity Improvement.

If the simulation prediction and its associated degree of confidence are assessed as “not sufficiently valid” for the application at hand, the validity may need to be improved. Module C is, therefore, included in the PV methodology to improve the validity of the simulation prediction up to a level that is acceptable for the application.

The validity improvement for the simulation prediction should only be done if the uncertainty in the simulation prediction is identified as a significant contributor to the total uncertainty in the application output. In PRA applications, simulation predictions are usually compared against safety/failure thresholds to quantify probabilities of PRA basic events. In this case, uncertainties associated with the basic events can be ranked with respect to their contribution to the total uncertainty in the PRA model output (e.g., plant risk estimates) using global sensitivity analysis (GSA) approaches. Common GSA approaches include correlation-based [128–135], variance-based [136–140], and moment-independent [141–147] methods. For the I-PRA framework (Fig. 1), an advanced importance ranking method based on the cdf-based moment-independent approach was developed [142] and applied to Fire I-PRA to rank uncertain input parameters of fire simulation models [5,6]. Once a simulation-based basic event is identified as an important source of uncertainty contributing to the application output uncertainty, the corresponding simulation prediction used to quantify that basic event can be considered a significant uncertainty contributor. The ranking analysis at these levels, however, is not the focus of module C in the PV methodology. Instead, module C focuses on using advanced global importance ranking analyses to identify sources of epistemic uncertainty that contribute the most to the total uncertainty in the simulation prediction, considering that these sources of uncertainty can be mixed aleatory-epistemic uncertainties. These insights would inform decision-makers on how or whether they should prioritize their resources for improving the validity of the simulation prediction, e.g., collecting additional data to reduce epistemic uncertainties associated with input parameters and assumptions and approximations underlying the form/structure of the simulation model, or performing more thorough verification to reduce numerical approximation uncertainty. In module C, the uncertainty importance ranking and validity improvement should be done in an iterative manner to gradually refine the simulation models and inputs and to eventually achieve an acceptable level of validity for the simulation prediction.

#### 4.3.1 Step 13 in Figure 6: Global Importance Ranking of the Potentially Dominant Sources of Epistemic Uncertainty.

This step conducts an importance ranking analysis to identify sources of epistemic uncertainty that contribute the most to the epistemic uncertainty in the simulation prediction. Only sources of statistical variability are considered as it is assumed that systematic biases, once identified, should be avoided if possible. In addition, a presumption for this importance ranking is that the epistemic uncertainty in the simulation prediction is a significant contributor to the application output uncertainty (e.g., by leveraging insights from the importance ranking at the PRA basic events level) and, given limited resources, one wants to efficiently improve the validity of the simulation prediction. Considering that (i) the total uncertainty in the simulation prediction results from appending input parameter, model-form, and numerical approximation uncertainties (similar to the illustration in Fig. 8), (ii) model-form and numerical approximation uncertainties are sources of pure epistemic uncertainty, and (iii) uncertainties associated with the input parameters can include pure aleatory, pure epistemic, and mixed aleatory-epistemic uncertainties, this step is conducted in two substeps:

*Substep 13.1: Ranking the magnitudes of epistemic uncertainty components.* This substep compares the area within the input uncertainty p-box (e.g., the blue p-box illustrated in Fig. 8) against the areas associated with the model-form uncertainty (e.g., the green area illustrated in Fig. 8) and the numerical approximation uncertainty (e.g., the red area illustrated in Fig. 8) to determine the dominant epistemic uncertainty contributor. This substep may result in three alternative conclusions. First, it may indicate that the model-form uncertainty is the dominant contributing source, which would inform decision-makers that more validation data should be collected to better characterize the model-form uncertainty. Second, the result of this substep may indicate that the epistemic uncertainty associated with the input parameters of the simulation model is the dominant contributor. In this case, decision-makers should focus on collecting additional empirical data to reduce this source of epistemic uncertainty. Substep 13.2 can provide the importance ranking of the input parameters based on their epistemic uncertainty contribution to inform the decision-makers of which input parameters require prioritized attention for epistemic uncertainty reduction. Third, the result of substep 13.1 may indicate that numerical approximation uncertainty is the dominant contributor to the total uncertainty. In this case, the decision-makers should direct more resources toward reducing the numerical approximation errors (step 5).

If path 3 in module A is taken for a specific problem (i.e., one performs the Bayesian updating in step 9 instead of characterizing the model-form uncertainty using step 7), the total uncertainty in the simulation prediction will not be separated from the three uncertainty components. In this case, substep 13.1 should be skipped.

*Substep 13.2: Ranking the epistemic input uncertainties.* This ranking is needed when one wants to identify the epistemic input uncertainties that are dominantly contributing to the epistemic uncertainty in the simulation prediction. Since uncertainties associated with the input parameters may include pure aleatory, pure epistemic, and or mixed aleatory-epistemic uncertainties, this substep requires an importance ranking method that can separate the contributions of aleatory and epistemic uncertainties in each of the input parameters. Common GSA methods such as the variance-based [136–140] and moment-independent [141–147] methods only consider the case where input parameters are characterized with precise probability distributions. Sankararaman and Mahadevan [148] introduced a methodology that provides quantitative estimates of the contributions of the two types of uncertainty (within an input parameter with mixed aleatory-epistemic uncertainty) using variance-based Sobol indices. The challenge with traditional Sobol indices when dealing with mixed aleatory-epistemic input uncertainties is that the model prediction is uncertain (not deterministic) even for a fixed realization of the input parameters. Their work [148] introduced an auxiliary variable method based on the probability integral transform to construct a deterministic relationship between the uncertain input parameters and the model output to facilitate the computation of the Sobol indices. In quantifying the sensitivity metric for an input parameter, their work considered only an expected distribution derived from the p-box (rather than the entire p-box) [148]. This is a resolution tradeoff for having a less computationally demanding GSA approach [148]. Alternatively, Schöbi and Sudret [149] developed “imprecise” Sobol indices as an extension of the traditional indices. Their study [149] used a surrogate model based on the sparse polynomial chaos expansion for computing the imprecise Sobol indices. This provides an alternative, less expensive computational solution to the brute-force Monte Carlo simulation methods in cases where the polynomial chaos expansion technique is applicable. Other approaches found in the literature include pinching input p-boxes to precise distributions or fixed values to facilitate the sensitivity analysis [150–152] or imprecise GSA using Bayesian multimodel inference and importance sampling [153].

For interpretation, the higher the value of $Sqpbox$ is, the larger reduction in the epistemic uncertainty of the simulation output $Y$ can be expected when reducing the epistemic uncertainty associated with $Xq$.

#### 4.3.2 Step 14 in Figure 6: Collect New Data/Revise Model to Improve the Validity.

This step is reserved for validity improvement activities including, for example, collecting new empirical data and revising the simulation model. Insights from the importance ranking in step 13 should help inform decision-makers regarding the prioritization of their limited resources in order to achieve the most out of the validity improvement. Within the scope of this step, one may likely need to perform a cost-benefit analysis to determine the time and monetary value needed for each alternative validity improvement solution. After each major improvement, the validity assessment process needs to be iterated to evaluate the impact of such improvement on the acceptability result (module B) and whether further improvements are necessary. The methodology and procedure for this step are under development and will be reported in future publication.

## 5 Conclusions

This paper is the first in a series of two papers related to PV that provides the theoretical foundation and methodological platform. The second paper applies the PV methodological platform for a case study of fire PRA of NPPs.

Advanced M&S have been increasingly used in the nuclear domain to improve the realism of PRA for existing NPPs as well as to foster the analysis, design, and operationalization of advanced nuclear reactors. Before being used in PRA to support risk-informed decision making, simulation models need to be adequately validated. This paper proposes the PV methodology for systematically and scientifically evaluating the validity of simulation prediction when empirical validation approaches are not feasible, for instance, due to the lack of validation data at the model output level. In the PV methodology, the validity of a simulation prediction for an application of interest is determined by: (1) the magnitude of epistemic uncertainty (i.e., representing the degree of confidence) in the simulation prediction; and (2) result of an acceptability evaluation that compares the total uncertainty associated with the simulation prediction against some predefined acceptability threshold.

The theoretical foundation in Sec. 3 concluded five key characteristics of the PV methodology as listed below. Characteristic #5 does not exist in any of the exiting studies reviewed in Sec. 2. Each of characteristics #1 to #4 (partially or completely) exists in the existing methods; however, the integration of them under one methodology is a unique contribution of this research.

*Characteristic #1:*PV offers a multilevel validation analysis that can integrate data and uncertainty analysis at multiple levels of the system hierarchy to support the degree of confidence evaluation.*Characteristic #2:*PV separates aleatory and epistemic uncertainties and, when possible, differentiates between two sources of epistemic uncertainty (i.e., statistical variability and systematic bias) while considering their influence on the simulation prediction uncertainty.*Characteristic #3:*PV uses risk-informed acceptability criteria, along with a predefined guideline, to evaluate the acceptability of the simulation prediction.*Characteristic #4:*PV combines uncertainty analysis with two-layer sensitivity analysis to assess the validity of simulation prediction and to address the lack of validity (if needed).*Characteristic #5:*PV is equipped with a theoretical causal framework (Sec. 3.5) to guide the comprehensive identification of uncertainty sources and their interrelationships. This theoretical causal framework guides the identification of causal factors influencing the simulation prediction, their paths of influence, and the uncertainty sources associated with these causal factors that together can influence the uncertainty in the simulation prediction. This process helps ensure that the uncertainty analysis for the simulation prediction does not miss any important source of uncertainty that could arise from phases of the simulation model development. In addition, this theoretical causal framework can help simplify the traceability of the most important sources of uncertainty with regard to their contributions to the simulation prediction uncertainty.

The PV methodological platform (Sec. 4) is the operationalization of the five key characteristics above and is designed with three modules that feature a comprehensive uncertainty analysis framework, acceptability evaluation, and importance ranking analysis. Key advantages of the PV methodology include the capabilities to utilize all available data and information at different levels of the system hierarchy to (i) identify important sources of uncertainty that contribute to the uncertainty in the simulation model prediction, and (ii) evaluate the uncertainty in the simulation prediction and assess the validity of the simulation model even when system-level validation data associated with the simulation prediction are limited. To perform a comprehensive uncertainty analysis, the PV methodology, however, requires a significant amount of resources and synchronized efforts from a multidisciplinary team of modelers, code developers, system analysts, etc.

The steps of the PV methodology (Fig. 6) constitute a “base algorithm” applicable for simulation models of complex, hierarchical systems where the hierarchical elements of the systems can be subject to multiple plausible (element) model forms. The base PV methodology currently assumes that the element models on the same hierarchical level do not interact with each other in a way that output from one element model is input to another element model and vice versa. Such interactions often appear in simulation models with dynamically coupled elements/submodels. This is a subject that would require further extensions of the current base PV algorithm.

Future work could advance the PV methodology by exploring the following aspects: (a) update the theoretical causal framework and operationalize it by leveraging a data-theoretic methodology previously developed by some of the authors [154] and equipping it with appropriate causal modeling and measuring techniques; (b) extend the base PV algorithm for other types of complex systems (e.g., coupled, multiphysics systems) and for cases where there are multiple correlated model responses; (c) develop a detailed method for substep 7B (Sec. 4.1.7.2), i.e., the bottom-up causal model quantification method to estimate model-form uncertainty.

## Acknowledgment

This work is supported by the U.S. Department of Energy's Office of Nuclear Energy through the Nuclear Energy University Program (NEUP) Project #19-16298: I-PRA Decision-Making Algorithm and Computational Platform to Develop Safe and Cost-Effective Strategies for the Deployment of New Technologies (Federal Grant #DE-NE0008885). The authors would like to thank all members of the Socio-Technical Risk Analysis (SoTeRiA) Laboratory for their feedback on this paper.

## Funding Data

U.S. Department of Energy (Grant No. DE-NE0008885; Funder ID: 10.13039/100000015).

## Footnotes

https://www.asme.org/codes-standards/publications-information/verification-validation-uncertainty (accessed in May 2022).

Here, “accuracy” can be (informally) defined as closeness to truth, and “credibility” can be defined as the worthiness to take it true [29].

## References

**60**(4), pp.

**93**, p.