Over the past 20 years, Sandia National Laboratories (Sandia) and ASME [1–6] have played major roles in the development of V&V concepts, methods, and processes by building on the work of many other individuals and organizations [7–12]. Along the way, Sandia has organized workshops to demonstrate the state-of-the-art and highlight particular V&V issues. The most recent, the 2014 Sandia V&V Challenge Workshop, was held at the third ASME V&V Symposium in Las Vegas, NV, on May 8, 2014. As a challenge workshop, it was built around a challenge problem crafted to exercise specific ideas and spark discussions. The problem was released to the community prior to the workshop to allow participants time to prepare their responses. Challenge problems provide a common basis upon which a diverse group of participants can demonstrate methods and ideas and the workshops have become a benchmark of sorts for the V&V community. The 2014 workshop focused on the overall V&V strategies to address the challenge problem and also fostered discussion about how a V&V strategy can establish or assess the credibility of a specific prediction.
This special edition of the ASME Journal of Verification, Validation and Uncertainty Quantification documents the workshop and is organized as follows. This introductory paper discusses the history of workshops and how the goals have evolved with the V&V community—leading to the 2014 workshop. The next paper describes the problem statement and how the data provided for the problem were created (the “truth model”) . The full problem description, posed to the participants in 2013, is available as a technical report  but does not include the truth model. These two papers are followed by five contributions that describe the participants' responses to the challenge problem [15–19]. Two discussion papers are also included, in which the authors do not solve the problem but instead address relevant V&V topics while using the challenge problem as context [20,21]. The special edition concludes with a summary paper that describes the responses and lessons learned .
A History of Challenge Workshops
The 2002 Workshop on Alternative Representations of Epistemic Uncertainty dealt with uncertainty quantification (UQ), or how to characterize, model, and propagate uncertainty with both epistemic and aleatoric nature [23,24]. The workshop highlighted the importance of uncertainty, and the need to quantify uncertainty and understand its impact on model predictions. This workshop's goal was to compare different concepts and algorithms to describe predictions that rely on uncertain information. The participants found that each method had advantages and drawbacks, and they concluded that a healthy diversity of opinions would persist in the community .
The 2006 Validation Challenge Workshop posed problems with a hierarchical flow of information (i.e., materials level calibration, which informs validation at an “exploratory” level and “accreditation” level) [26–30]. The hierarchy is a visualization of how data and model predictions are used during multiple V&V analyses . It describes at a high level how the data and models relate to each other. This forms the basis for the selection and execution of V&V analyses as part of a larger strategy to improve predictions or assess credibility. The participants in the 2006 workshop had to apply multiple V&V methods across a predetermined hierarchy, to make a final prediction and satisfy a regulatory requirement. These challenge problems illustrated that (1) uncertainty is pervasive throughout modeling work and V&V, (2) quantitative methods are needed to assess the uncertainty across the hierarchy, and (3) a wide range of predictions and uncertainty estimates should be expected—even when the hierarchical information flow is specified [32–35]. Several other UQ and V&V workshops have focused on methods and theory , including some that did not contain a challenge problem [11,12,37].
By 2012, the workshop organizers felt that adequate attention was being paid to research on V&V theory and methods, and adoption of V&V methods and principles was spreading rapidly. Many journals from diverse engineering disciplines were publishing V&V-focused papers and the topic was popular at conferences—notably the ASME V&V Symposium. Both academia and industry were researching and applying V&V methods, however, very little was known about how to utilize the results. Decision makers and customers were struggling to understand and interpret the evidence gathered from V&V methods. Unfortunately, the two ends of this spectrum (research versus applications) are not easily connected for several reasons: (1) the V&V community of practice is relatively small, (2) a huge range of expertise is required to perform all the V&V activities, (3) the incentives of researchers (novel methods and skills, broad and general impact, and publishing ideas) and practicing engineers (delivering results, creating a process tailored to one organization, and maintaining competitive advantage) have limited overlap, and (4) the benefits of V&V are hard to measure and communicate. The organizers saw the need to continue developing V&V methods, provide the community with V&V experience, and investigate the decision-making process. In response, they launched efforts to develop a new problem and hold a third challenge workshop.
The 2014 Sandia Verification and Validation Challenge Workshop
The 2014 Sandia Verification and Validation Challenge Workshop was a daylong event with ten talks and follow-up discussions. After the organizers opened the workshop, nine participants presented responses to the challenge problem or their views on the role of V&V in decision support, followed by an open discussion. The participants were from academia, consulting firms, and the U.S. research laboratories, and were predominantly engineers and mathematicians. The workshop was open to all attendees of the ASME V&V Symposium and an estimated 100 people attended some or all of the talks and discussion.
Hu (Sandia National Laboratories) served as workshop moderator and introduced the challenge problem . This was followed by six responses from: Shields et al. (John Hopkins University) [39,40], Roy et al. (Virginia Tech) , Chen et al. (Northwestern University) , Xi and Pan (University of Michigan—Dearborn) , Mahadevan and Mullins (Vanderbilt) , and Beghini and Hough (Sandia National Laboratories) . For brevity, only the team leads are listed but each presentation was the work of a group. Three more speakers presented their thoughts on how V&V plays a role in decision making, Paez (consultant) gave an economic analysis of whether modeling, simulation, and V&V are worth the cost , Elele (U.S. Naval Air Warfare Center) discussed how V&V can be used to assess risk levels and impact on projects , and Brodrick (U.S. Naval Surface Warfare Center) discussed how models are currently used to support qualification activities, and a vision for how V&V and UQ should be incorporated into that process . After the workshop, several of the participants continued to refine their solutions and discussion topics, which led to the creation of this special edition.
The Challenge Problem
The scenario described in the challenge problem is an engineering project to predict the behavior of a population of simple storage tanks under a variety of loads. Various experimental studies have been completed, and limited data are available. In addition, a finite-element model has been developed to predict structural response—but not mechanical failure. The scenario ends in a decision whether the entire population should stay in service or needs to be replaced; however, the challenge is simplified into three tasks: predict probability of failure plus an uncertainty estimate, assess the credibility of the prediction, and describe the V&V strategy used. This paper reveals the motivations and the important features; the problem statement is described in the following papers [13,14].
The challenge problems must trade-off the flexibility to allow a variety of V&V approaches and the constraints required to focus the participants' attention. This problem de-emphasizes individual V&V methodologies and focuses on strategies to gather and integrate V&V evidence. This was accomplished by defining an open-ended set of objectives—in contrast to the more prescriptive nature of previous challenge problems [24,28–30,36]. The principle challenge was to develop a defensible V&V strategy, which prioritized which analyses to complete and which methods to use. The participants were free to choose how to use the data and model to achieve the ultimate prediction and had the opportunity—but not the obligation—to perform many analyses: UQ, calibration, solution verification, validation, etc. In addition, participants had to integrate the results and communicate their conclusions. This parallels the projects tackled by engineering organizations; the purpose is to supply credible advice to support technical decisions and there is no set of instructions to follow. This focus on V&V strategy is a small step toward bridging the gap from theory and methods to the processes needed to support decisions. The question is now: Which V&V analyses are the appropriate analyses to perform, in order to make the desired predictions? The next question will be: How do these analyses support and impact the ultimate decision?
To successfully investigate these questions about V&V strategies, the challenge problem required a great deal of complexity and context. The scope of the problem enabled participants to construct their own hierarchy based on the available data and models. This also raised the question of how to aggregate or integrate uncertainty estimates from different analyses—for example, uncertainty from a mesh convergence study, uncertainty and variability of material properties, plus the effects of measurement error. This requires careful considerations of the “meaning” of each uncertainty and the implications for credibility, in addition to methods for UQ.
The large scope also required a large commitment from the participants, and the expectation was that all participants would be limited by time, resources, and expertise—it is a major challenge to assemble a team that includes experts in verification, validation, UQ, physics domains, and systems engineering. That being said, several compromises were made to ensure the problem could be solved in a reasonable timeframe.
The end decision of whether to replace the tanks is not part of the challenge. The problem lacks the required context, such as cost of the tanks, the state of the company, risk tolerances, etc. Instead, an arbitrary criterion is created for the probability of failure. As a consequence, participants focused on a specific prediction and did not have to speculate on the intended use of the model. The participants and audience were expected to be V&V analysts and not systems engineers or decision makers, so this challenge was limited to fit their backgrounds. Nevertheless, the additional context was included to raise awareness of how V&V eventually impacts decisions.
The data, model, and code were supplied. This removed all choices about model development and design of experiments, but greatly reduced the workload. Four finite-element mesh resolutions were available, based on the same geometry, which could be solved using the provided code. The code was described as a finite-element solver, but was actually an algebraic function. This enabled anyone to easily and cheaply run the code, but the model behavior was not the same as a finite-element model—and code verification was not sensible. The data were created from a truth model—not physical experiments.
Only a single structural mechanics example was provided. The organizers wanted a compelling engineering problem rather than a generic mathematical exercise, but needed to keep it simple and accessible for engineers from any physics discipline background. Inevitably, some domain knowledge was helpful in interpreting the results.
The challenge workshop was not a competition and there is no “best” strategy. The responses were not scored or ranked. In fact, the open-ended problem formulation meant that direct comparison of responses was impossible. Discrepancies could not be traced to a single cause because so many different assumptions and method choices were made by each team. Also, no limits were placed on the time invested in the problem, and the participating teams had a wide range of effort levels and priorities. The only step taken to ensure fairness was to provide all the teams access to the same information and instructions. The organizers did not collaborate with any of the participants or reveal the source of the data prior to this special edition. The sources of the data, the truth model, are now available . Again, the organizers stress that scoring the approaches by comparing to the “truth” is not the intention of this activity—indeed, it is not clearly understood how to objectively judge any of the approaches at all. The organizers are not interested in the “correct result”; the goal is to learn about different V&V strategies.
Sandia has used challenge workshops as benchmarks of the V&V community. The community has evolved from developing individual UQ and V&V methods, for investigating how to integrate analyses across a hierarchy and now designing the appropriate hierarchy. Development and demonstration of individual methods are still absolutely critical, but this workshop emphasized V&V strategy as the current challenge. The quality of the workshop responses and the discussions at the ASME V&V Symposium indicate that the V&V community is growing, and V&V theory and methods are maturing. Again, the workshop saw a healthy diversity of V&V strategies, which reflect the different priorities and backgrounds of the participants. A summary of the responses and commentary is included in the following paper . The major lesson is that we need more experience with a larger variety of case studies and we need to learn how to judge the efficacies of V&V strategies.
Looking forward, a major obstacle will be the vast scope of V&V, which relates to the fields of statistics, probability theory, numerical algorithms, systems engineering, and many intermediate topics. It is difficult to merge quantitative and rigorous methods with the qualitative and subjective aspects of credibility. The next challenge may concern the role of V&V with respect to modeling and simulation and the decision-making process.
The workshop organizers hope that the V&V community will build upon the work in this special edition and make use of this challenge problem (and those from prior workshops). These problems are excellent case studies for teaching or demonstrating V&V approaches and concepts and will spark discussions about the role of V&V in engineering work.
This challenge problem and the resulting workshop were made possible with support from the Sandia National Laboratories and ASME. We wish to thank Greg Weirs, Laura Swiler, Walt Witkowski, and the Dakota team at the Sandia National Laboratories, and Ryan Crane and the V&V Standards committees at ASME for helping make the workshop a success. We are especially grateful to the workshop participants, who made this special edition possible, including those whose work does not appear in this issue: Michael Shields, Thomas Brodrick, and James Elele. Sandia National Laboratories is a multiprogram laboratory managed and operated by the Sandia Corporation, a wholly owned subsidiary of the Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under Contract No. DE-AC04-94AL85000.