This paper discusses a framework for designing artificial test problems, evaluation criteria, and two of the benchmark tests developed under a research project initiated by the Canadian Nuclear Safety Commission to investigate the approaches for qualification of tolerance limit methods and algorithms proposed for application in optimization of CANDU reactor protection trip setpoints for aged conditions. A significant component of this investigation has been the development of a series of benchmark problems of gradually increased complexity, from simple “theoretical” problems up to complex problems closer to the real application.

The first benchmark problem discussed in this paper is a simplified scalar problem which does not involve extremal, maximum or minimum, operations, typically encountered in the real applications. The second benchmark is a high dimensional, but still simple, problem for statistical inference of maximum channel power during normal operation.

Bayesian algorithms have been developed for each benchmark problem to provide an independent way of constructing tolerance limits from the same data and allow assessing how well different methods make use of those data and, depending on the type of application, evaluating what the level of “conservatism” is. The Bayesian method is not, however, used as a reference method, or “gold” standard, but simply as an independent review method.

The approach and the tests developed can be used as a starting point for developing a generic suite (generic in the sense of potentially applying whatever the proposed statistical method) of empirical studies, with clear criteria for passing those tests. Some lessons learned, in particular concerning the need to assure the completeness of the description of the application and the role of completeness of input information, are also discussed.

It is concluded that a formal process, which should include extended and detailed benchmark tests, but targeted to the context of the particular application and aimed at identifying the domain of validity of the proposed tolerance limit method and algorithm, is needed and might provide the necessary confidence in the proposed statistical procedure.

