The modern scientific process often involves the development of a predictive computational model that must be assessed through comparison to experiments. This process, called validation, ensures that the model is an accurate representation of reality. A variety of validation metrics have been developed to quantify this process. Some of these metrics have direct physical interpretations and a history of use, while others, especially those for probabilistic data, are more difficult to interpret. In this work, a variety of validation metrics are used to quantify the accuracy of different calibration methods. Frequentist and Bayesian perspectives are used with fixed effects and mixed-effects statistical models. Through a quantitative comparison of the resulting distributions, the most accurate calibration method can be selected. Two examples are included which compare the results of various metrics for different calibration methods. It is quantitatively shown that, in the presence of significant laboratory biases, a fixed effects calibration is significantly less accurate than a mixed-effects calibration. This is because the mixed-effects statistical model properly characterizes the underlying parameter distributions. The results suggest that validation metrics can be used to select the most accurate calibration model for a particular empirical model with corresponding data.