The number of design studies using statistical testing has increased dramatically over the past decade. While this has benefits, statistical testing requires scrutiny to protect against common errors and misconceptions. To illuminate how these issues affect design, this paper provides a comprehensive analysis of the past decade of studies within the DTM community. Specifically, the paper 1) reviews the background of statistical testing across multiple fields, highlighting recommended practices, 2) discusses its use in the Design community, and 3) provides concrete methods for authors and reviewers to evaluate statistical tests employed in Design Cognition studies.
The analysis identifies recurring issues with: ignoring multiple comparisons; deficiencies in study and result reporting; inadequate defense of modeling assumptions; unavailable plots, data, and analysis files for replication; and lack of interpretation of statistical results with respect to practical outcomes or alternate forms of scientific inquiry. Based upon practices already adopted in other research communities, we put forth: 1) checklists that help authors and reviewers verify data reporting, analysis, and statistical assumptions; and 2) design guidelines for creating more reproducible design experiments. Ultimately, we argue that design researchers, reviewers, and editors should view statistical testing less like a sword and more like a scalpel — a specialized tool best used in concert with other techniques — to gain a more complete picture of Design Cognition.