In an increasingly interconnected &amp; cyber-physical world, the ability to coherently measure and manage complexity is vital for the engineering design and systems engineering community. While numerous complexity measures (CMs) have been promulgated over the years, these greatly disagree about how complexity should be measured and so far, there has been no comparison across these CMs. In this paper, we propose a framework for benchmarking CMs in terms of their alignment with commonly-held beliefs in the literature; that a measure of complexity should detect increases in complexity with increasing size or level of interconnections, and should decrease through structuring of the architecture. We adopt a design of experiments approach and synthetically create system architectures to systematically vary across these three dimensions. We use this framework as a shared test-bed to document the response of six CMs that are representative of the predominant perspectives of the literature. We find that none of the measures fully satisfy the commonly-held beliefs of the literature. We also find that there is a dichotomy in the literature regarding the archetype of systems that are considered as complex: physics-based (e.g. aircraft) or flow-based (e.g. the power grid), and the intellectual origin of a CM often determines which system characteristics are considered as more complex. Our findings show that we are far from convergence. Our framework provides a path to enable better cross-validation as the community progresses towards a more complete understanding of the complexity phenomena.