The high heat flux and strong thermal coupling in the 3D ICs has limited the performance gains that would otherwise be feasible in 3D structures. The common practice of adopting worst-case design margins is in part responsible for this limitation since average-case performance would be limited by worst-case thermal design margins. The coupling between temperature and leakage power exacerbates this effect. However, worst-case thermal conditions are not the common state across the package at runtime. We argue for the co-design of the package, architecture, and power management based on the multi-physics interactions between temperature, power consumption and system performance. This approach suggests an adaptive architecture that accommodates the thermal coupling between layers and leads to increased energy efficiency over a wider operating voltage range and therefore higher performance.
In this paper, we target at a 3D multicore architecture where the cores reside on one die and the last level cache (LLC) resides on the other. The DRAM stack may be stacked on top of the package (e.g., 3D) or in the same package (e.g., 2.5D). We propose a novel adaptive cache structure — the constant performance model (CPM) cache — based on voltage adaptations to temperature variations. We construct a HSPICE model for the SRAM to explore the relationship between temperature, supply voltage, and the circuit delay in the context of the LLC. This model is used to investigate, characterize, and analyze the effect of the temperature-delay dependence of the SRAM LLC configuration on the system-level performance and energy efficiency. This analysis gives rise to an intelligent scheme for dynamic voltage regulation in the LLC cache that is sensitive to the temperature of the individual cache banks. Each cache bank is thermally coupled to the associated cores and thus is sensitive to the local core-level power management. We show that this local adaptation to the temperature-delay dependence leads to a significant power reduction in the LLC cache, and improvement of system energy efficiency computed as energy per instruction (EPI). We evaluate our approach using a cycle-level, full system simulation model of a 16-core x86 homogenous microarchitecture in 16nm technology that boots a full Linux operating system and executes application binaries. The advantages of the proposed adaptive LLC structure illustrate the potential of the co-design of the package, architecture, and power management in future 3D multicore architectures.