Abstract
This paper presents an approach that integrates systems engineering principles with reinforcement learning (RL) to solve complex design problems under resource constraints. Product design processes often involve decomposing a large system into interconnected subsystems, with resources allocated across teams responsible for optimizing each subsystem. We formulate this process as a hierarchical multi-armed bandit (MAB) problem, where decisions are made at both the system level (allocating budget across subsystems) and subsystem level (selecting heuristics for sequential information acquisition). We employ Thompson Sampling, a Bayesian RL technique, to concurrently learn effective decision-making policies at each level. This hierarchical RL agent aims to maximize the overall system performance while adhering to fixed budgets for design evaluations. To demonstrate the approach, we present an illustrative example of a race car optimization in The Open Racing Car Simulator (TORCS) environment. Results show that the RL agent can learn to allocate resources strategically, prioritize the most impactful subsystems, and identify effective information acquisition heuristics for each subsystem team. The results also indicate that our method converges to high-performing car configurations with greater efficiency when compared to Bayesian Optimization.