Machine learning is opening up new ways of optimizing designs, but it requires large data sets for training and verification. The primary focus of this paper is to explain the trade-offs between generating a large data set and the level of idealization required to automate the process of generating such a data set. This paper discusses the efforts in curating a large CAD data set with the desired variety and validity of automotive body structures. A method to incorporate constraint networks to filter invalid designs, prior to the start of model generation is explained. Since the geometric configurations and characteristics need to be correlated to performance (structural integrity), the paper also demonstrates automated workflows to perform finite element analysis on 3D CAD models generated. Key simulation results can then be associated with CAD geometry and fed to the machine learning algorithms. With the increase in computing power and network speed, such datasets could assist in generating better designs, which could potentially be obtained by a combination of existing ones, or might provide insights into completely new design concepts meeting or exceeding the performance requirements. The approach is explained using the hood frame as an example, but the same can be adopted to other design components.