Machine learning is opening up new ways of optimizing designs but it requires large data sets for training and verification. While such data sets already exist for financial, sales and business applications, this is not the case for engineering product design data. This paper discusses our efforts in curating a large Computer Aided Design (CAD) data set with desired variety and validity for automotive body structural compositions. Manual creation of 60,000 CAD variants is obviously not viable so we examine several approaches that can be automated with commercial CAD systems such as Parametric Design, Feature Based Design, Design Tables/Catalogs of Variants and Macros. We discuss pros and cons of each method and how we devised a combination of these approaches. This hybrid approach was used in association with DOE tables. Since the geometric configurations and characteristics need to be correlated to performance (structural integrity), the paper also demonstrates automated workflows to perform FEA on CAD models generated. Key simulation results can then be associated with CAD geometry and, for example, processes using machine learning algorithms for both supervised and unsupervised learning. The information obtained from the application of such methods to historical CAD models may help to understand the reasoning behind experiential design decisions. With the increase in computing power and network speed, such datasets together with novel machine learning methods, could assist in generating better designs, which could potentially be obtained by a combination of existing ones, or might provide insights into completely new design concepts meeting or exceeding the performance requirements.