Abstract

This article introduces a generative model designed for multimodal control over text-to-image foundation generative artificial intelligence (AI) models such as Stable Diffusion, specifically tailored for engineering design synthesis. Our model proposes parametric, image, and text control modalities to enhance design precision and diversity. First, it handles both partial and complete parametric inputs using a diffusion model that acts as a design autocomplete copilot, coupled with a parametric encoder to process the information. Second, the model utilizes assembly graphs to systematically assemble input component images, which are then processed through a component encoder to capture essential visual data. Third, textual descriptions are integrated via CLIP encoding, ensuring a comprehensive interpretation of design intent. These diverse inputs are synthesized through a multimodal fusion technique, creating a joint embedding that acts as the input to a module inspired by ControlNet. This integration allows the model to apply robust multimodal control to foundation models, facilitating the generation of complex and precise engineering designs. This approach broadens the capabilities of AI-driven design tools and demonstrates significant advancements in precise control based on diverse data modalities for enhanced design generation.

References

1.
Dhariwal
,
P.
, and
Nichol
,
A.
,
2021
, “
Diffusion Models Beat GANs on Image Synthesis
,”
Adv. Neural Inf. Process. Syst.
,
34
(
34
), pp.
8780
8794
.
2.
Song
,
B.
,
Zhou
,
R.
, and
Ahmed
,
F.
,
2023
, “Multi-Modal Machine Learning in Engineering Design: A Review and Future Directions,” arXiv:2302.10909.
3.
Rombach
,
R.
,
Blattmann
,
A.
,
Lorenz
,
D.
,
Esser
,
P.
, and
Ommer
,
B.
,
2022
, “
High-Resolution Image Synthesis With Latent Diffusion Models
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
New Orleans, LA
,
June 18–24
, pp.
10684
10695
.
4.
Goodfellow
,
I.
,
Pouget-Abadie
,
J.
,
Mirza
,
M.
,
Xu
,
B.
,
Warde-Farley
,
D.
,
Ozair
,
S.
,
Courville
,
A.
, and
Bengio
,
Y.
,
2014
, “
Generative Adversarial Nets
,”
Commun. ACM
,
63
(
11
), pp.
139
144
.
5.
Kingma
,
D. P.
, and
Welling
,
M.
,
2013
, “Auto-Encoding Variational Bayes,” arXiv:1312.6114.
6.
Sohl-Dickstein
,
J.
,
Weiss
,
E.
,
Maheswaranathan
,
N.
, and
Ganguli
,
S.
,
2015
, “
Deep Unsupervised Learning Using Nonequilibrium Thermodynamics
,”
the 32nd International Conference on International Conference on Machine Learning
,
Lille, France
,
July 7–9
, PMLR, pp.
2256
2265
.
7.
Karras
,
T.
,
Laine
,
S.
, and
Aila
,
T.
,
2019
, “
A Style-Based Generator Architecture for Generative Adversarial Networks
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
43
(
12
), pp.
4217
4228
.
8.
Brock
,
A.
,
Donahue
,
J.
, and
Simonyan
,
K.
,
2018
, “Large Scale Gan Training for High Fidelity Natural Image Synthesis,” arXiv:1809.11096.
9.
Chan
,
E. R.
,
Lin
,
C. Z.
,
Chan
,
M. A.
,
Nagano
,
K.
,
Pan
,
B.
,
De Mello
,
S.
, and
Gallo
,
O.
,
2022
, “
Efficient Geometry-Aware 3D Generative Adversarial Networks
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
New Orleans, LA
,
June 18–24
, pp.
16123
16133
.
10.
Skorokhodov
,
I.
,
Tulyakov
,
S.
, and
Elhoseiny
,
M.
,
2022
, “
StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
New Orleans, LA
,
June 18–24
, pp.
3626
3636
.
11.
Bowman
,
S. R.
,
Vilnis
,
L.
,
Vinyals
,
O.
,
Dai
,
A. M.
,
Jozefowicz
,
R.
, and
Bengio
,
S.
,
2015
, “Generating Sentences From a Continuous Space,” arXiv:1511.06349.
12.
Simonovsky
,
M.
, and
Komodakis
,
N.
,
2018
, “
Graphvae: Towards Generation of Small Graphs Using Variational Autoencoders
,” Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Proceedings, Part I 27, Rhodes, Greece, Oct. 4–7,
Springer
, pp.
412
422
.
13.
Razavi
,
A.
,
Van den Oord
,
A.
, and
Vinyals
,
O.
,
2019
, “
Generating Diverse High-Fidelity Images With vq-vae-2
,”
Adv. Neural Inf. Process. Syst.
,
32
(
1
), pp.
14866
14876
.
14.
Preechakul
,
K.
,
Chatthee
,
N.
,
Wizadwongsa
,
S.
, and
Suwajanakorn
,
S.
,
2022
, “
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
New Orleans, LA
,
June 18–24
, pp.
10619
10629
.
15.
Wei
,
C.
,
Mangalam
,
K.
,
Huang
,
P.-Y.
,
Li
,
Y.
,
Fan
,
H.
,
Xu
,
H.
,
Wang
,
H.
,
Xie
,
C.
,
Yuille
,
A.
, and
Feichtenhofer
,
C.
,
2023
, “
Diffusion Models as Masked Autoencoders
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Vancouver, Canada
,
June 17–24
, pp.
16284
16294
.
16.
Nichol
,
A.
,
Jun
,
H.
,
Dhariwal
,
P.
,
Mishkin
,
P.
, and
Chen
,
M.
,
2022
, “Point-e: A System for Generating 3D Point Clouds From Complex Prompts,” arXiv:2212.08751.
17.
Xing
,
Z.
,
Feng
,
Q.
,
Chen
,
H.
,
Dai
,
Q.
,
Hu
,
H.
,
Xu
,
H.
,
Wu
,
Z.
, and
Jiang
,
Y.-G.
,
2023
, “A Survey on Video Diffusion Models,” arXiv:2310.10647.
18.
Huang
,
Q.
,
Park
,
D. S.
,
Wang
,
T.
,
Denk
,
T. I.
,
Ly
,
A.
,
Chen
,
N.
,
Zhang
,
Z.
, et al.
2023
, “Noise2Music: Text-conditioned Music Generation With Diffusion Models,” arXiv:2302.03917.
19.
Saadi
,
J. I.
, and
Yang
,
M. C.
,
2023
, “
Generative Design: Reframing the Role of the Designer in Early-Stage Design Process
,”
J. Mech. Des.
,
145
(
4
), p.
041411
.
20.
Zhang
,
L.
, and
Agrawala
,
M.
,
2023
, “
Adding Conditional Control to Text-to-Image Diffusion Models
,”
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
,
Paris, France
,
Oct. 1–6
.
21.
Ju
,
X.
,
Zeng
,
A.
,
Zhao
,
C.
,
Wang
,
J.
,
Zhang
,
L.
, and
Xu
,
Q.
,
2023
, “
HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
,”
2023 IEEE/CVF International Conference on Computer Vision (ICCV)
,
Paris, France
,
Oct. 1–6
.
22.
Hoe
,
J. T.
,
Jiang
,
X.
,
Chan
,
C. S.
,
Tan
,
Y. -P.
, and
Hu
,
W.
,
2023
, “Interactdiffusion: Interaction Control in Text-to-Image Diffusion Models,” arXiv:2312.05849.
23.
Ho
,
J.
, and
Salimans
,
T.
,
2022
, “Classifier-Free Diffusion Guidance,” arXiv:2207.12598.
24.
Crowson
,
K.
,
Biderman
,
S.
,
Kornis
,
D.
,
Stander
,
D.
,
Hallahan
,
E.
,
Castricato
,
L.
, and
Raff
,
E.
,
2022
, “
VQGAN-CLIP: Open Domain Image Generation and Editing With Natural Language Guidance
,”
The European Conference on Computer Vision (ECCV)
,
Tel Aviv, Israel
,
October 2022
, Springer, pp.
88
105
.
25.
Su
,
Hanqi
,
Binyang
,
Song
, and
Ahmed
,
Faez
,
2023
, “
Multi-Modal Machine Learning for Vehicle Rating Predictions Using Image, Text, and Parametric Data
,”
ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Boston, MA
,
Aug. 20–24
.
26.
Bagazinski
,
N. J.
, and
Ahmed
,
F.
,
2023
, “
Shipgen: A Diffusion Model for Parametric Ship Hull Generation With Multiple Objectives and Constraints
,”
J. Mar. Sci. Eng.
,
11
(
12
), p.
2215
.
27.
Mazé
,
F.
, and
Ahmed
,
F.
,
2023
, “
Diffusion Models Beat GANs on Topology Optimization
,”
The Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence
,
Washington, DC
,
Feb. 7–14
, Vol. 37, pp.
9108
9116
.
28.
Yuan
,
L.
,
Yan
,
D.
,
Saito
,
S.
, and
Fujishiro
,
I.
,
2024
, “
Diffmat: Latent Diffusion Models for Image-Guided Material Generation
,”
Vis. Inform.
,
8
(
1
), pp.
6
14
.
29.
Lee
,
K.-H.
,
Lim
,
H. J.
, and
Yun
,
G. J.
,
2024
, “
A Data-Driven Framework for Designing Microstructure of Multifunctional Composites With Deep-Learned Diffusion-Based Generative Models
,”
Eng. Appl. Artif. Intell.
,
129
(
1
), p.
107590
.
30.
Zhou
,
R.
,
Yuan
,
C.
,
Permenter
,
F.
,
Zhang
,
Y.
,
Arechiga
,
N.
,
Klenk
,
M.
, and
Ahmed
,
F.
,
2024
, “
Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Washington, DC
,
Aug. 25–28
.
31.
Edwards
,
K. M.
,
Man
,
B.
, and
Ahmed
,
F.
,
2024
, “
Sketch2prototype: Rapid Conceptual Design Exploration and Prototyping With Generative Ai
,”
Proc. Des. Soc.
,
4
(
1
), pp.
1989
1998
.
32.
Chong
,
L.
,
Rayan
,
J.
,
Dow
,
S.
,
Lykourentzou
,
I.
, and
Ahmed
,
F.
,
2024
, “CAD-Prompted Generative Models: A Pathway to Feasible and Novel Engineering Designs,” arXiv:2407.08675.
33.
Chen
,
Q.
,
Wang
,
J.
,
Pope
,
P.
,
Chen
,
W.
, and
Fuge
,
M.
,
2022
, “
Inverse Design of Two-Dimensional Airfoils Using Conditional Generative Models and Surrogate Log-Likelihoods
,”
ASME J. Mech. Des.
,
144
(
2
), p.
021712
.
34.
Yukish
,
M. A.
,
Stump
,
G. M.
, and
Miller
,
S. W.
,
2020
, “
Using Recurrent Neural Networks to Model Spatial Grammars for Design Creation
,”
ASME J. Mech. Des.
,
142
(
10
), p.
104501
.
35.
Yang
,
Z.
,
Guo
,
Y.
,
Sun
,
Z.
,
Elkhodary
,
K.
,
Feng
,
F.
,
Kang
,
Z.
,
Tang
,
S.
, and
Guo
,
X.
,
2025
, “
Graphdgm: A Generative Data-Driven Design Approach for Frame and Lattice Structures
,”
ASME J. Mech. Des.
,
147
(
3
), p.
031703
.
36.
Etesam
,
Y.
,
Cheong
,
H.
,
Ataei
,
M.
, and
Jayaraman
,
P. K.
,
2024
, “Deep Generative Model for Mechanical System Configuration Design,” arXiv:2409.06016.
37.
Zhu
,
Q.
,
Zhang
,
X.
, and
Luo
,
J.
,
2023
, “
Biologically Inspired Design Concept Generation Using Generative Pre-Trained Transformers
,”
ASME J. Mech. Des.
,
145
(
4
), p.
041409
.
38.
Alam
,
M. F.
, and
Ahmed
,
F.
,
2024
, “GenCAD: Image-conditioned Computer-Aided Design Generation With Transformer-Based Contrastive Representation and Diffusion Priors,” arXiv:2409.16294.
39.
Regenwetter
,
L.
,
Curry
,
B.
, and
Ahmed
,
F.
,
2022
, “
BIKED: A Dataset for Computational Bicycle Design With Machine Learning Benchmarks
,”
ASME J. Mech. Des.
,
144
(
3
), p.
031706
.
40.
Radford
,
A.
,
Kim
,
J. W.
,
Hallacy
,
C.
,
Ramesh
,
A.
,
Goh
,
G.
,
Agarwal
,
S.
, and
Sastry
,
G.
,
2021
, “
Learning Transferable Visual Models From Natural Language Supervision
,”
International Conference on Machine Learning
,
Vienna, Austria
,
July 18–24
, PMLR, pp.
8748
8763
.
41.
Hu
,
Z.
, and
Xu
,
D.
,
2023
, “Videocontrolnet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model With Controlnet,” arXiv:2307.14073.
You do not currently have access to this content.