Abstract

We introduce a novel actor-critic framework that utilizes vision-language models (VLMs) and large language models (LLMs) for design concept generation, particularly for producing a diverse array of innovative solutions to a given design problem. By leveraging the extensive data repositories and pattern recognition capabilities of these models, our framework achieves this goal through enabling iterative interactions between two VLM agents: an actor (i.e., concept generator) and a critic. The actor, a custom VLM (e.g., GPT-4) created using few-shot learning and fine-tuning techniques, generates initial design concepts that are improved iteratively based on guided feedback from the critic—a prompt-engineered LLM or a set of design-specific quantitative metrics. This process aims to optimize the generated concepts with respect to four metrics: novelty, feasibility, and problem-solution relevancy, and variety. We explored the efficacy of incorporating images alongside text in conveying design ideas within our actor-critic framework by experimenting with two mediums for the agents: vision-language and language-only. We extensively evaluated the framework through a case study using the AskNature dataset, comparing its performance against benchmarks such as GPT-4 and real-world biomimetic designs across various industrial examples. Our findings underscore the framework's capability to iteratively refine and enhance the initial design concepts, achieving significant improvements across all metrics. We conclude by discussing the implications of the proposed framework for various design domains, along with its limitations and several directions for future research in this domain.

This content is only available via PDF.
You do not currently have access to this content.