Abstract
The rise of generative artificial intelligence (GAI) applications, epitomized by ChatGPT, has reshaped design processes by enhancing idea generation and conceptual depth for designers. However, the facilitating effects on novice designers' thinking remain uncertain, particularly in the context of sustainable service concept generation. This study examines these impacts of ChatGPT on design thinking process and outcomes through controlled experiments with 36 novice designers with ChatGPT, Tiangong AI, and no tools under a sustainable service design task. Through the protocol analysis, this study visualizes the design thinking by network-based cognitive maps, then evaluates design outcomes and systematically analyzes characteristics of design thinking development under different tool interventions. Findings indicate that ChatGPT enhances design concept novelty and systematicity but has limited impact on originality and sustainability. Furthermore, ChatGPT plays an active role in fostering thinking divergence and fluency, especially in providing relevant guides for developed ideas and accelerating the evaluation and creation process. The network-based cognitive maps reveal distinct shifts and styles influenced by ChatGPT, providing references for novice designers using such tools to enhance inspiration and design fluency, and also effectively employ diverse tools during specific concept generation stage. The study also provides insights for enhancing the relevance of educational curricula and enabling bottom-up sustainable service innovations.
1 Introduction
The emergence of generative artificial intelligence (GAI) applications, leveraging large-scale language models to replicate human creativity and produce contextually relevant content, has profoundly reshaped the landscape of design [1]. The unveiling of ChatGPT by OpenAI in November 2022 sparked a surge in inquiries into GAI's ramifications and significance in design processes [2,3]. GAI applications, with ChatGPT as a representative example, have empirically demonstrated their capability to enhance both the efficiency of idea generation and the depth of conceptualization for designers, whether seasoned professionals or novices [4,5]. Particularly in the face of rising challenges in social sustainability, including the provision of inclusive services for marginalized demographics and fostering positive intergenerational relations [6], these applications provide valuable insights into market demands and patterns. They also support scenario simulations, predictive analyses, and risk assessments for innovative service solutions, thus enabling informed design decisions [7,8]. While these applications are recognized as valuable aids in sustainable service design [9], the precise nodes and scenarios affecting the development of design thinking, as well as the intricate characteristics and causal factors driving the evolution of design thinking under tool intervention, remain unclear [10].
Design, fundamentally, is a problem-solving activity [11], with design thinking as the cognitive activity carried out by designers in this process [12]. Due to its complexity and creative essence, understanding the diverse implicit cognitive processes of designers is crucial for effectively measuring and evaluating design thinking [12]. Protocol analysis, a widely recognized effective method, offers insights into design reasoning, processes, and patterns [13]. In studies employing this method, encoding typically transforms thoughts into graphical representations using frequency data [14]. While it can provide detailed insights into cognitive nodes and relationships, it lacks precision in understanding the dynamic nature of the entire design process and encounters difficulties in intuitively comparing one process with another [15]. Therefore, considering the wealth of verbal data obtained through protocol analysis, improved encoding, and analytical methods are required. These methods should capture the dynamic evolution of thinking under different stimuli, maintain pattern richness, and facilitate the comparison of thinking modes across interventions [16]. Moreover, research has demonstrated that designers' background knowledge and the tools they employ have a significant impact on design thinking patterns [17,18]. Hence, conducting comparative studies is essential for elucidating the complexities of design thinking transformations resulting from different tool interventions [19]. Current research on the effects of GAI applications on design thinking has primarily concentrated on using individual tools as experimental stimuli, often lacking quantitative evaluations and qualitative analyses of thinking patterns under a comparison perspective [20].
This study aims to bridge these gaps by employing network-based cognitive maps. It visualizes the intricate relationships and flow of thinking elements among them through network structures, thereby overcoming the limitation of previous research methods that only understand static rather than dynamic thinking structures. This method investigates the cognitive shifts and decision-making processes of novice designers in the context of sustainable service innovation, exploring specific effects and reasons of ChatGPT on design thinking through experimental stimuli including ChatGPT, Tiangong AI, and no tools. The assessment dimensions include the quality of proposed concepts (design outcomes) and the divergence and fluency of thinking performance (design process). The research findings aid novice designers in overcoming obstacles in complex design tasks by utilizing effective tools tailored to their individual thinking habits to seek inspiration and improve efficiency. Furthermore, this study provides a reference for researchers and practitioners to understand design thinking pathways and the relationships between the development patterns and interactions with support tools. The research questions of this study are as follows:
With network-based cognitive maps, what are the visualized structures of novice designers' sustainable service design thinking supported by different tools?
What are the impacts of ChatGPT on the design outcomes under the evaluation index of novelty, originality, systematicity and sustainability?
What are the impacts of ChatGPT on the divergence and fluency of design thinking process through the analysis of cognitive maps?
2 Related Works
2.1 Generative Artificial Intelligence-Enabled Design.
In recent years, the GAI techniques, propelled by large language models (LLMs), have witnessed a significant surge within the design domain. As a support tool, it has demonstrated the capability to enhance automated design process efficiency [21] and optimized in repetitive tasks [22]. Also, these tools enhanced designers' access to inspiration by identifying unconventional associations [23]. Moreover, GAI also assisted in user and context understanding, which included the efficient construction of user profiles and the facilitation of rapid user empathy and emotional resonance [24]. In practice, employing one or more AI tools individually led to a greater number of conceptual examples, and AI-driven stimuli were found to be of higher quality compared to those emerging from random inspiration [25]. Undoubtedly, the design thinking shaped the outcomes [26]. Regarding the specific impacts of GAI tools on design thinking, they have been demonstrated to augment designers' mental iterations as well as expand the depth, scope, and creative efficacy of design proposals [5]. Nevertheless, research has identified potential risks and limitations associated with GAI tools, including the generation of misleading information, limited output diversity, and potentially diminished user engagement due to excessive reliance, therefore necessitating a meticulous evaluation of the stages and functions of their intervention within the design process to inform strategic implementation in practical applications [27]. Furthermore, although GAI tools may significantly influence students' creative novelty in the short term [28], this effect may diminish over the long term, requiring the consideration of multidimensional assessment metrics when evaluating their influence on the outcomes and thinking process [29].
As a widely accepted and utilized representative of GAI tools, ChatGPT positively influences design inspiration acquisition, development efficiency, and outcomes [28]. For example, ChatGPT serves as a virtual collaborator, enhancing personalized learning experiences for students [30], improving knowledge sharing and acquisition efficiency, and enhancing the quality of team collaboration [5]. Design thinking is a dynamic process characterized by overlapping and interacting modes. It is crucial to analyze the impact of tool interventions on individual and group-level thinking variations under different conditions [31]. However, existing literature predominantly focuses on studies employing single design support tools, with limited research using varied tools as intervention conditions. Consequently, the differentiation in effects between ChatGPT and traditional design support tools remains ambiguous. Furthermore, due to the complexity and dynamic nature of design thinking, investigating ChatGPT's impact on design thinking requires an objective analysis of both overall user differences and individual thinking change characteristics [32].
2.2 Understanding and Measuring the Design Thinking.
The United Nations Sustainable Development Goals (SDGs) 3, 10, and 11 underscored the criticality of protecting the social well-being and health of vulnerable populations and of devising inclusive strategies to ensure equitable access to sustainable urban community services [33]. Amidst the rising trend of population ageing, the mental health of the elderly and the innovation of social interaction services emerged as pivotal interdisciplinary research areas [34]. Service design, focused on human needs, capabilities, and experiences [35], designed systemic solutions that activate human agency, thereby promoting the efficient allocation of resources to address social demands and foster social system innovation [36]. Sustainable service design integrates traditional service design principles with sustainability goals, focusing on the strategic inclusion of ecological, social, and cultural factors. It addresses immediate user needs while considering long-term environmental, economic, and societal impacts. This approach encompasses not only environmental and resource sustainability but also social equity, cultural preservation, and economic viability. Its goal is to create a balanced relationship between human development and the environment, meeting present needs without compromising future generations [37]. This approach prioritizes holistic thinking, where user-centered design is integrated with societal and ecological systems, ensuring that solutions are adaptable, resilient, and align with both the balance of benefits among stakeholders and system efficiency [38].
Sustainable service design follows the fundamental processes of deductive, inductive, and creative reasoning inherent to design thinking [39]. The reasonable encoding of the dynamic development process of this design thinking helps understand how designers develop or adopt frameworks to address open-ended sustainable problems, and their cognitive characteristics in identifying, reasoning, creating, collaborating, and problem-solving [40]. Previous methods of design thinking encoding and measuring, originating from cognitive design, include protocol analysis, behavioral analysis, surveys, and interviews [41], which have been widely adopted. Given that the outcomes of sustainable service design are systematic and abstract service proposals [42], thinking becomes more hidden, and the protocols are sometimes too complex and the encoded data can be challenging to derive meaningful insights from. Therefore, exploring effective coding and analysis methods that can reveal design cognitive processes and summarize meaningful patterns while maintaining the richness of language protocols [43], to support comparison and measurement between different thinking modes, becomes an important research topic. Related studies have attempted the combination of network analysis and cognitive maps as a new approach for dynamic thinking structure encoding [44,45]. This involves abstract thinking transformed into elements represented by different shapes, arrows, and boxes describing relationships between elements, forming a network structure revealing patterns of idea generation and problem-solving strategies among target groups, as well as different thinking development processes under intervention conditions [46]. The cognitive map examples are shown in Fig. 1. Through calculations and analysis of thinking elements and flow within the maps, the development of thinking characteristics can be visually examined.
Measurement indicators based on designers' dynamic thinking encoding typically focus on two dimensions: process and outcomes [47]. Within previous investigations, fluency and divergence in the design thinking process [48,49] serve as critical metrics. Fluency represents the ability to transform various forms of reasoning [50], impacting decision-making after designers gain inspiration [51]. Divergence involves the capacity to generate diverse alternative solutions [52], influencing the diversity of design cognition, the effectiveness of conceptualization, and the richness of proposals [53]. Design outcome evaluations typically consider novelty and originality, representing perceived improvements over existing solutions in terms of functionality or experience [25,54], as well as unique forms, functions, or aesthetics not present in current solutions [55]. For tasks related to sustainable service innovation, design outcome measurements should also encompass systemic coordination reflecting the interdependent operation of nested services [56], and sustainability reflecting the degree of goal alignment and continuous value creation [57]. Although above indicators effectively demonstrate and measure changes in design thinking, particularly the cognitive shifts of different users under diverse tasks, the knowledge gap remains. Existing research has primarily emphasized collective characteristics in design thinking assessments, with fewer studies investigating the cognitive features, styles, and changes of representative individuals under tool support [20]. Therefore, analyzing design thinking changes with ChatGPT interventions requires a comprehensive assessment of both group and individual characteristics to provide meaningful insights for practice and teaching.
3 Research Design
3.1 Hypothesis and Research Framework.
This study proposes two hypotheses:
Hypothesis 1: Regarding design outcomes, the utilization of ChatGPT is anticipated to enhance aspects of novelty, originality, systematicity, and sustainability.
Hypothesis 2: In terms of the design process, employing ChatGPT is expected to facilitate the idea divergent and thinking fluency.
The research framework was delineated as shown in Fig. 2. The substantiation of Hypothesis 1 will focus on the evaluation of conceptual design outcomes from the experiment. And the verification of Hypothesis 2 is based on the construction of network-based cognitive maps that represent participants' design thinking processes. This involves quantifying the thinking elements (nodes) and tracing their trajectory, alongside examining the evolution of inter-element relationships (links) to probe into the impact of various tools on the divergence and fluency of the participants' design thinking. Based on the accuracy review from participants and expert for cognitive maps, the dimension of evaluation and comparative analysis employed the divergent structures and thinking chains.
3.2 Participants.
Due to the limited influence of professional experience and domain knowledge on novice designers [58], they were selected as experimental subjects to investigate how ChatGPT and other tools affect their design thinking. To ensure consistency in design abilities, we recruited 36 second-year product design students (21 females and 15 males) from two parallel classes in the same grade. These students demonstrated comparable academic performance, with an average Grade Point Average (GPA) for design courses ranging from 3.5 to 4.1 on a 5-point scale. The selected students had pursued identical coursework in their prior studies and had no specific training in service design specialization. We ensured that novice designers in both Group A and Group B had minimal experience with GAI tools. This is further supported by the research results discussed later, which show that participants could only pose simple, single-question prompts to the GAI. The 36 participants were divided into three groups of 12 each for the comparative analysis of support tool impacts, and their detailed information is shown in Table 1.
The detailed participant information
Group A | Group B | Group C | |
---|---|---|---|
Average GPA of design courses | 3.67 | 3.65 | 3.66 |
Average age | 21.42 | 21.58 | 21.5 |
Gender | 7 Females and 5 males | 7 Females and 5 males | 7 Females and 5 males |
Group A | Group B | Group C | |
---|---|---|---|
Average GPA of design courses | 3.67 | 3.65 | 3.66 |
Average age | 21.42 | 21.58 | 21.5 |
Gender | 7 Females and 5 males | 7 Females and 5 males | 7 Females and 5 males |
3.3 Experiment Procedure.
The experiment was structured around a design brief with Chinese. The explicit design task was articulated as follows: “Towards the United Nations Sustainable Development Goals (SDG3,10 and 11), focusing on enhancing mental well-being of the elderly and fostering positive intergenerational interactions, please design a service within urban residential communities.” Participants were tasked with creating an integrated proposal for the service, to be presented through a combination of sketches and textual descriptions. It is crucial to encompass key elements such as target users, service journey, service system (including mechanisms for the flow of resources, finances, and information), touchpoints, and stakeholders in design proposals.
The primary dependent variables in this study were the quality of design concepts generated by participants over a 60-min period, and the performance of their design thinking process in terms of divergence and fluency. The independent variables included design support tools: ChatGPT, Tiangong AI (a GAI application powered by a large language model tool in China, is designed to provide users with a selection of highly relevant websites and concise summary responses), and a control group with no tool support serving as a baseline for comparison. The parameter information for the two GAI applications is presented in Table 2. The multilingual testing results by large enterprises and research institutions and user feedback confirm that such GAI applications perform very similarly across multiple languages, demonstrating their adaptability to language differences [61]. In the experiment, the 36 participants were divided into 3 groups of 12 each. As an experimental group, Group A used ChatGPT (3.5), Group B used Tiangong AI, and Group C did not use any tools to complete the design task. The experimental time, environment, and task setup were the same for all three groups.
The parameter information for two GAI applications
Multimodal capabilities | Language | Reference | |
---|---|---|---|
Chat GPT3.5 | It primarily focuses on text generation and processing | It supports multiple languages, including but not limited to English and Chinese | [59] |
Tiangong AI | It supports text-image dialogue, text-to-image capabilities, and text generation and processing | It supports English and Chinese | [60] |
Multimodal capabilities | Language | Reference | |
---|---|---|---|
Chat GPT3.5 | It primarily focuses on text generation and processing | It supports multiple languages, including but not limited to English and Chinese | [59] |
Tiangong AI | It supports text-image dialogue, text-to-image capabilities, and text generation and processing | It supports English and Chinese | [60] |
Before the experiment, participants were asked to engage in the practice of Think Aloud method. As an efficacious data collection approach for the protocol analysis, it requires participants to vocalize their thought processes while performing tasks or solving problems. Specifically, as participants engage in the design tasks, they are encouraged to articulate their current thoughts accurately using language. This can include the use of logical connectors such as “first,” “next,” “however,” and “therefore” to indicate the ongoing evolution of their thinking. Additionally, providing explanations and clarifications of their thought processes can enhance the authenticity and validity of the data [62]. It allows researchers to understand how individuals process information, form hypotheses, make decisions, and solve problems [63]. Subsequently, participants were seated individually in a calm and isolated setting equipped with a computer, audio-visual apparatus, blank A4 paper, and a pen (Fig. 3(a)). The design brief printed on paper was positioned on the experimental table, allowing them to know the specific instructions, design tasks, and limitations.

(a) The experimental environment, (b) sketches of service design proposals drawn by three representative participants
In addition to verbal data, the experiment collected participants' non-verbal data using Behavior Analysis, documenting design proposals and thought processes on paper (Fig. 3(b)), and capturing video recordings of their entire design process. Furthermore, Groups A and B supplemented this with the screen-recording video to document their tool usage. The dataset of this study consists of 36 sets of audio recordings, video recordings, and sketches (from all groups), alongside 24 screen-recording videos (from Groups A and B).
3.4 Data Analysis
3.4.1 Evaluation Dimensions for Design Outcomes.
Based on the review of related literature, four metrics for evaluating design outcomes were selected in this paper. Novelty refers to the distinctive features, enhancements, or innovative aspects of a design solution, concept, or problem-solving approach relative to prior design practices. This is demonstrated by innovative combinations of existing design elements, novel insights into the design process, and fresh interpretations of user needs [54]. The design solution must not only offer a well-considered plan to address specific intergenerational issues but also present the application of services from an innovative experiential perspective, thereby inspiring users to actively engage [64]. The criterion of originality emphasizes the uniqueness of the design solution, concept, or method, indicating that it has not been previously proposed, examined, or implemented within the existing design domain [65]. This implies that the evaluated solution demonstrates significant differentiation, offering novel ideas and insights to the industry or market [66]. Systematicity is about the completeness of the service program's structure and the rationality of its interdependent mechanisms [67]. This assesses whether the proposal addresses complex design challenges in a logically coherent and well-structured manner, while effectively integrating service various components and elements [68]. Sustainability reflects the reusability of the design solution and its significant contribution to the SDGs [69]. This criterion particularly emphasizes the promotion of well-being across all age groups (SDG 3), the reduction of disparities and inequalities within communities (SDG 10), and the establishment of sustainable, inclusive communities (SDG 11). Additionally, the sustainability aspect should be reflected in whether the solution creates mechanisms for effective resource flow, has the potential for long-term impact, and makes positive contributions in economic, social, and cultural domains [70].
The researchers submitted detailed verbal articulations and images of final design proposals from participants to experts for evaluation. A survey questionnaire using a 1–7 Likert scale was employed for evaluation based on four metrics. For example, in the case of novelty, a rating of 1 indicates an extreme lack of novelty, while a rating of 7 indicates extreme novelty. For example, subject A3's design proposal involved organizing an intergenerational planting exchange centered around space seeds. This initiative emphasized the efficient circulation of social resources and the promotion of a green environment, aligning with sustainability objectives. In terms of novelty, the proposal stands out from conventional community-based green planting projects by providing an innovative experiential approach. It integrates the unique feature of space seeds, merging the theme of space exploration with community-based activities. The incorporation of a blind box seed selection mechanism enhances interactivity, introducing elements of both engagement and intrigue. As a result, this proposal excels in both sustainability and novelty, with an average score of 6.67 in each category.
The evaluation was conducted by ten design experts, all of whom hold either a master's or a doctoral degree in design, with an average of 5.9 years of experience since obtaining their bachelor's degree (SD = 3.65). The Cronbach's alpha value for the scoring results was 0.967, indicating the high reliability of the data.
3.4.2 Transcription and Coding Scheme for Design Thinking Process.
The two researchers transcribed three sets of audio recordings to ensure the completeness and accessibility of the data under the design phases of exploration, generation, and development [43]. Subsequently, they analyzed the data to identify key themes and concepts within the design thinking process, segmenting and labeling them based on linguistic content. These nodes included instances such as the participants' initial motivations, initial ideas, episodic and semantic precedents, moments of thinking blockage, assistance with tools, and interpretation of ideas. Finally, screen-recording videos from Groups A and B were integrated as supplementary data into the coding procedure. This served a dual purpose, one was to complement the participants' thinking processes in instances where they might have omitted verbalizing but had indeed utilized the supportive tool. The second purpose was to examine participants' questioning strategies during tool interactions and the thinking development triggered by tool feedback.
The above data coding procedure adhered rigorously to the established guidelines for coding elements and relationships within network-based cognitive maps in previous studies [45]. As a visual and intuitive method of representing the cognitive processes, it illustrates the comprehensive thinking structure by depicting intricate relationships among various thinking elements. The definitions and explanations for these elements are presented in Table 3 [44].
Classification of thinking elements
Elements | Definition | Ref. | Feedback from participants |
---|---|---|---|
Initial motivation (E1) | Deconstruction of design tasks and facilitation of subsequent cognitive elaboration | [71] | "When I received this Design Brief, I needed to understand what intergenerational interaction is. Then, I would search for similar designs to discover unsatisfied needs from them, and finally refine the idea.” (C01) |
Initial ideas (E2) | An initial design concept which is novel in overall aspects, and also satisfies the design brief | [72] | "Based on the search results, I can relate it to a parenting community event in the neighborhood. Is it possible to borrow from the scale and activity of this event?” (B06) |
Developed ideas (E3) | A design concept with more details or additional features compared to the initial idea | [73] | "After seeing the response, I received from ChatGPT, I realized that the infrastructure around this community is poorly built. Perhaps there could be a reconstruction of the fitness equipment in the community.” (A03) |
Precedents (E4) | Episodic precedents: memories related to direct and personal experiences | [74] | "I’ve noticed that our community often organizes weekend eco-friendly clothing swaps, but it appears that older adults are particularly focused on the sentimental value and nostalgia associated with old clothes.” (C08) |
Semantic precedents: memories obtained through learning or inference based on episodic memories | "I previously participated in a networking event in our community, and I noticed that the likelihood of participation by older people was very low. Therefore, one of the main aspects of intergenerational interaction may be to make it engaging enough for older individuals to participate.” (C08) | ||
Request (E5) | Proactive use of tools for information retrieval and obtaining pertinent responses | [73] | "Since I’m currently at a loss for ideas, I’m wondering if GPT can provide a direct solution.” (A06) |
Interpreters (E6) | A conceptual theme that significantly influences the interpretation of the design problem | [75] | "The idea I just proposed mainly targets children who do not have parents to pick them up after school. It is based on the current situation in my neighborhood that I have observed. This idea can alleviate the pressure on parents who are busy with work and provide some enjoyment for elderly individuals in the neighborhood.” (B02) |
Relationships between the elements | Sequential relationships among elements based on idea contributions | [76] | "I have just considered a specific schedule for intergenerational activities, but it may vary for different communities. Therefore, I believe the timing could be more flexible.” (C04) |
Elements | Definition | Ref. | Feedback from participants |
---|---|---|---|
Initial motivation (E1) | Deconstruction of design tasks and facilitation of subsequent cognitive elaboration | [71] | "When I received this Design Brief, I needed to understand what intergenerational interaction is. Then, I would search for similar designs to discover unsatisfied needs from them, and finally refine the idea.” (C01) |
Initial ideas (E2) | An initial design concept which is novel in overall aspects, and also satisfies the design brief | [72] | "Based on the search results, I can relate it to a parenting community event in the neighborhood. Is it possible to borrow from the scale and activity of this event?” (B06) |
Developed ideas (E3) | A design concept with more details or additional features compared to the initial idea | [73] | "After seeing the response, I received from ChatGPT, I realized that the infrastructure around this community is poorly built. Perhaps there could be a reconstruction of the fitness equipment in the community.” (A03) |
Precedents (E4) | Episodic precedents: memories related to direct and personal experiences | [74] | "I’ve noticed that our community often organizes weekend eco-friendly clothing swaps, but it appears that older adults are particularly focused on the sentimental value and nostalgia associated with old clothes.” (C08) |
Semantic precedents: memories obtained through learning or inference based on episodic memories | "I previously participated in a networking event in our community, and I noticed that the likelihood of participation by older people was very low. Therefore, one of the main aspects of intergenerational interaction may be to make it engaging enough for older individuals to participate.” (C08) | ||
Request (E5) | Proactive use of tools for information retrieval and obtaining pertinent responses | [73] | "Since I’m currently at a loss for ideas, I’m wondering if GPT can provide a direct solution.” (A06) |
Interpreters (E6) | A conceptual theme that significantly influences the interpretation of the design problem | [75] | "The idea I just proposed mainly targets children who do not have parents to pick them up after school. It is based on the current situation in my neighborhood that I have observed. This idea can alleviate the pressure on parents who are busy with work and provide some enjoyment for elderly individuals in the neighborhood.” (B02) |
Relationships between the elements | Sequential relationships among elements based on idea contributions | [76] | "I have just considered a specific schedule for intergenerational activities, but it may vary for different communities. Therefore, I believe the timing could be more flexible.” (C04) |
To ensure the coding outcome reliability, two researchers coded and inspected the transcriptions independently and carefully thereby making a judgment. Collaborative arbitration and refinement of ambiguous sections followed to achieve a cohesive coding decision (Cohen's Kappa value was 0.82, indicating substantial inter-rater agreement). Each researcher has over 5 years of research experience in design cognition and thinking quantification, sustainable service design, and social innovation. Their research experiences are highly similar, which ensures that their backgrounds have minimal impact on the coding results. The coding process is visually detailed in Fig.e 4. The results derived from the coding process were utilized to analyze the variations in the participants' thinking with different tools employed.
3.4.3 Constructing the Network-Based Cognitive Maps.
For constructing the network-based cognitive maps with the collected data, an online tool called Miro was employed in this study. The maps are developed in accordance with the principles for analyzing “nodes” and “links” as delineated in social network theory [77], elements listed in Table 3 were considered as nodes, with the logical connections between them represented as links. Each cognitive map began with the Design Brief (DB) as the origin of the design task, with thinking elements symbolized by various graphical markers. The directional arrows depicted sequential relationships among these elements, and closely related elements under the same theme were depicted in close succession. An example of a partial coding result denoted by A12 is displayed in Fig. 5.
The primary objective of this study was to explore the impact of tools on design thinking. Consequently, within the “Request” element, two sub-elements, “Prompt” and “Answer,” were identified. The “Prompt” denotes the participant's deliberate intention to pose a question to ChatGPT or the Tiangong AI, which is instrumental in eliciting an effective response from the tool. By analyzing participant responses to the tool's answers and categorizing them as either “Effective” or “Ineffective,” the researchers identified various outcomes for each Prompt. In Groups A and B, the “Prompts” were distinguished as A-Prompt and B-Prompt, respectively. Figure 6 illustrates the complete cognitive maps of design thinking processes for six exemplary subjects from Groups A and B.
To address the participants' limited self-awareness of their cognitive processes as novice designers, six experts, each with an average of five years of experience in the field, were evenly distributed among the groups to review the maps in individual sessions. Following two iterative rounds of adjudication and refinement, the final accuracy rates for the cognitive maps were determined. Consequently, these procedures led to an enhancement in the mapping accuracy in this study from 94.60% to 99.93%, which indicates a high degree of precision in the current coding outcomes.
3.4.4 Measuring the Design Thinking Process.
In order to explore the impacts of support tools on the service design thinking process, the degree of divergence and fluency were selected as core representations. On one hand, the cognitive maps exhibited a typical structure that originated from a single element and expanded outward into multiple elements [44]. These elements clustered from brainstorming and associative thinking about the solution's functionality or the behavior of its utilization. In this study, this structure was defined as a divergent structure. The divergence analysis of the design thinking process was based on calculating the number of divergent structures and evaluating participant performance [109]. The characteristic of a divergent structure (number of divergence lines greater than 1) is illustrated in Fig. 7(a). In the coding procedure, structures which were developed by the generative information stimulation of support tools were defined as P-type divergent structures.
On the other hand, the fluency of design thinking process was ascertained by counting and characterizing the thinking chain, which constituted a complete, unbroken structure that began with an initial motivation and persisted until no further ideas emerged (each chain comprises a minimum of two elements) [45,46]. The length was determined by counting the number of elements within each chain, as exemplified in Fig. 7(b). In cases where a single element derived multiple thinking chains, these were enumerated individually (e.g., thinking chain Nos. 6, 7, 8, and 9). Conversely, if several elements coalesced into a single element, only one chain was recorded (e.g., thinking chain Nos. 1 and 3). The cognitive map in Fig. 7 illustrates three divergent structures (including the P-type divergent structure) and ten thinking chains, with the longest chain consisting of 12 thinking elements.
4 Results
4.1 Comparative Analysis for the Design Outcomes.
Group A generated 19 design proposals, Group B created 17, and Group C produced 14. In instances where participants yielded multiple proposals, the final score was determined by calculating the average. Based on scores of all participants' individual scores, the average scores for each group across four indicators were calculated (Fig. 8).
The average scores for Group A significantly outperformed those of other groups. Regarding novelty, Group A notched both a higher average and the peak score, suggesting that novice designers, aided by ChatGPT, can create proposals that surpass existing solutions in efficiency or user experience. However, when considering originality, all groups demonstrated comparable levels, with Group A exhibiting both the highest and lowest values. This indicated that ChatGPT's assistance, which iterate and complements existing initiatives with experience and examples, did not yield transformative innovations that transcend existing solutions. The essence of originality remained significantly contingent upon the participants' individual reasoning, decision-making, and creative approaches when facing generated answers. For systematicity, the scores in Group A demonstrated that ChatGPT assists novice designers in rapidly conceptualizing systemic service configurations and operational mechanisms within a short timeframe, compensating for their limited knowledge and capabilities of service design. The sustainability average score of Group A was slightly higher than those of Groups B and C. However, the lowest score (2.67) appeared in Groups A and C. Overall, although ChatGPT demonstrated excellent language comprehension and generation capabilities, the evaluation of the design outcomes showed that its performance across the originality and sustainability dimensions did not establish an absolute advantage over Groups B and C.
4.2 Comparative Analysis of the Design Thinking Process.
All elements within the cognitive maps from 36 participants were tallied, with the findings presented in Table 4.
Statistics on the number of elements in the cognitive maps
Group | Quantities | E1 | E2 | E3 | E4 | E6 | Prompts | Effective answers | Ineffective answers |
---|---|---|---|---|---|---|---|---|---|
A | Maximum | 4 | 29 | 5 | 7 | 9 | 12 | 7 | 3 |
Minimum | 2 | 12 | 0 | 1 | 3 | 2 | 2 | 0 | |
Average | 3.17 | 21.17 | 2.08 | 3.92 | 5.00 | 4.92 | 4.17 | 1.58 | |
B | Maximum | 4 | 76 | 7 | 4 | 13 | 15 | 13 | 2 |
Minimum | 2 | 21 | 0 | 0 | 1 | 2 | 2 | 0 | |
Average | 2.92 | 44.83 | 2.58 | 2.17 | 5.42 | 7.42 | 6.17 | 0.92 | |
C | Maximum | 6 | 47 | 3 | 8 | 10 | – | – | – |
Minimum | 1 | 18 | 0 | 1 | 1 | – | – | – | |
Average | 3.67 | 26.50 | 1.25 | 3.25 | 4.83 | – | – | – |
Group | Quantities | E1 | E2 | E3 | E4 | E6 | Prompts | Effective answers | Ineffective answers |
---|---|---|---|---|---|---|---|---|---|
A | Maximum | 4 | 29 | 5 | 7 | 9 | 12 | 7 | 3 |
Minimum | 2 | 12 | 0 | 1 | 3 | 2 | 2 | 0 | |
Average | 3.17 | 21.17 | 2.08 | 3.92 | 5.00 | 4.92 | 4.17 | 1.58 | |
B | Maximum | 4 | 76 | 7 | 4 | 13 | 15 | 13 | 2 |
Minimum | 2 | 21 | 0 | 0 | 1 | 2 | 2 | 0 | |
Average | 2.92 | 44.83 | 2.58 | 2.17 | 5.42 | 7.42 | 6.17 | 0.92 | |
C | Maximum | 6 | 47 | 3 | 8 | 10 | – | – | – |
Minimum | 1 | 18 | 0 | 1 | 1 | – | – | – | |
Average | 3.67 | 26.50 | 1.25 | 3.25 | 4.83 | – | – | – |
The variation in the number of initial motivations (E1) among all groups was minimal. For initial ideas (E2), Group B was the most prolific, with more than two times the number of ideas generated compared to Group A. This suggests that the diverse information accessed through the Tiangong AI can facilitate rich imagination at the early design stage. However, the ratio of developed ideas (E3) to initial ideas in Group B was significantly lower compared to Group A, indicating that the Tiangong AI did not aid in the iteration of initial ideas. The inspirational answers generated by ChatGPT demonstrate a stronger correlation with the development of viable solutions, indicating greater benefit in aiding subjects to evaluate and refine their initial ideas. Precedents (E4) analysis revealed that Group A had the richest outputs of ideas stimulated by personal experience or prior case studies, followed by Group C, with Group B having the least. This implied that ChatGPT can save time for participants in invoking prior insights for mentally assessing and iterating on ideas. And the results also indicate that the abundance of information presented by Tiangong AI can distract participants' attention and increase the time required to draw upon prior knowledge for idea iteration. In terms of interpreters (E6), the performance across the three groups was relatively uniform, indicating that the tools had minimal impact on the elaboration and definition of ideas. Group B posed significantly more questions than Group A while the latter received a higher proportion of effective answers. Subsequent in-depth analysis of the data related to questions raised by participants and their interactions with the tools revealed that ChatGPT can not only quickly break down task information, but can also provide adapted solutions as inspiration and action paradigms in fewer question-and-answer interactions (Fig. 9(a)). Moreover, it can offer a pathway for design thinking and deduction (Fig. 9(b)), aiding participant in enhancing the efficiency of concept generation and fostering a continuous sense of recognition and excitement beyond expectations, thereby bolstering their problem-solving confidence.

(a) ChatGPT provides the action reference and (b) the problem-solving confidence during design process
4.2.1 The Thinking Divergence Comparison.
As the method introduced in Sec. 3.6, the design thinking process measurement through cognitive maps included statistical calculations and characteristics analysis for both divergent structures and thinking chains. The statistical results of divergent structures within cognitive maps across all groups are presented in Table 5.
The statistical result of divergent structures
Group | Quantities | Divergent structure | P-type divergent structure | Percentage |
---|---|---|---|---|
A | Maximum | 7 | 5 | – |
Minimum | 2 | 1 | – | |
Average | 4.33 | 2.33 | 53.85% | |
B | Maximum | 13 | 6 | – |
Minimum | 4 | 1 | – | |
Average | 8.08 | 3.17 | 39.18% | |
C | Maximum | 11 | – | – |
Minimum | 2 | – | – | |
Average | 5.42 | – | – |
Group | Quantities | Divergent structure | P-type divergent structure | Percentage |
---|---|---|---|---|
A | Maximum | 7 | 5 | – |
Minimum | 2 | 1 | – | |
Average | 4.33 | 2.33 | 53.85% | |
B | Maximum | 13 | 6 | – |
Minimum | 4 | 1 | – | |
Average | 8.08 | 3.17 | 39.18% | |
C | Maximum | 11 | – | – |
Minimum | 2 | – | – | |
Average | 5.42 | – | – |
Overall, Group A exhibits a lower average count of divergent structures compared to Groups B and C. However, the proportion of P-type structures is significantly higher within the total divergent structures for Group A compared to Group B. As depicted in Fig. 10, the prompts that induced the P-type divergent thinking in Group A participants included the following:
Who are the stakeholders? (A05)
What are the underlying contradictions in intergenerational issues? (A06)
Could you propose some solutions to the “digital divide” problem? (A06)
What is the process for implementing service design? (A09)
Could you provide an example of an aging-friendly service design solution for a community? (A09)
This solution seems overly simplistic; could you elaborate on the social interactions and learning mechanisms you’ve suggested? (A09)
Could you refine and expand upon the solution I proposed? (A01)
These prompts consist of straightforward inquiries, primarily focusing on the key elements of the service system, the implementation process, and the associated functions and value contributions. The prompts also include seeking examples of viable service solutions and delve into the application scenarios of specific sub-functions. This pattern suggests that the responses or insights provided by ChatGPT regarding functionality were more effective, prompting participants to concentrate their ideas around the core concept of functionality and leading to detailed refinement in subsequent stages.
To further elucidate the superior divergence effect observed in Group A, we classified the representative divergent structures from both Group A and B into four categories for comparative analysis. As depicted in Fig. 10, Category 01 compared the variety of elements within the divergent structure, prompted by effective stimuli. The participants from Group A exhibited greater diversity, suggesting that triggering past experiences to elicit sudden inspiration was facile and that participants articulated their ideas subsequent to achieving a satisfactory concept. This outcome was attributed to ChatGPT's capacity to distill relevant solutions into categorized examples, thereby offering a compendium of ideas. Conversely, participants from Group B exemplified a scenario where diverse idea generation necessitated individual logical deduction for subsequent directional actions, with further tool inquiries instigated solely following a disruption in logical thought processes. This indicates the role of Tiangong AI limited to furnishing explanatory data without offering a cognitive framework or exemplar solutions, escalating the time cost associated with reasoning through ideas.
Category 02 focused on the divergent structure of developed idea generation. The findings indicated that both Group A and B could elicit sudden inspiration following tool stimuli; however, Group A tended to bypass initial ideas in favor of directly formulating developed ideas post-effective stimulus, whereas Group B relied on a succession of initial ideas to facilitate the transition. Furthermore, despite the sometimes ineffective answers generated from ChatGPT, they nonetheless sparked new ideas among the participants, contributing to fostering critical thinking skills and the practicality assessment capabilities of novice designers. As Fig. 10 shows, Group B's participants produced a single design concept under two different effective answers. This phenomenon was attributed to the high redundancy of valid responses provided by Tiangong AI, which hinders the cultivation of divergent thinking regarding design intricacies.
Category 03 compared the characteristics of divergent structures under multiple consecutive prompt stimuli. In Group A, the divergent structure indicated that after ChatGPT provides an effective response, this information becomes the content of the second prompt. In this iterative interaction that progressively clarifies design details, ChatGPT acts as a co-creator, refining the concept together with participants. In contrast, Group B sequentially introduced multiple prompts, indicating a propensity for participants to become overwhelmed by voluminous information, potentially leading to progress cessation. Consequently, the capacity to generate ideas that shape the final solution is significantly contingent upon the participants' adeptness in formulating queries and the precision of their information retrieval.
The divergent structure labeled Category 04 was a distinctive pattern that frequently emerged among Group A participants, which exhibited an inductive property after diverging, a feature associated with ChatGPT's capability to synthesize information. For instance, it facilitated a high degree of generalization in response to participants' inquiries for problem clarification and offered a conceptual framework for thought. This enabled participants to efficiently transition from a state of divergent ideation, encompassing multiple initial ideas, to a unified notion, with detailed explanations related to the final solutions.
4.2.2 The Thinking Fluency Comparison.
The results of measuring thinking chain lengths are shown in Table 6, indicating that the maximum chain lengths for Groups A and B were roughly twice those of Group C. This extended sequence of ideas suggests that participants, with the aid of support tools, deeply engaged in dispersing ideas, logical reasoning, concept development, iterative refinement, and interpretation.
The statistical results of Groups A, B, and C with thinking chains
Average length of thinking chains | Maximum length of thinking chain | ||
---|---|---|---|
A | Maximum | 16.00 | 20 |
Minimum | 6.89 | 11 | |
Average | 10.98 | 15.25 | |
B | Maximum | 24.00 | 33 |
Minimum | 7.54 | 12 | |
Average | 12.51 | 20.17 | |
C | Maximum | 7.16 | 10 |
Minimum | 4.07 | 5 | |
Average | 5.62 | 8.17 |
Average length of thinking chains | Maximum length of thinking chain | ||
---|---|---|---|
A | Maximum | 16.00 | 20 |
Minimum | 6.89 | 11 | |
Average | 10.98 | 15.25 | |
B | Maximum | 24.00 | 33 |
Minimum | 7.54 | 12 | |
Average | 12.51 | 20.17 | |
C | Maximum | 7.16 | 10 |
Minimum | 4.07 | 5 | |
Average | 5.62 | 8.17 |
The representative samples from each group were selected for analyzing the specific impact of tools on thinking chains, and their cognitive maps under the stages of exploration, generation, and development are shown in Fig. 11. During the exploration stage, participant B11 frequently utilized the Tiangong AI for searches and inquiries, yielding numerous initial ideas. However, the precedents and relevance of these ideas to the interpreters of the final solution were markedly lower compared to those generated by A12. This suggested that the Tiangong AI provided a plethora of information related to initial ideas but lacked sufficient focus on the core design tasks. This amplified the effort required for their evaluation and necessitated numerous interactions with the tool and iterative refinements of the initial concepts to discern a viable direction. In comparison, ChatGPT significantly facilitated the rapid convergence of initial ideas towards a viable design direction.
During the generation stage, Group C exhibited a coherent chain of thinking, but analysis of participants' verbal data revealed a deficiency in evaluative thinking and mental iteration, resulting in less functional and detailed concepts. Conversely, A12 rapidly stimulated the generation of developed ideas following a single effective response from ChatGPT. Analysis of the verbal data suggested that this was due to ChatGPT providing a comprehensive example of a service design concept tailored to the community context and needs expressed by the participants, proving highly relevant and actionable. For Group B, participant B11 initiated successive inquiries after formulating the developed idea, suggesting that the participant felt the concept did not align closely with the expected context and requirements. Further interactions with the Tiangong AI were necessary to validate, enhance, and refine the concept.
The distinctions between A12 and B11 had a significant impact on cognitive patterns during the development stage. A12 remained focused and swiftly elaborated on the proposal's specifics, providing a comprehensive rationale for the final submission. In contrast, B11 validated and refined the developed idea after an additional search, but then pursued another cluster of initial ideas with the assistance of Tiangong AI. This indicated that the Tiangong AI had limitations in evaluating ideas for actionability and accessibility, requiring participants to expend more effort in discernment, resulting in increased time costs. In contrast, ChatGPT has a clear advantage in converging developed ideas into final concepts.
5 Discussion
Previous research on stimulating design thinking development with support tools has typically focused on a single tool as the intervention condition, lacking comparative studies across different intervention scenarios [78,79]. Especially in the emerging research area exploring the impact of GAI applications like ChatGPT on addressing complex design challenges, there is insufficient evidence supporting their facilitating approaches and mechanisms for action. This study focused on addressing this issue and employed network-based cognitive maps as a method to display and measure dynamic thinking structures under various intervention conditions. In the context of sustainable service design tasks, this study used quantitative statistics of varied thinking structures and chains in cognitive maps, along with qualitative analysis of typical subjects, to reveal ChatGPT's role in stimulating diverse inspirations, accelerating concept derivation, and enhancing structured outcomes of service design. These findings are consistent with those observed in earlier studies. ChatGPT served as an efficient tool that supported the refinement and enrichment of detailed functions in product or service design proposals [31], and aided novice designers in addressing complex and advanced challenges [80]. Furthermore, this study observed that ChatGPT's performance in divergent initial ideas was not outstanding, which aligns with previous findings indicating the risks for omitting critical information in generated responses and limitations on providing diverse design solutions [27].
Contrary to early findings, this study proposed that ChatGPT's contributions to promoting the originality and sustainability of sustainable service proposals were limited. While prior research acknowledges its role in detecting user behaviors and preferences to enhance personalized service experiences [81], the findings of this study reveal its greater contribution to integrating and evaluating initial ideas, and providing relevant solution paradigms and concept implementation references. However, constrained by its capability to generate responses based on restructuring existing solution frameworks or optimizing functionalities, ChatGPT's impact on fostering innovative breakthroughs remains limited, especially in emerging sustainable design tasks. This highlights the necessity, within the design education environment, to cultivate novice designers' inquiry capabilities attuned to the intricacies of design tasks and user backgrounds to propose innovative solutions with new experiences and meanings. Additionally, prior studies noted ChatGPT's value as an inspiration stimulation tool [27], which offers useful suggestions similar to virtual customers. However, this study observed that it did not match Tiangong AI in terms of the number of inspirational stimuli but excelled in enhancing concept assessments. Therefore, in practical and learning environments, optimal utilization of multiple tools is essential during brainstorming and idea divergence stages. Furthermore, some researchers have highlighted the risk of reduced ability to discern low-quality information caused by over-reliance on ChatGPT [82]. The cognitive maps of this study indicated that participants approached the generated information with caution, analyzing, assessing, and refining concepts. This was because the experiment task focused on abstract and systematic outputs, prompting novice designers to assess the executability and rationality of responses. This also suggests that ChatGPT has a promotive value in fostering critical thinking among novice designers, subject to the complexity of design tasks and users' prior experience.
In previous studies, the design thinking heavily relies on contextual characteristics and individual capabilities for information retrieval and processing [83]. The innovation of this study lies in validating that ChatGPT, leveraging novel approaches in natural language generation and algorithm-driven methods, enables novice designers to obtain real-time advice and feedback through interactive exchanges, significantly enhancing the efficiency of design cognition. Additionally, traditional design thinking emphasizes intuition and prior experience to support decision-making [84]. This paper proposed that ChatGPT as a decision support platform, could compensate for the limitations of prior experience by analyzing large datasets and simulating user feedback, thereby improving decision accuracy in design thinking and constructing more comprehensive cognitive models. Furthermore, while traditional design thinking emphasizes collaborative creation, effectiveness is sometimes hindered by varying levels of collaboration and knowledge among co-creators [85]. Serving as a virtual collaborator, ChatGPT facilitates openness and diversity in design thinking and is closely intertwined with fields such as technology and engineering, thus overcoming limitations in divergent thinking imposed by collaborators' capabilities.
6 Conclusion
The results from hypothesis verification suggest that hypothesis 1 of this study is partially supported, and hypothesis 2 is supported. Regarding design outcomes, ChatGPT enhanced designers' creation of sustainable service concepts compared to earlier services with improved structures and more effective mechanisms. However, its promotion effect has not increased the gap between other tools in the originality and sustainability of concepts. During the design thinking process, ChatGPT plays a positive role in promoting divergence and fluency. First of all, as an accelerator for controlled divergence and idea evaluation, ChatGPT provides inspirational stimulation that is highly relevant to the task, saving considerable time in evaluating initial ideas and reasoning. Second, ChatGPT served as a platform for evaluating and testing proposals. Through simulated dialogue, it supports testing the effectiveness and accessibility of design ideas and facilitates avoiding risks of proposal failure in practical scenarios. This is crucial for service design that emphasizes innovative changes in real social environments. Thirdly, ChatGPT as an interactive learning partner offered immediate guidance, not only examples of solutions but also the paths for concept advancement, which effectively enhanced thinking fluency and concept creation efficiency.
Comprehending GAI's influence on the cognitive processes and mental shifts of novice designers can demystify the “black box” of design thinking, thereby optimizing the efficiency of design practice, enhancing the pertinence of educational curricula, and bolstering the enactment of sustainable service innovations. The contributions of this research were, firstly, to offer guidance for novice designers to use ChatGPT in the context of arousing inspiration and augmenting design fluency, which aided them in employing diverse tools during distinct phases of concept generation, ensuring the maximization of their respective advantages. Secondly, the cognitive mapping displayed the distinct cognitive shifts and styles exhibited by novice designers, contributing to improving design capabilities and thinking depth necessary for students' learning within GAI-supported educational settings. Lastly, this research unveiled the design outcomes of sustainable services facilitated by various tools, along with the interrelationships among outcomes, processes, and tool utilization strategies, which provided a reference for designers' bottom-up actions.
In the future, especially given the rapid development and iterative trends of GAI technology, there are several potential directions for further research on the facilitative role of GAI applications like ChatGPT in design thinking and activities. For instance, research could explore the impact and mechanisms of ChatGPT and similar GAI applications on personalized product customization, optimization of user experiences, automation and intelligence in the design process, and data-driven design decision-making. However, the risks and challenges associated with ChatGPT should not be overlooked. Challenges include whether the information and conceptual prototypes generated by GAI can reflect diverse user aesthetics and provide insightful recommendations that address complex user needs within intricate social systems. Moreover, the potential risks of over-reliance on GAI applications leading to a decline in designers' inquiry capabilities and social risks associated with job displacement and industry transformation warrant further investigation in the future. Furthermore, this research also has limitations, including the interaction between ChatGPT and subjects, the phrasing of questions, and the coping strategies employed by novice designers, which lead to variations in cognitive processing. These aspects will be addressed and enhanced in future research, thereby providing more robust theoretical foundations for designers and educationalists of efficient use of GAI applications.
Acknowledgment
This research was supported by the Fundamental Research Funds for the Central Universities (2232023E-06) and the Chinese Ministry of Education Humanities and Social Sciences Research Youth Fund Project (23YJC760101).
Author Contribution Statement
Conceptualization, C.J. and R.H.; Methodology, C.J. and R.H.; Investigation, C.J. and R.H.; Writing-original draft preparation, C.J. and R.H.; Writing-review and editing, C.J. and T.S.; All authors have read and agreed to the published version of the manuscript.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.