Abstract
Aspect-based sentiment analysis (ABSA) enables a systematic identification of user opinions on particular aspects, thus improving the idea creation process in the initial stages of a product/service design. Large language models (LLMs) such as T5 and GPT have proven powerful in ABSA tasks due to their inherent attention mechanism. However, some key limitations remain. First, existing research mainly focuses on relatively simpler ABSA tasks such as aspect-based sentiment analysis, while the task of extracting aspects, opinions, and sentiment in a unified model remains largely unaddressed. Second, current ABSA tasks overlook implicit opinions and sentiments. Third, most attention-based LLMs use position encoding in a linear projected manner or through split-position relations in word distance schemes, which could lead to relation biases during the training process. This paper incorporates domain knowledge into LLMs by introducing a new position encoding strategy for the transformer model. This paper addresses these gaps by (1) introducing the ACOSI (aspect, category, opinion, sentiment, implicit indicator) analysis task, developing a unified model capable of extracting all five types of labels in the ACOSI analysis task simultaneously in a generative manner; (2) designing a new position encoding method in the attention-based model; and (3) introducing a new benchmark based on ROUGE score that incorporates design domain knowledge inside. The numerical experiments on manually labeled data from three major e-Commerce retail stores for apparel and footwear products showcase the domain knowledge inserted transformer method’s performance, scalability, and potential.
1 Introduction
Extracting user opinions and completing implicit knowledge from online product reviews have become increasingly vital for successful innovative product design. Utilizing design domain expertise synergistically with large language models (LLMs) to augment the design process and facilitate implicit knowledge completion has emerged as a prevalent prospective methodology in recent years [1–6]. With the exponential growth of online purchasing platforms, a vast amount of user-generated information has accumulated on customer experiences with various products and services. According to recent market surveys, reviews significantly impact customer purchase decisions. Aspect-based sentiment analysis (ABSA) and fine-grained specified LLMs have emerged as crucial facilitators in finding widespread user opinions through e-Commerce and social media platforms. However, due to the noisy and unstructured nature of review data and the lack of efficient methods to insert domain knowledge into LLMs for specified models, extracting valuable information and completing implicit knowledge are often hindered by the limitations of state-of-the-art natural language processing (NLP) methods.
Moreover, there is a large deficiency of automated methods for the completion of implicit knowledge on a large scale from reviews, which can provide valuable information for product designers and improve the probability of successful product development [7]. Effortless completion of implicit knowledge is expected to improve both the quantity and the quality of ideas in the design generation process [8,9]. To address this, we propose a novel approach to implicit knowledge completion. Our dataset, named ACOSI, consists of five labels: aspect (A), category (C), opinion (O), sentiment (S), and implicit indicator (I). The latter represents implicit sentiment-related information from various perspectives, crucial for comprehensive implicit knowledge completion. The encoding algorithm is a novel methodology based on transformer models [10,11], that is, the design-knowledge-guided position design algorithm (DKG), designed to enhance implicit knowledge completion. The validity of the methodology is examined using a sizable dataset collected from notable e-Commerce platforms in the apparel and footwear industry.2 This section outlines the rationale behind the research, the selection of the dataset and methodology, as well as the goals and contributions of the article in the context of implicit knowledge completion for product design.
The significance of implicit knowledge completion in product design is immense. Unlike explicit knowledge that customers can easily articulate, implicit knowledge represents unexpressed desires or problems that customers may not even be aware of [12]. This hidden knowledge often leads to breakthrough innovations and can provide a competitive advantage in the market [13]. By completing implicit knowledge, designers can create products that not only meet current expectations but also anticipate future demands, potentially revolutionizing entire product categories [14]. This proactive approach to design aligns with the concept of “design-driven innovation” proposed by Verganti [15], where companies lead users rather than simply following their expressed needs. Furthermore, the completion of implicit knowledge can result in higher customer satisfaction and loyalty, as users often experience a sense of delight when a product solves problems they had not previously recognized [16]. In the context of our research, the extraction of implicit knowledge from online reviews represents a critical step toward systematically completing this knowledge, thus improving the potential for truly innovative and user-centered product design [17]. Despite extensive research, a universally accepted definition of implicit knowledge remains elusive. Many researchers approach the concept by focusing on edge users and edge use cases [18–22]. Another perspective defines implicit knowledge as unexpected delighters—features or attributes that, once revealed, significantly enhance user satisfaction [23]. While this latter definition captures the essence of the impact of implicit knowledge, it also makes the concept more challenging to extrapolate and apply consistently across different contexts.
1.1 Knowledge Gaps.
Exploration of user needs is a preliminary step in early-stage new product development processes [24]. Existing need-finding approaches could be divided into two categories, empirical study and data-driven study, and the attention-based model is mostly utilized in recent data-driven research [25]:
Gaps in empirical studies. The main basis for these approaches is the analysis of previous designs [26,27], surveys, and focus group studies [28], and Web-based configurators [29,30]. However, these methods have inherent biases because they target only specific portions of the user population and product instances and are limited to structured inquiries. The lack of direct methods for customers to articulate their requirements [31] and the influence of prior knowledge [32] exacerbate this shortcoming. These restrictions have hindered the widespread adoption of mass customization approaches in industry, given the considerable economic and operational gaps involved [33].
Gaps in data-driven studies. In this category, some researchers have attempted to integrate the information from the image and text to evaluate the generative design in the design concept generation process [34–37]. When using the pure text-based database, sentiment analysis has become a key enabler for the “large-scale” needs to find and allow the extraction of opinions from myriad users of e-Commerce and social media platforms [38]. Sentiment analysis is the process of identifying the subjective opinion of an opinion holder (e.g., user) for a target (e.g., product attribute) from an unstructured text (e.g., product review) [24,27,39–51]. Among the three levels of sentiment analysis (document level, sentence level, and word level sentiment analysis), ABSA could provide the most fine-grained information from the raw text, namely, aspect, opinion, and sentiment. With the increasing demand for unified analysis, the extraction of triplets of opinion sentiments on aspects attracts much attention from the community [38,39,42,45,51–54]. Some researchers are expanding the task with a new label “category” that makes the task become a quadruple extraction problem, ACOS (aspect, category, opinion term, sentiment) [55–60]. However, the ABSA and ACOS quadruples cannot elicit implicit opinions and aspects. Among the proposed methods, the implicit opinion has been ignored or simply denoted as a “Null” label. Even the ACOS task is only capable of predicting the four labels. When the review does not mention explicit aspects or opinions, the model will output “Null.” This peculiarity prevents the model from extracting information that implies or describes the aspect indirectly.
Gaps in position encoding in the attention-based language model. In pre-trained language models such as T5 (Text-To-Text Transfer Transformer) [61] and Bidirectional Encoder Representations from Transformers (BERT) [62], to integrate position information into the transformer, position encoding is processed along with input. There are two directions of position encoding in a previous study, fixed position encoding and relative position encoding. In terms of fixed position encoding, BERT uses the same method as in the original transformer model, which is the cosine projection of the fixed position. T5 chooses the relative position encoding strategy in their model, they choose the prefixed length 128 as a relation range, and assume that the distance of words further than this range is not related. However, both position encodings only take the word position as the index, and in domain-specific problems like the needs identification problem, some words are more important than others, even the word appears in the edge area. Researcher tried various methods to reduce position bias [63–66] in sequence-to-sequence tasks, especially specific position-sensitive tasks such as machine translation. Yet, the position bias was treated as the redundant part and not being used as a part to carry information.
Gaps in incorporating domain knowledge in LLMs. LLMs have shown power in several NLP tasks [11,62,67–72]. The success of ChatGPT [73] invokes more resources and expectations for LLMs’ ability. However, even with a great text generation ability, ChatGPT could not answer questions that require domain knowledge well. Moreover, in transformer-based models, there is a lack of systematic way to include domain knowledge in the training process [25], most domain knowledge incorporation is processed by the fine-tuning process or few shot learning process [68,74–78], yet this type of method did not consider domain knowledge in the training process; the model will work purely on the fixed structure without prior knowledge.
1.2 Objectives.
This research paper aims to address the absence of systematic approaches in the implicit knowledge completion from online reviews using NLP by developing a new position encoding algorithm in standard attention-based models [55,79] and also introduces a new annotated dataset and an efficient position encoding algorithm that facilitate the automated extraction of implicit user needs from online reviews with design domain knowledge integrated with a transformer. Currently, no NLP model is capable of identifying aspects or opinions that have not been explicitly mentioned in a review or those that do not correspond to a specific aspect, category, or sentiment. To overcome this limitation, the paper proposes a new NLP task called ACOSI, which has a structure similar to ACOS [55], but with a crucial distinction. In ACOSI, the text that implies the aspect will be identified and labeled as the indirect opinion when the user does not provide explicit descriptions of the aspect. This new task covers aspects, categories, opinions, sentiments, and implicit opinion extraction, hence the acronym ACOSI.
Despite the similarity of their acronyms, ACOS [55] and ACOSI address completely different problems in terms of their outputs. In ACOS, researchers labeled implicit opinions as “Null,” while in ACOSI, the model can identify and extract opinion text related to aspects, regardless of whether they are explicit or implicit. This crucial difference means that the ACOSI analysis task has the capability to output opinions in association with aspects, whereas the ACOS task does not. Design-knowledge-guided position encoding is generated to achieve the proposed goal. The key contributions of this paper are summarized below.
A newly created annotated and curated dataset is available to address ACOSI analysis tasks in the product design domain through NLP techniques. This dataset can also be utilized to solve ABSA and ACOS tasks.
A novel NLP model has been developed and trained on the annotated dataset, with the potential to solve problems related to extracting implicit sentiments, aspects, and opinions.
A DKG position encoding algorithm has been developed to overcome the context unawareness problem.
The remainder of this article is organized as follows. Section 2 provides a summary of background work related to the main research topics, including ACOSI analysis task and position encoding in large language models. Section 3 discusses the details of the proposed ACOSI analysis task, including the DKG position encoding, the model building process, new loss, the benchmark building process, and the model training process. Section 4 presents the experimental results, analyses, and implications of the developed methodology. Section 6 provides concluding remarks and several directions for future research.
2 Background
This section provides an overview of related and background work on ACOSI analysis task, ABSA task in the unified model, and position encoding in the attention-based deep learning model.
2.1 ACOSI Analysis Task.
A standard sentiment analysis task involves annotating the user review language into two basic subtasks: aspect extraction and aspect-based sentiment classification. When the two subtasks are integrated, an aspect–sentiment pair of [A–S] is produced where A is the aspect term and S is the sentiment [42,44,45,51]. Expanding on this approach, Qiu et al. [80] produced a triplet of aspect A, sentiment S, and opinion O. With these two approaches, researchers were able to perform A–O pair extraction [81–83] and Aspect, Opinion, Sentiment (AOS) triple extraction [84].
Recently, Cai et al. [55] found that previous research focused solely on extracting explicit aspects and opinion terms from user product reviews, ignoring the fact that 44% of the time these reviews also contained implicit aspects or opinion terms. They introduced a comprehensive task called ACOS that combines explicit and implicit aspects and opinion terms. For example, consider the review: “I like the look and the velvet is great, but the quality of the velvet is not strong.” This review can be broken down into two segments:
“I like the look”: Here, the aspect term is implicit (Null), the category is “Appearance,” the opinion term is “like,” and the sentiment is “Positive.” Thus, the corresponding ACOS label is [Null-Appearance-like-Positive]. “The velvet is great, but the quality of the velvet is not strong”: This segment contains two aspects: (a) “The velvet is great”: The aspect term is “velvet,” the category is “Material,” the opinion term is “great,” and the sentiment is “Positive.” The ACOS label is [velvet-Material-great-Positive]. (b) “The quality of the velvet is not strong”: The aspect term is “velvet,” the category is “‘Material,” the opinion term is implicit (NULL), and the sentiment is “Negative.” The ACOS label is [velvet-Material-NULL-Negative].
This example demonstrates how the ACOS task can capture both explicit and implicit aspects and opinions within a single review.
While offering advantages, the quadruple extraction task fails to efficiently integrate the four subtasks and capture implicit aspects and opinions. This limits the extraction of useful information, such as the concern raised about the durability of velvet when an opinion is labeled “Null.” To enhance the effectiveness of the ACOS quadruple, we incorporated the implicit “I” tag, resulting in the ACOSI quintet task. It is important to note that the order of these elements (A, C, O, S, I) in the annotation is flexible and can be adapted to specific needs or preferences. In this task, annotators mark the span of the opinion text and identify whether it is a direct or indirect opinion. For instance, in the phrase “the velvet quality does not hold up,” we might annotate it as [velvet-Material, quality does not hold up, Negative, Indirect Opinion] or [velvet-Material, Negative, quality does not hold up, Indirect Opinion]. This approach enables us to maintain the information contained within the opinion text span while indicating its implicit nature. The flexibility in the order of ACOSI elements allows for consistency with various annotation formats and model outputs, addressing the reviewer’s concern about apparent discrepancies between sections. The key is that all five elements (A, C, O, S, I) are present in the annotation, regardless of their sequence. The task is shown in Fig. 1.
2.2 ABSA in a Unified Model.
The ACOSI analysis task can be separated into three classification tasks, namely, category classification, sentiment classification, and implicit indicator classification, as well as two sequence generation tasks, aspect extraction and opinion extraction. Previous research focused on specific subsets of these labels as an expansion of ABSA [56,59,85]. For example, aspect extraction, opinion extraction, sentiment extraction [81], aspect–opinion extraction [84], aspect–sentiment extraction, and opinion–sentiment extraction [86]. However, combining these subtasks is a time-consuming process and does not capture the mutual dependence of sentiments on both opinions and aspects. To achieve a unified approach that accomplishes multiple tasks within a single model, previous studies have been divided into three directions.
2.3 Large Language Model for Implicit Knowledge Completion.
The use of LLMs for implicit knowledge completion has become an attractive topic in recent years. Researchers have explored incorporating named entity recognition tasks to extract entities and reassemble the results according to user needs [60,87]. One approach involves extracting and summarizing information from user reviews [88]. In sentiment analysis, researchers have used LLMs to analyze user feedback and reviews, extracting sentiment and identifying specific pain points or desired features [89,90]. For requirements elicitation, LLMs have been employed to process and summarize large volumes of user interviews and surveys, helping to identify common themes and requirements [91]. In trend prediction, by analyzing social media posts and online discussions, LLMs have been used to predict emerging user needs and preferences [92]. Regarding persona generation, some studies have explored the use of LLMs to create more accurate and diverse user personas based on aggregated user data [93]. In the natural language processing of user feedback, LLMs have been used to process and categorize open-ended feedback, enabling more efficient analysis of large-scale user studies [94–96]. Additionally, researchers have experimented with LLM-powered chatbots to conduct initial user interviews and gather preliminary insights [97,98]. While LLMs have shown significant promise in these areas, researchers also caution about possible biases and emphasize the need for human oversight in interpreting results [99]. As the technology continues to evolve, it is expected that LLMs will play an increasingly important role in implicit knowledge completion, complementing traditional research methods.
2.4 Position Encoding.
In the classic encoder–decoder structured model [25], the position information of the input and output should also be included in the training process. The direct position information is linearly increasing and aligns with the work rank, but the information for each word should be considered as the same instead of linearly increasing; to solve this problem, the original transformer model chooses the trigonometric function to scale down the position to range (1, 1). But the trigonometric transformation still feeds fixed information to the model; the relationship between words has been neglected. To capture the relationship information between words inside input and output, a relative position encoding method was generated [100], and in relative position encoding, the position information was no longer represented by a single vector, instead, it is a square matrix whose dimension was equal to the length of input/output. The relative position encoding shows its advance in the T5 [61] model.
Developing heuristic models based on syntax rules and algorithms based on lexicons [101].
Building deep learning models using an extraction approach based on named entity recognition that assigns a tag to each word within the text-based dataset [60,102–104].
Generating all necessary labels using a predictive format, where the model outputs labels iteratively based on the previous ones. In other words, the model operates sequentially [58,105,106].
The review text can be categorized into explicit and implicit aspects and opinions. However, previous research has not given sufficient attention to the implicit category, with researchers often denoting implicit aspects and opinions as “Null.” Additionally, there has been a lack of detailed analysis regarding the “implicit” category.
3 Methodology
This section introduces the dataset along with the proposed model structure and the DKG algorithm.
3.1 The Dataset.
The unprocessed review dataset was extracted from three websites, namely, Finish Line, ASICS, and New Balance, resulting in a total of 145,430 reviews. Within this dataset, there were 10,700 lengthy reviews that exceeded 60 words, 22,458 concise reviews with less than 10 words, and 1636 reviews that referenced specific product names. To eliminate any innate bias, lengthy and concise reviews were discarded, resulting in a dataset of 59,184 reviews. Within this filtered dataset, 75.89% were rated 5-star, 12.85% 4-star, 4.96% 3-star, 2.64% 2-star, and 4.92% 1-star. In the referenced annotated dataset [107], we utilized basic length heuristics to filter reviews, preserving those with 2–5 sentences, and conducted a uniform sampling based on the star rating of the reviews. The annotation task was performed by native-English-speaking product design students that we hired. Each example was assigned two annotators and any conflicts that arise were resolved by a project lead. We used Cohen’s Kappa [108] as the metric to measure inter-annotator agreement, and we observed a 52.8% chance-adjusted agreement on the categorical components and 40.8% on the span extraction tasks, which include aspect and opinion, between the annotators. These agreement scores reflect the inherent complexity of multi-label annotation in product reviews, where multiple valid interpretations often exist for the same text segment. For instance, when a reviewer comments on a shoe’s “uncomfortable fit,” one annotator might focus on the general comfort category while another might specifically note the sizing aspect, both being equally valid interpretations. Each sample was annotated following the ACOSI format, resulting in a set of ground-truth quintuples, which include aspect term, aspect category, opinion, sentiment polarity, and an implicit/explicit opinion flag. To ensure annotation quality despite the inherent subjectivity, we implemented a rigorous review process where any significant disagreements were discussed and resolved by the project lead, who had extensive experience in both product design and annotation tasks. Annotators were instructed to annotate implicit opinion spans by labeling the minimal supporting span that covered the implicit sentiment expression for the current tuple. Future research will expand the dataset with additional annotators and implement more detailed annotation guidelines to further improve consistency while maintaining the richness of multi-perspective interpretation.
3.2 Unified Model Structure and Implicit Knowledge Completion.
Pre-trained LLMs utilize the likelihood of single words or word spans in real text to encode natural language into numerical values. The concept of a language model, originally introduced in the early 1900s with neural networks, has since been integrated into many text analysis tasks. With the advent of deep learning, researchers have developed LLMs that are pre-trained in large corpora. T5 [61] is a pre-trained LLM developed by Google AI Language that uses transformers as a benchmark model structure and trains on massive datasets, specifically the full version of the C4 dataset [109]. These powerful LLMs can be fine-tuned for specific NLP tasks, such as adding a single layer, resulting in state-of-the-art performance in several NLP tasks. In this study, we use T5 as our base LLM and fine-tune it for the ACOSI label prediction task. The identification of implicit knowledge is still in its infancy. Some research defined those implicit knowledge into three categories [23]: unexpected delighted, lead user needs, and extraordinary user needs, while others defined them as needs from edge users/latent users [18,20,21,110,111]. Systematic methods for implicit knowledge completion still lack the ability to identify indirect opinions that the user has in text, do not have large enough databases, leading to an increase in bias, and it is a challenge to obtain or hierarchically categorize product attributes without human involvement.
In this article, inspired by the work of Cai et al. [55], we further expanded the quadruple task. Their work, for example, in the review “I like the look and the velvet is great, but the quality of the velvet is not strong,” the aspect term is “cushion,” the category is “Material,” the opinion term is “great,” and the sentiment is “Positive.” Thus, the corresponding ACOS labels are [velvet-Material-great-Positive]. Additionally, we can extract [Null-Appearance-like-Positive] where the aspect term is implicit in the second part of the review and [velvet-material-NULL-Negative] where the opinion term is implicit in the third part.
While offering advantages, the quadruple extraction task fails to efficiently integrate the four subtasks and capture implicit aspects and opinions. This limits the extraction of useful information, such as the concern raised about the durability of velvet when an opinion is labeled “Null.” To enhance the effectiveness of the ACOS quadruple, we incorporated the implicit “I” tag, which results in the ACOSI quintet task. In this task, the annotators mark the span of the opinion text and identify whether it is a direct or indirect opinion. For instance, in the phrase “the velvet quality does not hold up,” we annotate [velvet-Material quality does not hold up (Indirect Opinion)-Negative]. This approach enables us to maintain the information contained within the opinion text span while indicating its implicit nature. The ACOSI analysis task is completed using a consolidated model that generates output. The T5 model processes all tasks related to text in a sequence-to-sequence fashion, where tasks like sentiment analysis produce outputs such as “positive” and “negative.” Even regression tasks are processed in this way, with predictions in the form of string output such as “five.” In the T5 model, a standard encoder-decoder transformer architecture [25] is used, which consists of three parts; in addition to the structure of the standard transformer model, the knowledge-guided position algorithm was tested in contrast to the same experimental settings. Figure 2 provides a visual representation of the entire structure of the model.
Tokenization. To provide text input to the model, it is necessary to encode the entire text into input IDs. In this experiment, the T5 tokenizer is used in the first step, which consists of a vocabulary with 32,128 elements. Given that this is a problem that involves multiple tasks and labels, five special tokens are included in the vocabulary: . These tokens denote the initiation of a label and the different types of labels, respectively.
Encoding process. The input data are encoded using the T5 large model encoder, which comprises a conventional transformer encoder with 12 attention layers, where each layer has a hidden dimension of 768.
Knowledge guided encoding. The position encoding of the input was performed before being fed into the model; the details of the algorithm are introduced in the next paragraph.
Decoding process. Using a generative approach, the encoder output is decoded. The decoder employs a six-layer attention mechanism and generates each prediction through an auto-regressive process by which the next prediction is based on the previous output and the current decoder output.
3.2.1 Design-Knowledge-Guided Position Encoding.
Design knowledge-guided position encoding
A pre-defined design knowledge lexicon
in length input
if {then
ifthen
ifthen
else
end if
else
ifthen
else
end if
end if
else
end if
3.2.2 Loss Function.
Through massive cross-entropy, the entire vocabulary was weighted with the lexicon, and the loss in each iteration will be directly affected.
3.2.3 The Design Knowledge Benchmark.
In most classification tasks, accuracy, recall, and F1 score serve as standard benchmarks. However, these three classification benchmarks may not fully address the complexities of the ACOSI analysis task. This inadequacy arises because the primary objective of the ACOSI analysis task is to identify implicit opinions from reviews, often requiring identification of aspects that are similar to, but not exactly matching, the provided labels. Consequently, standard classification benchmarks, which predominantly rely on exact matches, may not suffice.
4 Experiments and Results
In this work, we choose the T5 model [11] as our baseline, which has the ability to solve all types of NLP tasks in a unified generative manner.
4.1 Data Preparation and Tokenization.
Regarding the references to attributes, the most frequently mentioned were “Exterior” and “Fit.” Specifically, within all reviews that referenced attributes, 58.25% referred to “exterior,” 76.88% to “fit,” 12.21% to “shoe parts,” 32.11% to “Durability,” 7.33% to “Permeability,” 15.48% to “Stability,” and 16.59% to “Impact absorption.” The attribute data were scrutinized with a specialized sneaker attribute lexicon [85]. After filtering, 2000 reviews were randomly selected for annotation, with 400 reviews from each rating to create a more balanced training dataset.
Since there are five different labels in the same task, we want to introduce some special tokens to indicate the difference among tokens; to be more specific, we introduced six new tokens into the tokenizer we used in the experiment, namely . The first token is a separator that indicates the start of a new set of labels.
4.2 Model Performance.
In the experiment, we chose the pure T5 model as our baseline comparison model; for fairness of the comparison, we use the exact same parameters settings and benchmarks to measure performance. In each parameter setting, we choose the best model as the final model. The benchmark used includes the following: ROUGE1, ROUGE2, ROUGEL (ROUGE long), and DKG-ROUGE. Since the nuance of the DKG-ROUGE performance is hard to interpret the difference between the baseline model and the DKG model, we provided another metric design-bag-of-words (DBW) technique which uses bag-of-words technique to give a more direct comparison. The DBW used in the experiments is the words that appear in both the test prediction and the attribute lexicon; the results show that in 13 parameter settings, the DKG algorithm robustly has higher chances of predicting design-related words in the test set; the result is shown in Fig. 4; the statistical results of DKG-ROUGE are shown in Figs. 5 and 6. In the following figures and charts, T5 is the based model used in the article and DKG-T5 is the new model we developed (Table 1).
Metric | Learning rate | Max. prediction length | Number of beams | Number of repeat ngram | ROUGE1 | DKG-ROUGE | DBW |
---|---|---|---|---|---|---|---|
Pure T5 | 64 | 3 | 2 | 0.658 | 0.0030 | 507 | |
128 | 3 | 2 | 0.686 | 0.0036 | 568 | ||
256 | 3 | 2 | 0.673 | 0.0031 | 541 | ||
128 | 3 | 2 | 0.688 | 0.0031 | 502 | ||
128 | 3 | 2 | 0.689 | 0.0034 | 523 | ||
128 | 4 | 2 | 0.686 | 0.0033 | 515 | ||
128 | 5 | 2 | 0.686 | 0.0029 | 488 | ||
128 | 3 | 3 | 0.692 | 0.0031 | 502 | ||
64 | 4 | 4 | 0.687 | 0.0034 | 525 | ||
64 | 3 | 5 | 0.684 | 0.0032 | 524 | ||
64 | 5 | 4 | 0.683 | 0.0030 | 512 | ||
128 | 3 | 4 | 0.695 | 0.0029 | 462 | ||
128 | 3 | 5 | 0.694 | 0.0034 | 542 | ||
DKG transformer | 64 | 3 | 2 | 0.566 | 0.0037 | 504 | |
128 | 3 | 2 | 0.561 | 0.0033 | 524 | ||
256 | 3 | 2 | 0.561 | 0.0042 | 607 | ||
128 | 3 | 2 | 0.557 | 0.0038 | 588 | ||
128 | 3 | 2 | 0.552 | 0.0039 | 562 | ||
128 | 4 | 2 | 0.515 | 0.0043 | 657 | ||
128 | 5 | 2 | 0.519 | 0.0034 | 535 | ||
128 | 3 | 3 | 0.523 | 0.0044 | 615 | ||
64 | 4 | 4 | 0.561 | 0.0027 | 514 | ||
64 | 3 | 5 | 0.550 | 0.0044 | 624 | ||
64 | 5 | 4 | 0.555 | 0.0030 | 506 | ||
128 | 3 | 4 | 0.557 | 0.0041 | 623 | ||
128 | 3 | 5 | 0.539 | 0.0026 | 517 |
Metric | Learning rate | Max. prediction length | Number of beams | Number of repeat ngram | ROUGE1 | DKG-ROUGE | DBW |
---|---|---|---|---|---|---|---|
Pure T5 | 64 | 3 | 2 | 0.658 | 0.0030 | 507 | |
128 | 3 | 2 | 0.686 | 0.0036 | 568 | ||
256 | 3 | 2 | 0.673 | 0.0031 | 541 | ||
128 | 3 | 2 | 0.688 | 0.0031 | 502 | ||
128 | 3 | 2 | 0.689 | 0.0034 | 523 | ||
128 | 4 | 2 | 0.686 | 0.0033 | 515 | ||
128 | 5 | 2 | 0.686 | 0.0029 | 488 | ||
128 | 3 | 3 | 0.692 | 0.0031 | 502 | ||
64 | 4 | 4 | 0.687 | 0.0034 | 525 | ||
64 | 3 | 5 | 0.684 | 0.0032 | 524 | ||
64 | 5 | 4 | 0.683 | 0.0030 | 512 | ||
128 | 3 | 4 | 0.695 | 0.0029 | 462 | ||
128 | 3 | 5 | 0.694 | 0.0034 | 542 | ||
DKG transformer | 64 | 3 | 2 | 0.566 | 0.0037 | 504 | |
128 | 3 | 2 | 0.561 | 0.0033 | 524 | ||
256 | 3 | 2 | 0.561 | 0.0042 | 607 | ||
128 | 3 | 2 | 0.557 | 0.0038 | 588 | ||
128 | 3 | 2 | 0.552 | 0.0039 | 562 | ||
128 | 4 | 2 | 0.515 | 0.0043 | 657 | ||
128 | 5 | 2 | 0.519 | 0.0034 | 535 | ||
128 | 3 | 3 | 0.523 | 0.0044 | 615 | ||
64 | 4 | 4 | 0.561 | 0.0027 | 514 | ||
64 | 3 | 5 | 0.550 | 0.0044 | 624 | ||
64 | 5 | 4 | 0.555 | 0.0030 | 506 | ||
128 | 3 | 4 | 0.557 | 0.0041 | 623 | ||
128 | 3 | 5 | 0.539 | 0.0026 | 517 |
In parallel experiments, if choosing the regular ROUGE score as the benchmark, we find that the T5 universally has around 0.1 higher ROUGE score than the DKG algorithm, in the mean time, on the 1k training dataset, we noticed that the T5 is converging faster than the DKG algorithm, we think that both results are because the T5 has a much larger training dataset, the prior knowledge in T5 makes the model generate predictions faster, and the results will automatically more correct in terms of syntax, which also lead to a higher ROUGE score. Worth to mention, DKG attention forces the multi-head mechanism to pay more attention to a specific domain; it will inevitably make the results get further from the general context; this also explains that the pure T5 model has a better plain ROUGE performance than the DKG model.
In terms of domain knowledge performance, our observations indicate that our model outperforms all best T5 models. Specifically, all best T5 models exhibit a DKG-ROUGE score near 0.003 (rounded to 4 decimal places), while in our experiment, of 13 different parameter settings, 7 sets demonstrate superior domain knowledge performance compared to T5.
Conducting a hypothesis test using a Student’s t-test with a 95% confidence level, we obtain a p-value of 0.01. This result suggests that our DKG model performs better than the pure T5 model in domain knowledge extraction. Furthermore, we present the statistics report of the bag-of-words in Fig. 4. From this report, we can readily conclude that the DKG position encoding algorithm consistently predicts more words related to the domain of sneaker design attributes.
Our findings have significant implications for extracting user knowledge, particularly implicit user knowledge, through the ACOSI analysis task. The superior performance of the DKG model in domain knowledge extraction suggests that it is more skilled at identifying and categorizing specific aspects of user feedback that relate to product design and user experience. This aligns with our core aim of developing a more nuanced and context-aware method for completing implicit knowledge from online reviews. The improved ability to capture domain-specific language and concepts is crucial for identifying implicit knowledge, which is often embedded in the subtleties of user expressions and may not be immediately apparent in standard language models.
The ACOSI (aspect, category, opinion, sentiment, and implicit indicator) task benefits from this enhanced domain knowledge in several ways. First, the more accurate identification of aspects and categories ensures that we are capturing the relevant product features and usage contexts. Second, the improved extraction of opinions and sentiments allows for a more nuanced understanding of user experiences. Crucially, the ability to better identify implicit indicators is vital for completing implicit knowledge, as these are often not explicitly stated but inferred from context and domain-specific language. The benchmark is advanced in completing implicit knowledge because it goes beyond simple language understanding to incorporate domain-specific knowledge and context. This allows for a more sophisticated interpretation of user feedback, potentially revealing knowledge that users themselves may not have directly articulated. By combining the strengths of large language models (like T5) with domain-specific guidance (through DKG), we create a system that is both linguistically competent and contextually aware, making it particularly well-suited for the complex task of completing implicit knowledge in specific product domains.
This analysis reinforces our initial research background and core aim to develop a more effective method for completing implicit user knowledge from online reviews in specific product domains. The results demonstrate that our approach offers a significant advance in this direction, providing a more nuanced and context-aware tool for designers and researchers in the field of user-centered product development.
4.3 Example Model Predictions.
The results of the post-analysis with the model output associated with an example user review are presented in this section.
Review
My ankles tend toward supination, so a well-cushioned heel is crucial. Every other pair of NBs I own give great even support, but the outer heels on this pair collapsed after less than a week ! The shoes now slant outward in a very unsafe and totally unacceptable way that makes my supination way worse. They are completely unusable - huge waste of money. Don’t buy this pair unless you have a perfectly even foot strike!
Elicited Labels
Label 1: (“NULL,” “ContextOfUse#Purchase_Context,” “neutral,” “ankles tend toward supination, so a well-cushioned heel is crucial,” “direct”)
Label 2: (“outer,” “Appearance#Shoe Component,” “negative,” “outer,” “direct”)
Label 3: (“outer,” “ContextOfUse#Usage frequency,” “negative,” “IMPLICIT,” “direct”)
Label 4: (“shoes,” “Performance#Sizing/Fit,” “negative,” “makes my supination way worse,” “indirect”)
Label 5: (“NULL,” “Cost/Value,” “negative,” “huge waste of money,” “indirect”)
Label 6: (“this,” “Performance#Sizing/Fit,” “negative,” “unless you have a perfectly even foot strike,” “indirect”)
User Needs
The user has specific needs due to their tendency toward supination (where the foot rolls outward). They require a shoe with a well-cushioned heel that provides even support to counteract their supination tendencies. Labels 4 and 6 indicate that sneaker designers may put more effort for users with special needs; even a normal user could have uneven foot strikes in daily life, a “smart” insole could perhaps help this situation.
Review
I purchased these less than a month ago and the pattern is so cute but It is so worn that it starts to look trashy. I would not recommend any of the tie dye crocs.
I love the shoe itself, just not how delicate they are with the pattern.
Elicited Labels
Label 1: (“NULL,” “ContextOfUse#Review_Temporality,” “neutral,” “purchased these less than a month ago,” “direct”)
Label 2: (“tie dye crocs,” “Appearance#Color,” “negative,” “I would not recommend any,” “direct”)
Label 3: (“shoe,” “General,” “positive,” “I love the shoe,” “direct”)
Label 4: (“shoe,” “Appearance#Material,” “negative,” “just not how delicate they are with the pattern,” “direct”)
Label 5: (“NULL,” “Appearance#Color,” “negative,” “pattern is so cute,” “direct”)
User Needs
The user shows a pronounced fondness for the ornamental configurations of this particular sneaker. Labels 4 and 5 serve as indicators that sneaker designers should seriously consider durability even in the creation of a seemingly straightforward pattern. The user need in this example is a sneaker that blends esthetic appeal with durability, emphasizing the desire for ornamental patterns that maintain their visual quality over time. They prefer products that offer both style and longevity, suggesting a need for durable or replaceable decorative elements that do not detract from the shoe’s overall quality and appearance. It is important to note that while the above labels are direct outputs from our model performing implicit knowledge completion, the user needs described here represent human designers’ interpretation and analysis of these model-generated labels. The model itself does not directly extract or identify user needs; rather, it provides structured information that human experts can then analyze to identify and articulate specific user needs and design implications.
5 Discussion and Limitations
While our proposed ACOSI analysis task demonstrates promising results in analyzing customer reviews for athletic footwear, it is important to acknowledge several limitations that may affect its broader applicability and generalizability.
Transferability. The current study focused exclusively on athletic footwear reviews from three specific websites (Finish Line, ASICS, and New Balance). This narrow product focus raises questions about the transferability of our approach to other product categories, particularly more complex or technically sophisticated items. For instance: (a) Complex engineering products: The applicability of our ACOSI analysis task to products like electric vehicles, which involve numerous intricate components and systems, remains untested. Such products may require a more nuanced approach to identify and categorize customer needs accurately. (b) Consumer electronics: High-tech consumer products often have rapidly evolving features and capabilities, which may present challenges for our current model in capturing and interpreting user needs effectively. (c) Service-based products: Our ACOSI analysis task’s effectiveness in analyzing reviews for intangible products or services, which may have different evaluation criteria compared to physical goods, is yet to be explored.
Scalability. While our dataset of 59,184 reviews provided a substantial basis for analysis, there are potential scalability concerns: (a) Data volume: As the volume of reviews increases, particularly for popular products or diverse product lines, the computational resources required for processing and analysis may become a limiting factor. (b) Language diversity: Our current model is trained on English-language reviews. Scaling to accommodate multiple languages would require significant additional resources and may introduce new challenges in maintaining consistency across languages. (c) Real-time analysis: The ability of our ACOSI analysis task to handle real-time or near-real-time analysis of incoming reviews, which would be valuable for timely product improvements, has not been assessed.
Implicit knowledge completion. Our approach to identifying implicit knowledge, while innovative, has limitations: (a) Subjectivity: The identification and categorization of implicit knowledge may still be influenced by subjective interpretations, despite our efforts to maintain objectivity. (b) Cultural context: The model’s ability to discern culturally specific implicit knowledge may be limited, potentially missing important nuances in global markets. (c) Temporal dynamics: Customer needs and expectations evolve over time. Our current ACOSI analysis task may not adequately capture these temporal dynamics, potentially leading to outdated insights if not regularly updated.
Lexicon dependency and evaluation metrics. Our method’s reliance on domain-specific lexicons for positional encoding presents both strengths and limitations. Although this approach allows for precise, context-aware analysis in the sneaker domain, it raises questions about generalizability to other product categories. The creation of high-quality lexicons for diverse domains may be challenging and resource-intensive, potentially limiting the method’s broad applicability. For instance: (a) Diverse product categories: Developing comprehensive lexicons for products with rapidly evolving features (e.g., smartphones) or highly specialized technical components (e.g., industrial machinery) may require frequent updates and extensive domain expertise. (b) Abstract concepts: For services or products dealing with intangible qualities, creating an exhaustive and accurate lexicon might be particularly challenging. Additionally, while the ROUGE-based evaluation metric provides a standardized measure of performance, it may not fully capture the nuanced aspects of latent need identification. Future work should explore complementary evaluation methods that can better assess the quality and relevance of extracted implicit knowledge across various domains. This could include human expert evaluations, user studies, or more sophisticated natural language understanding metrics that go beyond surface-level text similarity. It is important to note that while the lexicon used in DKG-ROUGE and DBW metrics was utilized in the DKG algorithm, it was not directly used in the training process of the DKG-T5 model. The gradient descent during the training process did not optimize to align the results with the lexicon. Therefore, while the lexicon played a role in informing the model with some design-related knowledge, it did not inherently bias the model’s prediction performance. This separation ensured that the evaluation of the model remained robust and unbiased. Future research could explore alternative methods for knowledge integration and evaluation to further validate this approach.
Label order in auto-regressive models. One important consideration in our approach is the potential impact of the sequence order in text generation when using auto-regressive models like T5. As pointed out by a reviewer, the order in which we present the ACOSI elements during training and inference (e.g., ACOSI, CSOIA, SOAIC, etc.) may influence the model’s predictions. While our current study did not explicitly investigate this aspect, it is a valuable point for future research. The sequence dependency in auto-regressive models could potentially affect the reliability and consistency of predictions across different element orders. This limitation opens up an interesting avenue for future work, where experiments could be conducted to train and evaluate the model using various ACOSI element sequences. Such studies could provide insights into the robustness of the approach and potentially lead to ensemble methods that leverage multiple sequence orders to enhance prediction reliability. As we continue to refine and expand this research, exploring the impact of element order on model performance will be a priority to ensure the most accurate and dependable results in aspect-based sentiment analysis tasks.
Addressing these limitations will be crucial to enhance the robustness and wide-scale applicability of our proposed ACOSI analysis task. Future research should focus on validating the approach across diverse product categories, improving scalability for larger and more diverse datasets, and refining the identification of implicit knowledge to consider cultural and temporal factors.
6 Conclusions and Future Research Directions
This article introduces a ground breaking dataset designed to address a novel natural language processing task known as ACOSI (Comprehensive Procurement of Implied Requirements from Virtual Critiques). It proposes a pioneering approach that combines a consolidated T5 paradigm with a DKG positional coding algorithm to automate the generation of implied views and facets on a wide scale. By leveraging advanced research in natural language processing and linguistic models, this paradigm aims to significantly reduce the time and effort required for data preparation. Moreover, it seeks to decrease reliance on manually crafted expert systems for extracting aspect-opinion-sentiment from reviews. The strengths of this paradigm for ACOSI extraction on a large scale include:
Ability to incorporate domain knowledge. In current transformer-based language models, the integration of domain knowledge is usually done by fine-tuning with a smaller domain dataset. Our work provided a novel, effective, and efficient method; through only changing the position encoding, we emphasize the prior knowledge in the network.
Transformation. The method of operation culminates in the extraction of a comprehensive register of prospective aspects along with their associated opinions. This paradigm represents a noteworthy advance in facilitating automatic and wide-ranging elicitation of implicit aspect opinion, thereby transcending user-centric approaches and potentially uncovering more informative and revolutionary revelations to underpin the design process.
Scalability. Leveraging pre-trained linguistic models, such as T5, alleviates the requirement for extensive manually annotated data. The entire spectrum of methodological constituents is systematized in a streamlined manner and is amenable to rapid adaptation and application to novel datasets.
The emergence of multi-agent LLMs has garnered significant attention in recent years [113,114]. Unlike traditional single input/output format systems [115], multi-agent systems enable models to collaborate or compete with each other within a more intricate environment. This setup closely mirrors real-world systems, offering the potential to simulate human thinking processes more accurately. In many complex tasks, such as the ultimate goal proposed in this paper, addressing implicit knowledge completion proves challenging within a single model setup. Implicit knowledge completion pertain to extracting requirements not explicitly articulated in review texts, a task even challenging for human annotators. Consequently, benchmarking the model’s output evaluation becomes exceedingly difficult.
However, by simulating the identification of implicit knowledge completion akin to human thinking or brainstorming processes, each sub-step of the model can be validated more accurately. For instance, breaking down implicit knowledge completion into basic functionalities, emotional support, and fashion allure, and further subdividing each into more granular needs, facilitates a finer evaluation. Through such breakdowns of the ultimate goal, multi-agent systems can leverage diverse types of prior knowledge and agents with distinct “personalities.” This diversity enhances the validity and robustness of the system.
Potential implementation: The completion of implicit knowledge from online reviews has the potential to significantly enhance the process of user needs finding in the field of human-centered design. Our approach can serve as a valuable tool for designers and researchers, uncovering insights not immediately apparent or explicitly stated in user feedback, and revealing deeper user needs and desires. This paves the way for more innovative and user-centered product designs. Furthermore, the ability of our approach to accurately identify aspects, categories, opinions, sentiments, and implicit indicators in user feedback provides a comprehensive and nuanced understanding of user experiences. This can aid designers in understanding the context and subtleties of user needs, allowing them to create products that meet current expectations and anticipate future demands. It is important to note that while our approach aids in the completion of implicit knowledge, it should not replace direct user engagement in the design process, but rather serve as a complementary tool enhancing traditional user research methods.
In conclusion, our ACOSI analysis task and the associated T5 model with the DKG positional coding algorithm constitute a significant advancement in the field of natural language processing, specifically for the automated extraction of implicit aspect-opinion-sentiment from reviews. The proposed approach has demonstrated promising results in the athletic footwear domain, and it is expected to make substantial contributions to other domains as well, given its scalability and adaptability. Future research will focus on addressing the limitations identified in this study and further refining the ACOSI analysis task and the associated models for wider and more effective applications.
Footnote
Acknowledgment
This material is based on work supported by the National Science Foundation (NSF) under the Engineering Design and System Engineering (EDSE) Grant No. 2050052. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.