Abstract

Incorporating style-related objectives into shape design has been centrally important to maximize product appeal. However, algorithmic style capture and reuse have not fully benefited from automated data-driven methodologies due to the challenging nature of design describability. This paper proposes an AI-driven method to fully automate the discovery of brand-related features. First, to tackle the scarcity of vectorized product images, this research proposes two data acquisition workflows: parametric modeling from small curve-based datasets, and vectorization from large pixel-based datasets. Second, this study constructs BIGNet, a two-tier Brand Identification Graph Neural Network, to learn from both scalar vector graphics’ curve-level and chunk-level parameters. In the first case study, BIGNet not only classifies phone brands but also captures brand-related features across multiple scales, such as lens’ location, as confirmed by AI evaluation. In the second study, this paper showcases the generalizability of BIGNet learning from a vectorized car image dataset and validates the consistency and robustness of its predictions given four scenarios. The results match the difference commonly observed in luxury versus economy brands in the automobile market. Finally, this paper also visualizes the activation maps generated from a convolutional neural network and shows BIGNet’s advantage of being a more explainable style-capturing agent.

1 Introduction

Recognizing, codifying, and incorporating desired stylistic objectives into shape design has long been a focus of product development [1,2]. While market appeal is important, conveying specific aesthetic styles through design can be challenging and unpredictable due to the need for differentiation from previous designs and the time-variant nature of style [3,4].

Attempting to relate aesthetic attributes to consumer response and market success, Liu et al. [5] studied three aspects of car aesthetics impact on the market: segment prototypicality (SP), brand consistency (BC), and cross-segment mimicking. Among all, BC has been shown to have the most consistent effect that positively relates to profit, indicating that BC maintenance is one of the key factors of a successful design. To maintain BC, brand feature encoding is crucial for designers to manage the brands’ essence and to produce consistent and competitive designs. However, codifying and modeling BC-related features is challenging and subjective because they are articulated primarily by humans. Because brand features are often subtle and difficult to systematically quantify, designers often have to go through a laborious process to master the brand features of a product.

Previously, research has shown the possibility of constructing shape grammars—a sequential and systematic shape description system for product design [6]—to capture product brands. While previous research showed that shape grammars can describe a variety of products’ brand features or semantic languages [714], the process of finding shape grammars was mostly achieved through human perception, which is a time-consuming and hard-to-transfer process. Despite the difficulty of automatic shape grammars induction, studies in this field showed the feasibility of constructing describable and quantifiable systems for brand consistency. During such a process, human designers learn the unique features shared among products in each brand. As a realization process resembles supervised learning, a natural question arises: can an AI agent learn brand consistency through a fully automated process and free humans from laborious shape-to-shape comparison? This paper, therefore, models the brand recognition process as a data-driven fine-grained classification task. By examining the trained neural network classifier, humans are expected to gain brand-related feature knowledge from the AI’s attention.

There are multiple challenges. First is data scarcity. As most brands only annually release several products that share similar functionality and have distinct exteriors, a class would have only on order of tens of data samples. Since deep learning models’ performance hugely relies on a large number of samples to learn meaningful content [15], augmentation techniques to expand the dataset are necessary to study. Second, ensuring the interpretability of the constructed AI imposes great challenges. Since AlexNet [16], convolutional neural networks (CNN) have seen a dramatic accuracy improvement in classifying pixel images. However, extracting key primitives in a parametric fashion from such CNN remains difficult. While class activation mapping (CAM) [17] techniques attempted to interpret CNN pictorially, the resulting heat maps are often fuzzy and lack precision. Since humans tend to learn, reason, and design based on curves and shapes, this type of architecture and workflow may not effectively capture intuitive brand features. Moreover, these models cannot be expected to adequately quantify or edit these features using this approach. This paper proposes a Brand Identification Graph Neural Network (BIGNet), a curve-based AI that can explicitly capture and visualize brand-related geometry features. These geometries include intra- and inter-curve and chunk-level features. Using the proposed approach, humans can utilize AI as a communicative and explainable style discovery agent and accelerate the design process.

This research’s main contributions are as follows:

  1. Two vectorized data acquisition approaches for style recognition: parametric modeling from small curve-based datasets, and vectorization from large pixel-based datasets.

  2. BIGNet, a novel hierarchical graph neural network (GNN) that can learn from both scalar vector graphics’ (SVGs) curve-level and chunk-level features.

  3. Evaluation study to produce design insight and feature visualization, which shows BIGNet’s capability of perceiving explicit and explainable brand-related features.

2 Related Work

This research aims to accelerate product style design, based on deep learning. This section will first review how previous research attempted to construct systematic approaches to convey stylized ideation. Second, this section will review the progress and limitations in fine-grained image classification and curve-based deep learning methods.

Shape grammars. Shape grammars have been used as a computational tool for explicit feature representation and generation for over five decades. A shape grammar consists of a set of shape rules that sequentially eliminate, edit, or generate design primitives. The fundamental papers and visual examples of shape grammars are comprehensively presented by Stiny [6,18,19], and how shape grammars are used in a case study of this paper can be found in Fig. 3. Because of a shape grammar’s explicit expression to describe stylized concepts, it later became a feasible method for product designers to capture brand-related features of exterior design [5], which is a crucial subset of branding [20]. Agarwal and Cagan [7] brought shape grammars into industrial products, and they used it to successfully describe coffee machines’ shape generation rules and find brands’ discriminative features. It was further shown that shape grammars could describe a variety of products’ brand features or semantic languages [814]. However, the process of defining shape grammars was mostly done by human perception, which is still a time-consuming process and difficult to transfer from one product to another. This research takes a different approach to identifying differentiating geometric features as a means to automate the feature perception task in a data-driven method, thus accelerating the design cycle.

Fine-grained classification and attention visualization. Detecting style-oriented features for better object recognition accuracy has been a challenge. However, explicit detectors [21,22] and descriptor design [2325] paved the way for deep CNNs [26] to reach high accuracy by learning complex and transformation-invariant features [16,27]. To generalize CNN’s application to a variety of tasks, fine-tuning on pretrained networks by only training one fully connected layer from scratch was studied and found to result in much faster convergence and better accuracy [2830]. These advancements, however, could not promise CNN to have full interpretability of the describable features. While it showed the possibility of visualizing localized regions on fine-grained classification [17,3134], because CNN is learning from pixelated information, it is still unclear which shapes or curves are important, therefore diminishing the usage for designers. Therefore, the attempt to train a deep learning model on curve-based image representation is proposed in this work to visualize human-readable features.

Curve-based recognition methods. As representing images in curves is much closer to how humans see an image, research on curve-based recognition models focuses on building AI that can learn descriptive features from human sketches for classification. In the field of sketch recognition, early studies [3537] used support vector machine as the classifier to differentiate rasterized sketch images. To visualize stroke importance, Schneider and Tuytelaars [36] also adopted a leave-one-feature-out (LOFO) technique to remove one stroke at a time and see how it affects the classification score. More recent studies [3840] took advantage of CNN and achieved better performance and robustness on the recognition task. To recognize curve-based images with multiple abstract levels of features, Yu et al. [38] proposed a multi-scale, multi-channel CNN architecture to learn from partial images segmented from stroke order. However, it still has not incorporated grouping information of curve-based images, which is useful for learning more descriptive features [41]. To address this limitation, Li et al. [39] then proposed sketch-R2CNN to classify sequentially rendered images and paired with a recurrent neural network (RNN) attention mechanism that enabled better accuracy and feature visualization. Although sketch recognition has made significant progress, most studies focused on recognizing simple human sketches comprising only a few tens of strokes. These strokes are typically drawn with straight lines rather than curves, and the sketches are often low resolution, which is a simple representation and limits the complexity that can be achieved with the task. As scaling the recognition onto industrial products like cars can easily have thousands of curves, previous studies would not be applicable due to expensive rendering computation and vanishing gradient problems for RNN-based architectures. Furthermore, industrial products typically consist of dozens to hundreds of groups of curves (chunks) that may contain higher-level brand-related features. There has been little research on analyzing the inter-chunk relationships at a scale of 10,000 or more. The above studies conclude that since CNN is restricted to work on pixelated images, it is not possible to analyze SVGs without first rasterizing, which results in a sparse image and loses grouping information. As previous methods faced challenges to recognize features from large curve-based images due to expensive computation and vanishing gradient, a novel deep learning architecture that can efficiently and concurrently process such data is developed in this paper.

Spatial graph neural network. GNN refers to the domain of deep learning methods designed to deduce information from general non-Euclidean graphs. A graph G is defined as G(V,E), where V is the set of vertices or nodes, and E is the set of edges [42]. Among the branches, spatial-based convolution GNNs (ConvGNN) are found to be most analogous to conventional CNNs, while allowing the nodes to have an arbitrary number of neighbors and offering flexibility on connectivity strength as well as the aggregation process. Neural Network for Graphs [43] was the first work toward spatial ConvGNNs that performed aggregation by summing up each node’s neighborhood information directly. After that, multiple architecture improvements were proposed around flexible aggregation [4446] and sampling [47] strategies. Among all, Graph Attention Networks [46] adopted attention mechanisms to learn the relative energy (weights) between two connected nodes. By enabling the specification of different weights to different nodes, it has achieved state-of-the-art prediction results in Cora, CiteSeer, and PubMed benchmarks. From an engineering perspective [48], GNN has shown its capability of tackling various problems, including physical modeling [49], chemical reaction prediction [50], traffic state prediction [51], and engineering drawings’ segmentation [52,53]. However, using GNN as a classifier to perform style-related feature recognition has not been previously studied. Inspired by the recent successful applications, this research models each SVG as a two-tier graph and builds a spatial GNN with learnable chunk-level attention mechanisms to perform graph-level classification.

3 Methodology

To enhance designers’ ability to edit parametric curves in real time and evaluate the impact on brand consistency, the goal of this research is to build a curve-based AI-driven feature retrieval surrogate that is both explainable and describable. As brand consistency is shown to be important yet abstract for humans to easily identify, the case studies of the proposed methodology are applied to industrial products’ brand recognition. The research workflow is shown in Fig. 1.

Fig. 1
Workflow of this research, illustrating the four key stages of product, brand, and model selection, data acquisition, neural network’s message-passing design, and AI analysis. The logos of phones and cars are shown to represent the brands selected for identification, but the classification task is based on product shapes rather than logos. Two case studies demonstrate the framework’s adaptability to different product domains and scales on the design spectrum.
Fig. 1
Workflow of this research, illustrating the four key stages of product, brand, and model selection, data acquisition, neural network’s message-passing design, and AI analysis. The logos of phones and cars are shown to represent the brands selected for identification, but the classification task is based on product shapes rather than logos. Two case studies demonstrate the framework’s adaptability to different product domains and scales on the design spectrum.
Close modal

3.1 Data Representation and Acquisition.

This research focuses on the front view of product models because just as human beings are more recognizable by their faces, designers tend to place the most recognizable features in products’ front view as well [54]. SVG format is chosen to represent the objective products, as it composes an image of chunks of geometry defined explicitly by parametric control points. To maintain data homogeneity, all curves are converted to cubic Bezier curves, with each image represented by an arbitrary number of chunks of curves, while each curve is parameterized by four control points (eight scalar values).

Facing the scarcity of product data, this research first synthesizes intermediate designs as a data augmentation method. This is achieved by creating and interpolating unified design rules for each brand of products through human observation. For the simplicity of geometry and planar design, a case study is run to identify mobile phones’ exterior features. Second, this research attempts to generate SVGs from vectorizing a generic pixel dataset. As images are taken from the same car at slightly different post angles, this perspective difference contributes to part of the data augmentation. This is implemented with an image processing pipeline, including background removal, noise reduction, edge detection, and vectorization. A case study on cars’ recognition is run for the second approach.

3.2 AI Architecture.

Analyzing features within SVG images offers a unique set of advantages and challenges in comparison to pixel-based counterparts. On one hand, SVG’s strengths lie in its significantly smaller file size, with vectorized images occupying 10 to 10,000 times less storage space, as observed in the two case studies within this paper, all while encapsulating concise and explicit design features of products’ exteriors. Furthermore, SVGs exhibit scalability resilience, meaning variations in height and width do not compromise the intricacy of detail resolution. As a result, this paper leverages the capability to standardize image heights to 1, thereby promoting homogeneity. Notably, height information remains preserved within bounding box parameters, serving as a valuable resource for subsequent stages in the message-passing flow and receiving the attention it deserves. Conversely, SVG images, with their potential for containing an arbitrary number of curves and chunks, pose formidable challenges for AI architecture. As highlighted in related research, conventional deep learning models for image-related tasks often rely on CNNs. However, given the unique characteristics of SVG images, including the inability to resize and standardize the number of features, such an approach becomes unfeasible.

To learn the discrepancy among curve-based images in terms of brand-related styles, this paper proposes a Brand Identification Graph Neural Network (BIGNet), a two-tier spatial GNN that can learn from the SVG format dataset (Fig. 2). In the first layer, a chunk of curves is represented as a graph, while each node is a curve, and connectivity is determined by its neighbor curves. More precisely, the model first samples and aggregates the neighborhood of each node, feeds each node into fully-connected layers (FCs), and then reads out the response by average pooling. In the second layer, an SVG picture is represented as a graph, while each node is a chunk of curves, and connectivity strength is determined via the weights learned from the bounding box parameters. After aggregation, the hidden layers are concatenated and passed to the last fully connected layer to get the prediction. BIGNet’s forward propagation is summarized in Algorithm 1. The parameters used in the two case studies are slightly different to adjust for the images’ complexity level and are listed in Table 1.

Fig. 2
Schematic diagram of the two-tier BIGNet structure
Fig. 2
Schematic diagram of the two-tier BIGNet structure
Close modal
Table 1

BIGNet’s parameters for the two case studies

CasePhonesCars
ActivationLeakyReLULeakyReLU
OptimizerAdamAdam
PoolingAverage poolingAverage pooling
LossBinary cross entropyCategorical cross entropy
E2Loop graph (each node has 2 neighbors)Loop graph (each node has 2 neighbors)
D2Bidirectional, depth: 2Bidirectional, depth: 2
A2ev2d2=(1W2)ev2d21+W2×E2ev2d21ev2d2=(1W2)ev2d21+W2×Linear(88)(E2ev2d21)
W210.5
f4Linear(32 → 24 → 12)Linear(24 → 32 → 24)
f3PassLinear(24 → 24 → 24)
E1Fully connectedFully connected
D122
W1N × 5N2 × 5
f*Linear(5 → 12)Linear(5 → 24)
A1Ev1d1=f*(W1)×Ev1d11Ev1d1=f*(W1)×Ev1d11
f2PassLinear(72 → 24 → 24 → 24)
f1Linear(36 → 18 → 8 → 2)Linear(24 → 18 → 12 → brand#)
Learnable parameters2000–20766716–6812
CasePhonesCars
ActivationLeakyReLULeakyReLU
OptimizerAdamAdam
PoolingAverage poolingAverage pooling
LossBinary cross entropyCategorical cross entropy
E2Loop graph (each node has 2 neighbors)Loop graph (each node has 2 neighbors)
D2Bidirectional, depth: 2Bidirectional, depth: 2
A2ev2d2=(1W2)ev2d21+W2×E2ev2d21ev2d2=(1W2)ev2d21+W2×Linear(88)(E2ev2d21)
W210.5
f4Linear(32 → 24 → 12)Linear(24 → 32 → 24)
f3PassLinear(24 → 24 → 24)
E1Fully connectedFully connected
D122
W1N × 5N2 × 5
f*Linear(5 → 12)Linear(5 → 24)
A1Ev1d1=f*(W1)×Ev1d11Ev1d1=f*(W1)×Ev1d11
f2PassLinear(72 → 24 → 24 → 24)
f1Linear(36 → 18 → 8 → 2)Linear(24 → 18 → 12 → brand#)
Learnable parameters2000–20766716–6812

BIGNet forward propagation algorithm

Algorithm 1

Input: SVG graph G(V1,E1);

   graph-level FCs f1;

   aggregated chunk-level FCs f2;

   primitive chunk-level FCs f3;

   curve-level FCs f4;

   chunk attention matrix FC f*;

   chunk-level depth D1;

   curve-level depth D2;

   chunk attention matrix W1;

   curve diffusion weight W2;

   chunk aggregating function A1;

   curve aggregating function A2;

   chunk graphs {G(V2(v1),E2(v1)),v1V1};

   curve features {{xv2,v2V2(v1)},v1V1}

Output: Vector representation ZG

forv1V1do

  forv2V2(v1)do

     hv20xv2

     ev20xv2

     ford2 = 1 ...D2do

       ev2d2A2(ev2d21,E2(v2),W2)

       hv2d2concat(hv2d21,ev2d2)

     end

  end

  Hv1*pool({f4(hv2D2),v2V2(v1)})

  Hv10f3(Hv1*)

  Ev10Hv10

  ford1 = 1 ...D1do

     Ev1d1A1(Ev1d11,E1,f*(W1))

     Hv1d1concat(Hv1d11,Ev1d1)

  end

end

HG*pool({f2(Hv1d1),v1V1})

ZG(HG*)

3.3 AI Evaluation Overview.

After successfully training the network, a series of evaluation criteria is studied to deduce explainable and quantifiable results, depicted in the fourth section in Fig. 1.

The evaluation of the synthetic phone dataset aims to determine whether BIGNet can accurately perceive brand-related features, and whether these details can possibly be transferred through potential human-AI interactions. As the dataset is synthesized using shape rules created by humans through observation from a reference source dataset, the accuracy of the reference source dataset is first examined to verify the manually constructed shape rules’ validity. After that, a dimensional reduction is performed on the networks’ latent vector to see if the brands are well separated. To further investigate the network’s attention, an ablation study using LOFO is performed both at the curve and chunk level. This test removes a chunk or a curve from the picture at a time. Those curves that result in a prediction performance drop when removed are considered to be important features. Finally, based on the localized features observed from the ablation study, parameter extrapolation (Partial Dependence Plot [55]) is implemented on the original shape rules to visualize the confidence change. This experiment serves three purposes. First, such validation can demonstrate BIGNet’s capability of recognizing the brand identity of unseen data. Second, this step mimics a very likely design process where humans move geometry features in the domain, wondering how the modifications will change the brand identity. Lastly, this step further checks the importance of highlighted curves to the discrimination task to understand if the products’ brand features are successfully extracted by the AI.

The evaluation of BIGNet on the vectorized car dataset assesses its ability to recognize brand-related styles in complex, automatically generated real data. This includes testing the model’s robustness and consistency across different tasks and scenarios. First, confusion matrix and dimension-reduced latent vectors plots are calculated to learn the brands’ differences in distinguishability. Second, AI’s chunk-level attention is visualized using a CAM [17]–inspired algorithm. By highlighting the chunk that contributes more to correct identification, CAM is shown to have much more robustness than LOFO on SVGs with a higher order of chunks. Finally, to compare the curve-based approach to the pixel-based approach, a CNN is fine-tuned using ResNet-50 that was pretrained with simCLR [29]. The attention of this CNN is then visualized using Grad-CAM [34]. As CNN is expected to reach a better accuracy level due to its quantitatively much larger model size, this study focuses on examining whether BIGNet conveys more explicit and describable design features than CNN’s class attention visualization.

4 Case Study—Phones

4.1 Brand and Model Selection.

This study compares and differentiates the front views of the products of the two most popular cellphone brands—Apple and Samsung [56,57]. Among the many lines of Samsung phones, the Samsung Galaxy S series has the most similar functionality and price range as Apple’s iPhone and, therefore, is chosen to be the competitor of Apple’s iPhone. To preserve a reasonable degree of homogeneity, all the phone models chosen are without home buttons, which are also more contemporary designs.

4.2 Parametric Modeling From Small Curve Datasets

4.2.1 Synthetic Dataset Generation.

Challenges exist in finding an abundant and well-measured dataset. After realizing the need to increase the sample size of an existing dataset collected from Dimensions.com (shown in Table 2), this study observes the patterns for the selected models, and by using parameter interpolation on the manually established shape rules (number and types of parameters are listed in Table 3, and shape grammars of Apple is shown in Figs. 3(a) and 3(b)), a synthetic SVG dataset with 20,000 synthetic phones, 10,000 for both Apple and Samsung is successfully created (some results are shown in Fig. 3(c)). More details about the synthetic dataset generation process can be found on the BIGNet-phone GitHub page at the “Data Availability Statement” section.

Table 2

The phone models of an existing dataset

Apple iPhoneSubmodel namesTotal number of submodels
XXR, X, XS, XS Max15
1111, 11 Pro, 11 Pro Max
1212 mini, 12, 12 Pro, 12 Pro Max
1313 mini, 13, 13 Pro, 13 Pro Max
Samsung Galaxy SSubmodel namesTotal number of submodels
S10S10e, S10, S10+, S10 5G7
S20S20, S20+, S20 Ultra
Apple iPhoneSubmodel namesTotal number of submodels
XXR, X, XS, XS Max15
1111, 11 Pro, 11 Pro Max
1212 mini, 12, 12 Pro, 12 Pro Max
1313 mini, 13, 13 Pro, 13 Pro Max
Samsung Galaxy SSubmodel namesTotal number of submodels
S10S10e, S10, S10+, S10 5G7
S20S20, S20+, S20 Ultra
Table 3

Number and types of parameters used in shape rules to make the synthetic SVG dataset

Shape rules parametersContinuous (ex: height, width, fillet)Discrete (ex: lens position)Regulation (ex: height–width ratio)
Apple2856
Samsung25112
Shape rules parametersContinuous (ex: height, width, fillet)Discrete (ex: lens position)Regulation (ex: height–width ratio)
Apple2856
Samsung25112
Fig. 3
(a) Shape rules used to generate Apple phones. (b) By sequentially applying rule 1 to rule 7 with parameters of Apple in Table 3, a synthetic Apple phone is made. (c) Synthetic examples of Apple (A-1 to A-3) and Samsung (B-1 to B-3). For Apple, there are two distinct types of speaker position: at the middle (A-1, A-2) or at the top (A-3) of the notch. The number of circles represents lens and sensors range from zero to four. For Samsung, there are three distinct types of lens: one at the middle (B-1), one at the upper right corner (B-2), and two at the upper right corner (B-3).
Fig. 3
(a) Shape rules used to generate Apple phones. (b) By sequentially applying rule 1 to rule 7 with parameters of Apple in Table 3, a synthetic Apple phone is made. (c) Synthetic examples of Apple (A-1 to A-3) and Samsung (B-1 to B-3). For Apple, there are two distinct types of speaker position: at the middle (A-1, A-2) or at the top (A-3) of the notch. The number of circles represents lens and sensors range from zero to four. For Samsung, there are three distinct types of lens: one at the middle (B-1), one at the upper right corner (B-2), and two at the upper right corner (B-3).
Close modal

4.2.2 Synthetic Image Preprocessing.

The two brands are synthesized with completely different sets of rules that may contain brand-irrelevant information, for example, the number of curves and chunks. Therefore, after generating the synthetic SVG dataset, the next important step is to adapt it to a more homogeneous format, so that the AI can learn the difference between brands’ shapes instead of the intrinsic difference between the brands’ shape rules. Therefore, this study then rasterizes the images and vectorizes each into a cubic Bezier SVG using Potrace [58]. All the phones’ heights are then normalized to 1 since the synthetic dataset has relatively larger Samsung phones than Apple phones. This will enable the AI to learn meaningful design languages. It is noticeable that, as explained in Sec. 3.2, bounding box details, including the phone’s height, are extracted prior to the normalization. Therefore, the height information is not disregarded during BIGNet’s message-passing design.

4.3 Results and Discussion of Phone Case Study

4.3.1 Model’s Training Process and Performance.

Using BIGNet’s architecture and parameters from Table 1 column 1, after 105 epochs, the model is able to reach over 99.8% for both training and testing accuracy (Fig. 4) and can 100% predict the reference sources’ brands as well (Table 4). The high reference accuracy demonstrates that parametric modeling has successfully captured the source data. The small difference between train and test accuracy could be attributed to the fact that both sets are generated using identical interpolated shape rules. The fact that BIGNet reaches very high accuracy is very likely due to the same reason that gives the two classes significant brand consistency. While these findings indicate the possibility of overfitting, the additional evaluation metrics examined in the following sections will demonstrate that BIGNet does not compromise interpretability. Table 5 shows the confusion matrices on the three datasets.

Fig. 4
Accuracy converges to 98% within 20 epochs
Fig. 4
Accuracy converges to 98% within 20 epochs
Close modal
Table 4

Trained model’s loss and accuracy on the 3 datasets

DatasetNumber of samplesLossAccuracy
Train18,0000.005399.83%
Test20000.003899.85%
Reference280.0046100%
DatasetNumber of samplesLossAccuracy
Train18,0000.005399.83%
Test20000.003899.85%
Reference280.0046100%
Table 5

Confusion matrices (not normalized)

Train set confusion matrixPrediction
TruthAppleSamsung
Apple90000
Samsung308970
Test set confusion matrixPrediction
TruthAppleSamsung
Apple9991
Samsung2998
Reference set confusion matrixPrediction
TruthAppleSamsung
Apple150
Samsung013
Train set confusion matrixPrediction
TruthAppleSamsung
Apple90000
Samsung308970
Test set confusion matrixPrediction
TruthAppleSamsung
Apple9991
Samsung2998
Reference set confusion matrixPrediction
TruthAppleSamsung
Apple150
Samsung013

4.3.2 Dimensional Reduction.

By visualizing the last hidden layer with t-distributed Stochastic Neighbor Embedding (t-SNE) and principal component analysis (PCA) in Fig. 5, Apple phones (left) and Samsung phones (right) are clustered and well separated from each other, demonstrating that the network can very clearly discriminate the two phone brands.

Fig. 5
t-SNE and PCA plots from test set’s latent vectors
Fig. 5
t-SNE and PCA plots from test set’s latent vectors
Close modal

4.3.3 LOFO Visualization Study.

To visualize localized features, important chunks and curves are distinctively colored for each picture (some results are shown in Fig. 6). Discriminative features of the two brands are then observed and summarized in Table 6. Since both brands highlight the lens, fillet, width, and the gaps between the screen and the frame, the study then examines partial dependence plots created by parameter extrapolation on these features in the following section to further understand the network’s attention.

Fig. 6
LOFO results from Apple (a-1 to a-7) and Samsung (b-1 to b-7)
Fig. 6
LOFO results from Apple (a-1 to a-7) and Samsung (b-1 to b-7)
Close modal
Table 6

A summary of the model’s attention by observing the LOFO visualization results

BrandindexObserved featuresFigure
Applei-1Lensa-2, a-3, a-4, a-5
i-2Corner’s filleta-1, a-3, a-5, a-7
i-3Widtha-1, a-2, a-3, a-4, a-6, a-7
i-4Screen–frame gapa-1, a-2, a-3, a-4, a-5, a-6, a-7
i-5Speaker at the middlea-1, a-2, a-4, a-6
i-6Notch related featuresa-1, a-2, a-4, a-5, a-6, a-7
i-7Mute buttona-1, a-2, a-4, a-5, a-6
Samsungs-1Lens at the right cornerb-2, b-3, b-5
s-2Corner’s filletb-1, b-2, b-3, b-5, b-7
s-3Widthb-1, b-2, b-3, b-4, b-5, b-6, b-7
s-4Screen–frame gapb-1, b-2, b-4, b-6, b-7
BrandindexObserved featuresFigure
Applei-1Lensa-2, a-3, a-4, a-5
i-2Corner’s filleta-1, a-3, a-5, a-7
i-3Widtha-1, a-2, a-3, a-4, a-6, a-7
i-4Screen–frame gapa-1, a-2, a-3, a-4, a-5, a-6, a-7
i-5Speaker at the middlea-1, a-2, a-4, a-6
i-6Notch related featuresa-1, a-2, a-4, a-5, a-6, a-7
i-7Mute buttona-1, a-2, a-4, a-5, a-6
Samsungs-1Lens at the right cornerb-2, b-3, b-5
s-2Corner’s filletb-1, b-2, b-3, b-5, b-7
s-3Widthb-1, b-2, b-3, b-4, b-5, b-6, b-7
s-4Screen–frame gapb-1, b-2, b-4, b-6, b-7

Note: One thing to notice is i-1 to i-4 are similar parameters to s-1 to s-4; therefore, this study continues to experiment on their parameter extrapolations.

4.3.4 Partial Dependence Plot

  • (1) Lens horizontal position

The goal of this experiment is to see if prediction confidence drops while extrapolating Apple’s lens horizontal position. Since the trained BIGNet is robust by looking at multiple features, the result in Fig. 7 is plotted when features i-2, i-4, and i-6 are all shifted to Samsung’s dimensions range. The confidence curve drops when the lens is at the middle and to the right of the notch, which is because those are both Samsung’s possible lens locations (see b-1, b-2 in Fig. 6).

  • (2) Fillet radius and height–width ratio

Fig. 7
Confidence change while extrapolating Apple’s lens horizontal position
Fig. 7
Confidence change while extrapolating Apple’s lens horizontal position
Close modal

Similar results from (1) are found while extrapolating Apple’s width. The interesting thing is although Samsung is shorter in dimension of both width and normalized width, the model considers wider phones as Samsung. This is the result of the model looking at the length of the frame’s segment instead of the whole width, which also takes fillet radius into consideration (Table 7). In Fig. 8, this explanation is verified since the crossover width lays between the boundary of the two brands’ normalized segment length ranges.

  • (3) Screen–frame gaps

Fig. 8
Confidence change while extrapolating Apple phone’s width: the left and right regions represent the range of Apple and Samsung’s normalized segment length
Fig. 8
Confidence change while extrapolating Apple phone’s width: the left and right regions represent the range of Apple and Samsung’s normalized segment length
Close modal
Table 7

Although Samsung has a shorter normalized width, it also has a relatively smaller fillet radius; therefore, its segment length that the model perceives is longer (last column).

Dimension BrandWidth (mm)Fillet (mm)Height (mm)Normalized widthNormalized segment length
Apple71.1510.75146.150.490.34
Samsung73.107.42154.550.470.38
Dimension BrandWidth (mm)Fillet (mm)Height (mm)Normalized widthNormalized segment length
Apple71.1510.75146.150.490.34
Samsung73.107.42154.550.470.38

Since both brands also highlight the gaps among the screen, frames’ inner width (plane), and outer width (edge) in 4.3.3, a 2D extrapolation experiment on the two gap parameters is done on Samsung’s shape rules. In Fig. 9, results show that phones with smaller gaps between the screen and the frame are more likely to be predicted as Samsung, which matches the interpolation range of the two brands.

Fig. 9
Since Samsung has shorter distances between both screen to plane and plane to edge: the heatmap shows a greater prediction confidence at the lower left corner, meaning this is also a discriminative feature to the model.
Fig. 9
Since Samsung has shorter distances between both screen to plane and plane to edge: the heatmap shows a greater prediction confidence at the lower left corner, meaning this is also a discriminative feature to the model.
Close modal

5 Case Study—Cars

5.1 Background.

While the phone case study has shown the viability of GNN learning from curve-based representation, laborious work has to be done on observing and parameterizing to synthesize an augmented dataset. It is doable on two phone brands promptly, but unified, interpolatable parametric expressions have to be established for every studied model in every brand; generalizing without human attention is challenging. In addition, for products like cars with more complex shapes and more variety of models, although it will be difficult to construct unified shape grammars, there exist pixel datasets that have thousands of images for each brand. If SVGs can be acquired from such resources, BIGNet can be applied to learn from these large datasets with complex product geometries, and the workflow of extracting brand-related features can be even further automated.

This case study, therefore, aims to explore the feasibility of converting pixel images to curve images to create a data-driven, hands-free recognition system. Since cars have distinctive functionality and design criteria that differ significantly from those of phones, this study not only demonstrates the potential of fully automated SVG retrieval but also attempts to showcase the adaptability and generalizability of BIGNet across different product domains and design scales. Therefore, the following distinct yet comparable training scenarios are run to examine the model’s flexibility, robustness, consistency, and explainability:

Classifying different number of brands. All vectorized images in this case study are generated from the same automated pipeline without the need of parametric modeling. As a result, expanding the number of models and brands for a more comprehensive style classification is made possible with little human effort. Yet, one of the counters of deep learning methods is their lack of reproducibility. This is caused by having redundant freedom of parameters that would lead to suboptimal convergence. However, a style perception agent is expected to consistently exhibit the same features regardless of the training scheme, or data processing nuances, to enable designers’ ability to reason and make decisions from its inference. To investigate the generalizability of BIGNet and showcase the ease of dataset regeneration, this study conducts both six- and ten-brand classifications. Although a decrease in overall confidence when moving from six to ten brands is expected, it is also anticipated that BIGNet will still exhibit similar patterns in terms of which brands are easily identifiable and which are not.

Logo removal. While logos aid in brand recognition, the geometry of logos is not necessarily the brand-related features design engineers are attempting to extract. Therefore, identifying logos as part of the learned brand features is not the primary objective, as this may lead to overfitting of the logo and hinder the attention given to other important design features. To assess the effect of logos on brand classification, separate models are trained on cars with logos and without logos.

Comparison to CNN trained on pixel images. BIGNet trained on SVGs is claimed to offer more explainability, but CNN trained on pixel images is widely used and offers high identification accuracy. To compare the two approaches, this study fine-tunes a simCLR-pretrained ResNet-50 using the exact train–test split of pixel images before vectorization. Since ResNet-50 has 23 million learnable parameters and pixel images contain richer information, CNN is expected to have better accuracy than BIGNet. Despite that, as the goal of this research is to extract explicit and usable attention features, this comparison will focus on comparing feature visualization mapping and validate whether BIGNet provides more explicit and parametric results.

5.2 Data Selection and Preprocessing.

HK Comp cars [59], the largest brand- and orientation-labeled car dataset, is chosen to be the source of pixel images. To ensure a fair representation of the market, this research selects the top six and top ten most abundant car brands as the dataset, including both luxurious and affordable brands with international distribution in Asia, Europe, and America. While the resulting selection is expected to be non-biased and diverse, the total number of images per brand is relatively small, with only a few hundred images per brand. Therefore, to provide enough training data for BIGNet, a train–test ratio of 9:1 is split. Nonetheless, the dataset has an imbalanced number of images among brands (Fig. 10). To address this issue, two steps are taken to ensure no falsely high accuracy will occur due to bias toward brands with more data. First, a stratified split is performed to preserve the same ratio in both sets. Second, the algorithm randomly oversamples minority classes during training to ensure each brand has equal representation. More statistics on data distribution, including year and car segments, are provided in Table 8 to ensure the variance of each brand.

Fig. 10
Data distribution among brands shows a nonnegligible imbalance. The largest class Volkswagen (VW) has 73.6% more data than the smallest class KIA. Additionally, with only several hundreds of samples per brand, this study performs a 9:1 stratified train–test split.
Fig. 10
Data distribution among brands shows a nonnegligible imbalance. The largest class Volkswagen (VW) has 73.6% more data than the smallest class KIA. Additionally, with only several hundreds of samples per brand, this study performs a 9:1 stratified train–test split.
Close modal
Table 8

Data distribution among segments and years

Brand vs. segmentToyotaVWBenzAudiBMWHyundaiNissanFordKIAChevyTotalRatio
Sedan114190119127231171727615394134738.4%
SUV1082315159117875724353269319.7%
Hatchback7480272928405088637455315.8%
MPV34555000109356502587.4%
Others211661101179706038133665818.8%
Labeled %59.0%71.7%69.7%61.3%77.5%63.4%56.6%57.0%62.8%54.8%64.3%
Brand vs. yearToyotaVWBenzAudiBMWHyundaiNissanFordKIAChevyTotalRatio
<20082221868292881101412.6%
200862764312222361571763.3%
2009537234365061174025264147.8%
201058648110261375116317057110.7%
2011611057533108785447676068812.9%
2012111122807463446183644875014.1%
201311011211612912381911515686105519.8%
2014127121155107129788731111100104619.6%
2015397296463151214233324638.7%
>20154125651302290.5%
Labeled %99.3%100.0%99.5%100.0%100.0%100.0%98.9%99.8%100.0%100.0%99.8%
Brand vs. segmentToyotaVWBenzAudiBMWHyundaiNissanFordKIAChevyTotalRatio
Sedan114190119127231171727615394134738.4%
SUV1082315159117875724353269319.7%
Hatchback7480272928405088637455315.8%
MPV34555000109356502587.4%
Others211661101179706038133665818.8%
Labeled %59.0%71.7%69.7%61.3%77.5%63.4%56.6%57.0%62.8%54.8%64.3%
Brand vs. yearToyotaVWBenzAudiBMWHyundaiNissanFordKIAChevyTotalRatio
<20082221868292881101412.6%
200862764312222361571763.3%
2009537234365061174025264147.8%
201058648110261375116317057110.7%
2011611057533108785447676068812.9%
2012111122807463446183644875014.1%
201311011211612912381911515686105519.8%
2014127121155107129788731111100104619.6%
2015397296463151214233324638.7%
>20154125651302290.5%
Labeled %99.3%100.0%99.5%100.0%100.0%100.0%98.9%99.8%100.0%100.0%99.8%

Note: Among all brands, major segments are sedan, SUV, and hatchback, while most images come from 2009–2015. This demonstrates not only the balanced segment and year ratio in each brand, but also validates the concurrent brand competency during 2009–2015.

To achieve successful vectorization, an image preprocessing pipeline depicted in the “Pixel image to SVG conversation” section of Fig. 1 is proposed. First of all, background noise is removed by applying detectron2 [60], a maskRCNN-based AI to detect and apply the mask on the original image. Second, Google Cloud Vision API is used to detect and remove the logo for a comparison dataset. Edge detection is then implemented before curve fitting. Through comparing state-of-the-art edge detection methods, the transformer-based EDTER [61] is found to achieve the best performance at preserving object curves and eliminating reflections on the cars’ glossy surfaces. After applying edge thinning on EDTER’s response, Potrace [58] is used to vectorize the edges into SVG. To maintain a reasonable degree of homogeneity, all vectorized results are converted to cubic Bezier curves. At last, each SVG has its height normalized to 1, and the bounding box information of center coordinate, width, height, and area are pre-computed to enhance chunk-level aggregation.

5.3 AI Modification for Increased Data Complexity.

Since BIGNet is trained with the same type of image format, the message-passing flow in this case study shares a lot of common blocks in the phone study’s architecture. However, cars’ exterior shape has many more components than phones, and the vectorization from generic images also unavoidably leads to redundant curves. The two factors result in around 20 times more chunks and 10 times more curves than phones. To cope with the data’s increased scale of complexity, BIGNet architecture is modified from the phone case study’s parameters (Table 1 column 2) and can be summarized into three aspects:

  1. Increase the size of GNN. First, a more flexible curve-level aggregation policy with one FC is enhanced. Second, chunk-level FCs are added for better digestion both before and after the chunk-level aggregation. Lastly, the widths of most FCs are doubled to allow GNN’s bandwidth to carry more features per node. This is a reasonable modification because the geometry of car parts is much more complicated than that of a phone, with a lot more organic shapes. Overall, the number of learnable parameters has increased from 2000 to 6716.

  2. Better use of bounding box information. One of BIGNet’s key components is the chunk-level connectivity strength learned from bounding box attention. In the phone study, as all data share the common largest chunk being the phone’s outer frame, the connectivity strength matrix is derived from the bounding box relationship normalized by the maximum bounding box. Such homogeneity does not exist on vectorized SVGs of cars. Furthermore, the inter-chunk relationship, which is the square of a number of chunks, becomes roughly 100 times larger than synthetic phones and has a much larger variance. All these factors impose great challenges to the previous parameters. To tackle the increased complexity, the ratio values in the correlation matrix, namely area, width, and height, are first normalized to between 0 and 1 by taking logarithmic values. After that, an FC is applied to adapt the shape to desired features. The pseudo-code is shown in Algorithm 2.

  3. Augmentation. First, the horizontal flip is done because although cars’ front views are symmetric, photographs in the dataset are rarely taken at the exact center. Second, another augmentation is applied by running two distinct EDTER models to retrieve slightly different edge detection responses. The two combined techniques enlarge the dataset four times. Lastly, in this dataset, multiple images are often taken on the exact make and model from slightly different perspectives of the front view. As humans can identify car parts from slightly off perspectives, this research also treats this as a natural perspective augmentation. It is expected to prevent GNN from overfitting and, therefore, achieve a more robust model.

chunk level aggregation in car case study

Algorithm 2

Input: graph with N nodes G(V={v1...vN},E);

  Node features: EN×m={e1×m(v1)...e1×m(vN)};

  Nodes' bounding box features (horizontal location, vertical location, width, height, area):

  β1×5(v)={x(v),y(v),w(v),h(v),a(v),},vVN;

  Linear layer f*=f*(5m)

Output: Node features after one aggregation:

   EN×m={e1×m(v1)...e1×m(vN)}

Init Bprimitve = 0N × N × 5

Init Badapted = 0N × N × m

Init EN×m={e1×m(v1)...e1×m(vN)}=0N×m

Init E={eN×m(v1)...eN×m(vN)}=0N×N×m

Fori in 1...Ndo

  Forj in 1...Ndo

     Bprimitve[i, j, 0] x(vi)x(vj)

     Bprimitve[i, j, 1] y(vi)y(vj)

     Bprimitve[i, j, 2] log(w(vi)w(vj))

     Bprimitve[i, j, 3] log(h(vi)h(vj))

     Bprimitve[i, j, 4] log(a(vi)a(vj))

     Badapted[i, j, :] f*(Bprimitve[i,j])

  end

  eN×m(vi)stack(e1×m(vi))Ntimes

  eN×m(vi)Badapted[i,:,:]eN×m(vi)

  e1×m(vi)pool(eN×m(vi),axis=0)

end

5.4 Results and Discussion of Car Case Study.

This section will first focus on the results of BIGNet’s six-brand classification without logo removal, and then have an extensive comparison with other training scenarios.

5.4.1 Training.

During training, the batch size is chosen to be 100, and the learning rate is initialized to be 0.001. After ∼27,000 iterations, the learning rate is decreased to 0.0001 for fine-tuning since both train and test accuracy are stagnated (Fig. 11). As the end of the model starts overfitting, maximized test accuracy, which is at the 723rd epoch, is selected for evaluation. It is able to reach 89.3% training accuracy and 80.6% test accuracy. The testing results have shown BIGNet’s capability of recognizing real and unseen data.

Fig. 11
Accuracy and loss during the training process
Fig. 11
Accuracy and loss during the training process
Close modal

5.4.2 Performance Evaluation.

First, confusion matrices are examined. In Fig. 12, both train and test sets show a consistent trend of being able to predict Audi, BMW, and Benz most correctly. The result also shows that Toyota and Hyundai, while having the least accuracy among the six, also have a relatively higher chance to confuse with each other. Cohen’s kappa coefficient matrix is then calculated to validate this finding, as lower values between 0 and 1 indicate more inconsistency between ground truth and model prediction. Dimensional reduction on the last hidden layer using 2D t-SNE also shows evidence that Audi, BMW, and Benz are the three distinct clusters in latent space, while Hyundai and Toyota are much more entangled with each other. In other words, Hyundai and Toyota have better SP than other brands, making them harder to differentiate by BIGNet. These findings have aligned with the conclusion of Liu et al. [5]. Although they adopted a different approach to evaluate brand consistency, they also found that luxurious brands yield higher BC and that SP has a stronger effect on economy cars than BC.

Fig. 12
BIGNet evaluation on train set (column 1) and test set (column 2) in terms of confusion matrix (row 1), Cohen’s kappa coefficient matrix (row 2), and 2D t-SNE plot of data’s latent vectors (row 3). Both train and test sets show Benz, Audi, and BMW having better recognition rate and that Toyota and Hyundai get confused with each other more often.
Fig. 12
BIGNet evaluation on train set (column 1) and test set (column 2) in terms of confusion matrix (row 1), Cohen’s kappa coefficient matrix (row 2), and 2D t-SNE plot of data’s latent vectors (row 3). Both train and test sets show Benz, Audi, and BMW having better recognition rate and that Toyota and Hyundai get confused with each other more often.
Close modal

5.4.3 Feature Analysis.

As mentioned in Sec. 5.3, the increased data complexity also negatively impacts attempts to visualize the attention of BIGNet using LOFO. LOFO requires to run each image as many times as the number of curves and chunks, leading to huge computational costs. Further, the confidence score change from LOFO often does not reflect consistent and brand-related car features either. The ablation of individual shapes is either too subtle due to the robustness of larger graphs, or is too biased to chunks with large bounding boxes, making the contour always highlighted. This case study, therefore, implements a CAM [17]-inspired algorithm that efficiently runs only one inference per image. The algorithm looks at the contributions of each chunk’s latent vector and visualizes the chunks that contribute the most to a correct prediction.

The visualization results (Fig. 13) show BIGNet’s consistent attention to certain parts of each brand. It is notable that the detected features sometimes exhibit asymmetry. This is primarily due to the inherent asymmetry presented in the vectorized images. Given that photographs are seldom captured from a perfectly centered perspective and considering that the vectorization algorithm is sensitive to environmental factors such as the direction of light and reflections, which can introduce asymmetry and noise in curve fitting, car parts on the left and right sides frequently exhibit distinct geometries, thus contributing to the observed asymmetry.

Fig. 13
CAM-based BIGNet brand-related features’ visualization on test set. The Grad-CAM visualizations on CNN are located in the upper right corner of each image. It is obvious that BIGNet captures luxury segments’ well-distinguishable car parts including grille, headlights, and fog lights, while there are much fewer geometric clues on affordable cars (Toyota) that it has to rely on logo detection.
Fig. 13
CAM-based BIGNet brand-related features’ visualization on test set. The Grad-CAM visualizations on CNN are located in the upper right corner of each image. It is obvious that BIGNet captures luxury segments’ well-distinguishable car parts including grille, headlights, and fog lights, while there are much fewer geometric clues on affordable cars (Toyota) that it has to rely on logo detection.
Close modal

Table 9 further summarizes the most frequently highlighted attention in the test set and their percentage of being visualized. Among the six brands, the luxury brands (BMW, Benz, and Audi) exhibit more explainable and intuitive attention, while all three tend to highlight the curves related to the grille and headlights. This suggests that luxury brands prioritize preserving brand consistency in the same car parts while incorporating different geometries. Finally, ResNet-50 is fine-tuned as the CNN to compare with BIGNet. Although CNN reaches almost 100% accuracy for both train and test sets, it fails to show the explicit features of a brand using Grad-CAM. BIGNet’s results, on the other hand, are not only explainable but also editable, as each curve is parameterized by control points. This makes it a much more useful tool as a surrogate for analyzing brand-related features.

Table 9

Most highlighted features by BIGNet

BrandNumber of data observed in test set1st most highlighted feature %2nd most highlighted feature %
Toyota54Fog lights92.6%Logo79.6%
VW70Logo68.6%Grille64.3%
Benz65Grille89.2%Headlights76.9%
Audi63Grille77.8%Headlights33.3%
BMW65Headlights84.6%Grille72.3%
Hyundai49Headlights69.4%Fog lights24.5%
BrandNumber of data observed in test set1st most highlighted feature %2nd most highlighted feature %
Toyota54Fog lights92.6%Logo79.6%
VW70Logo68.6%Grille64.3%
Benz65Grille89.2%Headlights76.9%
Audi63Grille77.8%Headlights33.3%
BMW65Headlights84.6%Grille72.3%
Hyundai49Headlights69.4%Fog lights24.5%

5.4.4 Generalizability Study.

Compared to the phone case study, the car case study demonstrates a broader utility of BIGNet. It tackles a significantly more complex problem, transitioning from clean, parametrically modeled data to automatically vectorized data and from dozens of curves to thousands of curves. This section aims to impose even more demanding criteria on BIGNet for brand recognition. Specifically, BIGNet’s reliability and consistency are examined by further increasing the classification difficulty, with the effects of logo removal and classifying ten brands (original six brands plus KIA, Chevy, Ford, and Nissan). For each experiment, the accuracy, well-clustered classes, recognizability ranking, and entanglement pairs are evaluated as described in Sec. 5.4.2 and are summarized in Table 10. As BMW, Audi, and Benz are frequently getting higher accuracy across all four scenarios, economy cars are also frequently getting lower accuracy and more entanglements, which substantiates the findings in Sec. 5.4.2. It is also found that adding more brands to the classification problem has a bigger effect than removing the logo, which can be because of all four additional brands are in the economy segments, which have lower brand consistency. It is also notable that logo removal only slightly decreases the accuracy, showing BIGNet’s capability of recognizing higher-level features. Lastly, for a fair comparison, the trained ten-brand classifiers share the same BIGNet architecture that was designed for the six-brand classification task, except for the final linear layer. Therefore, the low accuracy result is very likely because of underfitting and is possible to be improved by increasing the number of BIGNet’s learnable parameters. Overall, this study not only shows BIGNet’s consistency and robustness in recognizing geometric differences between economy and luxury car brands but also demonstrates its generalizability to larger and harder brand classification problems.

Table 10

Comparison of the four training scenarios

BrandsHas logoAcc.Well-clustered classes (t-SNE)Well recognizable (Confusion matrix accuracy)Badly recognizable (Confusion matrix accuracy)Entanglement pairs (Cohen’s kappa coefficient matrix)
6YesTrain: 89.4%
Test: 80.6%
BMW > Audi > Benz > VWTrain: BMW > Audi = Benz > VW
Test: Audi > BMW > Benz = VW
Train: Hyundai < Toyota
Test: Toyota < Hyundai
Toyota-Hyundai
NoTrain: 87.1%
Test: 77.3%
BMW > Audi > Benz > VWTrain: BMW > Audi > Benz
Test: BMW > VW > Benz
Train: Toyota < Hyundai
Test: Hyundai = Toyota
Toyota-Hyundai
Toyota-VW
10YesTrain: 64.9%
Test: 63.4%
BMW > Audi > BenzTrain: BMW > Audi > VW
Test: Audi > VW > BMW
Train: Hyundai < Nissan < KIA
Test: Ford < Hyundai < Nissan
Toyota-Hyundai
Toyota-Nissan
Ford-Hyundai
Ford-KIA
NoTrain: 65.8%
Test: 59.1%
BMW > Audi > BenzTrain: BMW > Audi > Benz > VW
Test: BMW > Audi > VW > Benz
Train: Nissan < Ford < Hyundai
Test: Nissan < Hyundai < Ford
Toyota-Hyundai
Toyota-Nissan
Ford-Hyundai
Ford-KIA
BrandsHas logoAcc.Well-clustered classes (t-SNE)Well recognizable (Confusion matrix accuracy)Badly recognizable (Confusion matrix accuracy)Entanglement pairs (Cohen’s kappa coefficient matrix)
6YesTrain: 89.4%
Test: 80.6%
BMW > Audi > Benz > VWTrain: BMW > Audi = Benz > VW
Test: Audi > BMW > Benz = VW
Train: Hyundai < Toyota
Test: Toyota < Hyundai
Toyota-Hyundai
NoTrain: 87.1%
Test: 77.3%
BMW > Audi > Benz > VWTrain: BMW > Audi > Benz
Test: BMW > VW > Benz
Train: Toyota < Hyundai
Test: Hyundai = Toyota
Toyota-Hyundai
Toyota-VW
10YesTrain: 64.9%
Test: 63.4%
BMW > Audi > BenzTrain: BMW > Audi > VW
Test: Audi > VW > BMW
Train: Hyundai < Nissan < KIA
Test: Ford < Hyundai < Nissan
Toyota-Hyundai
Toyota-Nissan
Ford-Hyundai
Ford-KIA
NoTrain: 65.8%
Test: 59.1%
BMW > Audi > BenzTrain: BMW > Audi > Benz > VW
Test: BMW > Audi > VW > Benz
Train: Nissan < Ford < Hyundai
Test: Nissan < Hyundai < Ford
Toyota-Hyundai
Toyota-Nissan
Ford-Hyundai
Ford-KIA

Note: All cases share consistency of having better recognition of luxurious brands and more entanglement between affordable brands.

6 Limitations and Future Work

Although BIGNet’s classification accuracy is lower than pixel CNN, there are currently only less than 7000 learnable parameters in BIGNet. This is much fewer than Resnet-50 having 23 million parameters. On one hand, there is a huge space to improve accuracy from hyperparameter tuning and data processing. While the current BIGNet can already demonstrate explicit and explainable visualization results on brand-related features, with increased accuracy, visualization results would be expected to yield even more explainability. On the other, as curve-based images have more condensed information than pixel images, both dataset and AI architecture require smaller storage, which may have applications on lightweight AI design.

This research has shown it is possible to extract explicit and editable features by a deep network agent; however, the implications of these findings on brands and their classifications can be studied with regard to design processes across companies in future work. To help actual designers identify and quantify features in an automatic way, studies are planned to examine how humans can interact and collaborate with such a surrogate system and achieve design objectives. BIGNet’s usage may also be limited to interpolation and extrapolation, as the case studies have assumed that the design languages of brands’ appearance are fixed. However, if the designer wished to extend that language through different feature modifications, such modification to the brand language can be made through, for example, rule addition and subtraction followed by additional training of BIGNet. Such modification of brands through shape rule modification was demonstrated by McCormack et al. [11]. Aside from recognition, there is potential for actively generating or transferring stylized content building upon BIGNet’s framework. This may open up a new avenue for future research in data-driven, explainable generative models.

The research workflow depicted in Fig. 1 focuses on brand recognition. However, since BIGNet is a generic classifier, it possesses significant potential to be applied to other stylization problems as well, such as semantics, ergonomics, and even individual designers’ preferences. With subtle architecture modification, it is anticipated that BIGNet will offer a wide range of applications, including image segmentation, engineering design, market positioning, and technical appraisement. Aside from classifications to recognize and preserve brand consistency, safety, semantics, and ergonomics, with subtle modifications, it also has the potential of learning regression problems to recognize or predict contents’ labels like year or price. With the strong explainability of learning from curve-based representations, this research opens an avenue for deducing information from curve representations and is expected to outperform pixel-base approaches in domains that value interpretability more than accuracy.

7 Conclusions

This research proposes an automatic workflow to analyze and visualize style content explicitly and performs case studies on products’ brand classification to recognize and preserve brand consistency. To mimic a human designer’s thought process, data is constructed as SVG and is classified using BIGNet, a two-tier spatial GNN.

The phone study shows the model able to learn on parametrically synthesized SVG data. By visualizing attention using LOFO, BIGNet demonstrates the capability of capturing brand-related features at intra-curve, inter-curve, intra-chunk, and inter-chunk levels. Partial dependence plots on the model’s confidence variation during parameter extrapolation further substantiate that BIGNet learned continuous and meaningful features, including lens’ location, height–width ratio, and screen–frame gap.

The car study further explores the potential of a fully automated recognition system and investigates the generalizability of the workflow. With some architecture modifications from the phone study, BIGNet can learn from generic vectorized car images and reach 80.6% test accuracy on a six-brand classification task. Furthermore, despite that vectorized SVGs contain more product-irrelevant curves and chunks than the ones synthesized parametrically, BIGNet is still able to maintain robustness and discover explainable and meaningful brand-related information. During the evaluation process, BMW, Benz, and Audi are found to achieve higher recognition rates compared to other brands. This finding matches the optimized marketing strategy that luxurious cars value brand consistency more than economy cars. CAM visualization further shows that BIGNet has consistent attention on luxury brands’ grille and headlights. Finally, a comparison of BIGNet with a CNN baseline demonstrates that a curve-based deep learning model produces more interpretable visualizations, while the image format is also more editable.

To summarize, BIGNet, as a deep learning model, can identify brand-related features and can be applied to various product categories with distinguishing geometries, enabling humans to finally utilize it as a communicative and explainable style discovery agent, which significantly accelerates the aesthetic design process. Future research will explore a wider range of potential applications in other stylization domains.

Footnote

Acknowledgment

We would like to thank Ayush Raina and Sheng-yu Wang for their help in problem formulation and providing insightful feedback. This work was partially funded by the National Science Foundation under grant Award CMMI-2113301.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The data and information that support the findings of this article are freely available.2

References

1.
Orbay
,
G.
,
Fu
,
L.
, and
Kara
,
B.
,
2015
, “
Deciphering the Influence of Product Shape on Consumer Judgments Through Geometric Abstraction
,”
ASME J. Mech. Des.
,
137
(
8
), p.
081103
.
2.
Ersin Yumer
,
M.
,
Chaudhuri
,
S.
,
Hodgins
,
J. K.
, and
Burak Kara
,
L.
,
2015
, “
Semantic Shape Editing Using Deformation Handles
,”
ACM Trans. Graph.
,
34
(
4
), pp.
1
12
.
3.
Ravasi
,
D.
, and
Stigliani
,
I.
,
2012
, “
Product Design: A Review and Research Agenda for Management Studies
,”
Int. J. Manag. Rev.
,
14
(
4
), pp.
464
488
.
4.
Bloch
,
P. H.
,
2011
, “
Product Design and Marketing: Reflections After Fifteen Years
,”
J. Prod. Innov. Manage.
,
28
(
3
), pp.
378
380
.
5.
Liu
,
Y.
,
Li
,
K. J.
,
Chen
,
H. A.
, and
Balachander
,
S.
,
2017
, “
The Effects of Products’ Aesthetic Design on Demand and Marketing-Mix Effectiveness: The Role of Segment Prototypicality and Brand Consistency
,”
J. Mark.
,
81
(
1
), pp.
83
102
.
6.
Stiny
,
G.
,
1991
, “
The Algebras of Design
,”
Res. Eng. Des.
,
2
(
3
), pp.
171
181
.
7.
Agarwal
,
M.
, and
Cagan
,
J.
,
1997
, “
A Blend of Different Tastes—The Language of Coffeemakers
,”
Environ. Plann B Plann. Des.
,
25
(
2
), pp.
205
226
.
8.
Ang
,
M. C.
,
Ng
,
K. W.
, and
Pham
,
D. T.
,
2013
, “
Combining the Bees Algorithm and Shape Grammar to Generate Branded Product Concepts
,”
Proc. Inst. Mech. Eng. B
,
227
(
12
), pp.
1860
1873
.
9.
Chau
,
H. H.
,
Chen
,
X.
,
McKay
,
A.
, and
de Pennington
,
A.
,
2004
, “
Evaluation of a 3D Shape Grammar Implementation
,”
International Conference on Design Computing and Cognition
,
Cambridge, MA
,
July 19–21
, Springer, Netherlands, pp.
357
376
.
10.
Pugliese
,
M. J.
, and
Cagan
,
J.
,
2002
, “
Capturing a Rebel: Modeling the Harley-Davidson Brand Through a Motorcycle Shape Grammar
,”
Res. Eng. Des.
,
13
(
3
), pp.
139
156
.
11.
McCormack
,
J. P.
,
Cagan
,
J.
,
Vogel
,
C. M.
,
2004
, “
Speaking the Buick Language: Capturing, Understanding, and Exploring Brand Identity With Shape Grammars
,”
Des. Stud.
,
25
(
1
), pp.
1
29
.
12.
Aqeel
,
A. B.
,
2015
, “
Development of Visual Aspect of Porsche Brand Using CAD Technology
,”
Procedia Technol.
,
20
, pp.
170
177
.
13.
Hsiao
,
S.-W.
, and
Huang
,
H. C.
,
2002
, “
A Neural Network Based Approach for Product Form Design
,”
Des. Stud.
,
23
(
1
), pp.
67
84
.
14.
Lin
,
C.-C.
, and
Hsiao
,
S.-W.
,
2003
, “A Study on Applying Feature-Based Modeling and Neural Network to Shape Generation,” http://140.116.207.99/handle/987654321/262952.
15.
Goodfellow
,
I.
,
Bengio
,
Y.
, and
Courville
,
A.
,
2016
,
Deep Learning
,
MIT Press
,
Cambridge, MA
.
16.
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Hinton
,
G. E.
,
2017
, “
ImageNet Classification With Deep Convolutional Neural Networks
,”
Commun. ACM
,
60
(
6
), pp.
84
90
.
17.
Zhou
,
B.
,
Khosla
,
A.
,
Lapedriza
,
A.
,
Oliva
,
A.
, and
Torralba
,
A.
,
2016
, “
Learning Deep Features for Discriminative Localization
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Las Vegas, NV
,
June 26–July 1
, pp.
2921
2929
.
18.
Stiny
,
G.
, and
Gips
,
J.
,
1971
, “
Shape Grammars and the Generative Specification of Painting and Sculpture
,”
IFIP Congress
,
Ljubljana, Yugoslavia
,
Aug. 23–28
, p.
128
.
19.
Stiny
,
G.
,
2006
,
Shape : Talking About Seeing and Doing
,
MIT Press
,
Cambridge, MA
.
20.
Boatwright
,
P.
,
Cagan
,
J.
,
Kapur
,
D.
, and
Saltiel
,
A.
,
2009
, “
A Step-by-Step Process to Build Valued Brands
,”
J. Prod. Brand. Manag.
,
18
(
1
), pp.
38
49
.
21.
Harris
,
C.
, and
Stephens
,
M.
,
1988
, “
A Combined Corner and Edge Detector
,”
Alvey vision conference
,
Manchester, UK
,
Aug. 31–Sept. 2
, pp. 23.1–23.6.
22.
Tmosi
,
C.
, and
Kanade
,
T.
,
1992
, “
Shape and Motion From Image Streams: A Factorization Method
,”
Int. J. Comput. Vision
,
9
(
2
), pp.
137
154
.
23.
Lowe
,
D. G.
,
2004
, “
Distinctive Image Features From Scale-Invariant Keypoints
,”
Int. J. Comput. Vision
,
60
(
2
), pp.
91
110
.
24.
Dalal
,
N.
, and
Triggs
,
B.
,
2005
, “
Histograms of Oriented Gradients for Human Detection
,”
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
,
San Diego, CA
,
June 20–26
, pp.
886
893
.
25.
Calonder
,
M.
,
Lepetit
,
V.
,
Strecha
,
C.
, and
Fua
,
P.
,
2010
, “
BRIEF: Binary Robust Independent Elementary Features
,”
European Conference on Computer Vision
,
Heraklion, Crete, Greece
,
Sept. 5–11
, pp.
778
792
.
26.
Lecun
,
Y.
,
Bottou
,
L.
,
Bengio
,
Y.
, and
Haffner
,
P.
,
1998
, “
Gradient-Based Learning Applied to Document Recognition
,”
Proc. IEEE
,
86
(
11
), pp.
2278
2324
.
27.
Simonyan
,
K.
, and
Zisserman
,
A.
,
2015
, “
Very Deep Convolutional Networks for Large-Scale Image Recognition
,”
International Conference on Learning Representations
,
San Diego, CA
,
May 7–9
.
28.
Iandola
,
F. N.
,
Han
,
S.
,
Moskewicz
,
M. W.
,
Ashraf
,
K.
,
Dally
,
W. J.
, and
Keutzer
,
K.
,
2017
, “
SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size
,”
International Conference on Learning Representations
,
Toulon, France
,
Apr. 24–26
.
29.
Chen
,
T.
,
Kornblith
,
S.
,
Norouzi
,
M.
, and
Hinton
,
G.
,
2020
, “
A Simple Framework for Contrastive Learning of Visual Representations
,”
International conference on machine learning
,
Vienna, Austria
,
July 12–18
, pp.
1597
1607
.
30.
Chen
,
T.
,
Kornblith
,
S.
,
Swersky
,
K.
,
Norouzi
,
M.
, and
Hinton
,
G.
,
2020
, “
Big Self-Supervised Models Are Strong Semi-Supervised Learners
,”
Adv. Neural Inf. Process. Syst.
,
33
, pp.
22243
22255
.
31.
Chabot
,
F.
,
Chaouch
,
M.
,
Rabarisoa
,
J.
,
Teuliere
,
C.
, and
Chateau
,
T.
,
2017
, “
Deep Edge-Color Invariant Features for 2D/3D Car Fine-Grained Classification
,”
IEEE Intelligent Vehicles Symposium, Proceedings
,
Los Angeles, CA
,
June 11–14
, Institute ofElectrical and Electronics Engineers Inc., pp.
733
738
.
32.
Zhang
,
Q.
,
Wu
,
Y. N.
, and
Zhu
,
S.-C.
,
2018
, “
Interpretable Convolutional Neural Networks
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
, pp.
8827
8836
.
33.
Chang
,
D.
,
Ding
,
Y.
,
Xie
,
J.
,
Bhunia
,
A. K.
,
Li
,
X.
,
Ma
,
Z.
,
Wu
,
M.
,
Guo
,
J.
, and
Song
,
Y.-Z.
,
2020
, “
The Devil Is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification
,”
IEEE Transactions on Image Processing
,
Virtual
,
Oct. 25–28
, pp.
4683
4695
.
34.
Selvaraju
,
R. R.
,
Cogswell
,
M.
,
Das
,
A.
,
Vedantam
,
R.
,
Parikh
,
D.
, and
Batra
,
D.
,
2017
, “
Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Honolulu, HI
,
July 21–26
, pp.
618
626
.
35.
Eitz
,
M.
,
Hays
,
J.
, and
Alexa
,
M.
,
2012
, “
How Do Humans Sketch Objects?
,”
ACM Trans. Graph.
,
31
(
4
), pp.
1
10
.
36.
Schneider
,
R. G.
, and
Tuytelaars
,
T.
,
2014
, “
Sketch Classification and Classification-Driven Analysis Using Fisher Vectors
,”
ACM Trans. Graph.
,
33
(
6
), pp.
1
9
.
37.
Li
,
Y.
,
Hospedales
,
T. M.
,
Song
,
Y.-Z.
, and
Gong
,
S.
,
2015
, “
Free-Hand Sketch Recognition by Multi-Kernel Feature Learning
,”
Comput. Vision Image Understanding
,
137
, pp.
1
11
.
38.
Yu
,
Q.
,
Yang
,
Y.
,
Song
,
Y.-Z.
,
Xiang
,
T.
, and
Hospedales
,
T. M.
,
2017
, “
Sketch-a-Net: A Deep Neural Network That Beats Humans
,”
Int. J. Comput. Vision
,
122
(
3
), pp.
411
425
.
39.
Li
,
L.
,
Zou
,
C.
,
Zheng
,
Y.
,
Su
,
Q.
,
Fu
,
H.
, and
Tai
,
C.-L.
,
2020
, “
Sketch-R2CNN: An RNN-Rasterization-CNN Architecture for Vector Sketch Recognition
,”
IEEE Trans. Visual Comput. Graphics
,
27
(
9
), pp.
3745
3754
.
40.
Hu
,
C.
,
Li
,
D.
,
Song
,
Y.-Z.
,
Xiang
,
T.
, and
Hospedales
,
T. M.
,
2018
, “
Sketch-a-Classifier: Sketch-Based Photo Classifier Generation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
, pp.
9136
9144
.
41.
Xu
,
P.
,
Huang
,
Y.
,
Yuan
,
T.
,
Pang
,
K.
,
Song
,
Y.-Z.
,
Xiang
,
T.
,
Hospedales
,
T. M.
,
Ma
,
Z.
, and
Guo
,
J.
,
2018
, “
SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
.
42.
Wu
,
Z.
,
Pan
,
S.
,
Chen
,
F.
,
Long
,
G.
,
Zhang
,
C.
, and
Yu
,
P. S.
,
2020
, “
A Comprehensive Survey on Graph Neural Networks
,”
IEEE Trans. Neural Netw. Learn. Syst.
,
32
(
1
), pp.
4
24
.
43.
Micheli
,
A.
,
2009
, “
Neural Network for Graphs: A Contextual Constructive Approach
,”
IEEE Trans. Neural Networks
,
20
(
3
), pp.
498
511
.
44.
Atwood
,
J.
, and
Towsley
,
D.
,
2016
, “
Diffusion-Convolutional Neural Networks
,”
Neural Information Processing Systems
,
Barcelona, Spain
,
Dec. 5–10
.
45.
Monti
,
F.
,
Boscaini
,
D.
,
Masci
,
J.
,
Rodolà
,
E.
,
Svoboda
,
J.
, and
Bronstein
,
M. M.
,
2017
, “
Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 21–26
, pp.
5115
5124
.
46.
Veličković
,
P.
,
Cucurull
,
G.
,
Casanova
,
A.
,
Romero
,
A.
,
,
P.
, and
Bengio
,
Y.
,
2017
, “
Graph Attention Networks
,”
stat
,
1050
(
20
), pp.
10
48550
.
47.
Hamilton
,
W. L.
,
Ying
,
R.
, and
Leskovec
,
J.
,
2017
, “
Inductive Representation Learning on Large Graphs
,”
Neural Information Processing Systems
,
Long Beach, CA
,
Dec. 4–9
.
48.
Zhou
,
J.
,
Cui
,
G.
,
Hu
,
S.
,
Zhang
,
Z.
,
Yang
,
C.
,
Liu
,
Z.
,
Wang
,
L.
,
Li
,
C.
, and
Sun
,
M.
,
2020
, “
Graph Neural Networks: A Review of Methods and Applications
,”
AI Open
,
1
, pp.
57
81
.
49.
Sanchez-Gonzalez
,
A.
,
Heess
,
N.
,
Springenberg
,
J. T.
,
Merel
,
J.
,
Riedmiller
,
M.
,
Hadsell
,
R.
, and
Battaglia
,
P.
,
2018
, “
Graph Networks as Learnable Physics Engines for Inference and Control
,”
International Conference on Machine Learning
,
Stockholmsmässan, Stockholm, Sweden
,
July 10–15
, PMLR, pp.
4470
4479
.
50.
Do
,
K.
,
Tran
,
T.
, and
Venkatesh
,
S.
,
2019
, “
Graph Transformation Policy Network for Chemical Reaction Prediction
,”
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
,
Anchorage, AK
,
Aug. 4–8
, pp.
750
760
.
51.
Guo
,
S.
,
Lin
,
Y.
,
Feng
,
N.
,
Song
,
C.
, and
Wan
,
H.
,
2019
, “
Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting
,”
AAAI
,
33
(
1
), pp.
922
929
.
52.
Xie
,
L.
,
Lu
,
Y.
,
Furuhata
,
T.
,
Yamakawa
,
S.
,
Zhang
,
W.
,
Regmi
,
A.
,
Kara
,
L.
, and
Shimada
,
K.
,
2022
, “
Graph Neural Network-Enabled Manufacturing Method Classification From Engineering Drawings
,”
Comput. Ind.
,
142
, p.
103967
.
53.
Zhang
,
W.
,
Joseph
,
J.
,
Yin
,
Y.
,
Xie
,
L.
,
Furuhata
,
T.
,
Yamakawa
,
S.
,
Shimada
,
K.
, and
Kara
,
L. B.
,
2023
, “
Component Segmentation of Engineering Drawings Using Graph Convolutional Networks
,”
Comput. Ind.
,
147
, p.
103885
.
54.
Ranscombe
,
C.
,
Hicks
,
B.
,
Mullineux
,
G.
, and
Singh
,
B.
,
2012
, “
Visually Decomposing Vehicle Images: Exploring the Influence of Different Aesthetic Features on Consumer Perception of Brand
,”
Des. Stud.
,
33
(
4
), pp.
319
341
.
55.
Friedman
,
J. H.
,
2001
, “
Greedy Function Approximation: A Gradient Boosting Machine
,”
Ann. Stat.
,
29
(
5
), pp.
1189
1232
.
56.
Akkucuk
,
U.
, and
Esmaeili
,
J.
,
2016
, “
The Impact of Brands on Consumer Buying Behavior
,”
Int. J. Acad. Res. Bus. Soc. Sci.
,
5
(
4
), pp.
1
16
.
57.
Hussain Shaheed Zulfikar Ali Bhutto
,
S.
, and
Raheem Ahmed
,
R.
,
2020
, “
Smartphone Buying Behaviors in a Framework of Brand Experience and Brand Equity
,”
Transform. Bus. Econ.
,
19
(
2
), pp.
220
242
.
58.
Selinger
,
P.
,
2003
, “Potrace: A Polygon-Based Tracing Algorithm,” http://autotrace.sourceforge.net/.
59.
Yang
,
L.
,
Luo
,
P.
,
Loy
,
C. C.
, and
Tang
,
X.
,
2015
, “
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Boston, MA
,
June 7–12
, pp.
3973
3981
.
60.
Wu
,
Y.
,
Kirillov
,
A.
,
Massa
,
F.
,
Lo
,
W.-Y.
, and
Girshick
,
R.
,
2019
, “Detectron2,” https://github.com/facebookresearch/detectron2.
61.
Pu
,
M.
,
Huang
,
Y.
,
Liu
,
Y.
,
Guan
,
Q.
, and
Ling
,
H.
,
2022
, “
EDTER: Edge Detection with Transformer
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
New Orleans, LA
,
June 21–24
, pp.
1402
1412
.