Usage context is considered a critical driving factor for customers' product choices. In addition, physical use of a product (i.e., user-product interaction) dictates a number of customer perceptions (e.g., level of comfort). In the emerging internet of things (IoT), this work hypothesizes that it is possible to understand product usage and level of comfort while it is “in-use” by capturing the user-product interaction data. Mining this data to understand both the usage context and the comfort of the user adds new capabilities to product design. There has been tremendous progress in the field of data analytics, but the application in product design is still nascent. In this work, application of feature-learning methods for the identification of product usage context and level of comfort is demonstrated, where usage context is limited to the activity of the user. A novel generic architecture using foundations in convolutional neural network (CNN) is developed and applied to a walking activity classification using smartphone accelerometer data. Results are compared with feature-based machine learning algorithms (neural network and support vector machines (SVM)) and demonstrate the benefits of using the feature-learning methods over the feature-based machine-learning algorithms. To demonstrate the generic nature of the architecture, an application toward comfort level prediction is presented using force sensor data from a sensor-integrated shoe.

Introduction and Motivation

In product development, usage context is considered a critical driver in the marketing [13] and engineering [4,5] of a product. Per He et al. [5], usage context is defined as “all aspects describing the context of product use that vary under different use conditions and affect product performance and/or consumer preferences for the product attributes.” Based on this definition, usage context can be further classified into intended usage and actual (in-use) usage context. In engineering design, usage context has been identified and incorporated in choice modeling via surveys, limiting insight on actual use post product launch as only intended usage is considered [5].

Traditionally, consumer information obtained in the form of surveys, focus groups, etc., is mapped onto a single construct, Utility, using methods like discrete choice analysis or choice analysis [6]. This mapping is influenced by various forms of designer bias including prior experience, existing mental models, and industry/team/firm culture and practices [7]. In addition to Utility, other constructs also affect consumer preferences. An alternative approach in understanding these preferences comes from behavioral research in consumer psychology where constructs measuring specific thoughts, perceptions, and attitudes are mapped onto a network of interconnected judgments that predict downstream consumer preferences or perceptions about a product. This network of interconnected judgments has the potential to reveal the reasoning behind a consumer's particular preference/perception at the psychological level. Using the latent constructs, a designer can influence consumers' perceptions in a more targeted manner by designing products accordingly.

An important influence on consumer perceptions for a given product is usage context. Inclusion of usage context in design is typically accomplished through observational studies or self-reporting by end-users. This makes the inclusion of such information subjective in nature, though more recently work has been done to include usage context as a factor in Utility based design [5]. The largely subjective nature of current practice represents an opportunity to develop approaches that quantify the influence of usage contexts in design and map it onto various latent constructs representing specific user perceptions.

Figure 1 shows the relationship between the product usage context and a customer's perception about the product. Latent constructs can assist the designer in understanding the various psychological evaluations of the customers relevant to the product under consideration. For example, a “perceived capability” latent construct can facilitate the determination of whether the product is capable enough to fulfill the needs of the customer. Similarly, there are other latent constructs, which can be considered based on the product context. One of the scenarios in which customers can evaluate a particular product is based on their actual usage of the product. This is represented in Fig. 1, where the actual usage context influences a network of latent factors (e.g., comfort) ultimately influencing a customer's perception about the product. For example, if a customer is wearing a work-shoe and using it for activities like running, the customer will likely be uncomfortable and will not prefer the shoe for running.

This extreme example of a product in use highlights a gap in the identification of usage context and predictions of comfort, which is often a posterior process reliant on user reporting. To understand customer perceptions, it is important to identify and understand actual usage by capturing actual “in-use” context and measuring comfort levels in a manner that is not limited to posterior self-reports (e.g., product reviews). The process envisioned in this work can be automated to support these capture and measure activities. Automated identification of usage context could provide more insight to designers regarding how the product is actually being used which then could lead to the development of new design features and functionality in future generations of the product. Further, automated prediction of comfort level reduces the need for user reporting and mitigates loss of information and potential for reviewer bias.

Identifying usage context using field data can also lead to the discovery of relevant contexts not taken into account earlier in the design process. In addition, it supports understanding of contexts that are not relevant, helping to eliminate corresponding product features in future generations. For example, a subset of intended usage (identified prior to product launch) may not be applicable post launch and hence can be eliminated in future generations of the product. On the other hand, new usage contexts identified post launch using field data can be included in the intended usage scenarios set during the development process for the next generation of the product. In addition, it can be safely hypothesized that if the predicted comfort level for the new usage context is high, then these contexts can create value for the product.

In certain design paradigms like empathic design [8,9], the user–product interaction can be observed, recorded, and used later in the design process. The associated comfort level can also be similarly recorded and used later in the design process. However, scaling such methods creates challenges and obtaining information once the product is fielded is time consuming and costly, and in some cases could be infeasible (e.g., observing a customer using running shoes). Further, these methods tend toward qualitative information but capturing actual usage information in a quantitative manner for integration with existing design methodologies may also prove beneficial.

In this work, we refer to in-use or “actual” usage context as “activity” being performed by the customer and “level of comfort” as a rating a customer would give. The range of the comfort rating can be defined by the decision maker during data collection. As the internet of things (IoT) paradigm continues to emerge and smart devices pervade, increasing the ubiquity of sensors will provide data associated with a user and specific activities being performed. We hypothesize that this data are critical in understanding the relationship between usage context, latent factors like comfort, and ultimately the overall perception of the product that influences product valuation and purchase decisions. Establishing relationships between actual usage context and such latent factors is a four-step procedure—(i) collect data using embedded sensors, (ii) recognize activity, (iii) predict latent factors, and (iv) correlate activity with latent factors. This capability is part of the long term vision of a new paradigm in product design—cyber-empathic design (CED) [10]. In CED, user–product interaction data mapped to specific psychological latent constructs provide insight on how user perceptions are formed over a particular product. For CED to be practically used in design, development of methods that automate identification and presentation of relevant product attributes is critical. In this paper, the issues of recognizing activity and a relevant latent factor (i.e., comfort level) are investigated. In addition, an approach to identify activity and predict comfort level for the same activity simultaneously is proposed, but will be further tested in future work.

In the last decade, there has been a tremendous surge in the area of machine learning, especially in the subfield of deep learning more recently. Machine learning is a very common method for activity recognition [1116]. However, there is less work in the application of machine learning methods toward latent constructs prediction.

The procedure for traditional machine learning methods is shown in Fig. 2(a). Although this approach is widely used, there are three challenges with this methodology:

  1. (1)

    It relies heavily on human engineered features.

  2. (2)

    It relies on humans to continuously invent new features or research quality features (i.e., dependence on domain knowledge).

  3. (3)

    It introduces human bias for feature selection that is then fed into the classification algorithm.

Given these issues, the focus in machine learning shifts toward the automatic generation of quality features where the algorithm itself can discover features from raw data and use it for training and classification, as shown in Fig. 2(b). In early image classification, the machine learning community faced a similar situation. The convolutional neural network (CNN) [17] method addressed this situation and virtually eliminated the requirement of domain expertise in image classification. As a result, the CNN is one of the foundations for deep learning and is widely used today [18].

For this work, we use feature-learning methods for activity recognition and comfort level estimation as shown in Fig. 2(b) by employing a CNN. The CNN is a supervised algorithm that learns feature representations from raw data. The results are compared with two feature-based methods—neural network and support vector machines (SVM). It should be noted that for the case studies presented in this paper, the authors' do not possess the domain expertise required for analysis of the data source—feature engineering from signal data—and hence demonstrates the domain-independent usefulness of the “feature-learning” methods in activity recognition and comfort level estimation.

A review of the feature-based and feature-learning methods is presented in Sec. 2. In Sec. 3, how the feature-learning method has been adopted with a modified architecture for this work is described. Case studies, results and discussions are presented in Sec. 4 while conclusions and future work are presented in Sec. 5.

Related Work—Feature-Based and Feature-Learning Methods

This section first presents the basic classification of machine learning algorithms based on learning style, highlighting the need for feature-learning methods. A brief overview of the feature-based methods is presented, and finally, an overview of the feature-learning method, which is adopted for this work, is presented.

As shown in Fig. 3, machine learning can be broadly classified into supervised (human provides pretagged information), unsupervised (nontagged information), and semisupervised methods [1921]. Each class of supervised and unsupervised method can be further classified into feature-based and feature-learning methods [1921].

Feature-based methods follow the procedure shown in Fig. 2(a). In these methods, data are first collected, and using domain knowledge, features are extracted from the raw data and serve as input to the classification algorithm. For supervised algorithms, annotated outputs are needed and using a supervised classification algorithm a model is developed to match the inputs (features) with the annotated outputs [20]. The most common feature-based supervised algorithms are neural networks and support vector machines [20]. Although the supervised algorithms themselves are universal approximators [22,23], the classification accuracy is highly dependent on the quality of the inputs (i.e., the extracted features). Similar classification and arguments are applicable for unsupervised feature-based methods. Thus, the quality of the inputs to the classification algorithm is dependent on the domain knowledge of the experts extracting the features.

Automated activity recognition and comfort level estimation are complicated tasks. The accuracy of the recognition and estimation are dependent on high quality features extracted from raw data. However, the feature definition is often heuristic based and not task dependent; hence, not all variations in the data can be captured. In addition, for the feature extraction task, feature selection is complex and iterative. While it is expected that domain experts will extract high quality features, it is not assumed that all variations in the data have been taken into account and the extracted features may not represent the optimal set in complex real-world scenarios. Thus, there is potential for techniques where features are recognized by the algorithm, reducing dependence on domain experts.

Based on this recognized opportunity, Secs. 2.1 and 2.3 present algorithms (feature-based and feature-learning) used in this work. This work is only related to supervised algorithms; hence, the review is limited to supervised algorithms.

Multiclass Support Vector Machines (MC-SVM).

Support vector machines can be used when data have exactly two classes. An SVM classifies data by finding the best hyperplane that separates all data points of one class from those of the other classes. The best hyperplane for an SVM means the one with the largest margin between the two classes. Margin describes the maximal width of the region parallel to the hyperplane that has no interior data points [24]. The support vectors are the data points that are closest to the separating hyperplane; these points are on the boundary of the slab. Figure 4 illustrates these definitions.

To extend the SVM for more than two classes, consider a dataset consisting of l patterns where each one is a pair of the type (xi,yi) i [1,,l], xim, and yi=±1. A standard binary SVM can be learned by solving a convex constrained quadratic programing minimization problem which is given by the following formulation [1115,23,25]: 
minα12αTQαrTα
(1)
 
0αiC i[1,,l]
(2)
 
yTα=0
(3)
where C is the regularization parameter, ri=1 i  and Q is the symmetric positive semidefinite l×l kernel matrix, and qij=yiyjK(xi,xj). After solving the convex constrained quadratic programing problem, the αi i[1,,l] values can be found and used to predict the class of any new pattern using the feed-forward phase formulation of the SVM 
f(x)=i=1lyiαiK(xi,x)+b
(4)

where b is the bias term. It should be noted here that x denotes the features that are inputs to the classification algorithm.

Neural Networks.

The fundamental building block for neural networks is the single-input neuron as shown in Fig. 5. Three distinct functional operations take place in the neuron model. First, the scalar input p is multiplied by the scalar weight w to form the product wp. Second, the weighted input wp is added to the scalar bias b to form the net input n. Finally, the net input is passed through the activation function f, which produces the scalar output a [26,27].

The simple neuron can be extended to handle inputs that are vectors [27,28] as shown in Fig. 6. The three operations remain the same even in this case, except that the weights w are now represented by a matrix. The types of activation functions are step function, linear transfer function, and log sigmoid functions. Two or more neurons can be combined in a layer and a neural network can consist of one or more such layers [13,27,28] as shown in Fig. 7.

In multilayer neural networks, the output of one layer acts as input to the subsequent layer. The layers in a multilayer neural network play different roles. The layer that produces the output is called the output layer while all other layers are called the hidden layer. The neural network architecture shown in Fig. 7 is also known as a “fully connected neural network.” The fully connected neural network forms one of the stages in a CNN, which is presented in Sec. 2.3.

Convolutional Neural Network.

Convolutional neural network originated in the field of computer vision and image classification [1719,29,30]. The key attribute of CNN is architecting different processing units (convolution, pooling, and normalizations) alternatively as shown in Fig. 8. Through stacking of convolution and pooling layers, it is possible to learn features that provide representation of the input signal. The features extracted by CNN are task dependent and not reliant on human engineered features. For detailed information readers are referred to Refs. [1719], [29], and [30]. The critical difference between a general neural network and CNN is that CNN learns features from raw data directly while a neural network is dependent on subject matter experts to design and extract features.

In the convolution layer, a filter extracts features by traversing across different regions of the input data. It should be noted that the filter weights are shared across all regions in one layer. Thus, different filters are used to extract different features. Similar to the simple neuron concept shown in Fig. 5, the output of the convolutional operators is passed through the activation function to form a feature map. If we denote the kth feature map at a given layer as hk, whose filters are determined by the weights Wt and bias bk, then the feature map hk is obtained as follows (using as activation tanh function): 
hijk=tanh((Wk*x)ij+bk)
(5)

In the pooling layers, the resolution of feature maps is reduced to increase the invariance of features to distortions on the inputs. Specifically, feature maps in the previous layers are pooled over local neighborhoods by either max, sum, or mean pooling function. The features in the pooled layer are then used as an input to a fully connected neural network or another convolutional layer as shown in Fig. 8. The error obtained from the feed forward network which consists of the convolutional layer, pooling layer, and the fully connected neural network is then back propagated via the back-propagation algorithm [1719,29,30]. Using an optimization algorithm, the weights of the convolutional layers are trained by minimizing the error of the feed forward network. In this manner, the optimized features of the input data are learned. The CNN has been modified for this work and is considered for activity classification and comfort level prediction. The architecture of the CNN we use in this work is presented in Sec. 3.

Proposed Architecture

The architecture presented in Sec. 2.3 is suitable for images, which are stationary. That is, they do not have a temporal component like sensor data as the data from sensors can vary from one time to another. Thus, there is a need to develop time invariant features. Also, in the case of activity recognition and comfort level estimation, there are potentially multiple sources of signals. For this work, the principles of CNN are adapted for multisourced data. In Sec. 3.1, the architecture used for this work is presented.

CNN Architecture.

In this work, the principles of CNN are leveraged to learn the features of each signal source. The features are learned using convolution and pooling layers. As sensor signals are one-dimensional, only one-dimensional convolution is performed instead of two-dimensional convolution as in the case of images. The convolutions are performed on the temporal axis of the sensor signal. The learned features from each signal source are then concatenated and used as input to a fully connected network. The computing architecture is shown in Fig. 9.

The number of feature-learning layers (convolution and pooling layers) varies from task to task and can be treated as a hyper-parameter selected based on cross-validation analysis as demonstrated in Sec. 4. The number of feature-learning layers, if scaled, enables hierarchical feature-learning. However, increasing the number of feature-learning layers also increases the computational cost. It also exponentially increases the potential of a vanishing gradient which occurs when a gradient optimizer and chain-rule derivative is used to train multilayer neural networks [18]. To address this challenge, new activation functions like a rectified linear unit [18] and new architectures like residual networks [31,32] have been developed. The rectified linear unit function is shown in Eq. (6), while the smooth analytical approximation is shown in Eq. (7). In practice, during training of the network, one should start with a smaller network and gradually increase the number of feature-learning layers until there is no improvement in the accuracy. In this work, a rectified linear unit has been considered but the use of residual networks is left for future work. 
f(x)=max(0,x)
(6)
 
f(x)=ln(1+ex)
(7)

Finally, the stacked feature-learning layers act as input to a fully connected network. The number of layers in the fully connected network is also considered as a hyper-parameter. The network is then trained using a gradient based optimization algorithm. There have been many gradient-based algorithms employed in machine learning including stochastic gradient descent [3335], stochastic gradient descent with Nesterov Momentum [3336], ADAGRAD [34,35,37], and ADAM [34,35,38]. Based on empirical findings, ADAGRAD and ADAM are widely used in practice because of their speed and accuracy [13,17,18,31], and because of this, ADAGRAD has been implemented in this work. Both ADAGRAD and ADAM are gradient based optimization methods with modifications to handle large amounts of data and promote improved convergence rates. The parameters associated with the optimization algorithm are also considered as hyper-parameters and should be tuned using a cross-validation analysis.

These basic building blocks of the modified CNN (convolutional layer, pooling layer and fully connected layer) remain the same for any task including activity recognition and comfort level estimation. Thus, the feature-learning network is more appealing than the feature-based network as it eliminates the requirement of domain expertise in designing the features for each task. The only expertise or experience needed is related to the systematic selection and tuning of the hyper-parameters. A demonstration of the proposed feature-learning method is presented in Sec. 4.

Case Studies and Discussion

To demonstrate the effectiveness of the architecture presented in Sec. 3 and feature-learning methods in general, two case studies are presented in this section—activity recognition and comfort level estimation. For activity recognition, a dataset from the University of California-Irvine (Irvine, CA) is used [39]. The dataset consists of smartphone sensor data collected from a number of users during various activities (walking, running, laying down, etc.). For comfort level estimation, experiments were conducted at the University at Buffalo, where a number of participants used a sensor-integrated shoe and responded to surveys pertaining to their comfort level while the sensors collected data. During the study, the participants performed various tasks (walking, walking upstairs, etc.). For both applications, the accuracy of the feature-learning method developed and described in Sec. 3, and traditional feature-based methods represented by a neural network and SVM are studied to demonstrate that feature-learning methods can be used for context identification without the requirement of domain expertise to extract features.

In product design, to understand the actual usage context and predict the comfort level, designers can avoid having to identify new features if the algorithm and architecture can automatically learn features that characterize the consumer usage context and comfort level from the data. In the remaining sections, first a general procedure used to train and develop the model is presented, followed by the details of the dataset. Descriptions of the features used in the feature-based methods are also presented along with the results of the comparison studies between the feature-learning and feature-based methods.

General Model Training Procedure.

For both feature-learning methods and feature-based methods, there are a number of hyper-parameters which need to be tuned during the model training to obtain the best performance. Hyper-parameters include but are not limited to the number of convolutional and pooling layers (for feature-learning methods), the number of fully connected layers, the size of the convolutional filters and pooling layers (for feature-learning methods), the number of hidden nodes in the fully connected layers, and the parameters related to the gradient based optimization algorithms used for training. While hyper-parameter tuning is still a topic of active research in the machine learning community [40], in this work, to obtain the best model, the procedure shown in Fig. 10 is used for both case studies. This procedure results in an optimized set of hyper-parameters and eliminates under- or over-fitting of the model.

As shown in Fig. 10, as the first step, the original dataset is randomly divided into three categories—a training set, a validation set, and a test set. The training and validation sets are used to perform cross-validation analysis, where algorithms with different hyper-parameters are tested. The training and validation dataset are combined prior to the cross-validation analysis. K-fold cross-validation analysis [41], with K = 5, is performed for both case studies. The model with the best validation accuracy is selected as the model and is used to obtain the benchmark accuracy. The test dataset is not included in the cross-validation analysis to avoid biasing the model training. In this way, the true accuracy of the model is reflected since the model never uses any information related to the test dataset. Also, overfitting of the model is avoided in the cross-validation analysis, as the model with the best validation accuracy (and not training accuracy) is selected.

Case Study 1—Activity Recognition.

To implement the architecture presented in Sec. 3 for usage context or activity recognition, a dataset from the University of California-Irvine is used [39].

Data Set Information.

The dataset includes data from 30 volunteers within the age group of 19–48 years [39]. Each performed six activities (walking, walking upstairs, walking downstairs, sitting, standing, and laying) wearing a smartphone on their waist. Using its embedded accelerometer and gyroscope, three-axial linear acceleration and three-axial angular velocity collected at a constant rate of 50 Hz is captured. The sensor signals were preprocessed by applying noise filters and then sampled in fixed width sliding windows of 2.56 s and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components. Therefore, a filter with 0.3 Hz cutoff frequency was used. For each record in the dataset, the following information are present:

  • raw tri-axial signals from the accelerometer and gyroscope of all the trials with the participants,

  • activity label, and

  • an identifier of the subject who carried out the experiment.

For this work, only the acceleration component obtained from the accelerometer is used. The dataset contained 10,929 total data points. For all algorithms implemented (feature-based and feature-learning), the training and testing procedure presented in Fig. 10 is followed.

Feature Extraction (Feature-Based Methods).

For the feature-based methods, Table 1 shows the features and the corresponding feature-based classification multiclass support vector machine, which were programmed using Python. Using the features and the corresponding algorithms, the accuracy results are compared with the feature-learning methods in Sec. 4.2.3.

Accuracy Comparison.

For CNN, the architecture presented in Sec. 3 and represented by Fig. 9 is implemented where source 1, source 2, and source 3 correspond to the X-Axis, Y-Axis, and Z-Axis, respectively. The number of feature-learning layers (convolutional and pooling layer) is two, the number of convolutional filters is 128 with filter size of five samples, and the pooling size is four in both the layers. The feature-learning layer acts as an input to a fully connected network, where for both models two hidden layers are used with 2048 hidden units in each layer. The output layer is modeled as a softmax layer [16,18,23,29,42,43] and the cross-entropy loss function [16,18,23,29,42,43] is minimized using the ADAGRAD [34,35,37] optimization algorithm with a learning rate of 0.005. These hyper-parameters were obtained using the training procedure shown in Fig. 10 and were finalized based on K-Fold cross-validation analysis. The program for CNN is implemented using Python.

For the neural networks (feature-based), two hidden layers are used with each layer having 2048 hidden units, and ADAGRAD as the optimization algorithm with a learning rate of 0.001. The SVM is modeled using radial basis function (as the Kernel Function) [44]. Again, the hyper-parameters being reported are based on the training procedure shown in Fig. 10. The neural networks and SVM are implemented using matlab.

Using the test portion of the dataset, the overall accuracy of all algorithms (feature-based and feature-learning) is shown in Fig. 11. From Fig. 11, the accuracy of the feature-learning method (CNN) is much greater than the accuracy of the feature-based methods. The CNN developed its own features from the raw data directly and as such did not require domain expertise pertaining to human walking and signal processing. This result confirms the hypothesis presented in Sec. 1 that domain expertise required to understand the context of product usage via embedded sensors can be effectively eliminated by feature-learning methods. Since determining actual product usage contexts would be important for product design, these results demonstrate that the application of feature-learning methods can be very effective in product design without compromising the accuracy of context recognition.

The confusion matrices shown in Fig. 12 show the number of samples correctly and incorrectly classified (highlighted in white and gray). Using these classified samples, the predicted class accuracy, true class accuracy, and the overall accuracy of the algorithms are calculated. For example, in the case of CNN, for the target class “walking,” out of 1792 samples the algorithm correctly classifies 1750 samples as walking (highlighted in gray—true positive) while the rest were misclassified as other classes (highlighted in white—false negative). Thus, the target class accuracy for walking is 97.66%. On the other hand, out of 1813 samples classified as walking, only 63 were misclassifications (false positives). Therefore, the predicted class accuracy for walking is 96.53%. Using all the correctly classified samples for all classes, the overall classification accuracy for CNN is 95.81% demonstrating the superiority of the feature-learning algorithm over the feature-based algorithms for all activities.

In traditional feature-based machine learning methods, features must be developed according to the design task (activity recognition in this case). However, feature-learning methods automatically determine feature representations directly from raw data which is an important property for product design since the same feature-learning algorithm can be used to support various product design objectives. This is demonstrated in Sec. 4.3 by estimating user comfort level using a sensor integrated shoe.

Case Study 2—Comfort Estimation.

This section presents a case study using a sensor-integrated shoe that demonstrates the effectiveness of the feature-learning methods to estimate comfort ratings of users. Comfort estimation using sensor data is a novel challenge in product design, given the emerging access to product sensors and sensor data. The product description, along with the experimental protocol, dataset description, and comparison results are presented in Secs. 4.3.14.3.3.

Device Description, Experimental Protocol, and Data Set Information.

As a part of the case study, standard walking shoes were retrofitted with various sensors including eight force sensitive resistors (FSRs), one accelerometer, one flex sensor, and one temperature sensor. The FSRs collectively target the fore-, mid-, and hind-foot area. The target areas, sensor layout, and the prototype of sensor integrated shoe insert are shown in Fig. 13. Arduino Mega is used as the microcontroller to collect the data which are stored on a SD card. The data are collected at a frequency of 22 Hz.

To collect data, student, staff, and faculty participants were recruited from the University at Buffalo. The shoe sizes were limited to women's sizes 7, 8, and 9 (US) and men's sizes 8, 9, and 10 (US). Participants were paid $20 as compensation for their participation. In total, 151 users participated in the study; however, data from only 142 users could be used for analysis. For the study, each participant completed surveys using their smartphone and walked on a designated path across the campus for 1 mile (approximately 25 min). The path included tasks like walking on a flat surface, walking upstairs, walking downstairs, sitting, and standing. At the end of the walking tasks, each participant responded to the survey questions—“How much pressure did you feel on your fore-foot while walking,” “How much pressure did you feel on your midfoot while walking?,” “How much pressure did you feel on your hind-foot (heel) while walking?.” The rating ranges from 1 (Not at all) to 7 (Very much). As the ratings corresponding to how much pressure the user feels (which relates to comfort), only sensor data from the FSRs are used. From Fig. 13, it can be seen that four FSRs correspond to the fore-foot rating, two FSRs correspond to the midfoot rating, and two FSRs correspond to the hind-foot rating. The generic feature-learning network shown in Fig. 9 is modified for each foot area to support the number of sensor sources. The sensor signals were preprocessed by applying noise filters and then sampled in fixed width sliding windows of 1 min and 50% overlap. The dataset contained 17,773 data points. For all algorithms implemented (feature-based and feature-learning), the training and testing procedure presented in Fig. 10 is followed.

Feature Extraction (Feature-Based Methods).

As the ratings relate to the sensation of pressure (and ultimately comfort ratings), for this case study, the features shown in Table 2 were extracted. It should be noted that these features are not exhaustive enough to represent comfort on their own and new features ideally should be designed. But this is a disadvantage of feature-based methods that rely on human intervention to determine adequate features.

The signal length is 1 min (at 22 Hz frequency). Only NN are used as the feature-based method and these results are compared to the feature-learning methods in Sec. 4.3.3.

Accuracy Comparison.

For CNN, the architecture presented in Sec. 3 and represented by Fig. 9 is implemented. The number of feature-learning layers (convolutional and pooling layer) is two, the number of convolutional filters is 128 with a filter size of 25 samples, while the pooling size is five in both layers. The feature-learning layer acts as an input to a fully connected network, where for both models one hidden layer with 2048 hidden units within the layer is used. The output layer is modeled as a softmax layer [16,18,23,29,42,43] and cross-entropy loss function [16,18,23,29,42,43] and is minimized using the ADAGRAD [34,35,37] optimization algorithm with a learning rate of 0.05. These hyper-parameters were obtained using the training procedure shown in Fig. 10.

The neural network (feature-based method) is modeled using two hidden layers with each layer having 2048 hidden units and ADAGRAD as the optimization algorithm with a learning rate of 0.005.

The overall accuracy of all algorithms (feature-based and feature-learning) for all areas of the foot is shown in Fig. 14. From Fig. 14, it can be seen that the accuracy of the feature-learning methods is significantly higher than the accuracy of the feature-based methods for all areas of the foot.

Similar to the first case study, the results of the feature-based methods are highly dependent on the features that are designed. The accuracy of the feature-based methods could potentially be increased by improving the features designed and obtaining a better feature set to represent the sensor data. However, this process is laborious, it requires significant domain expertise, and the improvement in accuracy is not guaranteed. On the other hand, for feature-learning methods, domain knowledge is not required. Human input is needed to guide the hyper-parameter optimization, but even that process could be automated using principles from design of experiments [45]. Considering the increase in accuracy, the decrease in required effort, and the decrease in the required domain knowledge, the proposed feature-learning method outperforms the traditional machine learning methods.

In line with the long term vision of CED presented in Sec. 1, in Sec. 4.4, we propose a conceptual integration of machine learning based methods to identify usage context with the current analytical procedure of CED. Testing the validity of such integration is a topic for future investigation.

Conceptual Integration With Cyber-Empathic Design Framework.

The analytical procedure of CED is based on structural equation modeling (SEM) [46] where user–product interaction data (obtained using sensors) is mapped onto a network of latent constructs. Specifically, for sensors, features are extracted and used as input to the SEM as marked in Fig. 15. CED has been shown to be effective in modeling users' perception about a product. However, it does not include usage context and its effect on the latent constructs. The conceptual integration of usage context using machine learning and sensor data is shown in Fig. 15.

Identified usage context can be used as an additional input to the SEM that ultimately models the relationship among various latent constructs. In this manner, usage context and its relationship with psychological latent constructs can be studied as product usage context is expected to affect the types of design influences and product attributes important to heterogeneous end-users. The approach presented in this work provides a mechanism for such work but validation of this integration is beyond the scope of this work.

Based on these results, conclusions and opportunities for future work are presented in Sec. 5.

Conclusion and Future Work

Usage context information is important when considering how to redesign consumer products. However, current research only accounts for intended usage context in design activities. Further, there are other latent variables like comfort, which also influence a customer's product preference. User-product interaction can be modeled as actual usage context and also be used to model user comfort as well. Existing approaches like empathic design are not suitable to capture the actual usage context and comfort as challenges in scale and subjective influences limit their applicability. In this work, to capture actual product usage context and comfort, data collected from sensors are leveraged. The collection of this type of data is possible because of the emerging IoT paradigm where sensors can be effectively embedded in products.

Machine learning is one of the most applicable methods to support the analysis of the data from such sensors, but these methods require features (representation of raw data) as inputs. Traditionally, these features are engineered by domain experts, which is a laborious process. In this work, feature-learning methods, traditionally used in image and speech recognition, are used to recognize the product usage context and estimate comfort ratings. By using feature-learning methods, it allows for the automated development of features directly from the raw data. This can significantly reduce the time required in the development and analysis of sensor features, providing more effective design decision support information at much greater efficiencies.

An important task for a product designer is to estimate and maximize a customer's comfort by also understanding appropriate use contexts. In this work, we study both context recognition and comfort rating estimation using unrelated datasets. Since these two tasks are related and could be dependent on one another, it is important to be able to modify the feature-learning algorithm in order to perform multiple tasks simultaneously (i.e., predict usage context and its related comfort rating). For feature-based methods, it would be difficult to manually design features which simultaneously predict usage context and comfort because features that may be effective for usage context may not be as effective for comfort. However, for feature-learning methods, features could be automatically learned for both scenarios. Usage context and comfort rating data would need to be collected simultaneously and appropriate validation tests applied.

For the comfort estimation case study, the comfort rating was obtained at the end of a series of tasks performed by the users. Therefore, there was a latency period between the task and the comfort rating that could have influenced the responses. This latency period is not currently captured in the current feature-learning method. In addition, comfort is related to the customization or personalization of a product which is not currently captured by the approach since the learned features represent the entire population of users. Ideally, the features should be learned for each user separately, especially in the case of comfort rating estimation and product design which would allow personalized features to support efforts in product family design or mass customization. Finally, the approach should be expanded to include the full set of multimodal sensor data together including, in case study 2, the data from the force sensor and accelerometer. This more comprehensive data could be used to predict broader usage contexts and other user impacts such as fatigue or satisfaction.

2

Figure modified from original by Phulvar (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons.

Acknowledgment

Any findings, opinions reported in this work represent the authors' view and do not represent NSF.

Funding Data

  • Directorate for Engineering, National Science Foundation (NSF) (Grant No. CMMI-1435479).

References

References
1.
Dickson
,
P. R.
,
1982
, “
Person-Situation: Segmentation's Missing Link
,”
J. Mark.
,
46
(
4
), pp.
56
64
.
2.
Belk
,
R. W.
,
1974
, “
An Exploratory Assessment of Situational Effects in Buyer Behavior
,”
J. Mark. Res.
,
11
(
2
), pp.
156
163
.
3.
De la Fuente
,
J. R.
, and
Guillen
,
M. J. Y.
,
2005
, “
Identifying the Influence of Product Design and Usage Situation on Consumer Choice
,”
Int. J. Mark. Res.
,
47
(
6
), pp.
667
686
.https://www.mrs.org.uk/ijmr_article/article/80992
4.
Van Horn
,
D.
, and
Lewis
,
K.
,
2015
, “
The Use of Analytics in the Design of Sociotechnical Products
,”
Artif. Intell. Eng. Des. Anal. Manuf.
,
29
(
01
), pp.
65
81
.
5.
He
,
L.
,
Chen
,
W.
,
Hoyle
,
C.
, and
Yannou
,
B.
,
2012
, “
Choice Modeling for Usage Context-Based Design
,”
ASME J. Mech. Des.
,
134
(
3
), p.
031007
.
6.
Louviere
,
J. J.
,
Hensher
,
D.
,
Swait
,
J.
, and
Adamowicz
,
W.
,
2000
,
Stated Choice Methods
,
Cambridge University Press
,
Cambridge, UK
.
7.
Klayman
,
J.
, and
Ha
,
Y.
,
1987
, “
Confirmation, Disconfirmation, and Information in Hypothesis Testing
,”
Psychol. Rev.
,
94
(
2
), pp.
211
228
.
8.
Burns
,
A.
, and
Evans
,
S.
,
2001
, “
Empathic Design: A New Approach for Understanding and Delighting Customers
,”
Int. J. New Prod. Dev. Innovation Manage.
,
3
(
4
), pp.
313
327
.
9.
Lin
,
J.
, and
Seepersad
,
C. C.
,
2007
, “
Empathic Lead Users: The Effects of Extraordinary User Experiences on Customer Needs Analysis and Product Redesign
,”
ASME
Paper No. DETC2007-35302.
10.
Ghosh
,
D.
,
Kim
,
J.
,
Olewnik
,
A.
,
Lakshmanan
,
A.
, and
Lewis
,
K.
,
2016
, “
Cyber-Empathic Design—A Data Driven Framework for Product Design
,”
ASME
Paper No. DETC2016-59642.
11.
Ravi
,
N.
,
Dandekar
,
N.
,
Mysore
,
P.
, and
Littman
,
M. L.
,
2005
, “
Activity Recognition From Accelerometer Data
,”
17th Conference on Innovative Applications of Artificial Intelligence
(
IAAI
), Pittsburgh, PA, July 9–13, pp.
1541
1546
.https://pdfs.semanticscholar.org/20cb/9de9921d7efbc1add2848239d7916bf158b2.pdf
12.
Anguita
,
D.
,
Ghio
,
A.
,
Oneto
,
L.
,
Parra
,
X.
, and
Reyes-Ortiz
,
J. L.
,
2012
, “
Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine
,”
Ambient Assisted Living and Home Care
,
J.
Bravo
,
R.
Hervás
, and
M.
Rodríguez
, eds.,
Springer
,
Berlin
, pp.
216
223
.
13.
Mannini
,
A.
, and
Sabatini
,
A. M.
,
2010
, “
Machine Learning Methods for Classifying Human Physical Activity From On-Body Accelerometers
,”
Sensors
,
10
(
2
), pp.
1154
1175
.
14.
Kwapisz
,
J. R.
,
Weiss
,
G. M.
, and
Moore
,
S. A.
,
2011
, “
Activity Recognition Using Cell Phone Accelerometers
,”
ACM SigKDD Explor. Newsl.
,
12
(
2
), pp.
74
82
.
15.
Bao
,
L.
, and
Intille
,
S. S.
,
2004
, “
Activity Recognition From User-Annotated Acceleration Data
,”
Pervasive Computing
,
Springer-Verlag
,
Berlin
, pp.
1
17
.
16.
Tapia
,
E. M.
,
2008
, “
Using Machine Learning for Real-Time Activity Recognition and Estimation of Energy Expenditure
,”
Ph.D. dissertation
, Massachusetts Institute of Technology, Cambridge, MA.https://dspace.mit.edu/handle/1721.1/44913
17.
LeCun, Y.
, and
Bengio, Y.
, 1995, “
Convolutional Networks for Images, Speech, and Time Series
,”
The Handbook of Brain Theory and Neural Networks
, M. A. Arbib, ed., MIT Press, Cambridge, MA.
18.
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Hinton
,
G. E.
,
2012
, “
ImageNet Classification With Deep Convolutional Neural Networks
,”
Advances in Neural Information Processing Systems
,
Curran Associates
, Red Hook, NY, pp.
1097
1105
.
19.
Längkvist
,
M.
,
Karlsson
,
L.
, and
Loutfi
,
A.
,
2014
, “
A Review of Unsupervised Feature Learning and Deep Learning for Time-Series Modeling
,”
Pattern Recognit. Lett.
,
42
, pp.
11
24
.
20.
Kotsiantis
,
S. B.
,
Zaharakis
,
I. D.
, and
Pintelas
,
P. E.
,
2006
, “
Machine Learning: A Review of Classification and Combining Techniques
,”
Artif. Intell. Rev.
,
26
(
3
), pp.
159
190
.
21.
Bengio
,
Y.
,
Courville
,
A.
, and
Vincent
,
P.
,
2013
, “
Representation Learning: A Review and New Perspectives
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
35
(
8
), pp.
1798
1828
.
22.
Auer
,
P.
,
Burgsteiner
,
H.
, and
Maass
,
W.
,
2008
, “
A Learning Rule for Very Simple Universal Approximators Consisting of a Single Layer of Perceptrons
,”
Neural Networks
,
21
(
5
), pp.
786
795
.
23.
Huang
,
G.-B.
,
Wang
,
D. H.
, and
Lan
,
Y.
,
2011
, “
Extreme Learning Machines: A Survey
,”
Int. J. Mach. Learn. Cybern.
,
2
(
2
), pp.
107
122
.
24.
Cristianini, N.
, and
Shawe-Taylor, J.
, 2000,
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
, Cambridge University Press, Cambridge, UK.
25.
Maurer
,
U.
,
Smailagic
,
A.
,
Siewiorek
,
D. P.
, and
Deisher
,
M.
,
2006
, “
Activity Recognition and Monitoring Using Multiple Sensors on Different Body Positions
,”
International Workshop on Wearable and Implantable Body Sensor Networks
(
BSN
), Cambridge, MA, Apr. 3–5, pp.
113
116
.
26.
Cireşan, D. C.
,
Meier, U.
,
Gambardella, L. M.
, and
Schmidhuber, J.
, 2000, “
Deep, Big, Simple Neural Nets for Handwritten Digit Recognition
,”
Neural Comput.
,
22
(12), pp. 3207–3220.
27.
Nielsen
,
M. A.
, 2015, “
Neural Networks and Deep Learning
,” Determination Press, accessed, Sept. 25, 2015, http://neuralnetworksanddeeplearning.com/
28.
Haykin, S.
, 1998,
Neural Networks: A Comprehensive Foundation
, 2nd ed., Prentice Hall, Upper Saddle River, NJ.
29.
Schmidhuber
,
J.
,
2015
, “
Deep Learning in Neural Networks: An Overview
,”
Neural Networks
,
61
, pp.
85
117
.
30.
Lawrence
,
S.
,
Giles
,
C. L.
,
Tsoi
,
A. C.
, and
Back
,
A. D.
,
1997
, “
Face Recognition: A Convolutional Neural-Network Approach
,”
IEEE Trans. Neural Networks
,
8
(
1
), pp.
98
113
.
31.
Levine
,
S.
,
Finn
,
C.
,
Darrell
,
T.
, and
Abbeel
,
P.
,
2016
, “
End-To-End Training of Deep Visuomotor Policies
,”
J. Mach. Learn. Res.
,
17
(
39
), pp.
1
40
.
32.
He
,
K.
,
Zhang
,
X.
,
Ren
,
S.
, and
Sun
,
J.
,
2015
, “
Deep Residual Learning for Image Recognition
,” Preprint
arXiv:1512.03385
.https://arxiv.org/abs/1512.03385
33.
Bottou
,
L.
,
2012
, “
Stochastic Gradient Tricks
,”
Neural Networks: Tricks of the Trade
,
Montavon
,
G.
,
Orr
,
G. B.
, and
Müller
,
K. R.
, eds.,
Springer
, Berlin, pp.
430
435
.
34.
LeCun
,
Y.
,
Bottou
,
L.
,
Orr
,
G. B.
, and
Müller
,
K. R.
,
1998
, “
Efficient BackProp
,”
Neural Networks: Tricks of the Trade
,
G. B.
Orr
and
K. R.
Müller
, eds.,
Springer
,
Berlin
, pp.
9
50
.
35.
Bengio
,
Y.
,
2012
, “
Practical Recommendations for Gradient-Based Training of Deep Architectures
,”
Neural Networks: Tricks of the Trade
,
Montavon
,
G.
,
G. B.
Orr
, and
K. R.
Müller
, eds.,
Springer
,
Berlin
, pp.
437
478
.
36.
Sutskever
,
I.
,
Martens
,
J.
,
Dahl
,
G. E.
, and
Hinton
,
G. E.
,
2013
, “
On The Importance of Initialization and Momentum in Deep Learning.
,”
Int. Conf. Mach. Learn.
,
28
(
3
), pp.
1139
1147
.http://proceedings.mlr.press/v28/sutskever13.html
37.
Hadgu
,
A. T.
,
Nigam
,
A.
, and
Diaz-Aviles
,
E.
,
2015
, “
Large-Scale Learning With ADAGRAD on Spark
,”
IEEE International Conference on Big Data
(
Big Data
), Santa Clara, CA, Oct. 29–Nov. 1, pp.
2828
2830
.
38.
Kingma
,
D.
, and
Ba
,
J.
,
2014
, “
ADAM: A Method for Stochastic Optimization
,” Preprint
arXiv:1412.6980
.https://arxiv.org/abs/1412.6980
39.
Anguita
,
D.
,
Ghio
,
A.
,
Oneto
,
L.
,
Parra
,
X.
, and
Reyes-Ortiz
,
J. L.
,
2013
, “
A Public Domain Dataset for Human Activity Recognition Using Smartphones
,”
21st European Symposium on Artificial Neural Networks
, Bruges, Belgium, Apr. 24–26, pp. 437–442.
40.
Bergstra
,
J. S.
,
Bardenet
,
R.
,
Bengio
,
Y.
, and
Kégl
,
B.
,
2011
, “
Algorithms for Hyper-Parameter Optimization
,”
Advances in Neural Information Processing Systems
, Granada, Spain, Dec. 12–14, pp.
2546
2554
.
41.
Refaeilzadeh
,
P.
,
Tang
,
L.
, and
Liu
,
H.
,
2009
, “
Cross-Validation
,”
Encyclopedia of Database Systems
,
Springer
, New York, pp.
532
538
.
42.
Luštrek
,
M.
, and
Kaluža
,
B.
,
2009
, “
Fall Detection and Activity Recognition With Machine Learning
,”
Informatica
,
33
(
2
), pp.
197
204
.http://www.informatica.si/index.php/informatica/article/view/238/235
43.
Bottou
,
L.
,
2010
, “
Large Scale Machine Learning With Stochastic Gradient Descent
,”
International Conference on Computational Statistics
(
COMPSTAT
), Paris, France, Aug. 22–27, pp.
177
186
.https://www.rocq.inria.fr/axis/COMPSTAT2010/slides/slides_17.pdf
44.
Scholkopf
,
B.
,
Sung
,
K.-K.
,
Burges
,
C. J.
,
Girosi
,
F.
,
Niyogi
,
P.
,
Poggio
,
T.
, and
Vapnik
,
V.
,
1997
, “
Comparing Support Vector Machines With Gaussian Kernels to Radial Basis Function Classifiers
,”
IEEE Trans. Signal Process.
,
45
(
11
), pp.
2758
2765
.
45.
Montgomery
,
D. C.
,
2012
,
Design and Analysis of Experiments
,
8th ed.
,
Wiley
,
Hoboken, NJ
.
46.
Anderson
,
J. C.
, and
Gerbing
,
D. W.
,
1988
, “
Structural Equation Modeling in Practice: A Review and Recommended Two-Step Approach
,”
Psychol. Bull.
,
103
(
3
), pp.
411
423
.