Sensor signals acquired during the manufacturing process contain rich information that can be used to facilitate effective monitoring of operational quality, early detection of system anomalies, and quick diagnosis of fault root causes. This paper develops a method for effective monitoring and diagnosis of multisensor heterogeneous profile data based on multilinear discriminant analysis. The proposed method operates directly on the multistream profiles and then extracts uncorrelated discriminative features through tensor-to-vector projection, and thus, preserving the interrelationship of different sensors. The extracted features are then fed into classifiers to detect faulty operations and recognize fault types. The developed method is demonstrated with both simulated and real data from ultrasonic metal welding.

## Introduction

The wide applications of low-cost and smart sensing devices along with fast and advanced computer systems have resulted in a data-rich environment, which makes a large amount of data available in many applications. Sensor signals acquired during the process contain rich information that can be used to facilitate effective monitoring of operational quality, early detection of system anomalies, and quick diagnosis of fault root causes. In discrete manufacturing and many other applications, the sensor measurements provided by online sensing and data capturing technology are time- or spatial-dependent functional data, also called profile data [1,2]. In this paper, we are particularly interested in cycle-based profile data in ultrasonic metal welding [3], which are collected from repetitive operational cycles of the discrete manufacturing process.

Ultrasonic welding is a solid-state bonding process that uses high frequency ultrasonic energy to generate oscillating shears between metal sheets clamped under pressure [4,5], as illustrated in Fig. 1. The advantages of ultrasonic welding in joining dissimilar and conductive materials have been well recognized [6]. As electric car sales accelerate and production scales up in recent years, ultrasonic welding has been increasingly adopted for joining lithium-ion batteries for electric vehicles [4]. Tensile tests are conducted to study the joint's fracture, load-extension relationship, and tensile strength [3,4,6,7]. However, since tensile test is destructive and can only be performed offline, it is important to develop in situ monitoring and evaluation to provide opportunities for a faster implementation of corrective actions.

To facilitate in-process quality monitoring and fault diagnosis in ultrasonic welding, sensors were installed in the welding machine to collect in situ signals such as the ultrasonic power and the displacement between horn and anvil. Figure 2 shows two signals for four samples. Lee et al. found that there is correlation between online sensor signals and weld attributes [3]. Guo et al. developed an online monitoring algorithm to ensure weld quality and detect bad welds [8]. However, these studies only analyzed certain features from the sensor signals, such as the maximum power and maximum displacement, while the rich information hidden in the real-time signals were not extracted or explored. Moreover, existing studies are limited to either characterizing weld attributes or detecting bad welds while fault diagnosis for bad welds has not been investigated. Furthermore, multiple signals need to be modeled together since a single signal may not be informative enough for fault identification. Therefore, this paper aims to develop a new method in sensor fusion and fault diagnosis to enable in situ nondestructive evaluation of ultrasonic metal welding.

There is extensive research on the modeling and monitoring of cycle-based profile data in the literature, including both linear profiles and nonlinear profiles. An overview of parametric and nonparametric approaches for profile data as well as application domains can be found in Kuljanic et al. [9]. In recent years, there is a strong industrial interest for multisignal applications, especially in cases where a single signal does not provide enough information for decision making. This leads to an increasing demand for multisensor fusion methods to analyze the multiple signals captured from different sensors for process monitoring and system diagnostics purposes.

There have been many research efforts on multisensor data fusion in manufacturing operations, for example, chatter detection in milling [10], tool condition monitoring [11,12], engine fault diagnosis [13], etc. A large portion of the multisensor data fusion methods is based on extracting a single synthetic index from the monitoring signals, e.g., peak value, a weighted summation of signals, etc. The main limitations of this approach include the loss of information involved in the feature extraction process, the loss of sensor-to-sensor correlations, and the problem-dependent nature of the synthesizing scheme. Although profile monitoring techniques have been demonstrated to be more effective than synthetic index-based methods in monitoring processes characterized by repeating patterns [9], only a few authors have studied profile monitoring approaches in the field of sensor fusion [14–16]. Recently, with the fast development of multilinear methods for face recognition, Paynabar et al. [17] proposed a multichannel profile monitoring and fault diagnosis method based on uncorrelated multilinear principal component analysis (UMPCA) [18], whereas Grasso et al. [19] investigated the problem of multistream profile monitoring using multilinear PCA (MPCA) [20]. Multichannel profiles are homogeneous, in which all sensors measure the same variable, whereas multistream signals are heterogeneous, in which various sensors measure different variables.

In this study, we investigate the use of multilinear extensions of linear discriminant analysis (LDA) to deal with multistream signals for the purpose of process monitoring and fault diagnosis. LDA has been widely used as an effective tool for dimension reduction and discriminant analysis of complex data. Regular LDA is a linear algorithm that can only operate on vectors, thus cannot be directly applied to multistream profiles. To apply LDA to multistream profiles, these profiles need to be combined and reshaped (vectorized) into vectors first. Therefore, this method is referred to as vectorized LDA (VLDA). Applying LDA to this high-dimensional vector creates high computational complexity due to the dimension of scatter matrices. Moreover, vectorization breaks the natural structure and correlation in the original data, e.g., sensor-to-sensor correlation, and potentially loses more useful representations that can be obtained in the original form. Lu et al. [21] introduced an uncorrelated multilinear LDA (UMLDA) framework as an alternative to VLDA. UMLDA is a multilinear dimensionality reduction and feature extraction method that operates directly on the multidimensional objects, known as tensor objects, rather than their vectorized versions. The UMLDA extracts uncorrelated discriminative features directly from tensorial data through solving a tensor-to-vector projection (TVP). Although MPCA and UMPCA are also multilinear subspace feature extraction algorithms operating directly on the tensorial representations, similar to PCA, they are both unsupervised methods that do not make use of the class information. In manufacturing and many other applications, training samples from various classes can be easily collected in an efficient manner. In these applications, supervised multilinear methods like UMLDA take class information into consideration and thus may be more suitable for fault recognition. Although there is some exploratory research on the applications of UMLDA to image processing on face and gait recognition tasks [21], very little research could be found in the literature on using the UMLDA technique for analyzing multistream nonlinear profiles for the purpose of fault detection and diagnosis.

Therefore, the main objective of this paper is to develop a UMLDA-based approach for analyzing multistream in situ profiles in ultrasonic welding that considers the interrelationship of sensors. The features extracted by the UMLDA-based method can effectively discriminate different classes and provide fault diagnosis results. The effectiveness of the proposed method is tested on both simulations and a real-world case study in the ultrasonic metal welding process.

The remainder of this paper is organized as follows. Section 2 presents the method for analysis and dimension reduction of multistream profiles using UMLDA. VLDA is also reviewed in this section. Section 3 compares the proposed UMLDA-based method with VLDA and its variants, and other competitor methods including UMPCA-based and MPCA-based methods in the performance of extracting discriminative features and recognizing the type of faults. A case study of an ultrasonic metal welding process is given in Sec. 4. Finally, Sec. 5 concludes the paper with the discussion of broader impacts.

## Dimension Reduction of Multistream Signals: UMLDA and VLDA

Multiway data analysis is the extension of two-way methods to higher-order datasets. This section first reviews the basic notations and concepts in multilinear algebra and then introduces the implementation of UMLDA and VLDA for the purpose of dimensionality reduction in handling multistream signals. More details on the theoretical foundations of the mathematical development of UMLDA can be found in Refs. [22–24]. The algorithm we use in this paper for extracting uncorrelated features from tensor data is based on the theories presented in those articles.

### Basic Multilinear Algebra Concepts and Tensor-to-Vector Projection.

An *L*-way array $A$ is an *L*th-order tensor object $A\u2208RI1\xd7I2\xd7\cdots \xd7IL$ such that *I*_{l} represents the dimension of the *l*-mode, *l* = 1, …, *L*, where the term *mode* refers to a generic set of entities [25]. The *l*-mode vectors of $A\u2208RI1\xd7I2\xd7\cdots \xd7IL$ are defined as the *I*_{l}-dimensional vectors obtained from $A$ by varying the index *i*_{l} (*i*_{l} = 1, …, *I*_{l}) while keeping all the other indices fixed. In multilinear algebra, a matrix $A$ can be considered to be a second-order tensor. The column vectors and row vectors are considered as the 1-mode and 2-mode vectors of the matrix, respectively. The *l*-mode product of a tensor $A$ by a matrix $U\u2208RJl\xd7Il$, denoted by $A\xd7lU$, is a tensor with entries $(A\xd7lU)$$(i1,\u2026,il\u22121,jl,il+1,\u2026,iL)=\u2211ilA(i1,\u2026,iL)\u22c5U(jl,il)$.

To project tensorial data into a subspace for better discrimination, there are two general forms of multilinear projection: the tensor-to-tensor projection (TTP) and the tensor-to-vector projection (TVP). The TVP projects a tensor to a vector and it can be viewed as multiple projections from a tensor to a scalar. A tensor $A\u2208RI1\xd7I2\xd7\cdots \xd7IL$ can be projected to a point *y* through *L* unit projection vectors ${u(1)T,u(2)T,\u2026,u(L)T}$ as $y=A\xd71u(1)T\xd72u(2)T\xd7\cdots \xd7Lu(L)T$, $u(l)\u2208RIl\xd71$, $||u(l)=1||$ for *l* = 1, …, *L*, where || · || is the Euclidean norm for vectors. This projection ${u(1)T,u(2)T,\u2026,u(L)T}$ is called an elementary multilinear projection (EMP), which is the projection of a tensor on a single line (resulting a scalar) and it consists of one projection vector in each mode. The TVP of a tensor object $A$ to a vector $y\u2208RP$ in a *P*-dimensional vector space consists of *P* EMPs, which can be written as ${up(1)T,up(2)T,\u2026,up(L)T}p=1,\u2026,P={up(l)T,l=1,\u2026,L}p=1P$. The TVP from $A$ to $y$ is then written as $y=A\xd7l=1L$${up(l)T,l=1,\u2026,L}p=1P$, where the *p*th component of $y$ is obtained from the *p*th EMP as $y(p)=A\xd71up(1)T\xd72up(2)T\xd7\cdots \xd7Lup(L)T$.

In the frame of multistream profile data, the simplest *L*-way array representing the signals is a third-order tensor object $A\u2208RI1\xd7I2\xd7M$ such that *I*_{1} is the number of sensors, *I*_{2} is the number of data points collected on each profile, and *M* is the number of multistream profiles or samples. Note that more articulated datasets may be generated by introducing additional modes, e.g., by adding a further mode to group together different families of sensors.

### The UMLDA Approach.

Multilinear subspace feature extraction algorithms operating directly on tensor objects without changing their tensorial structure are emerging. Since LDA is a classical algorithm that has been very successful and applied widely in various applications, there have been several variants of its proposed multilinear extension, named multilinear discriminant analysis (MLDA) in general. The projected tensors obtained from MLDA, however, are correlated contrary to classical LDA. To overcome this issue, Lu et al. [21] proposed UMLDA, in which a TVP projection is used for projection. In this subsection, we review this UMLDA method.

The derivation of the UMLDA algorithm follows the classic LDA derivation of minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum discrimination. A number of EMPs are solved one by one to maximize the discriminant criterion with an enforced zero-correlation constraint. To formulate the UMLDA problem, let ${ymp,m=1,\u2026,M}$ denote the *p*th projected scalar features, where *M* is the number of training samples and $ymp$ is the projection of the *m*th sample $Am$ by the *p*th EMP ${up(1)T,up(2)T}:$

*C*is the number of classes,

*N*

_{c}is the number of samples for class

*c*,

*c*

_{m}is the class label for the

*m*th training sample, $y\xafp=(1/M)\u2211mymp=0$ assuming the training samples are zero-mean, $y\xafcp=(1/Nc)\u2211m,cm=cymp$, and $y\xafcmp$ is the mean of the class that $ymp$ belongs to. Let $gp$ denote the

*p*th coordinate vector and $gp(m)=ymp$. The objective of UMLDA is to determine a set of

*P*EMPs that maximize the scatter ratio while producing uncorrelated features. The mathematical formulation of UMLDA can be written as

*δ*

_{pq}= 1 for

*p*=

*q*and

*δ*

_{pq}= 0 otherwise.

*P*EMPs ${up(1)T,up(2)T}p=1P$ are determined sequentially in

*P*steps, with the

*p*th step obtaining the

*p*th EMP. The implementation of UMLDA given by Lu et al. [21] for the purpose of face recognition introduces a regularization parameter

*γ*(regularized UMLDA (R-UMLDA)). To solve for $up(l*)$ in the $l*\u2212mode$, assuming that ${up(l),l\u2260l*}$ is given, the tensor samples are projected in these (

*L*− 1 modes) ${l\u2260l*}$ to obtain vectors $y~mp(l*)=Am\xd7l=1,l\u2260l*L{up(l)T,l=1,\u2026,l*\u22121,l*+1,\u2026,L}p=1P$. The regularized within-class scatter matrix $S~Wp(l*)$ is defined as

*γ*≥ 0 is a regularization parameter, $IIl*$ is an identity matrix of size $Il*\xd7Il*$, and $\lambda max(S\u02c7W(l*))$ is the maximum eigenvalue of $S\u02c7W(l*)$, which is the within-class scatter matrix for the

*l*-mode vectors of the training samples.

The purpose of introducing the regularization parameter is to improve the UMLDA algorithm under small sample size scenario, where the dimensionality of the input data is high, but the number of training samples for some classes is too small to represent the true characteristics of their classes. This is a common case in small scale production like prototyping or personalized production. This scenario may also occur when a certain type of fault exists but rare, and that the data from that fault case are limited. If the number of training samples is too small, the iterations tend to minimize the within-class scatter toward zero in order to maximize the scatter ratio. Having a regularization parameter in the within-class scatter ensures that during the iteration, less focus is put on shrinking the within-class scatter. The basic UMLDA is obtained by setting *γ* = 0.

*A*denote the number of R-UMLDA feature extractors to be aggregated. To classify a test sample $A$, it is projected to

*A*feature vectors ${y(a)}a=1,\u2026,A$ using the

*A*TVPs first. Next, for the

*a*th R-UMLDA feature extractor, the nearest-neighbor distance of the test sample $A$ to each candidate class

*c*is

Therefore, the test sample $A$ is assigned the label $c*=argmincd(A,c)$.

### The VLDA Approach.

*I*

_{1}is the number of sensors,

*I*

_{2}is the number of data points collected on each profile, and

*M*is the number of samples. The classical LDA is then performed on matrix $A$. What we seek is a transformation matrix $W$ that maximizes the ratio of the between-class scatter to the within-class scatter

*S*_{B}and

*S*_{W}are the between-class scatter and the within-class scatter, respectively, and

*c*is the number of classes. The transformed signal samples can be obtained by $y=WTA$. More details on the calculation of

*S*_{B}and

*S*_{W}using Fisher linear discriminant can be found in Ref. [27].

## Performance Comparison in Simulation

In this section, the performances of the UMLDA and VLDA methodologies are evaluated and compared by numerical studies via simulation. The purpose of using simulation is to generate profile data from statistical models to mimic the profiles under different out-of-control (OOC) scenarios. We do not intend to replace real data with simulated data, but rather we would like to test the proposed method's performance in a larger and more complex dataset before applying it to real data. Real data may be limited in the types of patterns they can show and in the sample size. Simulation study is common when one wishes to test the effectiveness of a method, to explore how the performance is affected by certain parameters, to compare different methods, or to generate data that are otherwise difficult to obtain.

The multistream signals in simulation are generated in a similar manner as in Ref. [19]: A four-stream profile dataset is generated based on three benchmark signals proposed by Donoho and Johnstone [28]. The complex pattern features in the benchmark signals make it difficult for profile modeling using a parametric approach. Figure 3 illustrates the three benchmark signals: “blocks,” “heavysine,” and “bumps,” and they are denoted as *x*_{1}, *x*_{2}, and *x*_{3}, respectively.

*χ*∈ ℝ

^{N×K×M}denote the third-order tensor object that represents the four-stream profile dataset, where

*N*= 4 is the number of streams or sensors,

*K*= 128 is the number of data points for all the signals, and

*M*is the number of samples.

*χ*is generated to contain different types of correlation structures: linear correlation (e.g.,

*χ*

_{1,·,m}and

*x*_{1},

*χ*

_{2,·,m}and

*x*_{3}, etc.), curvilinear correlation (e.g.,

*χ*

_{2,·,m}and

*x*_{1},

*χ*

_{3,·,m}and

*x*_{2}, etc.), and no correlation (e.g.,

*χ*

_{3,·,m}and

*x*_{1},

*χ*

_{4,·,m}and

*x*_{3}, etc.).

*χ*is defined as follows:

*n*= 1, …, 4,

*m*= 1, …,

*M*. Similar to the dataset used in Ref. [19], the following settings are used to generate the dataset:

*μ*_{b}= [0.2, 1, 1.5, 0.5, 1, 0.7, 0.8]

^{T}and $\Sigma b=diag(\sigma b12,\u2026,\sigma b72)=diag(0.08,$$0.015,0.05,0.01,0.09,0.03,0.06)$. Figure 4 shows 100 in-control profile samples generated in this setting. As can be seen in Eq. (7), the four streams of signals are not independent, but the correlation structure is complex for profile modeling.

Out-of-control (OOC) scenarios are generated to simulate different kinds of deviations from the natural multistream pattern. Each OOC scenario is associated with an assignable cause. In the context of ultrasonic metal welding (and many other manufacturing processes as well), these assignable causes represent different faults, e.g., mislocated weld, sheet metal distortion, surface contamination, etc. In this paper, we assume that multiple faults do not occur simultaneously on one part, i.e., a single part has no more than one fault. The following OOC scenarios are considered:

*Scenario (a)*: Mean shift of the reference signal

*x*_{u}reference signal,

*u*= 1, 2, 3, and

**1**

_{K×1}is a column vector of ones.

*Scenario (f)*: Gradual mean shift of the reference signal

*δ*

_{f}is the magnitude of the shift and

**1**

_{K×1}is a column vector of ones. This scenario is introduced to represent the effects of tool wear on profile data. As tool wear develops, the reference signal of the (

*m*+ 1)th sample would have a larger mean shift than that of the

*m*th sample. Considering the severeness of tool wear, let $\delta f1\u2208[0.01,0.05]\sigma xu$ represents the deviations caused by a lightly worn tool, $\delta f2\u2208(0.05,0.1]\sigma xu$ represents the deviations caused by a tool with intermediate level of worn, and $\delta f3\u2208(0.1,0.15]\sigma xu$ represents a severely worn tool,

*u*= 1, 2, 3.

### Methods in Comparison.

The general framework of profile monitoring and fault diagnosis using multistream signals is illustrated in Fig. 5. For multilinear methods like UMLDA, the multistream signals can be directly represented in a tensor object, and then, the tensor is normalized so that the training samples are in the same dimension and zero-mean. For linear methods like VLDA, the multistream signals need to be vectorized to a matrix, and then followed by normalization. Feature extraction method, e.g., UMLDA or VLDA, then produces vector features that can be fed into standard classifiers for classification. The output is a tensor class label which represents “normal” or some fault type.

Performance comparison is conducted in two levels: (1) feature extraction performance and (2) classification performance. To compare feature extraction performance, we use the following four multilinear and three linear methods to extract features: R-UMLDA, R-UMLDA with aggregation (R-UMLDA-A), UMPCA, MPCA, VLDA, uncorrelated LDA (ULDA), and regularized LDA (RLDA). The feature vectors obtained are then fed into the nearest-neighbor classifier (NNC) with the Euclidean distance measure for classification.

In R-UMLDA, the regularization parameter *γ* is empirically set to 0.001. If we let *Q* denote the number of training samples per class, then intuitively, stronger regularization is more desirable for a smaller *Q*, and weaker regularization is recommended for a larger *Q*. Since the tensor object *χ* ∈ ℝ^{4×128×M}, one R-UMLDA will extract up to four features. In R-UMLDA-A, up to *A* = 20 differently initialized and regularized UMLDA extractors are combined with each producing up to 4 features, resulting in a total of 80 features. The *γ* parameter ranges from 10^{−7} to 10^{−2}.

UMPCA and MPCA are unsupervised multilinear methods that seek a set of projections to maximize the variability captured by the projected tensor. UMPCA will produce up to 4 features which are uncorrelated, while MPCA will produce as many as approximately 80 features which are correlated in order to capture at least 99% of the variation in each mode. Details on the theoretical development of UMPCA and MPCA can be found in Refs. [18,20].

In addition to VLDA, two more linear methods are included in comparison, ULDA and RLDA. ULDA and RLDA improve LDA on undersampled problems and small sample size problems, respectively. Each method will project to up to *C* − 1 features with *C* being the number of classes. Details on the theoretical development of ULDA and RLDA can be found in Refs. [29,30].

In order to further improve classification performance, we feed the features extracted by multiple R-UMLDA extractors into random subspace method and compare its performances with the R-UMLDA-A which adopts the simple nearest-neighbor aggregation. Since classification is not the main focus of this work, we will not discuss the ensemble learning methods in detail. Readers interested in random subspace method and ensemble learning are referred to Refs. [31,32].

### Simulation Results.

This subsection discusses simulation results in three main cases A, B, and C.

#### Case A.

Case A focuses on identifying the faults in out-of-control scenarios (a)–(e). Generate a total of 1200 profile samples with 200 samples in each class, as plotted in Fig. 6. The five OOC scenarios are specified as follows: (a) mean shift of the “block” reference signals: $x1\u2192x1+0.1\sigma x11K\xd71$, resulting in $\chi ~1,\u22c5,m=b1,m(x1+0.1\sigma x11K\xd71)+b2,mx2+\epsilon 1,m$, $\chi ~2,\u22c5,m=b3,m(x1+$$0.1\sigma x11K\xd71)2+0.1\sigma x11K\xd71)2+b4,mx3+\epsilon 2,m$, and $\chi ~4,\u22c5,m=b7,m(x1+$$0.1\sigma x11K\xd71)+\epsilon 4,m$; (b) superimposition of a sinusoid term on the “block” signal: $x1\u2192x1+0.1\sigma x1ys$, *y*_{s} is a sine function, resulting in $\chi ~1,\u22c5,m=b1,m(x1+0.1\sigma x1ys)+b2,mx2+\epsilon 1,m$, $\chi ~2,\u22c5,m=b3,m(x1+$$\sigma x1ys)2+b4,mx3+\epsilon 2,m$, and $\chi ~4,\u22c5,m=b7,m(x1+0.1\sigma x1ys)+\epsilon 4,m$; (c) increase in the standard deviation of the error term *e*_{1}: $\sigma \epsilon 1.m\u21923\sigma \epsilon 1.m$, leading to $\chi ~1,\u22c5,m=b1,mx1+b2,mx2+\epsilon ~1,m$, where $\epsilon ~1,m\u223cN(0,(3\xd70.5)2)$; (d) mean shift of the model parameter *b*_{1}: $\mu b1\u2192\mu b1+5\sigma b1$, yielding $\chi ~1,\u22c5,m=b~1,mx1+b2,mx2+\epsilon 1,m$, where $b~1,m\u223cN(\mu b1+5\sigma b1,\sigma b12)$; and (e) increase in the standard deviation of the model parameter *b*_{1}: $\sigma b1\u21924\sigma b1$, giving $\chi ~1,\u22c5,m=b~1,mx1+b2,mx2+\epsilon 1,m$, where $b~1,m\u223cN(\mu b1,(4\sigma b1)2)$.

Of the five OOC scenarios above, all profiles in streams 1, 2, and 4 are affected in (a) and (b), while in (c), (d), and (e), only the profiles in stream 1 have out-of-control patterns. Since a large amount of the $\epsilon ~1,m\u2032s$ generated in fault (c) would overlap with the in-control $\epsilon 1,m\u2032s$, and that the $b~1,m\u2032s$ generated by $b~1,m\u223cN(\mu b1,(4\sigma b1)2)$ in fault (e) would greatly overlap with the in-control $b1,m\u2032s$, faults (c) and (e) would be very difficult to separate from the in-control class.

Half of these 1200 samples are used as training. Before UMLDA modeling, generated data are normalized by taking away the grand mean of all training samples from the original data. Using the procedures described in Secs. 2 and 3.1, regularized UMLDA is applied to the normalized data. In UMLDA, the eigentensors corresponding to the *p*th EMP, $Up\u2208R4\xd7128$, *p* = 1, 2, 3, 4, are obtained by $up(1)\u2218up(2)$, where $up(1)\u2208R4\xd71$ and $up(2)\u2208R128\xd71$. Figure 7 shows $Up$ obtained from the training dataset in a single simulation run of case A. As can be seen from Fig. 7, the eigenvectors corresponding to the first EMP show an efficient discrimination against streams 1 and 4, whereas those corresponding to the second EMP show a strong discrimination against stream 2. The eigenvectors corresponding to the third and fourth EMPs show weak discriminations against stream 4, whereas limited useful information is extracted from stream 3 for discriminant analysis. These results are exactly compatible with the data generation model, thus implying that R-UMLDA can effectively extract information for discriminant analysis about multistream profiles.

Using the first *p* EMPs (*p* = 1, 2, 3, 4), multistream profiles can be projected to *p* uncorrelated features, which are then fed into the NNC. The classification performance in the test dataset is shown in Fig. 8 and Table 1. Figure 8 plots the following detailed results against the number of features used: correct classification rate: $\u2211m=1MtestI(c^m=cm)/Mtest$, where $c^m$ is the predicted class for sample *m*, *c*_{m} is the true class, and *M*_{test} is the number of test samples; correct passing rate: $\u2211m=1MtestI(c^m=0|cm=0)/Mtest$, where “0” indicates the “normal” class; correct detection rate: $\u2211m=1MtestI(c^m>0|cm>0)/Mtest$, where *c* > 0 indicates a fault class; true fault classification rate: $\u2211m=1MtestI(c^m=cm|cm>0)/Mtest$; rate of true detection but wrong fault classification: $\u2211m=1MtestI(c^m\u2260cm|c^m>0,cm>0)/Mtest$. As can be seen in Fig. 8, the first two features extracted by R-UMLDA are the most powerful features for classification. Adding the third and fourth features improves the correct classification rate slightly.

More detailed classification results with respect to the number of features fed into the classifier are shown in the confusion matrices in Table 1. From Table 1, we can easily observe an improvement in classification accuracy when two or more EMPs are used instead of using only the first one. We also notice that when two or more features are used, most of the classification errors come from separating the in-control class, fault (c), and fault (e) from each other. This observation is exactly compatible with the data generation model, based on which we have expected that faults (c) and (e) are the most difficult classes to separate from the in-control class.

Applying the competitor methods described in Sec. 3.1, Fig. 9(a) shows the classification performance of NNC for various feature extraction methods in case A test dataset. The plotted results are the average correct classification rates in 100 simulation runs. In Fig. 9(a), the curves with triangle markers correspond to classification performance for UMPCA and MPCA features. It is obvious that these results are significantly worse than LDA-based methods, regardless of the number of features used. This agrees with our understanding of PCA-based feature extractors which do not make use of the class information and only seek projections to maximize the captured variability instead of class discrimination.

The curves with cross, star, and asterisk markers in Fig. 9(a) correspond to vectorized LDA methods (including LDA, ULDA, and RLDA), whereas the curves with square and circle markers correspond to UMLDA methods. It can be seen from Fig. 9(a) that the first two features extracted by R-UMLDA are the most powerful features in classification. Beyond the first two features, the performance from R-UMLDA improves very slowly with an increased number of features used. The first three features extracted by vectorized LDA methods are also powerful, but the improvement from using the first two R-UMLDA features is not significant.

The best correct classification rate is achieved using R-UMLDA-A. Figure 9(a) shows that R-UMLDA-A outperforms all other algorithms. This demonstrates that aggregation is an effective procedure and there is indeed complementary discriminative information from differently regularized R-UMLDA feature extractors.

#### Case B.

Case B focuses on identifying the faults in OOC scenario (f), which mimics the deviations caused by tool wear. We generate a total of 800 profile samples with 200 samples in each of the following four classes: in-control and three OOC (f) scenarios, where three magnitudes of gradual mean shift are added to the “block” signal to reflect machine tools with light worn, medium worn, and severe worn. Half of these samples are used as training. Table 2 presents the confusion matrix of the nearest-neighbor classifier for R-UMLDA (with *γ* = 0.001) features in case B test dataset. As more features are fed into the classifier, the classification accuracy improves significantly. We also observe that classification errors only occur in the following three situations: distinguishing between the normal class and fault (f − 1) light tool wear, distinguishing between (f − 1) light tool wear and (f − 2) medium wear, and distinguishing between (f − 2) light wear and (f − 3) severe wear.

Figure 9(b) shows the classification performance in terms of average correct classification rate in 100 simulation runs of NNC for various feature extraction methods in case B test dataset. Similar to case A, the features extracted by UMPCA and MPCA are the weakest features in classification. Although the first few (1–2) features extracted by VLDA, ULDA, and RLDA are the most discriminative, using three or more R-UMLDA features lead to notably enhanced results. Figure 9(b) also shows the significant improvement introduced by aggregation. In all, R-UMLDA and R-UMLDA-A outperform all other algorithms.

#### Case C.

Case C investigates in-control and five OOC scenarios: (d) mean shift of the model parameter *b*_{1}, (e) standard deviation increase of *b*_{1}, and the three (f) OOC scenarios as described in case B. A total of 1200 profile samples with 200 samples in each class are generated and half of which are used as training. Table 3 presents the confusion matrix of NNC for R-UMLDA (with *γ* = 0.001) features in case C test dataset. As more features are fed into the classifier, the classification accuracy improves significantly. From Table 3, we also observe that almost all classification errors occur in the following four situations: distinguishing between the normal class and fault (f − 1), distinguishing between (f − 1) and (f − 2), distinguishing between (f − 2) and (f − 3), and separating fault (e) from normal. It is very difficult to separate fault (e) from the in-control class due to the fact that the $b~1,m\u2032s$ generated in fault (e) would greatly overlap with the in-control $b1,m\u2032s$.

Figure 9(c) shows the classification performance in terms of average correct classification rate in 100 simulation runs of NNC for various feature extraction methods in case C test dataset. Similar to cases A and B, the features extracted by UMPCA and MPCA are not as powerful as the other features in classification. Although the first few (1–2) features extracted by VLDA, ULDA, and RLDA are the most discriminative, using three or more R-UMLDA features lead to notably enhanced results. Figure 9(c) also shows that aggregation can effectively enhance the results, and that R-UMLDA and R-UMLDA-A outperform all other algorithms.

Under the framework of case C, we further investigate how the number of training samples in each class would affect feature extraction results. We consider a variant of case C, denoted as C′, that 20 profile samples are generated in each of the 6 classes. Figure 9(d) shows the correct classification rate of NNC for various feature extraction methods in cases C′ test dataset. Comparing Fig. 9(c) with Fig. 9(d), we notice that although the correct classification rates in Fig. 9(d) are slightly worse than those in Fig. 9(c) due to the smaller sample sizes, the classification performance does not vary significantly given the different number of samples. In both cases, the best result is always achieved by R-UMLDA-A. If we want to limit the number of selected features to 3 or 4, then the first 3–4 features extracted by R-UMLDA are always the most powerful ones in classification. The same conclusion can be drawn when the sample size is further reduced to 10 per class. On the other hand, when comparing these four simulation experiments in analysis of variance, the *P*-value is found to be less than 0.01, confirming that these four cases are indeed different. Therefore, simulation results demonstrate that R-UMLDA-A achieves the best overall performance in all the simulation experiments, and that R-UMLDA-A is a robust and effective feature extraction and dimension reduction algorithm for multistream profiles.

#### Improving Classification Via Ensemble Learning.

This subsection explores the possibility of further improving classification performance in fault diagnosis via ensemble learning. In R-UMLDA-A, 20 differently initialized and regularized UMLDA feature extractors are aggregated at the matching score level using the nearest-neighbor distance. Although R-UMLDA-A achieves the best results in previous simulation studies, more advanced ensemble-based learning algorithms such as boosting, bagging, and random subspace method are expected to achieve better results. Investigating alternative combination methods, however, is not the main focus of this chapter. Therefore, we will only show the classification performance using the random subspace method and leave the in-depth studies in this direction to future work.

Random subspace method is an ensemble classifier that consists of several classifiers each operating in a subspace of the original feature space and outputs the class based on the outputs of these individual classifiers. The *k*-nearest-neighbor classifiers are used here as individual classifiers. As an example, we consider the dataset from a single simulation run of case A as described in Sec. 3.2.1. Using the same 20 R-UMLDA feature extractors as in R-UMLDA-A, we plot the classification results in Fig. 10. The curves with circle or cross markers correspond to random subspace classification with different number of nearest neighbors (*k*). Comparing these results to R-UMLDA-A, which are plotted in square markers, we see that the random subspace ensemble significantly increases the accuracy of classification, given a proper choice of *k*. With *k* = 20 to 25, random subspace ensemble can achieve a relatively high correct classification rate using only 15 features, whereas R-UMLDA-A needs at least 20 features to achieve a similar performance. This also indicates more promising opportunities of using UMLDA for feature extraction and dimension reduction in handling multistream signals.

## Case Study in Multilayer Ultrasonic Metal Welding

In this case study, welding experiments of joining three layers of copper with one layer of nickel-plated copper are investigated. The clamping pressure is 34 psi, and the vibration amplitude is 40 *µ*m. Four sensors are used to collect in situ signals: the power meter records controller power signal, the force sensor measures the clamping force, the linear variable differential transformer (LVDT) sensor measures the displacement between horn and anvil, and the microphone captures the sound in vibration. Table 4 summarizes the sensors and signals [33,34]. Note that sensors use the same sampling rate in this case study. If sensors have different sampling rates, extra steps in data preprocessing would be needed.

Figure 11(a) shows the welded tabs from the normal welding process and three faulty processes: (1) surface contamination, (2) abnormal thickness, and (3) mislocated/edge weld. Figure 11(b) shows signals associated with these welds from the four sensors. In general, the normal welding process produces good welds with strong connections, while the faulty processes tend to create poor quality connections which may have adverse effects on the performance of the battery pack. If samples are contaminated, for example, with oil, there is less friction between the metal layers, causing insufficient vibration at the beginning of the weld. Therefore, the power signal does not rise as fast as a normal weld does. Once oil gets removed by vibration, the power signal picks up. Abnormal welding thickness may be caused by material handling errors, or sheet metal distortion, or operation errors. The displacement signal clearly shows how the displacement between horn and anvil is affected by thicker layers. Mislocated/edge weld may be caused by operation errors or alignment errors. With edge weld, all clamping force is applied to a smaller weld region, resulting in more displacement between horn and anvil toward the end of the weld. It can be seen from Fig. 11 that on the one hand, each signal contains richer information about product quality and process condition than any single point can provide, and on the other hand, a single stream of signals is not informative enough for recognizing the type of faults.

Sample data are organized in tensor object $A\u2208R4\xd7700\xd717$, which includes 4 sensors, 700 data points in each profile, and 17 samples. Samples are divided into training and test sets. Both R-UMLDA and VLDA methods are trained using eight normal samples, two samples with fault 1 (oily surface), one sample with fault 2 (abnormal thickness), and one sample with fault 3 (edge weld).

Using one R-UMLDA feature extractor with *γ* = 0.001, the eigentensors corresponding to the four EMPs are shown in Fig. 12. Recall that the eigentensors corresponding to the *p*th EMP are obtained by $Up=up(1)\u2218up(2)$, where $Up\u2208R4\xd7700$, $up(1)\u2208R4\xd71$, $up(2)\u2208R700\xd71$, and *p* = 1, 2, 3, 4. It can be seen from this figure that the eigentensors corresponding to the first EMP show an efficient discrimination and strong negative correlation in streams 2 and 3. The eigentensors corresponding to the second EMP show a strong discrimination in stream 1, whereas those corresponding to the third and fourth EMPs deliver similar information on discrimination in stream 4.

After training UMLDA and VLDA, the feature extractors and NNC are applied to five testing samples: two from the normal process, two from fault 1, and one from fault 2. Figure 13 plots the correct classification rate of NNC for UMLDA and VLDA in the testing samples. For the five testing samples, it can be seen that R-UMLDA-A can easily achieve 100% correct classification using only four features while R-UMLDA achieves 80%. The vectorized LDA methods, however, do not perform as well as UMLDA. The features extracted by RLDA achieve the same level of classification accuracy as R-UMLDA, whereas LDA and ULDA extract much weaker features. The results indicate that UMLDA-based methods, especially R-UMLDA-A, outperform VLDA methods (including LDA, ULDA, and RLDA) in detecting abnormal processes and fault diagnosis.

## Conclusion

In this paper, based on UMLDA, we proposed a method for effective analysis of multisensor heterogeneous profile data. With various sensors measuring different variables, information from each sensor, sensor-to-sensor correlation, and class-to-class correlation should all be considered. A simulation study was conducted to evaluate the performance of the proposed method and its performance superiority over VLDA and other competitor methods. The results showed that the features extracted by VLDA and competitor methods are not as powerful as UMLDA in discriminating profiles and classification. The possibility of improving classification performance in fault diagnosis using ensemble learning with UMLDA was further explored. We also applied both UMLDA and VLDA to a multilayer ultrasonic metal welding process for the purpose of process characterization and fault diagnosis. The results indicate that UMLDA outperforms VLDA in not only detecting the faulty operations but also classifying the type of faults.

Since the proposed method employs the tensor notation, all samples need to have the same number of data points so that the measurement data can be organized in a tensor format. In real manufacturing environment, the time variability of the signal could be common. To handle this situation, signal preprocessing would be needed. For example, to deal with inconsistent sampling rate, we can use downsampling or interpolation; to deal with inconsistent time duration of important patterns, we can crop to a specific segment of the profile or take the longest duration. Research on multimodal and multiresolution data fusion is a booming direction, yet it is beyond the scope of this work.

In the future, several remaining issues in this framework will be studied in more depth, such as integrating ensemble learning with R-UMLDA, developing method self-updating when unseen faults are observed, and adding probability output to classification. More comprehensive case study will be performed in the future as we collect more samples from welding experiments. Developing tensor-based methods for monitoring manufacturing processes with vision technology will be an interesting topic for future research. Furthermore, the extension of the developed method to online process monitoring and online learning would be an interesting development.

## Funding Data

General Motors Collaborative Research Lab in Advanced Vehicle Manufacturing at The University of Michigan.