Assembly through mating a pair of machined surfaces plays a crucial role in many manufacturing processes such as automotive powertrain production, and the mating errors during the assembly (i.e., gaps between surfaces) can cause significant internal leakage and functional performance problems. The surface mating errors are difficult to diagnose because they are not measurable. Current in-plant quality control for surface mating focuses on controlling the surface flatness of each individual part before they are mated, and the mating errors are indirectly evaluated by a pressurized sealing test to check whether any pressure drop occurs. However, it does not provide any clue to engineers about the origins and the root cause of the internal leakage. To address these limitations, this paper presents a pressurized color-tracking method to directly measure internal leak areas. By using the measurements of leak areas and the profiles of surfaces mated as training data along with Hagen–Poiseuille law, this paper develops a novel diagnostic method to predict potential leak areas (leakage paths) given the measurements on the profiles of mating surfaces. The effectiveness and robustness of the proposed method are verified by a simulation study and an experiment. The approach provides practical guidance for the subsequent assembly process as well as troubleshooting in surface machining processes.

## Introduction

Assembly through surface mating (surface assembly) has been widely adopted in many manufacturing applications. For example, the assembly of engine heads/blocks in automotive powertrain manufacturing and the assembly of flanges in pipeline construction process are both achieved by mating the surfaces of two machined parts. Poor mating quality can significantly affect functional performance of final products. Large variations in each individual mating surface could create significant between-surface gaps after mating, leading to undesired internal leakage across fluid paths. The leakage can cause significant product malfunction such as engine power loss and expensive head gasket repair. Therefore, the modeling and diagnosis of leakage during surface assembly plays a crucial role in ensuring high product quality.

The major challenge in monitoring and diagnosis of the surface mating problem is that a surface mating error is not directly measurable after the assembly of surfaces. The current surface mating quality control in a manufacturing plant is to inspect the flatness quality of each individual surface independently, aiming to ensure the produced surfaces are “sufficiently” flat. A pressure test for the assembled parts is then conducted based on a certain sampling rate to evaluate the sealing performance by measuring the pressure inside sealed and air-pressurized channels after surface mating. The measured pressure will be compared with a nominal pressure value, and any pressure drop would indicate a potential leakage. Aside from the pressure test after surface assembly, methods for predicting leakage rate were also developed. The predictions are usually made through mathematical prediction models with inputs from quantitative surface features and working environment parameters. For example, a fuzzy prediction model was proposed to predict the leakage rate for gasketed flange joints using inputs including surface roughness and gas pressure [1]. However, the current in-plant surface mating quality framework has significant limitations including:

Independent monitoring, diagnosis, and variation control for each individual surface flatness cannot always guarantee good quality of surface assembly. The “fit” or “match” between two surface shapes plays a more critical role than each individual surface flatness in ensuring the mating quality. For example, Fig. 1 shows a case study where a bottom surface was assembled with one of two upper surfaces with different shapes. Although upper surface 1 has a better flatness value (Note: a smaller value means a better quality) than upper surface 2, its average leakage percentage (measured by a leakage tester) is significantly larger than that of upper surface 2 since its shape is less compliant/fit with the lower surface. In practice, a noticeable amount of powertrain components assembled by surface mating still fail to pass the pressurized leakage test even if all surfaces to be mated are machined to satisfy tolerance specifications.

Increasing the flatness quality for each individual surface would result in significantly high manufacturing cost induced by procuring high-precision machine tools.

The monitoring strategy using the pressurized leakage test does not provide for an effective evaluation of surface mating errors nor feedback on the root cause such as the location of internal leakage paths. Engineers have little information about where internal leakage may occur, and all mating parts must be scrapped or completely reworked, increasing the scrap rate.

There is a strong need to model and characterize the surface mating quality for diagnosis of the leakage paths in between the mated surfaces.

Surface mating modeling and characterization involve two steps including: (1) measuring and modeling the profiles of individual surfaces to be mated and (2) estimating surface mating errors and potential internal leakage induced by the variations in each individual surface profile and interfacial contact conditions. For the first step, each individual surface profile can be measured by using multiresolution surface metrology systems. For instance, a high-resolution system employed by many powertrain manufacturing plants in North America is Coherix ShaPix surface metrology based on laser holographic interferometry [2–5] (see Fig. 2(a)), which can generate 4 × 10^{6} of data for a large area of 300 × 300 mm^{2} within seconds. Due to metrology cost consideration, manufacturing plants mostly rely on low-resolution measurement systems as quality checks such as coordinate measuring machine (CMM) measurements (Fig. 2(b)), which uses a stylus to scan a surface following a preprogrammed path. The measurements can be interpolated using spatial interpolation approaches based on weighted least squares [7], B-splines [8], wavelet [9], Kriging [10–14], Co-Kriging [15,16], and fuzzy-regression-Kriging [17] to reconstruct the profile variations over the entire surface. When the number of measurements is huge, approximate interpolation methods can be used to alleviate the computational cost [18–25].

For the second step (surface mating error estimation), however, there are significant lacks of research. As shown in Fig. 3, surface mating errors are induced by the interfacial void space after surface mating, and very limited literature was found related to the characterization of such interfacial gaps/void spaces. Malburg [26] proposed a profile filtering-based method that is capable of describing the void area along a given path of the interface. The method uses a circular element rolling along the original profile and then generating a filtered profile that indicates the void space of given paths. The void space has also been estimated using contact mechanics-based analysis and finite element analysis (FEA) [27–29]. As for internal leakage prediction, these research either considered the predicted void areas as leak areas or used void areas to estimate leakage rate. The limitations of these methods are summarized as follows:

The modeling of interfacial leakage path cannot be dealt with by only geometric comparison of measured/interpolated surface profiles. The simple geometric analysis ignores the interfacial interaction after two rough surfaces contact each other and the impact of flow pressure loss on creating a potential leakage path.

Contact analysis using FEA and principles of contact mechanics for estimating the interfacial interactions require expensive computation time for rough surfaces. Additionally, the modeling complexity poses a great challenge to making appropriate assumptions such as defining the boundary and contact conditions.

The existing methods did neither consider the spatial locations and connectivity of void areas nor their relations to the leakage. For example, a large void space at an isolated location could be independent of leakage because liquids have no way to reach to these areas (see Fig. 4).

Current methods mostly propose certain metrics to quantify the leak but lack a direct validation based on real-world data.

This paper aims to address the limitations in the modeling and diagnosis of leak areas. We propose a model-based approach to estimating the interfacial gap between two mated surfaces and spatial connectivity among these gaps (i.e., leakage path/channel) given the surface profiles. The model is postulated by integrating a stochastic model based on lattice graph representation and Hagen–Poiseuille law (refer to Ref. [30], a law characterizing flow pressure loss and will be discussed in Sec. 2.2.1). The model parameters are learned using the training data provided by developing a novel color-tracking method to measure the leak areas in between assembled surfaces, thus avoiding the need to estimate the interfacial interactions using FEA and contact mechanics. The learned model is capable of predicting the probability of leak areas over the entire mating area given the height profiles of the surfaces to be mated. As such, the approach can be used to diagnose surface mating errors for future surface assembly processes. The diagnostic results would provide practical guidance for the subsequent assembly process as well as troubleshooting in surface machining processes. The method was also demonstrated by using a simulation study and a surface mating experiment.

## Methodology

This section presents the methodology of stochastic modeling and diagnosis of leakage paths for mating surfaces including the model postulation, parameter estimation, and training data measurement and acquisition.

### Model Postulation for Connected Leakage Paths.

As shown in Fig. 5, the mating area surrounding channels or bores (white area) on a mating surface can be partitioned into a rectangular lattice (Fig. 5(a)), creating a graph that connects the neighboring grid points via directed arcs (Fig. 5(b)). Liquid or gas can propagate along the arcs between neighboring grid points. The grid points in the bore/channel area are potential sources of the leakage and called “source grids.” The grid points in other mating area are called “target grids.” A leakage path starts from a source grid and ends in a target grid, eventually forming leak areas. The proposed model will predict the probability of leak at every target grid given the height distribution (profile data) of surfaces to be mated. Before the model being applied to the surfaces on incoming parts during production, the parameters of the model need to be trained by the surface profile data and leak area data collected from sampled mating parts in the prior production. Acquisition of the training data and estimation of parameters for the leakage prediction model will be discussed in detail in Sec. 2.3.

The lattice grid size should be carefully selected since it determines the resolution of the predicted leak areas. The larger the grid size, the lower the prediction resolution. An appropriate resolution for making the prediction without loss of information is governed by the average or minimum size of interfacial void spaces. Also, the grid size cannot be smaller than the resolution of surface height data and furthermore a small grid size increases the computational complexity. Therefore, the size of the grids is determined based on the resolution requirement of the leakage estimation.

The framework of the prediction model is shown in Fig. 6. The detailed procedures to implement the model are presented in the following five steps, i.e.,

Step 1—Estimate interfacial void space: The input for the model is surface height data (surface profile) at every grid point. The height data can be obtained through surface measurement such as CMM and/or surface interpolating model based on the measurements. The interfacial void space potentially initiates the leakage and thus needs to be obtained first. However, it cannot be directly measured. In this paper, we adopt a simple model to facilitate the estimation of interfacial void spaces as described below.

The height of two surfaces at grid point*i*is denoted by $hi1$ and $hi2$, respectively, $i\u2208I,|I|=n$, where*I*denotes the total set of grid points, and*n*is the number of grid points. The estimated void space at grid point*i*is calculated bywhere $max(h1+h2)$ represents the maximum value of ${hi1+hi2},\u2200i\u2208I$, which indicates the assumed contact point of the mating surfaces (see Fig. 3). The interfacial void space estimated via Eq. (1) characterizes an initial contact state of two mating surfaces without deformation. It is used as an indicator of the potential or how easy/difficult a leak flow can pass through the interfacial void areas, instead of representation of the true interfacial void space.$vi=max(h1+h2)\u2212(hi1+hi2)$(1)Step 2—Calculate arc cost indicating the pressure loss between two neighboring grids: The cost

*w*associated with the arc in the graph should reflect the pressure loss of the leak flow passing from grid point_{ij}*i*to point*j*. We can evaluate the total pressure loss between two locations by summing up all the arc cost in between. Details will be discussed in Sec. 2.2.2. Obviously, the arc cost depends on the magnitude of the interfacial void space at the two grids*v*and_{i}*v*. The function of relating_{j}*v*and_{i}*v*to the cost_{j}*w*is represented by a parametric model_{ij}*f*, i.e., $wij=f[vi,vj;\Theta ]$ (cost function), where Θ is unknown parameter. The choice of the cost function model is important and will be discussed in Sec. 2.2.1.Step 3—Find the minimum-cost path between source grids and target grids: In reality, there might exist multiple leakage paths between a source grid and a target grid. We propose to identify the minimum-cost path, which stands for the easiest path (i.e., the path having minimum pressure loss) for leakage propagating from a source to a target. If the leakage flow cannot reach a target grid though the minimum-cost path, it is more likely that the grid is free from leakage problems. Such a process of finding the minimum-cost path also considers the spatial connectivity of void areas. The minimum-cost path can be conveniently found by applying the shortest path problem's algorithm (e.g., Dijkstra's algorithm [31]) to the built graph model (e.g., Fig. 5).

Step 4—Extract features from the minimum-cost path: The minimum-cost path from source grids to a target grid

*j*is denoted by path. In order to tell if a leak exists at grid_{j}*j*, features such as the mean and variation of arc cost can be extracted from path. These features will greatly influence the performance of the leakage classification model to be developed. A discussion on the feature extraction is presented in Sec. 2.2.2._{j}- Step 5—Conduct probabilistic classification: The extracted feature set at a target grid
*j*is denoted by*x*. The response of the classifier can be a binary variable, indicating the leakage at grid_{j}*j*when it is 1 and otherwise when assuming 0. Therefore, a binary classifier based on logistic regression can be developed to predict the probability of leak at grid point*j*(see Eq. (2))$ln(p(xj)1\u2212p(xj))=g[xj;B]\u2009$(2)

where $p(xj)$ is the probability of leak at grid point *j*; $g[xj;B]$ is a linear function of *x _{j}* having parameters $B={\beta 0,\beta 1,\beta 2,\u2026,\beta m}$; $m=|xj|$ is the number of features in feature set

*x*. The training data to obtain $p(xj)$ will be discussed in Sec. 2.3.1. The leak areas are formed after calculating the probability of leakage for all target grid points.

_{j}The advantages of the proposed method are as follows:

Spatial connectivity of void areas is considered so that isolated void areas are free from leakage.

The pressure loss of leak flow in the connected void channel is considered. As such, the model is able to characterize the scenario when the interfacial void spaces that are far from a leakage source are free from leakage.

### Discussions on the Proposed Model.

Great care must be exercised on two aspects when implementing the five steps to establish the prediction model and diagnostic classifier including (1) selection of the cost function and (2) feature extraction from the minimum-cost path.

#### Cost Function.

*i*and

*j*should reflect the pressure loss for leakage propagation from grid

*i*to

*j*. Since the flow in the leakage path is always creeping, it can be considered as laminar flow and Hagen–Poiseuille law [30] can apply. Equation (3) shows the Hagen–Poiseuille equation, which calculates the pressure loss $\Delta P$ for laminar flow, i.e.,

*η*is the viscosity of the fluid,

*Q*is volumetric flow rate,

*d*is the leakage path diameter, and

*L*is the leakage path length. The leak flow characterization between two neighboring grids,

*η*,

*L*, and

*Q*can be considered as constant, and

*d*can be approximated by $(vi+vj)/2$. Therefore, the pressure loss for leak flow between grids

*i*and

*j*can be estimated by Eq. (4), which is used as the cost function

where *A* is a positive constant. Additionally, the minimum-cost path found by Dijkstra's algorithm is not influenced by the value of *A*. Thus, *A* can be set to any positive constant.

#### Feature Extraction From the Minimum-Cost Path.

There are multiple possible features *x _{j}* at a grid point

*j*that can be extracted from the minimum-cost path path

*as found by using Dijkstra's algorithm. This section discusses three common features that characterize the sum, mean, and variation of pressure loss including:*

_{j}*c*, the total cost along the path path_{j}:_{j}*c*reflects the minimum pressure loss for leak flow propagates from sources to the target grid_{j}*j*. A higher value of*c*indicates that a higher pressure is required for leak flow to reach grid_{j}*j*.$cjmean$, the mean cost: $cjmean$ stands for the pressure loss per unit length (average pressure loss) along the minimum-cost path. A higher value of $cjmean$ represents more pressure is needed for leak flow to reach the target grid.

$cjstd$, the variation of the cost along the path: $cjstd$ characterizes the dispersion (variability) of the between-grids pressure loss along the minimum-cost path. A large value of $cjstd$ reflects a wide dispersion of between-grids pressure loss. It indicates the possible existence of extremely large pressure loss between some grids, due to very small interfacial void spaces (Eq. (4)), posing more challenges for leak flow's propagation.

where *k _{j}* is the number of between-grids arcs in path

*;*

_{j}*p*and

*q*represent two neighboring grids in the minimum-cost path path

*;*

_{j}*w*represents the arc cost between grid

_{pq}*p*and

*q*. In this paper, the set of

*c*, $cjmean$, and $cjstd$ are represented by

_{j}*C*,

*C*

^{mean}, and

*C*

^{std}, respectively, i.e., $C={cj},\u2009Cmean={cjmean},\u2009Cstd={cjstd}$. These features are aggregated in a feature vector

*X*, i.e., $X={C,Cmean,Cstd}$.

The above three features characterize the most representative statistical properties (including center tendency and dispersion) of pressure loss along the minimum-cost path. For a generic case, these three features are sufficient to make good predictions of leak area. If needed, other features can be introduced to characterize different statistical properties such as the median and the skewness of the *w _{pq}*'s along the minimum-cost path.

When multiple features are extracted, statistical feature selection for logistic regression model (Eq. (2)) needs to be implemented to identify those that mostly affect leakage classification. The selection can be achieved by using hypothesis test (i.e., $H0:\beta =0;Ha:\beta \u22600,\beta \u2208B$) for their parameters in the logistic regression model. The significance levels of these features are indicated by *p*-values of their parameters. The procedures of feature selection along with parameter estimation for the logistic regression model are presented in Sec. 2.3.2.

### Learning Prediction Model With Data.

The model needs to be trained using historical data before being applied to incoming mating surfaces. This section discusses the acquisition of training data and learning of the parameters in the prediction model from data. A novel color-tracking method is developed to identify leak areas, thus avoiding the need to perform complex computation of FEA and contact mechanics modeling.

#### A Pressurized Color-Tracking Method for Training Data.

The proposed model in this paper needs training data to learn its parameter *B* (see Eq. (2)). The training data contain surface profile data $H={h1,h2}$ for two mating surfaces (1 and 2) and leak areas data $Y={yi}$, where $i\u2208I,\u2009yi\u2208{0,1}$. As mentioned earlier, surface profile data can be conveniently obtained by surface measurements. In this paper, we use a Coherix ShaPix laser holographic interferometer (Fig. 8(a)), which has lateral resolution 80 *μ*m and vertical resolution 0.05 *μ*m.

In this paper, we propose a pressurized color-tracking method, by which colored fluid is pressurized to test the leakage of surface assembly and any leak between the surfaces will be marked with color. In this way, leak areas are indicated by colored contaminations. To develop the pressurized color-tracking method, a testbed is designed that mimics the assembly of engine head and block. The testbed is composed of (a) a mini engine head made of aluminum alloy 2024 and (b) a mini engine block made of cast iron, which is assembled by six bolts. The design and dimension of the components and the way of assembly are shown by the three-dimensional model in Fig. 7.

In order to measure the leak areas, red ink is injected into a cylinder by air pressure applied through a regular leakage tester connected with a stable air source, which is shown in Fig. 8(b). The pressure is adjusted and maintained for a certain amount of time (around 30 s for the testbed in this paper). After disassembling the surfaces, leak areas can be revealed by red contaminations on both surfaces. An example of the resultant leak areas indicated by ink is shown in Fig. 8(b). Note: To avoid contamination of the leakage trace marked by ink during disassembly, the testbed is carefully transferred (using a fixture) to a heating oven, which vaporizes the residual ink via pluggable ventilation holes and dries the internal structure before disassembly.

#### Parameter Estimation.

To learn the parameters *B* in the logistic regression model, we make two assumptions as follows:

- There are two possible states for a grid: leak and no leak. A Bernoulli distribution is commonly used to describe such Boolean-valued states and characterize the probability of leak occurrence for every grid. Thus, we assume that the probability distribution of leakage occurrence at each grid point is Bernoulli$P(yi;pi)={pi,if\u2009yi=1\u20091\u2212pi,if\u2009yi=0$(6)

where *p _{i}* is the probability of leakage occurrence at grid point

*i*;

*y*= 1 stands for leak occurs at grid point

_{i}*i*; and

*y*= 0 stands for no leak at grid point

_{i}*i*. When there are multiple leakage conditions such as mild, moderate, and severe leaks, a multinomial distribution can be assumed for leakage occurrence

*y*.

_{i}- The leakage occurrence between different grids can be assumed to be conditionally independent given the information about feature
*X*. Since the probability distribution of the extracted feature set*X*exhibits spatial dependency, the leakage occurrence should also be spatially dependent as determined by Eq. (2). Such a spatial dependency can be deducted by conditioning on the feature set*X*. Thus, the conditional independence between the leakage occurrences at different grids can be represented as follows, i.e.,$P(yi,yj|xi,xj)=P(yi|xi)P(yj|xj)\u2009,\u2003\u2200i,j\u2208I,i\u2260j\u2009$(7)

The significance of each feature as described in Eq. (5) needs to be tested by examining the *p*-values of its parameters *B*. The log-likelihood calculated by $log(L(B))$ will be used as a metric to quantify the prediction accuracy compared with the true leak areas. A large value of log-likelihood indicates better prediction accuracy. The procedure of parameter estimation and feature selection can be summarized as Algorithm 1.

## Case Study

A simulation study was conducted to theoretically verify whether the proposed model can identify the significant features and predict the leak areas accurately. The simulation study further tested the robustness of the model against the noise in the surface height data. An experimental study was also conducted to validate the effectiveness of the model under real-world surface manufacturing conditions.

**Input:** Surface height data *H* and leak areas data *Y*

**Output:***B*

**Procedures:**

*A*is a positive constant;

3. Find the minimum-cost paths for every grid point based on ${wij}$ using Dijkstra's algorithm;

4. Extract feature set *X* from minimum-cost paths, e.g., $X={C,Cmean,Cstd}$ calculated via Eq. (5);

*B*by maximizing

*L*(

*B*) (Eq. (8)) using dataset {

*X*,

*Y*} through Newton–Raphson method (other methods are also applicable, e.g., gradient descent)

*p*-values of $\beta \u2208B$ by hypothesis test $H0:\beta =0;Ha:\beta \u22600$ and select significant features based on:

**IF**$p-value<0.05$: **THEN** feature is selected,

**ELSE**: feature is deleted;

7. Repeat step 5 using the selected significant features in step 6.

### Simulation for Surface Mating Modeling.

In this simulation, we discuss feature selection for modeling, leak areas prediction and verification, and robustness of the prediction in response to measurement noise.

*V*and leak area data

*Y*(as ground truth). The training data will help estimate the model parameters and test data will be employed to validate the accuracy of leak area prediction based on the model. The leak area data were simulated based on the minimum total cost

*C*from sources to every target grid under the assumption that higher cost indicates a smaller probability of leakage. The cost function used in this simulation is Eq. (10). The noise at every grid is assumed to be i.i.d. $N\u223c(0,\sigma 2)$, where

*σ*represents the standard deviation of the simulated noise indicating the noise level

The detailed procedures of simulating one pairs of mating surfaces are as follows:

- (1)
Specify the surface shape and grid size. Each of the mating surfaces has a square shape (30 × 30) with a bore in the middle (diameter $\u220510$). The grid size is set to be 1.

- (2)
Simulate spatially correlated interfacial void space data at every grid point by the Gaussian process model, making $V={vi},\u2009vi\u2208[0,0.15]$. (Note: we can also simulate the surface height data first for the two mating surfaces separately and then use Eq. (1) to calculate the void space data

*V*). - (3)
Generate the arc cost using

*V*based on Eq. (10) and then calculate the minimum-cost path from source grids (the grids surrounding the bore) to all the other target grids by the Dijkstra's method. - (4)
Extract the total cost feature

*C*from the minimum-cost paths using the first equation in Eq. (5). - (5)Generate the leak conditions data $yi\u2208{0,1}$ for all the target grids based on their feature
*c*value. For example, the following rules can be assumed to simulate the leak data:$p(yi=1)={1,if\u2009\u2009c\u22640.8T0.98,if\u20090.8T<c\u22640.85T0.95,if\u20090.85T<c\u22640.9T0.9,if\u20090.9T<c\u22640.95T0,if\u20090.95T<c$

where $p(yi=1)$ represents the probability of leak for grid *i*; *T* is a specified threshold that determines the probability of the leakage.

- (6)
Specify a value of

*σ*and add i.i.d. $N\u223c(0,\sigma 2)$ noise to the interfacial void space*V*for every grid. In this simulation case,*σ*ranges from 0 to 0.05. After this step, the training or test dataset {*V*,*Y*} can be obtained.

Figure 9 shows an example of the simulation result, where panel (a) shows the mating surfaces data for training that are the simulated interfacial void space data with *σ* = 0, panel (b) shows the simulated interfacial space data with $\sigma =0.05$, and panel (c) shows the simulated leak areas. Figures 9(d)–9(f) show the mating surfaces data for test that are the simulated interfacial void space data with *σ* = 0, simulated interfacial space data with $\sigma =0.05$, and the simulated leak areas, respectively. Note: For simplification of illustration, the values in this simulation are all unitless.

#### Feature Selection.

Three candidate features $X={C,Cmean,Cstd}$ are considered for statistical feature selection. The training (see Figs. 9(a) and 9(c)) and test data (see Figs. 9(d) and 9(f)) were simulated only based on feature *C*. In this study, we tested whether the significant feature *C* and insignificant features *C*^{mean} and *C*^{std} can be correctly identified. Without losing generality, the initial measurement noise is set to be *σ* = 0.

The *p*-values of the three candidate features are shown in Table 1. It can be seen that the *p*-values of *C*^{mean} and *C*^{std} are >0.05 (insignificant) and the *p*-values for *C* is < 0.05 (significant). Also, after removing the insignificant features *C*^{mean} and *C*^{std}, the log-likelihood does not show a significant decreasing trend as shown in Table 2. These results demonstrate that the prediction model along with feature selection method can successfully identify the significant features and therefore make accurate predictions of leak areas.

#### The Influence of Noise and Model Verification.

Noises are inevitable during surface data measurement. Therefore, it is necessary to test the robustness of the leakage prediction model against noise. Different levels of noise were simulated and added to both training and test void space data. In this study, *σ*'s were set to be ${0.00,0.01,0.02,0.03,0.04,0.05}$, and feature set *X* was chosen to {*C*}. The simulation–prediction process was repeated for 100 times per noise level except for $\sigma =0.00$. It is worth noting that $\sigma =0.05$ is a relatively significant noise compared with the magnitude (0.1) of surface height.

The prediction results are shown in Fig. 10. The left panel (Fig. 10(a)) is the leak area prediction when *σ* = 0. By comparing Fig. 10(a) with the simulated true leak areas in Fig. 9(f), it can be concluded that the prediction is very close to the test data. Figure 10(b) is one of the prediction results among the 100 repetitions when $\sigma =0.05$. It can be seen that the result is still close to the test data in Fig. 9(f) though less accurate than Fig. 10(a). A box plot in Fig. 10(c) shows the prediction accuracy indicated by the value of log-likelihood under different levels of noise. It can be observed that the median of the log-likelihood value shows a slight decreasing trend but the overall ranges are comparable when the noise level increases. Such a pattern in the log-likelihood values indicates that the accuracy of the prediction is not significantly jeopardized. The prediction accuracy is less affected even when the noise level reaches $\sigma =0.04$ or 0.05, which is a very large noise compared with the surface profile magnitude 0.1 and rarely happens in the real-world production. As such, we can conclude the good robustness of the proposed method.

Remark: Alternative methods can also be employed to show the how the prediction accuracy is affected by the noise level. For instance, one may calculate the percentage of falsely and/or correctly predicted leakage areas. It plays a very similar role as log-likelihood.

### A Surface Mating Experiment.

An experiment was conducted using the testbed as described in Sec. 2.3.1. The surface profile data measured by the Coherix Shapix surface metrology system and the tested leakage areas are shown in Fig. 11. Several grooves were intentionally generated on the surface surrounding a bore (leakage source) for the purpose of introducing leakage (Fig. 11(b)). The lateral resolution (i.e., the size of a grid) for leak areas prediction is set to be 2 mm.

A five-fold cross validation, a commonly used technique for assessing the generality of a model, was conducted to validate the proposed model based on the experimental data as shown in Fig. 11. In this section, we partitioned the surface area surrounding the bore into five areas as shown in Fig. 12(a). Of the five subsamples, one single subsample was always retained as the validation data for testing the prediction model, and the remaining four subsamples were used as training data. This training-validation process was repeated for five times until all the subsamples are used as validation data.

Among the five subsamples, the *p*-values for three candidate features are shown in Table 3. As can be seen, *C*^{mean} is not a significant feature in three folds out of five folds. As a result, *C*^{mean} is excluded from feature set. *C* and *C*^{std} have small *p*-values and thus are retained in the feature set for leakage prediction. Furthermore, we tested whether excluding *C*^{mean} from the model's feature set could affect the prediction results. In this case, leakage prediction was conducted using feature set ${C,Cmean,Cstd}$ and ${C,Cstd}$, respectively (see Table 4). It can be seen that the sum of the log-likelihood of the five-fold cross validation not including *C*^{mean} is even larger than that using *C*^{mean}. Thus, excluding *C*^{mean} does not jeopardize the prediction and verifies the insignificance of *C*^{mean}. The feature selection results can be explained as follows: A large value of *C* (total pressure loss) will prevent the grid from leakage. Also, a large value of *C*^{std}, which stands for a wide dispersion of between-grids pressure loss, indicates a possible large between-grids pressure loss, i.e., very small interfacial void space (according to Eq. (4)), thus preventing the grid from leakage. With the existence of both *C* and *C*^{std}, the statistical effect of *C*^{mean} on the leakage occurrence becomes less important and thus insignificant.

The predicted leak areas using feature set ${C,Cstd}$ are shown in Fig. 12, where the results of five validation samples were combined in a single figure. Comparing the leak area prediction in Fig. 12(b) with the true leak areas in the experiment as shown in Fig. 11(c), we can conclude that the prediction results are reasonably accurate and quite informative. Additionally, the log-likelihood values that are larger than $log(0.5)\xd7140\u2248\u221297$ can indicate an informative and precise prediction, where 140 is the sample size of each fold. The log-likelihood values in Table 5 meet this criterion and thus demonstrate the prediction accuracy of the proposed model.

## Conclusions

Surface mating for assembly has been widely adopted in many manufacturing applications such as automotive powertrain production. The leakage/sealing problem is a critical issue affecting the quality of machined surface mating. A grand challenge to the diagnosis of the surface mating problem is that the mating error is not directly measurable and difficult to estimate. However, state-of-the-art methods do not provide for effective information about the internal leak areas. This paper proposes a novel method of modeling and diagnosis of leak areas for surface assembly, thereby offering a new method for assessing the surface mating quality. The outcome of this research potentially mitigates the needs of pursuing high flatness quality for each individual surface to loosen the tolerance for surface machining, thus reducing manufacturing cost. The diagnosed leak area provides engineers with valuable feedback on the root cause of sealing problems for troubleshooting.

By integrating surface profile data, leakage measurements, and Hagen–Poiseuille law, a prediction model is developed to estimate the leak areas after surface mating. The model adopts a lattice graph to represent leakage paths created by interfacial void spaces and connectivity among these voids from source to target grids. Among all the possible leakage channels, the minimum-cost path is calculated for every location in the mating zone. The cost is determined by the interfacial void space and calculated based on Hagen–Poiseuille law, which characterizes the pressure loss for leak flow propagation. As such, the minimum-cost path captures the direction with the least pressure loss when liquid/gas flows along the path. A binary classifier is then established to predict the probability of leak occurrence for every location in the mating zone based on the features extracted from minimum-cost paths. Feature extraction and selection are also discussed in this paper. To provide training data for the model, a novel pressurized color-tracking method is proposed, and a leakage testbed is designed for measuring the leak areas. A simulation study verified the accuracy and robustness of the prediction model for leakage paths, and a case study based on a surface mating experiment validated the proposed diagnostic algorithm.

## Acknowledgment

This research has been motivated by surface assembly problems encountered in real-world manufacturing plants for automotive powertrains. Dr. Zhenhua Huang from Coherix Inc. provided for valuable information about their customers' surface mating problems in the automotive industry. The study has been conducted in the High-Performance Materials Institute at Florida State University.

## Funding Data

Division of Civil, Mechanical and Manufacturing Innovation (Grant No. CMMI-1434411).