## Abstract

In recent years, machine learning (ML) techniques have gained popularity in structural health monitoring (SHM). These have been particularly used for damage detection in a wide range of engineering applications such as wind turbine blades. The outcomes of previous research studies in this area have demonstrated the capabilities of ML for robust damage detection. However, the primary challenge facing ML in SHM is the lack of interpretability of the prediction models hindering the broader implementation of these techniques. For this purpose, this study integrates the novel Shapley Additive exPlanations (SHAP) method into a ML-based damage detection process as a tool for introducing interpretability and, thus, build evidence for reliable decision-making in SHM applications. The SHAP method is based on coalitional game theory and adds global and local interpretability to ML-based models by computing the marginal contribution of each feature. The contribution is used to understand the nature of damage indices (DIs). The applicability of the SHAP method is first demonstrated on a simple lumped mass-spring-damper system with simulated temperature variabilities. Later, the SHAP method has been evaluated on data from an in-operation V27 wind turbine with artificially introduced damage in one of its blades. The results show the relationship between the environmental and operational variabilities (EOVs) and their direct influence on the damage indices. This ultimately helps to understand the difference between false positives caused by EOVs and true positives resulting from damage in the structure.

## 1 Introduction

Structural health monitoring (SHM) refers to the process of identifying changes in the integrity of a structure based on observations and measurements that describe its current state. SHM has sparked the interest of researchers due to the untapped potential for safety and reliability improvements and operation and maintenance (O&M) cost reductions. Among the diversity of methods adopted by researchers, data-driven vibration-based structural health monitoring (VSHM) is a promising approach and is continuously being studied. The primary goal of VSHM is to detect damage in a structure through the following steps [1]: (1) Measure vibration responses from the structure; (2) Extract damage sensitive features (DSFs), i.e., metrics that contain useful information to characterize the state of the structure; (3) Select a method to process the DSFs and calculate a damage index (DI); and (4) compare the DI with a predefined threshold, thereby identifying outliers that indicate potential damage in the structure. For the last two steps, commonly, a statistical outlier analysis method is implemented such as the Mahalanobis distance (MD) to identify discordant outliers in a feature set [2]. However, one of the prevailing challenges in damage detection is the influence of environmental and operational variabilities (EOVs) on DSFs when monitoring structures and collecting vibration responses. Several studies have demonstrated how EOVs can camouflage the presence of damage and, ultimately, hinder damage detection [3–6].

To address this challenge, scholars have adopted different techniques including multivariate linear regression models [7]. More recently, researchers have investigated the use of machine learning (ML) techniques to identify and mitigate the influence of EOVs. In the past decade, numerous studies have implemented these techniques showing promising results when monitoring structures, identifying failures or damage [8,9]. A study by Santos et al. [10] compared several ML algorithms, namely, one-class support vector machine, support vector data description, kernel principal component analysis, and greedy kernel principal component analysis to detect damage in a three-story aluminum frame structure under the influence of EOVs. The results have shown reliability in recognizing patterns and clustering the observations from the healthy structure, thereby mitigating the effects of EOVs and ultimately increasing the damage detection accuracy. Abdeljaber et al. [11] presented a computationally inexpensive system using 1D convolutional neural networks (CNNs) for real-time damage detection and localization. The approach enables the automatic extraction of optimal DSFs and was tested on eight structural cases derived from large-scale experiments on a simulator. The results showed the high level of generalization of this approach and the advantage of eliminating the need for manual model or parameter tuning on feature extraction.

In the field of wind energy, Zhang et al. [12] proposed a data-driven framework using random forests (RFs) in combination with extreme gradient boosting (XGBoost) for fault detection in wind turbines. The former is used to rank the DSFs according to their relevance and, later, the top-ranking features are used to train the XGBoost algorithm. The approach was tested on numerical simulations of three different types of wind turbines (onshore and offshore) in different operating conditions. This approach outperformed the support vector machine method when dealing with multidimensional features sets in terms of avoiding model overfitting. Solimine et al. [13] used principal component analysis (PCA) and K-means clustering to identify outliers, i.e., deviations from the normal operation of a full-scale wind turbine blade (WTB), through the collection of audio signals from the blade cavity. Their method successfully detected structural and acoustic abnormalities when the WTB underwent fatigue testing. Furthermore, the authors concluded that their method facilitated the ability to distinguish between damage-related acoustic aberrations and EOVs. Finally, Mylonas et al. [14] assessed the effect of wakes on down-wind turbines in a park by implementing a state-of-the-art ML technique, namely, a Variational Auto-Encoder (VAE) Neural Network. This technique can quantify the levels of statistical deviation on condition monitoring data such as power production and wind speed, thereby identifying wakes that can potentially contribute to a detriment in a turbine's fatigue life.

As seen, ML techniques have been successfully implemented in experimental and real case applications. Nevertheless, the ML algorithms adopted in these studies for VSHM frameworks are often referred to as black-box models and they provide little to no information about the decision-making process. The lack of interpretability and understanding regarding how these models detect changes in structural integrity and how DSFs are used by these models to make a certain prediction is one of the reasons hindering the widespread adoption of these techniques for industrial applications. Enabling interpretability in ML-based VSHM frameworks can help build evidence for reliable decision-making in SHM applications. In particular, differentiating between novelties caused due to the influence of EOVs or damage in the structure can help reduce false alarms.

The issue of interpretability when using ML techniques has been addressed previously in fields outside engineering [15]. In particular, the Shapley Additive exPlanations (SHAP) approach proposed by Lundberg et al. [16] has lately attracted attention in the research community. The authors adopted a model agnostic representation of feature importance which is estimated by Shapley values [17,18], and developed an algorithm to estimate the Shapley values in a computationally efficient way. SHAP is an additive feature attribution method that defines the output of a model as the sum of the real values attributed to each input feature. In other words, the Shapley values represent the marginal contribution of each feature to the prediction of the model. Lundberg et al. [16] have also shown that the Shapley values are the only way to assign feature importance while maintaining two important properties:

Local accuracy: The sum of the feature attributions is equal to the output of the function one is seeking to explain

Consistency: Changes in the model where the impact of a feature increases will never decrease the attribution assigned to that feature.

Shapley Additive exPlanations has been implemented in fields such as medicine and finance. For example, Lundberg et al. [19] used the SHAP technique to understand the outputs of an ML-based system predicting the risk of hypoxemia during anesthesia care from electronically recorded intra-operative data. Similarly, in the field of finance, Bussmann et al. [20] employed the SHAP technique to explain the risks associated with peer to peer lending credit in regulated financial services. The interpretable model enables lending entities to classify potential borrowers into risky and nonrisky borrowers based on common financial characteristics identified by SHAP as highly influencial including earnings before interest and profit before taxes for nondefaulted companies and total assets and shareholders funds for defaulted companies. In both cases, the added interpretability helped to understand the predictions made by the ML models and highlighted the potential of interpretable ML for reasoning in predictive models.

Interpretable ML techniques such as SHAP have also been adopted in engineering applications, albeit less frequently. For instance, Parsa et al. [21] used SHAP to analyze the causes for traffic accidents predicted by a XGBoost classification model. By implementing SHAP, the study found that traffic-related features such as the vehicle speed had a relative high impact on the prediction of an accident. Furthermore, nontraffic-related features such as demographic, network, land use, and weather conditions were identified as important predictors for the occurrence of an accident. This information can facilitate policy decisions related to, e.g., speed limits, urban planning, among others.

For SHM applications, Lim et al. [22] estimated different damage levels for several bridges based on monitored conditions of their decks using an XGBoost classification model. Their data was retrieved from the Korean Bridge Management System and corresponded to 142,439 deck inspection records of 2388 bridges. For their analysis, the authors collated 53 variables including identification factors such as region, shape, vehicle weight limit, among others; structural factors such as deck material, thickness, and strength, etc.; traffic factors and inspection factors such as age, damage type, and condition rating of the damage. Further, SHAP was adopted to capture the influence of the features used for the classification. Their results showed that major influencing factors influencing damage to bridges include age, average daily truck traffic, and vehicle weight limit. Onchis et al. [23] used local interpretable model-agnostic explanations (LIME) and SHAP in combination to characterize the location and depth of damage in cantilever beams. Through the added interpretability and a novel stability index, the authors provided trust in models prediction. The findings emphasized the benefits of using Shapley values for model interpretability in addition to ML-based techniques and their contribution to the decision-making process for preventive maintenance.

These studies have shown that interpretable ML techniques improved the pattern recognition properties of the predictive models providing supplementary information to select the most adequate predictors (i.e., predictors with the highest contribution). Enabling interpretability in ML-based VSHM frameworks helps build evidence for reliable decision-making in SHM applications. However, to the authors' best knowledge, this issue has not been reviewed in the scientific literature, particularly, for interpreting the effect EOVs in continuous VSHM for damage detection.

Damage detection in structures can often not be treated as a supervised classification problem due to the lack of knowledge about the damage. Therefore, this study introduces a novel data-driven framework to understand the cause of novelties identified by an MD-based DI in a semisupervised framework. For this, after novelties are identified, an XGBoost regression model is trained to learn the relationship between DSFs, EOVs, and MD-based DIs. The XGBoost regression is selected due to its superior performance compared to parametric models such as multivariate regressions and its low computational time [24]. Furthermore, the process is complemented with the SHAP approach for an enhanced understanding of the variabilities in the DIs identified. This postdetection interpretability enables explainability in online SHM systems. The proposed data-driven framework yields understanding when detecting outliers that correspond to a false alarm or damage in a structure. The framework is first evaluated on a simple lumped mass-springer-damper system with simulated temperature variabilities. The simulation explores and highlights the capabilities of SHAP to interpret the effect of temperature in a dynamic system. Later, the framework is demonstrated on data gathered from an in-operation V27 WT with artificially introduced damage in one of its blades to evaluate and identify the nature of DIs (e.g., EOVs or damage).

## 2 Methodology

This section elaborates on the steps proposed to determine the cause of novelties identified through an MD-based damage detection approach. First, extracted DSFs are used to calculate the MD-based DI. For this, under a semisupervised framework, only a subset of healthy data is required. Then, an XGBoost regression model is trained considering the DSFs and measured environmental variabilities as predictors and the MD-based DI as the prediction target. Finally, SHAP is included in the analysis to interpret the XGBoost model and thus interpret the contribution of each DSF and measured environmental parameter to the DI. A schematic representation of the proposed framework is illustrated in Fig. 1.

### 2.1 Covariance-Based Damage Sensitive Features.

Using the covariance as a DSF for damage detection has shown promising results in several studies [25–27] and is, therefore, adopted in this study. The covariance of the vibration responses is used to measure the phase and amplitude relationship between the responses at different sensor locations, which characterizes the state of the structure.

### 2.2 Mahalanobis Distance-Based Damage Index.

*L*is the length of the DSF vector and

*M*is the number of observations. The MD is calculated as follows:

where *y _{i}* is the MD between the DSF vectors $xi$ at observation

*i*and the DSF matrix $D\u2208\mathbb{R}L\xd7E$ with $E\u2264M$.

**is the subset of**

*D***used for training. $\mu D\u2208\mathbb{R}L$ is the mean of the observations in**

*X***and $\Sigma D\u2208\mathbb{R}L\xd7L$ is the covariance between observations in**

*D***.**

*D**y*is also referred to as “calculated MD” in this paper.

_{i}### 2.3 XGBoost Decision Trees Regression Model.

XGBoost is a scalable ML technique for tree boosting developed by Chen and Guestrin [24]. The algorithm uses nested decision trees to improve efficiency and learning in the model [29]. Whilst decision trees use simple conditional statements to classify the features into target labels based on particular attributes of the features, XGBoost uses multiple decision trees in parallel, thereby optimizing performance and improving the accuracy of the classification. Each decision tree within the XGBoost algorithm learns from the previous decision tree gradually improving the model and building a strong learner as shown in Fig. 2.

*M*observations containing the DSFs including measured EOVs $\chi i\u2208\mathbb{R}\u2113$ and the corresponding prediction target $yi\u2208\mathbb{R}$. The tree ensemble model is the summation of

*K*additive functions with the estimated prediction $y\u0302i$ defined as follows:

*f*is the independent tree structure of

_{k}*K*trees and

*F*the tree space. Different sets of functions are trained by minimizing the following objective function

*l*is a loss function which measures the difference between the prediction $y\u0302i$ and the prediction target $yi$ at iteration step

*t*(or

*t*th tree). The second term Ω is the complexity of the model and is defined as

*P*is the number of leaves in the tree,

*w*the score of a leave,

_{j}*γ*the complexity scalar of each leaf and

*λ*the scaling parameter to penalize the complexity. Unlike in general gradient boosting decision trees, for XGBoost the second-order Taylor expansion is taken to the loss function. Using the mean squared error (MSE) as a loss function, the

*t*th tree can be derived as follows:

*j*th leaf.

*I*represents all the data samples in leaf node

_{j}*j*;

*g*and

_{i}*h*are the first and second derivatives of the MSE loss function and data points are assigned to the corresponding leaf with the function $q(\xb7)$. Further, by letting $Gj=\u2211j\u2208Ijgi$ and $Hj=\u2211j\u2208Ijhi$, optimizing the objective function can be transformed to finding the minimum of a quadratic function. The optimal weight $wj*$ for the

_{i}*j*th leaf is computed by

*q*. To evaluate the best split during the training and determine when the splitting is stopped, the following gain score is calculated

where the first two terms refer to the summation of the left and right tree respectively and are compared to the score of the original leaf, in third term. The score of the generated leafs is penalized by *γ*. If the gain is negative, the tree stops growing and returns to the previous split, otherwise, it continues improving the model.

In other words, after every split the decision tree model is evaluated based on the two regularization terms: The first term in Eq. (3) is responsible for prediction accuracy during the training and the second term considers the complexity of the decision tree. As high tree complexity can cause over-fitting, this is penalized.

### 2.4 Shapley Additive Explanations for XGBoost Decision Trees.

*a*defined as a linear function of binary features as in the following equation:

where $zl\u2208{0,1}L$, *L* is the length of the DSF vector and $\varphi l\u2208IR$ is the feature attribution values. The variables *z _{l}* represent a feature being observed (

*z*= 1) or unknown (

_{l}*z*= 0).

_{l}*S*contains a set of nonzero indexes in

*z*. To compute the feature attribution value $\varphi l$ for each input feature, SHAP uses the classic Shapley values from game theory as in the following equation:

where *N* is the set of all input features.

## 3 Simulated Lumped Six Degrees‐of‐Freedom System

To evaluate the proposed framework in an application for VSHM, let us first consider a simple dynamic system. Figure 3 shows a six degrees-of-freedom (DOF) lumped mass-spring-damper system consisting of six masses *m*_{1} to *m*_{6} connected to each other by springs with stiffnesses *k*_{1} to *k*_{6} and dampers with the coefficients *c*_{1} to *c*_{1}. The system is fixed to the ground on one of the sides. The mass and damper coefficients are given the same value of 2 kg and $0.01Ns2m$, respectively.

*T*affects the spring stiffnesses. For simplicity, the following rule is used:

_{i}The system is excited by an impulse force applied to first mass and, after the impulse, the masses vibrate freely. The resulting time series with 4350 samples with a sampling frequency of 1321 Hz is used for calculating the covariance which is used as DSFs (see Sec. 2.1).

In total, we simulate 2000 observations. At each observation, the temperature is set according to Fig. 4. It ranges from approximately 7.5 °C to 15.5 °C, where extreme weather events are implemented as sudden temperature drops of 4 °C between observations 1100–1150 and 1400–1450. To simulate damage, after 1950 observations, the stiffness *k*_{3} is arbitrarily chosen and reduced by 5%. For each observation, the stiffnesses are set according to Eq. (11), and the DSF is calculated. The number of observations per simulated state is summarized in Table 1.

### 3.1 Damage Detection: Mahalanobis Distance-Based Damage Index.

Following the framework for continuous monitoring presented in previous studies [27,31], the MD-based DI defined in Eq. (1) is computed with only a subset of healthy observations, in this case, the first 1250 are used as a reference for the inverse covariance matrix $\Sigma D$. To identify novelties which are potentially related to damage, a threshold is defined based on the first 1250 observations allowing 20% of outliers. This threshold is application-specific and can be defined based on the expertise of the operator and other factors such as economics, safety, etc. The DI in Fig. 5 shows an increase for observations 1100–1150 and 1400–1450, where the temperature drops 4 °C. Further, the DI of observations 1950–2000 increase due to the 5% stiffness reduction of spring *k*_{3}.

### 3.2 Interpretability.

To understand the difference between novelties caused by the stiffness reduction and the influence of temperature effects, a regression model is built taking the temperature and DSFs as predictors and the MD-based DI as the prediction target. The XGBoost regression model is trained as described in Sec. 2.3 using observations from the healthy and damaged states of the system. The aim of adopting a regression model in this study is to maintain a high prediction accuracy of the regression model to reproduce the MD-based DI. For this, the hyperparameters (i.e., maximum depth, minimum child weight, gamma, subsample, colsample by tree, alpha for L1 regularization, and learning rate) are identified by a random search. The search is performed until a performance accuracy $R2>0.99$ was reached, where *R*^{2} is square of the sample correlation coefficient between the predicted and the calculated MD as per Eq. (1). The calculated and predicted MD-based novelty indices are shown in Fig. 6. The parameters are summarized in Table 2. For each randomly selected value in a given range a maximum of 1000 iteration is set and the training stops if the mean squared error does not improve for 50 iterations to avoid overfitting.

Hyperparameter | Description | From | To | Interval | 6DOF–Best parameters |
---|---|---|---|---|---|

Maximum depth | The maximum depth of which each tree is built | 1 | 10 | 1 | 7 |

Minimum child weight | The minimum sum of instance weight of all the observations | 1 | 5 | 1 | 3 |

Required in a child | |||||

Gamma | Controls the minimum loss reduction required to make a node split | 0 | 1 | 0.1 | 0 |

Subsample | The number of observations randomly sampled at each tree | 0.1 | 1 | 0.1 | 0.4 |

Colsample by tree | The number of features selected to build each tree | 0 | 1 | 0.1 | 0.6 |

Regularization alpha | L1 regularization value on weights | 0.01 | 0.3 | 0.01 | 0.19 |

Learning rate | Stepsize on updating the weights | 0.01 | 0.3 | 0.01 | 0.22 |

Hyperparameter | Description | From | To | Interval | 6DOF–Best parameters |
---|---|---|---|---|---|

Maximum depth | The maximum depth of which each tree is built | 1 | 10 | 1 | 7 |

Minimum child weight | The minimum sum of instance weight of all the observations | 1 | 5 | 1 | 3 |

Required in a child | |||||

Gamma | Controls the minimum loss reduction required to make a node split | 0 | 1 | 0.1 | 0 |

Subsample | The number of observations randomly sampled at each tree | 0.1 | 1 | 0.1 | 0.4 |

Colsample by tree | The number of features selected to build each tree | 0 | 1 | 0.1 | 0.6 |

Regularization alpha | L1 regularization value on weights | 0.01 | 0.3 | 0.01 | 0.19 |

Learning rate | Stepsize on updating the weights | 0.01 | 0.3 | 0.01 | 0.22 |

Figure 7(a) summarizes the Shapley values obtained for the 6DOF lumped mass-spring-damper system XGBoost model for the observations 1–1950, while Fig. 7(b) refers to the summary of the observations 1950–2000. The features are ranked in descending order. It can be seen that the variance of sensor 3 has the largest marginal contribution on the model's prediction and the covariance of sensor 3 and 5 the lowest. On the horizontal axis, the contribution of the feature can be observed indicating a positive or a negative impact on the models prediction. For example, the temperature feature in Fig. 7(a) shows that low values (colored blue) have a positive impact on the models prediction, i.e., these increase the model's predicted index, whilst lower values have a negative impact, decreasing the model's predicted index. This suggests that low temperatures cause a high MD-based DI and vice-versa. Looking at the last 50 observations in Fig. 7(b), where the stiffness reduction is introduced, the contribution of the temperature to these predictions drops from ranks 4–6. This indicates that the predictions of these observations are not dominated by the influence of the temperature field on a global scale.

Looking at the local Shapley values $\varphi temp$ for the temperature in Fig. 8, it can be observed that observations 1100–1150 and 1400–1450 have particularly high Shapley values. This indicates that these observations are caused by temperature. Conversely, observations 1950–2000 have a negative impact on the model's prediction indicating that these novelties did not occur due to the influence of temperature.

Figures 9(a) and 9(b) show the Shapley values for observations 1100–1150 and 1400–1450 respectively in a global impact overview of the features contribution. In these figures, it is clear that temperature is the most influential feature for the predictions. The temperature drops of 4 °C introduced into the 6DOF simulated system which was responsible for the increased DI can be identified as false alarms by observing the Shapley values.

## 4 Case Study: Operational Wind Turbine V27

In this section, we demonstrate the capabilities of the proposed framework to detect damage and explain the DI in a continuous monitoring system installed in an operating wind turbine. The experiment was conducted by Tcherniak et al. [27] between November 28, 2014 and March 12, 2015. In the experiment, one of the blades in the operating Vestas V27 wind turbine in Fig. 11(a) is equipped with 11 accelerometers. For this study, the 8 accelerometers shown in Fig. 10 placed on the leading and trailing edge are considered. Additionally, the electromechanical actuator is shown in Fig. 11(c) was mounted close to the root of the blade to excite the blade. During the monitoring campaign, a total of 24,693 observations were recorded considering a healthy, damaged, and a repaired WTB. During the monitoring campaign, artificial damage was introduced by creating a trailing edge opening and gradually expanding it from an initial length of 15 cm in damage scenario 1 (see Fig. 11(b)), to 30 cm in scenario 2 and 45 cm in scenario 3. Finally, the WTB was repaired and further observations were recorded.

The framework is demonstrated on observations corresponding to the 32 rpm operational mode of the turbine. For the purpose of damage detection, the data is classified into two sets, namely, healthy and damaged. The former contains solely observations recorded after repairs as there are nearly three times more observations than those recorded prior to the introduction of the artificial damage. Therefore, in this study, the observations considered as healthy are obtained from the repaired observations which consist of 2927 observations. The data classified as damaged correspond to the observations that were recorded during the three damage scenarios. Further, the observations are filtered by a bandpass filter with cutoff frequencies of 700 and 1200 Hz. A summary of the number of observations in each class is presented in Table 3.

Healthy | Damage | |||||
---|---|---|---|---|---|---|

Reference | Testing | 15 cm | 30 cm | 45 cm | Total | |

Observations | 2000 | 639 | 66 | 117 | 105 | 2927 |

Healthy | Damage | |||||
---|---|---|---|---|---|---|

Reference | Testing | 15 cm | 30 cm | 45 cm | Total | |

Observations | 2000 | 639 | 66 | 117 | 105 | 2927 |

A study on the minimum amount of observations required is not performed as a limited amount of data is available. Generally, if observations are influenced by EOVs in a similar manner several times, a consistent pattern can be learned yielding more robust interpretability.

### 4.1 Damage Detection: Mahalanobis Distance-Based Damage Index.

The DI shown in Fig. 12 is computed in the same manner as described in Sec. 2.2 with the obtained covariance-based DSF vectors. The first 2000 observations are used to create a reference to compute the MD as shown in Eq. (1). Outliers in the DI can be identified at around observations 360–675 and 1080–1215 (highlighted in gray). These observations will be the focus of further interpretation. The DI for these observation causes false alarm and, thus, misinterpreted damage.

### 4.2 Interpretability.

To understand the potential cause of these novelties and how they differ from the identified damage after observation 2639, the XGBoost regression is built. DSFs and measured EOVs such as temperature, wind direction, and wind speed are included. The XGBoost regression model is adopted from Sec. 3.1 with the hyperparameters shown in Table 4. The model's predictions with an $R2>0.99$ are shown in Fig. 13.

Hyperparameter | Description | From | To | Interval | V27—Best parameters |
---|---|---|---|---|---|

Maximum depth | The maximum depth of which each tree is built | 1 | 10 | 1 | 8 |

Minimum child weight | The minimum sum of instance weight of all the observations | 1 | 5 | 1 | 1 |

Required in a child | |||||

Gamma | Controls the minimum loss reduction required to make a node split | 0 | 1 | 0.1 | 0 |

Subsample | The number of observations randomly sampled at each tree | 0.1 | 1 | 0.1 | 0.6 |

Colsample by tree | The number of features selected to build each tree | 0 | 1 | 0.1 | 0.5 |

Regularization alpha | L1 regularization value on weights | 0.01 | 0.3 | 0.01 | 0.08 |

Learning rate | Stepsize on updating the weights | 0.01 | 0.3 | 0.01 | 0.14 |

Hyperparameter | Description | From | To | Interval | V27—Best parameters |
---|---|---|---|---|---|

Maximum depth | The maximum depth of which each tree is built | 1 | 10 | 1 | 8 |

Minimum child weight | The minimum sum of instance weight of all the observations | 1 | 5 | 1 | 1 |

Required in a child | |||||

Gamma | Controls the minimum loss reduction required to make a node split | 0 | 1 | 0.1 | 0 |

Subsample | The number of observations randomly sampled at each tree | 0.1 | 1 | 0.1 | 0.6 |

Colsample by tree | The number of features selected to build each tree | 0 | 1 | 0.1 | 0.5 |

Regularization alpha | L1 regularization value on weights | 0.01 | 0.3 | 0.01 | 0.08 |

Learning rate | Stepsize on updating the weights | 0.01 | 0.3 | 0.01 | 0.14 |

The global influence, i.e., the Shapley values, of the DSFs and EOVs are shown in Figs. 14(a) and 14(b). The Shapley values presented for the healthy state of the WTB in Fig. 14(a) show that the variance of accelerometer 7 has the highest influence on the model's predictions, followed by temperature. This indicates that the DI is highly influenced by temperature while the turbine has no artificial damage introduced. We also can identify that temperature values close or below (see Fig. 15(a)) have a positive influence suggesting that low temperatures cause a high MD-based DI. This is consistent with the SHAP results obtained from the simulation in Sec. 3.1, where the drops in temperature caused an increase in the MD-based DI. EOVs such as wind direction and wind speed have a rather small influence on the model's prediction. Conversely, the significance of the temperature dropped for observations 2639–2927 as shown in Fig. 14(b). The majority of Shapley values for observations during the damaged state of the WTB fluctuate around or below 0 indicating a low influence on the model's prediction. This leads to the conclusion that these novelties are not caused by the effect of temperature.

For local interpretability and to gain insight into the contribution of the temperature measurements shown in Fig. 15(a) to the model's predictions, the corresponding Shapley values for each of the observations are shown in Fig. 15(b). Looking into the previous false alarms identified between observations 360–675 and 1080–1215, it is clear that these outliers are influenced by the positive Shapley values contribution of temperature to the model's outputs. When looking at the observations after 2639, when the artificial damage was introduced, the temperature effect is rather negligible with Shapley values fluctuating around zero.

Focusing on observations 360–675 in Fig. 16(a) and 1080–1215 in Fig. 16(b), the Shapley values for temperature are dominating throughout all features confirming the influence of EOVs on the model's outputs. Low-temperature values have a positive impact on the prediction where as higher values have a rather low impact.

False alarms are present in observations around 2400 as can be seen in Fig. 12. In Fig. 16(a) can be seen that the temperature peak at 12 °C correlates with the false alarms. However, in Fig. 16(b) can be seen that the contribution of temperature to the predictions is rather minor relative to, e.g., the contributions at around observations 1200–1300. Figure 17 shows that there were four other DSFs with higher contributions than temperature for the cases of observations 2380–2420. This suggests that EOVs which were not measured or explicitly considered in the model may have had an influence on the novelty index. This represents a source of uncertainty and thus a limitation in the proposed framework.

Machine learning techniques are based on pattern recognition, i.e., a particular behavior or instance has to occur multiple times before the algorithm is able to recognize a pattern. A temperature of 12 °C occurred only once during the measurement campaign and, thus, a longer measurement campaign encompassing a wider range of environmental conditions may be necessary to address this uncertainty and provide clearer interpretations of the predictions.

## 5 Conclusions

This study introduced a data-driven framework to distinguish novelties due to the influence of EOVs from those caused by damage in a structure. This framework addresses the lack of interpretability and transparency in ML techniques used for VSHM. The proposed data-driven framework was first evaluated on a six degrees-of-freedom lumped mass-springer-damper system with simulated temperature variabilities and, later, demonstrated on a full-scale experiment in an in-operation wind turbine blade with artificially introduced damage.

The results indicate that ML models such as XGBoost can successfully learn the relationship between DSFs, EOVs, and Mahalanobis distance-based damage indices. The complementary use of interpretable ML, in this case, the SHAP approach, enabled the determination of the contributions of the DSFs and EOVs to the predictions from the ML model. This process provided transparency and eased the understanding of the influence of EOVs on the damage indices. The SHAP method estimated the marginal contributions of EOVs such as temperature and enabled the identification of false positives. The results suggest that the proposed framework can strengthen confidence in the decision-making process and contribute to the widespread adoption of ML techniques for industrial applications. Furthermore, the proposed framework can increase reliability and safety and reduce operation and maintenance costs in the wind industry.

## Nomenclature

- CNNs =
convolutional neural networks

- DI =
damage index

- DSF =
damage sensitive features

- EOV =
environmental and operational variabilities

- LIME =
local interpretable model-agnostic explanations

- MD =
Mahalanobis distance

- ML =
machine learning

- MSE =
mean squared error

- O&M =
operation and maintenance

- SHAP =
Shapley Additive exPlanations

- SHM =
structural health monitoring

- WT =
wind turbine

- XGBoost =
eXtreme gradient boosting

## References

*New Trends in Vibration Based Structural Health Monitoring*