Abstract

A traditional ensemble approach to predicting the remaining useful life (RUL) of equipment and other assets has been constructing data-driven and model-based ensembles using identical predictors. This ensemble approach may perform well on quality data collected from laboratory tests but may ultimately fail when deployed in the field because of higher-than-expected noise, missing measurements, and different degradation trends. In such work environments, the high similarity of the predictors can lead to large under/overestimates of RUL, where the ensemble is only as accurate as the predictor which under/overestimated RUL the least. In response to this, we investigate whether an ensemble of diverse predictors might be able to predict RUL consistently and accurately by dynamically aggregating the predictions of various algorithms which are found to perform differently under the same conditions. We propose improving ensemble model performance by (1) using a combination of diverse learning algorithms which are found to perform differently under the same conditions and (2) training a data-driven model to adaptively estimate the prediction weight each predictor receives. The proposed methods are compared to three existing ensemble prognostics methods on open-source run-to-failure datasets from two popular systems of prognostics research: lithium-ion batteries and rolling element bearings. Results indicate the proposed ensemble method provides the most consistent prediction accuracy and uncertainty estimation quality across multiple test cases, whereas the individual predictors and ensembles of identical predictors tend to provide overconfident predictions.

1 Introduction

Accurately predicting the remaining useful life (RUL) of equipment and other assets deployed in the field is critically important, especially in settings where failure can lead to costly downtime and threaten the safety of operators [13]. Determining when maintenance may be needed requires first defining (1) a health index (HI), typically derived from direct measurements (voltage, vibration, sound, power, etc.) of the equipment in question, and (2) a suitable HI threshold, that once exceeded, the equipment is deemed to have failed [46]. Note that we changed the term “health indicator” used in the conference paper [7] to “health index” in this paper for better consistency with earlier relevant studies on ensemble prognostics. The health of the asset is then tracked with respect to the end-of-life (EOL) threshold, and the RUL is generally estimated in units of time for easy interpretation of when maintenance may need to be performed [8,9].

Various model-based and data-driven methods have been proposed to accurately estimate the RUL of equipment operating in the field. However, accurately estimating RUL is challenging because the recorded measurements are often clouded with noise, and the degradation trajectory is highly influenced by the operating conditions. The traditional model-based approach to estimating RUL has been to update and extrapolate a mathematical model, such as a stochastic process model [10,11], empirical model [12,13], or physics-based model [1416], which represents the observed HI trajectory (or degradation trajectory). This is usually done using a filtering algorithm such as a Kalman filter (KF) [1719], a particle filter [8,20,21], or one of their variants [2225]. In this approach, the optimal model hyperparameters (variance terms, empirical model parameters) are inferred from earlier measurements and are updated as more data are collected. Data-driven methods are another popular category of methods for predicting RUL. In this approach, readily available offline run-to-failure data are used to train a machine learning model, which is then used to predict the RUL of an online test unit by leveraging the correlations learned from the offline training data. Machine learning techniques are able to predict RUL by directly mapping from input features to RUL or by extrapolating the historical HI/degradation parameter measurements into the future. Popular data-driven machine learning models used for RUL prediction include artificial neural networks [2628], support/relevance vector machines (S/RVMs) [2931], convolution neural networks [3234], and recurrent neural networks (RNNs) [3537]. Each of these models has been shown to provide accurate RUL predictions, and some models have even been trained to estimate RUL uncertainty as well. However, the overall accuracy and adaptability of single-model RUL prediction methods, whether they are model-based or data-driven, have been exposed as one of the main marks against these methods. In many cases, the observed degradation trends of many equipment and assets are found to be highly nonlinear and time-varying, making it difficult for a single model to remain accurate over the entire lifespan of the asset.

To address the issues associated with single-model prognostics methods, researchers have proposed forming ensemble predictors, where the final RUL prediction is a combination of the predictions from individual models. Researchers in Refs. [35,36] proposed using an ensemble of long short-term memory (LSTM) RNN models where the final prediction was aggregated using a Bayesian method. Similar to Refs. [35,36], our earlier work in Ref. [38] used two separate ensembles of LSTM RNNs. In this method, the first ensemble predicted the RUL, and the second ensemble applied a correction based on the current value of the HI. The two-step prediction process was found to be more accurate than a single ensemble because the error correction could be learned effectively from available offline run-to-failure data.

Despite the promise shown in our earlier work, we observed that under certain degradation conditions, the use of an ensemble did not improve accuracy, as all the predictors in the ensemble would either under or overestimate the RUL (depending on conditions). The lack of diversity among the predictors in the ensemble and the simple model-averaging scheme to combine the predictions sometimes lead to errant predictions of RUL. To improve model robustness and prevent the aforementioned scenarios, researchers have proposed increasing ensemble diversity by including multiple different types of model-based and data-driven models. Likewise, new adaptive weighting schemes have been proposed to replace the more traditional equal weight averaging methods. In Ref. [39], researchers tested three different weighting schemes for combining predictors in a prognostic ensemble. The three weighting schemes, namely accuracy-based, diversity-based, and optimization-based, were used to pre-compute a set of weights for each predictor in an ensemble. During online prediction, the weights remain constant, which is better known as “degradation-independent weighting.” These degradation-independent weighting schemes can improve prognostic accuracy when (1) each predictor in an ensemble performs similarly across the lifetime of a training/test unit, and (2) the training and test data have similar and uniform degradation trajectories. In a later study, researchers in Ref. [40] proposed a “degradation-dependent” weighting scheme, designed to update the weights for each ensemble predictor depending on the system's health status. The proposed method categorized the system's health index into three regions and precomputed an optimal set of weights for combining the predictor's predictions in an ensemble. The adaptive three-stage weighting scheme outperformed a traditional ensemble method because the weights were optimized considering the additional dimension of health status. A limitation of this degradation-dependent weighting scheme, or more precisely, degradation stage-dependent weighting scheme, is that it requires dividing the lifetime of a training/test unit into a finite number of discrete degradation stages, which inevitably involves subjectivity in choosing the number of degradation stages and picking the HI thresholds for defining these stages. Additionally, the weight of each predictor in an ensemble is kept as a constant within each degradation stage, and due to the use of an often small number (≤ 5) of degradation stages, the weight versus HI relationship is limited to a piecewise constant function, which may not be optimal in complex engineering applications.

Most similar to the methods and ideas proposed in this paper are those from Refs. [41,42]. In Ref. [41], researchers investigated an ensemble of LSTM RNN models where each model was trained using a different-length sliding window of historical degradation measurements to increase prediction diversity and improve ensemble accuracy. The method was shown to outperform single models and ensembles of RNNs with an identical window size of input data. In Ref. [42], the researchers investigated whether better RUL prediction performance can be achieved by combining predictions from diverse individual models that overestimate and underestimate the RUL. The authors also implemented a degradation-dependent weighting scheme by assigning model weight based on the current perceived level of degradation.

To further improve prognostic model adaptability and RUL prediction accuracy, we propose using an ensemble of diverse predictors where the final RUL prediction is adaptively combined using a set of weights determined by a data-driven model. Similar to Refs. [40,42], our method aims to consider the effects of degradation on the accuracy of the individual prognostic models in the ensemble. At any given time during prediction, the predictors in an ensemble are designed to produce different RUL predictions, either consistently greater than or less than the true RUL. By producing RUL predictions that are both greater and less than the true RUL, there exists an optimal weighted sum of the predictors’ predictions that exactly equals the true RUL. To exploit this, we propose offline training of a Gaussian process regression (GPR) surrogate model to learn the optimal weight each predictor should receive throughout the duration of the RUL prediction. The use of a GPR surrogate model is one minor difference from the conference paper [7] where we had previously used a feedforward neural network. Then, during online operation, the trained surrogate model is used to estimate the weight each predictor should receive, effectively improving the overall accuracy of the ensemble. We consider three base RUL predictors to study the interaction between model diversity and the health index, where the GPR surrogate model is used to learn the dependence. The three models, an exponential unscented Kalman filter (EUKF), a GPR forecasting model, and an LSTM RNN, differ in how they predict RUL. Note that the GPR forecasting model is different from the GPR surrogate model (from this point on we refer to the GPR surrogate model as just a “surrogate model” to avoid confusion). We would like to note that in the conference version of this paper [7], we proposed the dynamic weighting scheme for two diverse models: EUKF and GPR. In the journal version, we further developed the conference paper [7] by (1) enhancing the generality of the model by including an LSTM RNN as a third diverse model, (2) comparing our dynamic degradation-aware weighting scheme to other weighting schemes from literature, and (3) providing source codes (GitHub link4) for the literature-based weighting schemes evaluated on an open-source battery dataset.

The remainder of the paper is organized as follows. Section 2 covers the methods and implementation of the proposed dynamically weighted ensemble and other literature-based weighting schemes. Section 3 outlines the two open-source datasets used to evaluate the performance of the proposed method. Section 4 discusses the results, and Sec. 5 presents concluding remarks.

2 Methods

In this section, we formulate the proposed dynamic degradation-dependent ensemble (DDDEn) along with three other contemporary ensemble methods.

2.1 Proposed Approach (DDDEn).

The schematic of the proposed dynamically weighted ensemble for RUL prediction is shown in Fig. 1. First, several diverse prognostic models are identified followed by hyperparameter optimization of the individual models with the goal of minimizing the RUL prediction error on a training dataset. For any model Mi, this can be formally stated as
minPiRMSE(L,L^i)=1Ntrainn=1Ntrain(LnL^i,n)2
(1)
subjecttoPiminPiPimax
(2)
where L and L^ are two RUL vectors that respectively consist of the true and predicted RUL of a training set of Ntrain samples (with the accumulated run-to-failure data of all the entities). The hyperparameter set Pi is bounded by Pimin and Pimax. Each model Mi provides RUL predictions that are either consistently greater or less than the true RUL. At each prediction instance, a weighted sum of the individual model predictions can theoretically provide perfect RUL prediction accuracy. We then determine the best weights wn at each prediction instance of the training dataset as follows:
minwnRMSE(L,L^eff)
(3)
whereL^neff=i=1Mwi,nL^i,nsubjectto0wi,n1,i=1Mwi,n=1
(4)
Fig. 1
Schematic of the proposed method
Fig. 1
Schematic of the proposed method
Close modal

The effective RUL L^neff at each time instance is obtained by weighing individual model predictions. Finally, a surrogate model is built to predict wn=g(HIn,L^1,n,,L^M,n) at each time instance.

In this paper, we explore the utility of the proposed approach using the EUKF, GPR, and LSTM predictor models. We carry out the hyperparameter optimization formulated in Eq. (1) by using a grid search method with Latin-hypercube sampling. The optimized weights wn at each time instance are obtained by using matlab's fmincon function. The surrogate model g is trained using a GPR model. We note that both EUKF and GPR provide uncertainty in RUL prediction leading to an approximately normal distribution of L^N(μL^,σL^) in Eqs. (1)(4). The distribution of L^neff in Eq. (4) can be approximated as L^neff=N(μL^neff,σL^neff), where
μL^neff=i=1Mwi,nμL^i,nσL^neff2=i=1Mwi,n(μL^i,n2+σL^i,n2)μL^neff2
(5)

Although the calculation of RMSE depends only on the mean RUL, we evaluate the uncertainty quantification capabilities of all the models. Further, the stochastic nature of EUKF provides good epistemic uncertainty. Therefore, during weighting, we consider an ensemble of EUKF as M1, GPR as M2, and LSTM as M3.

2.2 Degradation-Independent Ensemble.

In this method, we take inspiration from Refs. [39,40] and assign constant weights to each model Mi over the entire prediction horizon. These weights are determined by minimizing an RUL prediction error on the training dataset (optimization-based weighting) as follows:
minwRMSE(L,L^eff)whereL^1:Ntraineff=i=1MwiL^i,1:Ntrainsubjectto0wi1,i=1Mwi=1
(6)

2.3 Degradation Stage-Dependent Ensemble.

Due to differences in individual predictor model architectures, each model can excel at RUL prediction at a different stage of degradation. To this end, Li et al. [40] proposed discretizing the HI into S stages so that the weight vector for the individual predictors in an ensemble can be optimized within each stage individually. In other words, for weight vector w=[w1,w2,,wS], the model weights for each stage ws=[w1s,,wMs],s=1,,S, can be determined as
minwsRMSE(Ls,L^s,eff)
(7)

In the two case studies presented later, we divide the HI into a total of S = 3 stages, with stage 3 being the closest to EOL.

2.4 Diverse Predictors.

The ensemble techniques described in Secs. 2.12.3 require diverse individual models for RUL prediction. Although the selection of individual predictors can be subjective to the type of dataset, appropriate model selection is nevertheless required to ensure model diversity and prediction accuracy. Model diversity can stem from (1) prediction bias (some models often either under- or over-predict RUL) or (2) generalization performance (different models generalize better to different test samples that fall outside a training distribution), either consistently throughout the lifetime or consistently during one or multiple degradation stages. When model diversity comes from prediction bias, it is desired to have a combination of both over- and under-predicting models where the weighting schemes could theoretically achieve true RUL, thereby increasing accuracy. As noted earlier, individual models may not be consistently biased towards under- or over-prediction during the entire lifetime. However, if there is such a bias in one or multiple local degradation stages, the DDDEn and degradation stage-dependent ensemble (DSDEn) models, which take the HI as an input, can assign weights to those local regions and achieve better overall performance. In this section, we briefly present three diverse predictors, namely EUKF, GPR, and LSTM, which we use for the two case studies. However, the applicability of this study's findings is not limited to these three diverse predictors.

2.4.1 EUKF for Remaining Useful Life Prediction.

KF is a well-known technique for estimating unknown states of a system based on a history of measurements with statistical noise. Unscented KF introduces the concept of sigma points to introduce nonlinearity in estimating the system's states. For a discrete-time single-state, single-measurement dynamical system, the state, and measurement equations at time t can be stated as
xt=f(xt1)+ξt1,ξN(0,Q)
(8)
zt=h(xt)+vt,vN(0,R)
(9)
where f and h are the state transition and measurement functions respectively, x is the a priori state estimate, z is the current measurement, ξ and v are process and measurement noise with covariances Q and R respectively. The unscented transformation creates sigma points around xt−1 such that mean is exactly x¯t1 with covariance Pt−1. Each sigma point is propagated through the nonlinearity and summed up using some weights to give an effective state prediction xt. For brevity, we avoid providing a detailed explanation of the unscented Kalman filter (UKF). Such explanations can be found in Refs. [43,44]. For the case studies we explored in this paper, we use a combination of linear and exponential terms in the state transition equation as follows:
[x1x2x3x4]t=[x1+x2Δt+x3ex4Δtx2x3x4]t1+N(0,Q)
(10)
The measurement equation can be stated as
zt=[1000][x1x2x3x4]t+N(0,R)
(11)
Once the states are estimated based on the measurements from t = 0 to the current time t, the same states are used to forecast z till a pre-defined threshold value is reached. The ground truth and predicted RUL are, respectively, calculated as follows:
Lt=EOLt
(12)
L^t=EOL^tt
(13)
where t is the current index (usually time or another measure), the hat operator ^ denotes an estimated value, and L denotes ground truth RUL.

2.4.2 Model-Based Remaining Useful Life Prediction Using Gaussian Process.

Similar to traditional filtering-based approaches to RUL prediction, GPR can also be used to provide a probabilistic prediction of RUL. This is done by first fitting an underlying trend function that models the trend of the HI. Then, a GPR model is trained to represent the residual from the fitting process. To predict RUL, the trend function is extrapolated into the future and the GPR model is evaluated along the entire trajectory. To estimate RUL, as stated in Eqn. (13), GPR is used to model the uncertainty around the underlying trend function. The GPR model is formulated as
f(t)GP(m(t),k(t,t))
(14)
where m(t) is the trend function and k(t, t′) is the kernel function of the GPR model. Each is further denoted by
m(t)=E[f(t)]
(15)
k(t,t)=E[(f(t)m(t))(f(tm(t)))]
(16)
In this study, we employ the popular squared exponential kernel function and use a second-order polynomial and an exponential + linear trend function to model the trend in the HI for the battery and rolling element bearing case studies, respectively. The trend function is represented as
m(t)=at2+bt+cm(t)=aebt+ct+d
(17)
where the coefficients a, b, c, and d are determined when fitting the trend function to the available HI data. The new formulation for using GPR with the quadratic trend function is written as follows:
p(HIt*|t*,D)=N(HIt*|M(t*,θM)+μt*,σ2+Σt*)
(18)
where
μt*=K(t*,t)(K(t,t)+σ2IT)1(HIM(t,θM))
(19)
Σt*=K(t*,t*)K(t*,t)[K(t,t)+σ2IT]1K(t,t*)
(20)

Above, θM denotes the parameters of an empirical degradation model (e.g., an empirical capacity fade model for the battery case study) which are to be determined from the training dataset D, σ2 is the noise variance of the training data, HI and t are the HI measurements and their corresponding time index from the dataset D, and μt* and Σt* are the predicted mean and variance of the HI at future time index t*, respectively.

2.4.3 LSTM for Remaining Useful Life Prediction.

LSTM RNN models excel at time series forecasting because their cell architecture consists of an internal memory gate that stores time-dependent information relevant to future predictions. Additionally, LSTM models can be flexibly scaled to process big data or run locally on a microcontroller [45]. For these reasons, LSTM models have been extensively studied for RUL prediction of lithium-ion batteries [36,46], rolling element bearings [38,47], and other engineered systems [4850].

In this study, we use a single-layer LSTM architecture with 60 hidden units and a lookback of ten time-steps. To keep with the theme of probabilistic RUL predictors, the LSTM model is trained using a negative log-likelihood (NLL) loss function to output a mean and standard deviation which parameterize a Gaussian distribution of the predicted RUL. A detailed description of this probabilistic LSTM model can be found in our recent paper on bearing prognostics [38]. The negative log-likelihood loss function takes the following form:
logL(θ)=n=1Ntrain[logσL^n2(xn;θ)2+(LnμL^n(xn;θ))22σL^n2(xn;θ)]
(21)
where Ln is the true RUL value of the nth training unit, and μL^n(xn;θ) and σL^n2(xn;θ) are the mean and variance of the final corrected RUL L^n, respectively, i.e., L^n(xn;θ)N(μL^n(xn;θ),σL^n2(xn;θ)) with xn being the input vector to the LSTM model and θ being the trainable LSTM model parameters.

3 Datasets

3.1 Battery Dataset.

The dataset published in Ref. [51] consists of 124 commercial lithium iron phosphate/graphite (LFP) cells manufactured by A123. The complementary study published in Ref. [52] included an additional 45 LFP cells as part of the same experiments. The authors divided the cumulative dataset into four batches of roughly 45 cells each. We adopted the same data partitioning, and the four batches are denoted: training (41 cells), primary test (43 cells), secondary test (40 cells), and tertiary test (45 cells). One primary test cell experienced extremely fast degradation and was removed from the dataset following a recommendation by the authors in Ref. [51]. In this study, we treated the cell discharge capacity as the HI and tracked RUL with respect to an HI threshold. Prior to use, the discharge capacities of all cells in the dataset were normalized by dividing every measurement by the value of the first measurement. This treatment ensures that every cell starts at a normalized capacity of 1. Prediction of cell RUL begins when the normalized capacity falls below 97% of its initial value and ends when the normalized capacity reaches 80%. Some of the cells do not reach the normalized capacity threshold of 80%. For these cells, we linearly extrapolated the previous 50 normalized discharge capacity data until they reached the EOL threshold (roughly 50 additional cycles). Since many of the cells in this dataset exceed 1000 cycles before EOL, we subsampled the dataset five times. This effectively equates to performing RUL prediction every 5th cycle. The cells from the dataset are shown below in Fig. 2.

Fig. 2
Cell discharge capacity trajectories for the 169 LFP cells in the two open-source datasets
Fig. 2
Cell discharge capacity trajectories for the 169 LFP cells in the two open-source datasets
Close modal

3.2 Bearing Dataset.

The Xi'an Jiaotong University and Changxing Sumyoung Technology Co., Ltd. (XJTU-SY) bearing dataset consists of run-to-failure vibration data of 15 rolling element bearings [53] (Table 1). The failure of these bearings was accelerated by applying large radial loads. The 15 bearings were divided into three groups of five bearings where each group was subject to a specific radial load and rotational speed. Two accelerometers mounted in the vertical (y) and horizontal (x) directions were used to gather vibration data for each bearing. Data were collected for 1.28 s every minute at a sampling frequency of 25.6 kHz. The first prediction time and EOL of each bearing were calculated using a threshold-based method described in Ref. [38].

4 Results and Discussion

In this study, we compare the RUL prediction accuracy and uncertainty quantification of the following models: (1) single EUKF, (2) single GPR with quadratic trend function, (3) ensemble of EUKF (En-EUKF), (4) standard ensemble of EUKF and GPR (En-EUKF + GPR) with equal importance to both En-EUKF and GPR, (5) DDDEn of En-EUKF and GPR (DDDEn-EUKF + GPR), (6) single LSTM, (7) simple, equally-weighted ensemble of all models, viz. En-EUKF, GPR, and En-LSTM, named as En-all, (8) degradation-independent ensemble (DIEn), (9) DSDEn, and (10) DDDEn. The first five models and results are exactly the same as those in our conference paper [7]. The additional models and results are an extension of the conference paper by adding an LSTM model and comparing our weighting scheme with other optimization-based weighting schemes from the prognostics literature. For all these models, hyperparameter optimization has been carried out as described in Sec. 2.1. These models are compared by evaluating the metrics described in Sec. 4.1 on the test dataset.

4.1 Prognostic Metrics.

First, we use the root-mean-square error (RMSE) metric to assess the mean prediction accuracy of each model. RMSE is calculated as follows:
RMSE=1Ntrainn=1N(LnL^n)2
(22)
where Ln and L^n are the true and predicted RUL, respectively.

We briefly introduce a few other metrics which are useful in assessing the performance of an RUL prediction algorithm. We first define an accuracy zone around the true RUL using a threshold value, α. Briefly, any prediction within the accuracy zone is considered accurate and would be acceptable in the field. To assess a model's ability to closely predict the true RUL, we define α-accuracy as the percentage of RUL predictions that are within the accuracy zone. Likewise, to assess a model's uncertainty quantification performance, we calculate β as the average probability mass of the predicted RUL probability density function (PDF) which covers the α-accuracy zone. Ideal scores for α-accuracy and β are 100% and 1.0, respectively. These metrics are shown graphically in Fig. 3.

Fig. 3
Visualization of the α-accuracy region and the β metric
Fig. 3
Visualization of the α-accuracy region and the β metric
Close modal
To compare model predictions of RUL mean and uncertainty, we look at the negative log-likelihood, calculated as
NLL(n)=logσL^n22+(LnμL^n)22σL^n2
(23)

Although NLL is defined at each prediction instance n, in comparing the models, we show the median of NLL over the entire test set.

4.2 Remaining Useful Life Prediction of Lithium-Ion Batteries.

Results for the proposed method on the open-source battery dataset were obtained via five repeated runs where the model hyperparameters were reoptimized each run. The results were averaged over the repetitions and the test cells in each repetition.

Table 1

Summary of bearings and testing conditions in XJTU-SY dataset

Operating conditionBearing IDsSpeed (rpm)Radial force (kN)
11_1, 1_2, 1_3, 1_4, 1_5210012
22_1, 2_2, 2_3, 2_4, 2_5225011
33_1, 3_2, 3_3, 3_4, 3_5240010
Operating conditionBearing IDsSpeed (rpm)Radial force (kN)
11_1, 1_2, 1_3, 1_4, 1_5210012
22_1, 2_2, 2_3, 2_4, 2_5225011
33_1, 3_2, 3_3, 3_4, 3_5240010

Figure 4 shows a snapshot of the capacity trajectory prediction(s) for primary test cell #6 by each model considered in this study. We also visualize each model's predicted RUL PDF(s). The single models (GPR, EUKF) tend to produce narrow RUL PDFs, while the ensemble models (En-EUKF, En-EUKF + GPR) produce wide PDFs. In particular, the RUL PDF prediction for the proposed method (En-EUKF + GPR) is observed to span the entirety of the two individual predictors’ PDFs. This is because the ensemble performs a weighted sum of the Gaussian PDFs, where the resulting Gaussian mixture maintains the absolute span in the RUL PDF of each model in the mixture. This improves the model's ability to estimate uncertainty, as discussed further below.

Fig. 4
Forecast of battery capacity with a snapshot of the uncertainty in RUL for all the models
Fig. 4
Forecast of battery capacity with a snapshot of the uncertainty in RUL for all the models
Close modal

Detailed results for the proposed method and similarly comparable methods are shown in Table 2. The results for the proposed method are shown in the far-right column in each grouping (conference paper and journal extension). Each performance metric is shaded for better comparison—the darker the shade, the better the model's performance for that metric. Right away, it is evident the proposed method performed exceptionally well at predicting the RUL of lithium-ion batteries. The use of diverse predictors in the ensemble combined with the adaptive weighting methodology proved to effectively reduce the RMSE over the other methods, particularly the individual models. The LSTM model has significantly better performance than the other individual models; therefore, adding LSTM into the ensemble further improves the performance. Although the RMSEs of the optimization-based weighting models like the DIEn and DSDEn are similar to DDDEn, DDDEn has superior performance in metrics like α–accuracy and NLL. This indicates that the more complex dynamic weighting method significantly improves other aspects of the RUL prediction, like timeliness (α–accuracy) and prediction uncertainty (NLL).

Table 2

Evaluation metrics for various models on training and three test datasets

Conference paperJournal extension
EUKFGPREn-EUKFEn-EUKF + GPRDDDEn-EUKF + GPRLSTMEn-allDIEnDSDEnDDDEn
RMSE (cycles)
Training30.029.428.427.725.133.726.628.127.724.2
Test 143.143.939.239.132.351.138.541.140.739.0
Test 230.765.230.437.831.236.927.425.525.126.2
Test 320.137.524.226.822.38.719.615.915.415.3
α—Accuracy (%)
Training27.036.619.029.147.854.335.247.149.973.0
Test 124.736.819.526.045.345.831.044.149.760.8
Test 240.117.632.518.642.042.728.046.951.956.4
Test 312.65.89.56.136.051.99.713.716.840.5
β—Probability
Training0.260.380.220.280.380.430.240.250.270.34
Test 10.240.360.220.260.370.380.240.240.260.30
Test 20.390.170.300.220.340.350.230.240.260.24
Test 30.120.100.110.110.280.410.170.200.210.23
NLL
Training28.93.24.73.02.50.72.01.81.61.3
Test 134.83.64.93.32.80.92.22.01.71.4
Test 229.29.63.64.03.10.92.42.21.92.1
Test 371.99.76.34.23.21.32.52.11.91.6
Conference paperJournal extension
EUKFGPREn-EUKFEn-EUKF + GPRDDDEn-EUKF + GPRLSTMEn-allDIEnDSDEnDDDEn
RMSE (cycles)
Training30.029.428.427.725.133.726.628.127.724.2
Test 143.143.939.239.132.351.138.541.140.739.0
Test 230.765.230.437.831.236.927.425.525.126.2
Test 320.137.524.226.822.38.719.615.915.415.3
α—Accuracy (%)
Training27.036.619.029.147.854.335.247.149.973.0
Test 124.736.819.526.045.345.831.044.149.760.8
Test 240.117.632.518.642.042.728.046.951.956.4
Test 312.65.89.56.136.051.99.713.716.840.5
β—Probability
Training0.260.380.220.280.380.430.240.250.270.34
Test 10.240.360.220.260.370.380.240.240.260.30
Test 20.390.170.300.220.340.350.230.240.260.24
Test 30.120.100.110.110.280.410.170.200.210.23
NLL
Training28.93.24.73.02.50.72.01.81.61.3
Test 134.83.64.93.32.80.92.22.01.71.4
Test 229.29.63.64.03.10.92.42.21.92.1
Test 371.99.76.34.23.21.32.52.11.91.6

Figure 5 shows the confidence level calibration curves of five select models. The proposed method was slightly underconfident in its predictions, indicated by the majority of its calibration curve falling above the y = x line. In general, under-confidence is preferred, as it builds a margin of safety into reliability and manufacturing engineers’ maintenance practice. In contrast, the rest of the models were largely overconfident in their RUL predictions. Another observation from Fig. 5 is that En-EUKF is much less overconfident than a single EUKF, proving that the concept of an ensemble, in general, improves model uncertainty quantification and the reliability in RUL prediction.

Fig. 5
Confidence level calibration curves comparing the uncertainty estimation performance of five models on the battery dataset. Perfect uncertainty quantification follows y = x.
Fig. 5
Confidence level calibration curves comparing the uncertainty estimation performance of five models on the battery dataset. Perfect uncertainty quantification follows y = x.
Close modal

Another notable observation is the consistent prediction accuracy and predictive uncertainty quantification across all three test datasets. The open-source battery dataset used in this study is extremely diverse in the cells’ lifetime, as shown in Fig. 2, resulting from diverse conditions under which the cells were charged during repeated charge-discharge cycling. It is challenging to build prognostic models that perform well on such diverse degradation trajectories. The proposed method performed well on such a diverse dataset because of the diverse model-based predictors in the ensemble. Additionally, we observe (not depicted in any figure) that the EUKF and LSTM models would underestimate RUL most of the time, while the GPR model would overestimate RUL. This scenario is ideal because an optimal set of weights exists to accurately predict the true RUL, consistent with what we observe. The surrogate model was able to learn the general trend of the optimal model weights over cells’ lifetime and accurately predict the optimal weights on the three test datasets. This is confirmed by comparing the RMSEs of the DDDEn to the En-all in Table 2.

Figure 6 shows the variation of model weights for the DIEn and DSDEn models. During the early degradation stage (stage 1), DSDEn assigns larger weights to the LSTM and GPR models. In stage 2, the EUKF model receives almost zero importance, and LSTM and GPR have almost equal importance. However, close to EOL, the GPR model gets the highest importance because its underlying trend function can accurately model each cell's capacity trajectory, thus accurately predicting its RUL.

Fig. 6
Variation of the three model weights for DIEn and DSDEn for the battery dataset. The vertical black lines indicate the cutoffs separating one stage from another.
Fig. 6
Variation of the three model weights for DIEn and DSDEn for the battery dataset. The vertical black lines indicate the cutoffs separating one stage from another.
Close modal

On the other hand, the DIEn model does not adaptively vary the weights over the course of degradation, which is why the weights of the three predictors remain different but constant. Interestingly, the LSTM and GPR weights determined by the DIEn model for stage 1 seem very close to those determined by the DSDEn model. This is because predicting battery RUL in the early stages (stages 1 and 2) is more difficult, and the overall prediction error can be drastically improved by optimizing the ensemble weights for this region. While the RMSEs of both the DIEn and DSDEn methods are similar, the extra flexibility of the DSDEn model to change the model weights across the three regions led to the slightly lower RMSEs and better performance.

Last, we would like to briefly discuss the performance of the proposed dynamically weighted ensemble when asked to predict on a set of test units which are very different from the units used for training. In general, machine learning models perform poorly when a data distribution shift occurs where the distribution of the unseen test data is different than that of the training data. Table 3 presents the summary statistics calculated for each of the four datasets which comprise the 169 LFP dataset [54]. We report the mean initial capacity and mean slope of the capacity fade trajectory for the first 200 cycles.

Table 3

Summary statistics for the four datasets comprising 169 LFP battery dataset

DatasetInitial capacity (Ah)Capacity trajectory slope (first 200 cycles) (Ah/cycle)
Training1.074−9.9 × 10–5
Primary test1.074−6.1 × 10–5
Secondary test1.063−3.8 × 10–5
Tertiary test1.051−2.4 × 10–5
DatasetInitial capacity (Ah)Capacity trajectory slope (first 200 cycles) (Ah/cycle)
Training1.074−9.9 × 10–5
Primary test1.074−6.1 × 10–5
Secondary test1.063−3.8 × 10–5
Tertiary test1.051−2.4 × 10–5

Both the initial capacity and the slope of the capacity trajectory in the initial 200 cycles greatly affect the performance of the prognostic algorithms used in this work because they significantly alter the forecasted capacity trajectories and, thus, the predicted RUL. The statistics in Table 3 shows that there is a significant data distribution shift from the training dataset to the secondary and tertiary test datasets. The effect of this distribution shift on model accuracy and uncertainty is most visible comparing the NLL of a single model to that of an ensemble model from the results in Table 2. The NLL values for the EUKF and GPR models are significantly higher for the secondary and tertiary test datasets than for the training and primary test datasets. This is due entirely to the distribution shift causing the model to predict inaccurately and be more uncertain in its predictions. These results align with previously reported results where single machine learning models were found to be less accurate on the secondary and tertiary test datasets because of the distribution shift [54]. However, if we look at the NLL values of any of the ensemble models, for example, DDDEn, we see that its NLL values are consistent across all four datasets. The consistent accuracy and uncertainty quantification of the ensemble models even in the presence of distribution shift is due to the aggregated effect of combining the predictions from multiple predictors. So, while the accuracy of an ensemble might decrease in the presence of a distribution shift, the power of an ensemble is that the uncertainty is captured accordingly.

4.3 Remaining Useful Life Prediction of Rolling Element Bearings.

The 15 bearings from the XJTU-SY bearing dataset were split into five folds for cross-validation, as shown in Table 4. The root-mean-square (RMS) of the vibration signal RMS(t)=i=1N|ai|2 was used as the HI, where a is the accelerometer measurement. With a sampling time of 1.28 s collected at a sampling frequency of 25.6 kHz, the total number of datapoints N used for determining the RMS at each vibration measurement is N = 32, 768. The RMS represents energy of vibration and is a commonly used HI by the bearing prognostics community.

Table 4

Test bearing IDs and number of test samples for each cross-validation fold

Fold #Test bearing IDs# Test samples
11_1, 2_1, 3_1221
21_2, 2_2, 3_2506
31_3, 2_3, 3_3334
4a1_4*, 2_4, 3_463
51_5, 2_5, 3_5168
Fold #Test bearing IDs# Test samples
11_1, 2_1, 3_1221
21_2, 2_2, 3_2506
31_3, 2_3, 3_3334
4a1_4*, 2_4, 3_463
51_5, 2_5, 3_5168
a

Bearing 1_4 undergoes sudden catastrophic failure and is therefore not considered in this study.

During the cross-validation study, a single-fold was chosen as the test set while the other four folds were used for hyperparameter optimization and surrogate model construction. The cross-validation study was repeated five times, and the average of the performance metrics in those five independent runs is shown in Table 5. The entries for each fold over every metric have been color-coded for easy comparison, where the darker shade indicates better model performance. The overall comparison in Table 5 was achieved by combining all the folds and assigning weights to each fold that are proportional to the number of test samples in the given fold. The proposed DDDEn-EUKF + GPR model performed the best for most of the metrics and nearly all test scenarios. It was closely followed by DDDEn (with LSTM). Unlike the battery dataset results, the LSTM model has much poorer performance than EUKF and GPR. This is most likely due to the vastly smaller amount of training data available with the bearing dataset and the overall higher level of data noise. As a result, DDDEn without LSTM performs slightly better than the ensemble which includes LSTM. This highlights the importance of selecting the right type of predictor by properly balancing accuracy and diversity, particularly in cases where the individual predictors vastly disagree. Ultimately, the benefit of using an ensemble is the consistency of prediction accuracy and uncertainty quantification across the test datasets. Occasionally, a single GPR or EUKF model outperformed the more complex models, but this only happened for certain folds and was not significant across all the tests. The solid overall performance of the proposed method is due to the dynamic weighting scheme which effectively combines the strengths of each model in the ensemble by anticipating which model may provide the best RUL estimate at a given time. We would also like to note that, in this case study, other ensemble methods are not too far from the proposed model, suggesting that the proposed dynamic weighting scheme may need further tuning on this dataset.

Table 5

Evaluation metrics of all the models on the five-fold cross-validation study

Conference paperJournal extension
EUKFGPREn-EUKFEn-EUKF + GPRDDDEn-EUKF + GPRLSTMEn-allDIEnDSDEnDDDEn
RMSE
Fold 148.040.257.645.240.344.930.033.242.126.8
Fold 282.389.982.084.688.5116.394.9104.0114.0102.4
Fold 380.375.481.175.072.983.175.976.281.069.9
Fold 440.956.748.840.536.010.738.136.710.519.8
Fold 545.635.250.638.938.128.128.930.826.421.7
Net69.168.971.967.367.184.871.577.584.475.5
α—Accuracy (%)
Fold 111.821.77.222.233.017.226.734.719.926.7
Fold 210.124.319.021.125.94.221.314.95.318.2
Fold 314.12.115.07.82.77.58.17.88.415.3
Fold 423.84.87.94.99.534.94.86.947.619.0
Fold 515.510.19.57.717.214.310.115.319.328.0
Net12.815.314.215.319.212.418.217.612.716.9
β—Probability
Fold 10.140.220.150.290.310.160.250.260.160.26
Fold 20.140.220.190.200.250.040.180.140.050.17
Fold 30.130.020.140.090.100.070.090.060.090.12
Fold 40.090.030.100.040.090.310.070.090.390.17
Fold 50.150.110.080.100.150.110.110.140.150.20
Net0.140.140.150.170.200.110.170.150.110.16
NLL
Fold 122.73.13.72.72.71.72.92.81.52.6
Fold 226.22.22.92.72.413.72.62.77.82.6
Fold 36.211.73.12.93.35.52.911.93.72.8
Fold 42.61.92.52.51.81.22.32.51.11.8
Fold 56.93.42.82.73.21.52.62.71.42.3
Net16.85.03.12.82.82.12.72.72.02.6
Conference paperJournal extension
EUKFGPREn-EUKFEn-EUKF + GPRDDDEn-EUKF + GPRLSTMEn-allDIEnDSDEnDDDEn
RMSE
Fold 148.040.257.645.240.344.930.033.242.126.8
Fold 282.389.982.084.688.5116.394.9104.0114.0102.4
Fold 380.375.481.175.072.983.175.976.281.069.9
Fold 440.956.748.840.536.010.738.136.710.519.8
Fold 545.635.250.638.938.128.128.930.826.421.7
Net69.168.971.967.367.184.871.577.584.475.5
α—Accuracy (%)
Fold 111.821.77.222.233.017.226.734.719.926.7
Fold 210.124.319.021.125.94.221.314.95.318.2
Fold 314.12.115.07.82.77.58.17.88.415.3
Fold 423.84.87.94.99.534.94.86.947.619.0
Fold 515.510.19.57.717.214.310.115.319.328.0
Net12.815.314.215.319.212.418.217.612.716.9
β—Probability
Fold 10.140.220.150.290.310.160.250.260.160.26
Fold 20.140.220.190.200.250.040.180.140.050.17
Fold 30.130.020.140.090.100.070.090.060.090.12
Fold 40.090.030.100.040.090.310.070.090.390.17
Fold 50.150.110.080.100.150.110.110.140.150.20
Net0.140.140.150.170.200.110.170.150.110.16
NLL
Fold 122.73.13.72.72.71.72.92.81.52.6
Fold 226.22.22.92.72.413.72.62.77.82.6
Fold 36.211.73.12.93.35.52.911.93.72.8
Fold 42.61.92.52.51.81.22.32.51.11.8
Fold 56.93.42.82.73.21.52.62.71.42.3
Net16.85.03.12.82.82.12.72.72.02.6

Unlike the battery dataset explored in Sec. 4.2, the bearing dataset is much noisier and less monotonic, making it challenging to forecast the HI accurately. The non-monotonic nature of the HI caused issues for the single models. Almost 10% of the test data using single GPR and EUKF did not have an RUL prediction because the predicted trajectories of the HI were found not to cross the EOL threshold. In contrast, the ensemble model provided RUL prediction for most test samples where there was at least one RUL prediction from EUKF, GPR, or LSTM. This has to be accounted for when comparing models in Table 5.

Figure 7 shows the confidence level calibration curves for select models trained and tested on the bearing dataset. Similar to results for the battery dataset, the individual models are found to be overconfident in their predictions while the ensemble models produce more reliable uncertainty estimates, indicated by their closeness to the ideal calibration line. The proposed DDDEn model is the least overconfident of the group.

Fig. 7
Confidence level calibration curves comparing the uncertainty estimation performance of each model on the battery dataset. Perfect uncertainty quantification follows y = x.
Fig. 7
Confidence level calibration curves comparing the uncertainty estimation performance of each model on the battery dataset. Perfect uncertainty quantification follows y = x.
Close modal

As stated previously, the bearing dataset is very noisy and non-monotonic, making the lines between the different degradation stages blurry. In Fig. 8, we show the ensemble weights from the DIEn and DSDEn models over the range of the HI. Due to the non-monotonic nature of the dataset, the HI could decrease in value back into the range of a lower degradation stage but still be classified as a higher stage. This is why some symbols from a higher degradation stage appear in a lower stage. As shown in Fig. 8, the LSTM model is given very low priority in stages 2 and 3 due to its poor performance. This is because the model-based prediction methods (EUKF and GPR) are much more accurate at estimating RUL in the late aging stage, due to the use of an underlying mathematical model (exponential and quadratic) which accurately models the observed degradation trajectories. Essentially the comparison of individual models (EUKF versus GPR versus LSTM) from Table 2 and Table 5 and their relative importance obtained from DIEn and DSDEn models (Figs. 6 and 8) can be used as a precursor to model selection for creating an ensemble. Although, for the bearing case study, adding a vanilla LSTM RNN seems to negatively affect the ensemble's performance, this does not rule out the potential benefits of other curated LSTM models that are established to work on the same dataset, such as those in Ref. [38].

Fig. 8
Variation of the three model weights for DIEn and DSDEn model for the bearing dataset. The vertical black lines indicate the cutoffs separating one stage from another.
Fig. 8
Variation of the three model weights for DIEn and DSDEn model for the bearing dataset. The vertical black lines indicate the cutoffs separating one stage from another.
Close modal

5 Conclusion

In this paper, we have explored an ensemble of models with diverse architectures to provide robust and consistent RUL predictions. The models are combined using a dynamic weighting that assigns model importance based on the predictions of individual models and the health index at the current time instance. Using two open-source datasets, one pertaining to battery capacity fade and the other related to rolling element bearing failure, we show the superiority of the proposed ensemble model in its ability to both reduce RUL prediction error and improve the accuracy in estimating RUL predictive uncertainty than comparable methods. The dynamic weighting algorithm helps to reduce the degree of model uncertainty and overconfidence. We also compared our method to other state-of-the-art optimization-based ensemble weighting techniques that estimate either degradation-independent model weights (DIEn) or degradation stage-dependent model weights (DSDEn). The ensemble models show superior performance when the health index is less noisy and monotonic. On the other hand, a noisy and non-monotonic health index leads to strong disagreements among the diverse predictors causing ensemble methods to perform similarly to individual predictors. However, in either case, ensembles of diverse predictors were found to be reliable and consistent across test cases. Although we use EUKF, LSTM, and GPR to form the ensemble, the concept of choosing models which are found to predict RUL differently under the same conditions can be extended to include other model-based and data-driven methods. In our future work, we aim to investigate the proper selection of diverse models depending on the dataset. We also aim to improve the dynamic weighting surrogate model.

Replication of Results

The individual model predictors described in Sec. 2.4 (EUKF, GPR, and LSTM) have been implemented in matlab and python (Tensorflow/Keras) on the battery dataset. These diverse models and the implementation of three weighted ensemble methods described in Secs. 2.2 and 2.3 (En-all, DIEn, and DSDEn) are available for download on our GitHub page.5

Footnotes

Acknowledgment

This work was supported in part by Vermeer Corporation. Any opinions, findings, or conclusions in this paper are those of the authors and do not necessarily reflect the sponsor's views.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The data and information that support the findings of this article are freely available.6

References

1.
Kordestani
,
M.
,
Saif
,
M.
,
Orchard
,
M. E.
,
Razavi-Far
,
R.
, and
Khorasani
,
K.
,
2021
, “
Failure Prognosis and Applications—A Survey of Recent Literature
,”
IEEE Trans. Reliab.
,
70
(
2
), pp.
728
748
.
2.
Lei
,
Y.
,
Li
,
N.
,
Guo
,
L.
,
Li
,
N.
,
Yan
,
T.
, and
Lin
,
J.
,
2018
, “
Machinery Health Prognostics: A Systematic Review From Data Acquisition to RUL Prediction
,”
Mech. Syst. Signal Process.
,
104
, pp.
799
834
.
3.
Sikorska
,
J. Z.
,
Hodkiewicz
,
M.
, and
Ma
,
L.
,
2011
, “
Prognostic Modelling Options for Remaining Useful Life Estimation by Industry
,”
Mech. Syst. Signal Process.
,
25
(
5
), pp.
1803
1836
.
4.
Wu
,
J.
,
Wu
,
C.
,
Cao
,
S.
,
Or
,
S. W.
,
Deng
,
C.
, and
Shao
,
X.
,
2019
, “
Degradation Data-Driven Time-to-Failure Prognostics Approach for Rolling Element Bearings in Electrical Machines
,”
IEEE Trans. Ind. Electron.
,
66
(
1
), pp.
529
539
.
5.
Behzad
,
M.
,
Arghan
,
H. A.
,
Bastami
,
A. R.
, and
Zuo
,
M. J.
,
2017
, “
Prognostics of Rolling Element Bearings With the Combination of Paris Law and Reliability Method
,” 2017 Prognostics and System Health Management Conference (PHM-Harbin), IEEE, pp.
1
6
.
6.
Hu
,
X.
,
Xu
,
L.
,
Lin
,
X.
, and
Pecht
,
M.
,
2020
, “
Battery Lifetime Prognostics
,”
Joule
,
4
(
2
), pp.
310
346
.
7.
Nemani
,
V. P.
,
Thelen
,
A.
,
Hu
,
C.
, and
Daining
,
S.
,
2022
, “
Dynamically Weighted Ensemble of Diverse Learners for Remaining Useful Life Prediction
,”
ASME 2022 International Design Engineering Technical Conference
,
St. Louis, MO
,
Aug 14– 17
, pp.
1
8
.
8.
Saha
,
B.
,
Goebel
,
K.
,
Poll
,
S.
, and
Christophersen
,
J.
,
2009
, “
Prognostics Methods for Battery Health Monitoring Using a Bayesian Framework
,”
IEEE Trans. Instrum. Meas.
,
58
(
2
), pp.
291
296
.
9.
Wang
,
D.
, and
Tsui
,
K. L.
,
2017
, “
Statistical Modeling of Bearing Degradation Signals
,”
IEEE Trans. Reliab.
,
66
(
4
), pp.
1331
1344
.
10.
Tang
,
S.
,
Yu
,
C.
,
Wang
,
X.
,
Guo
,
X.
, and
Si
,
X.
,
2014
, “
Remaining Useful Life Prediction of Lithium-Ion Batteries Based on the Wiener Process With Measurement Error
,”
Energies
,
7
(
2
), pp.
520
547
.
11.
Zhai
,
Q.
, and
Ye
,
Z. S.
,
2017
, “
RUL Prediction of Deteriorating Products Using an Adaptive Wiener Process Model
,”
IEEE Trans. Ind. Informatics
,
13
(
6
), pp.
2911
2921
.
12.
Attia
,
P. M.
,
Chueh
,
W. C.
, and
Harris
,
S. J.
,
2020
, “
Revisiting the t 0.5 Dependence of SEI Growth
,”
J. Electrochem. Soc.
,
167
(
9
), p.
090535
.
13.
Miao
,
Q.
,
Xie
,
L.
,
Cui
,
H.
,
Liang
,
W.
, and
Pecht
,
M.
,
2013
, “
Remaining Useful Life Prediction of Lithium-Ion Battery With Unscented Particle Filter Technique
,”
Microelectron. Reliab.
,
53
(
6
), pp.
805
810
.
14.
Muetze
,
A.
, and
Strangas
,
E. G.
,
2015
, “
The Useful Life of Interver-Based Drive Bearings
”, no.
May
2016
, pp.
63
73
.
15.
Yu
,
W. K.
, and
Harris
,
T. A.
,
2001
, “
A New Stress-Based Fatigue Life Model for Ball Bearings
,”
Tribol. Trans.
,
44
(
1
), pp.
11
18
.
16.
Brown
,
J.
,
Wang
,
G.
,
Scott
,
E.
,
Brown
,
J.
,
Schmidt
,
C.
, and
Howard
,
W.
,
2005
, “
A Practical Longevity Model for Lithium-Ion Batteries: De-Coupling the Time and Cycle-Dependence of Capacity Fade
,”
ECS Meet. Abstr.
,
MA2005-02
(
4
), pp.
1
2
.
17.
Wang
,
Y.
,
Peng
,
Y.
,
Zi
,
Y.
,
Jin
,
X.
, and
Tsui
,
K. L.
,
2016
, “
A Two-Stage Data-Driven-Based Prognostic Approach for Bearing Degradation Problem
,”
IEEE Trans. Ind. Informatics
,
12
(
3
), pp.
924
932
.
18.
Qian
,
Y.
,
Yan
,
R.
,
Member
,
S.
, and
Hu
,
S.
,
2014
, “
Bearing Degradation Evaluation Using Recurrence Quantification Analysis and Kalman Filter
,”
IEEE Trans. Instrum. Meas.
,
63
(
11
), pp.
2599
2610
.
19.
Walker
,
E.
,
Rayman
,
S.
, and
White
,
R. E.
,
2015
, “
Comparison of a Particle Filter and Other State Estimation Methods for Prognostics of Lithium-Ion Batteries
,”
J. Power Sources.
,
287
, pp.
1
12
.
20.
He
,
W.
,
Williard
,
N.
,
Osterman
,
M.
, and
Pecht
,
M.
,
2011
, “
Prognostics of Lithium-Ion Batteries Based on Dempster-Shafer Theory and the Bayesian Monte Carlo Method
,”
J. Power Sources
,
196
(
23
), pp.
10314
10321
.
21.
Wang
,
D.
,
Miao
,
Q.
, and
Pecht
,
M.
,
2013
, “
Prognostics of Lithium-Ion Batteries Based on Relevance Vectors and a Conditional Three-Parameter Capacity Degradation Model
,”
J. Power Sources
,
239
, pp.
253
264
.
22.
Singleton
,
R. K.
,
Strangas
,
E. G.
, and
Aviyente
,
S.
,
2015
, “
Extended Kalman Filtering for Remaining-Useful-Life Estimation of Bearings
,”
IEEE Trans. Ind. Electron.
,
62
(
3
), pp.
1781
1790
.
23.
Cui
,
L.
,
Wang
,
X.
,
Xu
,
Y.
,
Jiang
,
H.
, and
Zhou
,
J.
,
2019
, “
A Novel Switching Unscented Kalman Filter Method for Remaining Useful Life Prediction of Rolling Bearing
,”
Meas. J. Int. Meas. Confed.
,
135
, pp.
678
684
.
24.
Plett
,
G. L.
,
2004
, “
Extended Kalman Filtering for Battery Management Systems of LiPB-Based HEV Battery Packs—Part 3. State and Parameter Estimation
,”
J. Power Sources
,
134
(
2
), pp.
277
292
.
25.
Plett
,
G. L.
,
2006
, “
Sigma-Point Kalman Filtering for Battery Management Systems of LiPB-Based HEV Battery Packs. Part 2: Simultaneous State and Parameter Estimation
,”
J. Power Sources
,
161
(
2
), pp.
1369
1384
.
26.
Gebraeel
,
N.
,
Lawley
,
M.
,
Liu
,
R.
, and
Parmeshwaran
,
V.
,
2004
, “
Residual Life Predictions From Vibration-Based Degradation Signals: A Neural Network Approach
,”
IEEE Trans. Ind. Electron.
,
51
(
3
), pp.
694
700
.
27.
Huang
,
R.
,
Xi
,
L.
,
Li
,
X.
,
Richard Liu
,
C.
,
Qiu
,
H.
, and
Lee
,
J.
,
2007
, “
Residual Life Predictions for Ball Bearings Based on Self-Organizing Map and Back Propagation Neural Network Methods
,”
Mech. Syst. Signal Process.
,
21
(
1
), pp.
193
207
.
28.
Guo
,
L.
,
Li
,
N.
,
Jia
,
F.
,
Lei
,
Y.
, and
Lin
,
J.
,
2017
, “
A Recurrent Neural Network Based Health Indicator for Remaining Useful Life Prediction of Bearings
,”
Neurocomputing
,
240
, pp.
98
109
.
29.
Benkedjouh
,
T.
,
Medjaher
,
K.
,
Zerhouni
,
N.
, and
Rechak
,
S.
,
2013
, “
Remaining Useful Life Estimation Based on Nonlinear Feature Reduction and Support Vector Regression
,”
Eng. Appl. Artif. Intell.
,
26
(
7
), pp.
1751
1760
.
30.
Loutas
,
T. H.
,
Roulias
,
D.
, and
Georgoulas
,
G.
,
2013
, “
Remaining Useful Life Estimation in Rolling Bearings Utilizing Data-Driven Probabilistic E-Support Vectors Regression
,”
IEEE Trans. Reliab.
,
62
(
4
), pp.
821
832
.
31.
Di Maio
,
F.
,
Tsui
,
K. L.
, and
Zio
,
E.
,
2012
, “
Combining Relevance Vector Machines and Exponential Regression for Bearing Residual Life Estimation
,”
Mech. Syst. Signal Process.
,
31
, pp.
405
427
.
32.
Wang
,
B.
,
Lei
,
Y.
,
Li
,
N.
, and
Yan
,
T.
,
2019
, “
Deep Separable Convolutional Network for Remaining Useful Life Prediction of Machinery
,”
Mech. Syst. Signal Process.
,
134
, p.
106330
.
33.
Hinchi
,
A. Z.
, and
Tkiouat
,
M.
,
2018
, “
Rolling Element Bearing Remaining Useful Life Estimation Based on a Convolutional Long-Short-Term Memory Network
,”
Procedia Comput. Sci.
,
127
, pp.
123
132
.
34.
Li
,
X.
,
Zhang
,
W.
, and
Ding
,
Q.
,
2019
, “
Deep Learning-Based Remaining Useful Life Estimation of Bearings Using Multi-Scale Feature Extraction
,”
Reliab. Eng. Syst. Saf.
,
182
, pp.
208
218
.
35.
Park
,
K.
,
Choi
,
Y.
,
Choi
,
W. J.
,
Ryu
,
H. Y.
, and
Kim
,
H.
,
2020
, “
LSTM-Based Battery Remaining Useful Life Prediction With Multi-channel Charging Profiles
,”
IEEE Access
,
8
, pp.
20786
20798
.
36.
Liao
,
L.
, and
Köttig
,
F.
,
2016
, “
A Hybrid Framework Combining Data-Driven and Model-Based Methods for System Remaining Useful Life Prediction
,”
Appl. Soft Comput. J.
,
44
, pp.
191
199
.
37.
Zhang
,
Y.
,
Xiong
,
R.
,
He
,
H.
, and
Pecht
,
M. G.
,
2018
, “
Long Short-Term Memory Recurrent Neural Network for Remaining Useful Life Prediction of Lithium-Ion Batteries
,”
IEEE Trans. Veh. Technol.
,
67
(
7
), pp.
5695
5705
.
38.
Nemani
,
V. P.
,
Lu
,
H.
,
Thelen
,
A.
,
Hu
,
C.
, and
Zimmerman
,
A. T.
,
2022
, “
Ensembles of Probabilistic LSTM Predictors and Correctors for Bearing Prognostics Using Industrial Standards
,”
Neurocomputing
,
491
, pp.
575
596
.
39.
Hu
,
C.
,
Youn
,
B. D.
,
Wang
,
P.
, and
Taek Yoon
,
J.
,
2012
, “
Ensemble of Data-Driven Prognostic Algorithms for Robust Prediction of Remaining Useful Life
,”
Reliab. Eng. Syst. Saf.
,
103
, pp.
120
135
.
40.
Li
,
Z.
,
Wu
,
D.
,
Hu
,
C.
, and
Terpenny
,
J.
,
2019
, “
An Ensemble Learning-Based Prognostic Approach With Degradation-Dependent Weights for Remaining Useful Life Prediction
,”
Reliab. Eng. Syst. Saf.
,
184
, pp.
110
122
.
41.
Xia
,
T.
,
Song
,
Y.
,
Zheng
,
Y.
,
Pan
,
E.
, and
Xi
,
L.
,
2020
, “
An Ensemble Framework Based on Convolutional Bi-Directional LSTM With Multiple Time Windows for Remaining Useful Life Estimation
,”
Comput. Ind.
,
115
, p.
103182
.
42.
Shi
,
J.
,
Yu
,
T.
,
Goebel
,
K.
, and
Wu
,
D.
,
2021
, “
Remaining Useful Life Prediction of Bearings Using Ensemble Learning: The Impact of Diversity in Base Learners and Features
,”
ASME J. Comput. Inf. Sci. Eng.
,
21
(
2
), p.
021004
.
43.
Terejanu
,
G. A.
,
2011
, “
Unscented Kalman Filter Tutorial
,”
Univ. Buffalo, Dep. Comput. Sci. Eng. NY
, no.
1
, pp.
1
6
.
44.
Wan
,
E. A.
, and
Van Der Merwe
,
R.
, “
The Unscented Kalman Filter for Nonlinear Estimation
,”
Technology
,
7
(
3
), pp.
3
8
.
45.
Pau
,
D.
,
Denaro
,
D.
,
Gruosso
,
G.
, and
Sahnoun
,
A.
,
2021
, “
Microcontroller Architectures for Battery State of Charge Prediction With Tiny Neural Networks
,”
IEEE International Conference Consumer Electronics (ICCE-Berlin)
,
Berlin, Germany
,
Nov. 15–18
.
46.
Cheng
,
Y.
,
Wu
,
J.
,
Zhu
,
H.
,
Or
,
S. W.
, and
Shao
,
X.
,
2021
, “
Remaining Useful Life Prognosis Based on Ensemble Long Short-Term Memory Network
,”
IEEE Trans. Instrum. Meas.
,
70
, pp.
1
12
.
47.
Lu
,
H.
,
Barzegar
,
V.
,
Nemani
,
V. P.
,
Hu
,
C.
,
Laflamme
,
S.
, and
Zimmerman
,
A. T.
,
2022
, “
Joint Training of a Predictor Network and a Generative Adversarial Network for Time Series Forecasting: A Case Study of Bearing Prognostics
,”
Expert Syst. Appl.
,
203
, p.
117415
.
48.
Malhotra
,
P.
,
2016
, “
Multi-Sensor Prognostics Using an Unsupervised Health Index Based on LSTM Encoder-Decoder
”, [Online]. Available: http://arxiv.org/abs/1608.06154
49.
Yuan
,
M.
,
Wu
,
Y.
, and
Lin
,
L.
,
2016
, “
Fault Diagnosis and Remaining Useful Life Estimation of Aero Engine Using LSTM Neural Network
,” pp.
135
140
.
50.
Liu
,
Y.
,
Zhao
,
G.
, and
Peng
,
X.
,
2019
, “
Deep Learning Prognostics for Lithium-Ion Battery Based on Ensembled Long Short-Term Memory Networks
,”
IEEE Access
,
7
(
Mcmc
), pp.
155130
155142
.
51.
Severson
,
K. A.
,
Attia
,
P. M.
,
Jin
,
N.
,
Perkins
,
N.
,
Jiang
,
B.
,
Yang
,
Z.
,
Chen
,
M. H
,
2019
, “
Data-Driven Prediction of Battery Cycle Life Before Capacity Degradation
,”
Nat. Energy
,
4
(
5
), pp.
383
391
.
52.
Attia
,
P. M.
,
Grover
,
A.
,
Jin
,
N.
,
Severson
,
K. A.
,
Markov
,
T. M.
,
Liao
,
Y.-H.
,
Chen
,
M. H.
, et al
,
2020
, “
Closed-Loop Optimization of Fast-Charging Protocols for Batteries With Machine Learning
,”
Nature
,
578
(
7795
), pp.
397
402
.
53.
Wang
,
B.
,
Lei
,
Y.
,
Li
,
N.
, and
Li
,
N.
,
2020
, “
A Hybrid Prognostics Approach for Estimating Remaining Useful Life of Rolling Element Bearings
,”
IEEE Trans. Reliab.
,
69
(
1
), pp.
401
412
.
54.
Thelen
,
A.
,
Li
,
M.
,
Hu
,
C.
,
Bekyarova
,
E.
,
Kalinin
,
S.
, and
Sanghadasa
,
M.
,
2022
, “
Augmented Model-Based Framework for Battery Remaining Useful Life Prediction
,”
Appl. Energy
,
324
, p.
119624
.