Abstract

Surrogate models play a vital role in overcoming the computational challenge in designing and analyzing nonlinear dynamic systems, especially in the presence of uncertainty. This paper presents a comparative study of different surrogate modeling techniques for nonlinear dynamic systems. Four surrogate modeling methods, namely, Gaussian process (GP) regression, a long short-term memory (LSTM) network, a convolutional neural network (CNN) with LSTM (CNN-LSTM), and a CNN with bidirectional LSTM (CNN-BLSTM), are studied and compared. All these model types can predict the future behavior of dynamic systems over long periods based on training data from relatively short periods. The multi-dimensional inputs of surrogate models are organized in a nonlinear autoregressive exogenous model (NARX) scheme to enable recursive prediction over long periods, where current predictions replace inputs from the previous time window. Three numerical examples, including one mathematical example and two nonlinear engineering analysis models, are used to compare the performance of the four surrogate modeling techniques. The results show that the GP-NARX surrogate model tends to have more stable performance than the other three deep learning (DL)-based methods for the three particular examples studied. The tuning effort of GP-NARX is also much lower than its deep learning-based counterparts.

1 Introduction

Modeling of dynamic systems plays a vital role in the analysis, design, health monitoring, and control of various dynamic engineering systems, such as chemical processes [1], civil infrastructure [2], battery systems [3], and vibratory mechanical systems [4]. A dynamic system can be linear or nonlinear. The modeling of nonlinear dynamic systems, in general, is more complicated than that of linear systems.

In order to describe the complex nonlinear behaviors of dynamic systems, various approaches have been developed for different types of systems using analytical methods or sophisticated computer simulations. Analytical models obtained through theoretical modeling based on simplifications and assumptions usually have a low requirement on the computational effort. However, the prediction accuracy is usually sacrificed due to model simplifications [5]. With the development of high-performance computing and advanced computational methods, high-fidelity computer simulations are becoming more common in the design and analysis of various dynamic systems. They play an essential role in the design of reliable complex nonlinear engineering systems, such as hypersonic aircraft [6], off-road vehicles [7], and large civil infrastructure [8]. While the sophisticated computer simulation models significantly increase the prediction accuracy, the required computational effort is also drastically increased, which poses challenges to the associated analysis, design, model updating (e.g., a digital twin), and model predictive control [9].

Aims to overcome the computational challenges introduced by the sophisticated computer simulation models, surrogate models constructed using machine learning techniques are usually used to substitute the original computer simulation models or experiments. As data-driven techniques, surrogate models construct a model of the original model by establishing a relationship between inputs and outputs of the complex dynamic system based on data. It provides a promising way of efficiently predicting the nonlinear dynamic behavior of various systems without sacrificing accuracy. It also allows for forecasting the nonlinear dynamic behavior in a reasonable time horizon in the future based on a certain amount of historical data. Due to these advantages of dynamic surrogate models, their applications can be seen in not only engineering design, but also many other fields where dynamic predictive models play an essential role, including disease transmission modeling [10], medical and environmental science [11], among others.

In the past decades, various surrogate modeling techniques have been developed to model nonlinear dynamic behavior. According to the fundamental differences in the modeling forms, the current approaches can be classified into two groups, namely, input–output models and state-space models [12]. For surrogate models in input–output forms, they are constructed by directly describing the relationship between the inputs and outputs based on observation data [12]. For state-space models, the surrogate models are represented in state-space forms [12]. The state-space model-based methods can be further classified into two categories: linear state-space models and nonlinear state-space models. Various mature techniques have been developed for the learning of linear state-space models in the field of dynamic system identification [13]. For the learning of nonlinear state-space models, it usually requires a nonlinear transformation of a vector of past input/output samples to a state vector [14]. Due to the importance of state-space models in control, numerous approaches have been developed in recent years using machine learning and/or optimization-based methods to learn nonlinear state-space models. For instance, Masti and Bemporad developed an autoencoder-based approach for the learning of nonlinear state-space models [15]; Deshmukh and Allison proposed a derivative function-based surrogate modeling method to learn the nonlinear state-space models [16]; Gedon et al. proposed a deep state-space model method for learning of nonlinear temporal models in the state-space form [17], and an optimization-based approach is suggested in Ref. [18] for system identification of nonlinear state-space models. Both the input–output form and state-space form of surrogate models are widely used in the analysis and modeling of nonlinear dynamic systems. They have their own advantages and disadvantages. The input–output form of surrogate models does not require an explicit definition of a Markovian state or any information about the internal states. It is applicable to any type of dynamic system with different complexities since it directly builds a function for the inputs and outputs. The disadvantage of the input–output form of surrogate models is that the order of the model is high due to the number of lags required to capture the relationship between inputs and outputs [18]. The nonlinear state-space form of surrogate models usually has a lower order (i.e., lower number of inputs) and is more efficient than its input–output counterparts. But it typically requires the definition of internal states and the surrogate modeling methods are much more complicated to implement in practice [18].

This paper focuses on surrogate models of nonlinear dynamic systems in the input–output form. In the past decades, a large group of methods have been developed for the modeling of dynamic systems in this form. For instance, autoregressive integrated moving average (ARIMA) [19] has gained considerable attention due to its predictive capability for time series models. ARIMA is derived from the autoregressive models [20], moving average models (MA) [21], and the autoregressive moving average models (ARMA) [22]. In general, the main limitations of these model classes are their imposed linear regression on the past history and their deficiency in accounting for other input variables other than time. The nonlinear autoregressive model with exogenous input (NARX) models, which can overcome these limitations, have been proposed in the dynamics field based on artificial neural networks (ANN) [23] and Gaussian process (GP) regression [24]. For example, one approach is the surrogate model with nonlinear function approximation, like ANN, support vector regression [25], kernel support vector machine [26], and the other one is tree-based ensemble learning algorithm [27], such as decision tree [28] and random forest [28,29]. In recent years, ANN with the robust nonlinear function such as multi-layer perceptron network [30] has been extensively used in nonlinear dynamic predictive models for various applications. However, the lack of dependencies between inputs in the sequence processing affects the accuracy in long-term sequence prediction tasks. In contrast, GP models conquer the problems mentioned above and provide more accurate predictions in the NARX scheme. In the new era of deep learning (DL), various approaches have been proposed recently to construct surrogate models of nonlinear dynamic systems. DL surrogate models include, but are not limited to, recurrent neural network (RNN) [31], long short-term memory (LSTM) which is a type of RNN [32], convolutional neural network (CNN) [33], and hybrid structures that combine different deep neural networks (DNNs) [34].

Even though the methods as mentioned earlier have shown good performance in different applications, there is no generic surrogate modeling method that is applicable to all nonlinear dynamic systems and across different domains. Selecting an appropriate surrogate modeling method for a specific nonlinear dynamic system remains an issue that needs to be addressed. With a focus on data-driven approaches, this paper performs a comprehensive comparative study of different surrogate modeling methods under the NARX scheme to investigate the predictive capability of different methods. Four widely used approaches, namely, GP-NARX, LSTM, convolutional neural network combined with long short-term memory (CNN-LSTM), and CNN with bidirectional LSTM (CNN-BLSTM), are extensively compared using three numerical examples to investigate the advantages and disadvantages of different approaches. The three examples include a simple mathematical model, a duffing oscillator model, and a Bouc-Wen model. It is expected that the finding from this research will provide guidance and reference for the selection of surrogate models to reduce the computational effort in building predictive models for the design and analysis of nonlinear dynamic systems.

The remainder of this paper is organized as follows. Section 2 presents a generalized description of nonlinear dynamic models. Section 3 introduces surrogate models studied in this paper. Section 4 presents the comparative studies of different surrogate models. Finally, Sec. 5 summarizes the results and draws the conclusions.

2 Background

This section first presents the generalized dynamic predictive models in the NARX form. Then, various forecasting strategies, including one-step or multi-step methods, are briefly discussed.

2.1 Nonlinear Dynamic Predictive Model.

A generalized nonlinear dynamic predictive model is defined as
yi=f(xi,θ)+νi
(1)
where yi is the output at time-step ti, f(·) is a nonlinear mapping from the regression vector xi to the output space, θ is an s-dimensional parameter vector included in f(·), the noise νi is used to account for the fact that the output yi will not in practice be a deterministic function of past data. The regression vector xi may be rewritten as [24]
xi=φ(ui,ui1,,uiq,yi1,yi2,,yip)
(2)
where yij,j=1,2,,p are the delayed samples of the measured output signal at time-steps ti−1, ti−2, …, tip, uik, k = 1, 2,… q are the delayed samples of the measured input signal at time-steps ti−1, ti−2, …, tiq, ui is the measured input value at the current time-step, φ(·) is a nonlinear mapping from the measured input and output values to the regression vector, and p and q are the number of lags in the inputs and outputs, respectively.

According to the choice of regressors, current nonlinear dynamic predictive models in Eq. (1) can be classified into six main categories: (1) nonlinear finite impulse response models, which use only delayed input values uik as regressors and are always stable due to the deterministic input values [35]; (2) NARX models, which use both delayed output signals yij and input signals uik as regressors and are also called series-parallel models [36]; (3) nonlinear output error models, which use the estimation y^ij of output value yij and input signals uik as regressors [37]; (4) nonlinear autoregressive and moving average model with exogenous input (NARMAX) models, which use yij, uik, and prediction error as regressors [38]; (5) nonlinear Box-Jenkins models [39]; and (6) nonlinear state-space models [18].

2.2 Predictive Model Forecast Strategies.

In order to obtain the accurate prediction of future of nonlinear dynamic behavior, various strategies have been developed to build dynamic predictive models. According to the number of future prediction time-step(s), the current strategies can be classified into two groups: one-step approach and multi-step approach. The one-step approach predicts only the next time-step of a time series, while the multi-step approach may predict any number of forward time-steps. In general, the multi-step prediction approach is more likely to be adopted due to its capability of providing longer-term predictions, especially for reliability analysis/prognostics purposes.

The traditional direct multi-step forecasting is usually replaced by recursive strategies for multi-step forecasting to improve the prediction accuracy over a long-time horizon. The forecasting models with recursive strategy, in general, are better than the direct multi-step forecasting in terms of prediction accuracy and stability [40]. However, to overcome the limitation of error accumulation, some new techniques have been developed by integrating different types of dynamic models, such as multiple output strategies for multi-step forecasting [41] and the direct-recursive hybrid methods [42]. In what follows, we briefly review four classical multi-step time series forecasting methods.

2.2.1 Direct Multi-Step Forecast Strategy.

The direct multi-step strategy considers a single-output model as follows [43]:
y^i=fh(ui,yi1,yi2,,yip,θ)
(3)
where ui=[ui,ui1,,uiq], y^i is the predicted output at time-step ti, fh(),h=1,2,,H is the hth single-output model from a total of H number of models, yij,j=1,2,,p are the measured output signals at time-steps tij,j=1,2,,p, and p is the number of time lags, as mentioned above. Through Eq. (3), predictions of maximally H time-steps are obtained by combining predictions from each single-output model. For example, to obtain the prediction of the following two steps ti+1 and ti+2, we may develop one model for forecasting at ti+1 and a separate model for forecasting at ti+2 as
y^i+1=f1(ui+1,yi,yi1,,yip+1,θ)y^i+2=f2(ui+2,yi+1,yi,,yip+2,θ)
(4)

The drawback of the direct multi-step forecast strategy in Eq. (3) is that the H independent models result in predictions y^i+1,y^i+2,,y^i+H without statistical dependencies. It is more complicated than the recursive strategy and requires more computational time because of the required number of predictive models.

2.2.2 Recursive Multi-Step Forecast Strategy.

The recursive multi-step forecast strategy learns a one-step model multiple times, where the prediction of the previous time-step will be considered as one of the inputs at the next time-step as follows [43]:
y^i+1=f(ui+1,yi,yi1,,yip+1,θ)y^i+2=f(ui+2,y^i+1,yi,yi1,,yip+2,θ)
(5)
where yij,j=0,1,,p1 are the samples of the measured output signal at time-step tij,j=0,1,,p1. In Eq. (5), in order to predict the next two time-steps ti+1 and ti+2, the recursive multi-step strategy develops a one-step forecasting model f(·) at time-step ti+1 and then uses it iteratively by adding the y^i+1 into the input vector for the next forecast at ti+2.

The recursive multi-step forecast strategy eliminates the limitations of the direct multi-step forecast method. However, the estimation error will accumulate over time since the predicted values are used as inputs instead of the actual ones. The error accumulation would generally degrade the accuracy and amplify the uncertainty with the size of the prediction time horizon.

2.2.3 Direct-Recursive Hybrid Strategies.

The direct-recursive hybrid strategies combine the underlying principles of direct and recursive forecasting strategies. When this strategy is used, the high-dimensional data may be considered as input variables. With the proper optimization algorithm, redundant information can be deleted to avoid overfitting problems. Based on both direct and recursive strategies, it can diminish dynamic system loss [29]. This strategy expands the input variables by replacing the true values with predicted ones using the recursive strategy, and it generates separate models with distinctive time horizons similar to the direct strategy. Thus, this method overcomes the shortcomings of the previous two strategies by avoiding error accumulation and maintaining the dependency of each estimated output. It is worth mentioning that this technique requires higher fidelity analysis and complex formulations because it demands the embedding size to be different for the whole range of prediction horizons. The hybrid strategy can be described as follows [43]:
y^i+1=f1(ui+1,yi,yi1,,yip+1,θ)y^i+2=f2(ui+2,y^i+1,yi,,yip+2,θ)
(6)
where yij,j=0,1,,p1 are samples of the measured output signal at a time-step tij, y^i+1 and y^i+2 are the predictions at time-step ti+1 and ti+2, f1(·) and f2(·) are separate predictive models.

2.2.4 Multiple Output Strategy.

When the prediction of a time horizon that is longer than that of training data is considered, one-step output mapping may ignore the dependency between future predictions (e.g., between y^i+1 and y^i+2) and degrade the prediction accuracy. Therefore, a multiple output strategy has the potential to overcome this drawback. The multiple output strategy trains one emulator to predict the forecast sequence in a single surrogate model [43]. The multiple output model is given by
[y^i+1,y^i+2,,y^i+r]=f(ui+1,ui+2,,ui+r,yi,yi1,,yip+1,θ)
(7)
where y^i+1,y^i+2,,y^i+r are the predictions at time-step ti+1,ti+2,,ti+r, and r is the total number of predicted time-steps. Practical applications show that multi-output models are complex and do not have enough flexibility since the stochastic dependency behavior must be learned.

3 Surrogate Modeling for Nonlinear Dynamic Systems

This section first discusses the generation of training data for surrogate modeling of nonlinear dynamic systems. Following that, we briefly review four surrogate modeling techniques for nonlinear dynamic systems, including GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM.

3.1 Training Data Generation for Surrogate Modeling of Nonlinear Dynamic Systems.

After considering the benefits of different forecasting strategies and nonlinear dynamic structures, this paper investigates dynamic surrogate modeling for long-term prediction using recursive multi-step forecasting strategy based on short-term training data. The dynamic output is characterized by the controllable inputs and multi-step outputs of previous time-steps and iterated by the single-period prediction result. All data in this paper are collected synthetically through simulations for verification and validation purpose. This paper uses the uniform time-steps in the predictive model and controllable inputs. However, the time-step sizes are different for different case studies.

Figure 1 shows an overview of surrogate modeling for nonlinear dynamic systems. As shown in this figure, there are five main steps, namely: (a) generation of input training data, which generates training data for dynamic system model parameters and excitations; (b) training data collection, which obtains data of output based on simulations of dynamic systems; (c) data preprocessing, which processes the data into the format that is needed for the training of various surrogate models; (d) surrogate modeling training using the processed data; and (e) prediction and validation, which uses the trained surrogate model for prediction under new conditions and excitations. Next, we briefly explain the first three steps, and following that, we will discuss the training of various surrogate models in Secs. 3.23.5.

Fig. 1
Overview of surrogate modeling for nonlinear dynamic systems: (a) sample generation, (b) data collection, (c) data preprocessing, (d) surrogate modeling, and (e) model prediction and validation
Fig. 1
Overview of surrogate modeling for nonlinear dynamic systems: (a) sample generation, (b) data collection, (c) data preprocessing, (d) surrogate modeling, and (e) model prediction and validation
Close modal

As shown in Fig. 1, training data generation is essential in training surrogate models for nonlinear dynamic systems. This paper first generates ϑ training samples for model parameters θ using Latin Hypercube sampling [44]. Let the generated training samples be θi,i=1,2,,ϑ, where θi represents the ith training sample. Then, for each sample θi, as shown in Fig. 1(b), we obtain a time series response yi,j, j = 1, 2, · · ·, Ni with controllable time series inputs ui,j, j = 1, 2, · · ·, Ni, in which Ni is the length of the controllable input excitations of the ith training sample.

After that, we process the data of inputs and outputs into training data for surrogate modeling by following the NARX scheme as follows (i.e., Fig. 1(c)):
Z=[z(1)z(ϑ)]RNT×(p+q+s),Y=[y(1)y(ϑ)]RNT×1wherez(i)=[ui,p+1yi,p+1θiui,p+2yi,p+2θiui,Niyi,Niθi]R(Nip)×(p+q+s),y(i)=[yi,p+1yi,p+2yi,Ni]R(Nip)×1,i=1,,ϑui,p+m=[ui,m+p,ui,m+p1,,ui,m+p+1q]R1×qyi,p+m=[yi,m+p1,yi,m+p2,,yi,m]R1×p,m=1,,Nip
(8)

in which NT=i=1ϑ(Nip) is the total number of training points and s is the number of dimensions of θ.

The above-generated training data are then used to train various surrogate models for the nonlinear dynamic system in Secs. 3.23.5. Figure 2 presents a single-step univariate dynamic predictive model schematic after the surrogate modeling (i.e., Fig. 1(e)). As indicated in this figure, the emulator predicts the future values; meanwhile, observed values were replaced by forecasting from the last period. Thus, for example, y^i+2 is the predicted output in the second iteration (y^i+2=f(ui+2,ui+1,,uiq+3,y^i+1,yi,,yip+2)+vi+2), and the first iteration output y^i+1 is in place of yi+1. All the controllable inputs switch from the actual values to the estimated ones with the incremental recursive scheme. Meanwhile, predictions are iterated in the recursive procedure with input variables to address error accumulation to a certain extent.

Fig. 2
Schematic of recursive single-step prediction for a univariate dynamic predictive model
Fig. 2
Schematic of recursive single-step prediction for a univariate dynamic predictive model
Close modal

Next, we will briefly review the four types of surrogate modeling techniques studied in this comparative paper, including GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM.

3.2 GP-NARX

3.2.1 Brief Review of GP-NARX.

As discussed above, a NARX model can be represented as
yi=f(xi,θ)+νixi=[ui,ui1,,uiq,yi1,yi2,,yip]
(9)
where νiN(0,σv2) is the white Gaussian noise following a normal distribution with zero mean and standard deviation σv, p and q are, respectively, the lags in the output and input values.

Since the nonlinear mapping function f(·) is usually a simulation model that requires substantial computational effort to evaluate, the direct application of the NARX model to dynamic system analysis, design, and control is computationally prohibitive. To address the computational challenge, f(·) can be substituted with a computationally “cheaper” surrogate model. In the GP-NARX surrogate model, the probabilistic and nonparametric Gaussian process (GP) model is used to approximate the nonlinear mapping function f(·).

The response over the prediction space is assumed to be a Gaussian process in the GP model with the mean function m(z),wherez=[x,θ]Rp+q+s and covariance function k(z,z) given by
m(z)=E[f(z)]k(z,z)=E[(f(z)m(z))(f(z)m(z))]
(10)
For measured input matrix Z=[z(1),,z(m)]TRNT×(p+q+s), where z(i) is the ith input vector at the certain discrete time-step of NT discrete time-steps, we can have the prior GP for function f(Z) as
f(Z)GP(mf,Σf+σn2INT)
(11)
where mf is the mean vector and Σf is the covariance matrix obtained by evaluating Eq. (10) for all input Z, σn2 is the variance of white noise νi, INT is an NT × NT unit matrix. We can generally have mf0, especially when we have no prior information. After observing the measured output data of Z, i.e., D={(Z,y)}, where y=[y1,,yi,,yNT]T, the posterior distribution of function f(z) given measured data D and hyperparameters Θ is given by
p(f(z)|Z,y,Θ)=L(y|Z,f(z),Θ)p(f(z)|Θ)p(y|Z,Θ)
(12)
where L(y|Z,f(z),Θ) is the likelihood, p(f(z)|Θ) is the prior given hyperparameters Θ, p(y|Z,Θ) is the evidence, and p(f(z)|Z,y,Θ) is the posterior distribution over f(z).
The implementation of Bayesian inference in Eq. (12) may be analytically intractable due to the evaluation of multiple integrals; one can estimate the hyperparameters Θ by maximizing the marginal likelihood [45]. After estimating the hyperparameters Θ, for any unobserved input z*, we have the following joint distribution
(yy*)N(0,[Σf+σn2INK(Z,z*)K(Z,z*)Tk(z*,z*)])
(13)
where K(Z,z*) is a covariance vector between z* and input vector Z.
By transforming the joint distribution to Gaussian condition distribution p(y*|D,z*,Θ), we can obtain the following mean and variance prediction for z*
m(z*)=K(Z,z*)T(Σf+σn2INT)1ycov(z*,z*)=k(z*,z*)K(Z,z*)T(Σf+σn2INT)1K(Z,z*)
(14)
After the GP model is constructed, the GP-NARX model is written as
y^i=fGP(ui,ui1,,uiq,yi1,yi2,,yip,θ)+νiy^i+1=fGP(ui+1,ui,,uiq+1,y^i,yi1,,yip+1,θ)+νi+1
(15)

3.2.2 Subsampling for GP-NARX Construction With a Large Volume of Data.

For high-rate time series/dynamic systems, a large volume of training data will be collected using Eq. (8). This will drastically increase the computational effort for both the training and prediction of the GP-NARX model. Aiming to address this challenge for GP-NARX surrogate modeling, this paper proposes a sub-selection method based on the max-min algorithm [46,47]. In the max-min method, training points are selected by maximizing the minimum distance between a new point and all currently available training points as
znew=maxzZ{minzt[zzt2]}
(16)
where zt denotes all currently available training points and 2 is the l2-norm of a vector. Thus, this algorithm allows us to evenly fill the design domain with a small number of representative points of the original training data.
In order to sub-select NS samples from the NT available samples given in Eq. (8), we first randomly select Nin samples and denote the selected Nin samples as zt=[z1t,z2t,,zNint], where zjt is the jth selected point. To sub-select the other NSNin samples from the NTNin available samples, we first compute the minimum distances between samples in zt and samples in Z (i.e., Eq. (8)) as follows:
lk=minj{zjtz(k)2},k=1,2,,NTNin
(17)
where z(k) is the kth sample in Z.
With the minimum distances obtained from Eq. (17), the index of a new sample is then identified as
i*=argmaxk{lk}
(18)

Based on the identified index, the new sample is then added to the selected training samples zt. This process continues iteratively until NS samples are sub-selected. Using the sub-selected samples, a GP-NARX surrogate model can be trained by following Eqs. (10) through (14).

3.3 Long Short-Term Memory.

Recently, DNNs, such as LSTM and CNN, have emerged as new modeling methods of temporal or sequential characteristics. It has been widely applied in various domains, such as product searching [48], advertising [49], image recognition [50], etc.

Even though LSTM, CNN-LSTM, and CNN-BLSTM belong to different types of RNNs, they resemble a mapping feedforward neural network using similar algorithms. For different databases, the models have different hyperparameters and weights in different layers. This paper uses a many-to-one RNN for prediction. Figure 3 below shows an example of a many-to-one RNN with one hidden layer. In this figure, a = σ(wx + b), where w and b are weights, σ(·) is the activation function (i.e., sigmoid function in this example). [x1,x2,,xNT] are inputs to the input layer and y^ is the output of the output layer. As mentioned above, this paper will investigate various methods to construct accurate surrogate models for nonlinear dynamic systems. In the following, we will briefly introduce the LSTM method, which is one of the commonly used variants of RNN.

Fig. 3
Many-to-one recurrent neural network
Fig. 3
Many-to-one recurrent neural network
Close modal

LSTM, as a variant of RNN, is widely used for solving classification and regression problems, such as PM2.5 prediction [51] and traffic flow prediction [52]. It not only performs well in capturing short-term data dependencies for prediction but also adequately account for the explicit dependences of multiple outputs over the long-time horizon. Comparing with the traditional RNN, LSTM overcomes the shortcoming of insufficient long-term dependence in RNN due to the exploding gradient resulting from gradient propagation. Figure 4 shows the structure of an LSTM cell. As shown in this figure, in each LSTM cell, there are an input gate, a forget gate, and an output gate. The forget gate determines the information whether it should be got rid of; the input gate determines the new information, which consists of actual observed values after iterations by a dynamic model; and the output gate decides the values to the next neuron will be computed using the activation function.

Fig. 4
Diagram of an LSTM cell
Fig. 4
Diagram of an LSTM cell
Close modal
The standard LSTM architecture comprises multiple hidden layers, which consist of input layers and output layers. For more complex LSTM models, it can add more layers such as dropout layers which can reduce the computational time. Furthermore, specific LSTM layers contain a series of LSTM cells, and each cell has its algorithm. In an LSTM network, it designs time-step t (t = 1, 2, …, n, where n is the number of time-steps) and network layer L (L = 1, 2, …, NL, where NL is the number of LSTM layers). Let us denote, at time-step ti, the input gate, forget gate, and output gate as ii, fi, and oi, respectively, hi as a hidden layer output and ci as the memory cell state. The relationships among the above variables are given as follows:
ii=σ(WxiIi+Whihi1+bi)
(19)
fi=σ(Wxfzi+Whfhi1+bf)
(20)
oi=σ(Wxozi+Whohi1+bo)
(21)
c^i=tanh(Wxczi+Whchi1+bc)
(22)
ci=fiΛci1+iiΛc^i
(23)
hi=oiΛtanh(ci)
(24)
where Λ represents the elementwise process operation (Hadamard product), tanh represents the hyperbolic tangent function, W with two subtitles matrixes are weights between two gates, Wαβ is the weight with different gates, bβ is the bias vector, where α = {x, h} and β = {f, i, c, o}. For example, Whf(l) represents the lth layer of LSTM's weight matrices equaling to the input vector hi within the forget gate fi; bf(l) denotes the lth layer of LSTM's bias vector corresponding to the forget gate.

For the first step, in Eq. (20), the input data {zil,hi1l} goes through the forget gate (σ sigmoid function), and it discharges a vector with the value of 0 and 1 to determine the latter information. It forgets the value ci1l when the vector value is 0, while passing ci1l value when the vector value is 1. Then the next step determines what information can be added to c^i by using the hyperbolic tangent function in Eq. (22). Afterward, in Eq. (23) the elementwise Hadamard product function combines c^i and ii to update the new memory state outputs cil. Lastly, this single LSTM cell’s outputs hil can be obtained through Eq. (24) using the Hadamard product function with updated memory outputs cil (scaled within value between −1 and 1) and output gate results oil.

Figure 5 depicts a three-layer LSTM architecture. In the three layers of Fig. 5, the input states are denoted as z=[x1,x2,,xj,,xNT], and the output is y^, where m is the total number of input or output datasets. p1, p2, and p3, are, respectively, the neural number of layers one, two, and three. In each LSTM layer, LSTM cell output {cil,hil} passes through pl nodes according to the LSTM cell algorithms, and output yil equals the lth hidden state response hil, which is transferred to the next LSTM layer for the input state. Finally, the fully connected layer connects the LSTM and output layers to obtain output features.

Fig. 5
A three-layer LSTM architecture
Fig. 5
A three-layer LSTM architecture
Close modal

Besides, the dropout layers can be added after each RNN layer to reduce the training time and avoid overfitting. The main idea of the dropout is to exclude input and nodes with a certain probability randomly. Figures 6(a) and 6(b) illustrate a standard neural network and the network after dropout.

Fig. 6
Illustration of dropout in LSTM: (a) standard neural net and (b) after applying dropout
Fig. 6
Illustration of dropout in LSTM: (a) standard neural net and (b) after applying dropout
Close modal

3.4 Convolutional Neural Network Combined With Long Short-Term Memory.

CNN-LSTM is a hybrid model that combines two different deep learning models, which integrates large data features and even spatial data structures with more than two dimensions. CNN is designed to perform sequential predictions using inputs like pictures and videos which cannot directly be modeled by the standard LSTM [53]. In CNN-LSTM, the jointly constructed model by CNN and LSTM works in three steps to perform dynamic surrogate modeling. For CNN, it can extract features of dynamic models. Then features collected by CNN are transferred to the following layers, such as LSTM layers or deeper LSTM network for bidirectional feature learning (BLSTM) in the forward and backward directions. Then the dense layer is used to represent the features and transfer them to the prediction layer.

Unlike the standard RNN, CNNs are constructed with neurons like LSTM neurons that can capture the features through learnable biases and weights. As shown in Fig. 7, the difference between CNN with traditional ANN is that the input of CNN can be three dimensions: width, height, and depth, which are widely used when images are inputs. In this paper, CNN transfers the inputs from one dimension into two or three dimensions to get a higher prediction accuracy. Moreover, the final output layer of CNN will reduce the multiple vectors into a single vector, all the temporal features of the dynamic models can be preserved for the following neural networks (i.e., LSTM).

Fig. 7
ConvNet Architecture
Fig. 7
ConvNet Architecture
Close modal
Figure 8 shows an example of a CNN-LSTM architecture. The input state for the model is ZR1×(p+q+s). O filters {W1, W2,…, WO} are implemented in convolution operations. The process of convolution learning is written as
WL=ϖ(Z*{W1,W2,,WO}+bc)
(25)
where WL is the feature map setting in the convolutional layer, ϖ is the activation function (ReLU function with max (0, x)), bc is the bias of the convolutional layer, and * represents the convolutional operation.
Fig. 8
CNN-LSTM architecture
Fig. 8
CNN-LSTM architecture
Close modal

Then all the collected convolutional features maps WL ∈ ℝκ (through κ time-steps) are transferred into LSTM layers, which has been discussed in Sec. 3.3. As shown in Fig. 8, a CNN model is adopted for feature extraction in the first half part, while an LSTM model is adopted for prediction in the other half part with the extracted dataset from the CNN layers as inputs. Through this hybrid structure, the prediction for the current step will be added into the observed values as part of the following control input according to the recursive prediction scheme discussed in Sec. 2 to perform long-term prediction for the future time-steps.

3.5 CNN-Bidirectional Long Short-Term Memory.

Bidirectional long short-term memory (BLSTM) is an extension of traditional LSTM. Instead of having only one forward direction like LSTM, it establishes two training directions, as shown in Fig. 9, where one is in the input order, and the other one is in a reversed order of the initial one.

Fig. 9
Architecture of bidirectional LSTM
Fig. 9
Architecture of bidirectional LSTM
Close modal
Figure 10 illustrates the structure of CNN-BLSTM. The only difference between CNN-LSTM and CNN-BLSTM is to replace LSTM with bidirectional LSTM. In this paper, two bidirectional LSTM layers are stacked together to get the hidden gate output as follows:
htl={ρ(ι(WL,ht1l),ι(WL,ht1l)),l=1ρ(ι(ht1l1,ht1l),ι(ht1l1,ht+1l)),l=2
(26)
where htl is the hidden gate output of the lth bidirectional response at the time-step t, h is the forward direction of bidirectional layer, h is the backward direction of bidirectional layer, ι is the LSTM cell (from Eqs. (16)(21)), and ρ is the function of combined forward (h) and backward (h) direction calculation sequences.
Fig. 10
Illustration of CNN-BLSTM architecture
Fig. 10
Illustration of CNN-BLSTM architecture
Close modal

The surrogate model constructed using CNN-BLSTM is similar to that from CNN-LSTM.

3.6 Summary of Models and Implementation Procedure.

In this paper, all the above four models are implemented in python environment. For GP-NARX, the model is developed based on Scikit-learn package. For LSTM, CNN-LSTM, and CNN-BLSTM, the models are implemented using Tensorflow and Keras packages. In the spirit of the “Occam’s Razor” principle, when presented with competing models that perform with similar predictive power, the “optimal” model should be the one with the least complexity, by some measure, which we interpret to mean tunable parameters for the purpose of this study. To this end, we summarize the tunable parameters, advantages, and disadvantages of GP-NARX and deep learning-based models in Table 1 for the selection of models.

Table 1

Summary of tunable parameters and features of different model classes

ModelTunable parametersInterpretabilityDataset size
GP-NARXCovariance function, likelihood covariance noise, and number of lagsEasy to interpretComputational complexity is O(n3)
LSTM, CNN-LSTM, CNN-BLSTMNumber of layers, number of neurons, batch size, number of epochs, dropout ratio, activation function, weight decay ratio, and number of lagsOngoing research topic to develop interpretable modelsCan handle “big data”/very large datasets
ModelTunable parametersInterpretabilityDataset size
GP-NARXCovariance function, likelihood covariance noise, and number of lagsEasy to interpretComputational complexity is O(n3)
LSTM, CNN-LSTM, CNN-BLSTMNumber of layers, number of neurons, batch size, number of epochs, dropout ratio, activation function, weight decay ratio, and number of lagsOngoing research topic to develop interpretable modelsCan handle “big data”/very large datasets

Next, we will use three numerical examples to perform a comprehensive comparative study of the aforementioned surrogate modeling techniques.

4 Comparative Studies

Three examples are used in this section to compare the performance of different surrogate modeling methods. The first one is a mathematical example with a two-step lag input and a one-step-ahead prediction. The second one is a duffing oscillator model. The last one is the Bouc-Wen nonlinear dynamic model. The exemplar complexity increases from Example 1 to Example 3.

In the past decades, various approaches have been developed to check the accuracy of dynamic model prediction, such as mean square error (MSE), correlation and median [54], probabilistic error measure using reliability theory [55], and dynamic time warping [56]. Since MSE is the most widely used in surrogate modeling, MSE as below is adopted in this paper to measure the prediction accuracy of different surrogate modeling methods
MSE=1Ndi=1Nd(yiy^i)2
(27)
where Nd is the total number of test data, yi and y^i,i=1,,Nd, are, respectively, the observed and predicted values.

In addition, the deep learning models (i.e., LSTM, CNN-LSTM, and CNN-BLSTM) in all three examples are tuned to achieve the best performance. The dropout rate is tuned in the range of [0.001, 0.1] by comparing the performance of four different dropout rates, 0.001, 0.005, 0.01, and 0.1. Four different batch sizes (16, 32, 64, and 128) are also compared to determine the best batch size for different examples. The performances of LSTM models with a different number of layers (2, 3, 4, and 5) are also compared in each example to select the best number of layers. The learning rate is tuned using 0.0001, 0.001, 0.01, and 0.1. The number of neurons is tuned with 30, 40, 60, 80, and 100 to select the optimal number of neurons. For the CNN model, two layers are used for convolution and pooling. The pooling size for the CNN layer is selected as 2 after comparing the performance of three different sizes, namely, 2, 3, and 6. Even though the hyperparameters of the deep learning models might be different for different problems, the optimal hyperparameters for the three studied problems turn out to be the same, which are dropout rate—0.01, batch size—32, number of hidden layers—3, learning rate—0.001, and number of neurons—80. The number of epochs is also tuned for each example to ensure the convergence of the loss functions. The required tunning effort for the deep learning models in general is much higher than that of GP-NARX.

4.1 Example 1: A Mathematical Example.

A mathematical nonlinear dynamic predictive model is given by
yi=cos(yi1yi2+2)+0.8sin(yi22+yi1)+vi
(28)
where vi is noise and follows the normal distribution viN(0,σv2), σv=0.1, and yi is the response at the ith time-step.

As shown in Eq. (28), it is a two-step lag nonlinear dynamic model. Since the input only includes outputs from previous time-steps, it can be considered as a one-dimensional NARX model. In order to compare the performance of the four different surrogate modeling methods, the predictive model given in Eq. (28) is assumed to be unknown. Therefore, surrogate models are constructed using training data generated using the predictive model. In this case study, five trajectories of time series data are generated using Eq. (28) as the training data. The first two time-step responses of the five time series are, respectively, [0.6, 1.2], [1.2, −0.6], [0, 0], [−1.2, −1.2], and [−0.6, 0.6]. The time series lengths are 20, 50, 50, 30, and 50, respectively. Through the five training time series, 195 training samples are obtained. Figure 11 presents the five time series training data.

Fig. 11
Five time series training data
Fig. 11
Five time series training data
Close modal

Based on the training data given in Fig. 11, four surrogate models are constructed using GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM, respectively, following the methods discussed in Sec. 3. Two scenarios, namely, with subsampling and without subsampling, are considered to compare the performance of different surrogate models. For surrogate modeling with subsampling, 150 training points are sub-selected from the available data. Since the total number of training data is small, it allows us to train the GP-NARX model using all the training data for the scenario without subsampling. In order to quantitatively quantify the accuracy of different surrogate models, Table 2 gives the MSE comparison of the four surrogate models with subsampling for 14 testing cases. Following that, Table 3 lists the MSE comparison for the scenario of surrogate modeling without subsampling.

Table 2

MSE comparison of GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM (surrogate modeling with subsampling)

CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(40,2)0.151420.029120.029150.021470.02147
2(70,2)0.006200.019230.012880.011130.00620
3(66,2)0.021380.016610.014790.012780.01278
4(80,2)0.095440.018090.014090.013030.01303
5(30,2)0.019530.028910.013900.012880.01288
6(190,2)0.003440.014140.014860.013590.00344
7(150,2)0.013480.008850.010070.009230.00885
8(200,2)0.015100.012080.013580.013090.01208
9(150,2)0.006940.014020.016280.015130.00694
10(300,2)0.003210.013670.013170.012700.00321
11(50,2)0.013290.012270.015100.015160.01227
12(30,2)0.005880.010200.010030.007810.00588
13(45,2)0.244150.040630.031200.018230.01823
14(60,2)0.006680.013950.011080.010120.00668
Average(104,2)0.043300.017980.015730.013310.01331
CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(40,2)0.151420.029120.029150.021470.02147
2(70,2)0.006200.019230.012880.011130.00620
3(66,2)0.021380.016610.014790.012780.01278
4(80,2)0.095440.018090.014090.013030.01303
5(30,2)0.019530.028910.013900.012880.01288
6(190,2)0.003440.014140.014860.013590.00344
7(150,2)0.013480.008850.010070.009230.00885
8(200,2)0.015100.012080.013580.013090.01208
9(150,2)0.006940.014020.016280.015130.00694
10(300,2)0.003210.013670.013170.012700.00321
11(50,2)0.013290.012270.015100.015160.01227
12(30,2)0.005880.010200.010030.007810.00588
13(45,2)0.244150.040630.031200.018230.01823
14(60,2)0.006680.013950.011080.010120.00668
Average(104,2)0.043300.017980.015730.013310.01331

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Table 3

MSE comparison of GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM (surrogate modeling using all available data without subsampling)

CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(40,2)0.03540.00940.01620.01100.0094
2(70,2)0.02000.00960.01300.00920.0092
3(66,2)0.02220.01120.01280.01180.0112
4(80,2)0.02310.01150.01450.01000.0100
5(30,2)0.00710.01040.01200.01200.0071
6(190,2)0.01160.01270.01450.01200.0116
7(150,2)0.01060.01390.01560.01250.0106
8(200,2)0.00380.01450.01360.01220.0038
9(150,2)0.01230.01180.01410.01110.0111
10(300,2)0.00760.01180.01440.01030.0076
11(50,2)0.00470.01040.01440.00900.0047
12(30,2)0.05200.01560.01970.01400.0140
13(45,2)0.08700.01050.01040.00860.0086
14(60,2)0.02730.00830.01250.00810.0081
Average(104,2)0.02320.01150.01410.01080.0108
CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(40,2)0.03540.00940.01620.01100.0094
2(70,2)0.02000.00960.01300.00920.0092
3(66,2)0.02220.01120.01280.01180.0112
4(80,2)0.02310.01150.01450.01000.0100
5(30,2)0.00710.01040.01200.01200.0071
6(190,2)0.01160.01270.01450.01200.0116
7(150,2)0.01060.01390.01560.01250.0106
8(200,2)0.00380.01450.01360.01220.0038
9(150,2)0.01230.01180.01410.01110.0111
10(300,2)0.00760.01180.01440.01030.0076
11(50,2)0.00470.01040.01440.00900.0047
12(30,2)0.05200.01560.01970.01400.0140
13(45,2)0.08700.01050.01040.00860.0086
14(60,2)0.02730.00830.01250.00810.0081
Average(104,2)0.02320.01150.01410.01080.0108

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Comparing the results in Tables 2 and 3, it shows that increasing the number of training data in general can increase the accuracy of all four types of surrogate models. Since GP-NARX model can be trained using all the training data, we focus on the results in Table 3 for the comparative study. The results in Table 3 indicate that GP-NARX has a higher prediction accuracy than the RNN models (i.e., LSTM, CNN-LSTM, CNN-BLSTM) in general, when the period of dynamic prediction is long. For example, the MSE of GP-NARX prediction is 0.0038 while its counterpart of the RNN models is over 0.01 when the prediction period is 200 time-steps (i.e., Case 8). When the period for prediction is 40 time-steps (i.e., Case 1), the MSE of GP-NARX is over 0.03 while that of RNN models is less than 0.015, as shown in Table 3.

Figure 12 presents the comparison of surrogate model prediction and true response for one testing case for the scenario of surrogate modeling without subsampling. Following that, Fig. 13 gives the corresponding prediction errors of the four surrogate models for the testing case. The results plotted in Figs. 12 and 13 show that the predictive model constructed using LSTM has the highest prediction accuracy for this particular testing case.

Fig. 12
Comparison of prediction and true response for Case 1 for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM
Fig. 12
Comparison of prediction and true response for Case 1 for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM
Close modal
Fig. 13
Comparison of prediction errors for the case given in Fig. 12
Fig. 13
Comparison of prediction errors for the case given in Fig. 12
Close modal

It can be concluded from this particular mathematical example that the GP-NARX constructed based on the time series given in Fig. 11 is more suitable than RNN models for the prediction of long periods. On the other hand, RNN models, including LSTM, CNN-LSTM, and CNN-BLSTM models, are more suitable for predictions that have a similar data shape or distribution as the training data. For instance, the lengths of the training time series are between 20 to 50, and the RNN models have a higher prediction accuracy than GP-NARX when the number of prediction time-steps is close to that range. In addition, the results in Table 3 show that the prediction capability of RNN models is more robust than that of GP-NARX for this particular example, which is manifested as a lower average MSE value and a more stable performance.

4.2 Example 2: An Asymmetric Duffing Oscillator.

An asymmetric duffing oscillator is adopted from Ref. [24] as our second example. It is a nonlinear second-order differential equation used to model certain damped and driven oscillator, which exhibits chaotic behavior and describes the dynamics of a point mass in a double well potential [57]. The equations of motion are given by
υy¨+cy˙+ky+k2y2+k3y3=u(t)yi(obs)=yi+vi,viN(0,σv2)
(29)
in which u(t) is the input excitation given by
u(t)=a×cos(bt)+(sin((b+3)t)+2)+2a×cos(bt+1)+sin(2bt)
(30)
where σv=2×106,υ=1, c = 10, k = 2 × 104, k2 = 107, and k3 = 5 × 109. For demonstration purpose, a, b, and c are treated as controllable model parameters that can be changed for different experiments, and thus we have θ=[a,b,c].

In order to generate training data for the surrogate models, we first generate 60 training data for θ with a lower bound θL ∈{2, 0.1, 0.1} and an upper bound θU{25,10,15} using the Latin Hypercube sampling method, and we have ϑ = 60 in Eq. (8). The range of θ is only used for demonstration purpose in this paper. In probabilistic analysis or design of nonlinear dynamic systems, the range needs to be determined according to distributions of θ. For each training dataset of θ, excitations of u(t) are generated after adding noise to Eq. (30) based on the values of a and b. The length of each u(t) is around 2000, and therefore we have Ni = {ni|1200 ≤ ni ≤ 2500},i=1,2,,60 in Eq. (29). For each excitation and training data of c, we solve Eq. (29) using the fourth-order fixed-step Runge–Kutta algorithm and obtain the responses of y. Figure 14 shows one example of the 60 training excitations corresponding responses of y.

Fig. 14
An example of training excitation and corresponding dynamic response
Fig. 14
An example of training excitation and corresponding dynamic response
Close modal

Through cross-validation and after comparing the performance of surrogate models with different numbers of lag ranging from 5 to 15, it is found that a time lag of nine for both input excitation u(t) and output y(t) gives the best prediction accuracy in surrogate modeling for all four types of surrogate models. We, therefore, have p = 9 and q = 9. Following the procedure discussed in Sec. 3.1, we obtain training data for surrogate modeling. Through the 60 training time series, 100,211 training samples are obtained. The large volume of training data makes the training of GP-NARX computationally very challenging. We, therefore, perform subsampling of the training data using the approach discussed in Sec. 3.1.2. From the subsampling, 900 training data are sub-selected for the training of GP-NARX while maintaining the space-filling property. Figure 15 shows a comparison between the sub-selected samples and the original samples for two dimensions of the training data (i.e., u1 versus y1 and u3 versus y3). As shown in this figure, the subsampling method can well-maintain the space-filling property while reducing the number of training data.

Fig. 15
A comparison of sub-selected training data and the original training data
Fig. 15
A comparison of sub-selected training data and the original training data
Close modal

The sub-selected training data are used to train GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM models. Additionally, another set of LSTM, CNN-LSTM, and CNN-BLSTM models are trained using all the training data. We therefore have one GP-NARX model trained based on subsampling, two sets of LSTM, CNN-LSTM, and CNN-BLSTM models, respectively, trained with and without subsampling. After training the surrogate models, we randomly generate eight samples for θ = [a, b, c] according to the lower and upper bounds. Based on that, eight random input excitations are obtained using Eq. (30). We then assume that the original dynamic model is unknown and compare predictions of surrogate models with responses of the original dynamic model for the eight testing datasets. Since the results of LSTM, CNN-LSTM, and CNN-BLSTM models with subsampling are much worse than that of deep learning models without subsampling, GP-NARX with subsampling is directly compared with LSTM, CNN-LSTM, and CNN-BLSTM models trained using all available training data. Table 4 gives the MSE comparison of the four surrogate models for all eight testing cases. The average MSE for these eight test datasets is 2.29 × 10−08 for GP-NARX, the best among the four studied surrogate models. In addition, the deep learning models (LSTM, CNN-LSTM, and CNN-BLSTM) have similar performances. Thus, the results in Table 4 show that GP-NARX has a higher prediction accuracy than the other methods for this particular example. In addition, GP-NARX has a stable performance when the data changes drastically over long periods.

Table 4

Comparison of GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM for Example 2

CaseData dimensionMSE (×108)
GP-NARXaLSTMCNN-LSTMCNN-BLSTMBEST
1(3000,9,3)1.2928.39.5918.31.29
2(2500,9,3)0.3122.55.0212.50.31
3(2700,9,3)3.4431.810.632.63.44
4(2000,9,3)2.2745.810.432.32.27
5(3100,9,3)2.7542.710.436.32.75
6(1800,9,3)2.6944.510.132.22.69
7(1900,9,3)2.6844.911.532.12.68
8(2100,9,3)2.9342.510.134.72.93
Average(2750,9,3)2.2937.99.728.92.29
CaseData dimensionMSE (×108)
GP-NARXaLSTMCNN-LSTMCNN-BLSTMBEST
1(3000,9,3)1.2928.39.5918.31.29
2(2500,9,3)0.3122.55.0212.50.31
3(2700,9,3)3.4431.810.632.63.44
4(2000,9,3)2.2745.810.432.32.27
5(3100,9,3)2.7542.710.436.32.75
6(1800,9,3)2.6944.510.132.22.69
7(1900,9,3)2.6844.911.532.12.68
8(2100,9,3)2.9342.510.134.72.93
Average(2750,9,3)2.2937.99.728.92.29

Note: The best model, which has the lowest MSE, is highlighted as bold font.

a

GP-NARX is trained based on subsampling.

Figure 16 presents the comparison of predictions and the true responses for one of the eight testing cases. Following that, Fig. 17 shows the corresponding kernel density estimates of error distribution for the four surrogate models. The results in Figs. 16 and 17 show that the predictive model constructed using GP-NARX has the highest prediction accuracy for this testing case (i.e., Case 1 in Table 4).

Fig. 16
Comparison of prediction and true response for Case 1 for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM
Fig. 16
Comparison of prediction and true response for Case 1 for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM
Close modal
Fig. 17
Comparison of prediction error distribution for a testing case
Fig. 17
Comparison of prediction error distribution for a testing case
Close modal

4.3 Example 3: The Bouc-Wen Model.

In structural engineering, many nonlinear inelastic material behaviors are of interest in structures such as reinforced concrete, steel, base isolation systems, damping devices, etc. [58] The Bouc-Wen model, proposed by Refs. [59,60], and extended by Refs. [61], is used in this work as the third example due to its flexibility to capture the behavior of many inelastic material models by just changing its tunable parameters.

There are different variants of the Bouc-Wen model. However, the model and notation shown in Ref. [62] have been adopted in this work. To describe this model, consider the single degree of freedom system:
υx¨(t)+ψx˙(t)+F(x˙(t),x(t))F(t)=α(t)
(31)
where x(t), x˙(t),andx¨(t) are the displacement, velocity, and acceleration response of the SDOF system, respectively. Also, υ, ψ, and α(t) represent the mass, damping coefficient, and excitation force, respectively. The term F(t) is the restoring force, which according to the Bouc-Wen Model, can be expressed as follows:
F(t)=ηkiu(t)+(1η)kiz(t)
(32)
where η denotes the ratio between the post-yield stiffness and the pre-yield stiffness (i.e., ki) and z(t) denotes a nonobservable hysteretic displacement. As noted, Eq. (32) is modeled as the sum of a linear and a nonlinear function.
The nonlinear function is obtained by solving the following nonlinear differential equation
z˙(t)=x˙(t)β|x˙(t)||z(t)|ς1z(t)γx˙(t)|z(t)|ς
(33)
which may be solved by using iterative methods to approximate its solution. The variables β, γ, and ς are the Bouc-Wen tunable parameters that are used to model several different materials. For more details on applying the Bouc-Wen Model to an MDOF, the reader can refer to Ref. [4]. For this work, these values are set as constants. Figure 18 shows an example of input excitation and corresponding dynamic responses for a two-storey building nonlinear dynamic system.
Fig. 18
An example of random excitation and corresponding dynamic responses
Fig. 18
An example of random excitation and corresponding dynamic responses
Close modal

In this example, we first generate 100 random excitations as the training time series. Based on the training excitations, 29,011 training samples are obtained. Using the subsampling approach discussed in Sec. 3.1.2, 2000 training data are sub-selected. Similar as that in Example 2, a GP-NARX model is trained using the sub-selected training data and two sets of LSTM, CNN-LSTM, and CNN-BLSTM models are, respectively, trained with the sub-selected data and with all available training data. The performance of LSTM, CNN-LSTM, and CNN-BLSTM models trained with subsampling is also much worse than that trained with all available data. We therefore also only compare GP-NARX trained using sub-selected data with deep learning models trained using all available data.

Tables 5 and 6 give the MSE comparison of the four surrogate models for all eight testing cases for the drifts of degree of freedom (DOF) 1 and 2. The average MSE of GP-NARX for the eight test datasets is 0.011 and 0.017 respectively for the drifts of the two DOFs. It implies that GP-NARX is the best among the four studied surrogate models for this particular example based on this group of training data. In addition, LSTM has the best performance for test case 1 while GP-NARX has a better overall performance than the other methods.

Table 5

MSE comparison of the four surrogate modeling methods for Drift of DOF1

CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(520,2)0.0250.0150.1780.1600.015
2(455,2)0.0220.0230.0790.0850.022
3(470,2)0.0070.0270.1160.1240.007
4(438,2)0.0060.0210.0740.1070.006
5(289,2)0.0090.0100.0700.0710.009
6(407,2)0.0050.0130.0850.1030.005
7(518,2)0.0060.0130.0890.1100.006
8(393,2)0.0080.0290.0600.0490.008
Average(436,2)0.0110.0190.0940.1010.011
CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(520,2)0.0250.0150.1780.1600.015
2(455,2)0.0220.0230.0790.0850.022
3(470,2)0.0070.0270.1160.1240.007
4(438,2)0.0060.0210.0740.1070.006
5(289,2)0.0090.0100.0700.0710.009
6(407,2)0.0050.0130.0850.1030.005
7(518,2)0.0060.0130.0890.1100.006
8(393,2)0.0080.0290.0600.0490.008
Average(436,2)0.0110.0190.0940.1010.011

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Table 6

MSE comparison of the four surrogate modeling methods for Drift of DOF2

CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(520,2)0.0310.0130.1100.1690.013
2(455,2)0.0340.0600.2590.2070.034
3(470,2)0.0160.0840.1510.1680.016
4(438,2)0.0120.0670.1210.1300.012
5(289,2)0.0140.0200.2090.1940.014
6(407,2)0.0100.0120.0730.1080.010
7(518,2)0.0120.0220.1240.1200.012
8(393,2)0.0060.0230.0750.2280.006
Average(436,2)0.0170.0380.1400.1660.017
CaseData dimensionGP-NARXLSTMCNN-LSTMCNN-BLSTMBEST
1(520,2)0.0310.0130.1100.1690.013
2(455,2)0.0340.0600.2590.2070.034
3(470,2)0.0160.0840.1510.1680.016
4(438,2)0.0120.0670.1210.1300.012
5(289,2)0.0140.0200.2090.1940.014
6(407,2)0.0100.0120.0730.1080.010
7(518,2)0.0120.0220.1240.1200.012
8(393,2)0.0060.0230.0750.2280.006
Average(436,2)0.0170.0380.1400.1660.017

Note: The best model, which has the lowest MSE, is highlighted as bold font.

Figure 19 gives the random input excitation of a case study that is used to verify the accuracy of various surrogate models. Following that, Figs. 20 and 21 present the comparison of surrogate model prediction and the true response (i.e., force–displacement hysteresis) for the testing case (i.e., Case 4 in Tables 3 and 4). Figures 2225 show the comparison of drifts from surrogate model prediction and the true response. To more intuitively compare the accuracy of different methods, Fig. 26 depicts the prediction error distributions of drifts for the four surrogate models.

Fig. 19
Random input excitation of a case study that is used to verify the accuracy of various surrogate models
Fig. 19
Random input excitation of a case study that is used to verify the accuracy of various surrogate models
Close modal
Fig. 20
Comparison of prediction and true response of a testing case for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM (force–displacement hysteresis of story 1)
Fig. 20
Comparison of prediction and true response of a testing case for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM (force–displacement hysteresis of story 1)
Close modal
Fig. 21
Comparison of prediction and true response of a testing case for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM (force–displacement hysteresis of story 2)
Fig. 21
Comparison of prediction and true response of a testing case for GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM (force–displacement hysteresis of story 2)
Close modal
Fig. 22
Comparison of prediction and true response of drifts for GP-NARX
Fig. 22
Comparison of prediction and true response of drifts for GP-NARX
Close modal
Fig. 23
Comparison of prediction and true response of drifts for LSTM
Fig. 23
Comparison of prediction and true response of drifts for LSTM
Close modal
Fig. 24
Comparison of prediction and true response of drifts for CNN-LSTM
Fig. 24
Comparison of prediction and true response of drifts for CNN-LSTM
Close modal
Fig. 25
Comparison of prediction and true response of drifts for CNN-BLSTM
Fig. 25
Comparison of prediction and true response of drifts for CNN-BLSTM
Close modal
Fig. 26
Comparison of probability density functions of prediction errors for a testing case
Fig. 26
Comparison of probability density functions of prediction errors for a testing case
Close modal

The results in Figs. 2026 show that the predictive model constructed using GP-NARX has the highest prediction accuracy for this particular testing case. Among the three deep learning methods, LSTM has the best performance while CNN-LSTM and CNN-BLSTM show similar performances.

5 Discussion and Conclusion

This paper performs a comparative study for surrogate modeling of nonlinear dynamic systems using four types of surrogate models, namely, GP-NARX, LSTM, CNN-LSTM, and CNN-BLSTM. The performances of different surrogate models for multi-step-ahead prediction are compared using three examples. The results show that (1) GP-NARX in general performs better than the other three types of surrogate modeling methods for the three particular examples; (2) the performance of GP-NARX is also relatively more robust than the other methods; (3) for some cases, deep learning-based surrogate modeling methods (i.e., LSTM, CNN-LSTM, and CNN-BLSTM) perform better than GP-NARX. But the required tuning effort of deep learning-based methods is higher than that of GP-NARX; (4) deep learning-based surrogate modeling methods are more suitable for the prediction that has a data shape which is similar to the training datasets; and (5) all the studied surrogate modeling methods can perform long-term multi-step prediction based on training data of short periods. It also demonstrates the promising potential of various machine learning methods for surrogate modeling of nonlinear dynamic systems.

Even though deep learning methods are becoming more and more popular in many fields, they do not provide a universal solution to all problems. For instance, deep learning-based methods are better than the conventional GP-NARX for several testing cases in the first example. But GP-NARX in general performs better than deep learning-based methods in the second and third examples. It is not suggested to substitute conventional machine learning methods with deep learning methods for surrogate modeling in various engineering applications without detailed investigations. It is therefore suggested that for problems that can be solved using conventional methods such as GP-NARX, following the “Occam’s Razor” principle, we should use such a method, since the required tuning effort of conventional methods is much lower. Deep learning methods (i.e., LSTM, CNN-LSTM, and CNN-BLSTM) have a significant advantage over GP-NARX in dealing with big data, such as videos, images, and high-dimensional data. Even for big data problems, GP-NARX method is still worth investigating after using dimension reduction techniques, such as autoencoder, or using subsampling method as what been discussed in this paper. The guidance of choosing surrogate models is to always start from the classical GP-NARX models whenever it is applicable, since it tends to be more robust and easier to tune than deep learning methods. In some situations, deep learning models can be used in conjunction with GP-NARX to improve the predictive capability. For instance, a CNN model can be used to reduce the high-dimensional problems into low-dimensional ones in the latent space, within which GP-NARX models can be used to capture temporal variability over time.

This paper only compared the performance of various surrogate modeling methods for deterministic predictions. Since the confidence of dynamic prediction is usually of interest to decision makers, the capability of various methods in providing probabilistic prediction is worth investigating in the future. In addition, the quality of surrogate models is affected by the data used for training, how to collect the most effective data for training surrogate models of nonlinear dynamic systems is also a very interesting research topic to be further studied.

Acknowledgment

The US Army Corps of Engineers provided funding for this work through the U.S. Army Engineer Research and Development Center Research Cooperative Agreement W912HZ-17-2-0024. The support is gratefully acknowledged. LA-UR-21-31231 approved for public release; distribution is unlimited.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper.

References

1.
Irizarry
,
R.
,
2005
, “
A Generalized Framework for Solving Dynamic Optimization Problems Using the Artificial Chemical Process Paradigm: Applications to Particulate Processes and Discrete Dynamic Systems
,”
Chem. Eng. Sci.
,
60
(
21
), pp.
5663
5681
.
2.
Xu
,
B.
,
He
,
J.
, and
Masri
,
S. F.
,
2015
, “
Data-Based Model-Free Hysteretic Restoring Force and Mass Identification for Dynamic Systems
,”
Comput.-Aided Civ. Infrastruct. Eng.
,
30
(
1
), pp.
2
18
.
3.
Pravin
,
P.
,
Bhartiya
,
S.
, and
Gudi
,
R. D.
,
2019
, “
Modeling and Predictive Control of an Integrated Reformer–Membrane–Fuel Cell–Battery Hybrid Dynamic System
,”
Ind. Eng. Chem. Res.
,
58
(
26
), pp.
11392
11406
.
4.
Tahmasian
,
S.
,
2021
, “
Dynamic Analysis and Optimal Control of Drag-Based Vibratory Systems Using Averaging
,”
Nonlinear Dyn.
,
104
(
3
), pp.
2201
2217
.
5.
Li
,
M.
, and
Lai
,
A. C. K.
,
2015
, “
Review of Analytical Models for Heat Transfer by Vertical Ground Heat Exchangers (GHEs): A Perspective of Time and Space Scales
,”
Appl. Energy
,
151
(
1
), pp.
178
191
.
6.
Butt
,
W. A.
,
Yan
,
L.
, and
Amezquita S
,
K.
,
2013
, “
Adaptive Integral Dynamic Surface Control of a Hypersonic Flight Vehicle
,”
Int. J. Syst. Sci.
,
46
(
10
), pp.
1717
1728
.
7.
Yoon
,
S.
,
2003
, “
A Study on Terrain-Surface Modeling and Searching Algorithms for Real-Time Simulation of Off-Road Vehicles
,”
Veh. Syst. Dyn.
,
39
(
5
), pp.
353
363
.
8.
Ouyang
,
M.
,
2014
, “
Review on Modeling and Simulation of Interdependent Critical Infrastructure Systems
,”
Reliab. Eng. Syst. Saf.
,
121
(
1
), pp.
43
60
.
9.
Gerdes
,
J.
,
Bruck
,
H. A.
, and
Gupta
,
S. K.
,
2019
, “
A Simulation-Based Approach to Modeling Component Interactions During Design of Flapping Wing Aerial Vehicles
,”
Int. J. Micro Air Veh.
,
11
(
1
), p.
1756829318822325
.
10.
Tavecchia
,
G.
,
Miranda
,
M. A.
,
Borras
,
D.
,
Bengoa
,
M.
,
Barcelo
,
C.
,
Paredes-Esquivel
,
C.
, and
Schwarz
,
C.
,
2017
, “
Modelling the Range Expansion of the Tiger Mosquito in a Mediterranean Island Accounting for Imperfect Detection
,”
Front. Zool.
,
14
(
1
), p.
39
.
11.
Thomas
,
N.
,
Portyankina
,
G.
,
Hansen
,
C. J.
, and
Pommerol
,
A.
,
2011
, “
HiRISE Observations of Gas Sublimation-Driven Activity in Mars’ Southern Polar Regions: IV. Fluid Dynamics Models of CO2 Jets
,”
Icarus
,
212
(
1
), pp.
66
85
.
12.
Rivals
,
I.
, and
Personnaz
,
L.
,
1996
, “
Black-Box Modeling With State-Space Neural Networks
,”
Neural Adaptive Control Technol.
,
15
(
1
), pp.
237
264
.
13.
Pillonetto
,
G.
, and
De Nicolao
,
G.
,
2010
, “
A New Kernel-Based Approach for Linear System Identification
,”
Automatica
,
46
(
1
), pp.
81
93
.
14.
Hong
,
X.
,
Mitchell
,
R. J.
,
Chen
,
S.
,
Harris
,
C. J.
,
Li
,
K.
, and
Irwin
,
G. W.
,
2008
, “
Model Selection Approaches for Non-Linear System Identification: A Review
,”
Int. J. Syst. Sci.
,
39
(
10
), pp.
925
946
.
15.
Masti
,
D.
, and
Bemporad
,
A.
,
2021
, “
Learning Nonlinear State–Space Models Using Autoencoders
,”
Automatica
,
129
(
1
), p.
109666
.
16.
Deshmukh
,
A. P.
, and
Allison
,
J. T.
,
2017
, “
Design of Dynamic Systems Using Surrogate Models of Derivative Functions
,”
ASME J. Mech. Des.
,
139
(
10
), p.
101402
.
17.
Gedon
,
D.
,
Wahlström
,
N.
,
Schön
,
T. B.
, and
Ljung
,
L.
,
2021
, “
Deep State Space Models for Nonlinear System Identification
,”
IFAC-PapersOnLine
,
54
(
7
), pp.
481
486
.
18.
Schön
,
T. B.
,
Wills
,
A.
, and
Ninness
,
B.
,
2011
, “
System Identification of Nonlinear State-Space Models
,”
Automatica
,
47
(
1
), pp.
39
49
.
19.
van Der Voort
,
M.
,
Dougherty
,
M.
, and
Watson
,
S.
,
1996
, “
Combining Kohonen Maps With Arima Time Series Models to Forecast Traffic Flow
,”
Transp. Res. C: Emerg. Technol.
,
4
(
5
), pp.
307
318
.
20.
Ing
,
C.-K.
, and
Wei
,
C.-Z.
,
2005
, “
Order Selection for Same-Realization Predictions in Autoregressive Processes
,”
Ann. Stat.
,
33
(
5
), pp.
2423
2474
. DOI: 10.1214/009053605000000525
21.
Li
,
D.
,
2012
, “
A Note on Moving-Average Models With Feedback
,”
J. Time Ser. Anal.
,
33
(
6
), pp.
873
879
.
22.
Chang
,
S.
,
Wei
,
X.
,
Su
,
F.
,
Liu
,
C.
,
Yi
,
G.
,
Wang
,
J.
,
Han
,
C.
, and
Che
,
Y.
,
2020
, “
Model Predictive Control for Seizure Suppression Based on Nonlinear Auto-Regressive Moving-Average Volterra Model
,”
IEEE Trans. Neural Syst. Rehabil. Eng.
,
28
(
10
), pp.
2173
2183
.
23.
Mustapa
,
R. F.
,
Dahlan
,
N. Y.
,
Yassin
,
A. I.
, and
Nordin
,
A. H.
,
2020
, “
Quantification of Energy Savings From an Awareness Program Using NARX-ANN in an Educational Building
,”
Energy Build.
,
215
(
1
), p.
109899
.
24.
Worden
,
K.
,
Becker
,
W. E.
,
Rogers
,
T. J.
, and
Cross
,
E. J.
,
2018
, “
On the Confidence Bounds of Gaussian Process NARX Models and Their Higher-Order Frequency Response Functions
,”
Mech. Syst. Signal Process
,
104
(
1
), pp.
188
223
.
25.
Zhang
,
Y.
,
Kimberg
,
D. Y.
,
Coslett
,
H. B.
,
Schwartz
,
M. F.
, and
Wang
,
Z.
,
2014
, “
Multivariate Lesion-Symptom Mapping Using Support Vector Regression
,”
Hum. Brain Mapp.
,
35
(
12
), pp.
5861
5876
.
26.
Tao
,
J.-W.
, and
Wang
,
S.-T.
,
2012
, “
Kernel Support Vector Machine for Domain Adaptation
,”
Zi dong hua xue bao
,
38
(
5
), pp.
797
811
. DOI:10.3724/SP.J.1004.2012.00797
27.
Papadopoulos
,
S.
,
Azar
,
E.
,
Woon
,
W.-L.
, and
Kontokosta
,
C. E.
,
2017
, “
Evaluation of Tree-Based Ensemble Learning Algorithms for Building Energy Performance Estimation
,”
J. Build. Perform. Simul.
,
11
(
3
), pp.
322
332
.
28.
Chen
,
W.
,
Zhang
,
S.
,
Li
,
R.
, and
Shahabi
,
H.
,
2018
, “
Performance Evaluation of the GIS-Based Data Mining Techniques of Best-First Decision Tree, Random Forest, and Naïve Bayes Tree for Landslide Susceptibility Modeling
,”
Sci. Total Environ.
,
644
(
1
), pp.
1006
1018
.
29.
Mantas
,
C. J.
,
Castellano
,
J. G.
,
Moral-García
,
S.
, and
Abellán
,
J.
,
2019
, “
A Comparison of Random Forest Based Algorithms: Random Credal Random Forest Versus Oblique Random Forest
,”
Soft Comput.
,
23
(
21
), pp.
10739
10754
.
30.
Loukeris
,
N.
, and
Eleftheriadis
,
I.
,
2015
, “
Further Higher Moments in Portfolio Selection and A Priori Detection of Bankruptcy, Under Multi-Layer Perceptron Neural Networks, Hybrid Neuro-Genetic MLPs, and the Voted Perceptron
,”
Int. J. Finance Econ.
,
20
(
4
), pp.
341
361
.
31.
Botvinick
,
M. M.
, and
Plaut
,
D. C.
,
2006
, “
Short-term Memory for Serial Order: A Recurrent Neural Network Model
,”
Psychol. Rev.
,
113
(
2
), pp.
201
233
.
32.
Song
,
X.
,
Liu
,
Y.
,
Xue
,
L.
,
Wang
,
J.
,
Zhang
,
J.
,
Wang
,
J.
,
Jiang
,
L.
, and
Cheng
,
Z.
,
2020
, “
Time-Series Well Performance Prediction Based on Long Short-Term Memory (LSTM) Neural Network Model
,”
J. Pet. Sci. Eng.
,
186
(
1
), p.
106682
.
33.
Chen
,
H.
,
Zhang
,
Y.
,
Kalra
,
M. K.
,
Lin
,
F.
,
Chen
,
Y.
,
Liao
,
P.
,
Zhou
,
J.
, and
Wang
,
G.
,
2017
, “
Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network
,”
IEEE Trans. Med. Imaging
,
36
(
12
), pp.
2524
2535
.
34.
Pak
,
U.
,
Kim
,
C.
,
Ryu
,
U.
,
Sok
,
K.
, and
Pak
,
S.
,
2018
, “
A Hybrid Model Based on Convolutional Neural Networks and Long Short-Term Memory for Ozone Concentration Prediction
,”
Air Qual. Atmos. Health
,
11
(
8
), pp.
883
895
.
35.
Filiński
,
M.
,
Wachel
,
P.
, and
Tiels
,
K.
,
2021
,
Low-Dimensional Decompositions for Nonlinear Finite Impulse Response Modeling
,
Springer International Publishing
,
Springer, Cham
, pp.
352
359
.
36.
Shokry
,
A.
,
Baraldi
,
P.
,
Zio
,
E.
, and
Espuña
,
A.
,
2020
, “
Dynamic Surrogate Modeling for Multistep-Ahead Prediction of Multivariate Nonlinear Chemical Processes
,”
Ind. Eng. Chem. Res.
,
59
(
35
), pp.
15634
15655
.
37.
Piga
,
D.
, and
Tóth
,
R.
,
2014
, “
A Bias-Corrected Estimator for Nonlinear Systems with Output-Error Type Model Structures
,”
Automatica
,
50
(
9
), pp.
2373
2380
.
38.
Rahrooh
,
A.
, and
Shepard
,
S.
,
2009
, “
Identification of Nonlinear Systems Using NARMAX Model
,”
Nonlinear Anal. Theory Methods Appl.
,
71
(
12
), pp.
e1198
e1202
.
39.
ElSaid
,
A.
,
El Jamiy
,
F.
,
Higgins
,
J.
,
Wild
,
B.
, and
Desell
,
T.
,
2018
, “
Optimizing Long Short-Term Memory Recurrent Neural Networks Using Ant Colony Optimization to Predict Turbine Engine Vibration
,”
Appl. Soft Comput.
,
73
(
1
), pp.
969
991
.
40.
Xue
,
P.
,
Jiang
,
Y.
,
Zhou
,
Z.
,
Chen
,
X.
,
Fang
,
X.
, and
Liu
,
J.
,
2019
, “
Multi-Step Ahead Forecasting of Heat Load in District Heating Systems Using Machine Learning Algorithms
,”
Energy
,
188
(
1
). DOI: 10.1016/j.energy.2019.116085
41.
Zhan
,
X.
,
Zhang
,
S.
,
Szeto
,
W. Y.
, and
Chen
,
X.
,
2019
, “
Multi-Step-Ahead Traffic Speed Forecasting Using Multi-Output Gradient Boosting Regression Tree
,”
Intell. Transp. Syst.
,
24
(
2
), pp.
125
141
.
42.
Wang
,
J.
,
Jiang
,
H.
,
Wu
,
Y.
, and
Dong
,
Y.
,
2015
, “
Forecasting Solar Radiation Using an Optimized Hybrid Model by Cuckoo Search Algorithm
,”
Energy
,
81
(
1
), pp.
627
644
.
43.
Brownlee
,
J.
,
2017
, “
4 Strategies for Multi-Step Time Series Forecasting
,” https://machinelearningmastery.com/multi-step-time-series-forecasting/, Accessed August 15, 2021.
44.
Yu
,
H.
,
Chung
,
C. Y.
,
Wong
,
K. P.
,
Lee
,
H. W.
, and
Zhang
,
J. H.
,
2009
, “
Probabilistic Load Flow Evaluation With Hybrid Latin Hypercube Sampling and Cholesky Decomposition
,”
IEEE Trans. Power Syst.
,
24
(
2
), pp.
661
667
.
45.
Rasmussen
,
C. E.
, and
Williams
,
C. K.
,
2006
,
Gaussian Processes for Machine Learning
,
MIT Press
,
Cambridge, MA
.
46.
Johnson
,
M. E.
,
Moore
,
L. M.
, and
Ylvisaker
,
D.
,
1990
, “
Minimax and Maximin Distance Designs
,”
J. Stat. Plan. Inference
,
26
(
2
), pp.
131
148
.
47.
Hu
,
Z.
,
Hu
,
C.
,
Mourelatos
,
Z. P.
, and
Mahadevan
,
S.
,
2019
, “
Model Discrepancy Quantification in Simulation-Based Design of Dynamical Systems
,”
ASME J. Mech. Des.
,
141
(
1
), p.
011401
.
48.
Lee
,
H. I.
,
Choi
,
I. Y.
,
Moon
,
H. S.
, and
Kim
,
J. K.
,
2020
, “
A Multi-Period Product Recommender System in Online Food Market Based on Recurrent Neural Networks
,”
Sustainability (Basel, Switzerland)
,
12
(
3
), p.
969
. DOI:10.3390/su12030969
49.
Zhou
,
L.
,
2020
, “
Product Advertising Recommendation in e-Commerce Based on Deep Learning and Distributed Expression
,”
Electron. Commer. Res.
,
20
(
2
), pp.
321
342
.
50.
Jiang
,
X.
,
Sun
,
J.
,
Li
,
C.
, and
Ding
,
H.
,
2018
, “
Video Image Defogging Recognition Based on Recurrent Neural Network
,”
IEEE Trans. Industr. Inform.
,
14
(
7
), pp.
3281
3288
.
51.
Li
,
S.
,
Xie
,
G.
,
Ren
,
J.
,
Guo
,
L.
,
Yang
,
Y.
, and
Xu
,
X.
,
2020
, “
Urban PM 2.5 Concentration Prediction via Attention-Based CNN–LSTM
,”
Appl. Sci.
,
10
(
6
), p.
1953
.
52.
Tian
,
Y.
,
Zhang
,
K.
,
Li
,
J.
,
Lin
,
X.
, and
Yang
,
B.
,
2018
, “
LSTM-Based Traffic Flow Prediction With Missing Data
,”
Neurocomputing
,
318
(
1
), pp.
297
305
.
53.
Li
,
Y.
,
Su
,
H.
,
Qi
,
C. R.
,
Fish
,
N.
,
Cohen-Or
,
D.
, and
Guibas
,
L. J.
,
2015
, “
Joint Embeddings of Shapes and Images via CNN Image Purification
,”
ACM Trans. Graph.
,
34
(
6
), pp.
1
12
. DOI:10.1145/2816795.2818071
54.
Sarin
,
H.
,
Kokkolaras
,
M.
,
Hulbert
,
G.
,
Papalambros
,
P.
,
Barbat
,
S.
, and
Yang
,
R.-J.
,
2008
, “
A Comprehensive Metric for Comparing Time Histories in Validation of Simulation Models With Emphasis on Vehicle Safety Applications
,”
Proc. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Brooklyn, NY
,
Aug. 3–6
, pp.
1275
1286
.
55.
Ao
,
D.
,
Hu
,
Z.
, and
Mahadevan
,
S.
,
2017
, “
Dynamics Model Validation Using Time-Domain Metrics
,”
J. Verif. Valid. Uncertain. Quantif.
,
2
(
1
), p.
011004
.
56.
Sarin
,
H.
,
Kokkolaras
,
M.
,
Hulbert
,
G.
,
Papalambros
,
P.
,
Barbat
,
S.
, and
Yang
,
R.-J.
,
2010
, “
Comparing Time Histories for Validation of Simulation Models: Error Measures and Metrics
,”
J. Dyn. Syst. Meas. Control
,
132
(
6
), p.
061401
.
57.
Wiggins
,
S.
,
1987
, “
Chaos in the Quasiperiodically Forced Duffing Oscillator
,”
Phys. Lett. A
,
124
(
3
), pp.
138
142
.
58.
Pei
,
J.-S.
,
Gay-Balmaz
,
F.
,
Luscher
,
D. J.
,
Beck
,
J. L.
,
Todd
,
M. D.
,
Wright
,
J. P.
,
Qiao
,
Y.
,
Quadrelli
,
M. B.
,
Farrar
,
C. R.
, and
Lieven
,
N. A.
,
2021
, “
Connecting Mem-Models With Classical Theories
,”
Nonlinear Dyn.
,
103
(
2
), pp.
1321
1344
.
59.
Bouc
,
R.
,
1967
, “
Forced Vibrations of Mechanical Systems With Hysteresis
,”
Proceedings of the Fourth Conference on Nonlinear Oscillations
,
Prague
,
Sept. 5–9
, p.
315
.
60.
Bouc
,
R.
,
1971
, “
A Mathematical Model for Hysteresis
,”
Acta Acust. United Acust.
,
24
(
1
), pp.
16
25
.
61.
Wen
,
Y.-K.
,
1976
, “
Method for Random Vibration of Hysteretic Systems
,”
J. Eng. Mech.
,
102
(
2
), pp.
249
263
.
62.
Lei
,
Y.
,
Wu
,
Y.
, and
Li
,
T.
,
2012
, “
Identification of Non-Linear Structural Parameters Under Limited Input and Output Measurements
,”
Int. J. Non Linear Mech.
,
47
(
10
), pp.
1141
1146
.