Abstract
In recent years, identification of nonlinear dynamical systems from data has become increasingly popular. Sparse regression approaches, such as sparse identification of nonlinear dynamics (SINDy), fostered the development of novel governing equation identification algorithms assuming the state variables are known a priori and the governing equations lend themselves to sparse, linear expansions in a (nonlinear) basis of the state variables. In the context of the identification of governing equations of nonlinear dynamical systems, one faces the problem of identifiability of model parameters when state measurements are corrupted by noise. Measurement noise affects the stability of the recovery process yielding incorrect sparsity patterns and inaccurate estimation of coefficients of the governing equations. In this work, we investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements and numerically estimate the state-time derivatives to improve the accuracy and robustness of two sparse regression methods to recover governing equations: sequentially thresholded least squares (STLS) and weighted basis pursuit denoising (WBPDN) algorithms. We empirically show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point. We additionally compare generalized cross-validation (GCV) and Pareto curve criteria as model selection techniques to automatically estimate near optimal tuning parameters and conclude that Pareto curves yield better results. The performance of the denoising strategies and sparse regression methods is empirically evaluated through well-known benchmark problems of nonlinear dynamical systems.
1 Introduction
Engineers rely heavily on mathematical models of real systems. In the design process, engineers seek the best models that describe the physical system they seek to design and optimize. However, physical processes are not only highly complex and inadequately understood but also the sources of noise are critically dependent upon the nature of nonlinearities. System identification allows for any system, however complex or obscure, to be modeled solely from measurements.
Recent advances in data acquisition systems along with modern data science techniques have fostered the development of accurate data-driven approaches [1] for modeling of complex nonlinear dynamical systems with the aim of understanding their behavior and predicting future states. A current trend to identify dynamical system models relies on approximating the governing equations in an over-complete basis of the state variables and eliminate the expansion terms that do not carry significant signal energy. Examples of this approach include polynomial NARMAX [2], symbolic polynomial regression [3,4], and sparse (polynomial) regression [5,6] dubbed sparse identification of nonlinear dynamics (SINDy) in Ref. [6].
System identification via sparse (polynomial) regression employs techniques from compressed sensing—specifically regularization via sparsity promoting norms, such as ℓ0- and ℓ1-norms, to identify an a priori unknown subset of the basis describing the dynamics. The identified dynamics may then be further analyzed to understand the physical behavior of the dynamical system and can be integrated in time to predict future states of the system. Stage-wise regression [7], matching pursuits [8–10], least absolute shrinkage and selection operator (LASSO) [11], least angle regression (LARS) [12], sequentially thresholded least squares (STLS) [6], and basis pursuit denoising (BPDN) [13] are some sparsity promoting algorithms that may be used for model recovery. For the interested reader, Ref. [14] provides a thorough comparison of different state-of-the-art sparse regression algorithms. For high-dimensional systems and large data sets, sparse regression methods often suffer from the curse of dimensionality since they can be prohibitively expensive in terms of computational cost and storage capacity. Even though this work explores tractable low-dimensional systems, several methods have been proposed to mitigate the curse of dimensionality for high-dimensional systems such as coordinate reduction via auto-encoders [15,16], or efficient array manipulation via low-rank tensor decompositions [17].
In practice, analysts only have access to state measurements from real sensors that are invariably corrupted by noise. Although sparse regression methods perform well for low levels of measurement noise, the sparsity pattern and accuracy of the recovered coefficients are usually very sensitive to higher noise levels. Several techniques have been proposed to mitigate the effect of noise for recovering governing equations from state measurements. Denoising strategies for governing equation recovery can be divided into two categories: simultaneous denoising and equation recovery, and a priori state denoising. The former strategy generally involves nonlinear optimization procedures where the loss function usually contains a measurement data loss which measures mismatch between state measurements and state predictions, and a state derivative error which measures the deviation between the state-time derivatives and the estimated governing equation. Some of the simultaneous methods employ neural networks [18] and automatic differentiation [19] to decompose the noisy measurements into the deterministic and random noisy components to recover the governing equations more accurately. Others use smoothing splines [20] and deep neural networks [21] as a surrogate model to approximate the state variables for the data loss which then can be differentiated to obtain state-time derivatives required for the state derivative error. However, simultaneous methods often require solving challenging nonlinear optimization problems and fine tuning of the hyperparameters of the algorithm. The recent a priori denoising strategy, adopted in this article, focuses on estimating accurate and smooth state measurements and their derivatives as a preprocessing step for sparse regression algorithms to produce significant improvements in the accuracy of the identified dynamics. We adopt the SINDy framework, as proposed in Ref. [6], and compare different approaches to denoise the state measurements and estimate accurate time derivatives with no prior knowledge of the process that corrupted the measurements. Typically, numerical differentiation is carried out using finite differences. However, since data measured by digital equipment are inherently discrete and contain noise due to the device properties and quantization effects, naive finite differences produce inaccurate derivative estimates [22–24]. A common approach to obtain derivative estimates is to approximate the underlying signal by a fitting function from the measured data. One then hopes that the derivative found from this approximation is accurate enough for noise filtering.
There exists extensive work on nonparametric regression smoothing and numerical differentiation of noisy data [24–30], also referred to as denoising techniques in this article. The underlying assumption of nonparametric smoothing techniques is that the signal is smooth and the derivatives up to certain order exist. Then, to filter out undesired noisy components from the signal, one poses the smoothing task as a tradeoff between the accuracy and smoothness of the filtered solution. Denoising strategies can be split into local and global techniques. Local denoising fit a low-degree polynomial to the nearest neighbors of each data point with a weight proportional to inverse distance from the neighbor to the data point [28]. Global denoising use the entire data points to estimate a denoised version of the original data. Some of the common denoising techniques include local kernel smoothers [31,32], smoothing splines [33], Tikhonov regularization [34,35], total variation regularization [36,37], Fourier-based smoothing [29,38], wavelet-based denoising [39,40], or singular spectrum analysis [41,42]. More recent trends use deep neural networks for signal denoising [43–45]. However, these methods often require large amounts of data and several trajectory realizations, making them unsuitable for the present application, where we focus on a single trajectory. This article compares the performance of Savitzky– Golay filter and LOWESS as local methods; and, smoothing splines, Tikhonov smoother, and ℓ1-trend filtering as global smoothers.
A key challenge of smoothing methods is the selection of the smoothing hyperparameter that controls the fidelity of the filtered solution, relative to the original, noisy measurements, and its smoothness after removing undesired (high-frequency) noise components. In the statistics literature, the compromise between accuracy and smoothness is often referred as the bias-variance tradeoff [46]. In general, the filtered solution is highly sensitive to the smoothing hyperparameter, and a suitable selection criterion is of paramount importance. Several methods have been proposed to select appropriate smoothing hyperparameters such as Akaike information criterion (AIC) [47] and Bayesian information criterion (BIC) [48], Mallow’s Cp [49], discrepancy principle [50,51], cross-validation (CV) [52,53] and the Pareto curve (also L-curve) criterion [54,55], among others. An ideal model selection criterion should approximate the solution as close as possible to the unknown dynamical process that generates the exact (i.e., noiseless) data. Hence, the selection of both the smoothing algorithm and the model selection criterion is inherently linked and should be studied simultaneously to select a smoothing parameter that is near optimal with respect to a suitable error measure. Due to their favorable properties, this article focuses on two robust model selection criteria: generalized cross-validation (GCV) and Pareto curve criterion.
1.1 Contribution of this Work.
The purpose of this article is to present an overview of major techniques for measurement denoising and numerical differentiation as a preprocessing step to recover the governing equations of nonlinear dynamical systems. Our aim is to provide an empirical assessment of several best-performing filtering techniques for system identification rather than delivering a thorough theoretical analysis.
In this work, we study and assess the performance of local and global filtering regression techniques to denoise state measurements and accurately estimate state-time derivatives. To the best of our knowledge, no prior work has been done on both denoising and numerical differentiation tailored to data-driven governing equation recovery. We extend the findings in Refs. [24,28,29,56,57] and apply smoothing techniques to dynamical systems, where the state trajectories are assumed to be smooth on the Poincaré map. One of the critical aspects of filtering and regularization techniques is the automatic selection of a suitable hyperparameter that controls the smoothness, for filtering, and sparsity, for equation recovery, of the solution. We also compare simple and robust model selection techniques to automatically select appropriate smoothing and regularization parameters. Although this article focuses on the differential form of sparse identification algorithms, the findings can also be used to any identification algorithm that requires accurate state measurement data and their time derivatives. For example, the smoothed data could be fed into equation-error methods, typical in aircraft system identification [58,59].
We begin, in the next section, by presenting a background on recovering dynamical system equations from state measurements using sparsity promoting regression techniques. In Sec. 3, we present an overview of different local and global smoothing techniques to denoise state measurements and estimate accurate state-time derivatives. In Sec. 4, we introduce two hyperparameter selection techniques, namely generalized cross-validation and Pareto curve, to automatically select near optimal regularization parameters for smoothing and sparse regression methods. In Sec. 5, we compare the performance of the different smoothing, numerical differentiation, and model selection techniques and assess the performance of sparse regression methods to recover the governing equations of three well-known dynamical systems. Finally, in Sec. 6, we draw conclusions and discuss relevant aspects of the various smoothing methods and offer directions for improvements of denoising techniques.
2 Problem Statement and Solution Strategies

Schematic of the linear system in Eq. (4). Solid lines represent exact signals, whereas dots are their noisy counterparts. Measurement noise affects the state-time derivative accuracy and perturbs the space spanned by the basis functions represented by the columns of the measurement matrix .

Schematic of the linear system in Eq. (4). Solid lines represent exact signals, whereas dots are their noisy counterparts. Measurement noise affects the state-time derivative accuracy and perturbs the space spanned by the basis functions represented by the columns of the measurement matrix .
Problem in Eq. (5) can be seen as an errors-in-variables model where perturbations are present in both the dependent and independent variables. There are specialized methods to solve this problem, such as total least squares [60,61]. Instead, this work focuses on reducing the size of the perturbations via filtering, where no specific structure is assumed in ej and E.
Notation. For the interest of a simpler notation, we henceforth drop the subscript j from and in Eq. (4). Unless otherwise stated, refers to the measurements of and not the dynamics in Eq. (1).
2.1 Sparse Regression.
2.2 Sequential Thresholded Least Squares.
The sufficient conditions for general convergence, rate of convergence, conditions for one-step recovery, and a recovery result with respect to the condition number and noise are addressed in Ref. [66]. Algorithm 1 summarizes a practical implementation of STLS.
1: procedure STLS (, , )
2: Solve for using least squares.
3: Apply thresholding .
4: while not converged or do
5: Delete the columns of whose corresponding component is 0, obtaining .
6: Solve for using least squares.
7: Apply thresholding .
8: .
9: end while
10: end procedure
2.3 Weighted Basis Pursuit Denoising.
Iteratively reweighted weighted basis pursuit denoising (WBPDN) adopted from Ref. [68]
1: procedure WBPDN (, , , , )
2: Set the iteration counter to and .
3: while not converged or do
6: .
7: end while
8: end procedure
3 Measurement Denoising and Derivative Estimation
In the present work, the aim of denoising techniques is to obtain an approximate model of the state variables with maximum accuracy with respect to an appropriate error measure as well as the smoothest solution possible. The derivatives are then computed by differentiating the approximate model. The assumption in all the methods is to approximate the state variables x(t) to then estimate the time derivatives by differentiating the state variable approximations. The assumption is that if the approximation of the state variables is close to the original, unknown variables x(t), then the estimated time derivatives are also close to . Mathematically, this can be expressed as for some positive constant C ∼ O(Δt), where Δt is the sampling rate, and a suitable norm, usually the ℓ2-norm.
One can divide smoothing techniques into local and global methods. On the one hand, local methods fit a regression function locally in time using neighboring measurements around a specific data point and ignore data outside the neighborhood. On the other hand, global methods employ all the data available to estimate a smooth approximation over the entire time domain. Inspired by the findings in Refs. [24,28,29,56,57], this article compares several best-performing local and global techniques for data denoising and numerical time differentiation.
3.1 Local Methods.
![Illustration of the local polynomial regression method using a local quadratic polynomial (solid line) over a window of size 30% of the entire data range. The shaded regions correspond to the rectangular and Epanechnikov [81] kernel functions](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/23/1/10.1115_1.4054573/1/m_jcise_23_1_011004_f002.png?Expires=1698788684&Signature=wgR1LqFx74~M5TFWI3bUUUBF0kQCrMSHc9PUP~M5-6M6-Er~~smgNQE4cjWylaLyUwdGTVokO9YYfM18pEPpzq6MLDNsyoWAsiJJvYsRhc7A~~JhvUhHm9qzlBvwO3h7PhNaYIEPaLDN1v9kolu8T~a-cTTQA2yZ6DteIz4-7A34JayjRd1dpxFWLgN8ndCn732EmQPVJkbTriJ7npbV5GfOLUAksrI0l7HvdE19uuNOJrYcKzuaYD7ch-sX8MeIEnvKrvzAzl~541~a4y5K4iU8FKsae0dbbldyedLT-0kGlxBtoR0oC9oKcrgnJNpnYeyF5KiYFtqZIe4O-Qb07Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Illustration of the local polynomial regression method using a local quadratic polynomial (solid line) over a window of size 30% of the entire data range. The shaded regions correspond to the rectangular and Epanechnikov [81] kernel functions
![Illustration of the local polynomial regression method using a local quadratic polynomial (solid line) over a window of size 30% of the entire data range. The shaded regions correspond to the rectangular and Epanechnikov [81] kernel functions](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/23/1/10.1115_1.4054573/1/m_jcise_23_1_011004_f002.png?Expires=1698788684&Signature=wgR1LqFx74~M5TFWI3bUUUBF0kQCrMSHc9PUP~M5-6M6-Er~~smgNQE4cjWylaLyUwdGTVokO9YYfM18pEPpzq6MLDNsyoWAsiJJvYsRhc7A~~JhvUhHm9qzlBvwO3h7PhNaYIEPaLDN1v9kolu8T~a-cTTQA2yZ6DteIz4-7A34JayjRd1dpxFWLgN8ndCn732EmQPVJkbTriJ7npbV5GfOLUAksrI0l7HvdE19uuNOJrYcKzuaYD7ch-sX8MeIEnvKrvzAzl~541~a4y5K4iU8FKsae0dbbldyedLT-0kGlxBtoR0oC9oKcrgnJNpnYeyF5KiYFtqZIe4O-Qb07Q__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Illustration of the local polynomial regression method using a local quadratic polynomial (solid line) over a window of size 30% of the entire data range. The shaded regions correspond to the rectangular and Epanechnikov [81] kernel functions
The quality of the local estimates is highly dependent on the local bandwidth h(t) and the degree of the polynomial. Large bandwidths tend to under-parametrize the regression function and increase the estimate bias because of curvature effects [82]. Small bandwidths, on the other hand, over-parametrize the unknown function, increase the variance, and produce noisy estimates. A similar tradeoff occurs with the degree of the polynomial, although the resulting estimates are less sensitive as compared to the bandwidth [31]. For a fixed bandwidth h, a large degree d reduces the bias but increases the variance, whereas a low order reduces the variance and increases the bias due to curvature effects. As recommended in Ref. [83], and since we are interested in first-order derivatives, we use a fixed polynomial degree d = 2 in the numerical experiments of this work. How to select an appropriate bandwidth h will be discussed in Sec. 4. In this article, we focus on the Savitzky–Golay (S–G) filter [84] and a locally weighted polynomial regression [85].
Reference [83] suggests a polynomial degree d = ν + 1, where ν is the order of the derivative (ν = 0 is the regression function estimate). Since we expect higher error on the derivative estimates than on the function estimates, we focus on first-order derivatives and set d = 2 for computing the function and derivative estimates.
3.1.1 Savitzky–Golay Filter.
3.1.2 Locally Weighted Polynomial Regression.
3.2 Global Methods.
![Schematic of data filtering via global smoothing methods. The regularization parameter can be varied from λ = 0, where data are interpolated, to infinity, which yields the best-linear fit. The optimal λ lives within the two extreme cases. The λ parameter was normalized in the [0, 1] range.](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/23/1/10.1115_1.4054573/1/m_jcise_23_1_011004_f003.png?Expires=1698788684&Signature=xFPxWbduQIAUZ4fK0KFzIThkDfI0-hPpKTRBYX1WqS2R8-U75XMBDJvy0S9dCvu~ij3S2KbkZ2DLg78obPvUSZIqnFIzCF1M4lb6bZXHASYrRB2CEOaJG14DGbcaa0HwpLHgV-mcbX89xdL2cGT7gln~aOE9Q-Nzu9HYtDqruToZ0oK3gRXZXpvcG623EST6W6UaR6Ow3dhDXIipP4NVD8si-WfoNPjs8-Hdq3bDa5VqsVtee~Vr6yMTQqgpIK9jXNL1uFGB~eKM~tZ5uhifHGUoTp8weGKX5UrPU0kRAikxpe-mz9mrYd8siHI0GX31aZ1lUEnlgVFwpLhnBwcDSQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Schematic of data filtering via global smoothing methods. The regularization parameter can be varied from λ = 0, where data are interpolated, to infinity, which yields the best-linear fit. The optimal λ lives within the two extreme cases. The λ parameter was normalized in the [0, 1] range.
3.2.1 Smoothing Splines.
3.2.2 Tikhonov Smoother.
3.2.3 ℓ1-Trend Filtering.
We note that one can directly estimate derivatives by reformulating ℓ1-trend filtering in terms of an integral operator and regularizing the derivative estimates using the ℓ1-norm. Similar to Tikhonov, we did not perceive significant differences on the results compared to differentiation via spline interpolation on the denoised states.
4 Hyperparameter Selection Methods
Model selection is the task of optimally selecting model hyperparameters from a set of candidates, given a set of data. In the context of state variable denoising, the selection criterion aims to optimize the predictive quality of the model, i.e., denoised states, by selecting the most appropriate one that balances the closeness of fit and the complexity. All the methods described in Sects. 2.1 and 3 contain a sparse regularization parameter, in the case of sparse regression algorithms, and smoothing parameter, bandwidth h for local smoothing and λ for global smoothing, that controls this tradeoff. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances; see Refs. [46,95,96] for a review. In this work, we focus on two data-driven selection methods widely used in regularized regression without prior knowledge of the noise process: GCV and Pareto curve criterion.
4.1 Generalized Cross-Validation.
For local methods, the GCV functions are defined in terms of the kernel bandwidth h. There are two alternatives to select suitable bandwidths using GCV: varying the bandwidth locally or globally. In the former case, the local bandwidth is varied within a range h(t0) ∈ [hmin(t0), hmax(t0)] for each data point t0, and the one that minimizes the local GCV is selected. In the latter case, a local regression for all data points with a fixed bandwidth is performed and repeated with different bandwidths within a range h ∈ [hmin, hmax], and the one that minimizes the global GCV function is chosen. In our numerical experiments, we did not observe significant differences in performance between the two approaches. We therefore only present the global bandwidth approach in Sec. 5.
4.2 Pareto Curve Criterion.
The Pareto curve is a graph that traces the tradeoff between the residual, i.e., SSE in Eq. (21), and the regularization constraint by varying the regularization parameter λ. Typically, the Pareto curve is an L-shaped graph generated by plotting the norm of the regularization constraint, i.e., in Eq. (21), versus the norm of the residual in a log–log scale for different values of λ. The Pareto curve we are referring to in this work is often called the L-curve for ℓ2-regularization [54] and Pareto frontier for ℓ1-regularization [102]. We will refer to the Pareto curve when the ℓ2-norm is employed for the residual and either the ℓ1- or ℓ2-norms are used for the regularization constraint, and we will be specific in cases it creates confusion. The Pareto curve often has a corner located where the solution changes from being dominated by the regularization error, corresponding to the steepest part of the curve, to being dominated by the residual error, where the curve becomes flat. The Pareto curve criterion employs the heuristic that makes use of the observation: the optimal value of λ corresponds to the curve’s corner such that a good balance of the regularization and residual errors is achieved.
For continuous Pareto curves, defined for all λ ∈ [0, ∞), Hansen [54] suggests defining the corner point as the point with maximum curvature. For discrete curves, however, it becomes more complicated to define the location of the corner point. Hansen et al. [103] highlight the difficulties in computing the corner point from discrete L-curves built using a finite number of λ values. The discrete curves may contain small local corners other than the global one that may give suboptimal regularization parameters. To alleviate this issue, they propose an adaptive pruning algorithm, where the best-corner point is computed by using candidate corners from curves at different resolutions. Since the L-curves must be sampled at different scales, the pruning algorithm can be computationally expensive. We instead compute the corner points using a simpler method proposed in Cultrera et al. [104]. The algorithm iteratively employs the Menger curvature [105] of a circumcircle and the golden section search method to efficiently locate the point on the curve with maximum curvature.
As compared to the GCV criterion, the Pareto curve criterion for ℓ2-regularization is often more robust to correlated errors since it combines information about the residual norm and the norm of the solution, whereas GCV only uses information about the residual norm [54]. The Pareto curve criterion also presents some limitations. The L-curve is likely to fail for problems with very smooth solutions [106], and nonconvergence may occur when the generalized Fourier coefficients decay at the same rate or less rapidly than the singular values of the basis matrix, in this article (see Ref. [107] for details). Additionally, the location of the corner is dependent on the scale in which the Pareto curve is considered, and it may not appear in certain scales [108]. In the numerical examples section, we compare the performance and discuss limitations of GCV and Pareto curve for the smoothing and sparse regression methods considered in this article.
5 Numerical Examples
In this section, we present numerical examples to assess the performance of the smoothing and numerical differentiation methods presented in Sec. 3, as well as model selection methods in Sec. 4. We then compare the accuracy of the sparse regression algorithms in Sec. 2, specifically WBPDN and STLS, to recover governing equations of three nonlinear dynamical systems using the preprocessed data. In all cases, we assume no prior knowledge of the noise values and dynamics that generated the data, except that the governing equations can be sparsely expressed in a multivariate monomial basis in the state variables. We only have access to the noisy state measurements at discrete times represented by the measurement vector , sampled every Δt units of time. The exact state variables are computed by integrating the nonlinear ordinary differential equations (ODE) using the fourth-order explicit Runge–Kutta (RK4) integrator with a tolerance of 10−10. In this work, we assume that the state variables are contaminated with independent zero-mean additive noise with variance σ2. Specifically, for all the numerical examples, we employ white Gaussian noise model given by , and vary σ to study different noise levels. We additionally study the effect of Gaussian distributed colored noise generated with a power law spectrum [109].
While here we work with a Gaussian noise model, we note that more general anisotropic and correlated noise can be easily incorporated through the generalized least squares formulation of the smoothing algorithms. We refer the interested reader to Ref. [110] for a thorough discussion on the generalized least squares, and Section 4.6.2. of Ref. [97] to estimate the weighting matrix involved in the method. To measure the signal magnitude relative to the noise level, we provide, in Appendix B, the signal-to-noise ratio (SNR) for each state j.
The considered nonlinear dynamical systems are the Lorenz 63 system as a base model for identifying chaotic dynamics, and the Duffing and Van der Pol oscillators as nonlinear stiffness and damping models. These dynamical systems are well-known reference problems and have been extensively studied in the literature.
We provide in Table 1 a compendium of available codes in matlab, python, and R for each of the smoothing methods. We highlighted in bold the codes we used in this article for smoothing splines and ℓ1-trend filtering. For local methods and Tikhonov smoother, we used our own implementation.
Compendium of filtering software codes and packages available in matlab, python, and R
Filter | matlab | python | R |
---|---|---|---|
Savitzky–Golay | sgolayfilt (signal processing) | savgol_filter (scipy) | sgolayfilt (signal) |
LOWESS | lowess (curve fitting) | lowess (lowess) | lowess (stats) |
Tikhonov | hpfilter (econometrics) | hpfilter (statsmodels) | hpfilter (mFilter) |
Ssplines | csaps (curve fitting) | UnivariateSpline (scipy) | smooth.spline (stats) |
ℓ1-trend filter | l1_tfa | ℓ1-trend filtering (CVXPY)b | trendfilter (genlasso) |
Filter | matlab | python | R |
---|---|---|---|
Savitzky–Golay | sgolayfilt (signal processing) | savgol_filter (scipy) | sgolayfilt (signal) |
LOWESS | lowess (curve fitting) | lowess (lowess) | lowess (stats) |
Tikhonov | hpfilter (econometrics) | hpfilter (statsmodels) | hpfilter (mFilter) |
Ssplines | csaps (curve fitting) | UnivariateSpline (scipy) | smooth.spline (stats) |
ℓ1-trend filter | l1_tfa | ℓ1-trend filtering (CVXPY)b | trendfilter (genlasso) |
Note: We provide routine names and packages/toolboxes (enclosed in parentheses).
Code available at https://stanford.edu/∼boyd/l1_tf/
Implementation available at https://www.cvxpy.org/examples/applications/l1_trend_filter.html
5.1 Lorenz 63 System.
Figure 4 compares the relative state (top row) and state-time derivative (bottom row) errors defined in Eq. (34) after running each of the local and global smoothing methods using GCV (left column) and Pareto curve criterion (right column) to automatically select the regularization parameter λ. The error trend with respect to different noise levels behaves as expected, with a smooth error increase as noise level grows. We notice that, in general, global methods outperform local methods for all noise levels. This observation is aligned with the conclusions drawn in Ref. [29]. Specifically, LOWESS with the optimal Epanechnikov kernel is superior to the Savitzky–Golay filter with a uniform kernel. For the estimation of state variables, the difference in accuracy between local and global methods is not as evident as in the case of state-time derivative estimates. We also observe that more accurate state estimates yield more accurate state-time derivatives for the different filters presented. In this example, global methods perform similarly, with ℓ1-trend filtering yielding the best results in general. As discussed in Ref. [90], the local adaptativity of ℓ1-trend filtering produces better estimates as compared to cubic smoothing splines. However, we do not notice significant superiority in terms of overall accuracy.

Comparison of the relative state (top row) and state-time derivative (bottom row) errors for the Lorenz 63 system using local and global smoothers. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).
Figure 5 compares the GCV function defined in Eq. (32) for the Savitzky–Golay filter and LOWESS as a function of the global bandwidth h at different noise levels. The solid circles and crosses correspond to the minimum error—i.e., optimal bandwidth—and the minimum of the GCV function, respectively. The general trend for both filters is consistent with noise levels: the bandwidth h increases as the noise level is increased. The interpretation is that as noise levels increase, the local features of the signal are progressively lost and a larger neighborhood is needed to estimate the signal. Reference [113] provides an explanation of this phenomenon from a geometric viewpoint. However, GCV tends to select larger bandwidths than the optimal ones, thus yielding over-smoothed estimates.

GCV functions for each state variable of Lorenz 63 system using Savitzky–Golay and LOWESS filters at different noise levels for an arbitrary realization. Solid circles represent the global bandwidth h that yield the minimum estimation error defined in Eq. (34). Crosses represent the minima of the GCV functions for the GCV selection strategy.

GCV functions for each state variable of Lorenz 63 system using Savitzky–Golay and LOWESS filters at different noise levels for an arbitrary realization. Solid circles represent the global bandwidth h that yield the minimum estimation error defined in Eq. (34). Crosses represent the minima of the GCV functions for the GCV selection strategy.
For the global methods, we also illustrate the behavior of GCV and Pareto curves to select near optimal smoothing parameters λ. Figure 6 shows the Pareto curves (top panels) and GCV functions (bottom panels) for smoothing splines at different noise levels for one state trajectory realization. In the case of Pareto curve criterion, the crosses represent the corners of the L-shaped curves found via the algorithm in Ref. [104]. In the case of GCV criterion, the crosses represent the minimum of the GCV function computed via Eq. (32). In both criteria, the dots represent the λ that yields the lowest relative coefficient error eξ. The first observation is that the Pareto curves consistently select larger λ—smoothing increases as we move along the Pareto curve from top to bottom—as we increase the noise level. We noticed that the L-shaped structure becomes clearer and sharper in regions between low and high noise level extremes. Overall, in all noise cases, the λ corresponding to the corner points of the Pareto curves almost coincide with the optimal ones. In the GCV case, we observe a flat region, where the minimum is located, and a steep region, where over-smoothing errors become more noticeable. As with Pareto curves, GCV also results in a good model selection strategy for selecting the smoothing parameter λ. We omit the plots for Tikhonov smoother since it yields similar trends and observations. The Pareto curves for ℓ1-trend filtering, as shown in Fig. 7 (top panels), become better defined than the smoothing splines and Tikhonov smoother cases, yielding smoothing parameters closer to the optimal ones. However, in the case of GCV, we notice that the behavior of the GCV function around the region of small λ is inconsistent and may yield minima resulting in under-smoothed state estimates. We also observed that by removing the small λ region corresponding to the noisy GCV behavior, the λ corresponding to the new minima matched the optimal ones well (see, for example, the σ = 0.1 case for state x1 in Fig. 7). The results reported for ℓ1-trend filtering using GCV in Fig. 4 were obtained after truncating the region of small λ with irregularities in the GCV function. Next, we present the performance of WBPDN and STLS for estimating the governing equation coefficients , using the filtered data by each of the global smoothing methods. For each method, we used the filtered data via Pareto curve since it slightly outperformed GCV. We omit the analysis of local methods since we observed that global methods generally offer better accuracy for the state and state-time derivative estimates in the examples of this article. Figure 8 compares the relative solution error, given in Eq. (36), for WBPDN (top panels) and STLS (bottom panels) at different noise levels, and using GCV (dashed lines) and Pareto curves (solid lines) as automatic hyperparameter selection techniques. Generally, as noted in Ref. [64], WBPDN significantly outperforms STLS for the hyperparameter selection methods presented in this work. The error trend consistently increases as we increase the noise level for both sparse regression methods. Overall, WBPDN is more robust to noise and yields more accurate coefficient estimates. Similar to the filtering stage, the Pareto curve criterion results in superior performance as compared to GCV for the sparse regression algorithms presented in this article. We also note that even though ℓ1-trend filtering yields slightly better state and state time-derivative estimates than smoothing splines and Tikhonov smoother, that difference is not noticeable for the error on the coefficient estimates. Thus, the global smoothing methods along with the Pareto curve criterion give similar results for the recovery of the coefficients. Figure 9 shows the Pareto curves and GCV functions for the zeroth iteration—i.e., no weighting—of WBPDN and data filtered with ℓ1-trend filtering. We emphasize that good coefficient estimates for the zeroth iteration of WBPDN are crucial for the subsequent iterations to enhance sparsity, reduce the error and converge. As can be observed, the corner point criterion of Pareto curves results in accurate estimation of near optimal regularization parameters λ, whereas the minima of the GCV functions generally tend toward smaller λ from the optimal ones. Similar results were observed for Tikhonov smoother and smoothing splines.
![Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 systems using smoothing splines at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves (using the iterative algorithm in Ref. [104]) and the minima of the GCV functions for GCV. Note that the corner on Pareto curves for small noise levels is not visible because of the scale of the plot; however, the corner exists if one amplifies the plot around that region.](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/23/1/10.1115_1.4054573/1/m_jcise_23_1_011004_f006.png?Expires=1698788685&Signature=I0vttXs-dWsLT~qwpbsLAiExgLVKcsWI2tLZIPJ2limMd-WBO~-gN1fAz~4nqvCwOUwwTQISDr~rU1RKDIVzmgkZXbcYrKLvIhJdIFmFhE6BB6mnrBCO3VA30SPV4Vv05nK6gLCVdaR91dgfkQrH2grDnIegJPgnvjdO8s4xIoht0Kh22La1Qo3Hf0RQ186RPhwTgzueqfpYvNc73zUlLWjtBY4CGV~jh4JSIudGIyXH~ImmBYjOLUCF-MalGdiFi0jzH~4lgYpiRvxb6HYKB1lUG7FRFhVtEq1OAU~ZKAfj6V8CcmVAmMc2Lvf9CcylF~caE7ZX9o0jr68HXp8ENg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 systems using smoothing splines at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves (using the iterative algorithm in Ref. [104]) and the minima of the GCV functions for GCV. Note that the corner on Pareto curves for small noise levels is not visible because of the scale of the plot; however, the corner exists if one amplifies the plot around that region.
![Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 systems using smoothing splines at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves (using the iterative algorithm in Ref. [104]) and the minima of the GCV functions for GCV. Note that the corner on Pareto curves for small noise levels is not visible because of the scale of the plot; however, the corner exists if one amplifies the plot around that region.](https://asmedc.silverchair-cdn.com/asmedc/content_public/journal/computingengineering/23/1/10.1115_1.4054573/1/m_jcise_23_1_011004_f006.png?Expires=1698788685&Signature=I0vttXs-dWsLT~qwpbsLAiExgLVKcsWI2tLZIPJ2limMd-WBO~-gN1fAz~4nqvCwOUwwTQISDr~rU1RKDIVzmgkZXbcYrKLvIhJdIFmFhE6BB6mnrBCO3VA30SPV4Vv05nK6gLCVdaR91dgfkQrH2grDnIegJPgnvjdO8s4xIoht0Kh22La1Qo3Hf0RQ186RPhwTgzueqfpYvNc73zUlLWjtBY4CGV~jh4JSIudGIyXH~ImmBYjOLUCF-MalGdiFi0jzH~4lgYpiRvxb6HYKB1lUG7FRFhVtEq1OAU~ZKAfj6V8CcmVAmMc2Lvf9CcylF~caE7ZX9o0jr68HXp8ENg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 systems using smoothing splines at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves (using the iterative algorithm in Ref. [104]) and the minima of the GCV functions for GCV. Note that the corner on Pareto curves for small noise levels is not visible because of the scale of the plot; however, the corner exists if one amplifies the plot around that region.

Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 systems using ℓ1-trend filtering at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves and the minima of the GCV functions for GCV.

Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 systems using ℓ1-trend filtering at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves and the minima of the GCV functions for GCV.

Relative coefficient errors of Lorenz 63 for WBPDN (top panels) and STLS (bottom panels) with respect to different noise levels σ with measurements preprocessed via smoothing splines, Tikhonov smoother, and ℓ1-trend filtering. The dashed and solid lines correspond to the solution using GCV and Pareto curves for selecting λ, respectively.

Relative coefficient errors of Lorenz 63 for WBPDN (top panels) and STLS (bottom panels) with respect to different noise levels σ with measurements preprocessed via smoothing splines, Tikhonov smoother, and ℓ1-trend filtering. The dashed and solid lines correspond to the solution using GCV and Pareto curves for selecting λ, respectively.

Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 system for WBPDN using ℓ1-trend filtering at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves and the minima of the GCV functions for GCV. Similar trends were observed for smoothing splines and Tikhonov smoother.

Pareto curves (top panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 system for WBPDN using ℓ1-trend filtering at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves and the minima of the GCV functions for GCV. Similar trends were observed for smoothing splines and Tikhonov smoother.
For STLS, we noticed that both GCV and Pareto curves were nonsmooth as compared to the WBPDN case, possibly yielding suboptimal λ estimates. We also observed that measuring the complexity of the model—i.e., vertical axis of the Pareto curve log–log plots—using the ℓ0- and ℓ1-norms for STLS did not produce well-defined L-shaped curves and corners, resulting in poor estimates of optimal λ. Instead, we used 1/λ. Figure 10 shows the GCV functions and Pareto curves for STLS with data filtered using Tikhonov smoother. As seen, the ℓ0-Pareto curve (top panels) is inconsistent with the typical L shape of Pareto curves. Similar results were observed for the ℓ1-Pareto curve. Both 1/λ-Pareto curves and GCV functions present discontinuities for STLS, which may be caused by the hard-thresholding step of the algorithm. Next, we assess the prediction accuracy of the identified governing equations for the best-performing procedure—i.e., ℓ1-trend filtering for denoising the data and estimating derivatives and WBPDN for recovering the Lorenz 63 system dynamics. For the lowest noise case σ = 0.001, even though the coefficients were recovered with high accuracy, we noticed an abrupt change in the predicted state trajectory from the exact one at around eight time units. Therefore, we only report the prediction of the identified dynamics up to eight time units for all noise cases. Figure 11 illustrates the measurements used for training (left panel) and the state predictions of the mean recovered model (right panel). We generated 100 state trajectory realizations of training measurements and computed the mean of the estimated coefficients for each realization. We can see that the butterfly shaped behavior of the Lorenz 63 system is captured for all noise cases. However, the σ = 1 case results in poor prediction performance, as the state trajectory diverges significantly form the exact one. Table 2 provides the quantitative assessment of the prediction accuracy up to eight time units for all noise cases.

ℓ0- and 1/λ-Pareto curves (top and central panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 system for STLS using Tikhonov smoother at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves and the minima of the GCV functions for GCV. Similar trends were observed for smoothing splines and ℓ1-trend filtering.

ℓ0- and 1/λ-Pareto curves (top and central panels) and GCV functions (bottom panels) for each state variable of Lorenz 63 system for STLS using Tikhonov smoother at different noise levels for an arbitrary realization. Solid circles represent the regularization parameters λ that yield the minimum estimation error defined in Eq. (34). Crosses represent the converged corner points for Pareto curves and the minima of the GCV functions for GCV. Similar trends were observed for smoothing splines and ℓ1-trend filtering.

Predicted states for Lorenz 63 from the recovered governing equations for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression. Left: The true dynamics are highlighted in a solid line, and discrete observations used for training are represented by dots. The cross indicates the initial condition. Right: State predictions for different noise levels up to eight time units.

Predicted states for Lorenz 63 from the recovered governing equations for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression. Left: The true dynamics are highlighted in a solid line, and discrete observations used for training are represented by dots. The cross indicates the initial condition. Right: State predictions for different noise levels up to eight time units.
Mean prediction errors and their standard deviations (enclosed in parenthesis) up to eight time units for Lorenz 63 for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression
Noise | σ = 0.001 | σ = 0.01 | σ = 0.1 | σ = 1 |
---|---|---|---|---|
3.46 × 10−2 (3.05 × 10−2) | 4.24 × 10−1 (3.90 × 10−1) | 8.47 × 10−1 (2.69 × 10−1) | 1.27 × 1100 (1.89 × 110−1) | |
4.60 × 10−2 (4.07 × 10−2) | 4.67 × 10−1 (3.85 × 10−1) | 8.68 × 10−1 (2.58 × 10−1) | 1.25 × 100 (1.54 × 10−1) | |
2.19 × 10−2 (1.89 × 10−2) | 1.18 × 10−1 (7.88 × 10−2) | 2.39 × 10−1 (7.71 × 10−2) | 4.03 × 10−1 (7.66 × 10−2) | |
eX | 2.19 × 10−2 (1.93 × 10−2) | 2.18 × 10−1 (1.78 × 10−1) | 4.10 × 10−1 (1.19 × 10−1) | 5.93 × 100 (7.87 × 10−2) |
Noise | σ = 0.001 | σ = 0.01 | σ = 0.1 | σ = 1 |
---|---|---|---|---|
3.46 × 10−2 (3.05 × 10−2) | 4.24 × 10−1 (3.90 × 10−1) | 8.47 × 10−1 (2.69 × 10−1) | 1.27 × 1100 (1.89 × 110−1) | |
4.60 × 10−2 (4.07 × 10−2) | 4.67 × 10−1 (3.85 × 10−1) | 8.68 × 10−1 (2.58 × 10−1) | 1.25 × 100 (1.54 × 10−1) | |
2.19 × 10−2 (1.89 × 10−2) | 1.18 × 10−1 (7.88 × 10−2) | 2.39 × 10−1 (7.71 × 10−2) | 4.03 × 10−1 (7.66 × 10−2) | |
eX | 2.19 × 10−2 (1.93 × 10−2) | 2.18 × 10−1 (1.78 × 10−1) | 4.10 × 10−1 (1.19 × 10−1) | 5.93 × 100 (7.87 × 10−2) |
Notes: For each of the recovered models from the 100 noisy trajectory realizations, we predicted the state trajectories via integrating the identified governing equations. We then computed the mean and standard deviation of the prediction errors defined in Eq. (34).
In the σ = 0.1 and σ = 1 cases, specially the latter, we noticed that the coefficients recovered for some of the 100 realizations yielded unstable dynamical models that could not be integrated for state prediction. We removed those cases for computing the average and the standard deviation to generate the plot in Fig. 11 (right panel) and assess the prediction accuracy in Table 2.
Finally, we study the performance of local and global smoothing techniques for additive Gaussian colored noise. We generated noise εj for each state variable j and different noise levels using a power spectral density (PSD) per unit of bandwidth proportional to 1/|f|d, where f denotes frequency. Specifically, we simulated pink, blue, and brown noise corresponding to exponents d = 1, − 1, − 2, respectively, and added it to the true state trajectory.
Figure 12 shows the relative state error performance against noise level for all local and global smoothers using GCV and Pareto curves as hyperparameter selection methods. As illustrated, the color of noise does not affect the performance of the smoothing methods yielding almost identical results. The same conclusion as in the white noise case can be drawn: global smoothers outperform local smoothers, and ℓ1-trend filtering is slightly more accurate than the rest of the global methods. For completeness, the error performance on the time derivatives can be found in Appendix D. We omit the analysis of the governing equation recovery performance since the smoothing results with colored noise are similar to the white noise case (Fig. 4).

Comparison of the relative state errors for the Lorenz 63 system using local and global smoothers with pink, blue, and brown noise (from left to right). The regularization parameters were computed using GCV (top row) and the Pareto curve criterion (bottom row).
5.2 Duffing and Van der Pol Oscillators.
We present the performance of the different filtering strategies to denoise the state and estimate time-derivatives in Fig. 13 for the Duffing and Fig. 14 for the Van der Pol oscillators. Similar to the Lorenz 63 example, global methods outperform local methods for the model selection techniques used in this article. Again, ℓ1-trend filtering yields slightly better results than smoothing splines and Tikhonov smoother, specially when estimating derivatives. Both GCV and Pareto curves robustly select suitable λ and perform similarly for the global methods. Similar observations can be made on the identification of both nonlinear oscillators. Figure 15 shows the accuracy of WBPDN for recovering the governing equation coefficients for Duffing (top panels) and Van der Pol (bottom panels). State measurements preprocessed with global filtering strategies yield similar recovery performance using WBPDN, and Pareto curves outperform GCV for estimating near optimal regularization parameters. We do not show the plots for Pareto curves and GCV functions and omit the comparison with STLS for these examples since we observed similar trends as in the Lorenz 63 case. Finally, we report the prediction accuracy qualitatively in Fig. 16, for Duffing, and Fig. 17, for Van der Pol systems. We also report the prediction errors in Tables 3 and 4 up to 20 time units for the Duffing and Van der Pol, respectively. For these examples, we integrated the identified governing equations up to 20 time units, about 10 times the time span used for training. As highlighted in Remark 3, we removed those realizations that produced unstable dynamics for higher noise levels. As shown, the predictions match well with the exact trajectory for noise levels up to σ = 0.01, and diverge for the worst noise case σ = 0.1. It is remarkable that the recovered models capture the dynamical behavior of the Duffing and Van der Pol oscillators using a small number of discrete state samples. In the Duffing case, both the damping rate and cubic nonlinearity are well identified, except for the highest noise case. For the Van der Pol example, the identified models recover the limit cycle behavior even though the training set does not include any sample along the limit cycle trajectory.

Comparison of the relative state (top row) and state-time derivative (bottom row) errors for the Duffing oscillator using local and global smoothers. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Comparison of the relative state (top row) and state-time derivative (bottom row) errors for the Van der Pol oscillator using local and global smoothers. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Relative coefficient errors for Duffing (top panels) and Van der Pol (bottom panels) oscillators via WBPDN with measurements preprocessed using smoothing splines, Tikhonov smoother, and ℓ1-trend filtering. The dashed and solid lines correspond to the solution using GCV and Pareto curves for selecting λ, respectively.

Relative coefficient errors for Duffing (top panels) and Van der Pol (bottom panels) oscillators via WBPDN with measurements preprocessed using smoothing splines, Tikhonov smoother, and ℓ1-trend filtering. The dashed and solid lines correspond to the solution using GCV and Pareto curves for selecting λ, respectively.

Predicted states for the Duffing oscillator from the recovered governing equations for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression. Left: The true dynamics are highlighted in a solid line, and discrete observations used for training are represented by dots. The cross indicates the initial condition. Right: State predictions for different noise levels up to 20 time units.

Predicted states for the Duffing oscillator from the recovered governing equations for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression. Left: The true dynamics are highlighted in a solid line, and discrete observations used for training are represented by dots. The cross indicates the initial condition. Right: State predictions for different noise levels up to 20 time units.

Predicted states for the Van der Pol oscillator from the recovered governing equations for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression. Left: The true dynamics are highlighted in a solid line, and discrete observations used for training are represented by dots. The cross indicates the initial condition. Right: State predictions for different noise levels up to 20 time units.

Predicted states for the Van der Pol oscillator from the recovered governing equations for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression. Left: The true dynamics are highlighted in a solid line, and discrete observations used for training are represented by dots. The cross indicates the initial condition. Right: State predictions for different noise levels up to 20 time units.
Mean prediction errors and their standard deviations (enclosed in parenthesis) up to 20 time units for Duffing oscillator for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression
Noise | σ = 0.0001 | σ = 0.001 | σ = 0.01 | σ = 0.1 |
---|---|---|---|---|
9.06 × 10−4 (6.96 × 10−4) | 8.39 × 10−3 (5.79 × 10−3) | 3.07 × 10−1 (3.78 × 10−1) | 9.76 × 10−1 (2.89 × 10−1) | |
7.34 × 10−4 (5.57 × 10−4) | 6.80 × 10−3 (4.62 × 10−3) | 2.76 × 10−1 (3.55 × 10−1) | 9.33 × 10−1 (4.59 × 10−1) | |
eX | 7.35 × 10−4 (5.58 × 10−4) | 6.81 × 10−3 (4.63 × 10−3) | 2.76 × 10−1 (3.54 × 10−1) | 9.33 × 10−1 (4.60 × 10−1) |
Noise | σ = 0.0001 | σ = 0.001 | σ = 0.01 | σ = 0.1 |
---|---|---|---|---|
9.06 × 10−4 (6.96 × 10−4) | 8.39 × 10−3 (5.79 × 10−3) | 3.07 × 10−1 (3.78 × 10−1) | 9.76 × 10−1 (2.89 × 10−1) | |
7.34 × 10−4 (5.57 × 10−4) | 6.80 × 10−3 (4.62 × 10−3) | 2.76 × 10−1 (3.55 × 10−1) | 9.33 × 10−1 (4.59 × 10−1) | |
eX | 7.35 × 10−4 (5.58 × 10−4) | 6.81 × 10−3 (4.63 × 10−3) | 2.76 × 10−1 (3.54 × 10−1) | 9.33 × 10−1 (4.60 × 10−1) |
Notes: For each of the recovered models from the 100 noisy trajectory realizations, we predicted the state trajectories via integrating the identified governing equations. We then computed the mean and standard deviation of the prediction errors defined in Eq. (34).
Mean prediction errors and their standard deviations (enclosed in parenthesis) up to 20 time units for Van der Pol for different noise levels using ℓ1-trend filter/Pareto for filtering and WBPDN/Pareto for sparse regression
Noise | σ = 0.0001 | σ = 0.001 | σ = 0.01 | σ = 0.1 |
---|---|---|---|---|
2.01 × 10−4 (1.51 × 10−4) | 1.88 × 10−3 (1.52 × 10−3) | 1.56 × 10−2 (1.11 × 10−2) | 1.62 × 100 (1.47 × 100) | |
4.40 × 10−2 (3.30 × 10−4) | 4.12 × 10−3 (3.31 × 10−3) | 3.45 × 10−2 (2.41 × 10−2) | 7.46 × 10−1 (3.68 × 10−1) | |
eX | 4.35 × 10−4 (3.27 × 10−4) | 4.07 × 10−3 (3.28 × 10−3) | 3.42 × 10−2 (2.38 × 10−2) | 1.64 × 100 (1.35 × 100) |
Noise | σ = 0.0001 | σ = 0.001 | σ = 0.01 | σ = 0.1 |
---|---|---|---|---|
2.01 × 10−4 (1.51 × 10−4) | 1.88 × 10−3 (1.52 × 10−3) | 1.56 × 10−2 (1.11 × 10−2) | 1.62 × 100 (1.47 × 100) | |
4.40 × 10−2 (3.30 × 10−4) | 4.12 × 10−3 (3.31 × 10−3) | 3.45 × 10−2 (2.41 × 10−2) | 7.46 × 10−1 (3.68 × 10−1) | |
eX | 4.35 × 10−4 (3.27 × 10−4) | 4.07 × 10−3 (3.28 × 10−3) | 3.42 × 10−2 (2.38 × 10−2) | 1.64 × 100 (1.35 × 100) |
Notes: For each of the recovered models from the 100 noisy trajectory realizations, we predicted the state trajectories via integrating the identified governing equations. We then computed the mean and standard deviation of the prediction errors defined in Eq. (34).
6 Conclusion
The motivation of this work has been to compare and investigate the performance of several filtering strategies and model selection techniques to improve the accuracy and robustness of sparse regression methods to recover governing equations of nonlinear dynamical systems from noisy state measurements. We presented several best-performing local and global noise filtering techniques for a priori state measurement denoising and estimation of state time-derivatives without stringent assumptions on the noise statistics. In general, we observed that global smoothing methods along with either GCV or Pareto curves for selecting near optimal regularization parameters λ outperform local smoothers with GCV as bandwidth selector. Of the global methods, ℓ1-trend filtering yields the best accuracy for state measurement denoising and estimation of state time-derivatives. However, the superior accuracy of ℓ1-trend filtering has not resulted in significant differences for coefficient recovery via WBPDN or STLS compared to smoothing splines and Tikhonov smoother. WBPDN produces smoother GCV functions and Pareto curves than STLS, and the selected regularization parameters are closer to the optimal ones. We conjecture that this is the reason why WBPDN yields more accurate governing equation coefficients. Similar to the findings in Ref. [64], the corner point criterion of Pareto curves often yields better estimates than cross-validation strategies in selecting near optimal regularization parameters.
We empathize that the filtering strategies and model selection techniques presented in this article can be used as guidelines for other identification algorithms that may require measurement smoothing or numerical differentiation to enhance their performance. For example, integral-based approaches, e.g., Refs. [116–118], which do not require numerical differentiation to recover governing equations can also benefit from the filtering strategies presented in this article.
Footnote
matlab routines for STLS can be found at https://faculty.washington.edu/kutz/
Acknowledgment
This work was supported by the National Science Foundation (Grant CMMI-1454601).
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
Appendix A: Difference Matrices
Appendix B: Signal-to-Noise Ratios
Signal-to-noise ratios of each state variable for the Lorenz 63 system
SNR (dB) | |||
---|---|---|---|
Noise | State x1 | State x2 | State x3 |
σ = 0.001 | 101.06 | 102.76 | 110.53 |
σ = 0.01 | 81.06 | 82.76 | 90.53 |
σ = 0.1 | 61.06 | 62.76 | 70.53 |
σ = 1 | 41.06 | 42.76 | 50.53 |
SNR (dB) | |||
---|---|---|---|
Noise | State x1 | State x2 | State x3 |
σ = 0.001 | 101.06 | 102.76 | 110.53 |
σ = 0.01 | 81.06 | 82.76 | 90.53 |
σ = 0.1 | 61.06 | 62.76 | 70.53 |
σ = 1 | 41.06 | 42.76 | 50.53 |
Appendix C: State, State-Time Derivative, and Coefficient Error Variances
This section compares the variance of the relative state and state time-derivative errors defined in Eq. (34) after running each of the local and global smoothing methods over 100 noisy trajectory realizations using GCV (left column) and Pareto curve criterion (right column) to automatically select the regularization parameter λ (Figs. 18–23).

Variances of the relative state (top row) and state time-derivative (bottom row) errors for the Lorenz 63 system using local and global smoothers. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Variances of the relative coefficient errors for the Lorenz 63 system using global smoothers and WBPDN. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Variances of the relative state (top row) and state time-derivative (bottom row) errors for the Duffing oscillator using local and global smoothers. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Variances of the relative coefficient errors for the Duffing oscillator using global smoothers and WBPDN. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Variances of the relative state (top row) and state time-derivative (bottom row) errors for the Van der Pol using local and global smoothers. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).

Variances of the relative coefficient errors for the Van der Pol oscillator using global smoothers and WBPDN. The regularization parameters were computed using GCV (left column) and the Pareto curve criterion (right column).
Appendix D: State-Time Derivative Error Performance With Colored Noise
Refer Fig. 24.

Comparison of the relative state time-derivative errors for the Lorenz 63 system using local and global smoothers with pink, blue, and brown noise (from left to right). The regularization parameters were computed using GCV (top row) and the Pareto curve criterion (bottom row).