## Abstract

In this work, we consider the problem of learning a reduced-order model of a high-dimensional stochastic nonlinear system with control inputs from noisy data. In particular, we develop a hybrid parametric/nonparametric model that learns the “average” linear dynamics in the data using dynamic mode decomposition with control (DMDc) and the nonlinearities and model uncertainties using Gaussian process (GP) regression and compare it with total least-squares dynamic mode decomposition (tlsDMD), extended here to systems with control inputs (tlsDMDc). The proposed approach is also compared with existing methods, such as DMDc-only and GP-only models, in two tasks: controlling the stochastic nonlinear Stuart–Landau equation and predicting the flowfield induced by a jet-like body force field in a turbulent boundary layer using data from large-scale numerical simulations.

## 1 Introduction

Control of high-dimensional nonlinear dynamical systems, such as turbulent flows, which are expensive to model due to their large state spaces, has led to the need for approximate models that are computationally tractable and amenable to standard control algorithms. Simulations and experiments of such complex and uncertain dynamical systems typically yield large spatiotemporal data that can be used to learn tractable reduced-order models suitable for control.

Data-driven model reduction of systems with high-dimensional state spaces has typically been performed using proper orthogonal decomposition (POD) [1]. POD computes the singular value decomposition (SVD) of snapshots of the high-dimensional state (e.g., flowfield) and identifies an optimal subspace containing the most energetic modes on which the high-dimensional state space is projected [2]. For systems with control inputs, POD variants, such as balanced proper orthogonal decomposition [3], have also been proposed, although balanced proper orthogonal decomposition requires knowledge of the underlying equations of the system. Dynamic mode decomposition (DMD) [4,5] is an attractive alternative for modeling the spatiotemporal evolution of data, such as fluid flows, from time-resolved snapshots of the high-dimensional system. Extensions of DMD have included systems with control inputs [6,7], noise-aware [8,9], and sparsity-promoting variants [10], among others.

Dynamic mode decomposition-based approaches often fail to capture the nonlinearities in the underlying dynamics due to the assumption of linearity of the DMD models. For that reason, Koopman operator theory—which is viewed as a generalization of DMD—has been extensively explored both in the fluid dynamics [11] and the controls community [12,13], with DMD-based [14,15], kernel-based [16], and deep learning-based [17,18] approximations of the Koopman operator proposed. Similar modeling efforts have also led to new methods such as sparse identification of nonlinear dynamics [19] and operator inference for model reduction [20].

Gaussian process (GP) regression [21] is a type of nonparametric regression model describing distributions over functions conditioned on the training data. They are ideal for learning arbitrary nonlinear stochastic dynamics due to their flexibility and inherent ability to provide uncertainty estimates capturing both process and measurement noise, as well as model uncertainties. GP regression and its scalable variants [22–25] have been successfully used for a number of dynamical systems modeling [26,27] and control tasks [28–30]. In the context of high-dimensional systems, GP regression has been used in modeling the POD coefficients for systems with varying parameters [31–34] as well as the reduced-order dynamics after POD or auto-encoder-based order reduction [35,36].

In this brief, we propose a hybrid dynamic mode decomposition with control (DMDc) + GP model for learning the nonlinearities and model uncertainties in the reduced-order data that DMDc alone fails to capture. In addition, total least-squares DMD [9] is extended to systems with control inputs (tlsDMDc) to fairly compare the proposed method with a noise-aware, linear-only DMD-based approach that accounts for control inputs.

In Sec. 2, we first introduce DMDc and extend total least-squares DMD [9] to systems with control inputs. Then, we introduce the general framework of Gaussian process regression and the process of training a DMDc + GP reduced-order model. In Sec. 3, we demonstrate the advantages of the proposed models on two tasks: modeling and controlling the nonlinear stochastic Stuart–Landau equation from noisy data and predicting the wall-normal velocity field induced by a jet-like body force field in a turbulent boundary layer.

## 2 Method

### 2.1 Dynamic Mode Decomposition With Control.

where $y\u2208\mathbb{R}ny$ is the state, $u\u2208\mathbb{R}nu$ is the control input, $w\u2208\mathbb{R}ny$ is the independent and identically distributed Gaussian white noise, and $f(\xb7)$ is the nonlinear operator that propagates the state $y(k)$ by one time-step.

Our goal is to identify a reduced-order model of the underlying dynamics when Eq. (1) is unknown and only a limited number of noisy experimental or numerical data is available, as is typical in fluid dynamics applications.

*p*training data tuples ${(y(i),u(i),f(y(i),u(i),w(i)))}1p$, where $y(i),\u2009i=1,\u2026,p$, are not necessarily sequential, the data can be arranged as

#### 2.1.1 Model Order Reduction.

A common way to reduce the dimensionality of the data when $ny\u226b1$ is to project the high-dimensional state $y(k)$ onto the POD modes given by the SVD of the data matrix $Y=U\Sigma VT$, where the columns of matrix $U\u2208\mathbb{R}ny\xd7p$ are the orthonormal eigenvectors of $YYT$ or POD modes arranged by their energy content, i.e., their singular value, the columns of $V\u2208\mathbb{R}p\xd7p$ are the orthonormal eigenvectors of $YTY$, and $\Sigma \u2208\mathbb{R}p\xd7p$ is the diagonal matrix containing the singular values of **Y** arranged by their magnitude.

*n*POD modes corresponding to the

_{x}*n*largest singular values of

_{x}**Y**is a common choice in model reduction methods that focuses the modeling efforts on the most important modes of the high-dimensional system and ignores the least energetic (and, typically, noisy) ones. The high-dimensional state can then be approximated as

*k*. In general, $x(k)$ is approximated in the least-squares sense as $x(k)=UPODTy(k)$. The snapshot matrices (2

*a*) and (2

*b*) are also reduced as

where $X,X\u2032$ are the POD mode amplitude matrices for the training data (2*a*) and (2*b*).

#### 2.1.2 Dynamic Mode Decomposition With Control.

*x*(

*k*) are modeled as a discrete-time linear state space system of the form

#### 2.1.3 Total Least-Squares Dynamic Mode Decomposition With Control.

*a*) and (2

*b*) can be decomposed in a mean and noise part as

**X**and $X\u2032$. The approximation of the dynamics can be expressed as

*A*and

*B*

The above is an extension of tlsDMD to systems with control inputs (tlsDMDc).

### 2.2 Gaussian Process Regression.

*A*and

*B*, we can then learn the nonlinear and process noise terms $e(\xb7)$ of the dynamics (4) that DMDc (and tlsDMDc) fails to capture. With

*A*and

*B*known, we can estimate the linear approximation error of each snapshot as

While DMDc and tlsDMDc assume that the error term $e(\xb7)$ in Eq. (4) is zero, here we attempt to model the error with GP regression trained on the data in Eq. (11). In particular, we seek to model the error term $e(x(k),u(k),w(k))$ in Eq. (4) with a nonparametric Gaussian process regression model as follows.

#### 2.2.1 Exact Gaussian Process Inference.

We start by considering a single component of the vector function $e(\xb7)$ which, for clarity, we will refer to as $e(\xb7)$. The input to this function at time-step *k* is the concatenation of the state $x(k)$ and control input $u(k)$ at that time-step, i.e., $z(k)=[xT(k)\u2009uT(k)]T\u2208\mathbb{R}nz$, with $nz=nx+ny$. If we have noisy observations *ϵ _{i}* of the unknown scalar-valued function $e(\xb7):\mathbb{R}nz\u2192\mathbb{R}$ at known inputs $zi$, for all $Z={zi}i=1p$, we collect these observations in a vector $\u025b$.

Let the measurement likelihood $p(\u025bi|e(zi))$ be zero-mean Gaussian and let $e$ be the (unknown) vector containing the values of $e(\xb7)$ at the points $Z$. We introduce a Gaussian prior $e(z)\u223cN(e(z)|m(z),k(z,z))$, where $m(\xb7):\mathbb{R}n\u2192\mathbb{R}$ is the *mean function* (typically chosen to be the zero function) and $k(\xb7,\xb7):\mathbb{R}nz\xd7\mathbb{R}nz\u2192\mathbb{R}$ is the *kernel function* (typically, a squared exponential) that measures the closeness between two input points and specifies the smoothness and continuity properties of the underlying unknown function $e(\xb7)$.

with the mean vector defined as $[m(Z)]i=m(zi)$ and the covariance $[k(Z,Z)]ij=k(zi,zj)$.

*hyperparameters*$\Theta ={\theta m,\theta k,\sigma \u025b}$ that define the Gaussian process mean, kernel, and likelihood functions can be found by minimizing the negative log-likelihood of the training data

#### 2.2.2 Multiple Outputs.

So far, the error $ei\u2208\mathbb{R}$ we have considered has been a scalar. In the general case, the error in Eq. (4) will be a vector $ei\u2208\mathbb{R}nx$. We can train the hyperparameters of an exact GP using the estimated error snapshots $\u025b$ in Eq. (11) and defining the matrix $E$ as the matrix containing the function values $e(zi)$ at the *i*th row. The latent functions are now $ed(\xb7):\mathbb{R}nz\u2192\mathbb{R}$, for $d=1,\u2026,nx$, and an independent exact GP is learned for each component of the error vector $e(\xb7)$ in Eq. (4). Alternatively, one could train a multitask GP regression model [37] to learn similarities in the outputs of the GP or use scalable variational GP regression [25] to decouple the inference cost from the size of the training data.

### 2.3 Dynamic Mode Decomposition With Gaussian Process Correction.

In the hybrid DMDc + GP method, the error term $e(\xb7)$ in Eq. (4) is modeled with GP regression. The training steps are:

One could use either DMDc or tlsDMDc for learning the linear part of the dynamics. However, as it is demonstrated in the numerical experiments, both choices perform similarly, with DMDc + GP having a slight advantage over tlsDMDc + GP. The use of GP regression for correcting the DMDc predictions offers a number of advantages, such as flexibility and uncertainty awareness. In particular, if we evaluate the dynamics away from the training dataset, the DMDc model will take over, while the uncertainty of the GP inference will increase, indicating less confidence in the GP predictions.

## 3 Numerical Experiments

### 3.1 Stuart–Landau Equation.

where *r*(*k*) is the radius and $\theta (k)$ the angle at time-step *k*, $ur(k)$ and $u\theta (k)$ are the control inputs, and $wr(k),w\theta (k)\u223cN(0,\sigma )$ are independent and identically distributed Gaussian noise terms. The parameters in this experiment are *dt* = 0.01, $\mu =0.1$, *β* = 1, and *γ* = 1, while *σ* varies from 0 to 0.1.

#### 3.1.1 Collecting Data.

The state space is encoded as $x(k)=[r(k)\u2009sin(\theta )\u2009cos(\theta )]T\u2208\mathbb{R}3$ in order to capture the periodic behavior of the angle $\theta (k)$. We assume that we have perfect state measurements and the only noise source is the process noise. We collect data for 13,000 time-steps and use the first 10,000 time-steps for training and the rest 3000 for testing. For the training split of the data, the control inputs are periodic with increasing frequency, in order to excite the different modes of the system. For the test part, the inputs are again periodic with increasing, but lower, frequency. The train/test split is shown in Fig. 1. The GP uses a squared exponential kernel and is trained on every fifth data point, in order to keep the inference cost low.

#### 3.1.2 Model Evaluation.

*σ*= 0 (no noise), $\sigma =0.05$ (low noise), and $\sigma =0.1$ (high noise). Evaluation is performed as follows: first, we select 64 uniformly distributed states from the test split as initial conditions; then, we simulate the state dynamics forward for

*N*= 256 time-steps; finally, we compute the average

*L*

_{2}norm of the state prediction error as a percentage of the actual state, i.e.,

The results are shown in Table 1. First, we notice that tlsDMDc tends to perform better than DMDc in all noise levels. Second, we see a big improvement in the prediction errors when a GP is introduced, with DMDcGP performing slightly better than tlsDMDcGP in all noise levels. Third, a GP-only model (without a linear DMDc or tlsDMDc part) has good performance in the presence of noise, with the caveat that when the GP approaches a location of the state and input space that is away from the training data, the predictions collapse to zero, pushing the rest of the predictions off (as seen in the zero-noise case, where the error is significantly large). For demonstration purposes, we also experiment with a more challenging 3000 time-step prediction using tlsDMDc and DMDcGP, as shown in Fig. 2.

Model | σ | DMDc | tlsDMDc | DMDcGP | tlsDMDcGP | GP |
---|---|---|---|---|---|---|

$eL2$% | 0 | 19.2 | 6.6 | 0.3 | 0.3 | 52.6 |

0.05 | 19.5 | 6.5 | 2.5 | 2.6 | 13.5 | |

0.1 | 19.8 | 6.6 | 5.1 | 7.0 | 6.3 |

Model | σ | DMDc | tlsDMDc | DMDcGP | tlsDMDcGP | GP |
---|---|---|---|---|---|---|

$eL2$% | 0 | 19.2 | 6.6 | 0.3 | 0.3 | 52.6 |

0.05 | 19.5 | 6.5 | 2.5 | 2.6 | 13.5 | |

0.1 | 19.8 | 6.6 | 5.1 | 7.0 | 6.3 |

*L*_{2} error for a prediction horizon of *N* = 256, averaged over 100 initial conditions.

Minimum errors per case are indicated with bold.

#### 3.1.3 Model Predictive Control.

where the terms $\Delta Ak,\u2009\Delta Bk$, and $dk$ result from the linearization of the Gaussian process [30] at time-step *k*, the inputs are constrained to be between –0.2 and 0.2, and the initial condition $xk$ at each solution of the optimal control problem is an exact measurement of the state (alternatively, it could be a state estimate derived from Kalman filtering a noisy partial state measurement). The actuation and state costs are chosen to be $R=10\u22123I2\xd72$ and $Q=I3\xd73$, respectively, the receding horizon is *N* = 50, and the optimal control problem is solved at each time-step as a quadratic program with inequality constraints [39].

We run the model predictive controller for 2000 time-steps, with a desired trajectory as shown in Fig. 3. The control task is executed with high noise ($\sigma =0.1$) in order to demonstrate the robustness of the learned DMDcGP model. Using the tlsDMDc model leads to large tracking errors (Fig. 3(a)), compared to DMDcGP where, after a few time-steps, the state *x*(*k*) ends up close to the desired trajectory (Fig. 3(b)).

### 3.2 Near-Wall Jet in a Turbulent Boundary Layer.

Next, we test the proposed tlsDMDc and DMDcGP methods on a more challenging model order reduction task for a high-dimensional system. In particular, we want to model the wall-normal velocity field that a jet in a turbulent boundary layer induces. Such a model is useful for model-based turbulent flow control tasks, where the cost of simulations is prohibitively large (e.g., 36,000 CPU hours to collect the present dataset) for real-time control [40,41]. We perform large eddy simulations of a turbulent boundary layer at a Reynolds number based on the momentum thickness of about $Re\theta =2000$. The near-wall jet is modeled as a body force with Gaussian distribution in space, a $45\u2009deg$ pitch angle toward the wall and in the direction of the flow, and magnitude that is controlled by a scalar control input, $u(k)\u2208[0,1]$. We perform a set of ten different large eddy simulations for pulsed inputs as shown in Fig. 4 and ensemble-average the data collected from these simulations, in order to minimize the effect of the background turbulence on the data and focus on the mean effect that the jet has on the flow. The high-dimensional state is the wall-normal velocity at a grid of size $61\xd716\xd715$ around the force field, yielding a high-dimensional state $y$ of size *n _{y}* = 14,640.

#### 3.2.1 Training.

First, we reduce the dimensionality of the data by projecting the high-dimensional states *y* onto the first *n _{x}* = 5 POD modes, which have been observed to capture the main flow structures in the data. Then, we model the POD mode amplitudes using both tlsDMDc and DMDcGP, computed on the train split shown in Fig. 4.

#### 3.2.2 Prediction.

The task here is to predict the flowfield over the next 250 time-steps for a pulse input that lasts a different amount of time (800 time-steps) than the pulses that were used in the training split (400 and 1200 time-steps). This problem is particularly challenging since the dynamics are transient. If we look at the POD mode amplitudes predicted by the tlsDMDc model in Fig. 5, we notice that although the predicted amplitudes for the dominant modes are closely tracking their corresponding best POD approximation, the less dominant modes are not well approximated. The latter leads to tlsDMDc underestimating both the downwash (flow toward the wall) and the upwash (flow away from the wall) downstream of the domain (Fig. 6(a)). On the contrary, DMDcGP not only approximates the amplitude of these weaker modes better but its predicted flowfield (Fig. 6(b)) closely matches both the closest POD approximation (Fig. 6(c)) as well as the exact (ensemble-averaged) flowfield (Fig. 6(d)). The mean state prediction errors (in the *L*_{2}-norm sense) are given in Table 2.

## 4 Conclusion

We presented an extension of the noise-aware total least-squares dynamic mode decomposition to systems with control inputs and a hybrid approach combining dynamic mode decomposition with control and Gaussian process regression for learning reduced-order models for high-dimensional stochastic nonlinear systems. Both approaches were shown to yield improved results in prediction and control tasks over existing methods. Future work will leverage these hybrid models in flow control applications, such as in Refs. [40] and [41].

## Acknowledgment

The authors would like to acknowledge support by the National Science Foundation Award Nos. 2129494 and 2052811 and the Texas Advanced Computing Center. Alexandros Tsolovikos acknowledges support by the A. Onassis Foundation Scholarship.

## Funding Data

Division of Chemical, Bioengineering, Environmental, and Transport Systems (Award No. 2129494; Funder ID: 10.13039/100000146).

Division of Civil, Mechanical and Manufacturing Innovation (Award No. 2052811; Funder ID: 10.13039/100000147).

## References

**2**(2), pp.