Abstract

This paper addresses the problem of algorithmic prediction of protein folding pathways, namely, the transient three-dimensional conformations of protein molecules during folding, under constrained rates of entropy change. We formulate the physics-based prediction of folding pathways as a control synthesis problem, where the control inputs guide the protein folding simulations. These folding control inputs are obtained from large-scale trust-region subproblems (TRS) utilizing a computationally efficient algorithm with no need for outer iterations. The proposed control synthesis approach, which leverages the solutions obtained from a special generalized eigenvalue problem, avoids potentially cumbersome and unpredictable iterative computations at each protein conformation. Moreover, the TRS-based control inputs align the closed-loop dynamics closely with the kinetostatic compliance method (KCM) reference vector field while satisfying ellipsoidal constraints on the folding control inputs. Finally, we provide conditions for existence and uniqueness of the resulting closed-loop solutions, which are the protein folding pathways under constraints on the rate of entropy change. Numerical simulations utilizing the KCM approach on protein backbones confirm the effectiveness of the proposed framework.

1 Introduction

The protein folding problem can be understood in two key parts [1,2], which build upon Anfinsen's seminal work [3] and Levinthal's proposition [4]. In the former, the aim is to predict a protein's three-dimensional (3D) structure solely from its amino acid sequence. In the latter, the objective is to compute transient conformations of proteins during folding (see Fig. 1).

Fig. 1
The protein folding problem is concerned with finding the transient and final folded conformations of protein molecules from their denatured unfolded state. Proteins adopt pathways to avoid high entropy loss routes during folding.
Fig. 1
The protein folding problem is concerned with finding the transient and final folded conformations of protein molecules from their denatured unfolded state. Proteins adopt pathways to avoid high entropy loss routes during folding.
Close modal

Understanding protein folding as a guided process involving specific pathways or mechanisms stems from Levinthal's paradox [4]. According to this paradox, protein folding through a random search of every possible protein conformation is implausible. This is because the number of potential protein configurations is astronomically vast and a random search would lead to an incredibly long folding time (Levinthal's time). Despite the inherent challenge of Levinthal's paradox, incorporating a minimal energy bias against energetically unfavorable protein conformations can significantly reduce the folding time scale to biologically relevant regimes [5]. Furthermore, the difference in entropy between denatured and folded proteins contributes to the complexity of folding. In particular, denatured proteins exhibit a markedly higher conformational entropy compared to their folded native state. Moreover, two-state proteins preferentially adopt pathways that minimize entropy loss during folding [6].

1.1 Importance of Protein Folding Simulations.

The limited temporal resolution of protein structural data presents a significant challenge to experimentally elucidating protein folding pathways [7]. Consequently, algorithmic prediction of protein folding pathways has emerged as a powerful complementary approach, finding applications in computer-aided drug discovery [8] and studies of protein misfolding diseases [9]. Furthermore, folding simulations hold promise for engineering applications, enabling quantitative prediction of mobility and range of motion in protein-based nanomachines [10,11].

1.2 Knowledge-Based Versus Physics-Based Approaches.

One category of widely used algorithms for predicting the final folds of proteins is provided by what is known as knowledge-based approaches. These algorithms, which are based on pattern recognition and machine learning, infer the 3D structures of folded protein shapes by harnessing the predictive power of the amino acid sequences (akin to Anfinsen's dogma) and leveraging extensive datasets of existing folds [12]. For instance, AlphaFold's core component is a neural network trained on the protein data bank (PDB) structures. This network predicts probability distributions of inter-residue atom distances and leverages the patterns derived from the vast protein sequence landscape through the integration of sequence databases and multiple sequence alignments [13]. However, knowledge-based approaches cannot describe the interactions between proteins and nucleic acids. Furthermore, they fail to take into account the stability of the protein molecule and the rates at which the folding process occurs, i.e., the folding kinetics (see, e.g., Ref. [14]).

In contrast to knowledge-based strategies, physics-based methods utilize basic physical principles to simulate the process of folding while computing the transient and final 3D structures of protein molecules [15]. While physics-based strategies supply trustworthy data concerning the temporary structures in the folding process, these simulations require significant processing time and specialized hardware (e.g., Anton family of supercomputers for atomic-level folding simulations [16]).

1.3 The Kinetostatic Compliance Method Framework.

To overcome the computational burden of physics-based protein folding algorithms, the innovative kinetostatic compliance method (KCM) framework has been developed in the literature [1720]. KCM models protein molecules as a multitude of rigid nanolinkages forming a nanokinematic mechanism subject to movement limitations imposed by chemical bonds. To address the significant computational expense associated with all-atom molecular dynamics, KCM iteratively refines protein structures by varying their dihedral angles in a kinetostatic manner. These angles, which define the relative orientations of bonded atoms, are varied under the influence of nonlinear interatomic force fields through an iterative process driving the protein molecule conformation toward a minimized energy state corresponding to the final folded structure.

Kinetostatic compliance method has been a valuable tool for understanding hydrogen bond formation and its influence on protein mobility since its introduction [21]. Moreover, KCM has facilitated the design of peptide-based nanomachines [11]. However, a critical knowledge gap exists: the conventional KCM does not consider how it perturbs the rate of entropy change in kinetostatic protein folding.

1.4 Control Theory and Protein Folding.

The idea of constraining protein folding dynamics using a guiding control input finds its roots in targeted molecular dynamics (TMD) [22,23]. In TMD, a proportional control input regulates the distance between a protein molecule and its target conformation, aiming to expedite the transition to the final folded state.

One of the few attempts to constrain the folding simulations by synthesizing proper guiding control inputs is due to Arkun and collaborators [24,25]. They utilized a bead-spring model of the protein molecule, assuming a linear time-invariant system. They subsequently employed a linear quadratic regulator (LQR) control scheme to guide the folding process. Their controller aimed to navigate away from high-energy regions of the folding landscape while simultaneously regulating the rate of molecule entropy change. However, the simplistic force fields used by Refs. [24] and [25] cannot capture the nonlinear and complicated dynamics of proteins under the electrostatic and van der Waals interatomic effects.

To address the limitations of the protein bead-spring LQR-based folding control framework due to Arkun and collaborators [24,25], we recently proposed a control-theoretic interpretation of KCM [26]. We demonstrated that constraints on the rate of change of entropy can be incorporated into the KCM folding dynamics through control inputs that steer the folding simulation. These control inputs are obtained by solving a large-scale quadratic program (QP) with constraints on a physically relevant norm of the control input. While our initial work utilized interior point algorithms to solve box-constrained QPs for computing the folding control inputs, it encountered a significant limitation. Specifically, it relied on polytopic inequalities, as noted in Ref. [26], which are compatible with rapid numerical solvers (see, e.g., Ref. [27]). This approach contrasts with more realistic quadratic constraints on the inputs. Notably, standard methods in biochemistry, which represent protein configurational entropy, employ quadratic representations based on ellipsoidal norms [28].

1.5 Contributions of the Paper.

This paper makes two key contributions to the KCM-based protein folding framework. First, we address a significant gap in KCM-based protein folding literature. While our previous work constrained the rates of entropy change in the KCM paradigm for the first time [26], it suffers the shortcoming of encoding these constraints via polytopic inequalities with an inevitable loss of accuracy in predicting the folding pathways. To overcome this challenge, we propose a novel approach that employs a computationally efficient trust-region subproblem (TRS) algorithm for constraining the rates of entropy change via ellipsoidal inequalities (Algorithm 1). Moreover, this algorithm offers a significant advantage by eliminating the need for outer iterations, thereby enhancing computational efficiency. This computationally efficient control synthesis algorithm, which is taken from the TRS optimization literature [29], is based on solving a generalized eigenvalue problem (GEP). This approach ensures that the TRS-based folding control inputs induce a closed-loop dynamics faithful to the KCM reference vector field while satisfying the ellipsoidal constraints on the folding control input.

Our second contribution focuses on leveraging the large-scale GEP-based TRS solver developed by Adachi et al. [29] to generate optimization-based control inputs. While Adachi et al. primarily targeted the optimization literature, where solution continuity was not a primary concern [29], control systems considerations necessitate investigating continuity properties of the generated control inputs (see, e.g., Ref. [30] for humanoid robots). Remarkably, the application of Adachi et al.'s solver and its continuity properties within the context of nonlinear optimization-based control remains unexplored. We address this gap by investigating the piecewise continuity of pointwise optimal control laws obtained from the GEP-based TRS solver. We also establish conditions for existence and uniqueness of closed-loop trajectories under these control inputs (Propositions 5.2 and 5.4). Other contributions of this paper include the novel derivation of KCM protein folding dynamics from a Lagrange–Rayleigh formalism (Sec. 3.1) and the first formal proofs of convergence for the KCM folding scheme to folded conformations in the literature (Sec. 3.2).

This paper significantly expands upon a preliminary version presented as a conference poster [31]. In this paper, we delve deeper into the control synthesis algorithm by providing a detailed description, discussing its computational complexity, and exploring its numerical properties. Additionally, we present a comprehensive analysis (absent from the conference poster) investigating the piecewise continuity of control inputs and establishing conditions for existence and uniqueness of closed-loop protein folding trajectories. Finally, the algorithm effectiveness is rigorously assessed through more extensive numerical simulations.

The remainder of this paper is structured as follows. First, in Sec. 2, we briefly present the nanokinematic structure of protein molecules and the main underpinnings of the KCM framework for simulating the kinetostatic folding of proteins. We further demonstrate how the KCM-based approach to protein folding can be formulated as a control synthesis problem. Thereafter, we formulate the main control problem in Sec. 3 and present a pointwise optimal control scheme for solving it in Sec. 4. We subsequently present conditions for existence and uniqueness of solutions to the resulting closed-loop protein folding dynamics in Sec. 5. After presenting the simulation results in Sec. 6, we conclude the paper with final remarks, the impact of the current work beyond protein folding, and future directions in Sec. 7.

1.6 Notation.

Throughout this paper, R+ and Z0+ denote the set of all non-negative real numbers and non-negative integers, respectively. The set S1 denotes the unit circle. The set SO(3) denotes the special orthogonal group of order three. The matrix InRn×n denotes the identity matrix. Given the vector xRn and symmetric positive definite matrix YRn×n, |x|2:=xx denotes the vector Euclidean norm, |x|:=maxi|xi| denotes the vector -norm, and |x|Y:=xYx denotes the vector ellipsoidal norm. Given the integer n, In denotes the identity matrix of size n. Hence, |x|In=|x|2. Given a matrix X, R(X) and N(X) denote the range and null space of the matrix, respectively. Additionally, X0 (X0) indicate that X is a positive definite (resp., semidefinite) matrix. Given a set SRn, co(S) (co¯(S)) denote the convex hull (respectively, convex closure) of the set. Given a smooth map h(·) from an open subset of Rm to an open subset of Rn, dhx denotes the Jacobian of h(·) at x. If h(·) is a real-valued function, dhx=xh(x), where xh(x) denotes the gradient of h(·) at x. Given a vector field Θ:RnRn, FI[Θ](·) is the Filippov set-valued map associated with the vector field.

2 Background

In this section, we provide a summary of the KCM framework for modeling the kinetostatic folding of proteins in vacuo.

2.1 Nano-Linkage-Based Kinematic Model of Protein Molecules.

Proteins are macromolecules with complex structures and intricate dynamics governing their folding process. These long molecular chains consist of peptide planes that are interconnected by chemical bonds. The functionality of protein molecules depends critically on their 3D structure (i.e., their conformation), which can be directly deduced from the linear sequence of amino acids forming their polypeptide chain.

A protein backbone chain schematic is depicted in Fig. 2, where each individual peptide plane is formed by six coplanar atoms that are covalently bonded (visualized as red lines in Fig. 2). The coplanarity assumption about the α-carbon Cα, carbonyl CO, amide nitrogen NH, and another α-carbon is motivated by the wealth of high-resolution X-ray crystallographic data [2]. This fundamental assumption underpins the representation of protein molecules as intricate nanomechanisms with numerous degrees-of-freedom (DOFs) (see, e.g., Refs. [18], [26], and [3234]). Accordingly, the key building blocks of the protein kinematic chain are the peptide planes that function as nanoscale rigid linkages.

Fig. 2
Modeling protein backbone chain as a nanokinematic mechanism where peptide planes act as rigid links that are interconnected by revolute joints centered at the alpha carbon atoms
Fig. 2
Modeling protein backbone chain as a nanokinematic mechanism where peptide planes act as rigid links that are interconnected by revolute joints centered at the alpha carbon atoms
Close modal

2.1.1 The Protein Configuration Vector.

The rotation of the nanolinkages in the protein kinematic chain with respect to each other is facilitated by the alpha-Carbon atoms (Fig. 2), which play the role of hinges connecting peptide planes together. The first and the last Cα atoms are bonded to an N-terminus–peptide plane and a C-terminus–peptide plane, respectively.

The protein molecule nanokinematics comprising N1 peptide planes is determined by a set of bond lengths and dihedral angle pairs. Considering the ith peptide plane, 1iN1, each pair of these dihedral angles represent rotations about the covalent bonds NCα, namely, θ2i+1, and CαC, namely, θ2i+2 (Fig. 2). Additionally, the vector
(1)

Represents the configuration of the backbone chain. The dihedral angle vector θ resides within the 2N-dimensional configuration space Q=S1××S1, where each individual unit circle S1 corresponds to a single dihedral angle within the backbone chain.

Each dihedral angle in the conformation vector θ in Eq. (1) corresponds to a single DOF in the protein molecule's kinematic chain. For each DOF, we can define a unit vector, uj, 1j2N. These vectors align with the axes of rotation around which the nanokinematic chain can rotate. As illustrated in Fig. 2, the unit vectors u2i and u2i+1 represent the directions along two crucial bonds within the ith peptide plane: the CαC bond and the NCα bond. These bonds define the primary rotation axes within the chain's segments. Serving as key structural references, u1 and u2N denote the unit vectors associated with the amino terminus (N-terminus) and carboxy terminus (C-terminus), respectively. The reference conformations are established by setting all dihedral angles to zero, i.e., θi=0 for 1i2N. This configuration results in the protein adopting a linear coil conformation, representing a denatured state. In this state, the protein is stretched or completely unfolded, lacking the specific three-dimensional structure that is characteristic of its functional, native form. Such a state is typically induced by various physical or chemical means, including heat, changes in pH, Chaotropic agents like urea, or mechanical stretching.

While dihedral angles define the rotational DOFs within a protein's kinematic chain, additional information is necessary to fully capture the spatial orientation of the rigid peptide nanolinkages. This information comes in the form of body vectors, denoted by bj, 1j2N. These vectors comprehensively define the relative spatial arrangement of coplanar atoms within each peptide plane. Consequently, any relative positional relationship between two atoms within a plane can be mathematically expressed as a linear combination of their corresponding body vectors, i.e., k1mb2i+k2mb2i+1, where the coefficients k1m and k2m, 1m4, are constants specific to the atom pair and are identical throughout the peptide chain (see, e.g., Refs. [18] and [19] for more details). The body vector b2i is oriented along the line connecting the alpha Carbon atom to the Nitrogen atom on the ith peptide plane, while b2i+1 aligns with the chemical bond linking the Nitrogen atom of the ith peptide plane to the alpha Carbon of the following peptide plane. Due to their distinct directional alignments within the peptide plane structure, b2i and b2i+1 are inherently linearly independent (Fig. 2). The body vectors bj and unit vectors uj act in concert to provide a comprehensive depiction of the protein molecule's conformation as it dynamically varies with respect to the vector θ, which encompasses the peptide dihedral angles.

2.1.2 The Protein Forward Kinematics.

The intricate relationship between the dihedral angle vector, θ, and the protein's kinematic structure can be captured through a framework of rotational matrix transformations [18]. This formalism provides a comprehensive and computationally efficient method for describing protein conformation as a function of its internal torsional DOFs. We start with a right-handed coordinate system originating at the fixed N atom at the amino-terminus. The body vectors and unit vectors are derived from the zero-position (ZP) conformation, which is defined as the conformation where all peptide planes are coplanar in the polypeptide serial linkage. The reference conformation, which is given by θ=0, with reference unit vectors uj0 and reference body vectors bj0, 1j2N, serves as the starting point for
(2)
which compute the mapping from dihedral angle space to protein kinematic configuration. Each of the transformation matrices X(θ,u10,,uj0), 1j2N, in Eq. (2) is defined using the sequential matrix multiplication
(3)
where each of the rotation matrices R(θr,ur0)SO(3) describes the rotation about the vector ur0 with angle θr. After computing the body vectors bj(θ) according to Eq. (2) and anchoring the N-terminal atom at the origin, the position vectors of the nitrogen and Cα atoms are then computed relative to the N atom of the amino-terminus using
(4)

2.2 Kinetostatic Compliance Method.

Kinetostatic compliance method framework leverages the established experimental observation that it is possible to capture the essence of protein folding dynamics while neglecting inertial forces (see, e.g., Refs. [24], [25], and [35]). According to KCM, protein dihedral angles evolve kinetostatically under the influence of interatomic force fields.

2.2.1 Interatomic Force Fields Responsible for Kinetostatic Folding.

Consider a peptide chain composed of Na atoms and N1 peptide planes. The dihedral angle vector, denoted by θ defined in Eq. (1), encodes the conformational state of the chain. The Cartesian coordinates of any two atoms ai, aj within the chain are represented by ri(θ), rj(θ), respectively (Eq. (4)). Their Euclidean distance is then calculated as dij(θ):=|ri(θ)rj(θ)|2. The specific parameters pertaining to atom charges, van der Waals radii, interatomic distances, dielectric constant, potential well depths, and force weights are readily available in Ref. [19] and the provided references.

The total free energy responsible for protein folding, denoted by G(θ), decomposes into electrostatic and van der Waals contributions, as expressed in
(5)

where Gelec(θ) and Gvdw(θ) represent the protein's electrostatic potential energy and van der Waals interatomic potential energy, respectively (detailed expressions can be found in Ref. [18]). Consequently, the net forces acting on each atom ai, 1iNa, due to both Coulombic and van der Waals interactions, can be obtained from the negative gradients of the respective individual energy terms: Fielec(θ)=riGelec and Fivdw(θ)=riGvdw. It is remarked that the molecular dynamics formulations begin with Cartesian coordinates of the atoms given by ri for 1iNa, before transitioning to dynamics described in generalized coordinates (see, e.g., Ref. [36]). In the KCM framework, the generalized coordinates are given by the dihedral angles θ1,,θ2N. Hence, ri=ri(θ1,,θ2N) for 1iNa. The free energy of the molecule G(θ1,,θ2N) is indeed a function of θ, and thus, should explicitly relate to the Cartesian coordinates as G(r1(θ1,,θ2N),,rNa(θ1,,θ2N)).

2.2.2 Kinetostatic Folding Torque Vector.

The KCM-based modeling framework necessitates the calculation of the resultant forces and torques acting on each of the N1 peptide planes within the protein molecule. These computed forces and torques are subsequently concatenated into a 6N-dimensional vector F(θ), which serves as the generalized force vector driving the protein folding process. However, for relating the actual changes in the protein's configuration, the vector F(θ) needs to be mapped to a 2N-dimensional torque vector, denoted as τ(θ). The kinetostatic folding torque vector is given by
(6)

where J(θ)R6N×2N represents the molecule chain Jacobian at conformation θ. This matrix translates the 6N-dimensional force vector into the corresponding 2N-dimensional torque vector governing dihedral angle evolution (see Ref. [18] for detailed derivations). Consequently, the changes in dihedral angles at each protein conformation take place under the direct influence of the kinetostatic folding torque vector acting on the peptide backbone. Specifically, the vector τ(θ)R2N, which aligns with the steepest-descent direction of the total free energy G(θ) in the protein conformation landscape, guides the dihedral angle variations during folding.

2.2.3 Folded Protein Conformations.

Within the context of protein folding, at each local minimum θ of the aggregate free energy function G(θ) defined in Eq. (5), the corresponding torque vector τ(θ) vanishes identically. This signifies the protein molecule's kinetostatic stationarity at folded conformations, where the absence of net kinetostatic torque reflects the balanced internal forces within the protein structure.

2.2.4 Successive Kinetostatic Folding Iteration.

Within the KCM framework, dihedral angles evolve kinetostatically under the influence of the kinetostatic folding torque vector resulting from the interatomic force fields. In particular, given an unfolded protein molecule conformation θ0, the following numerical scheme relates joint torques to dihedral angle changes (see, e.g., Refs. [18] and [19])
(7)

where τ(θk)=J(θk)F(θk) due to Eq. (6). The positive constant κ0 is chosen sufficiently small to prevent excessive angle variations and is heuristically optimized.

A flowchart of the successive kinetostatic folding iteration is depicted in Fig. 3. In the dihedral angle update iterations given by (7), the normalized torque vector τ(θk)|τ(θk)| governs the incremental updates to the dihedral angles at each conformation θk (see Ref. [20] for a sign gradient descent-based version of this iteration). The iterative process governed by Eq. (7) continues until the molecule's aggregated free energy G(θ) converges to the vicinity of a local minimum within the free-energy landscape. Convergence is achieved numerically when the kinetostatic folding torque vector norm falls below a predefined tolerance τtol>0, i.e., when |τ(θk)|2<τtol, for some positive integer k. The protein folding pathway transient conformations are given by {θk}k=0k.

Fig. 3
Flowchart of the conventional kinetostatic folding iteration. The variation of dihedral angles in the flowchart (within dashed borders) is governed by Eq. (7).
Fig. 3
Flowchart of the conventional kinetostatic folding iteration. The variation of dihedral angles in the flowchart (within dashed borders) is governed by Eq. (7).
Close modal

2.3 Properties of the Aggregated Free Energy of the Protein Molecule.

In this section we state the smoothness properties of the aggregated energy of the protein molecule, G(θ), in the vicinity of folded conformations (see, e.g., Ref. [37] for further details). In particular, in a vicinity of a folded conformation θ=θ, G(θ) is a smooth function of the dihedral angle vector θ and assumes a local minimum in this folded conformation. Therefore, θ is a critical point of G(·) and θG(θ)=0. Finally, in the so-called protein metastable states including the folded conformations, the Hessian of G(·) is positive definite.

Additionally, the relationship
(8)

links the gradient of the aggregated free energy to the generalized torques acting on dihedral angles, as demonstrated by Noguti and Gō [38]. Equation (8) is applicable to noncyclic molecules including proteins.

3 A Control Systems-Oriented Interpretation of Kinetostatic Compliance Method-Based Folding

In this section, we recast the KCM-based folding numerical scheme presented in Sec. 2.2 as a nonlinear affine control system. Our point of departure is the Lagrange–Rayleigh formalism resulting in an Euler–Lagrange control system. This controls-oriented framing, which first appeared in our earlier work [26,34], draws inspiration from the field of targeted molecular dynamics (TMD) [22,23] and Arkun et al.'s optimal control framework, where they employed linear spring-like forces described by a connectivity matrix [24,25,39]. In contrast to Arkun et al., our KCM-based line of work utilizes a nonlinear dynamical system to capture the dynamics within the molecule's dihedral angle space.

3.1 Deriving the Kinetostatic Compliance Method Dynamics From a Lagrange-Rayleigh Formalism.

We define a Lagrangian according to L(θ,θ˙):=G(θ), where we disregard the kinetic energy of the molecule per the KCM modeling approach. Additionally, we consider the Rayleigh dissipation function D(θ˙):=12θ˙Dθ˙, where D is a symmetric and positive definite matrix. This formulation leads us to the Euler–Lagrange control system subject to dissipative generalized forces (see, e.g., [40])
(9)
These equations can be rewritten as Dθ˙=θG(θ)+uc, where θG(θ)=J(θ)F(θ) due to Eq. (8). Setting D=I2N, we arrive at the control system
(10)

where the Jacobian matrix, J(θ), and the generalized force vector, F(θ), are identical to those defined in Eq. (6). Equation (10) describes the dynamics of the dihedral angles of a protein molecule subject to interatomic forces and a folding control input, uc. In the KCM dynamics, the folding control input, uc, can represent either physical forces, such as those exerted by optical tweezers or solvation effects, or nonphysical fictitious forces designed to enhance the convergence rate toward folded conformations in simulations performed in vacuo. In the case of the latter, which is the subject of this paper, the folding control input functions similarly to the forces used in targeted molecular dynamics.

As it has been demonstrated in Ref. [26], it is possible to retrieve protein folding pathway conformation samples, which are generated by the KCM difference equation in (7), from the nonlinear control-affine system in Eq. (10). In particular, one can consider the configuration-dependent feedback control law uc:θuKCM(θ), where
(11)
and utilize it in Eq. (10) to arrive at the following KCM reference vector field
(12)

Notably, employing a forward Euler scheme with an integration step equal to κ0 recovers the KCM iteration in Eq. (7) from the reference vector field in Eq. (12). In summary, the successive kinetostatic folding iteration in Eq. (7) is related to the reference vector field given by Eq. (12). This vector field is induced by applying the folding control input (11) to the KCM-based nonlinear control affine system in Eq. (10). Furthermore, the trajectories of the vector field in Eq. (12) provide the protein folding pathways.

Remark 1. While the controls-oriented perspective in Ref. [26] was not explicitly addressed in the original KCM-based protein folding works (e.g., Refs. [18] and [19]), it offers a valuable foundation for extending the KCM framework. As discussed later, this control-theoretic interpretation allows us to constrain the molecule entropy changes introduced by the folding control input, leading to more realistic pathway predictions.

Remark 2. The KCM reference vector field described by Eq. (12), originally developed through the pioneering efforts of Kazerounian, Ilieş, and their collaborators (e.g., Refs. [18] and [41]), has been widely utilized in the KC esponds to a veM literature. This vector field has demonstrated remarkable success in numerical protein folding simulations and has been effectively utilized in significant engineering applications, such as design of peptide-based nanomechanisms [42]. Unfortunately, its use has lacked formal theoretical guarantees and proofs of convergence to folded conformations. In this paper, we address this shortcoming by considering a very similar reference vector field (see Sec. 3.2) for which convergence and smoothness can be easily established. Our simulations will include comparisons of both reference vector fields.

3.2 Convergence of the Kinetostatic Compliance Method Reference Vector Field.

In this section, we consider the reference vector field
(13)
instead of the original reference vector field ΘKCM(θ) in Eq. (12) (Remark 2). To obtain the revised KCM reference vector field (13) from the KCM dynamics (10), we can choose the folding control input uc according to
(14)

In an open neighborhood of a specific folded conformation—excluding the conformation itself—the new KCM vector field given by Eq. (13) is smooth, thanks to the smoothness properties of the aggregated free energy of protein molecules (Sec. 3.2). The gradient flow induced by the reference vector field (13) has well-defined absolutely continuous solutions. Furthermore, using the results in Cortés' work [43, Proposition 7 and Theorem 8], it can be shown that the KCM flow θ˙=ΘKCM2(θ) with initial conformations in a neighborhood of a given folded conformation is guaranteed to converge to it in finite time due to positive definiteness of the Hessian of the aggregated free energy of the molecule (Sec. 3.2).

Figures 4 and 5 compare the folding pathways of a protein backbone chain with N=5 peptide planes, starting near a β-pleated sheet conformation (with simulation numerical parameters detailed in the β-pleated sheet simulations in Sec. 6), under two different reference vector fields. The KCM reference vector fields ΘKCM(θ) in Eq. (12) and ΘKCM2(θ) in Eq. (13) are derived from the control inputs uKCM(θ) and uKCM2(θ), as defined in Eqs. (11) and (14), respectively, within the KCM dynamics (10). As illustrated in the figures, although the original reference vector field commonly used in KCM folding studies converges more quickly, it does not possess the continuity properties of the modified vector field. Indeed, the conventional KCM control input starts oscillating after around 6000 iterations. Therefore, we continue our theoretical derivations using the modified reference vector field as outlined in Remark 2 and present the numerical simulation results for both vector fields in the sequel.

Fig. 4
The protein backbone chain free energy during folding under the conventional and the modified KCM reference vector fields given by Eqs. (12) and (13), respectively
Fig. 4
The protein backbone chain free energy during folding under the conventional and the modified KCM reference vector fields given by Eqs. (12) and (13), respectively
Close modal
Fig. 5
The norm of the folding control inputs under the conventional and the modified KCM control inputs (11) and (14), respectively
Fig. 5
The norm of the folding control inputs under the conventional and the modified KCM control inputs (11) and (14), respectively
Close modal

3.3 Encoding Entropy Loss Constraints in Kinetostatic Compliance Method Protein Folding.

In this section we consider the controls-oriented protein folding KCM dynamics and investigate how excessive entropy changes due to the folding control input can be constrained.

We first derive an expression for the change of the molecule entropy during folding. This derivation is motivated by a key observation in the field of structural biochemistry, which states that excessively rapid changes of entropy should be avoided during the guided folding process [6,24,39]. Multiple established methodologies exist to calculate the entropy changes experienced by a protein molecule during folding. These methodologies can be broadly classified into two categories: those based on configurational entropy computations (e.g., Ref. [44]) and those rooted in thermodynamics-derived operational definitions (e.g., Refs. [45] and [46]). In this work, we will employ the Clausius thermodynamic expression outlined in Refs. [45] and [46]. This approach shares similarities with the derivations presented by Arkun et al. [24], as both methods leverage the principles of classical thermodynamics.

The molecule entropy change according to Clausius equation can be expressed as
(15)
where T0 is the temperature and G(θ) is the aggregated free energy of the molecule given by Eq. (5). Hence
(16)
where we have expressed the rate of change of dihedral angle vector using the dynamics of the nonlinear affine control system in Eq. (10). Using Eq. (8), we can rewrite (16) as
(17)
Invoking the Cauchy-Schwartz inequality, we see
(18)
Since |θG(θ)|2·|uc|2|θG(θ)|22+|uc|22, we have
(19)
The above inequality indicates that to ensure boundedness of the rate of change of entropy, |S˙|, it is necessary to constrain the magnitude of the folding control input |uc|2. Due to the topological equivalence of vector norms, it follows that the folding control input should satisfy
(20)

where for some symmetric positive definite matrix B and constant Ω0. When B=I2N, we arrive at |uc|2Ω0.

As it can be seen from Eq. (20), we establish an ellipsoidal constraint on the control input uc. This constraint directly limits the total rate of entropy change S˙, rather than just the ‘extra’ or ‘deviating’ component θG(θ)uc. This approach is justified by Eq. (16), where the ‘natural’ torque term J(θ)F(θ) is influenced by the KCM free-energy gradients. By bounding |uc|B, we ensure that S˙ remains within a permissible range. Consequently, controlling the magnitude of uc not only restricts θG(θ)uc but also imposes a direct limit on S˙.

Remark 3. As demonstrated in Eq. (17), the gradient of the aggregated free energy, θG(θ), is crucial for determining the physics-based rate of entropy change. Additionally, the control input uc is employed to expedite convergence to folded conformations and to model effects such as solvation or optical tweezer forces, though these are not the focus of this study. The magnitude of this control input can artificially alter the rate of entropy change. To mitigate this, our folding control input design aims to closely align the reference vector field with ΘKCM2(·) in Eq. (13), induced by the control input uKCM2(θ) in Eq. (14), while adhering to norm constraints to prevent significant deviations in the natural rate of entropy change.

We are now ready to state the main control problem addressed in this paper.

3.3.1 Problem Statement (Protein Folding Control Problem).

Consider the protein folding nonlinear control affine system given by Eq. (10). Furthermore, consider the KCM reference vector field ΘKCM2(·) in Eq. (13) induced by the folding control input in Eq. (14). The Protein Folding Control Problem Subject to Entropy Loss Constraints (PFCP) is concerned with two objectives: (1) Synthesize a folding control input uc(·) such that the resulting closed-loop dynamics is closely aligned with the KCM reference vector field while adhering to the ellipsoidal constraint in Eq. (20); and (2) Find conditions for existence and uniqueness of the trajectories of the closed-loopy dynamics, which are the folding pathways subject to the constraint given by Eq. (20).

4 Solution to the Protein Folding Control Problem

To achieve the first objective of the PFCP formulated in Sec. 3.3, we first utilize a so-called optimal decision strategy (ODS) to derive a pointwise optimal control law (see, e.g., Refs. [47] and [48] for the pioneering work on ODS), which happens to be the solution to a TRS. We then present an efficient numerical algorithm for computing this optimal control law at each protein conformation.

4.1 The Optimal Decision Strategy Framework for Protein Folding Control Problem.

The proposed control scheme in this section for solving the PFCP leverages an ODS framework to achieve pointwise optimality.

We first introduce the concept of an ODS-based control scheme for KCM-driven protein folding. Let θ(t,t0,θ0,uc(t)), or θ(t) for brevity, represent the solution to Eq. (10) starting from the conformation θ0 at time t0 under the input function tuc(t). For each solution θ(t), we define the set of free vectors
(21)

The free vectors in S(θ(t)) give rise to the set of admissible dihedral angle folding rate vectors, denoted by St(θ(t)). Specifically, every element within St(θ(t)) corresponds to a vector in S(θ(t)) after its starting point is translated by θ(t). Hence, under any input uc(t) satisfying the constraints, the feasible tangent vectors along the folding trajectory reside within St(θ(t)) (Fig. 6).

Fig. 6
ODS and KCM-based protein folding. The vector field ΘODS(·) is the “nearest” to ΘKCM2(·) while the folding control inputs satisfy the ellipsoidal constraint (20)
Fig. 6
ODS and KCM-based protein folding. The vector field ΘODS(·) is the “nearest” to ΘKCM2(·) while the folding control inputs satisfy the ellipsoidal constraint (20)
Close modal
We now design the ODS-based control law uc(t) such that, at any given time t, the instantaneous closed-loop vector field exhibits the closest proximity to the KCM-based reference vector field ΘKCM2(θ(t)) given by Eq. (13). Considering the norm induced by a positive definite matrix QR2N×2N for capturing proximity of vector fields, we arrive at the following optimization problem
(22)
which can be shown to be equivalent to the TRS
(23)
where
(24)

The following lemma provides the necessary and sufficient conditions for existence of solutions to Eq. (23).

Lemma 4.1 [29, Theorem 1.1]. A vectoruc(θ)is an optimal solution to the TRS in (23) if and only if there existsλ(θ)0such that
(25a)
(25b)
(25c)
(25d)

The pair(λ(θ),uc(θ))satisfying the conditions of Proposition 4.1 is called the TRS solution atθ. We provide an efficient GEP-based algorithm for computing this solution in Sec. 4.2.

Remark 4. In the limit where the constraints in Eq. (23) become sufficiently relaxed, or equivalently, as Ω0, the generated control inputs converge to the conventional KCM control torque vector uKCM2(·) given by Eq. (14).

4.2 Efficient Computation of Trust-Region Subproblems Solutions Using a Generalized Eigenvalue Problem.

To solve the TRS in Eq. (23) for folding control input synthesis, we employ the approach due to Adachi et al. [29], which relies on constructing a generalized eigenvalue problem (GEP).

We start by constructing the matrix pencil
(26)
associated with the TRS in Eq. (23). We further consider the matrix pencil
(27)

and denote its largest eigenvalue by λmax.

Algorithm 1 taken from Ref. [29] solves the TRS in Eq. (23). The algorithm can be explained as follows. The solutions to the TRS in Eq. (23) at each protein conformation θ can be separated into two types: (i) interior solutions with |uc(θ)|B<Ω0; and, (ii) boundary solutions with |uc(θ)|B=Ω0.

If we have an interior solution at a protein conformation θ, then uc(θ) can be obtained by solving the linear system Quc(θ)=G(θ) because Q0. Accordingly, given a protein molecule conformation θ, we first compute the solution candidate p=Q1G(θ). If it satisfies |p|B<Ω0, then it is indeed the solution to the TRS in Eq. (23) at conformation θ. Additionally, due to Eq. (25c) in Lemma 4.1, λ(θ)=0.

On the other hand, if |p|BΩ0, then we are dealing with a boundary case at θ. In this case, one needs to solve for the rightmost eigenvalue of the matrix pencil M˜θ(λ), i.e., λ(θ), and its associated eigenvector.

Algorithm 1
if|Q1G(θ)|B<Ω0then     Interior solution
uc(θ)Q1G(θ)
λ(θ)0
else          Boundary solution computes the rightmost eigenvalue λ of the matrix pencil M˜θ(λ) in Eq. (26) and an eigenvector [y1y2] such that M˜θ(0)[y1y2]=λ[0BB0][y1y2].
uc(θ)sign(G(θ)y2)Ω0y1|y1|B
λ(θ)λ
endif
if|Q1G(θ)|B<Ω0then     Interior solution
uc(θ)Q1G(θ)
λ(θ)0
else          Boundary solution computes the rightmost eigenvalue λ of the matrix pencil M˜θ(λ) in Eq. (26) and an eigenvector [y1y2] such that M˜θ(0)[y1y2]=λ[0BB0][y1y2].
uc(θ)sign(G(θ)y2)Ω0y1|y1|B
λ(θ)λ
endif

Remark 5 (Complexity of Algorithm 1). The computational cost of Algorithm 1 for computing the interior solutions is that of solving a linear system (e.g., the standard method of using the LU decomposition of Q), which is O(N3) in the worst case [49]. As demonstrated by Ref. [29], the computational complexity of the algorithm for boundary solution calculations scales as O(N3) floating-point operations (flops) in the case of dense matrices Q and B. However, for large and sparse matrices Q and B, the cost becomes effectively equivalent to that of an Arnoldi-type method for computing a single rightmost eigenpair. This latter cost is typically proportional to a constant factor multiplied by the cost of a single matrix–vector product.

Finally, we summarize some of the key findings from Ref. [29] regarding the matrix pencils (26) and (27) in the next lemma. We will leverage these results to prove the important properties of our proposed TRS-based control law in Sec. 5.

Lemma 4.2 [29]. Consider a givenθQand assume that|Q1G(θ)|BΩ0. Consider the matrix pencilsM˜θ(λ)andP(λ)in Eqs. (26) and (27), respectively. The following properties hold.

  1. M˜θ(λ)is a regular matrix pencil, namely, the determinant ofM˜θ(·)is non-zero for someλand the number of its eigenvalues is equal to its size.

  2. The eigenvalues ofM˜θ(λ)are the roots of the rational equation
    (28)
    wherefi(·)are smooth functions ofθ, andμi, 1i2N, are the eigenvalues of the matrix pencilP(λ). Furthermore,hθ(λ)is strictly decreasing on(μ2N,), whereμ2N=λmaxis the largest eigenvalue of the matrix pencilP(λ).
  3. The largest real eigenvalue ofM˜θ(λ)isλ(θ), where(λ(θ),uc(θ))is the solution pair to the TRS in Eq. (23).

  4. λ(θ)>λmax. Additionally,|y1|B0andG(θ)y20inAlgorithm 1.

5 Existence and Uniqueness of Protein Folding Trajectories

In this section, we address the second objective of PFCP formulated in Sec. 3.3. Specifically, we investigate the properties of the TRS-based folding control input and provide sufficient conditions for existence and uniqueness of the folding trajectories under this input.

We start by defining the entropy constraint surface at each protein conformation θ according to
(29)
and the following two disjoint, open, and connected sets:
(30a)
(30b)

To facilitate subsequent analysis within this section, we leverage the closed-form solution provided by the following proposition. While Algorithm 1 offers a numerically stable scheme for computing the TRS-based pointwise optimal control law in Eq. (23), the closed-form solution is more suitable for further analysis.

Proposition 5.1. The control inputuc(θ), which is the solution to TRS (23), has the closed-form
(31)

Proof. The case for θEθ, i.e., the interior solution, is trivial. For θEθ+Eθ, Lemma 4.2 (Property 4) implies that λ(θ) is not an eigenvalue of the matrix pencil in (27). Therefore, P(λ(θ))=Q+λ(θ)B is invertible. Consequently, multiplying both sides of Eq. (25b) by P1(λ(θ)).▪

We have the following proposition regarding the continuity of the TRS-based pointwise optimal control law given by Eq. (31) on Eθ+ and Eθ.

Proposition 5.2. The control input given by Eq. (31) is continuous on the open setEθ+Eθfor all protein conformationsθQ. Moreover, this control input admits a continuous extension to the closureE¯θ=EθEθ.

Proof. We prove continuity of uc(θ) on each of the sets Eθ and Eθ+, respectively. Proposition 5.1 implies that uc(θ)=Q1G(θ) on the open set Eθ. Hence, the control input is continuous on Eθ for all θ, where G(θ) is a smooth function. On Eθ+, which is given by (30b), it suffices to prove that λ(θ) is a continuous function of θ. For each θ, Lemma 4.2 (Property 3) states that λ(θ) is the largest real eigenvalue of the regular matrix pencil M˜θ(λ), which possesses 4N eigenvalues. Lemma 4.2 (Property 2) indicates that these eigenvalues are the roots of the rational equation hθ(λ)=0 given by Eq. (28). Also, due to Lemma 4.2 (Property 3), the function hθ(·) is strictly decreasing beyond λmax. Since λ(θ)>λmax according to Lemma 4.2 (Property 4), λ(θ) is an isolated root of hθ(λ). Since hθ(λ) depends smoothly on θ, we conclude that the isolated root λ(θ) is a continuous function of θ. Finally, the existence of continuous extension of uc(θ) to E¯θ is trivial from Eq. (31).▪

The TRS-based pointwise optimal control law in Eq. (31) can be applied to Eq. (10) for inducing
(32)

which, due to Proposition 5.2, is a piecewise smooth vector field with potential discontinuities on the surface Eθ. In particular, whenever λ(θ)0 for some θEθ, the control input continuity will be lost.

To investigate the existence and uniqueness of protein folding trajectories induced by the piecewise continuous vector field (32), we need to invoke the notion of Filippov solutions [50]. In particular, we need to compute the Filippov set-valued map of the vector field in (32) and investigate the associated differential inclusion solutions.

Lemma 5.3. The Filippov set-valued map associated with the vector field in (32), which is induced by the TRS-based folding control input in Eq. (31), is
(33)

Proof. For θEθ, Proposition 5.2 implies that ΘODS is continuous. At each of these points, we have FI[ΘODS](θ)={ΘODS(θ)} due to the consistency property of Filippov set-valued maps (see, e.g., Ref. [50]). At points of discontinuity of ΘODS, that is, for θEθ, FI[ΘODS](θ) is the convex hull co{J(θ)F(θ)Q1G(θ),J(θ)F(θ)[Q+λ(θ)B]1G(θ)}. This convex hull is equal to J(θ)F(θ)+co{Q1G(θ),[Q+λ(θ)B]1G(θ)}.▪

The folding pathway trajectory is a Filippov solution to the differential inclusion
(34)

where FI[ΘODS](·) is given by Lemma 5.3. If this solution exists, then it is an absolutely continuous trajectory θ(·) satisfying (34) for almost all time instances in its maximal interval of existence.

The following proposition provides sufficient conditions for existence and uniqueness of protein folding pathway trajectories.

Proposition 5.4. Consider the entropy constraint surfaceEθin Eq. (29) and the differential inclusion given by Eq. (34). If the transversality condition
(35)

holds onEθ, whereΓ:=Q1BQ1andvθ:=J(θ)F(θ)[Q+λ(θ)B]1G(θ), then there exists a unique Filippov solution of (34) starting from each initial condition.

Proof. Consider the surface Eθ and define f(θ):=G(θ)ΓG(θ)Ω02. Accordingly,
(36)

Hence, the condition in Eq. (35) is equivalent to dfθvθ<0 on Eθ. This implies that vθ points into Eθ. According to Ref. [50, Proposition 5], we conclude that there exists a unique Filippov solution of Eq. (34) starting from each initial condition.▪

6 Simulation Results

This section presents numerical simulations to assess the efficacy of our proposed TRS-based pointwise optimal control law for kinetostatic folding. The simulations are carried out by considering the KCM-based nonlinear control affine system in Eq. (10) with two different control inputs. The first folding control input uKCM(·) given by Eq. (11) induces the vector field (12), whose forward Euler integration with time-step κ0 results in the conventional KCM iteration in Eq. (7). The second control input uc(·) is the solution to the TRS (23), which are computed using Algorithm 1 at each conformation. All our simulation results in this section adhere to the principles of Protofold I [19,41] and have been run on an Intel Core i7-6770HQ CPU at 2.60 GHz.

Our simulations focus on convergence of unfolded protein conformations toward two prevalent secondary structures: α-helices and β-pleated sheets. The α-helix (Fig. 7) is a tightly coiled right-handed arrangement of the amino acid chain, a common structural motif in proteins [2]. Similarly, β-pleated sheets (Fig. 7) represent another essential class of protein structures, which are instrumental in protein folding due to their ability to facilitate reorientation of the polypeptide chain [51]. Notably, small proteins like chignolin, a synthetic mini-protein with only 10 amino acids, can exhibit robust beta-hairpin folding [52].

Fig. 7
Schematic depiction of α-helix (left) and β-pleated sheet (right) secondary structures
Fig. 7
Schematic depiction of α-helix (left) and β-pleated sheet (right) secondary structures
Close modal

In all our simulations we consider four different folding control inputs labeled as KCM2 (resulting from the modified KCM approach introduced in Sec. 3.2), KCM (resulting from the conventional KCM approach), TRS-based KCM2 (resulting from the TRS in Eqs. (23) and (24)), and TRS-based KCM (resulting from solving the TRS for the conventional KCM reference vector field though with no theoretical guarantees of convergence). The KCM and KCM2 folding control inputs are given by Eqs. (11) and (14). These control inputs introduce artificial deviations in the natural rate of entropy change as discussed in Remark 3. On the other hand, the norm-constrained TRS-based control inputs prevent significant deviations in the natural rate of entropy change.

6.1 Simulations for Convergence to α-Helix Structures.

The simulations for convergence to α-helix structures were conducted on a protein backbone chain with a 22-dimensional dihedral angle space, namely, a backbone chain with N=10 peptide planes. To ensure uniform starting conditions for the two studied control inputs, the initial protein molecule conformation in all simulations was chosen as the same precoiled backbone chain in the vicinity of an α-helix configuration. The parameter κ0 is chosen to be 2.9×104 for the conventional KCM iteration. The parameters for the TRS-based KCM folding input are chosen to be Q=2NI22, B=I22, Ω(θ)=Ω0=ζc02N, ζ=0.35, and c0 is a constant chemical conversion factor.

Figures 810 depict the numerical simulation results. The protein backbone chain free energy profiles during folding are demonstrated in Fig. 9. The norm of the folding control inputs during folding are shown in Fig. 10. While the conventional KCM folding schemes cannot guarantee a constrained control input satisfying the ellipsoidal constraint given by Eq. (20), the TRS-based schemes succeed in doing so.

Fig. 8
The initial, transient, and final conformations for convergence to an α-helix structure using the proposed TRS-based folding control input
Fig. 8
The initial, transient, and final conformations for convergence to an α-helix structure using the proposed TRS-based folding control input
Close modal
Fig. 9
The protein backbone chain free energy during folding
Fig. 9
The protein backbone chain free energy during folding
Close modal
Fig. 10
The norm of the folding control input
Fig. 10
The norm of the folding control input
Close modal

6.2 Simulations for Convergence to β-Pleated Sheet Structures.

The simulations for convergence to β-pleated sheet structures (Fig. 7) were conducted on a protein backbone chain with a 22-dimensional dihedral angle space (N=10 peptide planes).

To ensure uniform starting conditions for the two studied control inputs, the initial protein molecule conformation in all simulations was chosen as the same, located within the proximity of a β-pleated sheet motif. The parameter κ0 is chosen to be 3.7×104 for the conventional KCM iteration. The parameters for the TRS-based KCM folding input are chosen to be Q=2NI22, B=I22, Ω(θ)=Ω0=ζc02N, ζ=0.35, and c0 is a constant chemical conversion factor.

Figures 1113 depict the numerical simulation results. The protein backbone chain free energy profiles during folding are demonstrated in Fig. 12. The norm of the folding control inputs during folding are shown in Fig. 13. While the conventional KCM folding schemes cannot guarantee a constrained control input satisfying the ellipsoidal constraint given by (20), the TRS-based schemes succeed in doing so.

Fig. 11
The initial, transient, and final conformations for convergence to a β-pleated sheet structure using the proposed TRS-based folding control input
Fig. 11
The initial, transient, and final conformations for convergence to a β-pleated sheet structure using the proposed TRS-based folding control input
Close modal
Fig. 12
The protein backbone chain free energy during folding
Fig. 12
The protein backbone chain free energy during folding
Close modal
Fig. 13
The norm of the folding control input
Fig. 13
The norm of the folding control input
Close modal

6.3 Role of Design Parameters Q and B.

In our simulations, the matrices Q and B play central roles in shaping both the objective function for the trust-region subproblem (TRS) and the ellipsoidal constraint that bounds the folding control input. Specifically, Q is a positive-definite matrix that appears in the quadratic cost term measuring “closeness” of the closed-loop vector field to the reference KCM flow. The choice of Q thus governs how aggressively (or conservatively) the control inputs deviate from the standard kinetostatic torque update while influencing the overall folding speed and its alignment with the KCM reference vector field. In contrast, B is another positive-definite matrix that defines the ellipsoidal norm used for constraining |uc|B.

From a physical perspective, the matrix B encodes how different joint variations affect the entropy of the protein molecule; it thus helps capture subtle details such as anisotropic couplings among torsional degrees-of-freedom suggested by force-field analyses (see, e.g., Ref. [24]). Meanwhile, the matrix Q balances how tightly the system should follow the steepest-descent direction of the interatomic force field, tied to minimizing the free-energy function. Larger off-diagonal entries in Q, for instance, can emphasize correlated angle updates consistent with certain bonded interactions, whereas diagonal-dominant structures may bias the control to focus on angles individually. Choosing Q and B further allows researchers to incorporate prior knowledge from experiments on protein folding rates or known structural motifs (e.g., α-helices versus β-sheets). If, for instance, the biochemistry literature indicates that certain bonds are significantly more sensitive to changes in conformational entropy, increasing the corresponding diagonal entries in B can reflect a stricter constraint on those angles. This flexibility ensures that the simulated pathways do not introduce entropy changes at unphysical rates—an important consideration recognized in studies of kinetically constrained folding [6,24].

7 Conclusion

This work addressed the challenge of predicting protein folding pathways using the KCM framework while constraining the rate of entropy change. Solutions are computed from a GEP-based TRS algorithm, which is computationally efficient and does not require outer iterations. The synthesized folding control inputs induce a closed-loop vector field close to the original KCM reference vector field while ensuring a limited rate of molecule entropy change introduced by the folding control input.

7.1 Impact Beyond Protein Folding.

While motivated by the specific challenges of protein folding, the GEP-based TRS algorithm presented in this paper has broader applicability within control systems and robotics. Unlike traditional TRS algorithms (e.g., semidefinite programming) that rely on iterative computational routines with unpredictable convergence, the GEP-based TRS solver in the paper offers superior efficiency along with guarantees for piecewise continuity of the generated control inputs. The computational efficiency is particularly relevant to the robotics community, where Boyd and Rimon identified the need for GEP-based TRS solvers for robot collision avoidance (e.g., point-ellipsoid distance calculations) [53]. However, scalability limitations due to the lack of efficient large-scale eigensolvers hindered the adoption of these solvers at the time. Additionally, the development of fast iterative solvers for embedded applications (e.g., Refs. [5456]) further shifted the focus away from GEP-based TRS methods. Notably, the prevalent approach in real-time nonlinear MPC by Diehl, Allgöwer, and collaborators leverages linearization and sequential QP techniques [57,58]. Recently, Pappas et al. Recognized the need for explicit TRS solvers in multiparametric nonlinear MPC and initiated research on solving TRSs with iterative active-set strategies [59]. Our work aligns with this renewed interest and offers a powerful alternative for control applications requiring efficient and scalable TRS solutions with guarantees of continuity for the generated control inputs.

Future research includes incorporating solvation effects, investigating protein unfolding subject to entropy loss constraints, and exploring the applicability of the current framework in studying the range of motion and mobility of peptide-based nanomachines.

Acknowledgment

We would like to express our gratitude to the Associate Editor and the reviewers for their constructive feedback, which has enabled us to refine several technical details in the paper.

Funding Data

  • National Science Foundation (NSF) (Award Nos. CMMI-2153744 and CMMI-2153901; Funder ID: 10.13039/100000084).

References

1.
Thirumalai
,
D.
,
O'Brien
,
E. P.
,
Morrison
,
G.
, and
Hyeon
,
C.
,
2010
, “
Theoretical Perspectives on Protein Folding
,”
Annu. Rev. Biophys.
,
39
(
1
), pp.
159
183
.10.1146/annurev-biophys-051309-103835
2.
Finkelstein
,
A. V.
, and
Ptitsyn
,
O.
,
2016
,
Protein Physics: A Course of Lectures
,
Academic Press
,
Amsterdam, The Netherlands
.
3.
Anfinsen
,
C. B.
,
1973
, “
Principles That Govern the Folding of Protein Chains
,”
Science
,
181
(
4096
), pp.
223
230
.10.1126/science.181.4096.223
4.
Levinthal
,
C.
,
1968
, “
Are There Pathways for Protein Folding?
,”
J. Chim. Phys.-Chim. Biol.
,
65
, pp.
44
45
.10.1051/jcp/1968650044
5.
Zwanzig
,
R.
,
Szabo
,
A.
, and
Bagchi
,
B.
,
1992
, “
Levinthal's Paradox
,”
Proc. Natl. Acad. Sci.
,
89
(
1
), pp.
20
22
.10.1073/pnas.89.1.20
6.
Weikl
,
T. R.
, and
Dill
,
K. A.
,
2003
, “
Folding Rates and Low-Entropy-Loss Routes of Two-State Proteins
,”
J. Mol. Biol.
,
329
(
3
), pp.
585
598
.10.1016/S0022-2836(03)00436-4
7.
Lapidus
,
L. J.
,
2021
, “
The Road Less Traveled in Protein Folding: Evidence for Multiple Pathways
,”
Curr. Opin. Struct. Biol.
,
66
, pp.
83
88
.10.1016/j.sbi.2020.10.012
8.
Bergasa-Caceres
,
F.
, and
Rabitz
,
H. A.
,
2020
, “
Interdiction of Protein Folding for Therapeutic Drug Development in SARS CoV-2
,”
J. Phys. Chem. B
,
124
(
38
), pp.
8201
8208
.10.1021/acs.jpcb.0c03716
9.
Hartl
,
F. U.
,
2017
, “
Protein Misfolding Diseases
,”
Annu. Rev. Biochem.
,
86
(
1
), pp.
21
26
.10.1146/annurev-biochem-061516-044518
10.
Hamdi
,
M.
, and
Ferreira
,
A.
,
2009
, “
Multiscale Design and Modeling of Protein-Based Nanomechanisms for Nanorobotics
,”
Int. J. Rob. Res.
,
28
(
4
), pp.
436
449
.10.1177/0278364908099888
11.
Mundrane
,
C.
,
Chorsi
,
M.
,
Vinogradova
,
O.
,
Ilieş
,
H.
, and
Kazerounian
,
K.
,
2022
, “
Exploring Electric Field Perturbations as the Actuator for Nanorobots and Nanomachines
,”
International Symposium on Advances in Robot Kinematics
,
Bilbao
, Spain, June 26--30, pp.
257
265
.10.1007/978-3-031-08140-8_28
12.
Moult
,
J.
,
Fidelis
,
K.
,
Kryshtafovych
,
A.
,
Schwede
,
T.
, and
Tramontano
,
A.
,
2014
, “
Critical Assessment of Methods of Protein Structure Prediction (CASP)—Round x
,”
Proteins: Struct., Function, Bioinformatics
,
82
(
S2
), pp.
1
6
.10.1002/prot.24452
13.
Senior
,
A. W.
,
Evans
,
R.
,
Jumper
,
J.
,
Kirkpatrick
,
J.
,
Sifre
,
L.
,
Green
,
T.
,
Qin
,
C.
, et al.,
2020
, “
Improved Protein Structure Prediction Using Potentials From Deep Learning
,”
Nature
,
577
(
7792
), pp.
706
710
.10.1038/s41586-019-1923-7
14.
Fersht
,
A. R.
,
2021
, “
AlphaFold–a Personal Perspective on the Impact of Machine Learning
,”
J. Mol. Biol.
,
433
(
20
), p.
167088
.10.1016/j.jmb.2021.167088
15.
Heo
,
L.
, and
Feig
,
M.
,
2018
, “
Experimental Accuracy in Protein Structure Refinement Via Molecular Dynamics Simulations
,”
Proc. Natl. Acad. Sci.
,
115
(
52
), pp.
13276
13281
.10.1073/pnas.1811364115
16.
Shaw
,
D. E.
,
Adams
,
P. J.
,
Azaria
,
A.
,
Bank
,
J. A.
,
Batson
,
B.
,
Bell
,
A.
,
Bergdorf
,
M.
,
Bhatt
,
J.
,
Butts
,
J. A.
,
Correia
,
T.
, et al.,
2021
, “
Anton 3: Twenty Microseconds of Molecular Dynamics Simulation Before Lunch
,”
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
, St. Louis, MO, Nov. 14--19, pp.
1
11
.10.1145/3458817.3487397
17.
Kazerounian
,
K.
,
Latif
,
K.
,
Rodriguez
,
K.
, and
Alvarado
,
C.
,
2005
, “
Nano-Kinematics for Analysis of Protein Molecules
,”
ASME J. Mech. Des.
,
127
(
4
), pp.
699
711
.10.1115/1.1867956
18.
Tavousi
,
P.
,
Behandish
,
M.
,
Ilieş
,
H. T.
, and
Kazerounian
,
K.
,
2015
, “
Protofold II: Enhanced Model and Implementation for Kinetostatic Protein Folding
,”
ASME J. Nanotechnol. Eng. Med.
,
6
(
3
), p. 034601.10.1115/1.4032759
19.
Tavousi
,
P.
,
2016
, “
On the Systematic Design and Analysis of Artificial Molecular Machines
,”
Ph.D. thesis
,
University of Connecticut
, Storrs, CT.https://digitalcommons.lib.uconn.edu/cgi/viewcontent.cgi?article=7301&context=dissertations
20.
Mohammadi
,
A.
, and
Al Janaideh
,
M.
,
2023
, “
Sign Gradient Descent Algorithms for Kinetostatic Protein Folding
,”
International Conference on Manipulation, Automation and Robotics at Small Scales
(
MARSS
), Abu Dhabi, UAE, Oct. 9--13, pp.
1
6
.10.1109/MARSS58567.2023.10294128
21.
Shahbazi
,
Z.
,
Ilieş
,
H. T.
, and
Kazerounian
,
K.
,
2010
, “
Hydrogen Bonds and Kinematic Mobility of Protein Molecules
,”
ASME J. Mech. Rob.
,
2
(
2
), p.
021009
.10.1115/1.4001088
22.
Wolf
,
S.
,
Amaral
,
M.
,
Lowinski
,
M.
,
Vallée
,
F.
,
Musil
,
D.
,
Güldenhaupt
,
J.
,
Dreyer
,
M. K.
,
Bomke
,
J.
,
Frech
,
M.
,
Schlitter
,
J.
, and
Gerwert
,
K.
,
2019
, “
Estimation of Protein–Ligand Unbinding Kinetics Using Non-Equilibrium Targeted Molecular Dynamics Simulations
,”
J. Chem. Inf. Model.
,
59
(
12
), pp.
5135
5147
.10.1021/acs.jcim.9b00592
23.
Ferrara
,
P.
,
Apostolakis
,
J.
, and
Caflisch
,
A.
,
2000
, “
Computer Simulations of Protein Folding by Targeted Molecular Dynamics
,”
Proteins: Struct., Function, Bioinform.
,
39
(
3
), pp.
252
260
.10.1002/(SICI)1097-0134(20000515)39:3<252::AID-PROT80>3.0.CO;2-3
24.
Arkun
,
Y.
, and
Erman
,
B.
,
2010
, “
Prediction of Optimal Folding Routes of Proteins That Satisfy the Principle of Lowest Entropy Loss: Dynamic Contact Maps and Optimal Control
,”
PLoS One
,
5
(
10
), p.
e13275
.10.1371/journal.pone.0013275
25.
Arkun
,
Y.
, and
Gür
,
M.
,
2012
, “
Combining Optimal Control Theory and Molecular Dynamics for Protein Folding
,”
PLoS One
,
7
(
1
), p.
e29628
.10.1371/journal.pone.0029628
26.
Mohammadi
,
A.
, and
Spong
,
M. W.
,
2022
, “
Quadratic Optimization-Based Nonlinear Control for Protein Conformation Prediction
,”
IEEE Control Syst. Lett.
,
6
, pp.
373
378
.10.1109/LCSYS.2021.3076869
27.
Wang
,
Y.
, and
Boyd
,
S.
,
2011
, “
Fast Evaluation of Quadratic control-Lyapunov Policy
,”
IEEE Trans. Control Syst. Technol.
,
19
(
4
), pp.
939
946
.10.1109/TCST.2010.2056371
28.
Levy
,
R. M.
,
Karplus
,
M.
,
Kushick
,
J.
, and
Perahia
,
D.
,
1984
, “
Evaluation of the Configurational Entropy for Proteins: Application to Molecular Dynamics Simulations of an α-Helix
,”
Macromolecules
,
17
(
7
), pp.
1370
1374
.10.1021/ma00137a013
29.
Adachi
,
S.
,
Iwata
,
S.
,
Nakatsukasa
,
Y.
, and
Takeda
,
A.
,
2017
, “
Solving the Trust-Region Subproblem by a Generalized Eigenvalue Problem
,”
SIAM J. Optim.
,
27
(
1
), pp.
269
291
.10.1137/16M1058200
30.
Morris
,
B.
,
Powell
,
M. J.
, and
Ames
,
A. D.
,
2013
, “
Sufficient Conditions for the Lipschitz Continuity of QP-Based Multi-Objective Control of Humanoid Robots
,”
52nd IEEE Conference on Decision and Control
, Firenze, Italy, Dec. 10--13, pp.
2920
2926
.
31.
Mohammadi
,
A.
, and
Spong
,
M. W.
,
2023
, “
Prediction of Protein Folding Pathways Under Entropy-Loss Constraints Using Quadratic Programming-Based Nonlinear Control
,” Poster Session Presented at American Control Conference (
ACC
),
San Diego, CA
, May 31–June 2, p.
1
.https://par.nsf.gov/servlets/purl/10427744
32.
Diez
,
M.
,
Petuya
,
V.
,
Martínez-Cruz
,
L. A.
, and
Hernández
,
A.
,
2013
, “
Biokinematic Protein Simulation by an Adaptive Dihedral Angle Approach
,”
Mech. Mach. Theory
,
69
, pp.
105
114
.10.1016/j.mechmachtheory.2013.05.007
33.
Ekenna
,
C.
,
Thomas
,
S.
, and
Amato
,
N. M.
,
2016
, “
Adaptive Local Learning in Sampling Based Motion Planning for Protein Folding
,”
BMC Syst. Biol.
,
10
(
S2
), pp.
165
179
.10.1186/s12918-016-0297-9
34.
Mohammadi
,
A.
, and
Spong
,
M. W.
,
2022
, “
Chetaev Instability Framework for Kinetostatic Compliance-Based Protein Unfolding
,”
IEEE Control Syst. Lett.
,
6
, pp.
2755
2760
.10.1109/LCSYS.2022.3176433
35.
Adolf
,
D. B.
, and
Ediger
,
M. D.
,
1991
, “
Brownian Dynamics Simulations of Local Motions in Polyisoprene
,”
Macromolecules
,
24
(
21
), pp.
5834
5842
.10.1021/ma00021a018
36.
Scheraga
,
H. A.
,
Khalili
,
M.
, and
Liwo
,
A.
,
2007
, “
Protein-Folding Dynamics: Overview of Molecular Simulation Techniques
,”
Annu. Rev. Phys. Chem.
,
58
(
1
), pp.
57
83
.10.1146/annurev.physchem.58.032806.104614
37.
Neumaier
,
A.
,
1997
, “
Molecular Modeling of Proteins and Mathematical Prediction of Protein Structure
,”
SIAM Rev.
,
39
(
3
), pp.
407
460
.10.1137/S0036144594278060
38.
Noguti
,
T.
, and
,
N.
,
1983
, “
A Method of Rapid Calculation of a Second Derivative Matrix of Conformational Energy for Large Molecules
,”
J. Phys. Soc. Jpn.
,
52
(
10
), pp.
3685
3690
.10.1143/JPSJ.52.3685
39.
Arkun
,
Y.
, and
Gür
,
M.
,
2011
, “
Protein Folding Using Coarse-Grained Optimal Control and Molecular Dynamics
,”
IFAC Proc. Vols.
,
44
(
1
), pp.
14213
14216
.10.3182/20110828-6-IT-1002.02489
40.
Ortega
,
R.
,
Loria
,
A.
,
Kelly
,
R.
, and
Praly
,
L.
,
1994
, “
On Passivity-Based Output Feedback Global Stabilization of Euler-Lagrange Systems
,”
Proceedings of 33rd IEEE Conference on Decision and Control
(
CDC
), Lake Buena Vista, FL, Dec. 14--16, pp.
381
386
.10.1109/CDC.1994.410898
41.
Kazerounian
,
K.
,
Latif
,
K.
, and
Alvarado
,
C.
,
2005
, “
PROTOFOLD: A Successive Kinetostatic Compliance Method for Protein Conformation Prediction
,”
ASME J. Mech. Des.
,
127
(
4
), pp.
712
717
.10.1115/1.1867502
42.
Chorsi
,
M. T.
,
Tavousi
,
P.
,
Mundrane
,
C.
,
Gorbatyuk
,
V.
,
Kazerounian
,
K.
, and
Ilieş
,
H.
,
2021
, “
Kinematic Design of Functional Nanoscale Mechanisms From Molecular Primitives
,”
ASME J. Micro- Nano-Manuf.
,
9
(
2
), p.
021005
.10.1115/1.4051472
43.
Cortés
,
J.
,
2006
, “
Finite-Time Convergent Gradient Flows With Applications to Network Consensus
,”
Automatica
,
42
(
11
), pp.
1993
2000
.10.1016/j.automatica.2006.06.015
44.
Chirikjian
,
G. S.
,
2011
, “
Modeling Loop Entropy
,”
Methods Enzymol.
,
487
, pp.
99
132
.10.1016/B978-0-12-381270-4.00004-4
45.
Hikiri
,
S.
,
Yoshidome
,
T.
, and
Ikeguchi
,
M.
,
2016
, “
Computational Methods for Configurational Entropy Using Internal and Cartesian Coordinates
,”
J. Chem. Theory Comput.
,
12
(
12
), pp.
5990
6000
.10.1021/acs.jctc.6b00563
46.
Halperin
,
A.
, and
Goldbart
,
P. M.
,
2000
, “
Early Stages of Homopolymer Collapse
,”
Phys. Rev. E
,
61
(
1
), pp.
565
573
.10.1103/PhysRevE.61.565
47.
Thomas
,
R.
,
Thorp
,
J.
, and
Pottle
,
C.
,
1976
, “
A Model-Referenced Controller for Stabilizing Large Transient Swings in Power Systems
,”
IEEE Trans. Autom. Control
,
21
(
5
), pp.
746
750
.10.1109/TAC.1976.1101343
48.
Spong
,
M.
,
Thorp
,
J.
, and
Kleinwaks
,
J.
,
1986
, “
The Control of Robot Manipulators With Bounded Input
,”
IEEE Trans. Autom. Control
,
31
(
6
), pp.
483
490
.10.1109/TAC.1986.1104314
49.
Golub
,
G. H.
, and
Van Loan
,
C. F.
,
1996
,
Matrix Computations
, 3rd ed.,
Johns Hopkins University Press
,
Baltimore, MD
.
50.
Cortes
,
J.
,
2008
, “
Discontinuous Dynamical Systems
,”
IEEE Control Syst. Mag.
,
28
(
3
), pp.
36
73
.10.1109/MCS.2008.919306
51.
Kountouris
,
P.
, and
Hirst
,
J. D.
,
2010
, “
Predicting β-Turns and Their Types Using Predicted Backbone Dihedral Angles and Secondary Structures
,”
BMC Bioinform.
,
11
(
1
), pp.
1
11
.10.1186/1471-2105-11-407
52.
Maruyama
,
Y.
, and
Mitsutake
,
A.
,
2018
, “
Analysis of Structural Stability of Chignolin
,”
J. Phys. Chem. B
,
122
(
14
), pp.
3801
3814
.10.1021/acs.jpcb.8b00288
53.
Rimon
,
E.
, and
Boyd
,
S. P.
,
1997
, “
Obstacle Collision Detection Using Best Ellipsoid Fit
,”
J. Intell. Rob. Syst.
,
18
(
2
), pp.
105
126
.10.1023/A:1007960531949
54.
Domahidi
,
A.
,
Chu
,
E.
, and
Boyd
,
S.
,
2013
, “
ECOS: An SOCP Solver for Embedded Systems
,” European Control Conference (
ECC
), Zurich, Switzerland, July 17--19, pp.
3071
3076
.10.23919/ECC.2013.6669541
55.
Bemporad
,
A.
,
2016
, “
A Quadratic Programming Algorithm Based on Nonnegative Least Squares With Applications to Embedded Model Predictive Control
,”
IEEE Trans. Autom. Control
,
61
(
4
), pp.
1111
1116
.10.1109/TAC.2015.2459211
56.
Takapoui
,
R.
,
Moehle
,
N.
,
Boyd
,
S.
, and
Bemporad
,
A.
,
2020
, “
A Simple Effective Heuristic for Embedded Mixed-Integer Quadratic Programming
,”
Int. J. Control
,
93
(
1
), pp.
2
12
.10.1080/00207179.2017.1316016
57.
Gros
,
S.
,
Zanon
,
M.
,
Quirynen
,
R.
,
Bemporad
,
A.
, and
Diehl
,
M.
,
2020
, “
From Linear to Nonlinear MPC: Bridging the Gap Via the Real-Time Iteration
,”
Int. J. Control
,
93
(
1
), pp.
62
80
.10.1080/00207179.2016.1222553
58.
Diehl
,
M.
,
Findeisen
,
R.
,
Allgöwer
,
F.
,
Bock
,
H. G.
, and
Schlöder
,
J. P.
,
2005
, “
Nominal Stability of Real-Time Iteration Scheme for Nonlinear Model Predictive Control
,”
IEE Proc.–Control Theory Appl.
,
152
(
3
), pp.
296
308
.10.1049/ip-cta:20040008
59.
Pappas
,
I.
,
Diangelakis
,
N. A.
, and
Pistikopoulos
,
E. N.
,
2021
, “
Multiparametric/Explicit Nonlinear Model Predictive Control for Quadratically Constrained Problems
,”
J. Process Control
,
103
, pp.
55
66
.10.1016/j.jprocont.2021.05.001