Abstract
This paper addresses the problem of algorithmic prediction of protein folding pathways, namely, the transient three-dimensional conformations of protein molecules during folding, under constrained rates of entropy change. We formulate the physics-based prediction of folding pathways as a control synthesis problem, where the control inputs guide the protein folding simulations. These folding control inputs are obtained from large-scale trust-region subproblems (TRS) utilizing a computationally efficient algorithm with no need for outer iterations. The proposed control synthesis approach, which leverages the solutions obtained from a special generalized eigenvalue problem, avoids potentially cumbersome and unpredictable iterative computations at each protein conformation. Moreover, the TRS-based control inputs align the closed-loop dynamics closely with the kinetostatic compliance method (KCM) reference vector field while satisfying ellipsoidal constraints on the folding control inputs. Finally, we provide conditions for existence and uniqueness of the resulting closed-loop solutions, which are the protein folding pathways under constraints on the rate of entropy change. Numerical simulations utilizing the KCM approach on protein backbones confirm the effectiveness of the proposed framework.
1 Introduction
The protein folding problem can be understood in two key parts [1,2], which build upon Anfinsen's seminal work [3] and Levinthal's proposition [4]. In the former, the aim is to predict a protein's three-dimensional (3D) structure solely from its amino acid sequence. In the latter, the objective is to compute transient conformations of proteins during folding (see Fig. 1).

The protein folding problem is concerned with finding the transient and final folded conformations of protein molecules from their denatured unfolded state. Proteins adopt pathways to avoid high entropy loss routes during folding.
Understanding protein folding as a guided process involving specific pathways or mechanisms stems from Levinthal's paradox [4]. According to this paradox, protein folding through a random search of every possible protein conformation is implausible. This is because the number of potential protein configurations is astronomically vast and a random search would lead to an incredibly long folding time (Levinthal's time). Despite the inherent challenge of Levinthal's paradox, incorporating a minimal energy bias against energetically unfavorable protein conformations can significantly reduce the folding time scale to biologically relevant regimes [5]. Furthermore, the difference in entropy between denatured and folded proteins contributes to the complexity of folding. In particular, denatured proteins exhibit a markedly higher conformational entropy compared to their folded native state. Moreover, two-state proteins preferentially adopt pathways that minimize entropy loss during folding [6].
1.1 Importance of Protein Folding Simulations.
The limited temporal resolution of protein structural data presents a significant challenge to experimentally elucidating protein folding pathways [7]. Consequently, algorithmic prediction of protein folding pathways has emerged as a powerful complementary approach, finding applications in computer-aided drug discovery [8] and studies of protein misfolding diseases [9]. Furthermore, folding simulations hold promise for engineering applications, enabling quantitative prediction of mobility and range of motion in protein-based nanomachines [10,11].
1.2 Knowledge-Based Versus Physics-Based Approaches.
One category of widely used algorithms for predicting the final folds of proteins is provided by what is known as knowledge-based approaches. These algorithms, which are based on pattern recognition and machine learning, infer the 3D structures of folded protein shapes by harnessing the predictive power of the amino acid sequences (akin to Anfinsen's dogma) and leveraging extensive datasets of existing folds [12]. For instance, AlphaFold's core component is a neural network trained on the protein data bank (PDB) structures. This network predicts probability distributions of inter-residue atom distances and leverages the patterns derived from the vast protein sequence landscape through the integration of sequence databases and multiple sequence alignments [13]. However, knowledge-based approaches cannot describe the interactions between proteins and nucleic acids. Furthermore, they fail to take into account the stability of the protein molecule and the rates at which the folding process occurs, i.e., the folding kinetics (see, e.g., Ref. [14]).
In contrast to knowledge-based strategies, physics-based methods utilize basic physical principles to simulate the process of folding while computing the transient and final 3D structures of protein molecules [15]. While physics-based strategies supply trustworthy data concerning the temporary structures in the folding process, these simulations require significant processing time and specialized hardware (e.g., Anton family of supercomputers for atomic-level folding simulations [16]).
1.3 The Kinetostatic Compliance Method Framework.
To overcome the computational burden of physics-based protein folding algorithms, the innovative kinetostatic compliance method (KCM) framework has been developed in the literature [17–20]. KCM models protein molecules as a multitude of rigid nanolinkages forming a nanokinematic mechanism subject to movement limitations imposed by chemical bonds. To address the significant computational expense associated with all-atom molecular dynamics, KCM iteratively refines protein structures by varying their dihedral angles in a kinetostatic manner. These angles, which define the relative orientations of bonded atoms, are varied under the influence of nonlinear interatomic force fields through an iterative process driving the protein molecule conformation toward a minimized energy state corresponding to the final folded structure.
Kinetostatic compliance method has been a valuable tool for understanding hydrogen bond formation and its influence on protein mobility since its introduction [21]. Moreover, KCM has facilitated the design of peptide-based nanomachines [11]. However, a critical knowledge gap exists: the conventional KCM does not consider how it perturbs the rate of entropy change in kinetostatic protein folding.
1.4 Control Theory and Protein Folding.
The idea of constraining protein folding dynamics using a guiding control input finds its roots in targeted molecular dynamics (TMD) [22,23]. In TMD, a proportional control input regulates the distance between a protein molecule and its target conformation, aiming to expedite the transition to the final folded state.
One of the few attempts to constrain the folding simulations by synthesizing proper guiding control inputs is due to Arkun and collaborators [24,25]. They utilized a bead-spring model of the protein molecule, assuming a linear time-invariant system. They subsequently employed a linear quadratic regulator (LQR) control scheme to guide the folding process. Their controller aimed to navigate away from high-energy regions of the folding landscape while simultaneously regulating the rate of molecule entropy change. However, the simplistic force fields used by Refs. [24] and [25] cannot capture the nonlinear and complicated dynamics of proteins under the electrostatic and van der Waals interatomic effects.
To address the limitations of the protein bead-spring LQR-based folding control framework due to Arkun and collaborators [24,25], we recently proposed a control-theoretic interpretation of KCM [26]. We demonstrated that constraints on the rate of change of entropy can be incorporated into the KCM folding dynamics through control inputs that steer the folding simulation. These control inputs are obtained by solving a large-scale quadratic program (QP) with constraints on a physically relevant norm of the control input. While our initial work utilized interior point algorithms to solve box-constrained QPs for computing the folding control inputs, it encountered a significant limitation. Specifically, it relied on polytopic inequalities, as noted in Ref. [26], which are compatible with rapid numerical solvers (see, e.g., Ref. [27]). This approach contrasts with more realistic quadratic constraints on the inputs. Notably, standard methods in biochemistry, which represent protein configurational entropy, employ quadratic representations based on ellipsoidal norms [28].
1.5 Contributions of the Paper.
This paper makes two key contributions to the KCM-based protein folding framework. First, we address a significant gap in KCM-based protein folding literature. While our previous work constrained the rates of entropy change in the KCM paradigm for the first time [26], it suffers the shortcoming of encoding these constraints via polytopic inequalities with an inevitable loss of accuracy in predicting the folding pathways. To overcome this challenge, we propose a novel approach that employs a computationally efficient trust-region subproblem (TRS) algorithm for constraining the rates of entropy change via ellipsoidal inequalities (Algorithm 1). Moreover, this algorithm offers a significant advantage by eliminating the need for outer iterations, thereby enhancing computational efficiency. This computationally efficient control synthesis algorithm, which is taken from the TRS optimization literature [29], is based on solving a generalized eigenvalue problem (GEP). This approach ensures that the TRS-based folding control inputs induce a closed-loop dynamics faithful to the KCM reference vector field while satisfying the ellipsoidal constraints on the folding control input.
Our second contribution focuses on leveraging the large-scale GEP-based TRS solver developed by Adachi et al. [29] to generate optimization-based control inputs. While Adachi et al. primarily targeted the optimization literature, where solution continuity was not a primary concern [29], control systems considerations necessitate investigating continuity properties of the generated control inputs (see, e.g., Ref. [30] for humanoid robots). Remarkably, the application of Adachi et al.'s solver and its continuity properties within the context of nonlinear optimization-based control remains unexplored. We address this gap by investigating the piecewise continuity of pointwise optimal control laws obtained from the GEP-based TRS solver. We also establish conditions for existence and uniqueness of closed-loop trajectories under these control inputs (Propositions 5.2 and 5.4). Other contributions of this paper include the novel derivation of KCM protein folding dynamics from a Lagrange–Rayleigh formalism (Sec. 3.1) and the first formal proofs of convergence for the KCM folding scheme to folded conformations in the literature (Sec. 3.2).
This paper significantly expands upon a preliminary version presented as a conference poster [31]. In this paper, we delve deeper into the control synthesis algorithm by providing a detailed description, discussing its computational complexity, and exploring its numerical properties. Additionally, we present a comprehensive analysis (absent from the conference poster) investigating the piecewise continuity of control inputs and establishing conditions for existence and uniqueness of closed-loop protein folding trajectories. Finally, the algorithm effectiveness is rigorously assessed through more extensive numerical simulations.
The remainder of this paper is structured as follows. First, in Sec. 2, we briefly present the nanokinematic structure of protein molecules and the main underpinnings of the KCM framework for simulating the kinetostatic folding of proteins. We further demonstrate how the KCM-based approach to protein folding can be formulated as a control synthesis problem. Thereafter, we formulate the main control problem in Sec. 3 and present a pointwise optimal control scheme for solving it in Sec. 4. We subsequently present conditions for existence and uniqueness of solutions to the resulting closed-loop protein folding dynamics in Sec. 5. After presenting the simulation results in Sec. 6, we conclude the paper with final remarks, the impact of the current work beyond protein folding, and future directions in Sec. 7.
1.6 Notation.
Throughout this paper, and denote the set of all non-negative real numbers and non-negative integers, respectively. The set denotes the unit circle. The set denotes the special orthogonal group of order three. The matrix denotes the identity matrix. Given the vector and symmetric positive definite matrix , denotes the vector Euclidean norm, denotes the vector -norm, and denotes the vector ellipsoidal norm. Given the integer n, denotes the identity matrix of size n. Hence, . Given a matrix , and denote the range and null space of the matrix, respectively. Additionally, () indicate that is a positive definite (resp., semidefinite) matrix. Given a set , () denote the convex hull (respectively, convex closure) of the set. Given a smooth map from an open subset of to an open subset of , denotes the Jacobian of at . If is a real-valued function, , where denotes the gradient of at . Given a vector field , is the Filippov set-valued map associated with the vector field.
2 Background
In this section, we provide a summary of the KCM framework for modeling the kinetostatic folding of proteins in vacuo.
2.1 Nano-Linkage-Based Kinematic Model of Protein Molecules.
Proteins are macromolecules with complex structures and intricate dynamics governing their folding process. These long molecular chains consist of peptide planes that are interconnected by chemical bonds. The functionality of protein molecules depends critically on their 3D structure (i.e., their conformation), which can be directly deduced from the linear sequence of amino acids forming their polypeptide chain.
A protein backbone chain schematic is depicted in Fig. 2, where each individual peptide plane is formed by six coplanar atoms that are covalently bonded (visualized as red lines in Fig. 2). The coplanarity assumption about the -carbon , carbonyl CO, amide nitrogen NH, and another -carbon is motivated by the wealth of high-resolution X-ray crystallographic data [2]. This fundamental assumption underpins the representation of protein molecules as intricate nanomechanisms with numerous degrees-of-freedom (DOFs) (see, e.g., Refs. [18], [26], and [32–34]). Accordingly, the key building blocks of the protein kinematic chain are the peptide planes that function as nanoscale rigid linkages.

Modeling protein backbone chain as a nanokinematic mechanism where peptide planes act as rigid links that are interconnected by revolute joints centered at the alpha carbon atoms
2.1.1 The Protein Configuration Vector.
The rotation of the nanolinkages in the protein kinematic chain with respect to each other is facilitated by the alpha-Carbon atoms (Fig. 2), which play the role of hinges connecting peptide planes together. The first and the last atoms are bonded to an N-terminus–peptide plane and a C-terminus–peptide plane, respectively.
Represents the configuration of the backbone chain. The dihedral angle vector resides within the 2N-dimensional configuration space , where each individual unit circle corresponds to a single dihedral angle within the backbone chain.
Each dihedral angle in the conformation vector in Eq. (1) corresponds to a single DOF in the protein molecule's kinematic chain. For each DOF, we can define a unit vector, , . These vectors align with the axes of rotation around which the nanokinematic chain can rotate. As illustrated in Fig. 2, the unit vectors and represent the directions along two crucial bonds within the ith peptide plane: the bond and the bond. These bonds define the primary rotation axes within the chain's segments. Serving as key structural references, and denote the unit vectors associated with the amino terminus (N-terminus) and carboxy terminus (C-terminus), respectively. The reference conformations are established by setting all dihedral angles to zero, i.e., for . This configuration results in the protein adopting a linear coil conformation, representing a denatured state. In this state, the protein is stretched or completely unfolded, lacking the specific three-dimensional structure that is characteristic of its functional, native form. Such a state is typically induced by various physical or chemical means, including heat, changes in pH, Chaotropic agents like urea, or mechanical stretching.
While dihedral angles define the rotational DOFs within a protein's kinematic chain, additional information is necessary to fully capture the spatial orientation of the rigid peptide nanolinkages. This information comes in the form of body vectors, denoted by , . These vectors comprehensively define the relative spatial arrangement of coplanar atoms within each peptide plane. Consequently, any relative positional relationship between two atoms within a plane can be mathematically expressed as a linear combination of their corresponding body vectors, i.e., , where the coefficients and , , are constants specific to the atom pair and are identical throughout the peptide chain (see, e.g., Refs. [18] and [19] for more details). The body vector is oriented along the line connecting the alpha Carbon atom to the Nitrogen atom on the ith peptide plane, while aligns with the chemical bond linking the Nitrogen atom of the ith peptide plane to the alpha Carbon of the following peptide plane. Due to their distinct directional alignments within the peptide plane structure, and are inherently linearly independent (Fig. 2). The body vectors and unit vectors act in concert to provide a comprehensive depiction of the protein molecule's conformation as it dynamically varies with respect to the vector , which encompasses the peptide dihedral angles.
2.1.2 The Protein Forward Kinematics.
2.2 Kinetostatic Compliance Method.
Kinetostatic compliance method framework leverages the established experimental observation that it is possible to capture the essence of protein folding dynamics while neglecting inertial forces (see, e.g., Refs. [24], [25], and [35]). According to KCM, protein dihedral angles evolve kinetostatically under the influence of interatomic force fields.
2.2.1 Interatomic Force Fields Responsible for Kinetostatic Folding.
Consider a peptide chain composed of atoms and peptide planes. The dihedral angle vector, denoted by defined in Eq. (1), encodes the conformational state of the chain. The Cartesian coordinates of any two atoms , within the chain are represented by , , respectively (Eq. (4)). Their Euclidean distance is then calculated as . The specific parameters pertaining to atom charges, van der Waals radii, interatomic distances, dielectric constant, potential well depths, and force weights are readily available in Ref. [19] and the provided references.
where and represent the protein's electrostatic potential energy and van der Waals interatomic potential energy, respectively (detailed expressions can be found in Ref. [18]). Consequently, the net forces acting on each atom , , due to both Coulombic and van der Waals interactions, can be obtained from the negative gradients of the respective individual energy terms: and . It is remarked that the molecular dynamics formulations begin with Cartesian coordinates of the atoms given by for , before transitioning to dynamics described in generalized coordinates (see, e.g., Ref. [36]). In the KCM framework, the generalized coordinates are given by the dihedral angles . Hence, for . The free energy of the molecule is indeed a function of , and thus, should explicitly relate to the Cartesian coordinates as .
2.2.2 Kinetostatic Folding Torque Vector.
where represents the molecule chain Jacobian at conformation . This matrix translates the 6N-dimensional force vector into the corresponding 2N-dimensional torque vector governing dihedral angle evolution (see Ref. [18] for detailed derivations). Consequently, the changes in dihedral angles at each protein conformation take place under the direct influence of the kinetostatic folding torque vector acting on the peptide backbone. Specifically, the vector , which aligns with the steepest-descent direction of the total free energy in the protein conformation landscape, guides the dihedral angle variations during folding.
2.2.3 Folded Protein Conformations.
Within the context of protein folding, at each local minimum of the aggregate free energy function defined in Eq. (5), the corresponding torque vector vanishes identically. This signifies the protein molecule's kinetostatic stationarity at folded conformations, where the absence of net kinetostatic torque reflects the balanced internal forces within the protein structure.
2.2.4 Successive Kinetostatic Folding Iteration.
where due to Eq. (6). The positive constant is chosen sufficiently small to prevent excessive angle variations and is heuristically optimized.
A flowchart of the successive kinetostatic folding iteration is depicted in Fig. 3. In the dihedral angle update iterations given by (7), the normalized torque vector governs the incremental updates to the dihedral angles at each conformation (see Ref. [20] for a sign gradient descent-based version of this iteration). The iterative process governed by Eq. (7) continues until the molecule's aggregated free energy converges to the vicinity of a local minimum within the free-energy landscape. Convergence is achieved numerically when the kinetostatic folding torque vector norm falls below a predefined tolerance , i.e., when , for some positive integer . The protein folding pathway transient conformations are given by .

Flowchart of the conventional kinetostatic folding iteration. The variation of dihedral angles in the flowchart (within dashed borders) is governed by Eq. (7).

Flowchart of the conventional kinetostatic folding iteration. The variation of dihedral angles in the flowchart (within dashed borders) is governed by Eq. (7).
2.3 Properties of the Aggregated Free Energy of the Protein Molecule.
In this section we state the smoothness properties of the aggregated energy of the protein molecule, , in the vicinity of folded conformations (see, e.g., Ref. [37] for further details). In particular, in a vicinity of a folded conformation , is a smooth function of the dihedral angle vector and assumes a local minimum in this folded conformation. Therefore, is a critical point of and . Finally, in the so-called protein metastable states including the folded conformations, the Hessian of is positive definite.
3 A Control Systems-Oriented Interpretation of Kinetostatic Compliance Method-Based Folding
In this section, we recast the KCM-based folding numerical scheme presented in Sec. 2.2 as a nonlinear affine control system. Our point of departure is the Lagrange–Rayleigh formalism resulting in an Euler–Lagrange control system. This controls-oriented framing, which first appeared in our earlier work [26,34], draws inspiration from the field of targeted molecular dynamics (TMD) [22,23] and Arkun et al.'s optimal control framework, where they employed linear spring-like forces described by a connectivity matrix [24,25,39]. In contrast to Arkun et al., our KCM-based line of work utilizes a nonlinear dynamical system to capture the dynamics within the molecule's dihedral angle space.
3.1 Deriving the Kinetostatic Compliance Method Dynamics From a Lagrange-Rayleigh Formalism.
where the Jacobian matrix, , and the generalized force vector, , are identical to those defined in Eq. (6). Equation (10) describes the dynamics of the dihedral angles of a protein molecule subject to interatomic forces and a folding control input, . In the KCM dynamics, the folding control input, , can represent either physical forces, such as those exerted by optical tweezers or solvation effects, or nonphysical fictitious forces designed to enhance the convergence rate toward folded conformations in simulations performed in vacuo. In the case of the latter, which is the subject of this paper, the folding control input functions similarly to the forces used in targeted molecular dynamics.
Notably, employing a forward Euler scheme with an integration step equal to recovers the KCM iteration in Eq. (7) from the reference vector field in Eq. (12). In summary, the successive kinetostatic folding iteration in Eq. (7) is related to the reference vector field given by Eq. (12). This vector field is induced by applying the folding control input (11) to the KCM-based nonlinear control affine system in Eq. (10). Furthermore, the trajectories of the vector field in Eq. (12) provide the protein folding pathways.
Remark 1. While the controls-oriented perspective in Ref. [26] was not explicitly addressed in the original KCM-based protein folding works (e.g., Refs. [18] and [19]), it offers a valuable foundation for extending the KCM framework. As discussed later, this control-theoretic interpretation allows us to constrain the molecule entropy changes introduced by the folding control input, leading to more realistic pathway predictions.
Remark 2. The KCM reference vector field described by Eq. (12), originally developed through the pioneering efforts of Kazerounian, Ilieş, and their collaborators (e.g., Refs. [18] and [41]), has been widely utilized in the KC esponds to a veM literature. This vector field has demonstrated remarkable success in numerical protein folding simulations and has been effectively utilized in significant engineering applications, such as design of peptide-based nanomechanisms [42]. Unfortunately, its use has lacked formal theoretical guarantees and proofs of convergence to folded conformations. In this paper, we address this shortcoming by considering a very similar reference vector field (see Sec. 3.2) for which convergence and smoothness can be easily established. Our simulations will include comparisons of both reference vector fields.
3.2 Convergence of the Kinetostatic Compliance Method Reference Vector Field.
In an open neighborhood of a specific folded conformation—excluding the conformation itself—the new KCM vector field given by Eq. (13) is smooth, thanks to the smoothness properties of the aggregated free energy of protein molecules (Sec. 3.2). The gradient flow induced by the reference vector field (13) has well-defined absolutely continuous solutions. Furthermore, using the results in Cortés' work [43, Proposition 7 and Theorem 8], it can be shown that the KCM flow with initial conformations in a neighborhood of a given folded conformation is guaranteed to converge to it in finite time due to positive definiteness of the Hessian of the aggregated free energy of the molecule (Sec. 3.2).
Figures 4 and 5 compare the folding pathways of a protein backbone chain with peptide planes, starting near a -pleated sheet conformation (with simulation numerical parameters detailed in the -pleated sheet simulations in Sec. 6), under two different reference vector fields. The KCM reference vector fields in Eq. (12) and in Eq. (13) are derived from the control inputs and , as defined in Eqs. (11) and (14), respectively, within the KCM dynamics (10). As illustrated in the figures, although the original reference vector field commonly used in KCM folding studies converges more quickly, it does not possess the continuity properties of the modified vector field. Indeed, the conventional KCM control input starts oscillating after around 6000 iterations. Therefore, we continue our theoretical derivations using the modified reference vector field as outlined in Remark 2 and present the numerical simulation results for both vector fields in the sequel.
3.3 Encoding Entropy Loss Constraints in Kinetostatic Compliance Method Protein Folding.
In this section we consider the controls-oriented protein folding KCM dynamics and investigate how excessive entropy changes due to the folding control input can be constrained.
We first derive an expression for the change of the molecule entropy during folding. This derivation is motivated by a key observation in the field of structural biochemistry, which states that excessively rapid changes of entropy should be avoided during the guided folding process [6,24,39]. Multiple established methodologies exist to calculate the entropy changes experienced by a protein molecule during folding. These methodologies can be broadly classified into two categories: those based on configurational entropy computations (e.g., Ref. [44]) and those rooted in thermodynamics-derived operational definitions (e.g., Refs. [45] and [46]). In this work, we will employ the Clausius thermodynamic expression outlined in Refs. [45] and [46]. This approach shares similarities with the derivations presented by Arkun et al. [24], as both methods leverage the principles of classical thermodynamics.
where for some symmetric positive definite matrix and constant . When , we arrive at .
As it can be seen from Eq. (20), we establish an ellipsoidal constraint on the control input . This constraint directly limits the total rate of entropy change , rather than just the ‘extra’ or ‘deviating’ component . This approach is justified by Eq. (16), where the ‘natural’ torque term is influenced by the KCM free-energy gradients. By bounding , we ensure that remains within a permissible range. Consequently, controlling the magnitude of not only restricts but also imposes a direct limit on .
Remark 3. As demonstrated in Eq. (17), the gradient of the aggregated free energy, , is crucial for determining the physics-based rate of entropy change. Additionally, the control input is employed to expedite convergence to folded conformations and to model effects such as solvation or optical tweezer forces, though these are not the focus of this study. The magnitude of this control input can artificially alter the rate of entropy change. To mitigate this, our folding control input design aims to closely align the reference vector field with in Eq. (13), induced by the control input in Eq. (14), while adhering to norm constraints to prevent significant deviations in the natural rate of entropy change.
We are now ready to state the main control problem addressed in this paper.
3.3.1 Problem Statement (Protein Folding Control Problem).
Consider the protein folding nonlinear control affine system given by Eq. (10). Furthermore, consider the KCM reference vector field in Eq. (13) induced by the folding control input in Eq. (14). The Protein Folding Control Problem Subject to Entropy Loss Constraints (PFCP) is concerned with two objectives: (1) Synthesize a folding control input such that the resulting closed-loop dynamics is closely aligned with the KCM reference vector field while adhering to the ellipsoidal constraint in Eq. (20); and (2) Find conditions for existence and uniqueness of the trajectories of the closed-loopy dynamics, which are the folding pathways subject to the constraint given by Eq. (20).
4 Solution to the Protein Folding Control Problem
To achieve the first objective of the PFCP formulated in Sec. 3.3, we first utilize a so-called optimal decision strategy (ODS) to derive a pointwise optimal control law (see, e.g., Refs. [47] and [48] for the pioneering work on ODS), which happens to be the solution to a TRS. We then present an efficient numerical algorithm for computing this optimal control law at each protein conformation.
4.1 The Optimal Decision Strategy Framework for Protein Folding Control Problem.
The proposed control scheme in this section for solving the PFCP leverages an ODS framework to achieve pointwise optimality.
The free vectors in give rise to the set of admissible dihedral angle folding rate vectors, denoted by . Specifically, every element within corresponds to a vector in after its starting point is translated by . Hence, under any input satisfying the constraints, the feasible tangent vectors along the folding trajectory reside within (Fig. 6).

ODS and KCM-based protein folding. The vector field is the “nearest” to while the folding control inputs satisfy the ellipsoidal constraint (20)

ODS and KCM-based protein folding. The vector field is the “nearest” to while the folding control inputs satisfy the ellipsoidal constraint (20)
The following lemma provides the necessary and sufficient conditions for existence of solutions to Eq. (23).
The pairsatisfying the conditions of Proposition 4.1 is called the TRS solution at. We provide an efficient GEP-based algorithm for computing this solution in Sec. 4.2.
4.2 Efficient Computation of Trust-Region Subproblems Solutions Using a Generalized Eigenvalue Problem.
To solve the TRS in Eq. (23) for folding control input synthesis, we employ the approach due to Adachi et al. [29], which relies on constructing a generalized eigenvalue problem (GEP).
and denote its largest eigenvalue by .
Algorithm 1 taken from Ref. [29] solves the TRS in Eq. (23). The algorithm can be explained as follows. The solutions to the TRS in Eq. (23) at each protein conformation can be separated into two types: (i) interior solutions with ; and, (ii) boundary solutions with .
If we have an interior solution at a protein conformation , then can be obtained by solving the linear system because . Accordingly, given a protein molecule conformation , we first compute the solution candidate . If it satisfies , then it is indeed the solution to the TRS in Eq. (23) at conformation . Additionally, due to Eq. (25c) in Lemma 4.1, .
On the other hand, if , then we are dealing with a boundary case at . In this case, one needs to solve for the rightmost eigenvalue of the matrix pencil , i.e., , and its associated eigenvector.
ifthen Interior solution |
else Boundary solution computes the rightmost eigenvalue of the matrix pencil in Eq. (26) and an eigenvector such that . |
endif |
ifthen Interior solution |
else Boundary solution computes the rightmost eigenvalue of the matrix pencil in Eq. (26) and an eigenvector such that . |
endif |
Remark 5 (Complexity of Algorithm 1). The computational cost of Algorithm 1 for computing the interior solutions is that of solving a linear system (e.g., the standard method of using the LU decomposition of ), which is in the worst case [49]. As demonstrated by Ref. [29], the computational complexity of the algorithm for boundary solution calculations scales as floating-point operations (flops) in the case of dense matrices and . However, for large and sparse matrices and , the cost becomes effectively equivalent to that of an Arnoldi-type method for computing a single rightmost eigenpair. This latter cost is typically proportional to a constant factor multiplied by the cost of a single matrix–vector product.
Finally, we summarize some of the key findings from Ref. [29] regarding the matrix pencils (26) and (27) in the next lemma. We will leverage these results to prove the important properties of our proposed TRS-based control law in Sec. 5.
Lemma 4.2 [29]. Consider a givenand assume that. Consider the matrix pencilsandin Eqs. (26) and (27), respectively. The following properties hold.
is a regular matrix pencil, namely, the determinant ofis non-zero for someand the number of its eigenvalues is equal to its size.
- The eigenvalues ofare the roots of the rational equationwhereare smooth functions of, and, , are the eigenvalues of the matrix pencil. Furthermore,is strictly decreasing on, whereis the largest eigenvalue of the matrix pencil.(28)
The largest real eigenvalue ofis, whereis the solution pair to the TRS in Eq. (23).
. Additionally,andinAlgorithm 1.
5 Existence and Uniqueness of Protein Folding Trajectories
In this section, we address the second objective of PFCP formulated in Sec. 3.3. Specifically, we investigate the properties of the TRS-based folding control input and provide sufficient conditions for existence and uniqueness of the folding trajectories under this input.
To facilitate subsequent analysis within this section, we leverage the closed-form solution provided by the following proposition. While Algorithm 1 offers a numerically stable scheme for computing the TRS-based pointwise optimal control law in Eq. (23), the closed-form solution is more suitable for further analysis.
We have the following proposition regarding the continuity of the TRS-based pointwise optimal control law given by Eq. (31) on and .
Proposition 5.2. The control input given by Eq. (31) is continuous on the open setfor all protein conformations. Moreover, this control input admits a continuous extension to the closure.
Proof. We prove continuity of on each of the sets and , respectively. Proposition 5.1 implies that on the open set . Hence, the control input is continuous on for all , where is a smooth function. On , which is given by (30b), it suffices to prove that is a continuous function of . For each , Lemma 4.2 (Property 3) states that is the largest real eigenvalue of the regular matrix pencil , which possesses 4N eigenvalues. Lemma 4.2 (Property 2) indicates that these eigenvalues are the roots of the rational equation given by Eq. (28). Also, due to Lemma 4.2 (Property 3), the function is strictly decreasing beyond . Since according to Lemma 4.2 (Property 4), is an isolated root of . Since depends smoothly on , we conclude that the isolated root is a continuous function of . Finally, the existence of continuous extension of to is trivial from Eq. (31).▪
which, due to Proposition 5.2, is a piecewise smooth vector field with potential discontinuities on the surface . In particular, whenever for some , the control input continuity will be lost.
To investigate the existence and uniqueness of protein folding trajectories induced by the piecewise continuous vector field (32), we need to invoke the notion of Filippov solutions [50]. In particular, we need to compute the Filippov set-valued map of the vector field in (32) and investigate the associated differential inclusion solutions.
Proof. For , Proposition 5.2 implies that is continuous. At each of these points, we have due to the consistency property of Filippov set-valued maps (see, e.g., Ref. [50]). At points of discontinuity of , that is, for , is the convex hull . This convex hull is equal to .▪
where is given by Lemma 5.3. If this solution exists, then it is an absolutely continuous trajectory satisfying (34) for almost all time instances in its maximal interval of existence.
The following proposition provides sufficient conditions for existence and uniqueness of protein folding pathway trajectories.
holds on, whereand, then there exists a unique Filippov solution of (34) starting from each initial condition.
6 Simulation Results
This section presents numerical simulations to assess the efficacy of our proposed TRS-based pointwise optimal control law for kinetostatic folding. The simulations are carried out by considering the KCM-based nonlinear control affine system in Eq. (10) with two different control inputs. The first folding control input given by Eq. (11) induces the vector field (12), whose forward Euler integration with time-step results in the conventional KCM iteration in Eq. (7). The second control input is the solution to the TRS (23), which are computed using Algorithm 1 at each conformation. All our simulation results in this section adhere to the principles of Protofold I [19,41] and have been run on an Intel Core i7-6770HQ CPU at 2.60 GHz.
Our simulations focus on convergence of unfolded protein conformations toward two prevalent secondary structures: -helices and -pleated sheets. The -helix (Fig. 7) is a tightly coiled right-handed arrangement of the amino acid chain, a common structural motif in proteins [2]. Similarly, -pleated sheets (Fig. 7) represent another essential class of protein structures, which are instrumental in protein folding due to their ability to facilitate reorientation of the polypeptide chain [51]. Notably, small proteins like chignolin, a synthetic mini-protein with only 10 amino acids, can exhibit robust beta-hairpin folding [52].
In all our simulations we consider four different folding control inputs labeled as KCM2 (resulting from the modified KCM approach introduced in Sec. 3.2), KCM (resulting from the conventional KCM approach), TRS-based KCM2 (resulting from the TRS in Eqs. (23) and (24)), and TRS-based KCM (resulting from solving the TRS for the conventional KCM reference vector field though with no theoretical guarantees of convergence). The KCM and KCM2 folding control inputs are given by Eqs. (11) and (14). These control inputs introduce artificial deviations in the natural rate of entropy change as discussed in Remark 3. On the other hand, the norm-constrained TRS-based control inputs prevent significant deviations in the natural rate of entropy change.
6.1 Simulations for Convergence to -Helix Structures.
The simulations for convergence to -helix structures were conducted on a protein backbone chain with a 22-dimensional dihedral angle space, namely, a backbone chain with peptide planes. To ensure uniform starting conditions for the two studied control inputs, the initial protein molecule conformation in all simulations was chosen as the same precoiled backbone chain in the vicinity of an -helix configuration. The parameter is chosen to be for the conventional KCM iteration. The parameters for the TRS-based KCM folding input are chosen to be , , , , and is a constant chemical conversion factor.
Figures 8–10 depict the numerical simulation results. The protein backbone chain free energy profiles during folding are demonstrated in Fig. 9. The norm of the folding control inputs during folding are shown in Fig. 10. While the conventional KCM folding schemes cannot guarantee a constrained control input satisfying the ellipsoidal constraint given by Eq. (20), the TRS-based schemes succeed in doing so.

The initial, transient, and final conformations for convergence to an -helix structure using the proposed TRS-based folding control input
6.2 Simulations for Convergence to -Pleated Sheet Structures.
The simulations for convergence to -pleated sheet structures (Fig. 7) were conducted on a protein backbone chain with a 22-dimensional dihedral angle space ( peptide planes).
To ensure uniform starting conditions for the two studied control inputs, the initial protein molecule conformation in all simulations was chosen as the same, located within the proximity of a -pleated sheet motif. The parameter is chosen to be for the conventional KCM iteration. The parameters for the TRS-based KCM folding input are chosen to be , , , , and is a constant chemical conversion factor.
Figures 11–13 depict the numerical simulation results. The protein backbone chain free energy profiles during folding are demonstrated in Fig. 12. The norm of the folding control inputs during folding are shown in Fig. 13. While the conventional KCM folding schemes cannot guarantee a constrained control input satisfying the ellipsoidal constraint given by (20), the TRS-based schemes succeed in doing so.

The initial, transient, and final conformations for convergence to a -pleated sheet structure using the proposed TRS-based folding control input
6.3 Role of Design Parameters and .
In our simulations, the matrices and play central roles in shaping both the objective function for the trust-region subproblem (TRS) and the ellipsoidal constraint that bounds the folding control input. Specifically, is a positive-definite matrix that appears in the quadratic cost term measuring “closeness” of the closed-loop vector field to the reference KCM flow. The choice of thus governs how aggressively (or conservatively) the control inputs deviate from the standard kinetostatic torque update while influencing the overall folding speed and its alignment with the KCM reference vector field. In contrast, is another positive-definite matrix that defines the ellipsoidal norm used for constraining .
From a physical perspective, the matrix encodes how different joint variations affect the entropy of the protein molecule; it thus helps capture subtle details such as anisotropic couplings among torsional degrees-of-freedom suggested by force-field analyses (see, e.g., Ref. [24]). Meanwhile, the matrix balances how tightly the system should follow the steepest-descent direction of the interatomic force field, tied to minimizing the free-energy function. Larger off-diagonal entries in , for instance, can emphasize correlated angle updates consistent with certain bonded interactions, whereas diagonal-dominant structures may bias the control to focus on angles individually. Choosing and further allows researchers to incorporate prior knowledge from experiments on protein folding rates or known structural motifs (e.g., -helices versus -sheets). If, for instance, the biochemistry literature indicates that certain bonds are significantly more sensitive to changes in conformational entropy, increasing the corresponding diagonal entries in can reflect a stricter constraint on those angles. This flexibility ensures that the simulated pathways do not introduce entropy changes at unphysical rates—an important consideration recognized in studies of kinetically constrained folding [6,24].
7 Conclusion
This work addressed the challenge of predicting protein folding pathways using the KCM framework while constraining the rate of entropy change. Solutions are computed from a GEP-based TRS algorithm, which is computationally efficient and does not require outer iterations. The synthesized folding control inputs induce a closed-loop vector field close to the original KCM reference vector field while ensuring a limited rate of molecule entropy change introduced by the folding control input.
7.1 Impact Beyond Protein Folding.
While motivated by the specific challenges of protein folding, the GEP-based TRS algorithm presented in this paper has broader applicability within control systems and robotics. Unlike traditional TRS algorithms (e.g., semidefinite programming) that rely on iterative computational routines with unpredictable convergence, the GEP-based TRS solver in the paper offers superior efficiency along with guarantees for piecewise continuity of the generated control inputs. The computational efficiency is particularly relevant to the robotics community, where Boyd and Rimon identified the need for GEP-based TRS solvers for robot collision avoidance (e.g., point-ellipsoid distance calculations) [53]. However, scalability limitations due to the lack of efficient large-scale eigensolvers hindered the adoption of these solvers at the time. Additionally, the development of fast iterative solvers for embedded applications (e.g., Refs. [54–56]) further shifted the focus away from GEP-based TRS methods. Notably, the prevalent approach in real-time nonlinear MPC by Diehl, Allgöwer, and collaborators leverages linearization and sequential QP techniques [57,58]. Recently, Pappas et al. Recognized the need for explicit TRS solvers in multiparametric nonlinear MPC and initiated research on solving TRSs with iterative active-set strategies [59]. Our work aligns with this renewed interest and offers a powerful alternative for control applications requiring efficient and scalable TRS solutions with guarantees of continuity for the generated control inputs.
Future research includes incorporating solvation effects, investigating protein unfolding subject to entropy loss constraints, and exploring the applicability of the current framework in studying the range of motion and mobility of peptide-based nanomachines.
Acknowledgment
We would like to express our gratitude to the Associate Editor and the reviewers for their constructive feedback, which has enabled us to refine several technical details in the paper.
Funding Data
National Science Foundation (NSF) (Award Nos. CMMI-2153744 and CMMI-2153901; Funder ID: 10.13039/100000084).