Abstract

External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneous trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design is shown to lead to uncontrolled robot motion. To overcome this issue, we present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC modeling structure. Two GP-based learning controllers are presented by using the EIC property. The partial EIC (PEIC)-based control design partitions the robotic dynamics into a fully actuated subsystem and a reduced-order underactuated subsystem. The null-space EIC (NEIC)-based control compensates for the uncontrolled motion in a subspace, while the other closed-loop dynamics are not affected. Under the PEIC- and NEIC-based, the tracking and balance tasks are guaranteed, and convergence rate and bounded errors are achieved without causing any uncontrolled motion by the original EIC-based control. We validate the results and demonstrate the GP-based learning control design using two inverted pendulum platforms.

1 Introduction

An underactuated balance robot possesses fewer control inputs than the number of degrees-of-freedom (DOFs) [1,2]. Motion control of underactuated balance robots requires both the trajectory tracking of the actuated subsystem and balance control of the unactuated, unstable subsystem [35]. Inverting the nonminimum phase unactuated nonlinear dynamics brings additional challenges in causal feedback control design. Several modeling and control methods have been proposed for these robots and their applications [410]. Orbital stabilization method was used for balancing underactuated robots [1,1113], with applications to bipedal robot [14] and cart-inverted pendulum [1]. Energy shaping-based control was also designed for underactuated balance robots [15,16]. One feature of those methods is that the achieved balance-enforced trajectory is not unique and cannot be prescribed explicitly [1,1113]. In Refs. [5] and [17], a simultaneous trajectory tracking and balance control of underactuated balance robots was proposed by using the property of the external and internal convertible (EIC) form of the robot dynamics. The EIC-based control has been demonstrated as one of the effective approaches to achieve fast convergence with guaranteed performance.

The above-mentioned control designs require an accurate model of robot dynamics, and the control performance would deteriorate under model uncertainties or external disturbances. Machine learning-based methods provide an efficient tool for robot modeling and control [18,19]. In particular, Gaussian process (GP) regression is an effective learning approach that generates nearly analytical structure and bounded prediction errors [7,1921]. Development of GP-based performance-guaranteed control for underactuated balance robots has been reported in Refs. [4], [20], and [22]. In Ref. [4], the control design was conducted in two steps. A GP-based inverse dynamics controller for unactuated subsystem to achieve balance and a model predictive control (MPC) was used to simultaneously track the given reference trajectory and estimate the balance equilibrium manifold (BEM). The GP prediction uncertainties were incorporated into the control design to enhance the control robustness. The work in Ref. [5] followed the sequential control design in the EIC-based framework, and the controller was adaptive to the prediction uncertainties. The training data were selected to reduce the computational complexity.

This work takes advantage of the structured GP modeling approach in Refs. [5] and [7] and presents an integration of EIC-based control with GP models. We first present the conditions under which uncontrolled motions exist under the original EIC-based control design for underactuated balance robots. We identify these conditions and design the stable GP-based learning control with the properly selected nominal robot dynamic model. Two different controllers, called partial- and null-space-EIC (i.e., PEIC- and NEIC), are presented to improve the closed-loop performance. The PEIC-based control constructs a virtual inertia matrix to reshape the dynamics coupling between the actuated and unactuated subsystems. The EIC-induced uncontrolled motion is eliminated, and the robotic system behaves as a combined fully actuated subsystem and a reduced-order unactuated subsystem. Alternatively, the compensation effect in the NEIC-based control is applied to the uncontrolled coordinates in the null space, while the other part of the stable system motion stays unchanged. The PEIC- and NEIC-based controls achieve guaranteed robust performance with a fast convergence of the closed-loop tracking errors.

The control tasks considered in this work include both the trajectory tracking for the actuated subsystem and platform balance for the unstable subsystem. The interconnection between these two subsystems lies in implicit dynamic relationship that needs to be estimated in real time. The control problem considered here distinguishes from the work in literature. Most existing approaches, such as orbital stabilization and energy shaping, focus on stabilization only, that is, the trajectory of the actuated subsystem is not prescribed, and the main control task is to stabilize the unstable subsystem. The main contribution of this work lies in the new GP-based learning control of underactuated balance robots using the EIC structural properties. Compared with the approaches in Refs. [5] and [17], this work reveals underlying design properties and limitations of the original EIC-based control for underactuated balance robots. Compared with the work in Refs. [4] and [23], the proposed method takes advantage of the attractive EIC modeling properties for control design and does not use MPC that requires high computational demands. Compared with other learning control methods such as reinforcement learning, the proposed control integrates the robot's dynamics property (i.e., EIC structure) and the GP-based model learning. By integrating physics knowledge into model learning, we identify the conditions for nominal model selection, and the proposed control is designed with guaranteed performance. This paper is an extension of the previous conference submission [24] with new design, analysis, and experiments. Particularly, the NEIC-based control design and experiments were not presented in Ref. [24].

The rest of the paper is outlined as follows. We introduce the EIC-based control and present the problem statement in Sec. 2. Section 3 presents the GP-based robot dynamics. The PEIC- and NEIC-based controls are presented in Sec. 4. The stability analysis is discussed in Sec. 5. The experimental results are presented in Sec. 6, and finally Sec. 7 summarizes the concluding remarks.

2 External and Internal Convertible-Based Robot Control and Problem Statement

2.1 Robot Dynamics and External and Internal Convertible-Based Control.

We consider an underactuated balance robot with (n+m) DOFs, n,m, and the generalized coordinates are denoted as qn+m. The robot dynamics is expressed as
(1)
where D(q),C(q,q˙), and G(q) are the inertia matrix, Coriolis, and gravity matrix, respectively. B denotes the input matrix, and un is the control input. The coordinates are partitioned as q=[qaTquT]T, with actuated coordinate qan and unactuated coordinate qum. We focus on the case nm, and without loss of generality, we assume that B=[In0]T, where Inn is the identity matrix with dimension n. The robot dynamic model in Eq. (1) is rewritten as
(2a)
(2b)

for actuated (Sa) and unactuated (Su) subsystems, respectively. Subscripts “aa (uu)” and “ua (au)” indicate the variables related to the actuated (unactuated) coordinates and coupling effects, respectively. For presentation convenience, we introduce H=Cq˙+G,Ha=Caq˙+Ga, and Hu=Cuq˙+Gu, and the dependence of D, C, and G on q and q˙ is dropped. Subsystems Sa and Su are also referred to as the external and internal subsystems, respectively [4,17].

The control goal is to steer actuated coordinate qa to follow a given desired trajectory qad for Sa, while the unactuated, unstable subsystem Su is balanced at unknown equilibrium que. Therefore, we need to estimate que in real time to achieve simultaneously trajectory tracking (for Sa) and platform balance (for Su). It is noted that not all arbitrary trajectories can be followed given the underactuated dynamics and balance requirement. Such a property has been explicitly discussed for the autonomous bikebot example in Ref. [25]. In this work, we assume that the given trajectory qda is well planned and the control exists. In this work, we assume that the given trajectory qad is well planned and the control exists. Designing and planning feasible trajectory qad is out of the scope of this work. qad

The original EIC-based control design is considered in two steps [5,17]. As shown in the top figure in Fig. 1(a), the first step is to identify and estimate the unknown equilibrium que under an external trajectory tracking control. With the estimated que, the external control design is updated with simultaneously trajectory tracking and balancing tasks. Following such a concept, we first designs external input uext to follow qad by temporarily neglecting Su, namely,
(3)
where vext=qadkd1e˙akp1ea is the auxiliary input under which the tracking error ea=qaqad converges to the origin, and kp1,kd1 are diagonal matrices with positive elements. Assuming that uext is applied to S and Sa follows qad,qu should keep balance around its equilibrium, which is however unknown. Then, BEM is introduced and used to capture the equilibrium of qu under q¨a=vext, namely,
(4)

where Γ(qu;vext)=Duuq¨u+Duavext+Hu. que is obtained by inverting Γ0=Γ(qu;vext)|q˙u=q¨u=0=0. Obtaining que requires accurate system dynamics and needs to invert the nonminimum phase dynamics Su, which is challenging for noncausal control design.

Fig. 1
Illustrative diagrams for (a) the original EIC-based control, (b) the PEIC-control design, and (c) the NEIC-based control design. The top row shows the general idea for the control design, and the bottom row illustrates the information flow in the design. In (a), the dashed line indicates the design flow, and the solid line indicates the control flow.
Fig. 1
Illustrative diagrams for (a) the original EIC-based control, (b) the PEIC-control design, and (c) the NEIC-based control design. The top row shows the general idea for the control design, and the bottom row illustrates the information flow in the design. In (a), the dashed line indicates the design flow, and the solid line indicates the control flow.
Close modal
To stabilize qu onto E, the qa motion is updated as
(5)
where Dua+=(DuaTDua)1DuaT is the generalized inverse of Dua,vuint=quekd2e˙ukp2eu is the auxiliary control that drives error eu=quque toward zero, and kp2,kd2 are diagonal matrices with positive elements. The final control is obtained by replacing vext in Eq. (3) with vint in Eq. (5), that is,
(6)

where vint is used as the virtual control input in Su, that is, under q¨a=vint,q¨u=vuint.

Figure 1(a) illustrates the above sequential EIC-based control design. It has been shown in Ref. [17] that the control uint guarantees both ea and eu convergence to a neighborhood of the origin exponentially if the high-order approximation terms of the closed-loop systems are affine with error e. Therefore, the EIC-based control achieves trajectory tracking for Sa and balancing task for Su simultaneously.

2.2 Motion Property Under External and Internal Convertible-Based Control.

Control design (5) uses a mapping from low-dimensional (m) to high-dimensional (n) spaces (i.e., nm). Under control (6) with properly selected control gains, it has been shown in Ref. [17] that there exists a finite time T >0, and for small number ε>0,||qu(t)que(t)||<ε for t > T. Therefore, given the negligible error, we obtain Dua(qa,qu)Dua(qa,que).

For S in Eq. (2), if rank(Dau)=m for all q, applying singular value decomposition (SVD) to Dua and Dua+, we obtain
(7)

where U=[u1,,um]m×m and Vn×n are unitary orthogonal matrices. Λ=[Λm0]m×n,Λ+=[Λm10]Tn×m and Λm=diag(σ1,,σm) with singular values σi>0,i=1,,m. We partition V into the block matrix V=[VmVn],Vmn×m and Vnn×(nm). Since rank(Dau)=m, the null space of Dua is ker(Dua)=span(Vn).

Column vectors of matrix V serve as a complete set of basis in n, and we introduce a coordinate transformation Υ:xVTx for xn. Clearly, Υ is a linear, time-varying, smooth map. Applying Υ to qa and vext, we have
(8)

where pa=[pamTpanT]T,νext=[(νmext)T(νnext)T]T, and pam,νmextm,pan,νnextnm. Note that [paTquT]T still serves as a complete set of generalized coordinates for S. Using the new coordinate pa, we have the following motion property under the original EIC-based control for S, and the proof is given in Appendix A1.

Lemma 1. ForSin Eq. (2), ifrank(Dau)=mholds forqand all n control inputs appear inSudynamics (throughq¨a), under the EIC-based control (6), the BEM in Eq. (4) is associated with onlyνmext, and robot dynamics can be written into
(9a)
(9b)
(9c)

No control input appears for coordinates in ker(Dua) as shown in Eq. (9b), and only m actuated coordinates in span(V) are under active control, as shown in Eq. (9a). The results in Lemma 1 reveal the motion property of S under the original EIC-based control design. The uncontrolled motion happens to a special set of underactuated balance robots under the conditions in Lemma 1. If the unactuated motion is only related to m (out of n) control inputs, the motion (9b) vanishes, and the EIC-based control works well. In Ref. [5], the EIC-based control worked properly for the rotary inverted pendulum with n=m=1. In Refs. [4] and [25], the EIC-based control also worked well for the bikebot with n =2 (planar motion) and m =1 (roll motion) but the roll motion depends on steering control only, that is, no velocity control, and therefore, does not satisfy the condition for Lemma 1. We will show an example of the three-link inverted pendulum platform that demonstrates the uncontrolled motion under the original EIC-based control in Sec. 6.

With the above-discussed motion property under the EIC-based control, we consider the following problem.

Problem Statement: The goal of robot control is to design an enhanced EIC-based learning control to drive the actuated coordinate qa to follow a given profile qad and simultaneously the unactuated coordinate qu to be stabilized on the estimated que. The uncontrolled motion presented in Lemma 1 should be avoided for robot dynamics (2).

3 Gaussian Process-Based Robot Dynamics Model

We build a GP-based robot dynamics model that will be used for control design in Sec. 4.

3.1 Gaussian Process-Based Robot Dynamics Model.

To keep it self-contained, we briefly review the GP regression model. We consider a multivariate continuously smooth function y=f(x)+w,xinx, where w is the zero-mean Gaussian noise and nx is the dimension of x. Denote the training data as D={X,Y}={xi,yi}i=1N, where X={xi}i=1N,Y={yi}i=1N, and N is the number of the data point. The GP model is trained by maximizing posterior probability p(Y;X,Θ) over the hyperparameters Θ, that is, Θ is obtained by solving

where K=(Kij),Kij=k(xi,xj)=σf2exp((1/2)(xixj)TW(xixj))+ϑ2δij,W=diag{W1,,Wnx}>0,δij=1 for i = j, and Θ={W,σf,ϑ} are hyperparameters.

The GP agent builds the joint distribution of new measurement x* and the training data as
(10)
where k=k(x*,X) and k*=k(x*,x*), and N(μ,Σ) denotes the Gaussian distribution with mean μ and variance Σ. The mean value and variance for input x* are
(11)
We integrate the GP regression with a nominal model. For S in Eq. (1), we first build a nominal model
(12)
where D¯ and H¯ are the nominal inertia and nonlinear matrices, respectively. Generally, the nominal dynamic model does not hold for the data sampled from the physical robot systems. The GP models are built to capture the difference between Sn and S, namely,

We build GP models to estimate He=[(Hae)T(Hue)T]T, where Hae and Hue are for Sa and Su, respectively. The training data D={X,Y} are sampled from S as X={q,q˙,q¨} and Y={He}.

The GP predicted mean and variance are denoted as (μi(x),Σi(x)) for Hie,i=a,u. The GP-based robot dynamics models Sagp and Sugp are given as
(13a)
(13b)
where Higp=H¯i+μi(x),i=a,u. The GP-based model prediction error is
(14)

To quantify the GP prediction error, the following property for Δ is obtained directly from Theorem 6 in Ref. [26].

Lemma 2. Given training datasetD, if the kernel functionk(xi,xj)is chosen such thatHaeforSahas a finite reproducing kernel Hilbert space norm||Hae||k<, for given0<ηa<1
(15)

wherePr{·}denotes the probability of an event,κan, and its ith entry isκai=2||Ha,ie||k2+300ςiln3((N+1)/(1ηa1/n)),ςi=maxx,xX(1/2)ln|1+ϑi2ki(x,x)|. A similar conclusion holds forΔuwith0<ηu<1.

3.2 Nominal Model Selection.

The nominal model plays an important role in the EIC control. We consider the following conditions for choosing the nominal model Sn to overcome the uncontrolled motion under the learning control.

C1: D¯=D¯T is positive definite, ||D¯||d,||H¯||h, where constants 0<d,h<;

C2: rank(D¯aa)=n,rank(D¯uu)=rank(D¯ua)=m; and

C3: nonconstant kernel of D¯ua.

With C1 and C2, the generalized inversions of D¯aa,D¯uu, and D¯au exist, which are used to compute the auxiliary controls. We can select D¯=D¯T to ensure D¯au=D¯uaT. To see the requirement of C3, we rewrite qa=i=1npaivi. By Eq. (9), under the updated control vint,q¨a=i=1mp¨aivi+i=m+1np¨aivi, where vi is the ith column of V. Note that the part i=m+1np¨aivi of Sa dynamics is free of control if V is constant. Although qu is stabilized on que,qa converges to qad only in an m-dimensional subspace and the other (nm) dimensional motion uncontrolled. If the system is stable, the uncontrolled motion cannot be fixed in the configuration space throughout the entire control process. Therefore, a nonconstant kernel D¯ua is needed.

Conditions C1C3 provide sufficient nominal model selection criteria. The commonly used nominal model in Refs. [5] and [7] is D¯q¨=Bu with H¯=0. The constant nominal model is used in Ref. [7] as the system is fully actuated. It is not difficult to satisfy the nominal model conditions in practice. First, the nonlinear term is canceled by feedback linearization, and H¯=0 can be used. Matrix D¯ captures the robots' inertia property. The mass and length of robot links are usually available or can be measured. Meanwhile, the dynamics coupling for revolute joints shows up in the inertia matrix as trigonometric functions of the relative joint angles. Therefore, the diagonal elements can be filled with mass or inertia estimates, and the off-diagonal entries can be constructed with trigonometric functions multiplying inertia constants.

4 Gaussian Process-Enhanced External and Internal Convertible-Based Control

In this section, we propose two enhanced controllers using the GP model Sgp, i.e., PEIC- and NEIC-based control. The PEIC-based control aims to eliminate uncontrolled motion under the original EIC-based control by reassigning the dynamics coupling, while the NEIC-based control directly manages the uncontrolled motion in a transformed space; see Figs. 1(b) and 1(c).

4.1 Robust Auxiliary Control.

With Sgp, we incorporate the variance from Sagp into tracking control as
(16)

where k̂p1=kp1+kn1Σa and k̂d1=kd1+kn2Σa are control gains with parameters kn1,kn20. The variance of GP prediction Σa captures the uncertainty in robot dynamics and is updated online with sensor measurements.

Given the GP-based dynamics, the BEM is estimated by solving the following optimization problem rather by inverting the system dynamics:
(17)
The balance control is then designed as
(18)

where êu=quq̂ue is the unactuated subsystem tracking error relative to the estimated BEM. Similar to k̂p2,k̂d2,k̂p2=kp2+kn3Σu and k̂d2=kd2+kn4Σu depend on Σu with the parameters by kn3,kn40.

Let Δque=queq̂ue denote the BEM estimation error, and the actual BEM is que=q̂ue+Δque. The control design based on actual BEM should be vuint=q¨uek̂p2euk̂d2eu, and therefore we have
where Δvuint=Δq¨ue+k̂p2Δque+k̂d2Δq˙ue. There are two sources causing the BEM estimation error. First, the learned dynamics Sugp deviates from the actual one due to the prediction error Δu. Therefore, the exact BEM solution using Sugp deviates from that obtained in Eq. (4). Second, there exist differences between the BEM solved from Sugp and that obtained from Eq. (17) due to the optimization algorithm. Given the bounded GP prediction error and limited optimization error, it is reasonable to assume that Δque is bounded. Because of the bounded Gaussian kernel function, the GP prediction variances are also bounded, i.e.,
(19)
where σamax=maxi(σfai2+ϑai2)1/2,σumax=maxi(σfui2+ϑui2)1/2, and σf and ϑ are the hyperparameters in each channel. Furthermore, we require the control gains to satisfy the following bounds:

for constants kpj,kdj>0,j=1,,4, where λ(·) denotes the eigenvalue operator.

The control design should follow the guidelines: (1) the pam and qu dynamics are preserved (since they are stable under the original EIC-based control), and (2) the uncontrolled motion (in Sagp) is either eliminated or under active control. The second requirement also implies that the motion of qu should depend on only m control inputs. To see this, solving q¨a from Sagp and plugging it into Sugp yields

Note that D¯uam×n,D¯aa1n×n, and qu is overactuated given n=dim(u)m=dim(qu). If qu depends on the same number of control inputs, (nm) column vectors in D¯uaD¯aa1 should be zero. Thus, the EIC-based control is applied between the same number of actuated and unactuated coordinates. The uncontrolled motion is avoided.

4.2 Partial External and Internal Convertible-Based Control Design.

The control design vint in Eq. (5) updates the input vext, and q¨a acts as a virtual control to steer qu to que. The Su dynamics is rewritten into
where qu is overactuated with respect to q¨a. We instead reallocate the coupling between qa and qu and assign m control inputs for the unactuated subsystem; see Fig. 1(b). To achieve such a goal, we partition the actuated coordinates as qa=[qaaTqauT]T,qaum,qaanm, and u=[uaTuuT]T. The Sgp dynamics in Eq. (13) is rewritten as
(20)
where all block matrices are in proper dimensions. We rewrite Eq. (20) into three groups as
(21a)
(21b)
(21c)

where Hana=D¯aaauq¨au+D¯auaq¨u+Haagp,Hanu=D¯aauaq¨aa+D¯auuq¨u+Haugp, and Hun=D¯uaaq¨aa+Hugp. Apparently, Sugp is virtually independent of Saagp, and the dynamics coupling exists only between Sugp and Saugp.

Let v̂ext in Eq. (16) be partitioned into v̂aext and v̂uext corresponding to qaa and qau, respectively. v̂aext is directly applied to Sgp, and v̂uext is updated for balance control purpose. As aforementioned, the condition to eliminate the uncontrolled motion in Sa is that qu only depends on m inputs. The task of driving qu to que is assigned to qau coordinates only. With this observation, the PEIC-based control takes the form of ûint=[ûaTûuT]T with
(22)

where v̂int=(D¯uau)1(Hun+D¯uuv̂uint). Clearly, the unactuated subsystem only depends on ûu (or qau) under the PEIC design as illustrated in Fig. 1(b). The following lemma presents the qualitative assessment of the PEIC-based control, and the proof is given in Appendix A2.

Lemma 3. If conditionsC1toC3are satisfied andSgpis stable under the EIC-based control design,Sgpis stable under the PEIC-based controlûint.

4.3 Null-Space External and Internal Convertible-Based Control Design.

Besides the PEIC-based control, we propose an alternative method in which the control input for pan is explicitly designed. Noting that pamspan(Vm) and panker(D¯ua)=span(Vn), subspaces span(Vm) and span(Vn) are orthogonal, and the motion of pan is independent of pam. Therefore, a compensation is designed in span(Vn) for pan, which leaves the motion in span(Vm) unchanged. Based on this observation, the NEIC-based control takes the form
(23)

where v˜aint=v˜int+v˜an,v˜an=Vnνn,v˜int=D¯ua+(Hugp+D¯uuv̂uint),νn is the control design that drives pai to paid,i=m+1,,n, and pad=Υ(qad) is transformed reference trajectory. The design of νn drives ea to the origin in ker(D¯ua). A straightforward yet effective design of νn can be νn=αν̂next, where α>0. Compared to the PEIC-based control, pan plays the similar role of qaa coordinates. In the new coordinate, the qu is associated with pam only.

The following result gives the property of the NEIC-based control, and the proof is given in Appendix A3.

Lemma 4. ForS, ifSgpsatisfies conditionsC1toC3andSgpis stable under the original EIC-based control,Sgpunder the NEIC-based controlv˜aintis also stable. Meanwhile,Sugpis unchanged compared to that under the EIC-based control.

The proofs of Lemmas 3 and 4 show that the inputs ûaint and u˜aint follow the control design guidelines. Both the PEIC- and NEIC-based controllers preserve the structured form of the EIC design. Figures 1(b) and 1(c) illustrate the overall flowchart of the PEIC- and NEIC-based control design, respectively. To take advantage of the EIC-based structure, we follow the design guideline to make sure that motion of unactuated coordinates only depends on m inputs in configuration space (PEIC-based control) or transformed space (NEIC-based control). The input νnext is re-used for uncontrolled motion under the NEIC-based control. The PEIC-based control assigns the balance task to a partial group of the actuated coordinates.

5 Control Stability Analysis

5.1 Closed-Loop Dynamics.

To investigate the closed-loop dynamics, we consider the GP prediction error and the BEM estimation error. The GP prediction error in Eq. (14) is extended to Δaa,Δau, and Δu for qaa,qau,andqu dynamics, respectively. Under the PEIC-based control, the dynamics of S becomes

Obtaining BEM with Eq. (17) under (q¨aa,v̂uext) is equivalent to inverting Eq. (21c). Thus, v̂uext=(D¯uau)1Hun|qu=q̂ue,q˙u=q¨u=0. Substituting the above equation into the qau dynamics yields q¨au=v̂uext+Oau, where Oau=(D¯uau)1D¯uuv̂uint(D¯aau)1Δau+o1 and o1 denotes the higher order terms.

Defining the total error eq=[eaTeuT]T and e=[eqTe˙qT]T, the closed-loop error dynamics becomes
(24)

with Otot=[OaTOuT]T,Oa=[OaaTOauT]T,Oaa=(D¯aaa)1Δaa,Ou=D¯uu1(ΔuD¯uau(D¯aau)1Δau)Δvuint,k̂p=diag(k̂p1,k̂p2), and k̂d=diag(k̂d1,k̂d2).

Because of bounded D¯, there exist constants 0<da1,da2,du1,du2< such that da1||D¯aa||da2 and du1||D¯uu||du2. The perturbation terms are further bounded as
The perturbation o1 is due to approximation, and Δvuint is the control difference by the BEM calculation with the GP prediction. They are both assumed to be affine with e, i.e.,
(25)
with 0<ci<,i=1,,4. From Eq. (19), we have ||κaTΣa1/2||σamax||κa|| and ||κuTΣu1/2||σumax||κu||. Thus, for 0<η=ηaηu<1, we can show that
(26)

where d1=c2+(1+(du2/σ1))c4,d2=c1+(du2/σ1)c3,la1=((σamax(du1+σm))/du1da1),andlu1=σumax/du1.

To obtain the closed-loop dynamics under the NEIC-based control, plugging the NEIC-based control into Sgp, we obtain
(27a)
(27b)
(27c)
To obtain the error dynamics, we take advantage of the definition of BEM. From Eq. (A3), we have νaext=Λm1UTHugp|qu=q̂ue,q˙u=q¨u=0. Then, we rewrite Eq. (27a) into
(28)

where o2 is the residual that contains higher order terms. Oam=o2Λm1UTD¯uuv̂uintΛm1UTΔuVmTD¯aa1Δa denotes the total perturbations.

The Sugp dynamics keeps the same form as that in the PEIC-based control. We write the error dynamics under the NEIC-based control as
(29a)
(29b)
(29c)
where eam=pampamd,ean=panpand, and Oan=VnTD¯aa1Δa. Applying inverse mapping Υ1 to Eqs. (29a) and (29b), the error dynamics in q is obtained as
(30)
where O2 is the transformed perturbations of [OanTOamTOuT]T. Following the same steps to obtain Eq. (26), we have
(31)

where lu2=σu,max((σ1+du1)/σ1du1), and la2=σa,max((σm+du1)/da1du1).

5.2 Stability Results.

To show the stability, we consider the Lyapunov function candidate V=eTPe0, where positive definite matrix P=PT is the solution of
(32)

for given positive definite matrix Q=QT, where A0 is the constant part of A in Eq. (24) and does not depend on variances Σa or Σu. kp=diag(kp1,kp2) and kd=diag(kd1,kd2).

We denote the corresponding Lyapunov function candidates for the NEIC- and PEIC-based controls as V1 and V2, respectively. The stability results are summarized as follows with the proof given in Appendix A4.

Theorem 1. For robot dynamics (2), using the GP-based model (13) that satisfies conditionsC1C3, under the PEIC- and NEIC-based control, the Lyapunov function under each controller satisfies
(33)

and the error e converges to a small ball around the origin, where γi is the convergence rate, ρi and ϖi are the perturbation terms, and 0<η=ηaηu<1.

6 Experimental Results

Two inverted pendulum platforms are used to conduct experiments to validate the control design. The results from each platform demonstrate different aspects of the control design.2

6.1 Two Degree-of-Freedom Rotary Inverted Pendulum

Figure 2(a) shows a 2DOF rotary inverted pendulum that was fabricated by Quanser Inc., Markham, ON, Canada. The base joint (θ1) is actuated by a DC motor, and the inverted pendulum joint (θ2) is unactuated, i.e., n=m=1. We use this platform to illustrate the original EIC-based control and also compare the performance under different nominal models and controllers. The robot dynamic model is given in Ref. [27] and is also found in Appendix B1.

Fig. 2
(a) A Furuta pendulum. The base link joint θ1 is actuated, and the pendulum link joint θ2 is unactuated. (b) A three-link inverted pendulum with actuated joints θ1 and θ2 and unactuated joint θ3. The rotation axis of link is perpendicular to that of link 2 and link 3.
Fig. 2
(a) A Furuta pendulum. The base link joint θ1 is actuated, and the pendulum link joint θ2 is unactuated. (b) A three-link inverted pendulum with actuated joints θ1 and θ2 and unactuated joint θ3. The rotation axis of link is perpendicular to that of link 2 and link 3.
Close modal
Since m=n=1, there is no uncontrolled motion when the original EIC-based control is applied. Therefore, either a constant or time-varying nominal model would work for the GP-based learning control. We created the following two nominal models:

where ci=cosθi,si=sinθi for angle θi, i =1, 2. The training data were sampled and obtained by applying control input u=kT[θ1θ1tθ2θ˙1θ˙1tθ˙2]T, where k4×1 and θ1t was the combination of sinusoidal waves with different amplitudes and frequencies. We chose this input to excite the system, and the gain k was selected without the need to balance the platform. It is difficult to guarantee that the system is fully excited. However, we changed the frequency of sinusoidal waves and obtained the motion data around the target trajectory.

We trained the GP regression models using a total of 500 data points randomly selected from a large dataset. We designed the control gains as k̂p1=10+50Σa,k̂d1=3+10Σa,k̂p2=1000+500Σu, and k̂d2=100+200Σu. The variances Σa and Σu were updated online with new measurements in real time. The reference trajectory was θ1d=0.5sint+0.3sin1.5t rad. The control was implemented at 400 Hz in matlab/simulink real-time system. Both the velocity and acceleration are needed for control design and GP training and prediction. To reduce the influence of measurement noise on control design, BEM estimation, and GP agent training, a sliding window was used to filter the velocity measurement online. The acceleration was obtained through real-time differentiation. The same technique was also used for the three-link inverted pendulum in Sec. 6.2.

Figures 3(a) and 3(b) show the tracking of θ1 and balance of θ2 under the EIC-based control. With either Sn1 or Sn2, the base link joint θ1 closely followed the reference trajectory θ1d, and the pendulum link joint θ2 was stabilized around its equilibrium θ2e as well. The tracking error was reduced further, and the pendulum closely followed the small variation under Sn1. With Sn2, the tracking errors became large when the base link changed rotation direction; see Fig. 3(c) at t =10, 17, and 22 s. Both the time-varying and constant nominal models worked for the EIC-based learning control.

Fig. 3
Experiment results with guaranteed performance: (a) arm rotation angle, (b) pendulum rotation angle, (c) tracking control error under GP-based control, (d) pendulum motion profile, (e) profile of Lyapunov function, and (f) trajectory error motion. At t = 17 s, an impact disturbance is applied. The dashed arrow in (f) indicates the direction in which the error grows after disturbance is applied.
Fig. 3
Experiment results with guaranteed performance: (a) arm rotation angle, (b) pendulum rotation angle, (c) tracking control error under GP-based control, (d) pendulum motion profile, (e) profile of Lyapunov function, and (f) trajectory error motion. At t = 17 s, an impact disturbance is applied. The dashed arrow in (f) indicates the direction in which the error grows after disturbance is applied.
Close modal

Table 1 further lists the tracking errors (mean and one standard deviation) under both GP models. For comparison purposes, we also conducted additional experiments to implement the original EIC-based control and the GP-based MPC design in Ref. [4]. The tracking and balance errors under the EIC-based learning control with model Sn1 are the smallest. In particular, with the time-varying model Sn1, the mean values of tracking errors e1 and e2 were reduced by 75% and 65%, respectively, in comparison with those under the original EIC-based control. Compared with the MPC method in Ref. [4], the tracking errors with nominal model Sn2 are at the same level.

Table 1

Tracking errors comparison under various controllers (×101 rad)

Sn1Sn2GP-based MPC [4]Physical EIC
|e1|0.24 ± 0.170.96 ± 0.340.87 ± 0.521.09 ± 0.40
|e2|0.09 ± 0.050.09 ± 0.390.07 ± 0.060.26 ± 0.15
Sn1Sn2GP-based MPC [4]Physical EIC
|e1|0.24 ± 0.170.96 ± 0.340.87 ± 0.521.09 ± 0.40
|e2|0.09 ± 0.050.09 ± 0.390.07 ± 0.060.26 ± 0.15

Figure 3(d) shows the control performance with nominal model Sn1 under disturbance. At t =17 s, an impact disturbance (by manually pushing the pendulum link) was applied, and the joint angles changed rapidly with Δθ1=0.7 rad and Δθ2=0.3 rad. The control gains increased (k̂p2=1215,k̂d2=143) to respond to the disturbance. As a result, the pendulum motion tracked the BEM closely and maintained the pendulum balance after the impact disturbance. Figure 3(e) shows the calculated Lyapunov function candidate V(t) and its envelope (i.e., V(t)=V(0)eγt,γ=0.1898) during the experiment. Figure 3(f) shows the error trajectory in the ||eq||||e˙q|| plane. The solid/dashed line shows the error trajectory before/after impact disturbance. The tracking error converged quickly into the error bound. After the disturbance was applied at t =17 s, both the Lyapunov function and errors grew dramatically. As the control gains increased, the errors quickly converged back to the estimated bound again.

6.2 Three Degree-of-Freedom Rotary Inverted Pendulum.

Figure 2(b) for a 3DOF inverted pendulum with two actuated joints (θ1 and θ2) and one unactuated joint (θ3), namely, n=2,m=1. The physical model of the robot dynamics was obtained using the Lagrangian method and is given in Appendix B2. All controllers were implemented at an updating frequency of 200 Hz through the Robot Operating System. The time-varying nominal model was selected as

where ci±j=cos(θi±θj). The control gains were k̂p1=15I2+20Σa,k̂d1=3I2+10Σa,k̂p2=25+20Σu,andk̂d2=5.5+10Σu, where GP variances Σa and Σu were updated online in real-time. The reference trajectory was chosen as θ1d=0.5sin1.5t and θ2d=0.4sin3t rad.

For the PEIC-based control, we chose qaa=θ1 and qau=θ2, and the NEIC-based control was νn=ν̂next. Figure 4 shows the experimental results under the PEIC- and NEIC-based control. Under both controllers, the actuated joints (θ1 and θ2) followed the given reference trajectories (θ1d and θ2d) closely, and the unactuated joint (θ3) was balanced around the BEM (θ3e) as shown in Figs. 4(a) and 4(b). The pendulum link motion displayed a similar pattern for both controllers. However, the tracking error e1 under the PEIC-based control (i.e., from −0.05 to 0.05 rad) was much smaller than that under the NEIC-based control (i.e., from −0.15 to 0.15 rad); see Figs. 4(c) and 4(d). The balance task in the PEIC-based control was assigned to joint θ2, and joint θ1 is viewed as virtually independent of θ2 and θ3. Joint θ1 achieved almost-perfect tracking control regardless of the errors for θ2 and θ3. The compensation effect in the null space appeared in the entire configuration space, and any motion error in the unactuated joints affected the motion of all actuated joints. Similar to the previous example, Fig. 4(e) shows the error trajectory profile in the ||eq||||eq˙|| plane. Figure 4(f) shows the Lyapunov function profiles under the PEIC- and NEIC-based controls.

Fig. 4
Experiment results with the 3DOF inverted pendulum: (a) and (b) Motion profiles under the PEIC- and NEIC-based control, (c) and (d) tracking errors under the PEIC- and NEIC-based control, (e) error trajectory in the ||eq||–||e˙q|| plane, and (f) comparison of the estimated Lyapunov function profile with the actual one
Fig. 4
Experiment results with the 3DOF inverted pendulum: (a) and (b) Motion profiles under the PEIC- and NEIC-based control, (c) and (d) tracking errors under the PEIC- and NEIC-based control, (e) error trajectory in the ||eq||–||e˙q|| plane, and (f) comparison of the estimated Lyapunov function profile with the actual one
Close modal

Figure 5 shows the motion of the actuated coordinate in the transformed coordinate pa under various controllers. Under the PEIC- and NEIC-based controls, the pa variables followed the reference profile pad as shown in Figs. 5(a) and 5(b). Figure 5(c) shows the motion profile under the original EIC-based control. In the first 2 s, joint θ3 followed the BEM under the EIC-based control, and pa1 coordinates displayed a similar motion pattern. However, pa2 coordinate showed diverge behavior and led to a failure completely. Therefore, as analyzed previously, the system became unstable under the EIC-based control though conditions C1 to C3 were satisfied.

Fig. 5
Motion profiles of robotic leg in transformed space under (a) PEIC-based control, (b) NEIC-based control, and (c) EIC-based control
Fig. 5
Motion profiles of robotic leg in transformed space under (a) PEIC-based control, (b) NEIC-based control, and (c) EIC-based control
Close modal

In NEIC-based control, vn drives the uncontrolled motion variable to its reference trajectory. To further reduce the tracking error, we can increase α values. Figure 6 shows the experiment results of the pa error profiles under various α values varying from 0.5 to 1.5. With a large α value, the tracking error of the actuated coordinates was reduced. Table 2 further lists the steady-state errors (in joint angles) under the NEIC-based control with various α values, the PEIC-based control and the physical model-based control design. Under the NEIC-based control with α=0.5, the system was stabilized; when increasing α values to 1 and 1.5, the mean tracking errors were reduced 50% and 70% for θ1, respectively, and 40% for θ2. Since control input νn did not affect the balance task of the unactuated subsystem, the tracking errors for θ3 maintained the same level. It is of interest that the control effort (i.e., last column in Table 2) only shows a slight increase with large α values.

Fig. 6
The tracking errors in coordinate pa under the NEIC-based control with various α values
Fig. 6
The tracking errors in coordinate pa under the NEIC-based control with various α values
Close modal
Table 2

Statistical analysis of tracking performance (mean and standard deviation for errors) under different controllers

|e1| (rad)|e2| (rad)|e3| (rad)||e||uTudt
PEIC (GP)0.0302 ± 0.01780.0566 ± 0.06850.1182 ± 0.01600.1343 ± 0.01665.7659
NEIC (GP, α=0.5)0.1395 ± 0.09460.1166 ± 0.05120.0303 ± 0.02090.2001 ± 0.07705.9022
NEIC (GP, α=1.0)0.0651 ± 0.04160.0756 ± 0.04810.0195 ± 0.01520.1101 ± 0.04995.7089
NEIC (GP, α=1.5)0.0376 ± 0.03020.0792 ± 0.04820.0207 ± 0.01690.0972 ± 0.04705.7305
PEIC (model)0.2168 ± 0.11650.2398 ± 0.16490.0179 ± 0.01400.3587 ± 0.13075.7978
NEIC (model, α=1.0)0.1374 ± 0.09220.1237 ± 0.05970.0455 ± 0.03850.2095 ± 0.07695.8452
|e1| (rad)|e2| (rad)|e3| (rad)||e||uTudt
PEIC (GP)0.0302 ± 0.01780.0566 ± 0.06850.1182 ± 0.01600.1343 ± 0.01665.7659
NEIC (GP, α=0.5)0.1395 ± 0.09460.1166 ± 0.05120.0303 ± 0.02090.2001 ± 0.07705.9022
NEIC (GP, α=1.0)0.0651 ± 0.04160.0756 ± 0.04810.0195 ± 0.01520.1101 ± 0.04995.7089
NEIC (GP, α=1.5)0.0376 ± 0.03020.0792 ± 0.04820.0207 ± 0.01690.0972 ± 0.04705.7305
PEIC (model)0.2168 ± 0.11650.2398 ± 0.16490.0179 ± 0.01400.3587 ± 0.13075.7978
NEIC (model, α=1.0)0.1374 ± 0.09220.1237 ± 0.05970.0455 ± 0.03850.2095 ± 0.07695.8452

6.3 Discussion.

For the rotary pendulum example, we have n = m, and the null space ker(Dau) vanishes. The compensation effect is no longer needed by the NEIC-based control, i.e., v˜aint=v˜int and u˜int=D¯aav˜aint+D¯auq¨u+Hagp=uint. In this case, the PEIC- and NEIC-based controls are degenerated to the EIC-based control. For the 3DOF inverted pendulum, the control inputs u1 and u2 act on θ3 joints through θ¨1 and θ¨2. Therefore, as shown in Lemma 1, the uncontrolled motion exists since all controls show up in Su dynamics. This observation explains why the original EIC-based control failed to balance the three-link inverted pendulum. If the Su dynamics is related to m control inputs (through q¨a) for n > m such as the bikebot dynamics in Refs. [4] and [25], only m external controls were updated, and the EIC-based control worked well without any uncontrolled motion.

For the PEIC-based control, the robot dynamics were partitioned into Sgp={Saagp,{Saugp,Sugp}}, which contains a fully actuated system Saagp, and a reduced-order underactuated system {Saugp,Sugp}. The EIC-based control is applied to Saugp and Sugp only. The dynamics of qu in general does not depend on any specific m actuated coordinates, since the mapping Υ is time-varying across different control cycles. In the NEIC-based control design, pam and qu become an underactuated subsystem, and pan is fully actuated.

In practice, no specific rules are defined to select qau out of qa coordinates, and therefore, there are a total of Cnm=n!/(m!(nm)!) options to select different coordinates. We take advantage of such a property to optimize tracking performance for selected coordinates. In the 3DOF pendulum case, we assigned the balance task of θ3 to θ2 motion. The length of link 1 was only 0.09 m and was much shorter than the length of link 2 (0.23 m). The coupling effect between θ2 and θ3 was much stronger than that between θ1 and θ3; see D13 and D23 in Appendix B2. Thus, it was efficient to use the motion of θ2 as a virtual control input to balance θ3. When implementing the PEIC-based controller with qau=θ1, the system cannot achieve the desired performance and becomes unstable. We also implemented the proposed controller with the physical model. The control errors are listed in Table 2. Compared with the learning-based controllers, the model-based control resulted in larger errors. Since the mechanical frictions and other unstructured effects were not considered, the physical model might not capture and reflect the accurate robot dynamics. The results confirmed the advantages of the proposed learning-based control approaches.

The unique feature of the proposed control lies in integration of the robot's inherent dynamics property (EIC structure) and the GP-based model learning, compared with other learning-based control approach [18,22]. By integrating physics knowledge into model learning, we identified the conditions for nominal model selection. The overall model learning and control design framework forms a white-box-like, physics knowledge involved control, which differs from the reinforcement learning-based policy search approach [18]. The solution also has the potential to further incorporate the bounded GP prediction error for a robust control [4].

7 Conclusion

This paper presented a new learning-based modeling and control framework for underactuated balance robots. The proposed design was an extension and improvement of the EIC-based control with GP-enabled robot dynamics. The proposed new robot controllers preserved the structural design of the original EIC-based control and achieved both tracking and balance tasks. The PEIC-based control reshaped the coupling between the actuated and unactuated coordinates. The robot dynamics was transferred into a fully actuated subsystem and one reduced-order underactuated balance subsystem. The NEIC-based control compensated for uncontrolled motion in a subspace. We validated and demonstrated the new control design on two experimental platforms and confirmed that stability and balance were guaranteed. The comparison with the physical model-based EIC control and the MPC design confirmed superior performance in terms of the error bound. Extension of the GP-based learning control design for highly underactuated balance robots is one of the ongoing research directions.

Funding Data

  • U.S. National Science Foundation (NSF) (Award No. CNS-1932370; Funder ID: 10.13039/100000001).

Data Availability Statement

No data, models, or code were generated or used for this paper.

Nomenclature

ea,eu,e =

tracking, balance, and overall errors

pa,νext =

transformed qa and vext in p coordinates

pam,pan =

controlled and uncontrolled coordinates

qa,qu =

coordinates for actuated and unactuated subsystems

qaa,qau =

partitioned actuated coordinates in (n − m)- and m-dimensions

que,q̂ue =

actual and estimated BEMs

S =

robot dynamics

Sn,Sgp =

nominal and GP-based robot dynamics

uint,ûint,u˜int =

EIC-, PEIC-, NEIC-based control inputs

vext,vint =

trajectory tracking and balanced-embedded control inputs

vuint =

BEM stabilization control input

v̂aext,v̂uext =

trajectory tracking control inputs for qaa and qau

v˜aint =

control input for pam

γ,r =

convergence rate and error bound

Δa,Δu =

estimation errors of actuated and unactuated dynamics

Appendix A: Proofs

A1 Proof of Lemma 1.
The system dynamics S under control uext is
(A1)
When rank(Dau)=m holds for q, the SVD in Eq. (7) exists and all m singular values are great than zero, i.e., σi>0. Thus, ker(Dau)=Vn contains (nm) column vectors. Plugging Eq. (7) into Eq. (A1) and considering the coordinate transformation, we obtain
(A2)

where UΛVTvext=UΛmνmext is used based on the fact that Λm×n is a rectangular diagonal matrix.

Given the definition of E,que is obtained by solving the algebraic equation Γ0(qu;vext)=0. We substitute Dua(que) with Dua(qu) in Γ0, and therefore, using Eq. (7), Γ0=0 is rewritten into
(A3)

The BEM E depends only on νmext, that is, the control effect in ker(Dua) is not used when obtaining the BEM.

Furthermore, since all controls show up in Su dynamics, the control inputs should be updated, and the EIC-based control in Eq. (6) exists. We substitute Eqs. (7) and (6) into Sa dynamics and obtain

Multiplying the above equation on both sides with VT and considering Eq. (8), S under the EIC-based control becomes Eq. (9), and the (nm) coordinates are free of control.

A2 Proof of Lemma 3.
Under input uu,q¨au=vauint, we solve q¨u by Eq. (21c)
Clearly, the unperturbed subsystem Sugp remains the same as that under the original EIC-based control. With the designed control, qaa dynamics is unchanged, and q¨aa=v̂aext holds regardless of v̂uint. For qaa and qau, we obtain q¨aa=v̂aext and q¨au=v̂uint. The relationship in Eq. (9) indicates that if the unactuated subsystem dynamics is written into q¨u=vuint, the dynamics q¨a under the transformation Υ must contain the portion (9a). Similarly, we obtain
(A4a)
(A4b)
(A4c)

where v̂aint=[(v̂aext)T(v̂uint)T]T. Since v̂aint is not obtained in the way as in Eq. (5), i.e., v̂aintker(D¯ua),vm+jTv̂aint0 and pan is under active control. Meanwhile, vm+jTv̂aint drives qaqad in ker(D¯au), given that v̂aext and v̂uint are designed to drive qaqad. Therefore, if the unperturbed system under the original EIC-based control is stable, it is also stable under the PEIC-based control.

A3 Proof of Lemma 4.
Under the NEIC-based control input, the Sagp becomes
(A5)
Plugging above equation into Sugp, we obtain
Using the SVD form of D¯ua in Eq. (7) and ΛVTVn0, the above equation is further simplified as
(A6)

Clearly, Sugp dynamics is unchanged compared to Eq. (9).

We further apply the transformation Υ to qa and SVD to D¯ua+. The Su dynamics (A5) and (A6) become
(A7a)
(A7b)
(A7c)

Compared to Eq. (9), we add control v˜aext to drive qaqad in the subspace ker(D¯ua). Therefore, if the system (9) is stable, Eq. (A4) is also stable, as the pam and qu dynamics are unchanged.

A4 Proof for Theorem 1.

We present the stability proof for the PEIC- and NEIC-based controls using the Lyapunov method.

PEIC-Based Control: Plugging Eq. (24) into V1=V and considering Eq. (32), we obtain V˙1=eT(ATP+PA)e+2eTPO1=eTQe+eTQΣe+2eTPO1, where QΣ=(AA0)TP+P(AA0). The bounded variance leads to the bounded eigenvalue of matrix QΣ. Given the fact that QΣ=QΣT, the eigenvalues of QΣ are real numbers.

Noting that QΣ is bounded and P,Q are constant, the perturbation term O1 is bounded as shown in Eq. (26). Then, V˙1 is rewritten as
where ω1=lu1||κu||+la1||κa|| denotes the uncertainties related to GP prediction errors. λmin(·) and λmax(·) denote the smallest and greatest eigenvalues of a matrix, respectively. Considering λmin(P)||e||2V1λmax(P)||e||2, we define

ρ1=2d1λmax(P)||e||,ϖ1=2ω1λmax(P)||e||. With the bounded perturbations ρ1 and ω1, the closed-loop system dynamics can be shown stable in probability as Pr{V1γ1V1+ρ1+ϖ1}>η. Taking further analysis, we obtain a nominal estimation of the error convergence as Pr{V˙1V1(0)eγ1t}>η and the error bound estimation Pr{||e||r1}>η with r1=(2d1λmax(P))/(λmin(Q)λmax(QΣ)2d2λmax(P)).

NEIC-Basd Control: Without the loss of generality, we select νn=VnTv̂ext. We take V2=V as the Lyapunov function candidate for Se,NEIC. If the control gains are the same as that in the PEIC-based control and α = 1 for compensation effect, γ2=γ1. We choose control gains properly such that γ2>0. The system can be shown stable as Pr{V˙2γ2V2+ρ2+ϖ2}>η, where ρ2=2d1λmax(P)||e||,ϖ2=2ω2λmax(P)||e||, and ω2=lu2||κu||+la2||κa|| is defined same as ω1 containing the GP prediction uncertainties. A nominal estimation of error convergence and final error bound can also be obtained.

To show γi>0, i =1, 2, the control gains should be properly selected. With a small predefined error limit as a stop criterion in BEM estimation, ci values can be shown as ci1. Given the explicit form, di are estimated for A0 and Q, P is obtained by solving Eq. (32). The matrix QΣ depends on the control gains associated with the reduction variance. Since the variance is bounded, we design kni such that λmax(QΣ) satisfies the inequality λmin(Q)λmax(QΣ)2d2λmax(P)>0 and then γ1>0. Thus, the stability is obtained.

Appendix B: Dynamics Model of Underactuated Balance Robots

B1 Rotary Inverted Pendulum.
The dynamics model for the rotary pendulum is in the form of Eq. (1) with qa=θ1 and qu=θ2. The model parameters are B=[10]T and

where lr, Jr, and dr are the length, mass inertia, and viscous damping coefficient of the base link, lp, Jp, and dp are corresponding parameters of the pendulum, mp is the pendulum mass, g is the gravitational constant, and kt,km,KG,Rm,andC are robot constant. The values of these parameters can be found in Ref. [27]. The control input is the motor voltage, i.e., u = Vm.

B2 Three-Link Inverted Pendulum.
The model parameters in Eq. (1) are

where mi, li, and Ji are the mass, length, and mass inertia of each link, and si+j=sin(θi+θj). Matrix C is obtained as Cij=k=13cijkθ˙k, where Christoffel symbols cijk=12((Dij/θk)+(Dik/θj)(Djk/θi)). The physical parameters are m1=0.7 kg, m2=1.3 kg, m3=0.3 kg, l1=0.065 m, l2=0.23 m, l3=0.25 m, J1=0.0008 kg m2, J2=0.005 kg m2, and J3=0.003 kg m2.

Footnotes

2

The video of the experiment is available at https://www.youtube.com/watch?v=ZOYb0UW3KS8

References

1.
Kant
,
N.
, and
Mukherjee
,
R.
,
2020
, “
Orbital Stabilization of Underactuated Systems Using Virtual Holonomic Constraints and Impulse Controlled Poincaré Maps
,”
Syst. Control Lett.
,
146
, p.
104813
.10.1016/j.sysconle.2020.104813
2.
Han
,
F.
, and
Yi
,
J.
,
2023
, “
On the Learned Balance Manifold of Underactuated Balance Robots
,” IEEE International Conference on Robotics and Automation (
ICRA
), London, UK, May 29–June 2, pp.
12254
12260
.10.1109/ICRA48891.2023.10161088
3.
Han
,
F.
,
Jelvani
,
A.
,
Yi
,
J.
, and
Liu
,
T.
,
2022
, “
Coordinated Pose Control of Mobile Manipulation With an Unstable Bikebot Platform
,”
IEEE/ASME Trans. Mechatron.
,
27
(
6
), pp.
4550
4560
.10.1109/TMECH.2022.3157787
4.
Chen
,
K.
,
Yi
,
J.
, and
Song
,
D.
,
2023
, “
Gaussian-Process-Based Control of Underactuated Balance Robots With Guaranteed Performance
,”
IEEE Trans. Rob.
,
39
(
1
), pp.
572
589
.10.1109/TRO.2022.3203625
5.
Han
,
F.
, and
Yi
,
J.
,
2021
, “
Stable Learning-Based Tracking Control of Underactuated Balance Robots
,”
IEEE Rob. Autom. Lett.
,
6
(
2
), pp.
1543
1550
.10.1109/LRA.2021.3056324
6.
Turrisi
,
G.
,
Capotondi
,
M.
,
Gaz
,
C.
,
Modugno
,
V.
,
Oriolo
,
G.
, and
Luca
,
A. D.
,
2022
, “
On-Line Learning for Planning and Control of Underactuated Robots With Uncertain Dynamics
,”
IEEE Rob. Autom. Lett.
,
7
(
1
), pp.
358
365
.10.1109/LRA.2021.3126899
7.
Beckers
,
T.
,
Kulić
,
D.
, and
Hirche
,
S.
,
2019
, “
Stable Gaussian Process Based Tracking Control of Euler–Lagrange Systems
,”
Automatica
,
103
, pp.
390
397
.10.1016/j.automatica.2019.01.023
8.
Chen
,
K.
, and
Yi
,
J.
,
2015
, “
On the Relationship Between Manifold Learning Latent Dynamics and Zero Dynamics for Human Bipedal Walking
,” IEEE/RSJ International Conference on Intelligent Robots and Systems (
IROS
), Hamburg, Germany, Sept. 28–Oct. 2, pp.
971
976
.10.1109/IROS.2015.7353488
9.
Grizzle
,
J. W.
,
Chevallereau
,
C.
,
Sinnet
,
R. W.
, and
Ames
,
A. D.
,
2014
, “
Models, Feedback Control, and Open Problems in 3D Bipedal Robotic Walking
,”
Automatica
,
50
(
8
), pp.
1955
1988
.10.1016/j.automatica.2014.04.021
10.
Han
,
F.
,
Huang
,
X.
,
Wang
,
Z.
,
Yi
,
J.
, and
Liu
,
T.
,
2022
, “
Autonomous Bikebot Control for Crossing Obstacles With Assistive Leg Impulsive Actuation
,”
IEEE/ASME Trans. Mechatron.
,
27
(
4
), pp.
1882
1890
.10.1109/TMECH.2022.3172909
11.
Shiriaev
,
A. S.
,
Perram
,
J. W.
, and
Canudas-de-Wit
,
C.
,
2005
, “
Constructive Tool for Orbital Stabilization of Underactuated Nonlinear Systems: Virtual Constraints Approach
,”
IEEE Trans. Autom. Control
,
50
(
8
), pp.
1164
1176
.10.1109/TAC.2005.852568
12.
de Wit
,
C. C.
,
Espiau
,
B.
, and
Urrea
,
C.
,
2002
, “
Orbital Stabilization of Underactuated Mechanical Systems
,”
IFAC Proc. Vol.
,
35
(
1
), pp.
527
532
.10.3182/20020721-6-ES-1901.00900
13.
Maggiore
,
M.
, and
Consolini
,
L.
,
2013
, “
Virtual Holonomic Constraints for Euler–Lagrange Systems
,”
IEEE Trans. Autom. Control
,
58
(
4
), pp.
1001
1008
.10.1109/TAC.2012.2215538
14.
Chevallereau
,
C.
,
Grizzle
,
J. W.
, and
Shih
,
C.-L.
,
2009
, “
Asymptotically Stable Walking of a Five-Link Underactuated 3-D Bipedal Robot
,”
IEEE Trans. Rob.
,
25
(
1
), pp.
37
50
.10.1109/TRO.2008.2010366
15.
Fantoni
,
I.
,
Lozano
,
R.
, and
Spong
,
M. W.
,
2000
, “
Energy Based Control of Pendubot
,”
IEEE Trans. Autom. Control
,
45
(
4
), pp.
725
729
.10.1109/9.847110
16.
Xin
,
X.
, and
Kanedai
,
M.
,
2005
, “
Analysis of the Energy-Based Control for Swinging Up Two Pendulums
,”
IEEE Trans. Autom. Control
,
50
(
5
), pp.
679
684
.10.1109/TAC.2005.846598
17.
Getz
,
N.
,
1995
, “
Dynamic Inversion of Nonlinear Maps With Applications to Nonlinear Control and Robotics
,” Ph.D. thesis,
Department of Electrical Engineering and Computer Science, University of California
,
Berkeley, CA
.
18.
Lambert
,
N. O.
,
Schindler
,
C. B.
,
Drew
,
D. S.
, and
Pister
,
K. S. J.
,
2021
, “
Nonholonomic Yaw Control of an Underactuated Flying Robot With Model-Based Reinforcement Learning
,”
IEEE Rob. Autom. Lett.
,
6
(
2
), pp.
455
461
.10.1109/LRA.2020.3045930
19.
Beckers
,
T.
, and
Hirche
,
S.
,
2022
, “
Prediction With Approximated Gaussian Process Dynamical Models
,”
IEEE Trans. Autom. Control
,
67
(
12
), pp.
6460
6473
.10.1109/TAC.2021.3131988
20.
Lederer
,
A.
,
Yang
,
Z.
,
Jiao
,
J.
, and
Hirche
,
S.
,
2023
, “
Cooperative Control of Uncertain Multiagent Systems Via Distributed Gaussian Processes
,”
IEEE Trans. Autom. Control
,
68
(
5
), pp.
3091
3098
.10.1109/TAC.2022.3205424
21.
Deisenroth
,
M.
, and
Ng
,
J. W.
,
2015
, “
Distributed Gaussian Processes
,”
International Conference on Machine Learning
, Lille, France, July 6–11,
pp.
1481
1490
.https://proceedings.mlr.press/v37/deisenroth15.pdf
22.
Helwa
,
M. K.
,
Heins
,
A.
, and
Schoellig
,
A. P.
,
2019
, “
Provably Robust Learning-Based Approach for High-Accuracy Tracking Control of Lagrangian Systems
,”
IEEE Rob. Autom. Lett.
,
4
(
2
), pp.
1587
1594
.10.1109/LRA.2019.2896728
23.
Chen
,
K.
,
Yi
,
J.
, and
Song
,
D.
,
2019
, “
Gaussian Processes Model-Based Control of Underactuated Balance Robots
,” International Conference on Robotics and Automation (
ICRA
), Montreal, QC, Canada, May 20–24, pp.
4458
4464
.10.1109/ICRA.2019.8794097
24.
Han
,
F.
, and
Yi
,
J.
,
2024
, “
Gaussian Process-Enhanced, External and Internal Convertible (EIC) Form-Based Control of Underactuated Balance Robots
,”
Proceedings of the IEEE International Conference on Robotics and Automation
, Yokohama, Japan, May 13–17, pp.
8937
8933
.
25.
Wang
,
P.
,
Han
,
F.
, and
Yi
,
J.
,
2023
, “
Gyroscopic Balancer-Enhanced Motion Control of an Autonomous Bikebot
,”
ASME J. Dyn. Syst., Meas., Control
,
145
(
5
), p.
101002
.10.1115/1.4063014
26.
Srinivas
,
N.
,
Krause
,
A.
,
Kakade
,
S. M.
, and
Seeger
,
M. W.
,
2012
, “
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
,”
IEEE Trans. Inf. Theory
,
58
(
5
), pp.
3250
3265
.10.1109/TIT.2011.2182033
27.
Apkarian
,
J.
,
Karam
,
P.
, and
Levis
,
M.
,
2011
,
Instructor Workbook: Inverted Pendulum Experiment for Matlab/Simulink Users
,
Quanser Inc
.,
Markham, ON, Canada
.