This article explores the concept of inferring intent during human-in-the-loop robot learning for output tracking. The human-response dynamics can affect the precision achieved by the human during human-in-the-loop operation. Preview-based online inversion is a viable technique that allows for stable online inversion of complex linear controlled systems, even those for which stable causal inverses do not exist, such as non-minimum phase systems. The averaged motion can still be affected by the human-response dynamics and can therefore be still different from the user’s intent. Therefore, inferring the human intent is important and necessary in the context of human-robot shared control. The results of applying the inversion-based iterative learning scheme to the human-in-the-loop trajectory tracking task has also been presented in the article. Figures show that the output tracking performance improves with respect to the manual tracking performance when inverse control is applied. When the iterative learning control law is applied, further tracking improvement is achieved. Thus, the learned control input can successfully emulate the human intent. An advantage of the robot-learning framework is that it allows novice human operators, who may be experts in the task, but not in teaching a robot, to successfully achieve the task objectives, which can expand the usage and acceptability of robots in society.
Various studies over the years, focused on understanding the mechanisms of learning in humans, have agreed on the notion that primarily, new skills are learned by watching successful demonstrations by a teacher, followed by imitation and practice until the skill is mastered . This is very commonly seen in infants and toddlers  as illustrated in Figure 1. Moreover, even in the animal world, such as in Capuchin monkeys, observations of a more proficient individual may benefit novices when subsequently acting on their own . Formally, such a learning scheme is referred to as Learning from Demonstration (LfD) or Teaching by Demonstration (TbD) , , and is sometimes colloquially referred to by the endearing term, “monkey see; monkey do”. Often the teacher might also physically hold the student's hand and demonstrate how to perform a task. For example, a tennis instructor might hold a student's hand and go through a tennis stroke as opposed to the instructor just displaying the stroke. This is referred to as kinesthetic teaching where the student's attention is directed to the effects of their movements while the learning is reinforced by real-time teacher feedback .
Learning is also important as robots are being envisioned as social partners in a range of applications, as home caregivers to the elderly and as co-workers in manufacturing. In such scenarios, kinesthetic teaching of the robot partners may allow social robots to better adapt to their dynamic surroundings in response to the guidance from their human partners. Humans might also be tele-teaching robots, such as service robots , from afar to enable fewer operators to manage a fleet of service robots. However, high precision tele-operation tasks require human operators to undergo extensive training to achieve suitable levels of expertise to operate the robot, e.g., tele-operated precision surgery , tele-operation of equipment in confined spaces in manufacturing/assembly lines . Nevertheless, such human-in-the-loop control exploits the ability of human operators to perform complex tasks (such as object detection in cluttered environments) while simultaneously ensuring safety and stability.
An advantage of the robot-learning framework is that it allows novice human operators, who may be experts in the task, but not in teaching a robot, to successfully achieve the task objectives, which can expand the usage and acceptability of robots in society. This is especially important in areas such as active prostheses/orthoses where the primary reason impeding widespread adoption is difficulty of use for first-time users . Most often, problems in learning from the human arise not due to the complexity of controlling the robot per se, but due to the inherent limitations of the human response dynamics, which modify the intent of the human-in-the-loop so that the human actions are not sufficiently good reference trajectories that the robot should follow to achieve the intended goal. Therefore, inferring the intent behind the human operator's actions becomes important for human-in-the-loop robot learning, which is evident in human-robot collaborative tasks, e.g., when the human-robot team is collaboratively moving furniture . Finally, model-based approaches have been studied to capture the human response dynamics , , in which case inverting such models results in estimating the human intent . The estimated intent can then be used with conventional iterative learning control to achieve the desired task.
In the following, the effect of the human response dynamics on output-tracking performance is presented, followed by approaches to correct these effects. Issues in the human-response-model estimation when designing the iterative learning control are presented along with experimental results.
Impact of Human-Response Dynamics on Output Tracking
with Kp = 1/Kc , τI = 0.2 s, τe =0.55 s, for a reference trajectory with bandwidth fBW =0.5 Hz. Additional details of this human-response dynamics modeling procedure and experimental results using this model are available in .
The plots in Figure 3 show that at low frequencies, the tracking error is minimal and grows to be significant at high frequencies, mainly in terms of phase errors. Thus, imitation learning based on the human input (i.e., u = uh) will not be sufficient and there is a need to infer the intended goal yd from the human operator's actions uh. The lack of access to the desired output yd can be problematic for iterative learning control (ILC), since the input update in ILC depends on the tracking error, yd - y. Thus, the primary challenge of robot learning with the human-in-the-loop is to overcome the limitation that the intended goal yd is not directly available.
Correcting Human-Response Dynamics
The human-response dynamics can affect the precision achieved by the human during human-in-the-loop operation. For example, during kinesthetic teaching of collaborative robots, the performance can be affected by the human operator's inability to achieve the required movements. Conventionally, the robot motion is taken to be the average of multiple demonstrations for a particular task. The averaged motion can still be affected by the human-response dynamics and can therefore be still different from the user's intent. Therefore, inferring the human intent is important and necessary in the context of human-robot shared control.
where G is the controlled system, and GH is the known human-response model in Figure 2 considering only the compensatory feedback channel of the human response, i.e., uh (·) = GH (·)e(·) = GH (·)(yd (·)-y(·)). Briefly, the derivation of the update law in (6) follows from finding the control input uc,k+1 that results in making the error small in the next iteration step, see  for details. The convergence of such ILC algorithms for human-in-the-loop trajectory tracking depends on the modeling error of the closed-loop system , .
As discussed earlier, the human-response model GH depends on the type of system G being controlled.
1) When the system being controlled G is a constant, or a first or second order system, parameterized approaches such as the Crossover Pilot models are effective in developing iterative control approaches as shown in . Nominal parameter models describe human feedback performance well up to a limited frequency range.
2) For more complex controlled systems G, typical parametric human models are not available. In this case, two options are available: (i) use more general human models that are valid for complex controlled systems , , , or (ii) modify the apparent controlled system GA seen by the human operator to be of the type (such as a constant or a first or second order system) for which typical parametric human models are still available, as illustrated in Figure 4. The latter option can be achieved with conventional methods such as model-reference control, inverse control, or even feedback control. Preview-based online inversion is a viable technique that allows for stable online inversion of complex linear controlled systems, even those for which stable causal inverses do not exist, such as non-minimum phase systems. Details of this system inversion technique for human-in-the-loop trajectory tracking is described in .
3) Parametric human models are restricted to a small range of frequencies, above which they deviate from actual human response. This limits the range of frequencies that can be tracked when using such models. Recent work has focused on extending the range of tracking frequencies, mainly by using online data-based modeling approaches, e.g., . Additionally, such data-based approaches also make it possible to estimate more general human response models, i.e., involving all possible input channels shown in Figure 2, such as the feed-forward (yd as input) and internal loop (y as input) channels in addition to the compensatory feedback channel (e = yd - y as input) . Moreover, the estimated models can be improved when more data becomes available during the robot operations, allowing for improved personalization over time.
Results from Iterative Learning
The results of applying the inversion-based iterative learning scheme to the human-in-the-loop trajectory tracking task in Figure 2 are presented next. Figure 5 shows that the output tracking performance improves with respect to the manual tracking performance when inverse control is applied (about 70% reduction in tracking error). When the iterative learning control law is applied, further tracking improvement is achieved (about 20% additional reduction in tracking error). Thus, the learned control input can successfully emulate the human intent.