This article explores the concept of inferring intent during human-in-the-loop robot learning for output tracking. The human-response dynamics can affect the precision achieved by the human during human-in-the-loop operation. Preview-based online inversion is a viable technique that allows for stable online inversion of complex linear controlled systems, even those for which stable causal inverses do not exist, such as non-minimum phase systems. The averaged motion can still be affected by the human-response dynamics and can therefore be still different from the user’s intent. Therefore, inferring the human intent is important and necessary in the context of human-robot shared control. The results of applying the inversion-based iterative learning scheme to the human-in-the-loop trajectory tracking task has also been presented in the article. Figures show that the output tracking performance improves with respect to the manual tracking performance when inverse control is applied. When the iterative learning control law is applied, further tracking improvement is achieved. Thus, the learned control input can successfully emulate the human intent. An advantage of the robot-learning framework is that it allows novice human operators, who may be experts in the task, but not in teaching a robot, to successfully achieve the task objectives, which can expand the usage and acceptability of robots in society.

## Article

Various studies over the years, focused on understanding the mechanisms of learning in humans, have agreed on the notion that primarily, new skills are learned by watching successful demonstrations by a teacher, followed by imitation and practice until the skill is mastered [4]. This is very commonly seen in infants and toddlers [1] as illustrated in Figure 1. Moreover, even in the animal world, such as in Capuchin monkeys, observations of a more proficient individual may benefit novices when subsequently acting on their own [2]. Formally, such a learning scheme is referred to as Learning from Demonstration (LfD) or Teaching by Demonstration (TbD) [5], [3], and is sometimes colloquially referred to by the endearing term, “monkey see; monkey do”. Often the teacher might also physically hold the student's hand and demonstrate how to perform a task. For example, a tennis instructor might hold a student's hand and go through a tennis stroke as opposed to the instructor just displaying the stroke. This is referred to as kinesthetic teaching where the student's attention is directed to the effects of their movements while the learning is reinforced by real-time teacher feedback [7].

Learning is also important as robots are being envisioned as social partners in a range of applications, as home caregivers to the elderly and as co-workers in manufacturing. In such scenarios, kinesthetic teaching of the robot partners may allow social robots to better adapt to their dynamic surroundings in response to the guidance from their human partners. Humans might also be tele-teaching robots, such as service robots [8], from afar to enable fewer operators to manage a fleet of service robots. However, high precision tele-operation tasks require human operators to undergo extensive training to achieve suitable levels of expertise to operate the robot, e.g., tele-operated precision surgery [9], tele-operation of equipment in confined spaces in manufacturing/assembly lines [10]. Nevertheless, such human-in-the-loop control exploits the ability of human operators to perform complex tasks (such as object detection in cluttered environments) while simultaneously ensuring safety and stability.

An advantage of the robot-learning framework is that it allows novice human operators, who may be experts in the task, but not in teaching a robot, to successfully achieve the task objectives, which can expand the usage and acceptability of robots in society. This is especially important in areas such as active prostheses/orthoses where the primary reason impeding widespread adoption is difficulty of use for first-time users [11]. Most often, problems in learning from the human arise not due to the complexity of controlling the robot per se, but due to the inherent limitations of the human response dynamics, which modify the intent of the human-in-the-loop so that the human actions are not sufficiently good reference trajectories that the robot should follow to achieve the intended goal. Therefore, inferring the intent behind the human operator's actions becomes important for human-in-the-loop robot learning, which is evident in human-robot collaborative tasks, e.g., when the human-robot team is collaboratively moving furniture [12]. Finally, model-based approaches have been studied to capture the human response dynamics [13], [14], in which case inverting such models results in estimating the human intent [15]. The estimated intent can then be used with conventional iterative learning control to achieve the desired task.

In the following, the effect of the human response dynamics on output-tracking performance is presented, followed by approaches to correct these effects. Issues in the human-response-model estimation when designing the iterative learning control are presented along with experimental results.

## Impact of Human-Response Dynamics on Output Tracking

To understand the effect of the human response dynamics on robot learning, consider a human-in-the-loop trajectory-tracking task as shown in Figure 2, where the human operator performs the role of a feedback controller in operating a robotic-controlled system G to get the output y of the controlled system to match the intended goal yd. This is a fairly typical human-in-the-loop configuration that is commonly observed, e.g., aircraft pilots, vehicular drivers, and heavy machinery operators. The human response dynamics (GH in Figure 2) in such a scenario has been well studied in literature, and various empirical and analytical models exist that describe the human operator's feedback characteristics. For example, the Crossover Model, first introduced by McRuer et. al. [16], describes an analytical transfer function model where the parameter values and the specific model structure depend on both the specific human's characteristics as well as the controlled system G. It is sometimes referred to as an analytical-verbal model, where the analytical part of the model, GH in Figure 2, is specified as,
$GHs=uhses=KpτLs+1τIs+1e−τes$
(1)
for s=jω and $j=−1$. The parameters in the model are: (1) Kp : operator static gain (including the human-machine interface gain K1), (2) τe : effective time delay, (3) τL : leadtime constant, (4) τI : lag-time constant. The term inside the brackets is called the equalization characteristic, whose form depends on the type of controlled system G and the bandwidth of the reference signal yd . The verbal part of the model refers to empirically validated adjustment rules to determine the specific form of the equalization characteristic for different task conditions. As an example, consider a controlled system,
$Gs=Kc$
(2)
where Kc is some constant, and suppose the reference trajectory yd has a bandwidth fBW =0.5 Hz (which is also the typical bandwidth of human visual smooth pursuit tracking [17]). The human-machine interface can be designed to set the operator static gain, Kp≈ 1/Kc . For a constant controlled system, G = Kc , the equalization characteristic has the form 1/(τIs + 1). Then, the human-feedback transfer function model for the constant controlled system becomes,
$GHs=KpτIs+1e−τes$
(3)

with Kp = 1/Kc , τI = 0.2 s, τe =0.55 s, for a reference trajectory with bandwidth fBW =0.5 Hz. Additional details of this human-response dynamics modeling procedure and experimental results using this model are available in [18].

A simulation shows the effect of the human operator on the closed-loop tracking performance is significant even when the controlled system in Figure 2 is one, i.e., G = 1, and the control input is zero, i.e., uc= 0. This represents the situation when the output y of the controlled system G, such as a robot, is able to exactly track the commanded signal u, i.e., y = u. The desired output trajectory yd is chosen to mimic a reach-retract movement with sinusoidal acceleration components of frequency f0 as,
$d2yddt2t=A sin2πf0t−0.1T, for t∈0.1T,0.4T,−A sin2πf0t−0.6T, for t∈0.6T,0.9T,0, otherwise.$
(4)

The plots in Figure 3 show that at low frequencies, the tracking error is minimal and grows to be significant at high frequencies, mainly in terms of phase errors. Thus, imitation learning based on the human input (i.e., u = uh) will not be sufficient and there is a need to infer the intended goal yd from the human operator's actions uh. The lack of access to the desired output yd can be problematic for iterative learning control (ILC), since the input update in ILC depends on the tracking error, yd - y. Thus, the primary challenge of robot learning with the human-in-the-loop is to overcome the limitation that the intended goal yd is not directly available.

## Correcting Human-Response Dynamics

The human-response dynamics can affect the precision achieved by the human during human-in-the-loop operation. For example, during kinesthetic teaching of collaborative robots, the performance can be affected by the human operator's inability to achieve the required movements. Conventionally, the robot motion is taken to be the average of multiple demonstrations for a particular task. The averaged motion can still be affected by the human-response dynamics and can therefore be still different from the user's intent. Therefore, inferring the human intent is important and necessary in the context of human-robot shared control.

The main idea in the intent-inference scheme is that a known human model GH may be inverted to obtain the intended goal yd from the human input uh and the measured output y as,
$yd·=GH−1·uh·+y·$
(5)
as shown in [15]. But, practical application of this modelinversion technique suffers from problems of modeling inaccuracies. In such situations, iterative learning control (ILC) [19] has been shown to result in improved tracking. Specifically, the proposed ILC update law for human-in-the-loop tracking is given point-wise at each frequency ω as,
$uc,k+1ω=uc,kω+ρωĜfb−1ωuh,kω$
(6)
for the k-th iteration step, where uc is the learned control input, uh is the human input, ρ is the iteration gain, and $Ĝfb$ is the known model of the closed-loop system Gfb, given at each frequency ω as [15]
$Gfbω=GωGHω1+GωGHω$
(7)

where G is the controlled system, and GH is the known human-response model in Figure 2 considering only the compensatory feedback channel of the human response, i.e., uh (·) = GH (·)e(·) = GH (·)(yd (·)-y(·)). Briefly, the derivation of the update law in (6) follows from finding the control input uc,k+1 that results in making the error small in the next iteration step, see [15] for details. The convergence of such ILC algorithms for human-in-the-loop trajectory tracking depends on the modeling error of the closed-loop system [15], [18].

## Model Selection

As discussed earlier, the human-response model GH depends on the type of system G being controlled.

1) When the system being controlled G is a constant, or a first or second order system, parameterized approaches such as the Crossover Pilot models are effective in developing iterative control approaches as shown in [15]. Nominal parameter models describe human feedback performance well up to a limited frequency range.

2) For more complex controlled systems G, typical parametric human models are not available. In this case, two options are available: (i) use more general human models that are valid for complex controlled systems [20], [13], [21], or (ii) modify the apparent controlled system GA seen by the human operator to be of the type (such as a constant or a first or second order system) for which typical parametric human models are still available, as illustrated in Figure 4. The latter option can be achieved with conventional methods such as model-reference control, inverse control, or even feedback control. Preview-based online inversion is a viable technique that allows for stable online inversion of complex linear controlled systems, even those for which stable causal inverses do not exist, such as non-minimum phase systems. Details of this system inversion technique for human-in-the-loop trajectory tracking is described in [18].

3) Parametric human models are restricted to a small range of frequencies, above which they deviate from actual human response. This limits the range of frequencies that can be tracked when using such models. Recent work has focused on extending the range of tracking frequencies, mainly by using online data-based modeling approaches, e.g., [22]. Additionally, such data-based approaches also make it possible to estimate more general human response models, i.e., involving all possible input channels shown in Figure 2, such as the feed-forward (yd as input) and internal loop (y as input) channels in addition to the compensatory feedback channel (e = yd - y as input) [23]. Moreover, the estimated models can be improved when more data becomes available during the robot operations, allowing for improved personalization over time.

## Results from Iterative Learning

The results of applying the inversion-based iterative learning scheme to the human-in-the-loop trajectory tracking task in Figure 2 are presented next. Figure 5 shows that the output tracking performance improves with respect to the manual tracking performance when inverse control is applied (about 70% reduction in tracking error). When the iterative learning control law is applied, further tracking improvement is achieved (about 20% additional reduction in tracking error). Thus, the learned control input can successfully emulate the human intent.

## References

References
1.
Andrew N
Meltzoff
,
Patricia K
Kuhl
,
Javier
Movellan
, and
Terrence J
Sejnowski.
“Foundations for a new science of learning”
.
Science
,
325
(
5938
),
2009
. Pages
284
288
.
2.
DM
Fragaszy
,
Dora
Biro
,
Y
Eshchar
,
Tatyana
Humle
,
P
Izar
,
B
Resende
, and
E
Visalberghi.
“The fourth dimension of tool use: temporally enduring artefacts aid primates learning to use tools.”
Phil. Trans. R. Soc. B
,
368
(
1630
):
20120410
,
2013
.
3.
Stefan
Schaal.
“Dynamic movement primitives-a framework for motor control in humans and humanoid robotics.” In Adaptive Motion of Animals and Machines
,
Springer
,
2006
, pages
261
280
.
4.
Aude
Billard
,
Sylvain
Calinon
,
Ruediger
Dillmann
, and
Stefan
Schaal.
“Robot programming by demonstration”. In Springer handbook of robotics
,
Springer
,
2008
, pages
1371
1394
.
5.
Chrystopher L
Nehaniv
and
Kerstin
Dautenhahn.
Imitation and social learning in robots, humans and animals: behavioural, social and communicative dimensions. Cambridge University Press,
2007
.
6.
Brenna D
Argall
,
Sonia
Chernova
,
Manuela
Veloso
, and
Brett
Browning.
“A survey of robot learning from demonstration.”
Robotics and autonomous systems
,
57
(
5
),
2009
. Pages
469
483
.
7.
Gabriele
Wulf
and
Wolfgang
Prinz.
“Directing attention to movement effects enhances learning: A review.”
Psychonomic bulletin & review
,
8
(
4
),
2001
. Pages
648
660
.
8.
David
Gouaillier
and
Pierre
Blazevic.
“A mechatronic platform, the aldebaran robotics humanoid robot.” In IEEE Industrial Electronics, IECON 2006-32nd Annual Conference on, IEEE,
2006
, pages
4049
4053
.
9.
Allison M
Okamura.
“Methods for haptic feedback in teleoperated robot-assisted surgery.”
Industrial Robot: An International Journal
,
31
(
6
),
2004
. Pages
499
508
.
10.
Rob
Buckingham
,
Vilas
Chitrakaran
,
Rosalind
Conkie
,
Geoff
Ferguson
,
Andrew
Graham
,
Alex
Lazell
,
Mariusz
Lichon
,
Nick
Parry
,
Fred
Pollard
,
Amir
Kayani
, et al. “Snake-arm robots: a new approach to aircraft assembly.” Technical report, SAE Technical Paper,
2007
.
11.
Elaine A
Biddiss
and
Tom T
Chau.
“Upper limb prosthesis use and abandonment: a survey of the last 25 years.”
Prosthetics and orthotics international
,
31
(
3
),
2007
. Pages
236
257
,
12.
Stefanos
Nikolaidis
,
David
Hsu
, and
Siddhartha
Srinivasa.
“Human-robot mutual adaptation in collaborative tasks: Models and experiments.” The International Journal of Robotics Research, doi:10.1177/0278364917690593,
2017
.
13.
Richard J
Wasicko
,
Duane T
McRuer
, and
Raymond E
Magdaleno.
“Human pilot dynamic response in single-loop systems with compensatory and pursuit displays.” Technical report, DTIC Document,
1966
.
14.
Bo
Yu.
Interaction dynamics in oscillator and human-in-the-loop systems. PhD thesis, The University of Michigan,
2014
.
15.
Rahul B
Warrier
and
Santosh
Devasia.
“Iterative learning from novice human demonstrations for output tracking.”
IEEE Transactions on Human-Machine Systems
,
46
(
4
),
2016
. Pages
510
521
.
16.
Duane T
McRuer
and
Ezra S
Krendel.
“The human operator as a servo system element.”
Journal of the Franklin Institute
,
267
(
6
),
1959
. Pages
511
536
.
17.
A
Terry Bahill
and
Jack D
McDonald.
“Smooth pursuit eye movements in response to predictable target motions.”
Vision research
,
23
(
12
),
1983
. Pages
1573
1583
.
18.
Rahul B
Warrier
and
Santosh
Devasia.
“Inferring intent for novice human-in-the-loop iterative learning control.” IEEE Transactions on Control Systems Technology, DOI: 10.1109/TCST.2016.2628769,
2016
.
19.
Suguru
Arimoto
,
Kawamura
, and
Fumio
Miyazaki.
“Bettering operation of robots by learning.”
Journal of Field Robotics
,
1
(
2
),
1984
. Pages
123
140
.
20.
DL
Kleinman
,
S
Baron
, and
WH
Levison.
“An optimal control model of human response part i: Theory and validation.”
Automatica
,
6
(
3
),
1970
. Pages
357
369
.
21.
Mitsuo
Kawato.
“Feedback-error-learning neural network for supervised motor learning.”
,
6
(
3
),
1990
. Pages
365
372
.
22.
Jonathan
Realmuto
,
Rahul B
Warrier
, and
Santosh
Devasia.
“Iterative learning control for human-robot collaborative output tracking.” In Mechatronic and Embedded Systems and Applications (MESA), 2016 12th IEEE/ASME International Conference on, IEEE,
2016
, pages
1
6
.
23.
Rahul B
Warrier
and
Santosh
Devasia.
“Data-based iterative human-in-the-loop robot-learning for output tracking.” In Preprint accepted to 20th IFAC World Congress.