## Abstract

Computational human body models (HBMs) are important tools for predicting human biomechanical responses under automotive crash environments. In many scenarios, the prediction of the occupant response will be improved by incorporating active muscle control into the HBMs to generate biofidelic kinematics during different vehicle maneuvers. In this study, we have proposed an approach to develop an active muscle controller based on reinforcement learning (RL). The RL muscle activation control (RL-MAC) approach is a shift from using traditional closed-loop feedback controllers, which can mimic accurate active muscle behavior under a limited range of loading conditions for which the controller has been tuned. Conversely, the RL-MAC uses an iterative training approach to generate active muscle forces for desired joint motion and is analogous to how a child develops gross motor skills. In this study, the ability of a deep deterministic policy gradient (DDPG) RL controller to generate accurate human kinematics is demonstrated using a multibody model of the human arm. The arm model was trained to perform goal-directed elbow rotation by activating the responsible muscles and investigated using two recruitment schemes: as independent muscles or as antagonistic muscle groups. Simulations with the trained controller show that the arm can move to the target position in the presence or absence of externally applied loads. The RL-MAC trained under constant external loads was able to maintain the desired elbow joint angle under a simplified automotive impact scenario, implying the robustness of the motor control approach.

## Introduction

Motor control mechanisms in humans manage and modify the stiffness of skeletal joints by generating active muscle forces as determined by the central nervous system (CNS). Active muscle forces enable the human body to maintain posture and balance, perform motor tasks, and react to external perturbations. In automotive loading scenarios such as pre-impact bracing in low-severity crashes, active muscle forces may alter the occupant response and injury modes due to the change in body kinematics during the loading phase or by stiffening up of joints. Computational human body models (HBMs) are extensively used in the automotive industry to predict the response of occupants and pedestrians in motor vehicle collisions (MVCs), leading to injury during the vehicle design phase. Incorporating active motor control mechanisms into HBMs will help improve our understanding of the mechanisms and tolerances of injury and will help accelerate the development of injury countermeasures.

Previous computational studies in vehicle safety research have widely used two approaches for modeling muscle control in HBMs. The first approach involves the application of a predetermined activation-time history to muscle groups responsible for carrying out specific motions around the joint. This simplified approach has been utilized in both multibody (MB) models [1,2] as well as finite element (FE) models [35] and have demonstrated the effect of muscle activation on the response of HBMs under external loads. Some studies have also derived an optimized activation scheme for muscles to maintain the corresponding joint at a predefined position [6]. Unfortunately, these predetermined activation schemes have limited utility for general use applications of HBMs, and these approaches cannot be used under the various cases of loading where activation patterns or targeted kinematics are not known a priori.

The second control approach, which has become state-of-the-art in active muscle control for HBMs used in automotive applications, is based on closed-loop feedback (PID) mechanisms that drive muscle activation levels by some error signal based on current and target joint positions or muscle lengths. These controllers are designed to output muscle forces or activation levels to restore the HBMs to a desired position. Kistemaker integrated a PID controller into a MB model of the upper arm to reproduce fast elbow motion like in humans [7,8], using the error calculated as the difference between muscle length and a target muscle length (equilibrium point control). The number of muscles was reduced by combining different muscle units into four lumped muscles, with two of the lumped muscle responsible for elbow extension and the remaining two for flexion. The output from the PID controller was muscle stimulation for each group. One limitation of using target muscle length for control is that it is difficult to correlate the muscle length with different values of target joint angles. Östh used a similar approach to model musculoskeletal control in a FE model of the arm [9]. In the FE model, nine different muscle units were modeled, and the control approach grouped the muscles as extensors and flexors. The PID controller was developed using the joint angle error as the input, and the control signal was used to derive the muscle activation level for each group. The PID controllers have also been used on whole body HBMs to predict the occupant response during motor vehicle load cases. Östh et al. used three PID controllers with head, neck, and lumbar spine angles as the error signal [10] for posture control during a frontal crash. Iwamoto et al. used an FE model with a PID controller to control the kinematics of an occupant in a low severity side impact scenario [11]. Twenty-two PID controllers were modeled to control the joint motion with the joint angles as error signals with muscles classified into right or left control groups. Martynenko et al. used PID controllers with 370 active muscles of the neck, torso, and upper extremities [12]. The error signal for each PID was based on target muscle lengths to predict the body kinematics of the occupant under the combination of emergency braking and lane change maneuvers. Inkol et al. used nonlinear torque generators for HBM joints in place of the control of Hill-type muscles [13]. Parameter control optimization were done to simulate athlete performance in golf, cycling, and wheelchair propulsion. Walter et al. [14] developed a hierarchical control architecture with PID controllers to maintain posture and simulate squat movement using full body musculoskeletal model with 36 muscles.

The feedback based PID control mechanisms require precise tuning of controllers to fit the validation data generated by volunteer testing. Although useful under many circumstances, the PID control approach suffers from two major limitations: (1) the controllers are tuned for a limited range of possible external loads and may not output accurate responses beyond the loading scenarios they are tuned for and (2) the feedback controllers require a predefined muscle recruitment strategy on how muscles are organized as agonist or antagonist groups for the preferred joint kinematics. The human musculoskeletal system is redundant in nature; there are more actuators (muscles) than the kinematic degrees-of-freedom (DOF) at the joints. For simple joints, assumptions can be made regarding extensor and flexor groups, however for more complex body regions, it is not feasible to isolate individual muscles responsible for motion along any standard joint direction for a generalized response. Rather, movements are carried out by intricate coordination of different muscles activated at different points of time by the CNS [1517], which is difficult to replicate using linear feedback gains [18,19].

The present study aims to use reinforcement learning (RL) for motor control in HBMs. Deep RL algorithms are recent advances in the field of machine learning which use an iterative approach to train the controller to generate desired outputs through a system of rewards and penalties [20]. Reinforcement learning is a biologically inspired learning routine that allows for the controller (called agent) to identify the optimal sequence of actions to take from a given state of the control environment to achieve a predefined goal [21]. State refers to the parameters that can be used to define the control environment. Reinforcement learning enables the control model to learn how muscle actuation affects joint kinematics by rewarding good responses and penalizing bad responses. This process is analogous to how children learn to use and coordinate their muscles to eventually interact with the environment around them. Deep RL leverages the performance of neural networks with reinforcement learning algorithms that enable the agent to reach the desired objective [22]. Neural networks (NNs) can be considered as function approximations, which map the state action pair to its corresponding value. In a RL problem, the neural networks are trained to predict the efficacy of each state-action pair and take the best possible action. The efficacy of actions is quantified using a reward function, which awards desirable actions (actions that moves the system closer to its objective) and penalizes the undesirable actions.

In this study, deep deterministic policy gradient (DDPG) is used to model the agent, which belongs to the class of RL algorithms called the actor-critic method [23]. One advantage of DDPG is that it can be used in continuous action spaces [24,25]. DDPG concurrently trains two neural networks to identify and output the best action. The actor network is the policy network that maps the state parameters to actions. The critic network updates the Q-value based on the state parameter and action from previous time-steps. The Q-value of a state-action is the cumulative reward that the agent is expected to receive when it undertakes the given action from the present state. The actor network outputs the action, which maximizes the expected reward.

Deep RL algorithms have demonstrated the ability to show human-level performance while playing video games [22] and learning complex games like alpha-go from scratch [26]. RL for control has also been used in the field of robotics for navigation in complex environments [2730]. HBMs present a further control challenge due to physiological redundancy and nonlinearity of the musculoskeletal system.

Previously RL control schemes have been used for simulating arm reaching tasks [3134], for generating motion about shoulder joint [35] and for maintaining the stability of joints under gravity [36,37]. Deep RL algorithms have also been used for synthesizing locomotion using multibody models of humans [38,39] and animals [40], control the kinematics of neck FE model in the sagittal plane [41] under rear impact scenario, and have aided in the design and analysis of limb assistive exoskeletons [42]. Human beings can adapt to changes in external loads [18,19], and the ability of RL agents to replicate such adaptive response have been studied previously [43]. While previous RL musculoskeletal control studies for motion related tasks have been performed in detail [3133,39,44,45], the ability of RL control mechanisms to extend to dynamic events such as automotive impacts, where the response time is much faster, and the environment is more chaotic, remains to be studied. The biofidelity of muscle activation patterns corresponding to joint stability in such cases and the effect of change in external environment on the muscle synergy also need to be investigated.

The current study makes use of DDPG algorithms to model and integrate active muscle control in HBMs with the aim of evaluating the biofidelity of the controller and verifying its adaptability in automotive environments. For this purpose, a MB model of the human arm with the anatomy of an average (50th percentile) male has been developed to demonstrate the utility and implementation of this active muscle modeling approach using a simplistic anatomical model. The reinforcement learned muscle activation controller (henceforth, referred to as RL-MAC) developed in this study was used to simulate the desired motion at the elbow or maintain the stability of the joint in the presence of external impact loads.

## Methodology

The human arm MB model was developed in matlabr2020b using the Simscape multibody toolbox. The multibody model was integrated with the matlab reinforcement learning toolbox to develop the RL-MAC and carry out training and simulation of the control model.

### Development of the Arm Multibody Model.

A simplified model of a 50th percentile human arm was developed comprising of scapula, humerus, radius, and ulna (Fig. 1(a)). The bones were modeled as rigid bodies. In this study, only the extension–flexion motion about the elbow was considered, thus the glenohumeral joint was constrained. The radius and ulna were also combined into one rigid body, and mass and inertial properties of the lower arm (below the elbow) were applied. In the present study, the focus was on reproducing the elbow extension–flexion motion, thus a revolute joint was defined between radius–ulna and humerus at the elbow with a joint stiffness of 0.6 N·m/rad [9,12]. Joint damping of 0.4 N·ms/rad was also used to prevent any minor oscillations at the neutral position in the passive condition. Previous studies have measured the damping of the elbow joint between 0.2 N·ms/rad and 1 N·ms/rad [46,47]. Popescu et al. also determined that the damping value was almost constant during the entire time of the elbow rotation [48]. The simulation space of the elbow was restricted at the limit of the extension–flexion angle between 0 deg to 160 deg when measured from full extension of the arm.

Fig. 1
Fig. 1
Close modal

The muscles in the model were defined with the suitable origin and insertion points [49,50], along the line of action of the muscle forces. Force magnitudes were calculated according to the Hill-type muscle model considering the muscle length and contractile velocity during the simulation.

Hill-type muscle model provides for the calculation of muscle forces in numerical analysis [51,52] (Fig. 2). Hill-type muscle consists of a contractile element (CE) simulating the active forces generated by the muscles (FCE). The passive element (PE), which is in parallel to the contractile element, computes the forces due to muscle stiffness (FPE).

Fig. 2
Fig. 2
Close modal

The forces generated by the muscles are nonlinear in nature and depend on the muscle length and the muscle velocity. The active muscle forces are stimulated by the CNS, which generates the muscle activation levels (at). The activity level or activation varies between 0 (fully passive) and 1 (fully active), and its value is determined by the CNS depending on external loads and the current joint stability.

The Hill-type muscle parameters like normalized force-length (Fl) and force-velocity (Fv) relationships are shown in Fig. 3. The total forces generated by the active part of the muscle are dependent on Fl, Fv, and the activation level (at) [52,53]
$FCE= at×Fmax× Fl(L) × FV(V)$
(1)
Fig. 3
Fig. 3
Close modal

Fmax is the maximum force a muscle can generate, which is a characteristic property of the muscle and is dependent on the anatomical cross section area. The active tension–length curve Fl describes the relationship between active muscle forces and normalized length of muscles (L), with the maximum force which occurs at an optimal length (Lopt). The FV curve is the relationship between contractile velocity and FCE. When the velocity is positive, i.e., the muscle elongates, the force (Fv) asymptotes at a value near Fmax.

For the passive element, the force was calculated using an exponential function of length. The passive force (FP) only starts acting when the length of the muscle exceeds the optimum length [54]
$FPE= 1exp(Ksh)−1 {exp[KshLmax(L−1)]−1} for L>1$
(2)

Ksh is a dimensionless parameter influencing the rise of the passive force with length. The total force generated by the muscle is the sum of the magnitudes of passive force and active force.

The MB model incorporated muscles responsible for both the flexion and extension motion (Fig. 1(b)). The muscles included in the model are the biceps brachii long head and short head, the brachialis, the brachioradialis, the pronator teres, the extensor carpi radialis longus, and the triceps long head, lateral head, and medial head. Some of the muscles which have large cross section and wide bone insertion regions were divided into several strands to distribute the muscle forces. The Hill-type muscle parameters like normalized force-length (Fl) and force-velocity (Fv) relationships were defined as a curve and were identical for each arm muscle [5,55].

The muscles origin and insertion points in the model are approximated from various sources of anatomical data available [49,50]. The optimum muscle length is the length of the muscles at the neutral position when the humerus and radius are at right angles to each other [56]. The muscle properties included in the model are tabulated below (Table 1).

Table 1

Properties of muscles in MB model

MusclesFmax (N) [9]No. of strands in modelLopt (mm)
Brachialis5682166, 156
Pronator teres3201157
MusclesFmax (N) [9]No. of strands in modelLopt (mm)
Brachialis5682166, 156
Pronator teres3201157

Note: Optimum length (Lopt) of the muscles is obtained at the neutral position (90 deg flexion angle).

### Muscle Control Framework.

As discussed before, the CNS actuates different muscles in coordination for carrying out any motion about a joint. Muscle forces are highly nonlinear in nature and are affected by the delay between the neural stimulation generated by the CNS and the actuation of the muscle. Humans can also adapt to changes in external environments during their movements, and Smeets et al. argued that human movement patterns cannot be explained using simple feedback mechanisms [18]. This study explores the feasibility of using the RL-MAC mechanism for muscle control under varied external environments. The RL-MAC used in this paper uses deep NN, which can efficiently approximate nonlinear behaviors from known predictors [57]. Trained RL agents have also been found to adapt to changes in environments in various applications [58]. All these factors make RL-MAC potentially suitable for HBMs.

Figure 4 shows the RL-MAC framework for controlling arm motion by incorporating a DDPG agent. The RL-MAC reads the state parameters from the arm multibody model. The controller outputs neural stimulations (ut), which are converted to muscle activation (at) using activation dynamics. The resultant activations are applied to the corresponding Hill's muscles in the MB model, which also obtain the value of muscle length and velocity from the model to simulate the muscle forces (Eq. (1)). The active muscle forces are tensile in nature, i.e., forces developed try to pull the origin and insertion points toward each other. During the motion, different sets of muscles are activated by the RL-MAC to carry out the relevant joint motion.

Fig. 4
Fig. 4
Close modal
The state of the controller is defined by the elbow joint angle, the joint velocity, the error (difference between target angle and current angle), and the muscle activations. The activation (a) buildup in the muscles as a result of the neural stimulation (u) using activation dynamics proposed by Zajac [52]
$dadt=1τact[u−(1−δ)au−δa]$
(3)

The τact is the time constant for generating muscle activity from neural stimulation, which represents the time delay between the neural stimulus (u) and the commencement of muscle activity (a). δ is the ratio $(τact/τdeact)$, where τdeact was the depletion time constant which controls the time lag of drop in activation level (a) after reduction of the stimulation (u). The length and velocity were determined during the simulation by measuring the Euclidean distance between the origin and insertion points of the muscles. The parameters used to calculate the muscle forces are tabulated in Table 2. The total muscle forces were calculated as the sum of forces generated by the contractile element and passive element.

Table 2

Muscle and activation parameters used in MB model

Muscle parametersValue
τact0.02 [59]
τdeact0.06 [59]
Minimum activation (ao)0.005
FL curveGraph input [5]
FV curveGraph input [5]
Ksh6.15 (Flexors) [9]
3 (Extensors)
Muscle parametersValue
τact0.02 [59]
τdeact0.06 [59]
Minimum activation (ao)0.005
FL curveGraph input [5]
FV curveGraph input [5]
Ksh6.15 (Flexors) [9]
3 (Extensors)
Table 3

Training and simulation scenarios

Evaluation of passive structural responseTorque at the elbow revolute jointMoment-angle response of the joint
Training scenario 1: targeted motion of the forearm with RL-MAC integrated MB modelNo external loadsAngle-time response of the elbow joint
Training scenario 2: targeted motion of the forearm under external loadsPoint mas attached to the radius and gravityAngle-time response of the elbow joint
Testing scenario: response to novel loadsSimplified crash pulse applied to the humerus proximal endAngle-time response of the elbow joint
Evaluation of passive structural responseTorque at the elbow revolute jointMoment-angle response of the joint
Training scenario 1: targeted motion of the forearm with RL-MAC integrated MB modelNo external loadsAngle-time response of the elbow joint
Training scenario 2: targeted motion of the forearm under external loadsPoint mas attached to the radius and gravityAngle-time response of the elbow joint
Testing scenario: response to novel loadsSimplified crash pulse applied to the humerus proximal endAngle-time response of the elbow joint

The overall architecture of the RL-MAC, which was based on the DDPG agent (Fig. 5), was similar to the one proposed by Lillicrap et al. [23]. The actor network consisted of a feedforward neural network with one hidden layer between the input and the final layer. Input to the hidden layer and final layer was activated with a rectified linear (ReLu) function. The output of the final layer was activated using a sigmoid function as the action space varies uniformly between 0 and 1 representing the neural stimulation (ut). The number of nodes in the final layer was equal to the number of muscles required to be stimulated. The critic network is comprised of three layers with the ReLu transfer function after the hidden layer. The input of the critic network included the state parameters and the actions from the actor network. The state observations were activated with a ReLu function before connecting to the hidden layer. The actions from the actor network are connected to a hidden layer skipping the ReLu activation [23]. The critic layer outputs the Q-value associated with the state-action pair. Since the Q-value is a scalar value, the critic network has one node in the final layer.

Fig. 5
Fig. 5
Close modal
In this study, the objective of the RL-MAC was to perform goal directed motion of the elbow joint within its range of motion and maintain its stability in the presence of external perturbations. The agent was trained to move the forearm to a target position from any given starting position and stabilize it at the final position. During the training phase, the controller learns to minimize the error between the current angle and the target angle. In the reward function, the controller is penalized proportionally to the magnitude of the error. The reward function also awards the agent if it manages to stabilize the elbow angle within 0.1 rad (5.7 deg) of the target value. Due to the redundant nature of the human musculoskeletal system, there are many combinations of muscle activation patterns that can produce the same desired movement. One method to reduce the ambiguity of the muscle activations scheme that is commonly used in muscle activation biomechanics is to minimize the metabolic costs of active muscle activity [60]. There are two major metabolic cost functions associated with muscles that are most commonly used. The first is the energy cost which seeks to minimize the total forces generated in the muscles or work done by the muscles [61,62]. Second is the muscle fatigue or muscle effort cost function, which considers minimizing the muscle activation over the time of motion [63]. In the control model, we have considered minimizing the activation, but alternate energy costs can also be implemented
$Reward = −α Error −β ∑a(t) + γ (|Error| < 0.1 rad)$
(4)

The reward calculated using Eq. (4) at each time-step of the simulation, while the agent tries to maximize the cumulative reward over the simulation time. α, β, and γ are positive constant terms that calibrate the different components of the reward function. Equal weightage was assigned to the activation of each muscle in the reward function. During the training of the RL-MAC Ornstein–Uhlenbeck (OU) process was used to add noise with a standard deviation of 0.09 for adequate exploration of the action space. OU noise was found to explore the action space better for coordinated actuation of muscles [40]. The RL training was considered to converge when the average cumulative reward over the most recent 250 iterations reached a predetermined value.

### Model Evaluation, Training, and Validation.

Before training the control model using RL-MAC, the passive structural behavior of the arm MB model was evaluated. A moment with a magnitude varying between −1 N·m and 1 N·m was applied at the elbow revolute joint with the humerus fixed, and the rotation of the forearm was measured. The resultant stiffness of the joint was calculated from the moment-angle data, and the magnitude was verified with published literature [56,64].

After verifying the structural stiffness of the MB model, the control training was carried out with the system of rewards mentioned in Eq. (4). The model was trained in two different scenarios described in the Training Scenario 1: Bare Arm Motion Control and Training Scenario 2: Arm Motion Control Under External Load sections (also summarized in Table 3), and the ability of the trained model to synthesize motion under a novel environment was also evaluated (testing scenario, summarized in Table 3). In both cases, the humerus and the scapula were fixed, and the forearm was free to rotate about the elbow revolute joint. The elbow angle was measured from the arm's extension limit (Fig. 6(a)).

Fig. 6
Fig. 6
Close modal

### Training Scenario 1: Bare Arm Motion Control.

In the first scenario, the MB arm model with integrated RL-MAC was trained to perform a fast goal-directed motion of the elbow joint from a given initial position to a target position of the joint. For the purpose of training, two different muscle recruitment strategies were used. In the first recruitment strategy, called group activated muscle recruitment (GAMR), the arm muscles were grouped as extensors and flexors (Figs. 6(b) and 6(c)), and identical activation was applied to muscles belonging to the same group. The flexor group included the biceps brachii long head and short head, the brachialis, the brachioradialis, the pronator teres, and the extensor carpi. The triceps long head, lateral head, and medial head made up the extensor group. For this case, the actor network had two nodes in the output layer prescribing the stimulation for the extensor and flexor groups. The second recruitment strategy was called individual activated muscle recruitment (IAMR), in which each muscle was actuated individually regardless of being an extensor or flexor and having an activation level independent of the activation levels of other muscles. For this activation scheme, the actor network had nine nodes in the final layer, one for neural stimulation for each muscle. During the training process, the initial and target angles were randomly varied in the elbow rotation space so that the trained agent does not overfit to any particular set of input data.

After the activation models were trained to convergence, the arm was repositioned to test the model's response to a series of initial and target angles based on independent human volunteer data from available literature [8]. The validation case used in this study required six male volunteers to perform fast targeted extension or flexion of the arm, and the elbow angle—time history of the motion was recorded. Due to the simple nature of the test setup, a similar motion could be replicated in the MB model. Depending on the rotation case, the arm was stabilized at the initial angle for the first 100–125 ms before the target angle was set to the desired value for the rotation. The angle-time data generated by the trained RL-MAC was compared with the volunteer kinematics data to verify its efficiency.

### Training Scenario 2: Arm Motion Control Under External Load.

Training is performed using the targeted arm motion detailed in scenario 1, but in this case, a mass was attached to the radius (Fig. 7(a)), and gravity was introduced in the multibody model. The model was trained using the IAMR scheme, and the reward function is described in Eq. (4). Only changes to the multibody environment were made, and no change was made to the control model for this scenario, i.e., no additional information on the magnitude of the added mass or direction of gravity was provided to the controller. This enabled the RL-MAC to formulate a response pattern, which was independent of the external loads applied. During the training, the added mass was randomly varied between 1 kg and 5 kg (in increments of 1 kg), and the direction of gravity was adjusted every iteration such that the arm was trained to perform the flexion–extension motion both along and against the gravity. The results from the trained model in scenario 2 enabled the assessment of the utility of the RL-MAC framework for training the HBM using the same architecture for varied sets of loads generally associated with to HBMs.

Fig. 7
Fig. 7
Close modal

### Testing Scenario: Response to Novel Environments.

The ability of the RL-MAC framework to produce a robust response within an environment for which it was not explicitly trained was also evaluated in this study. The model trained in the presence of an acceleration field (scenario 2) was subjected to a simplified crash pulse representative of the inertial load experienced by the upper extremity in a typical frontal MVC (Fig. 7(b)) [65]. The radius was pinned with a revolute joint representing the occupant's hand position on the steering wheel. The scapula was free to move in the planar direction. A stiffness of 1000 N/m was applied to the scapula in the vertical direction to effectively reproduce the effect of the lower body weight. The crash pulse was applied to the scapula in the horizontal direction. At the start of the simulation, the arm was set at a neutral position (flexion angle = 90 deg). The period of the pulse was similar to that used by Happee et al. to design a linear controller for two-dimensional arm motion [66]. However, the pulse magnitude applied to the MB scapula (Fig. 7(b)) was measured in the thoracic region in automotive frontal tests [65] and was higher than the constant force field under which the agent was trained. The RL-MAC trained in scenario 2 was evaluated in its ability to stabilize the arm under forward loads (representing frontal collision) and in rearward loads (rear collision). The testing step is important to verify the range of utility of the trained RL-MAC agent.

## Results

The response of the passive part of the MB arm model was measured and compared with previously published data. The stiffness of the model with inactivated muscles was found to be 0.955 N·m/rad. The stiffness magnitude measured was close to the values reported by Hayes and Hatze [56], Wiegner and Watts [64], and Howell et al. [67]. The resultant stiffness of the elbow joint with the unactuated muscles is due to the combination of prescribed revolute joint stiffness and passive muscle stiffness. Previous studies have added damping to the muscles to improve the passive and active behavior at the elbow joint [9,68]. In this study, the damping of muscles was not considered; thus, all the nonlinearity at the joint was the result of the joint damping assigned.

For the training cases, the MB model with RL-MAC was simulated for 600 ms in each episode of the training. The training was distributed over 20 CPUs using the matlab parallel computing toolbox. Each episode took 20–30 s to simulate, and the DDPG agent was updated every 5 ms of the training simulation.

### Training Scenario 1: Bare Arm Motion Control.

The arm model was trained to reach a randomized target joint angle from any randomized initial angle within the joint range of motion for extension/flexion. The training was carried out with both the IAMR and GAMR schemes. The variation of average reward with each iteration is displayed in Fig. 8. The average reward value considered for convergence was 1500, but it can vary depending on the control objective and the reward function. The IAMR converged faster compared to the GAMR scheme. A possible explanation can be that since the muscles are actuated individually, IAMR provides greater overall control of the arm motion. Also, given the randomized nature of the training algorithms, the reward versus episode response during the training varies slightly for each time the model is trained.

Fig. 8
Fig. 8
Close modal

During the testing and validation phase of this scenario, the angle time histories of the movements produced by the trained RL-MAC were compared with volunteer kinematics angle-time plots and showed excellent agreement with the experimental data (Fig. 9). Both the IAMR and the GAMR could reproduce the motion of the volunteers in both flexion and extension after completion of training. In some cases, the simulated angle was lower or higher than the target angle due to the nature of the reward function, which allowed for the error of ±0.1 rad.

Fig. 9
Fig. 9
Close modal

The muscle activity level for the duration of arm motion was also measured for both the activation strategies (Fig. 10). The IAMR model predicted different activation levels for each muscle during the simulation, and these responses were different than those using the GAMR scheme. In the IAMR scheme, some of the muscles remained inactivated throughout the duration of motion. The activation decreases rapidly for the muscles once the joint has been stabilized at the intended position. The GAMR scheme, which prescribed identical activation to all the muscles of the same group, resulted in long-term, low-magnitude activation for both extensors and flexors trying to balance each other even after the target position was reached.

Fig. 10
Fig. 10
Close modal

The muscle activity patterns in the arm simulations are biphasic or triphasic in nature, similar to that expected in fast goal directed movements [69,70]. At the onset of the error signals, the agonist muscles are activated, and after an initial burst, the activation declines. The initial activities of the agonist muscles are directly related to the amount the elbow is required to rotate [71]. Near the completion of the motion, the antagonist muscles are activated, causing deceleration of the limb. In some of the simulations, we also saw a second burst of muscle activity in the agonists [72]. Happee argued that the third phase of muscle activation is essential in some goal directed movements where the antagonist activations have not reduced significantly once the target position is reached [73]. In our simulations, the third activation phase was more prominent in cases where the initial error was low or when a mass was attached to the distal forearm (scenario 2).

### Training Scenario 2: Arm Motion Control Under External Loads.

The training of the RL-MAC using the IAMR scheme with the added random loading under scenario 2 converged in 8947 episodes. The trained RL-MAC was tested in its ability to produce extension–flexion motion and adapt to a range of mass attached at the radius (Fig. 11).

Fig. 11
Fig. 11
Close modal

The trained model was able to perform the desired motion in the presence of external loads in the extension and flexion of the elbow. The elbow angle remained stable at the end of the simulation for both loading modes. During the flexion motion, the arm was able to carry weight up to 4.8 kg, and activation patterns for each muscle were different for different values of applied loads (Fig. 12). For the 4.8 kg flexion case, the activations of the arm flexor muscles were nearly at maximum, indicating that the 4.8 kg weight was the structural load limit of the MB arm model during flexion. The arm could carry up to 10 kg against gravity in extension, even though the mass was limited to 5 kg in training, showing that the trained RL-MAC than that for which can actuate muscles for loads different it has been trained. For extension, the arm extensor muscles achieved a maximum level of actuation while carrying a 10 kg mass.

Fig. 12
Fig. 12
Close modal

The muscle activation profiles show a similar triphasic pattern. With the increase in mass against gravity, the antagonist activity period was shorter, followed by a noticeable burst of agonist activity in the third phase. The drop-in agonist activity after reaching in target angle during flexion, even with maximum load, was due to a decrease in the moment arm. On the contrary, the agonists in extension motion remain actuated throughout.

Angle-time history of RL-MAC trained in scenario 2 was simulated in the absence of external loads, and the response was compared with the volunteer tests, and scenario 1 trained RL-MAC with the IAMR scheme. The peak velocity of the RL-MAC in scenario 2 during the goal directed motion was lower than that seen in the experiments (Fig. 13). Nevertheless, the elbow stabilized at the target angle around the same time, after 300 ms of the onset of the error signal for both trained RL agents.

Fig. 13
Fig. 13
Close modal

### Testing Scenario: Response to Novel Loads.

The model trained in scenario 2 was used to control the arm motion under a simplified impact scenario design to replicate a driver holding on to the steering wheel and bracing for a frontal MVC. The response generated with the trained IAMR in scenario 2 under the simplified MVC pulse was compared to the response of a completely passive arm model without the active muscles (Fig. 13). The arm was kept at a neutral position (90 deg) at the start of the simulation followed by application of MVC crash pulse (Fig. 7(b)) after stabilization of the joint. The crash loading was applied horizontally at the scapula and humerus proximal end for 100 ms. During the application of the crash pulse, the objective of the active arm model was to maintain the neutral position. The active model was able to maintain the stability of the joint position of the elbow for both loading conditions. The passive case, however, was unstable for the duration of the application of the load (Fig. 14).

## Discussion

The present study demonstrated the development and implementation of an active muscle control framework for a simplified human arm model based on recent RL algorithms with the intention of simulating untrained impulsive loading responses typically seen in the automotive crash environment. The RL algorithms have been used in other active human body simulations typically associated with the control of body motion [31,34,38,42,74,75], but the applicability of RL-MAC to the automotive environment has not been studied.

In this study, training of the RL-MAC was carried out with two different activation schemes. The RL-MAC was trained to minimize the error between the current and the target angle using minimum possible activation. The activation minimization criterion was included to ensure that the extensor and flexor muscles do not cocontract at the stabilized final position. Previous arm movement control studies have used the contractile element length error during the development of feedback controller [8,12]. Kistemaker et al. developed a muscle controller based on the feedback of CE element length (λ control) to simulate the same volunteer tests [7,8]. Martynenko et al. used a similar approach to model the fast arm movements [12]. In both studies, presimulation with the passive arm was performed to determine the target CE lengths of the muscles. In a more complicated anatomical system with more DOFs, CE length for a corresponding joint position is complex, and a strategy using CE length as the controller feedback to move from one arbitrary position to another may be challenging to determine. In the current study, we used elbow angle error for the RL-MAC representing a proprioceptive signal to get to a randomly defined targeted position from a randomly defined initial position [9]. Happee et al. developed a controller for the stability of the head and neck multibody model with both the head kinematics feedback and muscle length feedback to maintain the head at the neutral position [76]. However, in that study, the initial and targeted positions (and thus muscle lengths) were identical.

The trained agent could produce the required motion in both extension–flexion from any given start angle, which indicates that the controller was trained to generate arm kinematics independent of the initial position of the forearm. Both the activation schemes resulted in trained models that had similar kinematics when tested under unloaded motion scenarios, although evaluation of the activation patterns showed that the individual activation schemes actuate only those muscles which are required for the motion and imparting low activations to the others. In the grouped muscles activation framework, all the muscles belonging to the same group were activated together at the same level, which may result in a higher energy cost of motion. Grouping muscles together is a simplification that may lead to accurate external kinematics but incorrect internal loading on the tissues. The separate muscle scheme is likely more biofidelic. However, further investigation is required to clearly identify the effects of the activation frameworks on the HBM.

The control architecture could also train the arm model when the MB model was modified with added randomized point mass in a constant gravity field. This information was not provided to the RL agent, instead the controller had to formulate a generalized response of the muscles with the same states as in training scenario 1 based on how the model was performing relative to its objective. Apart from the reward function, the RL-MAC framework required negligible user input on the performance of the arm model or how the various muscles should behave as a system to accomplish an objective. Adding mass and gravity direction to the RL-MAC state could improve the learning process and overall kinematics of the arm MB model as both the parameters are important for the control process [17,19]. However, expanding the states to include mass and gravity would result in potentially overfitting the agent to the training case. Instead, RL-MAC devised a muscle control strategy to respond to the sudden change in mass from kinematics based-feedback. This kind of muscle activity modification is not possible using linear closed-loop controllers as multiple muscles need to be actuated simultaneously [18,19]. It was also found that linear models for neuromuscular control overfit the training datasets [66]. In the RL-MAC training process, both the initial kinematic parameters and environmental variables can be modified in each iteration, which can reduce overfitting. The generalized RL-MAC (scenario 2) in the absence of an external force field was found to undershoot the response of the volunteer dataset, but the overall kinematics was similar during the time frame of evaluation (Fig. 13). The initial disparity in response is because the RL-MAC in scenario 2 has not been provided with the complete information of the environment and has to rely on the kinematics feedback from the arm MB model to decipher the constant force acting on the forearm in each iteration.

The two training scenarios have demonstrated the ability of the RL framework to actuate independent muscles within a system to achieve a response. This feature will be critical for body regions more complex than the arm, where it is difficult to identify and associate different muscles working in a system to achieve motion in many different DOF. The RL-MAC efficiency was found to depend primarily on how the reward function was defined. Crowder et al. reported that an inaccurate balance between coefficients of the reward function might not be able to train the musculoskeletal system even with an increase in training time [33]. Hence, to improve the control framework, it is desirable to explore how the different components of the reward function affect the controller response and potentially inform the reward function development with various sources of new or existing volunteer data.

Fig. 14
Fig. 14
Close modal

Reinforcement learning algorithms have the potential to train active HBMs to control their motion using scenarios where there is the potential for a lot of human volunteer data (e.g., lifting weights or exercising) [43], while providing good generalization and response in untrained cases where there is not an abundance of data (e.g., muscle response during impact). This is analogous to how humans learn to control their body motion from everyday activities but still, they need to react in some fashion to scenarios they have never encountered (like in an automotive crash). In this study, it was shown that the RL-MAC was able to maintain arm stability under conditions comparable to a driver bracing on the steering wheel before an automotive frontal crash pulse. This is notable because the muscle control model with arm kinematics as input was only trained to control its motion by using a simple weightlifting scenario (training scenario-2), and the nature of the loads in an impact scenario is substantially different with higher magnitude forces acting on the elbow joint for a shorter time span. Although this case is a simplified representation of a human's response to a crash, this study demonstrates the potential for a RL controller to respond appropriately under a different set of loads for which it has not been explicitly trained. This potentially makes for a more generalizable muscle control scheme than the traditional PID approaches and eliminates the need for retraining or calibrating the model under multiple load cases, which can be computationally expensive. With RL-MACs, training can be accomplished on a simplified, fast-running model, and the trained controller can be transferred to a more complex and detailed HBM capable of simulating impact and injury. The ability of trained RL controller to generate response under novel scenarios is also useful for the development of injury countermeasures and assistive devices [42] and for training and deployment of biomimetic devices [44,77] for which the nature and models of loads acting on the devices in real-world scenarios can be different from the training environment.

The purpose of this study was to implement and demonstrate the utility of the RL-MAC for training active muscle responses in a human body model for automotive scenarios, and to accomplish this, simplifications were made to the human body model. A multibody model of the human arm was developed using Hill-type line muscles, and for this study, the series element in the Hill-type muscle have been assumed to be rigid as the focus was the development of the controller. Some previous studies have considered a series element in Hill's muscle model to represent the stiffness of tendons [78]. However, in a study by Bayer et al., it was shown that the series element of the Hill-type model contributes around 7.6% of the total muscle forces [59]. In this study, the passive structural properties of the model were verified with the stiffness data available in the literature, demonstrating that the assumption of rigid tendon here did not affect the overall response of the arm MB model, and it certainly did not affect the objective of the study.

The MB model of the arm was simplified with a revolute joint at the elbow, the pronation, supination, and motion about the elbow were not considered. Only the flexion–extension motion of the arm was modeled mainly to compare the current RL-MAC with volunteer data [8] and similar arm models with PID controllers [8,9,12]. The potential of the deep RL controllers will be better understood in body regions like the head and neck, which have multiple nonlinear joints and complex coordination of muscles which makes the control system more complicated [76,79,80]. The development of a control model for such body regions with input kinematic parameters and reward functions will be assessed in future studies. In this study, we have penalized muscle activation to minimize fatigue (Eq. (4)). The effect of implementing other cost functions such as muscle work [62] or joint energy expenditure [81] also needs to be studied. Further, it also needs to be investigated whether a trained RL-MAC can adapt to changes in anthropometries.

## Conclusions

This study implemented a methodology for integrating a robust muscle control mechanism of the elbow joint. In the current study, RL-MAC could produce goal directed arm movement, synthesize the same motion in the presence of a constant force field, and the trained controller could react to high magnitude impulse loads which provide evidence of its potential for controlling HBMs in commonly encountered chaotic scenarios. Such a control mechanism is important for motor vehicle impact scenarios which can be injurious, and thus collecting volunteer data under these situations is not possible.

The current control methodology can be extended to study the response of other more complex body regions with numerous muscles in an automotive environment and can be incorporated at the whole-body level. Active HBMs will be important tools for the development of improved injury countermeasures and protective gear in the future. Furthermore, the RL-MAC framework can also be used in biomechanics applications such as gait and occupational health research.

## References

1.
De Jager
,
M.
,
Sauren
,
A.
,
Thunnissen
,
J.
, and
Wismans
,
J.
,
1996
, “
A Global and a Detailed Mathematical Model for Head-Neck Dynamics
,”
SAE
Paper No. 962430.10.4271/962430
2.
Shewchenko
,
N.
,
Withnall
,
C.
,
Keown
,
M.
,
Gittens
,
R.
, and
Dvorak
,
J.
,
2005
, “
,”
Br. J. Sports Med.
,
39
(
Suppl. 1
), pp.
i26
i32
.10.1136/bjsm.2005.019042
3.
Iwamoto
,
M.
, and
Nakahira
,
Y.
,
2014
, “
A Preliminary Study to Investigate Muscular Effects for Pedestrian Kinematics and Injuries Using Active THUMS
,”
Proceedings of the IRCOBI Conference, IRC-14–53
, Berlin, Germany, Sept. 10–12, pp.
444
460
4.
Brolin
,
K.
,
Halldin
,
P.
, and
Leijonhufvud
,
I.
,
2005
, “
The Effect of Muscle Activation on Neck Response
,”
Traffic Inj. Prev.
,
6
(
1
), pp.
67
76
.10.1080/15389580590903203
5.
Panzer
,
M. B.
,
Fice
,
J. B.
, and
Cronin
,
D. S.
,
2011
, “
Cervical Spine Response in Frontal Crash
,”
Med. Eng. Phys.
,
33
(
9
), pp.
1147
1159
.10.1016/j.medengphy.2011.05.004
6.
Chancey
,
V. C.
,
Nightingale
,
R. W.
,
Van Ee
,
C. A.
,
Knaub
,
K. E.
, and
Myers
,
B. S.
,
2003
, “
Improved Estimation of Human Neck Tensile Tolerance: Reducing the Range of Reported Tolerance Using Anthropometrically Correct Muscles and Optimized Physiologic Initial Conditions
,”
SAE
Paper No. 2003-22-0008.10.4271/2003-22-0008
7.
Kistemaker
,
D. A.
,
2006
, “Control of Fast Goal-Directed Arm Movements,” Ph.D. thesis, Printpartners Ipskamp B.V., Enschede, The Netherlands.
8.
Kistemaker
,
D. A.
,
Van Soest
,
A. K. J.
, and
Bobbert
,
M. F.
,
2006
, “
Is Equilibrium Point Control Feasible for Fast Goal-Directed Single-Joint Movements?
,”
J. Neurophysiol.
,
95
(
5
), pp.
2898
2912
.10.1152/jn.00983.2005
9.
Östh
,
J.
,
Brolin
,
K.
, and
Happee
,
R.
,
2012
, “
Active Muscle Response Using Feedback Control of a Finite Element Human Arm Model
,”
Comput. Methods Biomech. Biomed. Eng.
,
15
(
4
), pp.
347
361
.10.1080/10255842.2010.535523
10.
Östh
,
J.
,
Brolin
,
K.
,
Carlsson
,
S.
,
Wismans
,
J.
, and
Davidsson
,
J.
,
2012
, “
The Occupant Response to Autonomous Braking: A Modeling Approach That Accounts for Active Musculature
,”
Traffic Inj. Prev.
,
13
(
3
), pp.
265
277
.10.1080/15389588.2011.649437
11.
Iwamoto
,
M.
,
Nakahira
,
Y.
, and
Kimpara
,
H.
,
2015
, “
Development and Validation of the Total Human Model for Safety (THUMS) Toward Further Understanding of Occupant Injury Mechanisms in Precrash and During Crash
,”
Traffic Inj. Prev.
,
16
(
Suppl. 1
), pp.
S36
S48
.10.1080/15389588.2015.1015000
12.
Martynenko
,
O. V.
,
Neininger
,
F. T.
, and
Schmitt
,
S.
,
2019
, “
Development of a Hybrid Muscle Controller for an Active Finite Element Human Body Model in LS-DYNA Capable of Occupant Kinematics Prediction in Frontal and Lateral Maneuvers
,” Proceedings of the 26th International Technical Conference on the Enhanced Safety of Vehicles (
ESV
), Eindhoven, The Netherlands, June 10–13, pp.
1
12
.https://www-nrd.nhtsa.dot.gov/departments/esv/26th/
13.
Inkol
,
K. A.
,
Brown
,
C.
,
McNally
,
W.
,
Jansen
,
C.
, and
McPhee
,
J.
,
2020
, “
Muscle Torque Generators in Multi-Body Dynamic Simulations of Optimal Sports Performance
,”
Multibody Syst. Dyn.
,
50
(
4
), pp.
435
452
.10.1007/s11044-020-09747-9
14.
Walter
,
J. R.
,
Günther
,
M.
,
Haeufle
,
D. F.
, and
Schmitt
,
S.
,
2021
, “
A Geometry-and Muscle-Based Control Architecture for Synthesising Biological Movement
,”
Biol. Cybern.
,
115
(
1
), pp.
7
37
.10.1007/s00422-020-00856-4
15.
Roh
,
J.
,
Cheung
,
V. C.
, and
Bizzi
,
E.
,
2011
, “
Modules in the Brain Stem and Spinal Cord Underlying Motor Behaviors
,”
J. Neurophysiol.
,
106
(
3
), pp.
1363
1378
.10.1152/jn.00842.2010
16.
Ma
,
S.
, and
Feldman
,
A. G.
,
1995
, “
Two Functionally Different Synergies During Arm Reaching Movements Involving the Trunk
,”
J. Neurophysiol.
,
73
(
5
), pp.
2120
2122
.10.1152/jn.1995.73.5.2120
17.
Lacquaniti
,
F.
,
Bosco
,
G.
,
Gravano
,
S.
,
Indovina
,
I.
,
La Scaleia
,
B.
,
Maffei
,
V.
, and
Zago
,
M.
,
2015
, “
Gravity in the Brain as a Reference for Space and Time Perception
,”
Multisensory Res.
,
28
(
5–6
), pp.
397
426
.10.1163/22134808-00002471
18.
Smeets
,
J. B. J.
,
Erkelens
,
C. J.
, and
van der Gon Denier
,
J. J.
,
1990
, “
Adjustments of Fast Goal-Directed Movements in Response to an Unexpected Inertial Load
,”
Exp. Brain Res.
,
81
(
2
), pp.
303
312
.10.1007/BF00228120
19.
Happee
,
R.
,
1993
, “
Goal-Directed Arm Movements. III: Feedback and Adaptation in Response to Inertia Perturbations
,”
J. Electromyogr. Kinesiol.
,
3
(
2
), pp.
112
122
.10.1016/1050-6411(93)90006-I
20.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
, and
Graves
,
A.
, et al.,
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.10.1038/nature14236
21.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
MIT Press
, Cambridge, MA.
22.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Graves
,
A.
,
Antonoglou
,
I.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2013
, “
Playing Atari With Deep Reinforcement Learning
,” Technical Report Deepmind Technologies,
arXiv:1312.5602
.https://arxiv.org/abs/1312.5602
23.
Lillicrap
,
T. P.
,
Hunt
,
J. J.
,
Pritzel
,
A.
,
Heess
,
N.
,
Erez
,
T.
,
Tassa
,
Y.
,
Silver
,
D.
, and
Wierstra
,
D.
,
2015
, “
Continuous Control With Deep Reinforcement Learning
,” Proceedings 6th International Conference on Learning Representations, pp. 1–14,
arXiv:1509.02971
.https://arxiv.org/abs/1509.02971
24.
Wu
,
X.
,
Liu
,
S.
,
Zhang
,
T.
,
Yang
,
L.
,
Li
,
Y.
, and
Wang
,
T.
,
2018
, “
Motion Control for Biped Robot Via DDPG-Based Deep Reinforcement Learning
,” 2018 WRC Symposium on Advanced Robotics and Automation (
WRC SARA
), Beijing, China, Aug. 16, pp.
40
45
.10.1109/WRC-SARA.2018.8584227
25.
Islam
,
R.
,
Henderson
,
P.
,
Gomrokchi
,
M.
, and
Precup
,
D.
,
2017
, “
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
,” CoRR,
arXiv:1708.04133
.
26.
Silver
,
D.
,
Schrittwieser
,
J.
,
Simonyan
,
K.
,
Antonoglou
,
I.
,
Huang
,
A.
,
Guez
,
A.
,
Hubert
,
T.
, et al.,
2017
, “
Mastering the Game of Go Without Human Knowledge
,”
Nature
,
550
(
7676
), pp.
354
359
.10.1038/nature24270
27.
Phaniteja
,
S.
,
Dewangan
,
P.
,
Guhan
,
P.
,
Sarkar
,
A.
, and
Krishna
,
K. M.
,
2017
, “
A Deep Reinforcement Learning Approach for Dynamically Stable Inverse Kinematics of Humanoid Robots
,” 2017 IEEE International Conference on Robotics and Biomimetics (
ROBIO
), Macau, Macao, Dec. 5–8, pp.
1818
1823
.10.1109/ROBIO.2017.8324682
28.
Lobos-Tsunekawa
,
K.
,
Leiva
,
F.
, and
Ruiz-del-Solar
,
J.
,
2018
, “
Visual Navigation for Biped Humanoid Robots Using Deep Reinforcement Learning
,”
IEEE Rob. Autom. Lett.
,
3
(
4
), pp.
3247
3254
.10.1109/LRA.2018.2851148
29.
Abreu
,
M.
,
Reis
,
L. P.
, and
Lau
,
N.
,
2019
, “
Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning
,”
Robot World Cup
,
Springer
,
Cham, Switzerland
, pp.
3
15
.
30.
Xu
,
D.
,
Zhang
,
Y.
,
Tan
,
W.
, and
Wei
,
H.
,
2021
, “
Reinforcement Learning Control of a Novel Magnetic Actuated Flexible-Joint Robotic Camera System for Single Incision Laparoscopic Surgery
,” 2021 IEEE International Conference on Robotics and Automation (
ICRA
), Xi'an, China, May 30–June 5, pp.
1236
1241
.10.1109/ICRA48506.2021.9560927
31.
Fischer
,
F.
,
Bachinski
,
M.
,
Klar
,
M.
,
Fleig
,
A.
, and
Müller
,
J.
,
2021
, “
Reinforcement Learning Control of a Biomechanical Model of the Upper Extremity
,”
Sci. Rep.
,
11
(
1
), pp.
1
15
.10.1038/s41598-021-93760-1
32.
Jagodnik
,
K. M.
,
Thomas
,
P. S.
,
van den Bogert
,
A. J.
,
Branicky
,
M. S.
, and
Kirsch
,
R. F.
,
2016
, “
Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement
,”
IEEE Trans. Hum.-Mach. Syst.
,
46
(
5
), pp.
723
733
.10.1109/THMS.2016.2558630
33.
Crowder
,
D. C.
,
Abreu
,
J.
, and
Kirsch
,
R. F.
,
2021
, “
Hindsight Experience Replay Improves Reinforcement Learning for Control of a MIMO Musculoskeletal Model of the Human Arm
,”
IEEE Trans. Neural Syst. Rehabil. Eng.
,
29
, pp.
1016
1025
.10.1109/TNSRE.2021.3081056
34.
Tahami
,
E.
,
Jafari
,
A. H.
, and
Fallah
,
A.
,
2014
, “
Learning to Control the Three-Link Musculoskeletal ARM Using Actor–Critic Reinforcement Learning Algorithm During Reaching Movement
,”
Biomed. Eng.: Appl., Basis Commun.
,
26
(
5
), p.
1450064
.10.4015/S1016237214500641
35.
Joos
,
E.
,
Péan
,
F.
, and
Goksel
,
O.
,
2020
, “
Reinforcement Learning of Musculoskeletal Control From Functional Simulations
,”
International Conference on Medical Image Computing and Computer-Assisted Intervention
, Lima, Peru, Oct. 4–8, pp.
135
145
.10.1007/978-3-030-59716-0_14
36.
Min
,
K.
,
Iwamoto
,
M.
,
Kakei
,
S.
, and
Kimpara
,
H.
,
2018
, “
Muscle Synergy–Driven Robust Motion Control
,”
Neural Comput.
,
30
(
4
), pp.
1104
1131
.10.1162/neco_a_01063
37.
Iwamoto
,
M.
, and
Kato
,
D.
,
2021
, “
Efficient Actor-Critic Reinforcement Learning With Embodiment of Muscle Tone for Posture Stabilization of the Human Arm
,”
Neural Comput.
,
33
(
1
), pp.
129
156
.10.1162/neco_a_01333
38.
Kidziński
,
Ł.
,
Mohanty
,
S. P.
,
Ong
,
C. F.
,
Hicks
,
J. L.
,
Carroll
,
S. F.
,
Levine
,
S.
,
Salathé
,
M.
, and
Delp
,
S. L.
,
2018
, “
Learning to Run Challenge: Synthesizing Physiologically Accurate Motion Using Deep Reinforcement Learning
,”
The NIPS'17 Competition: Building Intelligent Systems
,
Springer
,
Cham, Switzerland
, pp.
101
120
.
39.
Song
,
S.
,
Kidziński
,
Ł.
,
Peng
,
X. B.
,
Ong
,
C.
,
Hicks
,
J.
,
Levine
,
S.
,
Atkeson
,
C. G.
, and
Delp
,
S. L.
,
2021
, “
Deep Reinforcement Learning for Modeling Human Locomotion Control in Neuromechanical Simulation
,”
J. Neuroeng. Rehabil.
,
18
(
1
), pp.
1
17
.10.1186/s12984-021-00919-y
40.
La Barbera
,
V.
,
Pardo
,
F.
,
Tassa
,
Y.
,
Daley
,
M.
A.,
Richards
,
C.
,
Kormushev
,
P.
, and
Hutchinson
,
J. R.
,
2021
, “
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-Mechanical Locomotion
,” CoRR,
arXiv:2112.06061
.https: //arxiv.org/abs/2112.06061
41.
Iwamoto
,
M.
,
Nakahira
,
Y.
,
Kimpara
,
H.
,
Sugiyama
,
T.
, and
Min
,
K.
,
2012
, “
Development of a Human Body Finite Element Model With Multiple Muscles and Their Controller for Estimating Occupant Motions and Impact Responses in Frontal Crash Situations
,”
Stapp Car Crash J.
,
56
, pp.
231
268
.10.4271/2012-22-0006
42.
Luo
,
S.
,
Androwis
,
G.
,
,
S.
,
Nunez
,
E.
,
Su
,
H.
, and
Zhou
,
X.
,
2021
, “
Robust Walking Control of a Lower Limb Rehabilitation Exoskeleton Coupled With a Musculoskeletal Model Via Deep Reinforcement Learning
,”
Research Square
.10.21203/rs.3.rs-1212542/v1
43.
Denizdurduran
,
B.
,
Markram
,
H.
, and
Gewaltig
,
M. O.
,
2022
, “
Optimum Trajectory Learning in Musculoskeletal Systems With Model Predictive Control and Deep Reinforcement Learning
,”
Biol. Cybern.
, epub, pp.
1
16
.10.1007/s00422-022-00940-x
44.
Driess
,
D.
,
Zimmermann
,
H.
,
Wolfen
,
S.
,
Suissa
,
D.
,
Haeufle
,
D.
,
Hennes
,
D.
,
Toussaint
,
M.
, and
Schmitt
,
S.
,
2018
, “
Learning to Control Redundant Musculoskeletal Systems With Neural Networks and SQP: Exploiting Muscle Properties
,” 2018 IEEE International Conference on Robotics and Automation (
ICRA
), Brisbane, QLD, Australia, May 21–25, pp.
6461
6468
.10.1109/ICRA.2018.8463160
45.
Qin
,
W.
,
Tao
,
R.
,
Sun
,
L.
, and
Dong
,
K.
,
2022
, “
Muscle‐Driven Virtual Human Motion Generation Approach Based on Deep Reinforcement Learning
,”
Comput. Animation Virtual Worlds
,
33
(
3–4
), p.
e2092
.10.1002/cav.2092
46.
Cannon
,
S. C.
, and
Zahalak
,
G. I.
,
1982
, “
The Mechanical Behavior of Active Human Skeletal Muscle in Small Oscillations
,”
J. Biomech.
,
15
(
2
), pp.
111
121
.10.1016/0021-9290(82)90043-4
47.
Rack
,
P. M.
,
2011
, “
Limitations of Somatosensory Feedback in Control of Posture and Movement
,”
Compr. Physiol.
, R. Terjung, ed., pp.
229
256
.10.1002/cphy.cp010207
48.
Popescu
,
F.
,
Hidler
,
J. M.
, and
Rymer
,
W. Z.
,
2003
, “
Elbow Impedance During Goal-Directed Movements
,”
Exp. Brain Res.
,
152
(
1
), pp.
17
28
.10.1007/s00221-003-1507-4
49.
Moore
,
K. L.
, and
Dalley
,
A. F.
,
2018
,
Clinically Oriented Anatomy
, Wolters Kluwer India Pvt Ltd., Gurugram Haryana, India.
50.
Lieber
,
R. L.
,
Jacobson
,
M. D.
,
Fazeli
,
B. M.
,
Abrams
,
R. A.
, and
Botte
,
M. J.
,
1992
, “
Architecture of Selected Muscles of the Arm and Forearm: Anatomy and Implications for Tendon Transfer
,”
J. Hand Surg.
,
17
(
5
), pp.
787
798
.10.1016/0363-5023(92)90444-T
51.
Hill
,
A. V.
,
1938
, “
The Heat of Shortening and the Dynamic Constants of Muscle
,”
Proc. R. Soc. London, Ser. B
,
126
(
843
), pp.
136
195
.10.1098/rspb.1938.0050
52.
Zajac
,
F. E.
,
1989
, “
Muscle and Tendon: Properties, Models, Scaling, and Application to Biomechanics and Motor Control
,”
Crit. Rev. Biomed. Eng.
,
17
(
4
), pp.
359
411
.https://pubmed.ncbi.nlm.nih.gov/2676342/
53.
Bahler
,
A. S.
,
Fales
,
J. T.
, and
Zierler
,
K. L.
,
1967
, “
The Active State of Mammalian Skeletal Muscle
,”
J. Gen. Physiol.
,
50
(
9
), pp.
2239
2253
.10.1085/jgp.50.9.2239
54.
Winters
,
J. M.
,
1995
, “
An Improved Muscle-Reflex Actuator for Use in Large-Scale Neuromusculoskeletal Models
,”
Ann. Biomed. Eng.
,
23
(
4
), pp.
359
374
.10.1007/BF02584437
55.
Panzer
,
M.
,
2006
, “
Numerical Modelling of the Human Cervical Spine in Frontal Impact
,” Master's. thesis,
University of Waterloo
56.
Hayes
,
K. C.
, and
Hatze
,
H.
,
1977
, “
Passive Visco-Elastic Properties of the Structures Spanning the Human Elbow Joint
,”
Eur. J. Appl. Physiol. Occup. Physiol.
,
37
(
4
), pp.
265
274
.10.1007/BF00430956
57.
Lewis
,
F. W.
,
Jagannathan
,
S.
, and
Yesildirak
,
A.
,
2020
,
Neural Network Control of Robot Manipulators and Non-Linear Systems
,
CRC Press
, Boca Raton, FL.
58.
,
S.
,
2021
, “
A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments
,”
ACM Comput. Surv. (CSUR)
,
54
(
6
), pp.
1
25
.10.1145/3459991
59.
Bayer
,
A.
,
Schmitt
,
S.
,
Günther
,
M.
, and
Haeufle
,
D. F. B.
,
2017
, “
The Influence of Biophysical Muscle Properties on Simulating Fast Human Arm Movements
,”
Comput. Methods Biomech. Biomed. Eng.
,
20
(
8
), pp.
803
821
.10.1080/10255842.2017.1293663
60.
Koelewijn
,
A. D.
,
Heinrich
,
D.
, and
van den Bogert
,
A. J.
,
2019
, “
Metabolic Cost Calculations of Gait Using Musculoskeletal Energy Models, a Comparison Study
,”
PLoS One
,
14
(
9
), p.
e0222037
.10.1371/journal.pone.0222037
61.
Minetti
,
A. E.
, and
Alexander
,
R. M.
,
1997
, “
A Theory of Metabolic Costs for Bipedal Gaits
,”
J. Theor. Biol.
,
186
(
4
), pp.
467
476
.10.1006/jtbi.1997.0407
62.
Umberger
,
B. R.
,
Gerritsen
,
K. G.
, and
Martin
,
P. E.
,
2003
, “
A Model of Human Muscle Energy Expenditure
,”
Comput. Methods Biomech. Biomed. Eng.
,
6
(
2
), pp.
99
111
.10.1080/1025584031000091678
63.
De Groote
,
F.
,
Kinney
,
A. L.
,
Rao
,
A. V.
, and
Fregly
,
B. J.
,
2016
, “
Evaluation of Direct Collocation Optimal Control Problem Formulations for Solving the Muscle Redundancy Problem
,”
Ann. Biomed. Eng.
,
44
(
10
), pp.
2922
2936
.10.1007/s10439-016-1591-9
64.
Wiegner
,
A. W.
, and
Watts
,
R. L.
,
1986
, “
Elastic Properties of Muscles Measured at the Elbow in Man: I. Normal Controls
,”
J. Neurol., Neurosurg. Psychiatry
,
49
(
10
), pp.
1171
1176
.10.1136/jnnp.49.10.1171
65.
Shaw
,
G.
,
Parent
,
D.
,
Purtsezov
,
S.
,
Lessley
,
D.
,
Crandall
,
J.
,
Kent
,
R.
,
Guillemot
,
H.
,
Ridella
,
S. A.
,
Takhounts
,
E.
, and
Martin
,
P.
,
2009
, “
Impact Response of Restrained PMHS in Frontal Sled Tests: Skeletal Deformation Patterns Under Seat Belt Loading
,”
Stapp Car Crash J.
,
53
, pp.
1
48
.10.4271/2009-22-0001
66.
Happee
,
R.
,
de Vlugt
,
E.
, and
van Vliet
,
B.
,
2015
, “
Nonlinear 2D Arm Dynamics in Response to Continuous and Pulse-Shaped Force Perturbations
,”
Exp. Brain Res.
,
233
(
1
), pp.
39
52
.10.1007/s00221-014-4083-x
67.
Howell
,
J. N.
,
Chleboun
,
G.
, and
Conatser
,
R.
,
1993
, “
Muscle Stiffness, Strength Loss, Swelling and Soreness Following Exercise‐Induced Injury in Humans
,”
J. Physiol.
,
464
(
1
), pp.
183
196
.10.1113/jphysiol.1993.sp019629
68.
Wochner
,
I.
,
Endler
,
C. A.
,
Schmitt
,
S.
, and
Martynenko
,
O. V.
,
2019
, “
Comparison of Controller Strategies for Active Human Body Models With Different Muscle Materials
,”
IRCOBI Conference Proceedings
, Florence, Italy, Sept. 11–13, pp.
133
135
.https://www.semanticscholar.org/paper/Comparison-of-Controller-Strategies-for-Active-Body-Wochner-Endler/c3e7329a5ffb9e12bc068fcdc5e87eb7c13e3960
69.
Marsden
,
C. D.
,
Obeso
,
J. A.
, and
Rothwell
,
J. C.
,
1983
, “
The Function of the Antagonist Muscle During Fast Limb Movements in Man
,”
J. Physiol.
,
335
(
1
), pp.
1
13
.10.1113/jphysiol.1983.sp014514
70.
Flament
,
D.
,
Hore
,
J.
, and
Vilis
,
T.
,
1984
, “
Braking of Fast and Accurate Elbow Flexions in the Monkey
,”
J. Physiol.
,
349
(
1
), pp.
195
202
.10.1113/jphysiol.1984.sp015152
71.
,
W. J.
,
Denier
,
J. J.
,
Geuze
,
R. H.
, and
Mol
,
C. R.
,
1979
, “
Control of Fast Goal-Directed Arm Movements
,”
J. Hum. Mov. Stud.
,
5
, pp.
3
17
.https://www.researchgate.net/publication/233391758_Control_of_fast_goaldirected_arm_movements
72.
Hannaford
,
B.
, and
Stark
,
L.
,
1985
, “
Roles of the Elements of the Triphasic Control Signal
,”
Exp. Neurol.
,
90
(
3
), pp.
619
634
.10.1016/0014-4886(85)90160-8
73.
Happee
,
R.
,
1992
, “
Time Optimality in the Control of Human Movements
,”
Biol. Cybern.
,
66
(
4
), pp.
357
366
.10.1007/BF00203672
74.
Kolesnikov
,
S.
, and
Khrulkov
,
V.
,
2020
, “
Sample Efficient Ensemble Learning With Catalyst.RL
,” e-print
arXiv:2003.14210
.https://arxiv.org/abs/2003.14210
75.
Akimov
,
D.
,
2019
, “
Distributed Soft Actor-Critic With Multivariate Reward Representation and Knowledge Distillation
,” e-print
arXiv:1911.13056
.https://arxiv.org/abs/1911.13056
76.
Happee
,
R.
,
de Bruijn
,
E.
,
Forbes
,
P. A.
, and
van der Helm
,
F. C.
,
2017
, “
Dynamic Head-Neck Stabilization and Modulation With Perturbation Bandwidth Investigated Using a Multisegment Neuromuscular Model
,”
J. Biomech.
,
58
, pp.
203
211
.10.1016/j.jbiomech.2017.05.005
77.
Diamond
,
A.
, and
Holland
,
O. E.
,
2014
, “
Reaching Control of a Full-Torso, Modelled Musculoskeletal Robot Using Muscle Synergies Emergent Under Reinforcement Learning
,”
Bioinspiration Biomimetics
,
9
(
1
), p.
016015
.10.1088/1748-3182/9/1/016015
78.
Millard
,
M.
,
Uchida
,
T.
,
Seth
,
A.
, and
Delp
,
S. L.
,
2013
, “
Flexing Computational Muscle: Modeling and Simulation of Musculotendon Dynamics
,”
ASME J. Biomech. Eng.
,
135
(
2
), p.
021005
.10.1115/1.4023390
79.
Mukherjee
,
S.
,
Perez-Rapela
,
D.
,
Forman
,
J.
,
Virgilio
,
K.
, and
Panzer
,
M. B.
,
2021
, “
,”
IRCOBI Conference Proceedings
, Online, Sept. 8–10, pp.
697
698
80.
Ólafsdóttir
,
J. M.
,
Östh
,
J.
, and
Brolin
,
K.
,
2019
, “
Modelling Reflex Recruitment of Neck Muscles in a Finite Element Human Body Model for Simulating Omnidirectional Head Kinematics
,”
IRCOBI Conference Proceedings
, Florence, Italy, Sept. 11–13, pp.
308
323
81.
Berret
,
B.
,
Darlot
,
C.
,
Jean
,
F.
,
Pozzo
,
T.
,
Papaxanthis
,
C.
, and
Gauthier
,
J. P.
,
2008
, “
The Inactivation Principle: Mathematical Solutions Minimizing the Absolute Work and Biological Implications for the Planning of Arm Movements
,”
PLoS Comput. Biol.
,
4
(
10
), p.
e1000194
.10.1371/journal.pcbi.1000194