Abstract

In this work, three reinforcement learning algorithms (Proximal Policy Optimization, Soft Actor-Critic, and Twin Delayed Deep Deterministic Policy Gradient) are employed to control a two link selective compliance articulated robot arm (SCARA) robot. This robot has three cables attached to its end-effector, which creates a triangular shaped workspace. Positioning the end-effector in the workspace is a relatively simple kinematic problem, but moving outside this region, although possible, requires a nonlinear dynamic model and a state-of-the-art controller. To solve this problem in a simple manner, reinforcement learning algorithms are used to find possible trajectories for three targets out of the workspace. Additionally, the SCARA mechanism offers two possible configurations for each end-effector position. The algorithm results are compared in terms of displacement error, velocity, and standard deviation among ten trajectories provided by the trained network. The results indicate the Proximal Policy Algorithm as the most consistent in the analyzed situations. Still, the Soft Actor-Critic presented better solutions, and Twin Delayed Deep Deterministic Policy Gradient provided interesting and more unusual trajectories.

References

1.
Williams
,
R. L.
, and
Gallina
,
P.
,
2001
, “
Planar Cable-Direct-Driven Robots: Part I—Kinematics and Statics
,”
ASME
Paper No. DETC2001/DAC-21145. 10.1115/DETC2001/DAC-21145
2.
Bosscher
,
P.
,
Riechel
,
A. T.
, and
Ebert-Uphoff
,
I.
,
2006
, “
Wrench-Feasible Workspace Generation for Cable-Driven Robots
,”
IEEE Trans. Rob.
,
22
(
5
), pp.
890
902
.10.1109/TRO.2006.878967
3.
Shariatee
,
M.
,
Akbarzadeh
,
A.
,
Mousavi
,
A.
, and
Alimardani
,
S.
,
2014
, “
Design of an Economical Scara Robot for Industrial Applications
,” Second RSI/ISM International Conference on Robotics and Mechatronics (
ICRoM
),
IEEE
,
Tehran, Iran
, Oct. 15–17, pp.
534
539
.10.1109/ICRoM.2014.6990957
4.
Trevisani
,
A.
,
Gallina
,
P.
, and
Williams
,
R. L.
,
2006
, “
Cable-Direct-Driven Robot (CDDR) With Passive SCARA Support: Theory and Simulation
,”
J. Intell. Rob. Syst.
,
46
, pp.
73
94
.10.1007/s10846-006-9043-7
5.
Trevisani
,
A.
,
2010
, “
Underconstrained Planar Cable-Direct-Driven Robots: A Trajectory Planning Method Ensuring Positive and Bounded Cable Tensions
,”
Mechatronics
,
20
(
1
), pp.
113
127
.10.1016/j.mechatronics.2009.09.011
6.
Das
,
M. T.
, and
Dülger
,
L. C.
,
2005
, “
Mathematical Modelling, Simulation and Experimental Verification of a SCARA Robot
,”
Simul. Modell. Pract. Theory
,
13
(
3
), pp.
257
271
.10.1016/j.simpat.2004.11.004
7.
Wang
,
Y.
,
Hang
,
L.-B.
, and
Yang
,
T.-L.
,
2006
, “
Inverse Kinematics Analysis of General 6r Serial Robot Mechanism Based on Gröbner Base
,”
Front. Mech. Eng. China
,
1
(
1
), pp.
115
124
.10.1007/s11465-005-0022-7
8.
Guzman-Gimenez
,
J.
,
Valera Fernandez
,
A.
,
Mata Amela
,
V.
, and
Diaz-Rodriguez
,
M. A.
,
2020
, “
Synthesis of the Inverse Kinematic Model of Non-Redundant Open-Chain Robotic Systems Using Groebner Basis Theory
,”
Appl. Sci.
,
10
(
8
), p.
2781
.10.3390/app10082781
9.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
MIT Press
, Cambridge, MA.
10.
Ebert
,
D. S.
,
Musgrave
,
F. K.
,
Peachey
,
D.
,
Perlin
,
K.
, and
Worley
,
S.
,
2003
,
Texturing & Modeling: A Procedural Approach
,
Morgan Kaufmann
, San Francisco, CA.
11.
Zienkiewicz
,
O. C.
, and
Taylor
,
R. L.
,
2005
,
The Finite Element Method for Solid and Structural Mechanics
,
Elsevier
, Berkeley, CA.
12.
Yuan
,
Y.
, and
Mahmood
,
A. R.
,
2022
, “
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots
,”
International Conference on Robotics and Automation
(
ICRA
),
IEEE
,
Philadelphia, PA
, May 23–27, pp.
5546
5552
.10.1109/ICRA46639.2022.9811771
13.
OpenAI
,
2022
, “
Introducing ChatGPT
,” accessed Feb. 2, 2023, https://openai.com/blog/chatgpt
14.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,” Computing Research Repository, arXiv:1707.06347.
15.
Haarnoja
,
T.
,
Zhou
,
A.
,
Abbeel
,
P.
, and
Levine
,
S.
,
2018
, “
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor
,”
International Conference on Machine Learning
(
PMLR
), Stockholm, Sweden, pp.
1861
1870
.https://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf
16.
Fujimoto
,
S.
,
Hoof
,
H.
, and
Meger
,
D.
,
2018
, “
Addressing Function Approximation Error in Actor-Critic Methods
,”
International Conference on Machine Learning
(
PMLR
), Stockholm, Sweden, pp.
1587
1596
.https://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf
17.
Yu
,
E. Y.
,
2020
, “
Coding PPO From Scratch With PyTorch
,” accessed Mar. 14, 2022, https://medium.com/analytics-vidhya/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8
18.
Tabor
,
P.
,
2018
, “
Reinforcement Learning Repository
,” accessed Mar. 14, 2022, https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning
19.
Hindmarsh
,
A.
, and
Petzold
,
L.
,
2005
, “
LSODA, Ordinary Differential Equation Solver for Stiff or Non-Stiff System
,” International Nuclear Information System (INIS), 41(31).
20.
Virtanen
,
P.
,
Gommers
,
R.
,
Oliphant
,
T. E.
,
Haberland
,
M.
,
Reddy
,
T.
,
Cournapeau
,
D.
,
Burovski
,
E.
, et al.,
2020
, “
Scipy 1.0: Fundamental Algorithms for Scientific Computing in Python
,”
Nat. Methods
,
17
(
3
), pp.
261
272
.10.1038/s41592-019-0686-2
21.
Brockman
,
G.
,
Cheung
,
V.
,
Pettersson
,
L.
,
Schneider
,
J.
,
Schulman
,
J.
,
Tang
,
J.
, and
Zaremba
,
W.
,
2016
, “
OpenAI Gym
,” arXiv:1606.01540.
You do not currently have access to this content.