Abstract

This study explores the design issues of a learning-based approach to solving a tri-finger robotic arm manipulating task, which requires complex movements and coordination among the fingers. We train an agent to acquire the necessary skills for proficient manipulation by employing reinforcement learning. To enhance the learning efficiency, effectiveness, and robustness, two knowledge transfer strategies, fine-tuning and curriculum learning, are utilized and compared within the soft actor-critic architecture. Fine-tuning allows the agent to leverage pre-trained knowledge and adapt it to new tasks. Several tasks and learning-related factors are investigated and evaluated, such as model versus policy transfer and within- versus across-task transfer. To eliminate the need for pretraining, curriculum learning decomposes the advanced task into simpler and progressive stages, mirroring how humans learn. The number of learning stages, the context of the subtasks, and the transition timing are examined as critical design parameters. The key design parameters of two learning strategies and their corresponding effects are explored in context-aware and context-unaware scenarios, allowing us to identify the scenarios where the methods demonstrate optimal performance, derive conclusive insights, and contribute to a broader range of learning-based engineering applications.

References

1.
Wang
,
T.
,
Puang
,
E. Y.
,
Lee
,
M.
,
Jing
,
W.
, and
Wu
,
Y.
,
2022
, “
End-to-end reinforcement learning of robotic manipulation with robust keypoints representation
,”
2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
,
Chiang Mai, Thailand
,
Nov. 7–10
.
2.
Ahn
,
M.
,
Brohan
,
A.
,
Brown
,
N.
,
Chebotar
,
Y.
,
Cortes
,
O.
,
David
,
B.
,
Finn
,
C.
, et al
,
2023
, “
Do as I Can, Not as I Say: Grounding Language in Robotic Affordances
,”
Conference on Robot Learning
,
Atlanta, GA
,
Nov. 6–9
.
3.
Shafiee
,
A.
,
Nag
,
A.
,
Muralimanohar
,
N.
,
Balasubramonian
,
R.
,
Strachan
,
J. P.
,
Hu
,
M.
,
Williams
,
R. S.
, and
Srikumar
,
V.
,
2016
, “
ISAAC: A Convolutional Neural Network Accelerator With In-Situ Analog Arithmetic in Crossbars.
,”
ACM SIGARCH Comput. Archit. News
,
44
(
3
), pp.
14
26
.
4.
Levine
,
S.
,
Pastor
,
P.
,
Krizhevsky
,
A.
,
Ibarz
,
J.
, and
Quillen
,
D.
,
2018
, “
Learning Hand-Eye Coordination for Robotic Grasping With Deep Learning and Large-Scale Data Collection.
,”
Int. J. Rob. Res.
,
37
(
4–5
), pp.
421
436
.
5.
Rusu
,
A. A.
,
Večerík
,
M.
,
Rothörl
,
T.
,
Heess
,
N.
,
Pascanu
,
R.
, and
Hadsell
,
R.
,
2017
, “
Sim-to-Real Robot Learning From Pixels With Progressive Nets
,”
Conference on Robot Learning
,
Mountain View, CA
,
Oct. 18–21
, pp.
262
270
.
6.
Barth
,
S. W.
, and
Norton
,
S. W.
,
1988
, “Knowledge Engineering Within a Generalized Bayesian Framework,”
Machine Intelligence and Pattern Recognition
, Vol.
5
,
J.
Lemmer
and
L.
Kanal
, eds.,
North-Holland
,
Amsterdam
, pp.
103
114
.
7.
Hu
,
C.
, and
Jin
,
Y.
,
2023
, “
Long-Range Risk-Aware Path Planning for Autonomous Ships in Complex and Dynamic Environments.
,”
ASME J. Comput. Inf. Sci. Eng.
,
23
(
4
), p.
041007
.
8.
Bahrami
,
M.
, and
Kaviani
,
S.
,
2008
, “
A New Method for Knowledge Representation in Expert System’s (XMLKR)
,”
2008 First International Conference on Emerging Trends in Engineering and Technology
,
Nagpur, Maharashtra, India
,
July 16
,
pp. 326–331. IEEE
.
9.
Hu
,
C.
, and
Jin
,
Y.
,
2022
, “
Path Planning for Autonomous Systems Design: A Focus Genetic Algorithm for Complex Environments
,”
ASME J. Auto. Veh. Syst.
,
2
(
4
), p.
041001
.
10.
Yin
,
Y.
,
Tran
,
M.
,
Chang
,
D.
,
Wang
,
X.
, and
Soleymani
,
M.
,
2023
, “Multi-Modal Facial Action Unit Detection With Large Pre-Trained Models for the 5th Competition on Affective Behavior Analysis in-the-Wild.” arXiv preprint arXiv:2303.10590.
11.
Wang
,
X.
, and
Jin
,
Y.
,
2022
, “
Work Process Transfer Reinforcement Learning: Feature Extraction and Fine-Tuning in Ship Collision Avoidance
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (IDETC/CIE)
,
St. Louis, MO
,
Aug. 14–17
,
Vol. 86212, American Society of Mechanical Engineers, p. V002T02A069.
12.
Wang
,
X.
, and
Jin
,
Y.
,
2023
, “
Transfer Reinforcement Learning: Feature Transferability in Ship Collision Avoidance
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (IDETC/CIE)
,
Boston, MA
,
Aug. 20–23
,
Vol. 87318, American Society of Mechanical Engineers, p. V03BT03A071.
13.
Mnih
,
V.
,
2015
, “
Reinforcement Learning.
,”
Nature
,
518
(
7540
), pp.
529
533
.
14.
Konda
,
V.
, and
Tsitsiklis
,
J.
,
1999
, “
Actor-Critic Algorithms
,”
Advances in Neural Information Processing Systems (NeurIPS)
,
Denver, CO
, pp.
1008
1014
.
15.
Ahmed
,
O.
,
Träuble
,
F.
,
Goyal
,
A.
,
Neitz
,
A.
,
Bengio
,
Y.
,
Schölkopf
,
B.
,
Wüthrich
,
M.
, and
Bauer
,
S.
,
2020
, “CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning.” arXiv preprint arXiv:2010.04296.
16.
Bengio
,
Y.
,
Louradour
,
J.
,
Collobert
,
R.
, and
Weston
,
J.
,
2009
, “
Curriculum Learning
,”
Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009
,
Montreal, Canada
,
June 14
, pp.
41
48
.
17.
Tidd
,
B.
,
Hudson
,
N.
, and
Cosgun
,
A.
,
2020
, “Guided Curriculum Learning for Walking Over Complex Terrain.” arXiv preprint arXiv:2010.03848.
18.
Gao
,
L.
,
Cordova
,
G.
,
Danielson
,
C.
, and
Fierro
,
R.
,
2023
, “
Autonomous Multi-Robot Servicing for Spacecraft Operation Extension
,”
Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Detroit, MI
,
Oct. 1–5
.
19.
Zoghlami
,
F.
,
Kurrek
,
P.
,
Jocas
,
M.
,
Masala
,
G.
, and
Salehi
,
V.
,
2021
, “
Design of a Deep Post Gripping Perception Framework for Industrial Robots.
,”
ASME J. Comput. Inf. Sci. Eng.
,
21
(
2
), p.
021003
.
20.
Kurrek
,
P.
,
Zoghlami
,
F.
,
Jocas
,
M.
,
Stoelen
,
M.
, and
Salehi
,
V.
,
2020
, “
Q-Model: An Artificial Intelligence Based Methodology for the Development of Autonomous Robots
,”
ASME J. Comput. Inf. Sci. Eng.
,
20
(
6
), p.
061006
.
21.
Jung Yoon
,
Y.
,
Narayan
,
S. V.
, and
Gupta
,
S. K.
,
2024
, “
Self-Supervised Learning of Spatially Varying Process Parameter Models for Robotic Finishing Tasks
,”
ASME J. Comput. Inf. Sci. Eng.
,
24
(
2
), p.
021008
22.
Gu
,
S.
,
Holly
,
E.
,
Lillicrap
,
T.
, and
Levine
,
S.
,
2017
, “
Deep Reinforcement Learning for Robotic Manipulation With Asynchronous Off-Policy Updates
,”
2017 IEEE International Conference on Robotics and Automation (ICRA)
,
Singapore
,
May 29
, pp.
3389
3396
.
23.
Huang
,
B.
, and
Jin
,
Y.
,
2022
, “
Reward Shaping in Multiagent Reinforcement Learning for Self-Organizing Systems in Assembly Tasks.
,”
Adv. Eng. Inform.
,
54
, p.
101800
.
24.
Liu
,
R.
,
Nageotte
,
F.
,
Zanne
,
P.
,
de Mathelin
,
M.
, and
Dresp-Langley
,
B.
,
2021
, “
Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review.
,”
Robotics
,
10
(
1
), p.
22
.
25.
Coumans
,
E.
, and
Bai
,
Y.
,
2016
, Pybullet: A Python Module for Physics Simulation for Games, Robotics and Machine Learning. [Online], https://pybullet.org
26.
Yoon
,
J.
,
Wu
,
Y.-F.
,
Bae
,
H.
, and
Ahn
,
S.
,
2023
, “An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning.” arXiv preprint arXiv:2302.04419.
27.
Allshire
,
A.
,
MittaI
,
M.
,
Lodaya
,
V.
,
Makoviychuk
,
V.
,
Makoviichuk
,
D.
,
Widmaier
,
F.
,
Wüthrich
,
M.
,
Bauer
,
S.
,
Handa
,
A.
, and
Garg
,
A.
,
2022
, “
Transferring Dexterous Manipulation From GPU Simulation to a Remote Real-World Trifinger
,”
2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Kyoto, Japan
,
Oct. 23
, IEEE, pp.
11802
11809
.
28.
Ding
,
W.
,
Lin
,
H.
,
Li
,
B.
, and
Zhao
,
D.
,
2022
, “
Generalizing Goal-Conditioned Reinforcement Learning With Variational Causal Reasoning.
,”
Adv. Neural Inf. Process. Syst.
,
35
, pp.
26532
26548
.
29.
Sontakke
,
S. A.
,
Mehrjou
,
A.
,
Itti
,
L.
, and
Schölkopf
,
B.
,
2021
, “
Causal Curiosity: Rl Agents Discovering Self-Supervised Experiments for Causal Representation Learning
,”
Proceedings of the International Conference on Machine Learning
,
Virtual
,
July 1
, ICML 2021, pp.
9848
9858
.
30.
He
,
T.
,
Gajcin
,
J.
, and
Dusparic
,
I.
,
2022
, “Causal Counterfactuals for Improving the Robustness of Reinforcement Learning.” arXiv preprint arXiv:2211.05551.
31.
Baradel
,
F.
,
Neverova
,
N.
,
Mille
,
J.
,
Mori
,
G.
, and
Wolf
,
C.
,
2019
, “Cophy: Counterfactual Learning of Physical Dynamics.” arXiv preprint arXiv:1909.12000.
32.
Li
,
S.
,
Mo
,
Y.
, and
Li
,
Z.
,
2022
, “
Automated Pneumonia Detection in Chest X-Ray Images Using Deep Learning Model
,”
Innov. Appl. Eng. Technol.
,
1
(
1
), pp.
1
6
.
33.
Pang
,
N.
,
Qian
,
L.
,
Lyu
,
W.
, and
Yang
,
J.-D.
,
2019
, “
Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with joint BERT-CRF Model
,”
BIRNDL@ SIGIR
,
Paris, France
,
July 25
, pp.
28
41
.
34.
Joshi
,
G.
, and
Chowdhary
,
G.
,
2021
, “Adaptive Policy Transfer in Reinforcement Learning.” arXiv preprint arXiv:2105.04699.
35.
Laskin
,
M.
,
Srinivas
,
A.
, and
Abbeel
,
P.
,
2020
, “
Curl: Contrastive Unsupervised Representations for Reinforcement Learning
,”
Proceedings of the International Conference on Machine Learning
,
PMLR
.
36.
Shrivastava
,
D.
,
Gunesh Dhekane
,
E.
, and
Islam
,
R.
,
2019
, “Transfer Learning by Modeling a Distribution Over Policies.” arXiv preprint arXiv:1906.03574.
37.
Yin
,
H.
, and
Pan
,
S.
,
2017
, “
Knowledge Transfer for Deep Reinforcement Learning With Hierarchical Experience Replay
,”
Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2017
,
San Francisco, CA
,
Feb. 14
, Vol. 31, No. 1, pp.
1640
1646
.
38.
Klink
,
P.
,
D'Eramo
,
C.
,
Peters
,
J. R.
, and
Pajarinen
,
J.
,
2020
, “
Self-Paced Deep Reinforcement Learning
,”
Adv. Neural Inf. Process. Syst.
,
33
, pp.
9216
9227
.
39.
Foglino
,
F.
,
Christakou
,
C. C.
, and
Leonetti
,
M.
,
2019
, “
An Optimization Framework for Task Sequencing in Curriculum Learning
,”
2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)
,
Valencia, Spain
,
Aug. 19
,
IEEE, pp. 207–214
.
40.
Zeng
,
Y.
,
Duan
,
J.
,
Li
,
Y.
,
Ferrara
,
E.
,
Pinto
,
L.
,
Kuo
,
C.-C. J.
, and
Nikolaidis
,
S.
,
2022
, “Human Decision Makings on Curriculum Reinforcement Learning With Difficulty Adjustment.” arXiv preprint arXiv:2208.02932.
41.
Narvekar
,
S.
,
Peng
,
B.
,
Leonetti
,
M.
,
Sinapov
,
J.
,
Taylor
,
M. E.
, and
Stone
,
P.
,
2020
, “
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
,”
J. Mach. Learn. Res.
,
21
(
1
), pp.
7382
7431
.
42.
Campbell
,
R.
, and
Yoon
,
J.
,
2023
, “Automatic Curriculum Learning With Gradient Reward Signals.” arXiv preprint arXiv:2312.13565.
43.
Racaniere
,
S.
,
Lampinen
,
A.
,
Santoro
,
A.
,
Reichert
,
D.
,
Firoiu
,
V.
, and
Lillicrap
,
T.
,
2020
, “
Automated Curriculum Generation Through Setter-Solver Interactions
,”
International Conference on Learning Representations (ICLR)
,
Virtual
,
Apr. 26
.
44.
Haarnoja
,
T.
,
Zhou
,
A.
,
Abbeel
,
P.
, and
Levine
,
S.
,
2018
, “
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor
,”
Proceedings of the International Conference on Machine Learning
,
PMLR
.
45.
Hasselt
,
H.
,
2010
, “
Double Q-Learning
,”
Adv. Neural Inf. Process. Syst.
,
23
, pp.
2613​
2621
.
46.
Fujimoto
,
S.
,
Hoof
,
H.
, and
Meger
,
D.
,
2018
, “
Addressing Function Approximation Error in Actor-Critic Methods
,”
International Conference on Machine Learning (ICML)
,
Stockholm, Sweden
,
July 3
, pp.
1587
1596
.
47.
Ji
,
H.
, and
Jin
,
Y.
,
2022
, “
Knowledge Acquisition of Self-Organizing Systems With Deep Multiagent Reinforcement Learning
,”
ASME J. Comput. Inf. Sci. Eng.
,
22
(
2
), p.
021010
.
48.
Chen
,
J.
,
Lan
,
T.
, and
Choi
,
N.
,
2023
, “
Distributional-Utility Actor-Critic for Network Slice Performance Guarantee
,”
Proceedings of the Twenty-Fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
,
Montreal, Canada
,
Oct. 23
, pp.
161
170
.
49.
Huang
,
B.
, and
Jin
,
Y.
,
2023
, “
Knowledge Transfer in Self-Organizing Systems and Impact of Social Ability
,”
Proceedings of the International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Boston, MA
,
Aug. 20
,
Volume 87318, ASME, Paper No. V03BT03A004
.
50.
Chen
,
J.
, and
Lan
,
T.
,
2023
, “Minimizing Return Gaps With Discrete Communications in Decentralized pomdp.” arXiv preprint arXiv:2308.03358.
You do not currently have access to this content.