Abstract

This paper illustrates two approaches for the mobile manipulation of factory robots using deep neural networks. The networks are trained using synthetic datasets unique to the factory environment. Approach I uses depth and red-green-blue (RGB) images of objects for its convolutional neural network (CNN) and Approach II uses computer-aided design models of the objects with RGB images for a deep object pose estimation (DOPE) network and perspective-n-point (PnP) algorithm. Both the approaches are compared based on their complexity, required resources for training, robustness, pose estimation accuracy, and run-time characteristics. Recommendations of which approach is suitable under what circumstances are provided. Finally, the most suitable approach is implemented on a real mobile factory robot in order to execute a series of manipulation tasks and validate the approach.

References

1.
Redmon
,
J.
, and
Farhadi
,
A.
,
2016
, “
YOLO9000: Better, Faster, Stronger
,” https://arxiv.org/abs/1612.08242.
2.
Wise
,
M.
,
Ferguson
,
M.
,
King
,
D.
,
Diehr
,
E.
, and
Dymesich
,
D.
,
2016
, “
Fetch and Freight: Standard Platforms for Service Robot Applications
,”
Workshop on Autonomous Mobile Service Robots
.
3.
Du
,
G.
,
Wang
,
K.
,
Lian
,
S.
, and
Zhao
,
K.
,
2021
, “
Vision-Based Robotic Grasping From Object Localization, Object Pose Estimation to Grasp Estimation for Parallel Grippers: A Review
,”
Artif. Intell. Rev.
,
54
, pp.
1677
1734
.
4.
Rad
,
M.
, and
Lepetit
,
V.
,
2017
, “
BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects Without Using Depth
,”
IEEE International Conference on Computer Vision (ICCV)
,
Venice, Italy
,
Oct. 22–29
.
5.
Zeng
,
A.
,
Song
,
S.
,
Nießner
,
M.
,
Fisher
,
M.
,
Xiao
,
J.
, and
Funkhouser
,
T.
,
2017
, “
3DMatch: Learning Local Geometric Descriptors From RGB-D Reconstructions
,”
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Honolulu, HI
,
July 22–25
.
6.
Simon
,
M.
,
Fischer
,
K.
,
Milz
,
S.
,
Witt
,
C.
,
Oelsner
,
F.
,
Maeder
,
P.
, and
Gross
,
H.-M.
,
2021
, “
StickyPillars: Robust and Efficient Feature Matching on Point Clouds Using Graph Neural Networks
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Nashville, TN
,
June 20–25
.
7.
Xiang
,
Y.
,
Schmidt
,
T.
,
Narayanan
,
V.
, and
Fox
,
D.
,
2018
, “
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
,”
In Robotics: Science and Systems (RSS)
,
Pittsburgh, PA
,
June 26–30
.
8.
Kehl
,
W.
,
Manhardt
,
F.
,
Tombari
,
F.
,
Ilic
,
S.
, and
Navab
,
N.
,
2017
, “
SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again
,”
IEEE International Conference on Computer Vision (ICCV)
,
Venice, Italy
,
Oct. 22–29
.
9.
Lu
,
W.
,
Wan
,
G.
,
Zhou
,
Y.
,
Fu
,
X.
,
Yuan
,
P.
, and
Song
,
S.
,
2019
, “
DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration
,”
IEEE International Conference on Computer Vision (ICCV)
,
Seoul, South Korea
,
Oct. 27–Nov. 2
.
10.
Wang
,
C.
,
Xu
,
D.
,
Zhu
,
Y.
,
Martín-Martín
,
R.
,
Lu
,
C.
,
Fei-Fei
,
L.
, and
Savarese
,
S.
,
2019
, “
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Long Beach, CA
,
June 15–20
.
11.
He
,
Y.
,
Sun
,
W.
,
Huang
,
H.
,
Liu
,
J.
,
Fan
,
H.
, and
Sun
,
J.
,
2020
, “
PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation
,”
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Seattle, WA
,
June 13–19
.
12.
Calli
,
B.
,
Walsman
,
A.
,
Singh
,
A.
,
Srinivasa
,
S.
,
Abbeel
,
P.
, and
Dollar
,
A. M.
,
2015
, “
Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set
,”
IEEE Rob. Autom. Mag.
,
22
(
3
), pp.
36
52
.
13.
Hinterstoisser
,
S.
,
Lepetit
,
V.
,
Ilic
,
S.
,
Holzer
,
S.
,
Bradski
,
G.
,
Konolige
,
K.
, and
Navab
,
N.
,
2013
, “Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes,”
Computer Vision – ACCV 2012
,
Lee
,
K. M.
,
Matsushita
,
Y.
,
Rehg
,
J. M.
,
Hu
,
Z.
, eds.,
Springer
,
Berlin
, pp.
548
562
.
14.
Tremblay
,
J.
,
To
,
T.
,
Sundaralingam
,
B.
,
Xiang
,
Y.
,
Fox
,
D.
, and
Birchfield
,
S.
,
2018
, “
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects
,”
Conference on Robot Learning (CoRL) 2018
,
Zurich, Switzerland
,
Oct. 29–31
.
15.
Lepetit
,
V.
,
Moreno-Noguer
,
F.
, and
Fua
,
P.
,
2009
, “
EPnP: An Accurate O(n) Solution to the PnP Problem
,”
Int. J. Comput. Vis.
,
81
.
16.
Chowdhury
,
A. B.
,
Roberson
,
J.
,
Hukkoo
,
A.
,
Bodapati
,
S.
, and
Cappelleri
,
D. J.
,
2020
, “
Automated Complete Blood Cell Count and Malaria Pathogen Detection Using Convolution Neural Network
,”
IEEE Rob. Autom. Lett.
,
5
(
2
), pp.
1047
1054
.
17.
Redmon
,
J.
, and
Farhadi
,
A.
,
2017
, “
YOLO9000: Better, Faster, Stronger
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 22–25
, pp.
7263
7271
.
18.
Redmon
,
J.
,
Divvala
,
S.
,
Girshick
,
R.
, and
Farhadi
,
A.
,
2016
, “
You Only Look Once: Unified, Real-Time Object Detection
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp.
779
788
.
19.
McArthur
,
D. R.
,
Chowdhury
,
A. B.
, and
Cappelleri
,
D. J.
,
2019
, “
Autonomous Door Opening With the Interacting-BoomCopter UAV
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Anaheim, CA
,
Aug. 18– 21
, Paper No. V05AT07A045.
20.
McArthur
,
D. R.
,
Chowdhury
,
A. B.
, and
Cappelleri
,
D. J.
,
2020
, “
Autonomous Door Opening With the Interacting-BoomCopter Unmanned Aerial Vehicle
,”
ASME J. Mech. Rob.
,
12
(
2
), p.
021102
.
21.
Fischler
,
M. A.
, and
Bolles
,
R. C.
,
1981
, “
Random Sample Consensus: A Paradigm for Model Fitting With Applications to Image Analysis and Automated Cartography
,”
Commun. ACM
,
24
(
6
), pp.
381
395
.
22.
Simonyan
,
K.
, and
Zisserman
,
A.
,
2015
, “
Very Deep Convolutional Networks for Large-Scale Image Recognition
,”
International Conference on Learning Representations.
,
San Diego, CA
,
May 7–9
.
23.
To
,
T.
,
Tremblay
,
J.
,
McKay
,
D.
,
Yamaguchi
,
Y.
,
Leung
,
K.
,
Balanon
,
A.
,
Cheng
,
J.
,
Hodge
,
W.
, and
Birchfield
,
S.
,
2018
, “
NDDS: NVIDIA Deep Learning Dataset Synthesizer
,”
CVPR 2018 Workshop on Real World Challenges and New Benchmarks for Deep Learning in Robotic Vision
,
Salt Lake City, UT
,
June 22
.
24.
Wong
,
M.
,
Kunii
,
K.
,
Baylis
,
M.
,
Ong
,
W. H.
,
Kroupa
,
P.
, and
Koller
,
S.
,
2019
, “
Synthetic Dataset Generation for Object-to-Model Deep Learning in Industrial Applications
,”
PeerJ Comput. Sci.
,
5
, p.
e222
.
25.
Blender Online Community
,
2018
,
Blender—A 3D Modelling and Rendering Package
,
Blender Foundation
.
26.
Tremblay
,
J.
,
Prakash
,
A.
,
Acuna
,
D.
,
Brophy
,
M.
,
Jampani
,
V.
,
Anil
,
C.
,
To
,
T.
,
Cameracci
,
E.
,
Boochoon
,
S.
, and
Birchfield
,
S.
,
2018
, “
Training Deep Networks With Synthetic Data: Bridging the Reality Gap by Domain Randomization
.”
CoRR
, abs/1804.06516.
27.
Otsu
,
N.
,
1979
, “
A Threshold Selection Method From Gray-Level Histograms
,”
IEEE Trans. Syst. Man Cybern.
,
9
(
1
), pp.
62
66
.
You do not currently have access to this content.