Abstract
Robotic technology can benefit disassembly operations by reducing human operators' workload and assisting them with handling hazardous materials. Safety consideration and prediction of the human movement are priorities in close collaboration between humans and robots. The point-by-point forecasting of human hand motion, which forecasts one point at each time, does not provide enough information on human movement due to errors between the actual movement and the predicted value. This study provides a range of possible hand movements to increase safety. It applies three machine learning techniques, including long short-term memory (LSTM), gated recurrent unit (GRU), and Bayesian neural network (BNN) combined with bagging and Monte Carlo dropout (MCD), namely, LSTM-bagging, GRU-bagging, and BNN-MCD to predict the possible movement range. The study uses an inertial measurement unit (IMU) dataset collected from the disassembly of desktop computers by several participants to show the application of the proposed method.
1 Introduction
Human–robot collaboration in disassembly operations has been receiving attention in recent years. Several topics such as disassembly sequence planning, object detection, human activity recognition, and human motion prediction are important when it comes to the disassembly operation in human–robot collaboration.
The aforementioned topics aim to facilitate human–robot collaboration from different aspects. The disassembly sequence planning determines the most suitable sequence for dismantling a product and sometimes specifies the task allocation between the human and the robot. Previous studies considered factors such as cost and safety when allocating disassembly tasks between humans and robots in human–robot collaboration [1]. The idea is to use the capabilities of robots for handling hazardous tasks and improving operator safety. Object detection helps the robot identify the objects for grasping, picking, and holding actions [2]. Human activity recognition improves the operator's safety and helps the robot operate autonomously while increasing the work productivity [3].
Besides object detection and human activity recognition, human motion prediction plays an important role in enhancing the operator's safety [4]. Kaipa et al. [5] designed a hybrid cell to study the operator's safety and robotic operation to facilitate assembly in a jet engine case study. Morato et al. [6] studied the multiple Kinect setup to track the real-time human body joints such as elbows, wrists, and shoulders to increase the operator's safety in real-time collaboration with the robot. One of the main challenges in human motion prediction is the complexity of considering uncertainties in human motion [7].
Previous studies have used a wide range of methods in addressing human motion prediction in different applications. To name a few, Wang et al. [8] used the long short-term memory (LSTM) and convolutional neural network (CNN) to predict human motion for objects such as cup, stone, sponge, spoon, and knife with different actions. Li et al. [9] applied the directed acyclic graph neural network to predict human motion in the CMU MOCAP and H3.6M datasets for actions such as walking and eating. Martinez et al. [10] used the gated recurrent unit (GRU) for human motion prediction using the H3.6M dataset, and Pavllo et al. [11] combined the quarter net framework with GRU to predict human motion using the H3.6M dataset. Zheng et al. [12] applied LSTM to forecast human arm motion on the generated data from a Franka Emika Panda Cobot. Wang and Shen [4] used the neural networks combined with Kalman filtering to predict human hand motion for picking actions. Wang et al. [13] applied LSTM for hand motion on the surface grinding plane. Zhang et al. [14] built a recurrent neural network (RNN) model to predict motion trajectory prediction in the assembly process. Gril et al. [15] adopt the linear tensor regression model to predict the human motion in the assembly and disassembly operation of six pins, springs, and ball bearings repetitively. Liao et al. [16] combined convolutional long short-term memory (ConvLSTM) and you only look once (YOLO) to predict human hand motion in the disassembly process of desktops.
Previous studies also have investigated the uncertainty of tasks and human motion in human–robot collaboration. To name a few, Burks et al. [17] proposed an assisted robotic planning and sensing framework and applied the online partially observable Markov decision process for semantic sensing and planning under uncertain environments. Sajedi et al. [18] applied the Bayesian neural networks to quantify the uncertainty for semantic segmentation of hands in human–robot collaboration. Furnari et al. [19] discussed the loss function incorporating uncertainty for the egocentric action anticipation and recognition methods. Abu Farha and Gall [20] developed a framework for modeling the uncertainty of future activities and predicted the probability distribution of activities. Casalino et al. [21] developed a fuzzy approach for scheduling assembly tasks considering uncertain durations of tasks in a human–robot collaboration setting.
Although previous studies have extensively addressed human motion prediction, the literature on predicting the movement interval is still limited. This study aims to investigate the performance of three machine learning models—LSTM, GRU, and Bayesian neural network (BNN)—in combination with bagging and Monte Carlo dropout (MCD) techniques for estimating the potential range of human motion. Specifically, we examine the performance of three model variants: LSTM-bagging, GRU-bagging, and BNN-MCD. We also explore the unique application of electronic waste (e-waste) disassembly. Table 1 provides a comparison of this study with the prior work.
Comparison of literature and this study
Reference | Methodology | Type of forecast | Experimental process | Human–robot collaboration | E-waste |
---|---|---|---|---|---|
[4] | Neural network with Kalman filtering | Point | Pick up tasks | ||
[8] | LSTM with CNN | Point | Objects manipulation | √ | |
[9] | Directed acyclic graph neural network | Point | CMU MOCAP and H3.6M | ||
[10] | GRU | Point | Human 3.6M | ||
[11] | QuaterNet with GRU | Point | Human 3.6M | ||
[12] | LSTM | Point | Arm motion | √ | |
[13] | LSTM | Point | Surface grinding | √ | |
[14] | RNN | Point | Assembly | √ | |
[15] | Linear tensor regression | Interval | Assembly and disassembly | √ | |
[16] | ConvLSTM with YOLO | Point | Disassembly | √ | √ |
This study | LSTM-bagging GRU-bagging BNN-MCD | Interval | Disassembly | √ | √ |
Reference | Methodology | Type of forecast | Experimental process | Human–robot collaboration | E-waste |
---|---|---|---|---|---|
[4] | Neural network with Kalman filtering | Point | Pick up tasks | ||
[8] | LSTM with CNN | Point | Objects manipulation | √ | |
[9] | Directed acyclic graph neural network | Point | CMU MOCAP and H3.6M | ||
[10] | GRU | Point | Human 3.6M | ||
[11] | QuaterNet with GRU | Point | Human 3.6M | ||
[12] | LSTM | Point | Arm motion | √ | |
[13] | LSTM | Point | Surface grinding | √ | |
[14] | RNN | Point | Assembly | √ | |
[15] | Linear tensor regression | Interval | Assembly and disassembly | √ | |
[16] | ConvLSTM with YOLO | Point | Disassembly | √ | √ |
This study | LSTM-bagging GRU-bagging BNN-MCD | Interval | Disassembly | √ | √ |
The dynamic and nonlinear characteristics of human hand movement, due to rapid changes in speed, direction, and spatial trajectory, create significant challenges for accurate real-time prediction [22]. To manage the spatiotemporal uncertainty in human motion and predict subsequent movements, RNN variants, specifically LSTM and GRU models, are helpful for sequence prediction tasks due to their gate mechanisms [8]. Moreover, BNN as a probabilistic model can handle the uncertainty in human motion movement motion [23]. The LSTM and GRU models address the gradient vanishing problem, a limitation of traditional RNNs in learning long-term dependencies [24]. Recent studies prove that LSTM [25], GRU [26], and BNN [23] models are capable of predicting human motion due to their capacities in handling spatial and temporal uncertainties in motion prediction. These studies show how LSTM, GRU, and BNN models help address the challenges of human motion prediction by managing uncertain and sequential data. However, existing literature mainly focuses on point prediction rather than interval prediction, as summarized in Table 1. Point predictions, while useful, provide the risk of collision in human–robot interactions. To address this risk, our study aims to shift from point predictions to interval predictions.
E-waste is becoming a serious environmental and economic problem. In 2019, 53.6 million tons of e-waste were generated around the globe with a growth rate of 21% [27]. Product recovery solutions such as eco-design policies and facilitating disassembly operations are important for e-waste recovery [28]. E-waste disassembly is particularly unique since it involves the separation and recovery of a complex mix of materials ranging from metals to hazardous materials. Also, e-waste disassembly often requires considering small and complex parts that are difficult to dismantle. Further, the high variability in consumer electronics design makes disassembly challenging for the remanufacturing workforce. Also, chemical exposure and physical and ergonomic hazards increase the risk of disassembly operations for human workers. Thus, the disassembly of e-waste requires further investigation.
The focus of this study is on disassembling desktop computers. This article is organized as follows. Section 2 provides an overview of LSTM, GRU, and BNN models. Section 3 describes the dataset and data collection experiment. Section 4 provides the prediction results. Finally, Sec. 5 concludes the article.
2 Methodology
This section describes the three machine learning models combined with bagging and MCD.
2.1 Long Short-Term Memory With Bagging.
We used PyTorch to construct the LSTM network [30]. The hidden size refers to the number of cells in each hidden layer, while the hidden layers show the number of LSTM layers stacked. The learning rate is set to 1 × 10−3 with a weight decay of 1 × 10−6, and the number of epochs is 100. We used the Adam optimizer and the squared l2 norm as the loss function.
Furthermore, we applied bagging to LSTM. Bagging, also known as bootstrap aggregation [31], is an ensemble learning method for reducing the variance [32] and avoids the local optimal solution by repeating the training process [33]. Bagging has already shown his promise in the previous literature [34–36]. In this article, we trained LSTM 30 times. The ensemble prediction is the mean of 30 predicted values from LSTM.
2.2 Gated Recurrent Unit With Bagging.
The GRU is formulated with an update gate and a reset gate expressed with and vectors, respectively. The , , and represent the input vector, candidate activation vector, and output vector. The W and b are the weight matrix and bias vector, respectively. and are the sigmoid function and hyperbolic tangent function, respectively.
The GRU further simplifies the architecture of LSTM, in which GRU has two gates instead of three gates of LSTM. In GRU, the update gate combines the functions of the input and forget gates found in LSTM. Because of fewer parameters to adjust, GRU can be trained faster than LSTM. GRU has a simple structure for training and can address the issues of memory use, gradient disappearance, and gradient explosion [39]. As proven by the previous studies [40,41], bagging can improve the performance of LSTM and GRU. Therefore, we utilized bagging in both LSTM and GRU.
2.3 Bayesian Neural Network With Monte Carlo Dropout.
The activation functions of the first and second layers are ReLU and linear, respectively. Each layer has a dropout with a probability of 20%. The loss function by the default package is the combination of squared l2 norm and Kullback–Leibler divergence. The remaining parameters such as the optimizer, learning rate, and other hyperparameters are the same as the LSTM setting.
In addition, we applied MCD to reproduce the results from the BNN model. The MCD is proposed by Gal and Ghahramani [46]. It allows the activation of the dropout in the testing phase [47]. The MCD changes the model architecture each time when providing the prediction. This article runs MCD on BNN 30 times. After conducting the 30 predictions, the ensemble prediction outcome can be computed. Monte Carlo dropout combined with Bayesian inference has received attention in different fields due to its simplicity, scalability, and computational efficiency [48].
The main difference between BNN and RNN variants such as LSTM and GRU is that BNN parameters are determined probabilistically by observing the dataset; however, LSTM and GRU parameters are trained in a data-driven manner based on the observations. In this study, we will conduct a hyperparameter experiment for LSTM, GRU, and BNN.
2.4 Possible Hand Movement Area.
After estimating the possible hand movement areas, the information can be provided to the robot control algorithm to avoid collisions. Figure 1 shows the concept of point estimation versus range estimation. The outlined circles represent the true observations, while the filled circles show the predictions. A single-point prediction has the risk that the predicted point may not align well with the actual movement and may lead to a potential collision risk.
The objective of this study is to facilitate movement prediction rather than point prediction. The input to the bagging or MCD model is the previous movement, and the output is the predicted range of the movement. The output will be predictions from t to t + 3. Bagging and MCD help the LSTM, GRU, and BNN models perform multiple iterations on hand movement predictions and form possible movement range predictions. Bagging and MCD target different aspects, with bagging addressing the dataset structure and MCD focusing on the neural network architecture. Both methods are applied at different stages: bagging during training and MCD during testing inference. In the training stage, bagging resamples the dataset before training to introduce data diversity, which helps reduce variance [31] and avoid local optimal solutions [33]. On the other hand, MCD modifies the model architecture during prediction inference. MCD is applied in both the training and testing stages to handle two types of uncertainty in BNN, namely, epistemic and aleatoric. Aleatoric uncertainty captures data noise, while epistemic uncertainty handles uncertainty in model parameters [49]. We apply the attributes of bagging and MCD to help alleviate the collision risk between the human operator and the robot by predicting the possible areas of the hand movement.
The upper and lower bounds can be calculated from multiple runs of bagging and MCD. Each model will run 30 times by applying bagging or MCD. The number of predicted values is 30. Among these 30 predictions, the maximum and minimum values are considered the boundary.
3 The Disassembly Dataset
This section describes the data collection procedure and the disassembly experiment for a desktop computer.
3.1 Dataset of Dell OptiPlex 7050 Micro Desktop for Disassembling.
The required dataset has been collected by using inertial measurement unit (IMU) sensors. Six sensors were deployed on one participant (P1) as shown in Fig. 2. The product under disassembly is a Dell OptiPlex 7050 Micro desktop computer. Six components have been dismantled from the desktop in the following order: (1) screw, (2) cover, (3) hard disk drive, (4) fan, (5) heat sink, and (6) RAM (Fig. 3).
The participants have completed the consent information, and the experiment was authorized by the University of Florida Institutional Review Board (IRB 202200211). The frequency of IMU sensors is 60 HZ, meaning the output of the sensor is 60 samples per second. The total samples are 6686 with a total disassembly time of around 111 s. The duration between the samples is 16.67 ms. The number of collected samples for (1) screw, (2) cover, (3) hard disk drive, (4) fan, (5) heat sink, and (6) RAM are 607 (10.1 s), 335 (5.6 s), 464 (7.7 s), 734 (12.2 s), 3773 (62.9 s), and 775 (12.9 s), respectively. The proportion of training, validation, and testing is 70%, 20%, and 10% for each component. For example, the number of training, validation, and testing samples for the screw is 425 (70%), 121 (20%), and 61 (10%), respectively. The duration between the current time t and the predicted time for t + 1, t + 2, and t + 3 is 16.67 ms, 33.34 ms, and 50.01 ms, respectively.
3.2 Time Length for Input and Output.
The hand's X, Y, and Z positions are collected by the sensors for the entire disassembling operation. The movement of collected samples is shown in Fig. 4. The unit for hand positions is mm.
According to Fig. 4, the hand movement for each component is different. The length of the input window is decided based on the Pearson correlation coefficient (PCC) from the next time t + 1 to the previous time, e.g., t, t−1,…, t−n. One advantage of the Pearson correlation coefficient is that it can quantify the degree to which variables are linearly related and provide a measure of the proportion of variance shared between them [50]. The input time length is selected when the PCC between the next time and the previous time is above 0.99. The higher length of input will increase the complexity and computation time. Therefore, we only selected lag features with 0.99 PCC.
The concept of the input window for input and output is described in Fig. 5. For predicting the value of the hand's X position at time t + 1 as output, the size of the input window is 7 time points, from t−6 to t, as input. Similarly, for predicting the Y position at time t + 1, we used a longer window of the previous 10 time points, from t−9 to t, as input. For predicting the Z position at time t + 1, we used a window of the previous 9 time points, from t−8 to t, as input. The PCC of each input time and output time is at least 0.99 above. When forecasting t + 2, the input window will be shifted with 1 time lag without changing its size. For example, when forecasting t + 2 in hand position X, the input will be from t−5 to t + 1.
Figure 5 shows that the LSTM model uses a window of the previous 9 time points, from t−9 to t, as input for predicting the position of the hand's Y coordinate at time t + 1. The predicted value for Y at time t + 1, along with the input window from t−8 to t, is then used as inputs for predicting the hand's position at time t + 2. This process is repeated to predict the hand's position at time t + 3. In Fig. 6, the maximum hand movement between time t and t + 3 is 53.26 mm (5.32 cm), meaning the hand moves 5.32 cm in 50 ms. The hand movement is rapid as the hand can move 5.32 cm in 50 ms and should be carefully predicted to avoid any collision.
4 The Results of Human Hand Motion Prediction
This section discusses the results and compares the findings of the three models.
4.1 Hyperparameter Experiment.
This study conducted a hyperparameter experiment to find the proper parameters for LSTM, GRU, and BNN. We apply the grid search algorithm to extremely inspect each combination of parameters. Grid search is a well-recognized hyperparameter optimization method for finding the proper parameters for machine learning [51]. According to Refs. [30,52], the LSTM and GRU have three hyperparameters such as hidden size, number of hidden layers, and dropout rate, while BNN has the number of hidden layers, number of neurons, mean of prior, sigma of prior in normal distribution, and dropout rate [45]. The details of each parameter range are shown in Table 2. For LSTM and GRU, we try 225 combinations parameters setting, while for BNN, we first try the hidden layers and the number of neurons separately before 729 combinations parameters for the mean of prior, sigma of prior, and dropout rate. We conducted the experiment in P1's hand X dataset.
The list of parameters and range for hyperparameter experiment in gird search
Model | Parameter | Range |
---|---|---|
LSTM/GRU | Hidden size | 16, 32, 64, 128, 256 |
LSTM/GRU | Hidden layers | 1, 2, 3, 4, 5 |
BNN | Number of neurons | 25, 50, 75, 100, 125, 150 |
BNN | Hidden layers | 1, 2, 3, 4, 5 |
BNN | Mean of prior | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
BNN | Sigma of prior | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
LSTM/GRU/BNN | Dropout rate | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
Model | Parameter | Range |
---|---|---|
LSTM/GRU | Hidden size | 16, 32, 64, 128, 256 |
LSTM/GRU | Hidden layers | 1, 2, 3, 4, 5 |
BNN | Number of neurons | 25, 50, 75, 100, 125, 150 |
BNN | Hidden layers | 1, 2, 3, 4, 5 |
BNN | Mean of prior | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
BNN | Sigma of prior | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
LSTM/GRU/BNN | Dropout rate | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 |
After conducting hyperparameter experiments, the best parameters are shown in Table 3. For LSTM, the best parameters of hidden size, hidden layers, and dropout rate are 16, 1, and 0.1, respectively; for GRU, the best parameters of hidden size and hidden layers are the same as the LSTM but with a dropout rate of 0.4; for BNN, the number of neurons, hidden layers, mean of prior, sigma of prior, and dropout rate are 100, 0.5, 0.1, and 0.2, respectively. Figures 7 and 8 show the gird search process in the hyperparameter experiment.

Grid search process for (a) LSTM at 0.1 dropout rate, (b) GRU at 0.4 dropout rate, and (c) BNN at 0.2 dropout rate. The optimal parameters with the lowest RMSE are marked with a dot.
The best parameters after conducting the hyperparameter experiment
Model | Parameter settings |
---|---|
LSTM | Hidden size: 16 Hidden layers: 1 Dropout rate: 0.1 |
GRU | Hidden size: 16 Hidden layers: 1 Dropout rate: 0.4 |
BNN | Number of neurons: 100 Hidden layers: 1 Mean of prior: 0.5 Sigma of prior: 0.1 Dropout rate: 0.2 |
Model | Parameter settings |
---|---|
LSTM | Hidden size: 16 Hidden layers: 1 Dropout rate: 0.1 |
GRU | Hidden size: 16 Hidden layers: 1 Dropout rate: 0.4 |
BNN | Number of neurons: 100 Hidden layers: 1 Mean of prior: 0.5 Sigma of prior: 0.1 Dropout rate: 0.2 |
4.2 Human Hand Motion Prediction.
The mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) are used to evaluate the performance of applying the three models for predicting t + 1 to t + 3. After training and validation, the testing results are listed in Tables 4–6. According to Table 4, the GRU-bagging model outperforms other models in predictions of the hand's X positions. The range of MAE, MSE, and RMSE is from 5.0 to 9.8, 44.2 to 195.5, and 6.6 to 14.0 for GRU-bagging.
The ensemble prediction results of each model for hand position X
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 5.7 | 57.1 | 7.6 |
GRU-bagging | t + 1 | 5.0 | 44.2 | 6.6 |
BNN-MCD | t + 1 | 7.4 | 113.0 | 10.6 |
LSTM-bagging | t + 2 | 8.5 | 139.2 | 11.8 |
GRU-bagging | t + 2 | 8.0 | 115.3 | 10.7 |
BNN-MCD | t + 2 | 10.2 | 145.6 | 12.1 |
LSTM-bagging | t + 3 | 10.4 | 216.8 | 14.7 |
GRU-bagging | t + 3 | 9.8 | 195.5 | 14.0 |
BNN-MCD | t + 3 | 12.5 | 249.5 | 15.8 |
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 5.7 | 57.1 | 7.6 |
GRU-bagging | t + 1 | 5.0 | 44.2 | 6.6 |
BNN-MCD | t + 1 | 7.4 | 113.0 | 10.6 |
LSTM-bagging | t + 2 | 8.5 | 139.2 | 11.8 |
GRU-bagging | t + 2 | 8.0 | 115.3 | 10.7 |
BNN-MCD | t + 2 | 10.2 | 145.6 | 12.1 |
LSTM-bagging | t + 3 | 10.4 | 216.8 | 14.7 |
GRU-bagging | t + 3 | 9.8 | 195.5 | 14.0 |
BNN-MCD | t + 3 | 12.5 | 249.5 | 15.8 |
Note: The models with the best performance are highlighted in bold.
The ensemble prediction results of each model for hand position Y
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 8.2 | 116.2 | 10.8 |
GRU-bagging | t + 1 | 8.3 | 115.2 | 10.7 |
BNN-MCD | t + 1 | 26.2 | 1268.3 | 35.6 |
LSTM-bagging | t + 2 | 14.0 | 325.2 | 18.0 |
GRU-bagging | t + 2 | 14.9 | 333.4 | 18.3 |
BNN-MCD | t + 2 | 32.4 | 1688.5 | 41.1 |
LSTM-bagging | t + 3 | 18.6 | 628.9 | 25.1 |
GRU-bagging | t + 3 | 19.3 | 643.1 | 25.4 |
BNN-MCD | t + 3 | 36.8 | 2088.2 | 45.7 |
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 8.2 | 116.2 | 10.8 |
GRU-bagging | t + 1 | 8.3 | 115.2 | 10.7 |
BNN-MCD | t + 1 | 26.2 | 1268.3 | 35.6 |
LSTM-bagging | t + 2 | 14.0 | 325.2 | 18.0 |
GRU-bagging | t + 2 | 14.9 | 333.4 | 18.3 |
BNN-MCD | t + 2 | 32.4 | 1688.5 | 41.1 |
LSTM-bagging | t + 3 | 18.6 | 628.9 | 25.1 |
GRU-bagging | t + 3 | 19.3 | 643.1 | 25.4 |
BNN-MCD | t + 3 | 36.8 | 2088.2 | 45.7 |
Note: The models with the best performance are highlighted in bold.
The ensemble prediction results of each model for hand position Z
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 9.0 | 141.4 | 11.9 |
GRU-bagging | t + 1 | 10.2 | 182.0 | 13.5 |
BNN-MCD | t + 1 | 16.9 | 439.3 | 21.0 |
LSTM-bagging | t + 2 | 6.2 | 77.3 | 8.8 |
GRU-bagging | t + 2 | 7.7 | 117.1 | 10.8 |
BNN-MCD | t + 2 | 19.7 | 609.9 | 24.7 |
LSTM-bagging | t + 3 | 15.8 | 335.5 | 18.3 |
GRU-bagging | t + 3 | 15.0 | 312.1 | 17.7 |
BNN-MCD | t + 3 | 46.4 | 2395.7 | 48.9 |
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 9.0 | 141.4 | 11.9 |
GRU-bagging | t + 1 | 10.2 | 182.0 | 13.5 |
BNN-MCD | t + 1 | 16.9 | 439.3 | 21.0 |
LSTM-bagging | t + 2 | 6.2 | 77.3 | 8.8 |
GRU-bagging | t + 2 | 7.7 | 117.1 | 10.8 |
BNN-MCD | t + 2 | 19.7 | 609.9 | 24.7 |
LSTM-bagging | t + 3 | 15.8 | 335.5 | 18.3 |
GRU-bagging | t + 3 | 15.0 | 312.1 | 17.7 |
BNN-MCD | t + 3 | 46.4 | 2395.7 | 48.9 |
Note: The models with the best performance are highlighted in bold.
According to Table 5, LSTM-bagging outperforms other models in predicting the Y position. The MAE of the LSTM-bagging model increases from 8.2 to 18.6 for t + 1 to t + 3, and the MSE increases from 116.2 to 628.49. The LSTM-bagging has better results in terms of RMSE as well.
According to the results presented in Table 6, the LSTM-bagging outperforms the other models in forecasting the hand position for the Z coordinate in t + 1 and t + 2, while GRU-bagging is better in t + 3. Specifically, the ranges of MAE, MSE, and RMSE values for the LSTM-bagging model are between 9.9 and 15.8, 141.4 and 376.1, and 14.9 and 19.4, respectively. The MAE, MSE, and RMSE for GRU-bagging in t + 3 are 15.0, 312.1, and 17.7, respectively.
Figures 9–11 show the prediction results for the X position by BNN-MCD, the Y position by GRU-bagging, and the Z position by LSTM-bagging. The x-axis shows the time in 16.67 ms between each interval and the order of the six disassembly tasks described in Sec. 3.1. Figure 12 shows the prediction performance for hand position X by GRU-bagging and LSTM-bagging for hand positions Y and Z over time.

The prediction performance for hand position X by GRU-bagging, and LSTM-bagging for hand positions Y and Z over time
Although the prediction trend is similar to observations, there are still errors between the predicted values and observed values.
4.3 Prediction Results of the Potential Range of Motion for the Human Hand.
The upper and lower bounds of motion movement can be determined from the 30 prediction results by each model. Figures 13–15 show the results for GRU-bagging, LSTM-bagging, and BNN-MCD, respectively. In Figs. 13 and 14, the boundaries miss covering the hand position for some disassembly tasks, e.g., heat sink due to the errors between the predicted values and the observed values as the models overestimate or underestimate. However, the boundaries defined by BNN-MCD provide the reasonable movement area of the hand position as shown in Fig. 15. The MCD is a Gaussian process that randomly drops out neurons, while the BNN is a probabilistic model that treats weights as random variables. The combination of these two models, both of which have uncertainty features, appears to be better suited for defining the possible movement area than fixed models like the GRU-bagging and LSTM-bagging.

The testing results of GRU-bagging on the possible movement area of the hand's Y position in time t + 1

The testing results of LSTM-bagging on the possible movement area of the hand's Y position in time t + 1
Table 7 presents partial testing samples of the BNN-MCD model, while Table 8 shows the number of observations that fall outside the boundary. Although some observations in Table 7 still do not fall within the boundary, the BNN-MCD model performs better than LSTM-bagging and GRU-bagging, as it has fewer out-of-bounds observation points, as shown in Table 8. In Table 8, specifically, for t + 1, BNN-MCD has an error rate of only around 16% (107/669), while LSTM-bagging and GRU-bagging have error rates of 53% (356/669) and 56% (373/669), respectively. The BNN-MCD is a probabilistic model combined with MCD, a Gaussian process, that provides the best possible movement prediction with a 16% error rate. These results demonstrate that the BNN-MCD model is more effective in forecasting the possible areas of hand movements.
The partial testing results of BNN-MCD for the first two prediction results on each task of the hand's Y position in time t + 1
Task | Obs. | Pred. | Errors | Max. boundary | Min. boundary |
---|---|---|---|---|---|
1 | −213 | −151 | 62 | −127 | −193 |
1 | −215 | −148 | 67 | −113 | −207 |
2 | −249 | −200 | 49 | −146 | −255 |
2 | −249 | −192 | 57 | −149 | −289 |
3 | −431 | −416 | 14 | −274 | −467 |
3 | −434 | −418 | 15 | −300 | −470 |
4 | −71 | −48 | 23 | −21 | −80 |
4 | −82 | −53 | 28 | −10 | −98 |
5 | −38 | −45 | 7 | −9 | −71 |
5 | −38 | −53 | 15 | −15 | −86 |
6 | −578 | −622 | 44 | −491 | −700 |
6 | −578 | −593 | 16 | −428 | −694 |
Task | Obs. | Pred. | Errors | Max. boundary | Min. boundary |
---|---|---|---|---|---|
1 | −213 | −151 | 62 | −127 | −193 |
1 | −215 | −148 | 67 | −113 | −207 |
2 | −249 | −200 | 49 | −146 | −255 |
2 | −249 | −192 | 57 | −149 | −289 |
3 | −431 | −416 | 14 | −274 | −467 |
3 | −434 | −418 | 15 | −300 | −470 |
4 | −71 | −48 | 23 | −21 | −80 |
4 | −82 | −53 | 28 | −10 | −98 |
5 | −38 | −45 | 7 | −9 | −71 |
5 | −38 | −53 | 15 | −15 | −86 |
6 | −578 | −622 | 44 | −491 | −700 |
6 | −578 | −593 | 16 | −428 | −694 |
The number of observation testing samples not within the upper and lower bounds for hand position Y (total test samples: 669)
Time | LSTM-bagging | GRU-bagging | BNN-MCD |
---|---|---|---|
t + 1 | 356 | 373 | 107 |
t + 2 | 527 | 499 | 170 |
t + 3 | 532 | 491 | 205 |
Time | LSTM-bagging | GRU-bagging | BNN-MCD |
---|---|---|---|
t + 1 | 356 | 373 | 107 |
t + 2 | 527 | 499 | 170 |
t + 3 | 532 | 491 | 205 |
The results imply that optimal prediction in practical applications can be achieved by utilizing advanced machine learning methods. Diverse machine learning techniques improve accuracy and facilitate managing uncertainties and complexities in real-world scenarios toward reducing risks.
Figure 16 displays the PDF of the Gaussian distribution generated by BNN-MCD, which was run 30 times at each time point. The PDFs are normally distributed with mean and standard deviation calculated from 30 samples at each time. The width of the distributions is narrower in the range of approximately 200–400, indicating a smaller range of possible movement and lower uncertainty. On the other hand, other ranges show a wider width of distributions, implying higher uncertainty and a larger range of possible movement.

The Gaussian distribution of testing results for the hand position X in time t + 1, as plotted by every 25th order, using the BNN-MCD model
It should be noted that since the PDF is drawn from simulation samples generated by BNN-MCD rather than real data, the actual observations may still fall outside the boundary, as shown in Table 8. This issue requires further discussion on how to improve forecasting accuracy in future research.
4.4 Implementation for Additional Participants.
This study further includes three additional participants, labeled P2, P3, and P4, for computer disassembly tasks. Three models including LSTM-bagging, GRU-bagging, and BNN-MCD are used to predict the possible movement of hand position X. The results for each participant are presented in Tables 9–11. The findings show that different participants have different optimal models for point prediction, with LSTM-bagging, GRU-bagging, and BNN-MCD each becoming the best model for one of the participants. This study suggests that in real-world applications, applying all three models can provide multivalidation and manage the variability in hand movements for different individuals.
The ensemble prediction results of each model for hand position X for participant P2
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 0.9 | 1.5 | 1.2 |
GRU-bagging | t + 1 | 0.9 | 1.7 | 1.3 |
BNN-MCD | t + 1 | 5.7 | 64.9 | 8.1 |
LSTM-bagging | t + 2 | 1.3 | 2.6 | 1.6 |
GRU-bagging | t + 2 | 1.1 | 2.4 | 1.5 |
BNN-MCD | t + 2 | 9.4 | 190.1 | 13.8 |
LSTM-bagging | t + 3 | 6.7 | 62.4 | 7.9 |
GRU-bagging | t + 3 | 6.2 | 51.4 | 7.2 |
BNN-MCD | t + 3 | 11.1 | 270.4 | 16.4 |
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 0.9 | 1.5 | 1.2 |
GRU-bagging | t + 1 | 0.9 | 1.7 | 1.3 |
BNN-MCD | t + 1 | 5.7 | 64.9 | 8.1 |
LSTM-bagging | t + 2 | 1.3 | 2.6 | 1.6 |
GRU-bagging | t + 2 | 1.1 | 2.4 | 1.5 |
BNN-MCD | t + 2 | 9.4 | 190.1 | 13.8 |
LSTM-bagging | t + 3 | 6.7 | 62.4 | 7.9 |
GRU-bagging | t + 3 | 6.2 | 51.4 | 7.2 |
BNN-MCD | t + 3 | 11.1 | 270.4 | 16.4 |
Note: The models with the best performance are highlighted in bold.
The ensemble prediction results of each model for hand position X for participant P3
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 3.0 | 16.9 | 4.1 |
GRU-bagging | t + 1 | 2.5 | 11.3 | 3.4 |
BNN-MCD | t + 1 | 10.1 | 171.7 | 13.1 |
LSTM-bagging | t + 2 | 4.4 | 40.7 | 6.4 |
GRU-bagging | t + 2 | 3.3 | 24.4 | 4.9 |
BNN-MCD | t + 2 | 11.8 | 176.0 | 13.3 |
LSTM-bagging | t + 3 | 7.6 | 113.1 | 10.6 |
GRU-bagging | t + 3 | 6.3 | 76.2 | 8.7 |
BNN-MCD | t + 3 | 9.5 | 180.4 | 13.4 |
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 3.0 | 16.9 | 4.1 |
GRU-bagging | t + 1 | 2.5 | 11.3 | 3.4 |
BNN-MCD | t + 1 | 10.1 | 171.7 | 13.1 |
LSTM-bagging | t + 2 | 4.4 | 40.7 | 6.4 |
GRU-bagging | t + 2 | 3.3 | 24.4 | 4.9 |
BNN-MCD | t + 2 | 11.8 | 176.0 | 13.3 |
LSTM-bagging | t + 3 | 7.6 | 113.1 | 10.6 |
GRU-bagging | t + 3 | 6.3 | 76.2 | 8.7 |
BNN-MCD | t + 3 | 9.5 | 180.4 | 13.4 |
Note: The models with the best performance are highlighted in bold.
The ensemble prediction results of each model for hand position X for participant P4
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 1.4 | 3.2 | 1.8 |
GRU-bagging | t + 1 | 1.7 | 4.7 | 2.2 |
BNN-MCD | t + 1 | 4.0 | 24.8 | 5.0 |
LSTM-bagging | t + 2 | 2.1 | 6.8 | 2.6 |
GRU-bagging | t + 2 | 2.3 | 9.1 | 3.0 |
BNN-MCD | t + 2 | 5.3 | 47.5 | 6.9 |
LSTM-bagging | t + 3 | 2.8 | 14.1 | 3.8 |
GRU-bagging | t + 3 | 3.0 | 17.5 | 4.2 |
BNN-MCD | t + 3 | 6.2 | 75.9 | 8.7 |
Model | Time | MAE | MSE | RMSE |
---|---|---|---|---|
LSTM-bagging | t + 1 | 1.4 | 3.2 | 1.8 |
GRU-bagging | t + 1 | 1.7 | 4.7 | 2.2 |
BNN-MCD | t + 1 | 4.0 | 24.8 | 5.0 |
LSTM-bagging | t + 2 | 2.1 | 6.8 | 2.6 |
GRU-bagging | t + 2 | 2.3 | 9.1 | 3.0 |
BNN-MCD | t + 2 | 5.3 | 47.5 | 6.9 |
LSTM-bagging | t + 3 | 2.8 | 14.1 | 3.8 |
GRU-bagging | t + 3 | 3.0 | 17.5 | 4.2 |
BNN-MCD | t + 3 | 6.2 | 75.9 | 8.7 |
Note: The models with the best performance are highlighted in bold.
The Kruskal–Wallis test is applied to determine if the performance of the three models is significantly different for participants P1, P2, P3, and P4. This nonparametric statistical test evaluates the differences in model performance. To apply the Kruskal–Wallis test, the errors between predicted and actual values are computed, which represent each model's performance. The null hypothesis of the Kruskal–Wallis test states that all models have the same performance, with a significance level of α = 0.05. The test results show that the P-value is less than 0.05, and there are significant differences in the performance of the LSTM-bagging, GRU-bagging, and BNN-MCD models. This rejects the null hypothesis. Dunn's test is further applied to identify specific differences, as shown in Table 12. The matrix values represent the P-values, where a P-value less than 0.05 shows a significant difference in performance between the models.
The results of Dunn’s test matrix after conducting Kruskal–Wallis for hand position X in t + 1 for participant P1
BNN | LSTM | GRU | |
---|---|---|---|
BNN | 1.0 | ||
LSTM | 1.0 | ||
GRU | 1.0 |
BNN | LSTM | GRU | |
---|---|---|---|
BNN | 1.0 | ||
LSTM | 1.0 | ||
GRU | 1.0 |
This study found that the optimal model for point prediction varies among participants with different uncertainty movements, as demonstrated in Tables 9–11. However, for range movement prediction, BNN-MCD is better, as described in Sec. 4.3. Table 8 shows that most true movement values fall within the prediction range of BNN-MCD. For point prediction, three models are recommended. For movement range prediction, BNN-MCD is the best model.
5 Conclusion
This article investigated the capability of three machine learning techniques including LSTM-bagging, GRU-bagging, and BNN-MCD for predicting the range of hand motion in the disassembly of consumer electronics. A case study of disassembling a desktop computer was used to show the application and IMU sensors were utilized to collect the required movement data. The bagging and MCD procedures were performed 30 times, and the resulting ensemble prediction was calculated. The findings show that the optimal model for forecasting point prediction varies among individual participants. The possible movement range is defined to improve the safety of the human operator. In terms of defining the upper and lower bounds, BNN-MCD outperforms LSTM-bagging and GRU-bagging. The BNN model, which is a probabilistic model, is combined with MCD, a Gaussian process, to adjust the model's architecture to account for uncertainty. This study suggests that in practical applications, three models can be used for point prediction, while BNN-MCD is the best model for predicting possible movements.
The study can be extended in several ways. The current study analyzed each hand position separately to provide a detailed comparison of each model's performance. While comparing the three models separately provided different perspectives, it is computationally expensive. To decrease computation, it may be useful to consider all three positions together in future research, for instance, by inputting positions X, Y, and Z into each model and outputting the forecasting results for all three positions. Also, currently, each disassembly operation is conducted once by a human operator, and future work is needed to collect more samples across participants and across more complex disassembly tasks.
Moreover, the focus of data collection in this study was on the upper extremity and hand motion in disassembling tasks; however, the study can be extended to consider the whole-body motion. Besides IMU sensors, other sensors such as RGB video images can be combined with IMU sensors to define different possible movement areas. Further, machine learning models can be integrated with computer vision techniques to equip robots with more accurate scene monitoring techniques.
Acknowledgment
This material was based upon work supported by the National Science Foundation–USA under grants #2026276 and #2422826. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.