## Abstract

In this paper, the feasibility of applying transfer learning for modeling robot manipulators is examined. A neural network-based transfer learning approach of inverse displacement analysis of robot manipulators is studied. Neural networks with different structures are applied utilizing data from different configurations of a manipulator for training purposes. Then, the transfer learning was conducted between manipulators with different geometric layouts. The training is performed on both the neural networks with pretrained initial parameters and the neural networks with random initialization. To investigate the rate of convergence of data fitting comprehensively, different values of performance targets are defined. The computing epochs and performance measures are compared. It is presented that, depending on the structure of the neural network, the proposed transfer learning can accelerate the training process and achieve higher accuracy. For different datasets, the transfer learning approach improves their performance differently.

## 1 Introduction

A neural network (NN) is based on a collection of neural nodes. A neuron node can receive and process an input signal and then transfer the output signal to the nodes in the next layer. This process mimics the function of a neuron in a biological brain. There is an input layer where input data are presented, and an output layer giving a response to the input data. The layers between the input layer and the output layer are called hidden layers. The weights existing in the connection between nodes determine the strength of influence between connected nodes. As the most popular neural network, backpropagation neural network is able to adjust the weights and biases between layers to minimize its mean squared error (MSE) calculated from its output prediction [1]. Neural networks have been proven to be efficient by iterative computation for solving nonlinear regression [2]. To optimize the training process, different training algorithms are developed [3].

The inverse displacement analysis (IDA) of a robot manipulator is the calculation of the joints’ motions to provide the required position and orientation of the end effector. Three common methods are applied for the IDA: geometric, algebraic, and iterative. NNs can be regarded as a feasible approach for solving the IDA. Research on the IDA for manipulators based on neural networks has shown a relatively satisfying accuracy [4,5]. In Ref. [6], inverse displacement of a 6-degrees-of-freedom (DOF) manipulator has been simulated utilizing a multilayer backpropagation neural network. To find a suitable NN configuration for solving the IDA of a 6-DOF manipulator, NNs with different numbers of output nodes are compared, and an improved method of designing NN is proposed in Ref. [6]. In Ref. [7], an NN was developed to solve the IDA for a SCARA robot manipulator and reached satisfying accuracy. For the IDA of a 7-DOF manipulator with more joints than the SCARA manipulator, the NN also achieved acceptable position errors along the defined path [8]. A genetic algorithm and an adaptive-learning algorithm based on NN have been presented respectively in Refs. [9] and [10], which showed a significant positive impact to minimize the errors of joints’ motion.

It has been demonstrated that, in some machine learning cases, the transferred knowledge from one task is able to speed up the learning for another task, which can be considered as transfer learning. For different target tasks, transfer learning can be conducted by different approaches. Reinforced learning as an approach to transfer training is simulated on a physical robot to complete the learning process faster [11]. A study conducted on a 7-DOF manipulator also proves that transfer learning decreases the training time of both stationary and non-stationary pointing tasks [12]. In Ref. [13], alignment-based transfer learning from a secondary manipulator to the target manipulator is proposed.

With better available computation resources, NN performs as an increasingly important approach for transfer learning. Besides robotics, some studies of transfer learning based on NN have been conducted, such as language learning, prediction systems, and facial recognition [14–16]. Several kinds of robotic research of transfer learning based on NN are also conducted. For visual and non-visual tasks, transfer learning has also been proven for reducing the burden of data collection by using data from a similar robot [17]. For the detection of the robot body in terms of a two-dimensional (2D) image, transfer learning has been proven to be an efficient method to improve the accuracy with less training dataset [18]. A transfer learning based on convolutional neural network (CNN) was attempted in Ref. [19] for grasping operation, indicating the potential of transfer learning techniques to address the accessibility of suitable size of the training dataset. To compute positions of the end effectors’ operation on the fabric, to change the state of an uncertain manifold of the fabric, CNN models utilizing the red-green-blue (RGB)-based images and depth-based images from the source fabrics are trained and then transferred to the target fabrics to compare [20]. In mobile robots, transfer learning is proposed to speed up the visual defect inspection task [21]. To estimate the collision points between a robot and an external object, transfer learning of object isolation for a 7-DOF manipulator was investigated in Ref. [22] based on CNNs. In Ref. [23], to estimate the collision points between the robot and an external object, a dataset from simulation and a dataset from a real robot were generated for transfer learning in collision localization. These robotics applications of transfer learning are conducted on a target dataset by transferring the knowledge learned from the source environment, and the positive influence of transfer learning can be witnessed.

To the best knowledge of authors, transfer learning has not been applied for the kinematic modeling of manipulators. In this paper, the feasibility of implementing a NN-based transfer learning for the kinematic modeling of robot manipulators is examined. The investigation includes the workspace analysis based on the inverse displacement formulation of a serial robot (nonlinear relations). The main purpose of investigating transfer learning is to improve the NN final training errors and increase computing efficiency. To investigate if transfer learning can positively impact the NN training process for the IDA of manipulators, in Sec. 2, analytical and NN models are developed for a serial robot manipulator. Following a comprehensive investigation of NNs with various numbers of nodes in their hidden layers, in Sec. 3, implementation of transfer learning on datasets from different configurations and geometric layouts is presented for workspace analysis and path planning. Discussions and conclusions are reported in Sec. 4.

## 2 Modeling

In this section, the analytical formulations for kinematic modeling and the neural network model of a robot manipulator are briefly discussed. For serial robot manipulators, the inverse displacement problem can be solved using a closed-form formulation or iterative methods. To eliminate the effect of convergence error of iterative process and so on, a 3-DOF revolute–revolute–prismatic jointed serial manipulator (SCARA type), with closed-form formulation for joint displacements, is considered. Because the end effector pose is linearly related to the prismatic joint’s displacement, only the revolute-jointed part is examined for NN modeling. In the following subsections, the inverse displacement of the manipulator is evaluated by the closed-form formulation and utilizing a backpropagation neural network method.

### 2.1 Displacement Analysis.

A 3-DOF manipulator with two revolute joints and a prismatic joint is considered as a case study to generate the input–output datasets for NN. A configuration of the first two joints of the 3-DOF manipulator is shown in Fig. 1 and will be referred to as manipulators in the following discussion. Inverse displacement analysis (IDA) generates the motion of joints according to the position and orientation (pose) of the end effector. Forward displacement analysis (FDA) is the determination of pose based on the given displacement of every joint.

*p*

_{x}and

*p*

_{y}are the horizontal positions of the operation point of the end effector,

*l*

_{1}and

*l*

_{2}are the lengths of the first and second links, respectively, and

*θ*

_{1}and

*θ*

_{2}are the rotation of the first and second revolute joints.

The horizontal position of end effector, (*p*_{x}, *p*_{y}), is calculated in terms of the revolute joints’ displacements and lengths of links. The orientation (*α*) of the end effector is the sum of the rotations of the two revolute joints. Because the vertical position of end effector, *p*_{z}, is linearly related to the prismatic joint’s displacement, it will not be considered in the IDA.

*p*

_{x}and

*α*are the independent motions of end effector, their values are known for the IDA. Then, Eqs. (1) and (3) could be combined as one equation in terms of one unknown

*θ*

_{1}, which will result in up to two solutions for

*θ*

_{1}as

*p*

_{x}and

*p*

_{y}are the independent motions of the end effector, Eqs. (1) and (2) are solved for the two unknowns

*θ*

_{1}and

*θ*

_{2}, which will result in up to two solutions for the IDA as

### 2.2 Backpropagation Neural Network.

A neural network is a computational model that mimics the human brain. A backpropagation NN is a layered network consisting of an input layer, an output layer, and at least one hidden layer that contains nonlinear processing elements. The nonlinear processing elements are referred to as neurons because they are similar to neurons in a brain. They sum the incoming signals and generate output signals according to predefined functions.

The basic unit of computation of a NN is the neural node. The neural node receives the input from other nodes or external sources and then computes an output, which can be transferred to the next hidden layer or the output layer. Each input has an associated weight, which is assigned by its relative importance to other inputs of that neuron. The node applies the activation function, which is also called the transfer function, to the weighted sum of its inputs. A model of a NN with four hidden layers, displaying the activation flow and error flow, is depicted in Fig. 2.

*a*of a node is to determine the net input, which is the product of the node’s weight vector $w$ and input vector $x$:

*x*represents the inputs to the node;

_{i}*w*

_{i}represents the weights applied to those inputs;

*b*is the offset or bias term for the node; and

*m*is the number of synapses for the node. To determine the output, a transfer function is applied. There are three common transfer functions in the NN: hyperbolic tangent function, sigmoid function, and linear function. To measure the performance of trained NNs, the metric is the MSE, which is defined as:

*n*is the number of data in the corresponding dataset, and

*Y*

_{i}and $Yi^$ are the observed and predicted values of the output data, respectively.

In order to minimize the errors given by error flows, as shown in Fig. 2, training algorithms have been developed to further adjust the weights. The training process involves a number of epochs, each containing numerous iterations. One epoch means that all samples in the training dataset are trained for a forward pass and a backward feedback through the NN. The weights and biases can be adjusted after each epoch to achieve a better accuracy for data regression until the training is terminated.

## 3 Transfer Learning Methodology

In this section, an approach to transfer learning is proposed and simulation results are presented. The transfer learning process is to transfer the weights and biases from the NN of the source dataset to the NN of the target dataset. The transferred weights and biases are used for initializing the target NN.

Two types of datasets are generated for simulations, one for workspace and one for path planning. Utilizing the workspace dataset, the transfer learning is implemented between the datasets of different inverse displacement solutions (configurations). In addition, the transfer learning is investigated for the manipulator with different geometric settings. For path datasets, the investigation of transfer learning is first conducted on the datasets of the same manipulator with different paths and then on the datasets of different manipulators but the same path.

### 3.1 Dataset Generation.

The proposed transfer learning methodology is applied for workspace investigation, with the dataset generated based on FDA, and for path planning, with the required end effector path being the logarithmic spiral. For the transfer learning of configuration switch on workspace datasets, the source datasets are generated based on a selected robot configuration, and the workspace datasets of the counterpart configuration (IDA solution) based on the same robot geometry are used as the source datasets. For the geometry change, compared to the source workspace datasets, the target workspace datasets are generated based on the different geometric settings (link lengths) but the same robot configuration.

For transfer learning of different paths, both source and target path datasets are based on the same geometric setting. Then, the transfer learning is investigated on the same path, and the target geometric setting different from the source manipulator is applied for generating the target path dataset.

#### 3.1.1 Workspace Datasets.

Transfer learning of the IDA of the 2-DOF robotic arm was conducted on datasets generated from different IDA configurations of the same manipulator. Additionally, the transfer learning was applied on target manipulators with different geometric layouts. The rotation range for the two joints of each configuration is summarized in Table 1, and the geometric parameters of datasets from different configurations and geometries are reported in Table 2.

##### 3.1.1.1 Inverse displacement analysis configurations.

First, the independent motions of end effector are chosen as translation *p _{x}* and rotation

*α*(input), which reduces the number of unknowns to one unknown

*θ*

_{1}(output), i.e., Case 1. As noted in Sec. 2.1, there are up to two solutions for

*θ*

_{1}. To generate different configurations of the manipulator, datasets for different ranges of the joints’ rotations are generated.

The first dataset is the workspace (mesh-generated) data where the motion ranges of the first and second revolute joints are 0–180 deg and 0–360 deg, respectively; this range of motion is referred to as the first configuration here. The second dataset is generated based on a manipulator with the same layout but different ranges of revolute joints motions. The first joint rotates within the range of 180–360 deg, while the defined motion of the second joint is full rotation, 0–360 deg, referred to here as the second configuration.

Since a full rotation is defined for the second revolute joint in both datasets, the difference between the workspaces of the two manipulators is caused by the defined displacement limit of the first revolute joint. Because the first link rotates within 0–180 deg in the first dataset, the configuration of the first dataset is called “arm-up. ” Similarly, the configuration of the second dataset is denoted as “arm-down.”

##### 3.1.1.2 Geometric layouts.

Besides applying the transfer learning to datasets from different configurations, transfer learning is also implemented for the IDA of target manipulators whose geometric layouts are different from those of the source manipulator. In this case, the source and target datasets generated from the same IDA configuration are applied by transfer learning; i.e., only the robotic arm’s geometric parameters are changed during data generation.

For transfer learning between different geometric settings, three sets of link lengths are used, with increasing difference between the target and source manipulators. As noted in Table 2, compared to the source manipulator, Target 1 manipulator has the smallest geometric difference, and Target 3 shows the largest geometric difference. Compared to the source manipulator, Target 1 can be considered as a manipulator with a slight geometry change due to the repair or replacement of some components. In addition, Target 1 could be a prototyped manipulator with slightly different geometric parameters than its simulated counterpart (source). Targets 2 and 3 correspond to the situations that the trained NN of a robot manipulator is used for a different manipulator from the same series or for the same manipulator when the model size is altered.

Before dataset generation, a two-degree increment is used to generate the two vectors of joints’ rotations. Elements of two vectors start with the minimum and increase up to the maximum of the limit. Thus, elements can be considered as an arithmetic sequence. After the two vectors of each joint are saved, a Cartesian product is applied to generate a matrix containing samples. In the matrix, the first and second rows are comprised of *θ*_{1} and *θ*_{2}, respectively. Then, the pose of the end effector is calculated using Eqs. (1)–(3), in the form of a matrix with three rows, corresponding to *p*_{x}, *p*_{y}, and *α*. When the dataset with one unknown parameter (*θ*_{1}) of the IDA is requested (Case 1), the output and input data are selected and organized from those two matrices. For the input, the first row represents *α*, the orientation of the end effector, while the second row denotes *p _{x}*, the position of the end effector. A vector containing the corresponding rotation of the first joint (

*θ*

_{1}) is used as the output dataset.

To conduct a more comprehensive model, a dataset with two unknown parameters in its output data (Case 2) was generated for transfer learning. Similar to Case 1, both configuration and geometric settings are examined. For the transfer leaning of configuration switch, according to Eqs. (5) and (6), when both *θ*_{1} and *θ*_{2} are considered as the output for NNs, for each pose *p*_{x} and *p*_{y}, there are generally two configurations. For simulations, a full rotation of the first joint is chosen. The displacement of the second joint is defined into two ranges 0–180 deg and 180–360 deg, respectively. These two configurations for the IDA with two unknown parameters (*θ*_{1} and *θ*_{2}) are denoted as “elbow-up” and “elbow-down,” respectively. The settings of joints’ displacement for named configurations are listed in Table 1. For the transfer learning of the geometric change, the datasets based on the source and target geometric settings (link lengths) in Table 2 for the same configuration (elbow-up or elbow-down) are generated for NN training.

In summary, to conduct a comprehensive research of transfer learning, the IDA of the manipulator is based on two different structures of workspace dataset. The first structure is comprised of two known parameters (*α* and *p*_{x}) as the input and one unknown parameter (*θ*_{1}) as the output. For the case with one unknown parameter (Case 1), data corresponding to positive *p _{x}*, i.e., within the first and the fourth quadrants of Cartesian coordinate system, are selected for transfer learning of configuration switch and geometric change. For the second structure, the positions of end effector (

*p*

_{x}and

*p*

_{y}) are the input and the corresponding joint displacements (

*θ*

_{1}and

*θ*

_{2}) are the output. For the case with two unknown parameters (Case 2), data with positive

*p*

_{y}are used for the transfer learning of configuration switch and geometric change, i.e., the dataset from the first and second quadrants.

#### 3.1.2 Spiral Path Dataset.

In this section, transfer learning is implemented for path generation. A logarithmic spiral within the manipulator workspace is generated. In polar coordinates (*r*, *φ*), the logarithmic spiral can be written as *r* = *de*^{kφ}, with *p*_{x} = *de*^{kωt} cosω*t* and *p*_{y} = *de*^{kωt}sin*ωt*, where *d* (*d* > 0) and *k*(*k* ≠ 0) are real constants, defining polar slope angle and curvature, respectively; *t* is the time and *ω* is the angular velocity in which the end effector moves along the spiral. The detailed information of the two spirals (with two loops) used in transfer learning is listed in Table 3, and paths are shown in Fig. 3.

For the target data, a spiral with the start and end points reported in Table 3 is built. To cover a larger area of the whole workspace and increase the number of data in dataset, both target and source paths are logarithmic spirals with two full rotations, i.e., the angular displacement of 4*π*.

To ensure the same configuration of the robotic arm at every point along the path, the configuration defined as “elbow-down” is defined for calculating revolute joints’ displacements, which indicates the rotation of the second joint is limited from 0 deg to 180 deg. The rotation of each revolute joint can be calculated by Eqs. (5) and (6). The starting point and the end point of the two spirals are summarized in Table 3. The step size of $\omega t=0.5deg$ is selected to discretize the path and generate a matrix for the training dataset. Spiral 1 and spiral 2 each contain 1440 points (end effector positions). The path selection and geometric settings of transfer learning are listed in Table 4.

The transfer learning of IDA of different paths of the same manipulator and different manipulators of the same path can be valuable in industrial applications. For example, if the layout of an automotive assembly line is slightly changed, the path of the manipulators will be changed. As well, the transfer learning between datasets of different manipulators can be considered from a source manipulator to a manipulator whose components are replaced or altered due to wear and tear.

### 3.2 Neural Network Settings.

In default settings, the weights and biases of neural networks are randomly generated from the random number generator in matlab. To have the same initial conditions for the two datasets, the same matrix of random weights and biases are applied. Other parameters of training are listed in Table 5.

The validation check is one of the effective methods for estimating weights and biases. The MSE of output data is calculated after each epoch of training, which measures the error of the validation dataset. If the training process shows an improvement both in the training dataset and in the validation dataset, the performance of validation data will keep decreasing. As the weights and biases of the NNs are updated after each training epoch, the performance of validation and testing dataset varies during the training. Validation check starts with the calculation of performance based on the data in the validation dataset and terminates the training process if the number of successive epochs that witness the increase of validation performance reaches the maximum of validation check. Once the training stops because of the validation check, the weights and biases are returned to the values at the epoch before the validation performance (i.e., MSE) started increasing.

Epoch limit defines the upper limit of training epochs. If the validation check cannot achieve the defined performance (MSE) when the epoch of training reaches its limit, the training process would be ceased. Defining the epoch limit is necessary for avoiding superfluous training. For training on workspace dataset, because of more unknown parameters in the dataset of Case 2, the computation time of each epoch of NNs for Case 2 is normally longer than that of NNs for Case 1. To complete the training within the computation time limit of the server, the epoch limit of Case 2 (with two unknowns) is set smaller than that of Case 1. After extensive simulations, 60,000 and 30,000 are proven to be suitable numbers for Case1 and Case 2, respectively. Increasing the limit of validation check can cause less possibility of data overfitting so that the NN can be more likely to reach the training MSE in a smaller magnitude. As reported in Table 5, the limit of validation check of Case 2 (10,000) is set to be lower than that of Case 1 (25,000). As well, for training of spiral data, the same limit as of Case 2 (30,000) is defined.

MSE is the average squared difference between the estimated values and the actual values. For training on the workspace dataset, the performance goal (MSE = 0) for Case 1 means the training can be terminated when prediction errors of all training samples reach zero. After analysis of trained NNs of Case 1, it can be observed that the lowest final training performance of all tested NNs is in the magnitude of 10^{−10}, so the performance goal cannot be achieved for Case 1. For Case 2, MSE of *θ*_{1} and *θ*_{2} is calculated by Eqs. (6) and (5), respectively. After numerous simulations of Case 2, the value of the performance goal is MSE ≤ 10^{−8}. There are 6480 samples in the dataset. For MSE of 10^{−8}, the largest possible error of joint’s motion would be 1.138 × 10^{−2} deg (when the error in the remaining 6479 data points is zero). If the training reaches this performance goal, the results are considered to reach high accuracy.

The entire data are divided into three subsets: 80% training dataset, 10% validation dataset, and 10% testing dataset. An identical random number generator is utilized for data division. The total number of data for “arm-up” and “arm-down” are the same. Therefore, the three matrices that determine the categorization of the three subsets contain identical elements when the training is repeated.

For the training algorithm, the Levenberg–Marquardt (LM) algorithm, which is also called the damped least-squares method, is selected for solving the minimization problem for MSE.

After some trial and error in simulation on workspace dataset, for the case with one unknown parameter (Case 1), NNs with two hidden layers are selected. NNs with four hidden layers (Case 2) are developed for data with two unknown parameters. For each case and each NN, the numbers of nodes in each layer remain the same. For transfer learning on spiral path dataset, the training is conducted by NNs with three hidden layers containing equal numbers of nodes. Transfer functions of nodes are hyperbolic tangent function in each hidden layer and linear function for the output layer of each NN.

### 3.3 Transfer Learning Implementation.

The investigation of transfer learning includes workspace (mesh-generated) and path planning. For the workspace datasets, the transfer process is first implemented between the datasets of different configurations and then utilized the datasets of manipulators with different geometric settings. To gather a better understanding of how transfer learning impacts the IDA of path planning, transfer learning is tested on the dataset of a spiral path. The transfer learning process starts by transferring the weights and biases between the two different spirals of the same manipulator. Then, the transfer learning is simulated using the datasets of the same path but using two different manipulators.

#### 3.3.1 Implementation of Workspace Dataset.

For data with one unknown parameter (Case 1), the training process starts with the initial weights and biases for both “arm-up” and “arm-down” configurations (two solutions of the IDA). After the training is completed, the designed weights and biases are saved in the workspace and then extracted. The extracted weights and biases from the neural network of one configuration are applied as the initial weights and biases for the neural network of its counterpart. For example, if the training of “arm-up” is completed, the weights and biases of its NN are extracted and used as the initial weights and biases of the “arm-down” configuration, and vice versa. Transferring the weights and biases from one robot model to another related dataset is how the transfer learning is applied here. The training process of the NNs for one configuration with random weights and biases would be compared with that of NN for the same configuration initializing with the designed weights and biases from the other configuration.

For both workspace datasets (Case 1 and Case 2), the transfer learning is implemented on the IDA of manipulators with various geometry parameters and different configurations. For example, the weights and biases are extracted after NNs with random initialization are trained by the dataset of the source manipulator. The extracted biases and weights are treated as initialization of NNs for three target manipulators. NNs that are initialized with random weights and biases for three target manipulators are also trained with the same setup for further comparison with those using the proposed transfer learning.

As mentioned earlier, for Case 1, an NN with two hidden layers is defined for simulation. The number of nodes in the second hidden layer remains the same as that of the first hidden layer. Testing various numbers of nodes indicated that NNs having more than 20 nodes in each hidden layer can generate a relatively high accuracy. To conduct a more thorough investigation, the number of nodes in each hidden layer is increased from 20 to 40, which means 21 different NNs are trained.

Similarly, for Case 2, the IDA with two unknown parameters, transfer learning is applied using the same process for geometry change and configuration switch. Neural networks with two hidden layers cannot provide results with small errors. After extensive simulations, it is shown that NNs with four hidden layers can be suitable for the dataset with two unknown parameters. The same number of nodes is used for each hidden layer. The number of nodes of each hidden layer is varied from 22 to 40 (even numbers only due to increased simulation time).

During the simulation, the training is stopped when the value of performance function (MSE) reaches the performance target. To examine the initial training speed and final MSE after training, different MSE values are selected to compare the initial training performance, such as epochs and time. In this case, the training is terminated immediately if one of the three parameters, epoch limit, validation check, and performance goal, reaches the corresponding defined value in the NN settings. The decrease in training time and epoch indicates less computing, which can indicate a more efficient training.

If the epochs of the training with predefined (transferred) weights and biases (for initialization) are lower than those from the training with random initial weights and biases, the proposed approach can improve the convergence of training. The flowchart of the transfer learning process, for both workspace and path generation, is shown in Fig. 4.

The performance goal is set as MSE = 0 and MSE ≤ 10^{−8} for Case 1 and Case 2, respectively. Thus, it will be ideal if the training MSE keeps decreasing and then reaches that goal without being terminated by validation check. To develop a deeper understanding of the training process of transfer learning, additionally, the MSE and epochs are checked during the training as the MSE reaches 10^{−2} and 10^{−5}. The epochs when the training MSE reaches these two targets are recorded and compared. For example, if the number of epochs when the training MSE reaches 10^{−5} with transferred initialization is smaller than that with random initialization, the transfer learning can be proven to accelerate its training performance until the training MSE reaches 10^{−5}. Finally, the recorded information of the training process, such as final training performance, validation performance, testing performance, and epochs, is analyzed.

For the case with one unknown parameter (Case 1), epochs for configuration switch are shown in Fig. 5. It is evident that all of the tested NNs for both “arm-up” and “arm-down” datasets can reach the value of MSE ≤ 10^{−2} with lower epochs. The transfer learning can also speed up the training for most NNs if their MSE target is changed to 10^{−5}.

Epochs of geometry changes are also compared. Results of three target manipulators of Table 2 are shown in Figs. 6–8. The datasets from the same configurations are utilized for training purpose. As shown in Figs. 7–9, when the MSE reaches a relatively large magnitude (MSE of10^{−2}), all NNs for Target 1 and Target 2 show strong improvement by transfer learning. For Target 3, 19 out of 20 NNs also show accelerated training. When the training MSE reaches 10^{−5}, most NNs for target manipulators show fewer epochs as well. It is worth mentioning that the training MSE of all NNs for Target 1 is less than 10^{−5} after the first epoch (with epoch limit set to 60,000), which can be considered as a significant improvement.

#### 3.3.2 Implementation of Path Dataset.

For simulations of different paths, similar to the workspace examination, the dataset of the source manipulator is trained with random initialization. Then, the weights and biases of the trained NNs are extracted and transferred to initialize the weights and biases of the NNs of the target dataset. After training with pretrained initialization, the random initialization is applied to target dataset to compare with NNs utilizing pretrained initialization. For path datasets, NNs containing three hidden layers and equally distributed nodes among hidden layers are selected. For a thorough investigation, the numbers of nodes in each layer increase from 10 to 40, which means a total of 31 NNs are tested. Similar to the workspace investigation, the two performance MSE (10^{−2} and 10^{−5}) is used for the analysis of paths datasets, i.e., same path with two different manipulators, and same manipulator with two different paths, in Figs. 13 and 14, respectively.

As shown from Figs. 13 and 14, the numbers of epochs by transfer learning, for reaching the two MSE values, are lower than random initialization. It is worth mentioning that most NN configurations of the dataset of different paths (same manipulator) can achieve the training performance MSE ≤ 10^{−5} at the first epoch. This indicates the pretrained initialization on the path dataset can achieve low MSE for the two spirals of the same manipulator without completing any training epochs.

## 4 Discussion

In this section, the training results of the manipulator workspace and path datasets are discussed.

### 4.1 Results of Workspace Dataset.

For Case 1 (with one unknown), according to the data shown in Figs. 5–8, it is clear that the training performance of NNs with the same configuration but a different initialization approaches the MSE of 10^{−2} and 10^{−5} at different number of epochs. For some NN configurations, it is possible that only one of two NNs (with different initializing approaches) can reach these two MSE values. NN configurations whose MSE decreases to 10^{−2} and 10^{−5} with both random and pretrained initialization are considered in the following data comparison. The percentage of the NNs with less epochs can be considered as an index to describe the positive impact caused by the proposed transfer learning. The percentage of improved NNs, considering all tested NNs, as a result of transfer learning, is listed in Table 6.

As reported in Table 6, if the magnitude of the training performance (MSE here) is relatively large, such as 10^{−2}, in most tested NNs, less epochs are required by transferred weights and biases, except for 5 NNs for Target 3 with arm-up configuration. When the training MSE is refined to 10^{−5}, except for the “arm-up” configuration of Target 3, more than 85% NNs of each simulation represent an accelerated computation. This phenomenon can be considered as a significantly positive effect for improving the accuracy, indicating that the transferred initialization can optimize the training performance with smaller MSE at the initial epochs of training process.

The comparison based on the number of epochs is also implemented on the NNs of Case 2. The detailed information is reported in Table 7. During extensive simulations of Case 2 (with two unknowns), it is observed that data overfitting during the training process may occur on any NN randomly, which generates large errors of predicted results for some tested NNs. When the regression equations are more complicated, after extensive simulations, less configurations of NNs are feasible for solving the IDA. Some tested NNs utilizing different initialization approaches for the same scenario (configuration and geometry) cannot reach the two defined MSE values. Like the statistics of Case 1, the percentage check of all improved NNs remains higher than 80% in any circumstance of Case 2.

In summary, the ratios of improved NNs of MSE ≤ 10^{−2} are greater than or equal to the improved ratios of MSE ≤ 10^{−5}, which suggests the lower computation required by transferred weights and biases of the training process. According to Table 2, the geometric setting of Target 1 and Target 2 represents the smaller geometric difference from the source manipulator, while the manipulator (Target 3) shows the largest geometric difference. It can be found that, for both Case 1 and Case 2, the percentages of improved NNs of Targets 1 and 2 remain 100% from MSE ≤ 10^{−2} to MSE ≤ 10^{−5}, whereas the percentage of NN for Target 3 is reduced.

The detailed information about the training process of the tested NNs is extracted from the recorded training dataset. Due to the defined performance goals and validation check, the training MSE is likely to reach a value larger than the performance goal when the training process is terminated. To investigate the final training MSE of the tested NNs, an MSE threshold of 10^{−5} is defined for performance comparison. The final MSE value (4.65 × 10^{−6}) of a pretrained NN for Case 2 with 32 nodes in each hidden layer is close to 10^{−5}. The error histogram of this NN indicates that 99.76% of data samples are distributed into the bin with the error of $3.201\xd710\u22123deg$. For Case 2, according to Eq. (8), the MSE threshold (10^{−5}) can be considered as acceptable accuracy with the largest possible error of 0.255 deg (with zero joint error in the remaining 6479 data points). If the final training MSE is smaller than the threshold, the NN can be regarded as an effective NN configuration for the IDA.

The final MSE values of the randomly initialized NNs and pretrained NNs are compared with an MSE threshold (10^{−5}). The comparative studies of the final MSE on workspace dataset and path datasets indicate that the transfer learning has the potential of improving the final training MSE (for detailed discussion refer to Ref. [24]).

### 4.2 Results of Path Dataset.

Considering Figs 13 and 14, compared with NNs initializing with random weights and biases, the number of improved NNs whose MSE reaches 10^{−2} and 10^{−5} with less epochs by pretrained initialization are listed in Table 8.

It can be concluded that the training process utilizing pretrained weights and biases is more likely to achieve low MSE for training performance. If the NN initialized with random weights and biases cannot reach a satisfying regression precision, the initialization of pretrained can increase the possibility of achieving better training performance on the same NN configuration.

## 5 Conclusion

In this paper, the neural network-based transfer learning for modeling robot manipulators was implemented. A transfer learning approach was proposed for the workspace generation and path planning of robot manipulators. The epochs and the performance (MSE) of two datasets (training and validation) were investigated.

After data generation and comparison analysis, the following observations on the transfer learning were noted. First, in terms of epochs, it was concluded that the transfer learning impacts positively at the early stage of training. For workspace datasets, the improvement of NNs for “arm-down” and “elbow-up” utilizing transfer leaning was more apparent, indicating more quantities of examined NNs with lower epochs than their counterparts. This also indicated that the MSE of the proposed transfer learning was different for different datasets. The comparison of transfer learning approaches showed that the epochs of data regression for different configurations could not be decreased equally by transfer learning. For path datasets, most tested NNs utilizing transfer learning could reach a satisfying MSE (10^{−5}) with less epochs than random initialization, which indicated the transfer learning can speed up the training process to a specific MSE with less epochs. The phenomenon of less epochs is noticeable on the datasets of the same path with different manipulators.

In terms of the final training information, for workspace datasets, it is noteworthy that the training MSE of target manipulators started decreasing from a relatively small magnitude. Because of the transferred parameters, improvement of training MSE at the first epoch could be achieved. The positive impact to the IDA of target manipulators was higher when there was less considerable geometric difference between the source and the target manipulators. With a larger error in MSE (10^{−2}), all simulated NNs for the two configurations reached their MSE target for the training dataset, while the MSE of the corresponding validation datasets from transfer learning had similar values to those of randomly initialized NNs. Comparison of the final training MSE indicated that the proposed transfer learning method can increase the possibility of obtaining suitable NNs. For path datasets, the final MSE decreased to a smaller magnitude by applying transferred initialization. These indicate that the pretrained weights and biases can be used as the initial guess to improve the regression precision.

It can be concluded that, for the NN-based robot kinematic modeling, transfer learning is a powerful method to speed up the training process and increase the computational efficiency when the source and the target manipulators are close. This effect can be beneficial for NN modeling in industrial applications and robotics research, e.g., transferring the NN model of a simulated robot manipulator to a prototyped one or updating the NN model of a robot manipulator after part replacement. By transfer learning, a proper NN can also be identified with less experimentation, thus improving the efficiency of NN-based modeling.

## Footnote

## Acknowledgment

This research was enabled in part by support provided by Compute Canada^{2} and the Centre for Advanced Computing at Queen’s University.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper. Data provided by a third party are listed in Acknowledgment.