This paper presents a framework to build hybrid cells that support safe and efficient human–robot collaboration during assembly operations. Our approach allows asynchronous collaborations between human and robot. The human retrieves parts from a bin and places them in the robot's workspace, while the robot picks up the placed parts and assembles them into the product. We present the design details of the overall framework comprising three modules—plan generation, system state monitoring, and contingency handling. We describe system state monitoring and present a characterization of the part tracking algorithm. We report results from human–robot collaboration experiments using a KUKA robot and a three-dimensional (3D)-printed mockup of a simplified jet-engine assembly to illustrate our approach.

## Introduction

Factories of the future will be expected to produce increasingly complex products, demonstrate flexibility by rapidly accommodating changes in products or volumes, and remain cost competitive by controlling capital and operational costs. Networked machines with built-in intelligence will become the backbone of these factories. Humans will continue to play a vital role in the operation of the factories of the future to achieve flexibility at low costs. Realizing complete automation that meets all three above-described requirements does not appear to be feasible in the near foreseeable future. The goal of achieving flexibility at low costs simply means that humans will continue to play a vital role in the operation of the factories of the future. Their role will change from doing routine tasks to performing challenging tasks that are difficult to automate.

Humans and robots share complementary strengths in performing assembly tasks. Humans offer the capabilities of versatility, dexterity, performing in-process inspection, handling contingencies, and recovering from errors. However, they have limitations in terms of factors of consistency, labor cost, payload size/weight, and operational speed. In contrast, robots can perform tasks at high speeds, while maintaining precision and repeatability, operate for long periods of times, and can handle high payloads. However, currently robots have the limitations of high capital cost, long programming times, and limited dexterity.

Owing to the reasons discussed above, small batch and custom production operations predominantly use manual assembly. The National Association of Manufacturers estimates that the U.S. has close to 300,000 small and medium manufacturers (SMM), representing a very important segment of the manufacturing sector. As we move toward shorter product life cycles and customized products, the future of manufacturing in the U.S. will depend upon the ability of SMM to remain cost competitive. The high labor cost is making it difficult for SMM to remain cost competitive in high wage markets. They need to find a way to reduce the labor cost. Clearly, setting up purely robotic cells is not an option for them as they do not provide the necessary flexibility. Creating hybrid cells where humans and robots can collaborate in close physical proximities is a potential solution. However, current generation industrial robots impose safety risks to humans, so physical separation has to be maintained between humans and robots. This is typically accomplished by installing the robot in a cage. In order for the robot to be operational, the cage door has to be locked and elaborate safety protocol has to be followed in order to ensure that no human operator is present in the cage. This makes it very difficult to design assembly cells where humans and robots can collaborate effectively.

In this paper, we design and develop a framework for hybrid cells that support safe and efficient human–robot collaboration during assembly operations. Our prior work on this topic focused on the problem of ensuring safety during human–robot collaborations inside a hybrid cell by developing a human-monitoring system and precollision robot control strategies [1]. The specific contributions of this work include:

1. (1)

Details on the interaction between different system components of the human–robot collaboration framework

2. (2)

New part-tracking system that augments the state-monitoring capability of the hybrid cell significantly. The part-tracking system enables efficient monitoring of the assembly operations by detecting whether the correct part is being picked by the human and whether it is placed at the correct location/orientation in front of the robot.

3. (3)

New experimental results consisting of a collaboration between a human and a KUKA robot to assemble a three-dimensional (3D)-printed mockup of a simplified jet-engine. These experiments also demonstrate how the part-tracking system, combined with the human-instruction module, enables replanning of assembly operations on-the-fly.

Preliminary works related to this paper were presented in Refs. [2] and [3]. There are several works in the human–robot collaboration literature that compared different modes of collaboration [47]. Since this paper is mainly focused on the part estimation system, we present quantitative results on this topic. More system-level comparative results are outside the scope of this paper. Recent advances in safer industrial robots [810] and exteroceptive safety systems [1,11] create a potential for hybrid cells where humans and robots can work side-by-side, without being separated from each other by physical cages. However, realizing this goal is challenging. Humans might accidentally come in the way of the robot. Therefore, the robot must be able to execute appropriate collision avoidance strategies. Humans are prone to making errors and doing operations differently. Therefore, robot must be able to replan in response to an unpredictable human behavior and modify its motion accordingly. The robot must be able to communicate the error to the human as well.

We consider a one-robot one-human model that exploits complementary strengths of either agents. The human identifies a part from a bin of multiple parts, picks it, and places it in front of the robot. The part is then picked up, and assembled, by the robot. The human also assists the robot in critical situations by performing dexterous fine manipulation tasks required during part-placing. A state monitoring system allows to maintain a “knowledge” about the development of the assembly tasks, and provide additional information to the human operator if needed. After placing the part in front of the robot, the human can proceed with executing the next task instruction, rather than waiting until the robot finishes its intended task. The robot also replans and adaptively responds to different human actions (e.g., robot pauses if the human accidently comes very close to it, waits if the human places an incorrect part in front of it, etc.). All these features result in asynchronous collaborations between robot and the human. An overview of the hybrid cell is shown in Fig. 1.

## Related Work

### Support Human Operations in the Assembly Cell.

Recent advances in information visualization and human–computer interaction have given rise to different approaches to automated generation of instructions that aid humans in assembly, maintenance, and repair. Heiser et al. [12] derived principles for generating assembly instructions based on insights into how humans perceive the assembly process. They compare the instructions generated by their system with factory-provided and hand-designed instructions to show that instruction generation informed by cognitive design principles reduces assembly time significantly. Dalal et al. [13] developed a knowledge-based system that generates temporal multimedia presentations. The content included speech, text, and graphics. Zimmerman et al. [14] developed web-based delivery of instructions for inherently 3D construction tasks. They tested the instructions generated by their approach by using them to build paper-based origami models. Kim et al. [15] used recent advances in information visualization to evaluate the effectiveness of visualization techniques for schematic diagrams in maintenance tasks.

Several research efforts have indicated that instruction presentation systems can benefit from augmented reality techniques. Kalkofen et al. [16] integrated exploded view diagrams into augmented reality. The authors developed algorithms to compose visualization images from exploded/nonexploded real world data and virtual objects. Henderson and Feiner [17] developed an augmented reality system for a mechanic performing maintenance and repair tasks in a field setting. The authors carried out a qualitative survey to show that the system enabled easier task handling. Dionne et al. [18] developed a model of automatic instruction delivery to guide humans in virtual 3D environments. Brough et al. [19] developed virtualtrainingstudio, a virtual environment-based system that allows (i) training supervisors to create instructions and (ii) trainees to learn assembly operations in a virtual environment. A survey of virtual environments-based assembly training can be found in Ref. [20].

### Assembly Part Recognition.

The increasing availability of 3D sensors such as laser scanners, time-of-flight cameras, stereo cameras, and depth cameras has stimulated research in the intelligent processing of 3D data. Object detection and pose estimation is a vast area of research in the computer vision. In the past decade, researchers focused on designing robust and discriminative 3D features to find reliable correspondences between 3D point sets [2124]. Very few approaches are available for object detection based on feature correspondences when scenes are characterized by clutters and occlusions [2527]. In addition, these methods cannot deal with the presence of multiple instances of a given model, which is also the case with bag-of-3D features methods [2831] (refer to Ref. [32] for a survey on this topic). Feature-free approaches have also been developed based on the information available from depth cameras. The use of depth cameras became popular after the introduction of the low-cost Kinect technology. Kinect camera provides good-quality depth sensing by using a structured light technique [33] to generate 3D point clouds in real time. Approaches based on local shape descriptors are expected to perform better [25,26] in environments with many objects that have different shapes. However, these approaches do not work in the presence of symmetries and objects with similar shapes.

## System Overview

The hybrid cell will operate in the following manner:

1. (1)

The cell planner will generate a plan that will provide instructions for the human and the robot in the cell.

2. (2)

Instructions for the human operator will be displayed on a screen in the assembly cell.

3. (3)

The human will be responsible for retrieving parts from bins and bringing them within the robot's workspace.

4. (4)

The robot will pick up parts from its workspace and assemble them into the product.

5. (5)

If needed, the human will perform the dexterous fine manipulation to secure the part in place in the product.

6. (6)

The human and robot operations will be asynchronous.

7. (7)

The cell will be able to track the human, the locations of parts, and the robot at all time.

8. (8)

If the human operator makes a mistake in executing an assembly instruction, replanning will be performed to recover from that mistake. Appropriate warnings and error messages will be displayed in the cell.

9. (9)

If the human comes too close to the robot to cause a collision, the robot will perform a collision avoidance strategy.

The overall framework used to achieve the above list of hybrid cell operations consists of the following three modules:

Plan generation. We should be able to automatically generate plans in order to ensure efficient cell operation. This requires generating feasible assembly sequences and instructions for robots and human operators, respectively. Automated planning poses the following two challenges. First, generating precedence constraints for complex assemblies is challenging. The complexity can come due to the combinatorial explosion caused by the size of the assembly or the complex paths needed to perform the assembly. Second, generating feasible plans requires accounting for robot and human motion constraints. In Sec. 4, we present methods for automatically generating plans for the operation of hybrid cells.

System state monitoring. We need to monitor the state of the assembly operations in the cell to ensure error-free operations. We present methods for real-time tracking of the parts, the human operator, and the robot in Sec. 5.

Contingency handling. Contingency handling involves collision avoidance between robot and human, replanning, and warning generation. In Sec. 6.1, we describe how the state information is used to take appropriate measures to ensure human safety when the planned move by the robot may compromise safety. If the human makes an error in part selection or placement, and the error goes undetected, it can lead to a defective product and inefficient cell operation. Human error can occur due to either confusion about poor instructions or human not paying adequate attention. In Sec. 6.2, we describe how the part tracking information is used to automatically generate instructions for taking corrective actions if a human operator deviates from the selected plan. Corrective actions involve replanning if it is possible to continue assembly from the current state or issuing warning instructions to undo the task.

## Plan Generation

### Assembly Sequence Generation.

Careful planning is required to assemble the complex products [3436]. Precedence constraints among assembly operations must be used to guide feasible assembly sequence generation. We utilize a method developed in our earlier works [37,38] that automatically detects part interaction clusters that reveal the hierarchical structure in a product. This thereby allows the assembly sequencing problem to be applied to part sets at multiple levels of hierarchy. A 3D CAD model of the product, with the individual parts in their assembled configuration, is used as an input to the algorithm. Our approach described in Ref. [38] combines motion planning and part interaction clusters to generate assembly precedence constraints. We assume that the largest part PartL of the assembly guides the assembly process. Therefore, this part is extracted from the CAD model and kept aside. Next, spatial kmeans clustering is used to group the remaining parts into k part sets. Accordingly, the assembly comprises k + 1 part sets (PartL, PartSet1, PartSet2,…, PartSetk) in the first step. Now, the assembleability of this new assembly is verified. This is achieved by using motion planning to find the part sets that can be removed from the assembly. These parts sets are removed from the assembly and added to a new disassembly layer. Again, we find the part sets that can be removed from the simplified assembly. These part sets are removed from the assembly and added to the second disassembly layer. If this process halts before all part sets are removed, the method goes back to the first step where the number of clusters is incremented by one. This results in a different grouping of k + 1 new clusters. This cycle is repeated until all disassembly layers are identified. Next, the above process is recursively applied to find disassembly layers for each part set identified in the previous step. The information extracted in this way is used to generate a list of assembly precedence constraints among part sets, which can be used to generate feasible assembly sequences for each part set and the whole assembly. More details on the principal techniques (motion planning, generation of disassembly layers, and spatial partitioning-based part interaction cluster extraction), the corresponding algorithms used to implement the above approach, and test results on a wide variety of assemblies can be found in Ref. [38].

The assembly model used to illustrate the concepts developed in this paper is a jet engine assembly as shown in Figs. 2(a) and 2(b). The result of applying the above-mentioned method on this assembly model is a feasible assembly sequence as shown in Fig. 2(c).

### Instruction Generation.

The human worker inside the hybrid cell follows a list of instructions to perform assembly operations. However, poor instructions lead to the human committing mistakes related to the assembly. We address this issue by utilizing an instruction generation system developed in our previous work [39] that creates effective and easy-to-follow assembly instructions for humans. A linearly ordered assembly sequence (result of Sec. 4.1) is given as input to the system. The output is a set of multimodal instructions (text, graphical annotations, and 3D animations) that are displayed on a screen. Text instructions are composed using simple verbs such as Pick, Place, Position, Attach, etc. As mentioned in Sec. 4.1, we compute a feasible assembly sequence directly from the given 3D CAD model of the chassis assembly. Therefore, the following assembly sequence is input to the instruction generation system:

1. (1)

Pick up FRONT SHROUD SAFETY

2. (2)

Place FRONT SHROUD SAFETY on ASSEMBLY TABLE

3. (3)

Pick up MAIN FAN

4. (4)

Place MAIN FAN on ASSEMBLY TABLE

5. (5)

Pick up SHROUD

6. (6)

Place SHROUD on ASSEMBLY TABLE

7. (7)

Pick up FRONT SHAFT

8. (8)

Place FRONT SHAFT on ASSEMBLY TABLE

9. (9)

Pick up FIRST COMPRESSOR

10. (10)

Place FIRST COMPRESSOR on ASSEMBLY TABLE

11. (11)

Pick up SECOND COMPRESSOR

12. (12)

Place SECOND COMPRESSOR on ASSEMBLY TABLE

13. (13)

Pick up REAR SHAFT

14. (14)

Place REAR SHAFT on ASSEMBLY TABLE

15. (15)

Pick up SHELL

16. (16)

Place SHELL on ASSEMBLY TABLE

17. (17)

Pick up REAR BEARING

18. (18)

Place REAR BEARING on ASSEMBLY TABLE

19. (19)

Pick up EXHAUST TURBINE

20. (20)

Place EXHAUST TURBINE on ASSEMBLY TABLE

21. (21)

Pick up COVER

22. (22)

Place COVER on ASSEMBLY TABLE

Figure 3 shows the instructions used by the system for some of the assembly steps. Humans may get confused about which to pick when two parts look similar to each other. To address this problem, we utilize a part identification tool developed in Ref. [39] that automatically detects such similarities and present the parts in a manner that enables the human worker to select the correct part. For this purpose, a similarity metric between two parts was constructed based on attributes like part volume, surface area, types of surfaces, and curvature [40,41].

## System State Monitoring

Monitoring the system state inside the hybrid cell involves tracking of the states of the robot, the human, and the part currently being manipulated by the human. We assume that the robot will be able to execute motion commands given to it, so that the assembly cell will know the state of the robot.

A human tracking system was developed in our previous works [1,11] by using multiple Microsoft–Kinect sensors. The system is capable of building an explicit model of the human in near real time. Human activity is captured by the Kinect sensors that reproduce the human's location and movements virtually in the form of a simplified animated skeleton. Occlusion problems are resolved by using multiple Kinects. The output of each Kinect is a 20-joint human model. Data from all the Kinects are combined in a filtering scheme to obtain the human motion estimates. A systematic experimental analysis of factors like shape of the workspace, number of sensors, placement of sensors, and presence of dead zones was carried out in Ref. [1].

The assembly cell state monitoring uses a discrete state-to-state part monitoring system that was designed to be robust and decrease any possible robot motion errors. A failure in correctly recognizing the part and estimating its pose can lead to significant errors in the system. To ensure that such errors do not occur, the monitoring system consists of two control points—the first control point detects the part selected by the human and the second control point detects the part's spatial transformation when it is placed in the robot's workspace. The detection of the selected part in the first control point helps the system to track the changes introduced by the human in real time and trigger the assembly replanning and the robot motion replanning based on the new sequence. Moreover, the detection of the posture of the assembly part related to the robot in the second control point sends a feedback to the robot with the “pick and place” or “wait” flag.

The part monitoring system is based on a 3D mesh matching algorithm, which uses a real-time 3D part registration and a 3D mesh interactive refinement [42]. In order to register the assembly part in 3D format, multiple acquisitions of the surface are necessary given that a single acquisition is not sufficient to describe the object. These views are obtained by the Kinect sensors and represented as dense point clouds. The point clouds are refined in real time by a dense projective data association and a point-plane iterative closest point, all embedded in kinectfusion [4346]. kinectfusion is used to acquire refined point-clouds from both control points and for every single assembly part. In order to perform a 3D mesh-to-mesh matching, an interactive refinement revises the transformations composed of scale, rotation, and translation. Such transformations are needed to minimize the distance between the refined point cloud in a time ti and the refined point cloud at the origin t0, also called mesh model. Point correspondences were extracted from both meshes using a variation of Procrustes analysis [4749] and then compared with an iterative closest point algorithm [50]. Details of the 3D mesh matching algorithm follows.

### Three-Dimensional Mesh Matching Algorithm.

Three-dimensional vision measurements produce 3D coordinates of the relevant object or scene with respect to a local coordinate system. 3D point cloud registration transforms multiple data sets into the same coordinate system. Currently, there is no standard method for the registration problem and the performance of the algorithms is often related to preliminary assumptions.

Consider a point cloud representation of a rigid object with a set of n points $X={x1,…,xn}∈ℝ3}$ that is subject to an orthogonal rotation $R∈ℝ3x3$ and a translation $t∈ℝ3$. Then the goal is to fit the set of points X into a given point cloud representation of the same object or scene with n points Y = {y1,…, yn} under the choice of an unknown rotation R, an unknown translation t, and an unknown scale factor s. We can represent several configurations of the same object in a common space by maximizing the goodness-of-fit criterion. We do this with the aid of three high-level transformations: (1) translation (move the centroids of each configuration to a common origin), (2) isotropic scaling (shrink or stretch each configuration isotropically to make them as similar as possible), and (3) rotation/reflection (turn or flip the configurations in order to align the point clouds).

The set of transformations of the rigid object can be represented by sxiR + jtT = yi, where j is a 1 × n unit vector. The optimization problem of finding R, t, and s that minimizes the fitting error is often called extended orthogonal Procrustes analysis [51]. We cast our matching/registration problem as a weighted extended orthogonal Procrustes analysis (WEOPA). The rotation R can be computed by solving
$minR‖sRX+jtT−Y‖F2 subject to RTR=I3,det(R)=1$

where $‖.‖F$ is the Frobenius matrix norm. The pseudocode of the WEOPA algorithm is given in algorithm 1 to compute a solution to the orthogonal Procrustes problem.

The WEOPA algorithm depends on a good R0, t0, s0 initialization; therefore, the algorithm is not stable. In order to solve the stability problem, a heuristic method was designed in Ref. [51], which we call the heuristic iterative-WEOPA (algorithm 2). R0 and s0 initially take an identity value and t0 takes the value of zero. This initialization is sufficient for noise-free point clouds but most of the point clouds generated by the sensor contain noise, which shifts the centroid of the 3D point cloud far from the true position. Our algorithm deals with this problem by randomly generating orthogonal rotations, translations, and scaling as part of the initialization process. The heuristic combines these data with the WEOPA fitting algorithm, to compute and store additional minimums. When no new minimum is found after a certain number of iterations (=150), the algorithm is terminated. Later, the total number of minimums is used in order to draw conclusions. Moreover, experimentation showed that in most of the cases the algorithm found the minimum in less than 35 initialization parameters. The system was developed in c++ and uses a prebuilt pcl visualization package. In addition, the sensing and reconstruction of point clouds was customized from the original manufacturer to allow quasi-real-time reconstruction and processing.

### Part Tracking Results.

We have created a 3D printed jet engine replica, which is composed of eleven assembly parts. We selected five representative parts (shown as inputs in Fig. 4) that afford different recognition complexities to illustrate the challenges encountered during an assembly task. A block diagram of the part tracking system is shown in Fig. 4. The first step is to perform segmentation on the point cloud in order to retrieve all assembly parts. In this case, we performed a plane segmentation to find any table in the scene, and consider only clusters sitting on it. Later, we removed all clusters that are too small or too big in order to reduce the number of clusters and therefore the noise in the scene. After human places the part, it is ready to be picked by the robot. Uncertainties related to pose estimation are reduced to a small variation in the final location. That is, any attempt by the robot to pick up the part results in a successful grasping (Fig. 5(c)).

Regardless of the control point, the algorithm uses the point cloud generated from the 3D CAD model as a target and compares this target against the N point clouds or clusters extracted from the scanned scene. This approach allows the system to evaluate the alignment error for each assembly part, detected under the assumption that the minimum error belongs to the matching cluster. Once this analysis is completed, the system identifies the cluster that represents the best matching cluster, and thereby, recognizes the cluster. Experiments showed that our IterativeWEOPA algorithm successfully detected the corresponding matching between point clouds obtained from scanning and point clouds generated from 3D CAD models. Cluster identification and scene labeling provide the system with a tracking mechanism to detect, and report, changes in the scene.

We compared the results with the classical iterative closest point algorithm. Our algorithm performs better for every part. In order to evaluate and compare performance of our approach, a residual error was computed as the mean square distance between the points of the current mesh and the model mesh and their closest point. After 100 iterations, very small changes were observed in terms of these parameters. Therefore, we set 150 as a fixed number of iterations for this specific experiment. The objects considered in this study are assumed to be rigid bodies. Therefore, rotation, translation, and scaling transformation do not deform their corresponding point clouds. This allows the algorithm to use scaling as a compensatory transformation between a noisy point cloud and the point cloud generated by the CAD model. In addition, scaling transformation evaluated at step one is also used as a termination flag. This is valid under the assumption that if scaling transformation is above a specific threshold, then there is a high probability that the scanned part is actually different than the CAD model used for the query.

### Algorithm Characterization.

A complex problem in computer vision is detecting and identifying a part in a subset of parts that are similar. In order to test our model, we analyzed five parts that are geometrically similar. Due to the intrinsic noise and resolution of the sensor, the generated point cloud has many irregularities that eventually can affect the performance of the algorithm. Figure 6 shows the mean square error on point correspondence between five parts, where three of them have a lot of similarities between each other. Despite these irregularities, the algorithm was able to identify the correct part. Any mean square error on point correspondence below 0.09 can be considered as a true positive. Figure 6 shows that the mean square error (MSE) of the three most similar parts are below the threshold. In order to reduce the uncertainty, our algorithm uses a local comparison between parts that belong to a specific assembly. This step helps to sort the parts based on the MSE and identify the one with minimum MSE as the matched part. Experimental results showed that increasing the density of the point cloud improved the performance of the algorithm, in terms of MSE, until some point after which there was no visible improvement. However, the processing time increased exponentially (Fig. 7).

## Contingency Handling

### Collision Avoidance Between Robot and Human.

Ensuring safety in the hybrid cell via appropriate control of the robot motion is related to traditional robot collision avoidance. However, interaction scenarios in shared work cells differ from classical settings significantly. For instance, we cannot ensure safety always, if the robot reacts to a sensed imminent collision by moving along alternative paths. This is primarily due to the randomness of human motion, which is difficult to estimate in advance, and the dynamics of the robot implementing such a collision avoidance strategy. Also, these methods increase the computational burden as collision-free paths must be computed in real time. Velocity-scaling [52] can be used to overcome these issues by operating the robot in a tri-modal state: the robot is in a clear (normal operation) state when the human is far away from it. When the distance between them is below a user specified threshold, the robot changes into a slow (same path, but reduced speed) state. When the distance is below a second threshold (whose value is lesser than that of the first threshold), the robot changes to a pause (stop) state.

Our approach to ensuring safety in the hybrid cell is based on the precollision strategy developed in Ref. [11]: robot's pauses to move whenever an imminent collision between the human and the robot is detected. This is a simpler bi-modal strategy, in which the robot directly changes from clear to pause when the estimated distance is below a threshold. This stop-go safety approach conforms to the recommendations of the ISO standard 10218 [53,54]. In order to monitor the human–robot separation, the human model generated by the tracking system is augmented by fitting all pairs of neighboring joints with spheres that move as a function of the human's movements in real time. A roll-out strategy is used, in which the robot's trajectory into the near future is precomputed to create a temporal set of robot's postures for the next few seconds. Now, we verify if any of the postures in this set collides with one of the spheres of the augmented human model. The method is implemented in a virtual simulation engine developed based on tundra software. More details on this safety system can be found in Ref. [11].

### Replanning and Warning Generation.

If a deviation from the plan is detected, the system will automatically generate plans to handle the contingency. We present a proposal for the design of a contingency handling architecture for hybrid assembly cell that has the ability to replan its sequence in real time. This design permits a human operator to introduce adjustments or improvements into the assembly sequence in real time with little delays to the assembly cell output.

From the disassembly layers generated from the CAD model of the jet engine assembly, we can extract the following assembly sequence: (1) front shroud safety, (2) main fan, (3) shroud, (4) front shaft, (5) first compressor, (6) second compressor, (7) rear shaft, (8) shell, (9) rear bearing, (10) exhaust turbine, and (11) cover. This assembly sequence also defines the plans for the human and the motion planning for the robot. Although human operator and robot handle the same assembly parts, their kinematics constraints are different and have to be considered in the assembly planning.

Initially we can describe a scene where the human operator follows the system generated assembly plan with no-errors or requested adjustments. Figure 8 shows the complete process of the assembly operation. An initial assembly plan is generated before the operations begin in the hybrid assembly cell. The plan generates the sequence for the human pick and place operations and the motion plan for the robot assembly operations. A full integration among the assembly plan, human tracking system, and the robot significantly reduces the probability of error introduced by the robot in the cell. We will ignore those errors in this work. This configuration leaves the human operator as the only agent with the capacity to introduce errors in the assembly cell. We define deviations in the assembly cell as a modification to the predefined plan. These modifications can be classified into three main categories: (1) Deviations that leads to process errors, (2) deviations that leads to improvements in the assembly speed or output quality, and (3) deviations that leads to adjustment in the assembly sequence.

#### Deviations That Lead to Process Errors.

Deviations that lead to process errors are modifications introduced by the human operator that cannot generate a feasible assembly plan. These errors can generate an error in the assembly cell in a way that will require costly recovery. In order to prevent this type of errors, the system has to detect the presence of this modification by the registration of the assembly parts. Once the system has the information about the selected assembly part, it evaluates the error in real time by propagating the modification in the assembly plan and giving a multimodal feedback (e.g., text, visual and audible annotations). We have hand-coded several examples to illustrate the deviation described above. Following the assembly plan in our example and after placing the rear-bearing, the next part to be assembled is “exhaust turbine.” Rather than following the assembly sequence, the human operator can decide to use a different sequence. For example, the human picks the “compressor” part instead of exhaust turbine as shown in Fig. 9(a). In order to find a feasible plan, the new assembly sequence with Compressor as a second step is evaluated in real time. Using the exploration matrix, the system determines that there is no possibility to find a feasible assembly sequence following this step. Therefore, the system raises an alarm and generates appropriate feedback using text annotations. This forces the human operator to rely on the predefined assembly sequence.

#### Deviations That Leads to Improvement.

Every single modification to the master assembly plan is detected and evaluated in real time. The initial assembly plan is one of the many feasible plans that can be found. A modification in the assembly plan that generates another valid feasible plan classifies as an improvement. These modifications are accepted and give the ability and authority to the human operators to use their experience in order to produce better plans. This process helps the system to evolve and adapt quickly using the contributions made by the human agent. Following the assembly sequence, the next part to be assembled is “Front Shaft”. The human operator decides based on his/her previous experience that placing the “first compressor” next will improve the performance of the assembly process. The part first compressor is selected and the step is evaluated in real time. The system discovers that the changes made in the predefined assembly sequence can also generate a feasible assembly sequence. Therefore, the step is accepted and human is prompted to continue with the assembly operation. The updated assembly sequence becomes: (1) front shroud safety, (2) main fan, (3) shroud, (4) first compressor, (5) front shaft, (6) second compressor, (7) rear shaft, (8) shell, (9) rear bearing, (10) exhaust turbine, and (11) cover.

The most important feature of the framework is that the hybrid assembly cell not only accepts the modification in the assembly sequence, but also adapts its configuration in order to complete the assembly process.

Adjustments in the assembly process may occur when the assembly cell can easily recover from the error introduced by the human by requesting additional interaction in order to fix it. Assuming that the human operator is following the predefined assembly sequence, the next assembly part to be assembled is front shaft. The system recognizes the assembly part and validates the step. Therefore the part can be moved and placed in the intermediate location. Another common mistake in assembly part placement is the wrong pose (rotational and translational transformation that diverges from the required pose). The human is informed by the system about the mistake and is prompted to correct it. The system verifies the poses of the assembly parts in the intermediate location in real time and forces the human operator to place the part in the right location in order to resume the assembly process. Once the assembly part is located in the right position and orientation, the assembly process resumes.

## Conclusions

We presented the design details of a framework for hybrid cells that support safe and efficient human–robot collaboration during assembly operations. We presented an approach for monitoring the state of the hybrid assembly cell during assembly operations. The discrete state-to-state part monitoring was designed to be robust and decrease any possible robot motion errors. While the assembly operations are performed by human and robot, the system constantly sends feedback to the human operator about the performed tasks. This constant feedback, in the form of 3D animations, text and audio, helps to reduce the training time and eliminate the possibility of assembly errors. We will conduct experiments to quantitatively demonstrate these benefits of the proposed method in the future. A Microsoft–Kinect sensor, which has an effective range of approximately 1 to 4 m, was used for both part monitoring and human monitoring. Therefore, the monitoring equipment is placed sufficiently far from the robot without affecting its normal working process. We carried out a detailed sensor placement analysis w.r.t. the human-monitoring system in Ref. [1]. We will carry out a similar placement analysis of the part-monitoring system in the future. The proposed method uses a precollision strategy to predict human's impending collision with the robot and pauses its motion. We will compliment this capability in the future by exploiting the KUKA robot's inbuilt force sensing and impedance control features to implement compliant control for handling postcollision scenarios. In our previous work, we have developed other modules including ontology for task partitioning in human–robot collaboration for kitting operations [55] and resolving perception uncertainties [56] and occlusions in robotic bin-picking in hybrid cells [57]. Future work consists of investigating how to integrate them into the development of hybrid work cells for assembly applications.

## Funding Data

• National Science Foundation (Grant Nos. 1634431 and 1713921).

## References

References
1.
Morato
,
C.
,
Kaipa
,
K. N.
,
Zhao
,
B.
, and
Gupta
,
S. K.
,
2014
, “
Toward Safe Human Robot Collaboration by Using Multiple Kinects Based Real-Time Human Tracking
,”
ASME J. Comput. Inf. Sci. Eng.
,
14
(
1
), p.
011006
.
2.
Morato
,
C.
,
Kaipa
,
K. N.
,
Liu
,
J.
, and
Gupta
,
S. K.
,
2014
, “A Framework for Hybrid Cells That Support Safe and Efficient Human-Robot Collaboration in Assembly Operations,”
ASME
Paper No. DETC2014-34671.
3.
Morato
,
C.
,
Kaipa
,
K. N.
, and
Gupta
,
S. K.
,
2017
, “System State Monitoring to Facilitate Safe and Efficient Human-Robot Collaboration in Hybrid Assembly Cells,”
ASME
Paper No. DETC2017-68269.
4.
Bauer
,
A.
,
Wollherr
,
D.
, and
Buss
,
M.
,
2008
, “
Human-Robot Collaboration: A Survey
,”
Int. J. Humanoid Rob.
,
5
(
1
), pp.
47
66
.
5.
Shi
,
J.
,
Jimmerson
,
G.
,
Pearson
,
T.
, and
Menassa
,
R.
,
2012
, “
Levels of Human and Robot Collaboration for Automotive Manufacturing
,”
Workshop on Performance Metrics for Intelligent Systems
(
PerMIS
), College Park, MD, Mar. 20–22, pp.
95
100
.
6.
Cherubini
,
A.
,
Passama
,
R.
,
Crosnier
,
A.
,
Lasnier
,
A.
, and
Fraisse
,
P.
,
2016
, “
Collaborative Manufacturing With Physical Human-Robot Interaction
,”
Rob. Comput.-Integr. Manuf.
,
40
, pp.
1
13
.
7.
,
B.
, and
Wang
,
Y.
,
2017
, “
Collaborative Assembly in Hybrid Manufacturing Cells: An Integrated Framework for Human-Robot Interaction
,”
IEEE Trans. Autom. Sci. Eng.
,
PP
(
99
), pp.
1
15
.
8.
Baxter,
2010
, “Rethink Robotics,” Rethink Robotics, accessed Jan. 29, 2018, http://www.rethinkrobotics.com/baxter
9.
KUKA,
2010
, “KUKA LBR IV,” KUKA Robotics Corporation, Shelby Charter Township, MI, accessed Jan. 29, 2018, https://www.kuka.com/en-us/products/robotics-systems/industrial-robots/lbr-iiwa
10.
ABB,
2013
, “ABB Friendly Robot for Industrial Dual Arm FRIDA,” ABB, accessed Jan. 29, 2018, http://new.abb.com/products/robotics/industrial-robots/yumi
11.
Morato
,
C.
,
Kaipa
,
K. N.
,
Zhao
,
B.
, and
Gupta
,
S. K.
,
2013
, “Safe Human Robot Interaction by Using Exteroceptive Sensing Based Human Modeling,”
ASME
Paper No. DETC2013-13351.
12.
Heiser
,
J.
,
Phan
,
D.
,
Agrawala
,
M.
,
Tversky
,
B.
, and
Hanrahan
,
P.
,
2004
, “
Identification and Validation of Cognitive Design Principles for Automated Generation of Assembly Instructions
,”
Working Conference on Advanced Visual Interfaces
(
AVI
), Gallipoli, Italy, May 25–28, pp.
311
319
.
13.
Dalal
,
M.
,
Feiner
,
S.
,
McKeown
,
K.
,
Pan
,
S.
,
Zhou
,
M.
,
Höllerer
,
T.
,
Shaw
,
J.
,
Feng
,
Y.
, and
Fromer
,
J.
,
1996
, “
Negotiation for Automated Generation of Temporal Multimedia Presentations
,”
Fourth ACM International Conference on Multimedia
(
MULTIMEDIA
), Boston, MA, Nov. 18–22, pp.
55
64
.
14.
Zimmerman
,
G.
,
Barnes
,
J.
, and
Leventhal
,
L.
,
2003
, “
A Comparison of the Usability and Effectiveness of Web-Based Delivery of Instructions for Inherently-3D Construction Tasks on Handheld and Desktop Computers
,”
Eighth International Conference on 3D Web Technology
(
Web3D
), Saint Malo, France, Mar. 9–12, pp.
49
54
.
15.
Kim
,
S.
,
Woo
,
I.
,
Maciejewski
,
R.
,
Ebert
,
D. S.
,
Ropp
,
T. D.
, and
Thomas
,
K.
,
2010
, “
Evaluating the Effectiveness of Visualization Techniques for Schematic Diagrams in Maintenance Tasks
,”
Seventh Symposium on Applied Perception in Graphics and Visualization
(
APGV
), Los Angeles, CA, July 23–24, pp.
33
40
.
16.
Kalkofen
,
D.
,
Tatzgern
,
M.
, and
Schmalstieg
,
D.
,
2009
, “
Explosion Diagrams in Augmented Reality
,”
IEEE Virtual Reality Conference
(
VR
), Lafayette, LA, Mar. 14–18, pp.
71
78
.
17.
Henderson
,
S.
, and
Feiner
,
S.
,
2011
, “
Exploring the Benefits of Augmented Reality Documentation for Maintenance and Repair
,”
IEEE Trans. Visualization Comput. Graph.
,
17
(
10
), pp.
1355
1368
.
18.
Dionne
,
D.
,
de la Puente
,
S.
,
León
,
C.
,
Hervás
,
R.
, and
Gervás
,
P.
,
2009
, “
A Model for Human Readable Instruction Generation Using Level-Based Discourse Planning and Dynamic Inference of Attributes Disambiguation
,”
12th European Workshop on Natural Language Generation
, Athens, Greece, Mar. 30–31, pp.
66
73
.http://www.aclweb.org/anthology/W09-0610
19.
Brough
,
J. E.
,
Schwartz
,
M.
,
Gupta
,
S. K.
,
Anand
,
D. K.
,
Kavetsky
,
R.
, and
Pettersen
,
R.
,
2007
, “
Towards the Development of a Virtual Environment-Based Training System for Mechanical Assembly Operations
,”
Virtual Reality
,
11
(
4
), pp.
189
206
.
20.
Gupta
,
S. K.
,
Anand
,
D.
,
Brough
,
J. E.
,
Kavetsky
,
R.
,
Schwartz
,
M.
, and
Thakur
,
A.
,
2008
, “
A Survey of the Virtual Environments-Based Assembly Training Applications
,”
Virtual Manufacturing Workshop
, Turin, Italy, pp. 1–10.
21.
Ohbuchi
,
R.
,
,
K.
,
Furuya
,
T.
, and
Banno
,
T.
,
2008
, “
Salient Local Visual Features for Shape-Based 3D Model Retrieval
,”
IEEE International Conference on Shape Modeling and Applications
(
SMI
), Stony Brook, NY, June 4–6, pp.
93
102
.
22.
Chen
,
H.
, and
Bhanu
,
B.
,
2007
, “
3D Free-Form Object Recognition in Range Images Using Local Surface Patches
,”
Pattern Recognit. Lett.
,
28
(
10
), pp.
1252
1262
.
23.
Liu
,
Y.
,
Zha
,
H.
, and
Qin
,
H.
,
2006
, “
Shape Topics: A Compact Representation and New Algorithms For 3D Partial Shape Retrieval
,”
IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
, New York, June 17–22, pp.
2025
2032
.
24.
Frome
,
A.
,
Huber
,
D.
,
Kolluri
,
R.
,
Bulow
,
T.
, and
Malik
,
J.
,
2004
, “
Recognizing Objects in Range Data Using Regional Point Descriptors
,”
European Conference on Computer Vision
(
ECCV
), Prague, Czech Republic, May 11–14, pp.
224
237
.https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/frome-sc3d.pdf
25.
Mian
,
A.
,
Bennamoun
,
M.
, and
Owens
,
R.
,
2009
, “
On the Repeatability and Quality of Keypoints for Local Feature-Based 3D Object Retrieval From Cluttered Scenes
,”
Int. J. Comput. Vision
,
89
(2–3), pp. 348–361.
26.
Mian
,
A.
,
Bennamoun
,
M.
, and
Owens
,
R.
,
2009
, “
A Novel Representation and Feature Matching Algorithm for Automatic Pairwise Registration of Range Images
,”
Int. J. Comput. Vision
,
66
(
1
), pp.
19
40
.
27.
Zhong
,
Y.
,
2009
, “
Intrinsic Shape Signatures: A Shape Descriptor for 3D Object Recognition
,”
IEEE 12th International Conference on Computer Vision Workshops
(
ICCV
), Kyoto, Japan, Sept. 27–Oct. 4, pp.
689
696
.
28.
Johnson
,
A.
, and
Hebert
,
M.
,
1999
, “
Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
21
(
5
), pp.
433
449
.
29.
Chua
,
C.
, and
Jarvis
,
R.
,
1997
, “
Point Signatures: A New Representation for 3D Object Recognition
,”
Int. J. Comput. Vision
,
25
(
1
), pp.
63
85
.
30.
Stein
,
F.
, and
Medioni
,
G.
,
1992
, “
Structural Indexing: Efficient 3-D Object Recognition
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
14
(
2
), pp.
125
145
.
31.
Hetzel
,
G.
,
Leibe
,
B.
,
Levi
,
P.
, and
Schiele
,
B.
,
2001
, “
3D Object Recognition From Range Images Using Local Feature Histograms
,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (
CVPR
), Kauai, HI, Dec. 8–14, pp.
II-394
II-399
.
32.
Tangelder
,
J.
, and
Veltkamp
,
R.
,
2004
, “
A Survey of Content Based 3D Shape Retrieval Methods
,”
IEEE
International Conference on Shape Modeling Applications
, Genova, Italy, June 7–9, pp.
145
156
.
33.
Freedman
,
A.
,
Shpunt
,
B.
,
Machline
,
M.
, and
Arieli
,
Y.
,
2008
, “Depth Mapping Using Projected Patterns,” Prime Sense Ltd., Israel, Patent No.
WO 2008/120217 A2
34.
Gupta
,
S. K.
,
Regli
,
W. C.
,
Das
,
D.
, and
Nau
,
D. S.
,
1997
, “
Automated Manufacturability Analysis: A Survey
,”
Res. Eng. Des.
,
9
(
3
), pp.
68
190
.
35.
Gupta
,
S. K.
,
Paredis
,
C.
,
Sinha
,
R.
,
Wang
,
C.
, and
Brown
,
P. F.
,
1998
, “
An Intelligent Environment for Simulating Mechanical Assembly Operation
,”
ASME Design Engineering Technical Conferences
(
DETC
), Atlanta, GA, Sept. 13–16, pp. 1–12.https://www.ri.cmu.edu/pub_files/pub2/gupta_satyandra_1998_1/gupta_satyandra_1998_1.pdf
36.
Gupta
,
S. K.
,
Paredis
,
C.
,
Sinha
,
R.
, and
Brown
,
P. F.
,
2001
, “
Intelligent Assembly Modeling and Simulation
,”
Assem. Autom.
,
21
(
3
), pp.
215
235
.
37.
Morato
,
C.
,
Kaipa
,
K. N.
, and
Gupta
,
S. K.
,
2012
, “
Assembly Sequence Planning by Using Multiple Random Trees Based Motion Planning
,”
ASME
Paper No. DETC2012-71243.
38.
Morato
,
C.
,
Kaipa
,
K. N.
, and
Gupta
,
S. K.
, 2013, “
Improving Assembly Precedence Constraint Generation by Utilizing Motion Planning and Part Interaction Clusters
,”
J. Comput.-Aided Des.
,
45
(
11
), pp.
1349
1364
.
39.
Kaipa
,
K. N.
,
Morato
,
C.
,
Zhao
,
B.
, and
Gupta
,
S. K.
,
2012
, “
Instruction Generation for Assembly Operations Performed by Humans
,”
ASME
Paper No. DETC2012-71266.
40.
Cardone
,
A.
,
Gupta
,
S. K.
, and
Karnik
,
M.
,
2003
, “
A Survey of Shape Similarity Assessment Algorithms for Product Design and Manufacturing Applications
,”
ASME J. Comput. Inf. Sci. Eng.
,
3
(
2
), pp.
109
118
.
41.
Cardone
,
A.
, and
Gupta
,
S. K.
, 2006, “
Similarity Assessment Based on Face Alignment Using Attributed Applied Vectors
,”
Comput.-Aided Des. Appl.
,
3
(5), pp. 645–654.
42.
Petitjean
,
S.
,
2002
, “
A Survey of Methods for Recovering Quadrics in Triangle Meshes
,”
ACM Comput. Surv.
,
34
(
2
), pp.
211
262
.
43.
Newcombe
,
R.
, and
Davison
,
A.
,
2010
, “
Live Dense Reconstruction With a Single Moving Camera
,”
IEEE Conference on Computer Vision and Pattern Recognition
(
CVPR
), San Francisco, CA, June 13–18, pp.
1498
1505
.
44.
Newcombe
,
R.
,
Lovegrove
,
S.
, and
Davison
,
A.
,
2011
, “
DTAM: Dense Tracking and Mapping in Real-Time
,”
International Conference on Computer Vision
(
ICCV
), Barcelona, Spain, Nov. 6–13, pp.
2320
2327
.
45.
Newcombe
,
R.
,
,
S.
,
Hilliges
,
O.
,
Molyneaux
,
D.
,
Kim
,
D.
,
Davison
,
A.
,
Pushmeet
,
K.
,
Shoton
,
J.
,
Hodges
,
S.
, and
Fitzgibbon
,
A.
,
2011
, “
Kinectfusion: Real-Time Dense Surface Mapping and Tracking
,”
Tenth IEEE International Symposium on Mixed and Augmented Reality
(
ISMAR
), Basel, Switzerland, Oct. 26–29, pp.
127
136
.
46.
,
S.
,
Kim
,
D.
,
Hilliges
,
O.
,
Newcombe
,
R.
,
Molyneaux
,
D.
,
Newcombe
,
R.
,
Kohli
,
P.
,
Shoton
,
J.
,
Hodges
,
S.
,
Freeman
,
D.
,
Davison
,
A.
, and
Fitzgibbon
,
A.
,
2011
, “
Kinectfusion: Real-Time 3D Reconstruction and Interaction Using a Moving Depth Camera
,”
24th Annual ACM Symposium on User Interface Software and Technology
(
UIST
), Santa Barbara, CA, Oct. 16–19, pp.
559
568
.
47.
Toldo
,
R.
,
Beinat
,
A.
, and
Crosilla
,
F.
,
2010
, “
Global Registration of Multiple Point Clouds Embedding the Generalized Procrustes Analysis Into an ICP Framework
,”
International Conference on 3D Data Processing, Visualization, and Transmission
(
DPVT
), Paris, France, May 17–20, pp. 1–8.https://www.researchgate.net/publication/228959196_Global_registration_of_multiple_point_clouds_embedding_the_Generalized_Procrustes_Analysis_into_an_ICP_framework
48.
Goodall
,
C.
,
1991
, “
Procrustes Methods in the Statistical Analysis of Shape
,”
J. R. Stat. Soc. Ser. B
,
53
(
2
), pp.
285
339
.http://www.jstor.org/stable/2345744
49.
Krishnan
,
S.
,
Lee
,
P.
,
Moore
,
J.
, and
Venkatasubramanian
,
S.
,
2005
, “
Global Registration of Multiple 3D Point Sets Via Optimization-on-a-Manifold
,”
Third Eurographics Symposium on Geometry Processing
(
SGP
), Vienna, Austria, July 4–6, pp. 1–11. https://dl.acm.org/citation.cfm?id=1281952
50.
Rusinkiewicz
,
S.
, and
Levoy
,
M.
,
2001
, “
Efficient Variants of the ICP Algorithm
,”
IEEE
Third International Conference on 3D Digital Imaging and Modeling
, Quebec City, QC, Canada, May 28–June 1, pp.
145
152
.
51.
Wedin
,
P. A.
, and
Viklands
,
T.
,
2006
, “Algorithms for 3-Dimensional Weighted Orthogonal Procrustes Problems,” Umea University, Umeå, Sweden, Technical Report No.
UMINF-06.06
.http://www8.cs.umu.se/~viklands/PhDpaper1.pdf
52.
Davies
,
S.
,
2007
, “
Watching Out for the Workers [Safety Workstations]
,”
IET Manuf.
,
86
(
4
), pp.
32
34
.
53.
Andrieu
,
C.
, and
Doucet
,
A.
,
2011
, “Robots and Robotic Devices: Safety Requirements or Industrial Robots—Part 1: Robot,” International Organization for Standardization, Geneva, Switzerland, Standard No.
ISO 10218-1:2011
.https://www.iso.org/standard/51330.html
54.
ISO,
2011
, “Robots and Robotic Devices: Safety Requirements for Industrial Robots—Part 2: Robot Systems and Integration,” International Organization for Standardization, Geneva, Switzerland, Standard No.
ISO/FDIS 10218-2:2011
.https://www.iso.org/standard/73934.html
55.
Banerjee
,
A. G.
,
Barnes
,
A.
,
Kaipa
,
K. N.
,
Liu
,
J.
,
Shriyam
,
S.
,
Shah
,
N.
, and
Gupta
,
S. K.
,
2015
, “
An Ontology to Enable Optimized Task Partitioning in Human-Robot Collaboration for Warehouse Kitting Operations
,”
Proc. SPIE
,
9494
, p. 94940H.
56.
Kaipa
,
K. N.
,
Kankanhalli-Nagendra
,
A. S.
,
Kumbla
,
N. B.
,
Shriyam
,
S.
,
Thevendria-Karthic
,
S. S.
,
Marvel
,
J. A.
, and
Gupta
,
S. K.
,
2016
, “
Addressing Perception Uncertainty Induced Failure Modes in Robotic Bin-Picking
,”
Rob. Comput.-Integr. Manuf.
,
42
, pp.
17
38
.
57.
Kaipa
,
K. N.
,
Shriyam
,
S.
,
Kumbla
,
N. B.
, and
Gupta
,
S. K.
,
2016
, “Resolving Occlusions Through Simple Extraction Motions in Robotic Bin-Picking,”
ASME
Paper No. MSEC2016-8661.