Ball catching by a robot is one of the challenging and complex control tasks that is extensively studied to achieve human-like skills in robots. Over the last decade, several ball-catching robot designs have attained benchmarks in visual tracking and control algorithms. However, the coordination between the ball’s path tracking and the robot’s motion planning remains highly sensitive to environmental parameter changes. In general, ball-catching robots require a noise-free background with good lighting and multiple off-board tracking cameras. Also, a common failing point of these systems is the short flight time (or high-speed) of the ball and the uncertainties of throwing direction. To address these issues, in this study, we propose a ball-catching platform system that can rapidly orient the platform towards the throwing direction by utilizing two onboard cameras with multi-threading. A graphical user interface platform has been developed to implement the orientation algorithm and mask the ball with high accuracy. Our experimental results show that the proposed orientation platform system can be used in a low-light noisy background, and the overall ball-catching rate increases from 50% to 90% compared to the baseline design. The new system can also avoid erratic platform movements when masking is done in a noisy environment.
Playing throw and catch with a robot is not only intriguing but also suitable for learning different types of feedback control systems. Ball-catching by robot demonstrates a coordinated feedback control operation between visual sensing and motion tracking. This operation also needs to be coupled with the robot mechanics to rapidly move the catching platform. The angle and velocity at which a throwing ball approaches the platform can affect the catching performance significantly, and for successful catching, the platform needs to be at the right position at the right time . Due to the complexity of the catching tasks, wide-ranging ball-catching designs have been considered in recent years. The ability to catch thrown objects by robots has been explored through varying platform designs, such as a humanoid hand with three or multiple fingers [2–4], cup- or oval-shaped catching platforms [5–7], and flat-shaped surfaces [8,9]. Along with the catching platform, active vision systems with multiple high-speed cameras have also been studied in a dynamic environment to predict the trajectory of a flying ball, falling angle, and catching point [10–12]; and to play interactive games such as table tennis [13–15] and gesture-based robot games . Some of these designs use mechanical arrangements such as low bounciness index balls and deformable gloves to grip the ball effectively. However, more challenging flat-plate-like catching surfaces to catch high bouncing (e.g., ping pong) balls are also studied as they abstract some key aspects of vision-based feedback control performance and motion tracking in a complex environment.
In this study, we explored a flat catching surface for catching a ping pong ball through image sensing. Most of the similar robots (flat surface to hold the ball) are used for ball-balancing instead of ball-catching, as catching ping pong ball is a difficult task due to its high bounciness. A few similar commercial robots for ball-balancing can be found in Refs. [17,18], and these are used for teaching vision or touch sensor-based proportional-integral-derivative (PID) experiments in undergraduate control labs. In this study, we have used an open-source three degrees-of-freedom (3DOF) PID-controlled platform robot with three legs to develop the platform orientation system and also to make comparative tests of the overall catching performance.
The primary objective of this platform orientation system is to do effective catching of a ping pong ball. In addition to that, a graphical user interface (GUI) platform is developed to do effective masking of the ball in a noisy environment. This research will be highly useful in developing computer vision-based inexpensive sports simulators (catching baseball or softball) and intelligent swivel platforms for solar panels or satellite dishes. The novel contribution of this work is that (1) it presents a visual feedback algorithm with multi-threading to dynamically adjust the platform orientation by changing the leg’s posture and angles and that (2) it presents a new GUI interface (with Hue saturation value (HSV) range selections) for effective masking procedures, which facilitates complex maneuvers in a noisy environment.
Our proposed algorithm can effectively orient the platform towards the ball-throwing direction before executing the catching operation. Also, using the new GUI interface, the robot can be utilized with low-cost web cameras (30 fps) in poor lighting conditions (30–50 lumen per square meter) as well as in a cluttered or noisy environment.
2 Materials and Methods
In this section, we present the development of a platform orientation system to effectively catch a throwing ball. A 3DOF PID-controlled robot designed by JohanLink  is utilized to implement the orientation algorithm and test the catching performance. All the 3D printed (.stl) parts of this project were made available by the designer at Thingiverse site . The original system has one overhead camera that “sees” the ball when it is being dropped to the platform. If the ball bounces too much or the throwing speed is higher, the system performs poorly in catching the ball. This study presents the developments on top of the original design, and it uses an orientation algorithm and two cameras through multi-threading operations.
As shown in Fig. 1(a), using one overhead camera, the original system can track and hold a ball at the center of the plate. The catching platform has total of eight links, six revolute joints, and three spherical joints. Therefore, as a spatial mechanism, it has 3DOF motion. Three servo motors (Futuba-S3003) placed at 120 deg intervals are used to control three 2-link leg mechanisms. The servos use pulse width control, and the partial rotational movements (revolute joints) of these motors can create various tilting angles on the top surface. The bottom three links of each leg are 2.75 in. (70 mm), and the top three links of each leg are 3.34 in. (85 mm). The distance between the tips of each arm to the base is fixed at 4.52 in. (115 mm), and the top catching plate has a diameter of 10.63 in. (270 mm).
2.1 Inverse Kinematics.
The tilt angles (α and β) (Fig. 2) correspond to three motor angles (θA, θB, and θC) at the base that can be determined using the “fsolve” function from python’s scipy.optimize module . The range of tilt angle α is from 0 deg to 35 deg and the range for angle β is from 0 deg to 360 deg with an increment of 0.5 deg. The motor angle constants (k, m, and t) are related to the tilt angles using the kinematic equations shown in Eqs. (1)–(3) .
Here, d is the height of the plate above the motor when the plate is parallel to the base and , where are the absolute distances between the end positions of motor arm A to motor arm B, motor arm B to motor arm C, and motor arm C to motor arm A, respectively.
Here, (X, Y, Z) is the leg-end coordinate, L is the distance from the base center to the motor (85 mm), r is the length of the lower link of a leg (70 mm), and l is the length of the upper link of a leg (85 mm).
Using the “fsolve” function is highly beneficial in avoiding singularities and finding approximate values of angles instead of exact ones. A data file (data.txt) was generated for different angle combinations, and it can be called from the python program (interface.py) to pick appropriate motor angle parameters (θA, θB, and θC) .
An overhead camera (2MP 1080P USB Camera CMOS OV2710) with 30 frames per second is used to capture live video for image analysis. A python programming analyzes the video frame by frame and determines the error (distance from the center to the ball) to implement the PID algorithm. Both the camera and Arduino UNO board are connected to a computer for software integration.
2.2 Proportional-Integral-Derivative Control.
In this study, the empirical Ziegler–Nichols method  is used to determine initial PID gains, which was then followed by a trial and error approach for further fine-tuning. It can be seen from the experiment that if the integral and derivative gains (KI and KD) are set to zero and a very low proportional gain (KP = 0.1) is introduced to the system, its response becomes extremely slow. This test was done by placing the ping pong ball approximately 60 mm (2.36 in.) away from the center. The proportional gain (KD) was then increased gradually by 0.1, and the responses of the system were observed experimentally. For low proportional gain cases (KP < 2), the ball falls off the plate as the catching platform doesn’t demonstrate enough agility to reverse the rolling ball’s trajectory towards the center. After a significant number of experiments with different proportional gains, it has been observed that at KP = 13.9, the ball oscillates continuously with respect to the center without approaching a stable position. At this proportional gain, the ball is also on the verge of instability (i.e., falling off the surface). The peak time of this oscillation wave is approximately 3.5 s. From this experimental investigation, the initial PID gains were determined using the Ziegler–Nichols PID tuning laws presented in Ref. . Figure 3 shows a flow diagram demonstrating this gain selection method.
In our experiment, the ultimate gain, KU = KP = 13.9, and oscillation period, PU = 3.5 s. Using the Zigler–Nichols method, the proportional gain (KP = 8.34), integral gain (KI = 4.77), and the derivative gain (KD = 3.65) were selected.
The PID gains found from the Zeigler–Nichols method needed further fine-tuning. Although using these PID gains, the system can catch balls sometimes, additional manual tuning was helpful in dealing with different lighting conditions and ball-throwing speeds. One of the challenges of the system is that it uses visual feedback video from the camera to determine the ball’s position, velocity, and current error. This calculation is done frame by frame and is highly dependent on the frame rates of the camera, which changes with a slight variation in lighting condition. By tweaking the initial PID gains, better system performance is found for different lighting conditions. These gains are very close to the values found in the Zeigler–Nichols method, except for the integral gain. It has been observed that decreasing integral gain produces a relatively minimal steady-state error but significantly helps to balance the ball when thrown from a distance. Also, if only proportional and derivative gains are used, a small steady-state error (about 30 mm or 1.18 in. from the center) remains. After multiple trials and error, the most balanced PID gains for the system are found to be KP = 8.7, KI = 0.075, and KD = 4. These gains are used in all of the experiments presented in this study.
To implement a platform orientation system, we have used two cameras with multi-threading. A brief description of the newly designed system hardware and its operational principle is given in Sec. 2.3.
2.3 Hardware Setup for Two-Camera System.
The newly designed platform orientation system uses two cameras. One camera is used to “see” the ball’s throwing direction and orient the surface accordingly, and the other camera is used to balance the ball on the platform. One of the challenges of using two-camera systems is synchronizing the video feedback from two sources and freezing of GUI application due to one main thread running continuously on it. To avoid this problem, we have used a python’s binding library “PyQt5” to enable cross-platform applications [24,25]. The “PyQt5” library can provide its own infrastructure to create multithreaded applications using QThread, and along with the main thread, it can also have worker threads . We have used the Qt interface for running an extra class (QRunnable) and passed the task to another thread. A more technical outline of multi-threading applications can be found in Ref. .
In the catching application, first, the front camera (camera 2 in Fig. 4) is utilized to determine the platform orientation towards the throwing direction. If the user moves the ball, the platform will continuously track the ball and will orient its position. After a short period of tracking (we used 3 s), the PID balancing operation by the overhead camera (camera 1 in Fig. 4) starts running. A picture of this hardware system with two cameras is shown in Fig. 4(a). In this system, as the platform is already oriented towards the throwing direction, it eliminates unnecessary bouncing of the ball and performs better in catching. A layout of the two-camera system with multi-threading is shown in Fig. 4(b). The catching performance of the platform orientation robot system is compared with the original one-camera robot, and it is presented in the results and discussions (Sec. 3) of this paper.
2.4 Image Processing and Orientation Algorithm.
Image processing is one of the critical components for successful catching and balancing operations. We have used python’s OpenCV library to convert the RGB images into HSV images and select all the pixels that are part of the ping pong ball. The OpenCV’s “inRange” function is used to pick out pixels based on their values and create a mask . A new GUI platform has been developed to correctly identify the upper and lower hue saturation boundaries. Although both the original and new GUI uses “inRange” function, the newly developed one has sliders for the upper and lower HSV values selection, as shown in Fig. 5(b). This approach is more effective in masking compared to the direct adjusting of the size of a range around certain HSV values in the original GUI program. The user interface of the original GUI platform is shown in Fig. 5(a), and the newly developed GUI platform is shown in Fig. 5(b). To improve the usability of this GUI, a couple of functionalities such as camera selection, manually changing tilt angles (α and β), and catching mode selection, such as with or without multi-threading, are also added to the platform. The GUI platform mode “Standard + User Track” (Fig. 5(b)) activates the multi-threading operation for the platform orientation system with two cameras.
Overall, masking operation using the GUI is significantly improved, and it can correctly mask an item and track it in a noisy environment. An example of masking the ball using HSV ranges is shown in Fig. 6.
Generally, in a cluttered background environment, even after a very good masking operation, noise remains in the frame. Therefore, some areas of the ball may not show up in different lighting conditions due to the hue range. To address this issue, we have done a couple of morphological operations such as Gaussian blur, erode, and dilate. Blurring is applied to remove random noise and smoothen the edge. The eroding operation is applied to contract the foreground, and finally, the dilating operation is applied to expand the foreground . Figure 7 demonstrates the results after these operations, and as shown in Fig. 7(c), the GUI platform can find thresholds and mask the ball with high accuracy using this approach. This is an important requirement as the proper masking helps to track the ball and continuously orients the platform toward the throwing direction. After the successful masking operation for an environment, the GUI program can be executed for tracking and catching operations.
The selected HSV range values are passed to the python program for leg control operations. Individual frames from the real-time video of the front camera are used for the tracking algorithm. First, the program takes the RGB and HSV frame data and uses a range function to blacken anything outside the HSV range. After that OpenCV’s blur, erode, and dilate function is used to get a noise-free frame that only “sees” the ball. To properly outline the ball area “cv2.findContours()” function is used, and the program selects the biggest contour and encloses the ball’s boundary as a circle . The center of the circle is detected and drawn using python’s “imutils” library, as described by Rosebrock .
Once the ball’s center is determined, the program finds error function of the platform orientation system. Ideally, for tracking, the platform center should be aligned with ball’s center. The main GUI continuously calculates this error and adjusts the platform orientation using the PID parameter values. The platform uses the same inverse kinematics data (from data.txt) that produces required servo angles and maintains two tilt angles (α and β). α is the pitch angle that was held at 35 deg in our experiment, and β is the yaw angle that tracks the ball based on the center line (Fig. 7(c)). As a result, when the ball is in a particular direction, the platform rotates in that direction, and it continuously moves along with the ball’s position. A video demonstration of this orientation system can be accessed online.3
The overall approach of the newly developed platform orientation algorithm with multi-threading is shown in Fig. 8.
3 Results and Discussion
In this section, we evaluate the catching performance of the proposed platform orientation system (two-camera system) and compare it with the original one-camera system. Three different types of catching tests are performed: (1) catching from a ramp, (2) catching in a noisy environment, and (3) catching in different lighting conditions. Constant PID gains are used (Kp = 8.7, Ki = 0.075, and Kd = 4) for all tests. The catching environment, statistics, and related discussions are presented in Secs. 3.1 and 3.2.
3.1 Ramp Test.
In this test, the robot catches a ping pong ball falling freely from a ramp. The ball travels a fixed distance on the ramp and then falls into the catching surface to create equal throwing speeds and bouncing effects. The testing ramp is created using PVC pipes, and the traveling distance for the ball on the ramp is 254 mm (10 in.). The test setup can release the ball from 228.6 mm (9 in.) above the catching surface. A constant inclination angle of the ramp is used, and it is 30 deg from the horizontal axis. No additional force was applied to the ball during release time, and therefore only the ramp distance and inclination angle are used to create equal free-fall velocity in all cases. The testing environment to catch the ball falling from the ramp is shown in Fig. 9. For the catching operation, the new system uses the additional camera to orient the platform towards the ball before it is released down the ramp, while the old system starts with a horizontal plate (α = 0) position.
A video demonstration of the ramp test for the original one-camera system can be accessed online4 and a video demonstration of the ramp test for the system with platform orientation can be accessed online.5
Over the ten trials, the new platform orientation system successfully catches the ping pong ball nine times. On the other hand, the one-camera or the overhead camera system catches five times out of ten. Examples of failure and successful trials in catching are given in Figs. 10 and 11. Figure 10 shows examples of four successful trials as the ball dropped about 142.2 mm (5.6 in.) away from the disk center and advanced towards the center. In some of the trails, the ball stabilizes or goes to a steady-state between 40 mm and 100 mm away from the center. This is because of the waviness and crevices of the catching surface. Since the ping pong ball is very light, sometimes it stops in a small crack very close to the center, and the motor moves very little for further adjustments as the proportional voltage to move the surface becomes very small.
The V-shaped plot in Fig. 11 shows a failed trial to catch the ball by the robot with a platform orientation system. This figure shows that the ball moves relatively fast toward the center and overshoots in the other direction. As a result, the absolute distance from the surface center continues to increase again, and finally, it falls off the surface. The distances from the surface center to the ball center are shown in millimeter units, and it is found by converting frame pixels areas into frame areas in square millimeters.
3.2 Noise Test.
This section evaluates the catching performance in a noisy environment by applying new thresholding techniques. Using the new GUI tools, masking can be done by picking appropriate minimum or maximum HSV values through slider bars. Our experiment shows that the new GUI interface can mask the ball very effectively in a noisy environment. The old GUI can also mask the ball, but it creates some background noises, as shown in Fig. 14.
To create noises, we have attached four rectangular orange tapes to the catching surface. These tapes create a noisy environment and make the masking procedure very challenging. Figure 15 shows the thresholding capabilities using the new GUI interface. Here, the masking can eliminate all the background noises.
The testing setup in the noise test was similar to the ramp test setup, as shown in Fig. 9. However, the traveling distance for the ball on the ramp is 152.5 mm (6 in.), and it releases the ball from 152.5 mm (6 in.) above the catching surface without any additional force. The inclination angle of the ramp is 30 deg from the horizontal axis. A video demonstration of the failing catching performance of the original system in a noisy environment can be accessed online.6 Here, the ball was masked using the old GUI interface, and the catching surface shows erratic behavior due to background noises.
The video demonstration of the catching performance where the masking is done by using newly developed GUI interfaces can be accessed online7.
Using the new approach, the masking of the ball was very successful as the robot was able to catch ten out of ten trials. The old one (one-camera system) was able to catch eight out of ten times. The failures on the old one are due to the noise increasing to the point where the robot could not tell the difference between the ball and the tape. In all of the trials using old GUI thresholding, the catching surface becomes very erratic. Figure 16 shows successful catching performance in four trails where thresholding is done by the new GUI system.
Figure 17 shows four successful catching examples when thresholding is done by the old GUI. Although catching is successful in these cases, the catching surface vibrates significantly. By comparing Figs. 16 and 17, we can see that after the initial drop in the old system, the ball moves away from the center to a point where it may fall off the surface.
The erratic behavior of the old GUI in the noisy environment is shown in Fig. 18, which demonstrates two failing examples. Here, the distance from the ball center to the surface center continuously changes, i.e., the surface constantly vibrates and finally fails.
3.3 Light Level Test.
To determine the catching performance in different lighting conditions, we have carried out a series of tests. In general, the light level directly affects the frame rate of the camera. An IOS app (lux light meter) for measuring the illuminance (lux level) is used in this test. Lux is defined as the illumination of one square meter surface when the surface is one meter away from a single candle (1 lumen). A general bright office room has a 500 lux intensity level . We have used different lux intensity levels (30–500 lux) in the light test by adjusting the room light using a dimmable lamp. The setup for releasing the ping pong ball was the same as the noise test. For a significantly dimmed light environment, when lux intensity is less than 30, the frame rate of the overhead camera drops so low that both of the catching systems fail.
In this experiment, we found that both systems perform similarly at different light levels. Table 1 shows the successful catch by each system.
|Two-camera platform orientation system||One overhead camera system|
|LUXa level||Successful catches in five trials|
|Two-camera platform orientation system||One overhead camera system|
|LUXa level||Successful catches in five trials|
LUX = illumination of a 1 m2 surface that is one meter away from a single candle (1 Lumen).
The platform orientation system and the new GUI for thresholding presented in this study would facilitate further development in ball-catching robots. A catching robot can perform better if it can orient the platform towards the throwing direction before the ball is thrown. In this research, we have successfully used two onboard cameras with multi-threading to improve the catching performance. This approach significantly reduces the probability of the ball dropping off the surface and effectively catches high-bouncing index balls. In our ramp test, we found that the new orientation platform can catch a ping pong ball nine out of ten times, while the old one can catch five out of ten times, i.e., success in catching improves from 50% to 90%. The new GUI uses minimum or maximum HSV values that demonstrate good thresholding and masking capabilities in a noisy environment, as shown in Figs. 6, 7, 14, and 15. This method is found to be more effective in picking the small ball and filtering out the noise for two camera views that have different backgrounds.
We have found almost similar performance in the light level test, i.e., the catching rates for the new and old ones are the same. This may be due to the low number of tests (five trials) in the light test.
In the future, we plan to improve the platform orientation system by adding 3D depth-sensing cameras to the system. Another important route to explore is to check the ability to balance the ball through deep reinforcement learning-based control instead of PID control . Also, we plan to increase the number of trials for the light level test to get more statistically significant data.
The authors would like to thank Johan Link for making his Ball-balancing PID project available on Github.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
- d =
plate height above the motors, mm
- α =
tilt angle (pitch), deg
- D =
absolute distance between arm ends, mm
vector perpendicular to the catching plate
- KP =
- KI =
- KD =
- k, m, t =
motor angle constants
- A, B, C =
end positions of motor arms A, B, and C (three by one position matrix) respectively
- u(t) =
input to the system in the time domain
- e(t) =
error in the time domain
- β =
tilt angle (yaw), deg
- θA, θB, θC =
motor angles, deg