## Abstract

Swarm robotic search aims at searching targets using a large number of collaborating simple mobile robots, with applications to search and rescue and hazard localization. In this regard, decentralized swarm systems are touted for their coverage scalability, time efficiency, and fault tolerance. To guide the behavior of such swarm systems, two broad classes of approaches are available, namely, nature-inspired swarm heuristics and multi-robotic search methods. However, the ability to simultaneously achieve efficient scalability and provide fundamental insights into the exhibited behavior (as opposed to exhibiting a black-box behavior) remains an open problem. To address this problem, this paper extends the underlying search approach in batch-Bayesian optimization to perform search with embodied swarm agents operating in a (simulated) physical 2D arena. Key contributions lie in (1) designing an acquisition function that not only balances exploration and exploitation across the swarm but also allows modeling knowledge extraction over trajectories and (2) developing its distributed implementation to allow asynchronous task inference and path planning by the swarm robots. The resulting collective informative path planning approach is tested on target-search case studies of varying complexity, where the target produces a spatially varying (measurable) signal. Notably, superior performance, in terms of mission completion efficiency, is observed compared to exhaustive search and random walk baselines as well as a swarm optimization-based state-of-the-art method. Favorable scalability characteristics are also demonstrated.

## 1 Introduction

Swarm robotic search is concerned with searching for or localizing targets in unknown environments with a large number of collaborative robots. There exists a class of search problems in which the goal is to find the source or target with maximum strength (often in the presence of weaker sources) and where each source emits a spatially varying signal. Potential applications include source localization of gas leakage [1], nuclear meltdown tracking [2], chemical plume tracing [3], and magnetic field and radio source localization [4,5]. In such applications, decentralized swarm robotic systems have been touted to provide mission efficiency, fault tolerance, and scalable coverage advantages [6–8] compared to sophisticated standalone systems. Decentralized search subject to a signal with unknown spatial distribution usually requires both task inference and planning, which must be undertaken in a manner that maximizes search efficiency and mitigates inter-robot conflicts. This in turn demands decision algorithms that are computationally light-weight (i.e., amenable to onboard execution) [9], preferably explainable [10], and scalable [11]—it is particularly challenging to meet these characteristics simultaneously.

In this paper, we perceive the swarm robotic search process to consist of creating/updating a model of the signal environment and deciding future waypoints thereof, so as to collectively find the target source (location with maximum signal strength) as fast as possible. Specifically, we design, implement, and test a novel decentralized algorithm founded on a Bayesian search formalism. This algorithm tackles the exploration/exploitation balance over trajectories (as opposed to over points, which is typical in non-embodied search) while allowing asynchronous decision-making. In this context, we also explicitly consider other constraints attributed to the embodiment of the search process, e.g., individual robot’s speed and rage constraints. The remainder of this section briefly surveys the literature on swarm search algorithms and converges on the contributions of this paper.

### 1.1 Swarm Robotic Search.

In time-sensitive search applications under complex signal distributions, a team of robots can broaden the scope of operational capabilities through distributed remote sensing, scalability, and parallelism (in terms of task execution and information gathering) [12]. The *multi-robot search* paradigm [11] uses concepts such as cooperative control, model-driven strategies [13], Bayesian filter by incorporating mutual information [14], strategies based on local cues [15], and uncertainty reduction methods [16]. Scaling these methods from the multi-robotic (<10 agents [11]) to the swarm-robotic level (10–100 agents) often becomes challenging in terms of online computational tractability.

A different class of approaches that is dedicated to guiding the search behavior for larger teams is based on nature-inspired *swarm intelligence* (SI) principles [17–19]. SI-based heuristics have been used to design algorithms both for search in non-embodied *n*-dimensional space (e.g., particle swarm optimization) and for swarm robotic search [20,21]. Majority of the latter methods are targeted at localizing a single source [9,22]. A notable exception is the Glowworm optimization-based algorithm reported by Krishnanand et al. [18]. This approach was shown to handle multi-modal source localization by assuming robots are initially distributed in the search space, with its effectiveness relying on the usage of adaptive parameters (e.g., changing inertia weight) [22]. The localization of the maximum strength source in the presence of other weaker sources (i.e., given a multi-modal spatial signal-distribution), without making assumptions such as distributed starting points, remains a challenging problem.

*Translating optimization processes*: Similar in principle to some SI approaches, here we aim to translate an optimization strategy [23], namely, Bayesian optimization, to perform search in the physical 2D environment. In doing so, it is important to appreciate two critical differences between these processes: (1) *movement cost:* unlike optimization, in swarm robotic search, moving from one point to another may require a different energy/time cost depending upon the environment (distance, barriers, etc.) separating the current and next waypoints. (2) *Sampling over paths:* robots usually gather multiple samples (signal measurements) over the path from one waypoint to the next (as sampling frequency $\u226b$ waypoint frequency), unlike in optimization where we sample only at their next planned point. This “sampling over paths” characteristic has received minimal attention in existing SI-based approaches.

Moreover, with SI-based methods, the resulting *emergent* behavior, although often competitive, raises questions of dependability (due to the use of heuristics) and mathematical explainability [24]. The search problem can be thought of as comprising two main steps: task inference (identifying/updating the signal spatial model) and task selection (waypoint planning). In SI methods, the two steps are not separable, and a spatial model is not explicit. In our proposed approach, the processes are inherently decoupled—robots exploit Gaussian processes (GPs) to model the signal distribution knowledge (task inference) and solve a 2D optimization over a special acquisition function to decide waypoints (task selection). Such an approach is expected to provide explainability, while preserving computational tractability.

### 1.2 Objective of This Paper.

This paper is an extension of our recent work presented in the ASME 2019 IDETC/CIE conference [25]. In this paper, we develop (an explainable) decentralized and asynchronous swarm robotic search algorithm, subject to the following assumptions: (i) all robots are equipped with precise localization and (ii) each robot can communicate their knowledge, state, and decisions with all neighbors (full observability) at waypoints. In asynchronous decision-making, agents/robots take decision in an event-driven scheme, as opposed to a synchronous approach where all robots need to take decisions at fixed intervals. The asynchronous decision-making is critical in most real-world settings, due to the presence of stochastic action effects and imperfect and unreliable communication [26]. In addition, it has been shown that having asynchronous parallel sampling in Bayesian optimization (the motivating algorithm behind our proposed search method) can improve the optimization progress in comparison to synchronous implementations [27]. Decentralized decision-making here relates to how each swarm robotic agent independently plans its immediate future waypoint.

Within this context, the primary contributions of this paper lie in the following developments: (1) a novel decentralized algorithm (*Bayes-Swarm*) that extends Gaussian process modeling (to update over trajectories) and integrates physical robot constraints and other robots’ decisions to perform informative path planning—simultaneously mitigating knowledge uncertainty and getting closer to the source—and (2) a simulated parallelized implementation of *Bayes-Swarm* to allow asynchronous search planning over complex multi-modal signal distributions.

The remaining portion of the paper is organized as follows: the next section presents the problem definition and GP modeling. Then, our proposed decentralized algorithm (*Bayes-Swarm*) is described. Numerical experiments and results, encapsulating the performance of these methods on different-sized swarm and a parametric analysis of the proposed decentralized method, are then presented. The paper ends with concluding remarks.

## 2 Background

### 2.1 Gaussian Process Model.

*n*observations of an environment, $D=xi,yi|i=1,\u2026,n$, then we can write the following equation by assuming that the observed values

*y*differ from the function

*f*(

**x**) values by an additive noise $\u03f5$, where

**x**denotes an input vector:

*f*(

**x**) can be estimated by a GP with mean function

*μ*(

**x**) and covariance kernel $\sigma 2(x)$ given by

**K**)

_{ij}=

*k*(

**x**

_{i},

**x**

_{j}) and

**k**

_{n}(

**x**) = [

*k*(

**x**

_{1},

**x**), …,

*k*(

**x**

_{n},

**x**)]

^{T}. In this paper, the hyper-parameters of the GP model are optimized by maximizing the log-likelihood

*P*as a function of $\beta ,\theta ,\sigma n2$:

## 3 Swarm Bayesian Algorithm

### 3.1 Bayes-Swarm: Overview.

The robot behaviors including its motion, communication, and decision-making are illustrated in Fig. 1 and the pseudocode of our proposed decentralized *Bayes-Swarm* algorithm is depicted in Algorithm 1. Each robot in a team of size *N*_{r} is assumed to run the *Bayes-Swarm* algorithm at each decision-making step (i.e., after reaching its waypoint) to take the best action by maximizing an acquisition function that guides the team to the source location over the course of the operation. Importantly, these decision-making instances need not be synchronized across robots, unlike several other existing decentralized implementations. Before elaborating on the mathematical formulation of each component of the Bayes-Swarm algorithm, we provide here a brief description of how the overall algorithm works, using Fig. 1 as reference. At the beginning of the mission, the robots do not have any observations from the environment, and thus no belief model to follow (as the default setting); prior knowledge, if available in a practical application, can however be readily incorporated as a prior belief model in our formulation. By default, in the “Select First Waypoint” block in Fig. 1, each robot chooses a waypoint such that the heading directions of the team are somewhat uniformly distributed over the domain of interest. Then, each robot shares its decision with its peers and starts moving toward the planned waypoint. During its movement, each robot takes location-tagged signal measurements (observations) at a fixed sampling rate. Once the robot reaches the planned waypoint, it runs a check for ending the mission (based on algorithm termination criteria), and if unsatisfied, it proceeds to decide its next waypoint. This generic planning process involves four major sub-steps, represented by the four blocks inside the “Waypoint Planning” block in Fig. 1. First, the robot combines its own recent observations with the recent observations received (over a wireless network) from its peers, and then down-samples the new data set for preserving the tractability of onboard updating of the GP model. Subsequently, it uses the new data set to update its GP-model-based acquisition function. The next waypoint is then determined by maximizing this acquisition function, subject to certain range constraints. Each robot then creates an information packet (“Prepare Packet” block in Fig. 1) comprising a downsampled version of its recent observations and its decided next waypoint, and broadcasts that information to all peers, before moving to its next waypoint. This procedure is then repeated until the target of interest is found or other mission termination criteria (e.g., maximum endurance of robots) are reached.

### 3.2 Acquisition Function.

For swarm robotic search, it is important to design an acquisition function that accounts for the characteristics of the embodied search process, i.e., where (i) data are collected and uncertainty is reduced over trajectories and not over separate points in the domain of interest and (ii) robots can only travel finite distances constrained by their maximum speed over a given time-step. Here, we design our own acquisition function partly motivated by the work of Morere et al. [30]. However, in the future, there remains opportunity to translate other well-known acquisition functions such as GP-upper confidence bound (GP-UCB) [31] and q-expected improvement (q-EI) [32] from the Bayesian optimization (BO) and batch-BO domain [23] to suit the needs of swarm robotic search. Below, we describe our unique design of the acquisition function, and how it is used by each robot for waypoint planning.

*r*solves an optimization problem based on its information ($D1:kr$ and $X^\u2212rkr$), including self-observations and shared peers’ observation from the beginning of the mission till the decision-time

*k*

_{r}($D1:kr=\u22c3r=1Nr\u22c3i=1krDri$; $Dri=[Xri,yri]$) and the current local peers’ next waypoint ($X^\u2212rkr=\u22c3p=1;p\u2260rX^\u2212rpkr$). For the

*r*th robot, our mathematical formulation of the acquisition function can be expressed as

*h*

_{r}(.), leads robot

*r*to the expected location of the source (exploitation) and the second term,

*g*

_{r}(.), minimizes the knowledge uncertainty of robot

*r*. The coefficient α ∈ [0, 1] represents the exploitation weight, i.e., α = 1 would lead to purely exploitative behavior. The length (

*l*

_{s}) of the path

*s*is bounded based on the decision-horizon

*T*and the nominal velocity of the robots (

*V*). The individual terms of the acquisition function are described next.

#### The Bayes-Swarm algorithm

**Input:**$GPr,xr,X\u2212rkr$—the current location and recent observations of the robot ($x$), and the next waypoints of its peers ($X\u2212rkr$).

**Output:**$xrkr+1$—the next waypoint of robot-*r* at its iteration $kr$.

1: **procedure** takeDecision $(r,kr,Nr,\Delta \theta )$

2: **if**$kr=0$**then**

3: $xrkr\u2190$ takeFirstDecision $(r,kr,Nr,\Delta \theta )$

4: **else**

5: **if** Size of $Drkr>Nmax$**then** ⊳$Nmax=400$

6: Down-sample $Drkr$ to $Nmax$ observations

7: $xrkr\u2190$ by solving the optimization, Eq. (8)

8: $kr\u2190kr+1$

9: **return**$xrkr,kr$

10: **procedure** takeFirstDecision $(r,Nr,\Delta \theta ,V,T)$

11: $d\u2190VT$

12: **if**$\Delta \theta =360$**then** ⊳$\Delta \theta $: Initial feasible direction range

13: $\theta \u2190r\Delta \theta /Nr$

14: **else**

15: $\theta \u2190r\Delta \theta /(Nr+1)$

16: $xr1\u2190[dcos\theta ,dsin\theta ]$

17: **return**$xr1$

## 3.3 Source Seeking Formulation.

## 3.4 Knowledge-Gain Formulation.

## 3.5 Information Sharing.

Inter-robot communication is a key element of any swarm system, and robots often require to communicate with each other over an ad hoc wireless network in outdoor applications. However, given the bandwidth limitations of ad hoc wireless communication and the energy footprint of wireless communication [33], it is typically desirable to reduce the communication burden. To this end, in our proposed method, the decision-making is allowed to be asynchronous and robots share only a down-sampled set of observations. Table 1 provides a quick overview of the type and frequency of the information shared by each robot with all its peers across the swarm. Algorithm 2 lists two procedures that each robot uses to share or receive information. Robots then proceed to individually update their respective knowledge model based on their own information and the future plan of its peers. Having presented an overview of the *Bayes-Swarm* method, the next section introduces its distributed virtual implementation, case studies developed to test the performance of *Bayes-Swarm*, and the corresponding implementation settings that we used.

Property | Descriptions |
---|---|

Inter-robot communication frequency | After each waypoint planning instance |

Content of transmitted data | (1) Its next location to visit ($xrkr$) and (2) its observations over the last path ($Drkr$) |

Average size of outgoing data packets (with time-horizon 1 min) | 364 bytes |

Property | Descriptions |
---|---|

Inter-robot communication frequency | After each waypoint planning instance |

Content of transmitted data | (1) Its next location to visit ($xrkr$) and (2) its observations over the last path ($Drkr$) |

Average size of outgoing data packets (with time-horizon 1 min) | 364 bytes |

### Communication procedures

1: **procedure** receiveInformation $r,p,xpkp,Dpkp$

2: $Dr1:kr\u2190Dr1:kr\u22c3Dpkp$

3: $X^\u2212rpkr(1:2)\u2190X^\u2212rpkr(3:4)$

4: $X^\u2212rpkr(3:4)\u2190xpkp$

5: **return**$Dr1:kr,X^\u2212rpkr$

6: **procedure** sendInformation $r,xrkr,Drkr$

7: **if**$kr=0$**then**

8: Broadcast $xrkr$ ⊳4 bytes

9: **else**

10: Broadcast ${xrkr$; $Drkr}$ ⊳$4+6T$ bytes

## 3.6 Downsampling Collective Swarm Observations.

In order to keep the Bayes-Swarm algorithm scalable and computational tangible, we are required to downsample the collective data set of the swarm (i.e., observations made by the agents). This is because updating the GP models presents a cubic time complexity (*O*(*n*^{3})) with respect to the size (*n*) of the data set. In this work, we use a simple downsampling approach, known as sample rate compression by an integer factor *M* [34]. This approach reduces the data set by keeping the first sample and then every *M*-th sample after the first, where $M=\u2308size(D1:kr)/Nmax\u2309$.

## 4 Numerical Experiments and Case Studies

### 4.1 Distributed Virtual Implementation of Bayes-Swarm.

In order to enable a better representation of the distributed planning process embodied by a physical swarm of robots, we develop a simulated environment that provisions a parallel computing deployment of *Bayes-Swarm*. This uses “matlab’s” parallel programing capabilities to invoke 40 dedicated threads. Each robot operates (the behavior illustrated in Fig. 1) in parallel with respect to the rest of the swarm, updating its own knowledge model after each waypoint and deciding its own next waypoint. The entire process is simulated in a virtual environment developed with matlab R2017b and is executed on a workstation with Intel^{®} Xeon Gold 6148 27.5M Cache 2.40 GHz, 20 cores processor and 196 GB RAM. The simulation time-step is set at 1 ms. *Robot settings*: we set the velocity of each swarm robot at 10 cm/s based on the specifications of e-puck 2 [35]. The observation frequency is set at 1 Hz. To keep the computational complexity of refitting the GP low, the size of data ($Dr1:kr$) used by each robot is downsampled to 400 (i.e., when it grows beyond 400 in the latter stages of the mission).

### 4.2 Case Studies.

We design and execute a set of numerical experiments to investigate the performance of the proposed decentralized *Bayes-Swarm* approach. In order to provide an insightful understanding of the *Bayes-Swarm* algorithm, three types of tests are conducted for all case studies and the results are evaluated and compared in terms of completion time, cost incurred by robots, knowledge-gain per robot, and mapping error. Mapping error measures how the estimated response surface using GP deviates from the actual response surface of the source in terms of the root-mean-square-error metric. The three tests are described next. *Experiment 1:* a parametric analysis is conducted to study how the exploitation coefficient of *Bayes-Swarm* affects its performance. *Experiment 2:* a scalability analysis is conducted to investigate the performance of *Bayes-Swarm* across multiple swarm sizes. *Experiment 3:**Bayes-Swarm* is run using the default values listed in Table 2 to analyze its performance in response to different single- and multi-modal spatial distribution of signal strength, and results are compared with those of standard *exhaustive search* and *random walk* methods. *Experiment 4:* The performance of *Bayes-Swarm* is compared with that of glowworm optimization algorithm proposed by Krishnanand et al. [18], tested on case study 5.

Case study | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|

Bayes-Swarm | 500 | 100 | 500 | 700 | 100 |

Random-walk | 4000 | 50,000 | 60,000 | 60,000 | 10,000 |

Case study | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|

Bayes-Swarm | 500 | 100 | 500 | 700 | 100 |

Random-walk | 4000 | 50,000 | 60,000 | 60,000 | 10,000 |

To conduct the first three experiments stated above, five distinct case studies are defined, each corresponding to a different spatial distribution of the signal strength, as shown in Fig. 2. The first case study is a large convex source signal distribution and the rest of the case studies are non-convex multi-modal signal distributions (involving multiple signal sources). Case study 4 is expected to be the most challenging case as it contains one global maxima (target source) and five local maxima (weaker sources) in a large arena. Case study 5, adopted from Ref. [36], contains one global maxima (target source) and two local maxima (weaker sources).

In this paper, *Bayes-Swarm* utilizes two termination criteria during operation. The primary criterion terminates the search if any robot arrives within $\u03f5$-vicinity of the source signal location. In addition, *Bayes-Swarm* terminates if the operation reaches a maximum allowed search time (*T*_{max}). The distance threshold $\u03f5$ is set at 5 cm and the maximum search time *T*_{max} is outlined for each case study in Table 2. The decision-time horizon (*T*) is set at 4 s for the first decision-making step; then it changes to 10 s for the later decision-making steps.

### 4.3 Demonstrating Bayes-Swarm: Case Study 2.

Figure 3 depicts four snapshots of the *Bayes-Swarm* algorithm for case study 2 with four robots and α = 0.4. It can be seen from this figure that how the estimated knowledge model and its uncertainty improves by exploring the search space. The top figures show the uncertainty map ($\sigma (x$)) and the bottom figures show the robot location and its knowledge state (dashed contours). In the bottom figures, the gray solid contours represent the actual source signal (ground truth) and the gray dashed contours represent the source signal (knowledge) model of a robot at the stated time point. Blue solid lines show the paths that robots have already traveled and the observations over which have been shared with all peers, assisting the refitting of their knowledge model. The red solid line shows the paths traveled but the observations over which have not yet been shared with peers. The red dashed lines represent the paths that have been planned but not yet traveled.

From Figs. 3(a)–3(e), it can be seen that when robot 1 reaches its first waypoint, only four self observations are available to it;, hence it is able to build only a relatively inaccurate knowledge model (that gives the expected location of the source at (1.6, 1.0), which is in reality far away from both of the actual sources). When the last robot (robot 4) takes decision, it has its peers’ observations at *t* = 4^{+} s. The knowledge model (Fig. 3(f)) is still inaccurate, but the uncertainty map (Fig. 3(b)) is improved. After 26 s (Figs. 3(c) and 3(g)), the robots are able to converge to a fairly accurate knowledge model of the signal distribution, and their future updates and planning (seen in Figs. 3(d) and 3(h)) puts two robots in the team within the threshold of the source location at time *t* = 54 s.

## 5 Results and Discussion

### 5.1 Experiment 1: Parameter Analysis of Bayes-Swarm.

In the proposed decentralized method, there is one major prescribed parameter that needs to be prescribed or tuned—the exploitation coefficient parameter α—that regulates the balance between exploration and exploitation. We run an experiment to study how this exploitation coefficient parameter (α varying from 0 to 1) affects the performance of *Bayes-Swarm* for the case studies 2 and 4, across multiple swarm sizes. Snapshots of the final state of robots for three values of α for the case study 2 with four robots are depicted in Fig. 4. The performance outcomes in terms of completion time, and mapping error are summarized in Figs. 5 and 6.

*Pure source seeking (α = 1):* One of the extreme case happens when the knowledge-gain term is eliminated in the objective function; in this mode, robots try to reach the expected source location faster without exploring the area (getting enough knowledge)—basically the purely greedy approach. For this purpose, the exploitation coefficient is set at α = 1. Figure 4(c) illustrates the behavior of robots under this setting. It can be seen from this figure that the estimated source signal or knowledge model is quite inaccurate due to the lack of explorative search.

*Only knowledge-gain term (α = 0):* By setting α = 0, the objective function (Eq. (8)) is reduced to the knowledge-gain term (Eq. (11)), which results in purely explorative search. As expected, under this setting, robots are able to estimate a relatively accurate model of signal distribution (Fig. 4(a)). This mode is suited for mapping applications, such as mapping offshore oil spills [12].

*Combined source seeking and knowledge-gain terms—different trade-offs (0 < α < 1):* By setting the exploitation coefficient α at values between 0 and 1, we can tune the degree of exploration and exploitation of the swarm search. Figures 5(b) and 6(b) show that by increasing the exploitation coefficient from 0 to 1, the mapping error increases, especially for α values beyond 0.3. Figure 4(b) depicts the search behavior of the swarm for α = 0.4. In this setting, one robot successfully reaches the source location while other robots are still exploring the search area. Depending on the complexity of the source signal distribution, the effect of exploitation coefficient parameter on the estimation of the knowledge model will vary.

In terms of completion time, the complexity of the source signal distribution and the initial path of robots play important roles. In case study 2, the impact of α on completion time varies with the size of the robot team (Fig. 5(a)). In case study 4, we can see from Fig. 6(a) that Bayes-Swarm with α > 0.04 is not able to lead the robots to find the target/source within the maximum allowed time (700 s). In order to get the best performance, the exploitation coefficient (α) needs to be less than 0.02. This is attributed to the need for greater exploration in a multimodal environment. In summary, for choosing the correct value of α to get the best performance, we need to consider the number of robots, the complexity of the source signal distributions, and the robots’ capabilities.

### 5.2 Experiment 2: Scalability Analysis of Bayes-Swarm.

In this experiment, we use case study 4 to perform an analysis of how the size of the robot swarm impacts *Bayes-Swarm*’s performance. To this end, we run *Bayes-Swarm* simulations with α = 0.4 and swarm sizes varying from 2 to 100. Figure 7 illustrates the results of this analysis in terms of the completion time, averaged knowledge-gain of each robot ($g\xaf(x$)), averaged number of decisions per robot ($N\xafd$), and mapping error. The results show that the performance improves by increasing the size of the swarm from 2 to 100, with completion time reducing by ∼41.3%. Moreover, the averaged number of decisions (waypoint planning instances) per robot and the averaged knowledge-gain per robot, respectively, decrease by about 64% and 83.3% when the swarm size grows from 2 to 100. Although the mapping error with 100 robots is 16.6% less than the mapping error with 2 robots, increasing the number of robots does not universally improve the mapping error, as evident from the non-monotonic trend seen in the top right plot of Fig. 7 (unless α is tuned based on the size of swarm).

To summarize the observations made from Fig. 7, increasing the size of swarms becomes increasingly effective for complex signal distribution environments. However, beyond a certain swarm size (∼20 in this analysis), there is a decreasing rate of improvement. These observations provide strong evidence of the scalability of the *Bayes-Swarm* method. At the same time, they highlight the importance of identifying suitable team sizes for suitable mission profiles, given resource constraints and time sensitivity of the mission.

### 5.3 Experiment 3: Comparative Analysis With Baselines.

*Exhaustive search* and *random-walk* algorithms are implemented along with *Bayes-Swarm* for comparative analysis. We test these algorithms to find the source location in the five case studies, illustrated in Fig. 2. The settings of *Bayes-Swarm* are not individually tuned for each case, in order to allow fair comparison; the exploitation coefficient is set at 0.4 and *T* at 4 s. Table 3 summarizes the results of this experiment in terms of the completion time. In this experiment, the maximum allowed search time for random-walk is adjusted to 1.5 times of what is needed by exhaustive search for each case study environment. In case study 4, we partition the arena into four parts and each robot searches one part using the exhaustive search method. Note that, in this table, we only report the best performance across five runs of the random-walk method for each case.

Case study | Algorithm | Total time^{a} (s) | Success rate |
---|---|---|---|

1 | Bayes-Swarm | 246.1 | 1/1 |

Random-walk | 20,394 | 1/5 | |

Exhaustive search | 22,174 | 1/1 | |

2 | Bayes-Swarm | 42.5 | 1/1 |

Random-walk | 227.6 | 5/5 | |

Exhaustive search | 225.3 | 1/1 | |

3 | Bayes-Swarm | 260.1 | 1/1 |

Random-walk | – | 0/5 | |

Exhaustive search | 22,174 | 1/1 | |

4 | Bayes-Swarm | 373.2 | 1/1 |

Random-walk | – | 0/5 | |

Exhaustive search | 9163^{b} | 1/1 | |

5 | Bayes-Swarm | 31.9 | 1/1 |

Random-walk | – | 0/5 | |

Exhaustive search | 992^{b} | 1/1 |

Case study | Algorithm | Total time^{a} (s) | Success rate |
---|---|---|---|

1 | Bayes-Swarm | 246.1 | 1/1 |

Random-walk | 20,394 | 1/5 | |

Exhaustive search | 22,174 | 1/1 | |

2 | Bayes-Swarm | 42.5 | 1/1 |

Random-walk | 227.6 | 5/5 | |

Exhaustive search | 225.3 | 1/1 | |

3 | Bayes-Swarm | 260.1 | 1/1 |

Random-walk | – | 0/5 | |

Exhaustive search | 22,174 | 1/1 | |

4 | Bayes-Swarm | 373.2 | 1/1 |

Random-walk | – | 0/5 | |

Exhaustive search | 9163^{b} | 1/1 | |

5 | Bayes-Swarm | 31.9 | 1/1 |

Random-walk | – | 0/5 | |

Exhaustive search | 992^{b} | 1/1 |

As all random-walk runs are not able to find the source, we only report the total time of the best solution obtained using the random-walk.

For this case, we divide the search space into four equal quarters and each robot does an exhaustive search in each portion (two in the global portion).

The results show that the *Bayes-Swarm* algorithm performs significantly better than the exhaustive search and random-walk approaches in all the five case studies. Due to the complexity of some of the search environments, the random-walk method often fails to find the source location within the allowed maximum search time, as evident from its poor success rate in cases 1, 3, 4, and 5. Table 3 shows that *Bayes-Swarm* finds the primary source location about 5–100 times faster than exhaustive search in all five cases. As the random-walk reaches the goal only in the first two case studies, we compare *Bayes-Swarm* with the random-walk method only in these case studies; *Bayes-Swarm* is observed to perform 83 and 5 times faster than the random-walk method in case studies 1 and 2.

### 5.4 Experiment 4: Comparing Bayes-Swarm With a Swarm Intelligence Method.

To perform further comparative analysis of Bayes-Swarm with a state-of-the-art method, the well-known glowworm-based swarm search algorithm [18] is chosen. Specifically, we use the implementation of the glowworm algorithm available at Ref. [37]. For this analysis, both algorithms are run on case study 5 (first problem in Ref. [36], further described in Appendix Case Study 5). Both algorithms are run with the same robot specifications and environment settings as in Ref. [36]. It should be noted that there are two main differences between the generic mission objectives of the Bayes-Swarm algorithm and the glowworm algorithm: (1) Bayes-Swarm is designed to find the source with maximum strength signal in the presence of other weaker (say decoy) sources, while the glowworm algorithm is designed to find all local and global sources (both mission objectives can translate to important practical applications in the emergency response and defense domains). (2) The glowworm algorithm assumes the robots to be initially distributed in the search arena, while Bayes-Swarm makes no such assumption. With regard to the first difference, we compare Bayes-Swarm’s completion time to find the global target source (with maximum signal strength) with the time that the glowworm algorithm takes to find the first source (i.e., any source, local or global)—thus Bayes-swarm’s job is made to be at least as (or likely more) difficult. The second difference, with respect to starting points, is readily handled in our algorithm, since Bayes-Swarm is agnostic to the initial location of the robots.

We assume 50-robot teams and randomly generate the initial location of the robots in the arena, to be used by both methods: −3 ≤ *x*_{1} ≤ −1.2 and −3 ≤ *x*_{2} ≤ 3; based on the settings used in the reported glowworm algorithm, the robot velocity is set at *V* = 1 m/s.^{2} Since the glowworm algorithm employs a stochastic search approach, it is run ten times on this problem. It is observed that the robot team under *Bayes-Swarm* finds the source with maximum strength in 1.86 time units. In contrast, the robot team under the glowworm algorithm takes 3.04 ± 0.4 time units (mean ± std dev. over ten runs) to find the first (any) source, and 4.44 ± 0.55 time units to find the source with maximum signal strength. These results show that not only Bayes-Swarm is 58% faster than the glowworm algorithm in finding the global target source but also finds the global target source is less time than that taken by the glowworm method to find any source.

It is important to note that the performance of both Bayes-Swarm and glowworm algorithms is affected by their respective prescribed parameters. In the case of the glowworm implementation, we used the same parameter settings as those recommended by Kaipa and Ghose [37] (i.e., the paper from which we adopt the implementation of this algorithm). In the case of Bayes-Swarm, it was readily evident from earlier parametric analysis (Fig. 6) that, for complex multimodal signal environments, a value of α < 0.4 works well. Hence, we explored how Bayes-Swarm would compare to the glowworm algorithm when implemented with different values of α < 0.4. The results show that Bayes-Swarm found the target with maximum strength in 2.43, 3.32, and 2.40 time units for α set at 0.05, 0.1, and 0.2, respectively; these mission completion times are still better than that resulting from the glowworm implementation.

### 5.5 Discussion of Bayes-Swarm Performance.

The various empirical analyzes performed here show that the proposed Bayes-Swarm algorithm is scalable with respect to the number of robots and is able to localize targets involving complex multimodal signal environments. The algorithm also requires minimal heuristics, being dependent on only a single tunable parameter α that regulates the balance between exploration and exploitation. While, in the numerical experiments presented in this paper, a value of α < 0.4 was found to yield promising performance, in future it would be important to pursue approaches to automatically adapt α to the environment during the mission. Another important advantage of the Bayes-Swarm algorithm over model-free algorithms is its provisioning of a belief model of the signal environment during the mission. This makes the emergence of the robotic swarm system, from agent level micro-planning to team level macro search dynamics, relatively interpretable (as opposed to a blackbox phenomena).

The current form of the Bayes-Swarm method needs to overcome a few crucial limitations in the future. The need for downsampling is one of them. Since updating the GP-based belief model presents a cubic time complexity with respect to data-set size, we are required to downsample the collective data set of the swarm (observations made by agents). Without such downsampling, the cost of updating the belief model and thus of decentralized waypoint planning onboard simple robotic agents (with frugal computing capacity) will become burdensome. Currently, we use a simple downsampling approach based on sample rate compression. While this approach is simple to implement, it is far from optimal, especially for large swarm systems with 100s of robots. In such scenarios, choosing the most informative samples (to update the belief model) out of the entire set of observations remains a critical question, which will need to be addressed in future research on Bayes-Swarm. Another limitation of the current approach is the assumption of full observability, or a fully connected wireless network, where each swarm-robotic agent can communicate with all team members. In practice, it is more common to experience partial observance across the team due to communication range restrictions or communication intermittency issues. The allowance of asynchronous decision-making (currently offered by Bayes-Swarm) does help to some extent in mitigating the impact of such communication network limitations. However, in order to minimize potential conflict between agents’ decisions under partial observance, further advancements are needed in the formulations of the acquisition function and constraints guiding the waypoint planning of swarm agents.

## 6 Conclusion

In this paper, we proposed an asynchronous and decentralized algorithm to guide the path planning of a team or swarm of robots that is searching for the source of a spatially distributed signal in 2D arenas. This algorithm is founded on an extension of the batch Bayesian optimization method, with advancements made for application to embodied swarm systems. A new acquisition function is designed to be able to uniquely incorporate the following: (1) modeling knowledge gain over trajectories, as opposed to at points; (2) implicitly mitigating overlapping trajectories among robots to maximize unique knowledge gain; and (3) incentivising robots to reach (closest to) the expectation of the source, while accounting for constraints on the robot’s motion and cost incurred by it in reaching a candidate waypoint. A heuristic (parameter, α) is currently used to balance the source seeking and knowledge gain components of the acquisition function, and thus further parametric analysis is performed to understand its impact. It is found that suitable values of this parameter depend both on the size of the swarm and the complexity of the signal’s spatial distribution. An important direction of future research will be to build on this understanding to formulate a situation-adaptive variation (instead of user prescription) of the weighting coefficient.

To evaluate and compare the performance of the proposed algorithm, *Bayes-Swarm*, exhaustive search and random-walk baselines are considered. These algorithms are tested on five distinct case studies, with varying arena size and complexity (non-convexity) of the spatial distribution of the signal. Performance is analyzed in terms of completion time and mapping error. The *Bayes-Swarm* approach easily outperforms the exhaustive search and random-walk approaches by achieving up to 90 times better values of completion time. In addition, we compared our algorithm with the state-of-the-art SI-based glowworm algorithm, over a benchmark (multi-source) problem, with the outcomes clearly demonstrating the search efficiency benefits of Bayes-Swarm.

Scalability of the *Bayes-Swarm* algorithm is also analyzed, with significant performance gain (in terms of superlinear reduction in completion time) observed as the swarm size is changed from 2 to 20, and then mostly saturating owing to the bounds on the size of the arena. Increased swarm size, while beneficial to the mission, also increases the rate at which signal data are collected; this then increases the online computational cost of updating the GP model of the signal environment by every robot during the mission. Thus, future work will look at advanced downsampling-based update approaches (e.g., using active learning techniques) or direct sharing of model updates across robots (instead of sharing of data), especially for applications where 100s–1000s of robots are needed, or where longer mission time periods are needed. This, along with physical demonstration and the consideration of partial observability due to communication constraints, will allow us to more comprehensively explore the scalability of the *Bayes-Swarm* algorithm in the future.

## Footnote

These settings are purely computational and are used here to preserve the sanctity of the glowworm implementation and allow fair comparison.

## Acknowledgment

Support from the National Science Foundation Award IIS-1927462 is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF.

## Nomenclature

*r*=robot index, a value between 0 and

*N*_{r}*T*=decision-horizon time of robots

*V*=nominal velocity of robots

*l*_{s}=length of path

*s**N*_{max}=downsample threshold, which defines the maximum allowed samples for fitting the GP model by each robot

*N*_{r}=number of robots (swarm size)

- $D1:kr$=
observations history of robot-

*r*, including self-observations and shared by its peers, from beginning of the mission until finishing its*i*th waypoint - $xrk+1$=
next waypoint of robot-

*r*at the decision-time*k*_{r} - $yri]$=
source signal measurements made by robot-

*r*while it is moving from waypoint-(*i*− 1) to waypoint-*i* - $Dri$=
a set of observations of an environment that made by robot-

*r*after finishing its*i*th waypoint; i.e., $Dri=[Xri,yri]$ - $X^\u2212rpkr$=
current local peer-

*p*’s next waypoint of robot-*r*at the decision-time*k*_{r} - $Xri$=
location of the observations made by robot-

*r*while it is moving from waypoint-(*i*− 1) to waypoint-*i* - $X^\u2212rkr$=
current local peers’ next waypoint of robot-

*r*at the decision-time*k*_{r}; i.e., $X^\u2212rkr=\u22c3p=1;p\u2260rX^\u2212rpkr$ *h*_{r}(.)=source seeking term of robot-

*r*in Bayes-Swarm*g*_{r}(.)=knowledge-gain term of robot-

*r*in Bayes-Swarm- GP
_{r}=GP model trained and used by robot-

*r* - α=
exploitation weight, where α = 1 would be purely exploitative

- $\Delta \theta $=
initial feasible direction

### Appendix A: Definition of Case Studies

#### Case Study 1: Large Arena, Convex Signal Distribution

**x**= (

*x*

_{1},

*x*

_{2}), where 0 ≤

*x*

_{i}≤ 24, and

**c**

_{1}= (5, 23). The initial feasible direction, $\Delta \theta $, is set at 90.

#### Case Study 2: Small Arena, Non-Convex Signal Distribution

**x**= (

*x*

_{1},

*x*

_{2}), where 0 ≤

*x*

_{i}≤ 2.4,

**c**

_{1}= (1.9, 2.3), and

**c**

_{2}= (1.5, 0.5). The initial feasible direction, $\Delta \theta $, is set at 90.

#### Case Study 3: Large Arena, Non-Convex Signal Distribution

**x**= (

*x*

_{1},

*x*

_{2}), where 0 ≤

*x*

_{i}≤ 24,

**c**

_{1}= (10, 23), and

**c**

_{2}= (15, 5). The initial feasible direction, $\Delta \theta $, is set at 90.

#### Case Study 4: Large Arena, Highly Multi-Modal Signal Distribution

**x**= (

*x*

_{1},

*x*

_{2}), where −24 ≤

*x*

_{i}≤ 24. Moreover,

**c**

_{1}= (21, 19),

**c**

_{2}= (21, −19),

**c**

_{3}= (0, −15),

**c**

_{4}= (0, 15),

**c**

_{5}= (−19, 10),

**c**

_{6}= (21, 19), and

**c**

_{7}= (−15, −15). The initial feasible direction, $\Delta \theta $, is set at 360.

#### Case Study 5: Multi-Modal Source

*x*

_{i}≤ 3. A set of 50 robots are randomly deployed in a two-dimensional region such that −3 ≤

*x*

_{1}≤ −1.2 and −3 ≤

*x*

_{2}≤ 3. The function consists of a set of three peaks at locations (−0.0093, 1.5814), (1.2857, −0.0048), and (−0.46, −0.6292). The source with maximum strength is located at (−0.0093, 1.5814).

### Appendix B: Bayes-Swarm and Glowworm Algorithm Settings

Table 4 summarizes all settings that have been used for Bayes-Swarm for all experiments and case studies. Table 5 lists the settings that have been used for the glowworm algorithm in Experiment 4.

Experiment | Case study | N_{r} | V (m/s) | T (s) | α |
---|---|---|---|---|---|

1 | 2 | 4 | 0.1 | 4 | [0–1] |

1 | 4 | 10 | 0.1 | 4 | [0–1] |

2 | 4 | [2–100] | 0.1 | 4 | 0.4 |

3 | 1–4 | 5 | 0.1 | 4 | 0.4 |

3 | 5 | 5 | 0.1 | 4 | 0.99 |

4 | 5 | 50 | 1 | 0.1 | 0.05, 0.1 |

0.2, 0.4 |

Experiment | Case study | N_{r} | V (m/s) | T (s) | α |
---|---|---|---|---|---|

1 | 2 | 4 | 0.1 | 4 | [0–1] |

1 | 4 | 10 | 0.1 | 4 | [0–1] |

2 | 4 | [2–100] | 0.1 | 4 | 0.4 |

3 | 1–4 | 5 | 0.1 | 4 | 0.4 |

3 | 5 | 5 | 0.1 | 4 | 0.99 |

4 | 5 | 50 | 1 | 0.1 | 0.05, 0.1 |

0.2, 0.4 |

Note: *N*_{r}, number of robots; *V*, velocity of robots; *T*, decision-horizon length; α, exploitation coefficient.

N_{r} | $\rho $ | $\gamma $ | $\beta $ | r_{s} | Δs (m) |
---|---|---|---|---|---|

50 | 0.4 | 0.6 | 0.08 | 3 | 0.03 |

N_{r} | $\rho $ | $\gamma $ | $\beta $ | r_{s} | Δs (m) |
---|---|---|---|---|---|

50 | 0.4 | 0.6 | 0.08 | 3 | 0.03 |

Note: *N*_{r}, number of robots; $\rho $, luciferin decay constant; $\gamma $, luciferin enhancement constant; $\beta $, decision range gain; *r*_{s}, sensor range of robots; Δ*s*, distance moved by each glowworm when a decision is taken.