## Abstract

System of systems (SoS) are networked integration of constituent systems that together achieve new capabilities not possible through the operation of any single system. SoS can be found across all aspects of modern life such as power grids, supply chains, and disaster monitoring and tracking services. Their resilience (being able to withstand and recover from disruptions) is a critical attribute whose evaluation is nontrivial and requires detailed disruption models. Engineers rely on heuristics (such as redundancy and localized capacity) for achieving resilience. However, excessive reliance on these qualitative guidelines can result in unacceptable operation costs, erosion of profits, over-consumption of natural resources, or unacceptable levels of waste or emissions. Graph-theoretic approaches provide a potential solution to this challenge as they can evaluate architectural characteristics without needing detailed performance simulations, supporting their use in early stage SoS architecture selection. However, no consensus exists as to which graph-theoretic metrics are most valuable for SoS design and how they should be included in the design process. In this work, multiple graph-theoretic approaches are analyzed and compared, on a common platform, for their use as design tools for resilient SoS. The metrics central point dominance, modularity, specialized predator ratio, generalization, vulnerability, and degree of system order are found to be viable options for the development of early stage decision-support tools for resilient SoS design.

## 1 Introduction

Complex engineered networks are critical to the successful functioning of modern society. These networks (such as power grids, supply chains, and disaster monitoring and tracking services) are best described as *system of systems* (SoS): networked integration of *heterogeneous* and *independent* constituent systems that together achieve new capabilities not possible through the operation of any single system [1–3]. The constituent systems in these networks have operational and/or managerial independence and are usually developed independently. The behavior of the overall SoS depends largely on how these constituent systems interact with each other and cannot be determined only by knowing the behaviors of the systems in isolation (a property called *emergence*) [2,4]. These characteristics make their design and evaluation extremely challenging.

*Resilience* is an essential attribute for successful SoS operation and has received considerable attention in recent years. While multiple definitions of resilience exist in the literature (see Refs. [5–8]), the two aspects most commonly used to describe resilience are *survival* and *recovery* (Fig. 1) [9]. The United States has sustained more than 285 weather and climate disasters with infrastructure damages of $1 billion or more (CPI adjusted to 2020) since 1980, with a total cost exceeding 1.875 trillion USD [10]. The effect of the COVID-19 crisis on the global supply chain or the winter storm of February 2021 on the Texas power grid [11] are recent examples of the impact a lack of resilience can have on critical infrastructure SoS.

Quantification of resilience is comprised multifaceted considerations, such as the impact severity of a disruptive event, the state after recovery efforts, and the speed of recovery [12–15]. Assessing these aspects requires simulation of SoS performance under specific disruption scenarios using detailed disruption models. Such information is not readily available in the early stages of the design process. In addition to the challenges of assessing SoS behavior, large-scale, complex, and geographically dispersed SoS can have a large number of possible disruption and recovery scenarios. Assessing every architecture alternative for all possible disruptions is not a feasible design method. These challenges prevent SoS resilience from being reliably quantified in early design stages, leading to a dependence on qualitative guidelines such as physical and functional redundancy, localized capacity, and inter-node communications [16]. Feasible SoS architectures designed using these guidelines must then be evaluated and compared to select those that best meet stakeholder requirements. While these guidelines are useful, they are qualitative in nature and do not provide insight into *how much* redundancy or distribution of capacity is *enough* or *too much*. Excessive investment in resilience measures can hamper development objectives and concern stakeholders, causing for example unacceptable operation costs, erosion of profits, over-consumption of natural resources, or unacceptable levels of waste or emissions.

Quantifiable architectural characteristics associated with desirable balances between opposing objectives like efficiency and resiliency could dramatically improve the impact of the early network design stages. These types of characteristics can potentially be quantified using graph-theoretic approaches and have the advantage that they can be evaluated when architectural decisions are still being made [17]. A variety of graph-theoretic approaches have been proposed in the literature (details in Sec. 2) as promising SoS design for resilience indicators. However, there is no consensus on which graph-theoretic metrics are most valuable for SoS design and how they should be included in the design process. Additionally, graph-theory-based frameworks from nonengineering disciplines may also be of interest. Ecologists use ecological network analysis (ENA) to quantitatively measure the architectural characteristics of resilient and sustainable biological ecosystems. Prior work has investigated the potential for ENA and the patterns it has found amongst biological ecosystems to be used to guide the design of engineering networks and SoS for resilience and sustainability [18–22].

This work surveys popular graph-theoretic metrics in the literature, both from engineering design, social network analysis (SNA), and ecology, to identify a selection of relevant metrics that can be integrated as tools in the SoS design for the resilience process. Metrics are tested against resilience and resilience–affordability tradeoff indicators under various disruption scenarios to identify those that have meaningful correlations. Two hypothetical-realistic SoS operation scenarios, one representative of manufacturing and resource distribution networks and the second representative of disaster monitoring and tracking services, are used to test the metrics. The results provide a common-platform comparison and potential pathways for combined usage of various graph-theoretic approaches for resilient SoS design. The initial phase of this work was presented at the 2022 ASME International Design Engineering Technical Conferences and Computers & Information in Engineering Conference [23].

## 2 Related Work

A brief overview of related graph-theoretic approaches used in SoS/complex engineering networks’ resilience is covered to position this work. Readers interested in a broader review of applications of graph theory in SoS Engineering are encouraged to refer to Harrison [24]. Graph-theoretic approaches have proven valuable in transportation networks [25], where metrics such as the size of the largest connected component and the average shortest path have proved to be effective performance indicators during both normal operations and disturbances [26,27]. Node centrality measures have been shown to provide effective strategies to guide the quick restoration of air-transportation networks after disruptions [28]. These metrics, however, are not directly applicable as performance indicators for nontransportation-related SoS. Metrics such as customer interruption hours in energy or water infrastructure, profit losses in supply chains, and success rates for the detection and tracking of extreme weather events in disaster monitoring and tracking SoS are more relevant indicators for stakeholders than graph-theoretic analyses. Nevertheless, recent literature has shown value in graph-theoretic approaches for resilient SoS design. Network node and link centrality measures have been used to identify critical constituent systems and interactions in military organizations [29] and group betweenness centrality combined with line outage distribution factors has been used to quickly identify multiple contingencies that cause power grid violations [30]. Modularity is another graph-theoretic metric used for systems design but the literature is divided on the correlation between systems’ modularity and ability to survive disruptions. On the one hand, some studies have supported a positive correlation between modular architectures and disruption survival: higher modularity, in multisensor target tracking SoS architectures, was found to be related to better disruption survivability [17]. On the other hand, a negative correlation has also been observed in the literature: lowering the modularity of water distribution networks was found to improve resilience to disruptions [31], and for three engineering systems (bicycle drive train, automobile drive train, and an aircraft) high modularity negatively impacted system robustness [32].

Graph-theoretic approaches have also enabled the transfer of biological design principles to engineering for improved response to disruptions. Bio-inspiration applied to product architectures was found to improve robustness to random failures [33]. ENA [34], a graph-theoretic framework has been shown to be a useful design tool for improving the sustainability of industrial networks [18,35–37], and improving the resilience of power grids [19] and supply chains [38]. ENA has also been studied in the context of improving SoS resilience. Specifically, the ENA metric degree of system order (DoSO) has been found to be a valuable tool for early stage SoS design when resilience and affordability tradeoffs are important [22,39,40]. Another study has found that the addition of ecology-inspired detrital actors can be beneficial for the resilience of SoS such as the forestry industry [41].

While previous studies have identified multiple graph-theoretic metrics that are potentially valuable for resilient SoS design—no metric(s) has been accepted as the “best” approach. This is due to the fact that SoS attributes are complex and emergent and it is unlikely that a single low-fidelity model or metric can reliably predict SoS behavior. Resilience is also a highly context-dependent attribute. Therefore, the applicability of any graph-theoretic metrics for SoS resilience may also be context-dependent. Graph-theoretic metrics, including those outlined, are generally tested on and against different systems/SoS and disruptions making their comparison challenging. These limitations are approached here by conducting a common-platform comparison of promising metrics and identifying future research avenues for an early stage SoS resilience prediction framework.

## 3 Methods

This work uses two representative SoS case studies to test the graph-theoretic metrics: a manufacturing and resource distribution network and a disaster monitoring and tracking service. This section provides an overview of the case studies, defines the graph-theory metrics tested in this study, and describes the correlation tests conducted with resilience and resilience–affordability indicators.

### 3.1 Description of Notional System of Systems Case Studies.

The first case study used is a *manufacturing/resource distribution SoS* (MRD-SoS) network from previous work by the authors [22]. Notional SoS architectures were generated using a hypothetical operation scenario requiring the completion of three tasks. Five systems (with different capabilities) were assumed to be available to perform each of the three tasks. Therefore, a total of 15 available systems for the SoS design. The operations of Task-3 were dependent on the completion of Task-2 and the operations of Task-2 were dependent on the completion of Task-1. Notional architectures were generated by selecting combinations of the constituent systems and different sets of interactions between selected systems. The overall SoS performance was based on the successful completion of the three tasks and accounted for inter-operability differences between constituent systems. A total of 38,592 notional architectures were generated (two examples are shown in Fig. 2). Details regarding the architecture generation process and all architecture evaluation procedures for the manufacturing and resource distribution network case study are available from Ref. [22].

The second case study, a *disaster monitoring and tracking service SoS* (DMT-SoS), is based on work related to Earth observation sensor webs [42]. The SoS can contain the following constituent systems: one geostationary orbit satellite, three low-earth orbit satellites, two high-altitude airborne platforms, four low-altitude airborne platforms, and two ground stations. All selected constituent systems, except the ground stations, are assigned specific observation tasks. Each constituent system has the ability to perform a specific set of tasks and there is a performance score associated with each task that they are capable of performing, considering the observation area, and resolution of information captured. The task-allocation procedure for the selected systems in any architecture is modeled as an assignment problem. These systems conduct the assigned observation task and provide the information to the ground stations (through the communication routes in their architecture) that then pass it on to the user or stakeholder. The performance of the SoS is evaluated based on the scores of the observation tasks completed and the latency of the information pathways for the user/stakeholder to receive the observation information. A total of 4596 feasible architectures were tested for this operation with different selections of constituent systems and interactions between selected systems. Due to the constraints of required performance in this case study, the architectures are primarily distinguished by a unique set of interactions between systems, and the number of selected systems for the SoS is similar in most of them. An example architecture is shown in Fig. 3. The full details of this case study, including architecture evaluation procedures, are detailed in Supplementary Material A on the ASME Digital Collection.

The following features were evaluated for all the architectures in both case studies:

Measure of performance (MoP) under normal operation.

Architecture development cost (DC).

Expected MoP after disruption: The loss of one, two, and three constituent system(s) (

*N*− 1, 2, 3) were investigated, with every possible disruption tested for all architectures in each scenario. The ability of an architecture to respond to a specific disruption scenario was measured using the*expected (mean) performance after disruption*assuming that all disruptions in a given scenario were equally probable.Selected graph-theoretic metrics (see Sec. 3.2).

### 3.2 Graph-Theoretic Metrics for System of Systems Assessment.

**F**) of the SoS network representation, where

*F*

_{ij}= 1 if a link exists from

*i*to

*j*and is zero otherwise. Flow-based evaluations are completed using a flow matrix (

**T**) where

*T*

_{ij}represents the amount of flow from node

*i*to

*j*. A brief description of each tested metric is provided in this section. Readers interested in a more detailed description are encouraged to use the cited sources.

*Density*: A measure of how many links (edges) exist in a network compared to the maximum possible number of links (Eq. (1)). This metric is also called connectance in the ENA literature [34].

*L*represents the number of links, and

*N*represents the number of nodes in the digraph

*Central point dominance*: A measure of the average difference in betweenness centrality (the number of shortest paths between pairs of nodes that traverse a node) of the most central node (

*B*

_{max}) and rest of the nodes in the network (Eq. (2)) [43].

*B*

_{i}represents the betweenness centrality of the

*i*th node in the network

*Heterogeneity*: The variance of the degrees of the nodes (

*d*) in a network [44] (Eq. (3)). For undirected graph models, either the in-degree or out-degree of nodes can be used. For directed graph models, this work uses the total degree (in-degree + out-degree) of the nodes in the network

*Modularity*: A measure of the division/organization of the network nodes into modules. A highly modular network has dense intra-modular connections and few inter-module links. Modularity is evaluated using Eq. (4), where

*e*

_{ii}is the percentage of edges in module

*i*, and

*a*

_{i}is the percentage of edges with at least one end in module

*i*. A popular algorithm for this calculation was proposed by Newman [45,46]. This work uses an implementation of Newman’s algorithms provided by Zuo [47]

*Nestedness*: A measure of the tendency of nodes in a network to interact with subsets of the interaction partners of better-connected nodes [48]. A highly nested network would have one node in the network connected to all other (

*N*− 1) nodes, the next node connected to (

*N*− 2) nodes, and so on where the last node would only have one interaction. This study utilizes the unipartite nestedness based on overlap and decreasing fill (UNODF) metric (see Eq. (5)) proposed by Cantor et al. [49].

*N*is the total number of nodes in a network and

*F*

_{ij}is the structural matrix.

*d*

_{i}represents the degree of node

*i*, and the operator $\delta didj$ checks for node overlap of the same degree: $\delta didj=1$ if

*d*

_{i}=

*d*

_{j}, and $\delta didj=0$ in all other cases. UNODF has a range of 0–1, with perfectly nested networks at the upper bound. This metric provides two values: nestedness based on rows and based on columns (determined by running Eq. (5) for the transpose of

*F*). For undirected graph models (symmetric adjacency matrices), the two nestedness values will be equal

*Algebraic connectivity*: The second smallest eigenvalue of the Laplacian matrix of the network [50]. The Laplacian matrix is the difference between the degree matrix and the structural (adjacency) matrix (Eq. (6)), given a connected undirected graph (square-symmetric adjacency matrix). Under these conditions, the eigenvalues of the associated Laplacian matrix (given by the roots of Eq. (7)) will be real numbers greater than or equal to zero (Eqs. (8)–(9)). The second smallest eigenvalue (

*λ*

_{2}) gives the algebraic connectivity. Note: to evaluate the algebraic connectivity of the networks in the manufacturing/resource distribution case study, an undirected graph representation was created to satisfy this connected graph requirement

*Prey-to-predator ratio*: The ratio of the number of prey (producers) to the number of predators (consumers) (see Eq. (10))

*Specialized predator ratio*: The fraction of predators (consumers) that only consume from one prey (producer) (see Eq. (11))

*Generalization*: The average number of prey (producers) for each predator (consumer) (see Eq. (12))

*Vulnerability*: The average number of predators (consumers) for each prey (producer) (see Eq. (13)). Note: this ENA metric is not the same as the metric vulnerability used in engineering as a measure of the impact of a disruption on system performance [5]

*Degree of system order*: A measure of the relative amount of pathway constraints in a flow network [51]. It is calculated as the ratio of two information theory-based indices: (a) average mutual information (AMI, Eq. (21)) that quantifies the constraints in flow network pathways, and (b) Shannon Index (

*H*, Eq. (22)) that provides the upper limit on AMI. DoSO = 1 indicates an extremely pathway constrained network and DoSO = 0 indicates a network with extreme pathway flexibility

*Ecological fitness*: A predictor of the fitness of ecosystems to grow while simultaneously surviving/recovering from perturbations (

*R*

_{eco}, Eq. (15)) [52,53]. Ecological fitness is a function of the ENA metric DoSO (Eq. (14)). Ecological literature has shown that biological ecosystems exist within a narrow range of DoSO values (called the Window of Vitality), suggesting that they have evolved to avoid both extreme pathway constraints (DoSO = 1) and extreme pathway redundancies (DoSO = 0). The ecological fitness function

*R*

_{eco}was developed as a mathematical indicator of this fitness as a function of DoSO, with

*R*

_{eco}= 0 at both DoSO = 1 and DoSO = 0. This function peaks at DoSO = 1/

*e*≈ 0.3679 because the studied biological ecosystems were found to populate around this DoSO value indicating that such architectures would be evolutionarily favorable/fit for biological ecosystems. Ulanowicz [52] also recognized that different DoSO ranges could prove more favorable for other complex systems operating under distinct conditions and proposed a generalized version of the fitness function (

*R*

_{gen}, Eq. (16)). In Eq. (16), the

*β*parameter can be varied to adjust the fitness function for the unique DoSO values associated with the peak fitness of specific systems

*P*

_{r},

*P*

_{s},

*G*, and

*V*require prior calculations as shown in Eqs. (17)–(20). The metrics DoSO,

*R*

_{eco}, and

*R*

_{gen}require prior calculations shown in Eqs. (21)–(23)

### 3.3 Correlation Analysis.

*expected performance after disruption*for the

*N*− 1, 2, and 3 scenarios (see Sec. 3.1), measuring resilience as the ability of the SoS to adapt and maintain/recover functionality/achieve mission objectives despite disruptions [54]. The resilience–affordability tradeoff was measured using the

*resilience-cost function*(RC) shown in Eq. (24). This metric was calculated as the weighted sum between the normalized expected loss of performance for a given disruption scenario and

*architecture development cost*. Since lower values of both expected performance loss and development cost are desirable, a lower value of the RC metric indicates a better SoS architecture. In Eq. (24),

*E*[MoP

_{N−X}] is the

*expected performance after disruption*in the

*N*−

*X*scenario, MoP

_{req}is the required performance value from the SoS, DC is the development cost of the architecture, and

*λ*is the weight parameter. Loss of performance and development cost are both normalized by their maximum values (max (MoP

_{loss}) and max (DC), respectively). Three values of the weight parameter are used for the analysis (leading to three resilience–affordability tradeoff indicators):

*λ*= 0.75: Greater weight on lowering expected performance loss.*λ*= 0.5: Equal weight on both objectives.*λ*= 0.25: Greater weight on lowering development cost.

*N*− 1,

*N*− 2, and

*N*− 3 scenarios) and the nine RC metrics (three weighted sums for each scenario) were documented. To test the usefulness of the graph-theoretic metrics discussed in Sec. 3.2, a correlation analysis was conducted. Correlations were tested between all pairs of the resilience/resilience–affordability indicators and graph-theoretic metrics using the coefficient of linear correlation (

*ρ*), based on a similar analysis conducted for water distribution networks by Meng et al. [31]. In the present study, correlation coefficient magnitudes between 0.3 and 0.7 are considered moderate, and magnitudes greater than 0.7 are considered strong correlations based on Ratner [55]. Scatter plots of the data used for the correlation analyses are provided in Supplementary Material B on the ASME Digital Collection.

## 4 Results

### 4.1 Manufacturing and Resource Distribution System of Systems Case Study.

Figures 4 and 5 are heat maps illustrating the linear correlation coefficients for the manufacturing and resource distribution SoS case study. With respect to *SoS response to disruptions*, it is observed that: (1) density, total degree heterogeneity, algebraic connectivity, nestedness, modularity, specialized predator ratio, generalization, vulnerability, degree of system order, and ecological fitness had *moderate-to-strong* correlations with the expected performance after disruptions. (2) Out of these metrics modularity, specialized predator ratio, and degree of system order were found to have a *negative* correlation with the expected performance after disruptions. (3) However, strong correlations (*ρ* > 0.7) were only observed with the metrics specialized predator ratio, generalization, vulnerability, and degree of system order—especially in the *N* − 3 disruption scenario. (4) Central point dominance and prey-to-predator ratio were *not* found to have useful correlations with the expected performance after disruptions.

With respect to the *SoS resilience–affordability tradeoff indicators*, it is observed that: (1) The metrics density, total degree heterogeneity, algebraic connectivity, modularity, specialized predator ratio, generalization, vulnerability, degree of system order, and ecological fitness had *moderate-to-strong* correlations with the resilience–affordability tradeoff indicators (under most conditions). (2) Out of these, strong correlations (*ρ* > 0.7) were observed with the metrics modularity, specialized predator ratio, generalization, vulnerability, degree of system order, and ecological fitness—but under specific conditions. (3) The correlation magnitudes with resilience–affordability indicators were observed to be highly context-dependent. For example, the metrics generalization and vulnerability had strong correlations under the *N* − 3 and *N* − 2 disruption scenarios when equal or greater weight was placed on resilience, however, the same metrics had negligible correlations under the *N* − 1 scenario when greater weight was placed on affordability. (4) Central point dominance, prey-to-predator ratio, and nestedness were *not* found to have meaningful correlations with the resilience–affordability indicators (in most contexts).

### 4.2 Disaster Monitoring and Tracking System of Systems Case Study.

Figures 6 and 7 are heat maps illustrating the linear correlation coefficients for the disaster monitoring and tracking SoS case study. With respect to *SoS response to disruptions*, it is observed that: (1) The metrics density, central point dominance, algebraic connectivity, nestedness, modularity, specialized predator ratio, generalization (or vulnerability), degree of system order, and ecological fitness had *moderate-to-strong* correlations with the expected performance after disruptions. (2) Out of these metrics modularity, central point dominance, specialized predator ratio, degree of system order, and ecological fitness were found to have a *negative* correlation with the expected performance after disruptions. (3) Strong correlations (*ρ* > 0.7) were observed with the metrics central point dominance, specialized predator ratio, generalization (or vulnerability), degree of system order, and ecological fitness—only in the *N* − 3 disruption scenario. (4) The metric degree heterogeneity was *not* found to have meaningful correlations with the expected performance after disruptions. (5) No metrics were found to have meaningful correlations with the resilience indicators in the *N* − 1 disruption scenario.

With respect to the *SoS resilience–affordability tradeoff indicators*, it is observed that: (1) The metrics density, central point dominance, algebraic connectivity, nestedness, modularity, specialized predator ratio, generalization (or vulnerability), degree of system order, and ecological fitness had *moderate-to-strong* correlations with the resilience–affordability tradeoff indicators in the *N* − 3 disruption scenario, and in the *N* − 2 disruption scenario with greater weight on resilience. (2) Out of these, strong correlations (*ρ* > 0.7) were observed with the metrics modularity, degree of system order, and ecological fitness—only in the *N* − 3 disruption scenario. The metrics central point dominance, generalization (or vulnerability), and specialized predator ratio were also observed to have *ρ* ≈ 0.7, under this condition. (3) The correlation magnitudes with resilience–affordability indicators were observed to be significantly weaker in the *N* − 2 and *N* − 1 disruption scenarios. A reversal of the correlation signs (negative to positive or vice versa) was observed in the *N* − 1 disruption with greater weight on architecture development cost. (4) The metric degree heterogeneity was *not* found to have meaningful correlations with the resilience–affordability indicators.

## 5 Discussion

### 5.1 Important Differences Between the Two Case Studies.

The two case studies used in this work, present significant differences in terms of available architecture design and evaluation options, and operational context. In the MRD-SoS case study, there were more options for constituent system selection but relatively limited options for selecting interactions when compared to the disaster monitoring and tracking SoS (DMT-SoS). In the MRD-SoS, there were only three tasks and multiple systems available to choose from for each task—enabling the selection of multiple redundant systems in the SoS for each task. In the DMT-SoS, there were ten observation tasks and a total of ten available systems (with different task-related abilities)—forcing a selection of all or most available systems to fulfill minimum performance requirements. However, the designers in the DMT-SoS would have much greater options to select interactions between constituent systems, since almost all systems could be set up to have the ability to communicate (bi-directionally) with every other system. On the other hand, interaction options in the MRD-SoS were constrained to two groups only: from task 1 systems to task 2 systems, and from task 2 systems to task 3 systems.

The case studies were also different in terms of their operational contexts. The MRD-SoS had a much greater range of architecture costs compared to the DMT-SoS. This is because the cost of developing constituent systems (such as satellites) is generally greater than setting up communication between them, and the DMT-SoS architectures were primarily distinguished by the selection of interactions (and not constituent systems). Most of the DMT-SoS architectures were able to maintain the minimum required performance under *N* − 1 disruptions. A relatively greater fraction of MRD-SoS architectures were significantly impacted by *N* − 1 disruptions, especially the architectures with one-to-one interactions between systems performing different tasks. This difference explains the observation that no metric had meaningful correlations with the resilience indicators in the *N* − 1 disruption scenario for the DTM-SoS. It also explains the reversal of the sign of the correlations to the *N* − 1 resilience affordability indicator with greater weight on architecture cost: In this context, the architectures had little variation in expected *N* − 1 performance, and the correlations were almost completely based on architecture cost. The distributions of the architecture costs and expected performance under the three disruption scenarios for both case studies are presented in Supplementary Material B on the ASME Digital Collection.

The differences in design options, for the two case studies, also affect the ranges and distribution of different graph-theoretic metrics’ values for the alternative architectures in the two case studies. The adjacency matrices of the graphical representation of the DMT-SoS architectures are symmetrical due to the bidirectional nature of the communication interactions. Because of this feature, the prey–predator ratio of all architectures is equal to 1, nestedness of columns and rows are the same values, and the generalization and vulnerability metrics also have the same values. Because of these reasons, there are fewer metrics in the correlation figures for the DMT-SoS. The specialized predator ratio values for the majority of the DMT-SoS architectures are equal to zero. The relatively low density of interactions in the MRD-SoS also limits the range of nestedness values possible in those architectures and could be the reason for weaker correlations—suggesting nestedness may not be the most suitable metric to use with such applications. Finally, the DoSO values of the MRD-SoS architectures ranged between 0.2 and 1, and for the DMT-SoS ranged between 0 and 0.6, with most architectures having DoSO ≤ 0.3. This enabled the exploration of SoS architectures of two types—one that can reach extreme levels of pathway constraints and one that can reach extreme levels of pathway redundancy. The distributions of the graph-theoretic metrics for both case studies are presented in Supplementary Material B on the ASME Digital Collection.

### 5.2 Key Take-Aways.

Based on the results of this study, the following metrics are identified as promising tools for designing resilient SoS architectures: *central point dominance, modularity, specialized predator ratio, generalization, vulnerability, and degree of system order.*

Central point dominance is a measure of the centralization of the network: a high value indicates that one or few nodes in the network are central and most nodes are peripheral. Communication between a pair of peripheral nodes would need to go through the central node. The strong negative correlation between central point dominance and the *N* − 3 disruption scenario resilience indicator, and the moderately strong correlations with other resilience and resilience–affordability indicators in the DMT-SoS suggest that higher centralization of the network is detrimental to resilience in communication/information transfer type SoS. However, it should be noted that no such correlation was observed in the MRD-SoS case study.

The modularity of the notional SoS architectures was found to have moderate-to-strong negative correlations with tested resilience and resilience–affordability indicators in both case studies. A possible explanation of this is as follows: If specific systems performing a critical role in the SoS can only communicate/provide their output to the rest of the SoS through a single system in their module, it creates the possibility of single-point failures at each module. A more integrated architecture could allow the SoS to maintain at least a fraction of the functionality of the impacted task. This result is aligned with findings from a topological study of water distribution networks resilience [31] that suggested higher link density and lower modularity improve resilience. Another network theory-based study of modularity as a systems design rule found that modularity was negatively correlated with systems robustness [32]. However, there is also opposing evidence in the literature. A study on a multisensor target tracking SoS found that more modular architectures (calculated using a different modularity metric, *singular modularity index* [56]) were better able to withstand the tested disruption scenarios [17]. The contrasting evidence from the literature suggests that higher modularity may only be suitable for improving resilience under specific types of operation scenarios. For example, modular architectures may be better at not letting cyber-attacks in one module affect other modules.

A lower value of generalization (or vulnerability), or a higher value of the specialized predator ratio metric indicates that the network has specialized consumers (or producers). The results showed a moderate-to-strong positive correlation between generalization (and vulnerability) and the tested resilience and resilience affordability indicators. A moderate-to-strong negative correlation was also observed between specialized predator ratio and the tested resilience and resilience affordability indicators. These results were consistent in both case studies and indicated that specialization of consumers/producers is *detrimental* to the ability of the SoS to respond to unexpected disruptions. While it could be more efficient to use specialized producer/consumer pairs (for best interoperability), it increases the dependence on each specialized producer/consumer and renders the SoS inflexible to adapt to disruptions. A relevant example of this is the lack of resilience in single-source supply chains [57].

Finally, the DoSO metric had moderate-to-strong negative correlations with resilience and resilience–affordability indicators in both case studies. DoSO is a measure of the flexibility of flow pathways in a network. Greater flexibility of pathways for information sharing or material flow is expected to improve the ability of SoS to adapt to disruptions. The pathway flexibility in SoS can be increased in different ways such as utilizing physical/functional redundancy of constituent systems, localized capacity, and increasing inter-node interactions. However, these measures can also hamper efficiency-related objectives such as affordability. Therefore, it is expected that a very low DoSO metric would not be favorable except when resilience/adaptability was the most important attribute or when pathway flexibility measures can be implemented inexpensively. For example, some of the notional architectures in the second case study, the DMT-SoS, had DoSO values approaching 0. However, this extreme pathway flexibility was achieved using a high density of (bidirectional) communication interactions in the DMT-SoS architectures. The addition of communication pathways was relatively inexpensive compared to the addition of constituent systems. Therefore, achieving these low DoSO values (greater pathway flexibility) in the DMT-SoS architectures was able to improve adaptability for resilience without hampering affordability.

The authors acknowledge the limitations of using the coefficient of linear correlation as a measure of the usefulness of graph-theoretic metrics to SoS resilience. These correlations could be biased by the range of tested variables, potential nonlinear associations, and the presence of outliers in the data. However, this does not negate the value of this study as the starting point in identifying promising graph-theoretic approaches that can be used as decision-support tools in the early stages of SoS architecture design for resilience.

While the tested SoS case studies are simplified for this work, they are representative of real-life SoS applications. The use of realistic design constraints and operation requirements limited the range/distribution of some graph-theoretic metrics over the generated architectures for both case studies (discussed in Sec. 5.1). While these factors can affect the strength of correlations tested here, it should also be noted that there will be similar constraints when applying graph-theoretic evaluations to real-life SoS applications. Therefore, testing the correlations under these constraints provides a more realistic test of the usefulness of the various graph-theoretic metrics in the SoS design for the resilience process.

## 6 Concluding Remarks

This study highlights the potential for a number of graphical analysis metrics to be used in designing resilient SoS. The metrics central point dominance, modularity, specialized predator ratio, generalization, vulnerability, and degree of system order, were observed to have *moderate-to-strong* correlations with resilience and resilience–affordability tradeoff indicators for the two SoS case studies. Graph-theoretic approaches can be used to quantify accepted resilience heuristics [9,16]. For instance, the graph-theoretic metrics specialized predator ratio, generalization, and vulnerability can quantify physical redundancy in SoS. The degree of system order metric can quantify physical and functional redundancy and localized capacity in SoS architectures. More traditional engineering metrics such as modularity and density are able to quantify inter-node communications for operational groups and overall SoS architecture.

An interesting avenue for future work is to develop indicators based on the combinations of two or more metrics. For example, central point dominance and generalization (or vulnerability) can be combined to account for both centrality and producer/consumer specialization. These combinations can potentially provide better resilience–affordability indications than using only one metric. It would be prudent to also consider if the metrics to be combined are highly correlated amongst themselves for a given SoS application. If two metrics are highly correlated for a given SoS design space, then it is unlikely that using a combination will provide unique/additional insights into the architectures’ fitness for resilience or affordability. Different approaches to combining two or more metrics could also affect their ability as resilience–affordability indicators.

It is worth noting that the graph-theoretic metrics tested in this study are *static* analyses. They do not capture *dynamic* behaviors of the constituent systems. However, they are promising tools for a multifidelity SoS architecture design and optimization process. Graph-theoretic metrics can be calculated early in the design process without needing detailed disruption scenario models. These metrics, when used as indicators of resilience heuristics, support the development of a preliminary set of *favorable* SoS architectures usable by SoS engineers. Based on the context-dependency of the strength of correlations observed in this work, the authors recommend calibrating graph-theoretic metrics with historical data on successful and unsuccessful architectures for specific applications. In the later stages of design and deployment, when detailed disruption models and simulation/testing capabilities are available, high-fidelity SoS evaluation approaches can be applied to the smaller design space to select the final design. This process can significantly reduce the expense of testing and evaluation of SoS architecture alternatives, and potentially transform the early stages of the SoS design process while *complementing* and *advancing* existing techniques.

## Acknowledgment

The authors would like to thank Dr. Daniel Selva, Assistant Professor in the Department of Aerospace Engineering at Texas A&M University, for his time and guidance in setting up the disaster monitoring and tracking service case study. A.C. gratefully acknowledges the financial support from the Texas A&M Energy Institute’s Graduate Fellowship, and the J. Mike Walker ’66 Department of Mechanical Engineering’s Graduate Summer Research Grant.

## Nomenclature

*D*=density

**F**=structural adjacency matrix

*G*=generalization

**T**=flow matrix

*V*=vulnerability

*H*_{D}=degree heterogeneity

*P*_{r}=prey-to-predator ratio

*P*_{s}=specialized predator ratio

*Q*_{N}=modularity

*R*_{eco}=ecological fitness function

- DC =
development cost

- RC =
resilience-cost function

- CPD =
central point dominance

- ENA =
ecological network analysis

- MoP =
measure of performance

- SoS =
system of systems

- DoSO =
degree of system order

- DMT-SoS =
disaster monitoring and tracking system of systems

- MRD-SoS =
manufacturing/resource distribution system of systems

- UNODF =
unipartite nestedness based on overlap and decreasing fill

*λ*_{2}=algebraic connectivity

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## References

*System of Systems Engineering Innovations for the 21st Century*