Optimizing the topology of complex infrastructure systems can minimize the impact of cascading failures due to an initiating failure event. This paper presents a model-based design approach for the concept-stage robust design of complex infrastructure systems, as an alternative to modern network analysis methods. This approach focuses on system performance after cascading has occurred and examines design tradeoffs of the resultant (or degraded) system state. In this research, robustness is classically defined as the invariability of system performance due to uncertain failure events, implying that a robust network has the ability to meet minimum performance requirements despite the impact of cascading failures. This research is motivated by catastrophic complex infrastructure system failures such as the August 13th Blackout of 2003, highlighting the vulnerability of systems such as the North American power grid (NAPG). A mathematical model was developed using an adjacency matrix, where removing network connections simulates uncertain failure events. Performance degradation is iteratively calculated as failures cascade throughout the system, and robustness is measured by the lack of performance variability over multiple cascading failure scenarios. Two case studies are provided: an extrapolated IEEE 14 test bus and the Oregon State University (OSU) campus power network. The overarching goal of this research is to understand key system design tradeoffs between robustness, performance objectives, and cost, and explore the benefits of optimizing network topologies during the concept-stage design of these systems (e.g., microgrids).

## Introduction

As complex infrastructure systems become increasingly more distributed and heterogeneous (e.g., microgrids, data networks), the likelihood of uncontrollable cascading failure grows, despite ongoing research in this field and advances in domain specific technology [1–7]. As these systems exist in highly stochastic environments, it is a challenge (or even impossible) for designers to design for complete immunity (i.e., robustness) to failures from uncertain events [8]. This work asserts that concept-stage complex system designs must attempt to capture robustness by examining the system level impact of failure propagation, and explore topological design tradeoffs when evaluating degraded system performance, after a cascading failure has occurred.

The intent of this work is to expand the current understanding and knowledge base of system-level design tradeoffs (e.g., performance and robustness) during the concept-stage design phase. For example, if a system's design requirements were completely performance based, a deterministic design optimization approach would work. Alternatively, if a system is required to operate continually at a predetermined desired performance level, and be immune to any potential failure scenarios, a robust (and expensive or practically impossible) design approach can be explored. In this paper, the robust topology design approach examines concept-stage design tradeoffs between system performance and robustness, while considering the negative consequence of cascading failure from uncertain events. However, since many complex infrastructure systems already exist, and it may be impractical (or even impossible) to change existing topologies, this work is geared toward the design of new systems (e.g., microgrids).

This paper introduces a model-based topology optimization approach for the concept-stage robust design of complex infrastructure systems, as an alternative to modern network analysis methods. Two power system case studies are presented, the IEEE 14 and the OSU campus power system, to demonstrate the vulnerability of these systems. The motivation for examining these cases comes from the August 13th, 2003 blackout of the NAPG, where a single fault triggered an uncontrollable cascading failure [9,10].

## Background

The literature defines system robustness as the ability of a system to operate as designed, despite the impacts of internal and external sources of uncertainty [11,12]. While the impact of uncertainty can be predicted in some systems (e.g., manufacturing), many complex systems are particularly susceptible to failures from uncertain events, since they are finely tuned to meet a specific objective (or set of objectives) and often do not directly address failure. To understand these cascading issues, current methods have employed graph theory and network analysis for evaluating emergent behavior [2,13–18]. While significant progress has been made to measure system performance and robustness using metrics such as node degree and centrality, there are still additional opportunities to explore the impact of cascading failures due to uncertain events in complex systems using model-based optimization.

### Robust Design.

There are many existing approaches for capturing robustness in complex system design; however, these methods often focus on either network analysis metrics, or component-level interactions, and do not specifically address robustness as the minimization of performance variability [19–30]. Robust design methods have been used historically to minimize unintended product manufacturing variability stemming from uncontrollable environmental effects [11]. Chang et al. [28] have expanded on Taguchi's fundamental robust design approach, and have explored scaling these design principles to complex systems, where each subsystem can be optimized independent of each other with limited knowledge of top-level system requirements. This work supports the need for a complex system robust design approach that accounts for system-level sources of uncertainty, without the need to understand or reduce the source. However, the challenge is creating designs that are robust to various types of system uncertainty (e.g., extreme weather, directed attacks) common in distributed systems, and it is difficult to predict the impact of cascading failures resulting from an initiating fault event.

This paper suggests utilizing a model-based topology optimization approach as a means of increasing robustness during the concept-stage design phase, and builds on existing network theory approaches, which is discussed in Sec. 2.2.

### Network Theory and Topological Graph Models.

Based on the distributed nature of many complex infrastructure systems, understanding the relationship between different topology designs and a system's response to external faults or perturbations is a key when designing for system robustness. The literature recognizes the importance of considering topology when examining complex infrastructure systems, using network theory to mathematically represent the system topology, often with an adjacency matrix [3,13,15–17,31,32].

where $nG$ is the number of generation nodes, $nD$ is the number of demand nodes at the unperturbed network state, and $nGi$ is the number of generation units able to supply flow to distribution (demand) vertex $i$ after disruptions take place. Subsequent averaging is done over every demand node $i$ of the network.

where $\u03f5ij$ denotes the efficiency of the most efficient path between $i$ and $j$. In this definition, the undirected graph $(G)$ is an $N\xd7N$ adjacency matrix of $(eij$), where $0<eij\u22641$ if there is an arc between node $i$ and node $j$, otherwise $eij=0$.

Braha [15] has extensively examined the statistical structural properties of complex, man-made systems, and shows that topology has a major impact on their functionality, dynamics, robustness, and fragility. While this work is directly related to complex product development networks, the approach is generalizable to other “small word” networks such as power systems, where topology characteristics are typically a function of the average distance between two nodes and a clustering coefficient of the topology. Chinellato et al. [4] have moved beyond static systems, and investigated the dynamic response of different networks (including small world) to external perturbations. Hill and Braha [37] have also examined dynamic systems, noting that centrality metrics from static or slow moving networks may need to be adapted for dynamic networks. However, since the research presented in this paper focuses on concept-stage topology design, dynamic network analysis is not addressed here, and will be explored in future work.

While the above network analysis approaches provide critical design information, it should be noted that these mathematical models are still abstractions of complex infrastructure systems, and could result in misleading information. Hines et al. [3] have explored these concerns, comparatively evaluating topology-based network analysis metrics within standard test cases (e.g., IEEE 14) to predict the magnitude of failures based on initiating faults. Their work concluded that while topological network metrics can provide general information and predict trends regarding a system's robustness, they should be used in conjunction with a physics-based model to improve the level of system abstraction, and avoid misleading information.

## Contributions

As previously stated, the complex infrastructure systems literature displays several approaches for evaluating robustness; however, an opportunity exists to bridge the gap between network analysis, and robust design at the component level. A generalized approach is needed to couple classical robust design strategies at the component level (i.e., minimizing performance variance), and topological strategies at the network level (i.e., maximizing connectivity). This is particularly challenging as both of these approaches are often presented as mutually exclusive, and each approach evaluates system robustness at different levels of system abstraction. For example, component level models are based on physical system properties (e.g., size, current, and voltage), and network models rely on topology relationships (e.g., node degree and distance between nodes). In addition, many current approaches focus on failure prevention, instead of performance degradation, which is the ability of a system to still operate at reduced capacity. Since most system failures occur due to uncertain initiating fault events, minimizing both the degraded performance magnitude and the variability of degraded performance magnitude for different possible fault scenarios is essential when evaluating and predicting system robustness.

This paper combines elements of classical robust design and model-based design, with network topology information, to minimize the degraded performance variability of a system after an uncontrollable cascading failure. A key element of this approach is the use of both physical parameters and topological relationships to create a balanced system abstraction. This approach presents design tradeoffs between resulting system performance and performance variability of a degraded infrastructure system *after* a cascading failure has occurred. Using this method, concept-stage static network topologies (e.g., microgrids) can be designed that are robust to uncertain events (e.g., natural disasters and directed attacks) often affecting highly distributed systems. Specifically, this research recognizes stochastic failure events and accounts for the ability to meet minimum performance requirements as well as considering cost.

Motivated by the NAPG case study, two case studies are presented. This first is an extrapolated version of the IEEE 14 test case and is represented by an adjacency matrix, where various system attributes such as power generation, regional demand, and system topology are included [38]. The second case study is the OSU Corvallis campus power system [39]. matpower, an open source power system analysis toolbox designed to operate within the matlab environment, is used to calculate the quasi-steady state decoupled power flow (DCPF) for both cases [40,41]. The results are two model-based optimization models that provide alternative system topology designs based on specific test case attributes such as power generation and demand. Topology optimization is performed with a multi-objective simulated annealing (SA) algorithm. Since it is not cost effective (or possible) to change existing power system topologies, applications of this research approach include concept-stage microgrid designs and existing systems reworks and are not intended as a dynamic control solution.

## Motivating Case Study: The North American Power Grid

The magnitude and frequency of large-scale and uncontrollable cascading failures within the NAPG have not declined during the past few decades, identifying the need to consider robustness when designing or reworking of these types of complex infrastructure systems [10,42,43]. The North American Electric Reliability Council (NERC), the Electric Power Research Institute (EPRI), and the Edison Electric Institute attribute these failures to Federal reorganization and deregulation of the NAPG, citing growing discontinuity between transmission and distribution systems [44]. Major system failures, where a single power line fault caused an uncontrollable cascading failure (e.g., the blackout of 2003), highlight the immediate vulnerability of complex systems, such as the NAPG, and motivate the need for concept-stage design strategies which could minimize the impact and variability of these catastrophic events [9,10]. Although the literature shows a large breadth of research in the power systems domain, data provided by the NERC shows that the frequency of blackouts has not decreased over the past 25 years [42]. Since the NAPG was originally constructed ad hoc based on sprawling population and increasing demand, topology optimization was not a primary consideration [45].

### Power System Optimization.

The literature on power system optimization encompasses a vast array of strategies for achieving system objectives and meeting design/operating requirements. Probabilistic risk assessment (PRA) methods are currently considered a best practice for identifying the likelihood that a fault will ultimately lead to a cascading system failure. The Long Island Power Authority uses a PRA tool, developed by EPRI, to determine likelihood and magnitude of occurrences within their local power system [46]. However, this tool does not account for cascading failures between different utilities, or the optimal subsystem design with respect to system-level (e.g., regional, national) objectives. This is a challenge for researchers developing accurate computer models intended to predict subsystem/system-level interactions.

In terms of dynamic power system response to failure, several accepted hardware solutions are used such as the flexible AC transmission system (FACTS), introduced by Hingorani [47]. The FACTS device enables the control of transmission line power flow to optimize loading [48]. Lininger et al. [23] incorporated the FACTS device into a computer model using a maximum flow algorithm to detect failure types in various outage scenarios. Carreras et al. [21] built on this approach by creating a computer model to replicate power outages due to excessive transmission line loading limits. Pinar et al. [49] have also addressed power grid vulnerability by outlining optimization strategies for power line failure prevention. Pahwa et al. [2] used the IEEE 300 test bus to examine power system failure modes that lead to cascading system failures. Cascading failure mitigation strategies often include targeted range-based load shedding and intentional islanding [50].

The examination of partial power system failures (i.e., degraded power systems) is also explored in the literature. Talukdar et al. [51] have focused on increasing the accuracy of failure predictions and the partial functionality of a power system after a fault event instead of focusing exclusively on failure prevention. This approach addresses system uncertainty, during dynamic switching operations, when attempting to completely recover power system performance. Fairley [9] supports this approach, noting that uncontrollable cascading failures are inherent in such large complex systems, and researchers should focus on mathematical models for failure mitigation, instead of elimination, when trying to increase system robustness.

### Power System Parameter Design.

In order to apply classical robust design techniques to power system modeling, Taguchi's parameter design method is used to visualize how different sources of uncertainty affect the system's response to different inputs (Fig. 1) [11,52]. As seen in Fig. 1, the control factors are elements that can be varied within the system, and noise factors are environmental elements that cannot be controlled. Based on historical failure events, both types of factors should be considered. Failures from external events are addressed in the context of Type 1 (parameter) robust design, which minimizes performance loss from external noise (e.g., weather). Type II (tolerance) robust design reduces performance losses due to uncertainty from internal control variables within the system (e.g., topology).

Applying type I and type II robust design to complex systems was explored by Lewis et al. [53], with the intention of identifying the system level impact of uncertainty from both internal system attributes and the external environment. The goal of this research was to formulate a design approach that could meet mean system level performance requirements, while simultaneously minimizing performance variation about the mean.

This relationship is explained visually in Fig. 2 showing how performance-based solutions may exist close to the boundary (i.e., constraint) of a design objective, where the performance variance is largest [52,53]. While Fig. 2 displays that the value of the function to be minimized is slightly larger at the robust solution point and there is significantly less mean performance variation. Using this approach in complex system design is relevant, as uncertainties from both control and noise factors exist. Subsequently, this research aims to understand how the application of classical robust design can benefit concept-stage complex system design.

## Methodology

### Solving Quasi-Steady State System Performance in matpower.

For this work, matpower is used to calculate the quasi-steady DCPF for both test cases [40,41,54]. The DCPF only examines active power, which is a linear and simplified version of an alternating current power flow (ACPF), and neglects transmission line losses and reactive power management. The DCPF calculation assumes static system conditions numerically analyzing key system attributes such as topology, generator limits, transmission line specifications, and power injection at each bus (i.e., node). The DCPF analysis provides valuable information on active power limits and power transmission line failures due to overloading. System attributes are inputted to matpower using specifically formatted models.

In this work, only the DCPF is considered since the presented design approach is intended for concept-stage steady state design only and does not consider dynamic effects. Although for some designs (e.g., microgrid), the DCPF simplification may impact the method presented [55,56]. Solving for an ACPF adds several variables relating to reactive power to analysis, which will increase model complexity and computational time. In practice, the reactive power required to support an actual microgrid could potentially increase individual line loading, and future work is needed to examine this hypothesis. However, the concept of a supporting performance flow (i.e., reactive power) is inherent only to power systems, and not generalizable to other complex system domains such as communication and traffic networks.

### Case 1: Extrapolated Three-IEEE 14 Test Bus System.

The first case study presented is an extrapolated version of the IEEE 14 test case (Fig. 3). The original IEEE 14 test case has 14 nodes, including two power generators and 12 demand nodes [38]. In this test case, physical topology and line lengths are directly related to the transmission line capacities. These lengths were estimated based on research from the Power Systems Engineering Research Center [57]. In addition, line lengths are directly proportional to connectivity costs

where $Aij$ is the adjacency matrix representing the case study topology, $Lij$ is the unit line length between all pairs of nodes, $CijLength$ is the nominal unit line length cost, and $Tij$ is the cost coefficient based on the type of transmission line (e.g., low voltage and high voltage).

In the extrapolate model presented, 3-IEEE 14 test busses (3-IEEE 14) are combined into a single synthetic power system, with a total of 60 lines. The 3-IEEE 14 subnetworks are connected to each other with three *interconnection* lines with matching line voltages. The interconnection line cost is based on a fixed line length of 241 km. This length value was chosen based on its similarity to the longest line length of the IEEE 14 test case, and it also represents a feasible interconnection distance between subnetworks for the 3-IEEE 14 synthetic case (e.g., microgrid) study. Figure 4 displays a visual representation of the 3-IEEE 14 subnetworks, with the three interconnection lines between each. Figure 5 shows an example of the corresponding upper triangular adjacency matrix.

### Case 2: Oregon State University Corvallis Campus Power System.

The OSU Corvallis campus power system is the second case study presented. OSU's local microgrid receives its power from the local power utility, a power cogeneration facility, and two large photovoltaic arrays, which are all modeled as generation nodes [39]. The OSU case study information is briefly summarized in Table 1.

### Topology Generation and Connectivity Testing.

*initial*topology that is “close” to an

*optimal*topology. For this task, a modified Waxman topology generation algorithm is used, which places points in Cartesian two-dimensional space [58]. Lines are added between two nodes

*u*and

*v*with a probability inversely proportional to the distance between them

*α*and

*β*are parameters,

*d*

_{u}_{,}

*is the distance between*

_{v}*u*and

*v*, and

*L*is the maximum internode distance. In the 3-IEEE 14 case study, the distance matrix (

*d)*was created using the Power Systems Engineering Research Center's estimates, in addition to the

*interconnection*line distances connecting the 3-IEEE 14 standard cases [57]. For the OSU campus power system, the node locations are taken from the actual university topology. Since this generation approach uses a modified Waxman topology algorithm, the selection of nodes

*u*and

*v*is made by sampling one line at a time from a distribution (subsequently the value of

*α*is not relevant)

where *u* and *v* can only be selected from the set of lines that do not yet exist. Since both case study topologies represent an actual (or a representation of an actual) microgrid, the starting topology is required to be fully connected. This implies each node must be connected to at least one other node. In addition, this line generation process further limits *u* and *v* by only allowing lines that will connect an unconnected node to the main topology, forming a spanning tree. After the graph is fully connected, and there are no unconnected nodes, additional lines will be added until the desired number (based on the original case study line density topology) is reached.

While the value of *α* is not significant using a modified Waxman topology generator, the choice of *β* directly impacts the number of lines (and subsequently the cost) of the system. As *β* approaches zero, the probability of choosing minimum-distance edges increases. Since distance is directly related to cost, one may in a sense “choose” the desired cost of the graph by tuning *β*. Since a larger *β* value increases the difference between the density of short and long links, each case study *β* value is “tuned” so that the ratio of short and long links is similar to the original case topology.

Using this iterative approach, *β* = 0.1 was chosen for the 3-IEEE-14 domain, and *β* = 0.03 for the OSU campus case. In both of the case studies presented, the number of lines in the Waxman generated topology is equal to the number of lines in the original topology (60 for the 3-IEEE-14 case, and 288 lines for the OSU campus case).

### Physical System Properties.

Each demand and generation node in both the 3-IEEE 14 and the OSU campus case studies has a load value (in megawatts) that must be satisfied for that node to perform nominally. Demand node values are a function of the total power required to supply a given area, and power generation values are based on energy production at that node. This model assumes that, if connected, power can be transported without losses between the demand and generation nodes.

System faults are replicated by removing a single line from a given topology. While many power system topologies are designed to handle a single element fault (i.e., *n* − 1), the topology design approach presented is intended to extend to different domains, where it is assumed that a single fault can result in uncontrollable cascading failure [59]. This assumption is based on real-world events such as the 2003 blackout of the North American power grid, where a single line failure led to overloading in adjacent lines, causing power substation circuit breakers to trip (as designed), which ultimately lead to uncontrollable cascading failure. In practice, however, it is likely that the transmission lines of a microgrid will have varying capacities, and a line's failure may be conditional on the overall capacity of a given system. This will be explored in future work.

Currently, each line has an equal probability of being removed. Lines are removed iteratively, where $LLoad(t)$ is the initial line load at time $t$, and the loading value is based on the demand node values associated with it. In this model, it is the assumed power that can flow freely between all nodes if a line is present, so multiple generators can satisfy a single node's power demand. However, future work will consider incorporating system specific failure probabilities based on historical system component reliability data. This will be implemented by examining the magnitude and frequency of failure occurrences, and using a goodness-of-fit test to create a representative set of component failure probabilities. Subsequently, lines will be removed according to a distribution of failure likelihood of events using a Monte Carlo sampling approach.

After a line is removed, load redistribution occurs among the remaining lines. This event can cause other lines to exceed their maximum capacity and subsequently also fail. The load redistribution and failure may happen several times before the system reaches a steady (degraded) state, replicating a cascading failure. $Df$ is the resultant demand satisfied after a cascading failure has ended, and the system is operating in a degraded condition.

For both case studies, only the ten highest capacity lines are removed for a given topology design. In this ten-line removal method, the ten highest capacity lines have an equal probability of failing, and the remaining lines (in a given test case) have a 0% initial failure probability. In order to provide the designer with more detailed information about the cascading failure sequence for a particular test case, future work will include a line failure distribution as part of the model output. After the set of Pareto optimal solutions is created with the ten-line method, an exhaustive line removal is performed to explore the cascading behavior for every possible fault scenario, for each of the Pareto optimal solutions. This exhaustive scenario assumes that each line has an equal probability of failure.

Table 2 displays a results comparison which includes the ten-line removal method and the exhaustive line removal. Additional comments on these results are described in Sec. 6. However, when computation time is less of an issue, the exhaustive search approach will likely produce more accurate results. The average steady-state demand value for each of the line removal scenarios is denoted by expected demand ($DE$), which is calculated based on the average resultant demand value ($Df)$

Ten-line removal | Exhaustive removal | Percent change between ten-line and exhaustive removal | |||||||
---|---|---|---|---|---|---|---|---|---|

Case study | Solution type | Topology cost | Expected demand | Demand variance | Expected demand | Demand variance | Number of lines | Expected demand (%) | Demand variance (%) |

3-IEEE 14 | Original topology | 2036 | 622 | 17,888 | 751 | 6140 | 60 | 20.7 | 65.6 |

Optimized solution | 1818 | 764 | 1758 | 770 | 427 | 49 | 0.7 | 75.7 | |

OSU campus | Original topology | 108 | 20.7 | 56.8 | 23.3 | 4.21 | 288 | 14.5 | 92.5 |

Optimized solution | 83.9 | 23.7 | 1.40 × 10^{−29} | 22.9 | 13.1 | 287 | 3.4 | NA |

Ten-line removal | Exhaustive removal | Percent change between ten-line and exhaustive removal | |||||||
---|---|---|---|---|---|---|---|---|---|

Case study | Solution type | Topology cost | Expected demand | Demand variance | Expected demand | Demand variance | Number of lines | Expected demand (%) | Demand variance (%) |

3-IEEE 14 | Original topology | 2036 | 622 | 17,888 | 751 | 6140 | 60 | 20.7 | 65.6 |

Optimized solution | 1818 | 764 | 1758 | 770 | 427 | 49 | 0.7 | 75.7 | |

OSU campus | Original topology | 108 | 20.7 | 56.8 | 23.3 | 4.21 | 288 | 14.5 | 92.5 |

Optimized solution | 83.9 | 23.7 | 1.40 × 10^{−29} | 22.9 | 13.1 | 287 | 3.4 | NA |

where $n$ is the total number of lines removed. Figure 6 outlines the general method involved for the topology design, implementation of system faults, and calculation of expected demand.

where $n$ is the total number of lines removed and $LN$ is the total number of lines in a given test case. For example, to calculate the expected demand for the OSU Campus case with 287 lines, it would take 10/287 of the time to calculate the solution for ten-line removal as compared to an exhaustive removal.

## Implementation

### Multi-Objective Optimization.

*minimize*cost,

*maximize*expected demand, and

*minimize*the variability of expected demand between each fault scenario. The optimization equation formulation can be seen in the below equation:

where $Aij$ represent the adjacency matrix of the topology design and $NComp$ is the number of disconnected components (i.e., nodes) in the topology. $\sigma DE2$ is the variance of expected demand and is intended to represent system robustness. In this formulation, no weights are assigned to any of the three objectives, so all Pareto optimal solutions can be evaluated. The optimization formulation is calculated using a multi-objective simulated annealing search algorithm which is discussed in Sec. 6.2.

### Multi-Objective Simulated Annealing Algorithm.

For this research, a modified SA algorithm is used for obtaining a global optimum, as it avoids getting trapped in local optima by accepting both improved and deteriorated solutions with a probability of less than one [60]. This acceptance probability is controlled by the “annealing temperature,” and decreases as the temperature drops during the annealing process. Czyzżak and Jaszkiewicz [61] expanded on SA and developed Pareto simulated annealing (PSA) to enable this search method to work for multi-objective optimization problems. PSA uses a set of interacting solutions (i.e., the *generating set*) at each iteration, instead of using one candidate, to represent the final solution [62].

where the adjacency matrix ($Ay)$ is the solution obtained by perturbing the adjacency matrix $(Ax)$, $NSA$ is the total number of objective functions, and $T$ is the temperature at each iteration. Details of SA algorithm used for this model are as follows:

Initial temperature = 100

Stop temperature = 1 × 10

^{−3}Cooling rate = 0.95

## Case Study Results

Table 2 presents a summary of results and displays a Pareto optimal solution for the ten-line removal method, and the corresponding solution for an exhaustive line removal, for both the synthetic 3-IEEE 14 and the OSU campus case study. The objective function results are shown for the lowest variance (i.e., robust solution) topology design, and include values for topology cost (as calculated in Eq. (3)), expected demand, and expected demand variance. For comparison, the objective function results for the unmodified case topologies (i.e., original topology) are also presented. The percent value change between the ten-line removal method and the exhaustive removal is displayed to evaluate the assumption that faults occurring at power system lines with high capacities have the largest impact on system performance. Based on these initial results, the ten-line removal method appears to be a viable alternative to the exhaustive removal approach.

Beginning with the 3-IEEE 14 case, Table 2 shows that the optimized solution outperforms the original topology for all three optimization objectives and significantly reduces expected demand variance for both the ten-line removal method and the exhaustive removal. In this case study, expected demand variance is reduced by about 65% when calculated using the exhaustive search over the ten-line approach, implying that there is a potentially more robust solution available, if the designer is willing to exhaustively explore each fault scenario for a given topology. In addition, the number of lines within the optimized solution topology is reduced to 49, from 60. Beyond the benefits of being a less expensive design, it is also possible that the likelihood of an initiating failure may be reduced, due to the smaller number of system components. This hypothesis will be examined in future work. Figure 7 displays the Pareto frontier of design objective solutions, using the exhaustive search approach, including expected demand, cost, and expected demand variance.

For the OSU campus case study, the optimized solution topology design was reduced from 288 to 287 lines, and cost was reduced from 108 (units) to 83.9. This infers that while the original topology was only reduced by one line, the optimized solution consists of a larger quantity of shorter line lengths with lower voltages (Eq. (3)). The optimized solution performed better than the original topology using the ten-line approach, however, this was not the case for the exhaustive removal. For the exhaustive removal solution, expected demand variance increased slightly from the original topology, implying that the current OSU campus power network is a robust design. In this paper, we did not compare two different search methods (i.e., ten-line removal and exhaustive removal), as the latter would be computationally unfeasible for larger line sizes. Instead, we compared the accuracy of the solution found by the ten-line removal method. Therefore, it is not the case that exhaustive removal performed worse than ten-line removal, only that the evaluation of the exhaustive removal is more accurate.

In practice, it would be up to the designer to decide if the cost reduction was worth compromising the current system's (i.e., original topology) robustness. However, future work is required to continue validating the assumption that minimizing expected demand variability implies system robustness. Figure 8 displays the Pareto frontier of design objective solutions including expected demand, cost, and expected demand variance. The optimized solution topology for the OSU Campus case is shown in Fig. 9, where larger line thickness designates higher line loads.

It should be noted that neither case study currently accounts for physical geographical constraints between nodes (e.g., mountains, rivers, and preservation areas), and assumes a connection between nodes is always possible. However, both networks are constrained by the physical distances between nodes, which were originally constructed around such topology restrictions. To increase model fidelity, these types of geographic constraints can be added for specific applications by penalizing node connections representing various topological features.

## Conclusions and Future Work

As infrastructure systems operate in highly stochastic environments, they must be designed for robustness by minimizing performance variability in the resultant degraded system state. A mathematical model was created that integrates model-based robust design with network topology information to iteratively test various network topology designs against uncertain failure events. Quantifying the behavior of cascading failures due to topology design in complex infrastructure systems is a key contribution, as well as identifying important design tradeoffs between performance and robustness during concept-stage steady‐state design. The applications of this research approach include concept-stage microgrid design and existing systems reworks.

The synthetic 3-IEEE 14 test bus and the OSU campus microgrid case studies demonstrate the effectiveness of the approach presented, comparing objective values between the original topology and the robust solution topology. These case studies highlight the significance of using model-based design and considering subsystem/system topology when optimizing complex infrastructure systems, and examine the influence of cascading failures from one subsystem to another.

One challenge in this research is the ability to validate the method as an accurate abstraction for modern complex infrastructure systems. While the case studies presented show merit, scaling the method to a larger network will assist in determining the solution accuracy. Future work will include modeling of larger power grid networks (e.g., Poland), and comparing the results of this approach to other solutions in the literature. In addition, the approach for determining robustness in this research will be compared to robustness metrics in the existing network analysis methods.

Despite these concerns, this research contributes to the field of complex infrastructure system design by directly addressing the fundamental issue of uncontrollable cascading failures due to existing topological configurations. Designing for robustness increases the predictability of failure effects by incorporating uncertainty into a system model, and optimizing for degraded performance variability. In addition, this approach captures important system topology information, while maintaining critical physical relationships that increase system model accuracy. Future work will also focus on the continued validation of the approach presented by comparatively analyzing case study results between this and other methods for complex infrastructure system design. Specifically, there is additional research required to formulate increasingly accurate system model abstractions that capture optimal tradeoffs between physical properties, simulation assumptions, and topological relationships. By understanding the effects of these tradeoffs, designers can create context specific simulations that balance accuracy, efficiency, and scalability.

## Nomenclature

- $Aij$ =
adjacency matrix

- $Ax$ =
initial adjacency matrix in simulated annealing algorithm

- $Ay$ =
solution obtained by perturbing the adjacency matrix $Ax$

- $CL$ =
network performance

- $CTot$ =
transmission line cost

- $CijLength$ =
nominal length cost between all pairs of nodes

- $DE$ =
average of resultant demand values that are satisfied after a failure has occurred

- $Df$ =
resultant demand that is satisfied after a failure has occurred

- $d\mu ,\upsilon $ =
distance between $\mu $ and $\upsilon $

- $E$ =
average network path efficiency

- $eij$ =
value of an adjacency matrix element

- $G$ =
interaction matrix

- $i$ =
row node in an adjacency matrix

- $IEEE14n$ =
objective value from the original IEEE 14 test bus

- $j$ =
column node in an adjacency matrix

- $L$ =
maximum internote distance

- $LCap$ =
maximum power that can flow through an individual line

- $Lij$ =
unit length between all pairs of nodes

- $LLoad$ =
amount of power flowing through an arc

- $LLoad(t)$ =
initial line load at a given time $t$

- $LN$ =
total number of lines in a test case

- $N$ =
number of elements in a specific row or column of an adjacency matrix

- $nD$ =
number of demand nodes

- $nG$ =
number of generation nodes

- $NComp$ =
number of disconnected components of a network

- $NSA$ =
number of objective functions in simulated annealing algorithm

- $nGi$ =
number of generation units able to supply flow to distribution vertex

- $Objn$ =
objective function value

- $P$ =
probability a solution will be selected for the continuation of the simulated annealing algorithm

- $t$ =
instantaneous time associated with an arc load

- $T$ =
temperature at each iteration of the simulated annealing algorithm

- $Tij$ =
cost coefficient based on the type of line

- $\alpha $ =
parameter of the Waxman topology generator

- $\beta $ =
parameter of the Waxman topology generator

- $\gamma $ =
parameter for line factor of safety

- $\mu $ =
point in Cartesian two-space

- $\upsilon $ =
point in Cartesian two-space

- $\sigma DE2$ =
expected demand variance

- $\u03f5ij$ =
network path efficiency