This paper deals with the distributed fault detection and isolation problem of uncertain, nonlinear large-scale systems. The proposed method targets applications where the computation requirements of a full-order failure-sensitive filter would be prohibitively demanding. The original process is subdivided into low-order interconnected subsystems with, possibly, overlapping states. A network of diagnostic units is deployed to monitor, in a distributed manner, the low-order subsystems. Each diagnostic unit has access to a local and noisy measurement of its assigned subsystem's state, and to processed statistical information from its neighboring nodes. The diagnostic algorithm outputs a filtered estimate of the system's state and a measure of statistical confidence for every fault mode. The layout of the distributed failure-sensitive filter achieves significant overall complexity reduction and design flexibility in both the computational and communication requirements of the monitoring network. Simulation results demonstrate the efficiency of the proposed approach.
The majority of contemporary industrial and commercial control systems are composed of a large number of spatially distributed feedback modules with heterogeneous sensors, actuators, and controllers that exchange information over a band-limited communication network that is embedded within the system. These large-scale systems are characterized by high-dimensional state-spaces and nonlinear dynamics. Typical applications are water distribution networks, power grids, automated highway systems, swarms of unmanned aerial vehicles, and environmental control systems, just to name a few. Large-scale systems are much more vulnerable to faults since the effects of a single malfunction to an individual part may rapidly diffuse to the entire system due to the interconnection of the various subcomponents.
Availability, dependability, and resiliency are becoming major design goals for large-scale technological systems due to stringent economic, ecological, and safety demands. These attributes are of major importance, primarily for safety-critical systems, e.g., airplanes, automobiles, and nuclear reactors, since they ensure public safety. Therefore, there is a growing need for reliable real-time monitoring and supervision especially in the case of safety-critical systems. Fault diagnosis (FD) describes the dual objective of detecting the occurrence of a fault (detection) and identifying it (identification or isolation). A timely diagnosis of a fault mode may improve the system's availability and maintainability by avoiding down-times, breakdowns, and catastrophic failures rates.
Research in the field of FD has attracted significant attention since the beginning of the 1970s. The significant majority of existing FD methods [1–6] have a centralized architecture in the sense that the sensor measurements and the diagnostic algorithm are collected and executed by a singleton processing unit. Centralized FD is considered a matured field that has established reliable solutions to many engineering applications. However, the applicability of this traditional approach is limited to concentrated low-order systems.
Modern processes involve high-dimensional state-spaces as well as highly nonlinear dynamics. In the case of large-scale and spatially distributed systems, centralized FD becomes ill-suited. Every monitoring system has certain limitations in terms of computational power and communication bandwidth. When the dimensionality and complexity of the system increases, it is likely that these limitations will not be satisfied by a centralized configuration. The online monitoring of a high-dimensional system would require extensive computations from the central processing unit. Processes with geographically remote subcomponents necessitate long distance and energy demanding broadcasts or complex multihop routing protocols to transmit information to the central fusion center. In both cases, a centralized architecture exhibits poor scalability.
In the literature, the majority of distributed fault detection and identification methods are developed for discrete-event systems and for multiprocessor computing applications [7–9]. A growing interest in the development of distributed fault detection algorithms has also been reported by the wireless sensor networks community . In this work our attention is spotlighted to model-based FD methods for dynamic systems that utilize a mathematical model of the process. A rudimentary classification of existing distributed FD model-based methodologies can take place based on the data type that is exchanged between the nodes of the diagnostic system. The diagnostic nodes (DNs) can exchange: raw measurements of the interconnected states [11–13], state estimates [14,15], or fault signatures . The most prominent work on distributed, observed-based FD for nonlinear dynamic systems has been reported by Ferrari et al.  and Boem et al. . The authors apply overlapping decomposition techniques to subdivide the monolithic process to a set of reduced order subsystems. Each subsystem is monitored by a local nonlinear observer. Seminal work in distributed estimation-based FD using Kalman filtering is reported in Ref. . The algorithm is based on the distributed version of Kalman filter (KF) established by Olfati-Saber [19,20]. The KF is restricted to linear system, while linearization (extended KF) leads to high false alarms rates .
Foundational work on estimation-based FD for nonlinear systems that employ large number of correlated sensors is introduced in Refs. [22,23]. The author combines a distributed particle filtering algorithm for state estimation with fused hypothesis testing through likelihood tests, to determine the occurrence of fault modes. The proposed method is mainly geared toward relative low-order systems monitored by a high number of interconnected sensors. The layout of the algorithm does not accommodate subdivisions of the original process. The applied classical likelihood tests require a bank of estimators equal to the number of fault modes. Such replication of the state is unsuitable for large-scale systems. A full-order distributed failure-sensitive filter has also been introduced by Noursadeghi and Raptis [24,25]. In this scheme, a detection network is assigned to monitor the entire state of the monolithic system using only local measurements. Similarly to the work of Cheng, the authors did not considered any form of subdivision in the dynamics of the original process. Instead, the algorithm introduced in Refs. [24,25] provides an estimate of the entire state of the monolithic system.
The particle filter (PF) is an ideal estimator for fault diagnosis since it avoids linearity and Gaussian noise assumptions. A comprehensive analysis on distributed PF algorithms is given by Hlinka [26,27] and Mohammadi [28,29]. A distributed PF scheme for FD diagnosis that accounts for system decomposition is reported in Ref. . The authors propose a hybrid modeling approach where every potential fault is treated as a system mode. This approach assumes that the transition probability between the fault modes is known a priori. This probabilistic information may not be available in most real-life applications.
In this work, we present a distributed, model-based and sequential fault diagnosis methodology for large-scale, stochastic nonlinear systems that are subject to multiple fault modes. This approach targets systems where the state dimension is significantly large (102 states and higher). A distributed version of the particle filtering method will serve as the foundation of the derived diagnostic algorithms. We introduce a reduced-order fault diagnostic algorithm that allows the subdivision of the original process dynamics to low-order interconnected subsystems with state overlap. A DN is assigned to monitor every partition of the monolithic system and triggers alarm indicators based on its local observations and information exchange between neighboring units. Each local failure sensitive filter outputs an estimate of the subsystem's state vector and the probabilities of failure of the local fault modes.
Our reduced-order FD technique achieves a dramatic decrease to the computational complexity of the original problem and provides significant design flexibility to the layout of the algorithm. The PF is an ideal estimator since it eliminates complex Lyapunov arguments that are required by the observed-based methods to guarantee convergence. A binary update rule is used to repopulate the particles and estimate the system modes without the need for transition probabilities. The failure sensitive filter can simultaneously detect and identify faults without the need for a bank of estimators. The proposed algorithm takes advantage of the decentralized architecture and computational strength of modern embedded systems such as wireless sensor networks and multicore processors.
This paper is organized as follows: A brief description of the PF algorithm is presented in Sec. 2. The synthesis of a centralized PF fault diagnosis algorithm is outlined in Sec. 3. The centralized algorithm serves as a benchmark framework for its distributed counterpart. The reduced-order distributed version of the failure sensitive filter is presented in Sec. 4. The performance of the proposed methodology is evaluated in Sec. 5 via numerical simulations. Finally, concluding remarks are given in Sec. 6.
Centralized Particle Filtering
Different variations of the PF algorithm exist depending on the choice of the importance density function and the resampling step. The most standard form of the PF algorithm is the sequential importance resampling filter (SIR). The SIR filter forms the foundation for some well-known PFs including the bootstrap filter , the auxiliary PF , and the regularized PF . These PFs are derived using a suboptimal choice of the proposal pdf .
where is the normal distribution with zero mean and covariance matrix evaluated at the points , where is the prediction error of the ith particle. For a detailed description of various PF algorithms and resampling techniques, the reader is referred to Ref. . The pseudocode of the bootstrap filter and resampling are provided in Tables 1 and 2, respectively. The block diagram of the bootstrap algorithm is shown in Fig. 1.
Centralized Particle Filtering Fault Diagnosis
This work extends the methodology introduced in Ref.  from one-dimensional fault-growth models to dynamic state-space systems of nonlinear processes introducing the centralized particle filtering fault diagnosis (CPFFD) algorithm. The CPFFD algorithms generate two outputs. The first is the system's state estimate from a sequence of noise infested measurements. The second output is a statistical characterization for the occurrence of each fault mode that can trigger fault alarms.
where the terms , , and refer to the state, input, and measurement vector, respectively; , and denote the known nonlinear functions of the system's healthy dynamics and measurement model, while and stand for the process and measurement noise sequences, respectively.
It is assumed that the system is initiated from the healthy mode ( at k = 0). Due to the random occurrence of the possible faults, the monolithic system may be viewed as a hidden Markov model, where the transition probabilities between the different system modes are unknown.
The proposed failure sensitive filter embeds the dynamics of the monolithic system S given in Eq. (8), as well as a binary variable (for every potential fault), that identifies the changes in the process dynamics expressed by the terms . Hence, the binary state vector , with and , is introduced to estimate the occurrence of each fault mode. More specifically, indicates that the absence of failure mode j, while denotes that the fault mode j is detected to the system. The continuous-valued states are coupled with the discrete-valued binary fault occurrence estimates resulting in a hybrid model.
where and are approximations of the failure sensitive filter's process and measurement noise, respectively. These noise sequences should be as close as possible to the actual ones ( and ). The nonlinear function , represents the evolution of the binary states driven by the identically independent distributed (i.i.d) uniform white noise . The function is defined such that the previous state is randomly excited at each time step by . This random vector of is assigned to one of the binary states (normal/faulty operating condition) based on the distance metric of the perturbed vector to the coordinates and .
where , and are nonlinear functions of appropriate dimensions and structure. The aforementioned definition will be used to ease the notation in subsequent parts of the analysis. The outputs of the CPFFD module are the estimation of the systems's state vector and the probabilities of failure of each fault mode. These probabilities are the expectations of the binary states . This measure is used to trigger alarm indicators if its value exceeds a certain threshold that marks the probability of detection (i.e., indicates that the system is in healthy operating condition). With this layout, two or more different co-existing fault modes can be simultaneously detected. The pseudocode of the CPFFD algorithm is given in Table 3.
The probability of failure is a much more computationally attractive measure compared to classical change detection methods such as hypothesis testing. In the context of fault isolation, detection algorithms using hypothesis testing through logarithm likelihood ratio test requires the execution of a bank of estimators that is equal to the fault modes. For large-scale systems, this computational load is prohibited. The proposed CPFFD algorithm is significantly more efficient, since it increases the dynamics of the detector by only M binary state vectors.
Distributed Particle Filtering Fault Diagnosis
The CPFFD algorithm described in Sec. 3 is not scalable or robust to complex large-scale dynamical systems that employ scattered measurement sensors over large geographical regions. For high-dimensional large-scale systems, this methodology becomes impractical due to limitations in the observation range of sensors, communication bandwidth, and computation power of the centralized computing node.
In this section, we present a reduced-order distributed particle filtering fault diagnosis (DPFFD) algorithm for large-scale nonlinear systems. The original diagnostic problem is subdivided to a number of lower-order interconnected fault sensitive filters. With this technique, each low-order filter can balance its computation power requirements and volume of data transfers. Similar to Ref. , we take into account subdivisions with state overlap. The states that are common between two or more subsystems are referred to as shared states. Shared states between subsystems appear when state variables are mutually monitored by sensors that correspond to different subsystems.
Here, we briefly illustrate the three most characteristic types of decomposition based on a similar description given in Ref. . The most communication intensive decomposition involves nonoverlapping subsystems of order one (Fig. 2(a)). This fragmenting is the most computationally effective, however, most likely the communication limitations will be reached. On the contrary, the decomposition depicted in Fig. 2(b) provides a balanced compromise between computational labor and communication broadcasts. It is important to note that there exists a trade-off between computation power and communication capacity for the nodes of the network. The third case (Fig. 2(c)) is similar to the previous scenario with the difference that there is overlap between the subsystems. In principal, overlapping dynamics increase both the complexity and the communication requirements of the overall design. This additional complexity overhead is due to the fusion of the common measurements and shared state estimates between the nodes. The overlap can increase the fidelity of states and measurements that are exposed to higher uncertainty, since they are monitored by more than one sensor. This fact can justify the additional effort in terms of complexity and communications that stems from overlapping states. Decomposition techniques are out of the scope of this paper, and the interested readers are referred to Refs. [38,39].
Graph theoretical tools are deployed to represent the dynamical interdependence of the system's states . A directed graph or digraph provides a pictorial representation of the system's structure . The digraph of system S is defined as the pair , where is the set of vertices consisted of the system states , the noise inputs , and the scalar terms . The set represents the oriented edges defined by the ordered pairs , where and . An oriented edge exists between the state Xl (or vl and βl) and state Xm, if the former appears at the dynamic equation of the latter. If an edge exists between vertices νl and νm, we call them adjacent and denote this relationship by . We define the neighborhood of the vertex as the set of all adjacent states to Xm. The digraph is also referred to as structural graph of the system S.
From a graph theoretical perspective, each subsystem SI of the monolithic process is represented by a cut-point set of vertices , where . Each cut-point set includes states that are observed locally by sensors of its corresponding subsystem. The components of χ that belong to the cut-point set comprise the local states of subsystem SI. States from subsystems with departing directed edges that enter the vertices of a cut-point set determine the interconnection variables or forcing terms.
where and refer to the control input and measurement vector of subsystem I, respectively. The nonlinear functions and denote to the local subsystem and measurement dynamics; while and stand for the subsystem and measurement noise, respectively.
Likewise to the centralized approach, the formulation of the reduced order local PF for fault diagnosis will include a vector of binary states to represent the absence or presence of each fault mode. The binary vector of failure mode j at subsystem I is represented by with the local fault function .
The aforementioned definitions are illustrated with a simple example. Consider a three-dimensional system with the global state vector , the noise vector , and the set of change step functions . The digraph of this example is shown in Fig. 3. Each sensor set monitors one subsystem. The monolithic system of the this example is decomposed into two subsystems represented by the cut-point sets and . The local dynamic models of these two overlapping subsystems are
where , , , and are nonlinear functions with appropriate dimensions and structure. The diagnostic algorithm includes the design of one DN for each subsystem SI. Each DN consists of a processing unit that executes the local PF algorithm. The nodes can measure their own local states and communicate with their neighbors to obtain a processed estimate of their forcing term vector. The layout of the proposed DPFFD algorithm is depicted in Fig. 4. The algorithm can be separated into three main parts.
Particles update: In the first part, each DN executes a local bootstrap PF. For every subsystem, Ns particles are drawn according to the state transition propagation given in Eq. (15). This action requires estimates of the forcing terms obtained by the neighbors of DN I. At time k − 1 all subsystems have already generated an estimate of their own states.
The local PF is concluded after the weight normalization and resampling steps of the bootstrap filter (Sec. 2). The outputs of each DN are an estimation of the subsystems' state vector , and the probabilities of failure .
Shared states fusion: The last part of the reduced-order DPFFD algorithm involves the fusion of state variables that belong to overlapping cut-point sets. At each time step, the estimates of all DN are collected by a central fusion center that assembles the final global output of the diagnostic network. The data transmitted to the central unit contain only postprocessed information. This is the only centralized processing action that takes place on the fusion center and does not add significant computational overhead to the algorithm. A running average filter is executed between the shared states to calculate a common estimate of their value.
The reduced-order DPFFD algorithm results to a significant reduction in the computational complexity and communication bandwidth requirements of each DN. Suppose that the large-scale system has states. The computational complexity of the centralized architecture, according to Ref.  and by considering Ns particles is approximated by floating point operations (flops). By decomposing the system into N subsystems, the number of the state variables is decreased to at each subsystem. Assuming that Ns particles are generated in every reduced-order estimator node, and with the assumption of no shared states, the total computational complexity of the reduced-order DPFFD algorithm reduces to .
This section provides an evaluation of the proposed DPFFD algorithm via extensive numerical simulations. Two systems of different dimensionality are analyzed to validate the efficiency of the algorithm. In both cases, the process model under investigation is a water tank system. This process was selected since its dynamics are nonlinear and its physical subcomponents (water tanks) are clearly identified.
The first case study involves the water tank system illustrated in Fig. 6. This process consists of nine identical cylindrical tanks of cross-sectional area Sc. The tanks are connected with pipes of cross section area Sp. The flow rate between tank i and tank j is defined by means of Torricelli's rule as
where refers to the neighboring tanks of tank i. The nominal values of the process's parameters are given in Table 5 and are based on the benchmark process described in Ref. . The fault modes under consideration are abrupt leaks to the water tanks. The leakage dynamics ot tank j are given by
where is the sampling period, and the process noise is drawn from the normal distribution . The goal of this simulation scenario is to investigate the case of decomposition with overlapping states. To this end, the monolithic process is decomposed into two reduced-order subsystems, namely S1 and S2, as shown in Fig. 6. Figure 7(a) depicts the structural graph of the monolithic process and its partitions. The local observation vectors of the two DNs are expressed by
where , , and the measurement noise sequences ω1 and ω2 are generated by the multivariate normal distribution . The subgraphs of the observation fusion are depicted in Fig. 7(b). As shown in this figure, the states are shared between the two DNs. Three failure modes are seeded at tanks 1, 4, and 5 at the time steps , respectively. The time horizon of the simulation is set to 360 time steps. The number of particles at each DN is set to Ns = 200.
During the execution of the reduced-order DPFFD algorithm, the i.i.d noise that drives the binary states is generated by the distribution . Figure 8 shows the population of the particles on the plane during the healthy and faulty operating condition of the system at a given time instant. The selection of this noise range plays a crucial role in the performance of the algorithm. The effect of the i.i.d uniform white noise is illustrated in Fig. 9. When the noise is with a = 0.5 (too small), there is no overlap between the two regions; thus, the particles remain trapped in the healthy state even in the presence of a fault. On the contrary, when the overlap increases (), the particles keep transitioning between the states and the output of the failure filter is indecisive.
The binary state's update function of Eq. (10) is essentially a “data-driven feedback” for the failure sensitive filter. This way, when the process is healthy, the filter will diminish the particles that correspond to fault modes indirectly through the likelihood function . A compromising value that ensures the optimal operation of the diagnostic filter was shown to be a = 0.6.
The probabilities of failure for each fault mode are illustrated in Fig. 10. Due to the overlap of the two DNs, the estimates of common states are fused using the central averaging step described in Sec. 4. As it can be seen, both DNs can timely detect and isolate their respective fault modes.
where nrow and ncol are the row and column number of the tank in the grid, and Nrows denotes the total rows in the array.
Abrupt leaks are seeded randomly to nine tanks at the time instances listed in Table 6. The nominal values of the system parameters are identical to the first scenario (Table 5). The measurement/process noise are drawn from the normal distribution . The time horizon of the simulation is set to 190 time steps. The same tuning guidelines for the failure sensitive filters hold with the first example. The initial values of the estimated tank water levels are set to .
Due to the high dimensionality of the system's states, the simulation results are presented with respect to both time and space. The illustration of the probabilities of failure, the actual and estimated values of the water tank levels are shown in the first, second, and the third row of Fig. 11, respectively. The output values of the DNs are depicted as color-coded pixels based on their location in the lattice, for different time instances. The probabilities of failure with respect to time, only for the leaked tanks, are shown in Fig. 12. When a leak is seeded in one of the tanks, its water level will gradually reduce. For a transient interval, the neighboring tanks will try to compensate for this loss due to the pressure difference until their level will also start to decrease as well. The diagnostic performance is deemed satisfactory since each DN can promptly detect and isolate its own fault mode in spite of having access to local information. This case study involves only nonoverlapping subsystems; therefore, state fusion was not necessary. The computational reduction compared to the CPFFD algorithm is dramatic. Instead of processing 100 states, each node is responsible of monitoring a one-dimensional system.
We have presented a reduced-order distributed implementation of a fault detection and isolation algorithm for nonlinear large-scale systems. A network of interconnected DNs is employed to monitor the entire process. Each node monitors lower-order subdivisions of the monolithic system. The DNs have access to partial local measurements and can communicate with adjacent nodes of the monitoring network. The layout of the scheme is driven by the two main constraints of networked systems: the available communication bandwidth and processing capabilities of the nodes. An on-line hypothesis testing module is embedded at each failure sensitive filter that triggers alarm indicators in the presence of a fault. This inference component eliminates the need for the entire system's state at each DN and the necessity of a bank of estimators to isolate the occurring faults. A simplistic state fusion step takes place between nodes that monitor common states. This approach relieves the filter design analysis by substituting the complex stability proofs that are required by observed-based methods, with Monte Carlo simulations that are conveniently applicable to real-life sensor networks.
National Science Foundation (NSF) (Award No. CMMI-1662742).
- b =
binary vector of Sf
- B =
set of time profile functions
- bI =
binary vector of subsystem
- dI =
interconnection variables (forcing terms) of subsystem SI
- ei =
prediction error of particle i
state transition function
compact state transition function
state transition function of subsystem SI
compact state transition function of subsystem
structural graph (digraph) of system S
function of fault mode j
fault function of failure mode j at subsystem SI
observation function of subsystem SI
compact local observation function
- HI =
local state matrix
- k =
time occurrence of failure mode j
- M =
number of failure modes
- n =
uniform white noise
- N =
number of subsystems
dimension of subsystem's SI forcing vector dI
- nu =
dimension of input vector
dimension of subsystem's SI input vector uI
- nv =
dimension of process noise vector
dimension of subsystem's SI system noise vector vI
- nx =
dimension of state vector
dimension of subsystem's SI state vector xI
- nz =
dimension of observation vector
dimension of subsystem's SI measurement vector zI
dimension of measurement noise vector
dimension of subsystem's SI measurement noise vector ωI
- Ns =
number of particles
posterior density function
state transition density function
likelihood density function
proposal distribution function
set of real numbers
- S =
- SI =
subsystem I of monolithic system S
- Sf =
failure sensitive filter
subsystem I of failure sensitive filter Sf
- u =
- uI =
input vector of subsystem SI
compact input vector of subsystem (combination of uI and dI)
- v =
process noise vector
approximate process noise vector
- V =
set of noise inputs
compact noise vector of system Sf (combination of and n)
- vI =
process noise vector of subsystem SI
compact noise vector of subsystem (combination of vI and nI)
neighborhood set of vertex m
set of vertices of the graph
- wi =
particles' weight of DN I
- x =
state vector of monolithic system S
global fused state vector
compact state vector of Sf (combination of xc and )
- xI =
state vector of subsystem SI
- Xi =
state variable i
compact state vector of subsystem (combination of and )
- xi =
particles of x
- xc =
continuous valued state vector of Sf
particles generated by DN I
nx order estimate of state xI
estimate of Xj by DN I
- z =
compact measurement vector of system Sf
- zI =
observation vector of subsystem SI
compact observation vector of subsystem
set of positive integers
- β =
time profile function of a fault's occurrence
- εs =
edges of the graph
covariance matrix of measurement noise
update function of the binary states
- χ =
set of system states
- ω =
measurement noise vector
approximate measurement noise vector
- ωI =
measurement noise vector of subsystem SI