This paper deals with the distributed fault detection and isolation problem of uncertain, nonlinear large-scale systems. The proposed method targets applications where the computation requirements of a full-order failure-sensitive filter would be prohibitively demanding. The original process is subdivided into low-order interconnected subsystems with, possibly, overlapping states. A network of diagnostic units is deployed to monitor, in a distributed manner, the low-order subsystems. Each diagnostic unit has access to a local and noisy measurement of its assigned subsystem's state, and to processed statistical information from its neighboring nodes. The diagnostic algorithm outputs a filtered estimate of the system's state and a measure of statistical confidence for every fault mode. The layout of the distributed failure-sensitive filter achieves significant overall complexity reduction and design flexibility in both the computational and communication requirements of the monitoring network. Simulation results demonstrate the efficiency of the proposed approach.

## Introduction

The majority of contemporary industrial and commercial control systems are composed of a large number of spatially distributed feedback modules with heterogeneous sensors, actuators, and controllers that exchange information over a band-limited communication network that is embedded within the system. These large-scale systems are characterized by high-dimensional state-spaces and nonlinear dynamics. Typical applications are water distribution networks, power grids, automated highway systems, swarms of unmanned aerial vehicles, and environmental control systems, just to name a few. Large-scale systems are much more vulnerable to faults since the effects of a single malfunction to an individual part may rapidly diffuse to the entire system due to the interconnection of the various subcomponents.

Availability, dependability, and resiliency are becoming major design goals for large-scale technological systems due to stringent economic, ecological, and safety demands. These attributes are of major importance, primarily for safety-critical systems, e.g., airplanes, automobiles, and nuclear reactors, since they ensure public safety. Therefore, there is a growing need for reliable real-time monitoring and supervision especially in the case of safety-critical systems. Fault diagnosis (FD) describes the dual objective of detecting the occurrence of a fault (detection) and identifying it (identification or isolation). A timely diagnosis of a fault mode may improve the system's availability and maintainability by avoiding down-times, breakdowns, and catastrophic failures rates.

Research in the field of FD has attracted significant attention since the beginning of the 1970s. The significant majority of existing FD methods [1–6] have a centralized architecture in the sense that the sensor measurements and the diagnostic algorithm are collected and executed by a singleton processing unit. Centralized FD is considered a matured field that has established reliable solutions to many engineering applications. However, the applicability of this traditional approach is limited to concentrated low-order systems.

Modern processes involve high-dimensional state-spaces as well as highly nonlinear dynamics. In the case of large-scale and spatially distributed systems, centralized FD becomes ill-suited. Every monitoring system has certain limitations in terms of computational power and communication bandwidth. When the dimensionality and complexity of the system increases, it is likely that these limitations will not be satisfied by a centralized configuration. The online monitoring of a high-dimensional system would require extensive computations from the central processing unit. Processes with geographically remote subcomponents necessitate long distance and energy demanding broadcasts or complex multihop routing protocols to transmit information to the central fusion center. In both cases, a centralized architecture exhibits poor scalability.

In the literature, the majority of distributed fault detection and identification methods are developed for discrete-event systems and for multiprocessor computing applications [7–9]. A growing interest in the development of distributed fault detection algorithms has also been reported by the wireless sensor networks community [10]. In this work our attention is spotlighted to model-based FD methods for dynamic systems that utilize a mathematical model of the process. A rudimentary classification of existing distributed FD model-based methodologies can take place based on the data type that is exchanged between the nodes of the diagnostic system. The diagnostic nodes (DNs) can exchange: raw measurements of the interconnected states [11–13], state estimates [14,15], or fault signatures [16]. The most prominent work on distributed, observed-based FD for nonlinear dynamic systems has been reported by Ferrari et al. [11] and Boem et al. [17]. The authors apply overlapping decomposition techniques to subdivide the monolithic process to a set of reduced order subsystems. Each subsystem is monitored by a local nonlinear observer. Seminal work in distributed estimation-based FD using Kalman filtering is reported in Ref. [18]. The algorithm is based on the distributed version of Kalman filter (KF) established by Olfati-Saber [19,20]. The KF is restricted to linear system, while linearization (extended KF) leads to high false alarms rates [21].

Foundational work on estimation-based FD for nonlinear systems that employ large number of correlated sensors is introduced in Refs. [22,23]. The author combines a distributed particle filtering algorithm for state estimation with fused hypothesis testing through likelihood tests, to determine the occurrence of fault modes. The proposed method is mainly geared toward relative low-order systems monitored by a high number of interconnected sensors. The layout of the algorithm does not accommodate subdivisions of the original process. The applied classical likelihood tests require a bank of estimators equal to the number of fault modes. Such replication of the state is unsuitable for large-scale systems. A full-order distributed failure-sensitive filter has also been introduced by Noursadeghi and Raptis [24,25]. In this scheme, a detection network is assigned to monitor the entire state of the monolithic system using only local measurements. Similarly to the work of Cheng, the authors did not considered any form of subdivision in the dynamics of the original process. Instead, the algorithm introduced in Refs. [24,25] provides an estimate of the entire state of the monolithic system.

The particle filter (PF) is an ideal estimator for fault diagnosis since it avoids linearity and Gaussian noise assumptions. A comprehensive analysis on distributed PF algorithms is given by Hlinka [26,27] and Mohammadi [28,29]. A distributed PF scheme for FD diagnosis that accounts for system decomposition is reported in Ref. [30]. The authors propose a hybrid modeling approach where every potential fault is treated as a system mode. This approach assumes that the transition probability between the fault modes is known a priori. This probabilistic information may not be available in most real-life applications.

In this work, we present a distributed, model-based and sequential fault diagnosis methodology for large-scale, stochastic nonlinear systems that are subject to multiple fault modes. This approach targets systems where the state dimension is significantly large (10^{2} states and higher). A distributed version of the particle filtering method will serve as the foundation of the derived diagnostic algorithms. We introduce a reduced-order fault diagnostic algorithm that allows the subdivision of the original process dynamics to low-order interconnected subsystems with state overlap. A DN is assigned to monitor every partition of the monolithic system and triggers alarm indicators based on its local observations and information exchange between neighboring units. Each local failure sensitive filter outputs an estimate of the subsystem's state vector and the probabilities of failure of the local fault modes.

Our reduced-order FD technique achieves a dramatic decrease to the computational complexity of the original problem and provides significant design flexibility to the layout of the algorithm. The PF is an ideal estimator since it eliminates complex Lyapunov arguments that are required by the observed-based methods to guarantee convergence. A binary update rule is used to repopulate the particles and estimate the system modes without the need for transition probabilities. The failure sensitive filter can simultaneously detect and identify faults without the need for a bank of estimators. The proposed algorithm takes advantage of the decentralized architecture and computational strength of modern embedded systems such as wireless sensor networks and multicore processors.

This paper is organized as follows: A brief description of the PF algorithm is presented in Sec. 2. The synthesis of a centralized PF fault diagnosis algorithm is outlined in Sec. 3. The centralized algorithm serves as a benchmark framework for its distributed counterpart. The reduced-order distributed version of the failure sensitive filter is presented in Sec. 4. The performance of the proposed methodology is evaluated in Sec. 5 via numerical simulations. Finally, concluding remarks are given in Sec. 6.

## Centralized Particle Filtering

*x*(

*k*) is defined according to

*k*, the measurement equation of the state

*x*(

*k*) is expressed by

*v*(

*k*) and measurement noise $\omega (k)$ are white and independent with known probability density functions (pdf). From a Bayesian perspective, the objective is to recursively quantify some degree of belief in the state

*x*(

*k*), given the measurement data $z(1:k)$ up to time

*k*. The belief is expressed by the calculation of the

*posterior*pdf $p(x(k)|z(1:k))$. The calculation of the posterior density $p(x(k)|z(1:k))$ allows the computation of various measures of the state

*x*(

*k*), such as the minimum mean square error

*x*(

*k*). For example, the minimum mean square error estimate is approximated as

Different variations of the PF algorithm exist depending on the choice of the importance density function and the resampling step. The most standard form of the PF algorithm is the sequential importance resampling filter (SIR). The SIR filter forms the foundation for some well-known PFs including the bootstrap filter [32], the auxiliary PF [33], and the regularized PF [33]. These PFs are derived using a suboptimal choice of the proposal pdf $q(x(k)|z(k\u22121),z(k))$.

*ω*is considered as a Gaussian distribution with zero mean and covariance matrix $\Sigma \omega $, the likelihood function for each particle is calculated by

where $N(ei(k),0,\Sigma \omega )$ is the normal distribution with zero mean and covariance matrix $\Sigma \omega $ evaluated at the points $ei(k)=z(k)\u2212h(xi(k))$, where $ei(k)$ is the prediction error of the *i*th particle. For a detailed description of various PF algorithms and resampling techniques, the reader is referred to Ref. [33]. The pseudocode of the bootstrap filter and resampling are provided in Tables 1 and 2, respectively. The block diagram of the bootstrap algorithm is shown in Fig. 1.

function BOOTSTRAP PF |

$Inputs:\u2009xi(k\u22121),wi(k\u22121),z(k)$ |

$Outputs:\u2009xi(k),wi(k)$ |

$Required:\u2009RESAMPLE$ |

1: for$i=1:Ns$do |

2: $xi(k)=f(xi(k\u22121),vi(k\u22121))$ ▷ Particles update |

3: $wi(k)=wi(k\u22121)\xb7p(z(k)|xi(k))$ ▷ Weights update |

4: end for |

5: $wi(k)=wi(k)\u2211j=1Nswj(k)$ ▷ Weight normalization |

6: $[{xi(k),wi(k)}i=1Ns]=RESAMPLE[{xi(k),wi(k)}i=1Ns]$ ▷ Resampling |

7: $x\u0302(k)=\u2211i=1Nswi(k)xi(k)$ ▷ Minimum mean-square error |

function BOOTSTRAP PF |

$Inputs:\u2009xi(k\u22121),wi(k\u22121),z(k)$ |

$Outputs:\u2009xi(k),wi(k)$ |

$Required:\u2009RESAMPLE$ |

1: for$i=1:Ns$do |

2: $xi(k)=f(xi(k\u22121),vi(k\u22121))$ ▷ Particles update |

3: $wi(k)=wi(k\u22121)\xb7p(z(k)|xi(k))$ ▷ Weights update |

4: end for |

5: $wi(k)=wi(k)\u2211j=1Nswj(k)$ ▷ Weight normalization |

6: $[{xi(k),wi(k)}i=1Ns]=RESAMPLE[{xi(k),wi(k)}i=1Ns]$ ▷ Resampling |

7: $x\u0302(k)=\u2211i=1Nswi(k)xi(k)$ ▷ Minimum mean-square error |

function RESAMPLE |

$Inputs:\u2009xi(k),wi(k)$ |

$Outputs:\u2009xj*(k),wj(k)$ |

1: $c(1)=0$ ▷ Initialize the cumulative distribution function (CDF) |

2: for$i=2:Ns$do |

3: $c(i)=c(i\u22121)+wi(k)$ ▷ Construct CDF |

4: end for |

5: i = 1 ▷ Start at the bottom of the CDF |

6: $u(1)=U[0,Ns\u22121]$ ▷ Draw a starting point |

7: for$j=1:Ns$do |

8: $u(j)=u(1)+Ns\u22121(j\u22121)$ ▷ Move along the CDF |

9: while$u(j)>c(i)$do |

10: $i=i+1$ |

11: end while |

12: $xj*(k)=xi(k)$ ▷ Assign samples |

13: $wj(k)=Ns\u22121$ ▷ Assign weights |

14: end for |

function RESAMPLE |

$Inputs:\u2009xi(k),wi(k)$ |

$Outputs:\u2009xj*(k),wj(k)$ |

1: $c(1)=0$ ▷ Initialize the cumulative distribution function (CDF) |

2: for$i=2:Ns$do |

3: $c(i)=c(i\u22121)+wi(k)$ ▷ Construct CDF |

4: end for |

5: i = 1 ▷ Start at the bottom of the CDF |

6: $u(1)=U[0,Ns\u22121]$ ▷ Draw a starting point |

7: for$j=1:Ns$do |

8: $u(j)=u(1)+Ns\u22121(j\u22121)$ ▷ Move along the CDF |

9: while$u(j)>c(i)$do |

10: $i=i+1$ |

11: end while |

12: $xj*(k)=xi(k)$ ▷ Assign samples |

13: $wj(k)=Ns\u22121$ ▷ Assign weights |

14: end for |

## Centralized Particle Filtering Fault Diagnosis

This work extends the methodology introduced in Ref. [34] from one-dimensional fault-growth models to dynamic state-space systems of nonlinear processes introducing the centralized particle filtering fault diagnosis (CPFFD) algorithm. The CPFFD algorithms generate two outputs. The first is the system's state estimate from a sequence of noise infested measurements. The second output is a statistical characterization for the occurrence of each fault mode that can trigger fault alarms.

function CPFFD |

$Inputs:\u2009Xi(k\u22121),wi(k\u22121),ui(k\u22121),Z(k)$ |

$Outputs:\u2009Xi(k),wi(k),\u2009{b\u03022j(k)}j=1M$ |

$Required:\u2009RESAMPLE$ |

1: for$i=1:Ns$do |

2: $Xi(k)=F(Xi(k\u22121),ui(k\u22121),V(k\u22121))$ ▷ Particles update |

3: $wi(k)=wi(k\u22121)\xb7p(Z(k)|Xi(k))$ ▷ Weights update |

4: end for |

5: $wi(k)=wi(k)\u2211j=1Nswj(k)$ ▷ Weight normalization |

6: $[{Xi(k),wi(k)}i=1Ns]=RESAMPLE[{Xi(k),wi(k)}i=1Ns]$ ▷ Resampling |

7: $X\u0302(k)=\u2211i=1Nswi(k)Xi(k)$ ▷ Minimum mean-square error |

8: for$j=1:M$do |

9: $b\u03022j(k)=E[b2j(k)|Z(k)]$ ▷ Probabilities of failure |

10: end for |

function CPFFD |

$Inputs:\u2009Xi(k\u22121),wi(k\u22121),ui(k\u22121),Z(k)$ |

$Outputs:\u2009Xi(k),wi(k),\u2009{b\u03022j(k)}j=1M$ |

$Required:\u2009RESAMPLE$ |

1: for$i=1:Ns$do |

2: $Xi(k)=F(Xi(k\u22121),ui(k\u22121),V(k\u22121))$ ▷ Particles update |

3: $wi(k)=wi(k\u22121)\xb7p(Z(k)|Xi(k))$ ▷ Weights update |

4: end for |

5: $wi(k)=wi(k)\u2211j=1Nswj(k)$ ▷ Weight normalization |

6: $[{Xi(k),wi(k)}i=1Ns]=RESAMPLE[{Xi(k),wi(k)}i=1Ns]$ ▷ Resampling |

7: $X\u0302(k)=\u2211i=1Nswi(k)Xi(k)$ ▷ Minimum mean-square error |

8: for$j=1:M$do |

9: $b\u03022j(k)=E[b2j(k)|Z(k)]$ ▷ Probabilities of failure |

10: end for |

*S*described by the following state-space model:

where the terms $x(k)\u2208\mathbb{R}nx$, $u(k)\u2208\mathbb{R}nu$, and $z(k)\u2208\mathbb{R}nz$ refer to the state, input, and measurement vector, respectively; $f(k):\mathbb{R}nx\u2192\mathbb{R}nx$, and $h(k):\u2009\mathbb{R}nx\u2192\mathbb{R}nz$ denote the known nonlinear functions of the system's healthy dynamics and measurement model, while $v(k)\u2208\mathbb{R}nx$ and $\omega (k)\u2208\mathbb{R}nz$ stand for the process and measurement noise sequences, respectively.

*M*potential fault modes described by the nonlinear functions ${gj(x(k),u(k))}j=1M$ with $gj(k):\u2009\mathbb{R}nx\xd7nu\u2192\u2009\mathbb{R}nx$. The term $\beta (k\u2212k0j)$ is a scalar function representing the time profile of the fault mode

*j*occurring at some unknown time $k0j$. We can consider both

*abrupt*(step-like) or

*incipient*(exponential-like) fault modes, defined as

It is assumed that the system is initiated from the healthy mode ($\beta (\xb7)=0$ at *k* = 0). Due to the random occurrence of the possible faults, the monolithic system may be viewed as a hidden Markov model, where the transition probabilities between the different system modes are unknown.

The proposed failure sensitive filter embeds the dynamics of the monolithic system *S* given in Eq. (8), as well as a binary variable (for every potential fault), that identifies the changes in the process dynamics expressed by the terms $\beta (k\u2212k0j)$. Hence, the binary state vector $bj(k)=[b1j(k)b2j(k)]T$, with $b1j,b2j\u2208{0,1}$ and $j=1,\u2026,M$, is introduced to estimate the occurrence of each fault mode. More specifically, $b1j(k)=1$ indicates that the absence of failure mode *j*, while $b2j(k)=1$ denotes that the fault mode *j* is detected to the system. The continuous-valued states are coupled with the discrete-valued binary fault occurrence estimates resulting in a hybrid model.

where $v\u0303(k)\u2208\mathbb{R}nx$ and $\omega \u0303(k)\u2208\mathbb{R}nz$ are approximations of the failure sensitive filter's process and measurement noise, respectively. These noise sequences should be as close as possible to the actual ones ($v(k)$ and $\omega (k)$). The nonlinear function $\Phi :\mathbb{R}2\u2192{[0\u20091]T,[1\u20090]T}$, represents the evolution of the binary states driven by the identically independent distributed (i.i.d) uniform white noise $nj(k)$. The function $\Phi (\xb7)$ is defined such that the previous state $bj(k\u22121)$ is randomly excited at each time step by $nj(k\u22121)$. This random vector of $\mathbb{R}2$ is assigned to one of the binary states (normal/faulty operating condition) based on the distance metric of the perturbed vector $bj(k\u22121)+nj(k\u22121)$ to the coordinates $[0\u20091]T$ and $[1\u20090]T$.

where $Z(k)=z(k),\u2009V(k)=[v\u0303(k)\u2009n1(k)\u2026nM(k)]$, and $F(\xb7),\u2009H(\xb7)$ are nonlinear functions of appropriate dimensions and structure. The aforementioned definition will be used to ease the notation in subsequent parts of the analysis. The outputs of the CPFFD module are *the estimation of the systems's state vector* and *the probabilities of failure of each fault mode*. These probabilities are the expectations of the binary states $b\u03022j(k)=E[b2j(k)|Z(k)]$. This measure is used to trigger alarm indicators if its value exceeds a certain threshold $\alpha \u2208(0;1)$ that marks the probability of detection (i.e., $b\u03022j(k)<\alpha $ indicates that the system is in healthy operating condition). With this layout, two or more different co-existing fault modes can be simultaneously detected. The pseudocode of the CPFFD algorithm is given in Table 3.

The probability of failure is a much more computationally attractive measure compared to classical change detection methods such as hypothesis testing. In the context of fault isolation, detection algorithms using hypothesis testing through logarithm likelihood ratio test requires the execution of a bank of estimators that is equal to the fault modes. For large-scale systems, this computational load is prohibited. The proposed CPFFD algorithm is significantly more efficient, since it increases the dynamics of the detector by only *M* binary state vectors.

## Distributed Particle Filtering Fault Diagnosis

The CPFFD algorithm described in Sec. 3 is not scalable or robust to complex large-scale dynamical systems that employ scattered measurement sensors over large geographical regions. For high-dimensional large-scale systems, this methodology becomes impractical due to limitations in the observation range of sensors, communication bandwidth, and computation power of the centralized computing node.

In this section, we present a reduced-order distributed particle filtering fault diagnosis (DPFFD) algorithm for large-scale nonlinear systems. The original diagnostic problem is subdivided to a number of lower-order interconnected fault sensitive filters. With this technique, each low-order filter can balance its computation power requirements and volume of data transfers. Similar to Ref. [37], we take into account subdivisions with state overlap. The states that are common between two or more subsystems are referred to as shared states. Shared states between subsystems appear when state variables are mutually monitored by sensors that correspond to different subsystems.

Here, we briefly illustrate the three most characteristic types of decomposition based on a similar description given in Ref. [37]. The most communication intensive decomposition involves nonoverlapping subsystems of order one (Fig. 2(a)). This fragmenting is the most computationally effective, however, most likely the communication limitations will be reached. On the contrary, the decomposition depicted in Fig. 2(b) provides a balanced compromise between computational labor and communication broadcasts. It is important to note that there exists a trade-off between computation power and communication capacity for the nodes of the network. The third case (Fig. 2(c)) is similar to the previous scenario with the difference that there is overlap between the subsystems. In principal, overlapping dynamics increase both the complexity and the communication requirements of the overall design. This additional complexity overhead is due to the fusion of the common measurements and shared state estimates between the nodes. The overlap can increase the fidelity of states and measurements that are exposed to higher uncertainty, since they are monitored by more than one sensor. This fact can justify the additional effort in terms of complexity and communications that stems from overlapping states. Decomposition techniques are out of the scope of this paper, and the interested readers are referred to Refs. [38,39].

Graph theoretical tools are deployed to represent the dynamical interdependence of the system's states [37]. A *directed graph* or *digraph* provides a pictorial representation of the system's structure [40]. The digraph of system *S* is defined as the pair $Gs=(Vs,Es)$, where $Vs=\chi \u222aV\u222aB$ is the set of vertices consisted of the system states $\chi ={Xj}j=1,\u2026,nx$, the noise inputs $V={vj}j=1,\u2026,nx$, and the scalar terms $B={\beta j}j=1,\u2026,M$. The set $Es$ represents the oriented edges defined by the ordered pairs ${\nu l,\nu m}$, where $\nu l\u2208Vs$ and $\nu m\u2208\chi $. An oriented edge exists between the state *X _{l}* (or

*v*and

_{l}*β*) and state

_{l}*X*, if the former appears at the dynamic equation of the latter. If an edge exists between vertices

_{m}*ν*and

_{l}*ν*, we call them

_{m}*adjacent*and denote this relationship by $\nu l\u223c\nu m$. We define the neighborhood $Vm\u2286Vs$ of the vertex $Xm\u2208\chi $ as the set $Vm={Xl\u2208\chi |{Xl,Xm}\u2208Es}$ of all adjacent states to

*X*. The digraph $Gs$ is also referred to as

_{m}*structural graph*of the system

*S*.

From a graph theoretical perspective, each subsystem *S _{I}* of the monolithic process is represented by a cut-point set of vertices $Vs(I)$, where $Vs(I)\u2286\chi $. Each cut-point set includes states that are observed locally by sensors of its corresponding subsystem. The components of

*χ*that belong to the cut-point set $Vs(I)$ comprise the local states of subsystem

*S*. States from subsystems with departing directed edges that enter the vertices of a cut-point set determine

_{I}*the interconnection variables*or

*forcing terms*.

*S*has a local state vector $xI\u2208\mathbb{R}nxI$, local interconnection variables vector $dI\u2208\mathbb{R}ndI$, and local process noise input vector $vI\u2208\mathbb{R}nvI$. Each of the

_{I}*M*fault modes will have their own presence in every subsystem. Let $gIJ(k)$ denote the appearance of the fault mode $J\u2009(J=0,1,\u2026,M)$ at subsystem $I\u2009(I=1,\u2026,N)$. Based on the earlier definition, the state space model of each subsystem

*S*is described by

_{I}where $uI(k)\u2208\mathbb{R}nuI$ and $zI(k)\u2208\mathbb{R}nzI$ refer to the control input and measurement vector of subsystem *I*, respectively. The nonlinear functions $fI(\xb7,\xb7,\xb7)\u2009:\u2009\mathbb{R}nxI\xd7\mathbb{R}ndI\xd7\mathbb{R}nuI\u2192\mathbb{R}nxI$ and $hI(\xb7)\u2009:\u2009\mathbb{R}nxI\u2192\mathbb{R}nzI$ denote to the local subsystem and measurement dynamics; while $vI(k)\u2208\mathbb{R}nxI$ and $\omega I(k)\u2208\mathbb{R}nzI$ stand for the subsystem and measurement noise, respectively.

Likewise to the centralized approach, the formulation of the reduced order local PF for fault diagnosis will include a vector of binary states to represent the absence or presence of each fault mode. The binary vector of failure mode *j* at subsystem *I* is represented by $bIj(k)=[b1,Ij(k)\u2009b2,Ij(k)]T$ with the local fault function $gIj(\xb7)$.

The aforementioned definitions are illustrated with a simple example. Consider a three-dimensional system with the global state vector $x(k)=[X1(k),X2(k),X3(k)]T$, the noise vector $v(k)=[v1(k),v2(k),v3(k)]T$, and the set of change step functions $B={\beta 1(k),\beta 2(k)}$. The digraph of this example is shown in Fig. 3. Each sensor set monitors one subsystem. The monolithic system of the this example is decomposed into two subsystems represented by the cut-point sets $Vs(1)={X1,X2}$ and $Vs(2)={X2,X3}$. The local dynamic models of these two overlapping subsystems are

where $XI=[(xI)T(bI1)T\u2026(bI2)T\u2026(bIM)T]T\u2208\mathbb{R}nxI+2\xb7M$, $UI=[(uI)T\u2009(dI)T]T$, $ZI=zI,\u2009VI=[v\u0303IT\u2009nIT]T$, and $FI(\xb7),\u2009HI(\xb7)$ are nonlinear functions with appropriate dimensions and structure. The diagnostic algorithm includes the design of one DN for each subsystem *S _{I}*. Each DN consists of a processing unit that executes the local PF algorithm. The nodes can measure their own local states and communicate with their neighbors to obtain a processed estimate of their forcing term vector. The layout of the proposed DPFFD algorithm is depicted in Fig. 4. The algorithm can be separated into three main parts.

*Particles update*: In the first part, each DN executes a local bootstrap PF. For every subsystem, *N _{s}* particles are drawn according to the state transition propagation given in Eq. (15). This action requires estimates of the forcing terms $d\u0302I$ obtained by the neighbors of DN

*I*. At time

*k*− 1 all subsystems have already generated an estimate of their own states.

*Weights update*: In this step, each node uses its local observation and updates its particles' weights based on $p(ZI(k)|XIi(k))$. By taking into account the estimates of the neighboring states $d\u0302I$, and by choosing the proposal distribution similar to the local state transition function, the weight update is given by

The local PF is concluded after the weight normalization and resampling steps of the bootstrap filter (Sec. 2). The outputs of each DN are an estimation of the subsystems' state vector $x\u0302I(k)$, and the probabilities of failure $b\u03022,Ij(k)=E[b2,Ij(k)|ZI(k)];j=1,\u2026,M$.

*Shared states fusion*: The last part of the reduced-order DPFFD algorithm involves the fusion of state variables that belong to overlapping cut-point sets. At each time step, the estimates of all DN are collected by a central fusion center that assembles the final global output of the diagnostic network. The data transmitted to the central unit contain only postprocessed information. This is the only centralized processing action that takes place on the fusion center and does not add significant computational overhead to the algorithm. A running average filter is executed between the shared states to calculate a common estimate of their value.

*n*. The vector $x\u0302Ig$ has nonzero entries, equal to the components of $x\u0302I$, at the states that correspond to the elements of the cut-point set $VsI$. The rows of

_{x}*H*correspond to the components of the global continuous valued state vector

_{I}*x*, while its rows to the components of

*x*. The binary matrix

_{I}*H*has a nonzero entry (equal to one) to its

_{I}*i–j*element, if the global state variable

*X*appears to the

_{i}*j*th component of the local state

*x*. For the local state vectors one has

_{I}*H*is defined as follows:

_{I}*i–j*element, if the state variable

*X*appears to subsystem

_{j}*S*. The StS matrix is defined as

_{i}*n*dimensional local state vectors $x\u0302Ig$ are added together. Every element of the resulting summation vector is divided by the number of appearances of the corresponding state variable to the cut-point sets. Denote by $\sigma =\u2211I=1NHIx\u0302Ig$ the sum of the

_{x}*n*dimensional state vectors. The

_{x}*i*th element of the global fused state vector $x\u0302$ is given by

The block diagram, with the breakdown of the reduced-order DPFFD algorithm's steps, is depicted in Fig. 5. The corresponding pseudocode of the proposed algorithm is outlined in Table 4.

function REDUCED-ORDER DPFFD |

$Inputs:\u2009xI(k\u22121),HI,{bIj(k\u22121)}j=1M,wIi(k\u22121),uI(k\u22121),$$ZI(k),Gs$ |

$Outputs:\u2009XIi(k),wIi(k),{b\u03022,Ij(k)}j=1M,x\u0303$ |

Required:CPFFD |

1: for$I=1:N$do |

2: for$j=1:nxI$do |

3: $Find\u2009Vj\u2009based\u2009on\u2009Gs$ ▷ Find the neighborhood set of node j |

4: $dI=\u222aj\u2208Vj{Xj}\u2212\u222aj=1nxI{Xj}$ ▷ Calculation of forcing term vector |

5: end for |

6: $UI=[(uI)T(dI)T]T$ ▷ Control input at DN I |

7: $XI=[(xI)T(bI1)T\u2026(bI2)T\u2026(bIM)T]T$ ▷ State vector of DN I |

8: $[XIi(k),wIi(k){b\u03022,Ij(k)}j=1M]=CPFFD[XIi(k\u22121),wIi(k\u22121),UI(k\u22121),ZI(k)]$ ▷ Calculation of the particles, particles weights and probabilities of failure |

9: $x\u0302Ig=HIx\u0302I$ ▷ Conversion of the reduced-order vector x to the full-order vector $xIg$_{I} |

10: end for |

11: $\sigma =\u2211INx\u0302Ig$ ▷ Summation of the full-order local state vectors |

12: for$i=1:nx$do |

13: $[x\u0303]i=[\sigma ]i\u2211j=1N[T]ji$ ▷ Calculation of the global state fusion vector's elements |

14: end for |

function REDUCED-ORDER DPFFD |

$Inputs:\u2009xI(k\u22121),HI,{bIj(k\u22121)}j=1M,wIi(k\u22121),uI(k\u22121),$$ZI(k),Gs$ |

$Outputs:\u2009XIi(k),wIi(k),{b\u03022,Ij(k)}j=1M,x\u0303$ |

Required:CPFFD |

1: for$I=1:N$do |

2: for$j=1:nxI$do |

3: $Find\u2009Vj\u2009based\u2009on\u2009Gs$ ▷ Find the neighborhood set of node j |

4: $dI=\u222aj\u2208Vj{Xj}\u2212\u222aj=1nxI{Xj}$ ▷ Calculation of forcing term vector |

5: end for |

6: $UI=[(uI)T(dI)T]T$ ▷ Control input at DN I |

7: $XI=[(xI)T(bI1)T\u2026(bI2)T\u2026(bIM)T]T$ ▷ State vector of DN I |

8: $[XIi(k),wIi(k){b\u03022,Ij(k)}j=1M]=CPFFD[XIi(k\u22121),wIi(k\u22121),UI(k\u22121),ZI(k)]$ ▷ Calculation of the particles, particles weights and probabilities of failure |

9: $x\u0302Ig=HIx\u0302I$ ▷ Conversion of the reduced-order vector x to the full-order vector $xIg$_{I} |

10: end for |

11: $\sigma =\u2211INx\u0302Ig$ ▷ Summation of the full-order local state vectors |

12: for$i=1:nx$do |

13: $[x\u0303]i=[\sigma ]i\u2211j=1N[T]ji$ ▷ Calculation of the global state fusion vector's elements |

14: end for |

The reduced-order DPFFD algorithm results to a significant reduction in the computational complexity and communication bandwidth requirements of each DN. Suppose that the large-scale system has $nx+2\xb7M$ states. The computational complexity of the centralized architecture, according to Ref. [42] and by considering *N _{s}* particles is approximated by $O((nx+2\xb7M)2\xb7Ns)$ floating point operations (flops). By decomposing the system into

*N*subsystems, the number of the state variables is decreased to $(nx+2\xb7M)/N$ at each subsystem. Assuming that

*N*particles are generated in every reduced-order estimator node, and with the assumption of no shared states, the total computational complexity of the reduced-order DPFFD algorithm reduces to $O((nx+2\xb7M)2\xb7Ns/N)$.

_{s}## Numerical Results

This section provides an evaluation of the proposed DPFFD algorithm via extensive numerical simulations. Two systems of different dimensionality are analyzed to validate the efficiency of the algorithm. In both cases, the process model under investigation is a water tank system. This process was selected since its dynamics are nonlinear and its physical subcomponents (water tanks) are clearly identified.

The first case study involves the water tank system illustrated in Fig. 6. This process consists of nine identical cylindrical tanks of cross-sectional area *S _{c}*. The tanks are connected with pipes of cross section area

*S*. The flow rate $Qi,j$ between tank

_{p}*i*and tank

*j*is defined by means of Torricelli's rule as

*μ*is the flow correction term of tank

_{i}*i*,

*g*is the gravity constant, and

*X*is the water level of the

_{i}*i*th tank. The water level dynamics of tank

*i*is determined by means of mass balance equation as

where $Ni$ refers to the neighboring tanks of tank *i*. The nominal values of the process's parameters are given in Table 5 and are based on the benchmark process described in Ref. [43]. The fault modes under consideration are abrupt leaks to the water tanks. The leakage dynamics ot tank *j* are given by

Parameter | Meaning | Value |
---|---|---|

S_{c} | Tank cross-sectional area | $0.0154\u2009m2$ |

S_{p} | Cross section of interacting pipes | $2\xd710\u22125\u2009m2$ |

μ_{i} | Flow correction term | 1 |

g | Gravity constant | $9.81\u2009m/s2$ |

Parameter | Meaning | Value |
---|---|---|

S_{c} | Tank cross-sectional area | $0.0154\u2009m2$ |

S_{p} | Cross section of interacting pipes | $2\xd710\u22125\u2009m2$ |

μ_{i} | Flow correction term | 1 |

g | Gravity constant | $9.81\u2009m/s2$ |

*i*th tank takes the form

where $Ts=0.1\u2009s$ is the sampling period, and the process noise $vi\u2208\mathbb{R}$ is drawn from the normal distribution $N(0,0.05)$. The goal of this simulation scenario is to investigate the case of decomposition with overlapping states. To this end, the monolithic process is decomposed into two reduced-order subsystems, namely *S*_{1} and *S*_{2}, as shown in Fig. 6. Figure 7(a) depicts the structural graph of the monolithic process and its partitions. The local observation vectors of the two DNs are expressed by

where $x1=[X1,\u2026,X6]T,\u2009x2=[X4,\u2026,X9]T$, $d1=[X7,X8,X9]T,\u2009d2=[X1,X2,X3]T$, and the measurement noise sequences *ω*_{1} and *ω*_{2} are generated by the multivariate normal distribution $N(0,0.1)$. The subgraphs of the observation fusion are depicted in Fig. 7(b). As shown in this figure, the states ${X4,X5,X6}$ are shared between the two DNs. Three failure modes are seeded at tanks 1, 4, and 5 at the time steps $T=290,250,\u2009and\u2009200\u2009s$, respectively. The time horizon of the simulation is set to 360 time steps. The number of particles at each DN is set to *N _{s}* = 200.

During the execution of the reduced-order DPFFD algorithm, the i.i.d noise that drives the binary states is generated by the distribution $U(\u22120.6,0.6)$. Figure 8 shows the population of the particles on the $b1\u2212b2$ plane during the healthy and faulty operating condition of the system at a given time instant. The selection of this noise range plays a crucial role in the performance of the algorithm. The effect of the i.i.d uniform white noise is illustrated in Fig. 9. When the noise is $U(\u2212a,a)$ with *a* = 0.5 (too small), there is no overlap between the two regions; thus, the particles remain trapped in the healthy state even in the presence of a fault. On the contrary, when the overlap increases ($a\u22650.8$), the particles keep transitioning between the states and the output of the failure filter is indecisive.

The binary state's update function $\Phi (\xb7)$ of Eq. (10) is essentially a “data-driven feedback” for the failure sensitive filter. This way, when the process is healthy, the filter will diminish the particles that correspond to fault modes indirectly through the likelihood function [34]. A compromising value that ensures the optimal operation of the diagnostic filter was shown to be *a* = 0.6.

The probabilities of failure for each fault mode are illustrated in Fig. 10. Due to the overlap of the two DNs, the estimates of common states ${X4,X5,X6}$ are fused using the central averaging step described in Sec. 4. As it can be seen, both DNs can timely detect and isolate their respective fault modes.

*x*=

_{i}*X*and $i=1,\u2026,100$. The two-dimensional location of the tank in the grid array is converted to a single index. The one-dimensional index

_{i}*i*is calculated by

where *n*_{row} and *n*_{col} are the row and column number of the tank in the grid, and *N*_{rows} denotes the total rows in the array.

Abrupt leaks are seeded randomly to nine tanks at the time instances listed in Table 6. The nominal values of the system parameters are identical to the first scenario (Table 5). The measurement/process noise are drawn from the normal distribution $N(0,1)$. The time horizon of the simulation is set to 190 time steps. The same tuning guidelines for the failure sensitive filters hold with the first example. The initial values of the estimated tank water levels are set to $14\u2009m$.

Fault mode number | Time of occurrence |
---|---|

Mode 82 | 10 |

Mode 91 | 30 |

Mode 13 | 50 |

Mode 92 | 70 |

Mode 64 | 90 |

Mode 10 | 110 |

Mode 28 | 130 |

Mode 55 | 150 |

Mode 96 | 170 |

Fault mode number | Time of occurrence |
---|---|

Mode 82 | 10 |

Mode 91 | 30 |

Mode 13 | 50 |

Mode 92 | 70 |

Mode 64 | 90 |

Mode 10 | 110 |

Mode 28 | 130 |

Mode 55 | 150 |

Mode 96 | 170 |

Due to the high dimensionality of the system's states, the simulation results are presented with respect to both time and space. The illustration of the probabilities of failure, the actual and estimated values of the water tank levels are shown in the first, second, and the third row of Fig. 11, respectively. The output values of the DNs are depicted as color-coded pixels based on their location in the lattice, for different time instances. The probabilities of failure with respect to time, only for the leaked tanks, are shown in Fig. 12. When a leak is seeded in one of the tanks, its water level will gradually reduce. For a transient interval, the neighboring tanks will try to compensate for this loss due to the pressure difference until their level will also start to decrease as well. The diagnostic performance is deemed satisfactory since each DN can promptly detect and isolate its own fault mode in spite of having access to local information. This case study involves only nonoverlapping subsystems; therefore, state fusion was not necessary. The computational reduction compared to the CPFFD algorithm is dramatic. Instead of processing 100 states, each node is responsible of monitoring a one-dimensional system.

## Conclusions

We have presented a reduced-order distributed implementation of a fault detection and isolation algorithm for nonlinear large-scale systems. A network of interconnected DNs is employed to monitor the entire process. Each node monitors lower-order subdivisions of the monolithic system. The DNs have access to partial local measurements and can communicate with adjacent nodes of the monitoring network. The layout of the scheme is driven by the two main constraints of networked systems: the available communication bandwidth and processing capabilities of the nodes. An on-line hypothesis testing module is embedded at each failure sensitive filter that triggers alarm indicators in the presence of a fault. This inference component eliminates the need for the entire system's state at each DN and the necessity of a bank of estimators to isolate the occurring faults. A simplistic state fusion step takes place between nodes that monitor common states. This approach relieves the filter design analysis by substituting the complex stability proofs that are required by observed-based methods, with Monte Carlo simulations that are conveniently applicable to real-life sensor networks.

## Funding Data

National Science Foundation (NSF) (Award No. CMMI-1662742).

## Nomenclature

*b*=binary vector of

*S*^{f}*B*=set of time profile functions

*b*=_{I}binary vector of subsystem $SIf$

*d*=_{I}interconnection variables (forcing terms) of subsystem

*S*_{I}*e*=^{i}prediction error of particle

*i*- $f(\xb7)$ =
state transition function

- $F(\xb7)$ =
compact state transition function

- $fI(\xb7)$ =
state transition function of subsystem

*S*_{I} - $FI(\xb7)$ =
compact state transition function of subsystem $SIf$

- $Gs$ =
structural graph (digraph) of system

*S* - $gj(\xb7)$ =
function of fault mode

*j* - $gIj(k)$ =
fault function of failure mode

*j*at subsystem*S*_{I} - $h(\xb7)$ =
observation function

- $hI(\xb7)$ =
observation function of subsystem

*S*_{I} - $HI(\xb7)$ =
compact local observation function

*H*=_{I}local state matrix

*k*=time index

- $k0j$ =
time occurrence of failure mode

*j* *M*=number of failure modes

*n*=uniform white noise

*N*=number of subsystems

- $ndI$ =
dimension of subsystem's

*S*forcing vector_{I}*d*_{I} *n*=_{u}dimension of input vector

- $nuI$ =
dimension of subsystem's

*S*input vector_{I}*u*_{I} *n*=_{v}dimension of process noise vector

- $nvI$ =
dimension of subsystem's

*S*system noise vector_{I}*v*_{I} *n*=_{x}dimension of state vector

- $nxI$ =
dimension of subsystem's

*S*state vector_{I}*x*_{I} *n*=_{z}dimension of observation vector

- $nzI$ =
dimension of subsystem's

*S*measurement vector_{I}*z*_{I} - $n\omega $ =
dimension of measurement noise vector

- $n\omega I$ =
dimension of subsystem's

*S*measurement noise vector_{I}*ω*_{I} *N*=_{s}number of particles

- $N(\xb7,\xb7,\xb7)$ =
normal distribution

- $p(x(k)|z(1:k))$ =
posterior density function

- $p(x(k)|x(k\u22121))$ =
state transition density function

- $p(z(k)|x(k))$ =
likelihood density function

- $q(x(k)|z(1:k))$ =
proposal distribution function

- $\mathbb{R}$ =
set of real numbers

*S*=monolithic system

*S*=_{I}subsystem

*I*of monolithic system*S**S*=^{f}failure sensitive filter

- $SIf$ =
subsystem

*I*of failure sensitive filter*S*^{f} - $T$ =
StS matrix

*u*=input vector

*u*=_{I}input vector of subsystem

*S*_{I}- $UI$ =
compact input vector of subsystem $SIf$ (combination of

*u*and_{I}*d*)_{I} - $U(\xb7,\xb7)$ =
uniform distribution

*v*=process noise vector

- $v\u0303$ =
approximate process noise vector

*V*=set of noise inputs

- $V$ =
compact noise vector of system

*S*(combination of $v\u0303$ and^{f}*n*) *v*=_{I}process noise vector of subsystem

*S*_{I}- $VI$ =
compact noise vector of subsystem $SIf$ (combination of

*v*and_{I}*n*)_{I} - $Vm$ =
neighborhood set of vertex

*m* - $VsI$ =
set of vertices of the graph $Gs$

*w*=^{i}particles' weight

- $wIi$ =
particles' weight of DN

*I* *x*=state vector of monolithic system

*S*- $x\u0303$ =
global fused state vector

- $X$ =
compact state vector of

*S*(combination of^{f}*x*and $bj|j=1,\u2026,M$)^{c} *x*=_{I}state vector of subsystem

*S*_{I}*X*=_{i}state variable

*i*- $XI$ =
compact state vector of subsystem $SIf$ (combination of $xIc$ and $bIj|j=1,\u2026,M$)

*x*=^{i}particles of

*x**x*=^{c}continuous valued state vector of

*S*^{f}- $XIi$ =
particles generated by DN

*I* - $x\u0302Ig$ =
*n*order estimate of state_{x}*x*_{I} - $X\u0302jSIf$ =
estimate of

*X*by DN_{j}*I* *z*=measurement vector

- $Z$ =
compact measurement vector of system

*S*^{f} *z*=_{I}observation vector of subsystem

*S*_{I}- $ZI$ =
compact observation vector of subsystem $SIf$

- $\mathbb{Z}+$ =
set of positive integers

*β*=time profile function of a fault's occurrence

- $\delta (\xb7)$ =
Dirac function

*ε*=_{s}edges of the graph $Gs$

- $\Sigma \omega $ =
covariance matrix of measurement noise

- $\Phi (\xb7)$ =
update function of the binary states

*χ*=set of system states

*ω*=measurement noise vector

- $\omega \u0303$ =
approximate measurement noise vector

*ω*=_{I}measurement noise vector of subsystem

*S*_{I}