We propose an information-theoretic framework for modeling complex systems as a communication network where physical devices can be organized into subsystems and subsystems are communicating through an information channel governed by the dynamics of the system.
Complex Systems can be viewed as a System of Systems (SoS), where constituent systems include process elements, sensors, actuators and control elements. A Process and Instrumentation Diagram defines how physical equipment is connected and a Process and Information Diagram determines how the elements of the SoS communicate with one another through physical interactions such as heat and mass transfer, motion, etc. The behavior of one element can be observed in the behaviors of others, and the information flow between the physical elements defines the intrinsic communication topology of the SOS. Mutual Information can be used to illuminate how the intrinsic communication topology of the system is changing over time and provides an indicator of changes in the dynamic behavior of the SoS .
Information Theory in Control System
Mutual Information (MI) is a measure of the statistical dependence between two messages or signals. A communication channel and a closed-loop control system are shown in Figure 1 where the relationship between the input and observed output defines the information transfer through the system. Several past researches [1-3] investigated the role of information theory in estimation and control. Entropy and mutual information can be used as criteria in controller and observer designs , and properties such as controllability and observability in linear systems . Advancements in modern communication systems are the foundation for cyber-physical systems where aspects of estimation and control can no longer be separated from information theory [5, 9, 10].
Computation of Mutual Information for Practical Problem
The computations in Eq. (1) and (2) can also be used to measure uncertainty and statistical dependence between two stationary time-series X and Y. However, most applications involve non-stationary time-series data from nonlinear and/or stochastic systems and computational measures such as Approximate Entropy (ApEn) have been widely used to quantify the predictability or regularity of a time-series. The computation of Mutual Information between non-stationary time-series data is still an open problem, which we briefly discuss next.
The most common approach to the estimation of entropy (and mutual information) is to discretize continuous-valued time-series data and use empirical estimation measures with bias adjustment to achieve accurate estimation . This “naive” mutual information estimator is capable of tracking the trend of changing coupling strength in deterministic chaotic systems, especially in coupled Rössler systems (Figure 2), and is often sufficient to monitor critical changes or degradation within dynamical systems. The computation of mutual information using quantization depends on the resolution of the binning process and from the data processing inequality, the entropy and mutual information estimates obtained with coarser binning provide lower bounds for estimates computed with finer resolution. It is desirable to use the highest possible resolution for estimation, however the number of points in the dataset limits the level of quantization that can be used in the estimation process. Our study for estimating mutual information in chaotic dynamical systems shows consistency of the mutual information estimates using different levels of binning resolution as long as the number of bins is sufficient to represent the dynamics of the system .
Physical and Communication Layers
A complex system, like a power generation plant used as the exemplar in this paper, can be represented with two layers that include a physical layer and a communication layer as shown in Figure 3. The physical layer contains the power generation plant with all its devices . The communication layer represents the communication topology extracted from the physical layer by deploying autonomous (computational) agents that gather data and provide information based on the foraging behavior of ants. For details, see “What is Foraging Behavior?”. This topology of the communication network identified by the agents is an information-based representation of the dynamic interactions between devices and subsystems in the power generation plant. This two-layer representation of a complex system is the foundation of our information-theoretic framework for monitoring and analysis of complex systems. An extensive literature on biologically-inspired routing algorithms exists and the reader is referred to [11,12] and the references therein for further details.
Discovering The Communication Layer of A Complex System
The approach taken is based on the foraging behavior of swarm insects for the discovery of the shortest path from a nest to a food source . Consider a system consisting of 4 nodes where a node corresponds to a physical location in a complex system where data can be collected [as shown in Figure 4]. Autonomous agents, with limited lifespan, are introduced at the nodes of the system at the start of each time window, obtain a window of data from their home node and start navigating through the system visiting other nodes. At each visited node, they calculate the mutual information between the data they obtained at the home node and the data available at the node that they are visiting. If two system objects have “high” mutual information, then the information exchange, or information connectivity, between these two nodes is “high”. The mutual information here is considered as an analog to food in the foraging behavior, and agents traveling between two nodes that have “high” mutual information deposit a stronger layer of pheromone on this route as compared to another edge connecting two nodes with “lower” mutual information.
The agents dispersed at the next time window, based on their foraging behavior, take the route with stronger pheromone, i.e., with stronger information connectivity between the nodes. Therefore, more agents will take this path while exploring the network. The agents retain their window of home node data until the end of their life. After that, they die along with their data but the pheromone they deposited during their lifespan remains on the paths they traveled and dissipates over time. This allows quick adaptation of the discovery algorithm to detect the dynamic changes in the system. The intrinsic communication topology is then determined by analyzing the weighted network with nodes and edges discovered by the autonomous agents based on the current pheromone levels on the edges of the network graph. The strength of the communications (i.e., connectivity) between two nodes in the network is related to the pheromone level on the edge connecting these nodes. The specifics of the algorithm that govern the evolution of the agents and the change detection method that is used to quantify a measure of node similarity are described in detail in [9,10].
A Case Study
The proposed ant foraging behavior algorithm is applied to the problem of detecting changes in the operational status of a power generation plant as an exemplar. Figure 5 shows the schematic of Alstom’s 1000 MWe ultra-supercritical pulverized coal-fired steam plant model . The steam generator produces steam flow to a turbine generator with boiler outlet conditions for main steam flow of 600°C at 58 bar g. The plant net heat rate is 9045 kJ/KWh. To study how the information topology of a complex system can provide information related to changes in system operating conditions, we consider an operating condition where the final desuperheater valve is saturated (out-of-range actuator fault). Process variables, controller outputs, and setpoints for the outer loop of the main steam controllers in the boiler are recorded every 5 seconds from the simulation model. A sliding window of length 50 minutes is passed over the data stream and the intrinsic communication topology for each window is discovered. The results are then passed through the change detection algorithm to detect the presence of the fault. The following fault scenarios are considered: (1) single sensor faults, (2) multiple sensor, actuator and process faults.
Single Sensor Fault Scenario
Temperature sensor 1 of the primary desuperheater is subject to a positive bias of 5°C at 30 minutes, as shown in Figure 6. It can be seen that the measurements at other locations also change after the bias is applied to sensor 1. However, after a transient period, all of the other sensors return to their original values. The power plant is modeled as a network of 10 nodes and the algorithm using the ant foraging model and time windows of sensor data is used to determine the intrinsic communication topology for the system. The results are then processed using a node similarity change detection algorithm to detect statistically different operating conditions and, in this case, detect the sensor fault.
Figure 6 shows the information connectivity structures for different time windows. The normalized simulation data is shown on the left side of these figures with the extracted connectivity structure on the right side. The thickness of the line between two nodes in the discovered topology is proportional to the communication strength between its terminal nodes as determined using mutual information. The system starts at a stable connectivity structure as shown in Figure 6(a). After the introduction of the sensor bias at time 30 minutes, temporary communication links are appearing between the nodes as shown in Figure 6(b). After around 40 minutes the information connectivity structure converges to a stable state as shown in Figure 6(c). Introducing a bias to the output of the sensor does not change the physical structure of the system, and so after the transient effects of the sensor bias at location 7 (sensor 1) diminish, the connectivity structures of Figure 6(a) and Figure 6(c) are the same.
A measure of node similarity, applied to the extracted communication topologies, is used to detect the point where the fault occurs. The node similarity measure for a positive step bias is shown in Figure 7. The sudden change in the node similarity scores indicates a change in the topology of the connectivity structure, and hence a change in the operating state of the system.
As the connectivity structures during the stable states are similar, node similarity scores are also similar. Around the time 30 minutes, when the sensor fault bias is applied, the node similarity scores of almost all of the nodes change. This indicates a change in the intrinsic communication structure of the system, and therefore a change in the status of the system. After around 40 minutes, the system converges to a stable operating condition and the node similarity scores return to their previous values. This shows that the connectivity structure also converges to its previous stable structure. The presence of the fault can then be detected by looking at sudden statistically meaningful changes in the similarity scores that are related to changes in the communication structures that are extracted from the available sensor data.
Multiple Fault Scenarios
Different fault scenarios are introduced to the plant in a 24-hour simulation. The faults are applied as discussed below and are shown in Figure 8. The faults, in order, are:
1) Sensor Fault: Applied to the temperature sensor of the primary desupreheater: a positive bias of 5°C that starts at time 1 hour, takes around 30 minutes to reach its final value and ends at the 3 hour time point. Then, a positive bias of 5°C that starts at the 5 hour time point, takes around an hour to reach its final value and ends at the 7 hour time point.
2) Actuator Fault: Applied to the controller output of the final desupreheater valve that provides the valve position demand for the final spray: a negative bias of 0.05 that starts at the 9 hour time point, takes around 30 minutes to reach its final value and ends at the 11 hour time point. Then, a negative bias of 0.05 that starts at the 13 hour time point, takes around 1 hour to reach its final value and ends at the 15 hour time point.
3) Process Fault: Applied to the heat transfer coefficient for all superheater heat exchangers: a negative 5% change starting at the 17 hour time point with a ramp duration of 30 minutes and an end time at the 21 hour time point.
The information-theoretic framework is used to model the intrinsic communication network of the process to extract the information connectivity structures from the measurement data using sliding windows of 50 minutes. The results are similar to those shown for the sensor fault in the previous section and are not provided here. The node similarity measure is then used to quantify connectivity changes in the extracted network and the results are examined to determine how effective the method is for fault detection and diagnosis. Figure 9 shows the node similarity measures for the system. The black and red lines represent the start and end of the faults, respectively. We can see that after each change, either the start or end of a fault, a similar pattern in the node similarity values occurs: there is a transient phase and after that the system has an internal change and it then stabilizes.
In this paper, we have proposed an information-theoretic framework where Mutual Information is used to quantify the statistical dependence between two non-stationary data streams. The Mutual Information computations use a Swarm Intelligence approach based on an ant foraging model for discovering the intrinsic communication topology of complex systems. The proposed approach was applied to a power generation plant simulation model under different fault scenarios, and data streams from the process were used to detect the presence of a fault (or multiple faults) by extracting the intrinsic communication topology of the power plant model and observing the changes in the connectivity topology. When a fault occurs in the system, the communication topology will change and changes in the topology are observed using a change detection algorithm such as the node similarity measure to detect and diagnose the fault type.
First, we considered a single sensor fault scenario. We applied the proposed approach to the simulation data and extracted the communication topology of the system. We observed that after the fault occurs in the system, the extracted communication topology changes. We used the node similarity measure to quantify the changes in the information connectivity and to observe how changes in the node similarity measures are related to the fault occurrences.
Finally, we considered multiple fault scenarios where sensor, actuator and process faults occurred in a single simulation run. We used the proposed approach and demonstrated that changes in the extracted communication topology quantified by the node similarity measure were observed, and these changes provided a means for detecting and diagnosing the faults.
About the Authors
Hanieh Agharazi is a Research Associate with Department of Electrical Engineering and Computer Science at Case Western Reserve University, Cleveland, USA. Her research interests include systems and control, signal processing, modern optimization, swarm intelligence, machine learning and power systems.
Wanchat Theeranaew is a Research Associate at Case Western Reserve University. His research interests include application of Information theory in control systems, feature extraction in time-series data and the development of advanced signal processing for information extraction in biological signals.
Richard M. Kolacinski is a Consultant. His research interests include nonlinear dynamical systems, stochastic systems, information theory, and complexity theory and their application to monitoring, event detection, model identification and estimation, and decision and control systems for energy systems.
Kenneth A. Loparo is the Arthur L. Parker Professor at Case Western Reserve University. His research interests include stability and control of nonlinear and stochastic systems; nonlinear filtering with applications to monitoring, fault detection, diagnosis, prognosis and reconfigurable control; and information theory aspects of stochastic and quantized systems.