Abstract
Unmanned engineering systems that execute various operations are becoming increasingly complex relying on a large number of components and their interactions. The reliability, maintainability, and performance optimization of these systems are critical due to their intricate nature and inaccessibility during operations. This paper introduces a new reliability-based optimization framework for planning operational profiles for unmanned systems. The proposed method employs deep learning techniques for subsystem health monitoring, dynamic Bayesian networks for system reliability analysis, and multi-objective optimization schemes for optimizing system performance. The proposed framework systematically integrates these schemes to enable their application to a wide range of tasks, including offline reliability-based optimization of system operational profiles. This framework is the first in the literature that incorporates health monitoring of multi-component systems with causal relationships. Using this hybrid scheme on unmanned systems can improve their reliability, extend their lifespan, and enable them to execute more challenging missions. The proposed framework is implemented and executed using a simulation model for the engine cooling and control system of an unmanned surface vessel.
1 Introduction
Unmanned systems are gaining popularity for their efficient performance of certain tasks without human intervention. These systems are composed of multiple heterogeneous components and subsystems that interact with each other in complex ways, which can cause inter-dependencies in the event of system failure. Unmanned systems have unique operating conditions and degradation characteristics for each subsystem and component. This degradation leads to differing needs for system reliability and risk assessment. Due to their complexity and possibly inaccessibility during operation, ensuring reliability, maintainability, and performance optimization of unmanned systems is crucial. Extensive research is necessary to develop reliable and robust maintenance and optimization schemes for these systems.
Many systems are now integrated with advanced sensors and computing technology, resulting in a continuous flow of abundant monitoring data. These real-time condition monitoring data are used to assess the health conditions of components as well as system performance, utilizing the power of prognostics and health management (PHM) technology. Deep learning-based PHM technology is currently on the rise, especially due to its ability to model large data sets and complex nonlinear relationships between different data streams. For instance, several studies have been carried out to develop and validate novel deep learning-based PHM schemes such as for turbofan engines [2], wind turbines [3], and marine diesel engines [4,5]. However, the current PHM technology is mostly concentrated at the component level. The current literature clearly lacks a comprehensive set of applications for the system-level PHM schemes with considerations of PHM at subsystem and component levels. Recent studies by Lewis and Groth [6], Moradi et al. [7], and Guo et al. [8] showed how system-level PHM schemes can be developed and applied to systems with varying levels of complexity, such as nuclear power plants, copper mine stone crushers, and marine electrical propulsion systems, respectively. Moradi and Groth [9] proposed and validated a novel framework that integrates PHM and probabilistic risk assessment tools for health monitoring of complex engineering systems. Additionally, Moradi et al. [10] demonstrated how deep learning-assisted PHM tools integrated with risk assessment techniques can be systematically implemented for health monitoring of a real-world vapor recovery unit at an offshore oil production platform. The authors discussed the advantages of such schemes for health monitoring and failure risk analysis of systems in the context of dynamic risk assessment [11].
Bayesian networks, a type of probabilistic graphical model, are often used for modeling causal relationships in system risk assessment and decision analysis [12]. Numerous applications of Bayesian networks for reliability analysis of multi-component systems can be found in the literature [13,14]. Bayesian networks with dynamic modeling capabilities have also been extensively utilized for time-dependent reliability analysis of engineering systems. Example applications of dynamic Bayesian networks (DBNs) include reliability and risk analysis of nuclear power plants [6], unmanned surface vessels [15], chemical infrastructures [16], and power distribution systems [17], to name a few. DBNs are essentially system-level models that come with the ability to model time-dependent behavior of complex engineering systems. However, the current literature contains only limited studies that combine a system DBN model with PHM techniques for dynamically updating the model with continuous condition data streams [6].
The operational profiles of unmanned systems can be designed in such a way that they are able to successfully perform certain tasks with high reliability throughout the duration of the mission. In order to enable this, this paper presents a new framework for designing reliability-based optimized operational profiles for unmanned systems in an offline mode for planning. The proposed approach combines deep learning-assisted PHM techniques, Bayesian network-based system reliability monitoring tools, and reliability-based optimization schemes into a single framework. Although numerous applications of reliability-based system performance optimization approaches can be found in the current literature (e.g., tunnel boring machines [18], healthcare systems [19], microelectromechanical systems [20], and structural systems [21]), these approaches do not explore the health monitoring and the causal relationship or dependency aspects of the problem in a systematic manner. To the best of the authors’ knowledge, the proposed framework in this paper is the first of its kind that can be utilized in various ways, including designing (planning) offline operational profiles for unmanned systems prior to a mission; optimizing system operational profiles during mission execution; scheduling maintenance activities, such as determining whether maintenance is necessary immediately or can be postponed until the mission completion; estimating the remaining operational lifespan of systems based on either total mission time or the number of similar missions; and identifying critical components that require spare parts to improve system reliability for extended missions. However, this paper focuses on the problem of offline design of system operational profiles.
In order to demonstrate the proposed approach, a case study is developed using a simulation model for the engine cooling and control system (ECCS) of an unmanned surface vessel (USV). The ECCS controls and cools the marine diesel engine that gives the USV the power it requires for propulsion and to complete its mission. USVs have numerous commercial and military applications as they offer numerous benefits over traditional manned ships. On the commercial side, USVs are (and will be) used for cargo and supply chain management, offshore oil and gas exploration, marine research and oceanography, fisheries, and environmental monitoring [22]. In military contexts, USVs are employed for intelligence gathering, surveillance and reconnaissance, maritime security and patrol, search and rescue, mine countermeasures, anti-submarine warfare, and fire suppression support [22,23]. They use advanced technology to operate in challenging environments. High operational reliability for USVs allows for faster and longer deployment, enhancing their capability to perform complex missions while reducing the risk to human life. Thus, USVs are versatile platforms for a wide range of tasks, appealing to both civilian and military organizations.
The contributions of this study are summarized as follows:
The paper proposes a new and general framework that integrates multiple techniques, such as deep learning for subsystem health monitoring, dynamic Bayesian networks for system reliability analysis, and optimization for reliability-based design of unmanned systems’ operational profiles, which has potential applications in a wide range of fields.
The proposed hybrid reliability-based optimization approach is the first in the literature that has an embedded health monitoring of multi-component (or multi-subsystem) systems with their causal (interdependency) relationships.
The case study offers a practical demonstration of the proposed method and its effectiveness in optimizing the operational profile of a critical system in a USV.
The remainder of the paper is organized as follows: Sec. 2 presents the problem aim and definition considered in this study. Section 3 presents the proposed approach and details of the implementation steps. Section 4 presents a case study on the performance optimization of the ECCS of a USV. Finally, Sec. 5 presents the conclusion of the study.
2 Problem Aim and Definition
3 Proposed Approach
The framework for the proposed approach consists of six main parts (or Steps 1–6): the inputs (Step 1), subsystem and system models (Step 2), subsystem-level health analysis (Step 3), system-level reliability analysis (Step 4), the optimizer (Step 5), and the outputs (Step 6). The framework architecture is presented in Fig. 1, with the step numbers for different parts. Briefly, to diagnose the health of systems, input information is collected from various sources in Step 1, followed by the development of subsystem and system models in Step 2. Using the deep learning-based trained diagnostic models, Step 3 identifies the health states of individual subsystems. Step 4 assigns failure probabilities and evaluates the system reliability profile, while Step 5 optimizes system performance. Finally, in Step 6, the optimized operational profiles and corresponding reliability profiles are collected and stored for future use. A detailed description covering all steps of the framework is presented in the following subsections. This formulation can be subject to constraints, such as a lower bound on the reliability at the system and/or subsystem levels.
3.1 Step 1: Input Information.
In order to execute the proposed approach, the first task is to collect input information from various sources including system description (e.g., list of components and subsystems, their functions and interactions, and system logic), engineering knowledge (e.g., physics-based models), reliability data (e.g., failure modes and associated failure rates), and environmental data. The inputs serve as the basis of developing subsystem- and system-level models.
3.2 Step 2: Subsystem and System Models.
The following subsections provide details on how to develop the system simulation model, the deep learning-based diagnostics models, and the DBN model from the input information in order to implement the proposed performance optimization scheme for unmanned systems.
3.2.1 System Simulation Model.
A physics-based system simulation model needs to be developed in order to generate system operational profiles with simulated condition monitoring data from sensors under uncertain environmental conditions. The simulink environment provided by mathworks® [24] can be used for this purpose. These operational profiles will be used to identify the reliability-based optimum profile.
3.2.2 Deep Learning Models for Subsystems’ Health Diagnosis.
Deep learning models can be useful for subsystems’ health diagnosis when the underlying physics of a subsystem is either unknown or difficult to derive. An array of sensor measurements can be used as inputs for a deep learning model, and then a model is trained to learn the relationships between data streams from sensors to efficiently classify different health states (e.g., healthy, degraded, and critical) of a component. In the proposed approach, a number of deep learning models need to be developed for each subsystem. The deep learning models can be trained by utilizing real monitoring data and maintenance logs, if available. The advantage of having separate trained deep learning models for each subsystem is computational efficiency, as it allows for the development of deep learning models with simpler architectures and the utilization of small data batches from the pool of condition monitoring data for analysis [9,10]. An illustration of the steps typically involved in data processing and deep learning model training for systems’ health diagnosis is presented in Fig. 2.
Deep learning uses artificial neural networks to model complex data sets. The process involves the learning of nonlinear relationships between input and output layers. One needs to carefully select and fine-tune its hyperparameters, such as model architecture, activation function, batch size, learning rate, and number of epochs, in order to achieve optimal performance. This allows the deep learning models to achieve higher accuracy and efficiency. Convolutional neural networks (CNNs) are commonly used in the diagnosis of mechanical components. They automatically extract features from layer inputs and control over-fitting using operations like pooling and dropout. The basic building block of a CNN is the convolution operation, which applies a filter to each local region of the input data to produce a feature map. The output of a CNN classifier is a probability distribution on predefined health classes that can be further utilized for system-level analysis [7]. For more details on the deep learning-based PHM approaches, the reader may refer to Ref. [25].
3.2.3 Dynamic Bayesian Networks for Uncertainty Propagation.
A DBN is a type of Bayesian network, which models the time dependencies between variable nodes. A Bayesian network shows the conditional probabilities between variables using a directed acyclic graph. It has parent and child nodes that represent discrete or continuous random variables, connected by directed edges. The conditional probability of a parent node is represented using a conditional probability table (CPT). The network uses Bayes’ theorem to combine prior beliefs and observed evidence to generate a posterior probability, supporting decision-making and providing valuable insights. A DBN consists of an initial network at t = 0 that consists of prior distributions of the node variables, and a two-slice network that defines state transitions through arcs and temporal CPTs between the nodes. Figure 3 shows an illustration of the two-slice transition network. To calculate the posterior distribution of the network at any time t, the temporal states of the nodes are updated using node evidence and prior information from the time slice at t = 0. For more details on DBNs, the reader may refer to Ref. [12]. DBNs are popular for system reliability and risk analysis as they can combine information from different sources and identify challenging-to-observe system states. They are often constructed from system-level fault trees that are used for failure modeling of multi-component systems.
3.3 Step 3: Subsystems’ Health Analysis.
Once subsystems’ health diagnosis models are developed and trained based on their operational behavior, they can be deployed for online health diagnosis of the components and subsystems. These models need to be updated by retraining them from time to time. This retraining process addresses changes in subsystem behavior due to changes in the operating environment.
The subsystems’ health state classification process using deep learning models is illustrated in Fig. 4. Usually, for components, the health states are classified into three categories: healthy, degraded, and critical. A trained deep learning model uses input condition data from sensors and predicts the current health state of a component or subsystem. These models can also be used to detect early signs of degradation or failure, which can be used to schedule maintenance or repairs before a catastrophic failure occurs. The proposed approach takes the prediction class probabilities into account, and propagates the information to the nodes of a system DBN model for system-level analysis.
3.4 Step 4: System Reliability Analysis.
This step uses a system-level DBN created using system failure logic to evaluate the system reliability at specific time-steps. The prior failure rates of different components can be obtained from maintenance logs and reliability data handbooks that contain failure rates for different commercial and military equipment. A parameterized DBN model needs to be updated based on health information obtained from deep learning-based diagnostic models. In order to connect the subsystem health diagnosis step with the system reliability evaluation step, the identified health states from different subsystems are assigned different failure probabilities to use the health information as temporal evidence for the system DBN. The health state information from different subsystems at different time-steps is then used to update the network in real-time. The integration process of the health information and the system DBN is illustrated in Fig. 5. The updated reliability profiles at each time-step can then be used for monitoring system health and performance optimization.
3.5 Step 5: System Performance Optimization.
The system performance model evaluates state of the USV system by using measures such as the time-dependent speed and reliability of the USV. The system performance optimization scheme entails a predictive model and an optimizer for the USV system’s operational profile (e.g., USV speed over time) and reliability.
3.6 Step 6: Output Information.
The proposed optimization scheme provides valuable outputs, such as the optimal system operational profiles and the corresponding system reliability profile for the entire mission. During the mission, these outputs can be deployed in real-time to further enhance the system’s performance. In addition, the generated reliability profile may serve as a decision-making tool for scheduling maintenance by determining if the system’s reliability at the end of the mission falls below a certain threshold. By utilizing this optimization scheme, mission-critical systems can achieve maximum performance and efficiency while maintaining the highest levels of reliability and safety.
4 Case Study
In order to demonstrate the proposed approach, a case study is developed using a simulation model for one of the critical subsystems of a USV. A USV consists of five major subsystems, namely the ECCS, the power system, the navigation system, the data acquisition system, and the communication system [15]. The ECCS is a critical subsystem in the USV that controls and provides cooling for the marine diesel engine. The ECCS consists of the track control devices, the situational awareness devices, and the propulsion subsystem. The propulsion subsystem includes the marine diesel engine, lube oil pump, seawater pump, and freshwater pump, among other components that are monitored by sensors attached to them.
For this case study, the USV is assumed to sail approximately 2400 nautical miles in the North Pacific Ocean for surveillance, with a sailing duration of around 200 h. The environmental data with corresponding uncertainty are collected from open-source information available on the web [26,27]. The USV is expected to complete the given task without human intervention and with high reliability for the entire duration of the task. In order to achieve that goal, the problem is divided into several sub-problems that are solved in a sequential fashion as described in Sec. 3. This case study shows how the proposed approach can optimize the ECCS, a critical system in a USV, in a real-world example that demonstrates its effectiveness.
4.1 System Operational Profile.
In order to obtain large quantities of data for training and using the previously discussed diagnosis models, the input data had to be developed for environmental and engine throttle or loading profiles. For each input category, several profiles were created. By executing a given throttle profile through the model while it is subjected to various environmental profiles with uncertainty considerations, sufficient synthetic sensor data were generated for use in the performance-optimizing framework.
The environmental profiles were created incorporating four parameters of air temperature, seawater temperature, air pressure, and seawater current, as shown in Table 1. The “Start” term refers to the environmental conditions that the vessel would experience at the beginning of the vessel’s journey and “Finish” term refers to the environmental conditions that the vessel would experience at the end of its journey. Each of the four parameters is assumed to be uniformly distributed. The air and seawater temperature bounds change linearly with time, whereas the air pressure and sea current bounds remain constant. For example, following Table 1, the air temperature is uniformly distributed between 10° C and 22° C at the start of the simulation, and the bounds linearly increase to 20° C and 34° C, respectively, at the end of the mission. These values were based on environmental conditions assuming the vessel is traveling from San Diego to Hawaii during the summer using published data [26,27]. These four parameters comprise the conditions that a marine diesel engine would be subjected to during operation.
Environment profile uncertainties
Lower bound | Upper bound | |||
---|---|---|---|---|
Environment | Start | Finish | Start | Finish |
Air temperature (°C) | 10 | 20 | 22 | 34 |
Seawater temperature (°C) | 19 | 22 | 25 | 28 |
Air pressure (MPa) | 0.1006 | 0.1006 | 0.102 | 0.102 |
Sea current (m/s) | 0.5 | 0.5 | 1 | 1 |
Lower bound | Upper bound | |||
---|---|---|---|---|
Environment | Start | Finish | Start | Finish |
Air temperature (°C) | 10 | 20 | 22 | 34 |
Seawater temperature (°C) | 19 | 22 | 25 | 28 |
Air pressure (MPa) | 0.1006 | 0.1006 | 0.102 | 0.102 |
Sea current (m/s) | 0.5 | 0.5 | 1 | 1 |
The throttle profiles are the desired loading that the operator is placing on the engine. The throttle was measured using a percentage, from 0% to 100%. The throttle profiles were manually generated using multi-level step functions in the matlabsimulink environment [24] to assess the performance under varying uncertain environmental conditions. Profiles were for 200 h of operation which was based on potential mission time and USV fuel capacity.
4.2 System Description.
The ECCS of a USV consists of all the components needed to control the vessel’s propulsion capabilities. Propulsion is dependent on the marine diesel engine which is dependent on its cooling and control components. As the engine operates, it generates a large amount of heat. The heat generation will cause temperatures to rise, potentially becoming detrimental to performance and component health. The cooling support subsystems remove the heat and expel it to the environment. The ECCS contains four cooling subsystems: air, freshwater, lube oil, and seawater. The air, freshwater, and lube oil subsystems all pull heat directly from the engine. The freshwater subsystem also cools the lube oil subsystem. The air and freshwater are then cooled by the seawater where the heat is finally dispelled back to the environment. The flow of these systems can be seen in the simplified block diagram shown in Fig. 6.
During operation, these subsystems are dependent on each other to ensure the proper functionality of the overall system. Any degradation of one subsystem will result in greater stress on the others and possibly immediate or delayed failure. Sensors and the conditions they monitor are what indicate how well a system is operating. The ECCS model contains 44 sensors monitoring a variety of conditions throughout the system such as temperatures and pressures. If a single or several sensors begin seeing abnormal temperatures based on the current loading and environment conditions, this can be a sign a component has failed or is failing. This is the information passed on to the previously discussed models that are now tasked with determining the new system reliability.
4.3 ECCS Simulation Model and Data Generation.
A physics-based simulation model for a USV is required to simulate multiple conditions that the USV could experience. The simulation model was created using matlab’s simulink and simscape Libraries [24]. These proved to be effective tools due to the built-in physics considerations that are programmed into these libraries and the relatively simple approach to running them in parallel. A python script was used to generate, control, and organize the input data into the model. The use of the python script (using Refs. [28–30]) also allowed the integration of the other components of this software framework, such as the DBN structure.
The focus of the simulation model was the ECCS subsystems for the USV. In the ECCS, temperature and pressure sensors are placed throughout to monitor the health and performance of various systems. The engine (Fig. 7) is the primary heat source for the vehicle and is responsible for providing the mechanical power to the other subsystems. This is modeled by considering two aspects: the environmental impact on the engine and the physical model for how the heat is dissipated from the engine through the pistons (based on a six-cylinder engine design). The cooling subsystems (i.e., lube oil subsystem in Fig. 8) are used to dissipate the heat from the engine to the environment.
To generate the training data for the diagnosis of the USV, air temperature, sea temperature, and air pressure profile was used for each simulation. Each simulation was given a starting health state for each component as well as a control profile. The Monte Carlo simulation method was used to generate random samples of environmental profiles, initial health states, and the control profile.
4.4 Deep Learning Model Training for Subsystems’ Health Diagnosis.
The CNN models were trained for health state classification of four crucial subsystems of the ECCS: the marine diesel engine, the freshwater cooling subsystem, the seawater cooling subsystem, and the lube oil subsystem. The diagnostic models were trained using synthetic sensor data from the system simulation model. The CNN models were designed to automatically extract relevant features from the input raw data through convolutional and pooling layers. In order to minimize the possibility of encountering input data outside the training boundary, CNN models were trained using an exhaustive and balanced synthetic data set (generated by the ECCS simulator) that includes the most possible conditions that the ECCS may experience due to environmental uncertainties. 20% of the synthetic data were selected randomly for testing, and the rest were used for model training. The data set consists of sensor readings primarily representing temperatures, pressures, and other measurements from different locations of the system in operation (see Fig. 6). To improve the modeling process, the data were scaled between 0 and 1. The data set comes with a total of ten health state classes for different subsystems depending on their efficiencies starting from 100% and gradually decreasing in steps of 10%. However, these classes are also associated with the three main health categories, healthy (efficiency ), degraded, and critical (efficiency ). The classification task was carried out by activating the neurons of the output layer using the softmax function that outputs a probability distribution of classes that can be further utilized as virtual evidence for updating the system DBN [37]. The performances of the CNN-based diagnostic models were measured by simply comparing the accuracy and the confusion matrices in predicting the health state classes. All four CNN models showed high accuracy rate exceeding 97% for both training and testing data sets. These models can be used to output real-time health information, such as determining whether a subsystem is functioning normally, experiencing degradation, or is in a critical state, given the environmental and operational uncertainties. Note that continuous evaluation and retraining of deep learning models with real-world data are necessary. This ensures sustained performance improvement, adaptability, and reduction of the impact of real-life inputs beyond the design space. Additionally, to handle moderate changes to design variables, transfer learning can be used to fine-tune existing models, reducing training needs. For more significant changes, online learning can be employed to continuously update models with new data. These strategies can effectively manage modeling costs while maintaining accuracy and efficiency in evolving system configurations within our proposed optimization framework.
4.5 Dynamic Bayesian Network for ECCS.
In order to create the structure of the DBN and parameterize it for system reliability analysis, we first created a fault tree (FT) for the ECCS. FTs are developed for reliability modeling and analysis of complex systems that exhibit component degradation over time [38]. Fault trees are useful to understand the combinations of component and subsystem failures (failure paths) that lead to a system-level failure. A fault tree primarily consists of two types of nodes: event nodes and gate nodes. Events represent failures of subsystems and components. The system failure is modeled using the top event of a fault tree. The gates are used to represent dependencies between components. The AND gate represents a parallel system with independent components, and the OR gate represents a series system with dependent components.
The ECCS failure is modeled using a FT shown in Fig. 9; the model is adapted from Ref. [15]. The corresponding hourly failure rates of various components are presented in Table 2.
Failure event | Failure rate (h−1) |
---|---|
Air cooling subsystem failure | 8.526 × 10−5 |
Contra-rotating rudder propeller failure | 8.933 × 10−4 |
Coupling failure | 1.086 × 10−6 |
Diesel engine failure | 1.537 × 10−4 |
Freshwater pump failure | 2.388 × 10−5 |
Fuel pump failure | 1.429 × 10−4 |
Gear box failure | 1.268 × 10−6 |
lube oil pump failure | 2.403 × 10−5 |
Motor controller failure | 4.753 × 10−6 |
Propulsion monitoring subsystem failure | 1.738 × 10−4 |
Propulsion protection subsystem failure | 1.738 × 10−4 |
Rudder angle controller failure | 3.102 × 10−6 |
Rudder angle sensor failure | 6.239 × 10−6 |
Seawater pump failure | 1.002 × 10−5 |
Shafts failure | 1.248 × 10−5 |
Signal amplifier failure | 1.095 × 10−5 |
Speed sensor failure | 1.248 × 10−5 |
Failure event | Failure rate (h−1) |
---|---|
Air cooling subsystem failure | 8.526 × 10−5 |
Contra-rotating rudder propeller failure | 8.933 × 10−4 |
Coupling failure | 1.086 × 10−6 |
Diesel engine failure | 1.537 × 10−4 |
Freshwater pump failure | 2.388 × 10−5 |
Fuel pump failure | 1.429 × 10−4 |
Gear box failure | 1.268 × 10−6 |
lube oil pump failure | 2.403 × 10−5 |
Motor controller failure | 4.753 × 10−6 |
Propulsion monitoring subsystem failure | 1.738 × 10−4 |
Propulsion protection subsystem failure | 1.738 × 10−4 |
Rudder angle controller failure | 3.102 × 10−6 |
Rudder angle sensor failure | 6.239 × 10−6 |
Seawater pump failure | 1.002 × 10−5 |
Shafts failure | 1.248 × 10−5 |
Signal amplifier failure | 1.095 × 10−5 |
Speed sensor failure | 1.248 × 10−5 |
The failure rates were obtained from NPRD [39], OREDA [36], and NSWC handbooks [35]. The ECCS is made of track control devices, situational awareness devices, and the propulsion subsystem—all connected through an OR gate. The failure of the propulsion device is modeled using a subtree that has the marine diesel engine and the lube oil pump among other components. Under the engine cooling subsystem, the seawater and freshwater subsystems are connected via an AND gate since the failure of both subsystems will result in the failure of the cooling water system.
The FT model shown in Fig. 9 was utilized for developing the structure of the ECCS DBN model. The FT gates were converted to node CPTs in order to parameterize the DBN. The FT events were mapped to the DBN nodes into two states: “working” and “failed.” The directed edges between nodes were created by connecting the input events to the output event of an FT gate. The next step was to establish the transition network by adding directed edges from parent nodes to child nodes at different time slices sequentially. A DBN node can also depend on its previous state, thus making it its own parent node in a transition network.
The final step was to create the node CPTs that model the causal relationships between different components and subsystems. The CPTs of the child nodes were determined based on the failure probabilities of their parent nodes and the associated FT gate type. The failure time distribution of individual components was assumed to follow the exponential distribution [15]. Assuming the failure rate of any component to be r, the failure time distribution can be written as f(t) = re−rt. Thus, given a component was working at a previous time-step, the probability of the same component having a failed state at the current time-step is given by . Similarly, if a component has already failed, the failure probability will change to unity. The final DBN structure for the ECCS is shown in Fig. 10, which was developed using the GeNIe Modeler by BayesFusion [37]. The DBN consists of 37 nodes, 62 arcs, and 158 independent parameters. With new health state information, this DBN can be updated to evaluate the reliability profile of the ECCS of a USV. Using prior information obtained from the reliability data handbooks, the DBN model of the ECCS was evaluated for 200 h with 10-hour time-steps. The system reliability after 100 h of operation was found to be around 0.84, and after 200 h, it went down to 0.71. The propulsion device showed a prior reliability of 0.8 and the cooling subsystem showed 0.98 after 200 h, which makes the propulsion device less reliable compared to the cooling subsystem of the ECCS.
4.6 Optimization of ECCS Operational Profiles.
As the surrogate model helped in decreasing the computational expense for evaluating the throttle profiles, this allowed for the use of optimizers, such as NSGA2, that did not need to be specialized for computationally costly problems. NSGA2 is a gradient-free method for solving multi-objective optimization problems that are commonly used in existing literature [42]. This was implemented using python’s multi-objective optimization library [43].
In addition to this, a Bayesian optimizer was used to verify the solutions obtained from the NSGA2. As established by Frazier [44], a Bayesian optimizer is an approach designed for optimization problems that take significant time to evaluate and have less than 20 dimensions (decision variables). Since the engine throttle profile only had five different stages (i.e., five decision variables), this was an appropriate fit. These stages represent the engine throttle levels during different parts of a 200-hour mission. Each stage or design variable represents a throttle level for 40 h of the mission. This was implemented using an open-source python Bayesian optimization library [30].
The way that the Bayesian optimizer worked was by randomly sampling a few throttle profiles and exploring the performance of the different profiles. So by exploring the design space, the optimizer was able to learn about the shape of the objective function and update its uncertainty at a given throttle profile based on this information using the Bayes’ theorem. The acquisition function, which was built into the Bayesian optimizer, sampled points to find the throttle profile that resulted in the highest likely increase in value (for maximization). For more details about Bayesian optimization, the reader may refer to Ref. [44].
A single objective optimization version of the problem was also considered and solved. The single objective optimization had the objective function that consisted of a weighted sum for maximizing the expected speed while minimizing the variability in the speed across all uncertain environments tested. For the single objective optimization, the system reliability was constrained to be above a critical system health value set by the user. The external penalty approach penalized any engine throttle profile that results in the reliability of the system being below the critical system health value, which was set as 0.735 (i.e., Rcritical = 0.735). The expected value () and the standard deviation () of the speed of the vessel were determined using the ECCS simulation model considering the uncertain environmental parameters (p) (from Sec. 4.1) and throttle profile.
4.7 Results and Discussion.
At the start of the experiment, the simulated USVs were given an initial health state and various uncertain environmental profiles. The initial health state was established to be a random value within the healthy state for each component. With these initial health state conditions, the ECCS simulation model would operate for 200 h (mission time) over the randomly generated environment and throttle profiles. Since the throttle profile is the only controllable variable in the system, the optimizer would adjust the throttle profiles based on the information it received from the sensors and system reliability prediction.
The optimizer seeks to maximize the speed, which reduces the travel time, while trying to prevent the component wear from going past a specific limit (extending the lifespan of the components as much as possible). Component wear is guaranteed for any operating system. However, with the addition of the optimizer, the system can reduce component wear at times of more severe operational conditions, such as warmer seawater temperatures. Conversely, the optimizer can take advantage of beneficial conditions, such as low current. Thus, the system is still completing its mission/trip as desired but doing so with higher reliability at completion.
The ECCS model was validated against results obtained from physical equations, while the CNN models were validated by splitting the entire data set into training (80%) and testing (20%) data sets.
The relationship between the expected speed and the expected reliability was explored through the bi-objective optimization formulation shown in Eq. (8). As explained earlier, in order for the expected speed to increase, more stress would need to be placed on the components which would decrease the reliability of the system. Maximizing both the expected speed and the expected reliability ensures that the objectives would be competing against each other. For the bi-objective optimization, the NSGA2 optimizer was run with a population size of 100 individuals for 42 generations. This configuration was chosen because after 42 generations the throttle profiles seemed to converge as there was less than a 5% difference between profiles. To verify these results a Bayesian optimizer was used following a similar structure to the optimization problem shown in Eq. (1). This had the objective of maximizing the expected speed (minimize : − Espeed(X, P)), with a constraint based on the expected system reliability; where the lower limit for the expected system reliability was set between 0.733 and 0.7365 and was established in increments of 0.0001 since this was the range of reliability values from the NSGA2 non-inferior solution set. This resulted in a population of 35 members. In order to keep the two optimization setups comparable, they both needed to use the same number of total function calls. This means that the Bayesian optimizer was set to go for 120 iterations. The resulting non-inferior solution sets from these optimizers are shown in Fig. 11. This shows that the results from both optimizers are consistent and compared well with each other.
To confirm the accuracy of these results based on the simulation model, 20 randomly selected points were chosen from the NSGA2 non-inferior solution set. The corresponding environmental uncertainty parameters and throttle conditions were run through the simulation model. Using the results from the simulation model as the actual values, the normalized percent error for the expected speed and expected system reliability was 0.057% and 0.044%, respectively.
Based on the throttle profiles from the bi-objective optimization formulation comparing the expected speed and the expected system reliability, the lowest system reliability is most impacted by the throttle level at the end of the mission. This can be seen through the three throttle profiles shown in Fig. 11. The throttle profile (throttle profile 1) that had the highest system reliability but the lowest expected speed was the throttle profile that started out going full throttle for the first 80 h of the mission and then slowed down to nearly traveling on idle for the rest of the mission. The throttle profile (throttle profile 3) that had the lowest system reliability but the highest expected speed was a throttle profile that was constantly at full throttle throughout the course of the mission. The second throttle profile (throttle profile 2) shown was a balancing point for the system reliability and expected speed that went full throttle for most of the mission until the final state where it was idle for the last 40 h. These results are justified as at the beginning of the mission, the difference between the system reliability and the critical system reliability is the largest.
A single objective optimization problem was considered and solved as well. For this single objective optimization formulation, the optimizer found a throttle profile (Fig. 12) that would have a 50/50 balance between maximizing the expected speed of the USV and minimizing the variation in speed, while maintaining system reliability above 0.735 for the duration of the mission, as demonstrated by the system and subsystem reliability curves shown in Fig. 13. This operating condition was also represented in Fig. 11 to verify the results from the optimization formulations. Expanding on the trade-offs between the expected speed and the variability in the expected speed was also explored through another bi-objective optimization problem. The NSGA2 was used to find the non-inferior solutions for this relationship. The optimizer was run using a population size of 100 individuals for 100 generations. It was found that these objectives did not significantly conflict with each other. This was because there was no significant difference between the highest and lowest variability. Since this difference is so small, this can likely be attributed to noise/inaccuracies in the surrogate model and these objectives.
In our approach, various models have been employed to perform specific tasks, introducing additional uncertainties to the analysis. To address these uncertainties, the utilization of Monte Carlo simulations, the validation of models against test data, and the application of robust optimization are carried out. While the complete elimination of these additional uncertainties poses a challenge, the proposed framework embraces uncertainties, assesses their impact, and provides strategies for robust decision-making.
5 Conclusion
This paper contributes to a new framework that integrates state-of-the-art techniques for reliability-based optimization of unmanned systems’ operational profiles, which has many potential applications. The proposed method is the first to integrate health monitoring of multi-component or multi-subsystem systems with causal relationships.
Unmanned systems have garnered immense popularity due to their potential ability to carry out tasks with high efficiency, all without the need for human intervention. However, their potential ability to achieve such levels of efficiency is heavily reliant on a multitude of components that work synchronously during various operations. Given the complex nature of unmanned systems and the fact that they may be inaccessible during operations, ensuring their reliability and performance is of utmost importance. The proposed reliability-based framework combines state-of-the-art simulation techniques, health diagnosis methods, system reliability analysis, and optimization techniques to enable the operational planning for unmanned system operations. In particular, this approach leverages deep learning techniques for subsystem health monitoring, dynamic Bayesian networks for system reliability analysis, and optimization schemes to optimize system performance and reliability for mission planning.
To demonstrate the effectiveness of the proposed approach, the paper implements and executes the framework using a simulation model for the engine cooling and control system of an unmanned surface vessel. The study shows promising results in terms of identifying system operational profiles with high expected reliability under uncertain environmental conditions. Beyond this specific example, the framework’s potential for improving the reliability and lifespan of unmanned systems is immense. With its ability to enable longer periods of deployment and the execution of more complex missions, this approach represents a significant step forward in the reliability-based optimization of unmanned systems’ operations. However, the proposed concept in this paper is aligned with the understanding that the systems’ operational profiles may be altered during a mission by implementing the framework in real-time. Offline operational profile planning can be conducted beforehand followed by an online implementation of the proposed framework, which can lead to improved performance of unmanned systems with additional knowledge about the uncertain environment during a mission. Future work will focus on combining offline and online operational profile planning for such systems.
Acknowledgment
This study was supported by the Office of Naval Research (ONR) under Grant No. N000142212459. This support does not constitute an endorsement by the funding agency of the opinions expressed in the paper.
Funding Data
U.S. Department of the Navy, Office of Naval Research (DON-ONR Grant No. N000142212459)
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The data set generated and supported the findings of this article can be obtained from the authors upon a reasonable request.