Abstract
Condition monitoring plays a crucial role in improving system failure resilience, preventing tragic consequences brought by unexpected system failure events, and saving the consequential high operation and maintenance costs. Continuous condition monitoring systems have been applied to diversified engineering systems for well-informed operational decision-makings. Although research has been devoted to predicting system states using the continuous data flow, there still lacks a systematic decision-making framework for system designers to assess the value of such monitoring systems at the design stage therefore making system design decisions on adopting monitoring systems to maximize the benefits. This paper constructs such a decision-making framework based on the value of information, with which system designers can evaluate expected operation cost reductions under specific operation modes considering the effectiveness of continuous monitoring systems in predicting system failures. Two case studies on a battery energy storage system and a mechanical system, respectively, are employed to illustrate the value evaluation of the monitoring information and the system maintenance process with the aid of different prognostic results based on the monitoring data. Case study results show that the value of monitoring systems will be influenced by the deviation among the equipment group, the accuracy of system-state prediction, and different types of costs involved in the operating process. The adjustment of maintenance actions based on monitoring and prognosis information will help improve the value of monitoring systems.
1 Introduction
The capability of effectively ensuring high system reliability throughout a product’s lifecycle is critical to both design and operations of practical engineering systems. To take into account system performance degradation over time, time-dependent probabilistic constraints have been introduced to the design decision-making frameworks, e.g., reliability-based design optimization (RBDO) [1–7], and periodical inspections and maintenance activities are conducted until the end-of-life, to ensure the time-dependent performance of engineering products. Furthermore, with the development of the Internet of Things (IoT), only considering the time-based maintenance will overestimate the operating cost, thus leading to a conservative design. To accurately estimate the lifecycle performance of a product, it is important to include the effect of monitoring systems, health management, predictive, and condition-based maintenance in the early-design phase [8–14]. Meanwhile, as sustainable design raises more and more concern, there is also a strong indication to integrate information from subsequent life-cycle stages into the early-design phase [15]. For example, product usage data can be applied to form clusters that indicate abnormal fields causing severe and rapid product function degradation [16]; the best sequence of disassembly operations for maintenance and component upgrade can be identified through graph-based integer linear programming together with multi-attribute utility analysis [17]; lifecycle cost and energy consumption in a closed-loop supply chain has been considered in the design of product modular architecture [18].
Monitoring systems play a crucial role in preventing tragic consequences of unexpected system failure and saving the consequential high cost. And an effective forecasting ability realized by proper application of monitoring systems enables customers, product manufacturers, and original equipment manufacturers to monitor system health, estimate the remaining useful life of systems, and take corrective actions [19]. Various engineering systems have benefited from utilizing monitoring systems to improve system safety, increase system operations reliability and mission availability [20], decrease unnecessary maintenance actions, and reduce system life-cycle costs [21]. Such a benefit can be realized through a better understanding of the system degradation process, which relies on the quality of designed sensor networks, the property of prognostic methods, and a better-informed maintenance decision-making process based on the knowledge inferred from monitoring data. Sensor network design optimization has been widely explored in Refs. [22,23], among which genetic algorithms, (mixed) integer programming, and heuristics based on specific system structure properties have been used.
As the value of monitoring systems, especially continuous monitoring systems, is recognized, plenty of research has been done from diversified aspects to fully utilize the information provided by monitoring in various engineering systems. First, data transmitted from monitoring systems are employed to establish the system degradation pattern. System degradation data obtained under normal and accelerated degradation processes can be employed to train degradation models with or without stress factors [24,25], with which system owners are able to prognose future component states more accurately. Among the degradation models, physics-based models (e.g., Arrhenius model, Eyring model) and statistics-based models (e.g., proportional hazard model) are primarily used. Then, the state of the whole system can be predicted by considering failure interactions between the components and the system behavior affected by such interactions [26]. Second, studies have been conducted to optimize the maintenance process based on the predicted system state. Renewal theory [27] and Monte Carlo simulation [28] can be applied to find the optimal maintenance policy for a single-unit system or a multi-unit system with independent components. An optimal maintenance policy for a general system can be built by considering economic, structural, and stochastic dependency between the components [29]. Guillen et al. [30] have proposed a framework for managing condition-based maintenance programs, focusing on the optimized usage of condition monitoring systems in the operating stage. Third, research on evaluating the value of information provided by the monitoring system [31] has been done largely based on Bayesian updating and utility-based decision theory. Value of the monitoring system and the information it has provided lies in helping to make timely and effective maintenance decisions, which are calculated by comparing the expected operating cost against the scenario without any latest monitoring data. An example of using the value of information to optimize the condition-based maintenance policy can be found in Ref. [32].
Although studies have been conducted from comprehensive aspects to develop effective condition monitoring systems (e.g., sensor networks) or to utilize the condition monitoring information for maintenance, they have largely focused on the operating phase of the system. However, it is a different problem for system designers to make a design decision on whether to adopt a condition monitoring system out of different design alternatives at the early-design stage. Therefore, this paper proposes a systematic framework and design tools for system designers to evaluate the cost and benefits of adopting a certain monitoring system, for critical system components or subsystems, by considering various key factors such as the system structure and performance degradation, failure criticality, monitoring system performance for system-state prediction, and the condition-based maintenance processes.
The paper is organized as follows. In Sec. 2, the definition and analysis of VoI are reviewed. Then, the decision-making framework and the assessment of VoI for continuous monitoring systems are further explained in Sec. 3. In Sec. 4, two case studies involving different operating conditions are conducted to illustrate how the framework can be used to evaluate and maximize the value of continuous monitoring systems in different engineering applications. Finally, the discussion on how the analysis of VoI of monitoring systems can benefit system design and conclusions are given in Sec. 5.
2 Value of Information
This section provides a brief introduction to the value of information (VoI) definition and the quantitative measures under perfect and imperfect information scenarios.
2.1 Value of Information Definition.
From the decision-analytic perspective, information has no value when it leads to an action we would have taken without the information, and is valuable when it leads to action different from the one we would have taken without the information [33]. Due to uncertain scenarios in the future, the expected value of information is most frequently adopted, which is determined by the information’s impact on future actions. Further, we make a distinction between perfect information and imperfect information. We often call perfect information clairvoyance, because it is what we would learn from a clairvoyant: a person who perfectly and truthfully reports on any observable event that is not affected by our actions [34]. Whereas clairvoyance about an uncertain parameter eliminates uncertainty on that parameter completely, imperfect information reduces but does not eliminate the uncertainty.
2.2 Value of Information Analysis.
The analysis of VoI is rooted in Bayesian updating and utility-based decision theory. As described in Ref. [35], it assigns a value to a piece of information as the difference between the expected utilities or costs of the optimum decisions with and without that information. As the decision tree is shown in Fig. 1, the decision-maker needs to select an option with a higher expected value among A and B. Suppose without any information, the highest expected value that can be achieved is EVbase. If perfect information about the consequence of option B is known, then the uncertainty at each decision point will be completely eliminated, leading to an expected value EVperfect. On the other hand, if only imperfect information is available, then the probability of each consequence of B will be conditioned on the information. With the imperfect information, an expected value EVimperfect can be achieved. Then, the expected value of perfect information (EVPI) and the expected value of imperfect information (EVII) can be calculated using (1) and (2). A similar method can be applied when the goal is to achieve the lowest cost or the maximum utility.
3 Value of Information for Continuous Monitoring Systems
As introduced in Sec. 1, the value of the monitoring system and the information it offers depends on how the system is modeled and how the maintenance actions are guided by the information. The methods of system degradation modeling, maintenance decision-making, and value of information quantification can be applied systematically to help engineering system owners to maximize their benefits of adopting monitoring systems. A framework for evaluating the value of continuous monitoring systems and making optimal decisions on monitoring is shown in Fig. 2. Since the framework is designed to help system owners make optimal decisions on monitoring, it first requires a specific design with system structure (e.g., series, parallel, and mixed), reliability target (expected service life), and redundancy allocation as the input. Such a design is associated with certain development costs, an expected degradation performance (e.g., physics-based model and simulation-based model), and variation among individual assets or components. Then, a selection of monitoring systems and prognostic algorithms should be provided, and this selection can be one of the monitoring alternatives for the system owner to compare. The selection of the monitoring system will determine a set of monitoring sensors and a specific design of the sensor network while the selection of prognostic algorithm will specify how the monitoring data should be processed and modeled to predict the system's health state. Monitoring and prognosis systems with different quality and accuracy require different investments, and the trade-off between cost and quality can be analyzed following the proposed framework by considering a system design together with its operating process. So, the third block requires information from the operating and maintenance process, including the decision on maintenance policy (e.g., period for time-based maintenance, threshold for condition-based maintenance), cost of the maintenance activities (e.g., cost of preventive maintenance, penalty for system failure and cost for recovery), and the effect of different maintenance activities (e.g., same as new, same as used, or extended usage length). With the specification of system degradation, monitoring, prognosis, and maintenance, we can then evaluate the value of monitoring systems and the value of each monitoring information. By analyzing the breakeven points under different scenarios, system owners can select the optimal monitoring system based on the system degradation pattern. And the value assessment of each monitoring information can be used to improve the effect of condition-based maintenance, thus maximizing the benefit of a specific monitoring system.
3.1 Maintenance Without Monitoring Information.
3.2 Maintenance With Monitoring Information.
3.2.1 State Assessment With Fixed Accuracy.
3.2.2 State Assessment With Improved Accuracy.
4 Case Studies and Results
In this section, two case studies are conducted following the decision-making framework shown in Fig. 2. The first case study on a battery energy storage system considers state assessment with fixed accuracy while the second case study on general mechanical equipment incorporates improved accuracy in the state assessment. The results of the case studies have been obtained by following each step as outlined in the proposed decision-making framework, such as specifying the degradation characteristic of the system, selecting a sensor network design or monitoring scenario, applying a certain condition-based maintenance policy, evaluating the value of monitoring systems, and then providing feedback to system designers on the performance of the system design. Since the proposed design decision-making framework is used to evaluate the benefit of adopting a certain monitoring system to a system design, the result of case studies has mainly reported showing the influence of parameters of the designed system and the monitoring system on the operating cost. Finally, insights can be drawn from using the framework so that the decision-maker would know which monitoring system should be adopted to which system design to maximize the benefit.
4.1 Case Study I: A Battery Energy Storage System.
In the Battery Energy Storage System, battery assets are charged when the electricity price is low and discharged at a high price so that periodical energy demand can be satisfied and utility companies can make profits. Different from other equipment whose failure is self-announcing and explicit, the failure of a battery asset can be defined based on different criteria. One of the common practices is to replace battery assets whose capacity is less than 80%. Thus, the battery asset needs to be monitored continuously, and meanwhile, its state of health has to be predicted to maintain a sufficient power supply. The use of monitoring systems in the Battery Energy Storage System can help ensure a timely replacement of the degraded assets, thus satisfying the energy demand. Since demand unfulfillment will lead to a potential reduction in the market share and various damage from the end-users, it is critical to select sufficient monitoring systems and adopt them in an efficient manner to save the total operating cost and provide the adequate power supply. The specifications of the decision-making framework in Fig. 2 for case study I are summarized in Table 1.
Framework setup | Case study I specification |
System structure | Energy storage system (parallel assets) |
Performance expectation | Energy provided satisfies the periodical demand |
Degradation model | Capacity loss based on the Arrhenius model as in Eq. (11) |
Monitoring and prognosis | Monitored individually and capacity predicted with different accuracy |
Maintenance | Replacement at 80% capacity |
Framework setup | Case study I specification |
System structure | Energy storage system (parallel assets) |
Performance expectation | Energy provided satisfies the periodical demand |
Degradation model | Capacity loss based on the Arrhenius model as in Eq. (11) |
Monitoring and prognosis | Monitored individually and capacity predicted with different accuracy |
Maintenance | Replacement at 80% capacity |
4.1.1 Problem Description for Case Study I.
Suppose sensors are installed in each battery asset for the decision-maker to infer its health state. In this case study, we consider two types of state inference results. In the first scenario, later referred to as the unbiased scenario, the predicted state of health follows a normal distribution whose mean value is the true remaining capacity and the standard deviation is a certain percentage of the mean. When the predicted capacity exactly matches the true capacity, then the information is perfect. While in the second scenario, the predicted result is biased meaning that the predicted state of health is always larger or smaller than the true remaining capacity. And the replacement actions are performed periodically for those assets whose predicted remaining capacity is below 80%. Also, based on the periodical prediction result, the total capacity of the current asset is compared with the demand to initiate the purchase or usage of additional new assets.
To evaluate the value of monitoring systems, one-year management processes of two different operating scenarios: operation without monitoring information versus operation with monitoring information, are simulated and compared. Operating cost including purchase cost, monitoring investment, and penalty for the unsatisfied power demand is considered. And then, the breakeven point: the operating cost with a monitoring system is the same as that without the monitoring system, under different scenarios is explored. The purchase cost of a new battery asset with 100 kWh capacity is assumed to be 100k$. To explain the influence of the relative value of penalty and monitoring cost, we compare the relationship between the P/B ratio: penalty from unit unsatisfied demand/the purchase cost of a battery asset and M/B ratio: cost of a unit monitoring sensor/the purchase cost of a battery asset. The replacement decision is made monthly, and the periodical energy demand in the presented study has been assumed as shown in Fig. 4.
4.1.2 Results of Case Study I.
For the first scenario with unbiased state inference result, we first provide a comparison of values of perfect information and imperfect information as shown in Fig. 5. It shows that the larger deviation the battery assets’ capacity degradation processes have, the more value monitoring and prognosis information can bring to the decision-maker. In Fig. 5, only one scenario with imperfect information is displayed. But based on different prediction accuracy levels, the difference between EVPI and EVII will also change. Furthermore, when taking the monitoring cost (i.e., the cost to collect the information) into consideration, breakeven points may occur. So next, by following the proposed decision-making framework, we explored the breakeven points in different combinations of prediction error percentage and the standard deviation of battery capacity under different penalty, purchase, and monitoring costs. The result is shown in Fig. 6. The result for each parameter setting is the average performance of 1000 simulation runs. Figure 6 indicates that the prediction error percentage should always be smaller than the battery standard deviation under breakeven points. When monitoring cost is high and battery variance is small, monitoring systems will not bring benefits.
For the second scenario with a biased state inference result, both positive bias and negative bias are considered. When the prediction has a positive bias, the result is shown in Fig. 7. With the positive biased information, battery capacity will be overestimated which makes the decision-maker purchase fewer battery assets so that monitoring information will always bring a high penalty. In this scenario, a breakeven point is a balance between purchase cost-savings and penalties within the case with monitoring systems. When the bias is small, monitoring costs cannot be high. Otherwise, the purchase saving cannot compensate penalty anyway and introducing a monitoring system will not bring benefits at all. When the prediction has a negative bias, the result is shown in Fig. 8. Battery capacity will be underestimated based on the negatively biased information which makes the decision-maker purchase more battery assets so that the monitoring information will lead to almost no penalty. In this scenario, a breakeven point is the balance between the monitoring cost and the penalty of the case without monitoring systems. When the bias is small and monitoring cost is low, monitoring systems can always bring benefits.
4.2 Case Study II: A General Mechanical Equipment.
General mechanical equipment like motors, fans, and turbines is frequently used in industry. Instead of the complete degradation process, the interest in analyzing the mechanical equipment lies in estimating its failure time. Weibull distributions are often applied to model the lifetime of a type of mechanical equipment, and sensor data (e.g., vibration) can be used to train prognostic models on the failure time. As the monitoring process continues, a more accurate prediction can be achieved from the accumulated data. In this case study, we consider the improvement of the prediction accuracy and the maintenance policy driven by the value of information.
4.2.1 Problem Description for Case Study II.
Framework setup | Case study II specification |
System type | General mechanical equipment |
Performance expectation | Avoid failure and provide a cost-effective functional service duration |
Degradation model | Weibull distribution |
Monitoring and prognosis | Monitored with embedded sensors and failure time predicted with improved accuracy |
Maintenance | Condition-based maintenance using the state assessment process as described in Sec. 3.2.2 |
Framework setup | Case study II specification |
System type | General mechanical equipment |
Performance expectation | Avoid failure and provide a cost-effective functional service duration |
Degradation model | Weibull distribution |
Monitoring and prognosis | Monitored with embedded sensors and failure time predicted with improved accuracy |
Maintenance | Condition-based maintenance using the state assessment process as described in Sec. 3.2.2 |
The maintenance processes with and without monitoring systems are compared to calculate the value of information and then improve the decision-making. Without the monitoring systems, the time to perform a replacement is optimized to minimize the expected cost rate EC1, which is calculated using Eq. (4). The best replacement time (tr*) based on the lifetime distribution is the one that minimizes the cost rate. And for the scenarios with monitoring systems, state assessment can be performed multiple times as long as the predicted result can provide positive value. The default parameters to conduct this case study are summarized in Table 3. These parameters are used to characterize the operating scenario to be evaluated. However, if such information is not available, we may conduct a sensitivity analysis to see how the maintenance decision and value of information change due to these parameters. In this study, when exploring the effect of a certain type of parameters, the other parameter values are kept the same as indicated in this table.
4.2.2 Results for Case Study II.
In this case study, there are three types of parameters that will influence the number of predictions and the minimum maintenance cost rate. The first type of parameters includes different types of cost in the maintenance process, including corrective maintenance cost (cc), preventive maintenance cost (cp) and the cost to make a prediction (ci). The second type includes the scale and shape parameter of the lifetime distribution of the equipment. The third type is about the model to describe the improvement of the prediction accuracy. The following results show how the value of information and the maintenance decisions change as the three types of parameters vary.
For the first type of cost parameters, what militates indeed is the cost ratio between the failure consequence and the cost of making the preventive actions. So, we change the two cost ratios: cc/cp and cc/ci to explore the change in maintenance decisions and the value of information. The optimal number of predictions to be made before replacement is shown in Fig. 10 and the cumulative value of information from the corresponding number of predictions is shown in Fig. 11. As can be noticed from the results, the change in these two cost ratios will influence the optimal number of predictions to make and the cumulative value of information. When the corrective maintenance cost is only twice the preventive maintenance cost, the replacement time calculated using the original lifetime distribution is good enough so that additional prediction (requiring cost and involving uncertainty) will not help reduce the cost rate. That is why the numbers in the first row of Fig. 10 are all zero. When performing a preventive action can save more cost compared with corrective maintenance, we will choose to make one or two predictions to perform maintenance actions before the failure happens. And the relationship between preventive cost and prediction cost will together determine the number of predictions. When the cost of preventive maintenance is extremely low, the information from one prediction result is useful enough to save the operating cost. And Fig. 11 shows a consistent change in the cumulative value of information: the lower cost of the two preventive actions need, the more operating cost we can save.
For the second type of distribution parameters, we would like to explore how the change of variance and skewness will influence the decision and VoI. The results are shown in Figs. 12 and 13. By increasing the shape parameter β, the distribution changes from right-skewed to left-skewed. And by decreasing the scale parameter α, the distribution becomes more and more centralized. As shown in Fig. 12, for the right-skewed two cases with α = 80 and α = 100, only making one prediction is the best choice, but it cannot achieve the highest cumulative value as compared with the other two right-skewed cases with α = 120 and α = 140. So, there is a boundary condition for the optimal decision changes from making one prediction to making two predictions.
For the third type of prediction-related parameters, they influence how we model the improvement of the prediction results obtained using the monitoring data. First, the bounds for the bias term Be and the standard deviation term S are considered, whose influence on the number of failure time predictions and the cumulative value of information is shown in Figs. 14 and 15, respectively. It is seen from the figures that the replacement decision obtained based on the prediction result will shorten the usage length when the prediction result has a negative bias. Moreover, when the saving brought from preventive maintenance is neutralized by the shortened usage length, it is not beneficial to determine the replacement time based on the prediction result. On the other hand, when the prediction result has a positive bias, the replacement time obtained based on the prediction result will be later than the original plan. Thus, the optimal result will be using one predicted result to distinguish the individual from the population but not trusting additional predictions since the predicted information may lead to actual failure of the equipment. Furthermore, the quality of the prediction result will influence the time and quantile to make predictions, as shown in Figs. 16 and 17, respectively. When the predicted distribution has a negative bias, there will be a longer waiting time before making the second prediction. However, when the predicted failure time distribution has positive biases, the second prediction will follow up closely to best capture the actual failure time.
Another prediction-related parameter is µ, which represents the speed of accuracy improvement. The maintenance actions under different µ values are summarized in Table 4. We can notice that when µ is large, which indicates fast improvement in the predictions based upon the monitoring information, only one prediction is made for the case study. In addition, as the increment of µ, the time and quantile value based upon the failure time distribution to make predictions will also change accordingly.
μ | 0.6 | 0.7 | 0.8 | b | 1.2 |
---|---|---|---|---|---|
Number of predictions | 2 | 2 | 2 | 2 | 1 |
First prediction time | 14 | 13 | 12 | 11 | 10 |
Second prediction quantile | 0.09 | 0.15 | 0.21 | 0.21 | – |
Cumulative VoI | 0.581 | 0.626 | 0.737 | 0.648 | 0.636 |
μ | 0.6 | 0.7 | 0.8 | b | 1.2 |
---|---|---|---|---|---|
Number of predictions | 2 | 2 | 2 | 2 | 1 |
First prediction time | 14 | 13 | 12 | 11 | 10 |
Second prediction quantile | 0.09 | 0.15 | 0.21 | 0.21 | – |
Cumulative VoI | 0.581 | 0.626 | 0.737 | 0.648 | 0.636 |
Thus, with the cumulative value of information calculated under different cost, equipment reliability, monitoring, and prognosis scenarios, decision-makers can select an operations management setting that can maximize the value of monitoring. Also, by following the optimal number of predictions and the best time to make a state assessment, the system can be maintained appropriately to achieve the largest monitoring benefit.
5 Conclusion
This paper establishes a decision-making framework, consisting of degradation modeling, monitoring system design, and maintenance planning, to evaluate the value of continuous monitoring systems in recurrent decision scenarios. Two case studies have been conducted to illustrate the efficacy of the framework applied to engineering systems with different operating scenarios, system modeling, and application of monitoring data. The results show that the value of monitoring systems will be influenced by the deviation among the equipment group, the accuracy of system-state prediction, and different types of costs involved in the operating process. And the adjustment of maintenance actions based on monitoring and prognosis information will help improve the value of monitoring systems. With the proposed framework, decision-makers will be able to evaluate the cost and benefits of adopting a certain monitoring system to a certain system design. This way, a cost-effective monitoring system can be selected to guarantee sufficient time-dependent reliability for the system, and in turn, the more accurate operating performance estimation can help improve the system design.
The current case studies only focus on one failure mode and use replacement with a new asset or component as the only maintenance action. While in real applications, more failure modes and other types of maintenance activities can be incorporated to estimate the operating cost with monitoring systems more accurately. Besides, the framework simplifies data processing and state prognosis by considering one unified monitoring system while in practice one engineering system may be monitored by multiple types of sensors, and there can be interactions among different monitoring systems. And another future work of this study is to incorporate maintenance cost evaluation considering the usage of monitoring systems with the design decisions. By adopting the proposed framework, the operating and maintenance cost can be estimated more accurately, which can be used as one of the evaluation criteria for system or product designs.
Acknowledgment
This research is partially supported by National Science Foundation (NSF) through the Faculty Early Career Development (CAREER) award: CMMI-1813111, and the NSF Engineering Research Center for Power Optimization of Electro-Thermal Systems (POETS) with cooperative agreement EEC-1449548.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The authors attest that all data for this study are included in the paper.
Nomenclature
- B =
purchase cost of a battery asset
- L =
time to perform a maintenance action
- M =
cost of a unit monitoring sensor
- P =
penalty from unit unsatisfied demand
- R =
gas constant
- S =
deviation bound of the predicted lifetime
- T =
actual lifetime
- cc =
corrective maintenance cost
- ci =
state inference cost
- qr =
quantile of a distribution to perform a maintenance action
- tr =
time to make a replacement
- Ah =
Ah throughput
- Be =
bias bound of the predicted lifetime
- F1 =
cumulative density function of the true lifetime distribution
- Fp =
cumulative density function of the predicted lifetime distribution
- Qloss =
battery capacity loss
- Tk =
Kelvin temperature
- =
allocated monitoring cost
- tr* =
best time to make a replacement
- cp =
preventive maintenance cost
- CM =
total monitoring cost
- α =
scale parameter of Weibull distribution
- β =
shape parameter of Weibull distribution
- τ =
time to perform health state prognosis
- γl =
health state prognosis result at time l
- θl =
a pre-determined threshold to initiate maintenance actions
- µl =
improvement of the prediction accuracy