Abstract

Condition monitoring plays a crucial role in improving system failure resilience, preventing tragic consequences brought by unexpected system failure events, and saving the consequential high operation and maintenance costs. Continuous condition monitoring systems have been applied to diversified engineering systems for well-informed operational decision-makings. Although research has been devoted to predicting system states using the continuous data flow, there still lacks a systematic decision-making framework for system designers to assess the value of such monitoring systems at the design stage therefore making system design decisions on adopting monitoring systems to maximize the benefits. This paper constructs such a decision-making framework based on the value of information, with which system designers can evaluate expected operation cost reductions under specific operation modes considering the effectiveness of continuous monitoring systems in predicting system failures. Two case studies on a battery energy storage system and a mechanical system, respectively, are employed to illustrate the value evaluation of the monitoring information and the system maintenance process with the aid of different prognostic results based on the monitoring data. Case study results show that the value of monitoring systems will be influenced by the deviation among the equipment group, the accuracy of system-state prediction, and different types of costs involved in the operating process. The adjustment of maintenance actions based on monitoring and prognosis information will help improve the value of monitoring systems.

1 Introduction

The capability of effectively ensuring high system reliability throughout a product’s lifecycle is critical to both design and operations of practical engineering systems. To take into account system performance degradation over time, time-dependent probabilistic constraints have been introduced to the design decision-making frameworks, e.g., reliability-based design optimization (RBDO) [17], and periodical inspections and maintenance activities are conducted until the end-of-life, to ensure the time-dependent performance of engineering products. Furthermore, with the development of the Internet of Things (IoT), only considering the time-based maintenance will overestimate the operating cost, thus leading to a conservative design. To accurately estimate the lifecycle performance of a product, it is important to include the effect of monitoring systems, health management, predictive, and condition-based maintenance in the early-design phase [814]. Meanwhile, as sustainable design raises more and more concern, there is also a strong indication to integrate information from subsequent life-cycle stages into the early-design phase [15]. For example, product usage data can be applied to form clusters that indicate abnormal fields causing severe and rapid product function degradation [16]; the best sequence of disassembly operations for maintenance and component upgrade can be identified through graph-based integer linear programming together with multi-attribute utility analysis [17]; lifecycle cost and energy consumption in a closed-loop supply chain has been considered in the design of product modular architecture [18].

Monitoring systems play a crucial role in preventing tragic consequences of unexpected system failure and saving the consequential high cost. And an effective forecasting ability realized by proper application of monitoring systems enables customers, product manufacturers, and original equipment manufacturers to monitor system health, estimate the remaining useful life of systems, and take corrective actions [19]. Various engineering systems have benefited from utilizing monitoring systems to improve system safety, increase system operations reliability and mission availability [20], decrease unnecessary maintenance actions, and reduce system life-cycle costs [21]. Such a benefit can be realized through a better understanding of the system degradation process, which relies on the quality of designed sensor networks, the property of prognostic methods, and a better-informed maintenance decision-making process based on the knowledge inferred from monitoring data. Sensor network design optimization has been widely explored in Refs. [22,23], among which genetic algorithms, (mixed) integer programming, and heuristics based on specific system structure properties have been used.

As the value of monitoring systems, especially continuous monitoring systems, is recognized, plenty of research has been done from diversified aspects to fully utilize the information provided by monitoring in various engineering systems. First, data transmitted from monitoring systems are employed to establish the system degradation pattern. System degradation data obtained under normal and accelerated degradation processes can be employed to train degradation models with or without stress factors [24,25], with which system owners are able to prognose future component states more accurately. Among the degradation models, physics-based models (e.g., Arrhenius model, Eyring model) and statistics-based models (e.g., proportional hazard model) are primarily used. Then, the state of the whole system can be predicted by considering failure interactions between the components and the system behavior affected by such interactions [26]. Second, studies have been conducted to optimize the maintenance process based on the predicted system state. Renewal theory [27] and Monte Carlo simulation [28] can be applied to find the optimal maintenance policy for a single-unit system or a multi-unit system with independent components. An optimal maintenance policy for a general system can be built by considering economic, structural, and stochastic dependency between the components [29]. Guillen et al. [30] have proposed a framework for managing condition-based maintenance programs, focusing on the optimized usage of condition monitoring systems in the operating stage. Third, research on evaluating the value of information provided by the monitoring system [31] has been done largely based on Bayesian updating and utility-based decision theory. Value of the monitoring system and the information it has provided lies in helping to make timely and effective maintenance decisions, which are calculated by comparing the expected operating cost against the scenario without any latest monitoring data. An example of using the value of information to optimize the condition-based maintenance policy can be found in Ref. [32].

Although studies have been conducted from comprehensive aspects to develop effective condition monitoring systems (e.g., sensor networks) or to utilize the condition monitoring information for maintenance, they have largely focused on the operating phase of the system. However, it is a different problem for system designers to make a design decision on whether to adopt a condition monitoring system out of different design alternatives at the early-design stage. Therefore, this paper proposes a systematic framework and design tools for system designers to evaluate the cost and benefits of adopting a certain monitoring system, for critical system components or subsystems, by considering various key factors such as the system structure and performance degradation, failure criticality, monitoring system performance for system-state prediction, and the condition-based maintenance processes.

The paper is organized as follows. In Sec. 2, the definition and analysis of VoI are reviewed. Then, the decision-making framework and the assessment of VoI for continuous monitoring systems are further explained in Sec. 3. In Sec. 4, two case studies involving different operating conditions are conducted to illustrate how the framework can be used to evaluate and maximize the value of continuous monitoring systems in different engineering applications. Finally, the discussion on how the analysis of VoI of monitoring systems can benefit system design and conclusions are given in Sec. 5.

2 Value of Information

This section provides a brief introduction to the value of information (VoI) definition and the quantitative measures under perfect and imperfect information scenarios.

2.1 Value of Information Definition.

From the decision-analytic perspective, information has no value when it leads to an action we would have taken without the information, and is valuable when it leads to action different from the one we would have taken without the information [33]. Due to uncertain scenarios in the future, the expected value of information is most frequently adopted, which is determined by the information’s impact on future actions. Further, we make a distinction between perfect information and imperfect information. We often call perfect information clairvoyance, because it is what we would learn from a clairvoyant: a person who perfectly and truthfully reports on any observable event that is not affected by our actions [34]. Whereas clairvoyance about an uncertain parameter eliminates uncertainty on that parameter completely, imperfect information reduces but does not eliminate the uncertainty.

2.2 Value of Information Analysis.

The analysis of VoI is rooted in Bayesian updating and utility-based decision theory. As described in Ref. [35], it assigns a value to a piece of information as the difference between the expected utilities or costs of the optimum decisions with and without that information. As the decision tree is shown in Fig. 1, the decision-maker needs to select an option with a higher expected value among A and B. Suppose without any information, the highest expected value that can be achieved is EVbase. If perfect information about the consequence of option B is known, then the uncertainty at each decision point will be completely eliminated, leading to an expected value EVperfect. On the other hand, if only imperfect information is available, then the probability of each consequence of B will be conditioned on the information. With the imperfect information, an expected value EVimperfect can be achieved. Then, the expected value of perfect information (EVPI) and the expected value of imperfect information (EVII) can be calculated using (1) and (2). A similar method can be applied when the goal is to achieve the lowest cost or the maximum utility.

Fig. 1
Decision tree for VoI analysis
Fig. 1
Decision tree for VoI analysis
Close modal
Despite the fact that almost all real-world information about the future is imperfect, the analysis of EVPI can set an upper limit on the value of imperfect information in the same decision context. When the EVPI is low, the decision-maker may consider not getting any additional information as it will not help much in improving the decision. Generally, the calculated VoI has to be compared with the cost of collecting the considered information [32], and the information is worth collecting only when the calculated VoI exceeds its cost.
(1)
(2)

3 Value of Information for Continuous Monitoring Systems

As introduced in Sec. 1, the value of the monitoring system and the information it offers depends on how the system is modeled and how the maintenance actions are guided by the information. The methods of system degradation modeling, maintenance decision-making, and value of information quantification can be applied systematically to help engineering system owners to maximize their benefits of adopting monitoring systems. A framework for evaluating the value of continuous monitoring systems and making optimal decisions on monitoring is shown in Fig. 2. Since the framework is designed to help system owners make optimal decisions on monitoring, it first requires a specific design with system structure (e.g., series, parallel, and mixed), reliability target (expected service life), and redundancy allocation as the input. Such a design is associated with certain development costs, an expected degradation performance (e.g., physics-based model and simulation-based model), and variation among individual assets or components. Then, a selection of monitoring systems and prognostic algorithms should be provided, and this selection can be one of the monitoring alternatives for the system owner to compare. The selection of the monitoring system will determine a set of monitoring sensors and a specific design of the sensor network while the selection of prognostic algorithm will specify how the monitoring data should be processed and modeled to predict the system's health state. Monitoring and prognosis systems with different quality and accuracy require different investments, and the trade-off between cost and quality can be analyzed following the proposed framework by considering a system design together with its operating process. So, the third block requires information from the operating and maintenance process, including the decision on maintenance policy (e.g., period for time-based maintenance, threshold for condition-based maintenance), cost of the maintenance activities (e.g., cost of preventive maintenance, penalty for system failure and cost for recovery), and the effect of different maintenance activities (e.g., same as new, same as used, or extended usage length). With the specification of system degradation, monitoring, prognosis, and maintenance, we can then evaluate the value of monitoring systems and the value of each monitoring information. By analyzing the breakeven points under different scenarios, system owners can select the optimal monitoring system based on the system degradation pattern. And the value assessment of each monitoring information can be used to improve the effect of condition-based maintenance, thus maximizing the benefit of a specific monitoring system.

Fig. 2
Decision-making framework
Fig. 2
Decision-making framework
Close modal
A decision tree that models the maintenance decision-making process is shown in Fig. 3. The final decision is whether to adopt a monitoring system, collect monitoring information, prognose the system state, and perform condition-based maintenance or simply use reliability models and perform time-based maintenance. The former case requires sequential actions based on the prognosis results (e.g., γl, γl). The state prognosis can be made periodically or continuously with the time interval δ → 0. Next, the expected cost following each decision path is explained. With the determined expected costs with (EC2) and without (EC1) monitoring systems and prognostic results, the Value of Monitoring Information can be quantified using (3). Noticing, EC2 can be quantified using either perfect information or imperfect information, leading to EVPI or EVII.
(3)
Fig. 3
Decision tree for the maintenance process
Fig. 3
Decision tree for the maintenance process
Close modal

3.1 Maintenance Without Monitoring Information.

Without monitoring systems, a time-based maintenance policy is often applied, based on which, the target system will be preventively replaced when the usage length reaches a threshold tr with a cost cp. If the system fails before the planned preventive action, then corrective maintenance has to be completed at a higher cost cc. And the expected cost rate, considering both the expected maintenance cost and the expected usage length, can be calculated using (4), where F1 is the cumulative density function of the system lifetime distribution. Similar ways of long-term cost rate quantification considering the expected lifetime cost and length are well accepted in the field of maintenance decision-making and can be found in Refs. [29,32,36]. And the similar approach is also applied in this paper when quantifying the expected cost rate of maintenance processes with monitoring information.
(4)

3.2 Maintenance With Monitoring Information.

The maintenance process guided by monitoring information and prognostic results is often referred to as condition-based maintenance. Using the monitoring data and health state prognosis algorithms, the system degradation condition can be assessed and the remaining useful life or failure time can be estimated. As explained in Sec. 2.1, there is a distinction between perfect information and imperfect information. In evaluating the VoI for continuous monitoring systems, perfect information can completely prevent failure by enabling preventive actions right before the failure occurs, leading to an expected cost rate in Eq. (5). Namely, in Fig. 3, if the perfect information γl is below the threshold θ, then the system will keep in normal operation with probability 1. While in real applications, monitoring systems and prognostic results can only provide imperfect information. And based on how the prognostic results are modeled and used, there are two possible methods to calculate the expected cost rate given imperfect information, which are explained in Secs. 3.2.1 and 3.2.2, respectively.
(5)

3.2.1 State Assessment With Fixed Accuracy.

When the prognostic model is only triggered by sufficient monitoring data, the state assessment accuracy can be regarded as at a fixed level. Thus, the sequential prognostic results (e.g., γl obtained at time l) can be treated equally. Whenever a result reaches the pre-determined maintenance threshold θ or the operating length reaches the predicted failure time, preventive action will be performed for the target system. Since the prognostic result cannot exactly reflect the truth, there will be cases when the system fails before a predicted maintenance activity is conducted. In this scenario, both the preventive cost and the corrective cost are determined by the accuracy of prognostic results. Considering an allocated monitoring cost, the expected cost rate for this scenario can be calculated using Eq. (6), where L is the time when a maintenance activity is conducted and T is the true life of the system.
(6)

3.2.2 State Assessment With Improved Accuracy.

When the system is continuously monitored, the initial prognostic results will have low accuracy due to the lack of monitoring data. As the system operates, more and more monitoring data accumulate, the prognostic results will have higher accuracy. In this scenario, the improved accuracy among sequential prognostics results due to accumulated monitoring data is recognized. The prognostic output is often a normal distribution describing the predicted failure time. Such an assumption is also used in other studies in the literature [3739]. Considering the prediction error, the prediction result obtained using the monitoring data collected until time τ can be modeled as N(T + e(τ), σ(τ)), where T is the true lifetime while e(τ) and σ(τ) are the bias term and the standard deviation. As τ approaches the true lifetime, the prediction accuracy will gradually increase. Referring to the representation of improved prediction accuracy in Ref. [37], we use functions (7) and (8) to model the evolvement of the bias and standard deviation, where Be and S are the bounds for the bias term and the standard deviation term, and the parameters µ and b characterize the speed of accuracy improvement. The parameter values in Eqs. (7) and (8) can be determined based on the historical prediction performance of similar monitoring systems and prognostic algorithms. If the exact values are hard to obtain, sensitivity analysis can be conducted to explore the decision space and estimate the potential range of the value of monitoring information. When the predicted failure time is represented using other distributions (e.g., Lognormal and Weibull), the bias term and standard deviation can still be applied to the mean and standard deviation of the corresponding distribution.
(7)
(8)
Since prognostic results have improved accuracy, there is a trade-off between a higher risk of potential failure and a more accurate prediction. Each time, the prognostic information with the highest value will be used to determine whether more data should be collected and used to forecast the system's health state. The value for prognostic information obtained at time τ (τ is later than the time that Fp is obtained) is quantified using Eq. (9), where Fp is the cumulative density function of the most recent predicted lifetime distribution. If no more data are needed, then the most recent prognostic result with the highest value will be used to optimize the maintenance threshold (e.g., usage length, degradation amount, and quantile of the predicted lifetime distribution). The expected cost for this scenario can be calculated using Eq. (10), where tend is the best time to make the last health state prognosis and the time to make a replacement is tr=Ftend1(qr). ci is the cost to make a prediction and the system-state inspection. Such a cost is considered because data collection, feature extraction and construction of deterioration indicators are needed to make a prediction on the failure time [40]. Since the true lifetime for each piece of equipment may be different, after finding the best time τ* to perform the first prediction, it is more meaningful to decide the best quantile (qi*) of the first predicted distribution to perform the next prediction, meaning that the time to perform the second prediction will be t2=Fτ*1(qi*). Then, the best time to perform the following predictions can be found in the same way using the most recent prognostic result.
(9)
(10)

4 Case Studies and Results

In this section, two case studies are conducted following the decision-making framework shown in Fig. 2. The first case study on a battery energy storage system considers state assessment with fixed accuracy while the second case study on general mechanical equipment incorporates improved accuracy in the state assessment. The results of the case studies have been obtained by following each step as outlined in the proposed decision-making framework, such as specifying the degradation characteristic of the system, selecting a sensor network design or monitoring scenario, applying a certain condition-based maintenance policy, evaluating the value of monitoring systems, and then providing feedback to system designers on the performance of the system design. Since the proposed design decision-making framework is used to evaluate the benefit of adopting a certain monitoring system to a system design, the result of case studies has mainly reported showing the influence of parameters of the designed system and the monitoring system on the operating cost. Finally, insights can be drawn from using the framework so that the decision-maker would know which monitoring system should be adopted to which system design to maximize the benefit.

4.1 Case Study I: A Battery Energy Storage System.

In the Battery Energy Storage System, battery assets are charged when the electricity price is low and discharged at a high price so that periodical energy demand can be satisfied and utility companies can make profits. Different from other equipment whose failure is self-announcing and explicit, the failure of a battery asset can be defined based on different criteria. One of the common practices is to replace battery assets whose capacity is less than 80%. Thus, the battery asset needs to be monitored continuously, and meanwhile, its state of health has to be predicted to maintain a sufficient power supply. The use of monitoring systems in the Battery Energy Storage System can help ensure a timely replacement of the degraded assets, thus satisfying the energy demand. Since demand unfulfillment will lead to a potential reduction in the market share and various damage from the end-users, it is critical to select sufficient monitoring systems and adopt them in an efficient manner to save the total operating cost and provide the adequate power supply. The specifications of the decision-making framework in Fig. 2 for case study I are summarized in Table 1.

Table 1

Problem setting for case study I

Framework setupCase study I specification
System structureEnergy storage system (parallel assets)
Performance expectationEnergy provided satisfies the periodical demand
Degradation modelCapacity loss based on the Arrhenius model as in Eq. (11)
Monitoring and prognosisMonitored individually and capacity predicted with different accuracy
MaintenanceReplacement at 80% capacity
Framework setupCase study I specification
System structureEnergy storage system (parallel assets)
Performance expectationEnergy provided satisfies the periodical demand
Degradation modelCapacity loss based on the Arrhenius model as in Eq. (11)
Monitoring and prognosisMonitored individually and capacity predicted with different accuracy
MaintenanceReplacement at 80% capacity

4.1.1 Problem Description for Case Study I.

Lithium-ion batteries are generally used in the energy storage system, and degradation of them can be modeled based on the Arrhenius model. The method proposed in Ref. [41] considers the influence of temperature, depth of discharge (DOD), and discharge rate and is applied to develop degradation models under different discharge rates. Since c/2 is an appropriate discharge rate in practice, this paper uses the model under the c/2 discharge rate to describe the degradation of battery assets in the system. The capacity loss model is expressed as Eq. (11), where R is the gas constant, Tk = 313.15 K is the Kelvin temperature, and Ah is the Ah-throughput, which is the product of cycle number, DOD and full capacity. In real practice, the degradation process of each individual asset may have a slight variance from the nominal degradation path. Therefore, parameter randomization is applied to simulate individual realizations of the theoretical degradation process, in which each parameter of the model follows a normal distribution whose mean value is the true parameter and the standard deviation is a percentage of the mean.
(11)

Suppose sensors are installed in each battery asset for the decision-maker to infer its health state. In this case study, we consider two types of state inference results. In the first scenario, later referred to as the unbiased scenario, the predicted state of health follows a normal distribution whose mean value is the true remaining capacity and the standard deviation is a certain percentage of the mean. When the predicted capacity exactly matches the true capacity, then the information is perfect. While in the second scenario, the predicted result is biased meaning that the predicted state of health is always larger or smaller than the true remaining capacity. And the replacement actions are performed periodically for those assets whose predicted remaining capacity is below 80%. Also, based on the periodical prediction result, the total capacity of the current asset is compared with the demand to initiate the purchase or usage of additional new assets.

To evaluate the value of monitoring systems, one-year management processes of two different operating scenarios: operation without monitoring information versus operation with monitoring information, are simulated and compared. Operating cost including purchase cost, monitoring investment, and penalty for the unsatisfied power demand is considered. And then, the breakeven point: the operating cost with a monitoring system is the same as that without the monitoring system, under different scenarios is explored. The purchase cost of a new battery asset with 100 kWh capacity is assumed to be 100k$. To explain the influence of the relative value of penalty and monitoring cost, we compare the relationship between the P/B ratio: penalty from unit unsatisfied demand/the purchase cost of a battery asset and M/B ratio: cost of a unit monitoring sensor/the purchase cost of a battery asset. The replacement decision is made monthly, and the periodical energy demand in the presented study has been assumed as shown in Fig. 4.

Fig. 4
Periodical energy demand
Fig. 4
Periodical energy demand
Close modal

4.1.2 Results of Case Study I.

For the first scenario with unbiased state inference result, we first provide a comparison of values of perfect information and imperfect information as shown in Fig. 5. It shows that the larger deviation the battery assets’ capacity degradation processes have, the more value monitoring and prognosis information can bring to the decision-maker. In Fig. 5, only one scenario with imperfect information is displayed. But based on different prediction accuracy levels, the difference between EVPI and EVII will also change. Furthermore, when taking the monitoring cost (i.e., the cost to collect the information) into consideration, breakeven points may occur. So next, by following the proposed decision-making framework, we explored the breakeven points in different combinations of prediction error percentage and the standard deviation of battery capacity under different penalty, purchase, and monitoring costs. The result is shown in Fig. 6. The result for each parameter setting is the average performance of 1000 simulation runs. Figure 6 indicates that the prediction error percentage should always be smaller than the battery standard deviation under breakeven points. When monitoring cost is high and battery variance is small, monitoring systems will not bring benefits.

Fig. 5
Comparison between EVPI and EVII
Fig. 5
Comparison between EVPI and EVII
Close modal
Fig. 6
Breakeven points for scenarios with unbiased prediction
Fig. 6
Breakeven points for scenarios with unbiased prediction
Close modal

For the second scenario with a biased state inference result, both positive bias and negative bias are considered. When the prediction has a positive bias, the result is shown in Fig. 7. With the positive biased information, battery capacity will be overestimated which makes the decision-maker purchase fewer battery assets so that monitoring information will always bring a high penalty. In this scenario, a breakeven point is a balance between purchase cost-savings and penalties within the case with monitoring systems. When the bias is small, monitoring costs cannot be high. Otherwise, the purchase saving cannot compensate penalty anyway and introducing a monitoring system will not bring benefits at all. When the prediction has a negative bias, the result is shown in Fig. 8. Battery capacity will be underestimated based on the negatively biased information which makes the decision-maker purchase more battery assets so that the monitoring information will lead to almost no penalty. In this scenario, a breakeven point is the balance between the monitoring cost and the penalty of the case without monitoring systems. When the bias is small and monitoring cost is low, monitoring systems can always bring benefits.

Fig. 7
Breakeven points for scenarios with positively biased predictions
Fig. 7
Breakeven points for scenarios with positively biased predictions
Close modal
Fig. 8
Breakeven points for scenarios with negatively biased predictions
Fig. 8
Breakeven points for scenarios with negatively biased predictions
Close modal

Thus, with the results in Figs. 68 as references, decision-makers can determine the minimum quality of the monitoring system required for managing a group of battery assets under a specific energy demand profile.

4.2 Case Study II: A General Mechanical Equipment.

General mechanical equipment like motors, fans, and turbines is frequently used in industry. Instead of the complete degradation process, the interest in analyzing the mechanical equipment lies in estimating its failure time. Weibull distributions are often applied to model the lifetime of a type of mechanical equipment, and sensor data (e.g., vibration) can be used to train prognostic models on the failure time. As the monitoring process continues, a more accurate prediction can be achieved from the accumulated data. In this case study, we consider the improvement of the prediction accuracy and the maintenance policy driven by the value of information.

4.2.1 Problem Description for Case Study II.

The specifications of the decision-making framework in Fig. 2 for case study II are summarized in Table 2. The lifetime distribution of a specific type of mechanical equipment is represented using the Weibull distribution, whose probability density function is shown in Eq. (12) with scale parameter α and shape parameter β. As described in Sec. 3.2.2, the prediction result is a normal distribution with decreasing bias and standard deviation. In this case study, the constant terms in formula (8) a = 0.125 and b = 0.1 and an example of how the prediction result evolves as the time approaches the true end-of-life is shown in Fig. 9 
(12)
Fig. 9
An example of the evolvement of the predicted failure time distribution (T = 200)
Fig. 9
An example of the evolvement of the predicted failure time distribution (T = 200)
Close modal
Table 2

Problem Setting for Case Study II

Framework setupCase study II specification
System typeGeneral mechanical equipment
Performance expectationAvoid failure and provide a cost-effective functional service duration
Degradation modelWeibull distribution
Monitoring and prognosisMonitored with embedded sensors and failure time predicted with improved accuracy
MaintenanceCondition-based maintenance using the state assessment process as described in Sec. 3.2.2
Framework setupCase study II specification
System typeGeneral mechanical equipment
Performance expectationAvoid failure and provide a cost-effective functional service duration
Degradation modelWeibull distribution
Monitoring and prognosisMonitored with embedded sensors and failure time predicted with improved accuracy
MaintenanceCondition-based maintenance using the state assessment process as described in Sec. 3.2.2

The maintenance processes with and without monitoring systems are compared to calculate the value of information and then improve the decision-making. Without the monitoring systems, the time to perform a replacement is optimized to minimize the expected cost rate EC1, which is calculated using Eq. (4). The best replacement time (tr*) based on the lifetime distribution is the one that minimizes the cost rate. And for the scenarios with monitoring systems, state assessment can be performed multiple times as long as the predicted result can provide positive value. The default parameters to conduct this case study are summarized in Table 3. These parameters are used to characterize the operating scenario to be evaluated. However, if such information is not available, we may conduct a sensitivity analysis to see how the maintenance decision and value of information change due to these parameters. In this study, when exploring the effect of a certain type of parameters, the other parameter values are kept the same as indicated in this table.

Table 3

Parameter values in general mechanical equipment application

ParameterValue
α120
β2
Be30
S300
μ1
cc200
cp50
ci1
ParameterValue
α120
β2
Be30
S300
μ1
cc200
cp50
ci1

4.2.2 Results for Case Study II.

In this case study, there are three types of parameters that will influence the number of predictions and the minimum maintenance cost rate. The first type of parameters includes different types of cost in the maintenance process, including corrective maintenance cost (cc), preventive maintenance cost (cp) and the cost to make a prediction (ci). The second type includes the scale and shape parameter of the lifetime distribution of the equipment. The third type is about the model to describe the improvement of the prediction accuracy. The following results show how the value of information and the maintenance decisions change as the three types of parameters vary.

For the first type of cost parameters, what militates indeed is the cost ratio between the failure consequence and the cost of making the preventive actions. So, we change the two cost ratios: cc/cp and cc/ci to explore the change in maintenance decisions and the value of information. The optimal number of predictions to be made before replacement is shown in Fig. 10 and the cumulative value of information from the corresponding number of predictions is shown in Fig. 11. As can be noticed from the results, the change in these two cost ratios will influence the optimal number of predictions to make and the cumulative value of information. When the corrective maintenance cost is only twice the preventive maintenance cost, the replacement time calculated using the original lifetime distribution is good enough so that additional prediction (requiring cost and involving uncertainty) will not help reduce the cost rate. That is why the numbers in the first row of Fig. 10 are all zero. When performing a preventive action can save more cost compared with corrective maintenance, we will choose to make one or two predictions to perform maintenance actions before the failure happens. And the relationship between preventive cost and prediction cost will together determine the number of predictions. When the cost of preventive maintenance is extremely low, the information from one prediction result is useful enough to save the operating cost. And Fig. 11 shows a consistent change in the cumulative value of information: the lower cost of the two preventive actions need, the more operating cost we can save.

Fig. 10
Number of predictions before replacement under different cost settings
Fig. 10
Number of predictions before replacement under different cost settings
Close modal
Fig. 11
Cumulative value of information under different cost settings
Fig. 11
Cumulative value of information under different cost settings
Close modal

For the second type of distribution parameters, we would like to explore how the change of variance and skewness will influence the decision and VoI. The results are shown in Figs. 12 and 13. By increasing the shape parameter β, the distribution changes from right-skewed to left-skewed. And by decreasing the scale parameter α, the distribution becomes more and more centralized. As shown in Fig. 12, for the right-skewed two cases with α = 80 and α = 100, only making one prediction is the best choice, but it cannot achieve the highest cumulative value as compared with the other two right-skewed cases with α = 120 and α = 140. So, there is a boundary condition for the optimal decision changes from making one prediction to making two predictions.

Fig. 12
Number of predictions before replacement under different lifetime distributions
Fig. 12
Number of predictions before replacement under different lifetime distributions
Close modal
Fig. 13
Cumulative value of information under different lifetime distributions
Fig. 13
Cumulative value of information under different lifetime distributions
Close modal

For the third type of prediction-related parameters, they influence how we model the improvement of the prediction results obtained using the monitoring data. First, the bounds for the bias term Be and the standard deviation term S are considered, whose influence on the number of failure time predictions and the cumulative value of information is shown in Figs. 14 and 15, respectively. It is seen from the figures that the replacement decision obtained based on the prediction result will shorten the usage length when the prediction result has a negative bias. Moreover, when the saving brought from preventive maintenance is neutralized by the shortened usage length, it is not beneficial to determine the replacement time based on the prediction result. On the other hand, when the prediction result has a positive bias, the replacement time obtained based on the prediction result will be later than the original plan. Thus, the optimal result will be using one predicted result to distinguish the individual from the population but not trusting additional predictions since the predicted information may lead to actual failure of the equipment. Furthermore, the quality of the prediction result will influence the time and quantile to make predictions, as shown in Figs. 16 and 17, respectively. When the predicted distribution has a negative bias, there will be a longer waiting time before making the second prediction. However, when the predicted failure time distribution has positive biases, the second prediction will follow up closely to best capture the actual failure time.

Fig. 14
Number of predictions before replacement under different prediction quality
Fig. 14
Number of predictions before replacement under different prediction quality
Close modal
Fig. 15
Cumulative value of information under different prediction quality
Fig. 15
Cumulative value of information under different prediction quality
Close modal
Fig. 16
Time to make the first prediction under different prediction quality
Fig. 16
Time to make the first prediction under different prediction quality
Close modal
Fig. 17
Quantile to make the second prediction under different prediction quality
Fig. 17
Quantile to make the second prediction under different prediction quality
Close modal

Another prediction-related parameter is µ, which represents the speed of accuracy improvement. The maintenance actions under different µ values are summarized in Table 4. We can notice that when µ is large, which indicates fast improvement in the predictions based upon the monitoring information, only one prediction is made for the case study. In addition, as the increment of µ, the time and quantile value based upon the failure time distribution to make predictions will also change accordingly.

Table 4

Influence of improvement speed

μ0.60.70.8b1.2
Number of predictions22221
First prediction time1413121110
Second prediction quantile0.090.150.210.21
Cumulative VoI0.5810.6260.7370.6480.636
μ0.60.70.8b1.2
Number of predictions22221
First prediction time1413121110
Second prediction quantile0.090.150.210.21
Cumulative VoI0.5810.6260.7370.6480.636

Thus, with the cumulative value of information calculated under different cost, equipment reliability, monitoring, and prognosis scenarios, decision-makers can select an operations management setting that can maximize the value of monitoring. Also, by following the optimal number of predictions and the best time to make a state assessment, the system can be maintained appropriately to achieve the largest monitoring benefit.

5 Conclusion

This paper establishes a decision-making framework, consisting of degradation modeling, monitoring system design, and maintenance planning, to evaluate the value of continuous monitoring systems in recurrent decision scenarios. Two case studies have been conducted to illustrate the efficacy of the framework applied to engineering systems with different operating scenarios, system modeling, and application of monitoring data. The results show that the value of monitoring systems will be influenced by the deviation among the equipment group, the accuracy of system-state prediction, and different types of costs involved in the operating process. And the adjustment of maintenance actions based on monitoring and prognosis information will help improve the value of monitoring systems. With the proposed framework, decision-makers will be able to evaluate the cost and benefits of adopting a certain monitoring system to a certain system design. This way, a cost-effective monitoring system can be selected to guarantee sufficient time-dependent reliability for the system, and in turn, the more accurate operating performance estimation can help improve the system design.

The current case studies only focus on one failure mode and use replacement with a new asset or component as the only maintenance action. While in real applications, more failure modes and other types of maintenance activities can be incorporated to estimate the operating cost with monitoring systems more accurately. Besides, the framework simplifies data processing and state prognosis by considering one unified monitoring system while in practice one engineering system may be monitored by multiple types of sensors, and there can be interactions among different monitoring systems. And another future work of this study is to incorporate maintenance cost evaluation considering the usage of monitoring systems with the design decisions. By adopting the proposed framework, the operating and maintenance cost can be estimated more accurately, which can be used as one of the evaluation criteria for system or product designs.

Acknowledgment

This research is partially supported by National Science Foundation (NSF) through the Faculty Early Career Development (CAREER) award: CMMI-1813111, and the NSF Engineering Research Center for Power Optimization of Electro-Thermal Systems (POETS) with cooperative agreement EEC-1449548.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The authors attest that all data for this study are included in the paper.

Nomenclature

B =

purchase cost of a battery asset

L =

time to perform a maintenance action

M =

cost of a unit monitoring sensor

P =

penalty from unit unsatisfied demand

R =

gas constant

S =

deviation bound of the predicted lifetime

T =

actual lifetime

cc =

corrective maintenance cost

ci =

state inference cost

qr =

quantile of a distribution to perform a maintenance action

tr =

time to make a replacement

Ah =

Ah throughput

Be =

bias bound of the predicted lifetime

F1 =

cumulative density function of the true lifetime distribution

Fp =

cumulative density function of the predicted lifetime distribution

Qloss =

battery capacity loss

Tk =

Kelvin temperature

cm¯ =

allocated monitoring cost

tr* =

best time to make a replacement

cp =

preventive maintenance cost

CM =

total monitoring cost

α =

scale parameter of Weibull distribution

β =

shape parameter of Weibull distribution

τ =

time to perform health state prognosis

γl =

health state prognosis result at time l

θl =

a pre-determined threshold to initiate maintenance actions

µl =

improvement of the prediction accuracy

References

1.
Hazelrigg
,
G. A.
,
1998
, “
A Framework for Decision-Based Engineering Design
,”
ASME J. Mech. Des.
,
120
(
4
), pp.
653
658
.
2.
Hu
,
Z.
, and
Du
,
X.
,
2013
, “
Time-Dependent Reliability Analysis With Joint Upcrossing Rates
,”
Struct. Multidiscipl. Optim.
,
48
(
5
), pp.
893
907
.
3.
Li
,
J.
, and
Mourelatos
,
Z. P.
,
2009
, “
Time-Dependent Reliability Estimation for Dynamic Problems Using a Niching Genetic Algorithm
,”
ASME J. Mech. Des.
,
131
(
7
), p.
071009
.
4.
Wang
,
P.
,
Wang
,
Z.
, and
Almaktoom
,
A. T.
,
2014
, “
Dynamic Reliability-Based Robust Design Optimization With Time-Variant Probabilistic Constraints
,”
Eng. Optim.
,
46
(
6
), pp.
784
809
.
5.
Wang
,
Z.
, and
Wang
,
P.
,
2012
, “
A Nested Extreme Response Surface Approach for Time-Dependent Reliability-Based Design Optimization
,”
ASME J. Mech. Des.
,
134
(
12
), p.
121007
.
6.
Zhang
,
J.
, and
Du
,
X.
,
2011
, “
Time-Dependent Reliability Analysis for Function Generator Mechanisms
,”
ASME J. Mech. Des.
,
133
(
3
), p.
031005
.
7.
Wang
,
Z.
, and
Wang
,
P.
,
2013
, “
A New Approach for Reliability Analysis With Time-Variant Performance Characteristics
,”
Reliab. Eng. Syst. Saf.
,
115
(
1
), pp.
70
81
.
8.
Wang
,
Z.
,
Huang
,
H.-Z.
, and
Du
,
X.
,
2010
, “
Optimal Design Accounting for Reliability, Maintenance, and Warranty
,”
ASME J. Mech. Des.
,
132
(
1
), p.
011007
.
9.
Wang
,
P.
,
Wang
,
Z.
,
Youn
,
B. D.
, and
Lee
,
S.
,
2015
, “
Reliability-Based Robust Design of Smart Sensing Systems for Failure Diagnostics Using Piezoelectric Materials
,”
Comput. Struct.
,
156
(
1
), pp.
110
121
.
10.
Wang
,
P.
,
Youn
,
B. D.
,
Hu
,
C.
,
Ha
,
J. M.
, and
Jeon
,
B.
,
2015
, “
A Probabilistic Detectability-Based Sensor Network Design Method for System Health Monitoring and Prognostics
,”
J. Intell. Mater. Syst. Struct.
,
26
(
9
), pp.
1079
1090
.
11.
Wang
,
P.
,
Tamilselvan
,
P.
, and
Hu
,
C.
,
2014
, “
Health Diagnostics Using Multi-attribute Classification Fusion
,”
Eng. Appl. Artif. Intell.
,
32
(
1
), pp.
192
202
.
12.
Bai
,
G.
, and
Wang
,
P.
,
2016
, “
Prognostics Using an Adaptive Self-cognizant Dynamic System Approach
,”
IEEE Trans. Reliab.
,
65
(
3
), pp.
1427
1437
.
13.
Almaktoom
,
A. T.
,
Krishnan
,
K. K.
,
Wang
,
P.
, and
Alsobhi
,
S.
,
2016
, “
Cost Efficient Robust Global Supply Chain System Design Under Uncertainty
,”
Int. J. Adv. Manuf. Technol.
,
85
(
1
), pp.
853
868
.
14.
Yodo
,
N.
, and
Wang
,
P.
,
2016
, “
Engineering Resilience Quantification and System Design Implications: A Literature Survey
,”
ASME J. Mech. Des.
,
138
(
11
), p.
111408
.
15.
Ramani
,
K.
,
Ramanujan
,
D.
,
Bernstein
,
W. Z.
,
Zhao
,
F.
,
Sutherland
,
J.
,
Handwerker
,
C.
,
Choi
,
J.-K.
,
Kim
,
H.
, and
Thurston
,
D.
,
2010
, “
Integrated Sustainable Life Cycle Design: A Review
,”
ASME J. Mech. Des.
,
132
(
9
), p.
091004
.
16.
Ma
,
H.
,
Chu
,
X.
,
Lyu
,
G.
, and
Xue
,
D.
,
2017
, “
An Integrated Approach for Design Improvement Based on Analysis of Time-Dependent Product Usage Data
,”
ASME J. Mech. Des.
,
139
(
11
), p.
111401
.
17.
Behdad
,
S.
, and
Thurston
,
D.
,
2012
, “
Disassembly and Reassembly Sequence Planning Tradeoffs Under Uncertainty for Product Maintenance
,”
ASME J. Mech. Des.
,
134
(
4
), p.
041011
.
18.
Chung
,
W.-H.
,
Okudan Kremer
,
G. E.
, and
Wysk
,
R. A.
,
2014
, “
A Modular Design Approach to Improve Product Life Cycle Performance Based on the Optimization of a Closed-Loop Supply Chain
,”
ASME J. Mech. Des.
,
136
(
2
), p.
021001
.
19.
Sun
,
B.
,
Zeng
,
S.
,
Kang
,
R.
, and
Pecht
,
M. G.
,
2012
, “
Benefits and Challenges of System Prognostics
,”
IEEE Trans. Reliab.
,
61
(
2
), pp.
323
335
.
20.
Liu
,
X.
,
Zheng
,
Z.
,
Toy
,
E. B.
,
Zhou
,
Z.
, and
Wang
,
P.
,
2021
, “
Battery Asset Management With Cycle Life Prognosis
,”
Reliab. Eng. Syst. Saf.
,
216
(
1
), p.
107948
.
21.
Liu
,
X.
, and
Wang
,
P.
,
2020
, “
Maintenance Decision Making Using State Dependent Markov Analysis With Failure Couplings
,”
Proceedings of the 2020 Asia-Pacific International Symposium on Advanced Reliability and Maintenance Modeling (APARM)
,
Vancouver, Canada
,
Aug. 20–23
, pp.
1
6
.
22.
Ferentinos
,
K. P.
, and
Tsiligiridis
,
T. A.
,
2007
, “
Adaptive Design Optimization of Wireless Sensor Networks Using Genetic Algorithms
,”
Computer Networks
,
51
(
4
), pp.
1031
1051
.
23.
Rathi
,
S.
, and
Gupta
,
R.
,
2014
, “
Sensor Placement Methods for Contamination Detection in Water Distribution Networks: A Review
,”
Procedia Eng.
,
89
(
1
), pp.
181
188
.
24.
Gorjian
,
N.
,
Ma
,
L.
,
Mittinty
,
M.
,
Yarlagadda
,
P.
, and
Sun
,
Y.
,
2010
, “
A Review on Degradation Models in Reliability Analysis
,”
Engineering Asset Lifecycle Management
,
Athens, Greece
,
Sept. 20–23
, Springer, pp.
369
384
.
25.
Gorjian
,
N.
,
Ma
,
L.
,
Mittinty
,
M.
,
Yarlagadda
,
P.
, and
Sun
,
Y.
,
2010
, “
A Review on Reliability Models With Covariates
,”
Engineering Asset Lifecycle Management
,
Springer
, pp.
385
397
.
26.
Bian
,
L.
, and
Gebraeel
,
N.
,
2014
, “
Stochastic Modeling and Real-Time Prognostics for Multi-Component Systems With Degradation Rate Interactions
,”
IIE Trans.
,
46
(
5
), pp.
470
482
.
27.
Guo
,
C.
,
Wang
,
W.
,
Guo
,
B.
, and
Si
,
X.
,
2013
, “
A Maintenance Optimization Model for Mission-Oriented Systems Based on Wiener Degradation
,”
Reliab. Eng. Syst. Saf.
,
111
(
1
), pp.
183
194
.
28.
Do
,
P.
,
Voisin
,
A.
,
Levrat
,
E.
, and
Iung
,
B.
,
2015
, “
A Proactive Condition-Based Maintenance Strategy With Both Perfect and Imperfect Maintenance Actions
,”
Reliab. Eng. Syst. Saf.
,
133
(
1
), pp.
22
32
.
29.
Alaswad
,
S.
, and
Xiang
,
Y.
,
2017
, “
A Review on Condition-Based Maintenance Optimization Models for Stochastically Deteriorating System
,”
Reliab. Eng. Syst. Saf.
,
157
(
1
), pp.
54
63
.
30.
Guillen
,
A. J.
,
Crespo
,
A.
,
Gomez
,
J. F.
, and
Sanz
,
M. D.
,
2016
, “
A Framework for Effective Management of Condition Based Maintenance Programs in the Context of Industrial Development of e-Maintenance Strategies
,”
Comput. Ind.
,
82
(
1
), pp.
170
185
.
31.
Huynh
,
K. T.
,
Barros
,
A.
, and
Berenguer
,
C.
,
2012
, “
Maintenance Decision-Making for Systems Operating Under Indirect Condition Monitoring: Value of Online Information and Impact of Measurement Uncertainty
,”
IEEE Trans. Reliab.
,
61
(
2
), pp.
410
425
.
32.
Fauriat
,
W.
, and
Zio
,
E.
,
2020
, “
Optimization of an Aperiodic Sequential Inspection and Condition-Based Maintenance Policy Driven by Value of Information
,”
Reliab. Eng. Syst. Saf.
,
204
(
1
), p.
107133
.
33.
Clemen
,
R. T.
, and
Reilly
,
T.
,
2013
,
Making Hard Decisions With Decision Tools
,
Cengage Learning
,
Boston, MA
.
34.
Parnell
,
G. S.
,
Terry Bresnick
,
M.
,
Tani
,
S. N.
, and
Johnson
,
E. R.
,
2013
,
Handbook of Decision Analysis
, Vol.
6
,
John Wiley & Sons
,
Hoboken, NJ
.
35.
Pozzi
,
M.
, and
Der Kiureghian
,
A.
,
2011
, “
Assessing the Value of Information for Long-Term Structural Health Monitoring
,”
Proceedings of the Health Monitoring of Structural and Biological Systems
,
San Francisco, CA
, p. 79842W.
36.
Ahmad
,
R.
, and
Kamaruddin
,
S.
,
2012
, “
An Overview of Time-Based and Condition-Based Maintenance in Industrial Application
,”
Comput. Ind. Eng.
,
63
(
1
), pp.
135
149
.
37.
Tian
,
Z.
,
Wu
,
B.
, and
Chen
,
M.
,
2014
, “
Condition-Based Maintenance Optimization Considering Improving Prediction Accuracy
,”
J. Oper. Res. Soc.
,
65
(
9
), pp.
1412
1422
.
38.
Gebraeel
,
N.
,
2006
, “
Sensory-updated Residual Life Distributions for Components With Exponential Degradation Patterns
,”
IEEE Trans. Autom. Sci. Eng.
,
3
(
4
), pp.
382
393
.
39.
Marble
,
S.
, and
Morton
,
B. P.
,
2006
, “
Predicting the Remaining Life of Propulsion System Bearings
,”
Proceedings of the 2006 IEEE Aerospace Conference
,
IEEE
, p.
8
.
40.
Huynh
,
K. T.
,
Grall
,
A.
, and
Berenguer
,
C.
,
2018
, “
A Parametric Predictive Maintenance Decision-Making Framework Considering Improved System Health Prognosis Precision
,”
IEEE Trans. Reliab.
,
68
(
1
), pp.
375
396
.
41.
Wang
,
J.
,
Liu
,
P.
,
Hicks-Garner
,
J.
,
Sherman
,
E.
,
Soukiazian
,
S.
,
Verbrugge
,
M.
,
Tataria
,
H.
,
Musser
,
J.
, and
Finamore
,
P.
,
2011
, “
Cycle-Life Model for Graphitelifepo4 Cells
,”
J. Power Sources
,
196
(
8
), pp.
3942
3948
.