Energy efficiency improvement and timely preventive maintenance (PM) are critical in manufacturing industry due to the rising energy cost, environmental concerns, and increasing requirements on system reliability. By strategically turning appropriate machines in down state, the corresponding energy consumption can be reduced, and at the meantime, the necessary PM works can be carried out to increase PM completion rate and reduce potential extra expense on PM during nonproduction shifts. However, there is usually a tradeoff between time dedicated to production and time available for energy saving and PM. In this paper, a systematic method is developed to identify opportunity windows (OWs) during which certain machines can be shut down to save energy and PM tasks can be performed while maintaining a desired production throughput. The method is based on stochastic serial production lines and real-time production data. A profit function is formulated to illustrate the tradeoff between energy cost savings and potential throughput loss. The profit function is used to justify the cost savings by utilizing the proposed OWs during production operation.

## Introduction

Energy consumption, used to receive relatively little attention, is now an important factor to evaluate a production system. With rising energy costs, increasing global competitiveness, environmental concerns, and more government regulations, energy consumption plays a more critical role in modern manufacturing operation. However, most of the studies in manufacturing system modeling and analysis have mainly focused on improving the production efficiency, flexibility, and responsiveness [1–3]. In most current manufacturing execution systems, there is no module or function to deal with energy management during production operation. Researchers have discovered that 85% of energy consumption in the production plant is used on functions not related to the production of parts [4], and a huge amount of energy is wasted during machines blockages and starvations without production [5]. This emphasizes the importance of identifying energy saving opportunities through the analysis of stochastic production systems.

Another significant factor to be considered in manufacturing systems is PM. For manufacturing systems with unreliable machines and finite internal buffers, PM is critical to reduce machine failures and increase system reliability and efficiency [6]. Usually, PM tasks require the equipment to be stopped in order for work to be safely performed. Consequently, there is a tradeoff between time dedicated to production and time available for PM. This is a particularly important issue in three-shift operations, where any downtime, planned or unplanned, directly impacts the amount of time available for production. In such a situation, planned downtime for maintenance, i.e., PM, must be balanced against the avoidance of unplanned downtime resulting from equipment failures. Some PM tasks are often performed late or not at all, resulting in a long-term decline in equipment reliability and system capability [6]. Even in plants operating on a one-shift or two-shift schedule, where planned downtime is readily available, the cost of staffing nonproduction shifts and/or scheduled overtime to perform PM may be prohibitive [6]. Therefore, it is desirable to find opportunities during the production shifts to do more PM works.

It is noted that to acquire opportunities for energy saving and maintenance, real-time dynamic analysis of manufacturing systems is needed. However, traditional methods in modeling manufacturing systems have focused on steady-state performance analysis. The studies can be grouped into analytical methods and simulation methods. While simulation models are widely adopted to evaluate performance of complex manufacturing systems [7–10], simulation modeling and analysis can be time-consuming and expensive, and simulation results can be difficult to interpret [11]. For analytical methods, decomposition and aggregation methods are utilized with Markov chain models to estimate the important performance metrics [12,13]. For example, Xia et al. [14] developed a decomposition method for the performance evaluation of serial production lines with unreliable machines and finite transfer-delay buffers. These methods are all based on long-term steady-state assumptions and are no longer applicable to the real-time dynamic analysis.

For the past decade, there is an increasing interest in the study of the transient behavior of production systems [15–18]. Chen et al. derived the mathematical model for transient performance evaluation for synchronized serial production lines with geometric machines [16]. Transient system performance, system-theoretic properties, workforce allocation, and bottlenecks of serial production lines with Bernoulli machines and finite buffers are studied in Ref. [18]. However, the method only provides some basic properties such as settling time monotonicity, while the transient throughput and work-in-process are actually derived based on approximation.

The OW concepts and evaluation methods have been developed based on deterministic analysis [5,6,19–21]. The methods bring explicit and systematic solutions to estimate the energy saving potential and maintenance opportunity. However, the deterministic-based analysis only provides a loose bound solution of OW. A real production system is stochastic. Therefore, it is necessary to develop a methodology to evaluate energy saving and maintenance opportunities in stochastic production systems. This paper is dedicated to this end.

The contribution of this research is in twofold. First, an innovative approach is developed to combine the stochastic modeling and data-driven method. Markov property is adopted and real-time data (e.g., machine up–down status and buffers' levels) are used to evaluate system status and OW. This is fundamentally different from traditional stochastic modeling techniques. The advantage of this modeling approach is that it enables a real-time prediction capability to more accurately estimate system performance by utilizing real-time information. Such a method was not explored before. Second, the OW concepts are extended from deterministic analysis to stochastic analysis.

The remainder of this paper is organized as follows. In Sec. 2, we provide manufacturing system assumptions and notations. Section 3 estimates the status of a discrete-time stochastic production system and evaluates the corresponding OW and recovery time. Case studies are given in Sec. 4 to validate the performance of the estimation and evaluation results. Conclusions and future work are summarized in Sec. 5.

## Assumptions and Notations

We consider the discrete-time (or discrete event system) continuous flow models to analyze the impact of OW on the production process. The continuous flow model is used in this paper because the production dynamics can be conveniently described by integral or differential equations [13,19]. The continuous flow model assumes that the quantity of jobs in the buffer varies continuously from zero to its capacity. For a serial production line with $M$ machines (represented by the rectangles) and $M\u22121$ buffers (represented by the circles) as shown in Fig. 1, the following notations are used in this paper:

$Mi$ denotes the

*i*th machine, where $1\u2264i\u2264M$.$Bi$ denotes the

*i*th buffer; each buffer $B2,B3,\u2026,BM$ has a finite capacity. With abuse of notation, $B2,B3,\u2026,BM$ are also used to denote the capacity of the buffer.$bi(n)$ denotes buffer level of $Bi,i=2,\u2026,M,$ at the

*n*th unit cycle.$si(n)$ denotes actual processing speed measured in unit cycles for machine $Mi$ at the

*n*th unit cycle.The production process is described in the discrete-time model with unit cycle duration as $\Delta t$. Each machine $Mi$ has a constant rated speed $1/(Ti\Delta t)$, $i=1,2,\u2026,M$, where $Ti\Delta t$ is the cycle time for machine $Mi$, and $Ti$ denotes the number of unit cycles in one work cycle of machine $Mi$. Therefore, in the discrete-time model, the rated speed measured in unit cycles for machine $Mi$ is $1/Ti$. An operational machine runs at its rated speed when it is neither starved nor blocked.

$MMk*$ denotes the slowest machines, i.e., $Mk*=argmax(Ti)i=1,\u2026,M,1\u2264k\u2264M$.

$MM*$ denotes the slowest machine that is closest to the end-of-line machine $SM$, since there might be one or multiple slowest machines in a system as described in Eq. (3). It is also denoted as the last slowest machine.

$e\u21c0i=(j,ni,di),i=1,2,\u2026,j=1,2,\u2026,M$, denotes a downtime event that machine $Mj$ is down at the $nith$ unit cycle for $di$ unit cycles.

$E={e\u21c01,e\u21c02,\u2026,e\u21c0k}$ denotes a sequence of downtime events of the line.

$PL$ denotes permanent production loss of the whole line.

$pi10,i=1,2,\u2026,M$ denotes the transition probability that machine $Mi$ transit from up to down state at the

*n*th unit cycle.$pi01,i=1,2,\u2026,M$ denotes the transition probability that machine $Mi$ transit from down to up state at the

*n*th unit cycle.$Pi1(n),i=1,2,\u2026,M,n=1,2,3,\u2026,$ denotes the probability that machine $Mi$ is up at the

*n*th unit cycle.$Pi0(n),i=1,2,\u2026,M,n=1,2,3,\u2026,$ denotes the probability that machine $Mi$ is down at the

*n*th unit cycle.$\theta i(n),i=1,2,\u2026,M,n=1,2,3,\u2026,$ denotes the processing probability of machine $Mi$ at the

*n*th unit cycle.

We make the following assumptions:

- (1)
A machine is blocked if it is up and its downstream buffer is full. The last machine is never blocked.

- (2)
A machine is starved if it is up and its upstream buffer is empty. The first machine is never starved.

- (3)
Machine failures are time-dependent. This means that the machine breakdowns may occur even while it is blocked or starved.

- (4)
$MCBFi$ and $MCTRi$ denote the mean-cycle-between-failure and the mean-cycle-to-repair of machine $Mi$, respectively. They are assumed to be geometric random variables [22]. Let $\lambda i=1/MCBFi$ and $\mu i=1/MCTRi$ [12].

- (5)
The power consumption of machine $Mi$ is reduced to a certain level if it is turned off.

- (6)
Each machine will run at its power rating $Pm$ when up and will consume no power when turned off.

- (7)
The machine warm up time is not considered for the ease of mathematical expression.

## Discrete-Time Markov Chain Stochastic SerialProduction Line Model

We will first consider a stochastic two-machine-one-buffer system. Then, we extend the analysis to the general stochastic serial production systems.

### Stochastic Two-Machine-One-Buffer System.

*n*th unit cycle, respectively. The system possesses Markov property due to geometric reliability assumption [13,23–27]. Therefore, the relationship between $Pi(n)$ and $Pi(n+1)$ for $M1$ and $M2$ is

where $A1=[1\u2212p110p101p1101\u2212p101]$ and $A2=[1\u2212p210p201p2101\u2212p201]$ are the probability transition matrices for $M1$ and $M2$.

When $Mi$ is not being blocked or starved at the *n*th unit cycle and $ui(n)=1$, its processing probability equals the probability that it is up. Otherwise, the processing probability is zero.

### Stochastic OW and Recovery Time.

The OW of machine $Mi$ is defined as the longest possible downtime of $Mi$ that does not result in permanent production loss at the end-of-line machine [6,12,13].

*where*$\u2211n=1NsM(n)$*and*$\u2211n=1Ns\u0303M(n;e\u21c0)$*are the production counts of the end-of-line station*$MM$*within*$[1,N]$*, with and without inserted downtime event*$e\u21c0=(i,nd,d)$*, respectively.*$N*(d)$*signifies the potential dependency of*$N*$*on*$d$.

*n*th unit cycle. Based on the estimation results from Sec. 3.1 and Definition 1, $WSi(n)$ can be evaluated as

Another quantity of practical interest is how long it takes for the system to recover after a downtime event [19,28,29]. We can define the recovery time of each machine in the discrete-time case as follows.

*where*$\u2211k=1Nsi(k)$*and*$\u2211k=1Ns\u0303i(k,e\u21c0)$*are the production counts of*$Mi$*within*$[1,N]$*, with and without downtime event*$e\u21c0=(i,Td,d)$.

Since the performance measures of the end-of-line machine are commonly considered as performance measures of the whole line, it is reasonable to consider the recovery time of the whole line as the time it takes for the end-of-line station to recover.

*where*$Tr(M,e\u21c0)$*is the recovery time of the end-of-line machine*$MM$*after the downtime event*$e\u21c0=(i,Td,d)$.

where $W1(Td)$ and $W2(Td)$ denote the OWs in deterministic analysis for $M1$ and $M2$, respectively.

In stochastic systems, the above calculation needs to be revised. For a two-machine-one-buffer production system, let $Tre(i,\u2009(i,\u2009Td,\u2009d))$ denote the recovery time for $Mi$ after an inserted downtime event $e\u21c0=(i,\u2009Td,\u2009d)$ in stochastic scenario, by definition we have

### Stochastic Serial Production Line Model With Multimachine Multibuffer.

*n*th unit cycle, and the relationship between $Pi(n)$ and $Pi(n+1)$ is

Similar to the analysis for Eqs. (5) and (6) in Sec. 3.1, we have the processing probability of $Mi,i=1,\u2026,M$ as follows:

The relationship between the production loss of the entire production line and the production loss of the slowest machine for the discrete-time model is presented in Lemma 1.

*Let machine*$MM*$

*be the unique slowest station in a serial transfer line consisting of*$M$

*machines, as shown in Fig.*1

*. Suppose a single isolated downtime event*$e\u21c0\u2032=(j,n\u2032d,d\u2032)$

*at*$Mj$

*leads to stoppage event*$e\u21c0=(M*,nd,d)$

*at*$MM*$

*. Then for any given machine*$Mi$

*in the line,*$\u2203n*\u2265nd+d$

*, which may depend on the location of*$Mi$

*such that [*ib19

*19*

*]*

*where*$\u2211n=1Nsi(n)$*and*$\u2211n=1Ns\u0303i(t;e\u21c0\u2032)$*are the production counts of*$Mi$*within*$[1,N]$*, with and without inserted downtime event*$e\u21c0\u2032=(j,n\u2032d,d\u2032)$.

*Proof.*Due to the fact that the slowest machine can only operate at two different speed, we have $s\u0303M*(n;e\u21c0\u2032)=1/TM*,n\u2209[nd,nd+d\u22121]$ and $s\u0303M*(n;e\u21c0\u2032)=0,n\u2208[nd,nd+d\u22121]$. Clearly, $n\u2032d+d\u2032=nd+d$ since the stoppage event $(M*,nd,d)$ results from $e\u21c0\u2032=(j,n\u2032d,d\u2032)$. If the slowest machine is $Mi$, i.e., $M*=i$, let $n*=nd+d$, then $\u2200N>n*$

Similar to the analysis for the two-machine-one-buffer system case, Definitions 1 and 2 are still applicable for the general stochastic systems. Let $WSi(n)$ denote the OW for $Mi$ at the *n*th unit cycle in stochastic scenario, we have the following theorem:

*Let machine*$MM*$

*be the unique slowest machine in a serial transfer line consisting of*$M$

*machines, as shown in Fig.*1

*,*$Mi$

*stop from the n*

*th*

*unit cycle, and all other machines operate without inserted downtime; the*

*OW*

*of*$Mi$

*at the n*

*th*

*unit cycle is*

*Proof.* The case $i=M*$ is proved by contradiction. Suppose the OW of the slowest machine $MM*$ at the *n*th unit cycle is not zero, i.e., $WSM*(n)>0$. Then, for any downtime event $(M*,n,d)$ with $0<d\u2264WSM*(n)$, the difference between the undisturbed and disturbed production count trajectories of the end-of-line machine $MM$ is nonzero, which contradicts the definition of OW.

We will prove the case when $i<M*$. An equivalent condition of Definition 1 requires that the duration of the stoppage event at the slowest machine $MM*$ be zero since any nontrivial stoppage event $(M*,ns,s)$ resulted from an inserted downtime event $e\u21c0=(i,n,d)$ at another machine $Mi$ eventually leads to nontrivial discrepancy between $\u2211k=1NsM(k)$ and $\u2211k=1Ns\u0303M(k;e\u21c0\u2032)$.

Considering the line segment between $Mi$ and $MM*$, as shown in Fig. 2, we insert a downtime event $e\u21c0=(i,n,d)$ at $Mi$ with duration $d$ at the *n*th unit cycle. Immediately after $Mi$ is down, there is no flow into this line segment until the $(n+d)th$ unit cycle. Applying the conservation of flow during the interval $[n,n+d\u22121]$ yields

where $si(k;e\u21c0)=0,k\u2208[n,n+d\u22121]$. In the case when $i<M*$, the slowest machine will not be starved by $Mi$ until $\u2211m=i+1M*bm(n;e\u21c0)$ becomes zero. In the stochastic model, the buffer content between $Mi$ and the slowest machine $MM*$ will be gradually drained. Therefore, the number of unit cycles it takes for all the buffers between $Mi$ and the slowest machine $MM*$ to become empty is $d*=inf{WS\u22650:s.t.\u2211k=i+1M*bk(n+WS)=0}$. If the downtime duration $d$ is greater than $d*$, the slowest machine $MM*$ will be starved by $Mi$. In light of Lemma 1, this will eventually lead to a permanent production loss between $\u2211k=1NsM(k)$ and $\u2211k=1Ns\u0303M(k;e\u21c0\u2032)$. Therefore, in the case when $i<M*$, the OW of $Mi$ at the *n*th unit cycle is $WSi(n)=d*$. Analogously, one can also prove the case when $i>M*$.

For the convenience of evaluating the upper bound of recovery time in the stochastic scenario, we first introduce Lemmas 2 and 3.

Lemma 2. *Let machine*$MM*$*be the unique slowest machine in a serial transfer line consisting of*$M$*machines as shown in Fig.*1 *, and then when*$Mi$*and*$MM*$*are up at the n*th *unit cycle*

- (1)
*for*$i>M*$*, if*$si(n)=sM*(n)=1/TM*$*, then all the buffers between*$MM*$*and*$Mi$*are empty, i.e.,*$\u2211j=M*+1ibj(n)=0$ - (2)
*for*$i<M*$*, if*$si(n)=sM*(n)=1/TM*$*, then all the buffers between the*$MM*$*and*$Mi$*are full, i.e.,*$\u2211j=i+1M*bj(n)=\u2211j=i+1M*Bj$

*Proof.* (1) For $i>M*$. If $si(n)=sM*(n)=1/TM*=min{1/Tm,m=1,\u2026,M}$, machine $Mi$ is partially starved by $MM*$. This means that all the buffers between $MM*$ and $Mi$ are empty, i.e., $\u2211j=M*+1ibj(n)=0$. Similarly one can also prove case (2).

Lemma 3. *Let machine*$MM*$*be the unique slowest machine in a serial transfer line consisting of*$M$*machines as shown in Fig.*1 *. The recovery time of machine*$Mi,i\u2260M*$*after a single isolated downtime event*$e\u21c0=(i,Td,d)$*with*$d\u2264WSi(Td)$*is bounded from above by the amount of time for all buffers between*$MM*$*and*$Mi$*to become empty (when*$k>M*$*) or full (when*$k<M*$*).*

*Proof.*In the case when $k>M*$, for a single isolated downtime event $e\u21c0=(i,Td,d)$ with $d\u2264WSi(Td)$, consider the following sets:

This completes the proof for the case when $k>M*$. Similarly, one can also prove the case when $k<M*$.

Let $Tre(i,(i,n,d))$ denote the recovery time in stochastic scenario for $Mi$ after an inserted downtime event $e\u21c0=(i,n,d)$, we have the following theorem:

*Let machine*$MM*$

*be the unique slowest machine in a serial transfer line consisting of*$M$

*machines, as shown in Fig.*1

*, and let*$e\u21c0=(i,n,d),d\u2264WSi(n)$

*be the only inserted downtime; the upper bound of recovery time for*$Mi$

*is*

*Proof.* As discussed previously, if the downtime event is larger than the OW of any individual machine, then the recovery time is infinite. Similar to the analysis on OW, the recovery time of each machine depends on the location of $Mi$ in relation to the slowest machine in the line. We start with $i=M*$. Based on the statement in Theorem 2, we have $d\u2264WSM*(n)=0$, namely, no downtime is allowed to insert on $MM*$. Therefore, the recovery time $Tre(i,(i,n,d))=\u221e$.

which gives the amount of unit cycles required for all buffers between $MM*$ and $Mi$ to first become empty.

According to Lemma 3, $N*$ is an upper bound for the recovery time of $Mi$ after a downtime event $e\u21c0=(i,n,d)$ with $d\u2264WSi(n)$. And it is apparent that $T2i=N1$. Analogously, one can also prove the case when $i<M*$.

*Remark.* The relationships between the estimated OWs/recovery time and the system parameters such as machines' MCBFs and MCTRs can be explicitly examined. According to the machine probability distribution analysis and Eq. (13), the probability, $Pi1(n),\u2200i\u2260M*$, becomes larger if $MCBFi$ increases or $MCTRi$ decreases. Therefore, from the OW and recovery time evaluation analysis in Eqs. (19) and (21), under the same initial conditions, the corresponding OW of machine $Mi,\u2200i\u2260M*$, will remain unchanged, while the corresponding recovery time will decrease. This means that the OW evaluation is relatively insensitive to machines' MCBFs and MCTRs.

### Profit Analysis.

where PC is the system production count, CE is the total cost of energy used by the production line, and $cp$ is the profit per part.

We will assume that the expenses to produce the part in terms of material and labor are already considered in the profit per part calculation. In this research, inventory cost is ignored, since the tradeoff between throughput and energy savings is the main concern.

Note that OWs provide opportunities for energy saving and opportunities to perform more PM works during production hours, so that more PM can be accomplished, and after-production shift PM can be reduced for cost savings. This scenario of maintenance cost is out of the scope of this paper, since we focus on the real-time estimation of OWs and their potential impact on throughput. The maintenance analysis and simulation can be found in previous research [6,19].

## Case Studies

Extensive numerical experiments are executed to verify the performance of the proposed method. The system analyzed is composed of 15 machines and 14 buffers as shown in Fig. 3, which is based on a portion of an engine block line. By changing the system parameters, hundreds of different lines can be obtained. We evaluate the profit through applying a control scheme using the OW and recovery time calculation. Compared to the deterministic-based method as described in Eqs. (18) and (20), the stochastic-based method is numerically proved to be more effective, which results in an average of 10–20% profit increase depending on the energy price and profit of each part as described in the profit analysis in Sec. 3.4. As an illustration, one case study is presented using the system parameters shown in Tables 1 and 2. The profit of each part is assumed to be $cp=$300$, and the energy price per kWh is assumed to be $ce=$0.2$.

The unit cycle length is set as $\Delta t=1\u2009min$, and the total simulation length is set as $25,000\u2009min$, which is approximately two and half weeks. For demonstration purpose, all the OWs are taken at $M6$, which is the machine with the highest speed. Whenever $M6$ takes an OW, it has to wait for a period of recovery time to take the next OW. Three scenarios are compared: (1) deterministic-based method as in Eqs. (18) and (20), (2) stochastic analysis as described in Eqs. (19) and (21), and (3) baseline production system with no OWs. The corresponding OW schedules, i.e., the time to insert OWs and the durations of OWs, are shown in Tables 3 and 4. Using Eqs. (22) and (23) in Sec. 3.4, we can calculate the corresponding profits for those three scenarios. Multiple simulation iterations are carried out. The profit results are shown in Table 5 with 95% confidence interval included.

It can be observed that profit increment can be achieved by applying the OWs based on either deterministic method or stochastic analysis. More importantly, the profit is more effectively increased by utilizing the stochastic analysis which provides a more accurate OW estimation. Note that there might be a minor production throughput impact by applying OWs due to the stochastic nature of production systems and estimation errors. However, based on the concepts of the OW, the throughput impact is minimal and the overall profit increments are significant.

To further illustrate the sensitivity of estimated OW with respects to machines' MCBFs and MCTRs, we decrease the value of $MCBF6$ from $14,400$ to $10,000$ for demonstration purpose. The results show that the estimated OW of machine $M6$ remains unchanged, which is exactly as expected from the discussion in the aforementioned remark.

## Discussion and Conclusion

This paper investigates the energy saving and maintenance opportunities in the stochastic serial production systems. The geometric reliability model and Markov property are assumed, and the real-time data are utilized in the analysis. A systematic method is developed to calculate the OW and recovery time in stochastic scenario. It is concluded that the OW concepts are valid in various scenarios. More importantly, the revised calculation methods based on the stochastic analysis can lead to more accurate prediction for production control schedule, which can result in bigger profit increment. In the future, we will further extend the application of OW and recovery time to more complicated systems such as parallel structures and develop optimal control scheme for more efficient production control.

## Acknowledgment

This work was supported by the U.S. National Science Foundation (NSF) Grant No. 1351160. The authors would like to thank Dr. Michael Brundage for his assistance to this work.