It is attractive to reduce the total cost of a manufacture system with real-time control of the production. The total cost mainly consists of the production cost, the penalty of the permanent production loss, and the Work-In-Process (WIP) inventory level cost. However, it is difficult to derive an analytical model of manufacture system due to the complexity of starved and blocked phenomena, the random failure and maintenance processes. Therefore, finding a real-time control policy for the manufacture system without exact analytical model is dearly needed. In this paper, a novel reinforcement learning based control decision policy is proposed based on the action of switching the machines on or off at the start of each time slot. Firstly, a simulation model is developed with MTBF and MTTR evaluated from the history data to collect samples. Then, a reinforcement learning method, specifically, Least-Square-Policy-Iteration method, is applied to obtain a sub-optimal policy. The simulation results show that the proposed method performs well in reducing the total cost.

