The increasing complexity of engineering systems has motivated continuing research on computational learning methods towards making autonomous intelligent systems that can learn how to improve their performance over time while interacting with their environment. These systems need not only to be able to sense their environment, but should also integrate information from the environment into all decision making. The evolution of such systems is modeled as an unknown controlled Markov chain. In previous research, the predictive optimal decision-making (POD) model was developed that aims to learn in real time the unknown transition probabilities and associated costs over a varying finite time horizon. In this paper, the convergence of POD to the stationary distribution of a Markov chain is proven, thus establishing POD as a robust model for making autonomous intelligent systems. The paper provides the conditions that POD can be valid, and an interpretation of its underlying structure.

This content is only available via PDF.
You do not currently have access to this content.