The growing demand for making autonomous intelligent systems that can learn how to improve their performance while interacting with their environment has induced significant research on computational cognitive models. Computational intelligence, or rationality, can be achieved by modeling a system and the interaction with its environment through actions, perceptions, and associated costs. A widely adopted paradigm for modeling this interaction is the controlled Markov chain. In this context, the problem is formulated as a sequential decision-making process in which an intelligent system has to select those control actions in several time steps to achieve long-term goals. This paper presents a rollout control algorithm that aims to build an online decision-making mechanism for a controlled Markov chain. The algorithm yields a lookahead suboptimal control policy. Under certain conditions, a theoretical bound on its performance can be established.

This content is only available via PDF.
You do not currently have access to this content.