The increasing complexity of engineering systems has motivated continuing research on computational learning methods toward making autonomous intelligent systems that can learn how to improve their performance over time while interacting with their environment. These systems need not only to sense their environment, but also to integrate information from the environment into all decision-makings. The evolution of such systems is modeled as an unknown controlled Markov chain. In a previous research, the predictive optimal decision-making (POD) model was developed, aiming to learn in real time the unknown transition probabilities and associated costs over a varying finite time horizon. In this paper, the convergence of the POD to the stationary distribution of a Markov chain is proven, thus establishing the POD as a robust model for making autonomous intelligent systems. This paper provides the conditions that the POD can be valid, and be an interpretation of its underlying structure.

1.
Bertsekas
,
D. P.
, and
Tsitsiklis
,
J. N.
, 1996,
Neuro-Dynamic Programming
(
Optimization and Neural Computation Series
Vol.
3
),
1st ed.
,
Athena Scientific
,
Nashua, NH
.
2.
Sutton
,
R. S.
, and
Barto
,
A. G.
, 1998,
Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)
,
MIT
,
Cambridge, MA
.
3.
Borkar
,
V. S.
, 2000, “
A Learning Algorithm for Discrete-Time Stochastic Control
,”
Probability in the Engineering and Informational Sciences
,
14
, pp.
243
258
. 0269-9648
4.
Kaelbling
,
L. P.
,
Littman
,
M. L.
, and
Moore
,
A. W.
, 1996, “
Reinforcement Learning: A Survey
,”
J. Artif. Intell. Res.
1076-9757,
4
, pp.
237
285
.
5.
Mandl
,
P.
, 1974, “
Estimation and Control in Markov Chains
,”
Adv. Appl. Probab.
0001-8678,
6
, pp.
40
60
.
6.
Borkar
,
V.
, and
Varaiya
,
P.
, 1979, “
Adaptive Control of Markov Chains. I. Finite Parameter Set
,”
IEEE Trans. Autom. Control
0018-9286,
AC-24
, pp.
953
957
.
7.
Borkar
,
V.
, and
Varaiya
,
P.
, 1982, “
Identification and Adaptive Control of Markov Chains
,”
SIAM J. Control Optim.
0363-0129,
20
, pp.
470
489
.
8.
Kumar
,
P. R.
, 1982, “
Adaptive Control With a Compact Parameter Set
,”
SIAM J. Control Optim.
0363-0129,
20
, pp.
9
13
.
9.
Doshi
,
B.
, and
Shreve
,
S. E.
, 1980, “
Strong Consistency of a Modified Maximum Likelihood Estimator for Controlled Markov Chains
,”
J. Appl. Probab.
0021-9002,
17
, pp.
726
734
.
10.
Kumar
,
P. R.
, and
Becker
,
A.
, 1982, “
A New Family of Optimal Adaptive Controllers for Markov Chains
,”
IEEE Trans. Autom. Control
0018-9286,
AC-27
, pp.
137
146
.
11.
Kumar
,
P. R.
, and
Lin
,
W.
, 1982, “
Optimal Adaptive Controllers for Unknown Markov Chains
,”
IEEE Trans. Autom. Control
0018-9286,
AC-27
, pp.
765
774
.
12.
Sato
,
M.
,
Abe
,
K.
, and
Takeda
,
H.
, 1982, “
Learning Control of Finite Markov Chains With Unknown Transition Probabilities
,”
IEEE Trans. Autom. Control
0018-9286,
AC-27
, pp.
502
505
.
13.
Sato
,
M.
,
Abe
,
K.
, and
Takeda
,
H.
, 1985, “
An Asymptotically Optimal Learning Controller for Finite Markov Chains With Unknown Transition Probabilities
,”
IEEE Trans. Autom. Control
0018-9286,
AC-30
, pp.
1147
1149
.
14.
Sato
,
M.
,
Abe
,
K.
, and
Takeda
,
H.
, 1988, “
Learning Control of Finite Markov Chains With an Explicit Trade-Off Between Estimation and Control
,”
IEEE Trans. Syst. Man Cybern.
0018-9472,
18
, pp.
677
684
.
15.
Kumar
,
P. R.
, 1985, “
A Survey of Some Results in Stochastic Adaptive Control
,”
SIAM J. Control Optim.
0363-0129,
23
, pp.
329
380
.
16.
Varaiya
,
P.
, 1982, “
Adaptive Control of Markov Chains: A Survey
,”
Proceedings of the IFAC Symposium
, New Delhi, India, pp.
89
93
.
17.
Agrawal
,
R.
, and
Teneketzis
,
D.
, 1989, “
Certainty Equivalence Control With Forcing: Revisited
,”
Proceedings of the IEEE Conference on Decision and Control Including the Symposium on Adaptive Processes
, Tampa, FL, p.
2107
.
18.
Malikopoulos
,
A. A.
, 2008, “
Real-Time, Self-Learning Identification and Stochastic Optimal Control of Advanced Powertrain Systems
,” Ph.D. thesis, Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI.
19.
Malikopoulos
,
A. A.
,
Papalambros
,
P. Y.
, and
Assanis
,
D. N.
, 2007, “
A State-Space Representation Model and Learning Algorithm for Real-Time Decision-Making Under Uncertainty
,”
Proceedings of the 2007 ASME International Mechanical Engineering Congress and Exposition
, Seattle, WA, Nov. 11–15.
20.
Malikopoulos
,
A. A.
,
Papalambros
,
P. Y.
, and
Assanis
,
D. N.
, 2007, “
A Learning Algorithm for Optimal Internal Combustion Engine Calibration in Real Time
,”
Proceedings of the ASME 2007 International Design Engineering Technical Conferences Computers and Information in Engineering Conference
, Las Vegas, NV, Sept. 4–7.
21.
Malikopoulos
,
A. A.
,
Assanis
,
D. N.
, and
Papalambros
,
P. Y.
, 2007, “
Real-Time, Self-Learning Optimization of Diesel Engine Calibration
,”
Proceedings of the 2007 Fall Technical Conference of the ASME Internal Combustion Engine Division
, Charleston, SC, Oct. 14–17.
22.
Malikopoulos
,
A. A.
,
Assanis
,
D. N.
, and
Papalambros
,
P. Y.
, 2008, “
Optimal Engine Calibration for Individual Driving Styles
,”
Proceedings of the SAE 2008 World Congress and Exhibition
, Detroit, MI, Apr. 14–17, SAE Paper No. 2008-01-1367.
23.
Kumar
,
P. R.
, and
Varaiya
,
P.
, 1986,
Stochastic Systems
,
Prentice-Hall
,
Englewood Cliffs, NJ
.
24.
Bertsekas
,
D. P.
, 2001,
Dynamic Programming and Optimal Control (Volumes 1 and 2)
(
Optimization and Neural Computation Series
),
1st ed.
,
Athena Scientific
,
Nashua, NH
.
25.
Kemeny
,
J. G.
, and
Snell
,
J. L.
, 1983,
Finite Markov Chains
,
1st ed.
,
Springer
,
New York
.
26.
Krishnan
,
V.
, 2006,
Probability and Random Processes
,
1st ed.
,
Wiley
,
New York
.
27.
Gubner
,
J. A.
, 2006,
Probability and Random Processes for Electrical and Computer Engineers
,
1st ed.
,
Cambridge University Press
,
Cambridge
.
28.
Grimmett
,
G. R.
, and
Stirzaker
,
D. R.
, 2001,
Probability and Random Processes
,
3rd ed.
,
Oxford University Press
,
New York
.
29.
Gosavi
,
A.
, 2003,
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
,
1st ed.
,
Springer
,
New York
.
You do not currently have access to this content.