Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

References

References
1.
Bellman
,
R.
,
1957
,
Dynamic Programming
,
Princeton University Press
,
Princeton, NJ
.
2.
Howard
,
R.
,
1960
,
Dynamic Programming and Markov Processes
,
MIT Press
,
Cambridge, MA
.
3.
d'Epenoux
,
F.
,
1960
, “
Sur un probleme de production et de stockage dans laléatoire
,”
Rev. Fr. Rech. Opér.
,
14
, pp.
3
16
.
4.
Derman
,
C.
,
1970
,
Finite State Markovian Decision Processes
,
Academic Press
,
Orlando
.
5.
Puterman
,
M. L.
,
1994
,
Markov Decision Processes: Discrete Stochastic Dynamic Programming
,
1st ed.
,
Wiley
,
New York
.
6.
Born
,
M.
, and
Fock
,
V.
,
1928
, “
Beweis des adiabatensatzes
,”
Z. Phys.
,
51
, pp.
165
180
.
7.
Messiah
,
A.
,
1962
,
Quantum Mechanics
,
1st ed.
, Vol.
2
,
Wiley
,
New York
.
8.
Kovchegov
,
Y.
,
2010
, “
A Note on Adiabatic Theorem for Markov Chains
,”
Stat. Probab. Lett.
,
80
, pp.
186
190
.
9.
Bradford
,
K.
, and
Kovchegov
,
Y.
,
2011
, “
Adiabatic Times for Markov Chains and Applications
,”
J. Stat. Phys.
,
143
(
5
), pp.
955
969
.
10.
Szita
,
I.
,
Takács
,
B.
, and
Lörincz
,
A.
,
2002
, “
ɛ-MDPS: Learning in Varying Environments,
,”
J. Mach. Learn. Res.
,
3
, pp.
145
174
.
11.
Bradford
,
K.
,
Kovchegov
,
Y.
, and
Nguyen
,
T.
,
2016
, “
Stable Adiabatic Times for Markov Chains
,”
Stochastics
,
88
(
4
), pp.
567
585
.
12.
Rosenwald
,
R.
,
Meyer
,
D.
, and
Schmitt
,
H.
,
2004
, “
Applications of Quantum Algorithms to Partially Observable Markov Decision Processes
,”
5th Asian Control Conference
, Melbourne, Australia, June 20–23, Vol.
1
, pp.
420
427
.
13.
Zacharias
,
L.
,
Nguyen
,
T.
,
Kovchegov
,
Y.
, and
Bradford
,
K.
,
2012
, “
Analysis of Adaptive Queuing Policies Via Adiabatic Approach
,”
2013 International Conference on Computing, Networking and Communications
(
ICNC
), Network Algorithm and Performance Evaluation Symposium, San Diego, CA, Jan. 28–31, pp. 1053–1057.
14.
Duong
,
T.
,
Nguyen-Huu
,
D.
, and
Nguyen
,
T.
,
2013
, “
Adiabatic Markov Decision Process With Application to Queuing Systems
,”
47th Annual Conference on Information Sciences and Systems
(
CISS
), Baltimore, MD, Mar. 20–22, pp.
1
6
.
15.
Derman
,
C.
, and
Strauch
,
R. E.
,
1966
, “
A Note on Memoryless Rules for Controlling Sequential Control Processes
,”
Ann. Math. Stat.
,
37
(
1
), pp.
276
278
.
16.
Seneta
,
E.
,
1981
,
Non-Negative Matrices and Markov Chains
,
Springer-Verlag
, New York.
17.
Levin
,
A. D.
,
Peres
,
Y.
, and
Wilmer
,
E. L.
,
2008
,
Markov Chains and Mixing Times
,
American Mathematical Society
, Providence, RI.
18.
Kleinrock
,
L.
,
1976
,
Queuing Systems: Theory
, Vol.
1
,
Wiley
,
New York
.
19.
Kleinrock
,
L.
,
1976
,
Queuing Systems: Computer Applications
,
Wiley
,
New York
.
20.
Gautam
,
N.
,
2012
,
Analysis of Queues: Methods and Applications (Operations Research Series)
,
Taylor & Francis
,
Boca Raton
.
21.
Lawler
,
G.
,
2006
,
Introduction to Stochastic Processes
(Chapman and Hall/CRC Probability Series),
Chapman & Hall/CRC
,
Boca Raton
.
22.
Bremaud
,
P.
,
1999
,
Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues
(Texts in Applied Mathematics),
Springer
,
Boca Raton
.
23.
Briem
,
U.
,
Theimer
,
T.
, and
Kröner
,
H.
,
1991
, “
A General Discrete-Time Queuing Model: Analysis and Applications
,”
International Teletraffic Congress
, Vol.
13
, pp.
13
19
.
24.
Morrison
,
J. A.
,
1980
, “
Analysis of Some Overflow Problems With Queuing
,”
Bell Syst. Tech. J.
,
59
(
8
), pp.
1427
1462
.
You do not currently have access to this content.