In traditional optimal control and design problems, the control gains and design parameters are usually derived to minimize a cost function reflecting the system performance and control effort. One major challenge of such approaches is the selection of weighting matrices in the cost function, which are usually determined via trial-and-error and human intuition. While various techniques have been proposed to automate the weight selection process, they either can not address complex design problems or suffer from slow convergence rate and high computational costs. We propose a layered approach based on Q-learning, a reinforcement learning technique, on top of genetic algorithms (GA) to determine the best weightings for optimal control and design problems. The layered approach allows for reuse of knowledge. Knowledge obtained via Q-learning in a design problem can be used to speed up the convergence rate of a similar design problem. Moreover, the layered approach allows for solving optimizations that cannot be solved by GA alone. To test the proposed method, we perform numerical experiments on a sample active-passive hybrid vibration control problem, namely adaptive structures with active-passive hybrid piezoelectric networks. These numerical experiments show that the proposed Q-learning scheme is a promising approach for automation of weight selection for complex design problems.

1.
Kwakernaak
,
H.
, and
Sivan
,
R.
, 1972,
Linear Optimal Control Systems
,
Wiley
,
New York
.
2.
Tsai
,
M. S.
, and
Wang
,
K. W.
, 1999, “
On the Structural Damping Characteristics of Active Piezoelectric Actuators With Passive Shunt
,”
J. Sound Vib.
0022-460X,
221
(
1
), pp.
1
20
.
3.
Tang
,
J.
, and
Wang
,
K. W.
, 2001, “
Active-Passive Hybrid Piezoelectric Networks for Vibration Control: Comparisons and Improvement
,”
Smart Mater. Struct.
0964-1726,
10
(
4
), pp.
794
806
.
4.
Kahn
,
S. P.
, and
Wang
,
K. W.
, 1994, “
Structural Vibration Control via Piezoelectric Materials With Active-Passive Hybrid Networks
,”
Proceedings of the 1994 International Mechanical Engineering Congress and Exposition
,
Chicago, IL
, DE 75, pp.
187
194
.
5.
Singh
,
A.
, and
Pines
,
D.
, 2004, “
Active/Passive Reduction of Vibration of Periodic One-Dimensional Structures Using Piezoelectric Actuators
,”
Smart Mater. Struct.
0964-1726,
13
(
4
), pp.
698
711
.
6.
Agnes
,
G. S.
, 1995, “
Development of a Modal Model for Simultaneous Active and Passive Piezoelectric Vibration Suppression
,”
J. Intell. Mater. Syst. Struct.
1045-389X,
6
(
4
), pp.
482
487
.
7.
Arar
,
A.
,
Sawan
,
M. E.
, and
Rob
,
R. A.
, 1995, “
Design of Optimal Control Systems With Eigenvalue Placement in a Specified Region
,”
Opt. Control Appl. Methods
0143-2087,
16
(
2
), pp.
149
154
.
8.
Sunar
,
M.
, and
Rao
,
S.
, 1993, “
Optimal Selection of Weighting Matrices in Integrated Design Of Structures/Controls
,”
AIAA J.
0001-1452,
31
(
4
), pp.
714
720
.
9.
Stuckman
,
B. E.
, and
Stuckman
,
P. L.
, 1993, “
Finding the Best Optimal Control Using Global Search
,”
Comput. Electr. Eng.
0045-7906,
19
(
1
), pp.
9
18
.
10.
Zhang
,
L.
, and
Mao
,
J.
, 2002, “
An Approach for Selecting the Weighting Matrices of lq Optimal Controller Design Based on Genetic Algorithms
,”
Proceedings of the IEEE Conference on Decision and Control
, Vol.
3
,
1331
1334
.
11.
Robandi
,
I.
,
Nishimori
,
K.
,
Nishimura
,
R.
, and
Ishihara
,
N.
, 2001, “
Optimal Feedback Control Design Using Genetic Algorithm in Multimachine Power System
,”
Am. J. Optom. Physiol. Opt.
0093-7002,
23
(
4
), pp.
263
271
.
12.
Skelton
,
R. E.
, and
DeLorenzo
,
M.
, 1985, “
Space Structure Control Design by Variance Assignment
,”
J. Guidance
,
8
(
1
), pp.
454
462
.
13.
Zhu
,
G.
,
Rotea
,
M.
, and
Skelton
,
R. E.
, 1997, “
A Convergent Algorithm for the Output Covariance Constraint Control Problem
,”
SIAM J. Control Optim.
0363-0129,
35
(
1
), pp.
341
361
.
14.
Collins
,
E.
, and
Selekwa
,
M.
, 1999, “
Fuzzy quadratic Weights for Variance Constrained lqg Design
,”
Proceedings of the IEEE Conference on Decision and Control
, Vol.
4
, pp.
4044
4049
.
15.
Collins
,
E.
, 2002, “
A Fuzzy Logic Approach to lqg Design With Variance Constraints
,”
IEEE Trans. Control Syst. Technol.
1063-6536,
10
(
1
), pp.
32
42
.
16.
Makila
,
P. M.
,
Westerlund
,
T.
, and
Toivonen
,
H. T.
, 1984, “
Constrained Linear Quadratic Control With Process Application
,”
Automatica
0005-1098,
20
(
4
),
15
29
.
17.
Puterman
,
M.
, 1994,
Markov Decision Processes: Discrete Stochastic Dynamic Programming
,
Wiley
,
New York
.
18.
Sutton
,
R.
, and
Barto
,
A.
, 1998,
Reinforcement Learning: An Introduction
,
MIT Press
,
Cambridge, MA
.
19.
Bellman
,
R.
, 1957,
Dynamic Programming
,
Princeton University Press
,
Princeton, NJ
.
20.
Howard
,
R.
, 1960,
Dynamic Programming and Markov Processes
,
MIT Press
,
Cambridge, MA
.
21.
Tesauro
,
G.
, 1995, “
Temporal Difference Learning and td-Gammon
,”
Commun. ACM
0001-0782,
38
(
3
), pp.
58
68
.
22.
Barto
,
A.
,
Sutton
,
R.
, and
Watkins
,
C. J.
, 1990, “
Learning and Sequential Decision Making
,”
Learning and Computational Neuroscience
,
M.
Gabriel
, and
J.
Moore
, eds., Vol.
1
,
MIT Press
,
Cambridge, MA
, Chap. 1, pp.
539
602
.
23.
Kaelbling
,
L.
,
Littman
,
M.
, and
Moore
,
A.
, 1996, “
Reinforcement Learning: A Survey
,”
J. Artif. Intell. Res.
1076-9757,
4
(
1
), pp.
237
285
.
24.
Watkins
,
C. J.
,
1989
, Learning with delayed rewards, Ph.D. thesis, Cambridge University, Cambridge, UK.
25.
Dejong
,
K.
, 1975, Analysis of the Behavior of a Class of Genetic Adaptive Systems, Ph.D. thesis, University of Michigan, Ann Arbor.
26.
Yen
,
J.
,
Liao
,
J.
,
Lee
,
B.
, and
Randolph
,
D.
, 1998, “
A Hybrid Approach to Modeling Metabolic Systems Using Genetic Algorithms and Simplex Method
,”
IEEE Trans. Syst., Man, Cybern., Part B: Cybern.
1083-4419,
28
(
2
), pp.
173
191
.
27.
Dixon
,
K.
,
Malak
,
R.
, and
Khosla
,
P.
, 2002, “
Incorporating Prior Knowledge and Previously Learned Information Into Reinforcement Learning Agents
,” Tech. Rep. 1,
Carnegie Mellon University
, Pittsburgh, PA.
You do not currently have access to this content.