IEEE/CAA Journal of Automatica Sinica
Citation: | Sumit Kumar Jha and Shubhendu Bhasin, "Adaptive Linear Quadratic Regulator for Continuous-Time Systems With Uncertain Dynamics," IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 833-841, May 2020. doi: 10.1109/JAS.2019.1911438 |
[1] |
R. E. Kalman, "Contributions to the theory of optimal control, " Bol. Soc. Mat. Mexicana, vol. 5, no. 2, pp. 102-119, 1960.
|
[2] |
D. Kleinman, "On an iterative technique for Riccati equation computations, " IEEE Trans. Automatic Control, vol. 13, no. 1, pp. 114-115, 1968. doi: 10.1109/TAC.1968.1098829
|
[3] |
P. A. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996.
|
[4] |
S. J. Bradtke, B. E. Ydstie, and A. G. Barto, "Adaptive linear quadratic control using policy iteration, " in Proc. Amer. Control Conf., vol. 3, 1994, pp. 3475-3479.
|
[5] |
D. Vrabie, M. Abu-Khalaf, F. L. Lewis, and Y. Wang, "Continuous-time ADP for linear systems with partially unknown dynamics, " in Proc. IEEE Int. Symp. Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 247-253.
|
[6] |
P. Mehta and S. Meyn, "Q-learning and Pontryagin's minimum principle, " in Proc. IEEE Conf. Decision and Control, 2009, pp. 3598-3605.
|
[7] |
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuous-time linear systems based on policy iteration, " Automatica, vol. 45, no. 2, pp. 477-484, 2009. doi: 10.1016/j.automatica.2008.08.017
|
[8] |
K. G. Vamvoudakis and F. L. Lewis, "Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, " Automatica, vol. 46, no. 5, pp. 878-888, 2010. doi: 10.1016/j.automatica.2010.02.018
|
[9] |
J. Y. Lee, J. B. Park, and Y. H. Choi, "Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems, " Automatica, vol. 48, no. 11, pp. 2850-2859, 2012. doi: 10.1016/j.automatica.2012.06.008
|
[10] |
Y. Jiang and Z.-P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi: 10.1016/j.automatica.2012.06.096
|
[11] |
S. Bhasin, R. Kamalapurkar, M. Johnson, and K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, "A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, " Automatica, vol. 49, no. 1, pp. 82-92, 2013. doi: 10.1016/j.automatica.2012.09.019
|
[12] |
S. K. Jha, S. B. Roy, and S. Bhasin, "Direct adaptive optimal control for uncertain continuous-time LTI systems without persistence of excitation, " IEEE Trans. Circuits and Systems Ⅱ: Express Briefs, vol. 65, no. 12, pp. 1993-1997, 2018. doi: 10.1109/TCSII.2018.2799625
|
[13] |
H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, "Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks, " IEEE Trans. Neural Networks and Learning Systems, vol. 24, no. 10, pp. 1513-1525, 2013. doi: 10.1109/TNNLS.2013.2276571
|
[14] |
K. G. Vamvoudakis, "Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach, " Systems & Control Letters, vol. 100, pp. 14-20, 2017.
|
[15] |
S. K. Jha, S. B. Roy, and S. Bhasin, "Data-driven adaptive LQR for completely unknown LTI systems, " in Proc. World Congr. IFAC, 2017, pp. 4224-4229.
|
[16] |
R. W. Beard, G. N. Saridis, and J. T. Wen, "Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation, " Automatica, vol. 33, no. 12, pp. 2159-2177, 1997. doi: 10.1016/S0005-1098(97)00128-3
|
[17] |
K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, 2000. doi: 10.1162/089976600300015961
|
[18] |
S. K. Jha, S. B. Roy, and S. Bhasin, "Policy iteration-based indirect adaptive optimal control for completely unknown continuous-time LTI systems, " in Proc. IEEE Symp. Adaptive Dynamic Programming and Reinforcement Learning, 2017, pp. 1-7.
|
[19] |
R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, Cambridge, MA: MIT Press, 1998.
|
[20] |
P. J. Werbos, "Neural networks for control and system identification, " in Proc. 28th IEEE Conf. Decision and Control, 1989, pp. 260-265.
|
[21] |
L. C. Baird, "Reinforcement learning in continuous time: advantage updating, " in Proc. IEEE World Congr. Computational Intelligence Int. Conf. Neural Networks, vol. 4, 1994, pp. 2448-2453.
|
[22] |
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive dynamic programming, " IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 140-153, 2002. doi: 10.1109/TSMCC.2002.801727
|
[23] |
D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Belmont, MA, USA: Athena Scientific, 2007.
|
[24] |
S. K. Jha, S. B. Roy, and S. Bhasin, "Memory-efficient filter based novel policy iteration technique for adaptive LQR, " in Proc. 2018 American Control Conf., 2018, pp. 4963-4968.
|
[25] |
T. Dierks and S. Jagannathan, "Online optimal control of nonlinear discrete-time systems using approximate dynamic programming, " J. Control Theory and Applications, vol. 9, no. 3, pp. 361-369, 2011. doi: 10.1007/s11768-011-0178-0
|
[26] |
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, "Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, " Automatica, vol. 50, no. 4, pp. 1167-1175, 2014. doi: 10.1016/j.automatica.2014.02.015
|
[27] |
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, " IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 4, pp. 943-949, 2008. doi: 10.1109/TSMCB.2008.926614
|
[28] |
K. G. Vamvoudakis, M. F. Miranda, and J. Hespanha, "Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation, " IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 11, pp. 2386-2398, 2016. doi: 10.1109/TNNLS.2015.2487972
|
[29] |
R. Kamalapurkar, P. Walters, and W. E. Dixon, "Model-based reinforcement learning for approximate optimal regulation, " Automatica, vol. 64, pp. 94-104, 2016. doi: 10.1016/j.automatica.2015.10.039
|
[30] |
E. Panteley, A. Loría, and A. Teel, "Relaxed persistency of excitation for uniform asymptotic stability, " IEEE Trans. Automatic Control, vol. 46, no. 12, pp. 1874-1886, 2001. doi: 10.1109/9.975471
|
[31] |
A. Loría and E. Panteley, "Uniform exponential stability of linear timevarying systems: revisited, " Systems & Control Letters, vol. 47, no. 1, pp. 13-24, 2002.
|
[32] |
F. Lewis and V. Syrmos, Optimal Control, 2nd ed. John Wiley & sons, INC., 1995.
|
[33] |
D. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, 1987.
|
[34] |
P. Kokotovic, M. Krstic, and I. Kanellakopoulos, Nonlinear and Adaptive Control Design, John Wiley and Sons, 1995.
|
[35] |
L. Eugene, W. Kevin, and D. Howe, Robust and Adaptive Control With Aerospace Applications, Springer London, 2013.
|
[36] |
E. Lavretsky and K. Wise, Robust and Adaptive Control: With Aerospace Applications, Springer, 2013.
|
[37] |
P. Ioannou and B. Fidan, Adaptive Control Tutorial, SIAM, 2006.
|
[38] |
E. Lavretsky, T. E. Gibson, and A. M. Annaswamy, "Projection operator in adaptive systems, " arXiv preprint arXiv: 1112.4232v6, 2012.
|
[39] |
S. Sastry and M. Bodson, Adaptive Control: Stability, Convergence and Robustness, Engle-wood Cliffs, NJ: Prentice Hall, 1989.
|
[40] |
S. Boyd and S. S. Sastry, "Necessary and sufficient conditions for parameter convergence in adaptive control, " Automatica, vol. 22, no. 6, pp. 629-639, 1986. doi: 10.1016/0005-1098(86)90002-6
|
[41] |
F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control, " IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009. doi: 10.1109/MCAS.2009.933854
|
[42] |
G. Kreisselmeier and G. Rietze-Augst, "Richness and excitation on an interval-with application to continuous-time adaptive control, " IEEE Trans. Automatic Control, vol. 35, no. 2, pp. 165-171, 1990. doi: 10.1109/9.45172
|
[43] |
H. Khalil, Nonlinear Systems, 3rd ed. Prentice Hall, 2002.
|
[44] |
M. Corless and L. Glielmo, "New converse lyapunov theorems and related results on exponential stability, " Mathematics of Control, Signals and Systems, vol. 11, no. 1, pp. 79-100, 1998. doi: 10.1007/BF02741886
|
[45] |
K. J. A ström and B. Wittenmark, Adaptive Control, Courier Corporation, 2013.
|