Adaptive Linear Quadratic Regulator for Continuous-Time Systems With Uncertain Dynamics

Sumit Kumar Jha; Shubhendu Bhasin

doi:10.1109/JAS.2019.1911438

Volume 7 Issue 3

Apr. 2020

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2020 > 7(3): 833-841

Sumit Kumar Jha and Shubhendu Bhasin, "Adaptive Linear Quadratic Regulator for Continuous-Time Systems With Uncertain Dynamics," IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 833-841, May 2020. doi: 10.1109/JAS.2019.1911438

Citation:

Sumit Kumar Jha and Shubhendu Bhasin, "Adaptive Linear Quadratic Regulator for Continuous-Time Systems With Uncertain Dynamics," IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 833-841, May 2020. doi: 10.1109/JAS.2019.1911438

Citation:

PDF( 706 KB)

Adaptive Linear Quadratic Regulator for Continuous-Time Systems With Uncertain Dynamics

doi: 10.1109/JAS.2019.1911438

Sumit Kumar Jha^1
,,
Shubhendu Bhasin^2
,

1.
Department of Electronics and Communication Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj-211004, India
2.
Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi-110016, India

More Information

Abstract

Abstract

In this paper, adaptive linear quadratic regulator (LQR) is proposed for continuous-time systems with uncertain dynamics. The dynamic state-feedback controller uses input-output data along the system trajectory to continuously adapt and converge to the optimal controller. The result differs from previous results in that the adaptive optimal controller is designed without the knowledge of the system dynamics and an initial stabilizing policy. Further, the controller is updated continuously using input-output data, as opposed to the commonly used switched/intermittent updates which can potentially lead to stability issues. An online state derivative estimator facilitates the design of a model-free controller. Gradient-based update laws are developed for online estimation of the optimal gain. Uniform exponential stability of the closed-loop system is established using the Lyapunov-based analysis, and a simulation example is provided to validate the theoretical contribution.
- Adaptive optimal control,
- continuous policy update,
- linear quadratic regulator,
- uncertain system,
- dynamics

FullText(HTML)

Recommended by Associate Editor Qinglai Wei.

References(45)

References

[1]	R. E. Kalman, "Contributions to the theory of optimal control, " Bol. Soc. Mat. Mexicana, vol. 5, no. 2, pp. 102-119, 1960.
[2]	D. Kleinman, "On an iterative technique for Riccati equation computations, " IEEE Trans. Automatic Control, vol. 13, no. 1, pp. 114-115, 1968. doi: 10.1109/TAC.1968.1098829
[3]	P. A. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996.
[4]	S. J. Bradtke, B. E. Ydstie, and A. G. Barto, "Adaptive linear quadratic control using policy iteration, " in Proc. Amer. Control Conf., vol. 3, 1994, pp. 3475-3479.
[5]	D. Vrabie, M. Abu-Khalaf, F. L. Lewis, and Y. Wang, "Continuous-time ADP for linear systems with partially unknown dynamics, " in Proc. IEEE Int. Symp. Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 247-253.
[6]	P. Mehta and S. Meyn, "Q-learning and Pontryagin's minimum principle, " in Proc. IEEE Conf. Decision and Control, 2009, pp. 3598-3605.
[7]	D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuous-time linear systems based on policy iteration, " Automatica, vol. 45, no. 2, pp. 477-484, 2009. doi: 10.1016/j.automatica.2008.08.017
[8]	K. G. Vamvoudakis and F. L. Lewis, "Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, " Automatica, vol. 46, no. 5, pp. 878-888, 2010. doi: 10.1016/j.automatica.2010.02.018
[9]	J. Y. Lee, J. B. Park, and Y. H. Choi, "Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems, " Automatica, vol. 48, no. 11, pp. 2850-2859, 2012. doi: 10.1016/j.automatica.2012.06.008
[10]	Y. Jiang and Z.-P. Jiang, "Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, " Automatica, vol. 48, no. 10, pp. 2699-2704, 2012. doi: 10.1016/j.automatica.2012.06.096
[11]	S. Bhasin, R. Kamalapurkar, M. Johnson, and K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, "A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, " Automatica, vol. 49, no. 1, pp. 82-92, 2013. doi: 10.1016/j.automatica.2012.09.019
[12]	S. K. Jha, S. B. Roy, and S. Bhasin, "Direct adaptive optimal control for uncertain continuous-time LTI systems without persistence of excitation, " IEEE Trans. Circuits and Systems Ⅱ: Express Briefs, vol. 65, no. 12, pp. 1993-1997, 2018. doi: 10.1109/TCSII.2018.2799625
[13]	H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, "Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks, " IEEE Trans. Neural Networks and Learning Systems, vol. 24, no. 10, pp. 1513-1525, 2013. doi: 10.1109/TNNLS.2013.2276571
[14]	K. G. Vamvoudakis, "Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach, " Systems & Control Letters, vol. 100, pp. 14-20, 2017.
[15]	S. K. Jha, S. B. Roy, and S. Bhasin, "Data-driven adaptive LQR for completely unknown LTI systems, " in Proc. World Congr. IFAC, 2017, pp. 4224-4229.
[16]	R. W. Beard, G. N. Saridis, and J. T. Wen, "Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation, " Automatica, vol. 33, no. 12, pp. 2159-2177, 1997. doi: 10.1016/S0005-1098(97)00128-3
[17]	K. Doya, "Reinforcement learning in continuous time and space, " Neural Computation, vol. 12, no. 1, pp. 219-245, 2000. doi: 10.1162/089976600300015961
[18]	S. K. Jha, S. B. Roy, and S. Bhasin, "Policy iteration-based indirect adaptive optimal control for completely unknown continuous-time LTI systems, " in Proc. IEEE Symp. Adaptive Dynamic Programming and Reinforcement Learning, 2017, pp. 1-7.
[19]	R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, Cambridge, MA: MIT Press, 1998.
[20]	P. J. Werbos, "Neural networks for control and system identification, " in Proc. 28th IEEE Conf. Decision and Control, 1989, pp. 260-265.
[21]	L. C. Baird, "Reinforcement learning in continuous time: advantage updating, " in Proc. IEEE World Congr. Computational Intelligence Int. Conf. Neural Networks, vol. 4, 1994, pp. 2448-2453.
[22]	J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive dynamic programming, " IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 140-153, 2002. doi: 10.1109/TSMCC.2002.801727
[23]	D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Belmont, MA, USA: Athena Scientific, 2007.
[24]	S. K. Jha, S. B. Roy, and S. Bhasin, "Memory-efficient filter based novel policy iteration technique for adaptive LQR, " in Proc. 2018 American Control Conf., 2018, pp. 4963-4968.
[25]	T. Dierks and S. Jagannathan, "Online optimal control of nonlinear discrete-time systems using approximate dynamic programming, " J. Control Theory and Applications, vol. 9, no. 3, pp. 361-369, 2011. doi: 10.1007/s11768-011-0178-0
[26]	B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, "Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, " Automatica, vol. 50, no. 4, pp. 1167-1175, 2014. doi: 10.1016/j.automatica.2014.02.015
[27]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof, " IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 4, pp. 943-949, 2008. doi: 10.1109/TSMCB.2008.926614
[28]	K. G. Vamvoudakis, M. F. Miranda, and J. Hespanha, "Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation, " IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 11, pp. 2386-2398, 2016. doi: 10.1109/TNNLS.2015.2487972
[29]	R. Kamalapurkar, P. Walters, and W. E. Dixon, "Model-based reinforcement learning for approximate optimal regulation, " Automatica, vol. 64, pp. 94-104, 2016. doi: 10.1016/j.automatica.2015.10.039
[30]	E. Panteley, A. Loría, and A. Teel, "Relaxed persistency of excitation for uniform asymptotic stability, " IEEE Trans. Automatic Control, vol. 46, no. 12, pp. 1874-1886, 2001. doi: 10.1109/9.975471
[31]	A. Loría and E. Panteley, "Uniform exponential stability of linear timevarying systems: revisited, " Systems & Control Letters, vol. 47, no. 1, pp. 13-24, 2002.
[32]	F. Lewis and V. Syrmos, Optimal Control, 2nd ed. John Wiley & sons, INC., 1995.
[33]	D. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, 1987.
[34]	P. Kokotovic, M. Krstic, and I. Kanellakopoulos, Nonlinear and Adaptive Control Design, John Wiley and Sons, 1995.
[35]	L. Eugene, W. Kevin, and D. Howe, Robust and Adaptive Control With Aerospace Applications, Springer London, 2013.
[36]	E. Lavretsky and K. Wise, Robust and Adaptive Control: With Aerospace Applications, Springer, 2013.
[37]	P. Ioannou and B. Fidan, Adaptive Control Tutorial, SIAM, 2006.
[38]	E. Lavretsky, T. E. Gibson, and A. M. Annaswamy, "Projection operator in adaptive systems, " arXiv preprint arXiv: 1112.4232v6, 2012.
[39]	S. Sastry and M. Bodson, Adaptive Control: Stability, Convergence and Robustness, Engle-wood Cliffs, NJ: Prentice Hall, 1989.
[40]	S. Boyd and S. S. Sastry, "Necessary and sufficient conditions for parameter convergence in adaptive control, " Automatica, vol. 22, no. 6, pp. 629-639, 1986. doi: 10.1016/0005-1098(86)90002-6
[41]	F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control, " IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009. doi: 10.1109/MCAS.2009.933854
[42]	G. Kreisselmeier and G. Rietze-Augst, "Richness and excitation on an interval-with application to continuous-time adaptive control, " IEEE Trans. Automatic Control, vol. 35, no. 2, pp. 165-171, 1990. doi: 10.1109/9.45172
[43]	H. Khalil, Nonlinear Systems, 3rd ed. Prentice Hall, 2002.
[44]	M. Corless and L. Glielmo, "New converse lyapunov theorems and related results on exponential stability, " Mathematics of Control, Signals and Systems, vol. 11, no. 1, pp. 79-100, 1998. doi: 10.1007/BF02741886
[45]	K. J. A ström and B. Wittenmark, Adaptive Control, Courier Corporation, 2013.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5)

Get Citation

PDF

XML

Article Metrics

Article views (1550) PDF downloads(115)

Highlights

In this paper, adaptive linear quadratic regulator (LQR) is proposed for continuous-time systems with uncertain dynamics. The dynamic state-feedback controller uses inputoutput data along the system trajectory to continuously adapt and converge to the optimal controller. The Lyapunov analysis is used to prove uniform exponential stability of the overall system.
The contribution of this paper is the design of a continuous-time adaptive LQR with a time-varying state-feedback gain for continuous-time LTI systems with uncertain dynamics, which is shown to exponentially converge to the optimal gain.
The novelty of the proposed result lies in the computational/memory efficient algorithm used to solve the optimal control problem for uncertain dynamics, without requiring an initial stabilizing control policy, unlike previous results which either use an initial stabilizing control policy and a switched policy update or past data storage. Further, the controller is updated continuously using input-output data, while ensuring exponentially converge to the optimal gain, as opposed to the commonly used switched/intermittent updates which may potentially lead to stability issues.

Adaptive Linear Quadratic Regulator for Continuous-Time Systems With Uncertain Dynamics

doi: 10.1109/JAS.2019.1911438

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content