A Homotopy Method for Continuous-Time Model-Free LQR Control Based on Policy Iteration

Wenwu Fan; Junlin Xiong

doi:10.1109/JAS.2025.125132

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

W. Fan and J. Xiong, “A homotopy method for continuous-time model-free LQR control based on policy iteration,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125132

Citation:

W. Fan and J. Xiong, “A homotopy method for continuous-time model-free LQR control based on policy iteration,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125132

W. Fan and J. Xiong, “A homotopy method for continuous-time model-free LQR control based on policy iteration,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125132

Citation:

W. Fan and J. Xiong, “A homotopy method for continuous-time model-free LQR control based on policy iteration,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125132

PDF( 1038 KB)

A Homotopy Method for Continuous-Time Model-Free LQR Control Based on Policy Iteration

doi: 10.1109/JAS.2025.125132

Wenwu Fan^,,
Junlin Xiong^{,
,}

Funds: This work was supported by the National Natural Science Foundation of China (62273320)

More Information

Abstract

Abstract

In recent years, reinforcement learning control theory has been well developed. However, model-free value iteration needs many iterations to achieve the desired precision, and model-free policy iteration requires an initial stabilizing control policy. It is significant to propose a fast model-free algorithm to solve the continuous-time linear quadratic control problem without an initial stabilizing control policy. In this paper, we construct a homotopy path on which each point corresponds to an linear quadratic regulator problem. Based on policy iteration, model-based and model-free homotopy algorithms are proposed to solve the optimal control problem of continuous-time linear systems along the homotopy path. Our algorithms are speeded up using first-order differential information and do not require an initial stabilizing control policy. Finally, several practical examples are used to illustrate our results.
- Homotopy path,
- initial stabilizing control policy,
- linear quadratic control,
- policy iteration,
- reinforcement learning

FullText(HTML)

References(40)

References

[1]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., Cambridge, USA: The MIT Press, 2018.
[2]	D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, USA: Athena Scientific, 1996.
[3]	L. Cui, S. Wang, J. Zhang, et al., “Learning-based balance control of wheel-legged robots,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7667–7674, 2021. doi: 10.1109/LRA.2021.3100269
[4]	X. Gao, J. Si, Y. Wen, et al., “Reinforcement learning control of robotic knee with human-in-the-Loop by flexible policy iteration,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5873–5887, 2022. doi: 10.1109/TNNLS.2021.3071727
[5]	T. Liu, L. Cui, B. Pang, et al., “A unified framework for data-driven optimal control of connected vehicles in mixed traffic,” IEEE Trans. Intelligent Vehicles, vol. 8, no. 8, pp. 4131–4145, 2023. doi: 10.1109/TIV.2023.3287131
[6]	M. Huang, Z. P. Jiang, and K. Ozbay, “Learning-based adaptive optimal control for connected vehicles in mixed traffic: Robustness to driver reaction time,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 5267–5277, 2022. doi: 10.1109/TCYB.2020.3029077
[7]	Q. Wei, Z. Yang, H. Su, et al., “Online adaptive dynamic programming for optimal self-learning control of VTOL aircraft systems with disturbances,” IEEE Trans. Autom. Science and Engineering, vol. 21, no. 1, pp. 343–352, 2024. doi: 10.1109/TASE.2022.3217539
[8]	F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” IEEE Trans. Systems, Man, and Cybern., Part B (Cybern.), vol. 41, no. 1, pp. 14–25, 2011. doi: 10.1109/TSMCB.2010.2043839
[9]	T. Bian and Z. P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, 2016. doi: 10.1016/j.automatica.2016.05.003
[10]	Y. Jiang and Z. P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. doi: 10.1016/j.automatica.2012.06.096
[11]	Y. Yang, Y. Pan, C. -Z. Xu, and D. C. Wunsch, “Hamiltonian-driven adaptive dynamic programming with efficient experience replay,” IEEE Trans. Neural Networks and Learning Systems, vol. 35, no. 3, pp. 3278–3290, 2024. doi: 10.1109/TNNLS.2022.3213566
[12]	W. Gao, M. Mynuddin, D. C. Wunsch, et al., “Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5229–5240, 2022. doi: 10.1109/TNNLS.2021.3069728
[13]	Y. Jiang, W. Gao, J. Na, et al., “Value iteration and adaptive optimal output regulation with assured convergencerate,” Control Engineering Practice, vol. 121, p. 105042, 2022.
[14]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007. doi: 10.1016/j.automatica.2006.09.019
[15]	Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Cooperative finitely excited learning for dynamical games,” IEEE Trans. Cybern., vol. 54, no. 2, pp. 797–810, 2024. doi: 10.1109/TCYB.2023.3274908
[16]	Y. Bao and J. M. Velni, “Model-free control design using policy gradient reinforcement learning in LPV framework,” in Proc. European Control Conf., Delft, Netherlands, 2021, pp. 150–155.
[17]	S. Mukherjee and T. L. Vu, “Reinforcement learning of structured stabilizing control for linear systems with unknown state matrix,” IEEE Trans. Autom. Control, vol. 68, no. 3, pp. 1746–1752, 2023. doi: 10.1109/TAC.2022.3155384
[18]	F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009. doi: 10.1109/MCAS.2009.933854
[19]	D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. doi: 10.1016/j.automatica.2008.08.017
[20]	B. Gravell, P. M. Esfahani, and T. Summers, “Learning optimal controllers for linear systems with multiplicative noise via policy gradient,” IEEE Trans. Autom. Control, vol. 66, no. 11, pp. 5283–5298, 2021. doi: 10.1109/TAC.2020.3037046
[21]	D. Kleinman, “On an iterative technique for Riccati equation computations,” IEEE Trans. Autom. Control, vol. 13, no. 1, pp. 114–115, 1968. doi: 10.1109/TAC.1968.1098829
[22]	Y. Yang, B. Kiumarsi, H. Modares, and C. Xu, “Model-free λ-policy iteration for discrete-time linear quadratic regulation,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 2, pp. 635–649, 2023. doi: 10.1109/TNNLS.2021.3098985
[23]	W. Fan and J. Xiong, “Q-Learning methods for LQR control of completely unknown discrete-time linear systems,” IEEE Trans. Autom. Science and Engineering, 2024. doi: 10.1109/TASE.2024.3434533.
[24]	A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” in Proc. 59th IEEE Conf. Decision and Control, Jeju, Korea (South), 2020, pp. 1902–1907.
[25]	S. Richter and R. DeCarlo, “A homotopy method for eigenvalue assignment using decentralized state feedback,” IEEE Trans. Autom. Control, vol. 29, no. 2, pp. 148–158, 1984. doi: 10.1109/TAC.1984.1103471
[26]	B. C. Eaves, “Homotopies for computation of fixed points,” Mathematical Programming, vol. 3, no. 1, pp. 1–22, 1972.
[27]	T. M. Wu, “A study of convergence on the Newton-homotopy continuation method,” Applied Math. and Computation, vol. 168, no. 2, pp. 1169–1174, 2005. doi: 10.1016/j.amc.2003.10.068
[28]	M. Mariton and P. Bertrand, “A homotopy algorithm for solving coupled Riccati equations,” Optimal Control Applications and Methods, vol. 6, no. 4, pp. 351–357, 1985. doi: 10.1002/oca.4660060404
[29]	B. Pan, P. Lu, X. Pan, et al., “Double-homotopy method for solving optimal control problems,” J. Guidance, Control, and Dynamics, vol. 39, pp. 1–15, 2016. doi: 10.2514/1.G000274
[30]	J. Zhang, Q. Xiao, and L. Li, “Solution space exploration of low-thrust minimum-time trajectory optimization by combining two homotopies,” Automatica, vol. 148, p. 110798, 2023.
[31]	C. Chen, F. L. Lewis, and B. Li, “Homotopic policy iteration-based learning design for unknown linear continuous-time systems,” Automatica, vol. 138, p. 110153, 2022.
[32]	Y. Sun and M. Fazel, “Learning optimal controllers by policy gradient: Global optimality via convex parameterization,” in Proc. 60th IEEE Conf. Decision and Control, Austin, USA, 2021, pp. 4576–4581.
[33]	M Giegrich, C Reisinger, and Y Zhang, “Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems,” SIAM J. Control and Optimization, vol. 62, no. 2, pp. 1060–1092, 2024. doi: 10.1137/22M1533517
[34]	F. L. Lewis and V. L. Syrmos, Optimal Control. New York, USA: Wiley, 1995.
[35]	B. P. Molinari, “The stabilizing solution of the algebraic Riccati equation,” SIAM J. Control, vol. 11, no. 2, pp. 262–271, 1973. doi: 10.1137/0311021
[36]	J. M. Ortega and W. C. Rheinboldt, Iterative Solutions of Nonlinear Equations in Several Variables. New York, USA: Academic Press, 1970, pp. 230–235.
[37]	A. S. Nemirovsky, Interior Point Polynomial Methods in Convex Programming, Atlanta, USA: Georgia Institute of Technology, 1996.
[38]	G. C. Walsh, Hong Ye, and L. G. Bushnell, “Stability analysis of networked control systems,” IEEE Trans. Control Systems Technology, vol. 10, no. 3, pp. 438–446, 2002. doi: 10.1109/87.998034
[39]	H. Xu, S. Jagannathan, and F. L. Lewis, “Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses,” Automatica, vol. 48, no. 6, pp. 1017–1030, 2012. doi: 10.1016/j.automatica.2012.03.007
[40]	T. Horibe and N. Sakamoto, “Optimal swing up and stabilization control for inverted pendulum via stable manifold method,” IEEE Trans. Control Systems Technology, vol. 26, no. 2, pp. 708–715, 2018. doi: 10.1109/TCST.2017.2670524

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (132) PDF downloads(27)

A Homotopy Method for Continuous-Time Model-Free LQR Control Based on Policy Iteration

doi: 10.1109/JAS.2025.125132

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content