Citation: | W. Fan and J. Xiong, “A homotopy method for continuous-time model-free LQR control based on policy iteration,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125132 |
[1] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., Cambridge, USA: The MIT Press, 2018.
|
[2] |
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, USA: Athena Scientific, 1996.
|
[3] |
L. Cui, S. Wang, J. Zhang, et al., “Learning-based balance control of wheel-legged robots,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7667–7674, 2021. doi: 10.1109/LRA.2021.3100269
|
[4] |
X. Gao, J. Si, Y. Wen, et al., “Reinforcement learning control of robotic knee with human-in-the-Loop by flexible policy iteration,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5873–5887, 2022. doi: 10.1109/TNNLS.2021.3071727
|
[5] |
T. Liu, L. Cui, B. Pang, et al., “A unified framework for data-driven optimal control of connected vehicles in mixed traffic,” IEEE Trans. Intelligent Vehicles, vol. 8, no. 8, pp. 4131–4145, 2023. doi: 10.1109/TIV.2023.3287131
|
[6] |
M. Huang, Z. P. Jiang, and K. Ozbay, “Learning-based adaptive optimal control for connected vehicles in mixed traffic: Robustness to driver reaction time,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 5267–5277, 2022. doi: 10.1109/TCYB.2020.3029077
|
[7] |
Q. Wei, Z. Yang, H. Su, et al., “Online adaptive dynamic programming for optimal self-learning control of VTOL aircraft systems with disturbances,” IEEE Trans. Autom. Science and Engineering, vol. 21, no. 1, pp. 343–352, 2024. doi: 10.1109/TASE.2022.3217539
|
[8] |
F. L. Lewis and K. G. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,” IEEE Trans. Systems, Man, and Cybern., Part B (Cybern.), vol. 41, no. 1, pp. 14–25, 2011. doi: 10.1109/TSMCB.2010.2043839
|
[9] |
T. Bian and Z. P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, 2016. doi: 10.1016/j.automatica.2016.05.003
|
[10] |
Y. Jiang and Z. P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. doi: 10.1016/j.automatica.2012.06.096
|
[11] |
Y. Yang, Y. Pan, C. -Z. Xu, and D. C. Wunsch, “Hamiltonian-driven adaptive dynamic programming with efficient experience replay,” IEEE Trans. Neural Networks and Learning Systems, vol. 35, no. 3, pp. 3278–3290, 2024. doi: 10.1109/TNNLS.2022.3213566
|
[12] |
W. Gao, M. Mynuddin, D. C. Wunsch, et al., “Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5229–5240, 2022. doi: 10.1109/TNNLS.2021.3069728
|
[13] |
Y. Jiang, W. Gao, J. Na, et al., “Value iteration and adaptive optimal output regulation with assured convergencerate,” Control Engineering Practice, vol. 121, p. 105042, 2022.
|
[14] |
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007. doi: 10.1016/j.automatica.2006.09.019
|
[15] |
Y. Yang, H. Modares, K. G. Vamvoudakis, and F. L. Lewis, “Cooperative finitely excited learning for dynamical games,” IEEE Trans. Cybern., vol. 54, no. 2, pp. 797–810, 2024. doi: 10.1109/TCYB.2023.3274908
|
[16] |
Y. Bao and J. M. Velni, “Model-free control design using policy gradient reinforcement learning in LPV framework,” in Proc. European Control Conf., Delft, Netherlands, 2021, pp. 150–155.
|
[17] |
S. Mukherjee and T. L. Vu, “Reinforcement learning of structured stabilizing control for linear systems with unknown state matrix,” IEEE Trans. Autom. Control, vol. 68, no. 3, pp. 1746–1752, 2023. doi: 10.1109/TAC.2022.3155384
|
[18] |
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009. doi: 10.1109/MCAS.2009.933854
|
[19] |
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. doi: 10.1016/j.automatica.2008.08.017
|
[20] |
B. Gravell, P. M. Esfahani, and T. Summers, “Learning optimal controllers for linear systems with multiplicative noise via policy gradient,” IEEE Trans. Autom. Control, vol. 66, no. 11, pp. 5283–5298, 2021. doi: 10.1109/TAC.2020.3037046
|
[21] |
D. Kleinman, “On an iterative technique for Riccati equation computations,” IEEE Trans. Autom. Control, vol. 13, no. 1, pp. 114–115, 1968. doi: 10.1109/TAC.1968.1098829
|
[22] |
Y. Yang, B. Kiumarsi, H. Modares, and C. Xu, “Model-free λ-policy iteration for discrete-time linear quadratic regulation,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 2, pp. 635–649, 2023. doi: 10.1109/TNNLS.2021.3098985
|
[23] |
W. Fan and J. Xiong, “Q-Learning methods for LQR control of completely unknown discrete-time linear systems,” IEEE Trans. Autom. Science and Engineering, 2024. doi: 10.1109/TASE.2024.3434533.
|
[24] |
A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” in Proc. 59th IEEE Conf. Decision and Control, Jeju, Korea (South), 2020, pp. 1902–1907.
|
[25] |
S. Richter and R. DeCarlo, “A homotopy method for eigenvalue assignment using decentralized state feedback,” IEEE Trans. Autom. Control, vol. 29, no. 2, pp. 148–158, 1984. doi: 10.1109/TAC.1984.1103471
|
[26] |
B. C. Eaves, “Homotopies for computation of fixed points,” Mathematical Programming, vol. 3, no. 1, pp. 1–22, 1972.
|
[27] |
T. M. Wu, “A study of convergence on the Newton-homotopy continuation method,” Applied Math. and Computation, vol. 168, no. 2, pp. 1169–1174, 2005. doi: 10.1016/j.amc.2003.10.068
|
[28] |
M. Mariton and P. Bertrand, “A homotopy algorithm for solving coupled Riccati equations,” Optimal Control Applications and Methods, vol. 6, no. 4, pp. 351–357, 1985. doi: 10.1002/oca.4660060404
|
[29] |
B. Pan, P. Lu, X. Pan, et al., “Double-homotopy method for solving optimal control problems,” J. Guidance, Control, and Dynamics, vol. 39, pp. 1–15, 2016. doi: 10.2514/1.G000274
|
[30] |
J. Zhang, Q. Xiao, and L. Li, “Solution space exploration of low-thrust minimum-time trajectory optimization by combining two homotopies,” Automatica, vol. 148, p. 110798, 2023.
|
[31] |
C. Chen, F. L. Lewis, and B. Li, “Homotopic policy iteration-based learning design for unknown linear continuous-time systems,” Automatica, vol. 138, p. 110153, 2022.
|
[32] |
Y. Sun and M. Fazel, “Learning optimal controllers by policy gradient: Global optimality via convex parameterization,” in Proc. 60th IEEE Conf. Decision and Control, Austin, USA, 2021, pp. 4576–4581.
|
[33] |
M Giegrich, C Reisinger, and Y Zhang, “Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems,” SIAM J. Control and Optimization, vol. 62, no. 2, pp. 1060–1092, 2024. doi: 10.1137/22M1533517
|
[34] |
F. L. Lewis and V. L. Syrmos, Optimal Control. New York, USA: Wiley, 1995.
|
[35] |
B. P. Molinari, “The stabilizing solution of the algebraic Riccati equation,” SIAM J. Control, vol. 11, no. 2, pp. 262–271, 1973. doi: 10.1137/0311021
|
[36] |
J. M. Ortega and W. C. Rheinboldt, Iterative Solutions of Nonlinear Equations in Several Variables. New York, USA: Academic Press, 1970, pp. 230–235.
|
[37] |
A. S. Nemirovsky, Interior Point Polynomial Methods in Convex Programming, Atlanta, USA: Georgia Institute of Technology, 1996.
|
[38] |
G. C. Walsh, Hong Ye, and L. G. Bushnell, “Stability analysis of networked control systems,” IEEE Trans. Control Systems Technology, vol. 10, no. 3, pp. 438–446, 2002. doi: 10.1109/87.998034
|
[39] |
H. Xu, S. Jagannathan, and F. L. Lewis, “Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses,” Automatica, vol. 48, no. 6, pp. 1017–1030, 2012. doi: 10.1016/j.automatica.2012.03.007
|
[40] |
T. Horibe and N. Sakamoto, “Optimal swing up and stabilization control for inverted pendulum via stable manifold method,” IEEE Trans. Control Systems Technology, vol. 26, no. 2, pp. 708–715, 2018. doi: 10.1109/TCST.2017.2670524
|