IEEE/CAA Journal of Automatica Sinica
Citation: | M. Zhao, D. Wang, S. Song, and J. Qiao, “Safe Q-learning for data-driven nonlinear optimal control with asymmetric state constraints,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 12, pp. 2408–2422, Dec. 2024. doi: 10.1109/JAS.2024.124509 |
[1] |
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
|
[2] |
D. Wang, J. Wang, M. Zhao, P. Xin, and J. Qiao, “Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1797–1809, Sept. 2023. doi: 10.1109/JAS.2023.123684
|
[3] |
Y. Sokolov, R. Kozma, L. D. Werbos, and P. J. Werbos, “Complete stability analysis of a heuristic approximate dynamic programming control design,” Automatica, vol. 59, pp. 9–18, Sept. 2015. doi: 10.1016/j.automatica.2015.06.001
|
[4] |
D. Wang, H.-L. Zhao, and X. Li, “Adaptive critic control for wastewater treatment systems based on multiobjective particle swarm optimization,” Chin. J. Eng., vol. 46, no. 5, pp. 908–917, May 2024.
|
[5] |
Q. Yang, W. Cao, W. Meng, and J. Si, “Reinforcement-learning-based tracking control of waste water treatment process under realistic system conditions and control performance requirements,” IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 8, pp. 5284–5294, Aug. 2022. doi: 10.1109/TSMC.2021.3122802
|
[6] |
Y. Jiang, W. Gao, J. Wu, T. Chai, and F. L. Lewis, “Reinforcement learning and cooperative H∞ output regulation of linear continuous-time multi-agent systems,” Automatica, vol. 148, p. 110768, Feb. 2023. doi: 10.1016/j.automatica.2022.110768
|
[7] |
M. Ha, D. Wang, and D. Liu, “Discounted iterative adaptive critic designs with novel stability analysis for tracking control,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1262–1272, Jul. 2022. doi: 10.1109/JAS.2022.105692
|
[8] |
M. Zhao, D. Wang, J. Qiao, M. Ha, and J. Ren, “Advanced value iteration for discrete-time intelligent critic control: A survey,” Artif. Intell. Rev., vol. 56, no. 10, pp. 12315–12346, Oct. 2023. doi: 10.1007/s10462-023-10497-1
|
[9] |
D. Wang, “Event-based iterative neural control for a type of discrete dynamic plant,” Chin. J. Eng., vol. 44, no. 3, pp. 411–419, Mar. 2022.
|
[10] |
Q.-Y. Fan and G.-H. Yang, “Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints,” ISA Trans., vol. 66, pp. 122–133, Jan. 2017. doi: 10.1016/j.isatra.2016.10.019
|
[11] |
D. Wang, N. Gao, D. Liu, J. Li, and F. L. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843
|
[12] |
A. Heydari, “Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 9, pp. 4522–4527, Sept. 2018. doi: 10.1109/TNNLS.2017.2755501
|
[13] |
J. Qiao, M. Zhao, D. Wang, and M. Li, “Action-dependent heuristic dynamic programming with experience replay for wastewater treatment processes,” IEEE Trans. Ind. Inf., vol. 20, no. 4, pp. 6257–6265, Apr. 2024. doi: 10.1109/TII.2023.3344130
|
[14] |
L. Zhang, Y. Feng, X. Liang, S. Liu, G. Cheng, and J. Huang, “Sample strategy based on TD-error for offline reinforcement learning,” Chin. J. Eng., vol. 45, no. 12, pp. 2118–2128, Dec. 2023.
|
[15] |
B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning,” IEEE Trans. Neural Networks Learn. Syst., vol. 27, no. 10, pp. 2134–2144, Oct. 2016. doi: 10.1109/TNNLS.2016.2585520
|
[16] |
Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song, “Discrete-time deterministic Q-learning: A novel convergence analysis,” IEEE Trans. Cybern., vol. 47, no. 5, pp. 1224–1237, May 2017. doi: 10.1109/TCYB.2016.2542923
|
[17] |
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, Apr. 2014. doi: 10.1016/j.automatica.2014.02.015
|
[18] |
M. Lin and B. Zhao, “Policy optimization adaptive dynamic programming for optimal control of input-affine discrete-time nonlinear systems,” IEEE Trans. Syst. Man Cybern. Syst., vol. 53, no. 7, pp. 4339–4350, Jul. 2023. doi: 10.1109/TSMC.2023.3247466
|
[19] |
B. Luo, D. Liu, and H.-N. Wu, “Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 6, pp. 2099–2111, Jun. 2018. doi: 10.1109/TNNLS.2017.2751018
|
[20] |
J. Li, T. Chai, F. L. Lewis, Z. Ding, and Y. Jiang, “Off-policy interleaved Q-learning: Optimal control for affine nonlinear discrete-time systems,” IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 5, pp. 1308–1320, May 2019. doi: 10.1109/TNNLS.2018.2861945
|
[21] |
Q. Wei and D. Liu, “A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems,” Sci. China Inf. Sci., vol. 58, no. 12, pp. 1–15, Dec. 2015.
|
[22] |
S. Song, M. Zhu, X. Dai, and D. Gong, “Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-learning algorithm,” IEEE Trans. Neural Networks Learn. Syst., vol. 35, no. 1, pp. 999–1012, Jan. 2024. doi: 10.1109/TNNLS.2022.3178746
|
[23] |
S. Song, D. Gong, M. Zhu, Y. Zhao, and C. Huang, “Data-driven optimal tracking control for discrete-time nonlinear systems with unknown dynamics using deterministic ADP,” IEEE Trans. Neural Networks Learn. Syst., Jun. 2024, DOI: 10.1109/TNNLS.2023.3323142.
|
[24] |
Z. Marvi and B. Kiumarsi, “Reinforcement learning with safety and stability guarantees during exploration for linear systems,” IEEE Open J. Control Syst., vol. 1, pp. 322–334, Nov. 2022. doi: 10.1109/OJCSYS.2022.3209945
|
[25] |
K. P. Wabersich, L. Hewing, A. Carron, and M. N. Zeilinger, “Probabilistic model predictive safety certification for learning-based control,” IEEE Trans. Autom. Control, vol. 67, no. 1, pp. 176–188, Jan. 2022. doi: 10.1109/TAC.2021.3049335
|
[26] |
K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, Jul. 2021. doi: 10.1016/j.automatica.2021.109597
|
[27] |
M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Trans. Autom. Control, vol. 66, no. 8, pp. 3638–3652, Aug. 2021. doi: 10.1109/TAC.2020.3024161
|
[28] |
Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 12, pp. 5441–5455, Dec. 2020. doi: 10.1109/TNNLS.2020.2967871
|
[29] |
Y. Yang, K. G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,” Int. J. Robust Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, Jun. 2020. doi: 10.1002/rnc.4962
|
[30] |
N. M. Yazdani, R. K. Moghaddam, B. Kiumarsi, and H. Modares, “A safety-certified policy iteration algorithm for control of constrained nonlinear systems,” IEEE Control Syst. Lett., vol. 4, no. 3, pp. 686–691, Jul. 2020. doi: 10.1109/LCSYS.2020.2990632
|
[31] |
Z. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” Int. J. Robust Nonlinear Control, vol. 31, no. 6, pp. 1923–1940, Apr. 2021. doi: 10.1002/rnc.5132
|
[32] |
J. Xu, J. Wang, J. Rao, Y. Zhong, and H. Wang, “Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function,” Int. J. Robust Nonlinear Control, vol. 32, no. 6, pp. 3408–3424, Apr. 2022. doi: 10.1002/rnc.5955
|
[33] |
S. Liu, L. Liu, and Z. Yu, “Safe reinforcement learning for affine nonlinear systems with state constraints and input saturation using control barrier functions,” Neurocomputing, vol. 518, pp. 562–576, Jan. 2023. doi: 10.1016/j.neucom.2022.11.006
|
[34] |
L. Zhang, L. Xie, Y. Jiang, Z. Li, X. Liu, and H. Su, “Optimal control for constrained discrete-time nonlinear systems based on safe reinforcement learning,” IEEE Trans. Neural Networks Learn. Syst., Oct. 2023, DOI: 10.1109/TNNLS.2023.3326397.
|
[35] |
S. Prajna, “Barrier certificates for nonlinear model validation,” Automatica, vol. 42, no. 1, pp. 117–126, Jan. 2006. doi: 10.1016/j.automatica.2005.08.007
|
[36] |
A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3861–3876, Aug. 2017. doi: 10.1109/TAC.2016.2638961
|
[37] |
D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
|
[38] |
L. Zheng, Z. Liu, Y. Wang, C. L. P. Chen, Y. Zhang, and Z. Wu, “Reinforcement learning-based adaptive optimal control for nonlinear systems with asymmetric hysteresis,” IEEE Trans. Neural Networks Learn. Syst., Jul. 2023, DOI: 10.1109/TNNLS.2023.3289978.
|
[39] |
D. Wang and D. Liu, “Learning and guaranteed cost control with event-based adaptive critic implementation,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 12, pp. 6004–6014, Dec. 2018. doi: 10.1109/TNNLS.2018.2817256
|
[40] |
M. Ha, D. Wang, and D. Liu, “A novel value iteration scheme with adjustable convergence rate,” IEEE Trans. Neural Networks Learn. Syst., vol. 34, no. 10, pp. 7430–7442, Oct. 2023. doi: 10.1109/TNNLS.2022.3143527
|
[41] |
J. Qiao, M. Zhao, D. Wang, and M. Ha, “Adjustable iterative Q-learning schemes for model-free optimal tracking control,” IEEE Trans. Syst. Man Cybern. Syst., vol. 54, no. 2, pp. 1202–1213, Feb. 2024. doi: 10.1109/TSMC.2023.3324215
|