Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems

Guangyu Zhu; Xiaolu Li; Ranran Sun; Yiyuan Yang; Peng Zhang

doi:10.1109/JAS.2023.123096

Volume 10 Issue 3

Mar. 2023

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2023 > 10(3): 781-791

G. Y. Zhu, X. L. Li, R. R. Sun, Y. Y. Yang, and P. Zhang, “Policy iteration for optimal control of discrete-time time-varying nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 781–791, Mar. 2023. doi: 10.1109/JAS.2023.123096

Citation:

G. Y. Zhu, X. L. Li, R. R. Sun, Y. Y. Yang, and P. Zhang, “Policy iteration for optimal control of discrete-time time-varying nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 781–791, Mar. 2023. doi: 10.1109/JAS.2023.123096

Citation:

PDF( 2641 KB)

Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems

doi: 10.1109/JAS.2023.123096

Guangyu Zhu^{1
,
,},
Xiaolu Li^1
,,
Ranran Sun^1
,,
Yiyuan Yang^1
,,
Peng Zhang^2
,

1.
Beijing Research Center of Urban Traffic Information Sensing and Service Technologies, and also with the Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, Beijing Jiaotong University, Beijing 100044, China
2.
Transport Planning and Research Institute, Ministry of Transport, China, Beijing 100028, China

Funds: This work was supported in part by Fundamental Research Funds for the Central Universities (2022JBZX024) and in part by the National Natural Science Foundation of China (61872037, 61273167)

More Information

Author Bio:
Guangyu Zhu received the Ph.D. degree in control theory and control engineering from Southeast University in 2004. He was a Visiting Scholar with MIT, USA, in 2010. He is currently a Professor with the Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, Beijing Jiaotong University. His research interests include intelligent transportation systems, risk analysis and intelligent emergency of urban rail transit, and complex network applications

Xiaolu Li received the B.S. degree in electronic information engineering from the City Institute of Dalian University of Technology in 2014, and the M.S. degree in traffic information engineering and control from Chongqing Jiaotong University in 2017. She received the Ph.D. degree in control science and engineering from Beijing Jiaotong University in 2021. Her research interests include the mining of passenger group characteristics and the organization and management of urban rail passenger flow

Ranra Sun received the B.S. degree in traffic engineering from the School of Transportation and Civil Engineering, Nantong University, in 2019. She is currently pursuing the Ph.D. degree in control science and engineering with Beijing Jiaotong University. Her research interests include the mining of passenger group characteristics and the organization and management of urban rail passenger flow

Yiyua Yang received the B.S. degree in traffic engineering from the School of Civil Engineering, Zhengzhou University, in 2020. She is pursuing the M.S degree in transportation planning and management with the School of Traffic and Transportation, Beijing Jiaotong University. Her research interests include urban rail transit operation level evaluation and passenger flow analysis at emergency situation

Peng Zhang received the B.S. degree in auto control from Xi’an Jiaotong University in 1997, and the M.S. degree in pattern recognition and artificial intelligence and the Ph.D. degree in wireless communication from Southeast University, in 2002 and 2006, respectively. From 2007 to 2008, he was a Research Fellow in Nanyang Technological University, Singapore. From 2009 to 2017, he was a Senior Engineer, Research Professor with Beijing Transportation Research institute. He is currently a Research Professor in Transport Planning and Research Institute, Ministry of Transport, China. His research interests include ITS technologies and transportation system model
Corresponding author: Guangyu Zhu, e-mail: gyzhu@bjtu.edu.cn
Received Date: 2021-09-10
Accepted Date: 2021-09-23

Available Online: 2022-08-15

Abstract

Abstract

Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems, in this paper, a new iterative adaptive dynamic programming algorithm, which is the discrete-time time-varying policy iteration (DTTV) algorithm, is developed. The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance. The admissibility of the iterative control law is analyzed. The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution. To implement the algorithm, neural networks are employed and a new implementation structure is established, which avoids solving the generalized Bellman equation in each iteration. Finally, the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm, where the mass and pendulum bar length are permitted to be time-varying parameters. The effectiveness of the developed method is illustrated by numerical results and comparisons.
- Adaptive critic designs,
- adaptive dynamic programming,
- approximate dynamic programming,
- optimal control,
- policy iteration,
- time-varying

FullText(HTML)

References(51)

References

[1]	F. Song, Y. Liu, J.-X. Xu, X. Yang, He, and Z. Yang, “Iterative learning identification and compensation of space-periodic disturbance in PMLSM systems with time delay,” IEEE Trans. Industrial Electronics, vol. 65, no. 9, pp. 7579–7589, Sept. 2018. doi: 10.1109/TIE.2017.2777387
[2]	T. Haidegger, L. Kovács, R.-E. Precup, S. Preitl, B. Benyó, and Z. Benyó, “Cascade control for telerobotic systems serving space medicine,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 3759–3764, Jan. 2011. doi: 10.3182/20110828-6-IT-1002.02482
[3]	Z. Cao, Q. Xiao, R. Huang, and M. Zhou, “Robust neuro-optimal control of underactuated snake robots with experience replay,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 1, pp. 208–217, Jan. 2018. doi: 10.1109/TNNLS.2017.2768820
[4]	R.-C. Roman, R.-E. Precup, and E. M. Petriu, “Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems,” European Journal of Control, vol. 58, pp. 373–387, Mar. 2021. doi: 10.1016/j.ejcon.2020.08.001
[5]	J. Hu, J. Duan, H. Ma, and M.-Y. Chow, “Distributed adaptive droop control for optimal power dispatch in DC microgrid,” IEEE Trans. Industrial Electronics, vol. 65, no. 1, pp. 778–789, Jan. 2018. doi: 10.1109/TIE.2017.2698425
[6]	S. Wu, X. Zhao, Z. Jiao, C.-K. Luk, and C. Jiu, “Multi-objective optimal design of a toroidally wound radial-flux halbach permanent magnet array limited angle torque motor,” IEEE Trans. Industrial Electronics, vol. 64, no. 4, pp. 2962–2971, Apr. 2017. doi: 10.1109/TIE.2016.2632067
[7]	J. Qiu, T. Wang, S. Yin, and H. Gao, “Data-based optimal control for networked double-layer industrial processes,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4179–4186, May 2017. doi: 10.1109/TIE.2016.2608902
[8]	C. Zhang, S. Zhang, G. Han, and H. Liu, “Power management comparison for a dual-motor-propulsion system used in a battery electric bus,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 3873–3882, May 2017. doi: 10.1109/TIE.2016.2645166
[9]	K. G. Vamvoudakis and J. Hespanha, “Online optimal operation of parallel voltage-source inverters using partial information,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4296–4305, May 2017. doi: 10.1109/TIE.2016.2630658
[10]	Y. Yang, D. Wunsch, and Y. Yin, “Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1929–1940, Aug. 2017. doi: 10.1109/TNNLS.2017.2654324
[11]	A. Sahoo, H. Xu, and S. Jagannathan, “Approximate optimal control of affine nonlinear continuous-time systems using event-sampled neurodynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 639–652, Mar. 2017. doi: 10.1109/TNNLS.2016.2539366
[12]	X. Xu, H. Yang, C. Lian, and J. Liu, “Self-learning control using dual heuristic programming with global Laplacian eigenmaps,” IEEE Trans. Industrial Electronics, vol. 64, no. 12, pp. 9517–9526, Dec. 2017. doi: 10.1109/TIE.2017.2708002
[13]	T. Bian, Y. Jiang, and Z.-P. Jiang, “Decentralized adaptive optimal control of large-scale systems with application to power systems,” IEEE Trans. Industrial Electronics, vol. 62, no. 4, pp. 2439–2447, Apr. 2015. doi: 10.1109/TIE.2014.2345343
[14]	J. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General Systems Yearbook, vol. 22, pp. 25–38, 1977.
[15]	P. J. Werbos, “A menu of designs for reinforcement learning over time,” in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. Werbos, Eds. Cambridge: MIT Press, 1991, pp. 67–95.
[16]	X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Industrial Electronics, vol. PP, no. 99, pp. 1–10, 2017.
[17]	A. Heydari, “Feedback solution to optimal switching problems with switching cost,” IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2009–2019, Oct. 2016. doi: 10.1109/TNNLS.2015.2388672
[18]	X. Zhong and H. He, “An event-triggered ADP control approach for continuous-time system with unknown internal states,” IEEE Trans. Cybernetics, vol. 47, no. 3, pp. 683–694, Mar. 2017. doi: 10.1109/TCYB.2016.2523878
[19]	Y. Zhu, D. Zhao, H. He, and J. Ji, “Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4101–4109, May 2017. doi: 10.1109/TIE.2016.2597763
[20]	C. Mu, Y. Tang, and H. He, “Improved sliding mode design for load frequency control of power system integrated an adaptive learning strategy,” IEEE Trans. Industrial Electronics, vol. 64, no. 8, pp. 6742–6751, Aug. 2017. doi: 10.1109/TIE.2017.2694396
[21]	X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Industrial Electronics, vol. 65, no. 7, pp. 5722–5732, Jul. 2018. doi: 10.1109/TIE.2017.2782205
[22]	L. Dong, X. Zhong, C. Sun, and H. He, “Event-triggered adaptive dynamic programming for continuous-time systems with control constraints,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1941–1952, Aug. 2017. doi: 10.1109/TNNLS.2016.2586303
[23]	Y. Wen, J. Si, X. Gao, S. Huang, and H. H. Huang, “A new powered lower limb prosthesis control framework based on adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 9, pp. 2215–2220, Sept. 2017. doi: 10.1109/TNNLS.2016.2584559
[24]	B. Talaei, S. Jagannathan, and J. Singler, “Boundary control of 2-D Burgers’ PDE: An adaptive dynamic programming approach,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3669–3681, Aug. 2018. doi: 10.1109/TNNLS.2017.2736786
[25]	X. Xu, Z. Huang, L. Zuo, and H. He, “Manifold-based reinforcement learning via locally linear reconstruction,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 4, pp. 934–947, Apr. 2017. doi: 10.1109/TNNLS.2015.2505084
[26]	Z. Wang, R. Lu, F. Gao, and D. Liu, “An indirect data-driven method for tajectory tracking control of a class of nonlinear discrete-time systems,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4121–4129, May 2017. doi: 10.1109/TIE.2016.2617830
[27]	F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
[28]	D. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
[29]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
[30]	D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
[31]	J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Systems,Man,and Cybernetics-Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, May 2002. doi: 10.1109/TSMCC.2002.801727
[32]	M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
[33]	Y. Zhu, D. Zhao, X. Yang, and Q. Zhang, “Policy iteration for $H_\infty$ optimal control of polynomial nonlinear systems via sum of squares programming” IEEE Trans. Cybernetics, vol. 48, no. 2, pp. 500–509, Feb. 2018. doi: 10.1109/TCYB.2016.2643687
[34]	B. Luo, D. Liu, T. Huang, and J. Liu, “Output tracking control based on adaptive dynamic programming with multistep policy evaluation,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 49, no. 10, pp. 2155–2165, 2019.
[35]	P. Yan, D. Wang, H. Li, and D. Liu, “Error bound analysis of Q-function for discounted optimal control problems with policy iteration,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 47, no. 7, pp. 1207–1216, Jul. 2017. doi: 10.1109/TSMC.2016.2563982
[36]	H. Zhang, Y. Liu, G. Xiao, and H. Jiang, “Data-based adaptive dynamic programming for a class of discrete-time systems with multiple delays,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. PP, no. 99, pp. 1–10, 2018.
[37]	R. Song, F. L. Lewis, Q. Wei, and H. Zhang, “Off-policy actor-critic structure for optimal control of unknown systems with disturbances,” IEEE Trans. Cybernetics, vol. 46, no. 5, pp. 1041–1050, May 2016. doi: 10.1109/TCYB.2015.2421338
[38]	J. Skach, B. Kiumarsi, F. L. Lewis, and O. Straka, “Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems,” IEEE Trans. Cybernetics, vol. 48, no. 1, pp. 29–40, Jan. 2018. doi: 10.1109/TCYB.2016.2618926
[39]	R. Song, F. L. Lewis, and Q. Wei, “Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 704–713, Mar. 2017. doi: 10.1109/TNNLS.2016.2582849
[40]	H. Zhang, H. Jiang, Y. Luo, and G. Xiao, “Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4091–4100, May 2017. doi: 10.1109/TIE.2016.2542134
[41]	H. Zhang, H. Liang, Z. Wang, and T. Feng, “Optimal output regulation for heterogeneous multiagent systems via adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 1, pp. 18–29, Jan. 2017. doi: 10.1109/TNNLS.2015.2499757
[42]	H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms,” IEEE Trans. Cybernetics, vol. 47, no. 10, pp. 3331–3340, Oct. 2017. doi: 10.1109/TCYB.2016.2611613
[43]	C. Li, D. Liu, and D. Wang, “Data-based optimal control for weakly coupled nonlinear systems using policy iteration,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 48, no. 4, pp. 511–521, Apr. 2018. doi: 10.1109/TSMC.2016.2606479
[44]	J. Zhang, H. Zhang, and T. Feng, “Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3339–3348, Aug. 2018. doi: 10.1109/TNNLS.2017.2728622
[45]	Q. Wei and D. Liu, “A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems,” Science China Information Sciences, vol. 58, no. 12, pp. 1–15, Dec. 2015.
[46]	D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
[47]	J. Si and Y.-T. Wang, “On-line learning control by association and reinforcement,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264–276, Mar. 2001. doi: 10.1109/72.914523
[48]	M. Kelly. “An introduction to trajectory optimization: How to do your own direct collocation,” SIAM Review, vol. 59, no. 4, pp. 849–904, 2017.
[49]	J. Koenemann, G. Licitra, M. Alp, M. Diehl, “OpenOCL–Open optimal control library,” Robotics Science and Systems, Jun. 2019.
[50]	A. Wächter and L. T. Biegler, “On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006. doi: 10.1007/s10107-004-0559-y
[51]	R. Beard, Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY, 1995.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8)

Get Citation

PDF

XML

Article Metrics

Article views (520) PDF downloads(85)

Highlights

A novel iterative adaptive dynamic programming method is presented for the infinite horizon optimal control problem of discrete time-varying nonlinear systems
The properties of the discrete-time time-varying policy iteration method, including monotonicity, convergence and optimality, are analyzed in detail
The critic neural network and actor neural network are introduced to implement the presented method
Simulation results show that the presented method can obtain the optimal control law and optimal performance index function, which verifies the correctness of the presented method

Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems

doi: 10.1109/JAS.2023.123096

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content