IEEE/CAA Journal of Automatica Sinica
Citation: | G. Y. Zhu, X. L. Li, R. R. Sun, Y. Y. Yang, and P. Zhang, “Policy iteration for optimal control of discrete-time time-varying nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 781–791, Mar. 2023. doi: 10.1109/JAS.2023.123096 |
[1] |
F. Song, Y. Liu, J.-X. Xu, X. Yang, He, and Z. Yang, “Iterative learning identification and compensation of space-periodic disturbance in PMLSM systems with time delay,” IEEE Trans. Industrial Electronics, vol. 65, no. 9, pp. 7579–7589, Sept. 2018. doi: 10.1109/TIE.2017.2777387
|
[2] |
T. Haidegger, L. Kovács, R.-E. Precup, S. Preitl, B. Benyó, and Z. Benyó, “Cascade control for telerobotic systems serving space medicine,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 3759–3764, Jan. 2011. doi: 10.3182/20110828-6-IT-1002.02482
|
[3] |
Z. Cao, Q. Xiao, R. Huang, and M. Zhou, “Robust neuro-optimal control of underactuated snake robots with experience replay,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 1, pp. 208–217, Jan. 2018. doi: 10.1109/TNNLS.2017.2768820
|
[4] |
R.-C. Roman, R.-E. Precup, and E. M. Petriu, “Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems,” European Journal of Control, vol. 58, pp. 373–387, Mar. 2021. doi: 10.1016/j.ejcon.2020.08.001
|
[5] |
J. Hu, J. Duan, H. Ma, and M.-Y. Chow, “Distributed adaptive droop control for optimal power dispatch in DC microgrid,” IEEE Trans. Industrial Electronics, vol. 65, no. 1, pp. 778–789, Jan. 2018. doi: 10.1109/TIE.2017.2698425
|
[6] |
S. Wu, X. Zhao, Z. Jiao, C.-K. Luk, and C. Jiu, “Multi-objective optimal design of a toroidally wound radial-flux halbach permanent magnet array limited angle torque motor,” IEEE Trans. Industrial Electronics, vol. 64, no. 4, pp. 2962–2971, Apr. 2017. doi: 10.1109/TIE.2016.2632067
|
[7] |
J. Qiu, T. Wang, S. Yin, and H. Gao, “Data-based optimal control for networked double-layer industrial processes,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4179–4186, May 2017. doi: 10.1109/TIE.2016.2608902
|
[8] |
C. Zhang, S. Zhang, G. Han, and H. Liu, “Power management comparison for a dual-motor-propulsion system used in a battery electric bus,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 3873–3882, May 2017. doi: 10.1109/TIE.2016.2645166
|
[9] |
K. G. Vamvoudakis and J. Hespanha, “Online optimal operation of parallel voltage-source inverters using partial information,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4296–4305, May 2017. doi: 10.1109/TIE.2016.2630658
|
[10] |
Y. Yang, D. Wunsch, and Y. Yin, “Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1929–1940, Aug. 2017. doi: 10.1109/TNNLS.2017.2654324
|
[11] |
A. Sahoo, H. Xu, and S. Jagannathan, “Approximate optimal control of affine nonlinear continuous-time systems using event-sampled neurodynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 639–652, Mar. 2017. doi: 10.1109/TNNLS.2016.2539366
|
[12] |
X. Xu, H. Yang, C. Lian, and J. Liu, “Self-learning control using dual heuristic programming with global Laplacian eigenmaps,” IEEE Trans. Industrial Electronics, vol. 64, no. 12, pp. 9517–9526, Dec. 2017. doi: 10.1109/TIE.2017.2708002
|
[13] |
T. Bian, Y. Jiang, and Z.-P. Jiang, “Decentralized adaptive optimal control of large-scale systems with application to power systems,” IEEE Trans. Industrial Electronics, vol. 62, no. 4, pp. 2439–2447, Apr. 2015. doi: 10.1109/TIE.2014.2345343
|
[14] |
J. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General Systems Yearbook, vol. 22, pp. 25–38, 1977.
|
[15] |
P. J. Werbos, “A menu of designs for reinforcement learning over time,” in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. Werbos, Eds. Cambridge: MIT Press, 1991, pp. 67–95.
|
[16] |
X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Industrial Electronics, vol. PP, no. 99, pp. 1–10, 2017.
|
[17] |
A. Heydari, “Feedback solution to optimal switching problems with switching cost,” IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2009–2019, Oct. 2016. doi: 10.1109/TNNLS.2015.2388672
|
[18] |
X. Zhong and H. He, “An event-triggered ADP control approach for continuous-time system with unknown internal states,” IEEE Trans. Cybernetics, vol. 47, no. 3, pp. 683–694, Mar. 2017. doi: 10.1109/TCYB.2016.2523878
|
[19] |
Y. Zhu, D. Zhao, H. He, and J. Ji, “Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4101–4109, May 2017. doi: 10.1109/TIE.2016.2597763
|
[20] |
C. Mu, Y. Tang, and H. He, “Improved sliding mode design for load frequency control of power system integrated an adaptive learning strategy,” IEEE Trans. Industrial Electronics, vol. 64, no. 8, pp. 6742–6751, Aug. 2017. doi: 10.1109/TIE.2017.2694396
|
[21] |
X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Industrial Electronics, vol. 65, no. 7, pp. 5722–5732, Jul. 2018. doi: 10.1109/TIE.2017.2782205
|
[22] |
L. Dong, X. Zhong, C. Sun, and H. He, “Event-triggered adaptive dynamic programming for continuous-time systems with control constraints,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1941–1952, Aug. 2017. doi: 10.1109/TNNLS.2016.2586303
|
[23] |
Y. Wen, J. Si, X. Gao, S. Huang, and H. H. Huang, “A new powered lower limb prosthesis control framework based on adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 9, pp. 2215–2220, Sept. 2017. doi: 10.1109/TNNLS.2016.2584559
|
[24] |
B. Talaei, S. Jagannathan, and J. Singler, “Boundary control of 2-D Burgers’ PDE: An adaptive dynamic programming approach,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3669–3681, Aug. 2018. doi: 10.1109/TNNLS.2017.2736786
|
[25] |
X. Xu, Z. Huang, L. Zuo, and H. He, “Manifold-based reinforcement learning via locally linear reconstruction,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 4, pp. 934–947, Apr. 2017. doi: 10.1109/TNNLS.2015.2505084
|
[26] |
Z. Wang, R. Lu, F. Gao, and D. Liu, “An indirect data-driven method for tajectory tracking control of a class of nonlinear discrete-time systems,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4121–4129, May 2017. doi: 10.1109/TIE.2016.2617830
|
[27] |
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
|
[28] |
D. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
|
[29] |
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
|
[30] |
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
|
[31] |
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Systems,Man,and Cybernetics-Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, May 2002. doi: 10.1109/TSMCC.2002.801727
|
[32] |
M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
|
[33] |
Y. Zhu, D. Zhao, X. Yang, and Q. Zhang, “Policy iteration for
|
[34] |
B. Luo, D. Liu, T. Huang, and J. Liu, “Output tracking control based on adaptive dynamic programming with multistep policy evaluation,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 49, no. 10, pp. 2155–2165, 2019.
|
[35] |
P. Yan, D. Wang, H. Li, and D. Liu, “Error bound analysis of Q-function for discounted optimal control problems with policy iteration,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 47, no. 7, pp. 1207–1216, Jul. 2017. doi: 10.1109/TSMC.2016.2563982
|
[36] |
H. Zhang, Y. Liu, G. Xiao, and H. Jiang, “Data-based adaptive dynamic programming for a class of discrete-time systems with multiple delays,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. PP, no. 99, pp. 1–10, 2018.
|
[37] |
R. Song, F. L. Lewis, Q. Wei, and H. Zhang, “Off-policy actor-critic structure for optimal control of unknown systems with disturbances,” IEEE Trans. Cybernetics, vol. 46, no. 5, pp. 1041–1050, May 2016. doi: 10.1109/TCYB.2015.2421338
|
[38] |
J. Skach, B. Kiumarsi, F. L. Lewis, and O. Straka, “Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems,” IEEE Trans. Cybernetics, vol. 48, no. 1, pp. 29–40, Jan. 2018. doi: 10.1109/TCYB.2016.2618926
|
[39] |
R. Song, F. L. Lewis, and Q. Wei, “Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 704–713, Mar. 2017. doi: 10.1109/TNNLS.2016.2582849
|
[40] |
H. Zhang, H. Jiang, Y. Luo, and G. Xiao, “Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4091–4100, May 2017. doi: 10.1109/TIE.2016.2542134
|
[41] |
H. Zhang, H. Liang, Z. Wang, and T. Feng, “Optimal output regulation for heterogeneous multiagent systems via adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 1, pp. 18–29, Jan. 2017. doi: 10.1109/TNNLS.2015.2499757
|
[42] |
H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms,” IEEE Trans. Cybernetics, vol. 47, no. 10, pp. 3331–3340, Oct. 2017. doi: 10.1109/TCYB.2016.2611613
|
[43] |
C. Li, D. Liu, and D. Wang, “Data-based optimal control for weakly coupled nonlinear systems using policy iteration,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 48, no. 4, pp. 511–521, Apr. 2018. doi: 10.1109/TSMC.2016.2606479
|
[44] |
J. Zhang, H. Zhang, and T. Feng, “Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3339–3348, Aug. 2018. doi: 10.1109/TNNLS.2017.2728622
|
[45] |
Q. Wei and D. Liu, “A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems,” Science China Information Sciences, vol. 58, no. 12, pp. 1–15, Dec. 2015.
|
[46] |
D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
|
[47] |
J. Si and Y.-T. Wang, “On-line learning control by association and reinforcement,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264–276, Mar. 2001. doi: 10.1109/72.914523
|
[48] |
M. Kelly. “An introduction to trajectory optimization: How to do your own direct collocation,” SIAM Review, vol. 59, no. 4, pp. 849–904, 2017.
|
[49] |
J. Koenemann, G. Licitra, M. Alp, M. Diehl, “OpenOCL–Open optimal control library,” Robotics Science and Systems, Jun. 2019.
|
[50] |
A. Wächter and L. T. Biegler, “On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006. doi: 10.1007/s10107-004-0559-y
|
[51] |
R. Beard, Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY, 1995.
|