A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 3
Mar.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
G. Y. Zhu, X. L. Li, R. R. Sun, Y. Y. Yang, and P. Zhang, “Policy iteration for optimal control of discrete-time time-varying nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 781–791, Mar. 2023. doi: 10.1109/JAS.2023.123096
Citation: G. Y. Zhu, X. L. Li, R. R. Sun, Y. Y. Yang, and P. Zhang, “Policy iteration for optimal control of discrete-time time-varying nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 781–791, Mar. 2023. doi: 10.1109/JAS.2023.123096

Policy Iteration for Optimal Control of Discrete-Time Time-Varying Nonlinear Systems

doi: 10.1109/JAS.2023.123096
Funds:  This work was supported in part by Fundamental Research Funds for the Central Universities (2022JBZX024) and in part by the National Natural Science Foundation of China (61872037, 61273167)
More Information
  • Aimed at infinite horizon optimal control problems of discrete time-varying nonlinear systems, in this paper, a new iterative adaptive dynamic programming algorithm, which is the discrete-time time-varying policy iteration (DTTV) algorithm, is developed. The iterative control law is designed to update the iterative value function which approximates the index function of optimal performance. The admissibility of the iterative control law is analyzed. The results show that the iterative value function is non-increasingly convergent to the Bellman-equation optimal solution. To implement the algorithm, neural networks are employed and a new implementation structure is established, which avoids solving the generalized Bellman equation in each iteration. Finally, the optimal control laws for torsional pendulum and inverted pendulum systems are obtained by using the DTTV policy iteration algorithm, where the mass and pendulum bar length are permitted to be time-varying parameters. The effectiveness of the developed method is illustrated by numerical results and comparisons.

     

  • loading
  • [1]
    F. Song, Y. Liu, J.-X. Xu, X. Yang, He, and Z. Yang, “Iterative learning identification and compensation of space-periodic disturbance in PMLSM systems with time delay,” IEEE Trans. Industrial Electronics, vol. 65, no. 9, pp. 7579–7589, Sept. 2018. doi: 10.1109/TIE.2017.2777387
    [2]
    T. Haidegger, L. Kovács, R.-E. Precup, S. Preitl, B. Benyó, and Z. Benyó, “Cascade control for telerobotic systems serving space medicine,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 3759–3764, Jan. 2011. doi: 10.3182/20110828-6-IT-1002.02482
    [3]
    Z. Cao, Q. Xiao, R. Huang, and M. Zhou, “Robust neuro-optimal control of underactuated snake robots with experience replay,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 1, pp. 208–217, Jan. 2018. doi: 10.1109/TNNLS.2017.2768820
    [4]
    R.-C. Roman, R.-E. Precup, and E. M. Petriu, “Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems,” European Journal of Control, vol. 58, pp. 373–387, Mar. 2021. doi: 10.1016/j.ejcon.2020.08.001
    [5]
    J. Hu, J. Duan, H. Ma, and M.-Y. Chow, “Distributed adaptive droop control for optimal power dispatch in DC microgrid,” IEEE Trans. Industrial Electronics, vol. 65, no. 1, pp. 778–789, Jan. 2018. doi: 10.1109/TIE.2017.2698425
    [6]
    S. Wu, X. Zhao, Z. Jiao, C.-K. Luk, and C. Jiu, “Multi-objective optimal design of a toroidally wound radial-flux halbach permanent magnet array limited angle torque motor,” IEEE Trans. Industrial Electronics, vol. 64, no. 4, pp. 2962–2971, Apr. 2017. doi: 10.1109/TIE.2016.2632067
    [7]
    J. Qiu, T. Wang, S. Yin, and H. Gao, “Data-based optimal control for networked double-layer industrial processes,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4179–4186, May 2017. doi: 10.1109/TIE.2016.2608902
    [8]
    C. Zhang, S. Zhang, G. Han, and H. Liu, “Power management comparison for a dual-motor-propulsion system used in a battery electric bus,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 3873–3882, May 2017. doi: 10.1109/TIE.2016.2645166
    [9]
    K. G. Vamvoudakis and J. Hespanha, “Online optimal operation of parallel voltage-source inverters using partial information,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4296–4305, May 2017. doi: 10.1109/TIE.2016.2630658
    [10]
    Y. Yang, D. Wunsch, and Y. Yin, “Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1929–1940, Aug. 2017. doi: 10.1109/TNNLS.2017.2654324
    [11]
    A. Sahoo, H. Xu, and S. Jagannathan, “Approximate optimal control of affine nonlinear continuous-time systems using event-sampled neurodynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 639–652, Mar. 2017. doi: 10.1109/TNNLS.2016.2539366
    [12]
    X. Xu, H. Yang, C. Lian, and J. Liu, “Self-learning control using dual heuristic programming with global Laplacian eigenmaps,” IEEE Trans. Industrial Electronics, vol. 64, no. 12, pp. 9517–9526, Dec. 2017. doi: 10.1109/TIE.2017.2708002
    [13]
    T. Bian, Y. Jiang, and Z.-P. Jiang, “Decentralized adaptive optimal control of large-scale systems with application to power systems,” IEEE Trans. Industrial Electronics, vol. 62, no. 4, pp. 2439–2447, Apr. 2015. doi: 10.1109/TIE.2014.2345343
    [14]
    J. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General Systems Yearbook, vol. 22, pp. 25–38, 1977.
    [15]
    P. J. Werbos, “A menu of designs for reinforcement learning over time,” in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. Werbos, Eds. Cambridge: MIT Press, 1991, pp. 67–95.
    [16]
    X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Industrial Electronics, vol. PP, no. 99, pp. 1–10, 2017.
    [17]
    A. Heydari, “Feedback solution to optimal switching problems with switching cost,” IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2009–2019, Oct. 2016. doi: 10.1109/TNNLS.2015.2388672
    [18]
    X. Zhong and H. He, “An event-triggered ADP control approach for continuous-time system with unknown internal states,” IEEE Trans. Cybernetics, vol. 47, no. 3, pp. 683–694, Mar. 2017. doi: 10.1109/TCYB.2016.2523878
    [19]
    Y. Zhu, D. Zhao, H. He, and J. Ji, “Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4101–4109, May 2017. doi: 10.1109/TIE.2016.2597763
    [20]
    C. Mu, Y. Tang, and H. He, “Improved sliding mode design for load frequency control of power system integrated an adaptive learning strategy,” IEEE Trans. Industrial Electronics, vol. 64, no. 8, pp. 6742–6751, Aug. 2017. doi: 10.1109/TIE.2017.2694396
    [21]
    X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Trans. Industrial Electronics, vol. 65, no. 7, pp. 5722–5732, Jul. 2018. doi: 10.1109/TIE.2017.2782205
    [22]
    L. Dong, X. Zhong, C. Sun, and H. He, “Event-triggered adaptive dynamic programming for continuous-time systems with control constraints,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1941–1952, Aug. 2017. doi: 10.1109/TNNLS.2016.2586303
    [23]
    Y. Wen, J. Si, X. Gao, S. Huang, and H. H. Huang, “A new powered lower limb prosthesis control framework based on adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 9, pp. 2215–2220, Sept. 2017. doi: 10.1109/TNNLS.2016.2584559
    [24]
    B. Talaei, S. Jagannathan, and J. Singler, “Boundary control of 2-D Burgers’ PDE: An adaptive dynamic programming approach,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3669–3681, Aug. 2018. doi: 10.1109/TNNLS.2017.2736786
    [25]
    X. Xu, Z. Huang, L. Zuo, and H. He, “Manifold-based reinforcement learning via locally linear reconstruction,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 4, pp. 934–947, Apr. 2017. doi: 10.1109/TNNLS.2015.2505084
    [26]
    Z. Wang, R. Lu, F. Gao, and D. Liu, “An indirect data-driven method for tajectory tracking control of a class of nonlinear discrete-time systems,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4121–4129, May 2017. doi: 10.1109/TIE.2016.2617830
    [27]
    F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, Dec. 2012. doi: 10.1109/MCS.2012.2214134
    [28]
    D. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
    [29]
    B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
    [30]
    D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
    [31]
    J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Systems,Man,and Cybernetics-Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, May 2002. doi: 10.1109/TSMCC.2002.801727
    [32]
    M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
    [33]
    Y. Zhu, D. Zhao, X. Yang, and Q. Zhang, “Policy iteration for $H_\infty$ optimal control of polynomial nonlinear systems via sum of squares programming” IEEE Trans. Cybernetics, vol. 48, no. 2, pp. 500–509, Feb. 2018. doi: 10.1109/TCYB.2016.2643687
    [34]
    B. Luo, D. Liu, T. Huang, and J. Liu, “Output tracking control based on adaptive dynamic programming with multistep policy evaluation,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 49, no. 10, pp. 2155–2165, 2019.
    [35]
    P. Yan, D. Wang, H. Li, and D. Liu, “Error bound analysis of Q-function for discounted optimal control problems with policy iteration,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 47, no. 7, pp. 1207–1216, Jul. 2017. doi: 10.1109/TSMC.2016.2563982
    [36]
    H. Zhang, Y. Liu, G. Xiao, and H. Jiang, “Data-based adaptive dynamic programming for a class of discrete-time systems with multiple delays,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. PP, no. 99, pp. 1–10, 2018.
    [37]
    R. Song, F. L. Lewis, Q. Wei, and H. Zhang, “Off-policy actor-critic structure for optimal control of unknown systems with disturbances,” IEEE Trans. Cybernetics, vol. 46, no. 5, pp. 1041–1050, May 2016. doi: 10.1109/TCYB.2015.2421338
    [38]
    J. Skach, B. Kiumarsi, F. L. Lewis, and O. Straka, “Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems,” IEEE Trans. Cybernetics, vol. 48, no. 1, pp. 29–40, Jan. 2018. doi: 10.1109/TCYB.2016.2618926
    [39]
    R. Song, F. L. Lewis, and Q. Wei, “Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 704–713, Mar. 2017. doi: 10.1109/TNNLS.2016.2582849
    [40]
    H. Zhang, H. Jiang, Y. Luo, and G. Xiao, “Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method,” IEEE Trans. Industrial Electronics, vol. 64, no. 5, pp. 4091–4100, May 2017. doi: 10.1109/TIE.2016.2542134
    [41]
    H. Zhang, H. Liang, Z. Wang, and T. Feng, “Optimal output regulation for heterogeneous multiagent systems via adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 1, pp. 18–29, Jan. 2017. doi: 10.1109/TNNLS.2015.2499757
    [42]
    H. Zhang, H. Jiang, C. Luo, and G. Xiao, “Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms,” IEEE Trans. Cybernetics, vol. 47, no. 10, pp. 3331–3340, Oct. 2017. doi: 10.1109/TCYB.2016.2611613
    [43]
    C. Li, D. Liu, and D. Wang, “Data-based optimal control for weakly coupled nonlinear systems using policy iteration,” IEEE Trans. Systems,Man,and Cybernetics: Systems, vol. 48, no. 4, pp. 511–521, Apr. 2018. doi: 10.1109/TSMC.2016.2606479
    [44]
    J. Zhang, H. Zhang, and T. Feng, “Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3339–3348, Aug. 2018. doi: 10.1109/TNNLS.2017.2728622
    [45]
    Q. Wei and D. Liu, “A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems,” Science China Information Sciences, vol. 58, no. 12, pp. 1–15, Dec. 2015.
    [46]
    D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
    [47]
    J. Si and Y.-T. Wang, “On-line learning control by association and reinforcement,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264–276, Mar. 2001. doi: 10.1109/72.914523
    [48]
    M. Kelly. “An introduction to trajectory optimization: How to do your own direct collocation,” SIAM Review, vol. 59, no. 4, pp. 849–904, 2017.
    [49]
    J. Koenemann, G. Licitra, M. Alp, M. Diehl, “OpenOCL–Open optimal control library,” Robotics Science and Systems, Jun. 2019.
    [50]
    A. Wächter and L. T. Biegler, “On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006. doi: 10.1007/s10107-004-0559-y
    [51]
    R. Beard, Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY, 1995.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)

    Article Metrics

    Article views (520) PDF downloads(85) Cited by()

    Highlights

    • A novel iterative adaptive dynamic programming method is presented for the infinite horizon optimal control problem of discrete time-varying nonlinear systems
    • The properties of the discrete-time time-varying policy iteration method, including monotonicity, convergence and optimality, are analyzed in detail
    • The critic neural network and actor neural network are introduced to implement the presented method
    • Simulation results show that the presented method can obtain the optimal control law and optimal performance index function, which verifies the correctness of the presented method

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return