A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Y. Zhang, Y. Wang, and Y. Cai, “Value iteration-based distributed adaptive dynamic programming for multi-player differential game with incomplete information,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 2, pp. 1–12, Feb. 2025.
Citation: Y. Zhang, Y. Wang, and Y. Cai, “Value iteration-based distributed adaptive dynamic programming for multi-player differential game with incomplete information,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 2, pp. 1–12, Feb. 2025.

Value Iteration-Based Distributed Adaptive Dynamic Programming for Multi-Player Differential Game With Incomplete Information

Funds:  This work was supported by the Aeronautical Science Foundation of China (20220001057001) and an Open Project of the National Key Laboratory of Air-based Information Perception and Fusion (202437)
More Information
  • In this paper, a distributed adaptive dynamic programming (ADP) framework based on value iteration is proposed for multi-player differential games. In the game setting, players have no access to the information of others’ system parameters or control laws. Each player adopts an on-policy value iteration algorithm as the basic learning framework. To deal with the incomplete information structure, players collect a period of system trajectory data to compensate for the lack of information. The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy. Theoretical analysis shows that by adopting proximal policy searching rules, the approximated policies can converge to a neighborhood of equilibrium policies. The efficacy of our method is illustrated by three examples, which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.

     

  • loading
  • 1 a.e. is an abbreviation for almost everywhere, which means holding on everywhere except a zero-measure set.
  • [1]
    T. Başar and G. Zaccour, Handbook of Dynamic Game Theory. Springer, Aug. 2018.
    [2]
    P. An, M. Liu, Y. Wan, and F. L. Lewis, “Multi-player H differential game using on-policy and off-policy reinforcement learning,” in Proc. 16th IEEE Int. Conf. Control and Automation, pp. 1137–1142, Oct. 2020. ISSN: 1948–3457.
    [3]
    E. Garcia, D. W. Casbeer, A. Von Moll, and M. Pachter, “Multiple pursuer multiple evader differential games,” IEEE Trans. Autom. Control, vol. 66, no. 5, pp. 2345–2350, May 2021. doi: 10.1109/TAC.2020.3003840
    [4]
    Z. Zhou and H. Xu, “Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning,” Neurocomputing, vol. 484, pp. 46–58, May 2022. doi: 10.1016/j.neucom.2021.01.141
    [5]
    J. Sun and Z. Ming, “Cooperative differential game-based distributed optimal synchronization control of heterogeneous nonlinear multiagent systems,” IEEE Trans. Cybernetics, vol. 53, no. 12, pp. 7933–7942, Dec. 2023. doi: 10.1109/TCYB.2023.3240983
    [6]
    D. Wang, N. Gao, D. Liu, J. Li, and F. L. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no.1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843
    [7]
    J. Zhao, “Data-driven adaptive dynamic programming for optimal control of continuous-time multicontroller systems with unknown dynamics,” IEEE Access, vol. 10, pp. 41503–41511, 2022. doi: 10.1109/ACCESS.2022.3168032
    [8]
    X. Li, L. Wang, Y. An, Q.-L. Huang, Y.-H. Cui, and H.-S. Hu, “Dynamic path planning of mobile robots using adaptive dynamic programming,” Expert Systems With Applications, vol. 235, p. 121112, Jan. 2024. doi: 10.1016/j.eswa.2023.121112
    [9]
    Y. Zhu, D. Zhao, X. Li, and D. Wang, “Control-limited adaptive dynamic programming for multi-battery energy storage systems,” IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 4235–4244, Jul. 2019. doi: 10.1109/TSG.2018.2854300
    [10]
    T. Lyu, H. Xu, L. Zhang, and Z. Han, “Source selection and resource allocation in wireless-powered relay networks: An adaptive dynamic programming-based approach,” IEEE Internet of Things J., vol. 11, pp. 8973–8988, Mar. 2024. doi: 10.1109/JIOT.2023.3321673
    [11]
    Z. Lin, J. Ma, J. Duan, S. E. Li, H. Ma, B. Cheng, and T. H. Lee, “Policy iteration based approximate dynamic programming toward autonomous driving in constrained dynamic environment,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 5, pp. 5003–5013, May 2023. doi: 10.1109/TITS.2023.3237568
    [12]
    T. Liu, L. Cui, B. Pang, and Z.-P. Jiang, “A unified framework for data-driven optimal control of connected vehicles in mixed traffic,” IEEE Trans. Intelligent Vehicles, vol. 8, no. 8, pp. 4131–4145, Aug. 2023. doi: 10.1109/TIV.2023.3287131
    [13]
    R. Song, Q. Wei, H. Zhang, and F. L. Lewis, “Discrete-time non-zero-sum games with completely unknown dynamics,” IEEE Trans. Cybernetics, vol. 51, no. 6, pp. 2929–2943, June 2021. doi: 10.1109/TCYB.2019.2957406
    [14]
    J. Li, Z. Xiao, J. Fan, T. Chai, and F. L. Lewis, “Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state,” Automatica, vol. 136, p. 110076, Feb. 2022. doi: 10.1016/j.automatica.2021.110076
    [15]
    H. Jiang, B. Zhou, and G.-R. Duan, “Modified λ-policy iteration based adaptive dynamic programming for unknown discrete-time linear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 35, no. 3, pp. 3291–3301, Mar. 2024. doi: 10.1109/TNNLS.2023.3244934
    [16]
    F. F. M. El-Sousy, M. M. Amin, and A. Al-Durra, “Adaptive optimal tracking control via actor-critic-identifier based adaptive dynamic programming for permanent-magnet synchronous motor drive system,” IEEE Trans. Industry Applications, vol. 57, no. 6, pp. 6577–6591, Nov. 2021. doi: 10.1109/TIA.2021.3110936
    [17]
    J. Na, Y. Lv, K. Zhang, and J. Zhao, “Adaptive identifier-critic-based optimal tracking control for nonlinear systems with experimental validation,” IEEE Trans. Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 459–472, Jan. 2022. doi: 10.1109/TSMC.2020.3003224
    [18]
    H. Li, D. Liu, and D. Wang, “Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics,” IEEE Trans. Autom. Science and Engineering, vol. 11, no. 3, pp. 706–714, Jul. 2014. doi: 10.1109/TASE.2014.2300532
    [19]
    C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
    [20]
    J. Sun and C. Liu, “Distributed zero-sum differential game for multi-agent nonlinear systems via adaptive dynamic programming,” in Proc. 37th Chinese Control Conf., pp. 2770–2775, 2018.
    [21]
    J. Sun and T. Long, “Event-triggered distributed zero-sum differential game for nonlinear multi-agent systems using adaptive dynamic programming,” ISA Trans., vol. 110, pp. 39–52, 2021. doi: 10.1016/j.isatra.2020.10.043
    [22]
    K. A. Cavalieri, N. Satak, and J. E. Hurtado, “Incomplete information pursuit-evasion games with uncertain relative dynamics,” in Proc. AIAA Guidance, Navigation, and Control Conf. National Harbor, Maryland: American Institute of Aeronautics and Astronautics, Jan. 2014.
    [23]
    D. Cappello and T. Mylvaganam, “Distributed control of multi-agent systems via linear quadratic differential games with partial information,” in Proc. IEEE Conf. Decision and Control, pp. 4565–4570, Dec. 2018.
    [24]
    F. Koepf, S. Ebbert, M. Flad, and S. Hohmann, “Adaptive dynamic programming for cooperative control with incomplete information,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics, pp. 2632–2638, 2018.
    [25]
    Y. Zhang, L. Zhang, and Y. Cai, “Value iteration-based cooperative adaptive optimal control for multi-player differential games with incomplete information,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 3, pp. 690–697, Mar. 2024. doi: 10.1109/JAS.2023.124125
    [26]
    M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
    [27]
    K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611, Aug. 2012. doi: 10.1016/j.automatica.2012.05.074
    [28]
    T. Bian and Z.-P. Jiang, “Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: A value iteration approach,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2781–2790, Jul. 2022. doi: 10.1109/TNNLS.2020.3045087
    [29]
    H. K. Khalil, 1950, Nonlinear Systems. Upper Saddle River, NJ: Prentice Hall, 3rd ed., 2002.
    [30]
    P. G. Ciarlet, Linear and Nonlinear Functional Analysis With Applications: With 401 Problems and 52 Figures. Philadelphia: Society for Industrial and Applied Mathematics, 2013.
    [31]
    Y. Zhang, B. Zhao, D. Liu, and S. Zhang, “Adaptive dynamic programming-based event-triggered robust control for multiplayer nonzero-sum games with unknown dynamics,” IEEE Trans. Cybernetics, vol. 53, no. 8, pp. 5151–5164, Aug. 2023. doi: 10.1109/TCYB.2022.3175650

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(5)

    Article Metrics

    Article views (37) PDF downloads(23) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return