Value Iteration-Based Distributed Adaptive Dynamic Programming for Multi-Player Differential Game With Incomplete Information

Yun Zhang; Yuqi Wang; Yunze Cai

doi:10.1109/JAS.2024.124950

Volume 12 Issue 2

Feb. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(2): 436-447

Y. Zhang, Y. Wang, and Y. Cai, “Value iteration-based distributed adaptive dynamic programming for multi-player differential game with incomplete information,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 2, pp. 436–447, Feb. 2025. doi: 10.1109/JAS.2024.124950

Citation:

Y. Zhang, Y. Wang, and Y. Cai, “Value iteration-based distributed adaptive dynamic programming for multi-player differential game with incomplete information,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 2, pp. 436–447, Feb. 2025. doi: 10.1109/JAS.2024.124950

Citation:

PDF( 1710 KB)

Value Iteration-Based Distributed Adaptive Dynamic Programming for Multi-Player Differential Game With Incomplete Information

doi: 10.1109/JAS.2024.124950

Funds: This work was supported by the Aeronautical Science Foundation of China (20220001057001) and an Open Project of the National Key Laboratory of Air-based Information Perception and Fusion (202437)

More Information

Abstract

Abstract

In this paper, a distributed adaptive dynamic programming (ADP) framework based on value iteration is proposed for multi-player differential games. In the game setting, players have no access to the information of others’ system parameters or control laws. Each player adopts an on-policy value iteration algorithm as the basic learning framework. To deal with the incomplete information structure, players collect a period of system trajectory data to compensate for the lack of information. The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy. Theoretical analysis shows that by adopting proximal policy searching rules, the approximated policies can converge to a neighborhood of equilibrium policies. The efficacy of our method is illustrated by three examples, which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
- Distributed adaptive dynamic programming,
- incomplete information,
- multi-player differential game (MPDG),
- value iteration

FullText(HTML)

¹ a.e. is an abbreviation for almost everywhere, which means holding on everywhere except a zero-measure set.

References(31)

References

[1]	T. Başar and G. Zaccour, Handbook of Dynamic Game Theory. Springer, Aug. 2018.
[2]	P. An, M. Liu, Y. Wan, and F. L. Lewis, “Multi-player H_∞ differential game using on-policy and off-policy reinforcement learning,” in Proc. 16th IEEE Int. Conf. Control and Automation, pp. 1137–1142, Oct. 2020. ISSN: 1948–3457.
[3]	E. Garcia, D. W. Casbeer, A. Von Moll, and M. Pachter, “Multiple pursuer multiple evader differential games,” IEEE Trans. Autom. Control, vol. 66, no. 5, pp. 2345–2350, May 2021. doi: 10.1109/TAC.2020.3003840
[4]	Z. Zhou and H. Xu, “Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning,” Neurocomputing, vol. 484, pp. 46–58, May 2022. doi: 10.1016/j.neucom.2021.01.141
[5]	J. Sun and Z. Ming, “Cooperative differential game-based distributed optimal synchronization control of heterogeneous nonlinear multiagent systems,” IEEE Trans. Cybernetics, vol. 53, no. 12, pp. 7933–7942, Dec. 2023. doi: 10.1109/TCYB.2023.3240983
[6]	D. Wang, N. Gao, D. Liu, J. Li, and F. L. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no.1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843
[7]	J. Zhao, “Data-driven adaptive dynamic programming for optimal control of continuous-time multicontroller systems with unknown dynamics,” IEEE Access, vol. 10, pp. 41503–41511, 2022. doi: 10.1109/ACCESS.2022.3168032
[8]	X. Li, L. Wang, Y. An, Q.-L. Huang, Y.-H. Cui, and H.-S. Hu, “Dynamic path planning of mobile robots using adaptive dynamic programming,” Expert Systems With Applications, vol. 235, p. 121112, Jan. 2024. doi: 10.1016/j.eswa.2023.121112
[9]	Y. Zhu, D. Zhao, X. Li, and D. Wang, “Control-limited adaptive dynamic programming for multi-battery energy storage systems,” IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 4235–4244, Jul. 2019. doi: 10.1109/TSG.2018.2854300
[10]	T. Lyu, H. Xu, L. Zhang, and Z. Han, “Source selection and resource allocation in wireless-powered relay networks: An adaptive dynamic programming-based approach,” IEEE Internet of Things J., vol. 11, pp. 8973–8988, Mar. 2024. doi: 10.1109/JIOT.2023.3321673
[11]	Z. Lin, J. Ma, J. Duan, S. E. Li, H. Ma, B. Cheng, and T. H. Lee, “Policy iteration based approximate dynamic programming toward autonomous driving in constrained dynamic environment,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 5, pp. 5003–5013, May 2023. doi: 10.1109/TITS.2023.3237568
[12]	T. Liu, L. Cui, B. Pang, and Z.-P. Jiang, “A unified framework for data-driven optimal control of connected vehicles in mixed traffic,” IEEE Trans. Intelligent Vehicles, vol. 8, no. 8, pp. 4131–4145, Aug. 2023. doi: 10.1109/TIV.2023.3287131
[13]	R. Song, Q. Wei, H. Zhang, and F. L. Lewis, “Discrete-time non-zero-sum games with completely unknown dynamics,” IEEE Trans. Cybernetics, vol. 51, no. 6, pp. 2929–2943, June 2021. doi: 10.1109/TCYB.2019.2957406
[14]	J. Li, Z. Xiao, J. Fan, T. Chai, and F. L. Lewis, “Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state,” Automatica, vol. 136, p. 110076, Feb. 2022. doi: 10.1016/j.automatica.2021.110076
[15]	H. Jiang, B. Zhou, and G.-R. Duan, “Modified λ-policy iteration based adaptive dynamic programming for unknown discrete-time linear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 35, no. 3, pp. 3291–3301, Mar. 2024. doi: 10.1109/TNNLS.2023.3244934
[16]	F. F. M. El-Sousy, M. M. Amin, and A. Al-Durra, “Adaptive optimal tracking control via actor-critic-identifier based adaptive dynamic programming for permanent-magnet synchronous motor drive system,” IEEE Trans. Industry Applications, vol. 57, no. 6, pp. 6577–6591, Nov. 2021. doi: 10.1109/TIA.2021.3110936
[17]	J. Na, Y. Lv, K. Zhang, and J. Zhao, “Adaptive identifier-critic-based optimal tracking control for nonlinear systems with experimental validation,” IEEE Trans. Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 459–472, Jan. 2022. doi: 10.1109/TSMC.2020.3003224
[18]	H. Li, D. Liu, and D. Wang, “Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics,” IEEE Trans. Autom. Science and Engineering, vol. 11, no. 3, pp. 706–714, Jul. 2014. doi: 10.1109/TASE.2014.2300532
[19]	C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, “Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics,” IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, Nov. 2019. doi: 10.1109/TAC.2019.2905215
[20]	J. Sun and C. Liu, “Distributed zero-sum differential game for multi-agent nonlinear systems via adaptive dynamic programming,” in Proc. 37th Chinese Control Conf., pp. 2770–2775, 2018.
[21]	J. Sun and T. Long, “Event-triggered distributed zero-sum differential game for nonlinear multi-agent systems using adaptive dynamic programming,” ISA Trans., vol. 110, pp. 39–52, 2021. doi: 10.1016/j.isatra.2020.10.043
[22]	K. A. Cavalieri, N. Satak, and J. E. Hurtado, “Incomplete information pursuit-evasion games with uncertain relative dynamics,” in Proc. AIAA Guidance, Navigation, and Control Conf. National Harbor, Maryland: American Institute of Aeronautics and Astronautics, Jan. 2014.
[23]	D. Cappello and T. Mylvaganam, “Distributed control of multi-agent systems via linear quadratic differential games with partial information,” in Proc. IEEE Conf. Decision and Control, pp. 4565–4570, Dec. 2018.
[24]	F. Koepf, S. Ebbert, M. Flad, and S. Hohmann, “Adaptive dynamic programming for cooperative control with incomplete information,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics, pp. 2632–2638, 2018.
[25]	Y. Zhang, L. Zhang, and Y. Cai, “Value iteration-based cooperative adaptive optimal control for multi-player differential games with incomplete information,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 3, pp. 690–697, Mar. 2024. doi: 10.1109/JAS.2023.124125
[26]	M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
[27]	K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611, Aug. 2012. doi: 10.1016/j.automatica.2012.05.074
[28]	T. Bian and Z.-P. Jiang, “Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: A value iteration approach,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2781–2790, Jul. 2022. doi: 10.1109/TNNLS.2020.3045087
[29]	H. K. Khalil, 1950, Nonlinear Systems. Upper Saddle River, NJ: Prentice Hall, 3rd ed., 2002.
[30]	P. G. Ciarlet, Linear and Nonlinear Functional Analysis With Applications: With 401 Problems and 52 Figures. Philadelphia: Society for Industrial and Applied Mathematics, 2013.
[31]	Y. Zhang, B. Zhao, D. Liu, and S. Zhang, “Adaptive dynamic programming-based event-triggered robust control for multiplayer nonzero-sum games with unknown dynamics,” IEEE Trans. Cybernetics, vol. 53, no. 8, pp. 5151–5164, Aug. 2023. doi: 10.1109/TCYB.2022.3175650

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (176) PDF downloads(52)

Highlights

This paper presents a novel distributed value iteration (VI)-based adaptive dynamic programming (ADP) method for multi-player differential game models with incomplete information
The incomplete information structure is characterized by the limit information from neighbors. Objective functions and control policies of neighbors are unavailable for each player
Each player completes the learning asynchronously and independently, although it does not know who among the others updates the policy and when it happens
The distributed VI is implemented by an on-policy ADP algorithm where the approximator’s weights of the iterated value functions are updated by nonlinear programming aiming for the nearest admissible policy
The theoretical analysis show that the L1-norm of the estimated value functions obtained by distributed VI can converge to a neighborhood of the equilibrium ones. Furthermore, with appropriate approximating basis functions, the estimated policy can converge in the Euclidean norm to the equilibrium ones

Value Iteration-Based Distributed Adaptive Dynamic Programming for Multi-Player Differential Game With Incomplete Information

doi: 10.1109/JAS.2024.124950

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content