Multi-UAV Cooperative Pursuit Strategy With Limited Visual Field in Urban Airspace: A Multi-Agent Reinforcement Learning Approach

Zhe Peng; Guohua Wu; Biao Luo; Ling Wang

doi:10.1109/JAS.2024.124965

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

Z. Peng, G. Wu, B. Luo, and L. Wang, “Multi-UAV cooperative pursuit strategy with limited visual field in urban airspace: A multi-agent reinforcement learning approach,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.124965

Citation:

Z. Peng, G. Wu, B. Luo, and L. Wang, “Multi-UAV cooperative pursuit strategy with limited visual field in urban airspace: A multi-agent reinforcement learning approach,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.124965

Citation:

PDF( 8179 KB)

Multi-UAV Cooperative Pursuit Strategy With Limited Visual Field in Urban Airspace: A Multi-Agent Reinforcement Learning Approach

doi: 10.1109/JAS.2024.124965

Funds: This work was supported in part by the National Natural Science Foundation of China (62373380)

More Information

Author Bio:
Zhe Peng received the B.Eng degree in transportation from the School of Transportation an Logistics from Southwest Jiaotong University in 2022. He is currently a master student in transportation at the School of Traffic and Transportation Engineering, Central South University. His research interests include reinforcement learning and its applications in optimization, and control problems

Guohua Wu (Senior Member, IEEE) received the B.S. degree in information systems and the Ph.D. degree in operations research from National University of Defense Technology, in 2008 and 2014, respectively. During 2012 and 2014, he was a visiting Ph.D. student at University of Alberta, Canada. He is currently a Professor at the School of Automation, Central South University. His current research interests include planning and scheduling, computational intelligence. He has authored more than 100 referred papers and serves as an Associate Editor of Information Sciences, and an Associate Editor of Swarm and Evolutionary Computation Journal

Biao Luo (Senior Member, IEEE) received the Ph.D. degree in control science and engineering from Beihang University in 2014. He is currently a Professor with the School of Automation, Central South University. Before that, he was an Associate Professor with the Institute of Automation, Chinese Academy of Sciences. His current research interests include intelligent control and decision-making. He serves as an Associate Editor for IEEE TNNLS, Artificial Intelligence Review, and Neurocomputing. He is the Vice-Chair of Adaptive Dynamic Programming and Reinforcement Learning Technical Committee of CAA

Ling Wang (Member, IEEE) received the B.Sc. degree in automation and the Ph.D. degree in control theory and control engineering from Tsinghua University in 1995 and 1999, respectively. He became a Full Professor with the Department of Automation, Tsinghua University in 2008. His current research interests include intelligent optimization and production scheduling. Prof. Wang has authored five academic books and more than 300 refereed papers. He was a recipient of the National Natural Science Fund for Distinguished Young Scholars of China, the National Natural Science Award, and the Natural Science Award nominated by the Ministry of Education of China. He is the Editor-in-Chief of International Journal of Automation and Control and an Associate Editor of IEEE TEVC, Swarm and Evolutionary Computation, and Expert Systems With Applications
Corresponding author: Guohua Wu, e-mail: guohuawu@csu.edu.cn
¹ Supplementary Material of this paper can be found in links https://github.com/DrPengZhe/SM-for-MUV-PE.git.
² More details at https://www.unrealengine.com
³ It can be download at https://microsoft.github.io/AirSim/
Received Date: 2024-08-02
Revised Date: 2024-09-24
Accepted Date: 2024-10-02

Available Online: 2025-03-06

Abstract

Abstract

The application of multiple unmanned aerial vehicles (UAVs) for the pursuit and capture of unauthorized UAVs has emerged as a novel approach to ensuring the safety of urban airspace. However, pursuit UAVs necessitate the utilization of their own sensors to proactively gather information from the unauthorized UAV. Considering the restricted sensing range of sensors, this paper proposes a multi-UAV with limited visual field pursuit-evasion (MUV-PE) problem. Each pursuer has a visual field characterized by limited perception distance and viewing angle, potentially obstructed by buildings. Only when the unauthorized UAV, i.e., the evader, enters the visual field of any pursuer can its position be acquired. The objective of the pursuers is to capture the evader as soon as possible without collision. To address this problem, we propose the normalizing flow actor with graph attention critic (NAGC) algorithm, a multi-agent reinforcement learning (MARL) approach. NAGC executes normalizing flows to augment the flexibility of policy network, enabling the agent to sample actions from more intricate distributions rather than common distributions. To enhance the capability of simultaneously comprehending spatial relationships among multiple UAVs and environmental obstacles, NAGC integrates the “obstacle-target” graph attention networks, significantly aiding pursuers in supporting search or pursuit activities. Extensive experiments conducted in a high-precision simulator validate the promising performance of the NAGC algorithm.
- Graph attention network,
- limited visual field,
- multi-agent reinforcement learning (MARL),
- normalizing flow,
- pursuit-evasion

FullText(HTML)

¹ Supplementary Material of this paper can be found in links https://github.com/DrPengZhe/SM-for-MUV-PE.git.
² More details at https://www.unrealengine.com
³ It can be download at https://microsoft.github.io/AirSim/

References(50)

References

[1]	Z. Zuo, C. Liu, Q.-L. Han, and J. Song, “Unmanned aerial vehicles: Control methods and future challenges,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 4, pp. 601–614, 2022. doi: 10.1109/JAS.2022.105410
[2]	C. de Souza Junior, “Hunter drones: Drones cooperation for tracking an intruder drone,” Ph.D. dissertation, University of Technology of Compiègne, Compiègne, France, 2021.
[3]	T. Lefebvre and T. Dubot, “Conceptual design study of an anti-drone drone,” in Proc. 16th AIAA Aviation Technology, Integration, and Operations Conf., 2016, p. 3449.
[4]	T. H. Chung, G. A. Hollinger, and V. Isler, “Search and pursuit-evasion in mobile robotics: A survey,” Autonomous Robots, vol. 31, pp. 299–316, 2011. doi: 10.1007/s10514-011-9241-4
[5]	R. Zhang, Q. Zong, X. Zhang, L. Dou, and B. Tian, “Game of drones: Multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7900–7909, 2022.
[6]	Y.-C. Lai and T.-Y. Lin, “Vision-based mid-air object detection and avoidance approach for small unmanned aerial vehicles with deep learning and risk assessment,” Remote Sensing, vol. 16, no. 5, p. 756, 2024. doi: 10.3390/rs16050756
[7]	C. De Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov, and D. Kulić, “Decentralized multi-agent pursuit using deep reinforcement learning,” IEEE Robotics and Automation Lett., vol. 6, no. 3, pp. 4552–4559, 2021. doi: 10.1109/LRA.2021.3068952
[8]	T. Olsen, A. M. Tumlin, N. M. Stiffler, and J. M. O’Kane, “A visibility roadmap sampling approach for a multi-robot visibility-based pursuit-evasion problem,” in Proc. IEEE Int. Conf. Robotics and Automation, 2021, pp. 7957–7964.
[9]	I. E. Weintraub, M. Pachter, and E. Garcia, “An introduction to pursuit-evasion differential games,” in Proc. American Control Conf., 2020, pp. 1049–1066.
[10]	N. Chen, L. Li, and W. Mao, “Equilibrium strategy of the pursuit-evasion game in three-dimensional space,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 1–13, 2024. doi: 10.1109/JAS.2024.124239
[11]	J. Chen, W. Zha, Z. Peng, and D. Gu, “Multi-player pursuit–evasion games with one superior evader,” Automatica, vol. 71, pp. 24–32, 2016. doi: 10.1016/j.automatica.2016.04.012
[12]	X. Qu, W. Gan, D. Song, and L. Zhou, “Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment,” Ocean Engineering, vol. 273, p. 114016, 2023. doi: 10.1016/j.oceaneng.2023.114016
[13]	B. P. L. Lau, B. J. Y. Ong, L. K. Y. Loh, R. Liu, C. Yuen, G. S. Soh, and U.-X. Tan, “Multi-AGV’s temporal memory-based RRT exploration in unknown environment,” IEEE Robotics and Automation Lett., vol. 7, no. 4, pp. 9256–9263, 2022. doi: 10.1109/LRA.2022.3190628
[14]	Y. Wang, L. Dong, and C. Sun, “Cooperative control for multi-player pursuit-evasion games with reinforcement learning,” Neurocomputing, vol. 412, pp. 101–114, 2020. doi: 10.1016/j.neucom.2020.06.031
[15]	S. F. Desouky and H. M. Schwartz, “Self-learning fuzzy logic controllers for pursuit–evasion differential games,” Robotics and Autonomous Systems, vol. 59, no. 1, pp. 22–33, 2011. doi: 10.1016/j.robot.2010.09.006
[16]	X. Fu, H. Wang, and Z. Xu, “Cooperative pursuit strategy for multiUAVs based on DE-MADDPG algorithm,” Acta Aeronautica Astronautica Sinica, vol. 43, no. 5, pp. 325311–325311, 2022.
[17]	K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” in Handbook of Reinforcement Learning and Control, Springer, 2021, vol. 325, pp. 321–384.
[18]	T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. Int. Conf. Machine Learning, 2018, pp. 1861–1870.
[19]	C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in Advances in Neural Information Processing Systems, 2022, pp. 24611–24624.
[20]	B. Peng, T. Rashid, C. Schroeder de Witt, P.-A. Kamienny, P. Torr, W. Böhmer, and S. Whiteson, “FACMAC: Factored multi-agent centralised policy gradients,” in Proc. Advances in Neural Information Processing Systems, 2021, pp. 12208–12221.
[21]	J. Ackermann, V. Gabler, T. Osa, and M. Sugiyama, “Reducing overestimation bias in multi-agent domains using double centralized critics,” arXiv preprint arXiv: 1910.01465, 2019.
[22]	T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971, 2015.
[23]	R. Isaacs, Differential Games I: Introduction. Santa Monica, USA: Rand Corporation, 1954.
[24]	T. D. Parsons, “Pursuit-evasion in a graph,” in Proc. Theory and Applications of Graphs: Proceedings, 2006, pp. 426–441.
[25]	D. W. Oyler, P. T. Kabamba, and A. R. Girard, “Pursuit-evasion games in the presence of obstacles,” Automatica, vol. 65, pp. 1–11, 2016. doi: 10.1016/j.automatica.2015.11.018
[26]	I. Suzuki and M. Yamashita, “Searching for a mobile intruder in a polygonal region,” SIAM J. Computing, vol. 21, no. 5, pp. 863–888, 1992. doi: 10.1137/0221051
[27]	B. P. Gerkey, S. Thrun, and G. Gordon, “Visibility-based pursuit-evasion with limited field of view,” The Int. J. Robotics Research, vol. 25, no. 4, pp. 299–315, 2006. doi: 10.1177/0278364906065023
[28]	B. Tovar and S. M. LaValle, “Visibility-based pursuit-evasion with bounded speed,” The Int. J. Robotics Research, vol. 27, no. 11–12, pp. 1350–1360, 1350.
[29]	S. Sachs, S. M. LaValle, and S. Rajko, “Visibility-based pursuit-evasion in an unknown planar environment,” The Int. J. Robotics Research, vol. 23, no. 1, pp. 3–26, 2004. doi: 10.1177/0278364904039610
[30]	A. Dumitrescu, H. Kok, I. Suzuki, and P. Żyliński, “Vision-based pursuit-evasion in a grid,” SIAM J. Discrete Mathematics, vol. 24, no. 3, pp. 1177–1204, 2010. doi: 10.1137/070700991
[31]	X. Liang, B. Zhou, L. Jiang, G. Meng, and Y. Xiu, “Collaborative pursuit-evasion game of multi-UAVs based on Apollonius circle in the environment with obstacle,” Connection Science, vol. 35, no. 1, p. 2168253, 2023. doi: 10.1080/09540091.2023.2168253
[32]	E. Lozano, U. Ruiz, I. Becerra, and R. Murrieta-Cid, “Surveillance and collision-free tracking of an aggressive evader with an actuated sensor pursuer,” IEEE Robotics and Automation Lett., vol. 7, no. 3, pp. 6854–6861, 2022. doi: 10.1109/LRA.2022.3178799
[33]	G. Sartoretti, J. Kerr, Y. Shi, G. Wagner, T. S. Kumar, S. Koenig, and H. Choset, “PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019. doi: 10.1109/LRA.2019.2903261
[34]	Z. Feng, M. Huang, D. Wu, E. Q. Wu, and C. Yuen, “Multi-agent reinforcement learning with policy clipping and average evaluation for uav-assisted communication markov game,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 12, pp. 14281–14293, Dec. 2023. doi: 10.1109/TITS.2023.3296769
[35]	K. Xue, J. Xu, L. Yuan, M. Li, C. Qian, Z. Zhang, and Y. Yu, “Multi-agent dynamic algorithm configuration,” in Proc. Advances in Neural Information Processing Systems, 2022, pp. 20147–20161.
[36]	J. Wang, Y. Hong, J. Wang, J. Xu, Y. Tang, Q.-L. Han, and J. Kurths, “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, 2022. doi: 10.1109/JAS.2022.105506
[37]	T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multiagent reinforcement learning,” J. Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020.
[38]	S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” arXiv preprint arXiv: 2105.14491, 2021.
[39]	P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv: 1710.10903, 2017.
[40]	X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in Proc. World Wide Web Conf., 2019, pp. 2022–2032.
[41]	Y. Ye and S. Ji, “Sparse graph attention networks,” IEEE Trans. Knowledge and Data Engineering, vol. 35, no. 1, pp. 905–916, 2021.
[42]	Z. Feng, D. Wu, M. Huang, and C. Yuen, “Graph attention-based reinforcement learning for trajectory design and resource assignment in multi-uav assisted communication,” IEEE Internet of Things J., 2024.
[43]	W. Du, T. Guo, J. Chen, B. Li, G. Zhu, and X. Cao, “Cooperative pursuit of unauthorized UAVs in urban airspace via multi-agent reinforcement learning,” Transportation Research Part C: Emerging Technologies, vol. 128, p. 103122, 2021. doi: 10.1016/j.trc.2021.103122
[44]	V. Isler, S. Kannan, and S. Khanna, “Randomized pursuit-evasion in a polygonal environment,” IEEE Trans. Robotics, vol. 21, no. 5, pp. 875–884, 2005. doi: 10.1109/TRO.2005.851373
[45]	M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Wiele, V. Mnih, N. Heess, and J. T. Springenberg, “Learning by playing solving sparse reward tasks from scratch,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4344–4353.
[46]	W. Dabney, M. Rowland, M. Bellemare, and R. Munos, “Distributional reinforcement learning with quantile regression,” in Proc. AAAI Conf. Artificial Intelligence, 2018, vol. 32, p. 1.
[47]	D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. Int. Conf. Machine Learning, 2015, pp. 1530–1538.
[48]	V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
[49]	J. Liu, Y. Zhong, S. Hu, H. Fu, Q. Fu, X. Chang, and Y. Yang, “Maximum entropy heterogeneous-agent mirror learning,” arXiv preprint arXiv: 2306.10715, 2023.
[50]	M. Wen, J. G. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, “Multi-agent reinforcement learning is a sequence modeling problem,” in Proc. Advances in Neural Information Processing Systems, 2022, pp. 16 509–16 521.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(13) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (9) PDF downloads(3)

Multi-UAV Cooperative Pursuit Strategy With Limited Visual Field in Urban Airspace: A Multi-Agent Reinforcement Learning Approach

doi: 10.1109/JAS.2024.124965

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content