Citation: | Z. Peng, G. Wu, B. Luo, and L. Wang, “Multi-UAV cooperative pursuit strategy with limited visual field in urban airspace: A multi-agent reinforcement learning approach,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.124965 |
[1] |
Z. Zuo, C. Liu, Q.-L. Han, and J. Song, “Unmanned aerial vehicles: Control methods and future challenges,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 4, pp. 601–614, 2022. doi: 10.1109/JAS.2022.105410
|
[2] |
C. de Souza Junior, “Hunter drones: Drones cooperation for tracking an intruder drone,” Ph.D. dissertation, University of Technology of Compiègne, Compiègne, France, 2021.
|
[3] |
T. Lefebvre and T. Dubot, “Conceptual design study of an anti-drone drone,” in Proc. 16th AIAA Aviation Technology, Integration, and Operations Conf., 2016, p. 3449.
|
[4] |
T. H. Chung, G. A. Hollinger, and V. Isler, “Search and pursuit-evasion in mobile robotics: A survey,” Autonomous Robots, vol. 31, pp. 299–316, 2011. doi: 10.1007/s10514-011-9241-4
|
[5] |
R. Zhang, Q. Zong, X. Zhang, L. Dou, and B. Tian, “Game of drones: Multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7900–7909, 2022.
|
[6] |
Y.-C. Lai and T.-Y. Lin, “Vision-based mid-air object detection and avoidance approach for small unmanned aerial vehicles with deep learning and risk assessment,” Remote Sensing, vol. 16, no. 5, p. 756, 2024. doi: 10.3390/rs16050756
|
[7] |
C. De Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov, and D. Kulić, “Decentralized multi-agent pursuit using deep reinforcement learning,” IEEE Robotics and Automation Lett., vol. 6, no. 3, pp. 4552–4559, 2021. doi: 10.1109/LRA.2021.3068952
|
[8] |
T. Olsen, A. M. Tumlin, N. M. Stiffler, and J. M. O’Kane, “A visibility roadmap sampling approach for a multi-robot visibility-based pursuit-evasion problem,” in Proc. IEEE Int. Conf. Robotics and Automation, 2021, pp. 7957–7964.
|
[9] |
I. E. Weintraub, M. Pachter, and E. Garcia, “An introduction to pursuit-evasion differential games,” in Proc. American Control Conf., 2020, pp. 1049–1066.
|
[10] |
N. Chen, L. Li, and W. Mao, “Equilibrium strategy of the pursuit-evasion game in three-dimensional space,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 1–13, 2024. doi: 10.1109/JAS.2024.124239
|
[11] |
J. Chen, W. Zha, Z. Peng, and D. Gu, “Multi-player pursuit–evasion games with one superior evader,” Automatica, vol. 71, pp. 24–32, 2016. doi: 10.1016/j.automatica.2016.04.012
|
[12] |
X. Qu, W. Gan, D. Song, and L. Zhou, “Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment,” Ocean Engineering, vol. 273, p. 114016, 2023. doi: 10.1016/j.oceaneng.2023.114016
|
[13] |
B. P. L. Lau, B. J. Y. Ong, L. K. Y. Loh, R. Liu, C. Yuen, G. S. Soh, and U.-X. Tan, “Multi-AGV’s temporal memory-based RRT exploration in unknown environment,” IEEE Robotics and Automation Lett., vol. 7, no. 4, pp. 9256–9263, 2022. doi: 10.1109/LRA.2022.3190628
|
[14] |
Y. Wang, L. Dong, and C. Sun, “Cooperative control for multi-player pursuit-evasion games with reinforcement learning,” Neurocomputing, vol. 412, pp. 101–114, 2020. doi: 10.1016/j.neucom.2020.06.031
|
[15] |
S. F. Desouky and H. M. Schwartz, “Self-learning fuzzy logic controllers for pursuit–evasion differential games,” Robotics and Autonomous Systems, vol. 59, no. 1, pp. 22–33, 2011. doi: 10.1016/j.robot.2010.09.006
|
[16] |
X. Fu, H. Wang, and Z. Xu, “Cooperative pursuit strategy for multiUAVs based on DE-MADDPG algorithm,” Acta Aeronautica Astronautica Sinica, vol. 43, no. 5, pp. 325311–325311, 2022.
|
[17] |
K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” in Handbook of Reinforcement Learning and Control, Springer, 2021, vol. 325, pp. 321–384.
|
[18] |
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. Int. Conf. Machine Learning, 2018, pp. 1861–1870.
|
[19] |
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in Advances in Neural Information Processing Systems, 2022, pp. 24611–24624.
|
[20] |
B. Peng, T. Rashid, C. Schroeder de Witt, P.-A. Kamienny, P. Torr, W. Böhmer, and S. Whiteson, “FACMAC: Factored multi-agent centralised policy gradients,” in Proc. Advances in Neural Information Processing Systems, 2021, pp. 12208–12221.
|
[21] |
J. Ackermann, V. Gabler, T. Osa, and M. Sugiyama, “Reducing overestimation bias in multi-agent domains using double centralized critics,” arXiv preprint arXiv: 1910.01465, 2019.
|
[22] |
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971, 2015.
|
[23] |
R. Isaacs, Differential Games I: Introduction. Santa Monica, USA: Rand Corporation, 1954.
|
[24] |
T. D. Parsons, “Pursuit-evasion in a graph,” in Proc. Theory and Applications of Graphs: Proceedings, 2006, pp. 426–441.
|
[25] |
D. W. Oyler, P. T. Kabamba, and A. R. Girard, “Pursuit-evasion games in the presence of obstacles,” Automatica, vol. 65, pp. 1–11, 2016. doi: 10.1016/j.automatica.2015.11.018
|
[26] |
I. Suzuki and M. Yamashita, “Searching for a mobile intruder in a polygonal region,” SIAM J. Computing, vol. 21, no. 5, pp. 863–888, 1992. doi: 10.1137/0221051
|
[27] |
B. P. Gerkey, S. Thrun, and G. Gordon, “Visibility-based pursuit-evasion with limited field of view,” The Int. J. Robotics Research, vol. 25, no. 4, pp. 299–315, 2006. doi: 10.1177/0278364906065023
|
[28] |
B. Tovar and S. M. LaValle, “Visibility-based pursuit-evasion with bounded speed,” The Int. J. Robotics Research, vol. 27, no. 11–12, pp. 1350–1360, 1350.
|
[29] |
S. Sachs, S. M. LaValle, and S. Rajko, “Visibility-based pursuit-evasion in an unknown planar environment,” The Int. J. Robotics Research, vol. 23, no. 1, pp. 3–26, 2004. doi: 10.1177/0278364904039610
|
[30] |
A. Dumitrescu, H. Kok, I. Suzuki, and P. Żyliński, “Vision-based pursuit-evasion in a grid,” SIAM J. Discrete Mathematics, vol. 24, no. 3, pp. 1177–1204, 2010. doi: 10.1137/070700991
|
[31] |
X. Liang, B. Zhou, L. Jiang, G. Meng, and Y. Xiu, “Collaborative pursuit-evasion game of multi-UAVs based on Apollonius circle in the environment with obstacle,” Connection Science, vol. 35, no. 1, p. 2168253, 2023. doi: 10.1080/09540091.2023.2168253
|
[32] |
E. Lozano, U. Ruiz, I. Becerra, and R. Murrieta-Cid, “Surveillance and collision-free tracking of an aggressive evader with an actuated sensor pursuer,” IEEE Robotics and Automation Lett., vol. 7, no. 3, pp. 6854–6861, 2022. doi: 10.1109/LRA.2022.3178799
|
[33] |
G. Sartoretti, J. Kerr, Y. Shi, G. Wagner, T. S. Kumar, S. Koenig, and H. Choset, “PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019. doi: 10.1109/LRA.2019.2903261
|
[34] |
Z. Feng, M. Huang, D. Wu, E. Q. Wu, and C. Yuen, “Multi-agent reinforcement learning with policy clipping and average evaluation for uav-assisted communication markov game,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 12, pp. 14281–14293, Dec. 2023. doi: 10.1109/TITS.2023.3296769
|
[35] |
K. Xue, J. Xu, L. Yuan, M. Li, C. Qian, Z. Zhang, and Y. Yu, “Multi-agent dynamic algorithm configuration,” in Proc. Advances in Neural Information Processing Systems, 2022, pp. 20147–20161.
|
[36] |
J. Wang, Y. Hong, J. Wang, J. Xu, Y. Tang, Q.-L. Han, and J. Kurths, “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, 2022. doi: 10.1109/JAS.2022.105506
|
[37] |
T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multiagent reinforcement learning,” J. Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020.
|
[38] |
S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” arXiv preprint arXiv: 2105.14491, 2021.
|
[39] |
P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv: 1710.10903, 2017.
|
[40] |
X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in Proc. World Wide Web Conf., 2019, pp. 2022–2032.
|
[41] |
Y. Ye and S. Ji, “Sparse graph attention networks,” IEEE Trans. Knowledge and Data Engineering, vol. 35, no. 1, pp. 905–916, 2021.
|
[42] |
Z. Feng, D. Wu, M. Huang, and C. Yuen, “Graph attention-based reinforcement learning for trajectory design and resource assignment in multi-uav assisted communication,” IEEE Internet of Things J., 2024.
|
[43] |
W. Du, T. Guo, J. Chen, B. Li, G. Zhu, and X. Cao, “Cooperative pursuit of unauthorized UAVs in urban airspace via multi-agent reinforcement learning,” Transportation Research Part C: Emerging Technologies, vol. 128, p. 103122, 2021. doi: 10.1016/j.trc.2021.103122
|
[44] |
V. Isler, S. Kannan, and S. Khanna, “Randomized pursuit-evasion in a polygonal environment,” IEEE Trans. Robotics, vol. 21, no. 5, pp. 875–884, 2005. doi: 10.1109/TRO.2005.851373
|
[45] |
M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Wiele, V. Mnih, N. Heess, and J. T. Springenberg, “Learning by playing solving sparse reward tasks from scratch,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4344–4353.
|
[46] |
W. Dabney, M. Rowland, M. Bellemare, and R. Munos, “Distributional reinforcement learning with quantile regression,” in Proc. AAAI Conf. Artificial Intelligence, 2018, vol. 32, p. 1.
|
[47] |
D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. Int. Conf. Machine Learning, 2015, pp. 1530–1538.
|
[48] |
V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
|
[49] |
J. Liu, Y. Zhong, S. Hu, H. Fu, Q. Fu, X. Chang, and Y. Yang, “Maximum entropy heterogeneous-agent mirror learning,” arXiv preprint arXiv: 2306.10715, 2023.
|
[50] |
M. Wen, J. G. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, “Multi-agent reinforcement learning is a sequence modeling problem,” in Proc. Advances in Neural Information Processing Systems, 2022, pp. 16 509–16 521.
|