A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Z. Peng, G. Wu, B. Luo, and L. Wang, “Multi-UAV cooperative pursuit strategy with limited visual field in urban airspace: A multi-agent reinforcement learning approach,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.124965
Citation: Z. Peng, G. Wu, B. Luo, and L. Wang, “Multi-UAV cooperative pursuit strategy with limited visual field in urban airspace: A multi-agent reinforcement learning approach,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.124965

Multi-UAV Cooperative Pursuit Strategy With Limited Visual Field in Urban Airspace: A Multi-Agent Reinforcement Learning Approach

doi: 10.1109/JAS.2024.124965
Funds:  This work was supported in part by the National Natural Science Foundation of China (62373380)
More Information
  • The application of multiple unmanned aerial vehicles (UAVs) for the pursuit and capture of unauthorized UAVs has emerged as a novel approach to ensuring the safety of urban airspace. However, pursuit UAVs necessitate the utilization of their own sensors to proactively gather information from the unauthorized UAV. Considering the restricted sensing range of sensors, this paper proposes a multi-UAV with limited visual field pursuit-evasion (MUV-PE) problem. Each pursuer has a visual field characterized by limited perception distance and viewing angle, potentially obstructed by buildings. Only when the unauthorized UAV, i.e., the evader, enters the visual field of any pursuer can its position be acquired. The objective of the pursuers is to capture the evader as soon as possible without collision. To address this problem, we propose the normalizing flow actor with graph attention critic (NAGC) algorithm, a multi-agent reinforcement learning (MARL) approach. NAGC executes normalizing flows to augment the flexibility of policy network, enabling the agent to sample actions from more intricate distributions rather than common distributions. To enhance the capability of simultaneously comprehending spatial relationships among multiple UAVs and environmental obstacles, NAGC integrates the “obstacle-target” graph attention networks, significantly aiding pursuers in supporting search or pursuit activities. Extensive experiments conducted in a high-precision simulator validate the promising performance of the NAGC algorithm.

     

  • loading
  • 1 Supplementary Material of this paper can be found in links https://github.com/DrPengZhe/SM-for-MUV-PE.git.
    2 More details at https://www.unrealengine.com
    3 It can be download at https://microsoft.github.io/AirSim/
  • [1]
    Z. Zuo, C. Liu, Q.-L. Han, and J. Song, “Unmanned aerial vehicles: Control methods and future challenges,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 4, pp. 601–614, 2022. doi: 10.1109/JAS.2022.105410
    [2]
    C. de Souza Junior, “Hunter drones: Drones cooperation for tracking an intruder drone,” Ph.D. dissertation, University of Technology of Compiègne, Compiègne, France, 2021.
    [3]
    T. Lefebvre and T. Dubot, “Conceptual design study of an anti-drone drone,” in Proc. 16th AIAA Aviation Technology, Integration, and Operations Conf., 2016, p. 3449.
    [4]
    T. H. Chung, G. A. Hollinger, and V. Isler, “Search and pursuit-evasion in mobile robotics: A survey,” Autonomous Robots, vol. 31, pp. 299–316, 2011. doi: 10.1007/s10514-011-9241-4
    [5]
    R. Zhang, Q. Zong, X. Zhang, L. Dou, and B. Tian, “Game of drones: Multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7900–7909, 2022.
    [6]
    Y.-C. Lai and T.-Y. Lin, “Vision-based mid-air object detection and avoidance approach for small unmanned aerial vehicles with deep learning and risk assessment,” Remote Sensing, vol. 16, no. 5, p. 756, 2024. doi: 10.3390/rs16050756
    [7]
    C. De Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov, and D. Kulić, “Decentralized multi-agent pursuit using deep reinforcement learning,” IEEE Robotics and Automation Lett., vol. 6, no. 3, pp. 4552–4559, 2021. doi: 10.1109/LRA.2021.3068952
    [8]
    T. Olsen, A. M. Tumlin, N. M. Stiffler, and J. M. O’Kane, “A visibility roadmap sampling approach for a multi-robot visibility-based pursuit-evasion problem,” in Proc. IEEE Int. Conf. Robotics and Automation, 2021, pp. 7957–7964.
    [9]
    I. E. Weintraub, M. Pachter, and E. Garcia, “An introduction to pursuit-evasion differential games,” in Proc. American Control Conf., 2020, pp. 1049–1066.
    [10]
    N. Chen, L. Li, and W. Mao, “Equilibrium strategy of the pursuit-evasion game in three-dimensional space,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 1–13, 2024. doi: 10.1109/JAS.2024.124239
    [11]
    J. Chen, W. Zha, Z. Peng, and D. Gu, “Multi-player pursuit–evasion games with one superior evader,” Automatica, vol. 71, pp. 24–32, 2016. doi: 10.1016/j.automatica.2016.04.012
    [12]
    X. Qu, W. Gan, D. Song, and L. Zhou, “Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment,” Ocean Engineering, vol. 273, p. 114016, 2023. doi: 10.1016/j.oceaneng.2023.114016
    [13]
    B. P. L. Lau, B. J. Y. Ong, L. K. Y. Loh, R. Liu, C. Yuen, G. S. Soh, and U.-X. Tan, “Multi-AGV’s temporal memory-based RRT exploration in unknown environment,” IEEE Robotics and Automation Lett., vol. 7, no. 4, pp. 9256–9263, 2022. doi: 10.1109/LRA.2022.3190628
    [14]
    Y. Wang, L. Dong, and C. Sun, “Cooperative control for multi-player pursuit-evasion games with reinforcement learning,” Neurocomputing, vol. 412, pp. 101–114, 2020. doi: 10.1016/j.neucom.2020.06.031
    [15]
    S. F. Desouky and H. M. Schwartz, “Self-learning fuzzy logic controllers for pursuit–evasion differential games,” Robotics and Autonomous Systems, vol. 59, no. 1, pp. 22–33, 2011. doi: 10.1016/j.robot.2010.09.006
    [16]
    X. Fu, H. Wang, and Z. Xu, “Cooperative pursuit strategy for multiUAVs based on DE-MADDPG algorithm,” Acta Aeronautica Astronautica Sinica, vol. 43, no. 5, pp. 325311–325311, 2022.
    [17]
    K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” in Handbook of Reinforcement Learning and Control, Springer, 2021, vol. 325, pp. 321–384.
    [18]
    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. Int. Conf. Machine Learning, 2018, pp. 1861–1870.
    [19]
    C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in Advances in Neural Information Processing Systems, 2022, pp. 24611–24624.
    [20]
    B. Peng, T. Rashid, C. Schroeder de Witt, P.-A. Kamienny, P. Torr, W. Böhmer, and S. Whiteson, “FACMAC: Factored multi-agent centralised policy gradients,” in Proc. Advances in Neural Information Processing Systems, 2021, pp. 12208–12221.
    [21]
    J. Ackermann, V. Gabler, T. Osa, and M. Sugiyama, “Reducing overestimation bias in multi-agent domains using double centralized critics,” arXiv preprint arXiv: 1910.01465, 2019.
    [22]
    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971, 2015.
    [23]
    R. Isaacs, Differential Games I: Introduction. Santa Monica, USA: Rand Corporation, 1954.
    [24]
    T. D. Parsons, “Pursuit-evasion in a graph,” in Proc. Theory and Applications of Graphs: Proceedings, 2006, pp. 426–441.
    [25]
    D. W. Oyler, P. T. Kabamba, and A. R. Girard, “Pursuit-evasion games in the presence of obstacles,” Automatica, vol. 65, pp. 1–11, 2016. doi: 10.1016/j.automatica.2015.11.018
    [26]
    I. Suzuki and M. Yamashita, “Searching for a mobile intruder in a polygonal region,” SIAM J. Computing, vol. 21, no. 5, pp. 863–888, 1992. doi: 10.1137/0221051
    [27]
    B. P. Gerkey, S. Thrun, and G. Gordon, “Visibility-based pursuit-evasion with limited field of view,” The Int. J. Robotics Research, vol. 25, no. 4, pp. 299–315, 2006. doi: 10.1177/0278364906065023
    [28]
    B. Tovar and S. M. LaValle, “Visibility-based pursuit-evasion with bounded speed,” The Int. J. Robotics Research, vol. 27, no. 11–12, pp. 1350–1360, 1350.
    [29]
    S. Sachs, S. M. LaValle, and S. Rajko, “Visibility-based pursuit-evasion in an unknown planar environment,” The Int. J. Robotics Research, vol. 23, no. 1, pp. 3–26, 2004. doi: 10.1177/0278364904039610
    [30]
    A. Dumitrescu, H. Kok, I. Suzuki, and P. Żyliński, “Vision-based pursuit-evasion in a grid,” SIAM J. Discrete Mathematics, vol. 24, no. 3, pp. 1177–1204, 2010. doi: 10.1137/070700991
    [31]
    X. Liang, B. Zhou, L. Jiang, G. Meng, and Y. Xiu, “Collaborative pursuit-evasion game of multi-UAVs based on Apollonius circle in the environment with obstacle,” Connection Science, vol. 35, no. 1, p. 2168253, 2023. doi: 10.1080/09540091.2023.2168253
    [32]
    E. Lozano, U. Ruiz, I. Becerra, and R. Murrieta-Cid, “Surveillance and collision-free tracking of an aggressive evader with an actuated sensor pursuer,” IEEE Robotics and Automation Lett., vol. 7, no. 3, pp. 6854–6861, 2022. doi: 10.1109/LRA.2022.3178799
    [33]
    G. Sartoretti, J. Kerr, Y. Shi, G. Wagner, T. S. Kumar, S. Koenig, and H. Choset, “PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019. doi: 10.1109/LRA.2019.2903261
    [34]
    Z. Feng, M. Huang, D. Wu, E. Q. Wu, and C. Yuen, “Multi-agent reinforcement learning with policy clipping and average evaluation for uav-assisted communication markov game,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 12, pp. 14281–14293, Dec. 2023. doi: 10.1109/TITS.2023.3296769
    [35]
    K. Xue, J. Xu, L. Yuan, M. Li, C. Qian, Z. Zhang, and Y. Yu, “Multi-agent dynamic algorithm configuration,” in Proc. Advances in Neural Information Processing Systems, 2022, pp. 20147–20161.
    [36]
    J. Wang, Y. Hong, J. Wang, J. Xu, Y. Tang, Q.-L. Han, and J. Kurths, “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, 2022. doi: 10.1109/JAS.2022.105506
    [37]
    T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multiagent reinforcement learning,” J. Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020.
    [38]
    S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” arXiv preprint arXiv: 2105.14491, 2021.
    [39]
    P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv: 1710.10903, 2017.
    [40]
    X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in Proc. World Wide Web Conf., 2019, pp. 2022–2032.
    [41]
    Y. Ye and S. Ji, “Sparse graph attention networks,” IEEE Trans. Knowledge and Data Engineering, vol. 35, no. 1, pp. 905–916, 2021.
    [42]
    Z. Feng, D. Wu, M. Huang, and C. Yuen, “Graph attention-based reinforcement learning for trajectory design and resource assignment in multi-uav assisted communication,” IEEE Internet of Things J., 2024.
    [43]
    W. Du, T. Guo, J. Chen, B. Li, G. Zhu, and X. Cao, “Cooperative pursuit of unauthorized UAVs in urban airspace via multi-agent reinforcement learning,” Transportation Research Part C: Emerging Technologies, vol. 128, p. 103122, 2021. doi: 10.1016/j.trc.2021.103122
    [44]
    V. Isler, S. Kannan, and S. Khanna, “Randomized pursuit-evasion in a polygonal environment,” IEEE Trans. Robotics, vol. 21, no. 5, pp. 875–884, 2005. doi: 10.1109/TRO.2005.851373
    [45]
    M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Wiele, V. Mnih, N. Heess, and J. T. Springenberg, “Learning by playing solving sparse reward tasks from scratch,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4344–4353.
    [46]
    W. Dabney, M. Rowland, M. Bellemare, and R. Munos, “Distributional reinforcement learning with quantile regression,” in Proc. AAAI Conf. Artificial Intelligence, 2018, vol. 32, p. 1.
    [47]
    D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Proc. Int. Conf. Machine Learning, 2015, pp. 1530–1538.
    [48]
    V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
    [49]
    J. Liu, Y. Zhong, S. Hu, H. Fu, Q. Fu, X. Chang, and Y. Yang, “Maximum entropy heterogeneous-agent mirror learning,” arXiv preprint arXiv: 2306.10715, 2023.
    [50]
    M. Wen, J. G. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, “Multi-agent reinforcement learning is a sequence modeling problem,” in Proc. Advances in Neural Information Processing Systems, 2022, pp. 16 509–16 521.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(13)  / Tables(2)

    Article Metrics

    Article views (9) PDF downloads(3) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return