A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064
Citation: J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064

Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots

doi: 10.1109/JAS.2024.125064
Funds:  This work was partially supported by project RELIABLE (PTDC/EEI-AUT/3522/2020), R&D Unit SYSTEC - Base (UIDB001472020) and Programmatic (UIDP001472020) funds - and Associate Laboratory Advanced Production and Intelligent Systems ARISE - LAP01122020, funded by national funds through the FCT/MCTES (PIDDAC)
More Information
  • The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges, particularly Coverage Path Planning. While this task has been typically tackled with classical algorithms, these often struggle with flexibility and adaptability in unknown environments. On the other hand, recent advances in Reinforcement Learning offer promising approaches, yet a significant gap in the literature remains when it comes to generalization over a large number of parameters. This paper presents a unified, generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques. The novelty of the framework comes from the design of an observation space that accommodates different map sizes, an action masking scheme that guarantees safety and robustness while also serving as a learning-from-demonstration technique during training, and a unique reward function that yields value functions that are size-invariant. These are coupled with a curriculum learning-based training strategy and parametric environment randomization, enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes, configurations, sensor payloads, and sub-tasks. Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training, outperforming a greedy heuristic by sixfold. Furthermore, in out-of-distribution environments, our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios, paving the way for generalizable and adaptable path-planning algorithms.

     

  • loading
  • 1 In this work, we consider the algorithm to be safe as long as it is collision-free and complete as long there is a finite upper bound to the number of time steps it takes to finish the task.
    2 https://youtu.be/ZockV7Nul28
  • [1]
    E. Galceran and M. Carreras, “A survey on coverage path planning for robotics,” Robot. Auton. Syst., vol. 61, no. 12, pp. 1258–1276, Dec. 2013. doi: 10.1016/j.robot.2013.09.004
    [2]
    D. K. Noh, W. J. Lee, H. R. Kim, I. S. Cho, I. B. Shim, and S. M. Baek, “Adaptive coverage path planning policy for a cleaning robot with deep reinforcement learning,” in Proc. IEEE Int. Conf. Consumer Electronics, Las Vegas, USA, 2022, pp. 1–6.
    [3]
    B. Nasirian, M. Mehrandezh, and F. Janabi-Sharifi, “Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment,” Front. Robot. AI, vol. 8, p. 624333, Mar. 2021. doi: 10.3389/frobt.2021.624333
    [4]
    D. Albani, D. Nardi, and V. Trianni, “Field coverage and weed mapping by UAV swarms,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 4319–4325.
    [5]
    A. J. Moshayedi, A. Sohail Khan, Y. Yang, J. Hu, and A. Kolahdooz, “Robots in agriculture: Revolutionizing farming practices,” EAI Endorsed Trans. AI Robot., vol. 3, pp. 1–23, Jun. 2024.
    [6]
    T. M. Cabreira, C. Di Franco, P. R. Ferreira, and G. C. Buttazzo, “Energy-aware spiral coverage path planning for UAV photogrammetric applications,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3662–3668, Oct. 2018. doi: 10.1109/LRA.2018.2854967
    [7]
    D. Baldazo, J. Parras, and S. Zazo, “Decentralized multi-agent deep reinforcement learning in swarms of drones for flood monitoring,” in Proc. 27th European Signal Processing Conf., A Coruna, Spain, 2019, pp. 1–5.
    [8]
    S. Y. Luis, D. G. Reina, and S. L. T. Marín, “A deep reinforcement learning approach for the patrolling problem of water resources through autonomous surface vehicles: The ypacarai lake case,” IEEE Access, vol. 8, pp. 204076–204093, Nov. 2020. doi: 10.1109/ACCESS.2020.3036938
    [9]
    C. Piciarelli and G. L. Foresti, “Drone patrolling with reinforcement learning,” in Proc. 13th Int. Conf. Distributed Smart Cameras, Trento, Italy, 2019, pp. 4.
    [10]
    H. Choset, “Coverage for robotics - a survey of recent results,” Ann. Math. Artif. Intell., vol. 31, no. 1, pp. 113–126, Oct. 2001.
    [11]
    T. M. Cabreira, L. B. Brisolara, and P. R. Jr. Ferreira, “Survey on coverage path planning with unmanned aerial vehicles,” Drones, vol. 3, no. 1, p. 4, Jan. 2019. doi: 10.3390/drones3010004
    [12]
    C. S. Tan, R. Mohd-Mokhtar, and M. R. Arshad, “A comprehensive review of coverage path planning in robotics using classical and heuristic algorithms,” IEEE Access, vol. 9, pp. 119310–119342, Aug. 2021. doi: 10.1109/ACCESS.2021.3108177
    [13]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
    [14]
    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv: 1712.01815, 2017.
    [15]
    C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. D. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang, “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv: 1912.06680, 2019. (查阅网上资料,未找到作者中ę字母的代码,请确认)
    [16]
    O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Oct. 2019. doi: 10.1038/s41586-019-1724-z
    [17]
    P. R. Wurman, S. Barrett, K. Kawamoto, J. Macglashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Kompella, H. Lin, P. Macalpine, D. Oller, T. Seno, C. Sherstan, M. D. Thomure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Dürr, P. Stone, M. Spranger, and H. Kitano, “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, Feb. 2022. doi: 10.1038/s41586-021-04357-7
    [18]
    A. Kanervisto, C. Scheller, and V. Hautamaki, “Action space shaping in deep reinforcement learning,” in Proc. IEEE Conf. Games, Osaka, Japan, 2020, pp. 479–486.
    [19]
    J. Heydari, O. Saha, and V. Ganapathy, “Reinforcement learning-based coverage path planning with implicit cellular decomposition,” arXiv preprint arXiv: 2110.09018, 2021.
    [20]
    M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018.
    [21]
    A. Mannan, M. S. Obaidat, K. Mahmood, A. Ahmad, and R. Ahmad, “Classical versus reinforcement learning algorithms for unmanned aerial vehicle network communication and coverage path planning: A systematic literature review,” Int. J. Commun. Syst., vol. 36, no. 5, p. e5423, Mar. 2023. doi: 10.1002/dac.5423
    [22]
    Z. Li, S. Li, A. Francis, and X. Luo, “A novel calibration system for robot arm via an open dataset and a learning perspective,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 69, no. 12, pp. 5169–5173, Dec. 2022.
    [23]
    L. Piardi, J. Lima, A. I. Pereira, and P. Costa, “Coverage path planning optimization based on Q-learning algorithm,” AIP Conf. Proc., vol. 2116, no. 1, p. 220002, Jul. 2019.
    [24]
    J. Xiao, G. Wang, Y. Zhang, and L. Cheng, “A distributed multi-agent dynamic area coverage algorithm based on reinforcement learning,” IEEE Access, vol. 8, pp. 33511–33521, Jan. 2020. doi: 10.1109/ACCESS.2020.2967225
    [25]
    J. P. Carvalho and A. P. Aguiar, “A reinforcement learning based online coverage path planning algorithm,” in Proc. IEEE Int. Conf. Autonomous Robot Systems and Competitions, Tomar, Portugal, 2023, pp. 81–86.
    [26]
    M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UAV coverage path planning under varying power constraints using deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Las Vegas, USA, 2020, pp. 1444–1449.
    [27]
    M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UAV path planning using global and local map information with deep reinforcement learning,” in Proc. 20th Int. Conf. Advanced Robotics, Ljubljana, Slovenia, 2021, pp. 539–546.
    [28]
    H. Bayerlein, M. Theile, M. Caccamo, and D. Gesbert, “UAV path planning for wireless data harvesting: A deep reinforcement learning approach,” in Proc. IEEE Global Communications Conf., Taipei, China, 2020, pp. 1–6.
    [29]
    M. Theile, H. Bayerlein, M. Caccamo, and A. L. Sangiovanni-Vincentelli, “Learning to recharge: UAV coverage path planning through deep reinforcement learning,” arXiv preprint arXiv: 2309.03157, 2023.
    [30]
    O. Saha, G. Ren, J. Heydari, V. Ganapathy, and M. Shah, “Deep reinforcement learning based online area covering autonomous robot,” in Proc. 7th Int. Conf. Automation, Robotics and Applications, Prague, Czech Republic, 2021, pp. 21–25.
    [31]
    O. Saha, G. Ren, J. Heydari, V. Ganapathy, and M. Shah, “Online area covering robot in unknown dynamic environments,” in Proc. 7th Int. Conf. Automation, Robotics and Applications, Prague, Czech Republic, 2021, pp. 38–42.
    [32]
    A. Ianenko, A. Artamonov, G. Sarapulov, A. Safaraleev, S. Bogomolov, and D. K. Noh, “Coverage path planning with proximal policy optimization in a grid-based environment,” in Proc. 59th IEEE Conf. Decision and Control, Jeju, Korea, 2020, pp. 4099–4104.
    [33]
    R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel, “A survey of zero-shot generalisation in deep reinforcement learning,” J. Artif. Int. Res., vol. 76, pp. 201–264, Jan. 2023.
    [34]
    M. Hessel, H. Van Hasselt, J. Modayil, and D. Silver, “On inductive biases in deep reinforcement learning,” arXiv preprint arXiv: 1907.02908, 2019.
    [35]
    T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, G. Dulac-Arnold, J. Agapiou, J. Z. Leibo, and A. Gruslys, “Deep Q-learning from demonstrations,” in Proc. 32nd AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018. (查阅网上资料,未找到页码信息,请确认补充)
    [36]
    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in Proc. IEEE Int. Conf. Robotics and Automation, Brisbane, Australia, 2018, 3803–3810.
    [37]
    S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” J. Mach. Learn. Res., vol. 21, no. 1, p. 181, Jan. 2020.
    [38]
    A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune, “First return, then explore,” Nature, vol. 590, no. 7847, pp. 580–586, Feb. 2021. doi: 10.1038/s41586-020-03157-9
    [39]
    J. E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J., vol. 4, no. 1, pp. 25–30, Dec. 1965. doi: 10.1147/sj.41.0025
    [40]
    L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artif. Intell., vol. 101, no. 1-2, pp. 99–134, May 1998. doi: 10.1016/S0004-3702(98)00023-X
    [41]
    F. Pardo, A. Tavakoli, V. Levdik, and P. Kormushev, “Time limits in reinforcement learning,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4042–4051.
    [42]
    S. Huang and S. Ontañón, “A closer look at invalid action masking in policy gradient algorithms,” in Proc. 35th Int. Florida Artificial Intelligence Research Society Conf., Hutchinson Island, USA, 2022.
    [43]
    R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert, and M. Althoff, “Excluding the irrelevant: Focusing reinforcement learning through continuous action masking,” arXiv preprint arXiv: 2406.03704, 2024.
    [44]
    Y. Hou, X. Liang, J. Zhang, Q. Yang, A. Yang, and N. Wang, “Exploring the use of invalid action masking in reinforcement learning: A comparative study of on-policy and off-policy algorithms in real-time strategy games,” Appl. Sci., vol. 13, no. 14, p. 8283, Jul. 2023. doi: 10.3390/app13148283
    [45]
    D. Zhong, Y. Yang, and Q. Zhao, “No prior mask: Eliminate redundant action for deep reinforcement learning,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, pp. 17078–17086.
    [46]
    A. Y. Ng, D. Harada, and S. J. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Proc. 16th Int. Conf. Machine Learning, San Francisco, USA: ACM, 1999, pp. 278–287.
    [47]
    M. Fortunato, M. G. Azar, B. Piot, J. Menick, M. Hessel, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, “Noisy networks for exploration,” in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada: ICLR, 2018.
    [48]
    Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA: ICML, 2016, pp. 1995–2003.
    [49]
    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico: ICLR, 2016.
    [50]
    D. Schmidt and T. Schmied, “Fast and data-efficient training of rainbow: An experimental study on Atari,” arXiv preprint arXiv: 2111.10247, 2021.
    [51]
    A. Stooke and P. Abbeel, “Accelerated methods for deep reinforcement learning,” arXiv preprint arXiv: 1803.02811, 2019.
    [52]
    L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189, Jul. 2020. doi: 10.1109/JAS.2019.1911732

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(15)  / Tables(5)

    Article Metrics

    Article views (31) PDF downloads(14) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return