Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots

José Pedro Carvalho; A. Pedro Aguiar

doi:10.1109/JAS.2024.125064

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064

Citation:

J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064

J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064

Citation:

J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064

PDF( 1606 KB)

Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots

doi: 10.1109/JAS.2024.125064

José Pedro Carvalho^{,
,},
A. Pedro Aguiar

Funds: This work was partially supported by project RELIABLE (PTDC/EEI-AUT/3522/2020), R&D Unit SYSTEC - Base (UIDB001472020) and Programmatic (UIDP001472020) funds - and Associate Laboratory Advanced Production and Intelligent Systems ARISE - LAP01122020, funded by national funds through the FCT/MCTES (PIDDAC)

More Information

Author Bio:
José Pedro Carvalho received the M.Sc. degree in electrical and computer engineering from the Faculty of Engineering, University of Porto (FEUP), Portugal. During the M.Sc., he also visited the University of California, USA. Currently, he is pursuing the Ph.D. degree in electrical and computer engineering at FEUP and received the Ph.D. scholarship from the Portuguese Foundation for Science and Technology (FCT) for collaboration with the Royal Institute of Technology (KTH), Sweden. His research interests extend to reinforcement learning and learning-based control with applications in robotics, focusing mainly on certifiably safe motion control of mobile robots

António Pedro Aguiar received the Ph.D. degree in electrical and computer engineering (ECE) from the Instituto Superior Técnico (IST), University of Lisbon, Portugal in 2002. He is currently a Full Professor with the ECE Department at the Faculty of Engineering, University of Porto (FEUP) and Director of ARISE - Advanced Production and Intelligent Systems Associate Laboratory. From 2002 to 2005, he was a Post-Doctoral Researcher at the Center for Control, Dynamical-Systems, and Computation at the University of California, Santa Barbara (UCSB). From 2005 to 2012, he was a Senior Researcher with the Institute for Systems and Robotics at IST, and an Invited Professor with the ECE Department at IST, Lisbon. His research interests include modeling, control, navigation, and guidance of autonomous robotic vehicles, nonlinear control, switched and hybrid systems, tracking, path-following, performance limitations, nonlinear observers, the integration of machine vision with feedback control, networked control, and coordinated/cooperative control of multiple autonomous robotic vehicles
Corresponding author: José Pedro Carvalho, e-mail: jpcarvalho@fe.up.pt
¹ In this work, we consider the algorithm to be safe as long as it is collision-free and complete as long there is a finite upper bound to the number of time steps it takes to finish the task.
² https://youtu.be/ZockV7Nul28
Received Date: 2024-08-01
Revised Date: 2024-11-07
Accepted Date: 2024-11-18

Available Online: 2024-12-17

Abstract

Abstract

The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges, particularly Coverage Path Planning. While this task has been typically tackled with classical algorithms, these often struggle with flexibility and adaptability in unknown environments. On the other hand, recent advances in Reinforcement Learning offer promising approaches, yet a significant gap in the literature remains when it comes to generalization over a large number of parameters. This paper presents a unified, generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques. The novelty of the framework comes from the design of an observation space that accommodates different map sizes, an action masking scheme that guarantees safety and robustness while also serving as a learning-from-demonstration technique during training, and a unique reward function that yields value functions that are size-invariant. These are coupled with a curriculum learning-based training strategy and parametric environment randomization, enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes, configurations, sensor payloads, and sub-tasks. Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training, outperforming a greedy heuristic by sixfold. Furthermore, in out-of-distribution environments, our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios, paving the way for generalizable and adaptable path-planning algorithms.
- Autonomous robots,
- coverage path planning,
- deep reinforcement learning,
- mobile robot,
- partially observable markov decision processes,
- path planning,
- zero-shot generalization

FullText(HTML)

¹ In this work, we consider the algorithm to be safe as long as it is collision-free and complete as long there is a finite upper bound to the number of time steps it takes to finish the task.
² https://youtu.be/ZockV7Nul28

References(52)

References

[1]	E. Galceran and M. Carreras, “A survey on coverage path planning for robotics,” Robot. Auton. Syst., vol. 61, no. 12, pp. 1258–1276, Dec. 2013. doi: 10.1016/j.robot.2013.09.004
[2]	D. K. Noh, W. J. Lee, H. R. Kim, I. S. Cho, I. B. Shim, and S. M. Baek, “Adaptive coverage path planning policy for a cleaning robot with deep reinforcement learning,” in Proc. IEEE Int. Conf. Consumer Electronics, Las Vegas, USA, 2022, pp. 1–6.
[3]	B. Nasirian, M. Mehrandezh, and F. Janabi-Sharifi, “Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment,” Front. Robot. AI, vol. 8, p. 624333, Mar. 2021. doi: 10.3389/frobt.2021.624333
[4]	D. Albani, D. Nardi, and V. Trianni, “Field coverage and weed mapping by UAV swarms,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 4319–4325.
[5]	A. J. Moshayedi, A. Sohail Khan, Y. Yang, J. Hu, and A. Kolahdooz, “Robots in agriculture: Revolutionizing farming practices,” EAI Endorsed Trans. AI Robot., vol. 3, pp. 1–23, Jun. 2024.
[6]	T. M. Cabreira, C. Di Franco, P. R. Ferreira, and G. C. Buttazzo, “Energy-aware spiral coverage path planning for UAV photogrammetric applications,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3662–3668, Oct. 2018. doi: 10.1109/LRA.2018.2854967
[7]	D. Baldazo, J. Parras, and S. Zazo, “Decentralized multi-agent deep reinforcement learning in swarms of drones for flood monitoring,” in Proc. 27th European Signal Processing Conf., A Coruna, Spain, 2019, pp. 1–5.
[8]	S. Y. Luis, D. G. Reina, and S. L. T. Marín, “A deep reinforcement learning approach for the patrolling problem of water resources through autonomous surface vehicles: The ypacarai lake case,” IEEE Access, vol. 8, pp. 204076–204093, Nov. 2020. doi: 10.1109/ACCESS.2020.3036938
[9]	C. Piciarelli and G. L. Foresti, “Drone patrolling with reinforcement learning,” in Proc. 13th Int. Conf. Distributed Smart Cameras, Trento, Italy, 2019, pp. 4.
[10]	H. Choset, “Coverage for robotics - a survey of recent results,” Ann. Math. Artif. Intell., vol. 31, no. 1, pp. 113–126, Oct. 2001.
[11]	T. M. Cabreira, L. B. Brisolara, and P. R. Jr. Ferreira, “Survey on coverage path planning with unmanned aerial vehicles,” Drones, vol. 3, no. 1, p. 4, Jan. 2019. doi: 10.3390/drones3010004
[12]	C. S. Tan, R. Mohd-Mokhtar, and M. R. Arshad, “A comprehensive review of coverage path planning in robotics using classical and heuristic algorithms,” IEEE Access, vol. 9, pp. 119310–119342, Aug. 2021. doi: 10.1109/ACCESS.2021.3108177
[13]	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
[14]	D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv: 1712.01815, 2017.
[15]	C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. D. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang, “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv: 1912.06680, 2019. (查阅网上资料,未找到作者中ę字母的代码,请确认)
[16]	O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Oct. 2019. doi: 10.1038/s41586-019-1724-z
[17]	P. R. Wurman, S. Barrett, K. Kawamoto, J. Macglashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Kompella, H. Lin, P. Macalpine, D. Oller, T. Seno, C. Sherstan, M. D. Thomure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Dürr, P. Stone, M. Spranger, and H. Kitano, “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, Feb. 2022. doi: 10.1038/s41586-021-04357-7
[18]	A. Kanervisto, C. Scheller, and V. Hautamaki, “Action space shaping in deep reinforcement learning,” in Proc. IEEE Conf. Games, Osaka, Japan, 2020, pp. 479–486.
[19]	J. Heydari, O. Saha, and V. Ganapathy, “Reinforcement learning-based coverage path planning with implicit cellular decomposition,” arXiv preprint arXiv: 2110.09018, 2021.
[20]	M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018.
[21]	A. Mannan, M. S. Obaidat, K. Mahmood, A. Ahmad, and R. Ahmad, “Classical versus reinforcement learning algorithms for unmanned aerial vehicle network communication and coverage path planning: A systematic literature review,” Int. J. Commun. Syst., vol. 36, no. 5, p. e5423, Mar. 2023. doi: 10.1002/dac.5423
[22]	Z. Li, S. Li, A. Francis, and X. Luo, “A novel calibration system for robot arm via an open dataset and a learning perspective,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 69, no. 12, pp. 5169–5173, Dec. 2022.
[23]	L. Piardi, J. Lima, A. I. Pereira, and P. Costa, “Coverage path planning optimization based on Q-learning algorithm,” AIP Conf. Proc., vol. 2116, no. 1, p. 220002, Jul. 2019.
[24]	J. Xiao, G. Wang, Y. Zhang, and L. Cheng, “A distributed multi-agent dynamic area coverage algorithm based on reinforcement learning,” IEEE Access, vol. 8, pp. 33511–33521, Jan. 2020. doi: 10.1109/ACCESS.2020.2967225
[25]	J. P. Carvalho and A. P. Aguiar, “A reinforcement learning based online coverage path planning algorithm,” in Proc. IEEE Int. Conf. Autonomous Robot Systems and Competitions, Tomar, Portugal, 2023, pp. 81–86.
[26]	M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UAV coverage path planning under varying power constraints using deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Las Vegas, USA, 2020, pp. 1444–1449.
[27]	M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UAV path planning using global and local map information with deep reinforcement learning,” in Proc. 20th Int. Conf. Advanced Robotics, Ljubljana, Slovenia, 2021, pp. 539–546.
[28]	H. Bayerlein, M. Theile, M. Caccamo, and D. Gesbert, “UAV path planning for wireless data harvesting: A deep reinforcement learning approach,” in Proc. IEEE Global Communications Conf., Taipei, China, 2020, pp. 1–6.
[29]	M. Theile, H. Bayerlein, M. Caccamo, and A. L. Sangiovanni-Vincentelli, “Learning to recharge: UAV coverage path planning through deep reinforcement learning,” arXiv preprint arXiv: 2309.03157, 2023.
[30]	O. Saha, G. Ren, J. Heydari, V. Ganapathy, and M. Shah, “Deep reinforcement learning based online area covering autonomous robot,” in Proc. 7th Int. Conf. Automation, Robotics and Applications, Prague, Czech Republic, 2021, pp. 21–25.
[31]	O. Saha, G. Ren, J. Heydari, V. Ganapathy, and M. Shah, “Online area covering robot in unknown dynamic environments,” in Proc. 7th Int. Conf. Automation, Robotics and Applications, Prague, Czech Republic, 2021, pp. 38–42.
[32]	A. Ianenko, A. Artamonov, G. Sarapulov, A. Safaraleev, S. Bogomolov, and D. K. Noh, “Coverage path planning with proximal policy optimization in a grid-based environment,” in Proc. 59th IEEE Conf. Decision and Control, Jeju, Korea, 2020, pp. 4099–4104.
[33]	R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel, “A survey of zero-shot generalisation in deep reinforcement learning,” J. Artif. Int. Res., vol. 76, pp. 201–264, Jan. 2023.
[34]	M. Hessel, H. Van Hasselt, J. Modayil, and D. Silver, “On inductive biases in deep reinforcement learning,” arXiv preprint arXiv: 1907.02908, 2019.
[35]	T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, G. Dulac-Arnold, J. Agapiou, J. Z. Leibo, and A. Gruslys, “Deep Q-learning from demonstrations,” in Proc. 32nd AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018. (查阅网上资料,未找到页码信息,请确认补充)
[36]	X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in Proc. IEEE Int. Conf. Robotics and Automation, Brisbane, Australia, 2018, 3803–3810.
[37]	S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” J. Mach. Learn. Res., vol. 21, no. 1, p. 181, Jan. 2020.
[38]	A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune, “First return, then explore,” Nature, vol. 590, no. 7847, pp. 580–586, Feb. 2021. doi: 10.1038/s41586-020-03157-9
[39]	J. E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J., vol. 4, no. 1, pp. 25–30, Dec. 1965. doi: 10.1147/sj.41.0025
[40]	L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artif. Intell., vol. 101, no. 1-2, pp. 99–134, May 1998. doi: 10.1016/S0004-3702(98)00023-X
[41]	F. Pardo, A. Tavakoli, V. Levdik, and P. Kormushev, “Time limits in reinforcement learning,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4042–4051.
[42]	S. Huang and S. Ontañón, “A closer look at invalid action masking in policy gradient algorithms,” in Proc. 35th Int. Florida Artificial Intelligence Research Society Conf., Hutchinson Island, USA, 2022.
[43]	R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert, and M. Althoff, “Excluding the irrelevant: Focusing reinforcement learning through continuous action masking,” arXiv preprint arXiv: 2406.03704, 2024.
[44]	Y. Hou, X. Liang, J. Zhang, Q. Yang, A. Yang, and N. Wang, “Exploring the use of invalid action masking in reinforcement learning: A comparative study of on-policy and off-policy algorithms in real-time strategy games,” Appl. Sci., vol. 13, no. 14, p. 8283, Jul. 2023. doi: 10.3390/app13148283
[45]	D. Zhong, Y. Yang, and Q. Zhao, “No prior mask: Eliminate redundant action for deep reinforcement learning,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, pp. 17078–17086.
[46]	A. Y. Ng, D. Harada, and S. J. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Proc. 16th Int. Conf. Machine Learning, San Francisco, USA: ACM, 1999, pp. 278–287.
[47]	M. Fortunato, M. G. Azar, B. Piot, J. Menick, M. Hessel, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, “Noisy networks for exploration,” in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada: ICLR, 2018.
[48]	Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA: ICML, 2016, pp. 1995–2003.
[49]	T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico: ICLR, 2016.
[50]	D. Schmidt and T. Schmied, “Fast and data-efficient training of rainbow: An experimental study on Atari,” arXiv preprint arXiv: 2111.10247, 2021.
[51]	A. Stooke and P. Abbeel, “Accelerated methods for deep reinforcement learning,” arXiv preprint arXiv: 1803.02811, 2019.
[52]	L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189, Jul. 2020. doi: 10.1109/JAS.2019.1911732

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(15) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (292) PDF downloads(88)

Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots

doi: 10.1109/JAS.2024.125064

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content