Citation: | J. Carvalho and A. Aguiar, “Deep reinforcement learning for zero-shot coverage path planning with mobile robots,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125064 |
[1] |
E. Galceran and M. Carreras, “A survey on coverage path planning for robotics,” Robot. Auton. Syst., vol. 61, no. 12, pp. 1258–1276, Dec. 2013. doi: 10.1016/j.robot.2013.09.004
|
[2] |
D. K. Noh, W. J. Lee, H. R. Kim, I. S. Cho, I. B. Shim, and S. M. Baek, “Adaptive coverage path planning policy for a cleaning robot with deep reinforcement learning,” in Proc. IEEE Int. Conf. Consumer Electronics, Las Vegas, USA, 2022, pp. 1–6.
|
[3] |
B. Nasirian, M. Mehrandezh, and F. Janabi-Sharifi, “Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment,” Front. Robot. AI, vol. 8, p. 624333, Mar. 2021. doi: 10.3389/frobt.2021.624333
|
[4] |
D. Albani, D. Nardi, and V. Trianni, “Field coverage and weed mapping by UAV swarms,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 4319–4325.
|
[5] |
A. J. Moshayedi, A. Sohail Khan, Y. Yang, J. Hu, and A. Kolahdooz, “Robots in agriculture: Revolutionizing farming practices,” EAI Endorsed Trans. AI Robot., vol. 3, pp. 1–23, Jun. 2024.
|
[6] |
T. M. Cabreira, C. Di Franco, P. R. Ferreira, and G. C. Buttazzo, “Energy-aware spiral coverage path planning for UAV photogrammetric applications,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3662–3668, Oct. 2018. doi: 10.1109/LRA.2018.2854967
|
[7] |
D. Baldazo, J. Parras, and S. Zazo, “Decentralized multi-agent deep reinforcement learning in swarms of drones for flood monitoring,” in Proc. 27th European Signal Processing Conf., A Coruna, Spain, 2019, pp. 1–5.
|
[8] |
S. Y. Luis, D. G. Reina, and S. L. T. Marín, “A deep reinforcement learning approach for the patrolling problem of water resources through autonomous surface vehicles: The ypacarai lake case,” IEEE Access, vol. 8, pp. 204076–204093, Nov. 2020. doi: 10.1109/ACCESS.2020.3036938
|
[9] |
C. Piciarelli and G. L. Foresti, “Drone patrolling with reinforcement learning,” in Proc. 13th Int. Conf. Distributed Smart Cameras, Trento, Italy, 2019, pp. 4.
|
[10] |
H. Choset, “Coverage for robotics - a survey of recent results,” Ann. Math. Artif. Intell., vol. 31, no. 1, pp. 113–126, Oct. 2001.
|
[11] |
T. M. Cabreira, L. B. Brisolara, and P. R. Jr. Ferreira, “Survey on coverage path planning with unmanned aerial vehicles,” Drones, vol. 3, no. 1, p. 4, Jan. 2019. doi: 10.3390/drones3010004
|
[12] |
C. S. Tan, R. Mohd-Mokhtar, and M. R. Arshad, “A comprehensive review of coverage path planning in robotics using classical and heuristic algorithms,” IEEE Access, vol. 9, pp. 119310–119342, Aug. 2021. doi: 10.1109/ACCESS.2021.3108177
|
[13] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
|
[14] |
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv: 1712.01815, 2017.
|
[15] |
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. D. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang, “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv: 1912.06680, 2019. (查阅网上资料,未找到作者中ę字母的代码,请确认)
|
[16] |
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Oct. 2019. doi: 10.1038/s41586-019-1724-z
|
[17] |
P. R. Wurman, S. Barrett, K. Kawamoto, J. Macglashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Kompella, H. Lin, P. Macalpine, D. Oller, T. Seno, C. Sherstan, M. D. Thomure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Dürr, P. Stone, M. Spranger, and H. Kitano, “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, Feb. 2022. doi: 10.1038/s41586-021-04357-7
|
[18] |
A. Kanervisto, C. Scheller, and V. Hautamaki, “Action space shaping in deep reinforcement learning,” in Proc. IEEE Conf. Games, Osaka, Japan, 2020, pp. 479–486.
|
[19] |
J. Heydari, O. Saha, and V. Ganapathy, “Reinforcement learning-based coverage path planning with implicit cellular decomposition,” arXiv preprint arXiv: 2110.09018, 2021.
|
[20] |
M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018.
|
[21] |
A. Mannan, M. S. Obaidat, K. Mahmood, A. Ahmad, and R. Ahmad, “Classical versus reinforcement learning algorithms for unmanned aerial vehicle network communication and coverage path planning: A systematic literature review,” Int. J. Commun. Syst., vol. 36, no. 5, p. e5423, Mar. 2023. doi: 10.1002/dac.5423
|
[22] |
Z. Li, S. Li, A. Francis, and X. Luo, “A novel calibration system for robot arm via an open dataset and a learning perspective,” IEEE Trans. Circuits Syst. II: Express Briefs, vol. 69, no. 12, pp. 5169–5173, Dec. 2022.
|
[23] |
L. Piardi, J. Lima, A. I. Pereira, and P. Costa, “Coverage path planning optimization based on Q-learning algorithm,” AIP Conf. Proc., vol. 2116, no. 1, p. 220002, Jul. 2019.
|
[24] |
J. Xiao, G. Wang, Y. Zhang, and L. Cheng, “A distributed multi-agent dynamic area coverage algorithm based on reinforcement learning,” IEEE Access, vol. 8, pp. 33511–33521, Jan. 2020. doi: 10.1109/ACCESS.2020.2967225
|
[25] |
J. P. Carvalho and A. P. Aguiar, “A reinforcement learning based online coverage path planning algorithm,” in Proc. IEEE Int. Conf. Autonomous Robot Systems and Competitions, Tomar, Portugal, 2023, pp. 81–86.
|
[26] |
M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UAV coverage path planning under varying power constraints using deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Las Vegas, USA, 2020, pp. 1444–1449.
|
[27] |
M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UAV path planning using global and local map information with deep reinforcement learning,” in Proc. 20th Int. Conf. Advanced Robotics, Ljubljana, Slovenia, 2021, pp. 539–546.
|
[28] |
H. Bayerlein, M. Theile, M. Caccamo, and D. Gesbert, “UAV path planning for wireless data harvesting: A deep reinforcement learning approach,” in Proc. IEEE Global Communications Conf., Taipei, China, 2020, pp. 1–6.
|
[29] |
M. Theile, H. Bayerlein, M. Caccamo, and A. L. Sangiovanni-Vincentelli, “Learning to recharge: UAV coverage path planning through deep reinforcement learning,” arXiv preprint arXiv: 2309.03157, 2023.
|
[30] |
O. Saha, G. Ren, J. Heydari, V. Ganapathy, and M. Shah, “Deep reinforcement learning based online area covering autonomous robot,” in Proc. 7th Int. Conf. Automation, Robotics and Applications, Prague, Czech Republic, 2021, pp. 21–25.
|
[31] |
O. Saha, G. Ren, J. Heydari, V. Ganapathy, and M. Shah, “Online area covering robot in unknown dynamic environments,” in Proc. 7th Int. Conf. Automation, Robotics and Applications, Prague, Czech Republic, 2021, pp. 38–42.
|
[32] |
A. Ianenko, A. Artamonov, G. Sarapulov, A. Safaraleev, S. Bogomolov, and D. K. Noh, “Coverage path planning with proximal policy optimization in a grid-based environment,” in Proc. 59th IEEE Conf. Decision and Control, Jeju, Korea, 2020, pp. 4099–4104.
|
[33] |
R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel, “A survey of zero-shot generalisation in deep reinforcement learning,” J. Artif. Int. Res., vol. 76, pp. 201–264, Jan. 2023.
|
[34] |
M. Hessel, H. Van Hasselt, J. Modayil, and D. Silver, “On inductive biases in deep reinforcement learning,” arXiv preprint arXiv: 1907.02908, 2019.
|
[35] |
T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, G. Dulac-Arnold, J. Agapiou, J. Z. Leibo, and A. Gruslys, “Deep Q-learning from demonstrations,” in Proc. 32nd AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018. (查阅网上资料,未找到页码信息,请确认补充)
|
[36] |
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in Proc. IEEE Int. Conf. Robotics and Automation, Brisbane, Australia, 2018, 3803–3810.
|
[37] |
S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” J. Mach. Learn. Res., vol. 21, no. 1, p. 181, Jan. 2020.
|
[38] |
A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune, “First return, then explore,” Nature, vol. 590, no. 7847, pp. 580–586, Feb. 2021. doi: 10.1038/s41586-020-03157-9
|
[39] |
J. E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J., vol. 4, no. 1, pp. 25–30, Dec. 1965. doi: 10.1147/sj.41.0025
|
[40] |
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artif. Intell., vol. 101, no. 1-2, pp. 99–134, May 1998. doi: 10.1016/S0004-3702(98)00023-X
|
[41] |
F. Pardo, A. Tavakoli, V. Levdik, and P. Kormushev, “Time limits in reinforcement learning,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4042–4051.
|
[42] |
S. Huang and S. Ontañón, “A closer look at invalid action masking in policy gradient algorithms,” in Proc. 35th Int. Florida Artificial Intelligence Research Society Conf., Hutchinson Island, USA, 2022.
|
[43] |
R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert, and M. Althoff, “Excluding the irrelevant: Focusing reinforcement learning through continuous action masking,” arXiv preprint arXiv: 2406.03704, 2024.
|
[44] |
Y. Hou, X. Liang, J. Zhang, Q. Yang, A. Yang, and N. Wang, “Exploring the use of invalid action masking in reinforcement learning: A comparative study of on-policy and off-policy algorithms in real-time strategy games,” Appl. Sci., vol. 13, no. 14, p. 8283, Jul. 2023. doi: 10.3390/app13148283
|
[45] |
D. Zhong, Y. Yang, and Q. Zhao, “No prior mask: Eliminate redundant action for deep reinforcement learning,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, pp. 17078–17086.
|
[46] |
A. Y. Ng, D. Harada, and S. J. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Proc. 16th Int. Conf. Machine Learning, San Francisco, USA: ACM, 1999, pp. 278–287.
|
[47] |
M. Fortunato, M. G. Azar, B. Piot, J. Menick, M. Hessel, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, “Noisy networks for exploration,” in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada: ICLR, 2018.
|
[48] |
Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA: ICML, 2016, pp. 1995–2003.
|
[49] |
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico: ICLR, 2016.
|
[50] |
D. Schmidt and T. Schmied, “Fast and data-efficient training of rainbow: An experimental study on Atari,” arXiv preprint arXiv: 2111.10247, 2021.
|
[51] |
A. Stooke and P. Abbeel, “Accelerated methods for deep reinforcement learning,” arXiv preprint arXiv: 1803.02811, 2019.
|
[52] |
L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189, Jul. 2020. doi: 10.1109/JAS.2019.1911732
|