A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 7 Issue 4
Jun.  2020

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Lan Jiang, Hongyun Huang and Zuohua Ding, "Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179-1189, July 2020. doi: 10.1109/JAS.2019.1911732
Citation: Lan Jiang, Hongyun Huang and Zuohua Ding, "Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179-1189, July 2020. doi: 10.1109/JAS.2019.1911732

Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge

doi: 10.1109/JAS.2019.1911732
Funds:  This work was supported by the National Natural Science Foundation of China (61751210, 61572441)
More Information
  • Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the “curse of dimensionality” issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network; such a process is called experience replay. Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.

     

  • loading
  • [1]
    M. Liu, F. Colas, F. Pomerleau, and R. Siegwart, “A Markov semisupervised clustering approach and its application in topological map extraction,” in Proc. IEEE Int. Conf. Intelligent Robots and Systems, Vilamoura, Algarve, 2012, pp. 4743–4748.
    [2]
    M. Xu, L. Jaesung, and K. Bo-Yeong, “Scalable coverage path planning for cleaning robots using rectangular map decomposition on large environments,” IEEE Access, 2018, pp. 1–1.
    [3]
    B. Sandeep and P. Supriya, “Analysis of fuzzy rules for robot path planning,” in Proc. Int. Conf. Advances in Computing, Communications and Informatics, Jaipur, India, 2016, pp. 309–314.
    [4]
    N. Y. Zeng, H. Zhang, Y. P. Chen, B. Q. Chen, and Y. R. Liu, “Path planning for intelligent robot based on switching local evolutionary PSO algorithm,” Assembly Autom., vol. 36, no. 2, pp. 120–126, Apr. 2016. doi: 10.1108/AA-10-2015-079
    [5]
    E. Masehian and D. Sedighizadeh, “A multi-objective PSO-based algorithm for robot path planning,” in Proc. IEEE Int. Conf. Industrial Technology, Vina del Mar, Chile, Apr. 2010, pp. 465–470.
    [6]
    H. Shuyun, S. Tang, B. Song, M. Tong, and M. Ji, “Robot path planning based on improved ant colony optimization,” Computer Engineering, vol. 34, no. 15, pp. 1–3, 2008.
    [7]
    B. Farid, G. Denis, P. Herve, and G. Dominique, “Modified artificial potential field method for online path planning applications,” in Proc. IEEE Intelligent Vehicles Symposium (IV), Redondo Beach, California, USA, 2017, pp. 180–185.
    [8]
    L. F. Liu, R. X. Shi, S. D. Li, and W. Jiang, “Path planning for UAVS based on improved artificial potential field method through changing the repulsive potential function,” in Proc. IEEE Chinese Guidance, Navigation and Control Conf. (CGNCC), Nanjing, China, 2016, pp. 2011–2015.
    [9]
    Y. W. Chen and W. Y. Chiu, “Optimal robot path planning system by using a neural network-based approach,” in Proc. Autom. Control Conf., Taichung, China, 2016, pp. 85–90.
    [10]
    J. L. Zhang, J. Y. Zhang, Z. Ma, and Z. Z. He, “Using partial-policy q-learning to plan path for robot navigation in unknown enviroment,” in Proc. 10th Int. Symposium on Computational Intelligence and Design (ISCID), vol. 1, pp. 85–90, Dec., 2017.
    [11]
    V. Babu, U. Krishna, and S. Shahensha, “An autonomous path finding robot using q-learning,” in Proc. Int. Conf. Intelligent Systems and Control, Coimbatore, India: IEEE, Jan. 2016, pp. 1–6.
    [12]
    Z. F. Wu, “Application of optimized q learning algorithm in reinforcement learning,” Bulletin of Science&Technology, vol. 36, no. 2, pp. 74–76, Feb. 2018.
    [13]
    L. J. Lin, “Reinforcement learning for robots using neural networks,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, USA, 1993.
    [14]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with deep reinforcement learning,” Computer Science, Dec. 2013.
    [15]
    L. Tong, “A speedup convergent method for multi-agent reinforcement learning,” in Proc. Int. Conf. Information Engineering and Computer Science, Dec. 2009, pp. 1–4.
    [16]
    J. Kober, J. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” Int. J. Robotics Research, vol. 32, no. 11, pp. 1238–1274, Sep. 2013. doi: 10.1177/0278364913495721
    [17]
    Y. Gao, S. F. Chen, and X. Lu, “Research on reinforcement learning technology: A review,” Acta Autom. Sinica, vol. 30, no. 1, pp. 86–100, Jan. 2004.
    [18]
    L. Xue, C. Y. Sun, D. Wunsch, Y. J. Zhou, and Y. Fang, “An adaptive strategy via reinforcement learning for the prisoner’s dilemma game,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301–310, 2018.
    [19]
    T. Liu, B. Tian, Y. F. Ai, L. Li, D. P. Cao, and F.-Y. Wang, “Parallel reinforcement learning: A framework and case study,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 4, pp. 827–835, 2018. doi: 10.1109/JAS.2018.7511144
    [20]
    Y. Yusof, H. M. A. H. Mansor, and H. M. D. Baba, “Simulation of mobile robot navigation utilizing reinforcement and unsupervised weightless neural network learning algorithm,” in Proc. Research and Development, 2016, pp. 123–128.
    [21]
    L. Li, Y. S. Lv, and F.-Y. Wang, “Traffic signal timing via deep reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 247–254, 2016. doi: 10.1109/JAS.2016.7508798
    [22]
    A. Sharma, K. Gupta, A. Kumar, A. Sharma, and R. Kumar, “Model based path planning using Q-learning,” in Proc. IEEE Int. Conf. Industrial Technology, Chengdu, China, 2017, pp. 837–842.
    [23]
    S. Parasuraman and S. C. Yun, “Mobile robot navigation: Neural qlearning,” Int. J. Computer Applications in Technology, vol. 44, no. 4, pp. 303–311, Oct. 2013.
    [24]
    A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Int. Conf. Neural Information Processing Systems, Doha, Qatar, 2012, pp. 1097–1105.
    [25]
    L. Tai and M. Liu, “A robot exploration strategy based on q-learning network,” in Proc. IEEE Int. Conf. Real-Time Computing and Robotics, Angkor Wat, Cambodia, 2016, pp. 57–62.
    [26]
    M. Jaradat, M. Al-Rousan, and L. Quadan, “Reinforcement based mobile robot navigation in dynamic environment,” Robotics&Computer Integrated Manufacturing, vol. 27, no. 1, pp. 135–149, Feb. 2011.
    [27]
    L. Cherroun and M. Boumehraz, “Intelligent systems based on reinforcement learning and fuzzy logic approaches, ‘application to mobile robotic’,” in Proc. Int. Conf. Information Technology and E-Services, 2012, pp. 1–6.
    [28]
    H. Boubertakh, M. Tadjine, and P. Y. Glorennec, “A new mobile robot navigation method using fuzzy logic and a modified q-learning algorithm,” J. Intelligent&Fuzzy Systems, vol. 21, no. 1–2, pp. 113–119, 2010.
    [29]
    J. Liu, W. Qi, and X. Lu, “Multi-step reinforcement learning algorithm of mobile robot path planning based on virtual potential field,” Springer, vol. 728, pp. 528–538, Sep. 2017.
    [30]
    Y. B. Zheng, B. Li, D. Y. An, and N. Li, “A multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field,” in Proc. Int. Conf. Natural Computation, 2016, pp. 363–369.
    [31]
    D. Luviano Cruz and W. Yu, “Multi-agent path planning in unknown environment with reinforcement learning and neural network,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics, San Diego, USA, 2014, pp. 3458–3463.
    [32]
    Y. J. Zhao, Z. Zheng, X. Y. Zhang, and Y. Liu, “Q learning algorithm based UAV path learning and obstacle avoidence approach,” in Proc. Chinese Control Conf. (Int.) (CCC), Dalian, China, 2017, pp. 3397–3402.
    [33]
    B. Q. Huang, G. Y. Cao, and M. Guo, “Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance,” in Proc. Int. Conf. Machine Learning and Cybernetics, Guangzhou, China, 2005, pp. 85–89.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(6)

    Article Metrics

    Article views (1899) PDF downloads(170) Cited by()

    Highlights

    • Fast convergence and Better strategy
      The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
    • Deep Q-learning
      We use DQL to process the state information of intelligent robots, and get the cumulative reward value of the corresponding action, so as to replace Q-table in reinforcement learning and solve the “curse of dimensionality”.
    • Experience replay
      Training a neural network requires a lot of data, but when the robot explores in a unknown environment, it is impossible to prepare enough training sample sets for it in advance. We make the robot collect experience data that are generated during its moving and store them in replay memory . Then, we use the data in the replay memory to train the neural network.
    • Heuristic knowledge
      On the one hand, heuristic knowledge guides the behavior of the robot, and it makes the actions selected by the robot more purposeful; on the other hand, it also increases the effectiveness of training the neural networks. With the help of heuristic knowledge, neural networks will converge to an optimal action strategy faster.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return