A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 1
Jan.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Y. X. Yang, Z. H. Ni, M. Y. Gao, J. Zhang, and D. C. Tao, “Collaborative pushing and grasping of tightly stacked objects via deep reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 135–145, Jan. 2022. doi: 10.1109/JAS.2021.1004255
Citation: Y. X. Yang, Z. H. Ni, M. Y. Gao, J. Zhang, and D. C. Tao, “Collaborative pushing and grasping of tightly stacked objects via deep reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 135–145, Jan. 2022. doi: 10.1109/JAS.2021.1004255

Collaborative Pushing and Grasping of Tightly Stacked Objects via Deep Reinforcement Learning

doi: 10.1109/JAS.2021.1004255
Funds:  This work was supported by the National Natural Science Foundation of China (61873077, 61806062), Zhejiang Provincial Major Research and Development Project of China (2020C01110), and Zhejiang Provincial Key Laboratory of Equipment Electronics
More Information
  • Directly grasping the tightly stacked objects may cause collisions and result in failures, degenerating the functionality of robotic arms. Inspired by the observation that first pushing objects to a state of mutual separation and then grasping them individually can effectively increase the success rate, we devise a novel deep Q-learning framework to achieve collaborative pushing and grasping. Specifically, an efficient non-maximum suppression policy (PolicyNMS) is proposed to dynamically evaluate pushing and grasping actions by enforcing a suppression constraint on unreasonable actions. Moreover, a novel data-driven pushing reward network called PR-Net is designed to effectively assess the degree of separation or aggregation between objects. To benchmark the proposed method, we establish a dataset containing common household items dataset (CHID) in both simulation and real scenarios. Although trained using simulation data only, experiment results validate that our method generalizes well to real scenarios and achieves a 97% grasp success rate at a fast speed for object separation in the real-world environment.

     

  • loading
  • [1]
    A. Rakshit, A. Konar, and A. K. Nagar, “A hybrid brain-computer interface for closed-loop position control of a robot arm,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1344–1360, Sep. 2020. doi: 10.1109/JAS.2020.1003336
    [2]
    J. Zhang and D. C. Tao, “Empowering things with intelligence: A survey of the progress, challenges, and opportunities in artificial intelligence of things,” IEEE Int. Things J., vol. 8, no. 10, pp. 7789–7817, May 2021. doi: 10.1109/JIOT.2020.3039359
    [3]
    A. Bicchi and V. Kumar, “Robotic grasping and contact: A review,” in Proc. IEEE Int. Conf. Robotics and Autom., San Francisco, CA, USA, 2000, pp. 348–353.
    [4]
    J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Y. Liu, J. A. Ojea, and K. Goldberg, “Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” 2017. [Online]. Available: https://arxiv.org/abs/1703.09312.
    [5]
    D. Morrison, P. Corke, and J. Leitner, “Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach,” 2018. [Online]. Available: https://arxiv.org/abs/1804.05172.
    [6]
    S. Kumra, S. Joshi, and F. “Sahin, Antipodal robotic grasping using generative residual convolutional neural network,” 2021. [Online]. Available: arXiv: https://arxiv.org/abs/1909.04810.
    [7]
    I. Popov, N. Heess, T. Lillicrap, R. Hafner, G. Barth-Maron, M. Vecerik, T. Lampe, Y. Tassa, T. Erez, and M. Riedmiller, “Data-efficient deep reinforcement learning for dexterous manipulation,” 2017. [Online]. Available: http://export.arxiv.org/abs/1704.03073.
    [8]
    D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine, “Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods,” in Proc. IEEE Int. Conf. Robotics and Autom. (ICRA), Brisbane, QLD, Australia, 2018, pp. 6284–6291.
    [9]
    M. Breyer, F. Furrer, T. Novkovic, R. Siegwart, and J. Nieto, “Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1549–1556, Apr. 2019. doi: 10.1109/LRA.2019.2896467
    [10]
    U. Viereck, A. ten Pas, K. Saenko, and R. Platt, “Learning a visuomotor controller for real world robotic grasping using simulated depth images,” in Proc. 1st Conf. Robot Learning, Mountain View, United States, 2017, pp. 291–300.
    [11]
    M. R. Dogar and S. S. Srinivasa, “A planning framework for nonprehensile manipulation under clutter and uncertainty,” Auton. Robot., vol. 33, no. 3, pp. 217–236, Oct. 2012. doi: 10.1007/s10514-012-9306-z
    [12]
    A. Zeng, S. R. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 4238–4245.
    [13]
    A. Hundt, B. Killeen, N. Greene, H. T. Wu, H. Kwon, C. Paxton, and G. D. Hager, ““Good robot!”: Efficient reinforcement learning for multi-step visual tasks with SIM to real transfer,” IEEE Robot. Autom. Lett., vol. 5, no. 4, pp. 6724–6731, Oct. 2020. doi: 10.1109/LRA.2020.3015448
    [14]
    B. Tang, M. Corsaro, G. Konidaris, S. Nikolaidis, and S. Tellex, “Learning collaborative pushing and grasping policies in dense clutter,” in Proc. IEEE Int. Conf. Robotics and Autom. (ICRA), Xi’an, China, 2021.
    [15]
    G. Peng, J. H. Liao, and S. B. Guan, “A pushing-grasping collaborative method based on deep q-network algorithm in dual perspectives,” 2021. [Online]. Available: https://arxiv.org/abs/2101.00829v1.
    [16]
    Z. P. Yang and H. L. Shang, “Robotic pushing and grasping knowledge learning via attention deep q-learning network,” in Proc. Int. Conf. Knowledge Science, Engineering and Management, Hangzhou, China, 2020, pp. 223–234.
    [17]
    W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1530–1538.
    [18]
    C. Wang, D. F. Xu, Y. K. Zhu, R. Martín-Martín, C. W. Lu, F. F. Li, and S. Savarese, “DenseFusion: 6D object pose estimation by iterative dense fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3338–3347.
    [19]
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21–37.
    [20]
    S. R. Song, A. Zeng, J. Lee, and T. Funkhouser, “Grasping in the wild: Learning 6DoF closed-loop grasping from low-cost demonstrations,” IEEE Robot. Autom. Lett., vol. 5, no. 3, pp. 4978–4985, Jul. 2020. doi: 10.1109/LRA.2020.3004787
    [21]
    D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke, and S. Levine, “Scalable deep reinforcement learning for vision-based robotic manipulation,” in Proc. 2nd Conf. Robot Learning, Zürich, Switzerland, 2018, pp. 651–673.
    [22]
    A. Ghadirzadeh, A. Maki, D. Kragic, and M. Björkman, “Deep predictive policy training using reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, pp. 2351–2358.
    [23]
    C. Finn and S. Levine, “Deep visual foresight for planning robot motion,” in Proc. IEEE Int. Conf. Robotics and Autom. (ICRA), Singapore, 2017, pp. 2786–2793.
    [24]
    M. Gupta, J. Müller, and G. S. Sukhatme, “Using manipulation primitives for object sorting in cluttered environments,” IEEE Trans. Autom. Sci. Eng., vol. 12, no. 2, pp. 608–614, Apr. 2015. doi: 10.1109/TASE.2014.2361346
    [25]
    D. Katz, A. Venkatraman, M. Kazemi, J. A. Bagnell, and A. Stentz, “Perceiving, learning, and exploiting object affordances for autonomous pile manipulation,” Auton. Robot., vol. 37, no. 4, pp. 369–382, Dec. 2014. doi: 10.1007/s10514-014-9407-y
    [26]
    A. Eitel, N. Hauff, and W. Burgard, “Learning to singulate objects using a push proposal network,” in Robotics Research, N. M. Amato, G. Hager, S. Thomas, and M. Torres-Torriti, Eds. Puerto Varas, Chile: Springer, 2020, pp. 405–419.
    [27]
    M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba, “Hindsight experience replay,” 2018. [Online]. Available: https://arxiv.org/pdf/1707.01495.pdf.
    [28]
    M. Kiatos and S. Malassiotis, “Robust object grasping in clutter via singulation,” in Proc. Int. Conf. Robotics and Autom. (ICRA), Montreal, QC, Canada, 2019, pp. 1596–1600.
    [29]
    D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1–31, Jan. 2019. doi: 10.1109/JAS.2018.7511249
    [30]
    L. Jiang, H. Y. Huang, and Z. H. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189, Jul. 2020. doi: 10.1109/JAS.2019.1911732
    [31]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013. [Online]. Available: https://arxiv.org/abs/1312.5602v1.
    [32]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
    [33]
    H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proc. 13th AAAI Conf. Artificial Intelligence, Phoenix, Arizona, 2016, 2094–2100.
    [34]
    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 2261–2269.
    [35]
    J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248–255.
    [36]
    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 448–456.
    [37]
    V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Machine Learning, Haifa, Israel, 2010, pp. 807–814.
    [38]
    N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS-improving object detection with one line of code,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5562–5570.
    [39]
    J. Zhang, Z. Chen, and D. C. Tao, “Towards high performance human keypoint detection,” Int. J. Comput. Vis., vol. 129, no. 9, pp. 2639–2662, Sep. 2021. doi: 10.1007/s11263-021-01482-8
    [40]
    J. H. Zhang, W. Zhang, R. Song, L. Ma, and Y. B. Li, “Grasp for stacking via deep reinforcement learning,” in Proc. IEEE Int. Conf. Robotics and Autom. (ICRA), Paris, France, 2020, pp. 2543–2549.
    [41]
    O. Mees, N. Abdo, M. Mazuran, and W. Burgard, “Metric learning for generalizing spatial relations to new objects,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, pp. 3175–3182.
    [42]
    T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” 2016. [Online]. Available: https://arxiv.org/abs/1511.05952.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(3)

    Article Metrics

    Article views (1935) PDF downloads(299) Cited by()

    Highlights

    • A novel collaborative pushing and grasping method is proposed for handling tightly stacked objects
    • An efficient non-maximum suppression policy is devised to suppress unreasonable actions
    • A novel PR-Net is devised to assess the degree of aggregation or separation between objects
    • A common household item dataset is established to train and evaluate the model

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return