Data-Driven Human-Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning

Yongliang Yang; Zihao Ding; Rui Wang; Hamidreza Modares; Donald C. Wunsch

doi:10.1109/JAS.2021.1004258

Volume 9 Issue 1

Jan. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(1): 47-63

Y. L. Yang, Z. H. Ding, R. Wang, H. Modares, and D. C. Wunsch, “Data-driven human-robot interaction without velocity measurement using off-policy reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 47–63, Jan. 2022. doi: 10.1109/JAS.2021.1004258

Citation:

Y. L. Yang, Z. H. Ding, R. Wang, H. Modares, and D. C. Wunsch, “Data-driven human-robot interaction without velocity measurement using off-policy reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 47–63, Jan. 2022. doi: 10.1109/JAS.2021.1004258

Citation:

PDF( 8925 KB)

Data-Driven Human-Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning

doi: 10.1109/JAS.2021.1004258

1.
Key Laboratory of Knowledge Automation for Industrial Processes Ministry of Education, School of Automation & Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
2.
School of Automation, Beijng Institute Technology, Beijing 100081, China
3.
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
4.
Mechanical Engineering Department, Michigan State University, East Lansing, MI 48824 USA
5.
Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO 65401 USA

Funds: This work was supported in part by the National Natural Science Foundation of China (61903028), the Youth Innovation Promotion Association, Chinese Academy of Sciences (2020137), the Lifelong Learning Machines Program from DARPA/Microsystems Technology Office, and the Army Research Laboratory (W911NF-18-2-0260)

More Information

Author Bio:
Yongliang Yang (Member, IEEE) received the B.S. degree in electrical engineering from Hebei University, Baoding, China, in 2011, and the Ph.D. degree in electrical engineering from the University of Science and Technology Beijing (USTB), Beijing, China, in 2017. From 2015 to 2017, he was a Visiting Scholar with the Missouri University of Science and Technology, Rolla, USA, sponsored by China Scholarship Council. He was an Assistant Professor at USTB from 2018 to 2020. From 2020 to 2021, he was a Postdoctoral Research Fellow with the State Key Laboratory of Internet of Things for Smart City, Faculty of Science and Technology, University of Macau, Macau, China. He is currently an Associate Professor with the School of Automation and Electrical Engineering, USTB. His research interests include adaptive optimal control, distributed optimization and control, and cyber-physical systems (CPSs). He was a recipient of the Best Ph.D. Dissertation of China Association of Artificial Intelligence, the Best Ph.D. Dissertation of USTB, the Chancellors Scholarship in USTB, and the Excellent Graduates Awards in Beijing. He also serves as a Reviewer of several international journals and conferences, including Automatica, IEEE Transactions on Automatic Control, IEEE Transactions on Cybernetics, and IEEE Transactions on Neural Networks and Learning Systems

Zihao Ding (Student Member, IEEE) received the B.S. degree in electrical engineering from University of Science and Technology Beijing, Beijing, China, in 2015. He is a master student in electrical engineering at the School of Automation, Beijing Institute of Technology. His research interests include reinforcement learning, robotic and autonomous driving

Rui Wang (Member, IEEE) received the B.E. degree in automation from the Beijing Institute of Technology in 2013, and the Ph.D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences in 2018. He is currently an Associate Professor with the State Key Laboratory of Management and Control for Complex Systems, Institute of Chinese Academy of Sciences. His current research interests include robot perception, control and learning. He has published over 50 refereed journal, and conference papers, e.g., IEEE Transactions on Systems, Man and Cybernetics: Systems, IEEE Transactions on Industrial Electronics, IEEE Transactions on Vehicular Technology, IEEE International Conference on Robotics and Automation (ICRA), IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), and International Federation of Automatic Control (IFAC)

Hamidreza Modares (Senior Member, IEEE) received the B.S. degree in electrical engineering from the University of Tehran, Tehran, Iran, in 2004, the M.S. degree in electrical engineering from the Shahrood University of Technology, Shahroud, Iran, in 2006, and the Ph.D. degree in electrical engineering from the University of Texas at Arlington, Arlington, TX, USA, in 2015. From 2006 to 2009, he was a Senior Lecturer with the Shahrood University of Technology. From 2015 to 2016, he was a Faculty Research Associate with the University of Texas at Arlington (UTA). From 2016 to 2018, he was an Assistant Professor with the Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, USA. He is currently an Assistant Professor with the Department of Mechanical Engineering, Michigan State University, East Lansing, MI, USA. He has authored several journal and conference papers on the design of optimal controllers using reinforcement learning. His current research interests include cyber-physical systems, machine learning, distributed control, robotics, and renewable energy microgrids. He was a recipient of the Best Paper Award at the 2015 IEEE International Symposium on Resilient Control Systems, the Stelmakh Outstanding Student Research Award from the Department of Electrical Engineering, UTA, in 2015, and the Summer Dissertation Fellowship from UTA in 2015. He is an Associate Editor for IEEE Transactions On Neural Networks and Learning Systems

Donald C. Wunsch (Fellow, IEEE) received the B.S. degree in applied mathematics from the University of New Mexico, Albuquerque, NM, USA, in 1984, the M.S. degree in applied mathematics and the Ph.D. degree in electrical engineering from the University of Washington, Seattle, WA, USA, in 1987 and 1991, respectively, and the M.B.A. degree (executive) in business administration from Washington University, MO, USA, in 2006. He was with Texas Tech University, Lubbock, TX, USA, Boeing, Seattle, WA, USA, Rockwell International, Albuquerque, NM, USA, and International Laser Systems, Albuquerque, NM, USA. He is currently the Mary K. Finley Missouri Distinguished Professor with the Missouri University of Science and Technology (Missouri S&T), Rolla, MO, USA, where he is also the Director of the Applied Computational Intelligence Laboratory, a multidisciplinary research group. His current research interests include clustering/unsupervised learning, biclustering, adaptive resonance, and adaptive dynamic programming architectures, hardware, and applications, neuro-fuzzy regression, autonomous agents, games, and bioinformatics. He is an INNS fellow. He was the INNS President. He was a recipient of the National Science Foundation (NSF) CAREER Award, the 2015 International Neural Networks Society (INNS) Gabor Award, and the 2019 Ada Lovelace Service Award. He has produced 22 Ph.D. recipients in computer engineering, electrical engineering, systems engineering, and computer science. He served as the International Joint Conference on Neural Networks (IJCNN) General Chair, and on several boards, including the St. Patricks School Board, the IEEE Neural Networks Council, the INNS, and the University of Missouri Bioinformatics Consortium, and the Chair of the Missouri S&T Information Technology and Computing Committee and the Student Design and Experiential Learning Center Board
Corresponding author: Rui Wang, e-mail: rwang5212@ia.ac.cn
Received Date: 2021-03-09
Revised Date: 2021-04-21
Accepted Date: 2021-06-07

Available Online: 2021-07-05

Abstract

Abstract

In this paper, we present a novel data-driven design method for the human-robot interaction (HRI) system, where a given task is achieved by cooperation between the human and the robot. The presented HRI controller design is a two-level control design approach consisting of a task-oriented performance optimization design and a plant-oriented impedance controller design. The task-oriented design minimizes the human effort and guarantees the perfect task tracking in the outer-loop, while the plant-oriented achieves the desired impedance from the human to the robot manipulator end-effector in the inner-loop. Data-driven reinforcement learning techniques are used for performance optimization in the outer-loop to assign the optimal impedance parameters. In the inner-loop, a velocity-free filter is designed to avoid the requirement of end-effector velocity measurement. On this basis, an adaptive controller is designed to achieve the desired impedance of the robot manipulator in the task space. The simulation and experiment of a robot manipulator are conducted to verify the efficacy of the presented HRI design framework.
- Adaptive impedance control,
- data-driven method,
- human-robot interaction (HRI),
- reinforcement learning,
- velocity-free

FullText(HTML)

References(45)

References

[1]	E. Nuño, R. Ortega, and L. Basañez, “An adaptive controller for nonlinear teleoperators,” Automatica, vol. 46, no. 1, pp. 155–159, 2010. doi: 10.1016/j.automatica.2009.10.026
[2]	S. Cai, Z. Ma, M. J. Skibniewski, and S. Bao, “Construction automation and robotics for high-rise buildings over the past decades: A comprehensive review,” Advanced Engineering Informatics, vol. 42, Article No. 100989, 2019. doi: 10.1016/j.aei.2019.100989
[3]	D. Han, P. Huang, X. Liu, and Y. Yang, “Combined spacecraft stabilization control after multiple impacts during the capture of a tumbling target by a space robot,” Acta Astronautica, vol. 176, pp. 24–32, 2020. doi: 10.1016/j.actaastro.2020.05.035
[4]	S. E. Fasoli, H. I. Krebs, J. Stein, W. R. Frontera, and N. Hogan, “Effects of robotic therapy on motor impairment and recovery in chronic stroke,” Archives of Physical Medicine and Rehabilitation, vol. 84, no. 4, pp. 477–482, 2003. doi: 10.1053/apmr.2003.50110
[5]	J. C. Perry, J. Rosen, and S. Burns, “Upper-limb powered exoskeleton design,” IEEE/ASME Transactions on Mechatronics, vol. 12, no. 4, pp. 408–417, 2007. doi: 10.1109/TMECH.2007.901934
[6]	M. Bergamasco, B. Allotta, L. Bosio, L. Ferretti, G. Parrini, G. Prisco, F. Salsedo, and G. Sartini, “An arm exoskeleton system for teleoperation and virtual environments applications,” in Proc. IEEE Int. Conf. Robotics and Automation, 1994, pp. 1449–1454.
[7]	H. Modares, I. Ranatunga, F. L. Lewis, and D. O. Popa, “Optimized assistive human–robot interaction using reinforcement learning,” IEEE Trans. Cybernetics, vol. 46, no. 3, pp. 655–667, 2015.
[8]	K. Guo, Y. Pan, D. Zheng, and H. Yu, “Composite learning control of robotic systems: A least squares modulated approach,” Automatica, vol. 111, Article No. 108612, 2020. doi: 10.1016/j.automatica.2019.108612
[9]	T. Sun and Y. Pan, “Robust adaptive control for prescribed performance tracking of constrained uncertain nonlinear systems,” J. Franklin Institute, vol. 356, no. 1, pp. 18–30, 2019. doi: 10.1016/j.jfranklin.2018.09.005
[10]	K. Dupree, P. M. Patre, Z. D. Wilcox, and W. E. Dixon, “Asymptotic optimal control of uncertain nonlinear euler-lagrange systems,” Automatica, vol. 47, no. 1, pp. 99–107, 2011. doi: 10.1016/j.automatica.2010.10.007
[11]	Z. Li, J. Liu, Z. Huang, Y. Peng, H. Pu, and L. Ding, “Adaptive impedance control of human–robot cooperation using reinforcement learning,” IEEE Trans. Industrial Electronics, vol. 64, no. 10, pp. 8013–8022, 2017. doi: 10.1109/TIE.2017.2694391
[12]	T. Sun, L. Peng, L. Cheng, Z. Hou, and Y. Pan, “Stability-guaranteed variable impedance control of robots based on approximate dynamic inversion,” IEEE Trans. Systems,Man,and Cybernetics:Systems, vol. 51, no. 7, pp. 4193–4200, 2019. doi: 10.1109/TSMC.2019.2930582
[13]	T. Sun, L. Cheng, L. Peng, Z. Hou, and Y. Pan, “Learning impedance control of robots with enhanced transient and steady-state control performances,” Science China Information Sciences, vol. 63, no. 9, pp. 1–13, 2020.
[14]	T. Sun, L. Peng, L. Cheng, Z. Hou, and Y. Pan, “Composite learning enhanced robot impedance control,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 3, pp. 1052–1059, 2020. doi: 10.1109/TNNLS.2019.2912212
[15]	R. Colbaugh, H. Seraji, and K. Glass, “Direct adaptive impedance control of robot manipulators,” J. Robotic Systems, vol. 10, no. 2, pp. 217–248, 1993. doi: 10.1002/rob.4620100205
[16]	S. Ge, C. Hang, L. Woon, and X. Chen, “Impedance control of robot manipulators using adaptive neural networks,” Int. J. Intelligent Control and Systems, vol. 2, no. 3, pp. 433–452, 1998.
[17]	C. Wang, Y. Li, S. S. Ge, and T. H. Lee, “Reference adaptation for robots in physical interactions with unknown environments,” IEEE Transactions on Cybernetics, vol. 47, no. 11, pp. 3504–3515, 2016.
[18]	W.-S. Lu and Q.-H. Meng, “Impedance control with adaptation for robotic manipulations,” IEEE Trans. Robotics and Automation, vol. 7, no. 3, pp. 408–415, 1991. doi: 10.1109/70.88152
[19]	H. N. Rahimi, I. Howard, and L. Cui, “Neural impedance adaption for assistive human–robot interaction,” Neurocomputing, vol. 290, pp. 50–59, 2018. doi: 10.1016/j.neucom.2018.02.025
[20]	Y. Wang, W. Sun, Y. Xiang, and S. Miao, “Neural network-based robust tracking control for robots,” Intelligent Automation &Soft Computing, vol. 15, no. 2, pp. 211–222, 2009.
[21]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. USA: A Bradford Book, 2018.
[22]	Y. Yang, D. Wunsch, and Y. Yin, “Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1929–1940, 2017. doi: 10.1109/TNNLS.2017.2654324
[23]	Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Trans. Systems, Man, and Cybernetics: Systems, to be published, 2020.
[24]	Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5441–5455, 2020. doi: 10.1109/TNNLS.2020.2967871
[25]	D. Wang and X. Zhong, “Advanced policy learning near-optimal regulation,” IEEE/CAA J. Automa Sinica, vol. 6, no. 3, pp. 743–749, 2019. doi: 10.1109/JAS.2019.1911489
[26]	Y. Yang, Z. Guo, H. Xiong, D. Ding, Y. Yin, and D. C. Wunsch, “Datadriven robust control of discrete-time uncertain linear systems via offpolicy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 12, pp. 3735–3747, 2019. doi: 10.1109/TNNLS.2019.2897814
[27]	D. Wang, D. Liu, C. Mu, and Y. Zhang, “Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 4, pp. 1342–1351, 2018. doi: 10.1109/TNNLS.2017.2749641
[28]	Q. Zhang and D. Zhao, “Data-based reinforcement learning for nonzerosum games with unknown drift dynamics,” IEEE Trans. Cybernetics, vol. 49, no. 8, pp. 2874–2885, 2019. doi: 10.1109/TCYB.2018.2830820
[29]	H. Modares, F. L. Lewis, and Z.-P. Jiang, “H_∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550–2562, 2015. doi: 10.1109/TNNLS.2015.2441749
[30]	B. Luo, H.-N. Wu, and T. Huang, “Off-policy reinforcement learning for H8 control design,” IEEE trans. Cybernetics, vol. 45, no. 1, pp. 65–76, 2014.
[31]	W. Gao, Z. Jiang, and K. Ozbay, “Data-driven adaptive optimal control of connected vehicles,” IEEE Trans. Intelligent Transportation Systems, vol. 18, no. 5, pp. 1122–1133, 2017. doi: 10.1109/TITS.2016.2597279
[32]	W. Gao, J. Gao, K. Ozbay, and Z. Jiang, “Reinforcement-learningbased cooperative adaptive cruise control of buses in the lincoln tunnel corridor with time-varying topology,” IEEE Trans. Intelligent Transportation Systems, vol. 20, no. 10, pp. 3796–3805, 2019. doi: 10.1109/TITS.2019.2895285
[33]	T. Degris, M. White, and R. S. Sutton, “Off-policy actor-critic,” arXiv preprint arXiv:1205.4839, 2012.
[34]	Y. Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. doi: 10.1016/j.automatica.2012.06.096
[35]	J. Kober and J. R. Peters, “Policy search for motor primitives in robotics,” in Advances in Neural Information Processing Systems, in Learning Motor Skills, Cham: Springer, 2014, pp. 83–117.
[36]	F. Zhang, D. M. Dawson, M. S. de Queiroz, and W. E. Dixon, “Global adaptive output feedback tracking control of robot manipulators,” IEEE Trans. Automatic Control, vol. 45, no. 6, pp. 1203–1208, 2000. doi: 10.1109/9.863607
[37]	F. L. Lewis, D. M. Dawson, and C. T. Abdallah, Robot Manipulator Control: Theory and Practice. Boca Raton, Florida: CRC Press, 2003.
[38]	J. E. Slotine and W. Li, “On the adaptive control of robot manipulators,” Int. J. Robot. Res., vol. 6, no. 3, pp. 49–59, 1987. doi: 10.1177/027836498700600303
[39]	A. T. Hasan, N. Ismail, A. Hamouda, I. Aris, M. Marhaban, and H. AlAssadi, “Artificial neural network-based kinematics jacobian solution for serial manipulator passing through singular configurations,” Advances in Engineering Software, vol. 41, no. 2, pp. 359–367, 2010. doi: 10.1016/j.advengsoft.2009.06.006
[40]	R. C. Miall, D. J. Weir, D. M. Wolpert, and J. F. Stein, “Is the cerebellum a smith predictor?” Journal of Motor Behavior, vol. 25, no. 3, pp. 203–216, 1993. doi: 10.1080/00222895.1993.9942050
[41]	A. Phatak, H. Weinert, I. Segall, and C. N. Day, “Identification of a modified optimal control model for the human operator,” Automatica, vol. 12, no. 1, pp. 31–41, 1976. doi: 10.1016/0005-1098(76)90066-2
[42]	J. Ragazzini, “Engineering aspects of the human being as a servomechanism,” Am. Psychol., vol. 3, pp. 219–314, 1948. doi: 10.1037/h0056536
[43]	E. Zergeroglu, W. Dixon, D. Haste, and D. Dawson, “A composite adaptive output feedback tracking controller for robotic manipulators,” Robotica, vol. 17, no. 6, pp. 591–600, 1999. doi: 10.1017/S0263574799001848
[44]	H. Y. Lau and L. C. Wai, “A jacobian-based redundant control strategy for the 7-DOF wam,” in Proc. 7th Int. Conf. Control, Automation, Robotics and Vision, (ICARCV 2002), IEEE, 2002, vol. 2, pp. 1060–1065.
[45]	F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. Hoboken, NewJersey: John Wiley & Sons, 2012.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9)

Get Citation

PDF

XML

Article Metrics

Article views (7381) PDF downloads(173)

Highlights

We presented a novel two-level HRI controller design framework consisting of a task-oriented performance optimization design and a plant-oriented impedance controller design
In the outer-loop, the data-driven reinforcement learning technique is used for performance optimization to assign the optimal impedance parameters
In the inner-loop, a velocity-free filter is designed to avoid the requirement of end-effector velocity measurement

Data-Driven Human-Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning

doi: 10.1109/JAS.2021.1004258

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content