Reinforcement Learning Behavioral Control for Nonlinear Autonomous System

Zhenyi Zhang; Zhibin Mo; Yutao Chen; Jie Huang

doi:10.1109/JAS.2022.105797

Volume 9 Issue 9

Sep. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(9): 1561-1573

Z. Y. Zhang, Z. B. Mo, Y. T. Chen, and J. Huang, “Reinforcement learning behavioral control for nonlinear autonomous system,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 9, pp. 1561–1573, Sept. 2022. doi: 10.1109/JAS.2022.105797

Citation:

Z. Y. Zhang, Z. B. Mo, Y. T. Chen, and J. Huang, “Reinforcement learning behavioral control for nonlinear autonomous system,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 9, pp. 1561–1573, Sept. 2022. doi: 10.1109/JAS.2022.105797

Citation:

PDF( 1914 KB)

Reinforcement Learning Behavioral Control for Nonlinear Autonomous System

doi: 10.1109/JAS.2022.105797

Zhenyi Zhang^1
,,
Zhibin Mo^1
,,
Yutao Chen^1
,,
Jie Huang^{1
,
,}

College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, the Key Laboratory of Industrial Automation Control Technology and Information Processing, Education Department of Fujian Province, Fuzhou 350108, and also with 5G+ Industrial Internet Institute, Fuzhou University, Fuzhou 350108, China

Funds: This work was supported in part by the National Natural Science Foundation of China (61603094)

More Information

Author Bio:
Zhenyi Zhang (Student Member, IEEE) received the B.E. degree in mechanical and electronic engineering, and the M.E. degree in mechanical engineering from Zhejiang Sci-Tech University in 2016 and 2019. He is now a Ph.D. candidate at Fuzhou University. His research interests include intelligent robot ethology and multi-agent learning systems

Zhibin Mo received the B.E. degree in Shantou University in 2018. He is pursuing the M.E. degree in Fuzhou University. His research interests include multi-agent reinforcement learning and human-robot coordination systems

Yutao Chen received the B.E. degree in automation from Hunan University in 2012, the master degree from the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, in 2014, and the Ph.D. degree from the Department of Information Engineering, University of Padova, Italy, in 2018. From 2019 to 2020, he was a Postdoctoral Researcher with the Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands. He is currently an Assistant Professor with the College of Electrical Engineering and Automation, Fuzhou University. His research interests include model predictive control algorithms and unmanned intelligent systems and applications

Jie Huang (Member, IEEE) received the B.E. degree in electrical engineering and automation, and the M.E. degree in control engineering from Fuzhou University in 2005 and 2010, respectively, and the Ph.D. degree in control science and engineering from Beijing Institute of Technology in 2015. From 2005 to 2015, he was a Lecturer with Fujian Institute of Education. From 2014 to 2018, he held postdoctoral and lecturer appointments with the Faculty of Science and Engineering, University of Groningen, The Netherlands. He is currently a Full Professor of robotic and control with the College of Electrical Engineering and Automation, Fuzhou University. He is the Vice-President of the Fujian Automation Association, Fujian Province, China. His research interests include autonomous robots, complex network dynamics, and multi-agent systems
Corresponding author: Jie Huang, e-mail: jie.huang@fzu.edu.cn
Received Date: 2021-12-11
Revised Date: 2022-01-10
Accepted Date: 2022-01-23

Available Online: 2022-03-26

Abstract

Abstract

Behavior-based autonomous systems rely on human intelligence to resolve multi-mission conflicts by designing mission priority rules and nonlinear controllers. In this work, a novel two-layer reinforcement learning behavioral control (RLBC) method is proposed to reduce such dependence by trial-and-error learning. Specifically, in the upper layer, a reinforcement learning mission supervisor (RLMS) is designed to learn the optimal mission priority. Compared with existing mission supervisors, the RLMS improves the dynamic performance of mission priority adjustment by maximizing cumulative rewards and reducing hardware storage demand when using neural networks. In the lower layer, a reinforcement learning controller (RLC) is designed to learn the optimal control policy. Compared with existing behavioral controllers, the RLC reduces the control cost of mission priority adjustment by balancing control performance and consumption. All error signals are proved to be semi-globally uniformly ultimately bounded (SGUUB). Simulation results show that the number of mission priority adjustment and the control cost are significantly reduced compared to some existing mission supervisors and behavioral controllers, respectively.
- Behavioral control,
- mission supervisor,
- nonlinear autonomous system,
- reinforcement learning

FullText(HTML)

References(46)

References

[1]	H. Wang, H. Zhao, J. Zhang, D. Ma, J. Li, and J. Wei, “Survey on unmanned aerial vehicle networks: A cyber physical system perspective,” IEEE Communications Surveys &Tutorials, vol. 22, no. 2, pp. 1027–1070, 2019.
[2]	Y. Cao, W. Yu, W. Ren, and G. Chen, “An overview of recent progress in the study of distributed multi-agent coordination,” IEEE Trans. Industrial informatics, vol. 9, no. 1, pp. 427–438, 2012.
[3]	K. K. Oh, M. C. Park, and H. S. Ahn, “A survey of multi-agent formation control,” Automatica, vol. 53, pp. 424–440, 2015. doi: 10.1016/j.automatica.2014.10.022
[4]	H. Yang and J. Liu, “An adaptive rbf neural network control method for a class of nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 2, pp. 457–462, 2018. doi: 10.1109/JAS.2017.7510820
[5]	J. Lu, Q. Wei, and F. Wang, “Parallel control for optimal tracking via adaptive dynamic programming,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1662–1674, 2020. doi: 10.1109/JAS.2020.1003426
[6]	M. Tipaldi and L. Glielmo, “A survey on model-based mission planning and execution for autonomous spacecraft,” IEEE Systems Journal, vol. 12, no. 4, pp. 3893–3905, 2017.
[7]	L. Garattoni and M. Birattari, “Autonomous task sequencing in a robot swarm,” Science Robotics, vol. 3, no. 20, 2018.
[8]	H. Ueno and Y. Saito, “Model-based vision and intelligent task scheduling for autonomous human-type robot arm,” Robotics and Autonomous Systems, vol. 18, no. 1-2, pp. 195–206, 1996. doi: 10.1016/0921-8890(95)00077-1
[9]	C. Ott, A. Dietrich, and A. Albu-Schäffer, “Prioritized multi-task compliance control of redundant manipulators,” Automatica, vol. 53, pp. 416–423, 2015. doi: 10.1016/j.automatica.2015.01.015
[10]	R. Brooks, “A robust layered control system for a mobile robot,” IEEE Journal on Robotics and Automation, vol. 2, no. 1, pp. 14–23, 1986. doi: 10.1109/JRA.1986.1087032
[11]	R. C. Arkin, “Motor schema based mobile robot navigation,” The Int. Journal of Robotics Research, vol. 8, no. 4, pp. 92–112, 1989. doi: 10.1177/027836498900800406
[12]	T. Balch and R. C. Arkin, “Behavior-based formation control for multirobot teams,” IEEE Trans. Robotics and Automation, vol. 14, no. 6, pp. 926–939, 1998. doi: 10.1109/70.736776
[13]	G. Antonelli and S. Chiaverini, “Kinematic control of platoons of autonomous vehicles,” IEEE Trans. Robotics, vol. 22, no. 6, pp. 1285–1292, 2006. doi: 10.1109/TRO.2006.886272
[14]	G. Antonelli, F. Arrichiello, and S. Chiaverini, “The null-space-based behavioral control for autonomous robotic systems,” Intelligent Service Robotics, vol. 1, no. 1, pp. 27–39, 2008. doi: 10.1007/s11370-007-0002-3
[15]	A. Marino, L. E. Parker, G. Antonelli, and F. Caccavale, “A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multi-robot border patrolling,” Journal of Intelligent &Robotic Systems, vol. 71, no. 3, pp. 423–444, 2013.
[16]	L. Moreno, E. Moraleda, M. Salichs, J. Pimentel, and A. de la Escalera, “Fuzzy supervisor for behavioral control of autonomous systems,” in Proc. IECON’93-19th Annu. Conf. IEEE Industrial Electronics, pp. 258–261, 1993.
[17]	A. Marino, F. Caccavale, L. E. Parker, and G. Antonelli, “Fuzzy behavioral control for multi-robot border patrol,” in Proc. IEEE 17th Mediterranean Conf. Control and Automation, pp. 246–251, 2009.
[18]	Y. Chen, Z. Zhang, and J. Huang, “Dynamic task priority planning for null-space behavioral control of multi-agent systems,” IEEE Access, vol. 8, pp. 149643–149651, 2020. doi: 10.1109/ACCESS.2020.3016347
[19]	J. Chen, M. Gan, J. Huang, L. Dou, and H. Fang, “Formation control of multiple Euler-Lagrange systems via null-space-based behavioral control,” Science China Information Sciences, vol. 59, no. 1, pp. 1–11, 2016.
[20]	M. C. P. Santos, C. D. Rosales, M. Sarcinelli-Filho, and R. Carelli, “A novel null-space-based UAV trajectory tracking controller with collision avoidance,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 6, pp. 2543–2553, 2017. doi: 10.1109/TMECH.2017.2752302
[21]	J. Huang, N. Zhou, and M. Cao, “Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: Theory and experiments,” IEEE Trans. Industrial Electronics, vol. 66, no. 12, pp. 9612–9622, 2019. doi: 10.1109/TIE.2019.2892669
[22]	N. Zhou, X. Cheng, Z. Sun, and Y. Xia, “Fixed-time cooperative behavioral control for networked autonomous agents with second-order nonlinear dynamics,” IEEE Trans. Cybernetics, 2021. DOI: 10.1109/TCYB.2021.3057219.
[23]	F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. John Wiley & Sons, 2012.
[24]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2017.
[25]	D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, “Adaptive dynamic programming for control: A survey and recent advances,” IEEE Trans. Systems, Man, and Cybernetics: Systems, 2020.
[26]	V. G. Lopez and F. L. Lewis, “Dynamic multiobjective control for continuous-time systems using reinforcement learning,” IEEE Trans. Automatic Control, vol. 64, no. 7, pp. 2869–2874, 2018.
[27]	M. Mazouchi, Y. Yang, and H. Modares, “Data-driven dynamic multiobjective optimal control: An aspiration-satisfying reinforcement learning approach,” IEEE Trans. Neural Networks and Learning Systems, 2021. DOI: 10.1109/TNNLS.2021.3072571.
[28]	K. Baizid, G. Giglio, F. Pierri, M. A. Trujillo, G. Antonelli, F. Caccavale, A. Viguria, S. Chiaverini, and A. Ollero, “Behavioral control of unmanned aerial vehicle manipulator systems,” Autonomous Robots, vol. 41, no. 5, pp. 1203–1220, 2017. doi: 10.1007/s10514-016-9590-0
[29]	A. Mustafa, N. K. Dhar, and N. K. Verma, “Event-triggered sliding mode control for trajectory tracking of nonlinear systems,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 307–314, 2019.
[30]	C. Silvestre, R. Cunha, N. Paulino, and A. Pascoal, “A bottom-following preview controller for autonomous underwater vehicles,” IEEE Trans. Control Systems Technology, vol. 17, no. 2, pp. 257–266, 2008.
[31]	J. Funke, M. Brown, S. M. Erlien, and J. C. Gerdes, “Collision avoidance and stabilization for autonomous vehicles in emergency scenarios,” IEEE Trans. Control Systems Technology, vol. 25, no. 4, pp. 1204–1216, 2016.
[32]	B. Wang and Y. Zhang, “An adaptive fault-tolerant sliding mode control allocation scheme for multirotor helicopter subject to simultaneous actuator faults,” IEEE Trans. Industrial Electronics, vol. 65, no. 5, pp. 4227–4236, 2017.
[33]	J. N. Franklin, Matrix Theory. Courier Corporation, 2012.
[34]	G. Wen, C. P. Chen, and B. Li, “Optimized formation control using simplified reinforcement learning for a class of multiagent systems with unknown dynamics,” IEEE Trans. Industrial Electronics, vol. 67, no. 9, pp. 7879–7888, 2019.
[35]	G. Wen, C. P. Chen, J. Feng, and N. Zhou, “Optimized multi-agent formation control based on an identifier-actor-critic reinforcement learning algorithm,” IEEE Trans. Fuzzy Systems, vol. 26, no. 5, pp. 2719–2731, 2017.
[36]	S. S. Ge and C. Wang, “Adaptive neural control of uncertain mimo nonlinear systems,” IEEE Trans. Neural Networks, vol. 15, no. 3, pp. 674–692, 2004. doi: 10.1109/TNN.2004.826130
[37]	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
[38]	T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv: 1511.05952, 2015.
[39]	Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. Int. Conf. Machine Learning, pp. 1995–2003, PMLR, 2016.
[40]	R. W. Beard, G. N. Saridis, and J. T. Wen, “Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation,” Automatica, vol. 33, no. 12, pp. 2159–2177, 1997. doi: 10.1016/S0005-1098(97)00128-3
[41]	B. Kosko, “Fuzzy systems as universal approximators,” IEEE Trans. Computers, vol. 43, no. 11, pp. 1329–1333, 1994. doi: 10.1109/12.324566
[42]	W. He, S. S. Ge, Y. Li, E. Chew, and Y. S. Ng, “Impedance control of a rehabilitation robot for interactive training,” in Proc. Int. Conf. Social Robotics, pp. 526–535, Springer, 2012.
[43]	H. Lin, B. Zhao, D. Liu, and C. Alippi, “Data-based fault tolerant control for affine nonlinear systems through particle swarm optimized neural networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 954–964, 2020. doi: 10.1109/JAS.2020.1003225
[44]	M. A. Johnson and M. H. Moradi, PID Control. Springer, 2005.
[45]	G. Wen, C. P. Chen, S. S. Ge, H. Yang, and X. Liu, “Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy,” IEEE Trans. Industrial Informatics, vol. 15, no. 9, pp. 4969–4977, 2019. doi: 10.1109/TII.2019.2894282
[46]	Y. Liu, X. Liu, Y. Jing, and Z. Zhang, “A novel finite-time adaptive fuzzy tracking control scheme for nonstrict feedback systems,” IEEE Trans. Fuzzy Systems, vol. 27, no. 4, pp. 646–658, 2018.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(6)

Get Citation

PDF

XML

Article Metrics

Article views (846) PDF downloads(222)

Highlights

The behavior-based autonomous systems rely on human intelligence to resolve multi-mission conflicts by designing mission priority rules and nonlinear controllers. In this work, a novel two-layer reinforcement learning behavioral control (RLBC) method is proposed to reduce such dependence by trial-and-error learning
In the upper layer, a novel reinforcement learning mission supervisor (RLMS) is proposed to learn the optimal mission priority. Compared with the existing mission supervisors, the proposed RLMS avoids artificially designing the mission priority adjustment rules and improves the dynamic adjustment performance by trial and error learning. Moreover, the RLMS reduces the hardware requirements by shifting the heavy computational burden to off-line training process
In the lower layer, a reinforcement learning controller (RLC) with identifier-actor-critic structure is designed to track the reference trajectory optimally. Compared with the existing behavioral controllers, the RLC ensures the robustness and the optimality by learning the nonlinear model and the optimal tracking control policy, respectively. In addition, the RLC effectively reduces the control cost of mission priority switching
The tracking error and the network weight error of the identifier, actor and critic all are proved to be semi-globally uniformly ultimately bounded (SGUUB) by using Lyapunov theory. The general paradigm of mission stability with different priorities is given through mathematical induction. Both the mission completion and the control objective achievement are guaranteed theoretically

Reinforcement Learning Behavioral Control for Nonlinear Autonomous System

doi: 10.1109/JAS.2022.105797

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content