IEEE/CAA Journal of Automatica Sinica
Citation: | A. Perrusquía and W. Guo, “Optimal control of nonlinear systems using experience inference human-behavior learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 90–102, Jan. 2023. doi: 10.1109/JAS.2023.123009 |
[1] |
D. Liberzon, Calculus of Variations and Optimal Control Theory. Princeton university press, 2011.
|
[2] |
F. L. Lewis, Optimal Control. New York, NY, USA: Wiley, 2012.
|
[3] |
A. Perrusquía and W. Yu, “Discrete-time H2 neural control using reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, pp. 1–11, 2020.
|
[4] |
J.-H. Kim and F. Lewis, “Model-free h∞ control design for unknown linear discrete-time systems via Q-learining with LMI,” Automatica, vol. 46, pp. 1320–1326, 2010. doi: 10.1016/j.automatica.2010.05.002
|
[5] |
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control using natural decision methods to design optimal adaptive controllers,” IEEE Conrol Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012.
|
[6] |
Z.-P. Jiang, T. Bian, and W. Gao, “Learning-based control: A tutorial and some recent results,” Foundations and Trends® in Systems and Control, vol. 8, no. 3, 2020.
|
[7] |
S. Tu and B. Recht, “The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint,” in Proc. Conf. Learning Theory. PMLR, 2019, pp. 3036–3083.
|
[8] |
M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite horizon-discounted cost linear quadratic regulator problems,” IEEE Trans. Cybernetics, vol. 45, no. 2, pp. 165–176, 2015. doi: 10.1109/TCYB.2014.2322116
|
[9] |
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018. doi: 10.1109/TNNLS.2017.2773458
|
[10] |
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpor, and M.-B. NaghibiSistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, pp. 1167–1175, 2014. doi: 10.1016/j.automatica.2014.02.015
|
[11] |
H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193–202, 2014. doi: 10.1016/j.automatica.2013.09.043
|
[12] |
H. Modares and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning,” IEEE Trans. Automatic Control, vol. 59, no. 11, pp. 3051–3056, 2014. doi: 10.1109/TAC.2014.2317301
|
[13] |
A. Perrusquía and W. Yu, “Robot position/force control in unknown environment using hybrid reinforcement learning,” Cybernetics and Systems, vol. 51, no. 4, pp. 542–560, 2020. doi: 10.1080/01969722.2020.1758466
|
[14] |
Q. Xie, B. Luo, and F. Tan, “Discrete-time LQR optimal tracking control problems using approximate dynamic programming algorithm with disturbance,” in Proc. IEEE 4th Int. Conf. Intelligent Control and Information Processing, 2013, pp. 716–721.
|
[15] |
D. Vrabie and F. L. Lewis, “Neural networks approach for continuoustime direct adaptive optimal control for partially unknown nonlinear systems,” Neural Networks, vol. 22, pp. 237–246, 2009. doi: 10.1016/j.neunet.2009.03.008
|
[16] |
L. Buşoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, 2010.
|
[17] |
R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
|
[18] |
A. Perrusquía and W. Yu, “Neural H2 control using continuous-time reinforcement learning,” IEEE Trans. Cybernetics, pp. 1–10, 2020.
|
[19] |
M. Wiering and M. van Otterlo, Reinforcement Learning: State-of-Art. Springer, 2012.
|
[20] |
I. Grondman, L. Buşoniu, G. A. Lopes, and R. Babǔska, “A survey of actor-critic reinforcement learning: Standard and natural policy gradients,” IEEE Trans. Systems,Man,and Cybernetics,PART C, vol. 42, no. 6, pp. 1291–1307, 2012. doi: 10.1109/TSMCC.2012.2218595
|
[21] |
A. Perrusquía and W. Yu, “Continuous-time reinforcement learning for robust control under worst-case uncertainty,” Int. Journal of Systems Science, vol. 52, no. 4, pp. 770–784, 2021. doi: 10.1080/00207721.2020.1839142
|
[22] |
A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. System,Man,and Cybernetics Part B,Cybernetics, vol. 38, no. 4, pp. 943–949, 2008. doi: 10.1109/TSMCB.2008.926614
|
[23] |
A. Gheibi, A. Ghiasi, S. Ghaemi, and M. Badamchizadeh, “Designing of robust adaptive passivity-based controller based on reinforcement learning for nonlinear port-Hamiltonian model with disturbance,” Int. Journal of Control, vol. 93, no. 8, pp. 1754–1764, 2020. doi: 10.1080/00207179.2018.1532607
|
[24] |
M. Tomás-Rodríguez and S. P. Banks, Linear, Time-Varying Approximations to Nonlinear Dynamical Systems: With Applications in Control and Optimization. Springer Science & Business Media, 2010, vol. 400.
|
[25] |
C. Wang, Y. Li, S. Sam Ge, and T. Heng Lee, “Optimal critic learning for robot control in time-varying environments,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2301–2310, 2015. doi: 10.1109/TNNLS.2014.2378812
|
[26] |
R. Kamalapurkar, Walters, and W. Dixon, “Model-based reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016. doi: 10.1016/j.automatica.2015.10.039
|
[27] |
A. Perrusquía, W. Yu, and A. Soria, “Position/force control of robot manipulators using reinforcement learning,” Industrial Robot: The International Jounral of Robotics Research and Application, vol. 46, no. 2, pp. 267–280, 2019. doi: 10.1108/IR-10-2018-0209
|
[28] |
H. Zhang, D. Liu, Y. Luo, and D. Wang, Adaptive Dynamic Programming for Control. London, U.K.: Springer-Verlag, 2013.
|
[29] |
B. Kiumarsi and F. L. Lewis, “Actor-critic based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 1, pp. 140–151, 2015. doi: 10.1109/TNNLS.2014.2358227
|
[30] |
C. Mu, Z. Ni, C. Sun, and H. He, “Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 3, pp. 584–598, 2016.
|
[31] |
C. Mu, Z. Ni, C. Sun, and H. He, “Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems,” IEEE Trans. Cybernetics, vol. 47, no. 6, pp. 1460–1470, 2016.
|
[32] |
B. Pang and Z.-P. Jiang, “Robust reinforcement learning: A case study in linear quadratic regulation,” arXiv preprint arXiv: 2008.11592, 2020.
|
[33] |
K. G. Vamvoudakis, “Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach,” Systems &Control Letters, pp. 14–20, 2017.
|
[34] |
F. L. Lewis, S. Jagannathan, and A. Yeşildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor & Francis, 1999.
|
[35] |
A. Perrusquía and W. Yu, “Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview,” Neurocomputing, vol. 438, pp. 145–154, 2021. doi: 10.1016/j.neucom.2021.01.096
|
[36] |
J. Young Lee, J. B. Park, and Y. H. Choi, “Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 5, 2015.
|
[37] |
A. Perrusquía, W. Yu, and X. Li, “Multi-agent reinforcement learning for redundant robot control in task-space,” Int. Journal of Machine Learning and Cybernetics, vol. 12, no. 1, pp. 231–241, 2021. doi: 10.1007/s13042-020-01167-7
|
[38] |
V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
|
[39] |
A. Perrusquía and W. Yu, “Robust control under worst-case uncertainty for unknown nonlinear systems using modified reinforcement learning,” Int. Journal of Robust and Nonlinear Control, vol. 30, no. 7, pp. 2920–2936, 2020. doi: 10.1002/rnc.4911
|
[40] |
J. Ramírez, W. Yu, and A. Perrusquía, “Model-free reinforcement learning from expert demonstrations: A survey,” Artificial Intelligence Review, pp. 1–29, 2021.
|
[41] |
A. Perrusquía, W. Yu, and X. Li, “Nonlinear control using human behavior learning,” Information Sciences, vol. 569, pp. 358–375, 2021. doi: 10.1016/j.ins.2021.03.043
|
[42] |
B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences, vol. 40, 2017.
|
[43] |
D. Kumaran, D. Hassabis, and J. L. McClelland, “What learning systems do intelligent agents need? Complementary learning systems theory updated” Trends in Cognitive Sciences, vol. 20, no. 7, pp. 512–534, 2016. doi: 10.1016/j.tics.2016.05.004
|
[44] |
A. Perrusquía, “A complementary learning approach for expertise transference of human-optimized controllers,” Neural Networks, vol. 145, pp. 33–41, 2021.
|
[45] |
R. C. O’Reilly, R. Bhattacharyya, M. D. Howard, and N. Ketz, “Complementary learning systems,” Cognitive Science, vol. 38, no. 6, pp. 1229–1248, 2014. doi: 10.1111/j.1551-6709.2011.01214.x
|
[46] |
H. F. Ólafsdóttir, D. Bush, and C. Barry, “The role of hippocampal replay in memory and planning,” Current Biology, vol. 28, no. 1, pp. R37–R50, 2018. doi: 10.1016/j.cub.2017.10.073
|
[47] |
M. G. Mattar and N. D. Daw, “Prioritized memory access explains planning and hippocampal replay,” Nature Neuroscience, vol. 21, no. 11, pp. 1609–1617, 2018. doi: 10.1038/s41593-018-0232-z
|
[48] |
A. Perrusquía, “Human-behavior learning: A new complementary learning perspective for optimal decision making controllers,” Neurocomputing, 2022.
|
[49] |
A. Vilà-Balló, E. Mas-Herrero, Ripollés, et al., “Unraveling the role of the hippocampus in reversal learning,” Journal of Neuroscience, vol. 37, no. 28, pp. 6686–6697, 2017. doi: 10.1523/JNEUROSCI.3212-16.2017
|
[50] |
K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman, “The hippocampus as a predictive map,” Nature Neuroscience, vol. 20, no. 11, pp. 1643–1653, 2017. doi: 10.1038/nn.4650
|
[51] |
S. Blakeman and D. Mareschal, “A complementary learning systems approach to temporal difference learning,” Neural Networks, vol. 122, pp. 218–230, 2020. doi: 10.1016/j.neunet.2019.10.011
|
[52] |
W. Schultz, Apicella, E. Scarnati, and T. Ljungberg, “Neuronal activity in monkey ventral striatum related to the expectation of reward,” Journal of Neuroscience, vol. 12, no. 12, pp. 4595–4610, 1992. doi: 10.1523/JNEUROSCI.12-12-04595.1992
|
[53] |
J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory,” Psychological Review, vol. 102, no. 3, p. 419, 1995.
|
[54] |
K. Vamvoudakis and F. L. Lewis, “On-line actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, pp. 878–888, 2010. doi: 10.1016/j.automatica.2010.02.018
|
[55] |
T. Cimen, “Survey of state-dependent riccati equation in nonlinear optimal feedback control synthesis,” Journal of Guidance,Control,and Dynamics, vol. 35, no. 4, pp. 1025–1047, 2012. doi: 10.2514/1.55821
|
[56] |
N. Babaei and M. U. Salamci, “State dependent riccati equation based model reference adaptive control design for nonlinear systems,” in Proc. IEEE XXIV Int. Conf. Information, Communication and Automation Technologies, 2013, pp. 1–8.
|
[57] |
N. Babaei and M. U. Salamci, “State dependent riccati equation based model reference adaptive stabilization of nonlinear systems with application to cancer treatment,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 1296–1301, 2014. doi: 10.3182/20140824-6-ZA-1003.02282
|
[58] |
S. R. Nekoo, “Model reference adaptive state-dependent Riccati equation control of nonlinear uncertain systems: Regulation and tracking of free-floating space manipulators,” Aerospace Science and Technology, vol. 84, pp. 348–360, 2019. doi: 10.1016/j.ast.2018.10.005
|
[59] |
N. T. Nguyen, “Model-reference adaptive control,” in Model-Reference Adaptive Control. Springer, 2018, pp. 83–123.
|
[60] |
J. R. Cloutier, D. T. Stansbery, and M. Sznaier, “On the recoverability of nonlinear state feedback laws by extended linearization control techniques,” in Proc. IEEE American Control Conf. (Cat. No. 99CH36251), vol. 3, 1999, pp. 1515–1519.
|
[61] |
H. Modares, F. L. Lewis, and Z.-P. Jiang, “H ∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550–2562, 2015. doi: 10.1109/TNNLS.2015.2441749
|
[62] |
C.-Y. Lee and J.-J. Lee, “Adaptive control for uncertain nonlinear systems based on multiple neural networks,” IEEE Trans. Systems Man and Cybernetics Part B, vol. 34, no. 1, pp. 325–333, 2004. doi: 10.1109/TSMCB.2003.811520
|
[63] |
D. Luviano and W. Yu, “Continuous-time path planning for multi-agents with fuzzy reinforcement learning,” Journal of Intelligent &Fuzzy Systems, vol. 33, pp. 491–501, 2017.
|
[64] |
W. Yu and A. Perrusquía, “Simplified stable admittance control using end-effector orientations,” Int. Journal of Social Robotics, vol. 12, no. 5, pp. 1061–1073, 2020. doi: 10.1007/s12369-019-00579-y
|
[65] |
M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and Control. John Wiley & Sons, 2020.
|