A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 1 Issue 4
Oct.  2014

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Qiming Zhao, Hao Xu and Sarangapani Jagannathan, "Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 372-384, 2014.
Citation: Qiming Zhao, Hao Xu and Sarangapani Jagannathan, "Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 372-384, 2014.

Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning

Funds:

This work was supported by National Science Foundation (NSF), Division of Electrical, Communications and Cyber Systems (ECCS) (1128281) and Missouri S&T University Intelligent System Center.

  • In this paper, the output feedback based finitehorizon near optimal regulation of nonlinear affine discretetime systems with unknown system dynamics is considered by using neural networks (NNs) to approximate Hamilton-Jacobi-Bellman (HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and timedependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closedloop system. Simulation results are given to show the effectiveness and feasibility of the proposed method.

     

  • loading
  • [1]
    Kirk D. Optimal Control Theory: An Introduction. New Jersey: Prentice-Hall, 1970.
    [2]
    Lewis F L, Syrmos V L. Optimal Control (Second edition). New York:Wiley, 1995.
    [3]
    Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic controlusing policy iteration. In: Proceedings of the 1994 American ControlConference. Baltimore, MD, USA: IEEE, 1994. 3475-3479
    [4]
    Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinearsystems with saturating actuators using a neural network HJB approach.Automatica, 2005, 41(5): 77-79
    [5]
    Xu H, Jagannathan S, Lewis F L. Stochastic optimal control of unknownnetworked control systems in the presence of random delays and packetlosses. Automatica, 2012, 48(6): 1017-1030
    [6]
    Xu H, Jagannathan S. Stochastic optimal controller design for uncertainnonlinear networked control system via neuro dynamic programming.IEEE Transactions on Neural Networks and Learning Systems, 2013,24(3): 471-484
    [7]
    Dierks T, Jagannathan S. Online optimal control of affine nonlineardiscrete-time systems with unknown internal dynamics by using timebasedpolicy update. IEEE Transactions on Neural Networks and LearningSystems, 2012, 23(7): 1118-1129
    [8]
    Chen Z, Jagannathan S. Generalized Hamilton-Jacobi-Bellman formulationbased neural network control of affine nonlinear discrete-timesystems. IEEE Transactions on Neural Networks, 2008, 19(1): 90-106
    [9]
    Slotine J E, Li W. Applied Nonlinear Control. Englewood Cliffs, NJ:Prentice-Hall, 1991.
    [10]
    Khalil H K, Laurent P. High-gain observers in nonlinear feedbackcontrol. International Journal of Robust and Nonlinear Control, 2014,24(6): 993-1015
    [11]
    Beard R. Improving the Closed-Loop Performance of Nonlinear Systems[Ph. D. dissertation], Rensselaer Polytechnic Institute, USA, 1995.
    [12]
    Cheng T, Lewis F L, Abu-Khalaf M. A neural network solution forfixed-final-time optimal control of nonlinear systems. Automatica, 2007,43(3): 482-490
    [13]
    Heydari A, Balakrishan S N. Finite-horizon input-constrained nonlinearoptimal control using single network adaptive critics. IEEE Transactionson Neural Networks and Learning Systems, 2013, 24(1): 145-157
    [14]
    Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programmingfor finite-horizon optimal control of discrete-time nonlinear systemswith ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1):24-36
    [15]
    Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinearHJB solution using approximate dynamic programming: convergenceproof. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetic, 2008, 38(4): 943-949
    [16]
    Narendra K S, Parthasarathy K. Identification and control of dynamicalsystems using neural networks. IEEE Transactions on Neural Networks,1990, 1(1): 4-27
    [17]
    Sarangapani J. Neural Network Control of Nonlinear Discrete-TimeSystems. Boca Raton, FL: CRC Press, 2006.
    [18]
    Yu W. Nonlinear system identification using discrete-time recurrentneural networks with stable learning algorithms. Information Sciences,2004, 158: 131-147
    [19]
    Zhang X, Zhang H G, Sun Q Y, Luo Y H. Adaptive dynamic programmingbased optimal control of unknown nonaffine nonlinear discretetimesystems with proof of convergence. Neurocomputing, 2012, 91:48-55
    [20]
    Vance J, Jagannathan S. Discrete-time neural network output feedbackcontrol of nonlinear discrete-time systems in non-strict form. Automatica,2008, 44(4): 1020-1027
    [21]
    Si J, Barto A G, Powell W B, Wunsch D. Handbook of Learningand Approximate Dynamic Programming. Hoboken: Wiley-IEEE Press,2004.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1306) PDF downloads(22) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return