Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games

Rushikesh Kamalapurkar; Justin R. Klotz; Warren E. Dixon

Volume 1 Issue 3

Jul. 2014

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2014 > 1(3): 239-247

Rushikesh Kamalapurkar, Justin R. Klotz and Warren E. Dixon, "Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 239-247, 2014.

Citation:

Rushikesh Kamalapurkar, Justin R. Klotz and Warren E. Dixon, "Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 239-247, 2014.

Citation:

Rushikesh Kamalapurkar, Justin R. Klotz and Warren E. Dixon, "Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 239-247, 2014.

PDF( 1497 KB)

Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games

1. Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville 32608, USA;
2. Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville 32608, USA;
3. Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville 32608, USA

Funds:

This work was supported by National Science Foundation Award (1161260, 1217908), Office of Naval Research (N00014-13-1-0151), and a contract with the Air Force Research Laboratory Mathematical Modeling and Optimization Institute. Recommended by Associate Editor Zhongsheng Hou

Abstract

Abstract

This paper presents a concurrent learning-based actor-critic-identifier architecture to obtain an approximate feedback-Nash equilibrium solution to an infinite horizon N-player nonzero-sum differential game. The solution is obtained online for a nonlinear control-affine system with uncertain linearly parameterized drift dynamics. It is shown that under a condition milder than persistence of excitation (PE), uniformly ultimately bounded convergence of the developed control policies to the feedback-Nash equilibrium policies can be established. Simulation results are presented to demonstrate the performance of the developed technique without an added excitation signal.
- Nonlinear system,
- optimal adaptive control,
- dynamic programming,
- data driven control

FullText(HTML)

References(29)

References

[1]	Kirk D E. Optimal Control Theory: An Introduction. New York: Dover Publications, 2004.
[2]	Lewis F L, Vrabie D, Syrmos V L. Optimal Control (Third edition). New York: John Wiley & Sons, 2012.
[3]	Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: Dover Publications, 1999.
[4]	Tijs S. Introduction to Game Theory. Hindustan Book Agency, 2003.
[5]	Basar T, Olsder G. Dynamic Noncooperative Game Theory (Second edition). Philadelphia, PA: SIAM, 1999.
[6]	Nash J. Non-cooperative games. The Annals of Mathematics, 1951, 54(2): 286-295
[7]	Case J H. Toward a theory of many player differential games. SIAM Journal on Control, 1969, 7(2): 179-197
[8]	Starr A W, Ho C Y. Nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(3): 184-206
[9]	Starr A, Ho C Y. Further properties of nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(4): 207-219
[10]	Friedman A. Differential Games. New York: John Wiley and Sons, 1971.
[11]	Littman M. Value-function reinforcement learning in Markov games. Cognitive Systems Research, 2001, 2(1): 55-66
[12]	Wei Q L, Zhang H G. A new approach to solve a class of continuoustime nonlinear quadratic zero-sum game using ADP. In: Proceedings of the IEEE International Conference on Networking Sensing and Control. Sanya, China: IEEE, 2008. 507-512
[13]	Vamvoudakis K, Lewis F. Online synchronous policy iteration method for optimal control. In: Proceedings of the Recent Advances in Intelligent Control Systems. London: Springer, 2009. 357-374
[14]	Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2010, 47: 207-214
[15]	Zhang X, Zhang H G, Luo Y H, Dong M. Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the IEEE Conference Decision and Control. Xuzhou, China: IEEE, 2010. 1359-1364
[16]	Vrabie D, Lewis F. Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proceedings of the 49th IEEE Conference Decision and Control. Atlanta, GA: IEEE, 2010. 3066-3071
[17]	Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556-1569
[18]	Zhang H, Liu D, Luo Y, Wang D. Adaptive Dynamic Programming for Control-Algorithms and Stability (Communications and Control Engineering). London: Springer-Verlag, 2013
[19]	Johnson M, Bhasin S, Dixon W E. Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference. Orlando, FL, USA: IEEE, 2011. 142-147
[20]	Mehta P, Meyn S. Q-learning and pontryagin's minimum principle. In: Proceedings of the IEEE Conference on Decision and Control. Shanghai, China: IEEE, 2009. 3598-3605
[21]	Vrabie D, Abu-Khalaf M, Lewis F L, Wang Y Y. Continuous-time ADP for linear systems with partially unknown dynamics. In: Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, HI: IEEE, 2007. 247-253
[22]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998. 4
[23]	Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis F L, Dixon W. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 89-92
[24]	Chowdhary G, Yucelen T, Mühlegg M, Johnson E N. Concurrent learning adaptive control of linear systems with exponentially convergent bounds. International Journal of Adaptive Control and Signal Processing, 2012, 27(4): 280-301
[25]	Chowdhary G V, Johnson E N. Theory and flight-test validation of a concurrent-learning adaptive controller. Journal of Guidance, Control, and Dynamics, 2011, 34(2): 592-607
[26]	Ioannou P, Sun J. Robust Adaptive Control. New Jersey: Prentice Hall, 1996. 198
[27]	Khalil H K. Nonlinear Systems (Third edition). New Jersey: Prentice Hall, 2002. 172
[28]	Nevistić V, Primbs J A. Constrained nonlinear optimal control: a converse HJB approach. California Institute of Technology, Pasadena, CA 91125, Techical Report CIT-CDS 96-021, 1996. 5
[29]	Savitzky A, Golay M J E. Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 1964, 36(8): 1627-1639

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (1242) PDF downloads(29)

Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content