A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 5 Issue 1
Jan.  2018

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Lei Xue, Changyin Sun, Donald Wunsch, Yingjiang Zhou and Fang Yu, "An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301-310, Jan. 2018. doi: 10.1109/JAS.2017.7510466
Citation: Lei Xue, Changyin Sun, Donald Wunsch, Yingjiang Zhou and Fang Yu, "An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301-310, Jan. 2018. doi: 10.1109/JAS.2017.7510466

An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game

doi: 10.1109/JAS.2017.7510466
Funds:

the National Natural Science Foundation (NNSF) of China 61603196

the National Natural Science Foundation (NNSF) of China 61503079

the National Natural Science Foundation (NNSF) of China 61520106009

the National Natural Science Foundation (NNSF) of China 61533008

the Natural Science Foundation of Jiangsu Province of China BK20150851

China Postdoctoral Science Foundation 2015M581842

Jiangsu Postdoctoral Science Foundation 1601259C

Nanjing University of Posts and Telecommunications Science Foundation (NUPTSF) NY215011

Priority Academic Program Development of Jiangsu Higher Education Institutions, the open fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education MCCSE2015B02

the Research Innovation Program for College Graduates of Jiangsu Province CXLX1309

More Information
  • The iterated prisoner's dilemma (IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.

     

  • loading
  • [1]
    J. Seiffertt, S. Mulder, R. Dua, and D. C. Wunsch, "Neural networks and Markov models for the iterated prisoner's dilemma, " in Proc. Int. Joint Conf. Neural Networks, Atlanta, GA, USA, 2009, pp. 2860-2866. http://dl.acm.org/citation.cfm?id=1704398
    [2]
    H. Y. Quek, K. C. Tan, C. K. Goh, and H. A. Abbass, "Evolution and incremental learning in the iterated prisoner's dilemma, " IEEE Trans. Evol. Comput., vol. 13, no. 2, pp. 303-320, Apr. 2009. http://ieeexplore.ieee.org/document/4703197/
    [3]
    R. Axelrod, The Evolution of Cooperation. New York, USA: Basic, 1984.
    [4]
    M. A. Nowak, R. M. May, "Evolutionary games and spatial chaos, " Nature, vol. 359, no. 6398, pp. 826-829, Oct. 1992. http://www.jstor.org/servlet/linkout?suffix=rf92&dbid=16&doi=10.1086%2F670192&key=10.1038%2F359826a0
    [5]
    F. Fu, M. A. Nowak, and C. Hauert, "Invasion and expansion of cooperators in lattice populations: Prisoner's dilemma vs. snowdrift games, " J. Theor. Biol., vol. 266, no. 3, pp. 358-366, Oct. 2010. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2927800/?tool=pubmed
    [6]
    J. Liu, Y. Li, C. Xu, and P. M. Hui, "Evolutionary behavior of generalized zero-determinant strategies in iterated prisoner's dilemma, " Phys. A Stat. Mech. Appl., vol. 430, pp. 81-92, Jul. 2015. http://www.sciencedirect.com/science/article/pii/S0378437115002034
    [7]
    G. Szabó, G. Fath, "Evolutionary games on graphs, " Phys. Rep., vol. 446, no. 4-6, pp. 97-216, Jul. 2007. http://www.sciencedirect.com/science/article/pii/S0370157307001810
    [8]
    D. C. Wunsch and S. Mulder, "Evolutionary algorithms, Markov decision processes, adaptive critic designs, and clustering: Commonalities, hybridization and performance, " in Proc. Int. Conf. Intelligent Sensing and Information Processing, Chennai, India, 2004, pp. 477-482. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=1287704
    [9]
    H. Ishibuchi and N. Namikawa, "Evolution of iterated prisoner's dilemma game strategies in structured demes under random pairing in game playing, " IEEE Trans. Evol. Comput., vol. 9, no. 6, pp. 552-561, Dec. 2005. http://ieeexplore.ieee.org/document/1545934/
    [10]
    H. Ishibuchi, H. Ohyanagi, and Y. Nojima, "Evolution of strategies with different representation schemes in a spatial iterated prisoner's dilemma game, " IEEE Trans. Comput. Intell. AI Games, vol. 3, no. 1, pp. 67-82, Mar. 2011. http://ieeexplore.ieee.org/document/5705567/
    [11]
    D. Ashlock and E. Y. Kim, "Fingerprinting: Visualization and Automatic analysis of prisoner's dilemma strategies, " IEEE Trans. Evol. Comput., vol. 12, no. 5, pp. 647-659, Oct. 2008. http://ieeexplore.ieee.org/document/4492964/
    [12]
    D. Ashlock, E. Y. Kim, and W. Ashlock, "Fingerprint analysis of the noisy prisoner's dilemma using a finite-state representation, " IEEE Trans. Comput. Intell. AI Games, vol. 1, no. 2, pp. 154-167, Jun. 2009. http://ieeexplore.ieee.org/document/4804733/
    [13]
    D. Ashlock and C. Lee, "Agent-case embeddings for the analysis of evolved systems, " IEEE Trans. Evol. Comput., vol. 17, no. 2, pp. 227-240, Apr. 2013. http://ieeexplore.ieee.org/document/6384730/
    [14]
    J. S. Wu, Y. Q. Hou, L. C. Jiao, and H. J. Li, "Community structure inhibits cooperation in the spatial prisoner's dilemma, " Phys. A Stat. Mech. Appl., vol. 412, pp. 169-179, Oct. 2014. http://www.sciencedirect.com/science/article/pii/S0378437114005172
    [15]
    Y. Z. Cui and X. Y. Wang, "Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks, " Phys. A Stat. Mech. Appl., vol. 407, pp. 7-14, Aug. 2014. http://www.sciencedirect.com/science/article/pii/S037843711400288X
    [16]
    S. P. Nageshrao, G. A. D. Lopes, D. Jeltsema, and R. Babuška, "Porthamiltonian systems in adaptive and learning control:A survey, " IEEE Trans. Autom. Control, vol. 61, no. 5, pp. 1223-1238, May 2016. doi: 10.1109/TAC.2015.2458491
    [17]
    C. M. Liu, X. Xu, and D. W. Hu, "Multiobjective reinforcement learning: A comprehensive overview, " IEEE Trans. Syst. Man Cybern. Syst., vol. 45, no. 3, pp. 385-398, Mar. 2015.
    [18]
    Y. J. Liu, Y. Gao, S. C. Tong, and Y. M. Li, "Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discretetime systems with dead-zone, " IEEE Trans. Fuzzy Syst., vol. 24, no. 1, pp. 16-28, Feb. 2016. http://ieeexplore.ieee.org/document/7072483/
    [19]
    Y. Gao and Y. J. Liu, "Adaptive fuzzy optimal control using direct heuristic dynamic programming for chaotic discrete-time system, " J. Vibrat. Control, vol. 22, no. 2, pp. 595-603, 2016. doi: 10.1177/1077546314534286
    [20]
    Y. J. Liu, L. Tang, S. C. Tong, C. L. P. Chen, and D. J. Li, "Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, " IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 1, pp. 165-176, Jan. 2015. http://www.ncbi.nlm.nih.gov/pubmed/25438326
    [21]
    K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, "Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality, " Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.
    [22]
    P. Hingston and G. Kendall, "Learning versus evolution in iterated prisoner's dilemma, " in Proc. Congr. Evolutionary Computation, Portland, OR, USA, 2004, pp. 364-372. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=1330880
    [23]
    S. Y. Chong and X. Yao, "Multiple choices and reputation in multiagent interactions, " IEEE Trans. Evol. Comput., vol. 11, no. 6, pp. 689-711, Dec. 2007. http://ieeexplore.ieee.org/document/4358753/
    [24]
    E. Semsar-Kazerooni and K. Khorasani, "Multi-agent team cooperation: A game theory approach, " Automatica, vol. 45, no. 10, pp. 2205-2213, Oct. 2009. http://www.sciencedirect.com/science/article/pii/S0005109809002970
    [25]
    D. Ashlock, J. A. Brown, and P. Hingston, "Multiple opponent optimization of prisoner's dilemma playing agents, " IEEE Trans. Comput. Intell. AI Games, vol. 7, no. 1, pp. 53-65, Mar. 2015. http://ieeexplore.ieee.org/document/6819427/
    [26]
    J. W. Li and G. Kendall, "The effect of memory size on the evolutionary stability of strategies in iterated prisoner's dilemma, " IEEE Trans. Evol. Comput., vol. 18, no. 6, pp. 819-826, Dec. 2014. http://ieeexplore.ieee.org/document/6642072
    [27]
    K. Moriyama, "Learning-rate adjusting Q-learning for prisoner's dilemma games, " in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence and Intelligent Agent Technology, Sydney, NSW, Australia, 2008, pp. 322-325. http://ieeexplore.ieee.org/document/4740642/
    [28]
    X. Y. Deng, Z. P. Zhang, Y. Deng, Q. Liu, and S. H. Chang, "Self-adaptive win-stay-lose-shift reference selection mechanism promotes cooperation on a square lattice, " Appl. Math. Comput., vol. 284, pp. 322-331, Jul. 2016. http://www.sciencedirect.com/science/article/pii/S0096300316302028
    [29]
    F. C. Santos and J. M. Pacheco, "Scale-free networks provide a unifying framework for the emergence of cooperation, " Phys. Rev. Lett., vol. 95, no. 9, pp. Article ID 098104, Aug. 2005. http://www.ncbi.nlm.nih.gov/pubmed/16197256?dopt=Abstract
    [30]
    F. C. Santos and J. M. Pacheco, "A new route to the evolution of cooperation, " J. Evol. Biol., vol. 19, no. 3, pp. 726-733, May 2006. doi: 10.1111/jeb.2006.19.issue-3

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(5)

    Article Metrics

    Article views (905) PDF downloads(66) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return