A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 5
May  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 6.171, Top 11% (SCI Q1)
    CiteScore: 11.2, Top 5% (Q1)
    Google Scholar h5-index: 51, TOP 8
Turn off MathJax
Article Contents
J. R. Wang, Y. T. Hong, J. L. Wang, J. P. Xu, Y. Tang, Q.-L. Han, and  J. Kurths,  “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, May 2022. doi: 10.1109/JAS.2022.105506
Citation: J. R. Wang, Y. T. Hong, J. L. Wang, J. P. Xu, Y. Tang, Q.-L. Han, and  J. Kurths,  “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, May 2022. doi: 10.1109/JAS.2022.105506

Cooperative and Competitive Multi-Agent Systems: From Optimization to Games

doi: 10.1109/JAS.2022.105506
Funds:  This work was supported in part by the National Natural Science Foundation of China (Basic Science Center Program: 61988101), the Sino-German Center for Research Promotion (M-0066), the International (Regional) Cooperation and Exchange Project (61720106008), the Programme of Introducing Talents of Discipline to Universities (the 111 Project) (B17017), and the Program of Shanghai Academic Research Leader (20XD1401300)
More Information
  • Multi-agent systems can solve scientific issues related to complex systems that are difficult or impossible for a single agent to solve through mutual collaboration and cooperation optimization. In a multi-agent system, agents with a certain degree of autonomy generate complex interactions due to the correlation and coordination, which is manifested as cooperative/competitive behavior. This survey focuses on multi-agent cooperative optimization and cooperative/non-cooperative games. Starting from cooperative optimization, the studies on distributed optimization and federated optimization are summarized. The survey mainly focuses on distributed online optimization and its application in privacy protection, and overviews federated optimization from the perspective of privacy protection me- chanisms. Then, cooperative games and non-cooperative games are introduced to expand the cooperative optimization problems from two aspects of minimizing global costs and minimizing individual costs, respectively. Multi-agent cooperative and non-cooperative behaviors are modeled by games from both static and dynamic aspects, according to whether each player can make decisions based on the information of other players. Finally, future directions for cooperative optimization, cooperative/non-cooperative games, and their applications are discussed.

     

  • loading
  • Jianrui Wang and Yitian Hong contributed equally to this work.
  • [1]
    M. Mazouchi, M. B. Naghibi-Sistani, and S. K. H. Sani, “A novel distributed optimal adaptive control algorithm for nonlinear multi-agent differential graphical games,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 331–341, Jan. 2018. doi: 10.1109/JAS.2017.7510784
    [2]
    M. Q. Xue, Y. Tang, W. Ren, and F. Qian, “Practical output synchronization for asynchronously switched multi-agent systems with adaption to fast-switching perturbations,” Automatica, vol. 116, p. 108917, Jun. 2020.
    [3]
    X. Jin, Y. Shi, Y. Tang, and X. T. Wu, “Event-triggered attitude consensus with absolute and relative attitude measurements,” Automatica, vol. 122, p. 109245, Dec. 2020.
    [4]
    Y. Tang, J. Kurths, W. Lin, E. Ott, and L. Kocarev, “Introduction to Focus Issue: When machine learning meets complex systems: Networks, chaos, and nonlinear dynamics,” Chaos, vol. 30, no. 6, p. 063151, Jun. 2020.
    [5]
    X. Jin, Y. Shi, Y. Tang, H. Werner, and J. Kurths, “Event-triggered fixed-time attitude consensus with fixed and switching topologies,” IEEE Trans. Autom. Control, 2021.
    [6]
    L. Ding, Q. L. Han, X. H. Ge, and X. M. Zhang, “An overview of recent advances in event-triggered consensus of multi-agent systems,” IEEE Trans. Cybern., vol. 48, no. 4, pp. 1110–1123, Apr. 2018. doi: 10.1109/TCYB.2017.2771560
    [7]
    M. Veres and M. Moussa, “Deep learning for intelligent transportation systems: A survey of emerging trends,” IEEE Trans. Intell. Transport. Syst., vol. 21, no. 8, pp. 3152–3168, Aug. 2020. doi: 10.1109/TITS.2019.2929020
    [8]
    S. Mao, Z. W. Dong, P. Schultz, Y. Tang, K. Meng, Z. Y. Dong, and F. Qian, “A finite-time distributed optimization algorithm for economic dispatch in smart grids,” IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 4, pp. 2068–2079, Apr. 2021. doi: 10.1109/TSMC.2019.2931846
    [9]
    A. Singh, T. Jain, and S. Sukhbaatar, “Learning when to communicate at scale in multi-agent cooperative and competitive tasks,” in Proc. 7th Int. Conf. Learning Representations, New Orleans, USA, 2019, pp. 1–16.
    [10]
    J. S. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed optimization and learning over networks,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4289–4305, Aug. 2012. doi: 10.1109/TSP.2012.2198470
    [11]
    Y. M. Wang, S. X. Wang, and L. Wu, “Distributed optimization approaches for emerging power systems operation: A review,” Electr. Power Syst. Res., vol. 144, pp. 127–135, Mar. 2017. doi: 10.1016/j.jpgr.2016.11.025
    [12]
    L. Ding, L. Y. Wang, G. Y. Yin, W. X. Zheng, and Q. L. Han, “Distributed energy management for smart grids with an event-triggered communication scheme,” IEEE Trans. Control Syst. Tech- nol., vol. 27, no. 5, pp. 1950–1961, Sep. 2019. doi: 10.1109/TCST.2018.2842208
    [13]
    J. Konečnỳ, B. McMahan, and D. Ramage, “Federated optimization: Distributed optimization beyond the datacenter,” arXiv preprint arXiv: 1511.03575, 2015.
    [14]
    T. Bașar and G. J. Olsder, Dynamic Noncooperative Game Theory. 2nd ed. Philadelphia, USA: SIAM, 1998.
    [15]
    E. Semsar-Kazerooni and K. Khorasani, “A game theory approach to multi-agent team cooperation,” in Proc. American Control Conf., St. Louis, USA, 2009, pp. 4512–4518.
    [16]
    T. L. Vincent and G. Leitmann, “Control-space properties of cooperative games,” J. Optim. Theory Appl., vol. 6, no. 2, pp. 91–113, Aug. 1970. doi: 10.1007/BF00927045
    [17]
    L. S. Shapley, “Stochastic games,” Proc. Natl. Acad. Sci., vol. 39, no. 10, pp. 1095–1100, Oct. 1953. doi: 10.1073/pnas.39.10.1095
    [18]
    I. E. Weintraub, M. Pachter, and E. Garcia, “An introduction to pursuit-evasion differential games,” in Proc. American Control Conf., Denver, USA, 2020, pp. 1049–1066.
    [19]
    Z. Zhou, J. H. Huang, J. P. Xu, and Y. Tang, “Two-phase jointly optimal strategies and winning regions of the capture-the-flag game,” in Proc. 47th Annu. Conf. IEEE Industrial Electronics Society, Toronto, Canada, 2021, pp. 1–6.
    [20]
    J. Wang, J. Huang, and Y. Tang, “Swarm intelligence capture-the-flag game with imperfect information based on deep reinforcement learning,” Sci. Sin. Technol., 2021.
    [21]
    S. Zamir, “Bayesian games: Games with incomplete information,” in Complex Social and Behavioral Systems, M. Sotomayor, D. Pérez-Castrillo, F. Castiglione, Eds. New York, USA: Springer, 2020.
    [22]
    D. Y. Sun, X. Huang, Y. H. Liu, and H. Zhong, “Predictable energy aware routing based on dynamic game theory in wireless sensor networks,” Comput. Electric. Eng., vol. 39, no. 6, pp. 1601–1608, Aug. 2013. doi: 10.1016/j.compeleceng.2012.05.007
    [23]
    S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to correlated equilibrium,” Econometrica, vol. 68, no. 5, pp. 1127–1150, Sep. 2000. doi: 10.1111/1468-0262.00153
    [24]
    J. Heinrich, M. Lanctot, and D. Silver, “Fictitious self-play in extensive-form games,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 805–813.
    [25]
    S. W. Wang, X. Jin, S. Mao, A. V. Vasilakos, and Y. Tang, “Model-free event-triggered optimal consensus control of multiple Euler-Lagrange systems via reinforcement learning,” IEEE Trans. Netw. Sci. Eng., vol. 8, no. 1, pp. 246–258, Jan.–Mar. 2021. doi: 10.1109/TNSE.2020.3036604
    [26]
    C. Z. Zhang, J. R. Wang, G. G. Yen, C. Q. Zhao, Q. Y. Sun, Y. Tang, F. Qian, and J. Kurths, “When autonomous systems meet accuracy and transferability through AI: A survey,” Patterns, vol. 1, no. 4, p. 100050, Jul. 2020.
    [27]
    H. Tembine, Q. Y. Zhu, and T. Bașar, “Risk-sensitive mean-field games,” IEEE Trans. Autom. Control, vol. 59, no. 4, pp. 835–850, Apr. 2014. doi: 10.1109/TAC.2013.2289711
    [28]
    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, Dec. 2018. doi: 10.1126/science.aar6404
    [29]
    A. Blair and A. Saffidine, “AI surpasses humans at six-player poker,” Science, vol. 365, no. 6456, pp. 864–865, Aug. 2019. doi: 10.1126/science.aay7774
    [30]
    O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Oct. 2019. doi: 10.1038/s41586-019-1724-z
    [31]
    T. Yang, X. L. Yi, J. F. Wu, Y. Yuan, D. Wu, Z. Y. Meng, Y. G. Hong, H. Wang, Z. L. Lin, and K. H. Johansson, “A survey of distributed optimization,” Ann. Rev. Control, vol. 47, pp. 278–305, May 2019. doi: 10.1016/j.arcontrol.2019.05.006
    [32]
    M. Zhu, A. H. Anwar, Z. L. Wan, J. H. Cho, C. A. Kamhoua, and M. P. Singh, “A survey of defensive deception: Approaches using game theory and machine learning,” IEEE Commun. Surv. Tut., vol. 23, no. 4, pp. 2460–2493, Oct.–Dec. 2021. doi: 10.1109/COMST.2021.3102874
    [33]
    K. Sohrabi and H. Azgomi, “A survey on the combined use of optimization methods and game theory,” Arch. Comput. Methods Eng., vol. 27, no. 1, pp. 59–80, Jan. 2020. doi: 10.1007/s11831-018-9300-5
    [34]
    Q. J. Shi, C. He, H. Y. Chen, and L. G. Jiang, “Distributed wireless sensor network localization via sequential greedy optimization algorithm,” IEEE Trans. Signal Process., vol. 58, no. 6, pp. 3328–3340, Jun. 2010. doi: 10.1109/TSP.2010.2045416
    [35]
    S. Mao, Y. Tang, Z. W. Dong, K. Meng, Z. Y. Dong, and F. Qian, “A privacy preserving distributed optimization algorithm for economic dispatch over time-varying directed networks,” IEEE Trans. Indust. Inf., vol. 17, no. 3, pp. 1689–1701, Mar. 2021. doi: 10.1109/TII.2020.2996198
    [36]
    D. K. Molzahn, F. Dörfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,” IEEE Trans. Smart Grid, vol. 8, no. 6, pp. 2941–2962, Nov. 2017. doi: 10.1109/TSG.2017.2720471
    [37]
    S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Trans. Autom. Control, vol. 63, no. 3, pp. 714–725, Mar. 2017. doi: 10.1109/TAC.2017.2743462
    [38]
    N. Eshraghi and B. Liang, “Distributed online optimization over a heterogeneous network with any-batch mirror descent,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 2933–2942.
    [39]
    X. X. Li, X. L. Yi, and L. H. Xie, “Distributed online convex optimization with an aggregative variable,” IEEE Trans. Control Netw. Syst., 2021.
    [40]
    J. Y. Li, C. Y. Gu, Z. Y. Wu, and T. W. Huang, “Online learning algorithm for distributed convex optimization with time-varying coupled constraints and bandit feedback,” IEEE Trans. Cybern., vol. 52, no. 2, pp. 1009–1020, Feb. 2022. doi: 10.1109/TCYB.2020.2990796
    [41]
    S. Shahrampour, A. Rakhlin, and A. Jadbabaie, “Distributed estimation of dynamic parameters: Regret analysis,” in Proc. American Control Conf., Boston, USA, 2016, pp. 1066–1071.
    [42]
    M. Akbari, B. Gharesifard, and T. Linder, “Distributed online convex optimization on time-varying directed graphs,” IEEE Trans. Control Netw. Syst., vol. 4, no. 3, pp. 417–428, Sep. 2017. doi: 10.1109/TCNS.2015.2505149
    [43]
    Y. Zhang, R. J. Ravier, M. M. Zavlanos, and V. Tarokh, “A distributed online convex optimization algorithm with improved dynamic regret,” in Proc. IEEE 58th Conf. Decision and Control, Nice, France, 2019, pp. 2449–2454.
    [44]
    S. M. Fosson, “Centralized and distributed online learning for sparse time-varying optimization,” IEEE Trans. Autom. Control, vol. 66, no. 6, pp. 2542–2557, Jun. 2020. doi: 10.1109/TAC.2020.3010242
    [45]
    S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed convex optimization on dynamic networks,” IEEE Trans. Autom. Control, vol. 61, no. 11, pp. 3545–3550, Nov. 2016. doi: 10.1109/TAC.2016.2525928
    [46]
    N. Mazzi, B. S. Zhang, and D. S. Kirschen, “An online optimization algorithm for alleviating contingencies in transmission networks,” IEEE Trans. Power Syst., vol. 33, no. 5, pp. 5572–5582, Sep. 2018. doi: 10.1109/TPWRS.2018.2808456
    [47]
    K. H. Lu, G. S. Jing, and L. Wang, “Online distributed optimization with strongly pseudoconvex-sum cost functions,” IEEE Trans. Autom. Control, vol. 65, no. 1, pp. 426–433, Jan. 2019. doi: 10.1109/TAC.2019.2915745
    [48]
    X. L. Yi, X. X. Li, L. H. Xie, and K. H. Johansson, “Distributed online convex optimization with time-varying coupled inequality constraints,” IEEE Trans. Signal Process., vol. 68, pp. 731–746, Jan. 2020. doi: 10.1109/TSP.2020.2964200
    [49]
    D. M. Yuan, Y. G. Hong, D. W. C. Ho, and S. Y. Xu, “Distributed mirror descent for online composite optimization,” IEEE Trans. on Autom. Control, vol. 66, no. 2, pp. 714–729, Feb. 2020. doi: 10.1109/TAC.2020.2987379
    [50]
    Q. G. Lu, X. F. Liao, T. Xiang, H. Q. Li, and T. W. Huang, “Privacy masking stochastic subgradient-push algorithm for distributed online optimization,” IEEE Trans. Cybern., vol. 51, no. 6, pp. 3224–3237, Jun. 2020.
    [51]
    K. H. Lu and L. Wang, “Online distributed optimization with nonconvex objective functions: Sublinearity of first-order optimality condition-based regret,” IEEE Trans. Autom. Control, 2021.
    [52]
    L. Ding, P. Hu, Z. W. Liu, and G. H. Wen, “Transmission lines overload alleviation: Distributed online optimization approach,” IEEE Trans. Ind. Inf., vol. 17, no. 5, pp. 3197–3208, May 2021. doi: 10.1109/TII.2020.3009749
    [53]
    J. L. Raisaro, G. Choi, S. Pradervand, R. Colsenet, N. Jacquemont, N. Rosat, V. Mooser, and J. P. Hubaux, “Protecting privacy and security of genomic data in i2b2 with homomorphic encryption and differential privacy,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 15, no. 5, pp. 1413–1426, Sep.–Oct. 2018.
    [54]
    D. L. Oberski and F. Kreuter, “Differential privacy and social science: An urgent puzzle,” Harv. Data Sci. Rev., vol. 2, no. 1, pp. 1–21, Feb. 2020.
    [55]
    M. Y. Hong, D. Hajinezhad, and M. M. Zhao, “Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1529–1538.
    [56]
    D. Hajinezhad, M. Y. Hong, and A. Garcia, “ZONE: Zeroth-order nonconvex multi-agent optimization over networks,” IEEE Trans. Autom. Control, vol. 64, no. 10, pp. 3995–4010, Oct. 2019. doi: 10.1109/TAC.2019.2896025
    [57]
    Y. J. Tang, J. S. Zhang, and N. Li, “Distributed zero-order algorithms for nonconvex multi-agent optimization,” IEEE Trans. Control Netw. Syst., vol. 8, no. 1, pp. 269–281, Mar. 2021. doi: 10.1109/TCNS.2020.3024321
    [58]
    Z. Y. He, J. P. He, C. L. Chen, and X. P. Guan, “Distributed nonconvex optimization: Gradient-free iterations and globally optimal solution,” arXiv preprint arXiv: 2008.00252, 2020.
    [59]
    Z. Y. He, J. P. He, C. L. Chen, and X. P. Guan, “Dependable distributed nonconvex optimization via polynomial approximation,” arXiv preprint arXiv: 2101.06127, 2021.
    [60]
    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017, pp. 1273–1282.
    [61]
    Q. Yang, Y. Liu, T. J. Chen, and Y. X. Tong, “Federated machine learning: Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, p. 12, Mar. 2019.
    [62]
    S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” arXiv preprint arXiv: 1711.10677, 2017.
    [63]
    Y. L. Lu, X. H. Huang, Y. Y. Dai, S. Maharjan, and Y. Zhang, “Differentially private asynchronous federated learning for mobile edge computing in urban informatics,” IEEE Trans. Ind. Inf., vol. 16, no. 3, pp. 2134–2143, Mar. 2020. doi: 10.1109/TII.2019.2942179
    [64]
    M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, and Y. Khazaeni, “Bayesian nonparametric federated learning of neural networks,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 7252–7261.
    [65]
    H. Y. Wang, M. Yurochkin, Y. K. Sun, D. Papailiopoulos, and Y. Khazaeni, “Federated learning with matched averaging,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
    [66]
    S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 5132–5143.
    [67]
    J. Jiang, S. X. Ji, and G. D. Long, “Decentralized knowledge acquisition for mobile internet applications,” World Wide Web, vol. 23, no. 5, pp. 2653–2669, Mar. 2020. doi: 10.1007/s11280-019-00775-w
    [68]
    H. T. Nguyen, V. Sehwag, S. Hosseinalipour, C. G. Brinton, M. Chiang, and H. V. Poor, “Fast-convergent federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 201–218, Jan. 2021. doi: 10.1109/JSAC.2020.3036952
    [69]
    R. Shokri, M. Stronati, C. Z. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in Proc. IEEE Symp. Security and Privacy, San Jose, USA, 2017, pp. 3–18.
    [70]
    J. L. Hou, R. Xi, P. Liu, and T. L. Liu, “The switching fractional order chaotic system and its application to image encryption,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 381–388, Apr. 2017. doi: 10.1109/JAS.2016.7510127
    [71]
    H. Y. Zhao, J. Yan, X. Y. Luo, and X. Gua, “Privacy preserving solution for the asynchronous localization of underwater sensor networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1511–1527, Nov. 2020. doi: 10.1109/JAS.2020.1003312
    [72]
    C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proc. 41st Ann. ACM Symp. Theory of Computing, Bethesda, USA, 2009, pp. 169–178.
    [73]
    C. L. Zhang, S. Y. Li, J. Z. Xia, W. Wang, F. Yan, and Y. Liu, “BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning,” in Proc. USENIX Ann. Tech. Conf., 2020, pp. 493–506.
    [74]
    B. Jia, X. S. Zhang, J. W. Liu, Y. Zhang, K. Huang, and Y. Q. Liang, “Blockchain-enabled federated learning data protection aggregation scheme with differential privacy and homomorphic encryption in IIoT,” IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 4049–4058, Jun. 2022. doi: 10.1109/TII.2021.3085960
    [75]
    C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proc. 3rd Theory of Cryptography Conf., New York, USA, 2006, pp. 265–284.
    [76]
    Y. Koda, K. Yamamoto, T. Nishio, and M. Morikura, “Differentially private aircomp federated learning with power adaptation harnessing receiver noise,” in Proc. IEEE Global Communications Conf., Taipei, China, 2020, pp. 1–6.
    [77]
    I. Curiel, Cooperative Game Theory and Applications: Cooperative Games Arising from Combinatorial Optimization Problems. Berlin, Germany: Springer Science & Business Media, 2013.
    [78]
    Y. Tang, H. J. Gao, and J. Kurths, “Multiobjective identification of controlling areas in neuronal networks,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 3, pp. 708–720, May 2013. doi: 10.1109/TCBB.2013.72
    [79]
    Y. C. Jin, Multi-Objective Machine Learning. Berlin, Germany: Springer, 2006.
    [80]
    W. Du, Y. Tang, S. Y. S. Leung, L. Tong, A. V. Vasilakos, and F. Qian, “Robust order scheduling in the discrete manufacturing industry: A multiobjective optimization approach,” IEEE Trans. Ind. Inf., vol. 14, no. 1, pp. 253–264, Jan. 2018. doi: 10.1109/TII.2017.2664080
    [81]
    K. Q. Zhang, Z. R. Yang, and T. Bașar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” in Handbook of Reinforcement Learning and Control, K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, Eds. Cham, Germany: Springer, 2021, pp. 321–384.
    [82]
    Z. Y. Zuo, Q. L. Han, B. D. Ning, X. H. Ge, and X. M. Zhang, “An overview of recent advances in fixed-time cooperative control of multi-agent systems,” IEEE Trans. Ind. Inf., vol. 14, no. 6, pp. 2322–2334, Jun. 2018. doi: 10.1109/TII.2018.2817248
    [83]
    X. H. Ge, Q. L. Han, D. R. Ding, X. M. Zhang, and B. D. Ning, “A survey on recent advances in distributed sampled-data cooperative control of multi-agent systems,” Neurocomputing, vol. 275, pp. 1684–1701, Jan. 2018. doi: 10.1016/j.neucom.2017.10.008
    [84]
    R. H. Crites and A. G. Barto, “Improving elevator performance using reinforcement learning,” in Proc. Advances in Neural Information Proc. Systems, Denver, USA, 1995, pp. 1017–1023.
    [85]
    G. E. Monahan, “State of the art-a survey of partially observable Markov decision processes: Theory, models, and algorithms,” Manag. Sci., vol. 28, no. 1, pp. 1–16, Jan. 1982. doi: 10.1287/mnsc.28.1.1
    [86]
    J. N. Foerster, Y. M. Assael, N. De Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Proc. 30th Advances in Neural Information Proc. Systems, Barcelona, Spain, 2016, pp. 2137–2145.
    [87]
    P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proc. 17th Int. Conf. Autonomous Agents and Multi-Agent Systems, Stockholm, Sweden, 2018, pp. 2085–2087.
    [88]
    R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proc. 31st Int. Conf. Neural Information Proc. Systems, Long Beach, USA, 2017, pp. 6382–6393.
    [89]
    S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multi-agent communication with backpropagation,” in Proc. 30th Int. Conf. Neural Information Proc. Systems, Barcelona, Spain, 2016, pp. 2252–2260.
    [90]
    A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “TarMAC: Targeted multi-agent communication,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 1538–1546.
    [91]
    G. Chen, “A new framework for multi-agent reinforcement learning–centralized training and exploration with decentralized execution via policy distillation,” in Proc. 19th Int. Conf. Autonomous Agents and Multi-Agent Systems, Auckland, New Zealand, 2020, pp. 1801–1803.
    [92]
    T. Rashid, M. Samvelyan, C. S. De Wit, G. Farquhar, J. Foerster, and S. Whiteson, “QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4295–4304.
    [93]
    K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 5887–5896.
    [94]
    T. Rashid, G. Farquhar, B. Peng, and S. Whiteson, “Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. 34th Ann. Conf. Neural Information Proc. Systems, Vancouver, Canada, 2020, pp. 10199–10210.
    [95]
    J. H. Wang, Z. Z. Ren, T. Liu, Y. Yu, and C. J. Zhang, “QPLEX: Duplex dueling multi-agent Q-learning,” in Proc. 9th Int. Conf. Learning Representations, 2021, pp. 1–27.
    [96]
    Y. D. Yang, J. Y. Hao, B. Liao, K. Shao, G. Y. Chen, W. L. Liu, and H. Y. Tang, “Qatten: A general framework for cooperative multi-agent reinforcement learning,” arXiv preprint arXiv: 2002.03939, 2020.
    [97]
    J. Y. Su, S. C. Adams, and P. A. Beling, “Value-decomposition multi-agent actor-critics,” in Proc. 35th AAAI Conf. Artificial Intelligence, 2021, pp. 11352–11360.
    [98]
    E. Winter, “The Shapley value,” Handbook Game Theory Econom. Appl., vol. 3, pp. 2025–2054, Aug. 2002.
    [99]
    R. E. Wang, M. Everett, and J. P. How, “R-MADDPG for partially observable environments and limited communication,” in Proc. Workshop in the 36th Int. Conf. Machine Learning, Long Beach, USA, 2020, pp. 1–9.
    [100]
    K. Liu, Y. Y. Zhao, G. Wang, and B. Peng, “Self-attention-based multi-agent continuous control method in cooperative environments,” Inf. Sci., vol. 585, pp. 454–470, Mar. 2021. doi: 10.1016/j.ins.2021.11.054
    [101]
    T. P. Yang, W. X. Wang, H. Y. Tang, J. Y. Hao, Z. P. Meng, H. Y. Mao, D. Li, W. L. Liu, Y. F. Chen, Y. J. Hu, C. J. Fan, and C. W. Zhang, “An efficient transfer learning framework for multi-agent reinforcement learning,” in Proc. 35th Advances in Neural Information Proc. Systems, 2021.
    [102]
    C. H. Liu, Z. P. Dai, Y. N. Zhao, J. Crowcroft, D. P. Wu, and K. K. Leung, “Distributed and energy-efficient mobile crowdsensing with charging stations by deep reinforcement learning,” IEEE Trans. Mobile Comput., vol. 20, no. 1, pp. 130–146, Jan. 2021. doi: 10.1109/TMC.2019.2938509
    [103]
    D. Bauso, Game Theory with Engineering Applications. Philadelphia, USA: SIAM, 2016.
    [104]
    M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Proc. 10th Int. Conf. Machine Learning, Amherst, USA, 1993, pp. 330–337.
    [105]
    P. Liu, W. Y. Zang, and M. Yu, “Incentive-based modeling and inference of attacker intent, objectives, and strategies,” ACM Trans. Inf. Syst. Secur., vol. 8, no. 1, pp. 78–118, Feb. 2005. doi: 10.1145/1053283.1053288
    [106]
    O. D. Altan, G. Y. Wu, M. J. Barth, K. Boriboonsomsin, and J. A. Stark, “Glidepath: Eco-friendly automated approach and departure at signalized intersections,” IEEE Trans. Intell. Veh., vol. 2, no. 4, pp. 266–277, Dec. 2017. doi: 10.1109/TIV.2017.2767289
    [107]
    K. Kang and H. A. Rakha, “Game theoretical approach to model decision making for merging maneuvers at freeway on-ramps,” Transp. Res. Rec., vol. 2623, no. 1, pp. 19–28, Jan. 2017. doi: 10.3141/2623-03
    [108]
    H. White, H. Q. Xu, and K. Chalak, “Causal discourse in a game of incomplete information,” J. Econometrics, vol. 182, no. 1, pp. 45–58, Sep. 2014. doi: 10.1016/j.jeconom.2014.04.007
    [109]
    S. N. Xia, F. L. Lin, Z. Y. Chen, C. B. Tang, Y. J. Ma, and X. H. Yu, “A Bayesian game based vehicle-to-vehicle electricity trading scheme for blockchain-enabled internet of vehicles,” IEEE Trans. Veh. Technol., vol. 69, no. 7, pp. 6856–6868, Jul. 2020. doi: 10.1109/TVT.2020.2990443
    [110]
    H. W. Zhang, J. D. Wang, D. K. Yu, J. H. Han, and T. Li, “Active defense strategy selection based on static Bayesian game,” in Proc. 3rd Int. Conf. Cyberspace Technology, Beijing, China, 2015, pp. 1–7.
    [111]
    J. Huang, C. C. Xing, Y. Qian, and Z. J. Haas, “Resource allocation for multicell device-to-device communications underlaying 5G networks: A game-theoretic mechanism with incomplete information,” IEEE Trans. Veh. Technol., vol. 67, no. 3, pp. 2557–2570, Mar. 2018. doi: 10.1109/TVT.2017.2765208
    [112]
    S. S. Hasanabadi, A. H. Lashkari, and A. A. Ghorbani, “A memorybased game-theoretic defensive approach for digital forensic investigators,” Forensic Sci. Int. Digital Invest., vol. 38, p. 301214, Sep. 2021.
    [113]
    R. Yan, Z. Y. Shi, and Y. S. Zhong, “Task assignment for multiplayer reach–avoid games in convex domains via analytical barriers,” IEEE Trans. Rob., vol. 36, no. 1, pp. 107–124, Feb. 2020. doi: 10.1109/TRO.2019.2935345
    [114]
    H. M. Huang, J. Ding, W. Zhang, and C. J. Tomlin, “Automation-assisted capture-the-flag: A differential game approach,” IEEE Trans. Control Syst. Technol., vol. 23, no. 3, pp. 1014–1028, Mar. 2015. doi: 10.1109/TCST.2014.2360502
    [115]
    M. Mitchell, A. M. Bayen, and C. J. Tomlin, “A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games,” IEEE Trans. Autom. Control, vol. 50, no. 7, pp. 947–957, Jul. 2005. doi: 10.1109/TAC.2005.851439
    [116]
    R. Yan, Z. Y. Shi, and Y. S. Zhong, “Reach-avoid games with two defenders and one attacker: An analytical approach,” IEEE Trans. Cybern., vol. 49, no. 3, pp. 1035–1046, Mar. 2019. doi: 10.1109/TCYB.2018.2794769
    [117]
    E. Ben-Porath, “Rationality, Nash equilibrium and backwards induction in perfect-information games,” Rev. Econom. Stud., vol. 64, no. 1, pp. 23–46, Jan. 1997. doi: 10.2307/2971739
    [118]
    S. D. Levitt, J. A. List, and S. E. Sadoff, “Checkmate: Exploring backward induction among chess players,” Am. Econom. Rev., vol. 101, no. 2, pp. 975–990, Apr. 2011. doi: 10.1257/aer.101.2.975
    [119]
    M. Bowling, N. Burch, M. Johanson, and O. Tammelin, “Heads-up limit Hold’em poker is solved,” Science, vol. 347, no. 6218, pp. 145–149, Jan. 2015. doi: 10.1126/science.1259433
    [120]
    S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2011, pp. 627–635.
    [121]
    M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, “Regret minimization in games with incomplete information,” in Proc. Advances in Neural Information Proc. Systems, Vancouver, Canada, 2007, pp. 1729–1736.
    [122]
    N. Brown and T. Sandholm, “Solving imperfect-information games via discounted regret minimization,” in Proc. 33rd AAAI Conf. Artificial Intelligence, Honolulu, USA, 2019, pp. 1829–1836.
    [123]
    H. Li, K. L. Hu, S. H. Zhang, Y. Qi, and L. Song, “Double neural counterfactual regret minimization,” in Proc. 7th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020, pp. 1–20.
    [124]
    N. Brown, A. Lerer, S. Gross, and T. Sandholm, “Deep counterfactual regret minimization,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 793–802.
    [125]
    E. Steinberger, “Single deep counterfactual regret minimization,” arXiv preprint arXiv: 1901.07621, 2019.
    [126]
    S. X. Li, Y. Z. Zhang, X. R. Wang, W. Q. Xue, and B. An, “CFR-MIX: Solving imperfect information extensive-form games with combinatorial action space,” in Proc. 30th Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 3663–3669.
    [127]
    D. Monderer and L. S. Shapley, “Fictitious play property for games with identical interests,” J. Econ. Theory, vol. 68, no. 1, pp. 258–265, Jan. 1996. doi: 10.1006/jeth.1996.0014
    [128]
    J. Heinrich and D. Silver, “Deep reinforcement learning from self-play in imperfect-information games,” in Proc. 3rd Workshops at Advances Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1–10.
    [129]
    P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multi-agent deep reinforcement learning,” Auton. Agents Multi-Agent Syst, vol. 33, no. 6, pp. 750–797, Oct. 2019. doi: 10.1007/s10458-019-09421-1
    [130]
    W. Q. Xue, Y. Z. Zhang, S. X. Li, X. R. Wang, B. An, and C. K. Yeo, “Solving large-scale extensive-form network security games via neural fictitious self-play,” in Proc. 30th AAAI Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 3713–3720.
    [131]
    W. B. Li, J. L. Xu, J. Huo, L. Wang, Y. Gao, and J. B. Luo, “Distribution consistency based covariance metric networks for few-shot learning,” in Proc. 33rd AAAI Conf. Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conf. and Ninth AAAI Symp. Educational Advances in Artificial Intelligence, Honolulu, USA, 2019, pp. 8642–8649.
    [132]
    M. Y. Huang, R. P. Malhamé, and P. E. Caines, “Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle,” Commun. Inf. Syst., vol. 6, no. 3, pp. 221–252, Jan. 2006. doi: 10.4310/CIS.2006.v6.n3.a5
    [133]
    M. Y. Huang, P. E. Caines, and R. P. Malhame, “Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria,” IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1560–1571, Sep. 2007. doi: 10.1109/TAC.2007.904450
    [134]
    J. Moon and T. Bașar, “Linear quadratic mean field Stackelberg differential games,” Automatica, vol. 97, pp. 200–213, Nov. 2018. doi: 10.1016/j.automatica.2018.08.008
    [135]
    J. Moon and T. Başar, “Linear quadratic risk-sensitive and robust mean field games,” IEEE Trans. Autom. Control, vol. 62, no. 3, pp. 1062–1077, Mar. 2017. doi: 10.1109/TAC.2016.2579264
    [136]
    T. Li and J. F. Zhang, “Decentralized tracking-type games for multi-agent systems with coupled ARX models: Asymptotic Nash equilibria,” Automatica, vol. 44, no. 3, pp. 713–725, Mar. 2008. doi: 10.1016/j.automatica.2007.07.007
    [137]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. doi: 10.1038/nature16961
    [138]
    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. T. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, Oct. 2017. doi: 10.1038/nature24270
    [139]
    J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering Atari, Go, chess and shogi by planning with a learned model,” Nature, vol. 588, no. 7839, pp. 604–609, Dec. 2020. doi: 10.1038/s41586-020-03051-4
    [140]
    N. Brown and T. Sandholm, “Superhuman AI for heads-up no-limit poker: Libratus beats top professionals,” Science, vol. 359, no. 6374, pp. 418–424, Dec. 2017.
    [141]
    N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, Jul. 2019. doi: 10.1126/science.aay2400
    [142]
    X. J. Wang, J. X. Song, P. H. Qi, P. Peng, Z. K. Tang, W. Zhang, W. M. Li, X. J. Pi, J. J. He, C. Gao, H. T. Long, and Q. Yuan, “SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II,” in Proc. 38th Int. Conf. Machine Learning, 2021, pp. 10905–10915.
    [143]
    Y. X. Chen, L. Zhang, S. J. Li, and G. Pan, “Optimize neural fictitious self-play in regret minimization thinking,” arXiv preprint arXiv: 2104.10845, 2021.
    [144]
    S. Mao, Z. W. Dong, W. Du, Y. C. Tian, C. Liang, and Y. Tang, “Distributed non-convex event-triggered optimization over time-varying directed networks,” IEEE Trans. Ind. Inf., 2021.
    [145]
    A. Ghosh, J. Hong, D. Yin, and K. Ramchandran, “Robust federated learning in a heterogeneous environment,” arXiv preprint arXiv: 1906.06629, 2019.
    [146]
    U. Mahadev, “Classical homomorphic encryption for quantum circuits,” in Proc. IEEE 59th Ann. Symp. Foundations of Computer Science, Paris, France, 2018, pp. 332–338.
    [147]
    M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” J. Comput. Phys., vol. 378, pp. 686–707, Feb. 2019. doi: 10.1016/j.jcp.2018.10.045
    [148]
    A. E. Roth and M. A. O. Sotomayor, Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis, Cambridge, UK: Cambridge University Press, 1992, pp. 485–541.
    [149]
    S. Kumabe and T. Maehara, “Convexity of b-matching games.” in Proc. 29th Int. Joint Conf. Artificial Intelligence, Yokohama, Japan, 2020, pp. 261–267.
    [150]
    I. Gemp, B. McWilliams, C. Vernade, and T. Graepel, “Eigengame unloaded: When playing games is better than optimizing,” arXiv preprint arXiv: 2102.04152, 2021.
    [151]
    S. Pateria, B. Subagdja, A. H. Tan, and C. Quek, “Hierarchical reinforcement learning: A comprehensive survey,” ACM Comput. Surv., vol. 54, no. 5, p. 109, Jun. 2022.
    [152]
    H. Ryu, H. Shin, and J. Park, “Multi-agent actor-critic with hierarchical graph attention network,” in Proc. 34th AAAI Conf. Artificial Intelligence, New York, USA, 2020, pp. 7236–7243.
    [153]
    L. X. Wang, Z. R. Yang, and Z. R. Wang, “Breaking the curse of many agents: Provable mean embedding Q-iteration for mean-field reinforcement learning,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 10092–10103.
    [154]
    F. Schubert, M. Awiszus, and B. Rosenhahn, “TOAD-GAN: A flexible framework for few-shot level generation in token-based games,” IEEE Trans. Games, 2021.
    [155]
    Y. Chen, J. M. Liu, and B. Khoussainov, “Maximum entropy inverse reinforcement learning for mean field games,” arXiv preprint arXiv: 2104.14654, 2021.
    [156]
    A. R. Tilman, J. B. Plotkin, and E. Akçay, “Evolutionary games with environmental feedbacks,” Nat. Commun., vol. 11, no. 1, p. 915, Feb. 2020.
    [157]
    D. Mandal, S. Medya, B. Uzzi, and C. Aggarwal, “Meta-learning with graph neural networks: Methods and applications,” arXiv preprint arXiv: 2103.00137, 2021.
    [158]
    Y. Shu, Z. J. Cao, C. Y. Wang, J. M. Wang, and M. S. Long, “Open domain generalization with domain-augmented meta-learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 9619–9628.
    [159]
    DI-engine Contributors, “DI-engine: OpenDILab decision intelligence engine,” 2021. [Online]. Available: https://github.com/opendilab/DI-engine.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(5)

    Article Metrics

    Article views (506) PDF downloads(218) Cited by()

    Highlights

    • For cooperative optimization, we focus on the recent work of distributed online optimization and the research of distributed online optimization and federated optimization in privacy protection
    • For games, we focus on cooperative games and non-cooperative games from the perspective of static games and dynamic games, respectively
    • We bridge the transition from cooperative optimization to games, i.e., cooperative games. Similar to cooperative optimization, cooperative games also consider coordinating multiple individuals to achieve unified goals. Different from cooperative optimization, cooperative games consider the emergence of cooperative behaviors

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return