Cooperative and Competitive Multi-Agent Systems: From Optimization to Games

Jianrui Wang; Yitian Hong; Jiali Wang; Jiapeng Xu; Yang Tang; Qing-Long Han; Jürgen Kurths

doi:10.1109/JAS.2022.105506

Volume 9 Issue 5

May 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(5): 763-783

J. R. Wang, Y. T. Hong, J. L. Wang, J. P. Xu, Y. Tang, Q.-L. Han, and J. Kurths, “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, May 2022. doi: 10.1109/JAS.2022.105506

Citation:

J. R. Wang, Y. T. Hong, J. L. Wang, J. P. Xu, Y. Tang, Q.-L. Han, and J. Kurths, “Cooperative and competitive multi-agent systems: From optimization to games,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 5, pp. 763–783, May 2022. doi: 10.1109/JAS.2022.105506

Citation:

PDF( 2421 KB)

Cooperative and Competitive Multi-Agent Systems: From Optimization to Games

doi: 10.1109/JAS.2022.105506

Jianrui Wang^1
,,
Yitian Hong^1
,,
Jiali Wang^1
,,
Jiapeng Xu^2
,,
Yang Tang^{1
,
,},
Qing-Long Han^{3
,
,},
Jürgen Kurths^4
,

1.
Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
2.
Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON N9B 3P4, Canada
3.
School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, VIC 3122, Australia
4.
Potsdam Institute for Climate Impact Research, 14473 Potsdam, and also with the Institute of Physics, Humboldt University of Berlin, 12489 Berlin, Germany

Funds: This work was supported in part by the National Natural Science Foundation of China (Basic Science Center Program: 61988101), the Sino-German Center for Research Promotion (M-0066), the International (Regional) Cooperation and Exchange Project (61720106008), the Programme of Introducing Talents of Discipline to Universities (the 111 Project) (B17017), and the Program of Shanghai Academic Research Leader (20XD1401300)

More Information

Author Bio:
Jianrui Wang received the B.S. degree in automation from Dalian Maritime University in 2019. He is currently a Ph.D. candidate in control science and engineering at the School of Information Science and Engineering, East China University of Science and Technology. His current research interests include deep reinforcement learning, multi-agent reinforcement learning, game theory, and their applications

Yitian Hong received the B.S. degree in intelligence science and technology from Xidian University in 2020. He is currently a Ph.D. candidate in computer science and technology at the School of Information Science and Engineering, East China University of Science and Technology. His current research interests include multi-agent deep reinforcement learning, federated learning, and their applications

Jiali Wang received the M.S. degree in mathematics from the College of Mathematics and Computer Science, Zhejiang Normal University in 2020. She is currently a Ph.D. candidate in control science and engineering at the School of Information Science and Engineering, East China University of Science and Technology. Her current research interests include game theory, optimal control and their applications

Jiapeng Xu received the Ph.D. degree in control science and engineering from the School of Information Science and Engineering, East China University of Science and Technology in 2021. From October 2019 to October 2020, he was a Visiting Scholar in the Department of Electrical Engineering, University of Notre Dame, USA. He is currently a Postdoctoral Fellow in the Department of Electrical and Computer Engineering, University of Windsor, Canada. His research interests include networked feedback control, event-triggered state estimation, distributed filtering and mean-field games

Yang Tang (Senior Member, IEEE) received the B.S. and Ph.D. degrees in electrical engineering from Donghua University in 2006 and 2010, respectively. From 2008 to 2010, he was a Research Associate with The Hong Kong Polytechnic University, China. From 2011 to 2015, he was a Post-Doctoral Researcher with the Humboldt University of Berlin, Germany, and with the Potsdam Institute for Climate Impact Research, Germany. He is now a Professor with the East China University of Science and Technology. His current research interests include distributed estimation/control/optimization, cyber-physical systems, hybrid dynamical systems, computer vision, reinforcement learning and their applications.He was a recipient of the Alexander von Humboldt Fellowship and has been the ISI Highly Cited Researchers Award by Clarivate Analytics from 2017. He is a Senior Board Member of Scientific Reports, an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Cybernetics, IEEE Transactions on Circuits and Systems−I: Regular Papers, IEEE Transactions on Emerging Topics in Computational Intelligence, IEEE Systems Journal and Engineering Applications of Artificial Intelligence (IFAC Journal), etc. He is a Leading Guest Editor for special issues in IEEE Transactions on Emerging Topics in Computational Intelligence and IEEE Transactions on Cognitive and Developmental Systems

Qing-Long Han (Fellow, IEEE) received the B.Sc. degree in mathematics from Shandong Normal University in 1983, and the M.Sc. and Ph.D. degrees in control engineering from East China University of Science and Technology in 1992 and 1997, respectively. He is Pro Vice-Chancellor (Research Quality) and a Distinguished Professor at Swinburne University of Technology, Australia. He held various academic and management positions at Griffith University and Central Queensland University, Australia. His research interests include networked control systems, multi-agent systems, time-delay systems, smart grids, unmanned surface vehicles, and neural networks.Professor Han was the recipient of The 2021 Norbert Wiener Award (the Highest Award in systems science and engineering, and cybernetics), The 2021 M. A. Sargent Medal (the Highest Award of the Electrical College Board of Engineers Australia), The 2021 IEEE/CAA Journal of Automatica Sinica Norbert Wiener Review Award, The 2020 IEEE Systems, Man, and Cybernetics (SMC) Society Andrew P. Sage Best Transactions Paper Award, The 2020 IEEE Transactions on Industrial Informatics Outstanding Paper Award, and The 2019 IEEE SMC Society Andrew P. Sage Best Transactions Paper Award.He is a Member of the Academia Europaea (The Academy of Europe) and a Fellow of The Institution of Engineers Australia. He is a Highly Cited Researcher in both Engineering and Computer Science (Clarivate Analytics, 2019−2021). He has served as an AdCom Member of IEEE Industrial Electronics Society (IES), a Member of IEEE IES Fellow Committee, and Chair of IEEE IES Technical Committee on Networked Control Systems. He is Co-Editor-in-Chief of IEEE Transactions on Industrial Informatics, Deputy Editor-in-Chief of IEEE/CAA Journal of Automatica Sinica, Co-Editor of Australian Journal of Electrical and Electronic Engineering, an Associate Editor for 12 international journals, including the IEEE Transactions on Cybernetics, IEEE Industrial Electronics Magazine, Control Engineering Practice, and Information Sciences, and a Guest Editor for 13 Special Issues

Jürgen Kurths received the B.S. degree in mathematics from the University of Rostock, Germany, the Ph.D. degree from the Academy of Sciences of the German Democratic Republic, Germany, in 1983, the Honorary degree from the N. I. Lobachevsky State University of Nizhny Novgorod, Russia in 2008, and the Honorary degree from Saratov State University, Russia in 2012. He was a Full Professor with the University of Potsdam, Germany, from 1994 to 2008. Since 2008, he has been a Professor of nonlinear dynamics with the Humboldt University of Berlin, Germany and the Chair of the research domain transdisciplinary concepts with the Potsdam Institute for Climate Impact Research, Potsdam. His current research interests include synchronization, complex networks, time series analysis, and their applications. Since 2009, he has been the Sixth-Century Chair with the University of Aberdeen, UK. He has authored over 680 articles, which are cited more than 40000 times (H-index: 104). He became a Member of the Academia Europaea in 2010, the Macedonian Academy of Sciences and Arts in 2012, and the Royal Society of Edinburgh in 2021. He is a Fellow of the American Physical Society. He received the Alexander von Humboldt Research Award from the Council of Scientific and International Research, India in 2015. He is named as an ISI Highly Cited Researcher in physics and engineering by Thomson Reuters. He is the Editor-in-Chief of CHAOS
Corresponding author: Yang Tang, e-mail: yangtang@ecust.edu.cn; Qing-Long Han, e-mail: qhan@swin.edu.au
Jianrui Wang and Yitian Hong contributed equally to this work.
Received Date: 2022-01-05
Revised Date: 2022-02-08
Accepted Date: 2022-02-26

Available Online: 2022-03-21

Abstract

Abstract

Multi-agent systems can solve scientific issues related to complex systems that are difficult or impossible for a single agent to solve through mutual collaboration and cooperation optimization. In a multi-agent system, agents with a certain degree of autonomy generate complex interactions due to the correlation and coordination, which is manifested as cooperative/competitive behavior. This survey focuses on multi-agent cooperative optimization and cooperative/non-cooperative games. Starting from cooperative optimization, the studies on distributed optimization and federated optimization are summarized. The survey mainly focuses on distributed online optimization and its application in privacy protection, and overviews federated optimization from the perspective of privacy protection me- chanisms. Then, cooperative games and non-cooperative games are introduced to expand the cooperative optimization problems from two aspects of minimizing global costs and minimizing individual costs, respectively. Multi-agent cooperative and non-cooperative behaviors are modeled by games from both static and dynamic aspects, according to whether each player can make decisions based on the information of other players. Finally, future directions for cooperative optimization, cooperative/non-cooperative games, and their applications are discussed.
- Cooperative games,
- counterfactual regret min- imization,
- distributed optimization,
- federated optimization,
- fictitious self-play,
- mean field games,
- multi-agent reinforcement learning,
- non-cooperative games

FullText(HTML)

Jianrui Wang and Yitian Hong contributed equally to this work.

References(159)

References

[1]	M. Mazouchi, M. B. Naghibi-Sistani, and S. K. H. Sani, “A novel distributed optimal adaptive control algorithm for nonlinear multi-agent differential graphical games,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 331–341, Jan. 2018. doi: 10.1109/JAS.2017.7510784
[2]	M. Q. Xue, Y. Tang, W. Ren, and F. Qian, “Practical output synchronization for asynchronously switched multi-agent systems with adaption to fast-switching perturbations,” Automatica, vol. 116, p. 108917, Jun. 2020.
[3]	X. Jin, Y. Shi, Y. Tang, and X. T. Wu, “Event-triggered attitude consensus with absolute and relative attitude measurements,” Automatica, vol. 122, p. 109245, Dec. 2020.
[4]	Y. Tang, J. Kurths, W. Lin, E. Ott, and L. Kocarev, “Introduction to Focus Issue: When machine learning meets complex systems: Networks, chaos, and nonlinear dynamics,” Chaos, vol. 30, no. 6, p. 063151, Jun. 2020.
[5]	X. Jin, Y. Shi, Y. Tang, H. Werner, and J. Kurths, “Event-triggered fixed-time attitude consensus with fixed and switching topologies,” IEEE Trans. Autom. Control, 2021.
[6]	L. Ding, Q. L. Han, X. H. Ge, and X. M. Zhang, “An overview of recent advances in event-triggered consensus of multi-agent systems,” IEEE Trans. Cybern., vol. 48, no. 4, pp. 1110–1123, Apr. 2018. doi: 10.1109/TCYB.2017.2771560
[7]	M. Veres and M. Moussa, “Deep learning for intelligent transportation systems: A survey of emerging trends,” IEEE Trans. Intell. Transport. Syst., vol. 21, no. 8, pp. 3152–3168, Aug. 2020. doi: 10.1109/TITS.2019.2929020
[8]	S. Mao, Z. W. Dong, P. Schultz, Y. Tang, K. Meng, Z. Y. Dong, and F. Qian, “A finite-time distributed optimization algorithm for economic dispatch in smart grids,” IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 4, pp. 2068–2079, Apr. 2021. doi: 10.1109/TSMC.2019.2931846
[9]	A. Singh, T. Jain, and S. Sukhbaatar, “Learning when to communicate at scale in multi-agent cooperative and competitive tasks,” in Proc. 7th Int. Conf. Learning Representations, New Orleans, USA, 2019, pp. 1–16.
[10]	J. S. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed optimization and learning over networks,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4289–4305, Aug. 2012. doi: 10.1109/TSP.2012.2198470
[11]	Y. M. Wang, S. X. Wang, and L. Wu, “Distributed optimization approaches for emerging power systems operation: A review,” Electr. Power Syst. Res., vol. 144, pp. 127–135, Mar. 2017. doi: 10.1016/j.jpgr.2016.11.025
[12]	L. Ding, L. Y. Wang, G. Y. Yin, W. X. Zheng, and Q. L. Han, “Distributed energy management for smart grids with an event-triggered communication scheme,” IEEE Trans. Control Syst. Tech- nol., vol. 27, no. 5, pp. 1950–1961, Sep. 2019. doi: 10.1109/TCST.2018.2842208
[13]	J. Konečnỳ, B. McMahan, and D. Ramage, “Federated optimization: Distributed optimization beyond the datacenter,” arXiv preprint arXiv: 1511.03575, 2015.
[14]	T. Bașar and G. J. Olsder, Dynamic Noncooperative Game Theory. 2nd ed. Philadelphia, USA: SIAM, 1998.
[15]	E. Semsar-Kazerooni and K. Khorasani, “A game theory approach to multi-agent team cooperation,” in Proc. American Control Conf., St. Louis, USA, 2009, pp. 4512–4518.
[16]	T. L. Vincent and G. Leitmann, “Control-space properties of cooperative games,” J. Optim. Theory Appl., vol. 6, no. 2, pp. 91–113, Aug. 1970. doi: 10.1007/BF00927045
[17]	L. S. Shapley, “Stochastic games,” Proc. Natl. Acad. Sci., vol. 39, no. 10, pp. 1095–1100, Oct. 1953. doi: 10.1073/pnas.39.10.1095
[18]	I. E. Weintraub, M. Pachter, and E. Garcia, “An introduction to pursuit-evasion differential games,” in Proc. American Control Conf., Denver, USA, 2020, pp. 1049–1066.
[19]	Z. Zhou, J. H. Huang, J. P. Xu, and Y. Tang, “Two-phase jointly optimal strategies and winning regions of the capture-the-flag game,” in Proc. 47th Annu. Conf. IEEE Industrial Electronics Society, Toronto, Canada, 2021, pp. 1–6.
[20]	J. Wang, J. Huang, and Y. Tang, “Swarm intelligence capture-the-flag game with imperfect information based on deep reinforcement learning,” Sci. Sin. Technol., 2021.
[21]	S. Zamir, “Bayesian games: Games with incomplete information,” in Complex Social and Behavioral Systems, M. Sotomayor, D. Pérez-Castrillo, F. Castiglione, Eds. New York, USA: Springer, 2020.
[22]	D. Y. Sun, X. Huang, Y. H. Liu, and H. Zhong, “Predictable energy aware routing based on dynamic game theory in wireless sensor networks,” Comput. Electric. Eng., vol. 39, no. 6, pp. 1601–1608, Aug. 2013. doi: 10.1016/j.compeleceng.2012.05.007
[23]	S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to correlated equilibrium,” Econometrica, vol. 68, no. 5, pp. 1127–1150, Sep. 2000. doi: 10.1111/1468-0262.00153
[24]	J. Heinrich, M. Lanctot, and D. Silver, “Fictitious self-play in extensive-form games,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 805–813.
[25]	S. W. Wang, X. Jin, S. Mao, A. V. Vasilakos, and Y. Tang, “Model-free event-triggered optimal consensus control of multiple Euler-Lagrange systems via reinforcement learning,” IEEE Trans. Netw. Sci. Eng., vol. 8, no. 1, pp. 246–258, Jan.–Mar. 2021. doi: 10.1109/TNSE.2020.3036604
[26]	C. Z. Zhang, J. R. Wang, G. G. Yen, C. Q. Zhao, Q. Y. Sun, Y. Tang, F. Qian, and J. Kurths, “When autonomous systems meet accuracy and transferability through AI: A survey,” Patterns, vol. 1, no. 4, p. 100050, Jul. 2020.
[27]	H. Tembine, Q. Y. Zhu, and T. Bașar, “Risk-sensitive mean-field games,” IEEE Trans. Autom. Control, vol. 59, no. 4, pp. 835–850, Apr. 2014. doi: 10.1109/TAC.2013.2289711
[28]	D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, Dec. 2018. doi: 10.1126/science.aar6404
[29]	A. Blair and A. Saffidine, “AI surpasses humans at six-player poker,” Science, vol. 365, no. 6456, pp. 864–865, Aug. 2019. doi: 10.1126/science.aay7774
[30]	O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. Wünsch, K. Mckinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Oct. 2019. doi: 10.1038/s41586-019-1724-z
[31]	T. Yang, X. L. Yi, J. F. Wu, Y. Yuan, D. Wu, Z. Y. Meng, Y. G. Hong, H. Wang, Z. L. Lin, and K. H. Johansson, “A survey of distributed optimization,” Ann. Rev. Control, vol. 47, pp. 278–305, May 2019. doi: 10.1016/j.arcontrol.2019.05.006
[32]	M. Zhu, A. H. Anwar, Z. L. Wan, J. H. Cho, C. A. Kamhoua, and M. P. Singh, “A survey of defensive deception: Approaches using game theory and machine learning,” IEEE Commun. Surv. Tut., vol. 23, no. 4, pp. 2460–2493, Oct.–Dec. 2021. doi: 10.1109/COMST.2021.3102874
[33]	K. Sohrabi and H. Azgomi, “A survey on the combined use of optimization methods and game theory,” Arch. Comput. Methods Eng., vol. 27, no. 1, pp. 59–80, Jan. 2020. doi: 10.1007/s11831-018-9300-5
[34]	Q. J. Shi, C. He, H. Y. Chen, and L. G. Jiang, “Distributed wireless sensor network localization via sequential greedy optimization algorithm,” IEEE Trans. Signal Process., vol. 58, no. 6, pp. 3328–3340, Jun. 2010. doi: 10.1109/TSP.2010.2045416
[35]	S. Mao, Y. Tang, Z. W. Dong, K. Meng, Z. Y. Dong, and F. Qian, “A privacy preserving distributed optimization algorithm for economic dispatch over time-varying directed networks,” IEEE Trans. Indust. Inf., vol. 17, no. 3, pp. 1689–1701, Mar. 2021. doi: 10.1109/TII.2020.2996198
[36]	D. K. Molzahn, F. Dörfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,” IEEE Trans. Smart Grid, vol. 8, no. 6, pp. 2941–2962, Nov. 2017. doi: 10.1109/TSG.2017.2720471
[37]	S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Trans. Autom. Control, vol. 63, no. 3, pp. 714–725, Mar. 2017. doi: 10.1109/TAC.2017.2743462
[38]	N. Eshraghi and B. Liang, “Distributed online optimization over a heterogeneous network with any-batch mirror descent,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 2933–2942.
[39]	X. X. Li, X. L. Yi, and L. H. Xie, “Distributed online convex optimization with an aggregative variable,” IEEE Trans. Control Netw. Syst., 2021.
[40]	J. Y. Li, C. Y. Gu, Z. Y. Wu, and T. W. Huang, “Online learning algorithm for distributed convex optimization with time-varying coupled constraints and bandit feedback,” IEEE Trans. Cybern., vol. 52, no. 2, pp. 1009–1020, Feb. 2022. doi: 10.1109/TCYB.2020.2990796
[41]	S. Shahrampour, A. Rakhlin, and A. Jadbabaie, “Distributed estimation of dynamic parameters: Regret analysis,” in Proc. American Control Conf., Boston, USA, 2016, pp. 1066–1071.
[42]	M. Akbari, B. Gharesifard, and T. Linder, “Distributed online convex optimization on time-varying directed graphs,” IEEE Trans. Control Netw. Syst., vol. 4, no. 3, pp. 417–428, Sep. 2017. doi: 10.1109/TCNS.2015.2505149
[43]	Y. Zhang, R. J. Ravier, M. M. Zavlanos, and V. Tarokh, “A distributed online convex optimization algorithm with improved dynamic regret,” in Proc. IEEE 58th Conf. Decision and Control, Nice, France, 2019, pp. 2449–2454.
[44]	S. M. Fosson, “Centralized and distributed online learning for sparse time-varying optimization,” IEEE Trans. Autom. Control, vol. 66, no. 6, pp. 2542–2557, Jun. 2020. doi: 10.1109/TAC.2020.3010242
[45]	S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed convex optimization on dynamic networks,” IEEE Trans. Autom. Control, vol. 61, no. 11, pp. 3545–3550, Nov. 2016. doi: 10.1109/TAC.2016.2525928
[46]	N. Mazzi, B. S. Zhang, and D. S. Kirschen, “An online optimization algorithm for alleviating contingencies in transmission networks,” IEEE Trans. Power Syst., vol. 33, no. 5, pp. 5572–5582, Sep. 2018. doi: 10.1109/TPWRS.2018.2808456
[47]	K. H. Lu, G. S. Jing, and L. Wang, “Online distributed optimization with strongly pseudoconvex-sum cost functions,” IEEE Trans. Autom. Control, vol. 65, no. 1, pp. 426–433, Jan. 2019. doi: 10.1109/TAC.2019.2915745
[48]	X. L. Yi, X. X. Li, L. H. Xie, and K. H. Johansson, “Distributed online convex optimization with time-varying coupled inequality constraints,” IEEE Trans. Signal Process., vol. 68, pp. 731–746, Jan. 2020. doi: 10.1109/TSP.2020.2964200
[49]	D. M. Yuan, Y. G. Hong, D. W. C. Ho, and S. Y. Xu, “Distributed mirror descent for online composite optimization,” IEEE Trans. on Autom. Control, vol. 66, no. 2, pp. 714–729, Feb. 2020. doi: 10.1109/TAC.2020.2987379
[50]	Q. G. Lu, X. F. Liao, T. Xiang, H. Q. Li, and T. W. Huang, “Privacy masking stochastic subgradient-push algorithm for distributed online optimization,” IEEE Trans. Cybern., vol. 51, no. 6, pp. 3224–3237, Jun. 2020.
[51]	K. H. Lu and L. Wang, “Online distributed optimization with nonconvex objective functions: Sublinearity of first-order optimality condition-based regret,” IEEE Trans. Autom. Control, 2021.
[52]	L. Ding, P. Hu, Z. W. Liu, and G. H. Wen, “Transmission lines overload alleviation: Distributed online optimization approach,” IEEE Trans. Ind. Inf., vol. 17, no. 5, pp. 3197–3208, May 2021. doi: 10.1109/TII.2020.3009749
[53]	J. L. Raisaro, G. Choi, S. Pradervand, R. Colsenet, N. Jacquemont, N. Rosat, V. Mooser, and J. P. Hubaux, “Protecting privacy and security of genomic data in i2b2 with homomorphic encryption and differential privacy,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 15, no. 5, pp. 1413–1426, Sep.–Oct. 2018.
[54]	D. L. Oberski and F. Kreuter, “Differential privacy and social science: An urgent puzzle,” Harv. Data Sci. Rev., vol. 2, no. 1, pp. 1–21, Feb. 2020.
[55]	M. Y. Hong, D. Hajinezhad, and M. M. Zhao, “Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1529–1538.
[56]	D. Hajinezhad, M. Y. Hong, and A. Garcia, “ZONE: Zeroth-order nonconvex multi-agent optimization over networks,” IEEE Trans. Autom. Control, vol. 64, no. 10, pp. 3995–4010, Oct. 2019. doi: 10.1109/TAC.2019.2896025
[57]	Y. J. Tang, J. S. Zhang, and N. Li, “Distributed zero-order algorithms for nonconvex multi-agent optimization,” IEEE Trans. Control Netw. Syst., vol. 8, no. 1, pp. 269–281, Mar. 2021. doi: 10.1109/TCNS.2020.3024321
[58]	Z. Y. He, J. P. He, C. L. Chen, and X. P. Guan, “Distributed nonconvex optimization: Gradient-free iterations and globally optimal solution,” arXiv preprint arXiv: 2008.00252, 2020.
[59]	Z. Y. He, J. P. He, C. L. Chen, and X. P. Guan, “Dependable distributed nonconvex optimization via polynomial approximation,” arXiv preprint arXiv: 2101.06127, 2021.
[60]	B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017, pp. 1273–1282.
[61]	Q. Yang, Y. Liu, T. J. Chen, and Y. X. Tong, “Federated machine learning: Concept and applications,” ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, p. 12, Mar. 2019.
[62]	S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne, “Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption,” arXiv preprint arXiv: 1711.10677, 2017.
[63]	Y. L. Lu, X. H. Huang, Y. Y. Dai, S. Maharjan, and Y. Zhang, “Differentially private asynchronous federated learning for mobile edge computing in urban informatics,” IEEE Trans. Ind. Inf., vol. 16, no. 3, pp. 2134–2143, Mar. 2020. doi: 10.1109/TII.2019.2942179
[64]	M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, and Y. Khazaeni, “Bayesian nonparametric federated learning of neural networks,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 7252–7261.
[65]	H. Y. Wang, M. Yurochkin, Y. K. Sun, D. Papailiopoulos, and Y. Khazaeni, “Federated learning with matched averaging,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
[66]	S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 5132–5143.
[67]	J. Jiang, S. X. Ji, and G. D. Long, “Decentralized knowledge acquisition for mobile internet applications,” World Wide Web, vol. 23, no. 5, pp. 2653–2669, Mar. 2020. doi: 10.1007/s11280-019-00775-w
[68]	H. T. Nguyen, V. Sehwag, S. Hosseinalipour, C. G. Brinton, M. Chiang, and H. V. Poor, “Fast-convergent federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 201–218, Jan. 2021. doi: 10.1109/JSAC.2020.3036952
[69]	R. Shokri, M. Stronati, C. Z. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in Proc. IEEE Symp. Security and Privacy, San Jose, USA, 2017, pp. 3–18.
[70]	J. L. Hou, R. Xi, P. Liu, and T. L. Liu, “The switching fractional order chaotic system and its application to image encryption,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 381–388, Apr. 2017. doi: 10.1109/JAS.2016.7510127
[71]	H. Y. Zhao, J. Yan, X. Y. Luo, and X. Gua, “Privacy preserving solution for the asynchronous localization of underwater sensor networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 6, pp. 1511–1527, Nov. 2020. doi: 10.1109/JAS.2020.1003312
[72]	C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proc. 41st Ann. ACM Symp. Theory of Computing, Bethesda, USA, 2009, pp. 169–178.
[73]	C. L. Zhang, S. Y. Li, J. Z. Xia, W. Wang, F. Yan, and Y. Liu, “BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning,” in Proc. USENIX Ann. Tech. Conf., 2020, pp. 493–506.
[74]	B. Jia, X. S. Zhang, J. W. Liu, Y. Zhang, K. Huang, and Y. Q. Liang, “Blockchain-enabled federated learning data protection aggregation scheme with differential privacy and homomorphic encryption in IIoT,” IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 4049–4058, Jun. 2022. doi: 10.1109/TII.2021.3085960
[75]	C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proc. 3rd Theory of Cryptography Conf., New York, USA, 2006, pp. 265–284.
[76]	Y. Koda, K. Yamamoto, T. Nishio, and M. Morikura, “Differentially private aircomp federated learning with power adaptation harnessing receiver noise,” in Proc. IEEE Global Communications Conf., Taipei, China, 2020, pp. 1–6.
[77]	I. Curiel, Cooperative Game Theory and Applications: Cooperative Games Arising from Combinatorial Optimization Problems. Berlin, Germany: Springer Science & Business Media, 2013.
[78]	Y. Tang, H. J. Gao, and J. Kurths, “Multiobjective identification of controlling areas in neuronal networks,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 3, pp. 708–720, May 2013. doi: 10.1109/TCBB.2013.72
[79]	Y. C. Jin, Multi-Objective Machine Learning. Berlin, Germany: Springer, 2006.
[80]	W. Du, Y. Tang, S. Y. S. Leung, L. Tong, A. V. Vasilakos, and F. Qian, “Robust order scheduling in the discrete manufacturing industry: A multiobjective optimization approach,” IEEE Trans. Ind. Inf., vol. 14, no. 1, pp. 253–264, Jan. 2018. doi: 10.1109/TII.2017.2664080
[81]	K. Q. Zhang, Z. R. Yang, and T. Bașar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” in Handbook of Reinforcement Learning and Control, K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, Eds. Cham, Germany: Springer, 2021, pp. 321–384.
[82]	Z. Y. Zuo, Q. L. Han, B. D. Ning, X. H. Ge, and X. M. Zhang, “An overview of recent advances in fixed-time cooperative control of multi-agent systems,” IEEE Trans. Ind. Inf., vol. 14, no. 6, pp. 2322–2334, Jun. 2018. doi: 10.1109/TII.2018.2817248
[83]	X. H. Ge, Q. L. Han, D. R. Ding, X. M. Zhang, and B. D. Ning, “A survey on recent advances in distributed sampled-data cooperative control of multi-agent systems,” Neurocomputing, vol. 275, pp. 1684–1701, Jan. 2018. doi: 10.1016/j.neucom.2017.10.008
[84]	R. H. Crites and A. G. Barto, “Improving elevator performance using reinforcement learning,” in Proc. Advances in Neural Information Proc. Systems, Denver, USA, 1995, pp. 1017–1023.
[85]	G. E. Monahan, “State of the art-a survey of partially observable Markov decision processes: Theory, models, and algorithms,” Manag. Sci., vol. 28, no. 1, pp. 1–16, Jan. 1982. doi: 10.1287/mnsc.28.1.1
[86]	J. N. Foerster, Y. M. Assael, N. De Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Proc. 30th Advances in Neural Information Proc. Systems, Barcelona, Spain, 2016, pp. 2137–2145.
[87]	P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proc. 17th Int. Conf. Autonomous Agents and Multi-Agent Systems, Stockholm, Sweden, 2018, pp. 2085–2087.
[88]	R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proc. 31st Int. Conf. Neural Information Proc. Systems, Long Beach, USA, 2017, pp. 6382–6393.
[89]	S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multi-agent communication with backpropagation,” in Proc. 30th Int. Conf. Neural Information Proc. Systems, Barcelona, Spain, 2016, pp. 2252–2260.
[90]	A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “TarMAC: Targeted multi-agent communication,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 1538–1546.
[91]	G. Chen, “A new framework for multi-agent reinforcement learning–centralized training and exploration with decentralized execution via policy distillation,” in Proc. 19th Int. Conf. Autonomous Agents and Multi-Agent Systems, Auckland, New Zealand, 2020, pp. 1801–1803.
[92]	T. Rashid, M. Samvelyan, C. S. De Wit, G. Farquhar, J. Foerster, and S. Whiteson, “QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4295–4304.
[93]	K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 5887–5896.
[94]	T. Rashid, G. Farquhar, B. Peng, and S. Whiteson, “Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. 34th Ann. Conf. Neural Information Proc. Systems, Vancouver, Canada, 2020, pp. 10199–10210.
[95]	J. H. Wang, Z. Z. Ren, T. Liu, Y. Yu, and C. J. Zhang, “QPLEX: Duplex dueling multi-agent Q-learning,” in Proc. 9th Int. Conf. Learning Representations, 2021, pp. 1–27.
[96]	Y. D. Yang, J. Y. Hao, B. Liao, K. Shao, G. Y. Chen, W. L. Liu, and H. Y. Tang, “Qatten: A general framework for cooperative multi-agent reinforcement learning,” arXiv preprint arXiv: 2002.03939, 2020.
[97]	J. Y. Su, S. C. Adams, and P. A. Beling, “Value-decomposition multi-agent actor-critics,” in Proc. 35th AAAI Conf. Artificial Intelligence, 2021, pp. 11352–11360.
[98]	E. Winter, “The Shapley value,” Handbook Game Theory Econom. Appl., vol. 3, pp. 2025–2054, Aug. 2002.
[99]	R. E. Wang, M. Everett, and J. P. How, “R-MADDPG for partially observable environments and limited communication,” in Proc. Workshop in the 36th Int. Conf. Machine Learning, Long Beach, USA, 2020, pp. 1–9.
[100]	K. Liu, Y. Y. Zhao, G. Wang, and B. Peng, “Self-attention-based multi-agent continuous control method in cooperative environments,” Inf. Sci., vol. 585, pp. 454–470, Mar. 2021. doi: 10.1016/j.ins.2021.11.054
[101]	T. P. Yang, W. X. Wang, H. Y. Tang, J. Y. Hao, Z. P. Meng, H. Y. Mao, D. Li, W. L. Liu, Y. F. Chen, Y. J. Hu, C. J. Fan, and C. W. Zhang, “An efficient transfer learning framework for multi-agent reinforcement learning,” in Proc. 35th Advances in Neural Information Proc. Systems, 2021.
[102]	C. H. Liu, Z. P. Dai, Y. N. Zhao, J. Crowcroft, D. P. Wu, and K. K. Leung, “Distributed and energy-efficient mobile crowdsensing with charging stations by deep reinforcement learning,” IEEE Trans. Mobile Comput., vol. 20, no. 1, pp. 130–146, Jan. 2021. doi: 10.1109/TMC.2019.2938509
[103]	D. Bauso, Game Theory with Engineering Applications. Philadelphia, USA: SIAM, 2016.
[104]	M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Proc. 10th Int. Conf. Machine Learning, Amherst, USA, 1993, pp. 330–337.
[105]	P. Liu, W. Y. Zang, and M. Yu, “Incentive-based modeling and inference of attacker intent, objectives, and strategies,” ACM Trans. Inf. Syst. Secur., vol. 8, no. 1, pp. 78–118, Feb. 2005. doi: 10.1145/1053283.1053288
[106]	O. D. Altan, G. Y. Wu, M. J. Barth, K. Boriboonsomsin, and J. A. Stark, “Glidepath: Eco-friendly automated approach and departure at signalized intersections,” IEEE Trans. Intell. Veh., vol. 2, no. 4, pp. 266–277, Dec. 2017. doi: 10.1109/TIV.2017.2767289
[107]	K. Kang and H. A. Rakha, “Game theoretical approach to model decision making for merging maneuvers at freeway on-ramps,” Transp. Res. Rec., vol. 2623, no. 1, pp. 19–28, Jan. 2017. doi: 10.3141/2623-03
[108]	H. White, H. Q. Xu, and K. Chalak, “Causal discourse in a game of incomplete information,” J. Econometrics, vol. 182, no. 1, pp. 45–58, Sep. 2014. doi: 10.1016/j.jeconom.2014.04.007
[109]	S. N. Xia, F. L. Lin, Z. Y. Chen, C. B. Tang, Y. J. Ma, and X. H. Yu, “A Bayesian game based vehicle-to-vehicle electricity trading scheme for blockchain-enabled internet of vehicles,” IEEE Trans. Veh. Technol., vol. 69, no. 7, pp. 6856–6868, Jul. 2020. doi: 10.1109/TVT.2020.2990443
[110]	H. W. Zhang, J. D. Wang, D. K. Yu, J. H. Han, and T. Li, “Active defense strategy selection based on static Bayesian game,” in Proc. 3rd Int. Conf. Cyberspace Technology, Beijing, China, 2015, pp. 1–7.
[111]	J. Huang, C. C. Xing, Y. Qian, and Z. J. Haas, “Resource allocation for multicell device-to-device communications underlaying 5G networks: A game-theoretic mechanism with incomplete information,” IEEE Trans. Veh. Technol., vol. 67, no. 3, pp. 2557–2570, Mar. 2018. doi: 10.1109/TVT.2017.2765208
[112]	S. S. Hasanabadi, A. H. Lashkari, and A. A. Ghorbani, “A memorybased game-theoretic defensive approach for digital forensic investigators,” Forensic Sci. Int. Digital Invest., vol. 38, p. 301214, Sep. 2021.
[113]	R. Yan, Z. Y. Shi, and Y. S. Zhong, “Task assignment for multiplayer reach–avoid games in convex domains via analytical barriers,” IEEE Trans. Rob., vol. 36, no. 1, pp. 107–124, Feb. 2020. doi: 10.1109/TRO.2019.2935345
[114]	H. M. Huang, J. Ding, W. Zhang, and C. J. Tomlin, “Automation-assisted capture-the-flag: A differential game approach,” IEEE Trans. Control Syst. Technol., vol. 23, no. 3, pp. 1014–1028, Mar. 2015. doi: 10.1109/TCST.2014.2360502
[115]	M. Mitchell, A. M. Bayen, and C. J. Tomlin, “A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games,” IEEE Trans. Autom. Control, vol. 50, no. 7, pp. 947–957, Jul. 2005. doi: 10.1109/TAC.2005.851439
[116]	R. Yan, Z. Y. Shi, and Y. S. Zhong, “Reach-avoid games with two defenders and one attacker: An analytical approach,” IEEE Trans. Cybern., vol. 49, no. 3, pp. 1035–1046, Mar. 2019. doi: 10.1109/TCYB.2018.2794769
[117]	E. Ben-Porath, “Rationality, Nash equilibrium and backwards induction in perfect-information games,” Rev. Econom. Stud., vol. 64, no. 1, pp. 23–46, Jan. 1997. doi: 10.2307/2971739
[118]	S. D. Levitt, J. A. List, and S. E. Sadoff, “Checkmate: Exploring backward induction among chess players,” Am. Econom. Rev., vol. 101, no. 2, pp. 975–990, Apr. 2011. doi: 10.1257/aer.101.2.975
[119]	M. Bowling, N. Burch, M. Johanson, and O. Tammelin, “Heads-up limit Hold’em poker is solved,” Science, vol. 347, no. 6218, pp. 145–149, Jan. 2015. doi: 10.1126/science.1259433
[120]	S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proc. 14th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2011, pp. 627–635.
[121]	M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, “Regret minimization in games with incomplete information,” in Proc. Advances in Neural Information Proc. Systems, Vancouver, Canada, 2007, pp. 1729–1736.
[122]	N. Brown and T. Sandholm, “Solving imperfect-information games via discounted regret minimization,” in Proc. 33rd AAAI Conf. Artificial Intelligence, Honolulu, USA, 2019, pp. 1829–1836.
[123]	H. Li, K. L. Hu, S. H. Zhang, Y. Qi, and L. Song, “Double neural counterfactual regret minimization,” in Proc. 7th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020, pp. 1–20.
[124]	N. Brown, A. Lerer, S. Gross, and T. Sandholm, “Deep counterfactual regret minimization,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 793–802.
[125]	E. Steinberger, “Single deep counterfactual regret minimization,” arXiv preprint arXiv: 1901.07621, 2019.
[126]	S. X. Li, Y. Z. Zhang, X. R. Wang, W. Q. Xue, and B. An, “CFR-MIX: Solving imperfect information extensive-form games with combinatorial action space,” in Proc. 30th Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 3663–3669.
[127]	D. Monderer and L. S. Shapley, “Fictitious play property for games with identical interests,” J. Econ. Theory, vol. 68, no. 1, pp. 258–265, Jan. 1996. doi: 10.1006/jeth.1996.0014
[128]	J. Heinrich and D. Silver, “Deep reinforcement learning from self-play in imperfect-information games,” in Proc. 3rd Workshops at Advances Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1–10.
[129]	P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multi-agent deep reinforcement learning,” Auton. Agents Multi-Agent Syst, vol. 33, no. 6, pp. 750–797, Oct. 2019. doi: 10.1007/s10458-019-09421-1
[130]	W. Q. Xue, Y. Z. Zhang, S. X. Li, X. R. Wang, B. An, and C. K. Yeo, “Solving large-scale extensive-form network security games via neural fictitious self-play,” in Proc. 30th AAAI Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 3713–3720.
[131]	W. B. Li, J. L. Xu, J. Huo, L. Wang, Y. Gao, and J. B. Luo, “Distribution consistency based covariance metric networks for few-shot learning,” in Proc. 33rd AAAI Conf. Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conf. and Ninth AAAI Symp. Educational Advances in Artificial Intelligence, Honolulu, USA, 2019, pp. 8642–8649.
[132]	M. Y. Huang, R. P. Malhamé, and P. E. Caines, “Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle,” Commun. Inf. Syst., vol. 6, no. 3, pp. 221–252, Jan. 2006. doi: 10.4310/CIS.2006.v6.n3.a5
[133]	M. Y. Huang, P. E. Caines, and R. P. Malhame, “Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria,” IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1560–1571, Sep. 2007. doi: 10.1109/TAC.2007.904450
[134]	J. Moon and T. Bașar, “Linear quadratic mean field Stackelberg differential games,” Automatica, vol. 97, pp. 200–213, Nov. 2018. doi: 10.1016/j.automatica.2018.08.008
[135]	J. Moon and T. Başar, “Linear quadratic risk-sensitive and robust mean field games,” IEEE Trans. Autom. Control, vol. 62, no. 3, pp. 1062–1077, Mar. 2017. doi: 10.1109/TAC.2016.2579264
[136]	T. Li and J. F. Zhang, “Decentralized tracking-type games for multi-agent systems with coupled ARX models: Asymptotic Nash equilibria,” Automatica, vol. 44, no. 3, pp. 713–725, Mar. 2008. doi: 10.1016/j.automatica.2007.07.007
[137]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. doi: 10.1038/nature16961
[138]	D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. T. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, Oct. 2017. doi: 10.1038/nature24270
[139]	J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver, “Mastering Atari, Go, chess and shogi by planning with a learned model,” Nature, vol. 588, no. 7839, pp. 604–609, Dec. 2020. doi: 10.1038/s41586-020-03051-4
[140]	N. Brown and T. Sandholm, “Superhuman AI for heads-up no-limit poker: Libratus beats top professionals,” Science, vol. 359, no. 6374, pp. 418–424, Dec. 2017.
[141]	N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, Jul. 2019. doi: 10.1126/science.aay2400
[142]	X. J. Wang, J. X. Song, P. H. Qi, P. Peng, Z. K. Tang, W. Zhang, W. M. Li, X. J. Pi, J. J. He, C. Gao, H. T. Long, and Q. Yuan, “SCC: An efficient deep reinforcement learning agent mastering the game of StarCraft II,” in Proc. 38th Int. Conf. Machine Learning, 2021, pp. 10905–10915.
[143]	Y. X. Chen, L. Zhang, S. J. Li, and G. Pan, “Optimize neural fictitious self-play in regret minimization thinking,” arXiv preprint arXiv: 2104.10845, 2021.
[144]	S. Mao, Z. W. Dong, W. Du, Y. C. Tian, C. Liang, and Y. Tang, “Distributed non-convex event-triggered optimization over time-varying directed networks,” IEEE Trans. Ind. Inf., 2021.
[145]	A. Ghosh, J. Hong, D. Yin, and K. Ramchandran, “Robust federated learning in a heterogeneous environment,” arXiv preprint arXiv: 1906.06629, 2019.
[146]	U. Mahadev, “Classical homomorphic encryption for quantum circuits,” in Proc. IEEE 59th Ann. Symp. Foundations of Computer Science, Paris, France, 2018, pp. 332–338.
[147]	M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” J. Comput. Phys., vol. 378, pp. 686–707, Feb. 2019. doi: 10.1016/j.jcp.2018.10.045
[148]	A. E. Roth and M. A. O. Sotomayor, Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis, Cambridge, UK: Cambridge University Press, 1992, pp. 485–541.
[149]	S. Kumabe and T. Maehara, “Convexity of b-matching games.” in Proc. 29th Int. Joint Conf. Artificial Intelligence, Yokohama, Japan, 2020, pp. 261–267.
[150]	I. Gemp, B. McWilliams, C. Vernade, and T. Graepel, “Eigengame unloaded: When playing games is better than optimizing,” arXiv preprint arXiv: 2102.04152, 2021.
[151]	S. Pateria, B. Subagdja, A. H. Tan, and C. Quek, “Hierarchical reinforcement learning: A comprehensive survey,” ACM Comput. Surv., vol. 54, no. 5, p. 109, Jun. 2022.
[152]	H. Ryu, H. Shin, and J. Park, “Multi-agent actor-critic with hierarchical graph attention network,” in Proc. 34th AAAI Conf. Artificial Intelligence, New York, USA, 2020, pp. 7236–7243.
[153]	L. X. Wang, Z. R. Yang, and Z. R. Wang, “Breaking the curse of many agents: Provable mean embedding Q-iteration for mean-field reinforcement learning,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 10092–10103.
[154]	F. Schubert, M. Awiszus, and B. Rosenhahn, “TOAD-GAN: A flexible framework for few-shot level generation in token-based games,” IEEE Trans. Games, 2021.
[155]	Y. Chen, J. M. Liu, and B. Khoussainov, “Maximum entropy inverse reinforcement learning for mean field games,” arXiv preprint arXiv: 2104.14654, 2021.
[156]	A. R. Tilman, J. B. Plotkin, and E. Akçay, “Evolutionary games with environmental feedbacks,” Nat. Commun., vol. 11, no. 1, p. 915, Feb. 2020.
[157]	D. Mandal, S. Medya, B. Uzzi, and C. Aggarwal, “Meta-learning with graph neural networks: Methods and applications,” arXiv preprint arXiv: 2103.00137, 2021.
[158]	Y. Shu, Z. J. Cao, C. Y. Wang, J. M. Wang, and M. S. Long, “Open domain generalization with domain-augmented meta-learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 9619–9628.
[159]	DI-engine Contributors, “DI-engine: OpenDILab decision intelligence engine,” 2021. [Online]. Available: https://github.com/opendilab/DI-engine.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (4578) PDF downloads(1082)

Highlights

For cooperative optimization, we focus on the recent work of distributed online optimization and the research of distributed online optimization and federated optimization in privacy protection
For games, we focus on cooperative games and non-cooperative games from the perspective of static games and dynamic games, respectively
We bridge the transition from cooperative optimization to games, i.e., cooperative games. Similar to cooperative optimization, cooperative games also consider coordinating multiple individuals to achieve unified goals. Different from cooperative optimization, cooperative games consider the emergence of cooperative behaviors

Cooperative and Competitive Multi-Agent Systems: From Optimization to Games

doi: 10.1109/JAS.2022.105506

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content