IEEE/CAA Journal of Automatica Sinica
Citation: | Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014. |
[1] |
Russell S J, Norvig P, Canny J F, Malik J M, Edwards D D. Artificial Intelligence: A Modern Approach. Englewood Cliffs: Prentice Hall, 1995.
|
[2] |
Zhao D B, Dai Y J, Zhang Z. Computational intelligence in urban traffic signal control, a survey. IEEE Transactions on System, Man and Cybernetics Part C: Applications and Reviews, 2012, 42(4): 485-494
|
[3] |
Li T, Zhao D B, Yi J Q. Adaptive dynamic programming for multiintersections traffic signal intelligent control. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 286-291
|
[4] |
Zhao D B, Bai X R, Wang F Y, Xu J, Yu W S. DHP for coordinated freeway ramp metering. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(4): 990-999
|
[5] |
Vlassis N. A concise introduction to multiagent systems and distributed artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2007, 1(1): 1-71
|
[6] |
Khan Z, Glisic S, DaSilva L A, Lehtomaki J. Modeling the dynamics of coalition formation games for cooperative spectrum sharing in an interference channel. IEEE Transactions on Computational Intelligence and AI in Games, 2011, 3(1): 17-30
|
[7] |
Zhao D B, Zhang Z, Dai Y J. Self-teaching adaptive dynamic programming for Go-Moku. Neurocomputing, 2012, 78(1): 23-29
|
[8] |
Tan C H, Tan K C, Tay A. Dynamic game difficulty scaling using adaptive behavior-based AI. IEEE Transactions on Computational Intelligence and AI in Games, 2011, 3(4): 289-301
|
[9] |
Zeng L, Hu G D. Consensus of linear multi-agent systems with communication and input delays. Acta Automatica Sinica, 2013, 39(7): 1133-1140
|
[10] |
Li T, Fu M Y, Xie L H, Zhang J F. Distributed consensus with limited communication data rate. IEEE Transactions on Automatic Control, 2011, 56(2): 279-292
|
[11] |
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998
|
[12] |
Wiering M A. Multiagent reinforcement learning for traffic light control. In: Proceedings of the 17th International Conference on Machine Learning. Stanford University, US: Morgan Kaufmann, 2000. 1151-1158
|
[13] |
Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence and 10th Conference on Innovative Applications of Artificial Intelligence. Menlo Park, CA: AAAI Press/MIT Press, 1998. 746-752
|
[14] |
Tuyls K, Nowé A. Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review, 2005, 20(1): 63-90
|
[15] |
Gomes E R, Kowalczyk R. Dynamic analysis of multiagent Q-learning with ε-greedy exploration. In: Proceedings of the 26th International Conference on Machine Learning. New York, USA: ACM, 2009. 369-376
|
[16] |
Kianercy A, Galstyan A. Dynamics of Boltzmann Q-learning in twoplayer two-action games. Physical Review E, 2012, 85(4): 1145-1154
|
[17] |
Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning. New Brunswick, US: Morgan Kaufmann, 1994. 157-163
|
[18] |
Owen G. Game Theory (Second Edition). Orlando, Florida: Academic Press, 1982.
|
[19] |
Littman M L. Friend-or-foe Q-learning in general-sum games. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 2001. 322-328
|
[20] |
Hu J L, Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th International Conference on Machine Learning. Madison, Wisconsin, US: Morgan Kaufmann, 1998. 242-250
|
[21] |
Singh S, Kearns M, Mansour Y. Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford University, Stanford, California, US: Morgan Kaufmann, 2000. 541-548
|
[22] |
Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136(2): 215-250
|
[23] |
Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 (in Chinese)
|
[24] |
Werbos P J. A menu of designs for reinforcement learning over time. Neural Networks for Control. Cambridge: MIT Press, 1990
|
[25] |
Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-networkbased optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634
|
[26] |
Huang Y Z, Liu D R. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing, 2014, 125: 46-56
|
[27] |
Song Rui-Zhuo, Xiao Wen-Dong, Sun Chang-Yin. Optimal tracking control for a class of unknown discrete-time systems with actuator saturation via data-based ADP algorithm. Acta Automatica Sinica, 2013, 39(9): 1413-1420 (in Chinese)
|
[28] |
Wang D, Liu D R. Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique. Neurocomputing, 2013, 121(9) 218-225
|
[29] |
Zhang Ji-Lie, Zhang Hua-Guang, Luo Yan-Hong, Liang Hong-Jing. Nearly optimal control scheme using adaptive dynamic programming based on generalized fuzzy hyperbolic model. Acta Automatica Sinica, 2013, 39(2): 142-148 (in Chinese)
|
[30] |
Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2008, 38(4): 988-993
|
[31] |
Zhao D B, Hu Z H, Xia Z P, Alippi C, Zhu YH, Wang D. Full range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing, 2014, 125: 57-67
|
[32] |
Zhao D B, Yi J Q, Liu D R. Particle swarn optimized adaptive dynamic programming. In: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, Hawaiian Islands, US: IEEE, 2007. 32-37
|
[33] |
Crites R H, Barto A G. Improving elevator performance using reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems 8. Denver, US: MIT Press, 1996. 1017-1023
|
[34] |
Bazzan A L, de Oliveira D, de Silva B C. Learning in groups of traffic signals. Engineering Applications of Artificial Intelligence, 2010, 23(4): 560-568
|
[35] |
Bazzan A L. Coordinating many agents in stochastic games. In: Proceedings of the 2012 International Joint Conference on Neural Networks. Brisbane, Australia: IEEE, 2012. 1-8
|
[36] |
Kok J R, Vlassis N. Sparse cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning. USA: ACM, 2004. 481-488
|
[37] |
Kok J R, Vlassis N. Using the max-plus algorithm for multiagent decision making in coordination graphs. In: Proceedings of Robot Soccer World Cup IX, Lecture Notes in Computer Science. Berlin: Springer, 2005. 1-12
|
[38] |
Syed A, Koenig S, Tambe M. Preprocessing techniques for accelerating the DCOP algorithm ADOPT. In: Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems. Utrecht, The Netherlands: ACM, 2005. 1041-1048
|
[39] |
Kok J R, Vlassis N. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 2006, 7(S1): 1789-1828
|
[40] |
Schneider J, Wong W K, Moore A, Riedmiller M. Distributed value functions. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1999. 371-378
|
[41] |
Chen G, Cao W H, Chen X, Wu M. Multi-agent Q-learning with joint state value approximation. In: Proceedings of the 30th Chinese Control Conference. Yantai, China: IEEE, 2011. 4878-4882
|
[42] |
Guestrin C, Lagoudakis M G, Parr R. Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: Morgan Kaufmann, 2002. 227-234
|
[43] |
Panait L, Luke S. Cooperative multi-agent learning: the state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
|
[44] |
Luke S. Genetic programming produced competitive soccer softbot teams for RoboCup97. In: Proceedings of the 3rd Annual Conference on Genetic Programming. Madison, Wisconsin, US: Morgan Kaufmann, 1998. 214-222
|
[45] |
Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2): 498-519
|
[46] |
Pearl J. Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufman, 1988
|
[47] |
Weiss Y, Freeman W T. On the optimality of solutions of the maxproduct belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 2001, 47(2): 736-744
|
[48] |
Lan X, Roth S, Huttenlocher D, Black M. Efficient belief propagation with learned higher-order Markov random fields. In: Proceedings of the 2006 European Conference on Computer Vision. Berlin: Springer, 2006. 269-282
|
[49] |
Som P, Chockalingam A. Damped belief propagation based near-optimal equalization of severely delay-spread UWB MIMO-ISI channels. In: Proceedings of the 2010 IEEE International Conference on Communications. Cape Town, South Africa: IEEE, 2010. 1-5
|
[50] |
Frey B J, Kschischang F R, Conference A. Probability propagation and iterative decoding. In: Proceedings of the 34th Annual Allerton Conference on Communication Control and Computing. Illinois, US: IEEE, 1996. 1-4
|