 Volume 1
							Issue 2
								
						 Volume 1
							Issue 2 
						IEEE/CAA Journal of Automatica Sinica
| Citation: | Xin Chen, Bo Fu, Yong He and Min Wu, "Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 2, pp. 127-133, 2014. | 
 
	                | [1] | Gao Yang, Chen Shi-Fu, Lu Xin. Research on reinforcement learning technology:a review. Acta Automatica Sinica, 2004, 30(1):86-100(in Chinese) | 
| [2] | Busoniu L, Babuska R, Schutter B D. Decentralized reinforcement learning control of a robotic manipulator. In:Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision. Singapore, Singapore:IEEE, 2006. 1347-1352 | 
| [3] | Maravall D, De Lope J, Douminguez R. Coordination of communication in robot teams by reinforcement learning. Robotics and Autonomous Systems, 2013, 61(7):661-666 | 
| [4] | Gabel T, Riedmiller M. The cooperative driver:multi-agent learning for preventing traffic jams. International Journal of Traffic and Transportation Engineering, 2013, 1(4):67-76 | 
| [5] | Tumer K, Agogino A K. Distributed agent-based air traffic flow management. In:Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. Honolulu, Hawaii, USA:ACM, 2007. 330-337 | 
| [6] | Tang Hao, Wan Hai-Feng, Han Jiang-Hong, Zhou Lei. Coordinated lookahead control of multiple CSPS system by multi-agent reinforcement learning. Acta Automatica Sinica, 2010, 36(2):330-337(in Chinese) | 
| [7] | Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics-Part C:Applications and Reviews, 2008, 38(2):156-172 | 
| [8] | Abdallah S, Lesser V. A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 2008, 33:521-549 | 
| [9] | Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes:research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5):673-687(in Chinese) | 
| [10] | Fulda N, Ventura D. Predicting and preventing coordination problems in cooperative Q-learning systems. In:Proceedings of the 20th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc, 2007. 780-785 | 
| [11] | Chen X, Chen G, Cao W H, Wu M. Cooperative learning with joint state value approximation for multi-agent systems. Journal of Control Theory and Applications, 2013, 11(2):149-155 | 
| [12] | Wang Y, de Silva C W. Multi-robot box-pushing:single-agent Qlearning vs. team Q-learning. In:Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Beijing, China:IEEE, 2006. 3694-3699 | 
| [13] | Cheng Yu-Hu, Feng Huan-Ting, Wang Xue-Song. Expectationmaximization policy search with parameter-based exploration. Acta Automatica Sinica, 2012, 38(1):38-45(in Chinese) | 
| [14] | Teboul O, Kokkinos I, Simon L, Koutsourakis P, Paragios N. Parsing facades with shape grammars and reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7):1744-1756 | 
| [15] | Matignon L, Laurent G J, Fort-Piat N L. Independent reinforcement learners in cooperative Markov games:a survey regarding coordination problems. The Knowledge Engineering Review, 2012, 27:1-31 | 
| [16] | Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136(2):215-250 | 
| [17] | Kapetanakis S, Kudenko D. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In:Proceedings of the Third International Joint Conference an Autonomous Agents and Multiagent System. New York, USA:IEEE, 2004. 1258-1259 | 
| [18] | Matignon L, Laurent G J, Fort-Piat N L. Hysteretic Q-learning:an algorithm for decentralized reinforcement learning in cooperative multiagent teams. In:Proceedings of IEEE/RSJ International Conference on Intelligent Robots and System. San Diego, California, USA:IEEE, 2007. 64-69 | 
| [19] | Tsitsiklis J N. On the convergence of optimistic policy iteration. The Journal of Machine Learning Research, 2003, 3:59-72 | 
| [20] | Wang Y, de Silva C W. A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence, 2008, 21(3):470-484 |