| [1] | D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. I. 4th ed. Belmont, USA: Athena Scientific, 2017. |
| [2] | D. P. Bertsekas, Reinforcement Learning and Optimal Control. Belmont, USA: Athena Scientific, 2019. |
| [3] | D. P. Bertsekas, Rollout, Policy Iteration, and Distributed Reinforcement Learning. Belmont, USA: Athena Scientific, 2020. |
| [4] | D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, "Mastering chess and Shogi by self-play with a general reinforcement learning algorithm, " arXiv preprint arXiv: 1712.01815, 2017. |
| [5] | D. P. Bertsekas, "Multiagent value iteration algorithms in dynamic programming and reinforcement learning, " arxiv: 2005.01627, 2020. |
| [6] | D. P. Bertsekas, "Constrained multiagent rollout and multidimensional assignment with the auction algorithm, " arXiv: 2002.07407, 2020. |
| [7] | D. P. Bertsekas, "Distributed dynamic programming, " IEEE Trans. Autom. Control, vol. 27, no. 3, pp. 610-616, Jun. 1982. |
| [8] | D. P. Bertsekas, "Asynchronous distributed computation of fixed points, " Math. Programming, vol. 27, no. 1, pp. 107-120, Sep. 1983. |
| [9] | D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Englewood Cliffs, USA: Prentice-Hall, 1989. |
| [10] | D. P. Bertsekas and H. Z. Yu, "Asynchronous distributed policy iteration in dynamic programming, " in Proc. 48th Annu. Allerton Conf. Communication, Control, and Computing, Allerton, USA, 2010, pp. 1368-1374. |
| [11] | D. P. Bertsekas and H. Z. Yu, "Q-learning and enhanced policy iteration in discounted dynamic programming, " Math. Oper. Res., vol. 37, pp. 66-94, Feb. 2012. |
| [12] | H. Z. Yu and D. P. Bertsekas, "Q-learning and policy iteration algorithms for stochastic shortest path problems, " Ann. Oper. Res., vol. 208, no. 1, pp. 95-132, Sep. 2013. |
| [13] | D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. Ⅱ. 4th ed. Belmont, USA: Athena Scientific, 2012. |
| [14] | D. P. Bertsekas, Abstract Dynamic Programming. Belmont, USA: Athena Scientific, 2018. |
| [15] | S. Bhattacharya, S. Badyal, T. Wheeler, S. Gil, and D. P. Bertsekas, "Reinforcement learning for POMDP: Partitioned rollout and policy iteration with application to autonomous sequential repair problems, " IEEE Rob. Autom. Lett., vol. 5, no. 3, pp. 3967-3974, Jul. 2020. |
| [16] | H. S. Witsenhausen, "A counterexample in stochastic optimum control, " SIAM J. Control, vol. 6, no. 1, pp. 131-147, 1968. http://www.ams.org/mathscinet-getitem?mr=231649 |
| [17] | H. S. Witsenhausen, "Separation of estimation and control for discrete time systems, " Proc. IEEE, vol. 59, no. 11, pp. 1557-1566, Nov. 1971. |
| [18] | J. Marschak, "Elements for a theory of teams, " Manage. Sci., vol. 1, no. 2, pp. 127-137, Jan. 1975. |
| [19] | R. Radner, "Team decision problems, " Ann. Math. Statist., vol. 33, no. 3, pp. 857-881, Sep. 1962. |
| [20] | H. S. Witsenhausen, "On information structures, feedback and causality, " SIAM J. Control, vol. 9, no. 2, pp. 149-160, 1971. http://www.researchgate.net/publication/250955878_On_Information_Structures_Feedback_and_Causality |
| [21] | J. Marschak and R. Radner, Economic Theory of Teams. New Haven, USA: Yale University Press, 1976. |
| [22] | N. Sandell, P. Varaiya, M. Athans, and M. Safonov, "Survey of decentralized control methods for large scale systems, " IEEE Trans. Autom. Control, vol. 23, no. 2, pp. 108-128, Apr. 1978. |
| [23] | T. Yoshikawa, "Decomposition of dynamic team decision problems, " IEEE Trans. Autom. Control, vol. 23, no. 4, pp. 627-632, Aug. 1978. |
| [24] | Y. C. Ho, "Team decision theory and information structures, " Proc. IEEE, vol. 68, no. 6, pp. 644-654, Jun. 1980. |
| [25] | D. Bauso and R. Pesenti, "Generalized person-by-person optimization in team problems with binary decisions, " in Proc. American Control Conf., Seattle, USA, 2008, pp. 717-722. |
| [26] | D. Bauso and R. Pesenti, "Team theory and person-by-person optimization with binary decisions, " SIAM J. Control Optim., vol. 50, no. 5, pp. 3011-3028, Jan. 2012. |
| [27] | A. Nayyar, A. Mahajan, and D. Teneketzis, "Decentralized stochastic control with partial history sharing: A common information approach, " IEEE Trans. Autom. Control, vol. 58, no. 7, pp. 1644-1658, Jul. 2013. |
| [28] | A. Nayyar and D. Teneketzis, "Common knowledge and sequential team problems, " IEEE Trans Autom. Control, vol. 64, no. 12, pp. 5108-5115, Dec. 2019. |
| [29] | Y. Y. Li, Y. J. Tang, R. Y. Zhang, and N. Li, "Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach, " arXiv: 1912.09135, 2019. |
| [30] | G. Qu and N. Li, "Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure, " in Proc. of CDC, Nice, France, 2019. |
| [31] | A. Gupta, "Existence of team-optimal solutions in static teams with common information: A topology of information approach, " SIAM J. Control Optim., vol. 58, no. 2, pp. 998-1021, Apr. 2020. |
| [32] | F. Bullo, J. Cortes, and S. Martinez, Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms. St. Princeton, USA: Princeton University Press, 2009. |
| [33] | M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent Networks. Princeton, USA: Princeton University Press, 2010. |
| [34] | M. S. Mahmoud, Multiagent Systems: Introduction and Coordination Control. Boca Raton, USA: CRC Press, 2020. |
| [35] | R. Zoppoli, M. Sanguineti, G. Gnecco, and T. Parisini, Neural Approximations for Optimal Control and Decision, Springer, 2020. http://www.researchgate.net/publication/338322921_Neural_Approximations_for_Optimal_Control_and_Decision |
| [36] | F. A. Oliehoek and C. Amato, A Concise Introduction to Decentralized POMDPs, Springer International Publishing, 2016. doi: 10.1007/978-3-319-28929-8 |
| [37] | P. Hernandez-Leal, M. Kaisers, T. Baarslag, and E. M. de Cote, "A survey of learning in multiagent environments: Dealing with non-stationarity, " arXiv: 1707.09183, 2017. |
| [38] | K. Q. Zhang, Z. R. Yang, and T. Başar, "Multi-agent reinforcement learning: A selective overview of theories and algorithms, " arXiv: 1911.10635, 2019. |
| [39] | L. S. Shapley, "Stochastic games, " Proc. Natl. Acad. Sci., vol. 39, no. 10, pp. 1095-1100, 1953. |
| [40] | M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning, " in Machine Learning Proceedings 1994, W. W. Cohen and H. Hirsh, Eds. Amsterdam, The Netherlands: Elsevier, 1994, pp. 157-163. |
| [41] | K. P. Sycara, "Multiagent systems, " AI Mag., vol. 19, no. 2, pp. 79-92, Jun. 1998. |
| [42] | P. Stone and M. Veloso, "Multiagent systems: A survey from a machine learning perspective, " Auton. Rob., vol. 8, no. 3, pp. 345-383, Jun. 2000. |
| [43] | L. Panait and S. Luke, "Cooperative multi-agent learning: The state of the art, " Auton. Agen. Multi-Agent Syst., vol. 11, no. 3, pp. 387-434, Nov. 2005. |
| [44] | L. Busoniu, R. Babuska, and B. De Schutter, "A comprehensive survey of multiagent reinforcement learning, " IEEE Trans. Syst., Man, Cybern., Part C, vol. 38, no. 2, pp. 156-172, Mar. 2008. |
| [45] | L. Busoniu, R. Babuška, and B. De Schutter, "Multi-agent reinforcement learning: An overview, " in Innovations in Multi-Agent Systems and Applications-1, D. Srinivasan and L. C. Jain, Eds. Berlin, Germany: Springer, 2010, pp. 183-221. |
| [46] | L. Matignon, G. J. Laurent, and N. Le Fort-Piat, "Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems, " Knowl. Eng. Rev., vol. 27, no. 1, pp. 1-31, Feb. 2012. |
| [47] | P. Hernandez-Leal, B. Kartal, and M. E. Taylor, "A survey and critique of multiagent deep reinforcement learning, " Auton. Agent. Multi-Agent Syst., vol. 33, no. 6, pp. 750-797, Oct. 2019. |
| [48] | A. OroojlooyJadid and D. Hajinezhad, "A review of cooperative multi-agent deep reinforcement learning, " arXiv: 1908.03963, 2019. |
| [49] | T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, "Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications, " IEEE Trans Cybern., vol. 50, no. 9, pp. 3826-3839, Sep. 2020. |
| [50] | G. Tesauro, "Extending Q-learning to general adaptive multi-agent systems, " in Proc. 16th Int. Conf. Neural Information Processing Systems, 2004, pp. 871-878. |
| [51] | F. A. Oliehoek, J. F. P. Kooij, and N. Vlassis, "The cross-entropy method for policy search in decentralized POMDPs, " Informatica, vol. 32, no. 4, pp. 341-357, 2008. http://www.ams.org/mathscinet-getitem?mr=2481391 |
| [52] | P. Pennesi and I. C. Paschalidis, "A distributed actor-critic algorithm and applications to mobile sensor network coordination problems, " IEEE Trans. Autom. Control, vol. 55, no. 2, pp. 492-497, Feb. 2010. |
| [53] | I. C. Paschalidis and Y. W. Lin, "Mobile agent coordination via a distributed actor-critic algorithm, " in Proc. 19th Mediterranean Conf. Control Automation, Corfu, Greece, 2011, pp. 644-649. |
| [54] | S. Kar, J. M. F. Moura, and H. V. Poor, "QD-Learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations, " IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1848-1862, Apr. 2013. |
| [55] | J. N. Foerster, Y. M. Assael, N. De Freitas, and S. Whiteson, "Learning to Communicate with Deep Multi-Agent Reinforcement Learning, " in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2137-2145. |
| [56] | S. Omidshafiei, A. A. Agha-Mohammadi, C. Amato, S. Y. Liu, J. P. How, and J. Vian, "Graph-based cross entropy method for solving multi-robot decentralized POMDPs, " in Proc. IEEE Int. Conf. Robotics and Automation, Stockholm, Sweden, 2016, pp. 5395-5402. |
| [57] | J. K. Gupta, M. Egorov, and M. Kochenderfer, "Cooperative multi-agent control using deep reinforcement learning, " in Proc. Int. Conf. Autonomous Agents and Multiagent Systems, Best Papers, Brazil, 2017, pp. 66-83. |
| [58] | R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, "Multi-agent actor-critic for mixed cooperative-competitive environments, " in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 6379-6390. |
| [59] | M. Zhou, Y. Chen, Y. Wen, Y. D. Yang, Y. F. Su, W. N. Zhang, D. Zhang, and J. Wang, "Factorized Q-learning for large-scale multi-agent systems, " arXiv: 1809.03738, 2018. |
| [60] | K. Q. Zhang, Z. R. Yang, H. Liu, T. Zhang, and T. Başar, "Fully decentralized multi-agent reinforcement learning with networked agents, " arXiv: 1802.08757, 2018. |
| [61] | Y. Zhang and M. M. Zavlanos, 2019 "Distributed off-policy actor-critic reinforcement learning with policy consensus, " in Proc. IEEE 58th Conf. Decision and Control, Nice, France, 2018, pp. 4674-4679. |
| [62] | C. S. de Witt, J. N. Foerster, G. Farquhar, P. H. S. Torr, W. Boehmer, and S. Whiteson, "Multi-agent common knowledge reinforcement learning", in Proc. 31st Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 9927-9939. |
| [63] | D. P. Bertsekas, "Multiagent rollout algorithms and reinforcement learning, " arXiv: 2002.07407, 2019. |
| [64] | S. Bhattacharya, S. Kailas, S. Badyal, S. Gil, and D. P. Bertsekas, "Multiagent rollout and policy iteration for POMDP with application to multi-robot repair problems, " in Proc. Conf. Robot Learning, 2020; also arXiv preprint, arXiv: 2011.04222. |
| [65] | D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, USA: Athena Scientific, 1996. |
| [66] | G. Tesauro, and G. R. Galperin, "On-line policy improvement using Monte-Carlo search, " in Proc. 9th Int. Conf. Neural Information Processing Systems, Denver, USA, 1996, pp. 1068-1074. |
| [67] | D. P. Bertsekas, Nonlinear Programming. 3rd ed. Belmont, USA: Athena Scientific, 2016. |
| [68] | M. G. Lagoudakis and R. Parr, "Reinforcement learning as classification: Leveraging modern classifiers, " in Proc. 20th Int. Conf. Machine Learning, Washington, USA, 2003, pp. 424-431. |
| [69] | C. Dimitrakakis and M. G. Lagoudakis, "Rollout sampling approximate policy iteration, " Mach. Learn., vol. 72, no. 3, pp. 157-171, Jul. 2008. |
| [70] | A. Lazaric, M. Ghavamzadeh, and R. Munos, "Analysis of a classification-based policy iteration algorithm, " in Proc. 27th Int. Conf. Machine Learning, Haifa, Israel, 2010. |
| [71] | P. Abbeel and A. Y. Ng, "Apprenticeship learning via inverse reinforcement learning, " in Proc. 21st Int. Conf. Machine Learning, Banff, Canada, 2004. |
| [72] | B. D. Argall, S. Chernova, M. Veloso, and B. Browning, "A survey of robot learning from demonstration, " Rob. Auton. Syst., vol. 57, no. 5, pp. 469-483, May 2009. |
| [73] | G. Neu and C. Szepesvari, "Apprenticeship learning using inverse reinforcement learning and gradient methods, " arXiv: 1206.5264, 2012. |
| [74] | H. Ben Amor, D. Vogt, M. Ewerton, E. Berger, B. Jung, and J. Peters, "Learning responsive robot behavior by imitation, " in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Tokyo, Japan, 2013, pp. 3257-3264. |
| [75] | J. Lee, "A survey of robot learning from demonstrations for human-robot collaboration, " arXiv: 1710.08789, 2017. |
| [76] | M. K. Hanawal, H. Liu, H. H. Zhu, and I. C. Paschalidis, "Learning policies for Markov decision processes from data, " IEEE Trans. Autom. Control, vol. 64, no. 6, pp. 2298-2309, Jun. 2019. |
| [77] | D. Gagliardi and G. Russo, "On a probabilistic approach to synthesize control policies from example datasets, " arXiv: 2005.11191, 2020. |
| [78] | T. T. Xu, H. H. Zhu, and I. C. Paschalidis, "Learning parametric policies and transition probability models of Markov decision processes from data, " Eur. J. Control, 2020. http://www.researchgate.net/publication/347300920_Nearly_Minimax_Optimal_Reinforcement_Learning_for_Linear_Mixture_Markov_Decision_Processes |
| [79] | R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd Ed. Cambridge, USA: MIT Press, 2018. |
| [80] | D. P. Bertsekas, "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations, " IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1-31, Jan. 2019. |
| [81] | D. P. Bertsekas, "Approximate policy iteration: A survey and some new methods, " J. Control Theory Appl., vol. 9, no. 3, pp. 310-335, Jul. 2011; Expanded version appears as Lab. for Info. and Decision System Report LIDS-2833, MIT, 2011. |
| [82] | J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning, " Mach. Learn., vol. 16, no. 3, pp. 185-202, Sep. 1994. |
| [83] | V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning, " Nature, vol. 518, no. 7540, pp. 529-533, 2015. http://europepmc.org/abstract/med/25719670 |