IEEE/CAA Journal of Automatica Sinica
Citation: | Girish Chowdhary, Miao Liu, Robert Grande, Thomas Walsh, Jonathan How and Lawrence Carin, "Off-Policy Reinforcement Learning with Gaussian Processes," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 227-238, 2014. |
[1] |
Sutton R, Barto A. Reinforcement Learning, an Introduction. Cambridge, MA: MIT Press, 1998.
|
[2] |
Engel Y, Mannor S, Meir R. Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine learning. New York: ACM, 2005. 201-208
|
[3] |
Ernst D, Geurts P, Wehenkel L. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 2005, 6: 503-556
|
[4] |
Maei H R, Szepesvári C, Bhatnagar S, Sutton R S. Toward off-policy learning control with function approximation. In: Proceedings of the 2010 International Conference on Machine Learning, Haifa, Israel, 2010.
|
[5] |
Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge, MA: The MIT Press, 2006.
|
[6] |
Engel Y, Szabo P, Volkinshtein D. Learning to control an octopus arm with gaussian process temporal difference methods. In: Advances in Neural Information Processing Systems 18. 2005. 347-354
|
[7] |
Melo F S, Meyn S P, Ribeiro M I. An analysis of reinforcement learning with function approximation. In: Proceedings of the 2008 International Conference on Machine Learning. New York: ACM, 2008. 664-671
|
[8] |
Deisenroth M P. Efficient Reinforcement Learning Using Gaussian Processes[Ph. D. dissertation], Karlsruhe Institute of Technology, Germany, 2010.
|
[9] |
Jung T, Stone P. Gaussian processes for sample efficient reinforcement learning with rmax-like exploration. In: Proceedings of the 2012 European Conference on Machine Learning (ECML). 2012. 601-616
|
[10] |
Rasmussen C, Kuss M. Gaussian processes in reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems, 2004, 751-759
|
[11] |
Kolter J Z, Ng A Y. Regularization and feature selection in leastsquares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009. 521-528
|
[12] |
Liu B, Mahadevan S, Liu J. Regularized off-policy TD-learning. In: Proceddings of the Advances in Neural Information Processing Systems 25. Cambridge, MA: The MIT Press, 2012. 845-853
|
[13] |
Sutton R S, Maei H R, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th Annual International Conference on Machine Learning. New York, USA: ACM, 2009. 993-1000
|
[14] |
Csató L, Opper M. Sparse on-line gaussian processes. Neural Computation, 2002, 14(3): 641-668
|
[15] |
Singh S P, Yee R C. An upper bound on the loss from approximate optimal-value functions. Machine Learning, 1994, 16(3): 227-233
|
[16] |
Watkins C J. Q-learning. Machine Learning, 1992, 8(3): 279-292
|
[17] |
Baird L C. Residual algorithms: reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, 1995. 30-37
|
[18] |
Tsitsiklis J N, van Roy B. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 1997, 42(5): 674-690
|
[19] |
Parr R, Painter-Wakefield C, Li L, Littman M L. Analyzing feature generation for value-function approximation. In: Proceedings of the 2007 International Conference on Machine Learning. New York: IEEE, 2007. 737-744
|
[20] |
Ormoneit D, Sen S. Kernel-based reinforcement learning. Machine Learning, 2002, 49(2-3): 161-178
|
[21] |
Farahmand A M, Ghavamzadeh M, Szepesvári C, Mannor S. Regularized fitted q-iteration for planning in continuous-space Markovian decision problems. In: Proceedings of the 2009 American Control Conference. St. Louis, MO: IEEE, 2009. 725-730
|
[22] |
Geist M, Pietquin O. Kalman temporal differences. Journal of Artificial Intelligence Research (JAIR), 2010, 39(1): 483-532
|
[23] |
Strehl A L, Littman M L. A theoretical analysis of model-based interval estimation. In: Proceedings of the 22nd International Conference on Machine Learning. New York: IEEE, 2005. 856-863
|
[24] |
Krause A, Guestrin C. Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach. In: Proceedings of the 24th International Conference on Machine Learning. New York: IEEE, 2007. 449-456
|
[25] |
Desautels T, Krause A, Burdick J W. Parallelizing explorationexploitation tradeoffs with Gaussian process bandit optimization. In: Proceedings of the 29th International Conference on Machine Learning. ICML, 2012. 1191-1198
|
[26] |
Chung J J, Lawrance N R J, Sukkarieh S. Gaussian processes for informative exploration in reinforcement learning. In: Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Karlsruhe: IEEE, 2013. 2633-2639
|
[27] |
Barreto A D M S, Precup D, Pineau J. Reinforcement learning using kernel-based stochastic factorization. In: Proceedings of the Advances in Neural Information Processing Systems 24. Cambridge, MA: The MIT Press, 2011. 720-728
|
[28] |
Chen X G, Gao Y, Wang R L. Online selective kernel-based temporal difference learning. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(12): 1944-1956
|
[29] |
Kveton B, Theocharous G. Kernel-based reinforcement learning on representative states. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. AAAI, 2012. 977-983
|
[30] |
Xu X, Hu D W, Lu X C. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992
|
[31] |
Snelson E, Ghahramani Z. Sparse Gaussian processes using pseudoinputs. In: Proceedings of the Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2006. 1257-126
|
[32] |
Lázaro-Gredilla M, Quińonero-Candela J, Rasmussen C E, Figueiras-Vidal A R. Sparse spectrum gaussian process regression. The Journal of Machine Learning Research, 2010, 11: 1865-1881
|
[33] |
Varah J M. A lower bound for the smallest singular value of a matrix. Linear Algebra and Its Applications, 1975, 11(1): 3-5
|
[34] |
Lizotte D J. Convergent fitted value iteration with linear function approximation. In: Proceedings of the Advances in Neural Information Processing Systems 24. Cambridge, MA: The MIT Press, 2011. 2537-2545
|
[35] |
Kingravi H. Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods [Ph.D dissertation], Georgia Institute of Technology, Atlanta, GA, 2013
|
[36] |
Boyan J, Moore A. Generalization in reinforcement learning: Safely approximating the value function. In: Proceedings of the Advances in Neural Information Processing Systems 7. Cambridge, MA: The MIT Press, 1995. 369-376
|
[37] |
Engel Y, Mannor S, Meir R. The kernel recursive least-squares algorithm. IEEE Transactions on Signal Processing, 2004, 52(8): 2275-2285
|
[38] |
Krause A, Singh A, Guestrin C. Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 2008, 9: 235-284
|
[39] |
Ehrhart E. Geometrie diophantienne-sur les polyedres rationnels homothetiques an dimensions. Comptes rendus hebdomadaires des seances de l'academia des sciences, 1962, 254(4): 616
|
[40] |
Benveniste A, Priouret P, Métivier M. Adaptive Algorithms and Stochastic Approximations. New York: Springer-Verlag, 1990
|
[41] |
Haddad W M, Chellaboina V. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton: Princeton University Press, 2008
|
[42] |
Khalil H K. Nonlinear Systems. New York: Macmillan, 2002.
|
[43] |
Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research (JMLR), 2003, 4: 1107-1149
|