IEEE/CAA Journal of Automatica Sinica
Citation: | Quan Liu, Xin Zhou, Fei Zhu, Qiming Fu and Yuchen Fu, "Experience Replay for Least-Squares Policy Iteration," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 274-281, 2014. |
[1] |
Wiering M, van Otterlo M. Reinforcement learning: state-of-the-art. Adaptation, Learning, and Optimization. Berlin, Heidelberg: Springer, 2012. 12-50
|
[2] |
Zhu Fei, Liu Quan, Fu Qi-Ming, Fu Yu-Chen. A least square actor-critic approach for continuous action space. Journal of Computer Research and Development, 2014, 51(3): 548-558 (in Chinese)
|
[3] |
Liu De-Rong, Li Hong-Liang, Wang Ding. Data-based self-learning optimal control: research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858-1870 (in Chinese)
|
[4] |
Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765-1776 (in Chinese)
|
[5] |
Chen Xin, Wei Hai-Jun, Wu Min, Cao Wei-Hua. Tracking learning based on Gaussian regression for multi-agent systems in continuous space. Acta Automatica Sinica, 2013, 39(12): 2021-2031 (in Chinese)
|
[6] |
Xu X, Zuo L, Huang Z H. Reinforcement learning algorithms with function approximation: recent advances and applications. Information Sciences, 2014, 261: 1-31
|
[7] |
Bradtke S J, Barto A G. Linear least-squares algorithms for temporal difference learning. Recent Advances in Reinforcement Learning. New York: Springer, 1996. 33-57
|
[8] |
Escandell-Montero P, Martínez-Martínez J D, Soria-Olivas E, Gómez-Sanchis J. Least-squares temporal difference learning based on an extreme learning machine. Neurocomputing, 2014, 141: 37-45
|
[9] |
Maei H R, Szepesvári C, Bhatnagar S, Sutton R S. Toward off-policy learning control with function approximation. In: Proceedings of the 27th International Conference on Machine Learning. Haifa: Omnipress, 2010. 719-726
|
[10] |
Tamar A, Castro D D, Mannor S. Temporal difference methods for the variance of the reward to go. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13). Atlanta, Georgia, 2013. 495-503
|
[11] |
Dann C, Neumann G, Peters J. Policy evaluation with temporal differences: a survey and comparison. The Journal of Machine Learning Research, 2014, 15(1): 809-883
|
[12] |
Lagoudakis M G, Parr R, Littman M L. Least-squares methods in reinforcement learning for control. Methods and Applications of Artificial Intelligence. Berlin, Heidelberg: Springer, 2002, 2308: 249-260
|
[13] |
Lagoudakis M, Parr R. Least squares policy iteration. Journal of Machine Learning Research, 2003, 4, 1107-1149
|
[14] |
Busoniu L, Babuska R, De Schutter B, Ernst D. Reinforcement Learning and Dynamic Programming using Function Approximators. New York: CRC Press, 2010. 100-118
|
[15] |
Adam S, Busoniu L, Babuska R. Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012, 42(2): 201-212
|
[16] |
Jung T, Polani D. Kernelizing LSPE (λ). In: Proceedings of the 2007 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Honolulu, HI: IEEE, 2007. 338-345
|
[17] |
Jung T, Polani D. Least squares SVM for least squares TD learning. In: Proceedings of the 17th European Conference on Artificial Intelligence. Amsterdam: IOS Press, 2006. 499-503
|