IEEE/CAA Journal of Automatica Sinica
Citation: | Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao and Xingguo Chen, "Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 257-266, 2014. |
[1] |
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998.
|
[2] |
Fernández F, García J, Veloso M. Probabilistic policy reuse for intertask transfer learning. Robotics and Autonomous Systems, 2010, 58(7): 866-871
|
[3] |
Lazaric A, Restelli M, Bonarini A. Transfer of samples in batch reinforcement learning. In: Proceedings of the 25th Annual International Conference on Machine Learning. New York, NY: ACM, 2008. 544-551
|
[4] |
McGovern A, Barto A G. Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2001. 361-368
|
[5] |
Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210
|
[6] |
Taylor M E, Stone P. An introduction to intertask transfer for reinforcement learning. AI Magazine, 2011, 32(1): 15-34
|
[7] |
Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765-1776 (in Chinese)
|
[8] |
Watkins C, Dayan P. Q-learning. Machine Learning, 1992, 8(3): 279-292
|
[9] |
Rummery G A, Niranjan M. On-line Q-learning using Connectionist Systems, Technical Report CUED/F-INFENG/TR 166, Department of Engineering, Cambridge University, Cambridge, UK, 1994.
|
[10] |
Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1): 181-211
|
[11] |
Stolle M, Precup D. Learning options in reinforcement learning. In: Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Alberta, Canada: Springer, 2002. 212-223
|
[12] |
Ö zgurü S, Alicia P W, Andrew G B. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceeding of the 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005. 816-823
|
[13] |
Taylor M E, Kuhlmann G, Stone P. Autonomous transfer for reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems. Estoril, Portugal: ACM, 2008. 283-290
|
[14] |
Chen F, Gao F, Chen S F, Ma Z D. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC002). Haikou, China: IEEE, 2007. 698-702
|
[15] |
Ravindran B, Barto A G. An Algebraic Approach to Abstraction in Reinforcement Learning [Ph. D. dissertation], University of Massachusetts Amherst, USA, 2004
|
[16] |
Soni V, Singh S. Using homomorphisms to transfer options across continuous reinforcement learning domains. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 494-499
|
[17] |
Wolfe A P, Barto A G, Andrew G. Decision tree methods for finding reusable MDP homomorphism. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI006). Boston, Massachusetts: AAAI Press, 2006. 530-535
|
[18] |
Goel S, Huber M. Subgoal discovery for hierachical reinforcement learning using learned policies. In: Proceedings of the 16th International FLAIRS Conference. California, USA: AAAI, 2003. 346-350
|
[19] |
Ferns N, Panangaden P, Precup D. Metrics for finite Markov decision processes. In: Proceeding of the 20th Conference on Uncertainty in Artificial Intelligence. Banff, Canada: AUAI Press, 2004. 162-169
|
[20] |
Mehta N, Natarajan S, Tadepalli P, Fern A. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning, 2008, 73(3): 289-312
|